Lab 03 - Take a sad plot and make it better

Image by Fauxels from Pexels Image by Fauxels from Pexels

The American Association of University Professors (AAUP) is a non-profit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011. The report begins with the following data visualisation.

In this lab you will discuss in your groups what makes a good data visualisation and create a better visualisation for the above data.

Learning goals


Set-up

Please find your team members that you formed in last weekโ€™s workshop.

In todayโ€™s lab you will continue to work collaboratively from the same repository. An important learning objective for today is to be aware of merge conflicts and learn how to resolve them when pushing changes you your teamโ€™s shared repository.

As with last week, it is very important that you follow the instructions carefully and that only specified team members do the activity at stated times.

Form pairs, give each member of your team a number (it can be different from last week) and look-out for the following emoji sequence to indicate who should be completing the activity:

If itโ€™s your turn to be the navigator ๐Ÿงญ, then guide your partner member in doing their task, and keep a watchful eye on these instructions.

If it is not your turn, then advise your team member in doing their task. Do not make any changes to your work and do not make any pushes to or pulls from GitHub โ€“ Keep your hand off the keyboard!


Creating a collaborative repository

Letโ€™s first set-up GitHub.

๐Ÿš˜๐Ÿงญ๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜ (Member 1 only - pair 1)

You are the maintainer of the GitHub repository for todayโ€™s lab worksheet. This means that you will need to take a clone of todayโ€™s lab template and to add your team members as collaborators so that they can add their contribution.

First, log onto GitHub and create a new repository by cloning todayโ€™s lab template project. To remind you of the step:

Next, to add your team members as collaborators:

(๐Ÿ˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜ - You should receive a collaboration invitation via email, accept this.)


Version control R project

๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜ (For all)

Once everyone has been added to the collaborative repository, open RStudio and create a new version control project using the GitHub repository you have just made. To remind you of the steps:

PAUSE: Ensure that all team members have successfully created an R project and have pulled the current content from GitHub. Everyone, hands off the computer unless it is your turn!


Information about merges and merge conflicts

Please read this section before proceeding

What happens when you push your committed changes from your computer to a repository on GitHub?

It may appear that GitHub simply replaces the version it has with the latest version that that you have on your computer. This may not appear to be problematic when working by yourself, but there is a major issue when working collaboratively. Say that you and your friend are working collaboratively, your friend pushes their work first and then you push your work afterwards. If GitHub simply replaces old code with new code then you risk loosing your friendโ€™s work!

What happens is that GitHub attempts to merge the existing and new files.

What actually happens is that there is an initial check to verify that version currently on GitHub matches with the version on your computer at the last time you communicated with GitHub (either via a pull or the previous push). If they are the same,then GitHub will happily replace the old files with the most version on your computer when you push the latest committed changes.

However, when working collaboratively, your team member may push their changes which would mean that your personal copy of the repository will be behind the version on GitHub.

In this case, GitHub will stop you from pushing your changes to the shared repository. When this happens, you will need to explicitly โ€œmergeโ€ your work and your collaboratorโ€™s work before you can push.

If you and your collaboratorโ€™s changes are in different files or in different parts of the same file, your work will be automatically merged on your next โฌ‡๏ธ pull from the shared repository. โ€“ This is what happened last week each time when you pulled the latest changes from your teamโ€™s repository.

However, if you and your collaborator has made changes to the same part of a file, then it is not possible to automatically merge the files. This is what is called a merge conflict as the merge procedure does not know which change you want to keep and which to overwrite. The decision to rectify the differences will have to be made by you.

When there is a merge conflict, additional conflict markers will appear in the file to indicate where the conflict is. This will look like:

<<<<<<< HEAD 

goldilocks %>% filter(porridge == "Too Hot")

======= 

goldilocks %>% filter(porridge == "Just Right")

>>>>>>> some1alpha2numeric3string4

The code <<< HEAD indicates the start of the conflict and >>> identifies the end. The content in the middle is partitioned by === to separate your changes (top) from the latest version on GitHub (bottom).

Your job is to reconcile the changes: edit the file so that it incorporates the best of both versions, and then delete the conflict markers (the <<<, ===, and >>> lines).

Once you have reconciled the changes, you should then stage and commit the results. Only then will you will be permitted to push your changes to GitHub.

Note

If Git can automatically resolve the merge conflict, you might see the following message:

If this happens, we recommend running the first command shown in the command line (not the R console): git config pull.rebase false. Once you do this, the message should not appear again in the future.


Making a merge conflict - add your name

Whilst you are waiting for your turn, either help each other with their steps or look ahead to the next section on discussing data visualisations.

๐Ÿš˜๐Ÿงญ๐Ÿš˜๐Ÿงญ๐Ÿš˜๐Ÿงญ (Members 1, 3, 5 - all pairs)

Open lab-03.Rmd. Type your own name and your navigatorโ€™s name at the top of the file and ๐Ÿงถ Knit the document. โœ… Commit your changes, but do not push to GitHub!

Everyone, hands off your computer.

๐Ÿš˜๐Ÿงญ๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜ (Member 1 only - pair 1)

Hands on your computer.

โฌ†๏ธ Push your changes to GitHub. This should happen as usual with no issue.

Hands off your computer.

๐Ÿ˜๐Ÿ˜๐Ÿš˜๐Ÿงญ๐Ÿ˜๐Ÿ˜ (Member 3 only - pair 2)

Hands on your computer.

Attempt to โฌ†๏ธ push your changes to GitHub. This time you will see the message similar to:

To https://github.com/StuDent/lab-03
 ! [rejected]        HEAD -> main (fetch first)
error: failed to push some refs to 'https://github.com/StuDent/lab-03'

This error message indicates that GitHub has failed to merge the changes that member 2 made with the changes made by member 1.

To resolve the merge conflict:

  1. โฌ‡๏ธ Pull the latest version currently on GitHub. You will see the following message:
CONFLICT (content): Merge conflict in lab-03.Rmd
Automatic merge failed; fix conflicts and then commit the result.
  1. Return to lab-03.Rmd and you should now see the following at the top of your file that indicates where and what was the merge conflict:
---
title: "Take a sad plot and make it better"
<<<<<<< HEAD 
author: "User1, User2, Erik, Ozan, User5, User6"
======= 
author: "Clara, Tania, User3, User4, User5, User6"
>>>>>>> 40f6ad0a10c26482b377
date: "`r Sys.Date()`"
output: html_document
---
  1. Rectify the conflict so that names of all members from pair 1 and 2 are listed as the author. For example:
---
title: "Take a sad plot and make it better"
author: "Clara, Tania, Erik, Ozan, User5, User6"
date: "`r Sys.Date()`"
output: html_document
---
  1. ๐Ÿงถ Knit your document and verify that the author line in the output is correct.

  2. โœ… Commit the changes with an informative message, for example resolved merge conflict with authors.

  3. โฌ†๏ธ Push your changes to GitHub. This time there should not be any issue.

Hands off your computer.

๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿš˜๐Ÿงญ (Member 5 only - pair 3)

Hands on your computer.

Follow the same instructions as member 3. Begin by attempting to โฌ†๏ธ Push your changes, which will result in an error message indicating that your personal version is behind the version on GitHub. Then:

  1. โฌ‡๏ธ Pull the latest updates from GitHub.
  2. Find the merge conflict.
  3. Resolve the merge conflict.
  4. ๐Ÿงถ Knit your document.
  5. โœ… Commit your changes with an informative message.
  6. โฌ†๏ธ Push your changes to GitHub.

Hands off your computer.

๐Ÿš˜๐Ÿงญ๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜ (Member 1 only - pair 1)

Although you have contributed your name to the author, you have not yet had the chance to experience a merge conflict โ€“ it is now your turn!

Hands on your computer.

Edit the author line of your version of lab-03.Rmd so that it only has your name and your groupโ€™s team name. ๐Ÿงถ Knit the document and โœ… commit your changes.

If you now attempt to โฌ‡๏ธ push you will be faced with an error message. Follow the above steps to resolve your merge conflict.

Hands off your computer.

๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜ (For all)

Finally, everybody โฌ‡๏ธ pull the latest changes from the shared repository. You all should now have the same document that has everyoneโ€™s name and your team name. Provided that you all followed the above instructions carefully then there should not be any further merge conflicts.

Tips for collaborating via GitHub


Questioning data visualisations

Look at the following data visualisations. Have a discussion with your team members at what might be problematic with the images. Do any of the visualisations have a problem with the 4 respects โ€“ people, data, mathematics and computer.

Image 1

Image 2

Image 3

Image 4


Exercises

In your groups, take it in turn to work collaboratively in answering the following exercises.

Remember to regularly ๐Ÿงถ knit, โœ… commit and โฌ†๏ธ push your work to your shared repository on GitHub. If you are faced with a merge conflict, then carefully follow the above instructions to reconcile the conflict before pushing your changes. If you come across an issue that you are unsure how to resolve, then please ask a tutor for assistance.

Packages & Data

For the following exercises, you will be needing to use some of the data wrangling functions from the tidyverse package and the data visualisation code from the ggplot2 package. Ensure that you have the following two lines of code at the top of lab-03.Rmd to make the commands available to you.

library(tidyverse)
library(ggplot2)

Letโ€™s start by loading the data from the AAUP that was used to create the data visualisation shown at the beginning of this worksheet.

staff <- read_csv("data/instructional-staff.csv")

Exercise 1.

While you work on these exercises, aim to create a merge conflict so that members 2, 4 and 6 can also practice resolving one. For example in pair 1, member 1 can make a minor edit in the answer box (like adding placeholder text), then commit and push this change to the repository. At the same time, member 2 should work on the actual answer for the exercise (without pulling the recent changes). When member 2 tries to push, theyโ€™ll encounter an error due to a merge conflict. Follow the instructions provided above to resolve the conflict.

View the data. Discuss as a team the following questions and write down your answer.

Exercise 2.

When creating a data visualisation, it is generally preferable to have the data set in a long format. That is to say, each row should relate to a unique case/observation.

If the data set is in a wide format then we need to reshape its structure by pivoting from wide to long using pivot_longer(). The animation below show how this function works, as well as its counterpart pivot_wider().

Quick reminder: the function has the following arguments:

pivot_longer(data, cols, names_to = "name")

Fill in the blanks in the following code chunk to pivot the staff data longer and save it as a new data frame called staff_long.

staff_long <- ___ %>%
  ___(
    cols = ____, 
    names_to = "_____",
    values_to = "percent"
    )

Inspect staff_long. How many rows does it have? Does this correspond to your answer from Exercise 1?

Exercise 3.

We will begin by plotting instructional staff employment trends as a dot plot. Copy the following code that creates a dot plot of percentage on the y-axis against year on the x-axis, with the dots coloured based on the faculty_type. Ensure that you understand what each part of the code is doing.

ggplot(data = staff_long,
       mapping = aes(x = year, 
                     y = percent, 
                     colour = faculty_type)) +
  geom_point()

Exercise 4.

Perhaps the trend over time can be better visualised using lines rather than dots. Edit the above code to use the geom_line() command.

What is wrong with the graph? Have a look at the data and the dot plot for clues as to what might be wrong before progressing to the next exercise. (You do not need to say how to fix it hereโ€”that is the next question!)

Exercise 5.

In the dot plot from exercise 3, notice that the scaling along the x-axis is not consistent. The physical distance between each of the years are the same, but numerically there are 14 years between the first two cases and 2 years between the last two!

The reason for this is because the year variable in staff_long is a "character" variable, not a numerical variable.

Complete the following code to edit the variable type of year from character to numerical.

staff_long <- staff_long %>%
  mutate(year = ______(year))

Now create the line plot described in exercise 4 to illustrate how the faculty proportions have changed over time.

Exercise 6.

Improve the line plot from the previous exercise by fixing up its labels (title, axis labels, and legend label) as well as any other components you think could benefit from improvement.

Exercise 7.

Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story? Write down your idea(s). The more precise you are, the easier the next step will be. Get creative, and think about how you can modify the dataset to give you new/different variables to work with.

Exercise 8.

Implement at least one of these ideas you came up with in the previous exercise. You should produce an improved data visualisation and accompany your visualisation with a brief paragraph describing the choices you made in your improvement, specifically discussing what you didnโ€™t like in the original plot and why, and how you addressed them in the visualisation you created.


Finishing off

At the end of the lab, you need to ensure that you have your own personal copy of todayโ€™s work. Please follow the following instructions carefully:

๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜For all)

Everybody, ๐Ÿงถ knit, โœ… commit and โฌ†๏ธ push any remaining changes to your groupโ€™s shared repository on GitHub. In doing so, ensure that you resolve any merge conflicts.

Once the version on GitHub contains everybodyโ€™s contribution, โฌ‡๏ธ Pull the latest changes so that your personal copy is up-to-date.

๐Ÿ˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜๐Ÿš˜ (All except member 1)

On GitHub, create your own copy of the shared repository. You can do this using the same instructions as at the start when copying todayโ€™s template repository, but instead importing from member 1โ€™s GitHub account rather than the course account.

If you want to continue to work on todayโ€™s lab after the workshop, then you will need to create a new version control project with your personal copy of the repository that you have just created.

๐Ÿš˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜ (Member 1 only)

At the end of the workshop, you want to ensure that only you can make further changes to the shared repository. To do this, you will need to remove the collaboration permissions of your team members. To do this: