Image by Fauxels from Pexels
The American Association of University Professors (AAUP) is a non-profit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011. The report begins with the following data visualisation.
In this lab you will discuss in your groups what makes a good data visualisation and create a better visualisation for the above data.
Please find your team members that you formed in last weekโs workshop.
In todayโs lab you will continue to work collaboratively from the same repository. An important learning objective for today is to be aware of merge conflicts and learn how to resolve them when pushing changes you your teamโs shared repository.
As with last week, it is very important that you follow the instructions carefully and that only specified team members do the activity at stated times.
Form pairs, give each member of your team a number (it can be different from last week) and look-out for the following emoji sequence to indicate who should be completing the activity:
If itโs your turn to be the navigator ๐งญ, then guide your partner member in doing their task, and keep a watchful eye on these instructions.
If it is not your turn, then advise your team member in doing their task. Do not make any changes to your work and do not make any pushes to or pulls from GitHub โ Keep your hand off the keyboard!
Letโs first set-up GitHub.
๐๐งญ๐๐๐๐ (Member 1 only - pair 1)
You are the maintainer of the GitHub repository for todayโs lab worksheet. This means that you will need to take a clone of todayโs lab template and to add your team members as collaborators so that they can add their contribution.
First, log onto GitHub and create a new repository by cloning todayโs lab template project. To remind you of the step:
Go to Your repositories in your GitHub account and then click on the green New button.
Click on Import a repository and type/copy the URL of todayโs lab template project: https://github.com/uoeIDS/lab-03-template
Add an appropriate name to your repository, say lab-03
, and click on Begin import.
Next, to add your team members as collaborators:
(๐๐๐๐๐๐ - You should receive a collaboration invitation via email, accept this.)
๐๐๐๐๐๐ (For all)
Once everyone has been added to the collaborative repository, open RStudio and create a new version control project using the GitHub repository you have just made. To remind you of the steps:
Open RStudio and go to File > New Projectโฆ
Select Version Control and then Git. Type/paste the URL of the repository you have just created.
Browse an appropriate location for the project and then click on Create Project.
PAUSE: Ensure that all team members have successfully created an R project and have pulled the current content from GitHub. Everyone, hands off the computer unless it is your turn!
Please read this section before proceeding
What happens when you push your committed changes from your computer to a repository on GitHub?
It may appear that GitHub simply replaces the version it has with the latest version that that you have on your computer. This may not appear to be problematic when working by yourself, but there is a major issue when working collaboratively. Say that you and your friend are working collaboratively, your friend pushes their work first and then you push your work afterwards. If GitHub simply replaces old code with new code then you risk loosing your friendโs work!
What happens is that GitHub attempts to merge the existing and new files.
What actually happens is that there is an initial check to verify that version currently on GitHub matches with the version on your computer at the last time you communicated with GitHub (either via a pull or the previous push). If they are the same,then GitHub will happily replace the old files with the most version on your computer when you push the latest committed changes.
However, when working collaboratively, your team member may push their changes which would mean that your personal copy of the repository will be behind the version on GitHub.
In this case, GitHub will stop you from pushing your changes to the shared repository. When this happens, you will need to explicitly โmergeโ your work and your collaboratorโs work before you can push.
If you and your collaboratorโs changes are in different files or in different parts of the same file, your work will be automatically merged on your next โฌ๏ธ pull from the shared repository. โ This is what happened last week each time when you pulled the latest changes from your teamโs repository.
However, if you and your collaborator has made changes to the same part of a file, then it is not possible to automatically merge the files. This is what is called a merge conflict as the merge procedure does not know which change you want to keep and which to overwrite. The decision to rectify the differences will have to be made by you.
When there is a merge conflict, additional conflict markers will appear in the file to indicate where the conflict is. This will look like:
The code <<< HEAD
indicates the start of the conflict and >>>
identifies the end. The content in the middle is partitioned by ===
to separate your changes (top) from the latest version on GitHub (bottom).
Your job is to reconcile the changes: edit the file so that it incorporates the best of both versions, and then delete the conflict markers (the <<<
, ===
, and >>>
lines).
Once you have reconciled the changes, you should then stage and commit the results. Only then will you will be permitted to push your changes to GitHub.
If Git can automatically resolve the merge conflict, you might see the following message:
If this happens, we recommend running the first command shown in the command line (not the R console):
git config pull.rebase false
.
Once you do this, the message should not appear again in the future.
Whilst you are waiting for your turn, either help each other with their steps or look ahead to the next section on discussing data visualisations.
๐๐งญ๐๐งญ๐๐งญ (Members 1, 3, 5 - all pairs)
Open lab-03.Rmd
. Type your own name and your navigatorโs name at the top of the file and ๐งถ Knit the document. โ
Commit your changes, but do not push to GitHub!
Everyone, hands off your computer.
๐๐งญ๐๐๐๐ (Member 1 only - pair 1)
Hands on your computer.
โฌ๏ธ Push your changes to GitHub. This should happen as usual with no issue.
Hands off your computer.
๐๐๐๐งญ๐๐ (Member 3 only - pair 2)
Hands on your computer.
Attempt to โฌ๏ธ push your changes to GitHub. This time you will see the message similar to:
This error message indicates that GitHub has failed to merge the changes that member 2 made with the changes made by member 1.
To resolve the merge conflict:
lab-03.Rmd
and you should now see the following at the top of your file that indicates where and what was the merge conflict:๐งถ Knit your document and verify that the author line in the output is correct.
โ
Commit the changes with an informative message, for example resolved merge conflict with authors
.
โฌ๏ธ Push your changes to GitHub. This time there should not be any issue.
Hands off your computer.
๐๐๐๐๐๐งญ (Member 5 only - pair 3)
Hands on your computer.
Follow the same instructions as member 3. Begin by attempting to โฌ๏ธ Push your changes, which will result in an error message indicating that your personal version is behind the version on GitHub. Then:
Hands off your computer.
๐๐งญ๐๐๐๐ (Member 1 only - pair 1)
Although you have contributed your name to the author, you have not yet had the chance to experience a merge conflict โ it is now your turn!
Hands on your computer.
Edit the author line of your version of lab-03.Rmd
so that it only has your name and your groupโs team name. ๐งถ Knit the document and โ
commit your changes.
If you now attempt to โฌ๏ธ push you will be faced with an error message. Follow the above steps to resolve your merge conflict.
Hands off your computer.
๐๐๐๐๐๐ (For all)
Finally, everybody โฌ๏ธ pull the latest changes from the shared repository. You all should now have the same document that has everyoneโs name and your team name. Provided that you all followed the above instructions carefully then there should not be any further merge conflicts.
Git
panel, look out for the message Your branch is ahead of 'origin/master' by 1 commit
. This indicates that your personal version is ahead of the version on GitHub. In this case, it is advisable to pull before you push anything to minimise any communication errors.Look at the following data visualisations. Have a discussion with your team members at what might be problematic with the images. Do any of the visualisations have a problem with the 4 respects โ people, data, mathematics and computer.
In your groups, take it in turn to work collaboratively in answering the following exercises.
Remember to regularly ๐งถ knit, โ commit and โฌ๏ธ push your work to your shared repository on GitHub. If you are faced with a merge conflict, then carefully follow the above instructions to reconcile the conflict before pushing your changes. If you come across an issue that you are unsure how to resolve, then please ask a tutor for assistance.
For the following exercises, you will be needing to use some of the data wrangling functions from the tidyverse
package and the data visualisation code from the ggplot2
package. Ensure that you have the following two lines of code at the top of lab-03.Rmd
to make the commands available to you.
Letโs start by loading the data from the AAUP that was used to create the data visualisation shown at the beginning of this worksheet.
While you work on these exercises, aim to create a merge conflict so that members 2, 4 and 6 can also practice resolving one. For example in pair 1, member 1 can make a minor edit in the answer box (like adding placeholder text), then commit and push this change to the repository. At the same time, member 2 should work on the actual answer for the exercise (without pulling the recent changes). When member 2 tries to push, theyโll encounter an error due to a merge conflict. Follow the instructions provided above to resolve the conflict.
View the data. Discuss as a team the following questions and write down your answer.
staff
data wide or long?When creating a data visualisation, it is generally preferable to have the data set in a long format. That is to say, each row should relate to a unique case/observation.
If the data set is in a wide format then we need to reshape its structure by pivoting from wide to long using pivot_longer()
. The animation below show how this function works, as well as its counterpart pivot_wider()
.
Quick reminder: the function has the following arguments:
data
as usual.cols
, specifies the columns to pivot into longer format.names_to
, is the name of the column where column names of pivoted variables go (character string).values_to
is the name of the column where data in pivoted variables go (character string).Fill in the blanks in the following code chunk to pivot the staff data longer and save it as a new data frame called staff_long
.
Inspect staff_long
. How many rows does it have? Does this correspond to your answer from Exercise 1?
We will begin by plotting instructional staff employment trends as a dot plot. Copy the following code that creates a dot plot of percentage
on the y-axis against year
on the x-axis, with the dots coloured based on the faculty_type
. Ensure that you understand what each part of the code is doing.
Perhaps the trend over time can be better visualised using lines rather than dots. Edit the above code to use the geom_line()
command.
What is wrong with the graph? Have a look at the data and the dot plot for clues as to what might be wrong before progressing to the next exercise. (You do not need to say how to fix it hereโthat is the next question!)
In the dot plot from exercise 3, notice that the scaling along the x-axis is not consistent. The physical distance between each of the years are the same, but numerically there are 14 years between the first two cases and 2 years between the last two!
The reason for this is because the year
variable in staff_long
is a "character"
variable, not a numerical variable.
Complete the following code to edit the variable type of year
from character to numerical.
Now create the line plot described in exercise 4 to illustrate how the faculty proportions have changed over time.
Improve the line plot from the previous exercise by fixing up its labels (title, axis labels, and legend label) as well as any other components you think could benefit from improvement.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story? Write down your idea(s). The more precise you are, the easier the next step will be. Get creative, and think about how you can modify the dataset to give you new/different variables to work with.
Implement at least one of these ideas you came up with in the previous exercise. You should produce an improved data visualisation and accompany your visualisation with a brief paragraph describing the choices you made in your improvement, specifically discussing what you didnโt like in the original plot and why, and how you addressed them in the visualisation you created.
At the end of the lab, you need to ensure that you have your own personal copy of todayโs work. Please follow the following instructions carefully:
๐๐๐๐๐๐For all)
Everybody, ๐งถ knit, โ commit and โฌ๏ธ push any remaining changes to your groupโs shared repository on GitHub. In doing so, ensure that you resolve any merge conflicts.
Once the version on GitHub contains everybodyโs contribution, โฌ๏ธ Pull the latest changes so that your personal copy is up-to-date.
๐๐๐๐๐๐ (All except member 1)
On GitHub, create your own copy of the shared repository. You can do this using the same instructions as at the start when copying todayโs template repository, but instead importing from member 1โs GitHub account rather than the course account.
If you want to continue to work on todayโs lab after the workshop, then you will need to create a new version control project with your personal copy of the repository that you have just created.
๐๐๐๐๐๐ (Member 1 only)
At the end of the workshop, you want to ensure that only you can make further changes to the shared repository. To do this, you will need to remove the collaboration permissions of your team members. To do this: