Lab 02 - Global plastic waste

Image by Willfried Wende from Pixabay Image by Willfried Wende from Pixabay

Plastic pollution is a major and growing problem, negatively affecting oceans and wildlife health. Our World in Data has a lot of great data at various levels including globally, per country, and over time. For this lab we focus on data from 2019.

Additionally, National Geographic ran a data visualization communication contest on plastic waste as seen here. The winners, Perpetual Plastic, created a physical data visualisation sculpture on Bali’s beaches out of washed up flip-flops and other plastic debris.

Learning goals

Before we get started, please ensure that you have RStudio installed, a GitHub account and are able to push and pull correctly. If not, please follow the set-up instructions here. Ask a tutor for help if you have any problems.


In today’s lab you will be working collaboratively from the same repository in GitHub. This will be extremely useful in your projects for sharing out the workload among the team members.

❗ For today, it is very important that you follow the instructions carefully to avoid creating any merger conflicts. Ask a tutor for help if this happens to your team. We will discuss how to resolve merger conflicts in the next lab.


Getting started

If your group consists of fewer than 6 people, then take turns in doing the tasks.

Today you will be working in a team of no more than 6 people. It is important that you take it in turns to work on each activity one person at a time.

You will also get started with pair programming techniques. What is pair programming? From Codeacademy.com,

In pair programming, one person is the “driver,” and the other is the “navigator.” The driver is the person at the keyboard who’s actively writing code. The navigator observes, checks code for accuracy, and keeps an eye on the bigger picture.

Pair programmers switch roles regularly, so both pairs stay engaged. They also work collaboratively, determining which tasks need to be done.

If your group has an odd number of people, one pair can be replaced by a group of three people and you can alternate 🚘 driver the 🧭 navigator roles.

We will be using the 🚘 emoji to describe the “driver”, and the 🧭 emojy to describe the “navigator.” In your group, form pairs with each person sitting next to their partner. Each member of the pair will alternate between the 🚘 driver role and the 🧭 navigator role.

Give each member of your team a number and look-out for the following emoji sequence to indicate who should be completing the activity:

If it’s your turn to be the navigator 🧭, then guide your partner member in doing their task. Remember: for anyone who is not the driver 🚘, do not make any changes to your work and do not make any pushes to or pulls from GitHub – Keep your hand off the keyboard!

Register your team on Wooclap

This week we will start to finalize the teams which will later work on the group projects. To collect the initial information on teams and team members, it is important that one of you fills out this wooclap form:


Creating a collaborative repository

Let’s first set-up GitHub.

🚘🧭😐😐😐😐 (Member 1 only - pair 1)

You are the maintainer of the GitHub repository for today’s lab worksheet. This means that you will need to take a clone of today’s lab template and to add your team members as collaborators so that they can add their contribution.

First, log onto GitHub and create a new repository by cloning today’s lab template project. To remind you of the step:

Next, to add your team members as collaborators:

(😐🚘🚘🚘🚘🚘 - You should receive a collaboration invitation via email, accept this.)


Version control R project

🚘🚘🚘🚘🚘🚘 (For all)

Once everyone has been added to the collaborative repository, open RStudio and create a new version control project using the GitHub repository you have just made. To remind you of the steps:

PAUSE: Ensure that all team members have successfully created an R project and have pulled the current content from GitHub. Everyone, hands off the computer unless it is your turn!


Adding your own name

MERGING CONFLICT: A git merge conflict is an event that takes place when Git is unable to automatically resolve differences in code between two commits. Git can merge the changes automatically only if the commits are on different lines or branches. For some general instructions on how to solve merging conflict, you can have a look at this page

You are going to take it in turns to add your own name to the author string at the top of the lab-02.Rmd .

🚘🧭😐😐😐😐 (Member 1 only - pair 1)

😐😐🚘🧭😐😐 (Member 3 only - pair 2)

😐😐😐😐🚘🧭 (Member 5 only - pair 3)

🚘🚘🚘🚘🚘🚘 (For all)

Everybody, ⬇️ Pull the latest changes from the shared repository so that the version you have has everyone’s name.

Congratulations! - You have now started working collaboratively from the same repository in GitHub. Now let’s do some data science…


Packages

Before getting started with the Exercises, run the following code in the Console to load the packages you will need to today’s lab.

library(tidyverse)
library(readxl)

Loading the data

The data for this lab is contained with 4 different files, each saved as different file types within the data folder. Detailed below

File 1: mismanaged-plastic-waste-per-capita.csv

File 2: per-capicta-ocean-plastic-waste.txt

File 3: UN_country_population.tsv

File 4: UN_country_region.xlsx

Note: The two data sources use different text for name, like “Turkey” or “Turkiye”, so joining of data should be based on the ISO3 Alpha-code.

Also, given that all dataset have a name variable, I suggest using a select() prior to joining the data sets in order to remove all but one of the name columns. Otherwise, the resulting joined data will have variables name.x and name.y but not name which may be confusing.


Load & join each data set.

Let’s do more collaborative work.

🧭🚘😐😐😐😐 (Member 2 only - pair 1)

data1 <- read_csv("data/mismanaged-plastic-waste-per-capita.csv")
plastic_data_all <- data1
plastic_data_all %>% head(n = 10)

😐😐🧭🚘😐😐 (Member 4 only - pair 2)

data2 <- read_csv2("data/per-capita-ocean-plastic-waste.txt")
plastic_data_all <- data2 %>%
  select(-name) %>%
  left_join(plastic_data_all, by = "code")

😐😐😐😐🧭🚘 (Member 6 only - pair 3)

data3 <- read_tsv("data/UN_country_population.tsv")
plastic_data_all <- data3 %>%
  select(-name) %>%
  right_join(plastic_data_all, by = "code")

Show answer to question

The data from the UN contains more rows (run nrow(data3)) than the plastic waste data set (run nrow(data1)). This is because the plastic waste data set only contains data from countries/territories with a coastline, whilst the UN data contains population data on all countries/territories whether they are coastal nations or landlocked. If we instead ran the code data3 %>% left_join(data1, by = "code") then the plastic waste data is added to the UN data, but there are no plastic waste data for landlocked countries. Consequently, the missing entries will be filled with NAs. This can be resolved by using drop_na() to remove all rows that contain at least one NA. Therefore, the following code should produce the same result:

data3 %>% 
  select(-name) %>%
  left_join(plastic_data_all, by = "code") %>%
  drop_na()

🚘🧭😐😐😐😐 (Member 1 only - pair 1)

data4 <- read_excel("data/UN_country_region.xlsx")
plastic_data_all <- data4 %>%
  select(-name) %>%
  right_join(plastic_data_all, by = "code")

🚘🚘🚘🚘🚘🚘 (For all)

Everybody, ⬇️ Pull the latest changes from the shared repository. Check that all 4 files are loaded and that they are joined into a single data frame.


Exercises

Now that you have loaded and joined the data, let’s now do some investigations.

Please continue to work collaboratively, using pair programming and taking turns in contributing to the questions. When the team member changes, remember to begin with a ⬇️ Pull from GitHub and then finish with a 🧶 ✅ ⬆️ Knit, Commit and Push.

Only one person should have their hands on their computer at any one time to minimise the chance of merger conflicts.

EXERCISE 1.

EXERCISE 2

EXERCISE 3

Exercise 4

The variable names in plastic_data_all are quite long, let’s do something about that.

In addition, region name of "Latin America and The Caribbean" is much longer than the other regions, so we can consider replacing this text with a suitable acronym.

Finally, create a frequency table or calculate an interesting statistic per region that uses the renamed variables.


Wrapping up

At the end of the lab, you need to ensure that you have your own personal copy of today’s work. Please follow the following instructions carefully:

🚘🚘🚘🚘🚘🚘 (For all)

Everybody, ⬇️ Pull the latest changes from the shared repository.

😐🚘🚘🚘🚘🚘 (All except member 1)

On GitHub, create your own copy of the shared repository. You can do this using the same instructions as at the start when copying today’s template repository, but instead importing from member 1’s GitHub account rather than the course account.

If you want to continue to work on today’s lab after the workshop, then you will need to create a new version control project with your personal copy of the repository that you have just created.

🚘😐😐😐😐😐 (Member 1 only)

At the end of the workshop, you want to ensure that only you can make further changes to the shared repository. To do this, you will need to remove the collaboration permissions of your team members. To do this:


That’s all for today. In next week’s lab we will continue to work collaboratively, but we will look at how to resolve merger conflicts.