Part 1

This guide is intended to teach you how to teach you one component of metagenomic analysis: how to plot abundances at the “phylum” level for each metagenomics sample.

(1) Login to RStudio Instance (URL/username in the Google Sheet)

You are provided an instance of the R editor RStudio on AWS already waiting for you to use. To access it, navigate to the URL provided in the Google Sheet in a web browser.

Login to your assigned RSstudio instance.

(2) Generate barplot using subset of kraken samples

In this section, you will learn how to use R to generate two plots that help visualize the similarities and differences between a subset of the Metasub samples and compare it to metagenomics samples from the Human Microbiome project..

We have provided a subset of of metagnomics data (taxa_table.csv) and one with, for your convenience.

First we will plot our samples as a stacked-barplot. A stacked-barplot shows a set of numbers as a series of columns one on top of each other colored by a label. In our case, a taxonomic profile is a set of numerical abundances labeled by the microbial species it belongs to.

Here’s how we make this plot in R:

library(ggplot2)   # These lines load additional libraries into our environment

taxa = read.csv('taxa_table.csv', header=TRUE, sep=',')  # Read our taxonomic table into a computational object from a file

taxa = taxa[taxa$rank == 'phylum',]  # This filters our taxonomic table to a specific taxonomic rank. One of Kingdom, Phylum, Class, Order, Family, Genus, Species. Play around with a few other ranks.

ggplot(taxa, aes(x=sample, y=percent_abundance, fill=taxon)) + # this creates a ggplot object. Can you figure out what the aes(...) section is doing?
  geom_bar(stat="identity") +  # this tells ggplot how we want our data to be displayed
  xlab('Sample') +  # These lines tell ggplot what our axis labels should be
  ylab('Abundance') +
  labs(fill='Phylum') +
  theme(axis.text.x = element_text(angle = 90)) # this rotates the x-axis text 90 degrees

Here is a video of this step being performed: https://youtu.be/WABocs_Gu4Y.

After this is complete:

  1. Run script line-by-line
  2. Modify code: edit your script to generate a barplot at the “family” level.