Important Announcement 1

Before we start, everyone must complete the Pre-VTP survey:

If you haven’t done this, please take 5 minutes to complete this now.

Important Announcement 2

Here is a Google Sheet that contains Sample Assignments, AWS and R addresses/info, and Participant Info:


  1. Introduction to the project.

  2. Rationale for the MetaSub Project

  3. Overview: bioinformatics analyses you’ll perform

1. Introduction to the project.

The primary goal of this project is to use the Linux and R programming languages to bioinformatically characterize, quantify and visualize the species composition of urban microbiome samples (i.e. subway swabs) from raw data to completed analysis.

What is bioinformatics?

Think, Room (4-5 ppl), Share:

  • 1-minute: think about one lab, activity, or lesson you prevously taught that had an element that could have been explored using bioinformatics analysis? (Explain)

  • 3-minutes: divide into breakout rooms and discuss with your group

  • 2-minutes: re-join the main meeting room and present 1-2 examples to the other participants.

Why is the metagenomics application of bioinformatics useful?

In case you (or more likely, your students) are thinking, “How would someone be able to use this knowledge in real life?”, here are three real-world applications of the metagenomics analysis technique you will learn:

  • Research Applications: you can ask questions about the similarity of microbiome samples, the prevalence/emergence of antimicrobial resistant strains, etc. The Metasub Project, which is where the data you’ll analyze in this VTP comes from, would be an example of this application.

  • Clinical Applications: This technique is already being employed by some biotech startups, like Karius (, to rapidly diagnose blood-borne infections.

  • Biotech/Industry Applications: Some companies offer microbial surveillance services to identify and monitor the presence of resistant pathogens (e.g. on hospital surfaces). One such startup called Biotia ( was spun out of the Mason lab (the same Cornell Med lab that created the Metasub project) and utilizes many of the same techniques that you will employ in this VTP.”

In a broader sense, bioinformatics is a subdivision of data science; principles learned in the former are highly relevant to the latter. We’ll point these out as we go along.

History of metagenomics:

Example: Clinical Application of Metagenomics:

Think, Room (4-5 ppl), Share:

  • 1-minute: Come up with one potential application of metagenomics not discussed here.

  • 3-minutes: divide into breakout rooms and discuss with your group

  • 2-minutes: re-join the main meeting room and present 1-2 examples from your breakout room to the other participants.

2. Rationale for the MetaSub Project

To accomplish this goals in this project, we will use data from the Metasub project, an effort to characterize the “built environment” microbiomes of mass transit systems around the world, headed by Dr. Chris Mason’s lab at Weill Cornell Medical Center (

Here’s the recent Metasub paper in case you haven’t seen it and would like to review it later:

(To be frank, this article is a bit challenging to read, so we suggest you review it later on if you’re inclined, after you’ve done a bit of the project.)

Some additional information in case you’re interested:

Metasub was borne out of a project called PathoMap (also from the Mason lab), which began in summer 2013 to profile the New York City metagenome in, around, and below NYC on mass-transit areas of the built environment, focusing on the subway.

Here’s the Pathomap paper in case you would like to review it later:

Pathomap sought to establish baseline profiles across the subway system, identify potential bio-threats, and provide an additional level of data that can be used by the city to create a “smart city;” i.e., one that uses high- dimensional data to improve city planning, management, and human health.”

Metasub extended the Pathomap project based on the recognition that NYC is not the only city in the world that could benefit from a systematic, longitudinal metagenomic profile of its subway system.

Although NYC subway has the most stations, it ranks 7th in the world in term of the number of riders per year. A wide variety of population density, length, and climate types define the busiest subways of the world, ranging from cold (Moscow) to temperate (New York City, Paris), to subtropical (Mexico City) and tropical (São Paulo).

To address this gap in our knowledge of the built environment, the Mason lab created Metasub: an international consortium of laboratories to establish a world-wide “DNA map” of microbiomes in mass transit systems.

Take a look at Figure 1, which provides an overview of the Pathomap project’s design and execution: