Ready-made & custom-built Virtual Training Projects in advanced bioinformatics and other data science techniques, with tracks for professionals, trainees, and students.
With the guidance of expert mentors and scheduled cohort meetings, Virtual Training Project (VTP) participants use our instructional platform—featuring step-by-step tasks, AI-enabled Q&A, and a group collaboration channel—to reproduce analyses from published research and cultivate skills at the intersection of data science & biology.
Virtual Training Project Examples
Bioinformatics + Genomics
MetaSUB is an international consortium anchored by the Mason Lab at Weill Cornell Medical Center that aims to characterize the microbiome of the urban built environment. They have created a consortium of laboratories to establish a world-wide "DNA map" of microbiomes in mass transit systems.
In this virtual version of the project, participants characterize, quantify and visualize microbial genome data using sequenced swabs from public urban environments. In the Linux terminal, participants perform genomic data QC, genome alignment, taxonomic characterization, and in R, PCA clustering and visualization.
Tech + Data Science
Alexapath—a Bill and Melinda Gates Foundation funded biotech startup—developed a smart microscope to screen for cervical cancer in developing nations at reduced costs and increased throughput.
Advances in computer vision, deep learning, computer hardware have made it possible to automate the analysis of images, which increases efficiency and reduces cost. In this VTP, participants train, test, and optimize a Convolutional Neural Network, implemented in TensorFlow, using cervical cancer image data collected from Alexapath's microscopy platform.
Transcriptomics + Cellular/Molecular Neuroscience
The habenula is a brain structure critical for several cognitive behaviors, and has been proposed to contribute to the neurobiological underpinnings of depression and addiction, but the neuron subtypes of the habenula remained unknown. Asst. Prof. Michael Wallace—a neuroscientist at Boston University School of Medicine and formerly Harvard Medical School—used high-throughput single-cell transcriptional profiling, monosynaptic retrograde tracing, and multiplexed in situ hybridization to characterize the cells of the mouse habenula.
In this VTP, participants profile the single-cell transcriptome of dissociated cells from a single mouse habenula sample. In an assigned R instance (hosted on a high-performance AWS cluster) each participant performs library QC, data filtering, clustering, principal component analysis, t-SNE dimensionality reduction, cell-type assignment, gene expression, and merged-sample analysis.
Economics + Large-scale Data
The Opportunity Atlas is the first comprehensive dataset on children's outcomes across neighborhoods in the US. Led by Dr. Raj Chetty of Harvard University and Opportunity Insights, the dataset was built using individual-level data from the US Census Bureau, federal income tax returns and American Community Surveys. The project has implications for social and economic policies to promote income mobility.
In this VTP, each participant investigates the power of several variables—such as household income and incarceration rates—to predict social mobility in an assigned city. Using the R statistical programming language and its popular library ggplot2, participants compute regressions, correlations, and generate histograms, scatterplots and other visualizations. To conclude, participants conduct a similar analysis on a US city/locale of their choice and compare the results to those from their assigned city.