MILRD is an education organization that offers Virtual Training Projects (VTPs) in advanced bioinformatics and other data science techniques for professionals, trainees, and students (graduate to high-school level).
The purpose of today’s presentation if to provide an overview of our VTPs and discuss volunteer opportunities for Illumina staff scientists and bioinformaticians.
VTPs are 1-2 week training projects teaching bioinformatic pipelines for analyzing data in a specific subdomain, e.g. Metagenomics. These projects are generally based on reproducing the results from peer-reviewed scientific literature, starting with raw data and working through to the data visualization and inference. When integrated into a course, VTPs integrate bioinformatic and data science competencies into the classroom environment.
MILRD Scholarship participants completed one of four VTPs:
Course integration VTPs include:
Participants are inrtoduced to the project, provided bioinformatics tasks, and assigned AWS instances and sample IDs on this platform.
Bifx instructions example:Participants are shown a complete analysis with a single sample and then asked to execute the same set of steps on their sample.
AWS-hosted RStudio Instance:We currently use Slack for group collaboration, but are currently implementing our own group channel functionality.
Checkpoints Results Each set of VTP analysis tasks is broken down into Checkpoints (CPs).
CP Template:The Metagenomics + Microbial Surveillance VTP was created in collaboration with Mason Lab at Cornell Medical Center.
Throughout the VTP, each participant characterizes, quantifies and visualize microbial metagenomics data from sequenced swabs of public urban environments on their own AWS High Performance Compute instance. In the Linux terminal, they perform genomic data quality control, genome alignment, taxonomic characterization & abundance quantification, and in R, they viaualize results, conduct a principal component analysis. To conclude, they investigate their most abundant species and use the Patric database to consider how they would determine the strains of these species.
Linux Steps R Steps Subset Component
Checkpoint Example: https://milrd.org/wp-content/uploads/2022/06/Screen-Shot-2022-06-07-at-2.26.46-PM.png
Scholarship participants from underrepresented and/or socioeconomically disadvantaged backgrounds and high-need organizations are sourced via our collaborators, including Social Good Fund, MindsOf Initiative, Science Teachers Association of New York State, and the State University of New York.
In Q4 of 2020, MILRD received funds from the Illumina Corporate Foundation and multiple individual donors to enroll aspiring scientists at the high school and college levels from underrepresented backgrounds and teachers from high-need schools in MILRD’s Virtual Training Projects (VTPs). This was our first round of grant funding to support disadvantaged students. MILRD used these funds to support scholarship students and teachers through January 2022. All told, 170 scholarship participants were supported with these funds.
Category | Percentage |
High Schoolers | 18% |
Undergraduates | 74% |
Teachers | 8% |
Category | Percentage |
Male | 42% |
Female | 58% |
1Participants are presented a blank text box when asked to report their gender.
Category | Percentage |
White | 42% |
Black or African American | 34% |
Asian | 7% |
Hispanic or Latino | 7% |
Native Hawaiian or Other Pacific Islanders | 2% |
Two or More Races | 1% |
American Indian or Alaska Native | 0% |
Other | 8% |
VTP | Percentage (# Participants) |
Metagenomics + Microbial Surveillance | 67% (115) |
Single-cell Transcriptomics + Lung Cell Characterization | 13% (22) |
Variant Calling + COVID-19 | 12% (20) |
Single-cell Transcriptomics + Habenula Neuron Characterization | 8% (13) |
The project was successful overall. All participants reported they would recommend the VTPs to other students. A majority (> 60% of the students) requested information on our 2-3 week extension VTPs. Many (> 15% of the students) have served as Assistant Mentors to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves.
An impact assessment of MILRD scholarship undergraduate and high school students from the MILRD/MindsOf collaboration showed substantial increases in self-reported knowledge across all assessed categories following VTP completion.
In 2022, we plan to (1) enroll whole classes at high schools and colleges/universities, and have already secured multiple course integrations and (2) enroll more teachers in VTPs for Professional Development/CTLE credit to promote integration of MILRD’s VTPs—and subsets of VTPs—into their courses.
To understand the impact of our VTPs, we conducted a pre/post self-efficacy study with 20 students from the MILRD/MindsOf project who completed the Metagenomics + Microbial Surveillance VTP between January 1, 2022 and January 31, 2022. We selected this VTP because the greatest number of students completed it.
This study assessed the Metagenomics + Microbial Surveillance VTP as an intervention to increase knowledge of: genomics data format/structure, metagenomics sample processing & analysis, Linux/Bash terminal use, and applications of bioinformatics tools.
The study design is a within-subjects (pre-post) design where we assess relevant dependent measures immediately before and immediately after workshop participation. There is no control group.
Of the 20 MILRD/MindsOf students in this study, 17 identified as ‘Black or African American’ and 3 identified as ‘Asian or Pacific Islander’. No students indicated they were of Hispanic or Latino descent.
The group was not large and varied enough to assess gender and education level as covariates:
library(flextable)
library(magrittr)
df_3 <- data.frame(Category = c("Gender*", "Education Level"), Participants = c("18: identified as 'Female'; 1: identified as 'Male'; 1: not reported", "17: 'undergraduate'; 3: 'high school'"))
table_3 <- df_3 %>% regulartable()
table_3 <- bold(table_3, bold = TRUE, part = "header")
table_3 <- set_header_labels(table_3, Category = "Category", Percentage = "Breakdown")%>% autofit()
table_3
Category | Participants |
Gender* | 18: identified as 'Female'; 1: identified as 'Male'; 1: not reported |
Education Level | 17: 'undergraduate'; 3: 'high school' |
Students were asked to rate their knowledge level for each of these questions on a 6-point, whole number, scale from 0 (None) to 6 (Expert):
Independent Variable
: Time of Assessment (pre, post)
Dependent Measures
: knowledge of (1) genomics data format/structure, (2) metagenomics sample processing, (3) metagenomics analysis, (4) Linux/Bash terminal use, and (5) applications of bioinformatics tools.
Overall, students reported substantial increases in knowledge across all assessed categories following VTP completion: (a.) genomics data format knowledge (Cohen’s d = 2.67), (b.) metagenomics data collection/processing knowledge (Cohen’s d = 4.15), (c.) metagenomics analysis knowledge (Cohen’s d = 3.28), (d.) linux terminal/bash knowledge (Cohen’s d = 2.37), (e.) R/RStudio knowledge (Cohen’s d = 3.29), (f.) bioinformatics application knowledge (Cohen’s d = 2.21).
Please note: because students replied to the survey with whole-number answers, some of the lines that connect pre/post responses are on top of each other; thus 20 distinct lines aren’t always available.
#library(vioplot)
library(ggpubr)
#library(ggplot2)
library(effsize)
pre = read.csv('pre_metagenomics_for_csv.csv', header=TRUE, sep=',')
post = read.csv('post_metagenomics_for_csv.csv', header=TRUE, sep=',')
#I understand how genomics data are structured_formatted
pre_genomics_data <- pre$I.understand.how.genomics.data.are.structured_formatted
post_genomics_data <- post$I.understand.how.genomics.data.are.structured_formatted
summary(pre_genomics_data)
summary(post_genomics_data)
mean(post_genomics_data) - mean(pre_genomics_data)
sd(pre_genomics_data)
sd(post_genomics_data)
t.test(post_genomics_data, pre_genomics_data)
cohen.d(post_genomics_data, pre_genomics_data)
genomics_data <- data.frame(Pre_VTP = pre_genomics_data, Post_VTP = post_genomics_data)
ggpaired(genomics_data, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how genomics data are structured/formatted", xlab = " ", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of genomics data format knowledge before completing the intervention (M = 0.550, SD = 0.999) than after (M = 3.55, SD = 1.23), Mdiff = 3.0, t(20) = 8.45, p < .0001. The effect size is very large (Cohen’s d = 2.67).
#I understand how metagenomics data are collected and processed
pre_metagenomics_collected <- pre$I.understand.how.metagenomics.data.are.collected.and.processed
post_metagenomics_collected <- post$I.understand.how.metagenomics.data.are.collected.and.processed
summary(pre_metagenomics_collected )
summary(post_metagenomics_collected)
mean(post_metagenomics_collected) - mean(pre_metagenomics_collected )
sd(pre_metagenomics_collected )
sd(post_metagenomics_collected)
t.test(post_metagenomics_collected, pre_metagenomics_collected )
cohen.d(post_metagenomics_collected, pre_metagenomics_collected )
metagenomics_collected <- data.frame(Pre_VTP = pre_metagenomics_collected , Post_VTP = post_metagenomics_collected)
ggpaired(metagenomics_collected, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are collected and processed", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics data collection/processing knowledge before completing the intervention (M = 0.450, SD = 0.605) than after (M = 3.85, SD = 0.988), Mdiff = 3.40, t(20) = 13.1, p < .0001. The effect size is very large (Cohen’s d = 4.15).
#I understand how metagenomics data are analyzed
pre_metagenomics_analyzed <- pre$I.understand.how.metagenomics.data.are.analyzed
post_metagenomics_analyzed <- post$I.understand.how.metagenomics.data.are.analyzed
summary(pre_metagenomics_analyzed)
summary(post_metagenomics_analyzed)
mean(post_metagenomics_analyzed) - mean(pre_metagenomics_analyzed)
sd(pre_metagenomics_analyzed)
sd(post_metagenomics_analyzed)
t.test(post_metagenomics_analyzed, pre_metagenomics_analyzed)
cohen.d(post_metagenomics_analyzed, pre_metagenomics_analyzed)
metagenomics_analyzed <- data.frame(Pre_VTP = pre_metagenomics_analyzed , Post_VTP = post_metagenomics_analyzed)
ggpaired(metagenomics_analyzed, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are analyzed", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics analysis knowledge before completing the intervention (M = 0.400, SD = 0.598) than after (M = 3.65, SD = 1.27), Mdiff = 3.25, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.28).
#I understand how to use the Linux/Bash terminal
pre_linux_bash <- pre$I.understand.how.to.use.the.Linux_Bash.terminal
post_linux_bash <- post$I.understand.how.to.use.the.Linux_Bash.terminal
summary(pre_linux_bash)
summary(post_linux_bash)
mean(post_linux_bash) - mean(pre_linux_bash)
sd(pre_linux_bash)
sd(post_linux_bash)
t.test(post_linux_bash, pre_linux_bash)
cohen.d(post_linux_bash, pre_linux_bash)
linux_bash <- data.frame(Pre_VTP = pre_linux_bash, Post_VTP = post_linux_bash)
ggpaired(linux_bash, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use the Linux/Bash terminal", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of linux terminal/bash knowledge before completing the intervention (M = 0.450, SD = 1.28) than after (M = 3.65, SD = 1.42), Mdiff = 3.20, t(20) = 7.48, p < .0001. The effect size is very large (Cohen’s d = 2.37).
#I understand how to use R progamming language and R_RStudio
pre_r <- pre$I.understand.how.to.use.R_RStudio
post_r <- post$I.understand.how.to.use.R_RStudio
summary(pre_r)
summary(post_r)
mean(post_r) - mean(pre_r)
sd(pre_r)
sd(post_r)
t.test(post_r, pre_r)
cohen.d(post_r, pre_r)
r <- data.frame(Pre_VTP = pre_r, Post_VTP = post_r)
ggpaired(r, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use R/RStudio", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of R/RStudio knowledge before completing the intervention (M = 0.350, SD = 0.8123) than after (M = 3.80, SD = 1.24), Mdiff = 3.45, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.29).
#I understand how bioinformatics tools can be used to answer a scientific question
pre_scientific_question <- pre$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question
post_scientific_question <- post$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question
summary(pre_scientific_question)
summary(post_scientific_question)
mean(post_scientific_question) - mean(pre_scientific_question)
sd(pre_scientific_question)
sd(post_scientific_question)
t.test(post_scientific_question, pre_scientific_question)
cohen.d(post_scientific_question, pre_scientific_question)
scientific_question <- data.frame(Pre_VTP = pre_scientific_question, Post_VTP = post_scientific_question)
ggpaired(scientific_question, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how bioinformatics tools can help answer a scientific question", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of bioinformatics application knowledge before completing the intervention (M = 1.25, SD = 1.25) than after (M = 3.95, SD = 1.19), Mdiff = 2.7, t(20) = 6.99, p < .0001. The effect size is very large (Cohen’s d = 2.21).
Participants had overwhelmingly positive experiences:
All participants reported they would recommend the VTPs to other students.
A majority (> 60% of the students) requested information about our 2-3 week extension VTPs, if/when funds become available. VTP Extension Projects allow participants to dive deeper into a subject and its analysis methods after completing the initial VTP. In Extension Projects, participants complete a customized and independent research project with that VTP’s dataset and present a capstone to an established domain expert who can assess the work and provide an evaluation that can be used for a recommendation letter if desired.
Many (> 15% of the students) have served as Assistant Mentors (AMs) to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves. The AM role is similar to that of a TA. We ask AMs for a 4-hour time commitment throughout the VTP, and they help new participants address the issues they encountered during their cohort (AMs are not expected to be experts or to address questions they are not comfortable answering). Assistant Mentoring helps participants gain a deeper understanding of the material and to interact with more researchers from academia and industry. MILRD pays an honorarium to each AM for their efforts.
Naomi Zakimi was a Genetics and Genomics major from UC Davis and completed the Metagenomics + Microbial Surveillance VTP. She and completed her degree shortly after completing the VTP and is now working at a bioinformatics lab (also at UC Davis).
Naomi later served as an Assistant Mentor to a new Metagenomics VTP cohort and additionally completed our Single-cell Transcriptomics + Lung Cell Characterization VTP.
Thank you…for this amazing opportunity! I learned so much from this VTP and found the topic very interesting
Anthony completed the Variant Calling + COVID-19 VTP as a high school student in Jamaica. He is currently a first year student at the University of Miami double majoring in Finance and Mathematics with a Minor in Computer Science.
Here are Anthony’s words about his experience in the MILRD VTP program:
This [VTP] program was so beneficial to my development. As a prospective Data Scientist, I got the opportunity to work in an RStudio environment and get more acquainted with R. It has definitely helped me to solidify my knowledge of the programming language. One of my favourite parts was interacting with the mentors, along with the other students doing the program with me. It was a unique opportunity to interact with persons more knowledgeable than me without feeling left out… The VTP gave me unprecedented access to research materials and undoubtedly developed my data and computational skills. I would recommend this VTP to anyone who is interested in a data related field.
…I was surrounded (virtually) with extremely smart individuals, some with PhDs, who made me feel welcome and excited to tackle the problems at hand. It was by no means easy, but by collaborating with the group and working together to fix bugs, we were successful in the end…. As a recent self-taught programmer at the time, I had never worked with such a large code base. That said, I was able to learn a few tips and tricks on how to structure and write clean and effective code, which has been helping me ever since.
It has helped me so much that I have gone on to do many exciting things with code and I even decided to tack on a Computer Science minor to my already strenuous course load…. All in all, my participation in the program helped me a lot and I am very grateful.”
MILRD has secured course integrations and scholarship students in collaboration with MindsOf and Steppingstone Scholars.
MindsOf Initiative is a mentorship program that supports aspiring professionals of Caribbean heritage across a broad range of fields. This initiative provides career guidance and training opportunities to high schoolers, undergraduates, and young professionals. MindsOf and MILRD collaborate to provide cost-free VTPs to underrepresented minority students.
Steppingstone Scholars is an educational social mobility non-profit organization. For low income students in the City of Philadelphia there are often no clear pathways to college or the workforce. Since 1999, Steppingstone has been working to address this systemic problem by creating not just one pathway, but many. Steppingstone Ventures programming reaches over 1000 students per year, preparing students for their futures while serving as an innovation hub for better ways to support students with a focus on STEM enrichment and university partnerships.
ILMN staff can volunteer to support MILRD Scholarship VTPs via two modalities. Each is a 1hr time committment.
Modality 1: Asynchronous cohort -group channel feedback.
VTP Mentors will:
Modality 1 Example:
Modality 2: Post-VTP Career Talk
Those interested to volunteer can complete our MILRD Volunteer Interest Form for Illumina Staff.
Questions and comments can be directed to [email protected].