In Q4 of 2020, MILRD received funds from the Illumina Corporate Foundation and multiple individual donors to enroll aspiring scientists at the high school and college levels from underrepresented backgrounds and teachers from high-need schools in MILRD’s Virtual Training Projects (VTPs). MILRD used these funds to support scholarship students and teachers through January 2022. All told, 170 scholarship participants were supported with these funds.
MILRD Scholarship participants completed one of four VTPs:
VTPs include:
Scholarship participants from underrepresented and/or socioeconomically disadvantaged backgrounds and high-need organizations were sourced via our collaborators, including Social Good Fund, MindsOf Initiative, Science Teachers Association of New York State, and the State University of New York.
Category | Percentage |
High Schoolers | 18% |
Undergraduates | 74% |
Teachers | 8% |
Category | Percentage |
Male | 42% |
Female | 58% |
1Participants are presented a blank text box when asked to report their gender.
Category | Percentage |
White | 42% |
Black or African American | 34% |
Asian | 7% |
Hispanic or Latino | 7% |
Native Hawaiian or Other Pacific Islanders | 2% |
Two or More Races | 1% |
American Indian or Alaska Native | 0% |
Other | 8% |
VTP | Percentage (# Participants) |
Metagenomics + Microbial Surveillance | 67% (115) |
Single-cell Transcriptomics + Lung Cell Characterization | 13% (22) |
Variant Calling + COVID-19 | 12% (20) |
Single-cell Transcriptomics + Habenula Neuron Characterization | 8% (13) |
The project was successful overall. All participants reported they would recommend the VTPs to other students. A majority (> 60% of the students) requested information on our 2-3 week extension VTPs. Many (> 15% of the students) have served as Assistant Mentors to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves.
An impact assessment of MILRD scholarship undergraduate and high school students from the MILRD/MindsOf collaboration showed substantial increases in self-reported knowledge across all assessed categories following VTP completion.
In 2022, we plan to (1) enroll whole classes at high schools and colleges/universities, and have already secured multiple course integrations and (2) enroll more teachers in VTPs for Professional Development/CTLE credit to promote integration of MILRD’s VTPs—and subsets of VTPs—into their courses.
To understand the impact of our VTPs, we conducted a pre/post self-efficacy study with 20 students from the MILRD/MindsOf project who completed the Metagenomics + Microbial Surveillance VTP between January 1, 2022 and January 31, 2022. We selected this VTP because the greatest number of students completed it.
This study assessed the Metagenomics + Microbial Surveillance VTP as an intervention to increase knowledge of: genomics data format/structure, metagenomics sample processing & analysis, Linux/Bash terminal use, and applications of bioinformatics tools.
The study design is a within-subjects (pre-post) design where we assess relevant dependent measures immediately before and immediately after workshop participation. There is no control group.
Of the 20 MILRD/MindsOf students in this study, 17 identified as ‘Black or African American’ and 3 identified as ‘Asian or Pacific Islander’. No students indicated they were of Hispanic or Latino descent.
The group was not large and varied enough to assess gender and education level as covariates:
library(flextable)
library(magrittr)
df_3 <- data.frame(Category = c("Gender*", "Education Level"), Participants = c("18: identified as 'Female'; 1: identified as 'Male'; 1: not reported", "17: 'undergraduate'; 3: 'high school'"))
table_3 <- df_3 %>% regulartable()
table_3 <- bold(table_3, bold = TRUE, part = "header")
table_3 <- set_header_labels(table_3, Category = "Category", Percentage = "Breakdown")%>% autofit()
table_3
Category | Participants |
Gender* | 18: identified as 'Female'; 1: identified as 'Male'; 1: not reported |
Education Level | 17: 'undergraduate'; 3: 'high school' |
Students were asked to rate their knowledge level for each of these questions on a 6-point, whole number, scale from 0 (None) to 6 (Expert):
Independent Variable
: Time of Assessment (pre, post)
Dependent Measures
: knowledge of (1) genomics data format/structure, (2) metagenomics sample processing, (3) metagenomics analysis, (4) Linux/Bash terminal use, and (5) applications of bioinformatics tools.
Overall, students reported substantial increases in knowledge across all assessed categories following VTP completion: (a.) genomics data format knowledge (Cohen’s d = 2.67), (b.) metagenomics data collection/processing knowledge (Cohen’s d = 4.15), (c.) metagenomics analysis knowledge (Cohen’s d = 3.28), (d.) linux terminal/bash knowledge (Cohen’s d = 2.37), (e.) R/RStudio knowledge (Cohen’s d = 3.29), (f.) bioinformatics application knowledge (Cohen’s d = 2.21).
Please note: because students replied to the survey with whole-number answers, some of the lines that connect pre/post responses are on top of each other; thus 20 distinct lines aren’t always available.
#library(vioplot)
library(ggpubr)
#library(ggplot2)
library(effsize)
pre = read.csv('pre_metagenomics_for_csv.csv', header=TRUE, sep=',')
post = read.csv('post_metagenomics_for_csv.csv', header=TRUE, sep=',')
#I understand how genomics data are structured_formatted
pre_genomics_data <- pre$I.understand.how.genomics.data.are.structured_formatted
post_genomics_data <- post$I.understand.how.genomics.data.are.structured_formatted
summary(pre_genomics_data)
summary(post_genomics_data)
mean(post_genomics_data) - mean(pre_genomics_data)
sd(pre_genomics_data)
sd(post_genomics_data)
t.test(post_genomics_data, pre_genomics_data)
cohen.d(post_genomics_data, pre_genomics_data)
genomics_data <- data.frame(Pre_VTP = pre_genomics_data, Post_VTP = post_genomics_data)
ggpaired(genomics_data, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how genomics data are structured/formatted", xlab = " ", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of genomics data format knowledge before completing the intervention (M = 0.550, SD = 0.999) than after (M = 3.55, SD = 1.23), Mdiff = 3.0, t(20) = 8.45, p < .0001. The effect size is very large (Cohen’s d = 2.67).
#I understand how metagenomics data are collected and processed
pre_metagenomics_collected <- pre$I.understand.how.metagenomics.data.are.collected.and.processed
post_metagenomics_collected <- post$I.understand.how.metagenomics.data.are.collected.and.processed
summary(pre_metagenomics_collected )
summary(post_metagenomics_collected)
mean(post_metagenomics_collected) - mean(pre_metagenomics_collected )
sd(pre_metagenomics_collected )
sd(post_metagenomics_collected)
t.test(post_metagenomics_collected, pre_metagenomics_collected )
cohen.d(post_metagenomics_collected, pre_metagenomics_collected )
metagenomics_collected <- data.frame(Pre_VTP = pre_metagenomics_collected , Post_VTP = post_metagenomics_collected)
ggpaired(metagenomics_collected, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are collected and processed", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics data collection/processing knowledge before completing the intervention (M = 0.450, SD = 0.605) than after (M = 3.85, SD = 0.988), Mdiff = 3.40, t(20) = 13.1, p < .0001. The effect size is very large (Cohen’s d = 4.15).
#I understand how metagenomics data are analyzed
pre_metagenomics_analyzed <- pre$I.understand.how.metagenomics.data.are.analyzed
post_metagenomics_analyzed <- post$I.understand.how.metagenomics.data.are.analyzed
summary(pre_metagenomics_analyzed)
summary(post_metagenomics_analyzed)
mean(post_metagenomics_analyzed) - mean(pre_metagenomics_analyzed)
sd(pre_metagenomics_analyzed)
sd(post_metagenomics_analyzed)
t.test(post_metagenomics_analyzed, pre_metagenomics_analyzed)
cohen.d(post_metagenomics_analyzed, pre_metagenomics_analyzed)
metagenomics_analyzed <- data.frame(Pre_VTP = pre_metagenomics_analyzed , Post_VTP = post_metagenomics_analyzed)
ggpaired(metagenomics_analyzed, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are analyzed", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics analysis knowledge before completing the intervention (M = 0.400, SD = 0.598) than after (M = 3.65, SD = 1.27), Mdiff = 3.25, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.28).
#I understand how to use the Linux/Bash terminal
pre_linux_bash <- pre$I.understand.how.to.use.the.Linux_Bash.terminal
post_linux_bash <- post$I.understand.how.to.use.the.Linux_Bash.terminal
summary(pre_linux_bash)
summary(post_linux_bash)
mean(post_linux_bash) - mean(pre_linux_bash)
sd(pre_linux_bash)
sd(post_linux_bash)
t.test(post_linux_bash, pre_linux_bash)
cohen.d(post_linux_bash, pre_linux_bash)
linux_bash <- data.frame(Pre_VTP = pre_linux_bash, Post_VTP = post_linux_bash)
ggpaired(linux_bash, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use the Linux/Bash terminal", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of linux terminal/bash knowledge before completing the intervention (M = 0.450, SD = 1.28) than after (M = 3.65, SD = 1.42), Mdiff = 3.20, t(20) = 7.48, p < .0001. The effect size is very large (Cohen’s d = 2.37).
#I understand how to use R progamming language and R_RStudio
pre_r <- pre$I.understand.how.to.use.R_RStudio
post_r <- post$I.understand.how.to.use.R_RStudio
summary(pre_r)
summary(post_r)
mean(post_r) - mean(pre_r)
sd(pre_r)
sd(post_r)
t.test(post_r, pre_r)
cohen.d(post_r, pre_r)
r <- data.frame(Pre_VTP = pre_r, Post_VTP = post_r)
ggpaired(r, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use R/RStudio", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of R/RStudio knowledge before completing the intervention (M = 0.350, SD = 0.8123) than after (M = 3.80, SD = 1.24), Mdiff = 3.45, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.29).
#I understand how bioinformatics tools can be used to answer a scientific question
pre_scientific_question <- pre$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question
post_scientific_question <- post$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question
summary(pre_scientific_question)
summary(post_scientific_question)
mean(post_scientific_question) - mean(pre_scientific_question)
sd(pre_scientific_question)
sd(post_scientific_question)
t.test(post_scientific_question, pre_scientific_question)
cohen.d(post_scientific_question, pre_scientific_question)
scientific_question <- data.frame(Pre_VTP = pre_scientific_question, Post_VTP = post_scientific_question)
ggpaired(scientific_question, cond1 = "Pre_VTP", cond2 = "Post_VTP",
color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how bioinformatics tools can help answer a scientific question", xlab = "", ylab = "Student Response")
A paired-samples t-test showed that, as hypothesized, participants reported lower levels of bioinformatics application knowledge before completing the intervention (M = 1.25, SD = 1.25) than after (M = 3.95, SD = 1.19), Mdiff = 2.7, t(20) = 6.99, p < .0001. The effect size is very large (Cohen’s d = 2.21).
Participants had overwhelmingly positive experiences:
All participants reported they would recommend the VTPs to other students.
A majority (> 60% of the students) requested information about our 2-3 week extension VTPs, if/when funds become available. VTP Extension Projects allow participants to dive deeper into a subject and its analysis methods after completing the initial VTP. In Extension Projects, participants complete a customized and independent research project with that VTP’s dataset and present a capstone to an established domain expert who can assess the work and provide an evaluation that can be used for a recommendation letter if desired.
Many (> 15% of the students) have served as Assistant Mentors (AMs) to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves. The AM role is similar to that of a TA. We ask AMs for a 4-hour time commitment throughout the VTP, and they help new participants address the issues they encountered during their cohort (AMs are not expected to be experts or to address questions they are not comfortable answering). Assistant Mentoring helps participants gain a deeper understanding of the material and to interact with more researchers from academia and industry. MILRD pays an honorarium to each AM for their efforts.
Naomi Zakimi was a Genetics and Genomics major from UC Davis and completed the Metagenomics + Microbial Surveillance VTP. She and completed her degree shortly after completing the VTP and is now working at a bioinformatics lab (also at UC Davis).
Naomi later served as an Assistant Mentor to a new Metagenomics VTP cohort and additionally completed our Single-cell Transcriptomics + Lung Cell Characterization VTP.
Thank you…for this amazing opportunity! I learned so much from this VTP and found the topic very interesting
Anthony completed the Variant Calling + COVID-19 VTP as a high school student in Jamaica. He is currently a first year student at the University of Miami double majoring in Finance and Mathematics with a Minor in Computer Science.
Here are Anthony’s words about his experience in the MILRD VTP program:
This [VTP] program was so beneficial to my development. As a prospective Data Scientist, I got the opportunity to work in an RStudio environment and get more acquainted with R. It has definitely helped me to solidify my knowledge of the programming language. One of my favourite parts was interacting with the mentors, along with the other students doing the program with me. It was a unique opportunity to interact with persons more knowledgeable than me without feeling left out… The VTP gave me unprecedented access to research materials and undoubtedly developed my data and computational skills. I would recommend this VTP to anyone who is interested in a data related field.
…I was surrounded (virtually) with extremely smart individuals, some with PhDs, who made me feel welcome and excited to tackle the problems at hand. It was by no means easy, but by collaborating with the group and working together to fix bugs, we were successful in the end…. As a recent self-taught programmer at the time, I had never worked with such a large code base. That said, I was able to learn a few tips and tricks on how to structure and write clean and effective code, which has been helping me ever since.
It has helped me so much that I have gone on to do many exciting things with code and I even decided to tack on a Computer Science minor to my already strenuous course load…. All in all, my participation in the program helped me a lot and I am very grateful.”
In 2022, we plan to:
Scale our programs more quickly by enrolling whole classes at high schools and colleges, and already have secured many course integrations;
Enroll more teachers in VTPs for Professional Development/CTLE credit to promote integration of MILRD’s VTPs into the courses they are teaching; and
Experiment with launching subsets of VTPs so teachers can integrate bioinformatics/computational biology exercises into their courses, even if they are unable to commit 1-2 weeks to a full VTP.
Per (1), one of MILRD’s focuses in 2021 was to seek out partnerships with schools and education organizations so that we could more easily enroll large student cohorts moving forward. This effort really started to bear fruit in Q4 of 2021 and is the result of two notable initiatives:
MILRD presented a VTP workshop, and was a non-profit exhibitor, at the annual conference of the Science Teacher Association of New York State (STANYS), New York’s oldest professional organization of Pre-K to University public-school science educators.
We integrated one of our VTPs into a SUNY Binghamton Undergraduate Microbiology Course.
Both of these initiatives have led to partnerships with schools and education organizations. We have confirmed course integrations at high-need colleges, high schools and education organizations totaling over 500 students, pending procurement of funds.
Per (2) and (3), MILRD recently completed our first teacher professional development (PD) event for CTLE credit in collaboration with STANYS (8 hours CTLE credit, Feb 1-10, 2022) with 9 teachers. We plan to do more PD events with STANYS and are actively recruiting other organizations/schools to partner with for these offerings.
The STANYS PD event highlighted that some teachers want to integrate bioinformatics/data science into their biology courses, but that the 1-2 week commitment for a full VTP isn’t feasible for the courses they teach. We’re working to offer VTP subsets so teachers can integrate smaller (0.5 - 1.5 hours total effort) bioinformatics/data science exercises into their courses.