Executive Summary

In Q4 of 2020, MILRD received funds from the Illumina Corporate Foundation and multiple individual donors to enroll aspiring scientists at the high school and college levels from underrepresented backgrounds and teachers from high-need schools in MILRD’s Virtual Training Projects (VTPs). MILRD used these funds to support scholarship students and teachers through January 2022. All told, 170 scholarship participants were supported with these funds.

MILRD Scholarship participants completed one of four VTPs:

  1. Metagenomics + Microbial Surveillance (Collaboration with Mason Lab, Cornell Medical Center)
  2. Single-cell Transcriptomics + Lung Cell Characterization (Collaboration with Dr. Martina Bradic, Memorial Sloan Kettering Cancer Center)
  3. Variant Calling + COVID-19 (Collaboration with Dr. Adriana Heguy, NYU Medical Center)
  4. Single-cell Transcriptomics + Habenula Neuron Characterization (Collaboration with Dr. Mike Wallace, Sabatini Lab at Harvard Medical School)
Each VTP takes 1-2 weeks and requires 15-20 hours of independent work (in addition to the three scheduled hour-long meetings):

VTPs include:

  • Unlimited support from expert mentors (group channel + video conference calls)
  • Access to all required high-performance compute resources (AWS), analysis tools, and software
  • Access to all source data required to complete the project
  • Optional Pre-VTP Preparation (completed the week prior to the VTP)

Scholarship Participant Breakdown

Scholarship participants from underrepresented and/or socioeconomically disadvantaged backgrounds and high-need organizations were sourced via our collaborators, including Social Good Fund, MindsOf Initiative, Science Teachers Association of New York State, and the State University of New York.

Education/Career Level
Gender1

1Participants are presented a blank text box when asked to report their gender.

Race & Ethnicity
VTP Enrollment

Mentors

Conclusions

The project was successful overall. All participants reported they would recommend the VTPs to other students. A majority (> 60% of the students) requested information on our 2-3 week extension VTPs. Many (> 15% of the students) have served as Assistant Mentors to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves.

An impact assessment of MILRD scholarship undergraduate and high school students from the MILRD/MindsOf collaboration showed substantial increases in self-reported knowledge across all assessed categories following VTP completion.

In 2022, we plan to (1) enroll whole classes at high schools and colleges/universities, and have already secured multiple course integrations and (2) enroll more teachers in VTPs for Professional Development/CTLE credit to promote integration of MILRD’s VTPs—and subsets of VTPs—into their courses.

Impact Assessment

Overview

To understand the impact of our VTPs, we conducted a pre/post self-efficacy study with 20 students from the MILRD/MindsOf project who completed the Metagenomics + Microbial Surveillance VTP between January 1, 2022 and January 31, 2022. We selected this VTP because the greatest number of students completed it.

Variables and Measures

This study assessed the Metagenomics + Microbial Surveillance VTP as an intervention to increase knowledge of: genomics data format/structure, metagenomics sample processing & analysis, Linux/Bash terminal use, and applications of bioinformatics tools.

The study design is a within-subjects (pre-post) design where we assess relevant dependent measures immediately before and immediately after workshop participation. There is no control group.

Of the 20 MILRD/MindsOf students in this study, 17 identified as ‘Black or African American’ and 3 identified as ‘Asian or Pacific Islander’. No students indicated they were of Hispanic or Latino descent.

The group was not large and varied enough to assess gender and education level as covariates:

library(flextable)
library(magrittr)
df_3 <- data.frame(Category = c("Gender*", "Education Level"), Participants = c("18: identified as 'Female'; 1: identified as 'Male'; 1: not reported", "17: 'undergraduate'; 3: 'high school'"))
table_3 <- df_3 %>% regulartable() 
table_3 <- bold(table_3, bold = TRUE, part = "header")
table_3 <- set_header_labels(table_3, Category = "Category", Percentage = "Breakdown")%>% autofit() 
table_3
*Participants are presented a blank text box when asked to report their gender.

Students were asked to rate their knowledge level for each of these questions on a 6-point, whole number, scale from 0 (None) to 6 (Expert):

  • I understand how genomics data are structured/formatted
  • I understand how metagenomics data are collected and processed
  • I understand how metagenomics data are analyzed
  • I understand how to use the Linux/Bash terminal
  • I understand how bioinformatics tools can be used to answer a scientific question

Independent Variable: Time of Assessment (pre, post)

Dependent Measures: knowledge of (1) genomics data format/structure, (2) metagenomics sample processing, (3) metagenomics analysis, (4) Linux/Bash terminal use, and (5) applications of bioinformatics tools.

Results

Overall, students reported substantial increases in knowledge across all assessed categories following VTP completion: (a.) genomics data format knowledge (Cohen’s d = 2.67), (b.) metagenomics data collection/processing knowledge (Cohen’s d = 4.15), (c.) metagenomics analysis knowledge (Cohen’s d = 3.28), (d.) linux terminal/bash knowledge (Cohen’s d = 2.37), (e.) R/RStudio knowledge (Cohen’s d = 3.29), (f.) bioinformatics application knowledge (Cohen’s d = 2.21).

Please note: because students replied to the survey with whole-number answers, some of the lines that connect pre/post responses are on top of each other; thus 20 distinct lines aren’t always available.

a. Genomics data format

#library(vioplot)
library(ggpubr)
#library(ggplot2)
library(effsize)


pre = read.csv('pre_metagenomics_for_csv.csv', header=TRUE, sep=',')

post = read.csv('post_metagenomics_for_csv.csv', header=TRUE, sep=',')

#I understand how genomics data are structured_formatted    

pre_genomics_data <- pre$I.understand.how.genomics.data.are.structured_formatted
post_genomics_data <- post$I.understand.how.genomics.data.are.structured_formatted

summary(pre_genomics_data)
summary(post_genomics_data)

mean(post_genomics_data) - mean(pre_genomics_data)

sd(pre_genomics_data)
sd(post_genomics_data)

t.test(post_genomics_data, pre_genomics_data)

cohen.d(post_genomics_data, pre_genomics_data)

genomics_data <- data.frame(Pre_VTP = pre_genomics_data, Post_VTP = post_genomics_data)
ggpaired(genomics_data, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how genomics data are structured/formatted", xlab = " ", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of genomics data format knowledge before completing the intervention (M = 0.550, SD = 0.999) than after (M = 3.55, SD = 1.23), Mdiff = 3.0, t(20) = 8.45, p < .0001. The effect size is very large (Cohen’s d = 2.67).

b. Metagenomics data collection

#I understand how metagenomics data are collected and processed

pre_metagenomics_collected <- pre$I.understand.how.metagenomics.data.are.collected.and.processed
post_metagenomics_collected <- post$I.understand.how.metagenomics.data.are.collected.and.processed

summary(pre_metagenomics_collected )
summary(post_metagenomics_collected)

mean(post_metagenomics_collected) - mean(pre_metagenomics_collected )

sd(pre_metagenomics_collected )
sd(post_metagenomics_collected)

t.test(post_metagenomics_collected, pre_metagenomics_collected )

cohen.d(post_metagenomics_collected, pre_metagenomics_collected )


metagenomics_collected <- data.frame(Pre_VTP = pre_metagenomics_collected , Post_VTP = post_metagenomics_collected)
ggpaired(metagenomics_collected, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are collected and processed", xlab = "", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics data collection/processing knowledge before completing the intervention (M = 0.450, SD = 0.605) than after (M = 3.85, SD = 0.988), Mdiff = 3.40, t(20) = 13.1, p < .0001. The effect size is very large (Cohen’s d = 4.15).

c. Metagenomics analysis

#I understand how metagenomics data are analyzed    

pre_metagenomics_analyzed <- pre$I.understand.how.metagenomics.data.are.analyzed
post_metagenomics_analyzed  <- post$I.understand.how.metagenomics.data.are.analyzed

summary(pre_metagenomics_analyzed)
summary(post_metagenomics_analyzed)

mean(post_metagenomics_analyzed) - mean(pre_metagenomics_analyzed)

sd(pre_metagenomics_analyzed)
sd(post_metagenomics_analyzed)

t.test(post_metagenomics_analyzed, pre_metagenomics_analyzed)

cohen.d(post_metagenomics_analyzed, pre_metagenomics_analyzed)

metagenomics_analyzed <- data.frame(Pre_VTP = pre_metagenomics_analyzed , Post_VTP = post_metagenomics_analyzed)
ggpaired(metagenomics_analyzed, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how metagenomics data are analyzed", xlab = "", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of metagenomics analysis knowledge before completing the intervention (M = 0.400, SD = 0.598) than after (M = 3.65, SD = 1.27), Mdiff = 3.25, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.28).

d. Linux/Bash terminal

#I understand how to use the Linux/Bash terminal    

pre_linux_bash <- pre$I.understand.how.to.use.the.Linux_Bash.terminal
post_linux_bash  <- post$I.understand.how.to.use.the.Linux_Bash.terminal

summary(pre_linux_bash)
summary(post_linux_bash)

mean(post_linux_bash) - mean(pre_linux_bash)

sd(pre_linux_bash)
sd(post_linux_bash)

t.test(post_linux_bash, pre_linux_bash)

cohen.d(post_linux_bash, pre_linux_bash)


linux_bash <- data.frame(Pre_VTP = pre_linux_bash, Post_VTP = post_linux_bash)
ggpaired(linux_bash, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use the Linux/Bash terminal", xlab = "", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of linux terminal/bash knowledge before completing the intervention (M = 0.450, SD = 1.28) than after (M = 3.65, SD = 1.42), Mdiff = 3.20, t(20) = 7.48, p < .0001. The effect size is very large (Cohen’s d = 2.37).

e. R/RStudio

#I understand how to use R progamming language and R_RStudio    

pre_r <- pre$I.understand.how.to.use.R_RStudio
post_r <- post$I.understand.how.to.use.R_RStudio

summary(pre_r)
summary(post_r)

mean(post_r) - mean(pre_r)

sd(pre_r)
sd(post_r)

t.test(post_r, pre_r)

cohen.d(post_r, pre_r)

r <- data.frame(Pre_VTP = pre_r, Post_VTP = post_r)
ggpaired(r, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how to use R/RStudio", xlab = "", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of R/RStudio knowledge before completing the intervention (M = 0.350, SD = 0.8123) than after (M = 3.80, SD = 1.24), Mdiff = 3.45, t(20) = 10.4, p < .0001. The effect size is very large (Cohen’s d = 3.29).

f. Bioinformatics applications

#I understand how bioinformatics tools can be used to answer a scientific question

pre_scientific_question <- pre$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question
post_scientific_question <- post$I.understand.how.bioinformatics.tools.can.be.used.to.answer.a.scientific.question

summary(pre_scientific_question)
summary(post_scientific_question)

mean(post_scientific_question) - mean(pre_scientific_question)

sd(pre_scientific_question)
sd(post_scientific_question)

t.test(post_scientific_question, pre_scientific_question)

cohen.d(post_scientific_question, pre_scientific_question)

scientific_question <- data.frame(Pre_VTP = pre_scientific_question, Post_VTP = post_scientific_question)
ggpaired(scientific_question, cond1 = "Pre_VTP", cond2 = "Post_VTP",
         color = "condition", fill = NULL, line.color = "black", palette = 'ucscgb', title = "I understand how bioinformatics tools can help answer a scientific question", xlab = "", ylab = "Student Response")

A paired-samples t-test showed that, as hypothesized, participants reported lower levels of bioinformatics application knowledge before completing the intervention (M = 1.25, SD = 1.25) than after (M = 3.95, SD = 1.19), Mdiff = 2.7, t(20) = 6.99, p < .0001. The effect size is very large (Cohen’s d = 2.21).

Student Feedback

Overview

Participants had overwhelmingly positive experiences:

  • All participants reported they would recommend the VTPs to other students.

  • A majority (> 60% of the students) requested information about our 2-3 week extension VTPs, if/when funds become available. VTP Extension Projects allow participants to dive deeper into a subject and its analysis methods after completing the initial VTP. In Extension Projects, participants complete a customized and independent research project with that VTP’s dataset and present a capstone to an established domain expert who can assess the work and provide an evaluation that can be used for a recommendation letter if desired.

  • Many (> 15% of the students) have served as Assistant Mentors (AMs) to new cohorts of the VTPs they previously completed, and in multiple cases, helped participants who had a lot more education/experience than themselves. The AM role is similar to that of a TA. We ask AMs for a 4-hour time commitment throughout the VTP, and they help new participants address the issues they encountered during their cohort (AMs are not expected to be experts or to address questions they are not comfortable answering). Assistant Mentoring helps participants gain a deeper understanding of the material and to interact with more researchers from academia and industry. MILRD pays an honorarium to each AM for their efforts.

Scholarship Spotlight: Naomi Zakimi

Naomi Zakimi was a Genetics and Genomics major from UC Davis and completed the Metagenomics + Microbial Surveillance VTP. She and completed her degree shortly after completing the VTP and is now working at a bioinformatics lab (also at UC Davis).

Naomi later served as an Assistant Mentor to a new Metagenomics VTP cohort and additionally completed our Single-cell Transcriptomics + Lung Cell Characterization VTP.

Thank you…for this amazing opportunity! I learned so much from this VTP and found the topic very interesting

Scholarship Spotlight: Anthony Givans

Anthony completed the Variant Calling + COVID-19 VTP as a high school student in Jamaica. He is currently a first year student at the University of Miami double majoring in Finance and Mathematics with a Minor in Computer Science.

Here are Anthony’s words about his experience in the MILRD VTP program:

This [VTP] program was so beneficial to my development. As a prospective Data Scientist, I got the opportunity to work in an RStudio environment and get more acquainted with R. It has definitely helped me to solidify my knowledge of the programming language. One of my favourite parts was interacting with the mentors, along with the other students doing the program with me. It was a unique opportunity to interact with persons more knowledgeable than me without feeling left out… The VTP gave me unprecedented access to research materials and undoubtedly developed my data and computational skills. I would recommend this VTP to anyone who is interested in a data related field.

…I was surrounded (virtually) with extremely smart individuals, some with PhDs, who made me feel welcome and excited to tackle the problems at hand. It was by no means easy, but by collaborating with the group and working together to fix bugs, we were successful in the end…. As a recent self-taught programmer at the time, I had never worked with such a large code base. That said, I was able to learn a few tips and tricks on how to structure and write clean and effective code, which has been helping me ever since.

It has helped me so much that I have gone on to do many exciting things with code and I even decided to tack on a Computer Science minor to my already strenuous course load…. All in all, my participation in the program helped me a lot and I am very grateful.”

Future Directions

In 2022, we plan to:

  1. Scale our programs more quickly by enrolling whole classes at high schools and colleges, and already have secured many course integrations;

  2. Enroll more teachers in VTPs for Professional Development/CTLE credit to promote integration of MILRD’s VTPs into the courses they are teaching; and

  3. Experiment with launching subsets of VTPs so teachers can integrate bioinformatics/computational biology exercises into their courses, even if they are unable to commit 1-2 weeks to a full VTP.

Per (1), one of MILRD’s focuses in 2021 was to seek out partnerships with schools and education organizations so that we could more easily enroll large student cohorts moving forward. This effort really started to bear fruit in Q4 of 2021 and is the result of two notable initiatives:

  1. MILRD presented a VTP workshop, and was a non-profit exhibitor, at the annual conference of the Science Teacher Association of New York State (STANYS), New York’s oldest professional organization of Pre-K to University public-school science educators.

  2. We integrated one of our VTPs into a SUNY Binghamton Undergraduate Microbiology Course.

Both of these initiatives have led to partnerships with schools and education organizations. We have confirmed course integrations at high-need colleges, high schools and education organizations totaling over 500 students, pending procurement of funds.

Per (2) and (3), MILRD recently completed our first teacher professional development (PD) event for CTLE credit in collaboration with STANYS (8 hours CTLE credit, Feb 1-10, 2022) with 9 teachers. We plan to do more PD events with STANYS and are actively recruiting other organizations/schools to partner with for these offerings.

The STANYS PD event highlighted that some teachers want to integrate bioinformatics/data science into their biology courses, but that the 1-2 week commitment for a full VTP isn’t feasible for the courses they teach. We’re working to offer VTP subsets so teachers can integrate smaller (0.5 - 1.5 hours total effort) bioinformatics/data science exercises into their courses.