Recent Publications

An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit …

Adenoid cystic carcinoma (ACC) is the second most common cancer type arising from the salivary gland. The frequent occurrence of …

Ionizing radiation is a well-appreciated health risk, precipitant of DNA damage, and contributes to DNA methylation variability. …

The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics …

Covariance matrices play fundamental roles in myriad statistical procedures. When the observations in a dataset far outnumber the …



uniCATE implements semiparametric inference procedures for variable importance parameters that assess biomarkers’ treatment effect modification capabilities in high-dimensional clinical trials.


The cvCovEst R package implements a data-adaptive framework for asymptotically optimal covariance matrix estimator selection in high dimensions.


The scPCA R package implements sparse contrastive PCA, a variant of PCA that extracts sparse, stable, and interpretable signal.


University of California, Berkeley

  • The Foundations of Data Science, Data 8 (Summer ‘20) – Instructor
  • Statistical Analysis of Categorical Data, PBHLTH 241 (Spring ‘20) – Graduate Student Instructor
  • Principles and Techniques of Data Science, DATA 100 (Spring ‘19, Fall ‘19) – Graduate Student Instructor
  • Introduction to Probability and Statistics in Biology and Public Health, PBHLTH 142 (Fall ‘18) – Graduate Student Instructor

Selected Experiences


Data Science Intern

Genentech / Roche

May 2021 – Present Remote
Develop flexible, interpretable approaches for predictive biomarker discovery, and benchmark them against competing methods. Implement a swimmer plot function in R for efficiently summarizing Phase I clinical trial data.

Graduate Student Researcher

University of California, Berkeley Superfund

Aug 2020 – Present Berkeley, CA, United States
Analyze data collected by the organization’s environmental health scientists and epidemiologists to better understand the effects of chemical exposures on human health. This is accomplished through the development and application of novel statistical methods.

Instructor, Data 8: The Foundations of Data Science

University of California, Berkeley

May 2020 – Aug 2020 Berkeley, CA, United States
Taught foundational concepts in statistics and computer science to over 400 students while managing a team of teaching assistants.

Graduate Student Intern

Sutter Health - Research, Development and Dissemination

Jun 2019 – Aug 2019 Walnut Creek, CA, United States
Developed a statistical learning pipeline to evaluate a patient’s risk of becoming septic during their hospital visit.