I am a fourth-year PhD student in applied statistics at Yale. I have a strong background in statistics with proficient programming skills. My research focuses on statistical learning methods with application in genetic and genomic data, specifically genetic risk prediction and clustering methods in gene co-expression network. I have plenty of experience with large datasets through various research project, all of which grows my interest in solving real world problems with data.


2013-2018 Ph.D. in Biotatistics, Yale University
2009-2013 B.S. in Mathematics and Statistics, Peking University

Honors and Awards

2017 Silver medal (top 4% among 3,307) in Kaggle Challenge: Quora Question Pairs
2017 First Place in Citadel and Correlation One Datathon at Duke University
2013 Outstanding Graduate of Peking University
2012 Outstanding Academic Performance Award in Peking University
2012 Xianzi Zen Scholarship


2017 Applied Survival Analysis
2016 Multivariate Statistics
2014-2016 Computational Statistics
2014 Intro to Statistical Thinking
A stats tutorial


I work with Dr. Hongyu Zhao on statistical genetics, focusing on risk prediction, Bayesian statistics, data integration and network-based methods. Furthermore, I am interested in machine/deep learning approaches and their application in biomedical studies.


[8] Hu Y., Lu Q., Liu W., Zhang Y., Li M., Zhao H. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. (PLOS Genetics, in press)
[7] Lu Q.*, Powles R.*, Abdallah S., Ou D., Wang Q., Hu Y., Lu Y., Liu W., Mukherjee S., Crane P., Zhao H. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease. (Under review)
[6] Hu Y.*, Lu Q.*, Powles R., Yao X., Yang C., Fang F., Xu X., Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. (PLOS Computational Biology, in press)
[5] Hu, Y., Zhao, H. (2016). CCor: a whole genome network-based similarity measure between two genes. Biometrics, 72(4):1216.
[4] Li, M., Foli, Y., Liu, Z., Wang, G., Hu Y., et al. (2016) High frequency of mitochondrial DNA mutations in HIV-infected treatment-experienced individuals. HIV Medicine, 18(1), 45-55.
[3] Lu Q., Hu Y., Sun J., Cheng Y., Cheung K., Zhao H. (2015). A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Scientific Reports, 5, 10576.
[2] Xi, R., Li, Y., Hu, Y. (2015). Bayesian Quantile Regression Based on the Empirical Likelihood with Spike and Slab Priors. Bayesian Analysis, Volume 11, 821-855.
[1] Lu, Q., Yao, X., Hu, Y., Zhao, H. (2015). GenoWAP: Post-GWAS Prioritization Through Integrated Analysis of Genomic Functional Annotation. Bioinformatics, 32(4): 542-548.


08/2016 JSM 2016, Chicago, IL
04/2016 The 30th NESS, Yale University, New Haven, CT (poster, )



Joint modeling of genetically correlated diseases and integrative functional annotations to predict disease risk using large-scale GWAS summary statistics. (Python package )


Genetic risk prediction using large-scale GWAS summary data and integrative functional annotations. (Python package )


A network-based approach to measure gene co-expression and detect modules. (R package )


Post-GWAS Prioritization through Integrated Analysis of Functional Annotation. (Python package )


Address: 60 College Street, New Haven, CT, 06520
Email: yiming DOT hu AT yale DOT edu
Yale Biostatistics
Dr. Hongyu Zhao's Lab | Center for Statistical Genomics and Proteomics