"Integrative Statistical Methods for Genetic Biomarker Identification" by Yuhan Xie

Integrative Statistical Methods for Genetic Biomarker Identification

Date of Award

Spring 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Zhao, Hongyu

Abstract

Genome-wide association studies (GWAS) have gained success in identifying thousands of genetic associations in many human traits and diseases. The most commonly used method for genotyping of individuals in GWAS is microarray-based. A large number of identified variants from these studies are in intergenic regions, so it is challenging to study the functional impacts of these variants. With the availability of large-scale multi-omics resources, post-GWAS analyses such as fine mapping, functional enrichment analyses, and network-assisted analyses have gained success in connecting functional consequences of identified variants with phenotypes of interest. Despite these efforts, there is still a large part of heritability left unexplained from the identified genetic variants in GWAS. One hypothesis is that the missing heritability may be due to rare variants. The development of next generation sequencing technologies such as whole exome sequencing (WES) and whole genome sequencing (WGS) has provided great opportunities for researchers to study associations between rare variants and phenotypes. Among rare variants, de novo variants (DNVs) represent an extreme case given their very low occurrence. Mounting biological evidence has pinpointed the importance of coding germline DNVs on many diseases such as congenital heart disease and neurodevelopmental disorders. Given the rarity of DNVs, gene-based analyses have been conducted to combine information within a gene, so as to boost the statistical power of risk gene identification. Despite recent successes in human genetics studies, there are many challenges in association studies based on GWAS and WES data. First, the statistical power can be hampered by relatively small sample sizes, such as pharmacogenomics GWAS studies and WES in rare diseases. Second, with more and more data available from different diseases and different platforms, there is a need to make the best use of these resources for genetic biomarker identification. Third, the scalability is an important factor in methodology development due to the high volume of genomic big data. To address these challenges and help downstream research on selecting the most worthwhile genetic biomarker to follow up in biological experiments, I have developed three computationally efficient statistical methods that can boost statistical power in risk gene identification and improve biological interpretation. First, I establish a statistical framework that can integrate DNVs from multiple traits as well as incorporate external functional annotation information. Second, I propose a statistical method that can incorporate functional connectivity among genes from protein-protein interaction (PPI) databases with DNVs. Third, I develop a model-based meta-analysis method that can jointly test prognostic and predictive effects and assess the replicability of identified genetic biomarkers without using an independent cohort. The applications of these three methods have identified novel genetic biomarkers and may lead to novel discoveries of disease mechanisms and individualized treatments.

This document is currently not available here.

Share

COinS