Leveraging Omics Data for Better Treatment: Computational Methods for Biomarker Identification, Drug Repositioning, and Polypharmacy Risk Assessment

Date of Award

Spring 2022

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computational Biology and Bioinformatics

First Advisor

Zhao, Hongyu


With the advancement of high-throughput sequencing and massively parallel technologies, more and more omics data are available for biomedical research. Genomics, transcriptomics, proteomics, metabolomics and microbiomics data help biomedical researchers delve into the complex biological systems and dissect the underlying mechanisms of genetics from different perspectives. Despite the huge success in molecular biology research, the potential value of omics data has not been fully realized, especially for treatment discovery. This thesis is focused on the computational method development for biomarker identification, drug repositioning, and polypharmacy risk assessment, which are three substantial tasks in treatment discovery and development. The first contribution of this thesis is development of a novel transcriptome-wide association analysis method for biomarker identification. Transcriptome-wide association studies (TWAS) have several advantages over traditional genomewide association studies (GWAS) because it performs gene-level association tests by which it reduces the multiple testing burden and outputs interpretable association results. In this project, we developed a novel TWAS method based on joint bounded-variable least-squares. We trained our expression imputation models with genotype and RNA-sequencing data from the updated version of Genotype-Tissue Expression (GTEx) project and the imputation accuracy outperformed other state-of-the-art methods. For the transcripts of interest, we incorporated non-coding transcripts in our analysis pipeline by performing specific expression adjustment procedures. For association test, we performed TWAS for a number of traits based on their GWAS summary statistics and identified novel genes with significant associations. The second contribution of this thesis is a computational drug repositioning framework based on multiomics data. Drug discovery is challenging due to its long research cycle and high capital investment, while computational drug repositioning seems to be a cost-effective alternative for de novo drug discovery. In this project, we proposed a signature-matching-based drug repositioning framework with multiple gene expression studies and TWAS. The disease signature was curated from reversible genes identified through differential expression analysis, network analysis, pathway analysis as well as TWAS. By matching disease signature with the cell-line-specific drug signatures in perturbation database, repurposed drug candidates can be identified through connectivity scores. We implemented this framework to discover repurposed drugs for non-alcoholic steatohepatitis. The validation experiments showed that four out of seven top recommended drug candidates have in vitro lipid reduction efficacy.The third contribution of this thesis is development and validation of a polypharmacy burden score based on adverse drug-drug interactions. Polypharmacy is a common phenomenon in older population and has been associated with increased risk of adverse health outcomes. One of the main reasons for the increased risk is the increased drug-drug interactions (DDI). In this project, we trained a new drug-drug interaction prediction models and built a novel polypharmacy burden score with ensemble information from knowledge base and prediction algorithms. The multi-class classification algorithm for DDI was trained on known DDI and curated empirical safe co-prescription pairs. The polypharmacy burden score was validated in the veterans aging cohort study (VACS) EHR database by performing association tests for hospitalization and comparing the FIB-4 index and eGFR for high-burden-score and low-burden-score cohorts.

This document is currently not available here.