Date of Award

January 2019

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Mark Gerstein


There has been a substantial advancement in understanding the genetic mediators of cancer as a result of large-scale sequencing projects. However, there remain challenges in interpreting the functional significance of genetic variants in cancer. This is related to both the challenge of annotating functionally significant regions and variants, as well as to the challenge of discovering how interactions among genetic loci affect cancer development.

The objectives of this thesis are, in this context, to A) describe two novel mechanisms for annotating coding and non-coding regions and variants that may have significance in cancer, and B) describe several methods by which such annotations may be used to assess how interactions or aggregative effects among cancer variants contribute to oncogenesis.

Related to objective A) we present a method for annotating upstream open reading frames (uORFs). uORFs are latent in mRNA transcripts and are thought to modify translation of coding sequences by altering ribosome activity. To determine which uORFs are likely to be biologically relevant, we built a simple Bayesian classifier using 89 attributes of uORFs labeled as active in ribosome profiling experiments. We also describe a software tool called ALoFT (annotation of loss-of-function transcripts), a method to annotate and predict the disease-causing potential of loss-of-function variants. For objective B) we applied the ALoFT software tool and an information theoretic measure called ‘synergy’ to variants from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We also describe how network approaches may be used to understand the functional effects of cancer variants. In particular, we describe how data from the Encyclopedia of DNA Elements (ENCODE) project may be used to develop a network-based understanding of interactions among cancer variants.

Conclusions related to A) are that variants in noncoding regions affecting uORFs may have a role in cancer development, while loss-of-function variants are subject to a more nuanced interpretation of their effect in cancer as can be predicted using the ALoFT software tool. Our conclusions related to B) are that beyond the classical driver-passenger paradigm, there is a considerable role for cooperative effects and regulatory or network effects in the genetic basis for altered biology in cancer cells.


This thesis is restricted to Yale network users only. This thesis is permanently embargoed from public release.