Date of Award
Fall 2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computational Biology and Bioinformatics
First Advisor
Kleinstein, Steven
Abstract
Recent advances in high-throughput experiments and systems biology approaches have resulted in hundreds of publications profiling immune responses to exposures such as pathogens and vaccines. Immune profiling is important for an emerging class of disease diagnostic technologies, as well as for understanding the mechanisms underlying disease pathogenesis and vaccine response. Massive amounts of public data are available for biomarker discovery and mechanistic profiling, including both datasets capturing transcriptional profiles of the immune system as well as published signatures describing changes that occur (e.g., in gene expression or cellular phenotypes) following infection or vaccination. This wealth of public data presents both opportunities and challenges for advancing our understanding of immune responses to infection and vaccination. Applying transcriptional immune signatures for clinical diagnostics requires that signatures robustly detect the pathogen of interest without cross-reacting with unintended conditions (e.g., comorbidities, other pathogens). While methods exist to discover and validate signature robustness using massive public data, no framework exists for evaluating signature cross-reactivity (i.e., biological specificity) or for designing highly specific infection signatures. This challenge is compounded by difficulties identifying published signatures in the first place, as signatures are present in the literature in many heterogeneous formats including figures, tables, and text. In this thesis, we address limitations in the evaluation, discovery, interpretation, and dissemination of immune signatures. To evaluate the clinical applicability of published infection signatures, we begin by building a computational framework to benchmark robustness and cross-reactivity. Our framework leverages 17,105 transcriptional profiles curated from more than 150 datasets that capture immune responses to more than 35 unique pathogens as well as non-infectious conditions. We apply this framework to 30 published infection signatures, and demonstrate that while many signatures are robust, nearly all predict unintended infections or non-infectious immune states such as aging. We make this framework publicly available for researchers to query the performance of their own signatures to promote signature development. We next propose and apply a framework for discovering pathogen-specific signatures of infection. We discover an 11-gene signature of COVID-19 infection from both massive public and new multi-omics datasets. This signature is robust in independent cohorts, and unlike all other published COVID-19 signatures, does not cross-react with other viral or bacterial infections, COVID-19 comorbidities, or confounders. Using reference cell type expression vectors and single-cell RNA-sequencing datasets to interpret the biology of our signature, we identified distinct roles for plasmablasts and memory T cells in determining signature performance. Signal from plasmablasts mediated COVID-19 detection, while signal from memory T cells controlled cross-reactivity with other viral infections. Finally, to facilitate signature access and querying from the literature, we build the HIPC Dashboard. This web-enabled application relies on a newly developed data model for capturing published immune signatures in a standardized, machine-readable format, as well as manual curation of more than 600 signatures describing human vaccine responses. This system enables researchers to access and rapidly interrogate immune signatures and will aid in building a broader understanding of immune responses. The methods presented in this thesis allow us to discover highly specific and interpretable biomarkers across biological systems and share both curated datasets and published findings with the broader research community.
Recommended Citation
Chawla, Daniel G., "Identifying Immunological Biomarkers in Massive Public Data" (2022). Yale Graduate School of Arts and Sciences Dissertations. 733.
https://elischolar.library.yale.edu/gsas_dissertations/733