Bayesian Methods to Integrate Multimodal Biomedical Data for Biomarker and Subpopulation Identification

Date of Award

Spring 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Zhao, Hongyu

Abstract

Across biomedical research, extracting latent information from large-scale complex data to accurately characterize population heterogeneity remains a critical challenge. This thesis comprises three chapters, each focusing on a distinct application: multi-state brain functional connectome subtyping, biomarker discovery across multiple endpoints, and diagnosis-enhanced rare disease analysis integrating risk scores. These studies share a common methodological foundation in Bayesian modeling, enabling the integration of multimodal data and the identification of hidden structures within complex biomedical data. The first chapter introduces an innovative Bayesian nonparametric network-variate clustering method, MMBeans, to identify neurodevelopmental subtypes based on multi-state functional connectivity. Simultaneously, this method accommodates the network topology architecture by modeling state-specific modular structures and extracting informative features for subtyping. Applied to the Adolescent Brain Cognitive Development (ABCD) study, MMBeans identifies distinct neurodevelopmental subtypes and brain sub-network phenotypes across cognitive states, revealing neurobiological heterogeneity and suggesting promising directions for further research in neuroscience. The second chapter extends Bayesian modeling to biomarker discovery in the presence of multiple endpoints. Traditional approaches that rely on binary responder/non-responder classification oversimplify disease complexity, resulting in significant information loss. To address this, we developed a Bayesian factor analysis model, mixBMIMIC, which integrates multiple clinical endpoints across varying data types to jointly identify reliable biomarkers and assess treatment-effect heterogeneity in inflammatory bowel diseases (IBDs) using data from the GEMINI studies. This method enhances biomarker selection accuracy, improving personalized treatment strategies by more effectively capturing underlying disease heterogeneity. The third chapter applies the Bayesian modeling approach from the second chapter to rare disease analysis, focusing on narcolepsy using UK Biobank data. By incorporating narcolepsy-specific disease risk scores and polygenic risk scores (N-DRS and N-PRS) within the mixBMIMIC model, we demonstrate that risk estimation becomes more comprehensive when additional surrogate outcomes are integrated alongside traditional ICD-10-based diagnoses, leading to more robust predictor identification. This study facilitates early detection and improves case identification for rare diseases. Overall, this thesis contributes to the development and application of Bayesian modeling for integrating multi-type data and uncovering hidden structures within heterogeneous populations in biomedical research. It also provides insights for future applications of statistical learning in precision medicine, supporting efforts toward personalized diagnostics and targeted interventions.

This document is currently not available here.

Share

COinS