Date of Award

Spring 2021

Document Type


Degree Name

Doctor of Philosophy (PhD)


Public Health

First Advisor

Ma, Shuangge


As the nation’s largest healthcare payer, the Medicare program generates an unimaginable vast volume of medical data. With an increasing emphasis on evidence-based care, how to effectively handle and make inferences from the heterogeneous and noisy healthcare data remains an important question. High-quality analysis could improve the quality, planning, and administrations of health services, evaluate comparative therapies, and forward research on epidemiology and disease etiology. This is especially true for older adults since this population’s health condition is generally complicated with multimorbidity, and the healthcare system for older adults is riddled with administrative and regulatory complexities. Taking advantage of the scaled and comprehensive Medicare data, this dissertation focuses on outcome research, human disease networks, and comparative effectiveness research for older adults. Healthcare outcome measures such as mortality, readmission, length of stay (LOS), and medical costs have been extensively studied. However, existing analysis generally focuses on one single disease (or at most a few pre-selected and closely related diseases) or all diseases combined. It is increasingly evident that human diseases are interconnected with each other. Motivated by the emerging human disease network (HDN) analysis, we conduct network analysis of disease interconnections on healthcare outcomes measures. First, we propose a clinical treatment HDN that analyzes inpatient LOS data. In the network graph, one node represents one disease, and two nodes are linked with an edge if their disease-specific LOS are correlated (conditional on LOS of all other diseases). To accommodate zero-inflated LOS data, we propose a network construction approach based on the multivariate Hurdle model. We analyze the Medicare inpatient data for the period of January 2008 to December 2018. Based on the constructed network, key network properties such as connectivity, module/hub, and temporal variation are analyzed. The results are found to be biomedically sensible, especially from a treatment perspective. A closer examination also reveals novel findings that are less/not investigated in the individual-disease studies. This work has been published in Statistics in Medicine. Second, considering that many healthcare outcomes are closely related to each other, we propose a high-dimensional clinical treatment HDN that can incorporate multiple outcomes. We construct a clinical treatment HDN on LOS and readmission and note that the proposed method can be easily generalized to other outcomes of different data types. To deal with uniquely challenging data distributions (high-dimensionality and zero-inflation), a new network construction approach is developed based on the integrative analysis of generalized linear models. Data analysis is conducted using the Medicare inpatient data from January 2010 to December 2018. Network structure and properties are found to be similar to that of the LOS HDN (in Chapter 2) but provide additional insights into disease interconnections considering both LOS and readmission. The proposed clinical treatment of HDNs can promote a better understanding of human diseases and their interconnections, guide a more efficient disease management and healthcare resources allocation, and foster complex network analysis. The manuscript of this work has been drafted and is ready for submission. Comparative effectiveness research aims to directly compare the outcomes of two or more healthcare strategies to address a particular medical condition. Such analysis can provide information about the risks, benefits, and costs of different treatment options, thus guide better clinical decisions. While conducting a randomized controlled trial is the gold-standard approach, there are several limitations. Efforts have been made to utilize healthcare record data in comparative effectiveness research. To estimate and compare causal effects of treatments/interventions, we use the Medicare data to emulate target clinical trials and develop a deep learning-based analysis approach. Under emulation, target clinical trials are explicitly “assembled” using the Medicare data. As such, statistical methods for clinical trials can be directly applied to estimate causal effects. With emulation analysis, we evaluate the effectiveness and safety outcomes of rivaroxaban versus dabigatran for Medicare patients with atrial fibrillation. The results show that dabigatran is superior in terms of time to any primary event (including ischemic stroke, other thromboembolic events, major bleeding, and death), major bleeding, and mortality. This work has been submitted to Clinical Epidemiology. Considering that many regression-based statistical methods (e.g., Cox proportional hazards model for survival data) have too strict data assumptions, we further develop an innovative deep learning-based analysis strategy. With the “emulation + deep learning” approach, we study the survival outcomes of endovascular repair versus open aortic repair for Medicare patients with abdominal aortic aneurysms. It is found that endovascular repair has survival advantages in both short- and long-term mortality. This work has been published in Entropy. Significantly different and advancing from the existing literature, this dissertation extends the scope of outcome research, human disease networks, and comparative effectiveness research. The findings in this dissertation are shown to have scientific merits, and the methodological developments may have other applications and serve as prototypes for future analysis.