Network Analysis of Disease Outcomes and Comparative Effectiveness Research via Mining Registry and Administrative Claims Data

Date of Award

Spring 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Ma, Shuangge

Abstract

Over the past decade, systematic and integrated information in electronic health records, registry data, and claims data has advanced research on clinical treatment outcomes, disease interactions, and treatment comparative effectiveness. Extensive analysis of treatment outcomes can facilitate more effective resource management and a better understanding of diseases. While most research focuses on single diseases or many diseases combined, recent interest has turned toward network analysis of treatment outcomes. However, existing studies face limitations, such as the ineffectiveness of methods, neglecting the zero-inflated nature of outcome data, making stringent model assumptions, lack of attention to heterogeneity and relevant covariates, and low data quality. Comparative effectiveness research aims to compare outcomes of treatment strategies for a specific condition, offering insights into the risks and benefits of each to support more informed clinical decision-making. While randomized controlled trials are the gold standard for evaluating such effects, they can be challenged by high costs, small sample sizes, an inability to reflect the real world, and feasibility concerns. In recent years, trial emulation has emerged as a promising alternative, using large observational datasets to estimate treatment effects in a trial-like framework. Despite its potential, methodological development for emulation remains limited. This dissertation develops methods for constructing human disease networks (HDNs) tailored to clinical treatment outcomes in a heterogeneous Taiwanese population and older cancer populations in the United States. It also develops novel trial emulation approaches to evaluate treatment strategies for various cancer types. First, we focus on the Taiwan National Health Insurance Research Database (NHIRD), a large population-level claims database, and construct HDNs for the number of outpatient visits and medical costs. Significantly advancing from existing literature, the proposed approach addresses population heterogeneity and the effects of covariates. Additionally, the proposed effectively accommodates zero inflation, Poisson distribution, high dimensionality, and network sparsity. In the analysis of NHIRD data, multiple subject groups are identified based on the heterogeneous interconnected outpatient visit and medical cost data. The interconnections and network modules are analyzed and compared across groups. This work has been accepted by The Annals of Applied Statistics. Second, we aim to extract insights from the Surveillance, Epidemiology, and End Results (SEER)-Medicare data and construct HDNs for the number of inpatient and outpatient visits for ten distinct cancer populations, primarily focusing on the context of cancer care. The proposed deep neural network-based approach can accommodate a high proportion of zeros in the data and capture intricate conditional dependencies between pairs of diseases. We have compared the constructed HDNs for different cancer populations. The interconnections and network modules for each cancer population are found to have sound implications. This work has been submitted. Third, we develop a Bayesian emulation approach that combines real-world evidence from observational data with prior information from published literature, including propensity score estimation via Bayesian logistic regression and a weighted Bayesian Weibull accelerated failure time model. Using SEER-Medicare data, we emulate a target trial to evaluate the comparative effectiveness of partial hepatectomy and ablation in overall survival for early-stage hepatocellular carcinoma (HCC) patients. Our findings suggest inconclusive differences, providing further insight into HCC clinical treatment. This work has been published in Life. To address limitations in regression analysis, we further develop a deep learning-based analysis pipeline, including a propensity score step, a weighted survival analysis step, and a bootstrap inference step. An emulated trial is designed to evaluate the relative effectiveness in overall survival for lumpectomy and mastectomy for early-stage female breast cancer patients. It demonstrates the power of “mining large data + deep learning-based analysis”. This work has been published in The Yale Journal of Biology and Medicine. This dissertation significantly advances existing research by expanding the scope of human disease networks, clinical treatment outcome analysis, and comparative effectiveness research. The proposed methods and techniques offer broad applicability and demonstrate scientific merit. The findings may serve as prototypes for further research and applications.

This document is currently not available here.

Share

COinS