Date of Award
Fall 2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Public Health
First Advisor
Crawford, Forrest
Abstract
This dissertation consists of three essays, each one tackling issues related to "identification'' in the statistical methods used in public health research. In the context of statistics, ``identification'' refers to the process of learning the values of the parameters or characteristics of a model or system after obtaining an infinite number of observations from it. In these essays, I draw attention to the limitations of current methods, which often hinge on specifying a limited set of data generating processes (DGPs). However, these DGPs may fail to correspond accurately with the intricacies of actual situations or with the pre-existing knowledge within the field. To address these limitations, I propose identification strategies for DGPs that more accurately reflect the realities of public health. Additionally, I develop procedures to estimate the parameters of these models, based on the observed data. By enhancing the identification strategies and estimation procedures, we can improve the credibility and applicability of the public health research. In Chapter 1, co-authored with Forrest W. Crawford, we discuss the impact of the time scale used when recording and analyzing time-varying data for causal inference. In particular, we formalize causal inference for densely sampled trajectories, and point out that sparsely sampled trajectories may result in substantial errors. Existing methods focus on identifying causal effects of treatments applied at discrete time points, but in reality, treatments often happen continuously or at a finer time scale than the one used for measurement or analysis. This discrepancy can introduce bias, distorting our understanding of the treatment's actual effects. To address this gap, we propose a framework to formally conceptualize and operationally account for this bias. This approach underscores the need for judicious selection of the time scale for data collection and analysis. It also carries significant implications for interpreting research results, particularly within areas like comparative effectiveness and health policy evaluation. Chapter 2, again a collaborative work with Forrest W. Crawford, builds upon the framework established in Chapter 1 to present a novel methodology to identify and estimate the causal effects of treatments applied continuously in time, utilizing densely measured trajectory data. While statistical methodology for estimating the causal effect of a time-varying treatment, measured discretely in time, is well developed, the existing discrete-time methods do not generalize easily to continuous time. We propose a novel approach that successfully removes confounding bias, despite the entanglement of uncountably infinite variables. This research is particularly pertinent given recent advancements in data collection technologies. The advent of sophisticated tools such as physiological monitors, wearable digital devices, and environmental sensors now allows for almost continuous data collection. This rich, dense data can be harnessed to yield profound insights into the impact of continuous-time treatments of public health interest, employing our newly proposed methodology. In Chapter 3, a joint effort with Luk Van Baelen, Els Plettinckx, and Forrest W. Crawford, we utilize capture-recapture survey (CRC) data to develop a method for estimating the size of populations that are challenging to count directly, including people with COVID-19, drug users, sex workers, and victims of conflict and trafficking. Existing methods can be restrictive and might result in biased estimates. We propose a new method that leverages available empirical information about the dependence structures between CRC samples, using the econometric theory of partial identification to establish robust inferential procedures. This will provide a more reliable estimate of the population size, and thus help public health policymakers plan and deliver services effectively. We have implemented open-source software for the proposed procedure in an R package for general CRC experiments. In a comprehensive real-world study, we apply our new methodology to estimate the number of people who inject drugs in Brussels, Belgium, using heterogeneous survey data. This work has led to a paper published in the Journal of Survey Statistics and Methodology.
Recommended Citation
Sun, Jinghao, "Novel Methods for Identification and Inference in Public Health" (2023). Yale Graduate School of Arts and Sciences Dissertations. 1167.
https://elischolar.library.yale.edu/gsas_dissertations/1167