Date of Award
Fall 2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Statistics and Data Science
First Advisor
Zhou, Huibin
Abstract
Transfer Learning is an area of statistics and machine learning research that seeks answers to the following question: how do we build successful learning algorithms when the data available for training our model is qualitatively different from the data we hope the model will perform well on? In this thesis, we focus on a specific area of Transfer Learning called label shift, also known as quantification. In quantification, the aforementioned discrepancy is isolated to a shift in the distribution of the response variable. In such a setting, accurately inferring the response variable’s new distribution is both an important estimation task in its own right and a crucial step for ensuring that the learning algorithm can adapt to the new data. We make two contributions to this field. First, we present a new procedure called SELSE which estimates the shift in the response variable’s distribution. Second, we prove that SELSE is semiparametric efficient among a large family of quantification algorithms, i.e., SELSE’s normalized error has the smallest possible asymptotic variance matrix compared to any other algorithm in that family. This family includes nearly all existing algorithms, including ACC/PACC quantifiers and maximum likelihood based quantifiers such as EMQ and MLLS. Empirical experiments reveal that SELSE is competitive with, and in many cases outperforms, existing state-of-the-art quantification methods, especially when the number of test samples is far greater than the number of train samples.
Recommended Citation
Chow, Brandon Tse Wei, "A Semiparametric Efficient Approach To Label Shift Estimation and Quantification" (2022). Yale Graduate School of Arts and Sciences Dissertations. 836.
https://elischolar.library.yale.edu/gsas_dissertations/836