"Optimization and Sequential Decision Making: Implicit Loss Functions a" by Ruitu Xu

Date of Award

Fall 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Statistics and Data Science

First Advisor

Lafferty, John

Abstract

Machine learning, as a transformative tool, has demonstrated remarkable efficacy across diverse applications. This dissertation delves into a selection of learning algorithms, analyzing their learning dynamics and providing theoretical guarantees on their performance. First, we scrutinize a novel neural network learning paradigm, leveraging a set of random weights that represent more biologically plausible neural connections during the back-propagation training process. Our research establishes a theoretical framework, illustrating both the convergence of the training loss and the mechanism of alignment between the forward-propagated network weights and the randomized backward-propagated weights. Subsequently, we venture into the realm of model-agnostic meta-learning, examining an algorithm that aims to provide a parametric initialization conducive to an array of distinct learning tasks. Our findings reveal that, under the continuous-time limit, the training dynamics of this algorithm exhibit linear convergence towards an approximate stationary point of the meta-learning loss, over strongly convex task losses.Building upon these insights, we introduce an innovative, computationally efficient training algorithm, anchored in our theoretical discoveries. Sequential decision-making, characterized by actions that unfold over time and can influence subsequent events, is a sophisticated and expansive area of study that this dissertation critically assesses. Our analysis covers a wide spectrum of decision-making problems, beginning with an exploration of a heterogeneous agent macroeconomic model, incorporating an infinite ensemble of households and firms vying for advantages in a labor market. This model, a representation of a broad class of macroeconomic inquiries, is then transposed into a mean-field game. A novel, data-driven reinforcement learning framework is introduced, adept at identifying the regularized competitive equilibrium of the game at a sub-linear rate. Subsequently, we delve into the realm of risk-sensitive reinforcement learning, focusing on the entropic risk measure. We pioneer a definition of sub-optimality gaps tailored for this risk measure. Drawing upon this, we provide non-asymptotic and logarithmic regret bounds for two model-free algorithms within the scope of episodic MDPs. These theoretical guarantees constitute a substantial leap from existing results and match our gap-dependent lower bounds, differing merely by a logarithmic factor. Our study further probes linear contextual bandits with heteroscedastic noise, bringing forth a novel, noise-adaptive Thompson sampling-style algorithm. This mechanism navigates the nuances of the noise of unknown variance, achieving theoretical guarantees spanning both worst-case scenarios of constant-variance noise and the case of deterministic rewards, thereby forging a continuum between these extremes. Finally, our focus transitions to generalized linear bandits. While conventional algorithms in this domain are overwhelmingly tethered to the computationally intensive task of identifying the MLE at every iteration, we offer an alternative paradigm. By harnessing online Stochastic Gradient Descent (SGD) for generalized linear bandit problems, we introduce a computationally efficient algorithm. This algorithm leverages past data through a streamlined SGD update at every step, simultaneously deploying Thompson sampling for exploration. This approach delivers state-of-the-art regret and time complexity performance in real-world settings.

Share

COinS