Date of Award
Spring 2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Statistics and Data Science
First Advisor
Sekhon, Jasjeet
Abstract
Statistical learning has proven highly effective across a wide array of applications. Among the various areas within statistical learning, one of particular interest is the study of decision making in dynamic environments, known as sequential decision making. With the rapid advancement of reinforcement learning and deep learning, practitioners increasingly encounter dynamic environments in real-world scenarios like robot control, vehicle navigation, medical science, and finance. This thesis concentrates on statistical estimation, evaluation, and optimization for sequential decision making problems. In such problems, a prevalent statistical model for the dynamic environment is the Markov Decision Process (MDP), characterized by features such as the state space, action space, transition probability, and other known or unknown structural elements. These characteristics determine the complexity of the environment, and data is collected through interaction with it. The richness of the data, combined with the complexity of the environment, determines the difficulty of a sequential decision making problem. A common objective in this domain is to learn a policy that, when executed, achieves high performance within the dynamic environment. In this thesis, we introduce innovative statistical algorithms designed to achieve accurate policy evaluation and learning, ensuring high performance. In Chapter 2, we delve into the offline policy evaluation problem assuming a linearly structured MDP. We devise a novel statistical algorithm that effectively utilizes variance information within the dataset. The statistical estimator produced by this algorithm exhibits asymptotically smaller error compared to standard methods. Chapter 3 focuses on policy learning within a goal-oriented framework termed Stochastic Shortest Path. Here, we introduce two novel algorithms capable of discovering policies that reach the goal state with sub-linear cost. Chapter 4 addresses the dynamic bipartite matching problem, prevalent in applications like ride-hailing and food delivery. We propose a novel framework, named Markov Matching Market, to model this dynamic matching problem and present algorithms that achieve optimal matching with minimal (sub-linear) utility loss.
Recommended Citation
Min, Yifei, "Statistical Theory and Algorithms for Sequential Decision Making" (2024). Yale Graduate School of Arts and Sciences Dissertations. 1401.
https://elischolar.library.yale.edu/gsas_dissertations/1401