Date of Award

Spring 2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Statistics and Data Science

First Advisor

Barron, Andrew

Abstract

Mixtures of distributions provide a flexible model for heterogeneous data, but this versatility is concomitant with computational difficulty. We study the task of generating samples from the "greedy'' gaussian mixture posterior. While it is widely known that Gibbs sampling can be slow to converge, concrete results quantifying this behavior are scarce. In this dissertation, we establish conditions under which the number of steps required by the Gibbs sampler is exponential in the separation of the data clusters. Further, we analyze the efficacy of potential solutions. The simulated tempering algorithm uses an auxiliary temperature variable to flatten the target density (reducing the effective cluster separation). As existing implementations are poorly suited to the unusual properties of the mixture posterior, we adapt simulated tempering by flattening the individual likelihood components (referred to as internal annealing). However, this is no universal solution, and we characterize conditions under which the original cause of slow convergence will persist. An alluring alternative is subsample annealing, which instead flattens the posterior by reducing the size of the observed subsample. Still, this approach is sensitive to the ordering of the data, and we prove that a single poorly chosen datum can be sufficient to prevent rapid convergence.

COinS