Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Statistics and Data Science
First Advisor
Barron, Andrew
Abstract
In this work, we consider a Bayesian method to train single-hidden-layer neural networks with $\ell_{1}$ controlled weights by defining posterior distributions using different subsets of the training data, and combining posterior means to form our estimators. We consider both a joint Bayesian model for all parameters of the neural network at once, and a greedy Bayes model training the neurons one at a time based on the residuals of previous fits. The log-likelihoods of the posterior distributions we define are multimodal and non-concave, so sampling algorithms such as Markov Chain Monte Carlo (MCMC) will not be rapidly mixing to directly sample the posteriors. Using an auxiliary random variable, we produce a mixture distribution which we call a log-concave coupling. Using a continuous uniform prior over the $\ell_{1}$ ball, the conditional distributions of this mixture are log-concave, and the mixing distribution itself is log-concave when the number of parameters in our neural network exceeds the squared number of data points. Thus the mixture distribution can be sampled efficiently to produce samples for our original target density. For a discrete uniform prior over the $\ell_{1}$ ball intersected with a grid of small spacing, we study the performance of our posterior mean estimator in an arbitrary regret sense and a statistical risk sense. Say we have a target function $g$, with $\tilde{g}$ being its projection into the closure of the convex hull of signed neurons scaled by a constant. With neuron weight vectors of dimension $d$ and $N$ data points, we show an estimator defined by a combination of our posterior means in the joint sampling problem has arbitrary sequence regret and statistical risk within $O([(\log d)/N]^{1/4})$ of the regret and risk of $\tilde{g}$. For the greedy construction, the additional regret and risk is an improved third root power.
Recommended Citation
McDonald, Curtis James, "Computation and Estimation for Neural Networks via Log-Concave Coupling" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1625.
https://elischolar.library.yale.edu/gsas_dissertations/1625