Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Engineering and Applied Science
First Advisor
Panda, Priya
Abstract
The widespread integration of Artificial Intelligence (AI) into everyday life is driving a paradigm shift towards deploying machine learning models on a massive, distributed ecosystem of edge devices. While this decentralization offers opportunities for data-driven applications, it introduces a fundamental trilemma: the concurrent pursuit of high performance, operational efficiency, and stringent data privacy. State-of-the-art AI models, particularly Large Language Models (LLMs), demand computational and memory resources that far exceed the capabilities of typical edge devices, creating significant bottlenecks in training and inference. Furthermore, the distributed nature of data gives rise to challenges of statistical heterogeneity and communication overhead, while the sensitivity of user data necessitates robust privacy-preserving mechanisms. This dissertation posits that overcoming these multifaceted challenges requires a holistic, cross-stack research methodology that spans hardware, algorithms, architectures, and models. This work presents a portfolio of six novel contributions that collectively form a comprehensive framework for building practical, high-performance AI systems that operate effectively on resource-constrained, heterogeneous, and privacy-sensitive edge devices. The contributions first address training efficiency. DC-NAS is a divide-and-conquer federated Neural Architecture Search that speeds up architecture discovery. FedSNN shows that Spiking Neural Networks (SNNs) are a scalable and energy-efficient alternative to traditional models. At the hardware level, HaLo-FL is a hardware-aware federated learning framework that co-designs the training process using adaptive low-precision techniques to match the specific constraints of diverse clients.Next, this work focuses on fine-tuning LLMs. FedPEFT is a specialized framework that reduces communication and training complexity for federated transformer models while maintaining accuracy. For privacy preserving fine-tuning, ECLIPSE introduces a split-compute architecture that combines on-device private components with a cloud-based backbone, using differential privacy. Finally, to optimize inference, FSD presents a fast speculative decoding framework for edge-cloud systems that reduces latency by maximizing client-server parallelism. Collectively, these contributions offer a multi-layered set of solutions that advance distributed machine learning, showing a clear path towards efficient, scalable, and private AI at edge devices.
Recommended Citation
Venkatesha, Yeshwanth, "Resource-Aware Distributed Machine Learning: Unified Approaches for Private and Efficient On-Device Intelligence" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1546.
https://elischolar.library.yale.edu/gsas_dissertations/1546