Date of Award

Fall 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering (ENAS)

First Advisor

Panda, Priyadarshini

Abstract

The growing complexity of neural networks—from spiking neural networks (SNNs) to vision transformers (ViTs) and large language models (LLMs)—has posed major challenges for deployment on edge devices with limited compute and memory resources. This thesis addresses these challenges by proposing a suite of co-optimization strategies that span device, circuit, architecture, and algorithm layers to enable efficient AI inference on both analog and digital compute-in-memory (CiM) platforms. The first contribution, SpikeSim, presents an end-to-end simulation framework for evaluating SNNs on CiM hardware, uncovering key tradeoffs in neuronal memory, latency, and crossbar mapping. XPert introduces a differentiable co-search framework that jointly optimizes neural architecture and circuit-level parameters such as ADC precision and crossbar size, achieving significant energy and area efficiency. To handle the high latency and memory demands of ViTs, two frameworks—PIVOT and TReX—are introduced. These leverage input-dependent inference effort modulation and attention reuse, demonstrating high energy-delay-area product (EDAP) savings across both image and language tasks. Finally, MEADOW addresses the data movement bottlenecks in LLMs by proposing a memory-efficient dataflow (TPHS) and weight-packing strategy, enabling low-latency inference on edge platforms with limited DRAM bandwidth. Together, these contributions provide a cross-stack roadmap for deploying modern AI models on constrained hardware, emphasizing the importance of full-system co-design to overcome limitations in energy, latency, and scalability. This thesis lays the foundation towards state-of-the-art intelligent and efficient edge AI.

Share

COinS