Date of Award

Fall 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering (ENAS)

First Advisor

Tassiulas, Leandros

Abstract

Real-world signals—networks, sequences, and space-time measurements—are held together by structured correlations that are neither purely local nor purely global and that evolve over time. This dissertation shows how learning and exploiting such correlation structure leads to models that are accurate, scalable, and interpretable, and how the same structure can guide sampling and editing. We begin with spatial correlations on graphs, treating the observed topology as a prior and learning the interaction structure that truly governs nodes—jointly (i) modeling linear dependencies via trainable graph filters, (ii) capturing non-linear cues through similarity matching (kernel/attention affinities), and (iii) inferring latent edges to repair missing or spurious links—yielding both a strong predictor and an interpretable correlation map that highlights subgraphs and motifs at scale. We then carry the same principle to long sequences by viewing tokens as nodes in an implicit affinity graph and exploiting those correlations to make transformer attention both expressive and efficient: a multi-hop attention diffusion propagates information over a learned sparse token graph to connect distant tokens within a single layer, while attention tensorization folds long inputs into low-order tensors so short-range attention along each axis composes into long-range interactions—together retaining full-attention benefits at sub-quadratic cost and enabling robust length extrapolation. Next, we move from one-dimensional sequence correlations to two-dimensional space-time. We model node interactions jointly across space and time by learning adaptive graphs over time and applying spatio-temporal attention that is local in space yet preserves long temporal pathways, yielding robustness to topology drift and missing sensors. Beyond temporal dimension prediction, we treat time series as a first-class modality alongside language and integrate them into multimodal LLMs: shared space-time embeddings and cross-modal attention let the model condition forecasts on textual context and verbalize quantitative trends. This unified treatment supports applications in financial analysis, weather forecasting, and computer network telemetry. Finally, we develop structure-guided generation along two tracks. For images, we replace a single entangled prompt with item-specific prompts and grouped cross-attention, yielding a unified editor (D-Edit) that supports text/image/mask edits and item removal with local, faithful changes. For dynamical data, we introduce graph-coupled diffusion/flow on learned space–time graphs: forward noise and reverse updates are topology-preconditioned (e.g., Laplacian shaping with conservative edge mixing), and conditioning is posed as boundary constraints. Across both, the principle is the same—align the generator with correlation structure—to achieve controllable, calibrated sampling and editing in images and space–time systems. Taken together, the thesis advances one principle: learn the correlation structure that ties together nodes, tokens, and space-time—and reuse it for prediction and generation. The resulting systems deliver state-of-the-art accuracy with sub-quadratic compute, remain robust under topology drift and missing data, and keep decisions traceable. This perspective offers a practical template for scalable, controllable, and interpretable models across traffic, text, and network telemetry.

Share

COinS