Date of Award

January 2025

Document Type

Open Access Thesis

Degree Name

Master of Public Health (MPH)

Department

School of Public Health

First Advisor

Mark Gerstein

Abstract

Traditional motif discovery tools often do not capture cooperative interactions between transcription factors (TFs). This study employs DNABERT, a transformer-based deep learning model, to analyze TF binding sites (TFBS) using attention mechanisms and k-mer tokenization. Fine-tuned on ENCODE ChIP-seq data, DNABERT achieved high predictive accuracy and could identify co-binders such as FOS and JUND through motif enrichment analysis. The model inferred position weight matrices (PWMs) aligning with JASPAR motifs and highlighted the regulatory importance of sequences near gene start regions. Comparisons with BPNet showed strong correlations, which validated DNABERT’s ability to capture TF-specific patterns. While constrained by dataset biases and computational demands, DNABERT advances co-binder TF correlation analysis by integrating attention-driven interpretability with sequence context analysis. This framework offers a foundation for exploring cooperative TF interactions and their roles in gene regulation, with future applications in disease research and multi-modal genomic integration.

Comments

This is an Open Access Thesis.

Open Access

This Article is Open Access

Share

COinS