Date of Award
January 2025
Document Type
Open Access Thesis
Degree Name
Master of Public Health (MPH)
Department
School of Public Health
First Advisor
Mark Gerstein
Abstract
Traditional motif discovery tools often do not capture cooperative interactions between transcription factors (TFs). This study employs DNABERT, a transformer-based deep learning model, to analyze TF binding sites (TFBS) using attention mechanisms and k-mer tokenization. Fine-tuned on ENCODE ChIP-seq data, DNABERT achieved high predictive accuracy and could identify co-binders such as FOS and JUND through motif enrichment analysis. The model inferred position weight matrices (PWMs) aligning with JASPAR motifs and highlighted the regulatory importance of sequences near gene start regions. Comparisons with BPNet showed strong correlations, which validated DNABERT’s ability to capture TF-specific patterns. While constrained by dataset biases and computational demands, DNABERT advances co-binder TF correlation analysis by integrating attention-driven interpretability with sequence context analysis. This framework offers a foundation for exploring cooperative TF interactions and their roles in gene regulation, with future applications in disease research and multi-modal genomic integration.
Recommended Citation
Zhao, Zetong, "Dnabert Reveals Tf Motifs Information In Dna With Attention" (2025). Public Health Theses. 2574.
https://elischolar.library.yale.edu/ysphtdl/2574

This Article is Open Access
Comments
This is an Open Access Thesis.