Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Chemistry
First Advisor
Batista, Victor
Abstract
Drug discovery is a challenging endeavor for multiple reasons including the complexity of protein-ligand interactions at the molecular-level, the vastness of chemical space, and safety constraints related to unintended biological effects. In this dissertation, I present a suite of novel deep learning approaches that address these challenges from multiple angles. I first introduce HAC-Net— a deep learning model that, at the time, was the state of the art for predicting protein-ligand binding affinity—which was used to identify a potential inhibitor of a G protein-coupled receptor whose overexpression leads to cancer, diabetes, and multiple sclerosis, as well as a potential antivirulence drug for drug-resistant staphylococcal infections. HAC-Net provides chemists with a predictive tool for rational drug design by enabling accurate modeling of molecular interactions in biochemically relevant systems. Building upon that work, I developed T-ALPHA—the current state-of-the-art deep learning model for predicting protein-ligand binding affinity—which incorporates an uncertainty-aware self-learning method for protein-specific alignment. T-ALPHA not only improves upon HAC-Net by offering superior predictive accuracy on experimental structures, but, importantly, retains state-of-the-art performance on generated structures, enabling chemists to obtain accurate binding affinity estimates even in the absence of experimentally determined structures. Beyond discriminative tasks, I demonstrate the ability of generative machine learning methods to intelligently navigate chemical space to locate desirable regions. I created ChemSpaceAL—the first active learning methodology for fine-tuning a molecular generative model toward a specified protein target—which is particularly applicable to the creation of protein target-specific molecular libraries and is designed to be computationally efficient. We are currently utilizing this methodology in collaboration with the Lisi group at Brown University to design small-molecule binders to the HNH domain of CRISPR-Cas9 to enhance its specificity for target DNA sequences. Recognizing the importance of considering off-target safety effects in addition to on-target potency, I created CardioGenAI—a generative machine learning-based framework for re-engineering drugs for reduced hERG-related cardiotoxicity while preserving their primary pharmacology—which I applied to specific programs within Pfizer R&D that were dealing with hERG liabilities. This framework is particularly valuable for medicinal chemists seeking to optimize lead compounds for reduced cardiotoxicity early in the drug discovery pipeline. Collectively, these efforts advance early-stage drug discovery by providing chemists with computation tools that complement experimental approaches, facilitating the investigation of biochemically significant systems at the molecular level.
Recommended Citation
Kyro, Gregory William, "Deep Learning Methods for Protein–Small Molecule Interactions with Applications to Early-Stage Drug Discovery" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1537.
https://elischolar.library.yale.edu/gsas_dissertations/1537