Date of Award
Fall 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Chemistry
First Advisor
Jorgensen, William
Abstract
The drug discovery pipeline is both expensive and prone to high attrition rates with many compounds failing in late-stage clinical trials. Computer-aided drug design (CADD) has emerged as a pivotal part of the pipeline to mitigate these challenges by enabling earlier and more cost-effective assessment of compound viability. This work focuses on two major aspects of CADD: absorption, distribution, metabolism, excretion, and toxicity (ADMET) predictions and free energy calculations. First, neural network models are developed for the prediction of aqueous solubility and cytotoxicity, two key contributors to late-stage drug attrition. Ensemble models of graph convolutional networks (GCNs) and graph attention networks (GATs), coined SolNet-GCN, SolNet-GAT, ToxNet-GCN, ToxNet-GAT, are trained and tested on curated datasets. The solubility models demonstrate superior performance over established approaches, both in cross-validation and on an independent validation set. For cytotoxicity prediction, a dataset of non-nucleoside reverse transcriptase inhibitors (NNRTIs) and corresponding experimental measurements are compiled. The cytotoxicity models outperform traditional descriptor-based models, serving as the first step to filling the unmet need of accurate toxicity predictions in the drug discovery process. Next, absolute binding free energy (ABFE) calculation protocols are optimized within the context of SARS-CoV-2 main protease inhibitors. Metadynamics, an enhanced sampling technique, is compared against free energy perturbation (FEP) methods. Metadynamics simulations leverage collective variables (CVs) to reconstruct the potential of mean force (PMF), thus allowing for the estimation of ABFE. Consensus docking, a methodology introduced in this work, is also explored for its ability to rank inhibitors based on binding affinity. This work highlights potential uses for these methods in the hit-to-lead pipeline. Finally, a validation protocol for molecular mechanics force fields via boiling point predictions of organic liquids is introduced. Molecular dynamics (MD) and Monte Carlo (MC) simulations are performed, and the Gibbs free energy of vaporization is calculated. The Optimized Potentials for Liquid Simulations (OPLS) all-atom force field is employed for all simulations. With a mean absolute error of 7.41% (MD) and 10.58% (MC), this work shows that the parameterization of the OPLS-AA force field using enthalpies of vaporization and gas phase properties is sufficient for accurate boiling point predictions from molecular simulations.
Recommended Citation
Saar, Anastasia, "Streamlining Drug Discovery with the Aid of Artificial Intelligence and Physics-Based Modeling" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1803.
https://elischolar.library.yale.edu/gsas_dissertations/1803