Parkour in Protein Morphospace
Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computational Biology and Bioinformatics
First Advisor
Machta, Benjamin
Abstract
Proteins can accumulate mutations over evolution and form distinct lineages through geneduplications. However, not all mutations are equally admissible – evolution is constrained to traverse through the parts of sequence space that are functionally viable. The connectivity structure of protein morphospace – the space of functional sequences – remains to be a mystery. In this dissertation, we use recent advancements in artificial intelligence to poke at this question. In the first chapter, we propose the One Fell Swoop method to identify sensible mutations within a protein sequence and estimate its functional plausibility. We evaluate how it does on experimental measurements of protein function through DMS experiments curated in the ProteinGym Benchmark, and explore how it can be used for protein design. In the second chapter, we define a search protocol through protein morphospace that uses One Fell Swoop to connect arbitrary pairs of homologous sequences. We found viable paths between protein pairs that share just 18% sequence identity. Notably, we observed that some intermediates along the path can acquire structures not shared by the end-point sequences but found in other members of the protein family. In the third chapter, we discuss how surface-level features in biological sequences can lead to inflated fitness scores as evaluated by transformer-based language models. Looking forward, we speculate that a measure of the ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.
Recommended Citation
Kantroo, Pranav, "Parkour in Protein Morphospace" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1684.
https://elischolar.library.yale.edu/gsas_dissertations/1684