Parkour in Protein Morphospace

Date of Award

Spring 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Biology and Bioinformatics

First Advisor

Machta, Benjamin

Abstract

Proteins can accumulate mutations over evolution and form distinct lineages through geneduplications. However, not all mutations are equally admissible – evolution is constrained to traverse through the parts of sequence space that are functionally viable. The connectivity structure of protein morphospace – the space of functional sequences – remains to be a mystery. In this dissertation, we use recent advancements in artificial intelligence to poke at this question. In the first chapter, we propose the One Fell Swoop method to identify sensible mutations within a protein sequence and estimate its functional plausibility. We evaluate how it does on experimental measurements of protein function through DMS experiments curated in the ProteinGym Benchmark, and explore how it can be used for protein design. In the second chapter, we define a search protocol through protein morphospace that uses One Fell Swoop to connect arbitrary pairs of homologous sequences. We found viable paths between protein pairs that share just 18% sequence identity. Notably, we observed that some intermediates along the path can acquire structures not shared by the end-point sequences but found in other members of the protein family. In the third chapter, we discuss how surface-level features in biological sequences can lead to inflated fitness scores as evaluated by transformer-based language models. Looking forward, we speculate that a measure of the ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

This document is currently not available here.

Share

COinS