"Learning Latent Representations for Biological Molecules" by Tianxiao Li

Date of Award

Fall 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational Biology and Bioinformatics

First Advisor

Gerstein, Mark

Abstract

Recent progresses in machine learning have achieved phenomenal successes in the prediction of properties of biological molecules from large biological data. However, how much the models really “understand” the nature of the data remains an important question. Representation learning aims at extracting information-rich, low dimensional vector representations from complex high dimensional data. Such representations not only facilitate visualization and interpretation of the data, but also flexibly and efficiently generalized to other downstream tasks without having to re-learn universal patterns in every application. Furthermore, in generative modelling, the latent representations can be directly manipulated to direct the generation towards desired properties. These are of particular importance to the computational modelling of biological molecules, as we seek to enhance our understanding of their properties and utilize them for the design of proteins and drugs.In this dissertation, we present several works on latent representation learning for biological molecules utilizing methods ranging from latent variable models and language models, offering novel insights into the pivotal explanatory patterns within their functions and structures. With these representation learning methods, we are able to quantify complex regulatory changes, identify DNA sequential patterns related to genomic variant effects, and manipulate the properties of known molecular templates.

Share

COinS