Date of Award
Spring 1-1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Chemistry
First Advisor
Batista, Victor
Abstract
Throughout human history, our ancestors have continuously expanded the frontiers of their interests in nature. Chemistry is one of the essences of such explorations. Yet, despite countless discoveries and experiments, we are still on a journey to examine the possibilities within our finite time. With an ever-growing amount of information and open access to public resources, we are experiencing an inflection point. This moment redefines how efficiently we can solve problems, and the key lies in integrating increasing insights from the entire history of knowledge. Algorithms, in contrast to human efforts, are scalable with the amount of available hardware. In many cases where humans cannot enumerate every possibility to find answers, algorithms fill the gap. Machine Learning is a special kind of algorithm that leverages gradient descent and statistical patterns. The capability of finding solutions using processes similar to neural activation in the human brain makes it increasingly powerful in areas once thought to be human-dominant. Just as Moore's law predicts that the number of transistors built on an integrated circuit doubles every two years, our computational capacity will continue to grow exponentially.I believe that in the future, we will no longer need to explore and deduce everything ourselves. Instead, we can comprehend information more efficiently by studying through the lens of algorithms, which can be treated not just as a simple collection, but as an entity encompassing all human knowledge. My research has two aims: 1) to harness the scaling nature of algorithms and 2) to integrate them intelligently for the study of Chemistry. The term Continization, which I have created, defines a process that transforms discrete descriptions into continuous representations. The formal definition is: A process through which discrete representations like words, molecules,as well as other abstract concepts can be represented as continuous variables preserving relational properties and contextual dependencies. In Chemistry, many descriptions are discrete, especially regarding categories. People speak of "a molecule", "a chemical reaction", or "a synthesis route". While one cannot traditionally measure "how far one molecule is from another", after continization, such measurements gain meaning and can be defined in a specific mathematical space. Optimizations seeking solutions in such spaces can be explored using gradient descent. Through this approach, we can not only interpolate among existing possibilities but also extrapolate to territories beyond the scope of prior knowledge.
Recommended Citation
Li, Haote, "Continization and Language Representation of Chemical Information" (2025). Yale Graduate School of Arts and Sciences Dissertations. 1767.
https://elischolar.library.yale.edu/gsas_dissertations/1767