Date of Award
January 2025
Document Type
Thesis
Degree Name
Master of Public Health (MPH)
Department
School of Public Health
First Advisor
Hongyu Zhao
Abstract
Understanding the genetic basis of rare diseases remains a central challenge in human genomics, especially when conventional methods rely on case-control labels that may oversimplify complex clinical variation. In this thesis, we introduce SimiSKATO, a novel approach that integrates phenotype embeddings with rare variant association testing. Using PERADIGM-generated similarity scores to represent clinical resemblance between individuals, SimiSKATO treats these scores as continuous traits within the SKAT-O framework. By applying this method to rare diseases in the UK Biobank, we demonstrate that SimiSKATO not only replicates known gene-disease associations but also identifies biologically plausible novel candidates missed by other methods. This embedding-based framework expands the toolkit for rare variant studies and underscores the value of integrating rich clinical data into genetic discovery.
Recommended Citation
Wang, Changheng, "Integrating Rare Variant Association Studies With Embedding-Based Methods: Simiskato" (2025). Public Health Theses. 2562.
https://elischolar.library.yale.edu/ysphtdl/2562
Comments
This thesis is restricted to Yale network users only. It will be made publicly available on 12/16/2025