Date of Award
Fall 2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Public Health
First Advisor
Hoh, Josephine
Abstract
Background: Domestic dogs (Canis lupus familiaris) share more than 300 common diseases and similar spontaneous disease development characteristics with humans, making them an excellent animal model to study human heritable diseases. Besides, dogs as a model organism also have been favored for mapping complex traits that are hard to map in humans benefiting from drastically different phenotypes and limited genetic heterogeneity across breeds as results of intensive artificial selection. In light of previous studies that profiled breed-specific traits or used genome-wide association studies (GWAS) to refine loci associated with characteristic morphological features in dogs, the field has gained tremendous genetic insights for known dog traits observed among breeds. This dissertation aims to address the question from a reserved perspective: whether there are breed-specific genotypes that may underly currently unknown phenotypes. To identify such genotypes, this dissertation proposed and investigated a novel genetic concept identified in modern dog breeds which is breed-specific genetic signature (BSGS). The BSGS profile of a breed characterize all genetic variants that are both homogeneously enriched in such breed and universally absent in all other breeds. The genetic uniqueness of BSGS make them high-priority candidate variants behind those distinctive breed-defining traits in corresponding breeds. Therefore, deciphering and prioritizing BSGS are important for revealing yet unknown gene-trait associations in dogs and further generating public health implications in humans. Overall, this dissertation leveraged large-scale whole genome sequencing data and a wide variety of computational approaches to discover breed-specific genetic underpinning behind breed-specific phenotypes in dogs and to empower dogs as an effective model organism to resolve complex human traits. The specific aims to achieve the overall goal were: Aim 1. To collect, generate, and assemble high-quality whole genome sequencing data into a comprehensive genetic variants catalog representing a highly diversified dog population (Chapter 2). Aim 2. To systematically profile genetic signatures exclusively owned by single dog breed and breed-pairs. Under this aim, each of three different types of short variants including single nucleotide polymorphisms (Chapter 3), short insertions and deletions (Chapter 4) and short tandem repeats (Chapter 5) was independently investigated. Aim 3. To analyze the colocalization of breed-specific genetic signatures and to investigate the selection drive of long breed-specific genomic segments (Chapter 6). Methods: The analytical approaches to achieve each aim were: Aim1. High-quality raw Whole Genome Sequencing (WGS) data covering a wide variety of well-defined dog breeds (452 dogs from 97 breeds) was both collected from public archive and generated in Hoh lab. A novel quality control tool was developed to comprehensively evaluate key hidden quality metrics of raw sequencing data and generate essential parameters for quality trimming. Trimmed sequencing data was assembled based on the most updated high-resolution dog reference genome. High-quality genomic variants were catalogued by variant type. Aim2. A novel multi-threaded computational tool leveraging unsupervised machine learning algorithms was developed to efficiently discover BSGS throughout the genome for qualified samples (412 dogs from 76 breeds). BSGS profiles were analyzed both within breed and across breeds. Functional annotation information was incorporated to evaluate potential protein-altering effects of each individual BSGS. Selected high-impact BSGS were experimentally validated via genotyping. Shared genomic signatures were examined to establish a novel breed similarity metrics and infer common evolutionary history between breeds. Potential causes of certain signature patterns were evaluated from evolutionary perspectives. Aim3. BSGS of different types of variants were pooled and analyzed collectively. Long breed-specific genomic segments of variants were identified using canonical correlation analyses and an adaptive sliding-window algorithm. Both spatial and annotation information were incorporated to identify BSGS enriched genomic regions. Evidence of intensive selections over breed-specific genomic segments were analyzed and used to prioritize potential functional variants associated with breed-defining traits. Results: This dissertation has generated a large amount of results previously unreported. To avoid comoutational artifacts, I collaborated with the Hoh lab members to unbiasedly validate several findings using additional dogs and different experimental methods. The computer codes, the entire dataset and full BSGS results are availble at the website “https://medicine.yale.edu/lab/hoh/data/.” (listed in the Appendices section). Parts of the work have been published in the journal of BMC Genomics. Below, I briefly highlight several significant findings. Aim1. I constructed a comprehensive dog whole-genome variant catalog containing 28 million high-quality short genomic variants for 452 dogs from 97 breeds. Around three quarters of variants observed within this dog variant collection were SNPs with the remaining quarter identified as short INDELs. Dog INDELs were observed to be much more polymorphic than SNPs as more than a third of INDELs were multi-allelic compared to less than five percent in SNPs. Although in general only a tiny fraction of variants located in exons, INDELs were also significantly more enriched within those functional regions than SNPs. Aim2. Among 144 functional BSGS identified within the entire dog WGS collection, nine BSGS with significant protein-altering effects were highlighted, which could potentially link to breed-specific phenotypes yet to be defined. Four novel nonsense BSGS with predicted prominent protein-truncation effects were found. SLC28A1 and SLAMF8 were severely truncated in Bernese Mountain Dog and Samoyed, respectively. PIBF1 and MYH16 were partially truncated in Bull Terrier and Basset Hound, respectively. Four INDELs resulting in either frame-shift or codon disruptions were found. Norwich Terrier and Airedale Terrier carried a frame-shift variant in ZDHHC1 and OR5J2 gene, respectively. Chow Chow carried an INDEL that can cause disruptive in-frame deletion in SIPA1L1. Bernese Mountain Dog carried an INDEL leading to the loss of one codon in CENPU. One functional breed-specific STR contraction causing loss of consecutive glycine were exclusively found in the coding region of TENT5A gene among Bull Terrier.Together with archaeological timelines and breeding records, the BSGS profile showed that ancient breeds generally carried more BSGS than recently established ones. The relative magnitude of BSGS number also revealed the history of certain breeds suffered from population bottleneck such as Bichon Frise which challenges people’s stereotype on their reportedly long breed history. The novel established GS-sharing based similarity metrics revealed close breed relationship between breed-pairs both with and without high morphological similarities such as Collie-Shetland-Sheepdog and Alaskan-Malamute-Chow-Chow. These results further provided information to yet unknown breed-defining phenotypes being shared by seemingly distantly-related breeds. Aim3. A total of 51 breed-specific long genomic segments from 19 different breeds were identified to each contain at least one functional BSGS. Among them, six influential functional BSGS with significant protein changing effects including three nonsense BSGS, two BSGS causing in-frame deletion and one BSGS with frameshift effect were found to locate within corresponding breed-specific segments. Seven breed-specific genomic structures enriched with multiple functional BSGS were discovered as marks of past selection events in four breeds. Specifically, one long-range missense BSGS trio (AQP3-NFX1-NOL6) was found in Akita. One breed-specific genomic structure containing two missense BSGS in FBXO40 and GOLGB1 was found in Alaskan Malamute. Two breed-specific genomic structures with one containing functional BSGS in MYO15B and SAP30BP, and the other containing CD300A and OTOP2 were found in Boxer. Three functional BSGS pairs (CTRL-SLC12A4, ZDHHC1-PLEKHG4 and PCNX3-SNX32) were identified in three breed-specific stretches in Norwich Terrier. Strong correlation between these functional BSGS and clusters of surrounding nonfunctional BSGS further indicated their role as potential selection target underneath certain breed-defining phenotypes. Conclusions: Understanding the unique breed structure within modern pure-bred dogs is the key for researchers to successfully design and conduct canine-based medical studies. As indicated by results of this dissertation, genetic background of dogs could vary greatly across breeds and make choices of appropriate breeds essential to relevant studies. Given the strong relationship between human and dog breed-specific traits, results from this dissertation could be of considerable interest to researchers and all. Novel genetic signatures that potentially differentiate dog breeds were uncovered. Several influential functional genetic signatures might indicate potentially breed-specific unknown phenotypic traits or disease predispositions. Overall, the BSGS profile opened the door for further investigations to discover unknown breed-specific phenotypes such as disease predispositions. Additionally, the method and computational tool developed in this dissertation can be applied to WGS collection of different dog breeds as well as model organisms besides dogs. This study will stimulate new thinking, as the results of breed-specific genetic signatures may offer an overarching relevance of the animal models to human health and disease.
Recommended Citation
Li, Zicheng, "A Multidisciplinary Approach With Dogs As The Model Organism To Identify Whole-Genome Breed-Specific Genotypes Potentially Relating To Human Complex Traits" (2023). Yale Graduate School of Arts and Sciences Dissertations. 1215.
https://elischolar.library.yale.edu/gsas_dissertations/1215