Date of Award

January 2014

Document Type


Degree Name

Medical Doctor (MD)



First Advisor

Richard P. Lifton

Subject Area(s)

Genetics, Medicine, Biology


A RARE VIEW OF CODING MUTATIONS AND PLASMA LIPID LEVELS. Aniruddh P. Patel, Sekar Kathiresan. Center for Human Genetics Research, Massachusetts General Hospital, Harvard Medical School, Boston, MA and Program in Medical and Population Genetics, the Broad Institute of Harvard and MIT, Cambridge, MA (Sponsored by Richard P. Lifton, Department of Genetics, Yale University School of Medicine, New Haven, CT).

Plasma low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) are quantitative, heritable risk factors for coronary heart disease. Genome-wide association screens (GWAS) of common DNA sequence variants have identified many loci associated with plasma lipid levels. Targeted re-sequencing of exons has been proposed as a strategy to pinpoint causal variants and genes based in GWAS loci. Additionally, genotyping of rare and low frequency variants in large cohorts using an exome array has been proposed as a method to assess the contribution of rare variation to plasma lipid levels at the population level.

We tested the hypothesis that each genomic region identified with a significant HDL-C level association by GWA studies contains at least one gene causal for HDL-C metabolism. We performed solution-based hybrid selection of 4,118 exons at 407 genes within 47 loci associated with HDL-C and subsequently sequenced individuals drawn from the extremes of the HDL-C distribution (high HDL-C, n=385, mean=102 mg/dl or low HDL-C, n=334, mean=32 mg/dl) using next-generation sequencing technology. We tested whether rare coding sequence variants, individually or aggregated within a gene, were associated with HDL-C. To replicate findings, we performed follow-up genotyping using the Exome Array (Illumina HumanExome BeadChip) in independent participants with extremely high HDL-C (n=514, mean=98 mg/dl) or low HDL-C (n=580, mean=32 mg/dl). Through sequencing, we identified 8,138 rare (minor allele frequency < 5%) missense, nonsense, or splice site variants. Across discovery sequencing and replication genotyping, we found 3 variants to be significantly associated with HDL-C. Of these, none were novel. In gene-level association analyses where rare variants within each gene are collapsed, only the CETP gene was associated with plasma HDL-C (P=2.0 x 10-6). After sequencing genes from GWAS loci in participants with extremely high or low HDL-C, we did not identify any new rare coding sequence variants with a strong effect on HDL-C. These results provide insight regarding the design of similar sequencing studies for cardiovascular traits with respect to sample size, follow-up, and analysis methodology.

We then tested the hypothesis that rare coding and splice-site mutations contribute to inter-individual variability in plasma lipid concentrations in the population. We contributed to the design of a new, rare-variant genotyping array based on the sequences of the protein-coding regions of ~18,500 genes ("the exome") in >12,000 individuals. This genotyping array ("the Exome Chip") includes approximately 250,000 non-synonymous and splice-site mutations and is estimated to capture nearly all such variation with a >1:1000 allele frequency in the European population. We obtained Exome Chip genotype data in >130,000 individuals from 58 studies. Within each study, we tested the association of plasma lipids with individual rare variants. To combine statistical evidence across studies, we performed meta-analysis. Top results for each trait replicated established associations in the genes APOE, CETP, and APOA5 for LDL-C, HDL-C, and TG, respectively. We identified 11 new genes associated with plasma lipid levels: ABCA6 with LDL-C (C1359R, frequency = 1:100, effect=+8.2 mg/dl, P=9.7 x 10-32, SERPINA with LDL-C (E366K, frequency = 2:100, effect = +3.1 mg/dl, P=2.3 x 10-7), REST with LDL-C (R645W, frequency = 6:10000, effect = +13.7 mg/dl, P=5.0 x 10-7), FBLN1 with LDL-C (H695R, frequency = 2:100, effect = -2.7 mg/dl, P=5.3 x 10-7), CCDC117 with LDL-C (T232I, frequency = 9:1000, effect = -4.3 mg/dl, P=7.3 x 10-7), TMED6 with HDL-C (F6L, frequency = 4:100, effect = -0.8 mg/dl, P=4.4 x 10-9), CDC25A with HDL-C (Q24H, frequency = 3:100, effect = -1.0 mg/dl, P=8.4 x 10-8), MAP1A (P2349L, frequency = 3:100) with HDL-C (effect= -1.4mg/dl, P=3.9 x 10-14) and TG (effect=+8.4mg/dl, P=3.2 x 10-26), PRRC2A with TG (S1219Y, frequency = 2:100, effect = +6.6 mg/dl, P=4.6 x 10-17), COL18A1 with TG (V125I, frequency = 1:1000, effect = +18.0 mg/dl, P=1.3 x 10-7), and EDEM3 with TG (P746S, frequency = 1:100, effect = -5.4 mg/dl, P=2.4 x 10-7).

In addition, at some genes previously known to affect lipids, we identified new associations for variants: APOC3 (R19Stop, frequency = 3:10,000) with HDL (effect=+11mg/dl, P=9.9 x 10-12) and with TG (effect=-65.9mg/dl, P=5.8 x 10-23); (splicesite IVS2+1 G>A, frequency = 2:1000) with HDL (effect=+10.6mg/dl, P=3.5 x 10-42) and with TG (effect=-65.2mg/dl, P=2.0 x 10-81). Using the Exome Chip rare variant genotyping array, we have discovered several new genes and variants associated with plasma lipids.