Date of Award

Spring 2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Molecular Biophysics and Biochemistry

First Advisor

Neugebauer, Karla

Abstract

Eukaryotic genes contain non-coding sequences called introns. The removal ofintrons from pre-mRNAs, termed splicing, is carried out by the spliceosome, a multi-megadalton molecular complex of proteins and RNAs. Splicing occurs cotranscriptionally across multiple cell types and species. The Neugebauer lab has developed single molecule nascent RNA sequencing methods—including single molecule intron tracking (SMIT) and long-read sequencing (LRS) of nascentRNA— to visualize the precursors, intermediates, and products of transcription and splicing in budding and fission yeasts. Using these methods, the lab was able to estimate the kinetics of single intron removal in both yeasts by relating the 30 end of nascent RNA (the position of RNA Polymerase II) to progress of the splicing reaction. In both species of yeast, splicing proceeded rapidly and co-transcriptionally. In comparison to yeast, mammalian genes are much more complex—on average they contain eight long introns surrounded by short exons. It was unclear how the presence of many more long introns, often with more poorly conserved splice site sequences, would affect how splicing and transcription are coordinated. Thus, I have optimized new methods to isolate nascentRNAand analyze co-transcriptional splicing in mammalian cells. To determine how splicing is integrated with transcription elongation and 30 end formation in mammalian cells, I performed long-read sequencing of individual nascent RNAs and PRO-seq during murine erythropoiesis. I chose murine erythroid leukemia (MEL) cells as a model system, as they can be easily differentiated in vitro, and they express a subset of erythroid-specific genes at high levels. Many studies of gene expression have historically been carried out in erythroblasts, and the biogenesis of -globin mRNA—the most highly expressed transcript in erythroblasts—was the focus of many seminal studies on the mechanisms of premRNAsplicing. I isolated nascent, chromatin-associated RNAs from MEL cells before and after induction of terminal erythroid differentiation and performed long-read sequencing on the Pacific Biosciences Sequel platform. Splicing was not accompanied by transcriptional pausing and was detected when RNA polymerase II (PolII) was within 75 – 300 nucleotides of 30 splice sites, often during transcription of the downstream exon. Interestingly, several hundred introns displayed abundant splicing intermediates, suggesting that splicing delays can take place between the two catalytic steps of splicing. Overall, splicing efficiencies were correlated among introns within the same transcript, and intron retention was associated with inefficient 30 end cleavage. Remarkably, a thalassemia patient-derived mutation introducing a cryptic 30 splice site improves both splicing and 30 end cleavage of individual-globin transcripts, demonstrating functional coupling between the two co-transcriptional processes as a determinant of productive gene output. Thus, I conclude that highly expressed pre-mRNAs in MEL cells are largely spliced co-transcriptionally, and that the mammalian spliceosome can assemble and act rapidly on this set of pre-mRNAs. A previously unappreciated level of cross-talk between splicing and 30 end cleavage efficiencies is involved in erythroid development. Together, this work provides a high-resolution description of mammalian gene expression and shows that short-read RNA sequencing of bulk RNA can conceal coordinated behaviours that can only be observed at the level of individual nascent transcripts.

COinS