RNA splicing
RNA splicing is a fundamental post-transcriptional modification process in eukaryotic cells, in which non-coding introns are precisely excised from precursor messenger RNA (pre-mRNA) transcripts and the remaining coding exons are ligated together to produce mature mRNA ready for translation into proteins.[1] This mechanism was first discovered in 1977 through studies on adenovirus transcripts, revealing that genes are organized into discontinuous segments, challenging the prevailing view of continuous coding sequences. The process is catalyzed by the spliceosome, a large ribonucleoprotein complex composed of five small nuclear RNAs (snRNAs: U1, U2, U4, U5, and U6) and over 100 associated proteins, which assembles stepwise on the pre-mRNA to recognize splice sites and execute two sequential transesterification reactions: the first forming a lariat intermediate by cleaving the 5' splice site and linking it to the branch point, and the second ligating the exons while releasing the intron.[2] Splicing is highly accurate and regulated, ensuring that only functional mRNAs are produced, and its dysregulation is implicated in numerous diseases, including cancers and neurodegenerative disorders.[3] A key feature of RNA splicing is alternative splicing, which allows a single gene to generate multiple mRNA isoforms by selectively including or excluding exons, thereby vastly expanding the proteome diversity from a limited genome—estimated to produce over 90% of human multi-exon genes undergoing alternative splicing to yield thousands of variants.[4] This versatility is crucial for cellular differentiation, tissue-specific functions, and responses to environmental cues, as seen in the nervous system where alternative splicing modulates neuronal excitability and synaptic plasticity.[5] Regulation occurs through cis-acting elements (such as exonic and intronic splicing enhancers/silencers) and trans-acting factors (splicing regulators like SR proteins and hnRNPs) that fine-tune splice site selection, with recent structural insights from cryo-electron microscopy revealing dynamic conformational changes in the spliceosome that underpin these regulatory mechanisms.[6] Defects in splicing machinery, such as mutations in spliceosomal components, contribute to pathologies like spinal muscular atrophy and myelodysplastic syndromes, highlighting splicing as a therapeutic target.[7] The discovery of RNA splicing by Phillip Sharp and Richard Roberts in 1977 earned them the 1993 Nobel Prize in Physiology or Medicine, underscoring its transformative impact on molecular biology.[8] Since then, advances in high-throughput sequencing have illuminated the prevalence and complexity of splicing across species, from yeast to humans, and its evolutionary conservation as a driver of genomic efficiency.[9] Ongoing research continues to uncover novel splicing variants and modulators, promising new avenues for understanding gene regulation and developing splicing-targeted interventions.[10]Overview
Definition and process
RNA splicing is a critical post-transcriptional modification process in which non-coding sequences known as introns are removed from a precursor messenger RNA (pre-mRNA) transcript, and the remaining coding sequences, called exons, are precisely joined together to produce a mature mRNA ready for translation into protein.[1] This process ensures that only the functional coding information is retained, allowing for the accurate expression of genes in eukaryotic cells.[11] The basic mechanism of RNA splicing involves two sequential transesterification reactions. First, specific splice sites are recognized: the 5' splice site typically begins with a GU dinucleotide, the branch point features an adenosine residue located 20–50 nucleotides upstream of the 3' splice site, and the 3' splice site ends with an AG dinucleotide. In the initial step, the 2'-OH group of the branch point adenosine performs a nucleophilic attack on the phosphodiester bond at the 5' splice site, cleaving the 5' exon and forming a lariat intermediate where the intron is looped via a 2'-5' phosphodiester bond to the branch point.[12] The second step involves the 3'-OH of the freed 5' exon attacking the phosphodiester bond at the 3' splice site, ligating the two exons and releasing the intron lariat.[13] RNA splicing is a universal process observed across eukaryotes, archaea, and certain bacteria, though its prevalence and machinery vary. In higher eukaryotes like humans, introns are present in over 97% of protein-coding genes, often comprising the majority of transcript length and making splicing indispensable for proper gene expression.[14] Unlike other RNA processing events such as 5' capping or 3' polyadenylation, which primarily stabilize the mRNA, splicing directly alters the coding sequence by selecting and joining exons, thereby determining the final protein isoform.[15]Historical discovery
The discovery of RNA splicing began in 1977 when researchers independently identified discontinuous gene structures in adenovirus, revealing that eukaryotic genes are composed of interrupted coding sequences separated by non-coding regions, later termed introns. Phillip A. Sharp's team at the Massachusetts Institute of Technology used electron microscopy to visualize hybrid molecules of adenovirus mRNA annealed to viral DNA, showing that mRNA sequences were spliced from separate genomic segments.[16] Concurrently, Richard J. Roberts's group at Cold Spring Harbor Laboratory employed similar techniques to map cytoplasmic poly(A)+ RNA transcripts from adenovirus type 2, demonstrating collinear but non-contiguous alignment with the genome, thus establishing the concept of split genes.[17] This breakthrough challenged the prevailing view of continuous genes and laid the foundation for understanding pre-mRNA processing; Sharp and Roberts were awarded the 1993 Nobel Prize in Physiology or Medicine for their contributions.[18] In the early 1980s, investigations into the machinery of splicing advanced significantly. Michael R. Lerner and Joan A. Steitz at Yale University proposed that small nuclear ribonucleoproteins (snRNPs), recently identified as abundant nuclear particles containing small nuclear RNAs (snRNAs), play a central role in pre-mRNA splicing, based on their ability to bind specifically to intron sequences via immunoprecipitation assays with autoimmune sera. This work introduced the spliceosome as a dynamic ribonucleoprotein complex mediating splicing in higher eukaryotes. Independently, Thomas R. Cech's laboratory at the University of Colorado discovered self-splicing in the ribosomal RNA intron of Tetrahymena thermophila, where the RNA itself catalyzed its excision without protein assistance, providing the first evidence of RNA's enzymatic activity and expanding the catalytic repertoire of RNA molecules.[19] Cech and Sidney Altman, who identified catalytic RNA in RNase P, shared the 1989 Nobel Prize in Chemistry for these discoveries.[20] During the 1990s and 2000s, detailed mapping of spliceosome components emerged, particularly in humans, through proteomic and genetic approaches. Comprehensive analyses identified over 300 proteins associated with the human spliceosome, including many novel factors involved in assembly and catalysis, using tandem affinity purification and mass spectrometry on yeast and human models. The Human Genome Project further highlighted splicing's prevalence, estimating that alternative splicing affects approximately 60% of human genes based on expressed sequence tag (EST) alignments to the draft genome, underscoring its role in proteomic diversity. In the 2010s, structural biology revolutionized splicing research with cryo-electron microscopy (cryo-EM) providing atomic-level insights into spliceosome dynamics. Kiyoshi Nagai's group at the MRC Laboratory of Molecular Biology resolved the structure of the yeast spliceosome immediately after the branching step at 3.8 Å resolution, revealing key conformational changes and interactions in the catalytic core. Entering the 2020s, long-read sequencing technologies, such as Pacific Biosciences and Oxford Nanopore, unveiled unprecedented splicing complexity in human transcriptomes, identifying thousands of novel isoforms and tissue-specific events that short-read methods overlooked, thus refining estimates of splicing diversity across cell types.[21] In 2024, researchers published the first comprehensive blueprint of the human spliceosome, identifying its core composition of approximately 150 proteins with specialized regulatory functions, further advancing insights into splicing mechanisms and potential therapeutic targets.[22]Types of Splicing Pathways
Spliceosomal splicing
Spliceosomal splicing is the predominant mechanism for intron removal from nuclear pre-mRNA in eukaryotic cells, carried out by the spliceosome, a large ribonucleoprotein complex that assembles de novo on each intron.00146-9) This process ensures the production of mature mRNA by excising non-coding introns and ligating coding exons, with the major spliceosome handling the vast majority of introns in most eukaryotes.[23] The spliceosome comprises small nuclear ribonucleoproteins (snRNPs) and numerous associated proteins, enabling precise recognition and catalysis. The major spliceosome includes four key snRNPs: U1, U2, U4/U6 (a di-snRNP), and U5, each containing a uridine-rich small nuclear RNA (snRNA) bound to specific proteins.00146-9) These components recognize conserved splice site sequences at intron boundaries and facilitate the splicing reaction.[23] In contrast, the minor spliceosome processes a small subset of atypical U12-dependent introns using analogous but distinct snRNPs: U11, U12, U4atac/U6atac, and U5.[24] Assembly of the spliceosome proceeds through a series of dynamic, stepwise complexes on the pre-mRNA substrate. It begins with the commitment complex (E complex), where U1 snRNP binds the 5' splice site and U2 auxiliary factors associate with the branch point sequence, followed by U2 snRNP binding to form the pre-spliceosome (A complex).[25] The tri-snRNP (U4/U6·U5) then joins to create the pre-catalytic B complex, which rearranges to the activated B* complex and ultimately the C complex for intron excision.00146-9) This ordered recruitment ensures fidelity, with rearrangements driven by ATP-dependent helicases and protein factors.[25] Two primary models describe how splice sites are recognized during assembly: intron definition and exon definition. In the intron definition model, prevalent in organisms with short introns like yeast, the spliceosome initially pairs the 5' and 3' splice sites across the intron.[26] Conversely, the exon definition model, common in vertebrates with longer introns, involves initial recognition across the exon, where U1 and U2 snRNPs bind opposing splice sites flanking the exon, facilitating cross-exon interactions before intron removal.[27] These models reflect adaptations to genomic architecture, with consensus sequences at splice sites playing a brief role in initial binding.[26] Most spliceosomal introns are U2-dependent, recognized by the major spliceosome, while U12-dependent introns, comprising about 0.35% of human introns, require the minor spliceosome and often feature AU-AC termini instead of the typical GU-AG.[24] In the human genome, introns average around 3 kb in length, vastly exceeding the typical exon size of about 145 nucleotides, which contributes to the complexity of accurate splicing.[28] Trans-splicing represents a specialized variant of spliceosomal splicing in certain eukaryotes, where a short leader sequence from one RNA molecule is joined to the 5' end of an independent pre-mRNA exon, rather than ligating exons from the same transcript.[29] This process, mediated by similar snRNPs as cis-splicing, occurs prominently in trypanosomes, where it adds a spliced leader to all mRNAs to resolve polycistronic transcripts, and in Caenorhabditis elegans, affecting about 70% of genes to add either SL1 or SL2 leaders.[30] Though rare in vertebrates, it highlights the spliceosome's versatility.[29] For exceptionally long introns, recursive splicing provides a mechanism to subdivide removal into multiple steps, using internal "ratchet" sites that mimic 3' splice sites. In Drosophila melanogaster, where introns can exceed 50 kb, this stepwise process enhances splicing accuracy by iteratively excising portions, as seen in the 74-kb ultrabithorax intron.[31] Recursive sites are enriched and conserved in long introns, preventing aberrant splicing and maintaining efficiency.[32]Self-splicing
Self-splicing refers to a form of RNA splicing in which the intron excises itself from the precursor RNA through ribozyme activity, independent of protein enzymes. This process was first demonstrated in 1982 with the ribosomal RNA precursor from the ciliate Tetrahymena thermophila, where the 413-nucleotide intervening sequence (IVS) was shown to autocatalytically excise and circularize under in vitro conditions mimicking physiological ionic strength, without requiring additional factors beyond a guanosine cofactor.[33] Group I introns are the most extensively studied class of self-splicing introns, characterized by a conserved secondary structure featuring paired helices and an internal guide sequence that aligns the 5' splice site with the 3' hydroxyl of a guanosine cofactor. These introns are predominantly found in organellar genomes (mitochondria and chloroplasts), ribosomal RNA genes of protists and fungi, and bacteriophage genomes, with prokaryotic origins suggesting horizontal transfer to eukaryotic organelles. The splicing mechanism proceeds via two transesterification reactions: first, an exogenous guanosine (or GTP/GMP) attacks the 5' splice site, cleaving the upstream exon and attaching to the intron's 5' end; second, the newly freed 3' hydroxyl of the upstream exon attacks the 3' splice site, forming the ligated exons and releasing the linear intron, which often cyclizes via a 2'-3' phosphodiester bond. This guanosine-dependent pathway requires divalent metal ions like Mg²⁺ for catalysis and is highly efficient in vitro, with rate constants approaching physiological speeds.[34][35] Group II introns, another major class of self-splicing elements, are structurally more complex with six helical domains and exhibit a branching mechanism analogous to that of spliceosomal introns, forming a lariat intermediate. These introns are common in mitochondrial and chloroplast genomes of fungi, plants, and algae, as well as in bacterial genomes, where they often encode a multifunctional reverse transcriptase-like protein that promotes their mobility as retroelements. Splicing initiates with the 2' hydroxyl of a bulged adenosine (branch point) within domain VI attacking the 5' splice site, generating a lariat intron and freeing the upstream exon; the second transesterification then joins the exons and releases the lariat intron, again facilitated by Mg²⁺ ions in the active site. Unlike Group I, Group II introns can splice in the absence of exogenous cofactors, though some rely on maturase proteins encoded within the intron for stability in vivo.[36][12] Self-splicing introns of both groups have prokaryotic origins, with Group I introns identified as over 42,000 across nature as of 2025 and Group II introns numbering in the thousands, primarily in bacterial and organellar contexts, reflecting sporadic distribution and horizontal mobility that contributed to their spread into eukaryotic lineages.[37] Group I and II introns are evolutionarily linked to the emergence of spliceosomal splicing through shared catalytic cores.[35][38]tRNA and minor spliceosomal splicing
tRNA splicing occurs in eukaryotes and archaea, where introns are typically located within the anticodon loop of pre-tRNA transcripts.80287-1) These introns are removed through a protein-dependent pathway involving distinct enzymatic steps, contrasting with self-splicing mechanisms in bacteria.57862-0/fulltext) The process begins with site-specific cleavage by a heterotetrameric tRNA splicing endonuclease complex, composed of subunits homologous to Sen proteins in yeast (such as Sen2, Sen34, Sen54, and Sen55), which recognizes structural features of the pre-tRNA rather than sequence alone.[39] In yeast, the endonuclease generates exons with 5'-hydroxyl and 2',3'-cyclic phosphate termini, leaving the intron as a linear fragment.[40] The subsequent ligation step seals the exons using a multifunctional ligase, such as Trl1 in yeast, which first opens the 2',3'-cyclic phosphate to a 2'-phosphate intermediate before forming the standard 3'-5' phosphodiester bond.57862-0/fulltext) This pathway ensures the production of mature tRNA capable of participating in translation, with the cyclic phosphate intermediate being a hallmark of the eukaryotic and archaeal tRNA splicing mechanism.80287-1) A well-studied example is the intron in the yeast tRNATyr gene (SUP6), where removal is essential not only for maturation but also for proper post-transcriptional modification of the tRNA.[41] In addition to tRNA processing, minor spliceosomal splicing handles a rare class of nuclear pre-mRNA introns known as U12-type, which constitute approximately 0.4% of human introns and often feature AU-AC terminal dinucleotides instead of the typical GU-AG.[42] The minor spliceosome employs specialized small nuclear ribonucleoproteins (snRNPs): U11/U12 and U4atac/U6atac, along with the shared U5 snRNP, to recognize and excise these introns through a process analogous but distinct from major spliceosomal activity.[43] These U12-type introns are enriched in genes expressed in neurons, suggesting specialized roles in neural function and development.[44] Representative examples include AT-AC introns in human genes such as ATR, which encodes a key DNA damage response kinase and relies on minor spliceosome components for accurate isoform production.[45]Biochemical Mechanisms
Splice site recognition and consensus sequences
In eukaryotic pre-mRNA splicing, splice site recognition begins with the identification of conserved sequence motifs at the exon-intron boundaries and within introns, which serve as docking sites for small nuclear ribonucleoproteins (snRNPs) and auxiliary factors. These motifs ensure precise excision of introns and ligation of exons, with deviations from consensus often requiring additional regulatory elements for efficient processing. The core signals include the 5' splice site (5' SS), branch point sequence (BPS), and 3' splice site (3' SS), each exhibiting species-specific consensus patterns derived from extensive sequence analyses and modern computational tools such as position weight matrices (PWMs) and databases like U12DB.[46] The 5' SS is defined by a nearly invariant GU dinucleotide at the start of the intron, forming part of a broader consensus sequence such as /exonCAG|GURAGU in mammals, where the vertical bar denotes the cleavage point and R represents a purine. This GU motif, first identified in viral and cellular genes, is essential for base-pairing with the 5' end of U1 snRNA, initiating spliceosome assembly. Upstream of the 5' SS, sequences resembling polypyrimidine tracts can influence recognition in certain contexts, though they are more prominently associated with the 3' SS. Mutations altering the GU dinucleotide abolish splicing, underscoring its critical role. The BPS, located approximately 20-50 nucleotides upstream of the 3' SS, features a conserved adenosine residue that acts as the nucleophile in the first transesterification step, forming a lariat intermediate. In mammals, the BPS consensus is YNCURAC (Y = pyrimidine, N = any nucleotide, R = purine, underlined A = branch point adenosine), a motif identified through mutational analysis of rabbit β-globin pre-mRNA. This sequence binds SF1/mBBP and facilitates U2 snRNP association, with the distance to the 3' SS influencing efficiency; optimal spacing enhances lariat formation.[47] The 3' SS consists of an AG dinucleotide immediately downstream of a polypyrimidine tract (Py tract), typically 12-20 uridine/cytidine-rich nucleotides that promote U2 auxiliary factor (U2AF) binding. This Py-AG arrangement, conserved across metazoans, was established in early intron sequencing studies and is crucial for defining the acceptor site, with the Py tract compensating for weak AG contexts by recruiting U2AF65. The scanning model posits that U2AF searches downstream from the BPS for the first suitable AG, ensuring accurate cleavage. Splice site recognition is modulated by cis-regulatory elements, including exonic splicing enhancers (ESEs) and intronic splicing enhancers (ISEs), which bind serine/arginine-rich (SR) proteins to stabilize core site interactions, particularly for suboptimal sequences. Conversely, exonic splicing silencers (ESSs) and intronic splicing silencers (ISSs) recruit heterogeneous nuclear ribonucleoproteins (hnRNPs), such as hnRNP A1, to repress usage. ESE motifs, often purine-rich (e.g., GAR repeats), were first characterized in the immunoglobulin μ chain gene and promote exon inclusion by recruiting SR proteins like SF2/ASF. ISEs and ISSs, identified through systematic screens, similarly influence site choice; for instance, G-rich ISEs bind hnRNP F/H to enhance upstream exon definition. These elements are essential for fine-tuning splicing fidelity. Variations in splice site consensus exist, notably in U12-type introns, a minor class (~1% in humans) processed by the minor spliceosome and featuring AU-AC dinucleotides instead of GU-AG. These were discovered through computational analysis of divergent 5' SS sequences and exhibit extended consensus like /RTATCCTTT/, with higher conservation due to their rarity.[48] U12-type introns also have a distinct BPS (UUCCUAAC). Weak splice sites, deviating significantly from consensus (e.g., non-GU 5' SS), depend on auxiliary factors like SR proteins binding ESEs/ISEs to compensate for poor base-pairing with snRNAs, as demonstrated in mutagenesis studies of β-globin introns. Such enhancements can increase splicing efficiency by 10- to 100-fold for suboptimal sites.| Splice Site | Consensus Motif (Mammals) | Key Features | Binding Factor |
|---|---|---|---|
| 5' SS | CAG|GURAGU | GU dinucleotide invariant; R = purine | U1 snRNP |
| BPS | YNCURAC (20-50 nt upstream of 3' SS) | Underlined A = branch adenosine; Y = pyrimidine | SF1, U2 snRNP |
| 3' SS | (YnC)AG| | Py tract (YnC, n=12-20); AG invariant | U2AF |