Fact-checked by Grok 2 weeks ago

Exon

An exon is a segment of a gene's DNA sequence in eukaryotic organisms that is transcribed into and retained in the (mRNA) after the removal of intervening non-coding sequences known as introns during the process of . These sequences typically encode portions of proteins but can also include untranslated regions (UTRs) at the 5' or 3' ends of the mRNA that influence stability, localization, and translation efficiency. The term "exon," short for "expressed region," was coined by biochemist in 1978 to describe these functional units of split genes. The discovery of exons and introns fundamentally reshaped by revealing that most eukaryotic genes are discontinuous, consisting of coding exons separated by non-coding introns. This breakthrough was independently achieved in 1977 by Phillip Sharp at the and Richard Roberts at through experiments on adenovirus RNA, demonstrating that mRNA is assembled from non-contiguous segments. Their work earned them the 1993 in or for elucidating the split structure of genes. Exons are recognized and joined by the , a complex of small nuclear ribonucleoproteins (snRNPs) that identifies exon-intron boundaries via conserved sequence motifs such as the 5' site (GU), , and 3' site (AG). Exons play a pivotal role in generating proteomic diversity through , a regulated process where different exon combinations from the same pre-mRNA produce multiple mRNA isoforms and protein variants. In humans, approximately 95% of multi-exon undergo alternative splicing, enabling a single to encode numerous functional products essential for , tissue specificity, and response to environmental cues. This mechanism not only amplifies the coding capacity of the genome but also contributes to disease when dysregulated, as seen in conditions like cancer and neurodegenerative disorders where aberrant exon inclusion or skipping alters protein function.

Definition and Basics

Definition

In eukaryotic genomes, genes are typically organized as interrupted sequences consisting of exons and introns, unlike prokaryotic genes which generally lack introns and are transcribed into continuous mRNA molecules. Exons represent the segments of DNA (or the corresponding RNA transcripts) that are retained in the mature messenger RNA (mRNA) following the removal of introns through RNA splicing. This process ensures that only exon sequences are incorporated into the final mRNA product exported from the nucleus for translation or other functions. Exons serve as the foundational units in , where they can encode sequences that form proteins in protein-coding genes or contribute to the structure of functional molecules in non-coding genes, such as those producing ribosomal RNAs or microRNAs. Within protein-coding transcripts, exons are categorized into coding regions that directly translate into polypeptides and untranslated regions (UTRs), including the 5' UTR which regulates and the 3' UTR which influences mRNA stability and localization. These UTR exons, while not translated, play essential regulatory roles in modulating efficiency.

Types of Exons

Exons are broadly classified into coding and non-coding types based on their role in protein synthesis. exons contain sequences that are translated into to form part of a protein, forming the (ORF) of (mRNA). In contrast, non-coding exons do not contribute to the protein-coding sequence; they include untranslated regions (UTRs) such as the 5' UTR, which precedes the , and the 3' UTR, which follows the , as well as exons in (ncRNA) genes like microRNAs or long non-coding RNAs that perform regulatory functions without . For instance, UTR exons regulate mRNA stability, localization, and efficiency through interactions with RNA-binding proteins and microRNAs. Exons are further categorized by their position within the transcript as (or first), internal, or (or last) exons, each with distinct structural and functional features. exons, located at the 5' end of the pre-mRNA, include the transcription start site and 5' UTR, and they acquire the 7-methylguanosine cap structure shortly after transcription initiation to protect the mRNA and facilitate binding. Internal exons are positioned between the and exons, typically flanked by splice sites on both ends, and often contain coding sequences that are conserved across species due to their protein-coding roles. exons, at the 3' end, encompass the 3' UTR and the signal, which triggers cleavage and addition of the to enhance mRNA and . This positional classification influences splicing patterns and post-transcriptional modifications. In the context of alternative splicing, certain exons exhibit variable inclusion, leading to subtypes like cassette exons and mutually exclusive exons. Cassette exons are optional internal exons that can be included or skipped in the mature mRNA, allowing for isoform diversity; for example, in the Dscam gene, multiple cassette exons generate thousands of neuronal isoforms for cell recognition. Mutually exclusive exons involve the selection of one exon from a pair or cluster, excluding the other, which is common in requiring precise functional variants, such as the Dscam gene in where it supports neural wiring specificity. Additionally, intron retention can result in introns being retained in the mature transcript, effectively functioning as exons, though often introducing premature stop codons that lead to mRNA degradation or non-productive isoforms; this is sometimes initially annotated as novel exons in genomic studies. Exons also display functional diversity based on the genes they belong to, particularly in housekeeping versus tissue-specific contexts. Housekeeping genes, essential for basic cellular functions like and cytoskeletal maintenance, typically feature constitutive exons with low rates to ensure uniform expression across tissues. In contrast, tissue-specific genes, such as those involved in neural or muscle development, often incorporate alternative exons—especially initial or cassette types—to enable regulated, localized expression; for example, alternative first exons in the DLG1 gene drive brain-specific isoforms. This allows exons to fine-tune protein function in response to cellular needs.

Historical Development

Discovery

The discovery of exons emerged from groundbreaking experiments in the mid-1970s that revealed eukaryotic genes are not continuous but composed of discontinuous coding segments separated by non-coding intervening sequences. In 1977, Phillip A. Sharp and his team at the used RNA-DNA hybridization techniques on adenovirus 2 late mRNA, forming structures that were visualized via electron microscopy; these images showed the mRNA pairing with non-contiguous DNA regions, with unpaired DNA loops indicating intervening sequences later termed introns. Independently that same year, and colleagues at applied similar hybridization and electron microscopy methods to adenovirus 2, mapping mRNA to multiple separated DNA segments and confirming the split gene structure across several viral transcripts. Their parallel findings demonstrated that eukaryotic genes consist of expressed exons interspersed with introns that are removed during RNA processing, fundamentally altering the understanding of gene organization. Early evidence supporting this discontinuous architecture came from electron micrographs of these hybrids, where introns appeared as looped-out DNA segments excluded from mRNA pairing, a visual hallmark observed in eukaryotic genes like those in adenovirus. This looped-out pattern provided direct proof that coding sequences (exons) are not linearly contiguous in the genome, challenging the prevailing colinearity model of DNA to protein. Soon after, the split structure was extended to cellular genes, first in the chicken ovalbumin gene in November 1977, where two interruptions were detected in the coding sequences. This was confirmed in mammalian genes through hybridization studies on the beta-globin gene. In 1978, and colleagues at the used mapping and to analyze the mouse beta-globin gene, revealing two intervening sequences that interrupt the coding region into three non-contiguous exons; these findings confirmed the presence of introns in a well-studied mammalian gene essential for hemoglobin production. The term "exon" was coined in 1978 by biochemist to describe these expressed, conserved sequences that are spliced together to form mature mRNA, contrasting with introns as the removed intervening regions; Gilbert proposed this nomenclature in a seminal commentary on the implications of split genes for evolution and protein diversity. For their pioneering work on split genes and , Sharp and Roberts shared the 1993 in Physiology or Medicine.

Key Milestones

In the 1980s, the development of cDNA cloning techniques enabled researchers to isolate and sequence messenger -derived DNA copies, facilitating the precise mapping of exon boundaries by comparing cDNA sequences to genomic DNA. This approach was instrumental in elucidating structures, as demonstrated in early applications to eukaryotic genes where cDNA libraries revealed discontinuous exon arrangements. Concurrently, Northern blotting emerged as a key method for detecting transcripts and confirming exon connectivity, allowing visualization of mature mRNA sizes and hybridization with exon-specific probes to delineate boundaries. The standardization of (RT-PCR) during this decade further advanced exon validation, providing a sensitive tool to amplify and verify specific exon junctions from RNA templates, thus confirming splicing patterns in various genes. The 1990s marked significant progress through the initiation of the in 1990, which spurred the creation of computational tools for exon prediction amid the growing need to annotate vast genomic sequences. A pivotal advancement was the development of GENSCAN in 1997, an algorithm that accurately predicted complete gene structures, including exon locations, by modeling splice site probabilities and coding potential, achieving 75-80% accuracy on human gene sets. Building on early discoveries such as the 1980 demonstration of in immunoglobulin genes like the mu heavy chain, where differential inclusion of exons produced membrane-bound versus secreted isoforms to highlight splicing's role in immune diversity, studies on gained broader traction during this period with genomic-scale analyses. The release of the draft sequence in represented a landmark achievement, revealing that exons constitute only about 1-2% of the genome—far lower than prior estimates of 5% or more—thus reshaping understandings of genomic organization and the prevalence of non-coding regions. This finding, derived from the Human Genome Sequencing Consortium's efforts, underscored the challenges in exon annotation and catalyzed refinements in prediction algorithms.

Genomic Organization

Exon-Intron Architecture

In eukaryotic genes, the mature mRNA is derived from a primary transcript known as pre-mRNA, which consists of an alternating series of exons and introns. Exons represent the segments retained in the final mRNA, while introns are intervening sequences removed during splicing. This architecture typically features one or more exons at the 5' and 3' termini of the gene, with introns interspersed between them, allowing for the modular assembly of coding information. The boundaries defining exon-intron junctions are highly conserved and characterized by specific consensus motifs essential for spliceosome recognition. At the 5' splice site, the intron begins with the nearly invariant dinucleotide GU (or GT in DNA), often embedded in a broader sequence such as MAG|GURAGU, where M is A or C and R is a purine. The 3' splice site, marking the end of the intron, concludes with the dinucleotide AG, preceded by a polypyrimidine tract—a stretch of pyrimidine-rich nucleotides (primarily U and C)—and an upstream branch point sequence featuring a critical adenine (A) residue, typically within 20–50 nucleotides of the 3' splice site. These motifs, including the branch point A that forms a lariat intermediate during splicing, ensure precise cleavage and ligation. Splice site recognition operates through two primary models: the intron definition model and the exon definition model, which depend on the relative lengths of introns and exons. In the intron definition model, prevalent in organisms with short introns such as yeast (Saccharomyces cerevisiae, where average intron length is approximately 250 nucleotides), the spliceosome assembles across the intron by directly pairing the 5' and 3' splice sites. Conversely, in the exon definition model, common in vertebrates with longer introns (average ~3–7 kb in humans), recognition begins across the exon, involving interactions between the 3' splice site of the upstream intron and the 5' splice site of the downstream intron, facilitated by exon-binding factors like SR proteins. This exon-centric mechanism accommodates the challenges posed by expansive introns, promoting efficient splicing of small internal exons (often 50–300 nucleotides).

Size Distribution and Genomic Contribution

In humans, the average length of internal exons is approximately 147 (), with most exons falling between 50 and 300 . Exon sizes exhibit a broad range, from as short as 1 () to over 90,000 in extreme cases across eukaryotic genes including humans, though the majority are under 500 . In contrast, introns are significantly larger, with an average length of about 3,356 . The distribution of exon numbers per gene typically ranges from 5 to 10 in protein-coding genes, with an average of 8.8 exons per gene. This pattern reflects a higher exon density in more complex organisms, where genes often contain multiple exons to support diverse splicing outcomes. Exons constitute only 1-1.1% of the total , despite encoding 100% of the protein-coding sequences. This small proportion underscores the compact nature of coding regions relative to . Comparative analyses across species reveal that generally feature fewer but longer exons per compared to vertebrates, which exhibit more numerous, shorter exons. This shift correlates with increasing organismal complexity, as seen in the expansion of exon counts during vertebrate evolution from ancestors.

Recent Discoveries in Exon Annotation

In 2024, researchers at the utilized exon trapping to identify approximately one million previously unannotated exons in the , significantly expanding the known transcriptomic landscape beyond initial post-genome sequencing annotations. This discovery, derived from analyzing diverse human samples, revealed novel isoforms and regulatory elements that were missed by short-read technologies, highlighting the limitations of earlier annotation efforts. Advancements in cryptic exon detection have further illuminated hidden splicing events, particularly those regulated by RNA-binding proteins. A 2025 study employing long-read sequencing uncovered a TDP-43-dependent cryptic exon in the , whose inclusion disrupts normal splicing and is associated with neurodegeneration in conditions like . This finding underscores how proteinopathies can activate latent exons within introns, altering function in neuronal contexts. Similarly, investigations into hybrid exons—formed through coordinated transcription and splicing—have revealed evolutionary mechanisms for generating novel transcript structures. In 2024, genomic analyses demonstrated that hybrid exons arise from nucleotide-level of promoter activity and splice site selection, enabling adaptive isoform diversity across . These exons often integrate upstream transcriptional starts with downstream splicing, contributing to regulatory flexibility in . Recent multi-species annotation efforts have enhanced resources for exon-intron structures. A 2024 phylogenetic study across 590 eukaryotic updated intron-exon architecture data using existing annotations, revealing conserved patterns in . Such database expansions facilitate cross-species analyses of mechanisms. These discoveries collectively revise prior estimates, indicating that affects more than 90% of human multi-exon genes, thereby amplifying proteomic complexity and underscoring the 's untapped regulatory potential.

Structure and Function

Molecular Structure

Exons are characterized by distinct compositions that differentiate them from intronic sequences. Notably, exons display a higher , averaging approximately 7% greater than that of flanking introns, which contributes to increased occupancy and packaging in coding regions. This bias arises from evolutionary pressures favoring stable secondary structures and efficient transcription in protein-coding areas. Additionally, codon usage within exons exhibits patterns of bias, particularly near exon-intron boundaries, where synonymous codons are selected to minimize disruption of splice site recognition while optimizing translation efficiency. At the biophysical level, exons often harbor secondary structural elements such as stem-loops or hairpins formed by base-pairing within the sequence. These structures can modulate splicing efficiency by influencing the accessibility of regulatory motifs, with certain hairpins repressing inclusion of specific exons in a tissue-dependent manner. Embedded within these sequences are short cis-regulatory motifs known as exonic splicing enhancers (ESEs) and silencers (ESSs). ESEs, typically 6-8 long, serve as binding sites for serine/arginine-rich ( that promote exon recognition by the , whereas ESSs recruit repressive factors like hnRNP proteins to inhibit splicing. Functional exons demonstrate high sequence conservation across diverse , reflecting their in encoding conserved protein domains and regulatory . This evolutionary stability is evident in orthologous exons shared among mammals, where nucleotide identity often exceeds 80% due to purifying selection against deleterious mutations. Exons are delimited by conserved splice site sequences at their boundaries, ensuring precise removal during processing.

Role in RNA Processing

Exons play a central role in the maturation of pre-messenger (pre-mRNA) through the process, where they are precisely joined together after the removal of intervening . The , a large ribonucleoprotein complex composed of five small nuclear ribonucleoproteins (snRNPs) and over 150 proteins, catalyzes this process via two sequential reactions. In the first step, the 2'-OH group of an at the intron's attacks the 5' splice site, leading to at the exon-intron boundary and formation of a intermediate containing the intron. The second step involves the 3'-OH of the upstream exon attacking the 3' splice site, resulting in intron release and of the adjacent exons to form mature mRNA. Exon sequences at the intron boundaries serve as critical scaffolds for spliceosome assembly and recognition. The 5' splice site, typically marked by a GU dinucleotide at the end of the upstream exon, is recognized by the U1 snRNP through base-pairing with its 5' stem-loop, initiating spliceosome recruitment. Similarly, the 3' splice site, ending with an AG dinucleotide at the start of the downstream exon, interacts with U2 auxiliary factor (U2AF) and U2 snRNP to stabilize binding near the branch point sequence within the intron. These exon-flanking motifs ensure accurate definition of exon boundaries, with disruptions often leading to splicing errors. Following successful splicing, the (EJC) is deposited approximately 20-24 upstream of each exon-exon on the mature mRNA. Composed of core proteins eIF4A3, MAGOH, Y14, and MLN51, the EJC marks the splicing event and facilitates downstream processes such as nuclear export by interacting with export factors like TAP/p15, enhancing mRNA transport to the . Additionally, the EJC contributes to quality control by promoting (NMD) of mRNAs with premature termination codons located more than 50 upstream of an exon , thereby preventing the production of truncated proteins. Aberrant splicing, such as or intron retention, can arise from in exon boundary sequences or components, leading to . For instance, a in the 5' splice site of exon 20 in the IKBKAP gene causes skipping of that exon in , resulting in a truncated IKAP protein and dysfunction. Similarly, in the gene due to splice site variants is a primary cause of , highlighting the pathological consequences of disrupted exon processing.

Alternative Splicing Mechanisms

Alternative splicing mechanisms allow exons to be variably included or excluded in mature mRNA transcripts, thereby generating multiple protein isoforms from a single and expanding proteomic diversity. These processes are tightly regulated and can involve several distinct patterns, including , where one or more exons are omitted from the final mRNA; intron retention, in which are retained alongside exons; mutually exclusive exons, where only one of two or more exons is included; and poison exons, which introduce premature termination codons (PTCs) that trigger (NMD) to regulate . is the most prevalent mechanism, accounting for a significant portion of splicing events in humans, while intron retention often occurs in specific cellular contexts like neuronal differentiation. Regulation of these mechanisms relies on cis-acting elements within pre-mRNA, such as exonic splicing enhancers (ESEs) and silencers (ESSs), which interact with factors including and heterogeneous nuclear ribonucleoproteins (hnRNPs). typically bind ESEs to promote exon inclusion by recruiting the , whereas hnRNPs often bind ESSs or intronic splicing silencers (ISSs) to repress splicing and favor or intron retention. This antagonism enables tissue-specific splicing patterns; for instance, in brain tissue, hnRNP A1 promotes skipping of certain neural exons, while SRSF1 enhances inclusion of muscle-specific isoforms. Such regulation is crucial for developmental processes, where adjusts isoform ratios in response to cellular signals. Through these mechanisms, a single can produce thousands of isoforms, dramatically increasing functional diversity; a prime example is the DSCAM gene, which generates over 38,000 protein variants via mutually exclusive splicing of four exon clusters, aiding neuronal self-avoidance and wiring specificity. In humans, similar complexity occurs in genes like , where variable exon inclusion modulates interactions. Aberrant alternative splicing contributes to pathologies, particularly cancer, where dysregulated exon inclusion promotes tumor progression. For example, in and colon cancers, increased inclusion of CD44 variable exon v6 (CD44v6) enhances and by altering hyaluronan binding and signaling. This shift often results from overexpression of splicing factors like Tra2β, which favors v6 inclusion, underscoring how splicing dysregulation can drive oncogenic phenotypes. Poison exons also play a role in , as their aberrant inclusion can silence tumor suppressor genes via NMD.

Applications and Techniques

Experimental Identification Methods

Classical methods for identifying exons relied on techniques that detect and map specific RNA transcripts. Northern blotting involves hybridizing labeled probes to separated by , allowing visualization of transcript sizes and confirmation of exon presence in mature mRNAs. This method was instrumental in early exon characterization by distinguishing full-length transcripts from potential splicing variants based on size differences. Similarly, the S1 nuclease protection assay uses single-stranded DNA probes complementary to target RNAs; unprotected regions are digested by S1 , protecting only hybridized exon sequences for quantification and precise boundary mapping. These assays provided high sensitivity for low-abundance transcripts but were limited to predefined probes and labor-intensive for genome-wide analysis. Sequencing-based approaches revolutionized exon detection by enabling transcriptome-wide profiling. Expressed Sequence Tags (ESTs), short partial sequences from cDNA clones, were among the first to systematically identify exons by aligning them to genomic DNA, revealing splicing patterns and novel transcripts. ESTs facilitated the identification of thousands of expressed genes and mRNA abundance patterns in model organisms through large-scale sequencing efforts, contributing significantly to early transcriptome annotation. RNA-Seq, utilizing high-throughput short-read sequencing of cDNA, quantifies exon usage across the transcriptome by mapping reads to reference genomes, detecting differential splicing and novel exons with high resolution. This method outperforms microarrays in sensitivity and dynamic range, capturing rare isoforms and tissue-specific exon inclusion. Advanced sequencing technologies have enhanced exon characterization by resolving complex isoforms. Long-read platforms like PacBio generate full-length transcripts, accurately assembling multi-exon structures and identifying events that short reads fragment. For instance, PacBio Iso-Seq has revealed novel isoforms in disease-associated genes by spanning entire coding regions without assembly errors. Oxford Nanopore sequencing similarly provides direct reads, enabling detection of full isoforms and splice variants in single cells, with applications in mapping in neural tissues. Complementing these, Crosslinking and sequencing (CLIP-seq) identifies splicing factor binding sites near exons, elucidating regulatory interactions that influence exon inclusion. CLIP-seq data integration with has pinpointed context-specific splicing factors, improving predictions of exon functionality. Emerging single-cell techniques, such as single-cell sequencing (scRNA-seq) variants including scSplice and targeted assays like Nanostring, enable exon-level resolution of in heterogeneous populations, revealing cell-type-specific exon usage in diseases like cancer and neurodegeneration as of 2025. Computational methods complement experimental data for exon prediction. Tools like employ hidden Markov models to scan genomic sequences for exon-intron boundaries based on statistical patterns, achieving high accuracy in eukaryotic gene . integrates splice site motifs, such as GT-AG rules, to delineate exons without prior transcript evidence. Recent advancements incorporate , with deep neural networks like predicting splice sites and exon structures from DNA sequences alone, outperforming traditional models in variant effect assessment. These tools benchmark favorably on diverse genomes, enhancing of short or atypical exons.

Therapeutic Applications

One prominent therapeutic strategy targeting exons is , which uses (AONs) to modulate splicing and bypass defective exons in genetic disorders. In (DMD), (Exondys 51), a phosphorodiamidate oligomer, induces skipping of exon 51 in the DMD , restoring the and enabling partial production in approximately 13-14% of patients with amenable mutations. The U.S. Food and Drug Administration granted accelerated approval to on September 19, 2016, marking the first oligonucleotide therapy for splicing modulation in DMD, though confirmatory trials are required to verify clinical benefits like improved motor function. CRISPR-Cas9 has emerged as a precise tool for exon excision or insertion in genetic diseases, with advances in specificity enhancing its therapeutic potential from 2023 to 2025. By directing to target specific exons, this approach enables reframing or replacement of mutated sequences, as demonstrated in DMD models where variant Cas9 enzymes like SpCas9-LRVQR restored expression through exon 53 reframing in patient-derived cells. Recent innovations, including improved and base editing variants, allow kilobase-scale insertions of functional exons without off-target effects, supporting applications in muscular dystrophies, disorders, and other monogenic conditions. RNA exon editing via trans-splicing represents a non-DNA-altering for mutation correction, where synthetic RNAs replace defective exons in pre-mRNA to produce functional proteins. This strategy is particularly suited for large genes exceeding 5 , as it facilitates replacement of entire exons or multi-exon segments using a single vector, addressing diverse with broad applicability. In 2024 developments, enhanced trans-splicing efficiencies through optimized and bioinformatics enabled a phase 1/2 (NCT06467344) for , using ACDN-01 to correct multiple ABCA4 exons and potentially benefit up to 70% of patients. Poison exons, cryptic or cassette exons that trigger nonsense-mediated decay (NMD) upon inclusion, are being harnessed for targeted mRNA degradation in cancer therapies through splice modulation. In SF3B1-mutant tumors, antisense oligonucleotides promote inclusion of a poison exon in BRD9, leading to its mRNA degradation and suppression of oncogenic activity. Similarly, in studies, TRA2β poison exon inclusion via AONs regulated protein expression and acted as a to inhibit cancer cell growth. For (ALS), splice modulation targets cryptic poison exons arising from TDP-43 dysfunction; a novel MNAT1 cryptic exon, identified via long-read sequencing, induces NMD-mediated degradation and was confirmed in ALS/FTD patient tissues, offering potential for therapeutic intervention to mitigate .

Misconceptions and Terminology

Common Misuses of the Term

One common misuse of the term "exon" involves equating it directly with the entire gene or assuming that all exons exclusively encode proteins, thereby overlooking the presence of untranslated regions (UTRs) and exons in non-coding RNAs. Exons are defined as DNA sequences that are transcribed into RNA and retained in the mature transcript after splicing, but only a subset—less than 30% in humans—actually code for amino acids, with the remainder contributing to regulatory elements like UTRs or non-coding RNAs such as microRNAs and long non-coding RNAs. This misconception persists in some scientific literature, textbooks, and technologies like whole-exome sequencing, which primarily targets protein-coding regions and thus captures less than 25% of the total exome, leading to an incomplete representation of exonic diversity. Another frequent error is applying the concept of exons to prokaryotic genes, which generally lack introns and thus do not undergo the splicing process that defines exons in eukaryotes. In prokaryotes, such as , genes are typically continuous coding sequences without the interspersed non-coding introns found in eukaryotic genomes, making the distinction between exons and introns irrelevant; the entire transcribed region functions analogously to a single exon, but the terminology is eukaryotic-specific and arises from the evolutionary absence of spliceosomal introns in prokaryotes. This misuse can confuse discussions by imposing eukaryotic frameworks on prokaryotic systems, where no splicing occurs due to coupled transcription-translation mechanisms. A related overgeneralization stems from extrapolating exon characteristics from model organisms, such as assuming uniform exon sizes across species, which ignores significant variation in exon length and structure. For instance, average internal exon lengths vary across species, being around 147 base pairs in humans, approximately 200 base pairs in nematodes like , and 179 base pairs in plants like , decreasing further with increasing intron numbers within genes; this variability reflects evolutionary adaptations and cannot be uniformly applied without species-specific context. Such assumptions can lead to errors in annotating genomes from non-model organisms or predicting splicing patterns. In popular and simplified educational materials, exons are often portrayed as straightforward "building blocks" of genes that directly assemble into proteins, neglecting the essential role of splicing and the modular nature of pre-mRNA processing. This oversimplification, echoed in some broader communications, reinforces the protein-coding bias and underemphasizes how exons contribute to RNA stability, localization, and regulation beyond mere . Exons are fundamentally distinguished from introns in eukaryotic genes, as exons represent the transcribed sequences that are retained and joined together in the mature messenger RNA (mRNA) after splicing, whereas introns are the intervening non-coding sequences that are excised during RNA processing. This retention of exons ensures they contribute to the final coding or regulatory elements of the mRNA, while introns are degraded post-removal to prevent interference with translation. Unlike codons, which are specific triplets of in the mRNA that directly individual during protein , exons are larger genomic segments that encompass multiple codons and may include untranslated regions. An exon typically spans dozens to hundreds of , allowing it to contain several codons, but it is not equivalent to a single codon unit; instead, exons serve as modular blocks in pre-mRNA that are processed to form the continuous coding sequence. Protein domains, as functional and structural modules within the folded polypeptide chain, differ from exons in that domains operate at the level of protein architecture and often span portions of one or multiple exons, reflecting an evolutionary correlation rather than a . While exon-intron boundaries can align with domain edges in some genes, promoting modular evolution through exon , domains are defined by their biochemical roles, such as enzymatic activity, independent of the underlying genomic exon structure. Exons must also be differentiated from isoforms, which are variant forms of mRNA or proteins arising from events that selectively include or exclude specific exons from the same . A single exon is a fixed genomic element, but isoforms represent the diverse products generated by combinatorial use of exons, enabling functional diversity without altering the exon sequences themselves.

References

  1. [1]
    Regulation of alternative RNA splicing by exon definition and exon ...
    Exons are segments of an interrupted gene that are represented in the mRNA, and introns are the sequences that intervene between exons.
  2. [2]
    Not all exons are protein coding: Addressing a common misconception
    Apr 12, 2023 · Exons are regions of DNA that are transcribed to RNA and retained after introns are spliced out. However, the term “exon” is often misused ...
  3. [3]
    Genomic Sequence, Splicing, and Gene Annotation - PMC - NIH
    The term “exon” was coined by Gilbert (1978) to refer to what is left when introns are removed by splicing, and RNAs that are entirely noncoding (such as tRNAs ...
  4. [4]
    Discovery of RNA splicing and genes in pieces - PubMed Central
    Jan 19, 2016 · The discovery of pre-mRNA splicing and the corollary that most genes of multicellular organisms are split into pieces, i.e., exons, separated by ...
  5. [5]
    Sixty years of genome biology - PMC - PubMed Central
    Accordingly, Roberts and Sharp were awarded the Nobel Prize in Physiology or Medicine in 1993 for this discovery. The existence of introns was entirely ...
  6. [6]
    Alternative splicing: Human disease and quantitative analysis ... - NIH
    Dec 24, 2020 · In humans, up to 95% of multi-exon genes undergo alternative splicing to encode proteins with different functions in distinct cellular processes ...
  7. [7]
    Introns: The Functional Benefits of Introns in Genomes - PMC - NIH
    ... introns in the genomic structure, whereas no prokaryotes identified so far carry introns. Second, the amount of total introns varies in different species.
  8. [8]
    Exon - an overview | ScienceDirect Topics
    Exon is defined as a nucleotide sequence that is included in the final mRNA after the process of splicing, where introns are excised. AI generated definition ...
  9. [9]
    Untranslated regions of mRNAs | Genome Biology | Full Text
    Feb 28, 2002 · When concentration of ribosomes or translation factor are limiting, the poly(A) tail can cooperate with 5' cap to enhance translation initiation ...
  10. [10]
    Hybrid exons evolved by coupling transcription initiation and ...
    Dec 31, 2024 · Exons within transcripts have traditionally been classified as first exons (FEs), internal exons (IEs) or last exons (LEs) based on their ...
  11. [11]
    Ending the message: poly(A) signals then and now
    This review charts our ever-increasing knowledge of the mechanism of formation of the ubiquitous 3′-terminal poly(A) tail, taken as a defining feature of ...
  12. [12]
    The changing paradigm of intron retention: regulation, ramifications ...
    The main archetypes of AS are the cassette-type alternative exon usage, alternative 5′ or 3′ splice sites, mutually exclusive exons and intron retention (IR) ( ...
  13. [13]
    The landscape of human mutually exclusive splicing - PMC - NIH
    Mutually exclusive splicing generates alternative isoforms by retaining only one exon of a cluster of neighbouring internal exons in the mature transcript and ...
  14. [14]
    Intron Retention as a Mode for RNA-Seq Data Analysis - Frontiers
    Jul 6, 2020 · Intron retention (IR) is an alternative splicing mode whereby introns, rather than being spliced out as usual, are retained in mature mRNAs.Introduction · Intron Retention Is Associated... · Methods for Intron Retention...
  15. [15]
    Tissue-Specific and Ubiquitous Expression Patterns from Alternative ...
    The use of distinct alternative first exons in 3,296 genes was examined using exon-microarray data from 11 human tissues. Comparing two transcripts from each ...
  16. [16]
    The Nobel Prize in Physiology or Medicine 1993 - Press release
    Roberts and Phillip A. Sharp in 1977 independently discovered that genes could be discontinuous, that is, a given gene could be present in the genetic material ...
  17. [17]
    1977: Introns Discovered
    Apr 26, 2013 · Richard Roberts' and Phil Sharp's labs showed that eukaryotic genes contain many interruptions, called introns.
  18. [18]
    Discovery of RNA splicing and genes in pieces - PNAS
    In the 1970s, methods had been developed for observing regions of RNA base paired to a longer DNA molecule by electron microscopy (EM) of the hybrid molecules.
  19. [19]
    Intervening sequence of DNA identified in the structural ... - PNAS
    Feb 15, 1978 · Intervening sequence of DNA identified in the structural portion of a mouse beta-globin gene. S M Tilghman, D C Tiemeier, J G Seidman, +3 ...Missing: intron | Show results with:intron
  20. [20]
    cDNA Cloning - an overview | ScienceDirect Topics
    A common cDNA cloning approach in the mid-1980s was the use of antibodies to probe cDNA expression libraries by detecting a cross-reactive polypeptide fusion ...
  21. [21]
  22. [22]
    The Early Days of Blotting | Springer Nature Experiments
    The term Southern blotting led to a “geographic” naming tradition, with RNA blotting bearing the name Northern blotting and protein transfer to membranes ...
  23. [23]
    Competitive reverse transcription polymerase chain reaction for ...
    Competitive reverse transcription polymerase chain reaction (RT-PCR) is an increasingly used method for quantifying RNA.
  24. [24]
    NHGRI History and Timeline of Events
    March 1999: Large-scale sequencing of the human genome begins. September 1999: Human Genome Project scientists confirm they are on schedule to produce the ...Missing: exon | Show results with:exon
  25. [25]
    Prediction of complete gene structures in human genomic DNA
    Apr 25, 1997 · GENSCAN is shown to have substantially higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes, ...
  26. [26]
    The key role of alternative splicing in human biological systems - PMC
    ... exon skipping events and the most specific ... Two mRNAs can be produced from a single immunoglobulin mu gene by alternative RNA processing pathways.
  27. [27]
    Origins of introns based on the definition of exon modules and their ...
    (A) Generalized eukaryotic gene structure. Eukaryotic genes consist at the RNA level of coding sequences (exons) interspersed with non-coding sequences (introns) ...
  28. [28]
    catalogue of splice junction sequences | Nucleic Acids Research
    Cite. Stephen M. Mount, A catalogue of splice junction sequences, Nucleic Acids Research, Volume 10, Issue 2, 22 January 1982, Pages 459–472, https://doi.org ...
  29. [29]
    Exon definition may facilitate splice site selection in RNAs ... - PubMed
    We suggest that exons are recognized and defined as units during early assembly by binding of factors to the 3' end of the intron, followed by a search for a ...
  30. [30]
    Splicing of internal large exons is defined by novel cis-acting ...
    Jul 11, 2012 · Human internal exons have an average size of 147 nt, and most are <300 nt. This small size is thought to facilitate exon definition.
  31. [31]
    Identification of minimal eukaryotic introns through GeneBase, a ...
    Nov 17, 2015 · The mean exon length for all organisms is 308 base pairs (bp) with a standard deviation (SD) of 613 (range 1–91,671). In total, 1,252,462 ...
  32. [32]
    Large Introns of 5 to 10 Kilo Base Pairs Can Be Spliced out in ... - NIH
    Aug 11, 2017 · For example, human genes tend to have small exons separated by long introns with the mean and median size of 3356 and 1023 bp, respectively [3].
  33. [33]
    Distributions of exons and introns in the human genome - PubMed
    On average, there are 8.8 exons and 7.8 introns per gene. About 80% of the exons on each chromosome are < 200 bp in length. < 0.01% of the introns are < 20 ...
  34. [34]
    Sequencing Your Genome: What Does It Mean? - PMC - NIH
    The human genome contains about 180,000 exons, which are collectively called an exome. An exome comprises about 1% of the human genome and hence is about 30 ...
  35. [35]
    Estimation of genetic distances from human and mouse introns - PMC
    The sequence of the complete human genome has revealed that, whereas exons occupy only 1.1% of the genome, introns extend over 24% of its length [13,14].The Intron And Exon Distance... · Figure 3 · Concatenated Exon Alignments<|control11|><|separator|>
  36. [36]
    The role of transposable elements in the evolution of non ...
    Our analysis indicated that, on average, the size of the last exons is longer in mammals compared to vertebrates and more so in invertebrates. The differences ...
  37. [37]
    Increased complexity of gene structure and base composition in ...
    Jul 20, 2011 · ... vertebrates from their invertebrate ancestors, and that the number of exons per gene increased. Studies based on all protein-coding genes in ...
  38. [38]
    U of T researchers discover one million new components of the ...
    Feb 9, 2024 · Researchers at the University of Toronto's Donnelly Centre for Cellular and Biomolecular Research have found close to one million new exons.
  39. [39]
    1 Million Unannotated Exons Discovered in the Human Genome
    Feb 12, 2024 · Over two decades after the first human genome was sequenced, a team of researchers has discovered ~1 million new exons in the human genome.
  40. [40]
    Long-read RNA sequencing unveils a novel cryptic exon in MNAT1 ...
    Jul 16, 2025 · TDP-43 regulates RNA splicing, and its dysfunction in neurons is a hallmark of some neurodegenerative diseases, including amyotrophic lateral ...
  41. [41]
  42. [42]
    Phylogenetic Analysis of 590 Species Reveals Distinct Evolutionary ...
    Here, we explore the evolution of the eukaryotic intron–exon gene structure by focusing on several key features such as the intron length, the number of introns ...Phylogenetic Analysis Of 590... · Results · Gene Structure Is Ancestral...Missing: database | Show results with:database
  43. [43]
    Splicing-specific transcriptome-wide association uncovers genetic ...
    Jun 25, 2024 · Over 90% of human genes undergo alternative splicing, resulting in multiple transcript isoforms from the same gene locus.
  44. [44]
    RNA Secondary Structure Repression of a Muscle-Specific Exon in ...
    This structure repressed splicing of exon 6B to exon 7 in a HeLa cell extract. ... EPERON, L.P., EFFECTS OF RNA SECONDARY STRUCTURE ON ALTERNATIVE SPLICING ...
  45. [45]
    Spliceosome Structure and Function - PMC - PubMed Central - NIH
    Pre-mRNA splicing is catalyzed by the spliceosome, a multimegadalton ribonucleoprotein (RNP) complex comprised of five snRNPs and numerous proteins.
  46. [46]
    Mechanisms and regulation of spliceosome-mediated pre-mRNA ...
    Splicing consists of two stepwise transesterification reactions, which lead to lariat formation followed by exon ligation and release of the intron lariat ( ...
  47. [47]
    A Day in the Life of the Exon Junction Complex - PubMed Central
    Jun 5, 2020 · The exon junction complex (EJC) is an abundant messenger ribonucleoprotein (mRNP) component that is assembled during splicing and binds to mRNAs upstream of ...
  48. [48]
    Splicing mutations in human genetic disorders: examples, detection ...
    Apr 21, 2018 · Weak definition of IKBKAP exon 20 leads to aberrant splicing in familial dysautonomia. Hum Mutat. 2007;28:41–53. doi: 10.1002/humu.20401 ...
  49. [49]
    Alternative Splicing and Isoforms: From Mechanisms to Diseases - NIH
    Feb 24, 2022 · So far, seven basic types of alternative splicing have been identified, including exon skipping, alternative 5′-splice site, alternative 3′- ...Missing: poison paper
  50. [50]
    Alternative splicing and cancer: a systematic review - PMC - NIH
    The main alternative splicing patterns are divided into 5 types, as shown in Fig. 2: Exon skipping, intron retention, mutually exclusive exons, alternative 5′ ...
  51. [51]
    Alternative splicing and related RNA binding proteins in human ...
    Feb 2, 2024 · This review provides a detailed account of the recent advancements in the study of alternative splicing and AS-related RNA-binding proteins in tissue ...
  52. [52]
    Splicing regulation: From a parts list of regulatory elements to an ...
    Splicing is regulated by cis-elements (ESE, ESS, ISS, and ISE) and trans-acting splicing factors (SR proteins, hnRNP, and unknown factors).
  53. [53]
    Regulation of alternative splicing by the core spliceosomal machinery
    The most widely studied trans-acting factors regulating AS are proteins of the SR (Ser/Arg-rich) and hnRNP (heterogeneous ribonucleoprotein) families, as well ...
  54. [54]
    Regulation of alternative mRNA splicing: old players and new ...
    Jun 1, 2018 · Conversely, the U2 snRNP protein SF3B1 binds nucleosomes positioned over exons and thereby affects splicing, possibly by aiding splice site ...
  55. [55]
    Drosophila Dscam Is an Axon Guidance Receptor Exhibiting ...
    Alternative splicing can potentially generate more than 38,000 Dscam isoforms. This molecular diversity may contribute to the specificity of neuronal ...
  56. [56]
    Regulation of CD44 Alternative Splicing by SRm160 and Its ...
    Expression of CD44 isoforms containing variable 5 exon (v5) correlates with enhanced malignancy and invasiveness of some tumors. Here we demonstrate that SRm160 ...
  57. [57]
    Splicing Factor Tra2-β1 Is Specifically Induced in Breast Cancer and ...
    May 1, 2006 · The human CD44 gene undergoes extensive alternative splicing of multiple variable exons positioned in a cassette in the middle of the gene.
  58. [58]
    Exploring the Diverse Functional and Regulatory Consequences of ...
    The retention of a poison intron containing a PTC induces NMD. Essential exons induce NMD when skipped. An alternative poison 5′ splice site induces a frame ...Missing: paper | Show results with:paper
  59. [59]
    ARTICLE Mapping of Small RNAs in the Human ... - Cell Press
    The elucidation of the largely unknown transcriptome of small RNAs is crucial for the understanding of genome and cellular function.
  60. [60]
    RNA Analysis by Nuclease Protection - PubMed
    Nuclease protection assays (S1 nuclease protection and RNase protection) are extremely sensitive procedures for detection and quantitation of mRNA species.Missing: seminal paper
  61. [61]
    The Basics: Nuclease Protection Assays - Thermo Fisher Scientific
    Detection of Specific mRNA Species Using a Nuclease Protection Assay. NPAs are the method of choice for the simultaneous detection of several RNA species.Missing: seminal paper
  62. [62]
    Expressed Sequence Tags and Human Genome Project - Science
    ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences.<|separator|>
  63. [63]
    [PDF] Expressed sequence tags for genes: a review - HAL
    May 11, 2020 · Abstract - Expressed sequence tags (ESTs) are partial sequences ... of 3 000 ESTs allows the identification of at least one transcript of 99 % of ...
  64. [64]
    RNA-Seq: a revolutionary tool for transcriptomics - PMC - NIH
    RNA-Seq is a deep-sequencing approach for transcriptome profiling, using high-throughput sequencing to map and quantify transcriptomes.
  65. [65]
    RNA-seq data science: From raw data to effective interpretation
    Mar 12, 2023 · In this review, we provide an overview of diverse methodologies for RNA-seq analyses that can be used to detect novel exons and transcripts, ...
  66. [66]
    Single-molecule, full-length transcript isoform sequencing reveals ...
    Jul 9, 2021 · Here, we demonstrate that single-molecule full-length RNA sequencing enables to identify disease-associated transcript isoforms.
  67. [67]
    PacBio Single-Molecule Long-Read Sequencing Provides New ...
    Aug 29, 2021 · Reads that were mapped to different exons in known gene regions were considered new isoforms, and isoforms spanning two or more genes are ...<|separator|>
  68. [68]
    Nanopore long-read RNA sequencing reveals functional alternative ...
    Oct 31, 2023 · Nanopore long-read RNA sequencing reveals functional alternative splicing variants in human vascular smooth muscle cells.
  69. [69]
    Splicing factor SFRS1 recognizes a functionally diverse landscape ...
    We observed a significant enrichment in the CLIP-seq data set relative to the human genome for binding sites located within 5′ and 3′ adjacent exons (Fisher's ...
  70. [70]
    Integration of CLIP experiments of RNA-binding proteins: a novel ...
    Jun 25, 2019 · Using this information, we present a methodology for predicting context-specific splicing factors based on CLIP experiments and RNA-seq. This ...
  71. [71]
    AUGUSTUS: ab initio prediction of alternative transcripts - PMC - NIH
    AUGUSTUS is a software tool for gene prediction in eukaryotes based on a Generalized Hidden Markov Model, a probabilistic model of a sequence and its gene ...
  72. [72]
    AUGUSTUS: a web server for gene finding in eukaryotes
    We present a www server for AUGUSTUS, a novel software program for ab initio gene prediction in eukaryotic genomic sequences.
  73. [73]
    Predicting RNA splicing from DNA sequence using Pangolin
    Apr 21, 2022 · Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep ...
  74. [74]
    Benchmarking deep learning splice prediction tools using functional ...
    We benchmarked established and deep learning splice prediction tools on published gold standard sets of 71 NCSS and 81 DI variants in the ABCA4 gene and 61 NCSS ...
  75. [75]
    FDA Approves Eteplirsen for Duchenne Muscular Dystrophy - NIH
    ... exon 51 skipping applying to 13%–14% of patients. Ultimately, exon skipping of various exons may be an effective therapy in more DMD patients. After ...
  76. [76]
    Review Recent advances in CRISPR-Cas9-based genome insertion ...
    The ability to insert full exons or entire genes enables therapeutic correction of genetic diseases through the insertion of full-length functional genes at ...
  77. [77]
    CMN Weekly (23 May 2025) - Your Weekly CRISPR Medicine News
    May 23, 2025 · CRISPR-Cas9 single-cut editing with SpCas9-LRVQR successfully restored dystrophin expression by reframing exon 53 in patient-derived iPSCs and a ...<|separator|>
  78. [78]
    Gene editing for collagen disorders: current advances and future ...
    Aug 11, 2025 · We explore the application of CRISPR-Cas9, which facilitates targeted DNA modifications, base editing (BE), and prime editing (PE), enabling ...
  79. [79]
  80. [80]
  81. [81]
    Therapeutic Targeting of RNA Splicing in Cancer - MDPI
    ... exon 14a, termed a “poison exon” due to the fact that it contains a PTC. The inclusion of exon 14a leads to the degradation of the BRD9 transcript ...
  82. [82]
    Antisense oligonucleotide-mediated TRA2β poison exon inclusion ...
    Feb 15, 2025 · We demonstrate that TRA2β-PE acts both as a regulator of protein expression and a long-noncoding RNA to control cancer cell growth.
  83. [83]
    Long-read RNA sequencing unveils a novel cryptic exon in MNAT1 ...
    Jul 16, 2025 · Given the toxic effects of TDP-43-dependent cryptic exons on cells ... MNAT1 is subject to degradation at the mRNA stage by NMD. In ...Missing: modulation | Show results with:modulation
  84. [84]
    Why Prokaryotes Genomes Lack Genes with Introns Processed by ...
    Oct 31, 2018 · Then, Gilbert (1978) coined the concept of exons (regions of the coding DNA that remained in the mRNA) and introns (the regions that are ...<|control11|><|separator|>
  85. [85]
    Intron and exon length variation in Arabidopsis, rice, nematode, and ...
    May 25, 2008 · The average exon length in Arabidopsis thaliana, Oryza sativa, Caenorhabditis elegans, and Homo sapiens genes decreases with an increasing number of introns.Missing: across | Show results with:across<|control11|><|separator|>
  86. [86]
    Exon - National Human Genome Research Institute
    An exon is a region of the genome that ends up within an mRNA molecule. Some exons are coding, in that they contain information for making a protein.
  87. [87]
    Intron - National Human Genome Research Institute
    The protein coding sequences for many genes are broken into smaller pieces of coding sequences called exons separated by non-coding sequences called introns.
  88. [88]
    Transcription/translation - Exons and introns - DNA Learning Center
    In most eukaryotic genes, coding regions (exons) are interrupted by noncoding regions (introns). During transcription, the entire gene is copied into a ...
  89. [89]
    Exons, Introns & Codons
    Exons and introns are DNA sequences; exons are expressed, introns are not. Codons are three-base sequences in mRNA read during translation.
  90. [90]
    Anatomy of a Gene - Learn Genetics Utah
    Exons are connected, or "spliced," together to form the mature mRNA. Together, the exons make up the gene's protein-coding region. By putting different ...
  91. [91]
    Protein domains correlate strongly with exons in multiple eukaryotic ...
    We conducted a multi-genome analysis correlating protein domain organization with the exon–intron structure of genes in nine eukaryotic genomes.
  92. [92]
    Exon–domain correlation and its corollaries | Bioinformatics
    When the exon–intron split structure of genes correlates with the organization of protein domains, i.e. the exons match the domains, then duplication, ...
  93. [93]
    All About Alternative Splicing | The Scientist
    Sep 27, 2024 · Unlike constitutive splicing, where exons are joined in a fixed order, alternative splicing selectively includes or excludes certain coding ...