Fact-checked by Grok 2 weeks ago

Exome

The exome consists of the collective exons in a , which are the segments of DNA transcribed into mRNA and predominantly translated into proteins, thereby encoding the functional . In the , the exome spans approximately 1-2% of the total DNA sequence, containing roughly 180,000 exons distributed across about 20,000 protein-coding genes. Exome selectively captures and analyzes these coding regions using targeted hybridization or amplification methods, offering a cost-effective alternative to whole-genome sequencing by focusing on areas enriched for disease-causing variants. This approach has driven major advances in identifying causal mutations for rare genetic disorders, with diagnostic yields ranging from 25-58% in clinical settings for undiagnosed cases, particularly Mendelian diseases. Key achievements include the rapid discovery of novel disease genes since the early , accelerating precision medicine applications in , , and . However, limitations persist, as exome methods may underperform in capturing certain structural variants or non-coding regulatory sequences implicated in complex traits, necessitating integration with broader genomic assays for comprehensive causal inference.

Definition and Biological Foundations

Core Definition

The exome comprises the aggregate of all exons within a genome, representing the protein-coding regions of genes that are transcribed into messenger RNA (mRNA) and subsequently translated into proteins. Exons are the segments of DNA that remain after intronic sequences are spliced out during RNA processing, forming the mature mRNA template for protein synthesis. This definition emphasizes the exome's role as the functional subset of the genome directly responsible for encoding amino acid sequences in polypeptides, excluding non-coding elements such as introns, promoters, enhancers, and intergenic regions. In the human genome, the exome encompasses approximately 180,000 exons across roughly 20,000-25,000 protein-coding genes, spanning about 30 million base pairs and constituting 1-2% of the total 3 billion base pair genome. More precisely, it accounts for around 1.5% of genomic DNA, yet this compact region harbors the majority—over 85%—of known disease-associated variants, underscoring its disproportionate biomedical significance despite its small size relative to non-coding DNA. Biologically, the exome's primary function lies in determining the proteome's diversity through sequence variations that alter , function, or expression levels, thereby influencing phenotypic traits and susceptibility to disorders. within exonic sequences, such as single variants or insertions/deletions, can disrupt coding frames or substitutions, leading to loss-of-function or gain-of-function effects in cellular processes. While the exome does not capture regulatory or structural genomic elements, its focus on coding exons provides a targeted lens for understanding Mendelian and complex genetic diseases rooted in protein dysfunction.

Relationship to Genome Structure

The exome consists of all exons across the protein-coding in a , representing the segments that are retained in mature mRNA after splicing and primarily encode sequences for proteins. These exons are embedded within the broader genomic architecture as discontinuous units, interspersed by introns—non-coding sequences that are transcribed but excised during RNA processing. This intron-exon organization, first elucidated in the late 1970s, enables , whereby different exon combinations from a single gene can produce multiple protein isoforms, thereby expanding functional diversity from a limited number of genes. In the , which spans approximately 3.2 billion base pairs, the exome constitutes roughly 1-1.5% of the total sequence, equivalent to about 30-45 million base pairs. This includes around 180,000 to 181,000 exons distributed across approximately 20,000 protein-coding genes, with internal exons (excluding untranslated regions) forming the core coding portions. The vast majority of the genome—over 98%—comprises non-exonic elements, including introns (which can exceed exons in length by orders of magnitude within individual genes), regulatory sequences, repetitive elements, and intergenic regions, underscoring the exome's compact role within a predominantly non-coding . This structural relationship highlights the exome's efficiency in encoding functional proteins amid genomic complexity, where exons often cluster in gene-dense regions but remain fragmented to facilitate evolutionary flexibility, such as exon shuffling or domain swapping. While not all exonic bases are strictly protein-coding (e.g., some contribute to untranslated regions or non-coding RNAs), the exome's focus on these elements prioritizes variants with direct impacts on protein function over the more diffuse effects in non-coding architecture.

Functional Role in Protein Coding

The exome consists of the exonic sequences within protein-coding genes, which collectively span approximately 1-2% of the , or about 30-60 million base pairs across roughly 20,000 genes. These exons serve as the primary template for protein synthesis, where their sequences are transcribed into pre-messenger RNA (pre-mRNA) transcripts that include both exons and intervening introns. During RNA processing, introns are precisely excised through splicing, and the exons are ligated to form mature mRNA, preserving the sequential order of exonic coding regions. The coding portions of these exons—known as coding DNA sequences (CDS)—are then translated by ribosomes in the , where each triplet codon specifies one of 20 or a stop signal, directly dictating the primary sequence of the resulting polypeptide chain. This sequence determines higher-order protein structures, including alpha helices, beta sheets, and domains essential for enzymatic , structural integrity, signaling, and molecular interactions. Not all exonic sequences code for proteins; exons also encompass untranslated regions (UTRs) at the 5' and 3' ends, which modulate mRNA stability, localization, and translation efficiency but do not contribute to the chain. Nonetheless, the within exons represent the core functional unit for protein coding, as their integrity ensures faithful replication of genetic information into functional diversity, underpinning cellular processes from metabolism to . Alterations in exonic , such as single changes, can disrupt this fidelity by introducing substitutions or truncations, though many such variants exert neutral effects due to and protein robustness.

Historical Development

Discovery of Exons and Gene Structure

In the decades preceding , the structure of eukaryotic genes was widely assumed to be colinear with the polypeptide products they encoded, featuring uninterrupted coding sequences from prokaryotic models extended to higher organisms. This view, rooted in earlier genetic studies like those on bacterial operons, lacked evidence for discontinuities in eukaryotic DNA despite hints from mismatched hybridization patterns. The discovery of discontinuous gene structure occurred in 1977 through independent experiments by at and Phillip A. Sharp at the , using adenovirus as a model system. Roberts' group and Sharp's group employed mapping, hybridizing poly(A)-containing viral mRNA to double-stranded genomic DNA under conditions that displace one DNA strand, forming RNA-DNA hybrids visualized via electron microscopy. This revealed distinct hybridized segments interrupted by unpaired DNA loops, indicating that genes comprise non-contiguous coding regions separated by intervening non-coding sequences. Sharp's team published findings in Proceedings of the demonstrating at least one large in the adenovirus hexon , while Roberts' work in identified multiple interruptions in late mRNAs, confirming the mosaic nature of eukaryotic genes. These observations showed that primary transcripts (pre-mRNA) include both coding exons—regions retained in mature mRNA—and introns, which are excised via to ligate exons into functional messages. The split-gene model explained discrepancies in gene size versus mRNA length and laid the foundation for understanding , where variable exon inclusion generates protein diversity from single genes. Roberts and Sharp received the 1993 Nobel Prize in or for these discoveries. The nomenclature "exon" for expressed, spliced segments and "intron" for intervening, removed sequences was proposed by in 1978, formalizing the structural elements in a article. Subsequent studies extended the finding to cellular genes, such as the ovalbumin and immunoglobulin genes in 1978, verifying introns' ubiquity in eukaryotes and their role in post-transcriptional processing. This from continuous to modular gene architecture enabled later concepts like the exome—the collective exonic portions of the genome targeted in sequencing for protein-coding variation analysis.

Emergence of Exome Sequencing

Exome sequencing emerged in the late 2000s as a targeted approach to interrogate protein-coding regions amid the high costs and data volume of whole-genome sequencing enabled by next-generation sequencing platforms. These platforms, commercialized around 2005–2007 by companies like Illumina and 454 Life Sciences, generated millions of short reads in parallel, but early applications strained computational and interpretive resources for , which comprises over 98% of the yet harbors fewer disease-causing variants. Exome sequencing addressed this by employing hybridization-based capture methods—using probes arrayed on beads or chips—to selectively enrich exons, the ~180,000 coding segments totaling approximately 30–60 megabases, prior to sequencing. This strategy leveraged the observation that ~85% of known disease-associated mutations in Mendelian disorders occur in exons, prioritizing causal realism in genetic diagnostics over exhaustive genomic coverage. The first proof-of-principle demonstration of whole came in 2009, when Ng et al. applied massively parallel sequencing to the exomes of four individuals affected by , a rare craniofacial disorder. By capturing and sequencing ~1% of the genome, they identified compound heterozygous mutations in the DHODH gene as the cause, filtering variants against unaffected relatives and population databases to pinpoint pathogenicity—a that confirmed the approach's efficacy for recessive disorders. This study, published in Science, marked the initial use of to resolve an unknown causal gene in a Mendelian condition, building on prior targeted resequencing but scaling it genome-wide via commercial capture kits like those from Agilent or NimbleGen, which achieved ~70–90% enrichment efficiency for targeted regions. Concurrently, similar efforts identified mutations in TTN for familial , underscoring exome sequencing's utility in heterogeneous phenotypes. Rapid adoption followed due to exome sequencing's cost-effectiveness—reducing per-sample expenses to under $1,000 by versus millions for early whole-genome efforts—and its focus on interpretable data, facilitating discoveries in undiagnosed cases. Between 2009 and 2011, applications expanded to mutations in and , with studies like those from the Autism Genome Project revealing novel variants in synaptic genes. Methodological refinements, including improved bait designs for splice sites and UTRs, enhanced coverage uniformity, mitigating biases in GC-rich regions that plagued initial arrays. By privileging empirical variant calling over speculative non-coding analysis, catalyzed a toward clinically actionable , though reliant on accurate reference annotations from projects like GENCODE.

Key Milestones in Application

In 2009, researchers led by Sarah B. Ng and Jay Shendure at the conducted the first successful application of whole (WES) to identify a causative gene for a rare Mendelian disorder, sequencing the protein-coding regions in two unrelated individuals with and pinpointing biallelic mutations in the DHODH gene, which encodes an enzyme in pyrimidine biosynthesis. This proof-of-principle study, published in early 2010, achieved approximately 75% coverage of targeted exons at 20-fold depth using array-based capture and massively parallel sequencing on the Illumina platform, highlighting WES's efficiency over whole-genome approaches for variant discovery in coding regions where most disease-causing mutations reside. The finding validated WES as a targeted, cost-effective method for monogenic disease gene identification, reducing sequencing burden from billions to roughly 30 million base pairs. Building on this, 2010 saw WES extended to sporadic neurodevelopmental disorders, with studies employing sequencing ( and parents) to detect mutations; for instance, Veltman and colleagues identified disruptive variants in genes like DOCK8 and SCN2A in children with , achieving diagnostic yields through high-confidence calls in ~95% of targeted exons. Concurrently, WES uncovered the genetic basis of via mutations in MLL2 (now KMT2D), reported by Ng et al. in a cohort of 10 affected individuals, demonstrating scalability to small pedigrees and emphasizing heterozygous loss-of-function variants. These applications shifted paradigms from linkage-based mapping to direct variant interrogation, accelerating discovery rates. By 2011, WES had identified causal variants for over 20 Mendelian conditions, including Schinzel-Giedion syndrome (SETBP1) and variants contributing to autism spectrum disorders in large cohorts like those from the , where de novo events in regulators were enriched. Clinical translation advanced in 2012, with institutions like implementing WES in diagnostic pipelines for undiagnosed pediatric cases, yielding positive molecular diagnoses in ~18% of trios with suspected genetic disorders through hybrid capture kits covering >95% of consensus coding sequences. This milestone marked WES's transition from research tool to routine , supported by falling costs (under $1,000 per exome by mid-decade) and improved bioinformatics for . Subsequent years featured large-scale consortia applications, such as the 2013 Deciphering Developmental Disorders project in the UK, which applied WES to over 4,000 trios and diagnosed ~27% of cases with novel or known variants, informing genotype-phenotype correlations. In , 2011-2012 studies like those by Kandoth et al. used WES on tumor-normal pairs to catalog somatic mutations in , revealing mutated pathways in ~90% of samples and paving the way for precision . By 2015, WES had contributed to ~1,000 disease gene discoveries, with diagnostic rates in cohorts reaching 25-40%, though limited by non-coding variant oversight. These milestones underscore WES's causal impact on resolving in both rare monogenic and .

Sequencing Methodologies

Principles of Next-Generation Sequencing

Next-generation sequencing (NGS), also known as massively parallel sequencing, enables the simultaneous analysis of millions to billions of short DNA fragments, achieving throughput orders of magnitude higher than Sanger sequencing's chain-termination method, which processes one sequence at a time. Introduced commercially around 2005 with platforms like the 454 Genome Sequencer, NGS principles center on parallelizing the sequencing reaction across immobilized DNA clusters or single molecules, reducing per-base costs from approximately $10 in the early 2000s to under $0.01 by 2020. This shift supports applications in , including targeted approaches like , by generating vast datasets of short reads (typically 50–300 base pairs) that are assembled via computational alignment. The core workflow commences with nucleic acid extraction from biological samples, yielding high-quality DNA or RNA free of contaminants, followed by library preparation. In library preparation, genomic DNA is fragmented mechanically or enzymatically into segments of 100–500 base pairs, ends are repaired for blunt or A-overhang , and platform-specific adapters—containing indices for multiplexing and priming sequences—are attached via . then occurs, either through emulsion PCR (emPCR) for bead-based systems or solid-phase bridge on flow cells, producing clonal clusters of up to 10^9 molecules per square millimeter to enhance signal detection. For targeted sequencing such as exomes, hybridization capture with biotinylated probes complementary to exonic regions enriches the library prior to , focusing ~1–2% of the . Sequencing itself employs detection of nucleotide incorporation or ligation events in real time. In dominant sequencing-by-synthesis (SBS) methods, used by Illumina platforms processing over 90% of NGS data, reversible terminator nucleotides labeled with distinct fluorophores are added by ; incorporation halts extension, fluorescence is imaged to identify the base (A, C, G, or T), terminators are cleaved, and the cycle repeats for each position. Alternative principles include sequencing by (e.g., Applied Biosystems SOLiD), where fluorescently labeled di- or trinucleotide probes are ligated to the template and queried in two-base encoding to reduce errors, or ion semiconductor sequencing (Ion Torrent), detecting pH changes from hydrogen ion release during without optics. These methods yield raw signals converted to base calls, with error rates around 0.1–1% per base, mitigated by high coverage (often 30–100x for exomes). Post-sequencing, bioinformatics pipelines handle : primary for base calling and scoring (e.g., Phred scores >Q30 indicating 99.9% accuracy), secondary to genomes using tools like BWA or Bowtie, and tertiary detection via callers such as GATK, which model sequencing errors and population frequencies. This principled framework, emphasizing scalability and error correction through redundancy, has driven NGS adoption since its validation in projects like the Human Genome Project's later phases, though it introduces challenges like PCR-induced biases and short-read ambiguities in repetitive regions.

Whole Exome Sequencing Techniques

Whole exome sequencing (WES) targets the approximately 1-2% of the human genome comprising protein-coding exons, using next-generation sequencing (NGS) platforms after targeted enrichment to focus on these regions. The core technique involves preparing a sequencing library from genomic DNA, enriching for exonic sequences via hybridization capture, and generating high-depth sequence data to identify variants. This approach reduces sequencing costs compared to whole-genome sequencing by prioritizing functionally relevant areas, typically achieving 100x average coverage across ~30-60 Mb of targeted exome space. Library preparation begins with high-quality genomic DNA input, requiring at least 150 ng (preferably 500 ng) of purified, non-degraded DNA extracted via phenol:chloroform methods to ensure integrity. DNA is fragmented to 150-300 bp sizes using mechanical shearing (e.g., ultrasonication) or enzymatic methods like transposases in kits such as Illumina Nextera. Fragments undergo end repair, A-tailing, and ligation of platform-specific adapters for multiplexing and amplification, followed by limited PCR cycles (typically 6-12) to generate the library while minimizing bias. Formalin-fixed paraffin-embedded (FFPE) samples can be used but often yield lower-quality libraries due to degradation. Target enrichment predominantly employs solution-based hybridization capture, where biotinylated oligonucleotide probes (baits), designed to cover consensus coding sequences (CCDS) and additional untranslated regions, hybridize to library fragments in solution. Captured targets are isolated using streptavidin-coated magnetic beads, with non-hybridized off-target DNA washed away, followed by post-capture PCR amplification to enrich the pool. Common commercial kits include Agilent SureSelect (targeting ~50 Mb with 120 bp RNA baits, effective for indel detection), Roche NimbleGen SeqCap EZ (~64 Mb with 55-105 bp DNA probes, offering uniform GC-rich coverage), and Illumina TruSeq or IDT xGen panels (~39-62 Mb, using transposase-based prep for efficiency). Array-based capture, using probes immobilized on microarrays, is less common due to longer hybridization times and lower throughput. PCR-amplification-based methods are limited to smaller gene panels, not scalable for whole-exome coverage. Sequencing occurs on short-read NGS platforms, with Illumina systems (e.g., NovaSeq or HiSeq) dominating due to high throughput and accuracy; paired-end 150 reads are standard, generating at least 45 million reads per sample for ~100x mean depth across captured regions. Alternative platforms like Ion Torrent provide semiconductor-based detection but are less prevalent for WES owing to shorter reads and homopolymer errors. Post-sequencing, assesses on-target rate (typically 50-80%), duplication levels, and uniformity to mitigate biases in GC-rich or repetitive exons. Variations in probe design and hybridization conditions influence capture efficiency, with modern kits improving off-target reduction and variant detection in challenging regions.

Comparison to Whole Genome Sequencing

Whole exome sequencing (WES) selectively captures and sequences the exons, which constitute approximately 1-2% of the and encode proteins, in contrast to (WGS), which analyzes the entire ~3 billion base pairs, including non-coding introns, regulatory elements, and intergenic regions. This focused approach in WES enables higher sequencing depth (often 100x or more) within targeted regions for equivalent resource investment, improving sensitivity for detecting single nucleotide variants and small insertions/deletions in coding sequences. WGS, while providing uniform but shallower coverage (typically 30x genome-wide), better resolves structural variants, copy number variations, and non-coding mutations that WES may overlook due to capture inefficiencies or off-target gaps. Cost remains a primary differentiator, with WES historically 2-5 times less expensive than WGS owing to reduced volume—WES generates ~4-12 gigabases per sample versus ~90-120 gigabases for WGS—lowering sequencing, , and bioinformatics demands. As of , WES costs ranged from $500-1,000 per sample, compared to $1,000-2,000 for WGS, though WGS prices have declined faster due to in high-throughput platforms, projecting parity in some clinical contexts by 2025. WES thus suits targeted investigations of protein-coding diseases, where non-coding contributions are minimal, but WGS offers superior comprehensiveness for or undiagnosed cases involving regulatory or somatic alterations.
AspectWhole Exome Sequencing (WES)Whole Genome Sequencing (WGS)
Genomic Coverage~1-2% (exons only); higher depth in targets (95-160x achieves 95% coding regions at ≥20x)100%; shallower uniform depth (e.g., 30x genome-wide, 98% at ≥20x in coding)
Variant DetectionExcels in coding SNVs/indels; misses ~5-10% of exonic variants due to capture bias; limited for structural/non-codingDetects broader variants including non-coding, , and structural; higher overall rare variant yield
Cost and Data LoadLower (~$500-1,000); less burdenHigher (~$1,000-2,000); greater / needs, but decreasing
Diagnostic YieldHigh for Mendelian/rare coding disorders (20-40% solve rate)Marginally higher (up to 10% more in trios); better for novel/non-coding causes
Despite WES's efficiency, its reliance on hybridization capture introduces biases, such as uneven coverage (e.g., GC-rich regions underrepresented), potentially reducing accuracy compared to WGS's PCR-free methods. WGS, however, demands advanced computational infrastructure for variant calling across vast non-coding "," where most variants are benign, complicating interpretation without prior disease hypotheses. In practice, WES predominates in clinical diagnostics for its balance of yield and feasibility, while WGS is preferred for research into population-level or non-Mendelian .

Applications in Research and Medicine

Diagnostic Uses in Rare Diseases

Whole exome sequencing (WES) has become a primary for diagnosing rare genetic diseases, particularly those suspected to be Mendelian in , by identifying pathogenic variants in protein-coding regions where approximately 85% of known disease-causing mutations reside. In clinical settings, WES is often applied to pediatric patients with undiagnosed developmental delays, intellectual disabilities, or congenital anomalies, enabling a molecular in cases to traditional testing. A 2019 study of over 3,000 unrelated patients with suspected disorders reported a diagnostic yield of 25% for WES, rising to 40% in trios (patient plus parents) due to improved variant filtering via inheritance patterns. This yield reflects the method's ability to detect de novo mutations, which account for up to 50% of cases in sporadic severe disorders like autism spectrum disorder or epileptic encephalopathies. Real-world implementations, such as the UK's Deciphering Developmental Disorders () project initiated in 2015, have sequenced over 13,000 trios by 2020, yielding diagnoses in 28% of previously undiagnosed cases and identifying novel gene-disease associations in 15%. Similarly, the Undiagnosed Diseases Network (UDN) in the , operational since 2013, integrates WES with phenotypic data, achieving a 35-40% solve rate for cases after extensive prior testing, often pinpointing variants in genes like SCN1A for or PIGA for congenital disorders of glycosylation. These successes stem from WES's cost-effectiveness—around $500-1,000 per exome as of 2023—compared to , while focusing on interpretable coding variants amenable to ACMG guidelines for pathogenicity classification. WES's diagnostic utility extends to adult-onset rare diseases, such as hereditary cardiomyopathies or ataxias, where reanalysis of prior sequencing data has increased yields by 10-20% over time due to accumulating variant databases like gnomAD, which by 2024 catalogs over 800,000 exomes for benign variant benchmarking. However, yield varies by disease category: highest (up to 50%) in neurodevelopmental disorders with high rates, lower (10-15%) in heterogeneous adult phenotypes like idiopathic . Integration with RNA sequencing or functional assays enhances confirmation, as seen in a 2022 cohort where 11% of provisional WES diagnoses were refined via transcriptomics, underscoring the method's role in causal variant validation.00095-0) Despite these advances, negative WES results do not rule out non-coding or structural variants, prompting sequential or combined approaches in persistent cases.

Insights into Mendelian and Complex Disorders

Exome sequencing has profoundly impacted the discovery of causative variants in Mendelian disorders, which are typically monogenic conditions following predictable inheritance patterns such as autosomal dominant, recessive, or X-linked. In 2010, it was first applied successfully to identify biallelic mutations in the DHODH gene as the cause of , a rare craniofacial disorder, marking the initial proof-of-principle for using this method in human disease gene discovery. By 2011, over 30 Mendelian disease genes had been identified through , accelerating the pace beyond traditional linkage-based approaches. Clinical studies have reported diagnostic yields of approximately 25% in cohorts of patients with suspected genetic disorders evaluated via trio , where parental samples help distinguish or inherited variants. As of 2019, next-generation sequencing methods, predominantly , accounted for about 36% (1,268 out of 3,549) of all reported Mendelian disease genes, demonstrating its efficiency in pinpointing rare, high-penetrance coding variants that were previously elusive. Ongoing efforts, such as those by the Centers for Mendelian Genomics, continue to uncover genes for hundreds of rare conditions by expanding phenotype-gene associations through large-scale sequencing. In complex disorders, characterized by polygenic architectures and environmental interactions, provides insights primarily into rare, protein-altering variants that contribute to disease risk, complementing genome-wide association studies focused on common variants. Analysis of exomes from 281,104 participants revealed that rare coding variants explain a substantial portion of for traits like levels and , with some variants conferring odds ratios exceeding 10 for specific conditions. For instance, in and other serious mental illnesses, whole in dense families has highlighted an enrichment of ultra-rare, damaging variants in genes involved in synaptic function, suggesting a role for such mutations alongside polygenic risk scores. However, its contributions remain incremental compared to Mendelian applications, as often involve non-coding regulatory elements outside the exome's scope, limiting resolution of full causal mechanisms without integration with whole-genome data. Diagnostic utility in complex neurodevelopmental disorders, such as , yields positive findings in 10-40% of cases, often revealing oligogenic or contributions that inform recurrence risks. These insights underscore exome sequencing's strength in detecting functionally interpretable variants—single changes, small indels, and copy number alterations in coding regions—but highlight the need for orthogonal validation, such as functional assays, to confirm pathogenicity amid challenges like incomplete in . Annual discovery rates for Mendelian genes have reached around 300 via exome-based approaches, sustaining momentum in cataloging the estimated 4,000-8,000 such disorders while gradually refining polygenic models for common diseases.

Broader Genomic and Population Studies

The Genome Aggregation Database (gnomAD) aggregates exome sequencing data from over 730,000 individuals across diverse ancestries, enabling precise estimation of allele frequencies for coding variants and identification of population-specific patterns of genetic variation. This resource has revealed that loss-of-function variants in constrained genes occur at lower frequencies than expected under neutrality, reflecting purifying selection against deleterious coding mutations, with rates varying by ancestry due to differences in effective population size and demographic history. In large cohorts, exome data facilitates gene-burden analyses to quantify the contribution of rare coding variants to disease heritability, as demonstrated in studies of immune-mediated disorders where such variants explain a significant portion of polygenic risk in European-descent populations. Exome sequencing has been applied to dissect population structure and admixture, providing higher resolution for coding regions compared to SNP arrays in some contexts, particularly for rare variants that inform recent evolutionary history. In isolated populations, such as the Vis group in , whole-exome sequencing uncovered elevated frequencies of homozygous loss-of-function variants attributable to founder effects and , highlighting how reduced diversity amplifies the detectability of selection signals in coding sequences. These analyses underscore exome data's utility in modeling and bottlenecks, where coding variants under selection serve as markers of adaptive processes more reliably than neutral non-coding sites. In evolutionary , exome-wide scans have detected signatures of polygenic in regions, such as heightened selection on genes related to pigmentation and in Arctic indigenous groups like the Nganasans, evidenced by an excess of derived alleles in functional exons compared to neutral expectations. Similarly, exome data from temperate plant populations have revealed mitigating local in loci under climatic pressure, with selective sweeps identifiable through reduced polymorphism in targeted exons. Such findings emphasize that while exome sequencing captures only protein- , it offers causal insights into functional by prioritizing variants with direct phenotypic effects, though interpretations must account for incomplete coverage of regulatory elements.

Limitations and Criticisms

Technical and Coverage Shortcomings

Whole exome sequencing (WES) exhibits uneven coverage across targeted exons, with sequence reads often distributed non-uniformly, leading to low-coverage regions that compromise variant calling accuracy. This variability arises from capture kit inefficiencies and sequencing biases, where certain genomic features like GC-rich or repetitive sequences are underrepresented, resulting in effective coverage below the targeted 95% of coding regions in many datasets. Capture efficiency remains a persistent technical limitation, with platforms such as Agilent SureSelect yielding 42-58% of reads on target and Illumina TruSeq around 45-46%, necessitating higher sequencing depths to compensate for off-target reads and achieve adequate exon coverage.00127-3.pdf) Even modern kits, while improving to over 97.5% at 10x depth and 95% at 20x, still suffer from platform-specific biases that exacerbate undercoverage in medically relevant genes, such as those with high or pseudogenes. Short-read technologies inherent to WES struggle with detecting insertions/deletions (indels) and structural variants due to alignment ambiguities in complex regions like homopolymers, contributing to error rates and false negatives. Nonuniformity is further compounded by sample-specific factors, including DNA quality and library preparation artifacts, which can reduce callable regions by up to 10-20% in clinical applications. These shortcomings collectively limit WES's sensitivity for rare variants, often requiring supplementary methods like targeted resequencing for validation.

Challenges in Variant Interpretation

Interpreting variants identified through whole exome sequencing (WES) presents substantial hurdles due to the high volume of detected alterations—often thousands per sample—most of which represent benign common polymorphisms rather than disease-causing changes. Distinguishing pathogenic variants requires integrating multiple lines of evidence, including population allele frequencies, computational predictions of functional impact, and segregation patterns in families, yet these tools frequently yield inconclusive results. The American College of and Genomics (ACMG) guidelines provide a framework for classification into categories such as pathogenic, likely pathogenic, benign, likely benign, or variants of uncertain significance (VUS), but application remains subjective and resource-intensive. A predominant issue is the prevalence of VUS, which over 70% of unique variants in databases like are classified as such, with rates growing over time due to expanding genomic data without corresponding functional validation. In WES and genome sequencing contexts, VUS reporting occurs in approximately 22.5% of cases, lower than in multi-gene panels (32.6%) but still nondiagnostic and complicating clinical decision-making. Among tested individuals, 41% harbor at least one VUS, with 31.7% receiving only VUS results, often leading to retesting or delayed diagnoses as evidence accumulates—10-15% of reclassified VUS shift to pathogenic or likely pathogenic. Additional pitfalls include incomplete , where pathogenic variants do not consistently manifest phenotypes, and phenocopies from environmental or non-genetic factors mimicking hereditary patterns. Technical artifacts from variant calling, such as alignment errors in repetitive regions or pseudogenes, can generate false positives that evade initial filters, necessitating orthogonal validation like . Rare variants in understudied populations lack robust frequency data, exacerbating misclassification risks, while the absence of comprehensive functional assays—due to ethical and logistical constraints—limits for missense or synonymous changes. These factors contribute to diagnostic , with unsolved cases often attributable to interpretive complexity rather than sequencing failures.

Economic and Practical Barriers

The high cost of whole exome sequencing (WES) remains a primary economic barrier, with estimates ranging from $555 to $5,169 per sample, often exceeding $2,000 in clinical settings as of recent analyses. These figures encompass sequencing, library preparation, and basic analysis but exclude downstream bioinformatics and interpretation, which can add hundreds per sample depending on complexity. In comparison to targeted panels, WES incurs higher expenses due to broader coverage, limiting its routine use despite superior diagnostic yields in undiagnosed cases. Reimbursement challenges exacerbate issues, as insurers frequently coverage for WES owing to perceived insufficient of clinical and high financial burden, with rates reaching 47.5% in some U.S. cohorts. In resource-constrained regions, such as , adoption is further hindered by lack of public funding and infrastructure for scaling WES, confining it to research or affluent private sectors. Even in high-income settings, hospitals face economic disincentives, as upfront investments in sequencing platforms and data storage yield long-term returns only through high-volume applications, which many lack. Practical barriers include the requirement for specialized computational infrastructure to handle the terabytes of data generated per exome, necessitating robust servers, software pipelines, and ongoing maintenance costs not always accounted for in sequencing quotes. Variant interpretation demands multidisciplinary teams of bioinformaticians, geneticists, and clinicians, whose scarcity delays implementation and increases operational overhead. Turnaround times for WES, typically 1-2 weeks in optimized labs but extending to months in standard clinical workflows, outpace targeted tests, posing challenges for time-sensitive diagnostics like pediatric or prenatal cases. In underserved populations, these factors compound with logistical hurdles, such as sample transport and limited genetic counseling, underscoring systemic inequities in genomic testing deployment.

Ethical and Societal Implications

Informed consent for is complicated by the test's broad scope, which can reveal pathogenic variants, variants of uncertain significance (VUS), and unsolicited secondary findings beyond the primary diagnostic aim, often overwhelming patients with uncertainties that traditional genetic tests do not produce. Clinical geneticists typically address these elements—test purpose, potential outcomes, familial implications, and result interpretation—during sessions lasting 30–45 minutes, employing layered approaches with initial concise overviews followed by tailored details, analogies (e.g., comparing sequencing to an ), and encouragement of questions to enhance comprehension. However, challenges persist, including time constraints in mainstream clinical settings, patient misconceptions (e.g., about repercussions), language barriers, and varying levels, which genetic counselors mitigate by prioritizing collaborative decision-making, expectation management, and understanding assessment over exhaustive technical explanations. In pediatric cases, must navigate additional ethical tensions, such as obtaining child assent where feasible and weighing long-term implications for relatives, while avoiding therapeutic misconceptions about guaranteed diagnoses. Privacy risks arise from exome sequencing's generation of voluminous data covering approximately 1–2% of the genome's protein-coding regions, which contain highly identifiable single polymorphisms (SNPs) susceptible to re-identification attacks even in ostensibly anonymized datasets. Sharing exome data in databases amplifies these vulnerabilities, as demonstrated by 2024 analyses showing that linking genomic variants from public and private repositories can deanonymize individuals, potentially exposing sensitive without consent. Under frameworks like HIPAA, genetic data may be disclosed to healthcare providers without explicit permission in certain scenarios, heightening misuse risks by insurers, employers, or forensic entities, as seen in cases leveraging consumer databases for identifications. Consent processes must therefore explicitly cover , secondary uses, and safeguards, with recommendations emphasizing -centric controls, such as opt-in and , to balance benefits against these persistent threats.

Management of Incidental Findings

In exome sequencing, incidental findings—also termed secondary findings—consist of pathogenic or likely pathogenic in genes unrelated to the primary diagnostic indication but associated with significant health risks amenable to intervention. These arise due to the broad coverage of regions, potentially revealing risks for conditions such as hereditary cancers or cardiovascular disorders. prioritizes those with established clinical utility to enable preventive measures, while avoiding disclosure of of uncertain significance that could cause undue anxiety without actionable benefit. The American College of Medical Genetics and Genomics (ACMG) established foundational guidelines in 2013, recommending that clinical laboratories actively search for and report such variants in a minimum set of 56 genes linked to highly penetrant, treatable or preventable conditions across categories including cancer susceptibility (e.g., TP53, PTEN), cardiac arrhythmias (e.g., KCNQ1, ), and . Variants qualifying for reporting must meet criteria for known pathogenic or expected pathogenic status based on population data, functional evidence, and studies, with updates to the gene list (e.g., ACMG SF v3.0 and later versions) reflecting ongoing curation by expert panels to incorporate new evidence on actionability. Laboratories performing constitutional are required to include this analysis, reporting findings directly to ordering clinicians regardless of patient age, though pre-test counseling must inform patients of the possibility, allowing opt-out by declining sequencing altogether. Post-disclosure management entails multidisciplinary coordination: to interpret variant pathogenicity and , confirmatory testing via targeted methods, and specialist referrals for surveillance or therapy, such as enhanced or implantation of cardioverter-defibrillators. Actionable incidental findings occur in 1-3% of exome sequences across large cohorts, with rates varying by ancestry and sequencing depth; for example, 3.02% in the eMERGE network's 21,915 participants and 0.58% for unsolicited findings in 16,482 pediatric cases. Non-ACMG-recommended incidental variants, which constitute the majority, are typically not pursued due to insufficient of net benefit, though some programs offer opt-in for broader , yielding uptake rates of about 50% among informed patients. Key challenges include clinician burden from interpreting and coordinating long-term follow-up, potential psychological distress to patients from unanticipated risks, and resource strain in settings where providers report concerns over direct-to-patient reporting and sustained management needs. Ethical tensions persist between beneficence—delivering potentially lifesaving information—and respect for , with critics arguing mandatory reporting overrides preferences, while proponents emphasize professional duty akin to other medical disclosures. Empirical underscore low recontact rates for updated interpretations (under 10% in follow-up studies), highlighting the need for standardized protocols to balance disclosure with practical feasibility.

Debates in Prenatal and Pediatric Contexts

In prenatal , obtaining valid poses significant challenges due to the intricate nature of genomic data, the emotional distress of fetal anomalies, and the compressed timeline for decision-making, often within weeks of invasive testing. Parents may receive generic consent forms to mitigate , yet debates persist on whether this suffices for understanding risks like variants of uncertain significance (VUS), which can comprise up to 20-30% of results and complicate reproductive choices such as termination without definitive pathogenicity. The return and management of findings further fuels controversy, including incidental discoveries like non-paternity (reported in 1-2% of cases) or carrier status for adult-onset disorders, raising questions about parental versus duties to the future child, such as promoting or ensuring health. Professional responsibilities extend to , reanalysis over time, and counseling on non-actionable results, with the American College of Medical Genetics and Genomics (ACMG) issuing "points to consider" rather than prescriptive guidelines, underscoring unresolved tensions between diagnostic potential and potential harm from uncertainty. In pediatric contexts, exome sequencing's diagnostic yield of approximately 25-40% for congenital anomalies or supports its endorsement by ACMG as a first- or second-tier test, yet ethical debates focus on unsolicited or secondary findings, which parents often request despite conflicts with the child's future autonomy and right not to know. These findings, actionable in about 1-4% of cases per ACMG criteria, can reveal risks irrelevant to childhood, prompting burdens like parental anxiety or strained family dynamics, with limited empirical data on long-term impacts. Consent processes remain contentious, balancing parental authority with assent requirements for older children (typically ages 7-12 and above), amid concerns over of probabilistic results and the gap between extensive data and actionable interventions. Access inequities, driven by variability and geographic expertise shortages, amplify debates on , as lower correlates with reduced uptake despite potential benefits.

Empirical Data and Statistics

Genomic Proportions and Variant Statistics

The exome, comprising the protein-coding portions of the , represents approximately 1-1.5% of the total genomic sequence, equivalent to roughly 30-45 million base pairs out of the approximately 3 billion base pairs in the diploid . Despite this limited proportion, exonic regions harbor about 85% of known disease-associated genetic variants, as mutations altering protein sequences are disproportionately linked to Mendelian disorders and . In whole (WES) of unrelated individuals of ancestry, the median number of coding variants per totals around 18,400-20,000 single variants (SNVs) and small insertions/deletions (indels), with approximately half being synonymous (not altering the ) and the remainder nonsynonymous. Nonsynonymous variants include missense changes (altering a single ) and predicted loss-of-function (pLoF) variants such as mutations, frameshifts, or splice-site disruptions, which occur at medians of about 8,700 and 120 per individual, respectively. Across large cohorts like the UK Biobank's initial 49,960 exomes, these variants collectively catalog over 4 million unique coding positions, with rare pLoF alleles ( <0.01%) numbering fewer than 1% of total exonic variation but enriched in disease-relevant genes.
Variant TypeMedian per IndividualApproximate Proportion of Total Coding Variants
Synonymous9,584~50%
Missense8,702~47%
pLoF120~1%
Data derived from exome sequencing of 49,960 UK Biobank participants. Synonymous variants predominate due to neutral evolutionary pressures, while missense and pLoF variants exhibit purifying selection, appearing at lower population frequencies; for instance, pLoF alleles are depleted in essential genes compared to non-essential ones. In aggregate, an individual's exome carries 10,000-12,000 nonsynonymous variants, of which only a small fraction (typically <1%) are rare and potentially pathogenic, necessitating prioritization tools for clinical interpretation.

Diagnostic Yield and Success Rates

The diagnostic yield of clinical (ES), defined as the proportion of cases yielding a molecular diagnosis explaining the patient's phenotype, varies by patient population and testing context but typically ranges from 25% to 40% in pediatric cohorts with suspected rare genetic disorders. A 2023 meta-analysis of ES in pediatric rare diseases reported an aggregated yield of approximately 37.8%, with higher rates observed in trio sequencing (probands plus parents) compared to proband-only analysis. In a cohort of 868 children with neurodevelopmental disorders, the yield reached 27% overall, rising to 34% for intellectual disability cases and 32% for epileptic encephalopathies. These figures reflect ES's strength in identifying single-nucleotide variants and small indels in coding regions, which account for a majority of Mendelian disease causes. In adult patients, diagnostic yields are generally lower, often 10-20%, due to greater phenotypic heterogeneity, later onset, and confounding environmental factors. A 2025 study of adult rare disease referrals reported yields varying from 6.1% for certain neuromuscular indications to 42.9% for select metabolic disorders, with neurodevelopmental phenotypes yielding 13.3%. Prenatal ES yields are similarly modest, around 20-30%, limited by fetal tissue availability and incomplete penetrance data. Factors influencing yield include prior negative testing (higher yield post-exclusion of common variants), phenotypic specificity, and bioinformatics pipelines; for instance, reanalysis of unsolved cases can increase yield by 10-15% over time as databases expand. Technical success rates for ES exceed 95% in most clinical labs, encompassing successful capture, sequencing coverage (typically >95% of exome at 20x depth), and variant calling, though challenges arise in samples with low DNA quality or high GC content. Clinical utility extends beyond diagnosis, with 60-80% of positive cases informing management changes, such as targeted therapies or avoiding ineffective treatments. Comparative studies show ES yields comparable to chromosomal microarray in many pediatric settings (ES: 27.1% vs. CMA: 13.6% for short stature), but lower than whole-genome sequencing (WGS) by 5-10% due to non-coding variant misses. Ongoing improvements in variant interpretation databases continue to refine these metrics.
ContextDiagnostic Yield RangeKey Reference
Pediatric rare diseases (trio ES)30-40%Meta-analysis, 2023
Neurodevelopmental disorders25-35%Cohort of 868 children, 2023
Adult rare diseases10-20%Indication-specific, 2025
Epilepsy/encephalopathies30-43%Specialized cohorts, 2024

Comparative Efficacy Metrics

Exome sequencing (ES) typically achieves diagnostic yields of 25% to 40% in unselected cohorts of patients with suspected rare genetic disorders, with higher rates (up to 58%) in pediatric cases after negative conventional testing. In direct comparisons with whole-genome sequencing (WGS), ES demonstrates slightly lower but comparable efficacy for identifying coding variants, which predominate in Mendelian diseases; one modeling study in children with suspected genetic disorders reported yields of 58% for first-line ES versus 64% for WGS, attributing the difference to WGS's detection of non-coding and structural variants missed by ES. Meta-analyses indicate variability, with some showing ES yields exceeding WGS (40% versus 34%) due to cohort heterogeneity and ES's focus on high-confidence exonic regions, though WGS generally offers incremental benefits (5-10% additional diagnoses) at higher computational and interpretive costs. Compared to targeted gene panels, ES provides superior breadth for heterogeneous or undiagnosed cases, though panels excel in cost and speed when a narrow is suspected. In primary immunodeficiencies, targeted panels yielded 56% diagnoses across 780 patients, with sequential ES adding only 2% more (total 58%), while standalone ES in challenging subsets reached 45%; panels cost $1,700 per test with <4-week turnaround versus $2,500 and 3 months for ES. Broader reviews confirm ES's advantage (30-40% yield) over small panels (10-20%) in unselected cohorts, as panels limit detection to predefined s, potentially missing novel or atypical variants. Versus chromosomal microarray analysis (CMA), ES detects sequence-level variants absent in CMA, yielding combined diagnostic rates of 20-30% in conditions like , where ES alone contributes 15-25% beyond CMA's copy number focus. Cost-effectiveness analyses favor ES over WGS for initial broad screening, with ES testing at €1,800 ($1,958) versus WGS at €3,700 ($4,024), though WGS proves viable as first-line (€21,000-€30,000 incremental cost per additional diagnosis) in severely ill infants to expedite comprehensive results.
Sequencing MethodTypical Diagnostic YieldKey ContextsRelative Cost (per test)Source
(ES)25-58%Pediatric rare diseases, post-negative testing€1,800 ($1,958)
Whole-Genome Sequencing (WGS)34-64%Comprehensive variant detection, including non-coding€3,700 ($4,024)
Targeted Panels10-56%Focused differentials (e.g., immunodeficiencies)$1,700
Conventional/43%Initial cytogenetic or single-gene tests€450 ($489)
These metrics underscore ES's balance of efficacy and practicality for exonic-focused diagnostics, with WGS reserved for unresolved cases requiring non-coding interrogation, though empirical gains from WGS remain modest relative to increased data volume and analysis demands.

References

  1. [1]
    Exome - National Human Genome Research Institute
    An exome is the sequence of all the exons in a genome, reflecting the protein-coding portion of a genome. In humans, the exome is about 1.5% of the genome.
  2. [2]
    Sequencing Your Genome: What Does It Mean? - PMC - NIH
    The human genome contains about 180,000 exons, which are collectively called an exome. An exome comprises about 1% of the human genome and hence is about 30 ...
  3. [3]
    Exome sequencing in genetic disease: recent advances and ... - NIH
    May 6, 2020 · Exome sequencing (ES) is the targeted sequencing of nearly every protein-coding region of the genome. Typically, either a hybridization capture ...
  4. [4]
    A three-year follow-up study evaluating clinical utility of exome ...
    Sep 10, 2020 · Exome sequencing (ES) has become one of the important diagnostic tools in clinical genetics with a reported diagnostic rate of 25–58%.
  5. [5]
    Genes to therapy: a comprehensive literature review of whole ...
    Nov 10, 2024 · Genes to therapy: a comprehensive literature review of whole-exome sequencing in neurology and neurosurgery ... coding regions of the ...
  6. [6]
    What are whole exome sequencing and whole genome sequencing?
    Jul 28, 2021 · Together, all the exons in a genome are known as the exome, and the method of sequencing them is known as whole exome sequencing. This ...Missing: biology | Show results with:biology
  7. [7]
    Not all exons are protein coding: Addressing a common misconception
    Apr 12, 2023 · ... coding regions rather than all transcribed regions. Previous article in issue; Next article in issue. Keywords. exons. introns. splicing. exome ...Missing: paper | Show results with:paper
  8. [8]
    What is exome sequencing? - Broad Institute
    Oct 15, 2010 · The “exome” consists of all the genome's exons, which are the coding portions of genes.
  9. [9]
    Exome - an overview | ScienceDirect Topics
    The exome represents the total protein coding sequence of all known genes, basically all the exons of the approximately 25,000 gene loci currently recognized.
  10. [10]
    Exome Sequencing: Current and Future Perspectives - PMC - NIH
    This review will focus on exome sequencing, a method that targets only a subset of the genome, often the protein coding portion, significantly reducing the ...
  11. [11]
    Finding the lost treasures in exome sequencing data - PMC - NIH
    Aug 22, 2013 · The exome represents approximately 1–1.5% of the human genome with approximately 50 million bp, but it accounts for over 85% of all mutations ...
  12. [12]
    Whole Genome vs Exome Sequencing - Illumina
    See how combining whole-genome and exome sequencing can yield important insights into variants related to autoimmune disorders such as lupus. Read interview ...
  13. [13]
    The human genome contains over a million autonomous exons - PMC
    There are ∼181,000 annotated internal exons within the ∼20,000 known human protein-coding genes; these constitute roughly 1% of the human genome. An even larger ...
  14. [14]
    Evolution and Functional Impact of Rare Coding Variation from ... - NIH
    By capturing and sequencing all protein-coding exons (i.e., the exome, which comprises ∼1 to 2% of the human genome), exome sequencing is a powerful approach ...<|separator|>
  15. [15]
    Not all exons are protein coding: Addressing a common misconception
    Apr 12, 2023 · We demonstrate that only a fraction of exonic sequences are protein coding and highlight the importance of non-coding exonic regions.What Is An Exon? · Figure 1 · The Non-Coding Exome Plays...
  16. [16]
    Large-scale whole-exome sequencing analyses identified protein ...
    Jul 15, 2024 · Compared to GWAS, whole-exome sequencing (WES) focused on protein-coding regions, potentially unmasking variants directly associated with IMDs.
  17. [17]
    The Nobel Prize in Physiology or Medicine 1993 - Press release
    This simple picture of gene structure completely changed when Richard J. Roberts and Phillip A. Sharp in 1977 independently discovered that genes could be ...
  18. [18]
    1977: Introns Discovered
    Apr 26, 2013 · Richard Roberts' and Phil Sharp's labs showed that eukaryotic genes contain many interruptions, called introns.
  19. [19]
    Discovery of RNA splicing and genes in pieces - PNAS
    The discovery of pre-mRNA splicing and the corollary that most genes of multicellular organisms are split into pieces, i.e., exons, separated by longer introns, ...
  20. [20]
    Finding the tail end: The discovery of RNA splicing - PMC - NIH
    Dec 23, 2019 · Sharp compares discovering RNA splicing to finding the Rosetta Stone. His revelation earned him the 1993 Nobel Prize in Physiology or Medicine, ...
  21. [21]
    Exome sequencing: the sweet spot before whole genomes - PMC
    Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical ...
  22. [22]
    The evolution of next-generation sequencing technologies - PMC
    In 1953, Watson and Crick first presented the double helical structure of a DNA molecule. Their findings unearthed the desire to elucidate the exact composition ...
  23. [23]
    The Rise and Rise of Exome Sequencing | Public Health Genomics
    Nov 30, 2016 · Beginning in 2009, the advent of exome sequencing has contributed significantly towards new discoveries of heritable germline mutations and ...
  24. [24]
    Using Whole Exome Sequencing to Walk From Clinical Practice to ...
    Feb 6, 2013 · Ng et al, in 2009, completed the first proof-of-principle study demonstrating the feasibility of using exome sequencing to identify causal ...
  25. [25]
    Application of Whole Exome Sequencing to Identify Disease ... - NIH
    To further facilitate the efficiency of this approach, whole exome sequencing (WES) was first developed in 2009. Over the past three years, multiple groups ...
  26. [26]
    Exome sequencing: the expert view | Genome Biology | Full Text
    Sep 14, 2011 · Because exomes focus on exons, which include coding regions of genes, and because most high-penetrance (Mendelian or nearly so) variation is ...<|separator|>
  27. [27]
    Exome sequencing identifies the cause of a Mendelian disorder - PMC
    We demonstrate the first successful application of exome sequencing to discover the gene for a rare, Mendelian disorder of unknown cause, Miller syndrome.
  28. [28]
    Exome sequencing identifies the cause of a mendelian disorder
    We demonstrate the first successful application of exome sequencing to discover the gene for a rare mendelian disorder of unknown cause, Miller syndrome.
  29. [29]
    Whole-exome sequencing for finding de novo mutations in sporadic ...
    Dec 21, 2010 · Recent work has used a family-based approach and whole-exome sequencing to identify de novo mutations in sporadic cases of mental ...
  30. [30]
    Unlocking Mendelian disease using exome sequencing
    Sep 14, 2011 · Exome sequencing is revolutionizing Mendelian disease gene identification. This results in improved clinical diagnosis, more accurate genotype-phenotype ...
  31. [31]
    Exome sequencing explained: a practical guide to its clinical ...
    Dec 9, 2015 · This review provides a practical guide for clinicians and genomic informaticians on the clinical application of whole-exome sequencing.
  32. [32]
    Genome Sequencing for Diagnosing Rare Diseases
    Jun 5, 2024 · Genetic variants that cause rare disorders may remain elusive even after expansive testing, such as exome sequencing.
  33. [33]
    Overview of Next Generation Sequencing Technologies - PMC - NIH
    The purpose of this review is to provide a compendium of NGS methodologies and associated applications. Each brief discussion is followed by web links to the ...
  34. [34]
    Next-Generation Sequencing Technology: Current Trends and ... - NIH
    The basic principle for short-read sequencing involves sequencing by synthesis based on enrichment through hybridization, amplification, or fragmentation.
  35. [35]
    NGS for Beginners | Learn the basics of NGS - Illumina
    Next-generation sequencing involves four basic steps: extraction, library preparation, sequencing, and data analysis. Extraction. During extraction, nucleic ...NGS Workflow Steps · NGS vs Sanger Sequencing · Bioinformatics for Beginners
  36. [36]
    [PDF] Whole Exome Sequencing and Analysis
    Feb 21, 2024 · A1. Whole Exome Sequencing (WES) is an efficient strategy to selectively sequence the coding regions (exons) of a genome, typically human, ...
  37. [37]
    Genes to therapy: a comprehensive literature review of whole ...
    Nov 10, 2024 · Solution-based exome sequencing. In the solution-based WES method, DNA samples undergo fragmentation, generating manageable fragments.
  38. [38]
    WES vs WGS vs Custom Panels - Roche Sequencing Solutions
    Compared to WGS, WES greatly reduces sequencing costs by focusing on only ~2–5% of the genome. Thus, for researchers whose focus is protein-coding regions, WES ...
  39. [39]
    Comparison of Exome and Genome Sequencing Technologies for ...
    At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold ...
  40. [40]
    Whole-genome sequencing is more powerful than whole-exome ...
    WGS is currently more expensive than WES, but its cost should decrease more rapidly than that of WES. We compared WES and WGS on six unrelated individuals. The ...
  41. [41]
    Panels, WES, or WGS: Which is Best for Rare Disease Diagnosis?
    Compared with WES, WGS generates extensive data, and the cost of storing and analyzing this data is two to three times higher than that of WES, although ...
  42. [42]
    Improved diagnostic yield compared with targeted gene sequencing ...
    Aug 3, 2017 · Emerging prospective data suggest that the early use of WES in diagnostic evaluations can result in cost savings and improved diagnostic yields, ...
  43. [43]
    Measuring coverage and accuracy of whole-exome sequencing in ...
    The accuracy of variant detection in coding regions is lower for whole-exome sequencing (WES) than whole-genome sequencing, even at equivalent coverage., ...<|control11|><|separator|>
  44. [44]
    gnomAD
    The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and ...
  45. [45]
    Clinical Whole-Exome Sequencing for the Diagnosis of Mendelian ...
    Oct 2, 2013 · Whole-exome sequencing is a diagnostic approach for the identification of molecular defects in patients with suspected genetic disorders.
  46. [46]
    Mendelian Gene Discovery: Fast and Furious with No End in Sight
    Sep 5, 2019 · The impact has been rapid and profound. NGS-based approaches (primarily ES) have led to ∼36% (1,268/3,549) of all reported Mendelian gene ...<|control11|><|separator|>
  47. [47]
    Centers for Mendelian Genomics uncovering the genomic basis of ...
    Aug 6, 2015 · Large DNA sequencing studies have enabled the CMGs to discover hundreds of individuals who "expand" a phenotype, or physical characteristics, of ...
  48. [48]
    Rare variant contribution to human disease in 281,104 UK Biobank ...
    Aug 10, 2021 · Exome sequencing has revolutionized our understanding of rare diseases, uncovering causal rare variants for hundreds of these disorders. However ...
  49. [49]
    Whole exome sequencing in dense families suggests genetic ...
    Dec 7, 2022 · Whole Exome Sequencing (WES) studies provide important insights into the genetic architecture of serious mental illness (SMI).Results · Variant Profile · Methods
  50. [50]
    Exome sequencing and complex disease: practical aspects of rare ...
    Instead of characterizing function in model systems, exome sequencing potentially allows for evaluating the functional consequence of pathogenic mutations ...Study Design: Sample... · Variant Calling · Association Analysis
  51. [51]
    The contribution of whole-exome sequencing to intellectual disability ...
    Whole-exome sequencing (WES) is useful for molecular diagnosis, family genetic counseling, and prognosis of intellectual disability (ID).
  52. [52]
    Whole exome sequencing - Insights - Mayo Clinic Labs
    Mayo Clinic Laboratories' WES test utilizes next-generation sequencing to investigate approximately 20,000 genes in patients with suspected hereditary disorders ...<|control11|><|separator|>
  53. [53]
    NIH funds new effort to discover genetic causes of single-gene ...
    Jul 19, 2021 · Recently, researchers have been identifying about 300 Mendelian disease genes each year using a technique called whole-exome sequencing.
  54. [54]
    gnomAD v4.0
    Nov 1, 2023 · This release is nearly 5x larger than the combined v2/v3 releases and consists of two callsets: exome sequencing data from 730,947 individuals, ...
  55. [55]
    Whole-exome sequencing to analyze population structure ... - PNAS
    May 31, 2016 · We compared the information provided by whole-exome sequencing (WES) and genome-wide single-nucleotide variant arrays in terms of principal ...
  56. [56]
    Whole-exome sequencing in an isolated population from the ... - NIH
    Apr 6, 2016 · This work confirms the isolate status of Vis population by means of whole-exome sequence and reveals the pattern of loss-of-function mutations, ...
  57. [57]
    Exome Sequencing Provides Evidence of Polygenic Adaptation to a ...
    Sep 12, 2017 · In this study, we use whole-exome sequencing data from the Nganasans and Yakuts to infer the evolutionary history of these two indigenous ...
  58. [58]
    Whole-exome sequencing unravels how selection and gene flow ...
    Nov 17, 2020 · Looking for the needle in a downsized haystack: Whole-exome sequencing unravels how selection and gene flow have shaped climatic adaptation in ...
  59. [59]
    Novel metrics to measure coverage in whole exome sequencing ...
    Apr 13, 2017 · A major shortcoming of WES is uneven coverage of sequence reads over the exome targets contributing to many low coverage regions, which hinders ...
  60. [60]
    Novel metrics to measure coverage in whole exome sequencing ...
    Whole Exome Sequencing (WES) is a powerful clinical diagnostic tool for discovering the genetic basis of many diseases. A major shortcoming of WES is uneven ...
  61. [61]
    Utility and limitations of exome sequencing as a genetic diagnostic ...
    Jun 15, 2018 · Technical challenges were identified, including inadequate capture and coverage of HL genes. Additional considerations of ES include ...
  62. [62]
    Comparative evaluation of four exome enrichment solutions in 2024
    Jan 27, 2025 · All kits showed high target coverage, with 10x coverage exceeding 97.5% and 20x coverage above 95%.
  63. [63]
    Clinical exome sequencing—Mistakes and caveats - PMC
    Clinical exome sequencing mistakes include issues with data quality, high error rates, nonuniform coverage, and difficulty detecting insertions/deletions.
  64. [64]
    Navigating the Limitations of Whole Exome Sequencing in Complex ...
    WES has limitations in complex regions like homopolymers, where it can have issues with variant calling due to technical limitations and biases.
  65. [65]
    Mapinsights: deep exploration of quality issues and error profiles in ...
    Jun 28, 2023 · We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to ...
  66. [66]
    Limitations of exome sequencing in detecting rare and undiagnosed ...
    Mar 19, 2020 · Exome sequencing may miss diagnoses; 33% of diagnoses in this study were not solved exclusively by ES, requiring additional testing.
  67. [67]
    Challenges in Medical Applications of Whole Exome/Genome ... - NIH
    Also relevant to medical sequencing is inadequate capture of 1,000 to 2,000 genes for variant detection, as the current WES offers adequate capture for ~ 80 to ...
  68. [68]
    Best practices for the interpretation and reporting of clinical whole ...
    Apr 8, 2022 · Genome test interpretation and reporting represents a challenge to laboratories seeking to implement, or maximize the diagnostic potential of, ...
  69. [69]
    Unsolved challenges of clinical whole-exome sequencing
    Aug 11, 2016 · We designed a systematic review of the literature to identify the most important challenges directly reported by technology users.
  70. [70]
    Systematic gaps in reporting variants of uncertain significance (VUS ...
    More than 70% of all unique variants in the ClinVar database are labeled as VUS, with the rate of VUS identification growing over time. VUS are nondiagnostic ...
  71. [71]
    GeneDx Announces Data Demonstrating that Whole Exome and ...
    Aug 21, 2023 · Variants of uncertain significance were reported less frequently on exome and genome sequencing (22.5%) than multi-gene panels (32.6%). The ...
  72. [72]
    Variants of Uncertain Significance in Hereditary Disease Genetic ...
    Oct 25, 2023 · Among all individuals tested, 692 227 (41.0%) had at least 1 VUS and 535 385 (31.7%) had only VUS results. The number of VUSs per individual ...
  73. [73]
    The challenge of genetic variants of uncertain clinical significance
    As new evidence becomes available, VUS may be re-classified. Current data suggest that 10 to 15% of re-classified VUS will be upgraded to likely pathogenic/ ...
  74. [74]
    Identifying Causal Variants in Mendelian Diseases: Common Pitfalls ...
    Incorrect variant annotation, incomplete penetrance, and phenocopies are common pitfalls in the identification of causal variants; Careful interpretation of ...
  75. [75]
    Best practices for variant calling in clinical sequencing
    Oct 26, 2020 · However, it should be noted that NGS data are prone to certain types of artifactual variant calls, many of which are related to errors in short- ...
  76. [76]
    Reducing INDEL calling errors in whole genome and exome ...
    Oct 28, 2014 · Common INDEL issues, such as realignment errors, errors near perfect repeat regions, and an incomplete reference genome have caused problems for ...<|control11|><|separator|>
  77. [77]
    Bioinformatics of germline variant discovery for rare disease ...
    Jan 23, 2024 · The ongoing process of making long read sequencing available for clinical diagnostics can be expected to solve many of the problems mentioned in ...
  78. [78]
    The challenges in the interpretation of genetic variants detected by ...
    Oct 12, 2023 · A major challenge in applying WES and array CGH is the interpretation of VUS variants. Although segregation study is a crucial step to overcome ...
  79. [79]
    Are whole-exome and whole-genome sequencing approaches cost ...
    Cost estimates for WES range from $555 to $5,169 and WGS from $1,906 to $24,810, but the evidence base is limited to support their widespread use.
  80. [80]
    Pricing | Bioinformatics and Systems Biology Core
    Sample-based services and charges: ; Exome-seq. Up-to 12 samples: $100/sample; Additional samples: $75/sample. output ; WGS. Up-to 12 samples: $150/sample ...
  81. [81]
    Efficacy and economics of targeted panel versus whole exome ...
    While cost and reimbursement barriers continue to exist for both targeted panels and WES, these barriers are greater for WES, despite having a higher diagnostic ...
  82. [82]
    [PDF] Identifying Insurance Barriers in Obtaining Exome Sequencing in the ...
    Insurance barriers include payor denial due to financial burden and lack of data supporting clinical utility. 47.5% of patients were denied coverage.
  83. [83]
    Challenges and recommendations to increasing the use of exome ...
    Jan 13, 2023 · The widespread adoption of exome sequencing and whole genome sequencing in Brazil is limited by various factors: cost and lack of funding, reimbursement, ...
  84. [84]
    The Economics of Genomic Medicine: Why Hospitals Can't Afford to ...
    Oct 16, 2025 · Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genetics in Medicine, 20(10) ...
  85. [85]
    Genomic Sequencing Procedure Cost and Value Models
    Whole Exome Sequencing (WES) model: modified default cost of WES to $2439, which is the average WES cost in the microcosting summary sheet; Microcost ...
  86. [86]
    Unsolved challenges of clinical whole-exome sequencing
    Aug 11, 2016 · WES results can sometimes take longer to obtain than more targeted tests, which may challenge their implementation in a clinically relevant ...
  87. [87]
    What's the cost of Whole Exome Sequencing (Only need raw data in ...
    Apr 24, 2025 · We do WES in about 1-2 weeks... And it costs around €999. It beats the 8 months normal clinical labs take in my country. It is expensive though.
  88. [88]
    Improving access to exome sequencing in a medically underserved ...
    The Texome Project is a 4-year study reducing barriers to genomic testing for underserved individuals, providing free exome sequencing and genetic evaluation.
  89. [89]
    Challenges to Informed Consent for Exome Sequencing
    Our results suggest that genetic counselors report intentions to prioritize individual patient needs when obtaining informed consent for exome sequencing.
  90. [90]
    Informed consent practices for exome sequencing: An interview ...
    Feb 11, 2022 · The use of clinical genomic sequencing gives rise to challenges regarding informed consent, as it can yield more, and more complex results.
  91. [91]
    Privacy and ethical challenges in next-generation sequencing - NIH
    NGS raises ethical concerns regarding privacy, informed consent, return of results, and the use of machine learning, due to the large amount of data generated.2. Privacy And Data... · 3. Informed Consent · 4. Return Of Results
  92. [92]
    'Anonymous' genetic databases vulnerable to privacy leaks - Nature
    Oct 14, 2024 · The ability to link private and public data sets could be putting research participants' private health information at risk.
  93. [93]
    ACMG Recommendations for Reporting of Incidental Findings ... - NIH
    Here, we provide the recommendations of the ACMG Working Group on Incidental Findings in Clinical Exome and Genome Sequencing (hereafter referred to as the ...
  94. [94]
  95. [95]
    Frequency of genomic incidental findings among 21,915 eMERGE ...
    Here we present frequencies and types of medically actionable incidental findings to support informed decision-making by patients, participants, and ...
  96. [96]
    Lessons learned from unsolicited findings in clinical exome ... - Nature
    Oct 25, 2021 · UFs were identified in 0.58% (95/16,482) of index patients, indicating that the overall frequency of UFs in clinical WES is low. Fewer UFs were ...
  97. [97]
    Overall frequency of Incidental Findings across the eMERGEIII-IF ...
    We identified 661 actionable findings unrelated to participant test indication, resulting in an overall IF rate of 3.02%.<|separator|>
  98. [98]
    Opt‐in for secondary findings as part of diagnostic whole‐exome ...
    Apr 6, 2023 · From the 3263 studied participants, 50.4% (n = 1643) opted in to receive the secondary findings. Of those who opted in, 45 (2.7%) had secondary ...1 Introduction · 3 Results · 4 Discussion
  99. [99]
    Challenges and practical solutions for managing secondary ...
    Nov 9, 2021 · This study aimed to explore primary care providers' challenges and potential solutions for managing secondary findings.
  100. [100]
    Return of non-ACMG recommended incidental genetic findings to ...
    Nov 21, 2022 · ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med. 2013;15(7):565–74. Article ...
  101. [101]
    Opening Pandora's box?: ethical issues in prenatal whole genome ...
    Ethical concerns include valid consent, managing information, professional responsibilities, priority setting, and duties towards the future child.
  102. [102]
    the complex social and ethical terrain of prenatal exome sequencing
    Nov 7, 2022 · Prenatal exome sequencing (PES) for the diagnosis ... Opening Pandora's box?: ethical issues in prenatal whole genome and exome sequencing.
  103. [103]
  104. [104]
    The Challenges of Performing Exome Sequencing in Structurally ...
    Oct 11, 2024 · When performing pES in structurally normal fetuses challenges lie not only in variant interpretation but also in clinical interpretation. Even ...
  105. [105]
    ACMG Practice Guidelines Exome and genome sequencing for ...
    An evidence-based clinical practice guideline for the use of exome and genome sequencing (ES/GS) in the care of pediatric patients with one or more congenital ...
  106. [106]
    The full spectrum of ethical issues in pediatric genome-wide ...
    Sep 6, 2021 · Genome-wide sequencing, as whole exome or whole genome sequencing (WGS/ WES), can be used to identify variations in a person's genetic code ...
  107. [107]
    Whole-exome sequencing in pediatrics: parents' considerations ...
    Jul 27, 2016 · Parents' preferences for unsolicited findings (UFs) from diagnostic whole-exome sequencing (WES) for their children remain largely unexplored.Missing: controversies | Show results with:controversies
  108. [108]
    The full spectrum of ethical issues in pediatric genome-wide ...
    Variations in insurance coverage, parental socioeconomic status, and geographic location are three factors that may limit access to germline genomic sequencing.Missing: controversies | Show results with:controversies
  109. [109]
    Whole Exome Sequencing - Center for Genetic Medicine
    For example, the human exome contains approximately 85 percent of all known disease-related variants.Missing: size | Show results with:size
  110. [110]
    Whole-Exome Sequencing | Element Biosciences
    Jan 9, 2024 · Comprising only 1% of the genome, exons nonetheless harbor 85% of the genetic mutations known to cause disease.2 By isolating and assessing only ...
  111. [111]
    Exome sequencing and characterization of 49,960 individuals in the ...
    Oct 21, 2020 · Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants ( ...
  112. [112]
    Whole Exome Sequencing - an overview | ScienceDirect Topics
    An average human exome contains between 20,000 and 25,000 coding variants, of which ~ 10,000 variants are nonsynonymous, and can potentially consist of the ...
  113. [113]
    A meta-analysis of diagnostic yield and clinical utility of genome and ...
    Purpose: To systematically evaluate the diagnostic yield and clinical utility of genome sequencing (GS) and exome sequencing (ES; genome-wide sequencing [GWS]) ...
  114. [114]
    Diagnostic Yield of Next-Generation Sequencing for Rare Pediatric ...
    Jun 9, 2025 · A large meta-analysis on pediatric patients with suspected genetic disorders reports success rates of 38.6% for WGS and 37.8% for WES [12]. WGS ...
  115. [115]
    Diagnostic yield of clinical exome sequencing in 868 children with ...
    A molecular diagnosis was reached in 27% of cases. Significantly higher yields of respectively 34% and 32% were observed in patients with intellectual ...Diagnostic Yield Of Clinical... · 3. Results · 3.2. Diagnostic Yield...<|separator|>
  116. [116]
    P168: Diagnostic yield of exome sequencing in adults with rare ...
    Diagnostic yield varied greatly by indication for testing (range 6.1-42.9%). The most common indications were neurodevelopmental (13.3%), neuromuscular (11.5%), ...<|control11|><|separator|>
  117. [117]
    Diagnostic Yield of Exome Sequencing in a Diverse Pediatric and ...
    The diagnostic yield was significantly higher in pediatric compared to prenatal cases. Overall, out of 529 pediatric probands, 141 (26.7%) had a positive ( ...
  118. [118]
    Diagnostic efficacy and clinical utility of whole-exome sequencing in ...
    Nov 20, 2024 · Overall diagnostic yield and clinical utility achieved in our cohort were 43% and 76%, respectively. This diagnostic yield is consistent with ...Methods · Results · Diagnostic Yield And...
  119. [119]
    Meta-analysis of the diagnostic and clinical utility of exome and ...
    May 13, 2023 · This meta-analysis aims to compare the diagnostic and clinical utility of exome sequencing (ES) vs genome sequencing (GS) in pediatric and ...
  120. [120]
    Molecular Diagnostic Yield of Exome Sequencing and ...
    Sep 11, 2023 · This systematic review and meta-analysis provides high-level evidence supporting the diagnostic efficacy of ES and CMA in patients with short stature.
  121. [121]
    Diagnostic Yield of Genome Sequencing Versus Exome ... - PubMed
    Jun 16, 2025 · This study aims to systematically review and meta-analyze the diagnostic power of ES versus GS in pediatric populations with rare diseases.Missing: clinical | Show results with:clinical
  122. [122]
    Whole-Genome vs Whole-Exome Sequencing in Suspected Genetic ...
    Jan 26, 2024 · Whole-exome sequencing (WES) analyzes protein-coding sections of the genome, while whole-genome sequencing (WGS) analyzes both coding and ...
  123. [123]
    Molecular Diagnostic Yield of Exome Sequencing and ... - PubMed
    This systematic review and meta-analysis provides high-level evidence supporting the diagnostic efficacy of ES and CMA in patients with short stature.
  124. [124]
    Comparing the diagnostic and clinical utility of WGS and WES with ...
    Jul 17, 2023 · The pooled diagnostic utility showed that WES (0.40, 95% CI 0.34-0.45, I2=90%), was qualitatively greater than WGS (0.34, 95% CI 0.29-0.39, I2= ...
  125. [125]
    A Comparison of Whole Genome Sequencing to Multi-Gene Panel ...
    Whole exome and whole genome sequencing (WGS) are entering clinical use, posing questions about their incremental value compared with disease-specific multi- ...