Fact-checked by Grok 2 weeks ago

Gene duplication

Gene duplication is a fundamental process in whereby a segment of DNA containing a gene is copied, resulting in two or more identical copies within the that can subsequently diverge through . This provides organisms with additional genetic material, allowing one copy to retain its original function while the duplicate acquires novel roles, thereby contributing to and without disrupting essential processes. Gene duplications occur through several distinct mechanisms, including whole-genome duplication (WGD), where the entire set of chromosomes is replicated, often seen in and early vertebrates; tandem duplication, involving adjacent copies formed by during ; segmental duplication, which copies large chromosomal regions; and retroduplication, where reverse-transcribed mRNA creates intronless gene copies via retrotransposons. These processes vary in frequency across species—for instance, WGD events have shaped over 64% of genes in many genomes, while tandem duplications account for about 10% in —and can lead to , which is particularly prevalent in flowering . Evolutionarily, gene duplication serves as a primary source of genetic innovation by enabling neofunctionalization, where the duplicate gains a new function, or subfunctionalization, where the original functions are partitioned between copies. This has driven key adaptations, such as enhanced stress resistance in plants and the diversification of gene families, with nearly all genes tracing back to ancient duplications. However, duplications can also contribute to when dosage imbalances occur, as in over 80% of human disease-associated genes that have undergone duplication. Overall, the retention and divergence of duplicates, with a typical half-life of around 4 million years, underscore their role in long-term genomic across eukaryotes.

Fundamentals

Definition and Types

Gene duplication is a fundamental evolutionary process in which a segment of DNA containing a functional is copied within the , resulting in two or more identical or nearly identical copies of the original . This duplication creates genetic redundancy, allowing one copy to maintain the original function while the other may accumulate mutations without immediate deleterious effects. The process is widespread across eukaryotes and prokaryotes, contributing to expansion and functional innovation, as first systematically explored in Susumu Ohno's seminal work. Gene duplications are classified into several types based on their genomic scale and mechanism of origin. Tandem duplications occur when copies are generated adjacent to each other on the same chromosome, often through errors in recombination, resulting in gene clusters. Dispersed duplications produce non-adjacent copies scattered across the genome, typically via transposition events like retrotransposition or DNA-mediated movement. Segmental duplications involve larger blocks of DNA, encompassing multiple genes, duplicated within or between chromosomes. Whole-genome duplications (WGD), also known as polyploidy events, replicate the entire genome, leading to multiple copies of all genes simultaneously; these are particularly common in plants but have occurred in vertebrate lineages as well. At the molecular level, gene duplication immediately introduces , where the duplicate copies share overlapping functions and are initially under relaxed purifying selection, as in one copy are buffered by the other. This reduces selective pressure on the duplicates, permitting or slightly deleterious changes to accumulate without disrupting functions, though most duplicates are eventually lost or pseudogenized. Functional divergence, if it occurs, arises later through processes like neofunctionalization or subfunctionalization, but the initial phase is characterized by preserved sequence similarity and . A classic example of whole-genome duplication's impact is seen in the clusters of , where two rounds of WGD in early produced four clusters (HoxA-D) from an ancestral single cluster, enabling spatial patterning innovations in body plans such as paired appendages.

Historical Context

The concept of gene duplication emerged in the early through cytogenetic studies in , where —whole-genome duplication—was recognized as a common mechanism contributing to and variation. Dutch botanist first described mutants in in 1907, and by the 1910s, researchers like Albert F. Blakeslee and Øjvind Winge had identified in various angiosperms, attributing it to doubling that amplified gene copies and facilitated evolutionary novelty. These observations laid foundational evidence for duplication events at the genomic scale, particularly in , where was estimated to occur in up to 70% of species by mid-century. In animals, early molecular insights came from research in the 1930s. Calvin B. Bridges demonstrated in 1936 that the Bar eye phenotype resulted from a tandem duplication of a chromosomal segment, providing the first direct evidence of segmental gene duplication and its phenotypic effects through . This work hinted at duplication as a source of genetic and , though it was viewed primarily as a cytological anomaly rather than an evolutionary driver. By the , the discovery of multigene families further illuminated the prevalence of duplications; for instance, (rDNA) was identified as a tandemly repeated multigene family in by Ritossa and Spiegelman in 1965, revealing hundreds of identical copies essential for . Similar findings in other organisms, such as and immunoglobulin genes, underscored that duplications generated families of related sequences, challenging the notion of genes as unique loci. The modern synthesis of gene duplication as a major evolutionary mechanism crystallized in 1970 with Susumu Ohno's seminal book Evolution by Gene Duplication, which argued that duplications provide raw material for innovation by freeing redundant copies from selective constraints, allowing divergence into new functions. This perspective integrated with Motoo Kimura's , proposed in 1968 and expanded in the 1970s, positing that many duplications and subsequent mutations are selectively neutral, fixed by rather than adaptive pressure, thus explaining the abundance of pseudogenes and paralogs in . Confirmation accelerated in the 1980s and 1990s with technologies; for example, sequencing of the human beta-globin cluster in 1980 revealed ancient duplications underlying evolution, while the 1996 genome sequence identified widespread paralogs from a whole-genome duplication event approximately 100 million years ago. These molecular data validated Ohno's hypothesis at scale, showing duplications accounted for 15-20% of eukaryotic genes. Early reception of these ideas was marked by debates over whether duplications primarily drive adaptive innovation or accumulate ly. Ohno's adaptive emphasis faced skepticism from ists like , who argued most fixed duplicates contribute little to fitness and are lost or silenced, as evidenced by high rates in genomes. Proponents of , however, highlighted cases like clusters, sequenced in the 1990s, where duplications correlated with morphological complexity. This tension persisted into the late 20th century, shaping models that balanced neutral drift with occasional positive selection in duplicate retention.

Mechanisms

Unequal Crossing Over

is a key mechanism of gene duplication that occurs during , particularly in , when misaligned homologous chromosomes or exchange genetic material unevenly. This misalignment leads to one recombinant chromatid receiving an extra copy of a or segment, while the reciprocal product experiences a deletion. The process is homology-dependent, relying on sequence similarity to initiate pairing, but errors in alignment result in non-allelic homologous recombination (NAHR), producing tandem duplications. At the molecular level, repetitive sequences play a critical role in facilitating misalignment. Low-copy repeats (LCRs), which are paralogous segments greater than 1 with over 90% sequence identity, mediate NAHR by promoting ectopic pairing between non-allelic sites. Similarly, Alu elements, abundant short interspersed nuclear elements, can drive unequal exchanges due to their high copy number and sequence homology, often resulting in local duplications or larger copy-number variants. These events typically yield arrays, where duplicated genes are arranged in direct orientation adjacent to the original copy, enhancing the potential for further evolutionary changes. Segmental duplications, involving large (often >10 ) non-tandem copies of chromosomal regions, can also arise via NAHR between dispersed LCRs, contributing to genomic architecture and disease susceptibility. The frequency of is elevated in genomic regions enriched with LCRs or Alu elements, as these repeats increase the likelihood of misalignment during . Such hotspots are common in clusters prone to instability, where even low-level (e.g., 25-39 bp identity) can suffice for recombination. In human sperm, for instance, de novo duplications occur at rates around 10^{-5} per , predominantly through intermolecular exchanges between homologous chromosomes. A prominent example is the duplication within the human alpha-globin gene cluster on , where between the alpha2 (HBA2) and alpha1 (HBA1) genes generates anti-3.7 kb duplications, resulting in three alpha-globin genes (ααα configuration). This event, driven by Z-box repetitive homology blocks flanking the genes, is reciprocal to common deletions and underscores how such mechanisms contribute to both normal variation and disease predisposition.

Replication-Based Errors

Replication-based errors during represent a primary for generating small-scale gene duplications, particularly those involving short tandem repeats (STRs). In this , known as replication slippage or slipped-strand mispairing, the temporarily dissociates from the template strand within repetitive sequences, leading to misalignment upon re-annealing. This slippage can cause the polymerase to skip forward (resulting in deletions) or repeat a segment (producing duplications) of the template, typically affecting sequences under 1 kb in length. Such errors are exacerbated in regions rich in STRs, where the repetitive nature facilitates strand dissociation during the S-phase of the . At the molecular level, replication fork stalling plays a central role, often triggered by non-B DNA structures such as hairpins or triplexes formed in repetitive or AT-rich sequences during strand unwinding. The fork stalling and template switching (FoSTeS) model describes how a stalled fork disengages, with the nascent strand invading a secondary template via microhomology (typically 2–15 bp), resuming synthesis and incorporating duplicated material. This mechanism accounts for both simple tandem duplications and more complex rearrangements with junctional microhomologies or insertions. Non-B structures, like stable hairpins in CAG/CTG repeats, impede polymerase progression, increasing the likelihood of template switching and duplication events. Error-prone DNA polymerases, such as those with lower fidelity (e.g., inversely correlated with proofreading efficiency), further promote slippage by stabilizing misaligned intermediates during synthesis. These errors are more frequent for microduplications under 1 , occurring at elevated rates in regions of , such as fragile sites or late-replicating domains. Replication timing influences susceptibility, with late-replicating regions exhibiting higher rates due to prolonged exposure to endogenous es and reduced efficiency. Experimental of replication (e.g., via aphidicolin) generates non-recurrent copy number variants (CNVs), including duplications, at frequencies mimicking spontaneous events, with breakpoints often showing microhomologies consistent with FoSTeS. Small tandem duplications of 15–300 bp are observed in up to 25% of certain alleles, underscoring their prevalence in genomic . A representative example is the expansion of CAG trinucleotide repeats in the HTT gene, associated with . Slippage during replication of these repeats leads to duplication of the triplet units, with formation on the nascent strand promoting further iterations and expansions beyond 36 repeats, resulting in toxic protein aggregates. This process highlights how replication errors in STRs can drive pathological duplications while contributing to evolutionary variation in repeat copy number.

Transposition Events

Transposition events contribute to gene duplication through retrotransposition, a process in which mature mRNA transcripts are reverse-transcribed into (cDNA) and randomly inserted into new genomic locations, generating retrogene copies of the original gene. This RNA-mediated mechanism differs from direct DNA duplication by relying on an intermediary transcript, often utilizing the enzymatic machinery of endogenous retroelements to facilitate the insertion. At the molecular level, long interspersed nuclear element-1 (LINE-1 or L1) retrotransposons play a central role by providing the enzyme, which converts the mRNA into cDNA via a target-primed reverse transcription process. The resulting retrogenes typically lack introns, as the source mRNA is processed and spliced, and they often insert without their original promoters or regulatory elements, leading to poly(A) tails at the 3' end but potential initial transcriptional silence unless new regulatory sequences are acquired nearby. These characteristics distinguish retrogenes from intron-containing duplicates formed by other mechanisms. Retrotransposition is particularly prevalent in mammalian genomes, where LINE-1 activity has driven a significant portion of processed formation, accounting for about 70% of non-functional duplicates in . In the , estimates indicate approximately 8,000 to 17,000 retrocopies exist, many of which originated from lineage expansions around 40-50 million years ago. This abundance underscores retrotransposition's role in genomic plasticity, though most retrogenes become , with a subset evolving new functions post-fixation. A notable example of retrotransposition's impact on gene family expansion involves the PGAM family, where functional retrocopies like PGAM5 have arisen and acquired new roles in cellular processes.

Chromosomal Alterations

Chromosomal alterations represent a major mechanism for generating gene duplications on a large scale, primarily through and , which result in the gain or multiplication of entire chromosomes or genomes, thereby creating multiple copies of numerous genes simultaneously. involves the abnormal gain or loss of one or more chromosomes, leading to an imbalance in where affected cells possess extra or fewer copies of genes on those chromosomes. This process often arises from , the failure of homologous chromosomes or to separate properly during or , which disrupts normal chromosome segregation and produces gametes or daughter cells with altered numbers. In contrast, entails the duplication of the entire , instantly doubling or multiplying gene copies across all chromosomes, and can occur through mechanisms such as hybridization between (leading to allopolyploidy) or , where cells undergo repeated without or . These alterations extend beyond single-gene events, affecting vast genomic regions and providing raw material for evolutionary innovation. Aneuploidy is typically transient in most organisms due to its disruptive effects on cellular function, but it can become fixed in certain lineages, contributing to gene copy variation. , however, is far more stable and prevalent, particularly in , where it serves as a key driver of and . Recent estimates suggest that polyploidy accompanies approximately 15% of speciation events in angiosperms, though older studies proposed higher figures of 30–80%. In animals, and related aneuploid events are rarer owing to challenges in and development, yet they have played pivotal roles in major evolutionary transitions, such as in . For instance, two rounds of whole-genome duplication (2R) occurred in the ancestral vertebrate lineage approximately 500–600 million years ago, followed by a third round () in fish, which expanded families essential for complex traits like the nervous and immune systems. These events underscore how chromosomal alterations can facilitate rapid genomic reconfiguration without relying on incremental small-scale duplications.

Evolutionary Implications

Duplication Rates

Gene duplication rates are typically estimated through phylogenetic analyses that reconstruct the divergence times of paralogous gene pairs using molecular clocks calibrated against known evolutionary timelines. These methods account for synonymous substitution rates (Ks) between duplicates to infer when duplications occurred, providing a framework to quantify both ongoing small-scale events and episodic bursts from whole-genome duplications (WGDs). In , the average duplication rate is approximately 0.01 events per per million years, based on genomic surveys of such as humans, nematodes, fruit flies, and . This rate reflects primarily and segmental duplications, with estimates varying slightly by ; for instance, rates in vertebrates range from 0.0005 to 0.004 duplications per per million years when focusing on recent events. In the , duplicated genes constitute about 8–20% of the total content, underscoring the cumulative impact of these events over evolutionary time. exhibit generally higher effective duplication rates, often exceeding 0.01 per per million years when including polyploidy-driven WGDs, which are far more prevalent in than in and can double the gene complement instantaneously. For example, many plant lineages, such as , show elevated retention of duplicates with half-lives of 17–25 million years, compared to 3–7 million years in , due to these polyploid events. Several factors influence these rates across taxa. Larger sizes correlate with higher duplication frequencies, as expanded non-coding regions facilitate segmental duplications and transposon-mediated events. Recombination hotspots, where is more likely, also elevate local duplication rates by promoting non-allelic . Selection pressures play a key role in modulating net rates by favoring retention of duplicates under dosage constraints or novel functions, while purging redundant copies; purifying selection is stronger in essential genes, leading to faster loss rates. Variation is evident across taxa—for instance, fishes display accelerated duplication dynamics post their ancient WGD event approximately 300–450 million years ago, resulting in higher proportions of paralogs (up to 20–30% in some species like ) and elevated tandem duplication rates compared to other vertebrates. This burst contributed to the diversification of s, which comprise over half of all vertebrate species.

Neofunctionalization

Neofunctionalization refers to the evolutionary process whereby, after gene duplication, one paralog acquires a novel function—such as a new enzymatic activity or a distinct expression pattern—while the other copy preserves the original ancestral role. This divergence enables the innovation of new traits without disrupting established functions, contributing to adaptive across . The concept builds on the initial redundancy created by duplication, which provides a genetic for mutational experimentation. At the molecular level, neofunctionalization arises from relaxed purifying selection on the duplicate , allowing neutral or slightly deleterious mutations to accumulate until beneficial ones confer selective advantages. These adaptive changes often involve alterations in regulatory regions, leading to novel spatiotemporal expression, or structural modifications like shuffling that enable new interactions or catalytic properties. For instance, mutations in promoter sequences can shift expression to new tissues, while shuffling might repurpose binding sites for different substrates. Such mechanisms have been observed in , where duplicated copies develop enhanced specificity or entirely new reactions. Evidence for neofunctionalization emerges from , revealing paralogous genes with specialized roles that diverged post-duplication. A prominent example is the globin gene family in vertebrates, where ancient duplications led to paralogs like alpha and beta hemoglobins adapting distinct functions in oxygen transport and storage across developmental stages and tissues, such as fetal versus adult forms. Similarly, in insects, the bithorax complex demonstrates neofunctionalization through gene duplicates that acquired unique regulatory roles in body patterning. These cases highlight how paralogs evolve non-overlapping functions, supported by sequence divergence and functional assays. Theoretical models underpin neofunctionalization, with Susumu Ohno's foundational framework proposing that gene duplication supplies the raw material for evolutionary novelty by freeing one copy from selective constraints. Ohno emphasized that this redundancy fosters innovation, as seen in genome expansions. Quantitative models extend this by estimating the probability of fixation for advantageous in duplicates under positive selection, often approximating 2s (where s is the selection coefficient) compared to neutral drift, which influences the likelihood of permanent divergence. These probabilistic approaches, informed by , predict higher neofunctionalization rates in large populations with strong selective pressures.

Subfunctionalization and Dosage Effects

Subfunctionalization occurs when duplicated genes partition the ancestral gene's functions between the copies, thereby reducing redundancy and promoting the retention of both paralogs. This process typically involves complementary degenerative mutations that eliminate subsets of the original regulatory elements or protein domains in each duplicate, leading to a division of labor such as tissue-specific expression or specialized biochemical roles. For instance, one copy may retain expression in certain tissues while the other takes over in different ones, ensuring that the combined functions match the pre-duplication state. This mechanism was formalized in the duplication-degeneration-complementation (DDC) model, which posits that neutral mutations in cis-regulatory sequences, like promoters, can stochastically partition ancestral expression patterns, making both copies essential for viability. At the molecular level, subfunctionalization often arises through mutations affecting promoters, enhancers, or splicing sites, which alter expression timing, location, or isoform production without creating novel functions. Changes in can further drive this by fixing different splice variants in each paralog, preserving the ancestral while distributing subroles. In the (CYP) , involved in liver detoxification, duplicates have subfunctionalized to specialize in metabolizing distinct substrates, such as one paralog targeting specific xenobiotics while another handles endogenous compounds, enhancing adaptive responses to environmental toxins. This partitioning contrasts with neofunctionalization, where duplicates acquire entirely new functions, but both can contribute to long-term gene retention. Dosage effects refer to the selective pressures maintaining balanced copy numbers in duplicated genes, particularly those encoding stoichiometric components of protein complexes, where imbalances disrupt macromolecular assembly or cellular . genes exemplify this: following duplication, yeast histone paralogs are retained to preserve precise stoichiometry, with strong purifying selection against dosage imbalances via mechanisms like gene conversion to minimize divergence. Such balance is critical because excess or deficient gene products can impair complex formation; for instance, overexpressed in trigger and segregation errors. In metazoans, dosage imbalances from segmental duplications or often lead to developmental disorders or cancer predisposition, as seen in conditions like where extra copies of dosage-sensitive genes perturb stoichiometric networks.

Gene Loss and Redundancy

Following gene duplication, one common evolutionary outcome is the loss of one or both copies, often through the accumulation of deleterious mutations that render the gene non-functional, transforming it into a . This process typically begins shortly after duplication, as redundant copies experience relaxed purifying selection, allowing slightly deleterious mutations—such as frameshifts, premature stop codons, or promoter disruptions—to accumulate and fix via . In many cases, the redundant copy decays neutrally until it is completely silenced or deleted from the , contributing to the observation that the vast majority of duplicate genes are lost within a few million years. Estimates suggest that 50-80% of duplicates may be lost or pseudogenized within this timeframe, depending on the and duplication mechanism, as seen in post-whole-genome duplication events in like where 30-65% of duplicates were eliminated over tens of millions of years. Redundancy resolution after duplication is heavily influenced by dosage sensitivity, where genes involved in balanced complexes or stoichiometric interactions are less likely to lose a copy due to the disruptive effects of altered . The gene balance hypothesis posits that such dosage-sensitive genes, including many transcription factors and signaling components, experience stronger selection against imbalance, leading to higher retention rates of duplicates compared to dosage-insensitive genes. For instance, essential genes—those whose is lethal—are disproportionately retained as duplicates, as their loss would compromise critical functions without the buffering effect of . This selective pressure helps maintain genomic stability by preserving copies that mitigate dosage perturbations, while non-essential, dosage-tolerant genes are more prone to rapid elimination. Evolutionary patterns of gene loss vary with size and ecological context, with faster pseudogenization observed in smaller populations where accelerates the fixation of disabling mutations. In neutral models of decay, the rate of pseudogene formation approximates the genomic deleterious mutation rate (typically 10^{-5} to 10^{-6} per site per generation), but in small effective sizes (e.g., Ne < 10^6), drift dominates, shortening the half-life of duplicates to as little as 1-5 million years on average across eukaryotes. A notable example is the mammalian-specific pseudogenization of olfactory receptor genes, where rapid expansions via duplication were followed by extensive losses—up to 50% s in humans—likely due to relaxed selection in species with diminished reliance on olfaction, such as primates. These patterns underscore how gene loss streamlines genomes by removing redundant or non-adaptive sequences, reducing metabolic costs and mutational targets while adapting to niche-specific pressures.

Detection Methods

Computational Identification

Computational identification of gene duplications relies on analyzing single-genome sequence data to detect paralogous genes—copies arising within the same lineage—through in silico algorithms that assess sequence , genomic context, and evolutionary relationships. Key criteria include high sequence similarity, typically requiring greater than 30-50% amino acid identity over substantial portions of the protein length (e.g., >70-90% coverage), to infer ; synteny breaks, where conserved order is disrupted indicating duplication events; and paralog clustering, grouping genes into families based on shared ancestry. Tools like (Basic Local Alignment Search Tool) are foundational for initial local alignments, scanning genomes for similar sequences with e-value thresholds to filter spurious matches. Methods for detection encompass whole-genome alignments to pinpoint segmental duplicates, where tools such as MCScanX identify collinear blocks of homologous genes (requiring at least five pairs with minimal gaps) to reveal duplicated segments often spanning tens to hundreds of kilobases. For ancient duplications, phylogenetic tree reconciliation integrates gene trees—built from multiple sequence alignments using models like WAG or HKY—with species trees to infer duplication nodes by detecting inconsistencies like excess terminal branches. These approaches enable timing of events relative to speciation, distinguishing within-species paralogs from inter-species orthologs. Challenges in these methods include accurately distinguishing paralogs (duplication-derived) from orthologs (speciation-derived), which often requires multi-species comparisons to resolve ambiguous topologies, and handling errors in repetitive regions that can artifactually inflate duplication counts or misalign segments. False positives from fragmented , particularly in low-coverage genomes, necessitate filtering steps like best hits or synteny validation. A prominent example is Ensembl's paralogy predictions, which employ a inspired by TreeFam : genes are clustered via BLAST-based similarity (e.g., e-value < 1e-5), followed by multiple alignments and phylogenetic tree construction with TreeBeST for reconciliation, identifying duplications across vertebrate genomes with high precision for families like Hox genes.

Array-Based Techniques

Array-based techniques, particularly (CGH) microarrays, enable the detection of gene duplications by identifying (CNVs) across the genome. In , genomic DNA from a test sample is labeled with one fluorophore (e.g., Cy3), while reference DNA is labeled with another (e.g., Cy5), and both are hybridized to an array of immobilized DNA probes, such as (BAC) clones or oligonucleotides. The ratio of fluorescence intensities for each probe reflects relative copy number differences; specifically, the log2-transformed ratio (log2(test/reference)) greater than 0 indicates copy number gains, including duplications, with values around 0.58 corresponding to a single copy gain in diploid genomes. This method was pioneered in the late 1990s to achieve higher resolution than traditional for analyzing DNA copy number alterations. Resolution has evolved significantly with array designs. Early BAC-based arrays offered megabase (Mb)-scale resolution due to larger probe sizes (100-200 kb), suitable for detecting large segmental duplications but limited for smaller events. Subsequent oligonucleotide and single nucleotide polymorphism (SNP) arrays improved this to kilobase (kb) scale, with probe densities enabling detection of CNVs as small as 1-10 kb, particularly effective for recent duplications not obscured by sequence divergence. These advancements allow array CGH to identify both germline and somatic duplications, though it primarily detects unbalanced changes and may miss low-level mosaicism below 20-30% cellular prevalence. In applications, array CGH has been instrumental in population genetics to map CNV landscapes, revealing widespread gene duplications contributing to human genetic diversity, as seen in studies profiling hundreds of individuals. In disease diagnostics, it aids in identifying pathogenic duplications associated with developmental disorders, congenital anomalies, and cancers, often as a first-line test replacing due to its genome-wide coverage. However, a key limitation is its inability to readily distinguish tandem duplications (adjacent copies) from dispersed ones (non-adjacent), as it reports net copy number without structural context, necessitating orthogonal methods like for clarification. A notable example from the 2000s involved array CGH in the Human Genome Project era, where BAC-based platforms identified thousands of segmental duplications and associated CNVs, contributing to assemblies like hg17 and hg18 by highlighting duplication hotspots prone to genomic instability. For instance, high-density aCGH experiments targeted these regions, uncovering over 1,400 copy-number variable regions (CNVRs) in diverse human populations and linking duplications to evolutionary expansions in gene families like those involved in immunity.

Sequencing Approaches

Next-generation sequencing (NGS) technologies have revolutionized the detection of gene duplications by enabling high-throughput analysis of copy number variations (CNVs) and structural variants (SVs) at base-pair resolution. Read-depth analysis, a primary method in NGS, quantifies duplication events by measuring the normalized coverage of sequencing reads across genomic regions, where increased read depth indicates copy number gains. Paired-end mapping complements this by identifying SVs, including duplications, through discrepancies in the expected distance or orientation between read pairs, which signal insertions or rearrangements. These approaches build on earlier array-based techniques as precursors for CNV detection but offer superior resolution for mapping duplication breakpoints. Long-read sequencing technologies, such as 's single-molecule real-time (SMRT) sequencing and (ONT), address limitations of short-read NGS by producing reads spanning tens to hundreds of kilobases, effectively resolving complex gene duplications within repetitive genomic contexts. These methods excel at assembling segmental duplications—low-copy repeats with high sequence identity—by spanning homologous regions that short reads often collapse or misalign. For instance, polyploid phasing algorithms applied to long-read data have enabled the de novo assembly of duplicated loci, distinguishing alleles in heterozygous duplications. In the 2020s, advances in long-read sequencing have significantly improved the resolution of segmental duplications exhibiting greater than 95% sequence identity, with complete telomere-to-telomere assemblies revealing previously hidden duplication structures in the human genome. These improvements stem from enhanced base-calling accuracy and hybrid assembly pipelines integrating short- and long-read data, achieving near-perfect reconstruction of duplicated regions that were intractable in earlier drafts. Integration of sequencing with CRISPR-Cas9 enrichment has further advanced validation, where targeted capture of duplicated loci followed by long-read sequencing confirms structural variants and resolves causal alleles in complex regions. Despite these progresses, challenges persist, particularly with short-read sequencing in repetitive regions, where high sequence similarity leads to mapping ambiguities and false positives in duplication calls. Quantification errors in read-depth analysis are also common due to biases from GC content or mappability, potentially under- or overestimating copy numbers in duplicated segments. Long-read technologies mitigate some issues but face higher per-base error rates, necessitating computational polishing for accurate duplication annotation. Hi-C sequencing provides a complementary 3D contextual view for duplication detection by capturing chromatin interactions, revealing spatial proximity between duplicated loci that indicates functional or evolutionary relationships. Recent pangenome studies from 2023 to 2025 have leveraged these sequencing approaches to uncover hidden duplications across diverse human populations, with graph-based pangenomes identifying novel SVs in non-reference alleles that short-read methods missed. For example, the 's 2023 assembly highlighted population-specific gene duplications through long-read integration, enhancing our understanding of structural variation diversity. The 2025 Data Release 2 further expanded the pangenome with additional phased diploid assemblies from diverse ancestries, improving the identification of population-specific gene duplications and structural variants.

Nomenclature and Annotation

Naming Conventions

Gene duplication results in paralogous genes that require standardized nomenclature to facilitate consistent scientific communication and database integration. The Gene Nomenclature Committee () establishes these conventions for human genes, ensuring unique symbols that reflect evolutionary relationships without implying unverified functions. For paralogs arising from duplication, HGNC assigns a shared root symbol followed by distinguishing suffixes, typically Arabic numerals (e.g., -1, -2) or letters (e.g., A, B) based on sequence similarity, chromosomal location, or inferred function. Gene families, often expanded by duplications, use prefixes like for the cytochrome P450 superfamily, with suffixes such as CYP2D6 indicating specific members. Pseudogenes, which are non-functional duplicates, receive a "P" suffix, as in CYP2D7P, to denote their inactivated status. These rules prioritize stability, with updates only for newly resolved duplications or to correct ambiguities, overseen by HGNC in collaboration with international experts. Naming principles emphasize brevity and specificity: chromosomal location informs symbols for genes of unknown function (e.g., location-based identifiers), while sequence homology or functional clues guide family assignments. However, challenges arise with ancient duplications, where extensive sequence divergence creates ambiguities in paralog identification and orthology assignment, complicating consistent labeling across species. The mitigates this through rigorous review, but entrenched provisional names (e.g., for "family with sequence similarity") can persist until better evidence emerges. A prominent example is the HOX gene clusters, products of ancient whole-genome duplications, where paralogs are named by cluster (e.g., HOXA, HOXB) and positional numeral (e.g., HOXA1, HOXB1), reflecting their collinear arrangement and shared homeobox domain. This system highlights duplication events while avoiding functional speculation.

Database Resources

Several key databases serve as essential repositories for gene duplication data, enabling researchers to access annotated genomic regions, evolutionary histories, and comparative analyses across species. These resources integrate high-throughput sequencing data to facilitate the study of duplication events, their ages, and functional implications, while providing tools for visualization and programmatic access. Ensembl's Compara database offers comprehensive paralog trees derived from gene orthology and paralogy predictions, where paralogues are identified as genes sharing a most recent common ancestor via duplication events. These trees annotate duplication ages through reconciliation with species trees, distinguishing recent from ancient duplications, and include synteny viewers for visualizing conserved genomic blocks affected by duplications. The platform supports API access for querying homology data and has incorporated 2020s sequencing advancements, such as long-read assemblies, in its latest releases, including Ensembl 115 (September 2025) with expanded vertebrate and invertebrate genome coverage. The UCSC Genome Browser provides dedicated tracks for segmental duplications, displaying putative duplicated regions with color-coded levels of support based on sequence similarity and alignment evidence (data from 2013, last updated 2014 for GRCh38/hg38). This resource aids in identifying low-copy repeats and tandem duplicates within human and other mammalian genomes. While the browser integrates recent assemblies like GRCh38.p14 (2023), the specific segmental duplication track has not been updated; for refined boundaries from newer data, such as the Telomere-to-Telomere (T2T) Consortium's CHM13 assembly (2022), users may employ custom tracks or external resources. For plant-specific analyses, Phytozome hosts comparative genomics data across hundreds of Archaeplastida species, using tools like InParanoid-DIAMOND to cluster paralogous gene families and detect duplication-driven expansions. It features synteny browsers via JBrowse and BioMart for cross-species queries, with post-2020 updates including over 149 new genomes (up to October 2025, e.g., Nicotiana benthamiana v1.0) and improved homology alignments from long-read sequencing. As of Phytozome v14 (2025), it incorporates pangenome datasets such as BrachyPan (54 Brachypodium distachyon lines) and CowpeaPan (8 Vigna unguiculata genomes) to enhance duplication detection in diverse accessions. DupMasker is a specialized annotation tool for segmental duplications, particularly in primates, employing a library of consensus duplicon sequences (based on 2008 data) to mask and annotate duplicated regions with metrics like percent divergence and alignment scores. Integrated with , it outputs GFF-formatted results for downstream analysis and supports modern search engines like RMBlast. For analyses with recent primate assemblies, supplementation with updated repeat libraries is recommended. OrthoDB complements these by cataloging orthologs and paralogs across eukaryotes and prokaryotes, using hierarchical orthology inference to distinguish duplication-derived paralogs from speciation-derived orthologs. This enables cross-species comparisons of gene family evolution, with tools for phyloprofiling duplication patterns in diverse taxa. The latest version, OrthoDB v12.2 (updated 2024), covers 5,952 eukaryotic species with expanded gene loci coordinates and CDS data.
DatabaseKey Features for Gene DuplicationPrimary OrganismsAccess Methods
Ensembl ComparaParalog trees, duplication age annotation, synteny viewersVertebrates, invertebratesWeb interface, API, BioMart
UCSC Genome BrowserSegmental dups tracks with similarity levels (2013 data, updated 2014)Mammals (e.g., human)Interactive browser, custom tracks
PhytozomeParalogy clustering, synteny via JBrowse, pangenome datasetsPlants (Archaeplastida)BioMart, genome browsers
DupMaskerDuplicon annotation, divergence metrics (2008 library)PrimatesCommand-line tool, GFF output
OrthoDBOrtholog-paralog distinction, phyloprofiles (v12.2, 2024)Eukaryotes, prokaryotesWeb search, downloads
Recent advancements in pangenomics have addressed gaps in non-model organisms by enabling the inclusion of diverse accessions in databases like and , with integrations as of 2025 (e.g., Ensembl's ongoing pangenome projects and Phytozome's BrachyPan) improving duplication detection in species lacking single reference genomes.

Pathological and Applied Aspects

Role in Disease Amplification

Gene duplications can contribute to disease pathogenesis through mechanisms that alter gene dosage, particularly in the form of oncogene amplification and copy number variations (). In cancers, oncogene amplification often arises via unequal crossing over during meiosis or mitosis, leading to increased copy numbers of proto-oncogenes such as on chromosome 8q24. This process results in extrachromosomal DNA elements or intrachromosomal homogeneously staining regions that drive uncontrolled cell proliferation. Similarly, CNVs involving gene duplications are implicated in neurodevelopmental disorders, where dosage imbalances disrupt neural development; for instance, duplications in regions like 16p11.2 or 22q11.2 are associated with and by affecting synaptic function and neuronal connectivity. In oncology, amplified gene duplications elevate oncoprotein levels, promoting hallmarks of cancer such as sustained proliferation and evasion of apoptosis. A prominent example is HER2 (ERBB2) amplification on chromosome 17q12, observed in approximately 15-20% of breast cancers, which enhances signaling through the PI3K/AKT and pathways to accelerate tumor growth. This amplification is therapeutically targeted by trastuzumab, a monoclonal antibody that binds the extracellular domain of HER2, inhibiting dimerization and downstream signaling while recruiting immune effectors for antibody-dependent cellular cytotoxicity. Such targeted therapies have improved survival rates, with trastuzumab-based regimens reducing recurrence risk by up to 50% in HER2-positive cases. Beyond cancer, gene duplications underlie several genetic disorders by perturbing protein stoichiometry in cellular processes. Charcot-Marie-Tooth disease type 1A (CMT1A), the most common inherited neuropathy, results from a 1.4 Mb duplication on chromosome 17p12 encompassing the , leading to 1.5- to 2-fold overexpression of peripheral myelin protein 22. This excess disrupts Schwann cell myelination of peripheral nerves, causing progressive muscle weakness and sensory loss with onset typically in the first or second decade of life. The duplication accounts for 70-80% of CMT1 cases and arises de novo in about 25% of patients. From an evolutionary perspective, gene duplications that initially provided adaptive advantages, such as expanded dosage for immune or metabolic functions, can predispose modern humans to disease susceptibility when dysregulated. Dosage-sensitive genes, which are intolerant to copy number changes, are enriched in genomic regions prone to recurrent duplications, linking ancient duplication events to contemporary disorders like congenital anomalies and cancers. This evolutionary legacy highlights how duplicated genes, while fostering innovation, create vulnerabilities exploited in pathological contexts. Recent advances in 2024 and 2025 have leveraged liquid biopsies to detect gene amplifications in circulating tumor DNA (ctDNA), enabling non-invasive monitoring of disease progression and therapy response. Studies demonstrate that ultrasensitive next-generation sequencing of ctDNA can identify amplifications like MYC or HER2 with >95% specificity in advanced cancers, correlating with tumor burden and resistance emergence. For example, a 2025 multicenter trial showed ctDNA-based detection of amplifications predicted in with accuracy comparable to tissue biopsies, facilitating personalized adjustments to targeted therapies. These insights underscore liquid biopsies' role in amplifying early intervention for duplication-driven malignancies.

Applications in Biotechnology

Gene duplication has been harnessed in biotechnology through synthetic techniques to create diverse gene libraries and facilitate . -Cas9 systems enable precise synthetic duplications by inducing targeted double-strand breaks that promote , allowing the copying of specific genomic segments for library construction. This approach is particularly useful in , where multiplexed variants generate libraries of duplicated regulatory elements or coding sequences to screen for enhanced functions, such as improved variants. Directed evolution leverages gene duplicates as scaffolds to accelerate the development of novel proteins with desired properties. By introducing duplicate copies of a into a host organism, followed by random and selection, researchers can evolve one copy while preserving the original function, mimicking natural neofunctionalization. A notable example involves β-propeller protein scaffolds through multiple gene duplications and rearrangements, which provides a stable framework for toward new catalytic activities. This method has been refined to include computational design of interface evolution between duplicated domains, yielding proteins with leaps in binding affinity or specificity. In , intentional gene duplications enhance biofuel production by increasing enzymatic flux through key pathways. In , duplicating genes involved in ethanol metabolism, such as , boosts tolerance and yield under industrial conditions, as seen in strains engineered for second-generation bioethanol from lignocellulosic feedstocks. Similarly, polyploid breeding induces whole-genome duplications to improve crop traits, leading to larger fruits, higher yields, and stress resistance; for example, tetraploid varieties of and exhibit enhanced vigor and nutrient content compared to diploids. These polyploids arise from colchicine-induced doubling, facilitating the fixation of beneficial alleles across multiple copies. Recent advances from 2023 to 2025 have focused on multiplexed duplication techniques to scale . Amplification editing (), a CRISPR-based , enables precise, programmable duplication of endogenous genes at chromosomal scales for higher expression levels in mammalian cells without off-target effects. This has been applied to boost therapeutic protein yields, such as monoclonal antibodies, by duplicating production cassettes in cells. In , such multiplexed duplications raise concerns, including risks of unintended genomic instability and equitable access, as alterations could be heritable if applied to cells, prompting calls for stringent oversight similar to broader editing guidelines. Validation of these duplications often relies on sequencing to confirm integration fidelity. A classic biotechnological example is the duplication of the human insulin gene in bacterial expression systems to meet pharmaceutical demands. Recombinant insulin production began with synthetic genes cloned into high-copy plasmids in , effectively duplicating the insulin sequence across multiple plasmid copies per cell to achieve gram-scale yields; this approach revolutionized treatment by providing scalable, human-identical insulin. Subsequent optimizations, including codon adaptation and multi-copy integration, further amplified output, with modern systems producing over 10 g/L in fermenters.

References

  1. [1]
    Duplication - National Human Genome Research Institute
    Duplication is a type of mutation that involves the production of one or more copies of a gene or region of a chromosome.
  2. [2]
    An Overview of Duplicated Gene Detection Methods - NIH
    Gene duplication is an important evolutionary mechanism allowing to provide new genetic material and thus opportunities to acquire new gene functions for an ...
  3. [3]
    Gene duplication and evolution in recurring polyploidization ...
    Feb 21, 2019 · Dispersed duplication (DSD) happens through unpredictable and random patterns by mechanisms that remain unclear, generating two gene copies that ...
  4. [4]
    Gene Duplication - an overview | ScienceDirect Topics
    Gene duplication is defined as a major mechanism of evolutionary change that occurs when a gene is duplicated during DNA replication, allowing the additional ...
  5. [5]
    The early stages of duplicate gene evolution - PNAS
    Duplicate genes are believed to be a major mechanism for the establishment of new gene functions and the generation of evolutionary novelty.Missing: definition | Show results with:definition
  6. [6]
    Evolution by Gene Duplication | SpringerLink
    Aug 23, 2014 · Book Title: Evolution by Gene Duplication · Authors: Susumu Ohno · Publisher: Springer Berlin, Heidelberg · eBook Packages: Springer Book Archive.
  7. [7]
    The evolution of gene duplications: classifying and distinguishing ...
    Jan 6, 2010 · The evolution of gene duplications: classifying and distinguishing between models. Hideki Innan &; Fyodor Kondrashov. Nature Reviews Genetics ...
  8. [8]
    Hox cluster duplications and the opportunity for evolutionary novelties
    We propose that the constraints on vertebrate Hox cluster structure lead to an association between the retention of duplicated Hox clusters and adaptive ...
  9. [9]
    The frequency of polyploid speciation in vascular plants - PNAS
    Since its discovery in 1907, polyploidy has been recognized as an important phenomenon in vascular plants, and several lines of evidence indicate that most ...
  10. [10]
    Ecological studies of polyploidy in the 100 years following its discovery
    By the 1940s, polyploidy was known to be a common and recurrent form of genetic variation in plants, and implicated as a factor in ecological adaptation and ...
  11. [11]
    Selectionism and Neutralism in Molecular Evolution - Oxford Academic
    In the 1960s several more different multigene families (Dayhoff 1969) were discovered, and this discovery set forth the study of evolution of multigene families ...
  12. [12]
    Evolution by the birth-and-death process in multigene families of the ...
    Multigene families whose member genes have the same function are generally believed to undergo concerted evolution that homogenizes the DNA sequences of the ...
  13. [13]
    Gene duplication as a mechanism of genomic adaptation to a ...
    Oct 12, 2012 · The development of the theory of gene duplications reflects that of the neutral theory of molecular evolution. The strong claim of neutrality of ...
  14. [14]
    The Neutralist/Selectionist Debate in Molecular Evolution
    Feb 5, 2024 · The neutral theory was based mainly on 2 observations: (i) that within-species genetic polymorphism is substantial and (ii) that proteins evolve ...Missing: 1970s | Show results with:1970s
  15. [15]
    Evolution of Repeated DNA Sequences by Unequal Crossover
    Evolution of Repeated DNA Sequences by Unequal Crossover: DNA whose sequence is not maintained by selection will develop periodicities as a result of random ...
  16. [16]
    Mechanisms of structural chromosomal rearrangement formation
    Jun 14, 2022 · Most recurrent rearrangements are caused by a mechanism named Non-Allelic Homologous Recombination (NAHR) that occurs between Low Copy Repeats ( ...
  17. [17]
    Genetic Proof of Unequal Meiotic Crossovers in Reciprocal Deletion ...
    The present study provides further evidence for a model in which reciprocal deletion and duplication syndromes arise from unequal crossing over between LCRs.
  18. [18]
    Alu-mediated diverse and complex pathogenic copy-number ...
    Alu repetitive elements are known to be major contributors to genome instability by generating Alu-mediated copy-number variants (CNVs).
  19. [19]
    Processes of de novo duplication of human α-globin genes | PNAS
    These exchanges most likely arise by unequal crossover at meiosis and should generate reciprocal duplication products, namely ααα chromosomes carrying ...<|control11|><|separator|>
  20. [20]
    Instability of repetitive DNA sequences: The role of replication in ...
    Open in Viewer Unequal crossing-over between circular molecules. Unequal crossing-over between direct repeats borne on plasmids generates a circular dimer ...Deletion Assays · Results And Discussion · Homology-Dependent...
  21. [21]
    Replication slippage involves DNA polymerase pausing and ...
    Genome rearrangements can take place by a process known as replication slippage or copy‐choice recombination. The slippage occurs between repeated sequences ...Missing: microduplications | Show results with:microduplications
  22. [22]
  23. [23]
    The role of fork stalling and DNA structures in causing chromosome ...
    Alternative non-B form DNA structures, also called secondary structures, can form in certain DNA sequences under conditions that produce single-stranded DNA ...
  24. [24]
    Replication slippage of different DNA polymerases is ... - PubMed
    Sep 24, 1999 · Replication slippage is a particular type of error caused by DNA polymerases believed to occur both in bacterial and eukaryotic cells.
  25. [25]
    Replication Stress and Mechanisms of CNV Formation - PMC - NIH
    We have found that agents that perturb replication induce a high frequency of CNVs in normal human cells that resemble non-recurrent CNVs in humans in all ...
  26. [26]
    Chromosomal breaks at the origin of small tandem DNA duplications
    ### Summary of Replication Slippage Causing Small-Scale Duplications (<1kb)
  27. [27]
    DNA Replication Timing, Genome Stability and Cancer - PMC - NIH
    The observations described above indicate that replication timing influences the mutation rate of different genomic regions in the germline, and over long ...
  28. [28]
    RNA-Mediated Gene Duplication and Retroposons - NIH
    A class of mammalian retroposons, long interspersed element-1 (LINE1, L1), has been shown to be involved in the reverse transcription of retrogenes.
  29. [29]
    Gene Duplication: The Genomic Trade in Spare Parts - PMC - NIH
    The duplication of genes and their subsequent diversification has had a key role in evolution. A range of fates can befall a duplicated gene.Missing: types | Show results with:types
  30. [30]
    Diversity through duplication: Whole-genome sequencing reveals ...
    Feb 25, 2014 · While an estimated 8,000–17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms ...
  31. [31]
    Extensive Copy-Number Variation of the Human Olfactory Receptor ...
    We have undertaken a detailed study of copy-number variation of ORs to elucidate the selective and mechanistic forces acting on this gene family.
  32. [32]
    Polyploidy | Learn Science at Scitable - Nature
    Polyploidy is the heritable condition of possessing more than two complete sets of chromosomes. Polyploids are common among plants, as well as among certain ...
  33. [33]
    Polyploidy in fungi: evolution after whole-genome duplication
    Apr 4, 2012 · Aneuploidy designs the occurrence of one or more extra or missing chromosomes by comparison with the normal haploid/diploid state of the species ...
  34. [34]
    Aneuploidy & chromosomal rearrangements (article) - Khan Academy
    Disorders of chromosome number are caused by nondisjunction, which occurs when pairs of homologous chromosomes or sister chromatids fail to separate during ...
  35. [35]
    Polyploidy: A Biological Force From Cells to Ecosystems
    Polyploidy, resulting from the duplication of the entire genome of an organism or cell, greatly affects genes and genomes, cells and tissues, organisms, ...
  36. [36]
    The Dynamic Fungal Genome: Polyploidy, Aneuploidy and Copy ...
    This review addresses the prevalence of polyploidy, aneuploidy, and copy number variation across diverse fungal species.
  37. [37]
    Molecular Genetic Features of Polyploidization and ...
    We provide the first evidence that aneuploidy exceeds eupolyploidy in the diploid crosses, suggesting aneuploidization is a leading cause of genome duplication.
  38. [38]
    Evidence for Polyploidy in Majority of Angiosperms - Science
    Three published estimates of the frequency of polyploidy in angiosperms (30 to 35 percent, 47 percent, and 70 to 80 percent) were tested by estimating the ...
  39. [39]
    Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate
    We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number ...
  40. [40]
    Three rounds (1R/2R/3R) of genome duplications and the evolution ...
    Jun 6, 2006 · ... entire genome duplications (2R) were proposed during the early evolution of vertebrates. Most glycolytic enzymes occur as several copies in ...
  41. [41]
    The evolutionary dynamics of plant duplicate genes - ScienceDirect
    The half-life to silencing and loss of a gene duplicate in A. thaliana, however, is estimated at 23.4 million years, which is 3–7-fold higher than for animal ...Missing: vs | Show results with:vs
  42. [42]
    The relationship of recombination rate, genome structure, and ... - NIH
    Sep 16, 2015 · Recombination rate is negatively correlated with genome size, which is likely caused by the removal of LTR retrotransposons. After correcting ...
  43. [43]
    Zebrafish: unraveling genetic complexity through duplicated genes
    Jul 30, 2024 · One study determined that zebrafish had the highest rate of tandem (duplicates located within 10 kb of each other) and intrachromosomal (copies ...
  44. [44]
    Fossilized cell structures identify an ancient origin for the teleost ...
    Jul 23, 2021 · Teleost fishes comprise one-half of all vertebrate species and possess a duplicated genome. This whole-genome duplication (WGD) occurred on the ...<|control11|><|separator|>
  45. [45]
    Neofunctionalization - an overview | ScienceDirect Topics
    Neofunctionalization is defined as the mechanism by which novel functions arise through gene duplication, where one gene copy retains the ancestral function ...
  46. [46]
    Evolution of new enzymes by gene duplication and divergence
    Apr 6, 2020 · In plants, at least 50% of genes have arisen due to segmental duplication, WGD, or even whole-genome triplication [[21]]. In bacteria, ...
  47. [47]
    Neofunctionalization of Duplicated Genes Under the Pressure ... - NIH
    This article develops basic population genetics theories on the process to permanent neofunctionalization under the pressure of gene conversion.
  48. [48]
    Adaptive Functional Divergence Among Triplicated α-Globin Genes ...
    The globin superfamily of genes therefore provides an excellent example of how physiological pathways can be elaborated and refined through functional and ...
  49. [49]
    The multiple fates of gene duplications - NIH
    Mar 7, 2022 · The Drosophila bithorax complex is a classical example of neofunctionalization of duplicated genes, in which a set of homeobox genes is ...
  50. [50]
    Evolution by Gene Duplication
    In stock Free deliveryextremely effective in policing alleHe mutations which arise in already existing gene loci. Because of natural selection, organisms have been able to adapt to.
  51. [51]
    Evidence that strong positive selection drives neofunctionalization in ...
    Theoretical work suggests that both genetic drift and positive selection may play a role in the fixation and early evolution of duplicate genes (3–5). In ...Missing: quantitative | Show results with:quantitative
  52. [52]
    The probability of duplicate gene preservation by subfunctionalization
    Gene duplicates are frequently preserved by subfunctionalization, whereby both members of a pair experience degenerative mutations that reduce their joint ...Missing: paper | Show results with:paper
  53. [53]
    Alternative Splicing and Subfunctionalization Generates Functional ...
    Subfunctionalization can occur when an ancestral gene carries out more than one function. If one duplicated copy mutates so that it loses one of the functions, ...Missing: basis | Show results with:basis
  54. [54]
    Evolutionary interplay between sister cytochrome P450 genes ...
    Oct 7, 2016 · Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169, 1157–1164 ( ...
  55. [55]
    Patterns of Gene Conversion in Duplicated Yeast Histones Suggest ...
    The “dosage balance hypothesis” postulates that genes whose functions involve precise interactions with other genes' products will be under selection against ...
  56. [56]
    Dosage-sensitive genes in evolution and disease - BMC Biology
    Sep 1, 2017 · These dosage-sensitive genes may confer an advantage upon copy number change, but more typically they are associated with disease.
  57. [57]
    Science | AAAS
    **Summary:**
  58. [58]
    Duplication and DNA segmental loss in the rice genome ...
    Jan 14, 2005 · Following the duplications, there have been large-scale chromosomal rearrangements and deletions. About 30–65% of duplicated genes were lost ...Duplicated Genes And Blocks · Estimation Of Gene Loss Rate · Results
  59. [59]
    Gene balance hypothesis: Connecting issues of dosage sensitivity ...
    We summarize, in this review, the evidence that genomic balance influences gene expression, quantitative traits, dosage compensation, aneuploid syndromes.Missing: histone | Show results with:histone
  60. [60]
    Duplication and Retention Biases of Essential and Non-Essential ...
    Duplicate essential genes are more likely to be retained in the long term than non-essential duplicate genes. It is often the case that genes with an essential ...
  61. [61]
    Evolution of olfactory receptor genes in the human genome - PNAS
    Sep 24, 2003 · These genes appear to have been generated by tandem gene duplication. However, the relationships between genomic clusters and phylogenetic ...Missing: retrotransposition | Show results with:retrotransposition
  62. [62]
  63. [63]
    High resolution analysis of DNA copy number variation using ...
    Oct 1, 1998 · We describe here our implementation of array CGH. We demonstrate its ability to measure copy number with high precision in the human genome, and to analyse ...
  64. [64]
    Detecting DNA Copy Number Alteration in Array-Based CGH Data
    The log2 intensity ratios of a single copy loss would be -1, and a single copy gain would be 0.58. The goal is to effectively identify locations of gains or ...
  65. [65]
    High resolution analysis of DNA copy number variation using ...
    Comparative genomic hybridization (CGH) was developed for genome-wide analysis of DNA sequence copy number in a single experiment.
  66. [66]
    BAC to the future! or oligonucleotides: a perspective for micro array ...
    The array CGH technique (Array Comparative Genome Hybridization) has been developed to detect chromosomal copy number changes on a genome-wide and/or ...Array Cgh Improves Spatial... · Different Array Cgh... · Figure 1
  67. [67]
    Array-based comparative genomic hybridization: clinical contexts for ...
    Commercially available whole-genome BAC arrays have moderate resolution (1 Mb), and no high-resolution BAC arrays (such as tiling arrays) are available on the ...
  68. [68]
    The array CGH and its clinical applications - ScienceDirect.com
    Array comparative genomic hybridization (aCGH) is a technique enabling high-resolution, genome-wide screening of segmental genomic copy number variations ...
  69. [69]
    Yield of additional genetic testing after chromosomal microarray for ...
    CMA results may show a copy-number gain that could represent either a tandem duplication or an unbalanced insertion. Distinguishing between these two ...
  70. [70]
    The genomic architecture of segmental duplications and associated ...
    Using high-density aCGH experiments specifically designed to interrogate putative segmental duplications, we identified 3583 CNVs in a panel of 17 genetically ...
  71. [71]
    Detection of structural variation using target captured next ... - Nature
    Dec 19, 2018 · Next-generation sequencing (NGS) is an efficient method for SV detection because of its high-throughput, low cost, and base-pair resolution.Detection Of Svs In 60,000... · Ngs Detects Svs With High... · Ngs As A Powerful Method For...
  72. [72]
    Detecting copy number variation in next generation sequencing data ...
    Aug 31, 2021 · Four different approaches are currently used for detecting CNVs from NGS data [14, 15]; paired-end mapping based detection (PE), split read ...
  73. [73]
    Long-read sequence and assembly of segmental duplications - PMC
    Jun 17, 2019 · We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome ...
  74. [74]
    PhaseDancer: a novel targeted assembler of segmental duplications ...
    Sep 11, 2023 · PhaseDancer works with next generation sequencing long-read data e.g. Oxford Nanopore or PacBio. Starting with an initial anchor sequence ...
  75. [75]
    Segmental duplications and their variation in a complete human ...
    Apr 1, 2022 · 1). This raises the percent estimate of the human genome that is segmentally duplicated from 5.4 to 6.7%. However, five SD-related gaps remained ...
  76. [76]
    Structural polymorphism and diversity of human segmental ... - Nature
    Jan 8, 2025 · We present a population genetics survey of SDs by analyzing 170 human genome assemblies (from 85 samples representing 38 Africans and 47 non-Africans)
  77. [77]
    An efficient CRISPR-Cas9 enrichment sequencing strategy for ...
    Aug 27, 2022 · Here we improve and demonstrate the use of CRISPR-Cas9 enrichment combined with long-read sequencing technology to resolve the MYB10 region in the linkage ...
  78. [78]
    Expectations and blind spots for structural variation detection from ...
    Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Author links open ...
  79. [79]
    A Strategy of Assessing Gene Copy Number Differentiation Between ...
    Feb 10, 2025 · Detection of CNVs using next-generation sequencing (NGS) read depth is cost-effective but challenged by high sequence similarity and GC ...
  80. [80]
    Long-Read DNA Sequencing: Recent Advances and Remaining ...
    Aug 25, 2023 · Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time ...
  81. [81]
    Integration of Hi-C with short and long-read genome sequencing ...
    Oct 29, 2022 · We investigate 11 individuals with complex genomic rearrangements including germline chromothripsis by combining short- and long-read genome sequencing (GS) ...
  82. [82]
    Genotyping sequence-resolved copy number variation using ...
    Oct 17, 2025 · It is challenging to distinguish paralogs from orthologs in complex rearrangements (for example, Fig. 2a). To only obtain divergence values ...
  83. [83]
    Guidelines for Human Gene Nomenclature - PMC - NIH
    Feb 1, 2021 · Here we present the current HUGO Gene Nomenclature Committee (HGNC) guidelines for naming not only protein-coding but also RNA genes and pseudogenes.
  84. [84]
    Problems with Paralogs: The Promise and Challenges of Gene ...
    Duplicated genes (paralogs) are especially difficult to detect and functionally characterize in nonmodel organisms, and often go overlooked. Because they are so ...
  85. [85]
    Comparative Genomics
    ### Summary of Features for Gene Duplication Analysis
  86. [86]
    Paralogues View - Ensembl
    Paralogues are defined in Ensembl as genes for which the most common ancestor node is a duplication event. These ancestral duplications are represented by red ...
  87. [87]
    Segmental Dups Track Settings - UCSC Genome Browser
    Segmental duplications play an important role in both genomic disease and gene evolution. ... This method has become known as WGAC (whole-genome assembly ...
  88. [88]
    Phytozome
    Search and visualization tools let users quickly find and analyze genes or genomic regions of interest. Query-based data access is provided by Phytozome's ...BioMart · Glycine max Wm82.a6.v1 · Glycine max Wm82.a2.v1 · O.sativa v7.0Missing: duplication | Show results with:duplication
  89. [89]
    DupMasker Download Page - RepeatMasker
    DupMasker/RepeatMasker use a sequence search engine to perform their searches. Currently DupMasker only supports the RMBlast and WUBlast/ABBlast engines.Missing: tool | Show results with:tool
  90. [90]
    DupMasker: A tool for annotating primate segmental duplications - NIH
    Results from these various applications illustrate the utility of this software tool. Analysis of regions associated with genomic disorders. Duplication-rich ...
  91. [91]
    OrthoDB
    ### Summary of OrthoDB Features and Usage
  92. [92]
    Amplification units containing human N-myc and c-myc genes. - PNAS
    Alternatively unequal crossing over and gene conversion, which have the effect of homogenizing repeated- sequence arrays, may be contributing to ...
  93. [93]
    Recent progress in understanding mechanisms of mammalian DNA ...
    Evidence for unequal crossing-over as the mechanism for amplification of some homogeneously staining regions. Cancer Genet. Cytogenet., 29 (1987), pp. 139 ...
  94. [94]
    A review of the cognitive impact of neurodevelopmental and ... - Nature
    Apr 8, 2023 · Many rare copy number variants are associated with neurodevelopmental and neuropsychiatric conditions (ND-CNV), including schizophrenia and autism spectrum ...
  95. [95]
    Targeting HER2-positive breast cancer: advances and future ...
    Nov 7, 2022 · Unlike trastuzumab, which binds to ECD IV of HER2, pertuzumab binds to ECD II, preventing HER2 heterodimerization with HER1, HER3 and HER4, ...
  96. [96]
    PMP22 related neuropathies: Charcot-Marie-Tooth disease type 1A ...
    Mar 19, 2014 · This autosomal dominantly inherited demyelinating form of CMT is caused by a 1.5 Mb duplication on chromosome 17p11.2 [15,16], containing the ...
  97. [97]
    Genomic disorders: A window into human gene and genome evolution
    Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes.
  98. [98]
    Liquid biopsy in cancer: current status, challenges and future ...
    Dec 2, 2024 · Several molecular markers can be detected by liquid biopsy, such as circulating tumor cells (CTCs), circulating tumor DNA (ctDNA), tumor-derived ...
  99. [99]
    Monitoring and Assessment of Circulating Tumor DNA in Cancers ...
    Oct 24, 2025 · Liquid biopsy with circulating tumor DNA (ctDNA) has rapidly emerged as a new paradigm for assessing tumor burden, genetic heterogeneity, and ...
  100. [100]
    Prospective Multicenter Study Evaluating a Combined Circulating ...
    Jun 26, 2025 · LHM ctDNA is noninferior to G360 ctDNA, but not tissue NGS. Treatment outcomes based on liquid biopsy are comparable with those based on tissue NGS.
  101. [101]
    Efficient inversions and duplications of mammalian regulatory DNA ...
    In addition, DNA fragment duplications and deletions could also be generated by CRISPR through trans-allelic recombination between the Cas9-induced double- ...
  102. [102]
    Beyond Cutting: CRISPR-Driven Synthetic Biology Toolkit for Next ...
    Techniques include CRISPR-Cas9 Assisted Recombineering (CRASAR), which induces targeted double-strand breaks near a gene concurrent with random mutagenesis, ...
  103. [103]
    Selection of chromosomal DNA libraries using a multiplex CRISPR ...
    Aug 19, 2014 · An optimized CRISPR-Cas9 system enables multiplexed genome engineering for evolving biomolecules and pathways from chromosomally integrated ...
  104. [104]
    Design of protein function leaps by directed domain interface evolution
    The processes of gene duplication and subsequent sequence divergence (1) have been successfully recapitulated in directed evolution and computational protein ...Results And Discussion · Affinity Clamp Design · Experimental Procedures
  105. [105]
    Engineering of β-propeller protein scaffolds by multiple gene ...
    The results show that the β-propeller scaffold is an attractive platform for future engineering work, particularly in experiments in which directed evolution ...
  106. [106]
    Improving industrial yeast strains: exploiting natural and artificial ...
    Duplication of the ancestral ADH gene as well as duplication of several other genes involved in ethanol metabolism, combined with the ability of Saccharomyces ...
  107. [107]
    Polyploidy and Crop Improvement - Udall - 2006 - ACSESS - Wiley
    Nov 1, 2006 · We review this evidence and discuss the relevance of genome duplication to crop improvement. Polyploidy provides genome buffering, increased ...
  108. [108]
    In vitro Ploidy Manipulation for Crop Improvement - Frontiers
    Jun 2, 2020 · In vitro regeneration systems provide a powerful tool for manipulating ploidy to facilitate breeding and development of new crops.
  109. [109]
    Amplification editing enables efficient and precise duplication of ...
    Jul 25, 2024 · We develop a genome editing tool named Amplification Editing (AE) that enables programmable DNA duplication with precision at chromosomal scale.
  110. [110]
    What are the ethical issues surrounding gene therapy? - MedlinePlus
    Feb 28, 2022 · Because gene therapy involves making changes to the body's basic building blocks (DNA), it raises many unique ethical concerns.
  111. [111]
    Genome editing of polyploid crops: prospects, achievements ... - NIH
    Breeding by crossing and selection is the essence of crop improvement, for example for introducing disease resistance traits. Polyploids have multiple alleles ...
  112. [112]
    Expression in Escherichia coli of chemically synthesized genes for ...
    Synthetic genes for human insulin A and B chains were cloned separately in plasmid pBR322. The cloned synthetic genes were then fused to an Escherichia coli ...Missing: duplication | Show results with:duplication
  113. [113]
    Expression and purification of recombinant human insulin from E ...
    A new pIBAINS expression vector was constructed that provides greater efficiency in the production of recombinant human insulin.Missing: duplication | Show results with:duplication