Fact-checked by Grok 2 weeks ago

Pseudogene

A pseudogene is a segment of an organism's DNA that closely resembles a functional gene in sequence and structure but has become inactivated, typically through the accumulation of disruptive mutations such as frameshift mutations, premature stop codons, or deletions that prevent the production of a functional protein product. These genomic elements primarily arise from evolutionary processes, including the duplication of functional genes followed by degenerative mutations or the retrotransposition of processed messenger RNA (mRNA) back into the genome, resulting in copies that lack the regulatory elements needed for proper expression. While historically dismissed as nonfunctional "junk DNA," pseudogenes are now recognized for their roles in gene regulation and other biological processes, challenging earlier views of their irrelevance. Pseudogenes are ubiquitous across eukaryotic genomes and are nearly as abundant as protein-coding genes, with annotations indicating approximately 15,000 in the human genome. They are classified into three main types based on their origins and structures: processed pseudogenes, which form via retrotransposition and lack introns and promoter regions; duplicated (or unprocessed) pseudogenes, which retain the intron-exon architecture of their parental genes but accumulate disabling mutations; and unitary pseudogenes, which arise from in situ inactivation of a single gene copy without duplication events. Comprehensive annotations, such as those from the GENCODE project, have identified 14,701 pseudogenes in humans (as of release 49, September 2025), including 10,638 processed, 3,536 duplicated, and 290 unitary types, with ongoing efforts to refine these counts through integrated genomic data. Beyond their structural similarity to genes, pseudogenes exert functional influence primarily through transcription into non-coding RNAs that modulate , such as acting as decoys for microRNAs (miRNAs), precursors for small interfering RNAs (siRNAs), or stabilizers/destabilizers of parental mRNA. For instance, the pseudogene PTENP1 competes with the tumor suppressor PTEN for miRNA binding, thereby enhancing PTEN expression and suppressing tumor growth, while disruptions in such interactions contribute to cancers like . Similarly, the HMGA1 pseudogene influences expression and is implicated in pathogenesis through mRNA stability competition. These regulatory mechanisms underscore pseudogenes' evolutionary conservation—some traceable back 40 million years—and their emerging therapeutic potential, including targeting via antisense or siRNAs to restore balanced regulation in diseases.

Definition and Characteristics

Definition

A pseudogene is a segment of DNA that closely resembles a functional but has become inactivated, typically rendering it non-coding due to the accumulation of disabling such as frameshift mutations, premature stop codons, or deletions that disrupt its . Pseudogenes often arise as paralogous copies of parental genes through duplication or retrotransposition, while unitary pseudogenes result from the direct inactivation of functional genes; they are generally considered defunct relics of evolutionary processes, lacking the ability to produce functional proteins. The term "pseudogene" was first coined in 1977 by Jacq et al., who identified untranscribed sequences homologous to the functional gene in the of the (Xenopus laevis). This discovery highlighted pseudogenes as genomic duplicates that mimic active genes in sequence but fail to undergo proper transcription, setting the stage for recognizing them across eukaryotic genomes. Unlike functional genes, pseudogenes do not typically produce viable transcripts or translate into functional products, distinguishing them as non-protein-coding elements in the . However, a subset may retain partial transcriptional activity, though this does not generally lead to functional outcomes. As of GENCODE release 49 (2025), the contains 14,701 pseudogenes, including 10,638 processed, 3,536 unprocessed (duplicated), and 290 unitary, representing a significant portion of gene-like sequences.

Molecular Characteristics

Pseudogenes are genomic sequences that resemble functional genes but are rendered non-functional primarily through the accumulation of disabling mutations. These mutations typically include insertions and deletions that induce frameshifts, disrupting the and leading to aberrant protein products if transcribed; mutations that introduce premature stop codons, truncating ; and point mutations that alter the , preventing initiation of protein synthesis. Such features ensure that pseudogenes lack the capacity for producing viable proteins, distinguishing them from their parental genes. In terms of sequence composition, pseudogenes maintain a high degree of to their corresponding functional genes, often exceeding 80% identity at the level, particularly in recently formed instances. For example, processed pseudogenes show an average identity of about 80.3% to the coding sequences () of their parents, while duplicated pseudogenes average 76.9%. Over evolutionary time, this similarity erodes as neutral mutations accumulate without purifying selection, resulting in greater in older pseudogenes. Genomically, pseudogenes are commonly situated in intergenic regions or adjacent to clusters of functional genes, minimizing with loci. Processed pseudogenes, derived from retrotransposition, characteristically lack introns and associated regulatory like promoters, reflecting their origin from mature mRNA intermediates. This positioning contributes to their transcriptional silencing in most cases. Detection of pseudogenes relies on bioinformatics approaches, such as using tools like to identify homologous regions harboring disabling mutations relative to annotated genes. Experimental validation often involves (RT-PCR) to demonstrate the absence of mature mRNA transcripts, confirming non-functionality. These methods enable comprehensive annotation in projects, such as GENCODE. The persistence of pseudogenes in genomes stems from their exemption from selective pressure, allowing them to evolve neutrally. Consequently, their mirrors that of non-coding neutral DNA, estimated at approximately $1.2 \times 10^{-8} per site per generation in humans (as of estimates from the ), facilitating their role as genomic fossils.

Classification and Origins

Processed Pseudogenes

Processed pseudogenes, also known as retropseudogenes, originate from the retrotransposition of (mRNA) molecules, which are reverse-transcribed into (cDNA) and randomly inserted into the . This process is primarily mediated by the enzymatic machinery of long interspersed nuclear element-1 (LINE-1 or L1) retrotransposons, which provide the and endonuclease activities necessary for the integration. Unlike functional genes, these pseudogenes form without the involvement of DNA duplication, resulting in their insertion at ectopic chromosomal locations independent of the parental gene's position. Distinguishing features of processed pseudogenes include their intronless structure, as the precursor mRNA has already undergone splicing, and the absence of promoter and other regulatory elements from the original genomic locus. They typically retain a polyadenine (poly-A) tail at the 3' end, a remnant of the mRNA's post-transcriptional modification, and are often flanked by short target site duplications (TSDs) of 4-20 base pairs, generated by the staggered cleavage of the target DNA during L1-mediated insertion. These characteristics render them transcriptionally inactive upon insertion, with disabling mutations such as frameshifts or premature stop codons accumulating over time due to neutral evolution. Processed pseudogenes represent the most prevalent type in mammalian genomes, comprising approximately 72% of all pseudogenes, with 10,638 high-confidence examples (as of GENCODE v47, ) derived from roughly 2,500 distinct parental genes. In humans, they are particularly abundant for highly expressed genes such as those encoding ribosomal proteins, reflecting the availability of mRNA substrates for retrotransposition. Their evolutionary timeline indicates relative youth compared to duplicated pseudogenes, as evidenced by lineage-specific insertions and lower rates in recent copies, which show higher sequence similarity to their functional counterparts (often >80% identity). Detection of processed pseudogenes relies on computational approaches that identify genomic sequences with high similarity to parental mRNAs but lacking introns and regulatory sequences. Methods typically involve BLAST-based alignments requiring >70% coverage of the coding sequence, absence of splice sites, presence of poly-A tails in at least 30% of cases, and multiple frame-disrupting mutations to confirm non-functionality. These criteria distinguish them from active retrogenes, which may regain functionality through secondary mutations or novel regulatory acquisition.

Duplicated Pseudogenes

Duplicated pseudogenes, also known as non-processed pseudogenes, originate from events such as unequal crossing-over during or whole-genome duplications, which copy segments of DNA including functional along with their introns and often promoters. These mechanisms preserve the genomic context of the parent gene, distinguishing them from other pseudogene types. Following duplication, one copy typically loses functionality through inactivating mutations, while the other retains its role, driven by the redundancy that relaxes selective pressure on the duplicate. Key features of duplicated pseudogenes include their structural similarity to the parental , such as retention of intron-exon and proximity to the original locus in many cases, reflecting the recency of the duplication event. They exhibit higher sequence similarity to their functional counterparts compared to older pseudogene classes due to limited time for divergence. Subtypes are classified based on duplication mode: tandem duplicated pseudogenes arise from local duplications near the parent , often in clusters, as seen in the gene family where tandem arrays have expanded through repeated unequal crossing-over; in contrast, segmental duplicated pseudogenes result from larger chromosomal duplications and are more dispersed across the . Duplicated pseudogenes are prevalent in and vertebrates, where frequent genome duplications contribute to their abundance; in the , they constitute approximately 24% of all pseudogenes, numbering 3,536 out of 14,701 total (as of GENCODE v47, 2024). Post-duplication, these pseudogenes undergo rapid decay through the accumulation of disabling mutations, such as frameshifts or mutations, at a neutral evolutionary rate unconstrained by purifying selection due to the redundancy provided by the functional copy. This process highlights their role as evolutionary byproducts in expansion.

Unitary Pseudogenes

Unitary pseudogenes arise from the inactivation of a single-copy functional through the accumulation of disabling at its original genomic locus, without any duplication or retrotransposition event. This process typically occurs after , where the , once under purifying selection, becomes dispensable due to changes in environmental pressures or dietary availability, allowing neutral to fix in the . Unlike other pseudogene classes, unitary pseudogenes represent direct losses of in a lineage, often reflecting evolutionary adaptations or relaxations in selection. Key features of unitary pseudogenes include their retention of the original genomic context, such as introns, flanking regulatory elements, and chromosomal location, distinguishing them from processed pseudogenes that lack these structures. They are characterized by inactivating mutations like premature stop codons, frameshifts, or deletions that disrupt the , rendering them incapable of producing functional proteins. These pseudogenes may exhibit polymorphism within populations if the inactivation is recent, but many are fixed across species, indicating ancient losses. Functionally, they are enriched in categories like olfactory receptors, immune-related genes, and metabolic enzymes, where redundancy or obsolescence facilitates their pseudogenization. In the , unitary pseudogenes constitute a minor fraction of the total pseudogene repertoire, with 290 loci (as of GENCODE v47, 2024), representing approximately 2% of the 14,701 pseudogenes. This low prevalence underscores their rarity compared to duplicated or processed pseudogenes, as they require the complete loss of a unique functional without a paralogous copy to compensate. A classic example is the GULO pseudogene in , including humans, where mutations inactivated the encoding , essential for biosynthesis; this loss occurred after the divergence from strepsirrhine and is fixed in haplorrhines due to dietary access to from fruits. Similarly, the UOX pseudogene, involved in metabolism, exemplifies unitary inactivation in hominoids, contributing to elevated levels that may have adaptive benefits like enhanced activity. Evolutionarily, unitary pseudogenes highlight gene losses that coincide with lineage-specific adaptations, such as shifts in immune responses or metabolic pathways under relaxed selection. In , these losses are distributed relatively uniformly across evolutionary branches, suggesting ongoing gene attrition rather than bursts tied to major events. They provide insights into how genomes streamline by discarding obsolete functions, potentially reducing mutational load while preserving regulatory architectures for possible . Identification of unitary pseudogenes relies on , where orthologous genes are functional in outgroup species (e.g., Gulo is intact) but disrupted in the focal lineage through sequence alignments revealing inactivating s. This approach, combined with manual curation of genomic annotations, distinguishes them from other pseudogenes by confirming the absence of functional paralogs and the presence of original locus markers. Advanced bioinformatic pipelines further validate candidates by assessing mutation spectra and evolutionary patterns.

Polymorphic Pseudogenes

Polymorphic pseudogenes represent genomic loci where alleles vary within a population, such that some individuals carry a functional gene while others possess an inactivated version due to loss-of-function (LoF) mutations, with the non-functional allele reaching appreciable frequencies (e.g., >1%) but not yet fixed across the species. These sites are transitional, capturing genes in the early stages of pseudogenization before they become stably non-functional in all members of the population. They originate from recent deleterious mutations in otherwise functional genes, including premature stop codons, frameshift indels, or disruptions to splice sites, which abolish protein-coding potential without complete fixation. Such mutations often arise in dynamic multigene families under relaxed selective pressure, allowing the inactivated to persist and spread via or weak positive selection in specific contexts, such as adaptation to pathogens. Over time, these polymorphic states can evolve into fixed pseudogenes if the LoF allele becomes predominant. Key features include population-specific variation in functionality, with common LoF mechanisms like CGA to TGA transitions at CpG sites accounting for a significant portion of cases (e.g., 31 out of 119 premature stop codons identified). A prominent example is the CASP12 gene, where a polymorphic stop codon results in a non-functional allele at ~94% frequency in non-African populations, potentially conferring resistance to bacterial infections like sepsis by reducing inflammatory responses, though this allele is rarer in African populations. Other examples include the ABO blood group gene and RHD, where allelic inactivation affects antigen expression in subsets of individuals. Prevalence is relatively low overall but elevated in rapidly evolving gene families, with 232 polymorphic pseudogenes cataloged in the human genome, including 66 in olfactory receptors and 166 non-olfactory cases, many enriched in immunity-related loci like IFNL4 and CCR5. These are particularly notable in immunity genes, where LoF variants in 179 genes span diverse populations, reflecting ongoing diversification in response to environmental pressures. Their implications lie in illuminating recent evolutionary dynamics, as they serve as markers of ongoing gene loss or gain-of-function shifts detectable through population genomics, influencing traits like , , and redundancy in multigene clusters. Studies of these pseudogenes highlight how neutral or adaptive processes shape at the individual and levels.

Evolutionary Significance

Formation Mechanisms

Pseudogenes arise through several primary mechanisms that disrupt gene function or generate non-functional copies, including retrotransposition, , and direct inactivation by mutations . Retrotransposition involves the reverse transcription of mRNA into cDNA, which is then integrated into the , typically mediated by long interspersed elements (LINEs) such as L1, resulting in processed pseudogenes that lack introns and regulatory elements. occurs via tandem or segmental chromosomal events, producing duplicated pseudogenes that retain intron-extron structures but accumulate disabling mutations over time. Direct inactivation happens when mutations, such as frameshifts, nonsense codons, or deletions, render an existing gene non-functional without duplication, leading to unitary pseudogenes. At the molecular level, transposable elements play a crucial role in processed pseudogene formation by providing the enzymatic machinery for retrotransposition; LINE-1 elements encode and endonuclease that facilitate mRNA mobilization and insertion, often at ectopic sites. In contrast, duplicated pseudogenes frequently originate from recombination errors, including unequal crossing-over or non-allelic , which misalign repetitive sequences and generate redundant copies prone to subsequent degeneration. These processes are influenced by genome architecture, with higher densities of repetitive elements increasing the likelihood of erroneous recombinations. In vertebrates, pseudogene formation often outpaces gene loss due to frequent duplication events and relaxed selective pressure on redundant copies; this disparity is amplified in larger genomes with elevated activity. and duplication history further modulate these rates; in mammals, processed pseudogenes are the most abundant type, comprising three to four times more than non-processed pseudogenes. Historically, pseudogene bursts have followed major genomic events, such as the two rounds of whole- duplication (2R) in ancestral vertebrates around 500 million years ago, which generated extensive paralogous copies, many of which degenerated into pseudogenes due to subfunctionalization or nonfunctionalization. These events provided raw material for complexity but led to widespread pseudogenization as a default fate for most duplicates. Recent advances highlight how recombinative deletion limits pseudogene persistence; a 2025 study in angiosperms demonstrated that post-polyploidization gene loss primarily occurs via DNA deletions facilitated by elevated recombination rates, rather than gradual pseudogenization, suggesting similar dynamics may reduce pseudogene accumulation in vertebrate lineages over evolutionary time. This mechanism underscores the transient nature of many pseudogenes, as recombination erodes non-essential sequences in regions of high meiotic activity.

Role in Genome Evolution

Pseudogenes play a pivotal role in by serving as a reservoir of genetic material that can occasionally be reactivated or repurposed to generate novel functional through processes like neofunctionalization. Although such reactivation events are rare, they provide raw material for evolutionary , allowing degraded copies to evolve new functions under selective pressures. For instance, certain pseudogenes have been shown to contribute to adaptive evolution by supplying sequences that can be co-opted for regulatory or structural roles in the . As neutral elements, pseudogenes accumulate at a rate reflective of , making them valuable markers for studying evolutionary and neutral evolution in genomes. Their patterns, unhindered by purifying selection, allow researchers to estimate times between and detect signatures of selection in nearby functional genes by comparison. This neutral reference property has been particularly useful in analyzing variation and evolutionary forces driving genomic diversity. Pseudogenes contribute to genome bloat by expanding the fraction, which can influence overall rates and recombination dynamics across the . In with high duplication rates, such as mammals, the proliferation of pseudogenes—especially processed ones—leads to increased genomic and , contrasting with more streamlined bacterial genomes that maintain lower pseudogene loads due to efficient deletion . This disparity highlights how pseudogene retention correlates with genome architecture in complex eukaryotes versus prokaryotes. Recent research indicates that pseudogenization plays a subordinate role compared to DNA deletion in the aftermath of whole-genome duplications, where recombinative processes more effectively prune redundant sequences to resolve genomic instability. In polyploidization events, elevated recombination facilitates the loss of duplicated material via deletion rather than gradual pseudogenization, underscoring that pseudogenes are often incidental byproducts rather than primary drivers of post-duplication streamlining.

Functional Roles

Regulatory Functions

Pseudogenes exert regulatory functions primarily through their transcribed RNAs, which act as non-coding regulators of without producing functional proteins. Approximately 20% of human pseudogenes are transcribed into , enabling them to participate in cellular processes such as and responses. These transcripts often function by modulating (miRNA) activity or influencing epigenetic modifications. One key mechanism involves pseudogene-derived RNAs serving as miRNA sponges, also known as competing endogenous RNAs (ceRNAs), which sequester miRNAs and thereby derepress target mRNAs. This competitive binding prevents miRNAs from silencing their natural targets, allowing for fine-tuned . A seminal example is the PTENP1 pseudogene, which shares miRNA response elements (MREs) with its parent gene PTEN; by acting as a ceRNA, PTENP1 sequesters miRNAs such as miR-21, thereby increasing PTEN protein levels and suppressing tumor growth. The ceRNA , which underpins this regulatory , highlights how pseudogene transcripts can integrate into broader RNA-mediated control systems. Pseudogene transcripts also contribute to epigenetic regulation by recruiting histone-modifying complexes to , leading to . For instance, the Oct4 pseudogene-derived (lncRNA) forms a complex with the and the histone methyltransferase SUV39H1, directing H3K9 trimethylation to the promoter of the parental Oct4 gene and thereby repressing its expression during . This mechanism exemplifies how pseudogene lncRNAs can enforce epigenetic barriers to maintain cell fate decisions. Additionally, in certain contexts like mouse oocytes, pseudogene-derived small interfering RNAs (siRNAs) have been shown to regulate by promoting formation and silencing transposable elements. As of , studies have highlighted the roles of pseudogene-derived lncRNAs in regulating cancer stem cells through ceRNA mechanisms.

Protein-Coding Potential

Pseudogenes, long viewed as genomic fossils lacking protein-coding capacity due to disruptive , have been found to retain or recover the to produce functional peptides in certain contexts, thereby functioning as "pseudo-pseudogenes." These cases arise when disabling are bypassed or corrected, allowing of viable open reading frames (ORFs). Mechanisms enabling this include reversal of frameshifts or premature stop codons through secondary , alternative that excludes mutated exons, partial transcription initiating upstream of disruptive elements, and generation of chimeric transcripts via or fusion with neighboring functional genes. Such processes enable low-level expression of peptides that can exhibit , as evidenced by proteogenomic analyses across lines and tissues. A notable example is the pseudogene MAPK6P4, which encodes a 84-amino-acid micropeptide that localizes to the and , promoting endothelial , , and tube formation by upregulating VEGFR2 and protein levels. This peptide's functionality was confirmed through knockdown experiments showing reduced in vitro and , highlighting how pseudogene-derived products can contribute to vascular development and potentially pathological in diseases like cancer. Similarly, the pseudogene NANOGP8 produces a that acts as an , enhancing in stem-like cancer cells by mimicking the parental NANOG gene's role in pluripotency maintenance. These instances illustrate that select pseudogenes can yield peptides with specific, conserved roles despite their relic status. Quantitative evidence from and indicates that roughly 5% of human pseudogenes are translated at detectable levels, often yielding short under specific conditions, though expression is typically low abundance compared to genes. Proteogenomic studies have from over 1,500 pseudogenes across diverse datasets, with about 26% of noncanonical proteins deriving from pseudogenic ORFs, underscoring their underappreciated coding reservoir. Detection relies on (Ribo-seq) to map occupancy on pseudogene transcripts and -based , including (DIA-MS) and immunopeptidomics, to verify presence in cellular proteomes. These methods have revealed that pseudogene translation often occurs in a tissue- or condition-specific manner, evading traditional annotation biases. From an evolutionary perspective, translated pseudogenes serve as a source of novel protein innovation, with approximately 19% of pseudogenes predicted to have coding potential that could facilitate , such as through micropeptides modulating cellular responses. Recent advancements (as of 2024) emphasize their role in generating functional diversity, potentially including contributions to stress via small regulators, though functional validation remains ongoing for many candidates. This coding potential expands the functional beyond annotated protein-coding genes, prompting reevaluation of pseudogenes' evolutionary utility.

Involvement in Diseases

Pseudogenes have been implicated in various human diseases through aberrant expression and regulatory dysregulation, often acting as competing endogenous RNAs (ceRNAs) that interfere with microRNA-mediated control of parental genes. For instance, loss of the PTENP1 pseudogene, which normally suppresses tumorigenesis by competing for miRNAs targeting the PTEN tumor suppressor, promotes oncogenesis in cancers such as and . This dysregulation exemplifies how pseudogene-derived non-coding RNAs can drive pathological processes by altering networks. In cancer, pseudogenes show promise as biomarkers due to their differential expression in tumors. The AURKAPS1 pseudogene is upregulated in head and neck squamous cell carcinomas (HNSCC), correlating with aggressive phenotypes, , and differential survival outcomes, enabling its use in diagnostic profiling and predicting radiotherapy response. Such patterns highlight the potential of pseudogene expression profiles for non-invasive cancer detection and personalized treatment strategies. Pseudogenes contribute to genetic disorders by complicating and gene conversion events. In (CAH) due to 21-hydroxylase deficiency, the CYP21A1P pseudogene's high to the functional CYP21A2 gene leads to deleterious mutations via unequal recombination, resulting in chimeric alleles. Recent advances in long-read sequencing have improved detection of these variants, resolving complex genotypes in previously ambiguous cases by traditional methods like and . Links to neurological conditions involve pseudogene-derived transcripts that modulate neurodegeneration pathways. For example, the GBAP1 pseudogene acts as a ceRNA sponging miR-22-3p to regulate GBA expression, influencing lysosomal function and α-synuclein aggregation in . These mechanisms, often tied to activity, underscore pseudogenes' role in disrupting neuronal , as reviewed in recent studies on non-coding RNAs in neurodegenerative diseases. Therapeutic strategies targeting pseudogenes are emerging, particularly with CRISPR-based editing to correct disease-associated loci. In (CGD) caused by NCF1 mutations, CRISPR-Cas9 targeting of the NCF1 pseudogenes (NCF1B and NCF1C) restores functional p47phox expression in patient-derived cells, though it risks chromosomal rearrangements due to ; optimized approaches like Cas12a mitigate these issues for safer modulation. This ceRNA dysregulation in diseases further supports pseudogenes as viable targets for precision therapies.

Examples in Organisms

Eukaryotic Examples

In humans, the (OR) gene family exemplifies pseudogene accumulation, with approximately 52% of the roughly 800 OR genes classified as pseudogenes due to disabling such as frameshifts and premature stop codons. This high rate of pseudogenization, compared to about 20% in mice, is linked to evolutionary changes in olfactory capabilities, reflecting reduced selective pressure for scent detection in lineages adapted to varied diets and environments. Another prominent pseudogene is GULO (), inactivated by in the coding sequence, which led to the loss of endogenous synthesis in the common ancestor of haplorhine approximately 60 million years ago. Among other mammals, the beta-globin contains one of the earliest identified pseudogenes, Hbb-bh1, discovered in as a non-functional duplicate of the functional beta-globin genes, featuring sequence defects that prevent proper transcription and . This processed pseudogene, derived from reverse-transcribed mRNA, provided initial evidence of pseudogene formation via retrotransposition and highlighted their role as genomic relics of active s. In , events often result in elevated pseudogene numbers, as redundant copies accumulate disabling mutations. For instance, in bread (Triticum aestivum), a hexaploid arising from hybridization and duplication, numerous disease resistance genes, particularly nucleotide-binding site (NBS-LRR) types, have pseudogenized post-polyploidy, contributing to over 80% pseudogene content in some resistance s. These pseudogenes, often unitary or duplicated, reflect subfunctionalization or loss of selective pressure in the expanded . Non-mammalian eukaryotes also feature pseudogenes from duplication events, as seen in Drosophila species. In the repleta group of Drosophila, the alcohol dehydrogenase (Adh) locus underwent duplications leading to pseudogenes like Adh-ψ, which evolved neutrally after acquiring inactivating mutations, illustrating how tandem duplications can generate non-functional copies that accumulate changes at a rate consistent with relaxed purifying selection. A notable functional example is the Oct4 pseudogene (Oct4-pg1), a processed pseudogene that produces a transcript as a microRNA sponge to regulate the parent Oct4 , thereby influencing pluripotency and self-renewal. This demonstrates how some pseudogenes can acquire regulatory roles despite originating as inactive duplicates.

Prokaryotic Examples

Pseudogenes in prokaryotes, including and , are typically fewer in number and shorter in length than those in eukaryotes, reflecting the compact nature of prokaryotic that favor efficient replication and strong selection against non-essential sequences. These pseudogenes often arise from processes such as (HGT), where acquired fail to integrate functionally, or from phage integration events that introduce defective copies into the . In a comprehensive survey of 64 prokaryotic , approximately 7,000 candidate pseudogenes were identified, comprising 1-5% of all gene-like sequences, with pseudogenes showing anomalous codon usage indicative of HGT origins in over 19% of cases—more than twice the rate for functional . Unlike the more persistent pseudogenes in eukaryotic , prokaryotic pseudogenes exhibit rapid turnover, often being deleted shortly after formation due to selective pressures maintaining genome streamlining. Prevalence varies significantly by lifestyle, with higher proportions observed in pathogenic and obligate intracellular , where reduction accompanies . For instance, obligate intracellular pathogens like Rickettsia prowazekii harbor up to 24% pseudogenes, while Mycobacterium leprae contains around 36.5% of its as pseudogenes, reflecting decay in non-essential genes during intracellular evolution. In contrast, free-living maintain lower levels, around 1-3%. Pathogenic strains generally show elevated pseudogene frequencies (about 3.9% versus 3.3% in non-pathogens), potentially contributing to modulation through . A notable example is , where pseudogenes constitute approximately 1-2% of the but play a key role in pathogenicity islands, contributing to and within the M. tuberculosis complex. Analysis of multiple strains revealed that in silico-predicted pseudogenes, often resulting from frameshifts or insertions, enhance population-level variation and are transcribed at levels suggesting residual regulatory functions. In , laboratory evolution experiments demonstrate pseudogene dynamics, such as the repair of frameshifted pseudogenes under selective pressure, enabling to new carbon sources; for instance, in long-term cultures, mutations restore functionality to pseudogenes like dcuS, facilitating fumarate utilization. Evolutionarily, prokaryotic pseudogenes often represent temporary inactivation during environmental , providing a for potential reactivation or serving as neutral markers of recent HGT events, but they are swiftly purged unlike the stable pseudogene pools in eukaryotes. This transience supports rapid streamlining, with pseudogenization rates higher in adapting populations but deletions dominating over time. Recent studies from 2023-2024 highlight pseudogenes' involvement in bacterial , including their role in mobility transitions that influence dissemination; for example, phylogenetic analyses of prokaryotic pangenomes show pseudogenes accumulating in accessory genomic elements, aiding to selective pressures like exposure.

References

  1. [1]
    Biochemistry, Pseudogenes - StatPearls - NCBI Bookshelf - NIH
    Pseudogenes are universal and plentiful within genomes. They originate from the decay of duplicated genes throughout evolution and resemble functional genes ...
  2. [2]
    Pseudogenes: Pseudo-functional or key regulators in health and ...
    Pseudogenes have long been labeled as “junk” DNA, failed copies of genes that arise during the evolution of genomes. However, recent results are challenging ...
  3. [3]
    Not so pseudo anymore: pseudogenes as therapeutic targets - PMC
    Pseudogenes are junk DNA gene remnants generated by inactivating mutations or the loss of regulatory sequences, often following gene duplication or ...Gene Regulation By... · Endo-Sirna · Pseudogene-Mediated...Missing: review | Show results with:review
  4. [4]
    The GENCODE pseudogene resource - PMC - PubMed Central
    Pseudogenes are defined as defunct genomic loci with sequence similarity to functional genes but lacking coding potential due to the presence of disruptive ...<|control11|><|separator|>
  5. [5]
    Pseudogenes and Their Genome-Wide Prediction in Plants - PMC
    Nov 28, 2016 · A pseudogene is generally defined as a defective paralogous copy of a functional gene (“parent gene” or “cognate gene”) that has lost its ...
  6. [6]
    Systematic functional interrogation of human pseudogenes using ...
    Aug 23, 2021 · The human genome encodes over 14000 pseudogenes that are evolutionary relics of protein-coding genes and commonly considered as ...
  7. [7]
    Human Release Statistics - GENCODE
    Pseudogenes, 14701. - processed pseudogenes, 10638. - unprocessed pseudogenes, 3536. - unitary pseudogenes, 290. Immunoglobulin/T-cell receptor gene segments.
  8. [8]
    Pseudogenes in the ENCODE regions: Consensus annotation ... - NIH
    Pseudogenes are usually defined as defunct copies of genes that have lost their potential as DNA templates for functional products (Vanin 1985; Mighell et al.Results · Discussion · Pseudogene Activity And...Missing: review | Show results with:review
  9. [9]
    Evolution and function of developmentally dynamic pseudogenes in ...
    Nov 8, 2022 · Pseudogenes are defined as genomic regions that resemble functional genes, contain gene-disabling mutations, and lack regulatory elements ...Methods · Pacbio Data Processing · 13059_2022_2802_moesm1_esm...
  10. [10]
    Estimate of the mutation rate per nucleotide in humans - PMC - NIH
    The average mutation rate was estimated to be approximately 2.5 x 10(-8) mutations per nucleotide site or 175 mutations per diploid genome per generation.
  11. [11]
    Human LINE retrotransposons generate processed pseudogenes
    Here we show that the human LINE retrotransposons, which transpose through the reverse transcription of their own transcript, can also mobilize transcribed DNA ...
  12. [12]
    Processed pseudogenes: A substrate for evolutionary innovation
    Sep 27, 2021 · LINE-1 retrotransposons can mobilise mRNAs in trans to form processed pseudogenes in new genomic locations. Over time, processed pseudogenes ...
  13. [13]
    Millions of Years of Evolution Preserved: A Comprehensive Catalog ...
    Here we report the identification of ∼8000 high-confidence processed pseudogenes in the human genome, which originate from ∼2500 distinct functional ...
  14. [14]
    Splitting pairs: the diverging fates of duplicated genes - Nature
    Nov 1, 2002 · Here, we review the mechanisms that lead to retention versus loss of duplicated genes and consider the broader implications at both a genetic and an ...
  15. [15]
    Evolution of olfactory receptor genes in the human genome - PNAS
    Sep 24, 2003 · Olfactory receptor (OR) genes form the largest known multigene family ... These genes appear to have been generated by tandem gene duplication.
  16. [16]
    Vertebrate pseudogenes - ScienceDirect.com
    Feb 25, 2000 · Pseudogenes are common and are encountered in a diverse range of life forms, but particularly vertebrates. Genome complexity has evolved by the ...
  17. [17]
    Identification and analysis of unitary pseudogenes - PubMed Central
    Novel human pseudogenes are identified that had previous functionality and their age is estimated. The rate of loss-of-function occurred uniformly.
  18. [18]
    Decreased Transcription Factor Binding Levels Nearby Primate ...
    Two classic examples of unitary pseudogenes in human are urate oxidase, Uox, and l-gulonolactone oxidase, Gulo, which are functional in the livers of most ...<|separator|>
  19. [19]
    Polymorphic pseudogenes in the human genome - NIH
    Nov 2, 2024 · In this study, we focus on polymorphic pseudogenes, a unique and relatively unexplored type of pseudogene whose inactivating mutations have not yet been fixed ...
  20. [20]
    Coding, or non-coding, that is the question | Cell Research - Nature
    Jul 25, 2024 · Processed pseudogenes, which represent the most abundant class, derive from a retrotransposition event. They do not contain introns, are located ...
  21. [21]
    Structural characterization and duplication modes of pseudogenes ...
    Mar 5, 2021 · We compared the relative importance of whole genome, tandem, proximal, transposed and dispersed duplication modes in the pseudo and functional gene complements.
  22. [22]
    Transcriptional activity and strain-specific history of mouse ... - Nature
    Jul 29, 2020 · By definition, pseudogenes are DNA sequences that contain disabling mutations rendering them unable to produce a fully functional protein.
  23. [23]
    The reconstruction of evolutionary dynamics of processed ... - PNAS
    Oct 28, 2024 · Retrotransposons, epigenetically silenced, highly repetitive virus-like elements, constitute nearly half of mammalian genomes.
  24. [24]
    Detecting non-allelic homologous recombination from high ...
    Apr 8, 2015 · In particular, notice that 381 genes and 12 pseudogenes genes were duplicated completely, and 19 novel genes were formed via fusion.
  25. [25]
    Pseudogenes: Implications in Disease and Diagnostics
    Only 1-2% of the mammalian genome codes for the protein ... The human genome has numerous pseudogenes, of which approximately 10,000 to 20,000 are known.
  26. [26]
  27. [27]
    [PDF] "Pseudogenes and Their Evolution". In - University of Michigan
    Nov 15, 2010 · Although the total number of detected pseudogenes varies among vertebrates, the number of processed pseudogenes is usually approximately three ...
  28. [28]
    Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate
    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, ...
  29. [29]
    Duplicate gene evolution and expression in the wake of vertebrate ...
    Feb 8, 2008 · To study events that trigger duplicate gene persistence after whole genome duplication in vertebrates, we have analyzed molecular evolution ...
  30. [30]
    The subordinate role of pseudogenization to recombinative deletion ...
    Jul 9, 2025 · Pseudogenes are typically defined as genomic sequences that resemble functional protein-coding genes but have lost their function. In this study ...
  31. [31]
    Identification and structural characterization of pseudogenes in ...
    Jul 30, 2025 · Analysing these unitary pseudogenes has elucidated the evolutionary trajectories of closely related species and, in some cases, has provided ...Missing: prevalence | Show results with:prevalence
  32. [32]
    Pseudogenes act as a neutral reference for detecting selection in ...
    Jan 4, 2024 · Our work demonstrates that comparing with pseudogenes can improve inferences of the evolutionary forces driving pangenome variation.Missing: genome | Show results with:genome
  33. [33]
    Origins of New Genes and Pseudogenes | Learn Science at Scitable
    The abundance of pseudogenes in a given genome usually depends on rates of gene duplication and loss. Mammals appear to have a high number of processed ...Mechanisms Of New Gene... · Gene Duplication · The Origin And Fate Of...
  34. [34]
    Comparative analysis of pseudogenes across three phyla - PNAS
    In human, despite the fact that pseudogenes are almost as numerous as protein-coding genes (4), only 25% of genes have a pseudogene counterpart. Consequently, ...
  35. [35]
    Full article: Pseudogenes are not pseudo any more
    Jan 1, 2012 · Citation In human, about 12,000 DNA sequences show evidence of being pseudogenes (~8,000 processed pseudogenes plus ~4,000 duplicated ...Functions Of Pseudogenes · Discussion · Table 1. Natural Antisense...Missing: prevalence | Show results with:prevalence
  36. [36]
    A coding-independent function of gene and pseudogene ... - Nature
    Jun 24, 2010 · We find that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role.Missing: PTENP1 | Show results with:PTENP1
  37. [37]
    Epigenetic silencing of Oct4 by a complex containing SUV39H1 and ...
    Jul 9, 2015 · Epigenetic silencing of Oct4 by a complex containing SUV39H1 and Oct4 pseudogene lncRNA ... histone modification and an RNA component. Nat ...
  38. [38]
    Inferring pseudogene–MiRNA associations based on an ensemble ...
    May 31, 2023 · Accumulating evidence shows that pseudogenes can function as microRNAs (miRNAs) sponges and regulate gene expression.
  39. [39]
    Proteomics Can Rise to the Challenge of Pseudogenes' Coding ...
    A pseudogene is a gene homologous to a known protein-coding gene, also called the parental gene, but contains a frameshift and/or deleterious mutations.
  40. [40]
    Pseudogene MAPK6P4-encoded functional peptide promotes ...
    Oct 18, 2023 · We show that pseudogene MAPK6P4 deficiency represses VEGFR2 and VE-cadherin protein expression levels, as well as inhibits the proliferation, migration, ...
  41. [41]
    Are Human Translated Pseudogenes Functional? - PMC - NIH
    Sometimes, a peptide was mapped to multiple pseudogenes. These peptides were removed, resulting in the final dataset of 75 unique translated pseudogenes.
  42. [42]
    Many lncRNAs, 5'UTRs, and pseudogenes are translated and ... - eLife
    Dec 19, 2015 · Using a new bioinformatic method to analyze ribosome profiling data, we show that 40% of lncRNAs and pseudogene RNAs expressed in human cells are translated.
  43. [43]
    Methylation of the PTENP1 pseudogene as potential epigenetic ...
    Jan 22, 2021 · The processed pseudogene PTENP1 is involved in the regulation of the expression of the PTEN and acts as a tumor suppressor in many types of ...Missing: pathogenesis | Show results with:pathogenesis
  44. [44]
    PTEN regulates expression of its pseudogene in glioblastoma cells ...
    In addition to PTEN gene, humans and several primates possess processed PTEN pseudogene (PTENP1) that gives rise to long non-coding RNA lncPTENP1-S.<|separator|>
  45. [45]
    AURKAPS1, HERC2P2 and SDHAP1 pseudogenes - PubMed
    Feb 19, 2025 · AURKAPS1 is a potential biomarker for HNSCC patients. This pseudogene is associated with changes in DNA repair, which should be more deeply analyzed in the ...<|separator|>
  46. [46]
    Improved Genetic Characterization of Congenital Adrenal ... - PubMed
    Jun 24, 2024 · Genetic analysis of congenital adrenal hyperplasia (CAH) has been challenging because of high homology between CYP21A2 and its pseudogene ...
  47. [47]
  48. [48]
    Long non‐coding RNAs as key regulators of neurodegenerative ...
    Feb 12, 2025 · LncRNAs influence the formation of protein aggregates by facilitating protein overexpression through the regulation of gene transcription and translation.
  49. [49]
    Gene editing of NCF1 loci is associated with homologous ... - PubMed
    Oct 9, 2024 · CRISPR-based genome editing of pseudogene-associated disorders, such as p47 phox -deficient chronic granulomatous disease (p47 CGD), ...
  50. [50]
    Comparative Genomics Search for Losses of Long-Established ...
    For example, the human-specific loss of CASP12 [58,59] was not identified by our analysis because the latest human genome assembly (NCBI release 36) has the ...
  51. [51]
    Pseudogenes and Their Genome-Wide Prediction in Plants - MDPI
    The majority of pseudogenes are generated from functional progenitor genes either by gene duplication (duplicated pseudogenes) or retro-transposition (processed ...
  52. [52]
    Unusual molecular evolution of an Adh pseudogene in Drosophila.
    The Adh locus in Drosophila species which are members of the repleta group contains products of one or two duplication events. In all species examined to ...
  53. [53]
    Stem cell regulatory function mediated by expression of a ... - PubMed
    ES cell-specific expression of Oct4 regulates stem cell pluripotency and self-renewing state. Although Oct4 expression has been reported in adult tissues during ...Missing: primary paper