Fact-checked by Grok 2 weeks ago

Intergenic region

Intergenic regions are the non-coding DNA sequences situated between protein-coding genes in a genome, comprising the majority of the DNA in eukaryotic organisms such as humans, where they account for approximately 75% of the total genomic content.^[1] These regions were historically viewed as non-functional "junk DNA," but extensive genomic studies have revealed their critical roles in cellular processes.^[2] A primary function of intergenic regions is to house cis-regulatory elements, including enhancers, silencers, and insulators, which modulate the expression of nearby genes by influencing transcription initiation, activation, or repression.^[2]^[3] Enhancers within these regions can act over long distances, sometimes spanning megabases, to loop and interact with promoters, thereby enabling tissue-specific and developmental-stage-specific gene regulation in eukaryotes.^[4] Additionally, intergenic regions serve as sources for non-coding RNAs, such as long intergenic non-coding RNAs (lincRNAs) and microRNAs (miRNAs), which further fine-tune gene expression through mechanisms like chromatin modification and post-transcriptional regulation.^[5] The study of intergenic regions has transformed genomics, particularly through projects like ENCODE, which demonstrated pervasive transcription across these areas and blurred traditional distinctions between genic and intergenic spaces by identifying widespread functional elements.^[6] In prokaryotes, intergenic regions are shorter and often contain promoters and operators essential for operon regulation, contrasting with the more complex, expansive regulatory landscapes in eukaryotes.^[7] Variations in intergenic sequences contribute to phenotypic diversity, disease susceptibility, and evolutionary adaptation, underscoring their importance beyond mere spacing between genes.^[8]

Definition and Basics

Definition

An intergenic region refers to a segment of DNA located between two consecutive genes on a chromosome, typically spanning from the transcription termination site of the upstream gene (after its stop codon and associated terminator sequences) to the transcription start site of the downstream gene. These regions often include the promoter of the downstream gene but exclude its coding sequence beginning at the start codon. These regions are predominantly non-coding, meaning they do not encode proteins, but they often contain functional elements such as regulatory sequences that can influence nearby gene activity.^[9]^[10] The concept of intergenic regions emerged prominently during early genome sequencing efforts, particularly with the Human Genome Project's draft sequence published in 2001, which classified approximately 75% of the human genome as intergenic DNA based on initial gene annotations. This terminology evolved from earlier notions of "spacer DNA," a term used in pre-genomics studies since the 1970s to describe non-transcribed sequences separating genes, especially in contexts like ribosomal DNA repeats. The precise delineation of intergenic boundaries became essential for genome annotation as sequencing technologies advanced.^[1]^[11] A key distinction exists between intergenic and intragenic regions: intergenic sequences lie entirely outside the defined boundaries of any gene, whereas intragenic regions are located within a single gene and include its exons, introns, and associated untranslated regions. For example, in bacterial genomes like Escherichia coli, intergenic regions are typically short, averaging 100-200 base pairs, and often separate genes in operons; in contrast, human intergenic regions vary widely and can extend over megabases between genes. These regions may briefly reference regulatory roles in modulating gene expression, but their primary characterization remains structural.^[12]^[13]^[14]

Genomic Context

Intergenic regions represent the non-genic portions of the genome, situated between annotated genes, and constitute the majority of genomic space in most organisms. In the human genome, these regions encompass approximately 75% of the total sequence, with protein-coding exons accounting for only about 1.1% and introns covering roughly 24%. This distribution underscores the predominance of non-coding DNA, where intergenic sequences form the bulk outside of transcribed gene units. In contrast, prokaryotic genomes exhibit much higher coding densities; for example, in Escherichia coli K-12, intergenic regions comprise about 12% of the 4.6 Mb genome, while protein-coding sequences occupy around 88%. These proportions highlight fundamental differences in genome organization, with eukaryotic genomes expanded by extensive non-coding elements compared to the compact structure in bacteria. Intergenic regions are positioned adjacent to key regulatory elements such as promoters, which initiate transcription at the 5' end of genes, terminators that signal transcription cessation at the 3' end, and enhancers that modulate gene expression from potentially distant sites. While primarily defined by the absence of coding or functional gene sequences, intergenic areas may harbor pseudogenes—non-functional gene copies—or repetitive elements like transposons, though these are distinguished from core intergenic space through annotation processes. This adjacency facilitates the integration of intergenic sequences into broader genomic architecture, influencing spatial organization and potential interactions with nearby functional units.^[15] Identifying intergenic regions poses significant annotation challenges, primarily due to the reliance on genome assembly and gene prediction algorithms that delineate gene boundaries with varying accuracy. Tools like GENSCAN employ probabilistic models to predict gene structures based on sequence features such as codon usage, splice site signals, and exon-intron compositions, thereby defining intergenic regions as the residual non-predicted segments. However, inaccuracies in predicting long introns, alternative splicing, or low-expression genes can lead to over- or underestimation of intergenic extents, particularly in complex eukaryotic genomes where repetitive sequences complicate assembly. Advances in long-read sequencing have improved resolution, but bioinformatics pipelines continue to refine these identifications to minimize misannotation. The non-coding nature of intergenic regions renders them evolutionary hotspots for insertions and deletions (indels), which accumulate at higher rates than in coding sequences due to reduced selective constraints. These structural variants contribute to genome size variation and plasticity across species, though detailed evolutionary dynamics are explored in dedicated contexts.^[16]

Structural Properties

Composition and Sequence Features

Intergenic regions exhibit distinct nucleotide compositions that differ between prokaryotes and eukaryotes. In eukaryotic genomes, these regions are often AT-rich, with enrichment of homopolymeric poly(dA:dT) tracts that contribute to nucleosome-depleted areas and influence chromatin organization.^[17] In prokaryotic genomes, intergenic regions display variable GC content, typically ranging from 40% to 60% in many bacterial species, and generally lower than that of adjacent coding sequences due to mutational biases and selection pressures.^[18] Additionally, intergenic sequences across both domains frequently harbor repetitive motifs, including microsatellites and transposable elements, which can comprise a substantial portion of non-coding DNA and arise from transposition events or replication slippage.^[19]^[20] Secondary structures within intergenic DNA arise from sequence features that enable folding into stable conformations. Inverted repeats, common in these regions, have the potential to form hairpin loops or stem-loop structures that affect DNA stability and processing; for instance, in the yeast Saccharomyces cerevisiae, approximately 33.5% of identified inverted repeats are wholly contained within intergenic regions, with clustering near gene 3′ flanks.^[21] These structures can influence local supercoiling and extrusion of cruciforms, though their functional constraints vary by genomic context.^[21] Boundary markers delineate the starts and ends of intergenic regions, often through specific sequence motifs tied to transcription machinery. In bacteria, rho-independent terminators act as key downstream boundaries, consisting of GC-rich stem-loop hairpins followed by polyuridine (U) tracts in the RNA transcript—corresponding to AT-rich sequences in the DNA—that promote RNA polymerase release and define intergenic onset.^[22] In eukaryotes, poly-A tracts similarly mark gene boundaries, with poly-T sequences precisely at 5′ ends and poly-A at 3′ ends observed in organisms like Dictyostelium discoideum, aiding in transcription termination and chromatin demarcation.^[23] Detection and characterization of intergenic composition rely on advanced sequencing technologies. Next-generation sequencing (NGS) enables high-resolution mapping of nucleotide profiles and repetitive elements in these regions, revealing their heterogeneity beyond simple AT/GC biases.^[24] The ENCODE project, initiated in 2007 and expanding from 2012, has uncovered hidden complexities in human intergenic sequences through integrated analyses of transcription, chromatin accessibility, and regulatory elements, showing that over 30% of transcribed bases originate from intergenic areas with diverse biochemical signatures.^[24]

Length and Distribution

Intergenic regions in bacterial genomes are typically compact, with average lengths ranging from 100 to 300 base pairs, reflecting the high gene density and streamlined architecture of prokaryotic chromosomes.^[25] For example, in Escherichia coli, the median intergenic length is approximately 134 base pairs, allowing for efficient packing of essential regulatory elements within limited non-coding space.^[26] In contrast, eukaryotic intergenic regions exhibit much greater variability and scale, often spanning from 1 kilobase to over 1 megabase, with the median intergenic length in the human genome being approximately 48 kilobases between genes.^[27] This expansion accommodates complex regulatory networks and repetitive elements.^[14] The distribution of intergenic regions across genomes is uneven, influenced by chromatin organization and genome architecture. In eukaryotes, these regions tend to cluster in gene-dense euchromatin, where shorter intergenic spacers facilitate coordinated gene expression, while they are more sparse and expansive in heterochromatin, which is gene-poor and enriched for repressive elements.^[28] Genome size further modulates this pattern; viral genomes maintain highly compact intergenic spaces, often with overlapping genes and minimal non-coding DNA to optimize replication efficiency.^[29] Conversely, plant genomes feature expansive intergenic regions due to abundant transposable elements and polyploidy, contributing to their overall larger sizes compared to compact bacterial or viral counterparts.^[30] Variability in intergenic lengths is shaped by factors such as tandem gene arrangements, which minimize spacing between co-regulated genes. For instance, histone gene clusters in eukaryotes like Drosophila are organized in tandem arrays with short intergenic regions, enabling synchronized replication-dependent expression.^[31] Comparative genomics studies reveal that intergenic lengths generally increase with organismal complexity, correlating with expanded regulatory needs in multicellular lineages, as evidenced by broader intergenic distributions in vertebrates versus prokaryotes.^[32] To quantify and visualize these lengths and distributions, researchers employ computational tools such as the UCSC Genome Browser, which integrates annotated gene models to display intergenic intervals and chromatin states across species.^[33] This resource allows for precise measurement of region sizes and patterns through interactive tracks, facilitating comparative analyses without relying on sequence composition details.^[34]

Biological Functions

Regulatory Mechanisms

Intergenic regions harbor a variety of non-coding regulatory elements that orchestrate gene activity by facilitating or inhibiting transcription initiation and elongation. These elements include promoters, enhancers, silencers, and insulators, which interact with transcription factors, co-activators, and repressors to modulate chromatin accessibility and RNA polymerase recruitment.^[35] In eukaryotic genomes, such regions often span thousands of base pairs and enable precise spatiotemporal control of gene expression.^[36] Core promoters, typically located immediately upstream of transcription start sites in intergenic spaces, serve as platforms for assembling the pre-initiation complex. A classic example is the TATA box, a conserved AT-rich sequence motif situated approximately 20-30 base pairs upstream of the start site, which binds TATA-binding protein (TBP) to initiate transcription.^[37] Distal enhancers, conversely, can reside up to 1 megabase away within intergenic DNA and loop to contact promoters via chromatin folding, thereby boosting transcription rates through mediator complexes and histone acetyltransferases.^[35] These enhancers are enriched in intergenic regions and exhibit tissue-specific activity, as demonstrated by genome-wide mapping studies.^[38] Silencer sequences in intergenic regions act as binding sites for transcriptional repressors, dampening gene expression by recruiting histone deacetylases or blocking activator access. Insulators, often mediated by the CCCTC-binding factor (CTCF), prevent inappropriate enhancer-promoter interactions and delineate chromatin domains; for instance, CTCF sites in intergenic areas block enhancer activity in a directional manner, maintaining spatial organization.^[39] Approximately 50% of CTCF binding sites occur in intergenic regions, underscoring their role in genome topology.^[39] Epigenetic modifications further fine-tune intergenic regulatory functions. Active intergenic regions, particularly enhancers, are marked by histone H3 lysine 4 monomethylation (H3K4me1), which correlates with open chromatin and transcription factor binding, while H3K27 acetylation enhances accessibility.^[40] In contrast, DNA methylation at CpG islands within intergenic promoters represses transcription by inhibiting transcription factor binding and promoting nucleosome compaction; hypomethylation in these areas facilitates activation.^[41] Notable examples illustrate these mechanisms. In Escherichia coli, the intergenic region of the lac operon contains operator sites that bind the LacI repressor, blocking RNA polymerase progression and regulating lactose-inducible transcription.^[42] In humans, the beta-globin locus control region (LCR), an intergenic hypersensitive site cluster upstream of the gene cluster, coordinates erythroid-specific expression by integrating enhancers and insulators, including CTCF-bound elements at HS5.^[43]

Involvement in Gene Expression

Intergenic regions play a critical role in transcription initiation by serving as platforms for RNA polymerase II (Pol II) recruitment, often through the integration of regulatory elements that facilitate the assembly of pre-initiation complexes. For instance, transcribed intergenic enhancers exhibit Pol II occupancy and nascent transcription, enabling precise recruitment at distal sites to initiate gene expression in a tissue-specific manner.^[44] These regions can also harbor pausing sites where Pol II accumulates shortly after initiation, allowing for regulatory control before productive elongation; such pausing is mediated by conserved DNA sequence motifs in intergenic areas, influencing the timing and efficiency of transcription across metazoan genomes.^[45] In addition to initiation, intergenic regions contribute to transcriptional elongation by providing sequences that modulate Pol II processivity and prevent interference between adjacent genes. Studies in plants have identified bidirectional transcription in intergenic zones that regulates elongation through RNA polymerase mapping, highlighting how these non-coding areas ensure coordinated expression of neighboring loci.^[46] Intergenic regions enable alternative promoter usage, which generates tissue-specific mRNA isoforms, particularly in disease contexts like cancer. Distal CpG islands within intergenic spaces can act as alternative promoters, driving the expression of protein isoforms with distinct functional properties; for example, in colorectal cancer, such intergenic promoters produce isoforms of genes like HNF4A that promote tumor progression through altered transcriptional regulation.^[47] Tumor-specific alternative transcription start sites in intergenic regions have been observed in prostate cancer, where they lead to isoform switching that enhances oncogenic signaling via genes such as TCF12.^[48] Regarding mRNA processing and stability, 3' UTR-proximal intergenic elements influence polyadenylation signals by harboring cryptic processing sites that affect cleavage and poly(A) tail addition. In histone genes, transcription extending into 3' intergenic DNA creates cryptic polyadenylation sites downstream of the mature 3' end, which, if utilized, destabilize the mRNA by altering its processing and export efficiency.^[49] These intergenic features can thus modulate mRNA decay rates, ensuring rapid turnover during cell cycle regulation. Experimental evidence from CRISPR-based editing studies since 2012 demonstrates how intergenic deletions alter gene expression levels, particularly at GWAS-identified loci for diseases. For example, CRISPR-Cas9 editing of an intergenic regulatory region near EPDR1, associated with bone mineral density via GWAS, confirmed its role in modulating target gene expression and disease risk.^[50] Similarly, targeted CRISPR activation of non-coding GWAS signals in schizophrenia-linked intergenic variants has shown upregulation of nearby genes like FOXO3, linking these edits to altered expression profiles in neuronal models.^[51] Such studies underscore the functional impact of intergenic variants on expression without coding changes.

Variations Across Organisms

In Prokaryotes

In prokaryotes, intergenic regions are characteristically compact, typically ranging from 50 to 500 base pairs in length, reflecting the streamlined architecture of bacterial and archaeal genomes that prioritizes coding efficiency. These short spacers often separate genes within operons or divergent gene pairs and frequently harbor bidirectional promoters, enabling coordinated transcription of adjacent genes in opposite directions from a shared regulatory element. This arrangement facilitates rapid gene expression control in response to environmental cues, as seen in many bacterial species where divergent operons share promoter sequences to optimize resource use in nutrient-limited conditions.^[52]^[53] Specific structural features within these intergenic regions include Rho-dependent terminators and transcription attenuators, which play critical roles in fine-tuning gene expression. Rho-dependent terminators, located primarily at the 3' ends of genes in intergenic spaces, facilitate the release of RNA polymerase by binding to nascent RNA lacking strong secondary structures, thereby preventing read-through into downstream regions and recycling transcription machinery. A prominent example is the attenuator in the trp operon leader sequence of Escherichia coli, a 162-base-pair intergenic region upstream of the structural genes that forms alternative RNA hairpins to modulate transcription based on tryptophan availability, either terminating early under high tryptophan levels or allowing full operon expression when levels are low. Additionally, intergenic regions serve as hotspots for horizontal gene transfer, where mobile elements like insertion sequences integrate, promoting genetic exchange and adaptation in dynamic microbial environments. Recent studies as of 2025 have also revealed that intergenic regions in bacteria, such as those in Enterobacteriaceae, encode numerous small proteins (microproteins), contributing to a previously unexplored functional landscape.^[54]^[55]^[56]^[57] Functionally, intergenic regions flank antibiotic resistance genes via mobile elements such as transposons and integrons, enabling their dissemination across bacterial populations. For instance, in pathogens like Pseudomonas aeruginosa, intergenic insertions of mobile elements near resistance loci, such as those encoding beta-lactamases, allow rapid acquisition and expression of resistance under selective pressure from antibiotics. Bacterial pangenome analyses further highlight intergenic variability, revealing that these non-coding regions exhibit higher sequence diversity than core genes, contributing to phenotypic differences across strains and facilitating adaptation in diverse ecological niches.^[58]^[59] Recent metagenomics studies from the 2020s have uncovered intergenic roles in microbial community dynamics, particularly in encoding regulatory elements for quorum sensing signals that coordinate behaviors like biofilm formation and virulence. In wastewater treatment microbiomes, for example, functional screens identified acyl-homoserine lactone (AHL) quorum sensing genes within intergenic contexts, enabling density-dependent communication among diverse bacteria to enhance collective resilience against stressors. These findings underscore how intergenic variability drives community-level interactions in complex environments.^[60]

In Eukaryotes

In eukaryotes, intergenic regions are typically much larger and more structurally diverse than in prokaryotes, often comprising vast stretches of non-coding DNA that play critical roles in gene regulation and genome architecture. These regions, sometimes referred to as intergenic deserts, can span hundreds of kilobases and serve as reservoirs for regulatory elements that modulate gene expression across developmental and environmental contexts. Unlike the compact operon-adjacent spacers in bacteria, eukaryotic intergenic spaces enable complex, long-range interactions essential for multicellularity. Recent advances as of 2024 have further elucidated enhancer-promoter specificity in these regions, involving phase separation in super-enhancers to drive precise gene activation.^[61]^[62] A prominent feature of eukaryotic intergenic regions is their capacity to harbor long non-coding RNAs (lncRNAs), which are transcribed from intergenic loci and influence chromatin states and transcriptional programs, particularly in developmental genes. For instance, many lncRNAs act as scaffolds for protein complexes or as decoys for transcription factors, thereby fine-tuning the expression of nearby genes involved in cell differentiation. Additionally, large intergenic deserts frequently contain super-enhancers, which are clusters of enhancers bound by high densities of transcription factors and mediators, driving robust, cell-type-specific activation of developmental genes such as those in the HOX clusters. These super-enhancers often produce enhancer RNAs (eRNAs) that stabilize chromatin loops and amplify signaling.^[63]31244-7)^[64] Intergenic regions also contribute significantly to chromatin organization within topologically associating domains (TADs), which are self-interacting chromatin segments averaging 100 kb to 1 Mb in size that compartmentalize the genome in three dimensions. Boundaries of TADs often reside in intergenic spaces enriched with insulators like CTCF-binding sites, which prevent ectopic enhancer-promoter contacts and maintain stable 3D genome folding essential for coordinated gene regulation. Disruptions in these intergenic elements can lead to misfolding and diseases such as congenital disorders, underscoring their role in preserving spatial genome integrity across eukaryotic species.^[65]^[66] In plant genomes, intergenic regions are frequently dominated by transposable elements (TEs), which constitute up to 85% of the non-coding space in species like maize and wheat, facilitating adaptation through epigenetic silencing and insertion-induced variability. These TEs can mobilize under stress, altering nearby gene expression and promoting traits such as drought resistance or flowering time shifts, as observed in Arabidopsis thaliana populations. In animal models, such as Drosophila melanogaster, intergenic regions host Polycomb response elements (PREs) that recruit Polycomb repressive complexes to silence developmental genes, ensuring stable epigenetic memory during embryogenesis. These PREs, often spanning 1-2 kb, integrate signals from multiple transcription factors to maintain Hox gene repression patterns.^[67]^[68]^[69] Recent advances in single-cell sequencing technologies since 2015 have revealed cell-type-specific transcriptional activity within human intergenic regions, highlighting dynamic enhancer and lncRNA expression that varies across tissues and states. For example, single-cell RNA sequencing of immune cells has identified hundreds of intergenic lncRNAs uniquely upregulated in subsets like T-helper cells, modulating immune responses, while chromatin accessibility assays show tissue-specific opening of intergenic super-enhancers in brain neurons versus hepatocytes. These findings emphasize the heterogeneity of intergenic contributions to cellular identity in humans.^[70]^[71]

Evolutionary Dynamics

Conservation Patterns

Functional intergenic elements, such as promoters and enhancers, exhibit higher levels of sequence conservation compared to neutral spacers due to their regulatory roles. PhastCons scores, which estimate the probability of negative selection on a nucleotide basis ranging from 0 to 1, are notably elevated in these functional regions; for instance, robust cis-regulatory elements in intergenic DNA average around 0.27, while random neutral sequences score approximately 0.03.^[72] In contrast, protein-coding exons display much higher conservation, with average PhastCons scores of about 0.65.^[72] This disparity underscores the purifying selection acting on functional non-coding sequences to maintain regulatory integrity.^[73] Selective pressures on intergenic regions vary by function, with strong negative selection preserving regulatory motifs essential for gene control. Transcription factor binding sites and other motifs in intergenic DNA show reduced polymorphism and divergence, indicative of purifying selection, even for moderate-affinity sites.^[74] Conversely, certain intergenic regions, particularly those involved in immune responses, experience positive selection; comparisons between human and chimpanzee genomes reveal accelerated evolution in non-coding sequences near pathogen recognition genes, such as those in the MHC region, adapting to selective pressures from infectious agents.^[75] Comparative genomic alignments across mammals, such as those generated by Ensembl's multi-species pipelines, demonstrate that intergenic regions retain roughly 20-30% sequence identity on average, far lower than the near 100% conservation observed in orthologous exons.^[76] These alignments highlight conserved non-coding elements (CNCs) as discrete, highly preserved segments within otherwise variable intergenic space, often comprising less than 5% of total non-coding DNA but showing exon-like constraint levels.^[77] Phylogenetic footprinting has been a key method for identifying CNCs, leveraging cross-species alignments to detect evolutionarily stable non-coding sequences likely harboring regulatory functions. This approach, applied to vertebrate genomes, uncovers footprints of conservation in intergenic regions that align poorly overall but contain motif-rich cores under selection.^[78] Recent 2020s pan-genome projects, including the Human Pangenome Reference Consortium, have refined these insights by incorporating structural variants across diverse populations, revealing that conserved intergenic elements exhibit low variability even in non-reference assemblies, thus updating CNC catalogs with greater resolution.^[79]

Role in Genome Evolution

Intergenic regions serve as mutation hotspots, exhibiting elevated rates of single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) compared to coding sequences, which primarily drive neutral evolution by allowing genetic variation to accumulate without immediate fitness consequences.^[80] These non-coding areas, often comprising repetitive elements and low-complexity sequences, experience indel mutation rates that are approximately 10% of SNP rates but contribute significantly to genomic diversity through neutral drift.^[81] For instance, genome-wide analyses reveal more high-frequency SNV and indel hotspots in intergenic spaces than predicted by background mutation models, underscoring their role in facilitating evolutionary flexibility.^[82] Intergenic duplications exemplify how these regions contribute to the emergence of novel genes, particularly in the evolution of olfactory receptor (OR) families. Large-scale, multi-chromosomal duplications originating from intergenic segments have expanded the human OR gene repertoire, with thousands of copies arising through tandem and segmental events that initially reside in non-coding contexts before potential co-option into functional roles.^[83] In primates, such duplications have driven the diversification of OR genes, enabling adaptive responses to environmental olfactory cues via subsequent positive selection on duplicated variants.^[84] Adaptive evolution also leverages intergenic variations, as seen in the lactase persistence trait in humans, where a SNP (rs4988235, -13910*T) in an enhancer element approximately 14 kb upstream of the LCT gene arose around 10,000 years ago in pastoralist populations. This intergenic variant enhances LCT expression post-weaning, conferring a selective advantage in dairy-consuming societies and demonstrating how non-coding mutations can rapidly fix under positive selection.^[85] Similarly, intergenic regions act as breakpoints for genome rearrangements, including chromosomal inversions and translocations, which reorganize gene order and promote speciation by suppressing recombination within inverted segments.^[86] Evidence from comparative genomics, including reconstructions of ancestral mammalian genomes, shows that many inversion endpoints localize to large intergenic intervals to minimize gene disruption, thereby facilitating structural evolution.^[87] Theoretical frameworks, building on Motoo Kimura's neutral theory of molecular evolution proposed in 1968, have been adapted post hoc to explain intergenic drift, positing that most non-coding mutations are selectively neutral and fixed by random genetic drift rather than adaptive forces.^[88] Subsequent developments, such as extensions to eukaryotic non-coding sequences in the 1970s and beyond, highlight how intergenic regions embody nearly neutral evolution, where slightly deleterious variants accumulate at rates governed by population size and drift, contrasting with stronger selection in coding areas.^[89] This model underscores the intergenic contribution to long-term genomic fluidity without compromising essential functions.

References

[1]
Intergenic Regions
Intergenic regions are the stretches of DNA located between genes. In humans, intergenic regions are non-protein-coding and comprise a large majority of the ...
[2]
What is noncoding DNA?: MedlinePlus Genetics
Jan 19, 2021 · Noncoding DNA contains sequences that act as regulatory elements, determining when and where genes are turned on and off.
[3]
Extended intergenic DNA contributes to neuron-specific expression ...
May 18, 2022 · Intergenic regions contain a large number of cis-regulatory DNA elements, such as enhancers, which perform a variety of functions leading to ...
[4]
Classification of human genomic regions based on experimentally ...
For instance, enhancers can be as far as one mega base pairs (1 Mbp) from the target gene in eukaryotes [3], and can be both upstream and downstream of the ...
[5]
The functions and unique features of long intergenic non-coding RNA
LincRNAs and mRNAs can positively or negatively regulate the expression of their own genes, or target other genes, by interacting with chromatin-modifying ...
[6]
3 Characterization of intergenic regions and gene definition - Nature
Jan 1, 2019 · The prevalence and analysis of ENCODE data are changing the definition and characterization of intergenic and genic regions.
[7]
The regulatory content of intergenic DNA shapes genome architecture
Intergenic distance between genes within operons is likely to underestimate the size of DNA used to regulate these genes and this underestimate could contribute ...
[8]
Extended intergenic DNA contributes to neuron-specific expression ...
May 18, 2022 · Intergenic regions contain a large number of cis-regulatory DNA elements, such as enhancers, which perform a variety of functions leading to ...
[9]
Intergenic Region - an overview | ScienceDirect Topics
An intergenic region is defined as a segment of DNA located between two genes, which may contain regulatory elements or SNPs, and is often the focus of ...
[10]
The sequence of the human genome - PubMed
Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. ... Human Genome Project*; Humans; Introns ...
[11]
Spacer DNA - an overview | ScienceDirect Topics
Intergenic spacer regions often show a higher degree of variability than the coding genes, making the former more useful for analyses at a lower taxonomic ...
[12]
Definition of intragenic and intergenic regions - Bio-protocol
An integration site was defined as being located in the intragenic region if the annotated integration site is located within the gene body of any ...
[13]
Genome-Wide Analyses in Bacteria Show Small-RNA Enrichment ...
A third observation of our study is that the average sizes and distributions of intergenic-region lengths are very similar among the species analyzed, ...
[14]
Genes, pseudogenes, and Alu sequence organization ... - PNAS
There are a total of 44 pairs of (+,+) genes with median intergenic length 28,950 bp. The median intergenic lengths, 35,568 bp, of (−,−) and 28,905 bp of (+,+) ...
[15]
An Integrated Encyclopedia of DNA Elements in the Human Genome
In a pilot phase covering 1% of the genome, the ENCODE project annotated 60% of mammalian evolutionarily constrained bases, but also identified many additional ...
[16]
The origin, evolution, and functional impact of short insertion ...
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional ...
[17]
Homopolymer tract organization in the human malarial parasite ...
Oct 3, 2014 · Homopolymeric tracts, particularly poly dA.dT, are enriched within the intergenic sequences of eukaryotic genomes where they appear to act ...
[18]
Genome and sequence determinants governing the expression of ...
Jun 8, 2020 · Bacterial intergenic regions tend to be lower in GC content than ... According to our model, low GC content bacteria have evolved ...
[19]
Repetitive DNA sequence detection and its role in the human genome
Sep 19, 2023 · TRs can be found in intergenic regions and in both the non-coding and coding regions of a variety of genes. Moreover, TRs occur ...
[20]
Transposable Elements as a Source of Novel Repetitive DNA in the ...
A number of studies have provided examples of TE sequences that give rise to new repetitive classes, such as microsatellites, minisatellites, and satellite DNA ...
[21]
The distribution of inverted repeat sequences in the Saccharomyces ...
Hairpins, the most common RNA secondary structural elements, are produced by intramolecular Watson–Crick binding. The DNA sequence encoding an RNA hairpin must ...
[22]
Bacterial Transcription Terminators: The RNA 3′-End Chronicles
Intrinsic termination, sometimes called Rho-independent termination, refers to dissociation of the EC caused solely by interactions of DNA and RNA with RNAP ...Missing: boundary markers intergenic
[23]
Unusual combinatorial involvement of poly-A/T tracts in organizing ...
We find that Dictyostelium genes are demarcated precisely at their 5′ ends by poly-T tracts and precisely at their 3′ ends by poly-A tracts.
[24]
An integrated encyclopedia of DNA elements in the human genome
Sep 5, 2012 · Excluding RNA elements and broad histone elements, 44.2% of the genome is covered. Smaller proportions of the genome are occupied by regions of ...
[25]
The Evolution of Bacterial Genome Architecture - Frontiers
May 29, 2017 · Whereas intergenic regions typically constitute 10 ± 5% of a bacterial genome, species subject to drift sometimes can have much greater ...
[26]
Confining euchromatin/heterochromatin territory: jumonji crosses the ...
Heterochromatin is typically highly condensed, gene-poor, and transcriptionally silent, whereas euchromatin is less condensed, gene-rich, and more accessible to ...The Dmm-1 Jmjc Domain... · The Epe1 Jmjc Domain Protein... · The Ibm1 Jmjc Domain Protein...
[27]
Gene overlapping and size constraints in the viral world
May 21, 2016 · We sought a unified evolutionary explanation that accounts for their genome sizes, gene overlapping and capsid properties.
[28]
The maize genome as a model for efficient sequence analysis of ...
The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of ...
[29]
Transcription of histone gene cluster by differential core-promoter ...
The 100 copies of tandemly arrayed Drosophila linker (H1) and core (H2A/B and H3/H4) histone gene cluster are coordinately regulated during the cell cycle.
[30]
The size of the genome and the complexity of living beings - Mètode
Feb 25, 2013 · However, in eukaryotes there is no correlation between genome size and the complexity of the organism. This is known as the C-value paradox. The ...Prokaryotes: Bacteria And... · Eukaryotes: C-Value Paradox · Number Of Genes And...<|control11|><|separator|>
[31]
Genome Browser User's Guide
The Genome Browser offers multiple tools that can correctly convert coordinates between different assembly releases. For more information on conversion tools, ...
[32]
UCSC Genome Browser Table Browser Tutorial
The UCSC Table Browser is a flexible tool for accessing and exporting data from genome browser tracks. This tutorial introduces the Table Browser interface and ...<|control11|><|separator|>
[33]
Enhancers, gene regulation, and genome organization - PMC
Apr 1, 2021 · Typically located at long genomic distances from their target genes, enhancers may be in upstream or downstream intergenic regions, in intronic ...
[34]
Transcriptional regulation by promoters with enhancer function - NIH
Promoters are located in close proximity to the 5′ end of genes and capable of inducing gene expression.
[35]
Core Promoters in Transcription: Old Problem, New Insights - PMC
The TATA box, TATAA, (TSS), the first core promoter element to be identified, was biochemically found to be located 20–30 bp upstream of the transcription start ...
[36]
Genome-Wide Prediction and Validation of Intergenic Enhancers in ...
Gene expression in eukaryotes is regulated by the orchestrated binding of regulatory proteins to promoters, enhancers, and other cis-regulatory DNA elements ( ...
[37]
CTCF: An Architectural Protein Bridging Genome Topology ... - NIH
Approximately 50% of CTCF binding sites reside within intergenic regions, ~15% are located near promoters and ~40% are intragenic (exons and introns) (Fig.1).
[38]
CpG islands under selective pressure are enriched with H3K4me3 ...
Analyzing thirteen human cell lines, we found H3K4me3, H3K27ac and H3K36me3 enrichment in the CGIs that experienced selective events. Further studies using ...
[39]
The interplay between DNA and histone methylation - PubMed Central
DNA hypomethylation across intergenic regions and DNA hypermethylation at promoter CpG islands have been described in many cancer contexts, independently of ...
[40]
A Novel Molecular Switch - PMC - PubMed Central
The operator of the lac operon, a short stretch of DNA (~17 base pairs), is composed of two nearly identical half sites that is located between the end of the ...
[41]
Intergenic Transcription in the Human β-Globin Gene Cluster - NIH
Several kilobases upstream of the ɛ-globin gene are at least five DNase I-hypersensitive sites (HS1 to HS5) which constitute the locus control region (LCR).
[42]
The landscape of RNA polymerase II transcription initiation in C ...
Based on the overlap of transcription initiation clusters with mapped transcription factor binding sites, we define 2361 transcribed intergenic enhancers.
[43]
Conserved DNA sequence features underlie pervasive RNA ...
Based on their location, pausing sites were classified into one of four major categories: promoter-proximal, gene-body, antisense or intergenic. For defining ...
[44]
RNA polymerase mapping in plants identifies intergenic regulatory ...
Our results suggest that bidirectional transcription can identify intergenic genomic regions in plants that play an important role in transcription regulation.Genomic Partitioning In... · Results · Ire In Maize Co-Localize...
[45]
Distal CpG islands can serve as alternative promoters to transcribe ...
We further hypothesized that the tissue-specific usage of CGIs as alternative promoters may be regulated by cell-type–specific transcription factors (TFs).
[46]
Tumor-specific usage of alternative transcription start sites in ...
Oct 14, 2011 · Extensive alternative splicing and dual promoter usage generate Tcf-1 protein isoforms with differential transcription control properties.The Wnt Pathway Regulates... · Tcf12 Protein Expression Is... · In Silico Protein...
[47]
Expression of mouse histone genes: transcription into 3' intergenic ...
Expression of mouse histone genes: transcription into 3' intergenic DNA and ... mRNA production and stability in serum-stimulated mouse 3T6 fibroblasts.
[48]
CRISPR‐Cas9–Mediated Genome Editing Confirms EPDR1 ... - NIH
CRISPR-Cas9 genome editing in the hFOB1.19 cell model supports previous observations, where this regulatory region harboring GWAS-implicated variation operates ...
[49]
From GWAS signal to function: targeted CRISPR activation enables ...
Oct 1, 2025 · Our study demonstrates that activating genomic regions harboring specific non-coding GWAS SNPs can modulate gene expression, suggesting that ...
[50]
Widespread divergent transcription from bacterial and archaeal ...
Bidirectional promoters enable co-regulation of divergent genes and are enriched in both intergenic and horizontally acquired regions. Divergent transcription ...
[51]
Growth Temperature and Genome Size in Bacteria Are Negatively ...
Apr 5, 2013 · Specifically, with increasing habitat temperature and decreasing genome size, the proportion of genomic DNA in intergenic regions decreases.
[52]
Rho directs widespread termination of intragenic and stable RNA ...
We found ≈200 Rho-terminated loci that were divided evenly into 2 classes: intergenic (at the ends of genes) and intragenic (within genes).Results · Rho Termination At Trnas · Rho Inhibition Reveals...
[53]
Regulation of Bacterial Gene Expression by Transcription Attenuation
The key elements required for attenuation control of trp operon expression are found in a 162-bp leader region, which is defined as the region between the ...
[54]
Adaptive evolution of hybrid bacteria by horizontal gene transfer - PMC
We conclude that HGT opens windows of positive selection for the subsequent evolution by point mutations; this effect is most pronounced in intergenic regions.
[55]
Mobile Genetic Elements Associated with Antimicrobial Resistance
This review aims to outline the characteristics of the major types of mobile genetic elements involved in acquisition and spread of antibiotic resistance
[56]
A Rapid, Large-Scale Pan-Genome Analysis Tool for Intergenic ...
Apr 1, 2018 · However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current ...Missing: variability | Show results with:variability
[57]
Functional metagenomic analysis of quorum sensing signaling in a ...
Oct 28, 2021 · We performed a metagenomic screen for AHL genes in an activated sludge microbial community from the Ulu Pandan wastewater treatment plant (WWTP) in Singapore.
[58]
Long non-coding RNAs: definitions, functions, challenges ... - Nature
Jan 3, 2023 · Most lncRNAs evolve more rapidly than protein-coding sequences, are cell type specific and regulate many aspects of cell differentiation and ...
[59]
Gene regulation by long non-coding RNAs and its biological functions
Dec 22, 2020 · Evidence accumulated over the past decade shows that long non-coding RNAs (lncRNAs) are widely expressed and have key roles in gene regulation.
[60]
Superenhancers as master gene regulators and novel therapeutic ...
Feb 1, 2023 · Superenhancers (SEs), identified as novel epigenetic regulatory elements, are clusters of enhancers with cell-type specificity that can drive the aberrant ...<|separator|>
[61]
Principles of genome folding into topologically associating domains
Apr 10, 2019 · The genome of many species is organized into domains of preferential internal chromatin interactions called “topologically associating domains” (TADs).
[62]
Evolutionary stability of topologically associating domains is ...
Aug 7, 2018 · TADs contribute to gene regulation by restricting chromatin interactions of regulatory sequences, such as enhancers, with their target genes.
[63]
Novel Insights into Plant Genome Evolution and Adaptation as ...
Here, we review some of the most updated examples on the roles of transposable elements (TEs) in plant genome evolution and adaptation through epigenetics ...
[64]
Transposable Elements Contribute to the Adaptation of Arabidopsis ...
Aug 9, 2018 · Our results highlight the importance of variations in TEs for the adaptation of plants in general in the context of rapid global climate change.Abstract · Introduction · Results · Discussion
[65]
Polycomb Group Response Elements in Drosophila and Vertebrates
In Drosophila, there are specific regulatory DNA elements called Polycomb group response elements (PREs) that bring PcG protein complexes to the DNA. Drosophila ...
[66]
A single-cell atlas of chromatin accessibility in the human genome
Nov 24, 2021 · This rich resource provides a foundation for the analysis of gene regulatory programs in human cell types across tissues, life stages, and organ systems.
[67]
Cell type-specific novel long non-coding RNA and circular RNA in ...
We identified hundreds of novel non-coding RNA genes and showed that the majority have cell type-dependent expression.Results · Transcriptional Signatures... · Circular Rna In Mature...
[68]
Epigenome and interactome profiling uncovers principles of distal ...
Oct 10, 2025 · In this calculation, the mean PhastCons scores were 0.274 for robust cCREs, 0.179 for E7 segments, 0.648 for exons, and 0.031 for the random ...
[69]
Conservation Track Settings - UCSC Genome Browser
The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1.
[70]
Conservation and regulatory associations of a wide affinity range of ...
We found that not only high affinity binding sites, but also numerous moderate and low affinity binding sites, are under negative selection in the mouse genome.Introduction · Results · Pbm ``bound'' 8-Mers Are...
[71]
Comparative sequencing of human and chimpanzee MHC class I ...
This report describes a large-scale single-contig comparison between human and chimpanzee genomes via the sequence analysis of almost one-half of the ...
[72]
Multiple genome alignments - Ensembl
Multiple alignments are calculated between groups of genomes. These are used to calculate ancestral sequences, age of base, conservation scores and constrained ...Missing: intergenic exons
[73]
Genomic Locations of Conserved Noncoding Sequences and Their ...
Mar 26, 2016 · The conservation levels of the CNSs are significantly higher than those of random sequences and lincRNA exons. Purifying selection on CNSs is ...
[74]
Conserved non-coding elements and cis regulation
Apr 1, 2013 · Phylogenetic footprinting. A technique to identify potential CRMs within conserved non-coding sequences through comparison with orthologous ...Missing: CNCs | Show results with:CNCs
[75]
Strong Heterogeneity in Mutation Rate Causes Misleading ...
Given that we focused our analyses on noncoding regions, which are essentially neutrally evolving, these indel hotspots are unlikely to result from selection, ...
[76]
High rate of mutation and efficient removal by selection of structural ...
The inferred SV mutation rate is roughly 10% of the SNV rate and ~30% of the short indel rate, indicating that SVs comprise about 8% of new mutations, or ...
[77]
The landscape and driver potential of site-specific hotspots across ...
May 13, 2021 · Genome-wide we find more high-frequency SNV and indel hotspots than expected given mutational background models. ... neutral somatic mutation rate ...
[78]
[PDF] Large multi-chromosomal duplications encompass many members ...
The human genome contains thousands of genes that encode a diverse repertoire of odorant receptors (ORs). We report here on the identification and ...
[79]
Complex Evolution of 7E Olfactory Receptor Genes in Segmental ...
Most OR genes have arisen by local duplication, but some, especially in humans, have duplicated interchromosomally (Trask et al.
[80]
On the Evolution of Lactase Persistence in Humans - Annual Reviews
Aug 31, 2017 · The human lactase persistence-associated SNP −13910*T enables in vivo functional persistence of lactase promoter-reporter transgene expression.Missing: intergenic | Show results with:intergenic
[81]
Reconstructing the History of Yeast Genomes | PLOS Genetics
May 15, 2009 · Fourth, if an endpoint of two inversions or translocations falls in a large intergenic region between two genes, it becomes less clear ...
[82]
[PDF] Reconstruction of ancestral chromosome architecture and gene ...
May 31, 2016 · We retraced all chromosomal rearrangements, includ- ing gene losses, gene duplications, chromosomal inversions and translocations at single gene ...
[83]
The importance of the Neutral Theory in 1968 and 50 years on - PMC
The Neutral Theory of Molecular Evolution asserts that most de novo mutations are either sufficiently deleterious in their effects on fitness that they have ...
[84]
Neutral Theory, Transposable Elements, and Eukaryotic Genome ...
Apr 23, 2018 · Kimura's fundamental concept of neutral mutation-random drift, which was published 50 years ago, is re-examined in light of its pervasive influence on ...Missing: post- | Show results with:post-