Fact-checked by Grok 2 weeks ago

Gene structure

A gene is a discrete segment of deoxyribonucleic acid (DNA) that serves as the basic unit of , encoding the instructions for synthesizing a functional product, most commonly a protein, through transcription into (mRNA) and subsequent translation. In eukaryotic organisms, which include animals, , fungi, and protists, genes exhibit a complex, modular structure consisting of regulatory elements, coding sequences interrupted by non-coding regions, and flanking untranslated sequences, enabling precise control of expression and for protein diversity. This contrasts with prokaryotic genes in and , which are typically continuous coding sequences without introns and organized in operons for coordinated regulation. The core components of a eukaryotic include promoters, short DNA sequences located upstream of the coding region where and general transcription factors assemble to initiate transcription. Promoters often feature conserved motifs like the approximately 25 base pairs upstream of the transcription start site, which helps recruit the transcription initiation complex. Upstream or downstream of the promoter, enhancers act as distal regulatory elements that can loop to interact with the promoter, boosting transcription rates in a - or condition-specific manner by binding activator proteins. The coding portion of the is divided into exons, which are retained in the mature mRNA and translated into sequences, interspersed with introns, longer non-coding sequences that are transcribed but removed during by the . At the 5' and 3' ends of the gene lie untranslated regions (UTRs), which do not code for proteins but play crucial roles in mRNA stability, localization, and efficiency; the 3' UTR also contains the signal (e.g., AAUAAA in the transcript) that directs cleavage and addition of a poly(A) tail of about 200 residues. In the , approximately 19,800 protein-coding genes constitute just 1.5% of the 3 billion base pairs, with the remainder comprising introns, regulatory elements, and . This intricate organization allows for extensive regulatory complexity, including epigenetic modifications like and histone acetylation that influence accessibility and gene activity. Overall, eukaryotic gene structure facilitates adaptive responses to environmental cues and developmental needs, underscoring its evolutionary significance.

Fundamental Concepts

Definition of a Gene

The concept of a gene traces its origins to Gregor Mendel's 1865 experiments with pea plants, which demonstrated that heredity is transmitted through discrete, stable units rather than blending traits continuously. These units, later termed , were initially understood as abstract factors governing inheritance patterns, with the term "gene" coined by in 1909 to describe such heritable elements independent of visible structures. Over the early , cytogenetic studies linked genes to chromosomes, establishing them as physical entities on these structures, though their molecular nature remained elusive until the mid-1900s. A pivotal advancement came in 1941 with and Edward Tatum's experiments on the bread mold , which proposed the ": each gene specifies a single enzyme required for a biochemical reaction. This idea was refined in the 1950s as "one gene–one polypeptide" to account for proteins composed of multiple chains and post-translational modifications. Concurrently, articulated the in 1958, positing that genetic information flows unidirectionally from DNA to RNA to protein, framing genes as sequences directing this process. In modern , a is defined as a specific locus on a comprising coding sequences and associated regulatory elements that enable the transcription of and, where applicable, its into a functional product. This encompasses both protein-coding , which encode polypeptides (often called structural ), and non-coding that produce functional RNAs such as (rRNA), (tRNA), and (miRNA) without translating into proteins. The estimated number has stabilized around 19,400 as of 2025. For instance, in the , approximately 19,400 protein-coding have been identified, occupying less than 2% of the total DNA sequence.

Basic Components of Gene Structure

Genes consist of segments of double-stranded DNA, composed of four nucleotide bases: adenine (A), thymine (T), cytosine (C), and guanine (G), which pair specifically (A with T, C with G) to form the double helix structure. In some viruses, genes are instead segments of single-stranded or double-stranded RNA, where uracil (U) replaces thymine. These nucleotides are linked via phosphodiester bonds in a linear chain with inherent polarity, running from the 5' end (where the phosphate group is attached to the 5' carbon of the sugar) to the 3' end (where the hydroxyl group is on the 3' carbon), which dictates the direction of synthesis and reading during transcription and translation. At the core of a gene's coding potential is the open reading frame (ORF), a continuous sequence of nucleotides that begins with a start codon (ATG in DNA, AUG in RNA) and ends with one of three stop codons (TAA, TAG, or TGA), without intervening stop codons in the reading frame. The ORF encodes the amino acid sequence of a protein and is read in triplets called codons, each specifying an amino acid or signaling termination, as defined by the universal genetic code. This frame establishes the translatable portion of the gene, distinguishing it from non-coding sequences. Genes are flanked by untranslated regions (UTRs) at both ends: the 5' UTR upstream of the and the 3' UTR downstream of the , which do not code for protein but influence mRNA stability, localization, and efficiency. In prokaryotes, the 5' UTR often contains a , such as the Shine-Dalgarno sequence (typically AGGAGG or variants), which facilitates ribosome attachment near the . These UTRs vary in length and sequence but are essential for regulating . Gene lengths exhibit significant variability across organisms, with typical prokaryotic genes averaging around 1 kilobase () and eukaryotic genes ranging from 10 to 50 , reflecting differences in regulatory complexity and non-coding content. For example, the length of genes is approximately 24 kb. This variability arises primarily from the size of intragenic and flanking elements, though the core coding sequence (ORF) remains relatively conserved in length for functional reasons. In chromosomal context, genes exist as discrete linear segments along DNA molecules organized into chromosomes, separated by intergenic regions that may contain regulatory sequences or non-functional DNA. These intergenic spaces buffer genes and accommodate elements that modulate expression, though their precise roles are detailed in discussions of regulatory components.

Shared Features Across Domains

Transcriptional Units

A transcriptional unit is defined as the contiguous segment of DNA that is transcribed into a single RNA molecule by RNA polymerase, encompassing the promoter region, the transcribed coding sequence, and the terminator sequence, with boundaries marked by the transcription start site (TSS) upstream and the transcription termination site downstream. This unit serves as the fundamental functional module for gene expression across all domains of life, ensuring that genetic information is accurately copied from DNA to RNA. In its simplest form, the transcriptional unit captures the essential elements required for RNA synthesis, from the precise location where transcription initiates to the point where it concludes. The transcription process within a unit proceeds through three main stages: , , and termination. occurs when , often in complex with accessory factors such as sigma factors in or general transcription factors in eukaryotes, binds to the promoter and unwinds the DNA at or near the TSS to form an open complex, allowing the first to be incorporated as the +1 position. During , the polymerase moves along the template strand, synthesizing a complementary RNA chain at a rate of approximately 20–50 per second in and slower in eukaryotes, maintaining a transcription bubble of about 12–14 base pairs. Termination is triggered by specific sequences or structures, such as rho-independent hairpins in prokaryotes or signals in eukaryotes, leading to the release of the and the primary transcript from the DNA template. The resulting primary transcript is a direct copy of the unit's non-template strand (with U replacing T), serving as pre-mRNA in eukaryotes or mature mRNA in prokaryotes. In most cases, a single transcriptional unit corresponds to one gene, producing RNA for a single protein or functional RNA molecule; however, in certain prokaryotic systems, one unit can include multiple genes transcribed as a polycistronic message. The TSS, universally numbered as +1, provides a reference point for mapping, with upstream regions (negative coordinates) containing promoter elements and downstream regions (positive coordinates) including the and untranslated regions. For instance, in , the subunit of recognizes conserved promoter motifs to direct accurate initiation at the TSS.

Core Regulatory Elements

Core regulatory elements are sequences essential for controlling by facilitating the recruitment of transcriptional machinery and modulating transcription rates. The promoter represents the primary core element, typically encompassing a region from approximately -40 to +1 base pairs relative to the transcription start site, where it serves as the for and general transcription factors to initiate transcription. These promoters contain consensus sequences that enable precise assembly of the pre-initiation complex, ensuring accurate and efficient gene activation across organisms. Terminators function as critical endpoints of the transcriptional unit, signaling the cessation of RNA synthesis and release of the polymerase through structural features such as GC-rich hairpin loops followed by poly-U tracts, which destabilize the transcription elongation complex. Proximal control regions, often considered enhancer-like elements adjacent to the core promoter, include domain-specific motifs such as the -10 and -35 boxes in prokaryotes or CAAT and boxes in eukaryotes that fine-tune initiation rates by binding activator transcription factors. These elements collectively provide platforms for sequence-specific DNA-binding domains in transcription factors, such as the and motifs, which recognize short DNA motifs to regulate transcriptional output. Repressive elements, such as operators in prokaryotes and silencers in eukaryotes, counteract activation by recruiting transcription factors that inhibit promoter activity through mechanisms like steric hindrance or, in eukaryotes, chromatin modification, thereby establishing boundaries for patterns. While distal enhancers in eukaryotes extend this regulation over longer distances, core elements provide the foundational proximal framework.

Prokaryotic Gene Organization

Operons and Polycistronic Genes

In prokaryotes, an operon is defined as a functional unit of DNA comprising a cluster of genes that are transcribed together from a single promoter into a polycistronic messenger RNA (mRNA) molecule, allowing coordinated expression of multiple proteins. This polycistronic mRNA encodes several proteins, each translated from its own open reading frame (ORF), typically with internal ribosome binding sites facilitating sequential translation. The concept of the operon was first proposed by François Jacob and Jacques Monod in their seminal 1961 paper, based on genetic and biochemical studies of the lac operon in Escherichia coli, where they described repressor-operator interactions as a mechanism for regulating gene expression in response to environmental signals. Structurally, an operon consists of a promoter region upstream, followed by an operator site, a leader sequence, one or more structural genes (ORFs), and a terminator sequence downstream. The leader sequence, often untranslated, can include regulatory elements such as ribosome binding sites or sequences that form secondary structures influencing transcription or translation. In the lac operon of E. coli, for example, three genes (lacZ, lacY, and lacA) encode enzymes for lactose metabolism, transcribed as a single mRNA unit. Operons are classified by their regulation: constitutive operons are continuously expressed, while most are inducible or repressible. Inducible operons, like the *, are typically off but activated by an inducer (e.g., lactose or its analog binding the repressor to relieve blockage). Repressible operons, such as the * in E. coli, are on under normal conditions but repressed by a corepressor (e.g., binding the repressor). The * additionally employs transcription , where high levels promote formation of a terminator hairpin in the leader sequence, causing premature transcription termination; low levels allow an antiterminator structure to form, enabling full operon expression. Approximately 50% of genes in bacterial genomes, such as E. coli, are organized into operons, with higher prevalence among those encoding proteins for shared metabolic pathways. This clustering provides advantages like stoichiometric control of protein levels and rapid, energy-efficient responses to environmental changes by co-regulating functionally related genes from one transcriptional event.

Promoter and Terminator Structures

In prokaryotes, promoters serve as critical DNA sequences that recruit RNA polymerase to initiate transcription, primarily through recognition by sigma factors. In Escherichia coli, the housekeeping sigma factor σ⁷⁰ directs RNA polymerase to promoters featuring two conserved hexameric elements: the -35 box with consensus sequence TTGACA, located approximately 35 base pairs upstream of the transcription start site (TSS), and the -10 box (Pribnow box) with consensus TATAAT, situated about 10 base pairs upstream of the TSS. These elements facilitate initial binding and melting of the DNA to form the open complex, with the -35 box interacting primarily with region 4 of σ⁷⁰ and the -10 box with region 2. Promoter strength is significantly influenced by the spacing between the -35 and -10 boxes, with an optimal separation of 17 base pairs maximizing transcription efficiency; deviations, such as insertions or deletions, can reduce expression levels by up to 100-fold due to suboptimal alignment of sigma factor domains. Mutations in these consensus sequences further modulate promoter activity, as demonstrated in systematic studies where altering key nucleotides in the -10 box decreased open complex formation rates. Additional promoter features enhance specificity and strength in prokaryotes. The extended -10 region, immediately upstream of the -10 box, often contains TG motifs that stabilize interactions with σ⁷⁰ region 2.4, contributing to promoter recognition in a subset of strong promoters. UP elements, AT-rich sequences located upstream of the -35 box (typically from -40 to -60), recruit the alpha subunit of via its C-terminal domain, boosting transcription up to 20-fold in E. coli promoters like that of the rRNA . Variations in promoter architecture allow differential regulation; promoters, recognized by σ⁷⁰, drive constitutive expression of essential genes under normal growth conditions, while stress-inducible promoters often utilize alternative sigma factors like σˢ (RpoS) for stationary phase or osmotic stress responses, featuring suboptimal -10 or -35 matches that favor σˢ selectivity over σ⁷⁰. Prokaryotic terminators signal the end of transcription, preventing into downstream genes. Rho-independent terminators, also known as intrinsic terminators, consist of a GC-rich forming a stable stem-loop structure (typically 7-10 base pairs with 4-5 GC pairs) followed by a run of 6-8 uracil residues in the nascent ; the pauses , and the weak rU-dA hybrids destabilize the transcription complex, causing dissociation. These are common in E. coli operons, such as the trp attenuator, where the stem-loop's stability (ΔG ≈ -15 to -25 kcal/mol) correlates with termination efficiency exceeding 90%. In contrast, Rho-dependent terminators lack such and rely on the Rho , a ring-shaped hexameric protein that binds C-rich, G-poor "rut" sites on the nascent via its RNA-binding domain, then uses ATP-dependent activity to translocate 5' to 3' along the RNA and disrupt the elongating , often at unstructured pause sites. Rho termination is prevalent in highly expressed genes like those in ribosomal operons, ensuring rapid recycling of . Archaea exhibit promoter and terminator structures that bridge bacterial and eukaryotic features, reflecting their phylogenetic position. Archaeal promoters typically include a TATA-like box (consensus TTTAA[A/T]A) centered 25-30 base pairs upstream of the TSS, recognized by the (TBP), which bends DNA to facilitate recruitment of B (TFB), a homolog of eukaryotic TFIIB. This TBP-TFB complex positions for initiation, with BRE (TFB-responsive element) upstream of the enhancing specificity in many archaeal species like . Archaeal terminators primarily involve at oligo(dT) tracts on the non-template strand, facilitating U-rich 3' ends that disrupt the transcription elongation complex without RNA hairpins, alongside factor-dependent mechanisms such as the helicase Eta and ribonuclease FttA. by aCPSF enhances termination efficiency in many species. Consensus sequences for prokaryotic promoters and terminators are derived from multiple sequence alignments of experimentally verified regulatory regions, using computational tools to identify overrepresented motifs. For instance, the (Multiple Em for Motif Elicitation) suite employs expectation-maximization algorithms to discover position weight matrices from aligned E. coli promoter datasets, yielding refined consensuses that account for sequence variability and improve prediction accuracy in genomic scans. Such alignments, initially compiled from dozens of promoters, have been expanded to thousands via high-throughput sequencing, confirming the core elements' conservation across .

Eukaryotic Gene Organization

Exons, Introns, and Splicing

In eukaryotic genes, the coding sequence is typically organized into a discontinuous structure known as split genes, where exons—segments that are retained and translated into protein—alternate with introns, which are non-coding sequences removed during post-transcriptional processing called splicing. This architecture allows for the production of (mRNA) by excising introns and ligating exons, a process essential for accurate in eukaryotes. The discovery of this split gene organization came in 1977 through electron microscopy studies of adenovirus transcripts hybridized to viral DNA, revealing looped-out intron regions between colinear exon segments; this work by Susan M. Berget, Claire , and Phillip A. Sharp demonstrated that eukaryotic genes are interrupted by non-coding sequences. Independently, ' group reported similar findings for adenovirus, establishing the prevalence of introns across eukaryotic genomes. For their pioneering identification of split genes and , Sharp and Roberts shared the 1993 in or . Splicing is mediated by the spliceosome, a large ribonucleoprotein complex, which recognizes specific consensus sequences at intron boundaries: the 5' splice site typically begins with the dinucleotide GU (GT in DNA), and the 3' splice site ends with AG. Within the intron, a branch point sequence—often featuring an adenosine (A) residue, typically in a YNCURAC motif where Y is pyrimidine and R is purine—serves as the nucleophile for the first transesterification step, forming a lariat intermediate.90546-3) The spliceosome assembles stepwise, with U1 small nuclear ribonucleoprotein (snRNP) binding the 5' splice site via base-pairing and U2 snRNP interacting with the branch point to facilitate intron excision and exon joining. Splicing can be constitutive, where all exons are invariably included in the mature mRNA, or , allowing exon inclusion to generate multiple protein isoforms from a single . In humans, approximately 95% of multi-exon genes undergo , enabling proteomic diversity through mechanisms like , mutually exclusive exons, or intron retention. On average, a human protein-coding contains about 9 exons, spans 27 kilobases () in genomic length, and has a coding sequence of roughly 1.3 , with introns comprising the majority of the span and contributing to regulatory complexity by accommodating splicing variants. Beyond their removal, introns play roles in mRNA nuclear export—often by recruiting export factors during splicing—and in enhancing transcript stability, such as through protection against degradation pathways. These functions underscore introns' contributions to fine-tuning gene expression in eukaryotes, in contrast to prokaryotes, which generally lack introns.

Distal Regulatory Elements

Distal regulatory elements in eukaryotic genomes are cis-acting DNA sequences located far from the transcription start sites of their target genes, often spanning distances up to 1 megabase (Mb), that modulate gene expression through long-range interactions. These elements include enhancers, which activate transcription; silencers, which repress it; and insulators, which delineate functional domains to prevent regulatory interference. Unlike proximal core promoters, distal elements rely on chromatin architecture to exert their effects, enabling precise, tissue-specific control of gene activity essential for development and cellular differentiation.01215-1) Enhancers are the most prominent distal activators, binding transcription factors (TFs) and co-activators to stimulate by looping to target promoters, a process facilitated by protein complexes such as and . These loops bring enhancers into physical proximity with promoters, often within topologically associating domains (TADs) identified through chromatin conformation capture techniques, which reveal megabase-scale chromatin folds that organize regulatory interactions. A classic example is the β-globin locus control region (LCR), a powerful enhancer cluster located 6–22 kb upstream of the β-globin genes, which drives high-level, erythroid-specific expression by interacting with promoters via looping and recruiting factors like NF-E2; its activity is confined to cells, illustrating specificity. Silencers function as distal repressive , counteracting by recruiting repressive complexes to inhibit transcription, often through disruption of enhancer-promoter loops or deposition of silencing marks. Polycomb response (PREs) exemplify silencers, binding Polycomb repressive complexes (PRC1 and PRC2) to catalyze trimethylation, thereby maintaining developmental ; for instance, PRC2-bound regions repress target genes via long-range contacts. These ensure that inappropriate is prevented in specific cellular contexts. Insulators, or boundary elements, safeguard genomic domains by blocking enhancer-promoter cross-talk and halting the spread of repressive states. The CCCTC-binding factor () is the primary insulator-binding protein in vertebrates, recognizing thousands of sites across the —such as the 13,804 identified in fibroblasts—that demarcate active and repressive domains, as shown by ChIP-seq analyses enriched at boundaries. CTCF sites, often organized in tandem, cooperate with to anchor loops that insulate TADs, preventing regulatory spillover between adjacent loci. The functional integration of distal elements depends on three-dimensional () genome , where data demonstrate that looping within TADs positions enhancers and silencers near promoters, while insulators define boundaries to maintain domain autonomy; this organization is dynamically regulated across cell types, with loops varying in ~28% of interactions. The contains an estimated 100,000 to 400,000 enhancers, vastly outnumbering the ~20,000 protein-coding genes, underscoring their regulatory dominance. Among these, super-enhancers—large clusters of typical enhancers bound densely by and master TFs—particularly drive cell identity by robustly activating lineage-specific genes, a concept established in through ChIP-seq studies in embryonic stem cells and differentiated lineages. Enhancer evolution is characterized by rapid sequence turnover across species, with many elements gained or lost over evolutionary time, yet functional conservation is preserved through core TF binding motifs that maintain regulatory logic. Comparative analyses across 20 mammalian species reveal that while enhancer sequences diverge, motifs for key TFs like CEBPA in liver-specific enhancers remain enriched in conserved regions, allowing adaptation while retaining essential gene control. This turnover contributes to species-specific traits, contrasting with the more stable core motifs that ensure cross-species functionality.

Comparative and Specialized Structures

Key Differences Between Prokaryotes and Eukaryotes

Prokaryotic genes are typically continuous coding sequences lacking introns, with an average length of approximately 1 kb, contributing to their compact organization. In contrast, eukaryotic genes are interrupted by introns, which expand their total length significantly; for example, a typical eukaryotic gene may span tens to hundreds of kilobases due to introns that can total over 175 kb in some cases. This structural difference reflects the absence of spliceosomal introns in prokaryotes, where genes are streamlined for efficient replication and expression. Gene regulation in prokaryotes relies on operons, which cluster functionally related genes under shared promoters for coordinated, contact-dependent control via proximal operators. Eukaryotes, however, employ distal enhancers located far from promoters—often thousands of base pairs away—to activate transcription through looping interactions, alongside that generates protein diversity from a single gene. These mechanisms allow eukaryotes finer, combinatorial control suited to multicellular complexity. The intron-early theory posits that introns originated in a common ancestor of prokaryotes and eukaryotes, facilitating early exon shuffling before widespread loss in prokaryotic lineages. Supporting this, self-splicing introns (group I and II) are present in approximately 25% of eubacterial genomes, though rare and typically few per genome, indicating relictual retention rather than routine use. Eukaryotic DNA is packaged into nucleosomes, each comprising ~147 bp of DNA wrapped around a histone octamer, which restricts access to genes and requires remodeling for transcription. Prokaryotic DNA, by contrast, exists as naked chromatin without histones, enabling direct, unimpeded access by RNA polymerase. Prokaryotic genomes devote ~90% of their sequence to protein-coding regions, minimizing non-coding DNA for rapid proliferation. Eukaryotic genomes, however, allocate less than 2% to coding sequences, with the majority comprising non-coding elements like introns and regulatory regions that support sophisticated control. Archaea bridge these domains, featuring bacterial-like operons for polycistronic transcription alongside eukaryotic-like promoters recognized by TATA-binding protein and transcription factor B homologs. This hybrid organization highlights evolutionary convergence in transcription initiation.
AspectProkaryotesEukaryotes
Gene ContinuityContinuous, no introns (~1 kb avg.)Interrupted by introns (tens-hundreds kb total)
RegulationOperons, proximal operatorsDistal enhancers, alternative splicing
DNA PackagingNaked DNANucleosome-wrapped (histone octamers)
Coding Proportion~90% of genome<2% of genome
Intron PresenceRare self-splicing (~25% species)Abundant spliceosomal introns

Gene Structure in Viruses and Organelles

Gene structure in viruses exhibits remarkable diversity, reflecting their compact genomes that range from approximately 10 to 200 kb in size and utilize single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), or as genetic material. Most viral genes lack introns, enabling efficient transcription and translation without splicing, though exceptions exist such as self-splicing group I introns in T4, where multiple introns interrupt genes like nrdB and nrdD, facilitating their removal via activity. To maximize coding capacity within these constrained genomes, viruses frequently employ overlapping reading frames, as seen in human immunodeficiency virus type 1 (HIV-1), where the and genes overlap by 205 , and expression of the pol-encoded enzymes requires a programmed -1 mediated by a structured RNA slippery sequence. This frameshifting mechanism, occurring at about 5-10% efficiency, produces the essential Gag-Pol polyprotein while conserving genomic space. In organelles, gene structure retains prokaryotic-like features due to their endosymbiotic origins from ancient bacteria, with mitochondrial and chloroplast genomes organized as circular DNA molecules that encode a limited set of genes involved in energy production. Human mitochondrial DNA (mtDNA), a 16.5 kb circular molecule, contains 37 genes—13 protein-coding, 22 tRNA, and 2 rRNA—lacking introns in most cases, though rare interruptions occur in some species, and transcripts are primarily monocistronic in animals but can be polycistronic in plants. Chloroplast genomes similarly feature bacterial-style organization, with most genes arranged in polycistronic operons transcribed from single promoters, separated by short intergenic regions, and encoding about 100-120 genes for photosynthesis and housekeeping functions. In kinetoplastid parasites like trypanosomes, mitochondrial RNA editing introduces or deletes uridines (U) at hundreds of sites across multiple genes, guided by small RNAs, to create functional mRNAs from cryptic pre-edited transcripts. These specialized structures underscore the evolutionary adaptations in viruses and organelles, where compact, intron-poor genomes with overlapping elements or post-transcriptional modifications optimize expression under selective pressures for replication efficiency and host interaction.

References

  1. [1]
    From DNA to RNA - Molecular Biology of the Cell - NCBI Bookshelf
    Both intron and exon sequences are transcribed into RNA. The intron sequences are removed from the newly synthesized RNA through the process of RNA splicing.
  2. [2]
    Differential Gene Transcription - Developmental Biology - NCBI - NIH
    First, eukaryotic genes are contained within a complex of DNA and protein called chromatin. The protein component constitutes about half the weight of chromatin ...<|control11|><|separator|>
  3. [3]
    Gene expression and regulation - Autoimmunity - NCBI Bookshelf
    Once the concept of the gene and its structure are defined, the sequential stages of transcription, translation, and the molecules involved will be described.
  4. [4]
    Evolutionarily new genes in humans with disease phenotypes ... - NIH
    Using the recently reported average human generation time of 26.9 years (Wang et al. 2023) and the most updated number of coding genes (19,831 based on Ensembl ...
  5. [5]
    [PDF] Genetic Timeline - National Human Genome Research Institute
    1865. Discovery: Heredity Transmitted in Units. Gregor Mendel's experiments on peas demonstrate that heredity is transmitted in discrete units.
  6. [6]
    The Evolving Definition of the Term “Gene” - PMC - PubMed Central
    This paper presents a history of the changing meanings of the term “gene,” over more than a century, and a discussion of why this word, so crucial to genetics, ...
  7. [7]
    Gene - Stanford Encyclopedia of Philosophy
    Jun 29, 2022 · Mendel's experiments, perceived as exemplars for the study of variation and heredity, named “genetics” by William Bateson (1861–1926; 1907), ...
  8. [8]
    The Contributions of George Beadle and Edward Tatum - NIH
    May 3, 2016 · Although the “one gene–one enzyme” hypothesis led directly to our current understanding of gene action, it is arguably not the most important ...
  9. [9]
    Central Dogma of Molecular Biology - Nature
    Aug 8, 1970 · The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information.
  10. [10]
    What is a gene?: MedlinePlus Genetics
    May 21, 2024 · A gene is the basic physical and functional unit of heredity. Genes are made up of DNA. Some genes act as instructions to make molecules called proteins.What is a chromosome? · What is DNA? · What is noncoding DNA? · What is a cell?
  11. [11]
    What is noncoding DNA?: MedlinePlus Genetics
    Jan 19, 2021 · Noncoding DNA does not provide instructions for making proteins. Scientists once thought noncoding DNA was “junk,” with no known purpose.
  12. [12]
    The status of the human gene catalogue | Nature
    Oct 4, 2023 · Over 20 years after the original publication of the human genome, the number of protein-coding genes is stabilizing at around 19,500 (Fig. 2), ...Missing: protein- coding
  13. [13]
    The Structure and Function of DNA - Molecular Biology of the Cell
    This polarity in a DNA chain is indicated by referring to one end as the 3′ end and the other as the 5′ end. The three-dimensional structure of DNA—the double ...
  14. [14]
    Open Reading Frame - National Human Genome Research Institute
    An open reading frame, as related to genomics, is a portion of a DNA sequence that does not include a stop codon (which functions as a stop signal).
  15. [15]
    Open Reading Frames - MeSH - NCBI - NIH
    A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon.
  16. [16]
    Anatomy of a Gene - Learn Genetics Utah
    A gene includes a promoter, protein-coding region (exons), introns, stop codons, and 5-prime and 3-prime untranslated regions (UTRs).Missing: composition | Show results with:composition
  17. [17]
    Shine-Dalgarno Sequences Play an Essential Role in the ...
    The classical mechanism for translation initiation in bacteria depends on ribosome positioning within the 5′ UTR of the mRNA through hybridization of a ...
  18. [18]
    Average Gene Length Is Highly Conserved in Prokaryotes ... - PubMed
    The average length of genes in a eukaryote is larger than in a prokaryote, implying that evolution of complexity is related to change of gene lengths.
  19. [19]
    Average gene length - Various - BNID 111922
    The coding sequence of a gene in the eukaryote kingdom is on average 445 bp longer than that in the prokaryotes."
  20. [20]
    1 - Chromosome Structure - EdTech Books
    The bacterial chromosome includes intergenic DNA sequences. Intergenic sequences are located between structural genes and are not typically transcribed.
  21. [21]
    2.1: Overview of Transcription - Biology LibreTexts
    Jul 13, 2021 · The transcribed grey DNA region in each of the three panels are the transcription unit of the gene. Termination sites are typically 3' to, or ...
  22. [22]
    Biochemistry, Replication and Transcription - StatPearls - NCBI - NIH
    The process occurs in three stages-initiation, elongation, and termination. Once the DNA is formed, it undergoes the process of transcription synthesizing ...Introduction · Fundamentals · Molecular Level · Testing
  23. [23]
    Eukaryotic core promoters and the functional basis of transcription ...
    Transcription typically initiates at a defined position, the transcription start site (TSS), at the 5' end of a gene, which we refer to as gene start. The TSS ...
  24. [24]
    Bacterial Sigma Factors and Anti-Sigma Factors: Structure, Function ...
    Jun 26, 2015 · Sigma factors are multi-domain subunits of bacterial RNA polymerase (RNAP) that play critical roles in transcription initiation.
  25. [25]
    Structural basis for transcription antitermination at bacterial intrinsic ...
    Jul 11, 2019 · The intrinsic terminator is characterized by a G-C rich hairpin followed by a U-track. Transcription termination at intrinsic terminators ...
  26. [26]
    Helix-turn-helix, zinc-finger, and leucine-zipper motifs for ... - PubMed
    Four distinct structural motifs have been proposed for the DNA-binding domains of eukaryotic transcriptional regulatory proteins; the helix-turn-helix, ...
  27. [27]
    Transcriptional silencers: driving gene expression with the brakes on
    Silencers are regulatory DNA elements that reduce transcription from their target promoters; they are the repressive counterparts of enhancers.
  28. [28]
    How to find genomic regions relevant for gene regulation - PMC - NIH
    ... 5–10 %, harbor key functional elements responsible for the regulation ... Gene Regulatory Elements across the Genome from Mammalian Cells. Cold Spring ...
  29. [29]
    Operon - an overview | ScienceDirect Topics
    An operon is a cluster of genes in prokaryotes that are transcribed together into a single mRNA molecule, encoding multiple proteins.
  30. [30]
    Operon mRNAs are organized into ORF-centric structures that ...
    Bacterial mRNAs are organized into operons consisting of discrete open reading frames (ORFs) in a single polycistronic mRNA.
  31. [31]
    Genetic regulatory mechanisms in the synthesis of proteins - PubMed
    1961 Jun:3:318-56. doi: 10.1016/s0022-2836(61)80072-7. Authors. F JACOB, J MONOD. PMID: 13718526; DOI: 10.1016/s0022-2836(61)80072-7. No abstract available ...
  32. [32]
    2.5: Gene and Operon - Biology LibreTexts
    Mar 5, 2021 · In prokaryotes, genes which encode proteins with relationships in a metabolic pathway form Operons - which produce polycistronic mRNA's.<|control11|><|separator|>
  33. [33]
    The trp operon (article) | Khan Academy
    Like regulation by the trp repressor, attenuation is a mechanism for reducing expression of the trp operon when levels of tryptophan are high. However, rather ...
  34. [34]
    Attenuation in the control of expression of bacterial operons - Nature
    Feb 26, 1981 · Attenuation in the control of expression of bacterial operons. Charles Yanofsky. Nature volume 289, pages 751–758 (1981)Cite this article.
  35. [35]
  36. [36]
    Operon - an overview | ScienceDirect Topics
    3.15). One advantage of operons is that the genes within the operon are regulated simultaneously. Usually these genes are not transcribed all the time. The ...
  37. [37]
    Rho-dependent transcription termination in bacteria recycles RNA ...
    Mar 14, 2019 · The bacterial RNA helicase, Rho, is a transcription termination protein that dislodges the elongation complexes. Here, we show that Rho ...
  38. [38]
    The structural basis for the oriented assembly of a TBP/TFB ... - PNAS
    Here we present the 2.4-Å crystal structure of archaeal TBP and the C-terminal core of TFB (TFB c ) in a complex with an extended TATA-box-containing promoter.Abstract · Sign Up For Pnas Alerts · Results And Discussion<|separator|>
  39. [39]
    Activation of Archaeal Transcription Mediated by Recruitment ... - NIH
    Archaeal promoters consist of a TATA box and a purine-rich adjacent upstream sequence (transcription factor B (TFB)-responsive element (BRE)), which are bound ...
  40. [40]
    The Nobel Prize in Physiology or Medicine 1993 - NobelPrize.org
    The Nobel Prize in Physiology or Medicine 1993 was awarded jointly to Richard J. Roberts and Phillip A. Sharp for their discoveries of split genes.
  41. [41]
    Deep surveying of alternative splicing complexity in the human ...
    Nov 2, 2008 · By combining mRNA-Seq and EST-cDNA sequence data, we estimate that transcripts from ∼95% of multiexon genes undergo alternative splicing and ...
  42. [42]
    Landscape of cohesin-mediated chromatin loops in the ... - Nature
    Jul 29, 2020 · Enhancers often exert their influence on gene expression over large distances through direct 3D chromatin contacts with multiple distal ...
  43. [43]
    Locus control regions - PMC - PubMed Central - NIH
    The enhancer activity of the β-globin LCR is tissue specific; that is, the expression of globin genes is confined to erythroid cells when linked to the β-globin ...
  44. [44]
    H3K27me3-rich genomic regions can function as silencers ... - Nature
    Jan 29, 2021 · Polycomb Group (PcG) proteins including Polycomb Repressive Complexes, PRC1 and PRC2 are widely recognized to mediate gene silencing of ...
  45. [45]
    Global analysis of the insulator binding protein CTCF in chromatin ...
    CTCF (CCCTC-binding factor) is the only known major insulator-binding protein in the vertebrates and has been shown to bind many enhancer-blocking elements.
  46. [46]
    Tandem CTCF sites function as insulators to balance spatial ...
    Mar 23, 2020 · CTCF is a key insulator-binding protein, and mammalian genomes contain numerous CTCF sites, many of which are organized in tandem.
  47. [47]
    Enhancers: five essential questions - PMC - PubMed Central - NIH
    It is estimated that the human genome contains hundreds of thousands of enhancers, so understanding these gene-regulatory elements is a crucial goal ... 5–10 ...
  48. [48]
  49. [49]
  50. [50]
    Enhancer Evolution across 20 Mammalian Species - PMC
    Third, highly conserved enhancers are enriched for TF binding motifs for liver-specific regulators such as CEBPA and PBX1, whereas highly conserved proximal ...
  51. [51]
    Evolution of exceptionally large genes in prokaryotes - PubMed
    Mar 6, 2008 · Analysis of bacterial genomic sequences revealed an average bacterial gene size of approximately 1 kb. However, genes with a size >10 kb were ...Missing: length | Show results with:length
  52. [52]
    The Complexity of Eukaryotic Genomes - The Cell - NCBI Bookshelf
    Whereas most prokaryotic genes are represented only once in the genome, many eukaryotic genes are present in multiple copies, called gene families. In some ...
  53. [53]
    Toward a resolution of the introns early/late debate: Only phase zero ...
    Such genes have no introns in their prokaryotic forms but introns in their eukaryotic homologs. ... prokaryotic genes (1, 2). Thus, such models predict that these ...
  54. [54]
  55. [55]
    Enhancer–promoter specificity in gene transcription - Nature
    Apr 25, 2024 · In this review, we provide an overview of recent progress in the eukaryotic gene transcription field pertaining to enhancer–promoter specificity.
  56. [56]
    The rise and falls of introns | Heredity - Nature
    Feb 1, 2006 · The Introns Early theory proposed that introns were present in the common ancestor of prokaryotes and eukaryotes, where they were merely the ...
  57. [57]
    Sequential splicing of a group II twintron in the marine ... - Nature
    Nov 18, 2015 · They can be detected in approximately 25% of eubacterial genomes and mostly in low numbers. This might be the main reason for the extreme rarity ...
  58. [58]
  59. [59]
    Primary Role of the Nucleosome - Cell Press
    Aug 6, 2020 · The formation and spurious transcription of NFRs epitomize the role of the nucleosome. Naked DNA must be exposed, by the removal of a nucleosome ...
  60. [60]
    The novel EHEC gene asa overlaps the TEGT transporter ... - Nature
    Dec 14, 2018 · Bacterial genes are densely packed in the genome. Typically, more than 90% of a prokaryotic sequence is covered with gene sequences reading in ...Missing: percentage | Show results with:percentage
  61. [61]
    Mechanism of Transcription Initiation at an Activator-Dependent ...
    Gene expression by s54RNAP requires activator ATPases, which bind to promoter-distal enhancer DNA sequences (Buck et al., 2000;. Popham et al., 1989; ...
  62. [62]
    Transcription in Archaea - PNAS
    With the recent indications for the presence of operons in Eucarya (46), some of the archaeal/bacterial factors may be classified as universal if family ...
  63. [63]
    Promoter-proximal elongation regulates transcription in archaea
    Sep 17, 2021 · Archaeal promoters seem to comprise fewer promoter elements compared to their bacterial and eukaryotic counterparts, but it is possible that ...<|control11|><|separator|>
  64. [64]
    Gene overlapping and size constraints in the viral world - PMC - NIH
    May 21, 2016 · Gene overlapping can be a convenient mechanism to introduce new reading frames on top of an already compact genome, providing an easy expansion ...
  65. [65]
    Multiple self-splicing introns in bacteriophage T4 - ScienceDirect.com
    Multiple introns, and the prospect that these occur within several genes in the same metabolic pathway, suggest a possible regulatory role for splicing in T4.
  66. [66]
    HIV expression strategies: Ribosomal frameshifting is directed by a ...
    The pol gene of the human immunodeficiency virus (HIV-1) is expressed as a gag:pol fusion, arising from a ribosomal frameshift that brings the overlapping, out ...Missing: ssDNA dsDNA
  67. [67]
    The Human Immunodeficiency Virus Type 1 Ribosomal ... - NIH
    The overlapping region of gag and pol genes contains the −1 ribosomal frameshift signal. ... HIV-1 involves a potential intramolecular triplex RNA structure ...
  68. [68]
    What's in a name? How organelles of endosymbiotic origin can be ...
    Feb 4, 2019 · Mitochondria and plastids evolved from free-living bacteria, but are now considered integral parts of the eukaryotic species in which they live.
  69. [69]
    Human Mitochondrial DNA: Particularities and Diseases - PMC
    Oct 1, 2021 · The mitochondrial DNA is a circular molecule of about 16.6 kb (16,569 bp) and unlike the nuclear genome has no introns. ... These mutations are ...Missing: 16.5 | Show results with:16.5
  70. [70]
    Chloroplast gene expression: Recent advances and perspectives
    Similar to those of bacteria, most chloroplast genes are organized as operons transcribed from single promoters and are separated by relatively small non-coding ...
  71. [71]
    Trypanosome RNA editing: the complexity of getting U in and taking ...
    RNA editing was first described in trypanosomatids, when it was found that four non-encoded uridine (U) residues were added to the mitochondrial (mt) mRNA ...