Eukaryotic transcription

Eukaryotic transcription is the enzymatic process by which RNA is synthesized from a DNA template within the nucleus of eukaryotic cells, enabling the expression of genetic information through the production of various RNA types essential for cellular function.^[1] Unlike prokaryotic transcription, which occurs in the cytoplasm with a single RNA polymerase, eukaryotic transcription involves three distinct nuclear RNA polymerases: RNA polymerase I (Pol I), which synthesizes most ribosomal RNAs (rRNAs); RNA polymerase II (Pol II), responsible for transcribing all protein-coding messenger RNAs (mRNAs) and many non-coding RNAs; and RNA polymerase III (Pol III), which produces transfer RNAs (tRNAs), 5S rRNA, and other small RNAs.^[1] This compartmentalization and specialization allow for precise control over gene expression in complex multicellular organisms.^[2] The process unfolds in three main stages: initiation, elongation, and termination. During initiation, RNA polymerases assemble with general transcription factors at promoter regions to form preinitiation complexes (PICs), often facilitated by sequence-specific DNA-binding factors and coactivators like the Mediator complex.^[3] Elongation follows promoter clearance, where the polymerase travels along the DNA, synthesizing RNA at rates of approximately 20–80 nucleotides per second for Pol II, while overcoming nucleosomal barriers through chromatin remodeling and histone modifications.^[4] Termination occurs upon recognition of specific signals, such as polyadenylation sites for Pol II transcripts, leading to polymerase release and RNA processing events like 5'-capping, splicing, and 3'-polyadenylation that are often co-transcriptionally coupled.^[3] Regulation of eukaryotic transcription is multifaceted and primarily occurs at the initiation stage, involving enhancers, silencers, and a vast array of transcription factors that respond to cellular signals, developmental cues, and environmental stimuli.^[1] The C-terminal domain (CTD) of Pol II, composed of heptapeptide repeats, plays a crucial role by undergoing dynamic phosphorylation—such as at serine 5 during initiation and serine 2 during elongation—to recruit processing and regulatory factors.^[2] Chromatin structure further modulates accessibility, with histone variants, covalent modifications (e.g., acetylation and methylation), and ATP-dependent remodelers like SWI/SNF ensuring that transcription is both activated and repressed as needed for processes like cell differentiation and stress response.^[3] This elaborate regulatory network underscores the evolutionary adaptation of eukaryotic transcription to support the complexity of higher organisms.^[1]

Core Transcription Machinery

RNA polymerases

Eukaryotic cells contain three distinct nuclear RNA polymerases, each specialized for transcribing specific classes of RNA essential for cellular function. RNA polymerase I (Pol I) primarily synthesizes the majority of ribosomal RNA (rRNA) precursors, excluding 5S rRNA, accounting for up to 60% of total cellular transcription in rapidly dividing cells.^[5] This polymerase is predominantly localized in the nucleolus, where it associates with ribosomal DNA (rDNA) clusters to facilitate ribosome biogenesis.^[6] Structurally, human Pol I comprises 13 subunits, including a large catalytic subunit and accessory components that enable high processivity on rDNA templates.^[7] RNA polymerase II (Pol II) is responsible for transcribing all messenger RNAs (mRNAs) and most small nuclear RNAs (snRNAs), which are critical for protein coding and RNA splicing, respectively.^[6] As the largest of the eukaryotic polymerases, Pol II consists of 12 subunits, with the largest subunit featuring a unique C-terminal domain (CTD) composed of tandem heptapeptide repeats with the consensus sequence YSPTSPS.^[8] The CTD's phosphorylation states, particularly on serine residues (e.g., Ser2 and Ser5), dynamically regulate transcription stages from initiation to termination by serving as a binding platform for RNA processing factors.^[9] RNA polymerase III (Pol III) transcribes transfer RNAs (tRNAs), 5S rRNA, and various small non-coding RNAs involved in translation and RNA processing.^[10] This polymerase is composed of 17 subunits in humans, including core elements shared with other polymerases and unique lobes for promoter recognition on internal control regions of its target genes.^[11] Pol III exhibits distinct promoter specificity compared to Pol I and Pol II, relying on transcription factor III C (TFIIIC) for recruitment to A- and B-box elements in tRNA and 5S rRNA genes.^[10] All three polymerases are multi-subunit enzymes with conserved core architectures derived from bacterial RNA polymerase, featuring a claw-like structure with a clamp domain that grips DNA to stabilize the transcription bubble, as exemplified in high-resolution structures of Pol II.^[12] Evolutionary conservation is evident in the shared catalytic subunits (e.g., RPB1 homologs) across Pol I, II, and III, reflecting a common ancestry while allowing specialization through unique accessory subunits.^[13] These polymerases differ markedly in sensitivity to the fungal toxin α-amanitin: Pol II is highly sensitive (inhibited at nanomolar concentrations), Pol III shows intermediate resistance (requiring micromolar levels), and Pol I is largely resistant even at high doses. This differential inhibition has been instrumental in distinguishing their activities in biochemical assays.^[14]

General transcription factors

In eukaryotic cells, general transcription factors (GTFs) are a set of basal proteins that enable RNA polymerases to accurately recognize promoters and initiate transcription, distinct from sequence-specific activators. For RNA polymerase II (Pol II), which transcribes protein-coding genes, the GTFs include TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, which collectively form the preinitiation complex (PIC) in a stepwise manner.^[15] These factors ensure promoter-specific initiation by interacting with core promoter elements like the TATA box and facilitating DNA unwinding.^[16] TFIID initiates PIC assembly by binding the TATA box via its TATA-binding protein (TBP) subunit, which bends the DNA, while TBP-associated factors (TAFs) contact other promoter motifs and stabilize the interaction; TFIIA enhances this binding by preventing non-specific associations.^[15] TFIIB then docks adjacent to TBP, recognizing the BRE element and bridging to Pol II to define the start site.^[15] Pol II arrives pre-bound to TFIIF, a heterotetramer that positions the enzyme over the promoter and suppresses non-productive binding.^[15] TFIIE recruits TFIIH and aids in open complex formation, while TFIIH's XPB and XPD helicase subunits unwind ~15 bp of DNA and its cdk7 kinase phosphorylates the Pol II C-terminal domain (CTD) heptapeptide repeats to promote escape into elongation.^[15] The assembly order is TFIID → TFIIA/TFIIB → Pol II-TFIIF → TFIIE → TFIIH.^[15] The Mediator complex, comprising over 20 subunits, serves a basal co-activator role by stabilizing the PIC through interactions with Pol II's CTD and GTFs.^[15] For RNA polymerase I (Pol I), which synthesizes ribosomal RNA from rDNA promoters, the primary GTFs are SL1 and UBF. SL1, a TBP-containing complex with TAF subunits (TAF1A, TAF1B, TAF1C, TAF1D, TAF12), binds the core promoter element (roughly -45 to +20 relative to the start site) to specify Pol I recruitment.^[17] UBF, an HMG-box protein, binds upstream control elements (UCE and proximal control element), enhancing DNA flexibility and stabilizing SL1 binding to facilitate PIC formation.^[18] TIF-IA (mammalian Rrn3 homolog) associates with Pol I's stalk and bridges it to the SL1-promoter complex, enabling closed-to-open complex transition.^[17] Assembly proceeds with UBF and SL1 binding first, followed by Pol I-TIF-IA recruitment.^[17] RNA polymerase III (Pol III) transcribes short non-coding RNAs like tRNAs and 5S rRNA using GTFs TFIIIA, TFIIIB, and TFIIIC, which recognize internal promoters. TFIIIA, featuring nine zinc fingers, specifically binds the internal control region of 5S rRNA genes to initiate assembly.^[19] TFIIIC, a large complex, recognizes A-box and B-box elements in internal promoters of tRNA and other genes, then recruits TFIIIB—a complex of TBP, TFIIB-related factor 1 (Brf1), and Bdp1—which positions Pol III via its TFIIF-like and TFIIE-like subunits.^[19] For type 1 promoters (e.g., 5S rRNA), the order is TFIIIA → TFIIIC → TFIIIB → Pol III; for type 2 (tRNA), it is TFIIIC → TFIIIB → Pol III; type 3 (U6 snRNA) uses SNAPc instead of TFIIIC.^[16] Across Pol I, II, and III, GTFs share conserved modules like TBP for promoter anchoring and TFIIB/Brf homologs for polymerase docking, underscoring mechanistic similarities.^[16]

Initiation of Transcription

Eukaryotic promoters and enhancers

In eukaryotic transcription, promoters are cis-regulatory DNA sequences that specify the site of transcription initiation by RNA polymerases. For RNA polymerase II (Pol II), which transcribes mRNA-encoding genes, the core promoter is a modular region typically spanning about 40 base pairs upstream to 40 base pairs downstream of the transcription start site (TSS). This region contains sequence motifs that collectively determine the basal level of transcription and facilitate recognition by general transcription factors.^[20] One prominent core promoter element is the TATA box, with the consensus sequence TATAAA, positioned approximately 25 to 35 base pairs upstream of the TSS. Found in roughly 10-30% of human Pol II promoters, the TATA box is more prevalent in genes with regulated, tissue-specific expression, such as those involved in development or stress responses, compared to housekeeping genes that often lack it.^[21] The Initiator (Inr) element, encompassing the TSS, has the consensus sequence YYANWYY (Y = pyrimidine, N = any nucleotide, W = A or T) and is present in nearly all Pol II-transcribed genes, aiding in precise start site selection.^[22] Additional Pol II core elements include the TFIIB recognition element (BRE), located upstream of the TATA box with consensus sequences like SSRCGCC (BREu) or RKYACCC (BREd), which modulates initiation efficiency. Downstream elements such as the downstream promoter element (DPE), with consensus RGWYVT at +28 to +32 bp relative to the TSS, and the motif ten element (MTE), consensus CSRSC at +20 to +27 bp, cooperate with Inr to enhance promoter strength, particularly in TATA-less promoters common in Drosophila and mammals.^[20] These elements exhibit variability; for instance, housekeeping genes like those encoding actin or GAPDH often feature CpG-rich, TATA-less promoters with dispersed Inr and DPE usage, while tissue-specific genes, such as β-globin, more frequently incorporate TATA boxes for sharp TSS selection.^[23] RNA polymerase I (Pol I) promoters, responsible for ribosomal RNA (rRNA) synthesis from rDNA repeats, consist of a core element spanning -45 to +20 bp around the TSS and an upstream control element (UCE) from -156 to -107 bp. The core element directs basal initiation, while the UCE boosts transcription efficiency up to 10-fold, with both elements showing species-specific sequence conservation but high overall homology in vertebrates.^[24] RNA polymerase III (Pol III) promoters, which drive small non-coding RNAs like tRNAs and 5S rRNA, are classified into three types based on element positioning. Type 1 promoters, as in 5S rRNA genes, feature internal A box (consensus TRGCNNGNG) and C box (GWTCRNNAGC) sequences within the transcribed region, separated by an intermediate element. Type 2 promoters, typical for tRNA genes, contain internal A box (TGGCNNAGTG) and B box (GGTTCGATTCC) elements downstream of the TSS. Type 3 promoters, found in U6 snRNA genes, resemble Pol II promoters with an upstream proximal sequence element (PSE) at -60 to -40 bp and a distal sequence element (DSE) further upstream, often including a TATA box.^[25] Enhancers are distal cis-regulatory elements that increase transcription rates from promoters, often located up to 100 kilobases away, either upstream or downstream, and function independently of orientation relative to the gene. First identified in the SV40 viral genome, enhancers loop to contact promoters via chromatin interactions, enabling tissue-specific regulation; for example, the immunoglobulin heavy chain enhancer in B cells drives lymphoid-specific expression through combinatorial binding motifs.^[26] Unlike core promoters, enhancers show greater sequence diversity and are enriched in housekeeping genes for broad activity, whereas tissue-specific enhancers, like those in muscle genes, exhibit precise motif arrangements for developmental control.^[27]

Preinitiation complex assembly

The assembly of the preinitiation complex (PIC) in eukaryotic transcription initiation for RNA polymerase II (Pol II) occurs through a highly ordered, sequential recruitment of general transcription factors (GTFs) to the core promoter, typically involving recognition of elements such as the TATA box.^[28] The process begins with the binding of TFIID, a multi-subunit complex containing the TATA-binding protein (TBP), which recognizes and sharply bends the DNA at the TATA box, facilitating subsequent interactions.^[29] TFIIA then associates with the TFIID-DNA complex, stabilizing the TBP-DNA interaction and preventing non-specific binding.^[30] Following this, TFIIB binds to the TBP-DNA-TFIIA platform, bridging the promoter to the arriving Pol II core enzyme, which is escorted by TFIIF; TFIIF helps position Pol II over the start site and stabilizes the downstream DNA duplex.^[31] The recruitment concludes with TFIIE, which binds to the partial complex and recruits TFIIH, a multi-functional ATPase/helicase complex essential for completing PIC formation.^[32] The resulting closed complex maintains the promoter DNA in a double-stranded configuration, with Pol II positioned upstream of the transcription start site (+1).^[30] TFIIH's XPB subunit harbors ATPase activity that supports the structural integrity of this closed state, although hydrolysis is poised for subsequent steps.^[33] The fully assembled Pol II PIC is a massive macromolecular assembly, approximately 2 MDa in size, comprising over 50 polypeptides that collectively ensure precise positioning of the catalytic active site at the +1 nucleotide for accurate initiation.^[29] This stability is critical for the complex's role in scanning the promoter to identify the start site, enhancing transcriptional fidelity.^[34] In contrast, PIC assembly for RNA polymerase I (Pol I), which transcribes ribosomal RNA genes, involves the upstream binding factor (UBF) and the selectivity factor SL1 (a TBP-TAF complex analogous to TFIID). UBF binds to upstream control elements and nucleates chromatin remodeling, while SL1 recognizes the core promoter and recruits Pol I alongside the core factor CF or Rrn3.^[35] For RNA polymerase III (Pol III), which handles small non-coding RNAs, assembly starts with TFIIIC binding to internal promoter elements (A and B boxes), followed by recruitment of TFIIIB (comprising TBP, Brf1, and Bdp1), which then positions Pol III at the start site.^[36] These Pol I and Pol III mechanisms highlight specialized adaptations from the Pol II paradigm, tailored to their distinct promoter architectures.^[37]

Promoter melting and open complex formation

Following the assembly of the preinitiation complex (PIC) at the promoter, promoter melting marks the transition to the open complex, where the DNA duplex unwinds to expose single-stranded template DNA for initial RNA synthesis.^[38] This critical step is driven by the general transcription factor TFIIH, whose XPB subunit (Ssl2 in yeast) exhibits ATP-dependent translocase activity that rotates and threads downstream DNA into the RNA polymerase II (Pol II) active site cleft, generating torsional stress that facilitates DNA unwinding.^[39]^[40] The resulting transcription bubble typically spans approximately 12-15 base pairs, centered around the transcription start site, with an initial ~6 bp opening that expands upon ATP hydrolysis by XPB.^[41] In the open complex, the unwound promoter DNA separates into template and non-template strands, positioning the single-stranded template within the Pol II active site while Pol II adopts a post-translocation state, ready for the first phosphodiester bond formation without an initial RNA nucleotide.^[42] This ATP-dependent process requires hydrolysis by XPB to power the helicase-like translocation, ensuring stable bubble formation despite the energy barrier of DNA duplex separation.^[43] TFIIE plays a key role in stabilizing the open complex by interacting with Pol II and TFIIH, buttressing the Pol II clamp to prevent collapse of the unwound DNA and enhancing XPB activity through conformational regulation.^[29] High-resolution cryo-EM structures of the eukaryotic open complex reveal the precise separation of template and non-template strands, with TFIIH in expanded conformations that accommodate the bubble and highlight XPB's arched insertion into the downstream DNA duplex.^[38]^[44] These models demonstrate how the non-template strand is expelled from the Pol II cleft, while the template strand threads through the active site, underscoring the coordinated dynamics of PIC components during melting.^[42]

Abortive initiation and promoter escape

Following the formation of the open complex, RNA polymerase II (Pol II) engages in abortive initiation, a phase characterized by the repeated synthesis and release of short RNA transcripts typically ranging from 2 to 10 nucleotides in length without polymerase clearance from the promoter.^[45] This process involves multiple cycles where Pol II initiates RNA chain extension but dissociates prematurely due to the instability of the early transcription complex and high off-rates, resulting in non-productive transcripts that do not progress to elongation.^[45] In eukaryotic systems, abortive initiation is less pronounced than in bacteria, with fewer cycling events, as evidenced by in vitro assays showing reduced abortive products beyond 4-10 nt when NTP concentrations are physiological (around 200 µM).^[45] Promoter escape marks the critical transition from this inefficient initiation to stable elongation, where Pol II clears the promoter and forms a processive elongation complex. A key regulatory event is the phosphorylation of serine 5 (Ser5) on the C-terminal domain (CTD) heptapeptide repeats of Pol II's largest subunit by the kinase subunit of TFIIH, specifically Kin28 in yeast or Cdk7 in metazoans.^[46] This phosphorylation disrupts stable interactions between the unphosphorylated CTD and preinitiation complex (PIC) components, such as Mediator and general transcription factors, thereby releasing Pol II from promoter-proximal contacts and facilitating escape around 15-20 nt of RNA synthesis.^[46] Structural studies reveal that escape proceeds in three major steps: displacement of the TFIIB B-reader element by the growing RNA, collapse of the transcription bubble at approximately 17-18 bases, and dissociation of initiation factors like TFIIB and TFIIH, enabling the polymerase to advance beyond +30 nt.^[47] The scaffold complex model describes how the Mediator complex and the Pol II holoenzyme contribute to this transition, with Mediator initially stabilizing the PIC and then dissociating upon CTD phosphorylation to allow promoter escape while leaving a residual "scaffold" at the promoter for subsequent initiation events.00257-3) In this framework, the holoenzyme—comprising Pol II, Mediator, and associated factors—undergoes a compositional shift, where TFIIH-mediated CTD Ser5 phosphorylation promotes Mediator release, enhancing the efficiency of Pol II clearance without fully committing to elongation.00257-3) This model is supported by in vivo and in vitro evidence showing that Mediator remains partially bound post-escape, coordinating reinitiation in a "transcription factory" manner.^[45] The efficiency of promoter escape is modulated by promoter strength, with stronger promoters—such as those featuring optimal TATA box to transcription start site spacing (30-31 bp)—exhibiting faster escape rates and reduced abortive cycling due to more stable initial complexes.^[45] For instance, canonical TATA elements promote bubble collapse at 9-10 nt, accelerating clearance compared to weaker or non-consensus promoters, which prolong abortive initiation and lower productive transcription yields.^[45] This promoter-dependent variation underscores how sequence context influences the energetic barriers to escape, ensuring regulated gene expression.

Elongation of Transcription

Elongation factors and processivity

During the elongation phase of eukaryotic transcription, RNA polymerase II (Pol II) is assisted by various accessory proteins known as elongation factors, which enhance its processivity—the ability to synthesize long RNA transcripts without dissociating from the DNA template. These factors counteract barriers such as nucleosomes and secondary DNA structures, ensuring efficient progression at rates sufficient for cellular demands.^[48] A key positive elongation factor is P-TEFb, composed of cyclin-dependent kinase 9 (CDK9) and cyclin T, which promotes productive elongation by phosphorylating serine 2 (Ser2) residues on the C-terminal domain (CTD) heptapeptide repeats of Pol II. This phosphorylation releases Pol II from promoter-proximal pausing and facilitates the transition to sustained elongation by modifying interactions with other factors. P-TEFb is recruited to gene bodies in a transcription-dependent manner, amplifying its impact on global mRNA synthesis.^[49]^[50] The negative elongation factors DSIF (Spt4/Spt5 heterodimer) and NELF initially contribute to promoter-proximal pausing but are repurposed for elongation upon phosphorylation by P-TEFb. Phosphorylation of DSIF and NELF dissociates NELF from the elongation complex, converting DSIF into a positive factor that stabilizes Pol II on the template and suppresses premature termination. This transition is essential for efficient escape from initiation into elongation, enabling processive transcription.^[51]^[52] DSIF (Spt4/Spt5) and the histone chaperone Spt6 further support elongation by maintaining chromatin integrity during Pol II passage. Spt4/Spt5 binds to Pol II and nascent RNA, promoting forward translocation and preventing backtracking, while Spt6 associates with the polymerase to facilitate nucleosome disassembly ahead of the transcription bubble and reassembly behind it through histone H3-H4 binding and recycling. These actions preserve epigenetic marks and prevent chromatin disruption, allowing Pol II to traverse nucleosomal arrays without stalling.^[53]^[54]^[55] In eukaryotes, these factors collectively enable Pol II to elongate at speeds of approximately 20-60 nucleotides per second in vivo, varying by gene context and cell type, which is critical for timely gene expression. DSIF and Spt6, in particular, play anti-backtracking roles by tethering Pol II to DNA, reducing dissociation events and enhancing overall processivity.^[4]^[56] The super-elongation complex (SEC), incorporating AFF4, ELL proteins, ENL/AF9, and P-TEFb, further boosts processivity for rapid transcription of specific genes, such as immediate early genes and those involved in development. AFF4 scaffolds the complex, while ELL prevents Pol II pausing by suppressing transient termination, enabling burst-like transcription rates that exceed standard elongation. SEC is particularly vital in contexts requiring high transcriptional output, like cellular stress responses.^[57]^[58]

Transcription fidelity and proofreading

During eukaryotic transcription elongation by RNA polymerase II (Pol II), fidelity is maintained through multiple checkpoints to ensure accurate nucleotide incorporation, with an overall error rate of approximately 1 in 10^6 nucleotides. The primary mechanism begins with base selection in the active site, where the trigger loop of the RPB1 subunit collapses to position the incoming nucleoside triphosphate (NTP) for correct Watson-Crick base pairing with the template DNA, discriminating against mismatches with kinetic rates that are 10^3 to 10^4 times slower than for correct NTPs. This selectivity arises from the precise geometry of the active site, which stabilizes matched base pairs while imposing energetic penalties on mismatches, resulting in an initial misincorporation error rate of about 1 in 10^4 to 10^5.^[59]^[60] Post-incorporation fidelity is enhanced by kinetic barriers that hinder the extension of mismatched nucleotides, preventing their propagation into the RNA chain. Mismatch-specific strategies vary: purine-purine mismatches (e.g., G·A) induce structural distortions that slow translocation and extension by over 4,000-fold relative to correct pairs, while pyrimidine-pyrimidine mismatches (e.g., T·U) form wobble pairs that fray the RNA 3' end, displacing it by more than 2 Å from the catalytic site and disrupting metal ion coordination essential for phosphodiester bond formation. These barriers provide a window for error detection before further elongation.^[61] Proofreading occurs when a mismatch triggers backtracking of the transcription complex, extruding the erroneous 3' RNA end into the secondary channel of Pol II. The rudder and lid domains of RPB1 play critical roles here: the rudder separates the RNA from the DNA template downstream of the active site, while the lid grips the RNA upstream, stabilizing the RNA:DNA hybrid and facilitating backtracking upon mismatch detection by altering hybrid stability. This backtracked state positions the mismatched RNA for cleavage, where Pol II's intrinsic endonucleolytic activity hydrolyzes the phosphodiester bond 2-9 nucleotides upstream, excising the error as a dinucleotide or oligonucleotide. The proofreading step further reduces the error rate by 10- to 100-fold, depending on the mismatch type.^[62]^[61]^[60] The cleavage factor TFIIS (transcription factor IIS) significantly amplifies proofreading efficiency by binding to the secondary channel of backtracked Pol II, opening the active site to promote cleavage of mismatched RNA up to 10^5 times faster than the intrinsic rate. TFIIS stabilizes the backtracked conformation and coordinates a second catalytic metal ion, enabling precise endonucleolytic hydrolysis at the mismatch site, which is particularly effective for single-nucleotide errors. This stimulated cleavage ensures high fidelity during elongation, with TFIIS-dependent proofreading accounting for much of the overall accuracy in vivo. Elongation factors like Spt4/5 can indirectly support fidelity by modulating backtracking dynamics, though their primary role is processivity.^[63]^[60]

Pausing, poising, and backtracking

In eukaryotic transcription elongation, promoter-proximal pausing occurs shortly after initiation, where RNA polymerase II (Pol II) stalls approximately 20-60 nucleotides downstream of the transcription start site, preventing premature progression into productive elongation.^[64] This pause is mediated by the negative elongation factor (NELF) complex and the DRB sensitivity-inducing factor (DSIF), which form a stable complex with the initiating form of Pol II (Pol IIa) and the nascent RNA transcript.^[64] NELF binds directly to the emerging RNA chain, while DSIF associates with Pol II to restrict its forward movement, thereby poising the polymerase for rapid activation upon receiving appropriate regulatory signals.^[64] A classic example of promoter-proximal pausing and poising is observed in the Drosophila heat shock genes, such as hsp70, where Pol II accumulates in a paused state under non-stress conditions, enabling swift transcriptional activation in response to heat shock. This poised configuration, established by NELF and DSIF, allows for the near-instantaneous recruitment of additional Pol II molecules and the transition to the elongating form (Pol IIo) upon induction, facilitating a robust protective response against cellular stress.^[64] In uninduced cells, the paused Pol II at hsp70 is transcriptionally engaged but stalled, ensuring that gene expression can be rapidly amplified without the delay of de novo initiation complex assembly.^[65] Backtracking represents another regulatory mechanism during elongation, wherein Pol II undergoes reverse translocation due to slippage in the RNA:DNA hybrid, displacing the 3' end of the nascent RNA into the polymerase's secondary channel by 8-14 nucleotides.00598-1) This backward movement stabilizes misaligned complexes, leading to transcriptional pausing or arrest, particularly at sequence obstacles or during error correction.00142-1) Resolution of backtracked states requires cleavage of the extruded RNA segment to realign the 3' end with the active site, restoring elongation competence.00142-1) In eukaryotes, the GreB homolog transcription factor IIS (TFIIS) enhances Pol II's intrinsic endonucleolytic activity to cleave backtracked transcripts, typically producing fragments of 8-14 nucleotides that match the extent of reversal.00598-1) TFIIS inserts into the polymerase's secondary channel, donating acidic residues to stabilize the catalytic magnesium ions and promote phosphodiester bond hydrolysis.^[66] This cleavage not only rescues stalled complexes but also contributes to transcriptional fidelity by removing mismatched or aberrant RNA segments, as backtracking often occurs following incorporation errors.00142-1) In the context of promoter-proximal pausing, such as at Drosophila hsp70, TFIIS is essential for efficient release from stall sites, further underscoring its role in poising and activation.00720-8)

Co-transcriptional RNA processing

Co-transcriptional RNA processing encompasses the modification events that mature the nascent pre-mRNA during transcription elongation by RNA polymerase II (Pol II), ensuring efficient coupling between synthesis and quality control of the transcript. These processes are orchestrated primarily through the C-terminal domain (CTD) of Pol II's largest subunit (RPB1), a tail of 52 heptapeptide repeats (consensus YSPTSPS) that undergoes dynamic phosphorylation. Phosphorylation patterns on CTD residues, particularly serines 2 and 5, serve as a platform for recruiting distinct processing machineries, preventing uncapped or improperly spliced RNAs from accumulating and linking processing fidelity to transcriptional progression. The initial modification, 5' capping, occurs co-transcriptionally shortly after transcription initiation, when the nascent RNA chain reaches 20-30 nucleotides in length. This step involves the addition of a 7-methylguanosine cap (m7GpppN) to the 5' triphosphate end by a tripartite capping enzyme complex: RNA triphosphatase (removes the γ-phosphate), guanylyltransferase (adds GMP), and methyltransferase (adds the methyl group). Recruitment of the capping enzyme to Pol II is mediated by direct binding of its guanylyltransferase subunit to the CTD phosphorylated at serine 5 (Ser5-P), a mark established by the Kin28/Cdk7 kinase during promoter clearance; this interaction allosterically activates the enzyme and restricts capping to Pol II transcripts.00446-3) Splicing, the removal of introns and ligation of exons, also proceeds co-transcriptionally for most introns, with spliceosome assembly initiated as soon as splice sites emerge from the polymerase exit channel. The U1 small nuclear ribonucleoprotein (snRNP) is the first component recruited, binding the 5' splice site at exon-intron boundaries via base-pairing with its U1 snRNA; this commitment complex forms within seconds of transcription across the site in yeast and mammalian systems. Subsequent spliceosome maturation involves U2 snRNP binding the branch point and recruitment of tri-snRNP particles, facilitated by interactions with the CTD; phosphorylation at serine 2 (Ser2-P), catalyzed by kinases like P-TEFb (Ctk1 in yeast), promotes association of splicing factors such as U2AF and SR proteins, enhancing splice site recognition and enabling sequential intron removal during elongation. Preparation for 3' end formation begins during elongation, with cleavage and polyadenylation factors scanning the nascent RNA for processing signals. The cleavage and polyadenylation specificity factor (CPSF) complex, comprising subunits like CPSF-160 and CPSF-73, recognizes the polyadenylation signal (PAS) AAUAAA (or variants like AUUAAA) located 10-30 nucleotides upstream of the cleavage site; this hexameric motif is bound by the RNA recognition motif (RRM) domains of CPSF-160, often in concert with downstream GU/U-rich elements recognized by the cleavage stimulation factor (CstF). CPSF recruitment to the elongating Pol II occurs via interactions with the Ser2-P CTD and associated elongation factors, positioning the endonuclease CPSF-73 for precise cleavage and subsequent poly(A) polymerase addition of the poly(A) tail upon signal recognition.

Termination of Transcription

Pol II-dependent termination mechanisms

In eukaryotic cells, transcription termination by RNA polymerase II (Pol II) primarily occurs through the torpedo model, where cleavage and polyadenylation at the poly(A) site generate a free 5' end on the downstream RNA fragment that is rapidly degraded by the 5'-3' exonuclease Rat1 (in yeast) or Xrn2 (in humans), acting as a "torpedo" to catch up with and displace the elongating Pol II complex from the DNA template.^[67] This process is tightly coupled to 3' end formation, ensuring efficient release of the mature mRNA and recycling of Pol II for subsequent rounds of transcription. The model integrates elements of an older allosteric mechanism, where conformational changes in Pol II triggered by processing factors contribute to weakening polymerase-DNA interactions, but the nuclease chase is the key effector for termination.^[67]^[68] The cleavage at the poly(A) site is mediated by the cleavage and polyadenylation specificity factor (CPSF) complex, which recognizes the AAUAAA signal and associated downstream elements, producing the upstream RNA for polyadenylation and the downstream fragment for degradation.^[69] Rat1/Xrn2 loads onto this 5' end and processively degrades the nascent RNA, requiring association with the Rai1 cofactor for efficient activity near Pol II; upon reaching the polymerase, the exonuclease displaces Pol II by disrupting its interactions with the RNA-DNA hybrid and template DNA.^[70] Torpedo loading and function depend on the cleavage factor complex including Pcf11 and Ctf20 (in yeast), which bridge the processing machinery to Pol II via interactions with the phosphorylated C-terminal domain (CTD) of the largest subunit, Rpb1.^[71] Specifically, phosphorylation of serine 2 (Ser2-P) in the CTD heptapeptide repeats (YSPTSPS) during elongation recruits termination factors such as Pcf11 and Rtt103, which bind cooperatively to Ser2-P marks to stabilize the subcomplex at the 3' end of genes and facilitate exonuclease access.^[72]^[73] As a fail-safe mechanism, Pol II employs intrinsic pausing and backtracking, where the polymerase stalls and extrudes a 3' RNA flap that can be cleaved endonucleolytically by the active site of Pol II itself, generating entry points for exonucleases if the primary torpedo pathway is impaired.^[74] This backtracked cleavage, enhanced by factors like TFIIS, prevents persistent polymerase stalling and ensures termination even in the absence of poly(A)-dependent processing, as observed in some non-coding transcripts or under mutagenic conditions.^[68] These redundant pathways highlight the robustness of Pol II termination, minimizing read-through transcription that could interfere with downstream gene expression.

Pol I- and Pol III-specific termination

In eukaryotic cells, termination of RNA polymerase I (Pol I) transcription, which synthesizes the majority of ribosomal RNA (rRNA), relies on sequence-specific roadblock factors that pause the elongating polymerase. In mammals, the transcription termination factor TTF-I binds to Sal box sequences—short DNA motifs (11–18 bp) located in the intergenic spacer downstream of the pre-rRNA coding region—inducing Pol I pausing and subsequent dissociation of the transcription complex.^[75] Multiple Sal box elements (T1–T10) enhance termination efficiency and may facilitate gene looping, bringing terminators into proximity with upstream promoters to promote Pol I recycling and reinitiation.^[24] In yeast, the analogous mechanism involves Reb1 (or its homolog Nsi1) binding to terminator sites in the ribosomal DNA intergenic spacer, such as the T1 site ~93 nucleotides downstream of the 25S rRNA gene, which pauses Pol I.^[76] Efficient termination requires the 5′–3′ exonuclease Rat1 (Xrn2 in mammals), recruited following endonucleolytic cleavage of the nascent pre-rRNA by Rnt1 at sites in the 3′ external transcribed spacer; Rat1 then degrades the RNA from the 5′ end, "torpedoing" the paused polymerase and promoting its release.^[76] Without Rat1 activity, Pol I often reads through the primary terminator, relying on secondary sites like T2 or the replication fork barrier for eventual dissociation.^[75] RNA polymerase III (Pol III), responsible for transcribing small non-coding RNAs such as tRNAs and 5S rRNA, employs an intrinsic termination mechanism independent of additional protein factors in many cases. Termination occurs at stretches of 4–6 thymidines (oligo-dT) in the non-template DNA strand, corresponding to oligo-dA in the template, which produce a run of uridines (oligo-U) at the RNA 3′ end; the weak rU:dA base pairs destabilize the RNA:DNA hybrid, causing Pol III to stall and dissociate without precise cleavage.^[77] Efficiency depends on the length and flanking sequences, with at least 4 Ts sufficient in vertebrates and 5 or more in Saccharomyces cerevisiae, often resulting in heterogeneous oligo-U tracts of variable length.^[77] In yeast, the ATP-dependent helicase Sen1 enhances Pol III termination by binding to the stalled complex and unwinding the RNA:DNA hybrid, facilitating polymerase displacement and transcript release, particularly for genes with suboptimal oligo-dT signals; this acts as a fail-safe mechanism to prevent read-through transcription.^[78] Sen1's role underscores a conserved function across polymerases, though Pol III termination remains more direct than the coupled cleavage-exonuclease processes in Pol I and Pol II.^[77]

Transcription Regulation

Chromatin remodeling and epigenetic control

Chromatin remodeling and epigenetic modifications play crucial roles in regulating access to DNA for transcription in eukaryotes by dynamically altering nucleosome structure and stability. These processes enable the transition between condensed, transcriptionally repressive chromatin (heterochromatin) and open, accessible states (euchromatin), thereby controlling the recruitment of RNA polymerase II and associated factors. Epigenetic marks, including covalent modifications on histones and DNA, provide heritable information that influences transcriptional states without altering the underlying genetic sequence.^[79] Histone acetylation, catalyzed by histone acetyltransferases (HATs) such as p300 and CBP, neutralizes the positive charge on lysine residues, reducing the affinity between histones and negatively charged DNA to promote an open chromatin conformation conducive to transcription activation. For instance, acetylation of histone H3 and H4 tails at promoter regions facilitates the binding of bromodomain-containing proteins that further stabilize the transcriptional machinery. In contrast, histone methylation exhibits context-dependent effects: trimethylation of histone H3 at lysine 4 (H3K4me3), deposited by SET1/MLL complexes, marks active promoters and enhancers, correlating with high levels of transcription by recruiting chromatin readers like TAF3 in the preinitiation complex. Conversely, trimethylation of H3 at lysine 27 (H3K27me3), mediated by the Polycomb Repressive Complex 2 (PRC2) containing EZH2, enforces gene repression by compacting chromatin and inhibiting polymerase progression, particularly at developmental loci.^[80]^[81] ATP-dependent chromatin remodeling complexes, such as the SWI/SNF family, utilize the energy from ATP hydrolysis to slide, eject, or restructure nucleosomes, thereby exposing promoter regions for transcriptional initiation. The SWI/SNF complex, featuring the BRG1 or BRM ATPase subunits, disrupts nucleosome-DNA contacts to facilitate activator binding and polymerase recruitment at target genes. Complementarily, the ISWI family remodelers, including complexes like ACF and CHRAC, promote regular nucleosome spacing and array assembly, which supports both activation and repression by maintaining ordered chromatin architecture that influences transcription factor access. These remodelers often cooperate with histone modifications to fine-tune chromatin dynamics during gene expression.^[82]^[83] DNA methylation, primarily at cytosine residues in CpG dinucleotides, contributes to transcriptional silencing by recruiting methyl-CpG-binding domain proteins that compact chromatin and exclude transcription factors, with dense methylation at promoter-proximal CpG islands strongly correlating with gene repression. Demethylation is facilitated by ten-eleven translocation (TET) enzymes, which oxidize 5-methylcytosine to 5-hydroxymethylcytosine and further intermediates, enabling active removal of repressive marks and potential reactivation of silenced loci. This balance ensures stable yet reversible control over transcriptional accessibility.^[84]^[85] Epigenetic inheritance of these modifications occurs during DNA replication, where parental histones with marks like H3K4me3 or H3K27me3 are randomly segregated to daughter strands, and histone chaperones such as CAF-1 facilitate the propagation of these patterns through read-write mechanisms involving methyltransferases that restore marks on newly synthesized histones. This semi-conservative process maintains transcriptional memory across cell divisions, ensuring lineage-specific gene expression profiles. These mechanisms collectively influence preinitiation complex assembly by modulating chromatin accessibility at promoters.^[86]^[87]

Gene-specific activation and enhancers

Gene-specific activation in eukaryotic transcription is primarily mediated by sequence-specific transcription activators that bind to enhancer regions, distal DNA sequences that boost the activity of target promoters. These activators recruit the transcriptional machinery to specific genes, enabling precise control over expression in response to cellular signals. Enhancers can function over long distances, often tens to hundreds of kilobases away from their target promoters, and their activity is orientation- and position-independent.^[27] Transcription activators typically consist of a DNA-binding domain (DBD) and an activation domain (AD). Common DBD motifs include zinc fingers, which coordinate zinc ions to stabilize interactions with DNA bases, as seen in the transcription factor Sp1, and helix-turn-helix motifs, where an alpha helix recognizes the major groove of DNA, exemplified by homeodomain proteins. Notable examples include NF-κB, a Rel family member with a Rel homology domain featuring a variant helix-turn-helix structure that binds κB sites to activate immune response genes, and p53, which uses a zinc finger-based DBD to recognize response elements in genes involved in cell cycle arrest and apoptosis.^[88]^[89]^[90]^[91] Co-activators bridge activators to the core transcriptional machinery, enhancing recruitment and activity. The Mediator complex, a large multi-subunit co-activator, interacts with activator ADs via its head, middle, and tail modules and binds the C-terminal domain (CTD) of RNA polymerase II (Pol II) to stabilize pre-initiation complex formation. p300, a histone acetyltransferase (HAT), serves as another key co-activator by acetylating histones to loosen chromatin and directly interacting with activators like p53 to facilitate Pol II recruitment; its HAT activity is essential for enhancer function.^[92]^[93]^[94] Enhancer-promoter communication often involves chromatin looping, where enhancers are brought into physical proximity with promoters. The Mediator complex, in conjunction with cohesin, facilitates these loops by tethering enhancer-bound activators to promoter-bound Mediator, while CTCF proteins at loop boundaries insulate domains to prevent inappropriate interactions. This looping mechanism enables enhancers to contact promoters across large genomic distances, as demonstrated in studies of developmental genes where cohesin extrusion along DNA fibers positions enhancers near targets.^[95]^[27]^[96] Super-enhancers represent clusters of multiple enhancers bound by high densities of activators, Mediator, and Pol II, driving robust expression of genes critical for cell identity, such as those encoding master transcription factors in embryonic stem cells. These regions span larger genomic areas (typically >12.5 kb) with elevated levels of histone acetylation and bromodomain protein occupancy compared to typical enhancers, leading to heightened transcriptional output and sensitivity to perturbations. However, the concept of super-enhancers has been subject to debate, with some researchers arguing they represent a quantitative rather than qualitative difference from typical enhancers.^[97]^[98]^[99]^[100]

Gene-specific repression and silencers

Gene-specific repression in eukaryotic transcription involves targeted mechanisms that inhibit the expression of particular genes, often through sequence-specific DNA-binding proteins known as repressors. These repressors directly bind to regulatory elements near target genes, preventing the assembly or activity of the transcription initiation complex. A prominent example is the repressor element-1 silencing transcription factor (REST), also known as neuron-restrictive silencer factor (NRSF), which binds to repressor element-1 (RE1) sites in the promoters of neuronal genes, thereby blocking their transcription in non-neuronal cells to maintain cell-type specificity during development.^[101]^[102] Another mode of repression, termed quenching, occurs when repressors compete with activators for binding to overlapping or adjacent DNA sites, thereby reducing activator occupancy and transcriptional activation without directly altering chromatin structure. This competitive quenching is particularly evident in short-range regulatory contexts, such as in Drosophila loci where repressors like those in the even-skipped stripe 2 enhancer displace activators to fine-tune spatial gene expression patterns.^[103]^[104] Co-repressors enhance the repressive activity of DNA-binding repressors by recruiting chromatin-modifying enzymes that compact nucleosomes and inhibit RNA polymerase access. Histone deacetylases (HDACs), such as HDAC1 and HDAC2, are key co-repressors that remove acetyl groups from histones, promoting a closed chromatin conformation that silences transcription. The Sin3 complex, a multi-subunit assembly containing HDACs, acts as a scaffold for these interactions; for instance, REST recruits the Sin3-HDAC complex to RE1 sites, amplifying repression of neuronal genes through deacetylation of histones H3 and H4.^[105]^[106]^[107] Silencers are cis-regulatory DNA elements that mediate long-range repression, distinct from promoters and enhancers, by insulating genes from activating signals or imposing heritable silencing. Insulator elements, such as those bound by CCCTC-binding factor (CTCF), block enhancer-promoter interactions by forming chromatin loops that physically separate regulatory domains, thereby preventing inappropriate activation in eukaryotic genomes. Polycomb repressive complexes (PRCs) associate with silencers to enforce stable repression; PRC2 catalyzes trimethylation of histone H3 at lysine 27 (H3K27me3), which recruits PRC1 to monoubiquitinate histone H2A at lysine 119 (H2AK119ub), compacting chromatin and inhibiting Pol II elongation at developmental genes like Hox clusters.^[108]^[109]^[110]^[111] A major class of eukaryotic repressors featuring silencer-like activity includes KRAB-containing zinc finger proteins (KRAB-ZFPs), the largest family of transcription factors in vertebrates, which bind specific DNA motifs via their C2H2 zinc finger domains. The KRAB domain recruits co-repressors like KAP1 (KRAB-associated protein 1), which in turn assembles HDACs and heterochromatin protein 1 (HP1) to silence endogenous retroviruses and developmental genes, often over long distances through heterochromatin spreading. Unlike bacterial quorum sensing, which coordinates population-level responses via diffusible signals, KRAB-ZFPs provide precise, cell-intrinsic repression tailored to eukaryotic complexity, such as in silencing transposable elements to maintain genomic stability.^[112]^[113]^[114]

Post-initiation control of elongation and termination

Post-initiation control in eukaryotic transcription primarily modulates the progression of RNA polymerase II (Pol II) through elongation and ensures precise termination, thereby fine-tuning gene expression levels and preventing aberrant transcription. After promoter escape, Pol II encounters regulatory checkpoints that influence its processivity, such as inhibitory complexes that maintain a paused state or promote premature termination in non-productive regions. These mechanisms integrate signals from cellular conditions to activate productive elongation or enforce termination fidelity, often coupling transcription with downstream RNA processing events.^[48] A key regulator of elongation is the positive transcription elongation factor b (P-TEFb), a cyclin-dependent kinase complex comprising CDK9 and cyclin T that phosphorylates the C-terminal domain (CTD) of Pol II at serine 2, as well as negative elongation factors DSIF and NELF, to facilitate pause release and productive elongation. P-TEFb activity is tightly controlled through sequestration in an inhibitory complex with HEXIM1 or HEXIM2 proteins and the 7SK small nuclear ribonucleoprotein (snRNP), where HEXIM binds 7SK RNA to expose domains that inhibit CDK9 kinase activity, thereby preventing premature elongation activation.^[115] Release of P-TEFb from this complex occurs upon recruitment by activators like BRD4 or Tat in viral contexts, allowing phosphorylation and elongation progression; this dynamic inhibition-activation cycle ensures elongation responds to developmental or stress signals.^[116] Attenuation mechanisms involve premature termination of Pol II shortly after initiation, particularly in promoter-proximal or non-coding regions, to silence spurious transcription and maintain genomic stability. For instance, promoter upstream transcripts (PROMPTs), short non-coding RNAs transcribed bidirectionally from sequences ~1-2 kb upstream of protein-coding gene start sites, are rapidly terminated via the torpedo pathway, where the XRN2 exonuclease degrades the RNA and displaces Pol II after cleavage by the cleavage and polyadenylation specificity factor (CPSF).^[117] This process is widespread, affecting over 50% of human genes, and helps prevent antisense interference or unproductive transcripts from accumulating.^[118] Termination fidelity is enhanced by coupling with 3' end processing, where the Integrator complex plays a pivotal role in snRNA genes by recognizing the snRNA promoter and cleaving nascent transcripts ~20-40 nucleotides downstream of the mature 3' end, triggering Pol II release through recruitment of termination factors like XRN2.^[119] For protein-coding genes, termination often involves alternative polyadenylation (polyA) site choice, where Pol II processivity and CTD phosphorylation influence selection among tandem polyA signals; high elongation rates favor distal sites for longer 3' UTRs, while pausing or attenuation promotes proximal cleavage, linking termination efficiency to mRNA isoform diversity and stability.^[120]^[121] Antisense transcription can interfere with sense strand elongation and termination by converging Pol II complexes that collide, leading to backtracking, stalling, or premature dissociation of the sense polymerase, particularly in head-on orientations that exacerbate roadblock effects. This interference is condition-specific, as seen in yeast where antisense RNAs repress sense genes by extending into promoters and altering chromatin, thereby modulating elongation rates without direct sequence complementarity.^[122]^[123]

Specialized Transcription Processes

Transcription-coupled DNA repair

Transcription-coupled nucleotide excision repair (TC-NER) is a specialized subpathway of nucleotide excision repair (NER) that preferentially repairs DNA lesions on the transcribed strand of active genes in eukaryotes. When RNA polymerase II (Pol II) encounters a bulky DNA lesion, such as a cyclobutane pyrimidine dimer induced by ultraviolet (UV) light, transcription elongation stalls, triggering the recruitment of repair factors to resolve the blockage and restore transcription. This process ensures that actively transcribed genes are repaired more rapidly than non-transcribed regions, prioritizing the maintenance of gene expression fidelity.^[124] Central to TC-NER initiation are the Cockayne syndrome proteins B (CSB, encoded by ERCC6) and A (CSA, encoded by ERCC8), which form a complex that recognizes the stalled Pol II and facilitates the assembly of the repair machinery. CSB, an SNF2 family ATPase, binds directly to the stalled polymerase and promotes its backtracking, a process where Pol II reverses along the DNA template to extrude the 3' end of the nascent RNA and expose the DNA lesion for verification and incision. CSA, a WD40-repeat protein, acts as an adaptor that ubiquitinates and recruits additional factors, including the TFIIH helicase and core NER proteins like XPA, to the site, enabling dual incision of the damaged strand approximately 20-30 nucleotides away from the lesion. This backtracking step is crucial, as it provides physical access to the lesion without dissociating Pol II from the DNA, and in some contexts, factors like Rad51 may stabilize the reversed complex to prevent collapse into double-strand breaks. Recent advances as of 2024 have elucidated transcription-coupled repair of DNA-protein crosslinks (DPCs), where stalled Pol II recruits factors for DPC proteolysis, and coordination with repair-independent eviction via the p97-proteasome pathway to maintain genome stability.^[125]^[126]^[127]^[128]^[129] Mutations in CSA or CSB genes disrupt TC-NER, leading to Cockayne syndrome, a rare autosomal recessive disorder characterized by severe neurological dysfunction, premature aging, and extreme sensitivity to UV-induced DNA damage due to inefficient repair of transcription-blocking lesions. Patients with Cockayne syndrome exhibit hyperphotosensitivity and defective recovery of RNA synthesis following UV exposure, underscoring the pathway's role in cellular viability. Beyond canonical TC-NER, non-canonical transcription-coupled base excision repair (TC-BER) operates for oxidative lesions, where Pol II stalling recruits DNA glycosylases such as NEIL1 and NEIL2 to initiate repair of base modifications like 8-oxoguanine on the transcribed strand. This coupling enhances the efficiency of repairing transcription-impeding oxidative damage in eukaryotes.^[124]^[130]

Dysregulation of transcription in cancer

Dysregulation of eukaryotic transcription plays a central role in oncogenesis by altering gene expression patterns that promote uncontrolled cell proliferation, survival, and metastasis. Mutations and aberrant activity in RNA polymerase II (Pol II) components, including its C-terminal domain (CTD), contribute to these changes. In cancer, Pol II factors such as POLR2A (encoding the RPB1 subunit with the CTD) are frequently overexpressed, driving tumor progression in gastric, ovarian, and acute myeloid leukemia cells through enhanced transcriptional output and cell cycle advancement.^[131] Additionally, oncogenic super-enhancers hijack Pol II machinery to amplify proto-oncogenes like MYC, forming large clusters of enhancers that sustain high MYC expression in multiple myeloma and other malignancies, thereby promoting transcriptional addiction and tumor maintenance.^[132] Epigenetic alterations further dysregulate transcription in cancer, often manifesting as global DNA hypomethylation that reactivates silenced oncogenes and transposons, alongside focal hypermethylation of tumor suppressor promoters. This imbalance disrupts chromatin accessibility and Pol II recruitment, fostering a pro-tumorigenic transcriptional landscape common across various cancer types. Histone deacetylase (HDAC) inhibitors counteract these changes by increasing histone acetylation to restore normal gene expression; vorinostat (Zolinza), approved by the FDA in 2006 for cutaneous T-cell lymphoma, exemplifies this therapeutic approach by inducing cell cycle arrest and apoptosis in transformed cells.^[133]^[134] Fusion proteins from chromosomal translocations exemplify direct interference with transcriptional activation. In acute promyelocytic leukemia (APL), the PML-RARα fusion protein aberrantly recruits corepressors to retinoic acid response elements, blocking Pol II-mediated activation of differentiation genes and maintaining the leukemic state; all-trans retinoic acid therapy reverses this by relieving repression and promoting degradation of the fusion.^[135] Recent advances, particularly post-2020, highlight vulnerabilities in transcriptional elongation as therapeutic targets. Genome-wide CRISPR/Cas9 screens in patient-derived glioblastoma stem cells have identified dependencies on Pol II elongation factors, such as those in the super elongation complex, revealing that inhibiting YY1-driven elongation machinery not only impairs tumor growth but also activates interferon responses to potentiate anti-PD-1 immunotherapy.^[136] Similarly, CRISPR screens have uncovered roles for Pol II pausing factors like NELF in enhancing CD8+ T cell anti-tumor immunity, where NELF suppression boosts effector functions and tumor infiltration, suggesting pausing modulation as a strategy to improve immunotherapy efficacy in solid tumors. Post-2022 developments include enhancer reprogramming as a driver of transcriptional dysregulation in diverse cancers and transcription-replication conflicts leading to oncogenic mutations, opening avenues for targeted therapies like inhibiting ubiquitinated oncogenic transcription factors (as of 2025).^[137]^[138]^[139]

Comparisons with Prokaryotic Transcription

Differences in machinery and initiation

Eukaryotic transcription employs three distinct nuclear RNA polymerases—RNA polymerase I (Pol I), II (Pol II), and III (Pol III)—each specialized for transcribing specific classes of genes, such as ribosomal RNAs by Pol I, protein-coding messenger RNAs by Pol II, and transfer RNAs and 5S ribosomal RNA by Pol III.^[140] In contrast, prokaryotes utilize a single RNA polymerase (RNAP) enzyme, primarily the housekeeping σ70-RNAP holoenzyme, which handles all transcription needs.^[140] This multiplicity in eukaryotes reflects the compartmentalization of transcription within the nucleus and the need to coordinate diverse gene expression programs, whereas the singular prokaryotic RNAP enables rapid, unified responses to environmental cues.^[141] A key distinction lies in the auxiliary factors required for promoter recognition and assembly. Eukaryotic Pol II transcription depends on general transcription factors (GTFs), including TFIID, TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH, which collectively form the preinitiation complex (PIC) to recruit and position the polymerase at promoters.^[31] Prokaryotes, however, rely on sigma (σ) factors that associate transiently with the RNAP core to form the holoenzyme, with σ70 being the primary factor for most housekeeping genes in bacteria like Escherichia coli.^[141] These σ factors directly recognize promoter sequences without the need for an extensive multi-component assembly, allowing for simpler and faster initiation compared to the GTF-mediated process in eukaryotes.^[140] Initiation in eukaryotes involves a multi-step, ordered assembly of the PIC on the core promoter. The process begins with TFIID binding to promoter elements via its TATA-binding protein (TBP) subunit, followed by recruitment of TFIIA and TFIIB, then Pol II pre-associated with TFIIF, and finally TFIIE and TFIIH to melt the DNA and form the open complex.^[31] This stepwise recruitment ensures precise positioning but introduces regulatory checkpoints absent in prokaryotes, where the pre-formed holoenzyme binds directly to the promoter, scans for the start site, and initiates without intermediate assemblies.^[141] Notably, while many eukaryotic promoters feature a TATA box recognized by TBP, prokaryotic promoters lack a true TATA equivalent in the eukaryotic sense, relying instead on conserved motifs for holoenzyme docking.^[140] Promoter architecture further highlights these differences. Eukaryotic promoters consist of a core region (often including the TATA box, initiator, and downstream elements) proximal to the transcription start site, augmented by distal enhancers that loop to interact with the core via mediator complexes for combinatorial regulation.^[140] Prokaryotic promoters, by comparison, are more compact, defined primarily by the -35 box (TTGACA) and -10 box (TATAAT) sequences recognized by σ70, with minimal reliance on distant regulatory elements.^[141] This core-only simplicity in prokaryotes supports polycistronic operons, whereas eukaryotic enhancers enable tissue-specific and signal-responsive expression across large genomic distances.^[140]

Differences in regulation and termination

Eukaryotic transcription regulation is markedly more complex than prokaryotic, incorporating layers such as chromatin remodeling and epigenetic modifications that control DNA accessibility for RNA polymerase II (Pol II). In contrast, prokaryotes primarily rely on operons—clusters of genes transcribed as polycistronic mRNAs under a single promoter—to coordinate expression of related functions, with attenuation mechanisms allowing premature termination based on translation rates.^[140] Eukaryotes employ distal enhancers, which are cis-regulatory DNA sequences that loop to promoters via mediator complexes and transcription factors to activate gene-specific expression over long distances.^[142] Prokaryotes, lacking such spatial separation, use riboswitches—RNA elements in the nascent transcript that sense metabolites and alter secondary structure to modulate transcription or translation without requiring additional proteins.^[142] This fundamental difference in logic stems from the need for eukaryotes to regulate diverse cell types and developmental stages, whereas prokaryotic regulation emphasizes rapid responses to environmental cues through direct promoter-operator interactions.81693-4) Transcription termination in eukaryotes for Pol II-transcribed genes follows the torpedo model, where cleavage and polyadenylation specificity factor (CPSF) recognizes polyadenylation signals, triggering endonucleolytic cleavage of the nascent RNA; a 5'-3' exonuclease (Xrn2 in mammals) then degrades the downstream RNA fragment, catching up to Pol II and promoting its release via the rat1/Xrn2-dependent pathway.^[143] Prokaryotic termination, by comparison, occurs via two main mechanisms: Rho-dependent, where the Rho helicase binds to unstructured RNA and translocates 5'-3' to dissociate RNA polymerase from the template, or intrinsic (Rho-independent), relying on GC-rich hairpin loops in the RNA followed by a uridine-rich tract that weakens polymerase-DNA interactions.^[140] Unlike eukaryotes, prokaryotes lack co-transcriptional RNA processing events such as capping, splicing, or polyadenylation during elongation, as transcription and translation are coupled in the cytoplasm without nuclear barriers.^[142] The evolutionary divergence in these processes is largely attributed to nuclear compartmentalization in eukaryotes, which physically separates transcription from translation, allowing for intricate post-transcriptional controls and enabling the evolution of complex multicellularity through fine-tuned gene regulation.^[144] In prokaryotes, the absence of a nucleus facilitates immediate translation of nascent mRNAs, prioritizing efficiency over the layered regulatory depth seen in eukaryotes.81693-4) This compartmentalization thus underpins the greater regulatory sophistication in eukaryotes, accommodating larger genomes and diverse cellular functions.^[142]