Fact-checked by Grok 2 weeks ago

CAAT box

The CAAT box, also known as the CCAAT box, is a conserved cis-regulatory DNA sequence element located in the promoter regions of numerous eukaryotic genes, typically positioned 50 to 100 base pairs upstream of the transcription start site. It functions as a for specific transcription factors that facilitate the assembly of the transcription initiation complex, thereby enhancing or regulating the rate of RNA polymerase II-mediated transcription. The for the CAAT box is generally recognized as 5'-GGCCAATCT-3' or a close variant such as CCAAT, though it can exhibit some flexibility while maintaining core functionality. This element is distinct from other core promoter motifs like the but often works in concert with them to direct accurate transcription initiation. The primary that binds the CAAT box is Nuclear Factor Y (NF-Y), a heterotrimeric consisting of NF-YA, NF-YB, and NF-YC subunits, which recognizes the CCAAT motif with high specificity and recruits additional co-activators to the promoter. The CAAT box plays a critical role in tissue-specific and inducible across diverse biological processes, including , response, and , by modulating promoter strength in response to cellular signals. Mutations or deletions in the CAAT box can significantly impair transcription efficiency, as demonstrated in studies of and cellular genes, underscoring its evolutionary conservation from to humans. In , CAAT box-binding factors like plant NF-Y orthologs further highlight its broad importance in and hormone signaling.

Fundamentals

Definition and Discovery

The CAAT box, also known as the CCAAT box, is a conserved cis-regulatory DNA sequence found in the promoter regions of many eukaryotic genes. It consists of a short motif, typically 5-10 base pairs in length, that functions as a binding site for specific transcription factors, thereby enhancing the rate of transcription initiation by RNA polymerase II. This element contributes to the recruitment of the basal transcription machinery and is essential for efficient gene expression in both viral and cellular contexts. The CAAT box was first identified in 1980 in the promoter regions of eukaryotic genes, such as the chicken ovalbumin gene. It was subsequently recognized in promoters, including the major late promoter (MLP) of human adenovirus type 2, where it appears as an inverted sequence (ATTGG) approximately 70-80 base pairs upstream of the transcription start site, alongside the . Functional analyses in the early 1980s demonstrated its key role in directing accurate transcription initiation by polymerase II. The CAAT box plays a vital role in both basal and regulated transcription, influencing the expression of genes involved in tissue-specific and developmental processes. It is present in approximately 25-30% of eukaryotic promoters, particularly those driving and inducible genes, and its activity helps modulate transcription levels in response to cellular signals. For instance, mutations or absences of the CAAT box in model promoters like the adenovirus MLP lead to reduced transcriptional efficiency, underscoring its importance in gene regulation.

Consensus Sequence

The CAAT box is characterized by a core of CCAAT, with the pentamer being highly invariant across eukaryotic promoters, where the first and the fourth are fully conserved. Flanking enhance specificity, yielding an extended of GGCCAATCT in many animal systems, with the GG dinucleotide upstream and CT dinucleotide downstream providing additional binding affinity. Variations in the sequence introduce degeneracy while maintaining functionality, such as GG(T/C)CAATCT, where the third position allows or substitution without abolishing recognition. In , the motif often deviates to CAAAT or simply CAAT, reflecting organism-specific adaptations, though the core CCAAT remains prevalent; for instance, high α-tocopherol soybean promoters favor CAAAT over CCAAT. This evolutionary conservation spans from , where the minimal 5'-CCAAT-3' motif suffices for binding, to mammals, with organism-specific tweaks in flanking regions ensuring regulatory precision across eukaryotes. Detection of CAAT boxes traditionally employs experimental techniques like DNase I footprinting to reveal protected DNA regions indicative of protein binding and (EMSA) to confirm sequence-specific interactions through gel retardation. Complementing these, bioinformatics approaches involve scanning in promoter databases, such as JASPAR for general eukaryotes or PLACE for , using position weight matrices derived from aligned consensus sites to identify potential CAAT elements computationally.

Role in Gene Regulation

Location in Promoters

The CAAT box is typically positioned 50 to 150 base pairs upstream of the transcription start site (TSS) in eukaryotic promoters, serving as a proximal regulatory element. This placement allows it to influence of the transcription without directly overlapping promoter. In many cases, the CAAT box is situated more specifically between -60 and -100 bp relative to the TSS, as observed in various . For instance, in the major late promoter of adenovirus type 2, the inverted CAAT box is located approximately 80 upstream of the TSS, where it coordinates with upstream promoter elements to drive efficient transcription. Unlike the , which occupies a relatively fixed position around -30 bp upstream of the TSS, the CAAT box demonstrates considerable positional variability across promoters, enabling flexible integration into diverse regulatory architectures. Promoters may contain multiple CAAT boxes, often at varying distances from the TSS, which can amplify transcriptional activation through of factors. The functional efficacy of the CAAT box is orientation-independent, allowing it to function effectively in both forward and reverse orientations, as seen in various promoters including the inverted CAAT box in the adenovirus major late promoter. Beyond promoters, CAAT boxes are occasionally found in enhancers, where they contribute to long-range regulation, and in , particularly the first intron of some genes, influencing post-initiation processes. Evolutionarily, the CAAT box is a hallmark of , present ubiquitously across eukaryotic genomes but entirely absent in prokaryotes, which rely instead on distinct sigma factor-binding motifs like the -10 and -35 boxes. Its prevalence is notably higher in promoters of genes, which often feature TATA-less, CpG-rich architectures, compared to tissue-specific genes that more frequently incorporate boxes alongside CAAT elements. This distribution underscores the CAAT box's role in supporting constitutive, broad expression patterns essential for cellular maintenance.

Interaction with Core Promoter Elements

The CAAT box exhibits significant synergy with the TATA box in facilitating transcription initiation by RNA polymerase II, primarily through enhanced recruitment of the basal transcription machinery. In reporter gene assays, the presence of a functional CAAT box upstream of the TATA box can amplify TATA-driven transcriptional initiation by approximately 5- to 10-fold, as observed in constructs derived from viral and cellular promoters. For instance, in the major late promoter of subgroup C human adenoviruses, mutations disrupting both the CAAT and TATA boxes result in a lethal phenotype for viral replication, underscoring their interdependent roles in promoter activation, while isolated CAAT mutations reduce transcription by up to 6-fold when combined with disruptions in adjacent elements. Beyond the , the CAAT box cooperates with other core promoter elements, such as GC boxes and initiator (Inr) sequences, to establish a modular promoter that fine-tunes . This cooperation allows for combinatorial control, where the spatial arrangement of these elements—often with the CAAT box positioned around -80 bp relative to the transcription start site—enables synergistic activation across diverse contexts. In some cases, CAAT boxes contribute to bidirectional promoter activity, supporting transcription from both DNA strands in convergent pairs, as seen in certain eukaryotic promoters where they integrate with upstream regulatory modules to drive balanced expression. The functional impact of these interactions is evident in the modulation of overall promoter strength, where the CAAT box acts as a critical rather than an independent driver. Mutations in the CAAT box disrupt this , leading to substantial reductions in transcriptional output; for example, in the human beta-globin promoter, CAAT box alterations decrease expression by approximately 10-fold in erythroid cell lines, highlighting its necessity for tissue-specific gene regulation. Such disruptions not only impair basal transcription but also abolish responses to upstream enhancers, emphasizing the CAAT box's role in integrating core promoter signals for robust initiation.

Binding Factors in Animals

CCAAT Enhancer Binding Proteins (C/EBPs)

The CCAAT enhancer binding proteins (C/EBPs) constitute a family of transcription factors in mammals that recognize and bind to CCAAT/enhancer motifs, playing pivotal roles in gene regulation. These motifs are palindromic sequences containing a CCAAT pentamer, distinct from the promoter-proximal CAAT box primarily bound by nuclear factor Y (NF-Y). This family comprises six members: C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ (also known as CHOP). All members share a highly conserved basic (bZIP) domain at their , which facilitates both DNA binding through the basic region and protein dimerization via the leucine zipper motif. These proteins predominantly form homodimers or heterodimers among family members, enabling cooperative binding to DNA and enhancing transcriptional activation. C/EBPs exhibit specificity for CCAAT/enhancer sequences (consensus 5'-RTTGCGYAAY-3', where R is a and Y is a ) located in the enhancers and promoters of target genes, with notable examples including the liver-specific gene and the acute-phase (CRP) gene. Their expression is often tissue-specific; for instance, C/EBPα, C/EBPβ, and C/EBPδ are highly expressed in hepatocytes of the liver and in adipocytes, where they coordinate developmental and physiological processes. This targeted expression allows C/EBPs to fine-tune in response to cellular needs, such as during or . In terms of regulatory functions, C/EBPs are essential activators of the acute-phase response during , as well as in metabolic regulation. C/EBPβ and C/EBPδ are particularly critical in these contexts: C/EBPβ mediates inflammatory signaling by binding promoters of cytokines like TNF, IL-8, and G-CSF in response to stimuli such as (LPS) and IL-1, while both proteins drive the initial stages of differentiation and support and . Post-translational modifications, notably , modulate their activity; for example, phosphorylation of C/EBPβ at threonine-235 increases its DNA binding affinity and transcriptional potency, thereby influencing the strength of activation.

Nuclear Factor Y (NF-Y)

Nuclear Factor Y (NF-Y) is the primary that binds the CAAT box in animal promoters, recognizing the 5'-GGCCAATCT-3' or variants thereof. NF-Y is a heterotrimeric complex composed of NF-YA, NF-YB, and NF-YC subunits, which assemble to specifically contact the CCAAT motif and recruit co-activators to enhance transcription initiation. This factor is essential for basal and regulated expression of numerous genes in animals, complementing the role of C/EBPs in enhancer contexts.

Binding Mechanism in Animals

The binding of CCAAT/enhancer-binding proteins (C/EBPs) to CCAAT/enhancer motifs in animal cells initiates with dimerization mediated by the C-terminal leucine zipper domain, which forms a parallel coiled-coil structure stabilized by hydrophobic interactions involving leucine residues and interhelical salt bridges, such as those between Asp320-Arg325' and Glu334-Arg339'. This dimerization positions the adjacent N-terminal basic regions (residues 285–300 in C/EBPα) to extend as continuous α-helices that insert into the major groove of DNA, adopting a fork-like "scissors-grip" configuration that clamps the DNA duplex. The basic regions make sequence-specific contacts with the CCAAT/enhancer motif (consensus 5'-RTTGCGYAAY-3'), primarily through hydrogen bonding and electrostatic interactions that recognize the core sequence elements. Crystal structures of the C/EBPα basic (bZIP) domain bound to a DNA site (ATTGCGCAAT) reveal key atomic interactions, including Arg289 forming hydrogen bonds with the N7 of at position 3 (A3) in the and nearby groups, while Arg300 engages in electrostatic contacts with guanines at positions G1 and G-2. Additional specificity arises from Asn292 hydrogen-bonding to at T-4 and A3, and Val296 sterically restricting purines at T-3 to favor the . These interactions occur symmetrically across the dyad axis, with each contacting one half-site (e.g., TGCG and CAAT), enabling high-affinity (Kd ≈ 10-50 for optimal sites). Cooperative binding is enhanced by interactions with co-activators such as CBP/p300, where C/EBPβ recruits p300 via its E1A-binding domain, leading to mutual stabilization on the promoter and of p300's C-terminal activation domain (e.g., at Ser1849 and Thr1851), which modulates acetylation and accessibility. This recruitment facilitates further assembly by bridging C/EBP to complex, specifically the active form lacking CDK8 and containing CRSP70/MED23, which in turn associates with to promote preinitiation complex (PIC) formation and transcriptional initiation at target genes. Phosphorylation provides allosteric regulation of binding affinity; for instance, mitogen-activated protein kinase (MAPK) pathways phosphorylate C/EBPβ at sites like Thr235/Pro236, inducing conformational changes that enhance DNA contact and increase binding affinity by 2- to 5-fold, as observed in mobility shift assays with phosphorylated isoforms. Similarly, casein kinase II phosphorylation in the basic region boosts transactivation without altering specificity, underscoring post-translational control in response to signals like Ras activation.

Binding Factors in Plants

Nuclear Factor Y (NF-Y)

Nuclear Factor Y (NF-Y) serves as the primary transcription factor that binds to the CAAT box in plant promoters, operating as a heterotrimeric complex consisting of NF-YA, NF-YB, and NF-YC subunits. This complex is highly conserved across eukaryotes, including both plants and animals, where it recognizes and binds the CCAAT motif to activate transcription of target genes. In the plant context, NF-Y predominantly regulates genes associated with the , such as those involved in embryogenesis and , as well as stress response pathways that enable to environmental challenges. In plants, NF-Y exhibits diverse and essential roles, particularly in controlling promoters of key developmental and adaptive genes in species like Arabidopsis thaliana and maize (Zea mays). For instance, in Arabidopsis, NF-Y complexes are critical for seed development through regulation of embryogenesis and for enhancing drought tolerance by modulating genes responsive to water deficit, such as AtNF-YA5. Similarly, in maize, NF-Y factors like ZmNF-YB2 contribute to improved drought resistance and yield stability under stress conditions. These functions underscore NF-Y's importance in agronomic traits, making it a target for crop improvement strategies. Evolutionarily, the plant NF-Y complex traces its origins to the yeast CCAAT-binding heterotrimer HAP2/3/5, reflecting deep conservation in machinery. However, plant NF-Y variants have undergone expansions and adaptations, particularly in integrating environmental signals such as and hormones to fine-tune in response to developmental cues and abiotic stresses. This evolutionary divergence allows plant NF-Y to orchestrate context-specific regulation beyond the ancestral fungal roles.

NF-Y Subunits and Complex Assembly

The Nuclear Factor Y (NF-Y) transcription factor complex in plants is a heterotrimer composed of three distinct subunits: NF-YA, which functions as the sequence-specific DNA-binding subunit; and NF-YB and NF-YC, which contain histone fold domains (HFDs) that mediate dimerization and provide a scaffold for complex assembly. The HFDs in NF-YB and NF-YC are structurally similar to those of histones H2A and H2B, respectively, enabling stable protein-protein interactions and non-sequence-specific DNA contacts. In plant genomes, such as that of Arabidopsis thaliana, each subunit is encoded by multiple paralogous genes—10 for NF-YA, 13 for NF-YB, and 13 for NF-YC—resulting in the potential for hundreds of heterotrimeric combinations that contribute to functional diversity. Assembly of the NF-Y complex initiates with the formation of an NF-YB/NF-YC heterodimer through their HFDs, a process stabilized by conserved interface residues including hydrogen bonds and hydrophobic interactions within the α-helices and loops of the folds. This dimer serves as an obligatory scaffold that subsequently recruits NF-YA via its conserved C-terminal domain, which interacts with the HFD through a trimerization interface involving salt bridges and hydrogen bonds, such as those formed by arginine residues in NF-YA. The resulting trimer is essential for DNA binding, as neither the dimer alone nor individual subunits exhibit high-affinity interaction with the target motif. The assembled NF-Y trimer binds the CAAT box by clamping the DNA double helix, with NF-YA providing specificity for the core CCAAT pentamer through direct base contacts via conserved residues like arginines and histidines, while the NF-YB/NF-YC HFDs wrap around the adjacent minor groove. In plants, this binding shows enhanced affinity for variants such as CAAAT, facilitated by flexible recognition of flanking bases (e.g., preferences for C at the +1 position and G at -1 relative to the core). Crystal structures of Arabidopsis NF-Y trimers in complex with DNA, resolved at 2.5 Å, reveal histone-like interactions that position the HFDs to mimic nucleosome core particle contacts, underscoring the structural basis for the complex's adaptability to diverse genomic contexts.

References

  1. [1]
    CAAT Box - an overview | ScienceDirect Topics
    Keep in mind that the CAAT box, GC boxes, and even the TATA box are promoter elements; they are sequences that have been found within various promoters. Other ...
  2. [2]
    CAAT box - (General Biology I) - Vocab, Definition, Explanations
    The CAAT box is a conserved nucleotide sequence found in the promoter region of eukaryotic genes. It plays a crucial role in the binding of transcription ...
  3. [3]
    The activity of the CCAAT-box binding factor NF-Y is ... - PubMed
    NF-Y is a ubiquitous and evolutionarily conserved transcription factor that binds specifically to the CCAAT motif present in the 5' promoter region of a wide ...
  4. [4]
    Functional Analysis of the CAAT Box in the Major Late Promoter of ...
    The inverted CCAAT box is located about 80 nucleotides (nt) upstream of the transcription start site and binds the cellular factor CP1 (7, 29). CP1 (CBF) ...
  5. [5]
    Regulation of Gene Expression Mechanisms - Advanced | CK-12 ...
    The CCAAT Box, which is also known as the CAAT box or CAT box, is an eukaryotic cis-regulatory element with a 5'-GGCCAATCT-3' consensus sequence. It occurs ...
  6. [6]
    CAAT Box - BYJU'S
    CAAT box in molecular biology refers to a separate pattern of nucleotides with GGCCAATCT consensus sequence occurring upstream by the 60-100 bases to the ...
  7. [7]
    Many promoter regions contain CAAT boxes containing consensus ...
    The CAAT box is a conserved DNA sequence (CAAT or CCAAT) within promoters that binds specific transcription factors to enhance or regulate transcription ...
  8. [8]
    CCAAT-box binding transcription factors in plants: Y so many?
    Transcription factors belonging to the CCAAT-box binding factor family (also known as the Nuclear Factor Y) are present in all higher eukaryotes.
  9. [9]
    Functional analysis of the CAAT box in the major late promoter of the ...
    ... CAAT box had further lowered the binding of transcription factor CP1 (also called CBF, NF-Y). Replacement of the CAAT box by an ATF binding site or an OCT1 ...
  10. [10]
    A survey of 178 NF-Y binding CCAAT boxes - Oxford Academic
    The CCAAT box is one of the most common elements in eukaryotic promoters, found in the forward or reverse orientation. Among the various DNA binding proteins ...
  11. [11]
    Genetic variation of γ-tocopherol methyltransferase gene contributes ...
    Nov 7, 2011 · The motif of the CAAT box in high α-tocopherol soybeans was "CAAAT", whereas the motif in typical soybeans was "CCAAT". "CCAAT" is the ...
  12. [12]
    Identification of Leaf Promoters for Use in Transgenic Wheat - MDPI
    Mar 28, 2018 · The CAAT-Box contains the sequence 5′-CCAAT-3/5′-CAAAT-3 at its core and is a cis-acting regulatory element that potentially binds a number of ...
  13. [13]
    CCAAT-binding factor complex | SGD
    Transcriptional activator that binds to the 5'-CCAAT-3' consensus elements within promoters required for yeast to grow on nonermentable carbon sources such ...
  14. [14]
    In vivo footprinting analysis of the CCAAT box array and surrounding...
    In vivo footprinting analysis of the CCAAT box array and surrounding elements. (a–d) Footprints from NIH 3T3 cells (3T), mouse liver (Li), brain (Br), kidney ( ...
  15. [15]
  16. [16]
    TBP introduction
    Dec 30, 1999 · The CCAAT-box (consensus GGT/CCAATCT) is located 50 to 130 residues upstream of the transcriptional start site. Protein such as C/EBP (for CCAAT ...
  17. [17]
    14 - Gene Regulation in Eukaryotes - EdTech Books
    The CAAT box is located at -80 and has the sequence 5'-GGCCAATCT-3'. Another common regulatory promoter component is a GC box (5'- GGGCGG – 3') located at -100.
  18. [18]
    A human cytomegalovirus early gene has three inducible promoters ...
    DNA sequence analysis detected TATA and CAAT boxes plus multiple-dyad symmetries in the promoter-regulatory region. Deletion analyses showed that the maximum ...
  19. [19]
    NF-Y behaves as a bifunctional transcription factor that can stimulate ...
    Importantly, we also determined that the orientation and the position of the CCAAT box are critical for its role in regulating the FGF-4 promoter. Together ...<|separator|>
  20. [20]
    Analysis of Intron Sequence Features Associated with ...
    The results suggest that the first introns in the HK genes prefer CG-rich sites while those in TSP genes prefer TATA and CAAT boxes. Our study also reveals that ...
  21. [21]
    Prevalence of the Initiator over the TATA box in human and yeast ...
    Aug 10, 2025 · ... housekeeping genes in vertebrate organisms are often TATA-less and ... CAAT box. Furthermore, 493 (48%) PPRs were located in CpG ...
  22. [22]
    Functional Analysis of the CAAT Box in the Major Late Promoter of ...
    First, a multiple mutation was created in the CAAT box, on the assumption that this might achieve a greater reduction in the binding of CP1 compared to that ...
  23. [23]
    A perspective of promoter architecture from the CCAAT box
    Sep 25, 2009 · NF-Y is essential for the recruitment of RNA polymerase II and inducible transcription of several. CCAAT box-containing genes. Mol Cell Biol ...
  24. [24]
    The bidirectional promoter of two genes for the mitochondrial ...
    This region contains two additional CCAAT box elements which are identically spaced in the two organisms, as well as the NRF-2 binding site identified ...
  25. [25]
  26. [26]
    CCAAT/enhancer-binding proteins: structure, function and regulation
    This review deals with the structure, biological function and the regulation of the C/EBP family. Full Text. The Full Text of this article is available as a PDF ...Missing: paper | Show results with:paper
  27. [27]
  28. [28]
    Recruitment of p300 by C/EBPβ triggers phosphorylation of ... - NIH
    We now show that C/EBPβ not only binds to p300 but also triggers massive phosphorylation of p300. This novel activity of C/EBPβ is dependent on the E1A-binding ...
  29. [29]
  30. [30]
    Modulation of DNA binding properties of CCAAT/enhancer binding ...
    Jan 25, 2007 · RelA-C/EBP interaction is enhanced by phosphorylation of threonine at amino acid 75 and results in increased DNA binding compared with the wild- ...
  31. [31]
    The Promiscuous Life of Plant NUCLEAR FACTOR Y Transcription ...
    The assembly of the CAAT-box binding complex at a photosynthesis gene promoter is regulated by light, cytokinin, and the stage of the plastids. J. Biol ...
  32. [32]
  33. [33]
  34. [34]
  35. [35]
    Interactions and CCAAT-Binding of Arabidopsis thaliana NF-Y ...
    NF-Y is a transcription factor that recognizes with high specificity and affinity the widespread CCAAT box promoter element. It is formed by three subunits: NF- ...
  36. [36]
    Structural determinants for NF‐Y subunit organization and NF‐Y/DNA association in plants
    ### Summary of Structural Methods and Findings for NF-YB/NF-YC Dimers and NF-Y Trimer