Fact-checked by Grok 2 weeks ago

Start codon

The start codon is a specific sequence of three nucleotides in messenger RNA (mRNA) that marks the point at which translation—the process of synthesizing a protein—begins by directing the ribosome to assemble the first amino acid. In both prokaryotes and eukaryotes, the most common start codon is AUG, which encodes the amino acid methionine in eukaryotes and N-formylmethionine in prokaryotes, serving dual roles as both an initiation signal and the first codon in the genetic code. This codon is recognized by initiator transfer RNA (tRNA), which binds to the ribosome's P-site to kickstart polypeptide chain elongation. While AUG predominates, alternative start codons exist and can expand the proteome's diversity, particularly under specific cellular conditions or in certain organisms. In prokaryotes, GUG and UUG can also function as start codons, often leading to the incorporation of formylmethionine, though with lower efficiency than AUG. In eukaryotes, non-AUG codons such as CUG, GUG, and UUG are used in a subset of mRNAs, sometimes resulting in proteins with non-methionine N-termini, and their selection is influenced by the surrounding context known as the Kozak sequence. A 2017 study indicated that at least 47 of the 64 possible triplet codons may initiate in , challenging traditional views and highlighting the flexibility of start codon recognition. The accuracy of start codon selection is critical for proper , as errors can lead to out-of-frame or truncated proteins, potentially causing cellular dysfunction or . In eukaryotes, ribosomal scanning from the mRNA's 5' cap ensures the first suitable is chosen, modulated by initiation factors like and eIF1. Non-canonical start codons, while less efficient, play roles in regulating during , , or in mitochondrial and genomes, underscoring their biological significance beyond canonical .

Overview

Definition and Function

A start codon is a sequence of three , or trinucleotide, in (mRNA) that specifies the initiation site for protein by the . In the standard , the primary start codon is AUG, which codes for the but serves a distinct role in signaling the beginning of . The primary function of the start codon is to recruit the initiator (tRNA), which carries in prokaryotes or unmodified in eukaryotes, to the ribosome-mRNA complex. This recruitment facilitates the assembly of the ribosomal initiation complex, including the binding of the small ribosomal subunit to mRNA and subsequent joining of the large subunit, thereby establishing the correct for decoding subsequent codons. By defining the start point, the start codon ensures the synthesis of the polypeptide chain proceeds accurately from the to the , preventing misinterpretation of the genetic message. Start codons exhibit near-universal conservation across all domains of life, underscoring the shared evolutionary ancestry of the genetic code. This universality, with AUG as the predominant initiator in most organisms, reflects the code's ancient origins, though rare exceptions occur in specialized systems such as mitochondria. The absence or mutation of a start codon typically prevents translation initiation, resulting in no protein production or the use of an alternative downstream start site, which often yields truncated or non-functional proteins. Such alterations can also induce frameshift errors if translation begins out of frame, leading to aberrant polypeptides with incorrect amino acid sequences and potential loss of biological activity.

Context in the Genetic Code

The standard consists of possible triplets (codons) formed from the four bases (A), (C), (G), and uracil (U) in (mRNA), which specify the 20 standard and three stop signals during protein translation. This code is nearly universal across all domains of , with the codon universally assigned to the (Met) in both internal positions and as the primary initiation signal. The following table summarizes the standard , organized by the first two bases of each codon (third base degeneracy is indicated in the rows):
First baseUCAGThird base
UUUU (Phe)
UUC (Phe)
UUA (Leu)
UUG (Leu)
UCU (Ser)
UCC (Ser)
UCA (Ser)
UCG (Ser)
UAU (Tyr)
UAC (Tyr)
UAA (Stop)
UAG (Stop)
UGU (Cys)
UGC (Cys)
UGA (Stop)
UGG (Trp)
U
C
A
G
CCUU (Leu)
CUC (Leu)
CUA (Leu)
CUG (Leu)
CCU (Pro)
CCC (Pro)
CCA (Pro)
CCG (Pro)
CAU (His)
CAC (His)
CAA (Gln)
CAG (Gln)
CGU (Arg)
CGC (Arg)
CGA (Arg)
CGG (Arg)
U
C
A
G
AAUU (Ile)
AUC (Ile)
AUA (Ile)
AUG (Met, Start)
ACU (Thr)
ACC (Thr)
ACA (Thr)
ACG (Thr)
AAU (Asn)
AAC (Asn)
AAA (Lys)
AAG (Lys)
AGU (Ser)
AGC (Ser)
AGA (Arg)
AGG (Arg)
U
C
A
G
GGUU (Val)
GUC (Val)
GUA (Val)
GUG (Val)
GCU (Ala)
GCC (Ala)
GCA (Ala)
GCG (Ala)
GAU (Asp)
GAC (Asp)
GAA (Glu)
GAG (Glu)
GGU (Gly)
GGC (Gly)
GGA (Gly)
GGG (Gly)
U
C
A
G
The codon AUG exhibits dual functionality within this code: it encodes for incorporation at internal sites during but primarily serves as the start codon to initiate , where it specifies (fMet) in prokaryotes or unmodified in eukaryotes. This distinction arises because the initiator form of methionine tRNA recognizes AUG in a specific ribosomal context that prioritizes over routine amino acid addition. The start codon also establishes the for the entire mRNA sequence, defining the correct phase (offset of 0, +1, or +2 ) to ensure accurate grouping of subsequent codons into without frameshift errors. Without this defined starting point, could produce non-functional polypeptides due to misaligned triplets. The role of as the initiating triplet was elucidated in the through cell-free experiments, notably by Marshall Nirenberg and Philip Leder, who used synthetic triplets and ribosome- assays to identify as the codon that promotes the binding of methionyl-tRNA and initiates polypeptide synthesis.

Recognition and Initiation

Decoding by the Ribosome

In prokaryotes, the ribosome recognizes the start codon through base-pairing between a purine-rich Shine-Dalgarno (SD) sequence, typically AGGAGG, located 4–12 nucleotides upstream of the AUG, and the anti-SD sequence (CCUCC) at the 3' end of the 16S rRNA in the small ribosomal subunit. This interaction positions the start codon precisely in the ribosomal , facilitating the assembly of the 70S initiation complex. In eukaryotes, recognition involves the Kozak consensus sequence, such as GCCAUGG, surrounding the AUG start codon, which optimizes binding and enhances selection by the 40S small ribosomal subunit during the scanning process from the mRNA 5' cap. The initiator tRNA briefly pairs with the start codon during this decoding step. The start codon is decoded directly in the peptidyl (P) site of the ribosome, unlike during elongation where incoming codons occupy the aminoacyl (A) site, ensuring the initiator tRNA is positioned for the first peptide bond. Fidelity of start codon selection is maintained by proofreading mechanisms involving GTP hydrolysis in initiation factors: IF2 in prokaryotes and in eukaryotes, which hydrolyze GTP upon correct codon-anticodon pairing to commit the ribosome to and reject mismatches, significantly enhancing the of start codon selection.

Initiator tRNA and Methionine

The initiator transfer RNA (tRNAiMet) is a specialized tRNA that recognizes the start codon AUG through its anticodon sequence CAU, enabling the delivery of to initiate protein synthesis. Unlike elongator tRNAs, tRNAiMet exhibits unique structural features, including a conserved A1:U72 at the aminoacyl acceptor and three consecutive G:C s in the anticodon (positions 29–31 paired with 41–39), which collectively promote direct binding to the ribosomal and enhance in start codon selection. In eukaryotes, additional post-transcriptional modifications, such as N6-threonylcarbamoyladenosine (t6A) at position 37 adjacent to the anticodon, stabilize codon-anticodon interactions and improve decoding efficiency at the AUG start site. In bacteria, the methionine carried by tRNAiMet is modified to N-formylmethionine (fMet) after charging with methionine, a process catalyzed by the enzyme methionyl-tRNA formyltransferase (encoded by the fmt gene) using 10-formyltetrahydrofolate as the formyl donor. This formylation occurs specifically on the initiator tRNAiMet and not on the elongator tRNAMet, due to structural determinants in the tRNA that allow selective recognition by the formyltransferase, thereby committing the amino acid exclusively to initiation. In eukaryotes, the methionine remains unmodified, as cytosolic formyltransferase activity is absent, though fMet is utilized in mitochondrial and chloroplast translation, reflecting their bacterial ancestry. The N-terminal fMet in prokaryotic proteins is frequently removed post-translationally by methionine aminopeptidases, exposing the penultimate residue for further processing or degradation signals. Both initiator and elongator tRNAMet are charged with methionine by the same methionyl-tRNA synthetase (MetRS), which recognizes conserved identity elements like the anticodon CAU and acceptor stem sequences without distinguishing between the two tRNAs during aminoacylation. The specificity for initiation arises downstream: in prokaryotes, formylation by Fmt ensures fMet-tRNAiMet is directed to initiation factors like IF2, while in eukaryotes, unmodified preferentially binds eIF2-GTP due to unique structural motifs, such as the A1:U72 pair and T-loop features (e.g., A54:U55). Key distinctions between initiator and elongator tRNAs prevent the former from participating in . In eukaryotes, fungal and plant tRNAiMet often feature a 2'-O-ribosyl modification at 64 (A64) in the TψC , which sterically hinders to the eEF1A and favors P-site accommodation. Mammalian tRNAiMet lacks this A64 modification but relies on base-pairing differences (e.g., U50:A64 instead of the elongator's G:C) and reduced affinity for eEF1A to exclude participation. In , tRNAifMet shows lower affinity for Tu (EF-Tu) compared to elongator tRNAMet, attributed to mismatches in the acceptor stem and T-, while exhibiting higher affinity for IF2 to ensure preferential recruitment to the during start codon decoding. These adaptations collectively ensure that initiator tRNA binds the ribosomal first, establishing the without competing in internal cycles.

Domain-Specific Variations

Bacterial Start Codons

In , the primary start codon is , accounting for approximately 83% of protein-coding genes in model organisms such as . This codon is recognized through the Shine-Dalgarno (SD) sequence, a purine-rich typically located 4–9 upstream of the , which base-pairs with a complementary anti-SD sequence at the 3' end of the 16S rRNA in the 30S . This interaction positions the ribosome precisely at the start site, facilitating efficient initiation complex formation. Alternative start codons include GUG (approximately 14% of genes) and UUG (approximately 3% of genes), which are decoded by the same initiator tRNAfMet as AUG. The anticodon of tRNAfMet (5'-CAU-3') enables wobble pairing at the first codon position, allowing recognition of GUG and UUG while incorporating (fMet) as the initial in all cases. These non-AUG codons are typically associated with stronger or more conserved SD sequences, which compensate for reduced base-pairing stability with the initiator tRNA. Non-AUG start codons are more frequently utilized in specific contexts, such as leaderless mRNAs lacking extended 5' untranslated regions or for internal within polycistronic transcripts. Their translation efficiency is lower than that of AUG, with GUG and UUG supporting at roughly 10–70% the rate of AUG depending on the SD context and assay conditions. Genes initiating with non-AUG codons often exhibit reduced expression levels compared to AUG-initiated genes. In vitro translation assays using reporter constructs like GFP and nanoluciferase have demonstrated the functionality of GUG and UUG as start codons, producing full-length proteins with correct N-terminal fMet. These experiments confirm that non-AUG initiation occurs without altering the , as the initiator tRNA occupies the ribosomal and proceeds from the defined start position.

Eukaryotic Start Codons

In eukaryotic cytoplasmic , the primary start codon is AUG, which accounts for over 99% of annotated protein-coding open reading frames across diverse eukaryotes, including mammals, , and . This codon is recognized by the initiator methionyl-tRNA during the scanning process, where the 43S pre-initiation complex, comprising the ribosomal subunit, eukaryotic factors, and Met-tRNAi^Met, binds near the 5' structure of the mRNA and migrates downstream in a 5'-to-3' direction to locate the first suitable AUG. This cap-dependent scanning mechanism ensures efficient at the optimal site, contrasting with the Shine-Dalgarno-mediated direct binding in . The efficiency of AUG recognition is strongly influenced by the surrounding context, known as the . The optimal motif in vertebrates is GCCGCCACCAUGG, where the (A or G) at position -3 relative to the A of and at +4 are particularly critical for ribosomal positioning and stable anticodon-codon pairing. Mutations in these positions, especially at -3, can reduce translation initiation efficiency by >10-fold in mammalian cell systems, as demonstrated through experiments that quantified expression. Suboptimal contexts thus modulate protein synthesis levels, providing a layer of . Although predominates, rare non-canonical start codons such as CUG (coding for ), GUG, and UUG are utilized in eukaryotic cytoplasmic , comprising approximately 1-2% of initiation sites in organisms like under specific conditions. These alternatives, often with efficiencies 20-60% of AUG depending on context, occur in responses or for particular genes, such as the CUG-initiated isoform of the GRS1 tRNA synthetase in during amino acid starvation. Context-dependent selection of these sites enables the production of protein isoforms with distinct N-termini, as seen in oncogenes where alternative initiation at upstream CUG codons generates longer, more stable variants; for instance, the CUG-initiated c-Myc1 isoform in cells promotes and is implicated in tumorigenesis.

Archaeal Start Codons

In , the primary start codon for is , which is utilized in the vast majority of cases, often exceeding 90% across analyzed genomes and transcripts. This codon pairs with an initiator tRNA that is structurally similar to its eukaryotic counterpart, carrying an unmodified residue without N-formylation, distinguishing it from the bacterial system. The archaeal initiator tRNAMeti features a CAU anticodon and is charged by a methionyl-tRNA synthetase homologous to eukaryotic enzymes, ensuring precise recognition of AUG during initiation complex assembly. Alternative start codons are employed infrequently in , reflecting a partial similarity to bacterial mechanisms but with lower overall frequency. GUG and UUG serve as non-canonical initiators in a minority of transcripts, typically less than 10-20% depending on the species and mRNA type, and they also code for when functioning as starts despite their standard assignment. In certain , such as Sulfolobus solfataricus, GUG accounts for about 12% of leaderless mRNAs and 20% of leadered ones, while UUG is even rarer at 0-7%. Additionally, AUA—normally decoding —has been documented as a start codon in specific genes, such as the L12 ribosomal protein in S. solfataricus, where it initiates of a shortened in natural contexts. CUG usage remains exceptionally rare and is not a dominant alternative in archaeal systems. Archaeal translation initiation exhibits a hybrid character, incorporating eukaryotic-like factors with prokaryotic ribosomal elements. The key GTPase aIF2, a homolog of eukaryotic , consists of α, β, and γ subunits and delivers the initiator tRNA to the of the ribosomal subunit in a manner akin to eukaryotes, promoting fidelity in start codon selection. However, the ribosomes themselves resemble bacterial 70S particles, with prokaryotic-like rRNA and proteins that facilitate direct binding rather than cap-dependent recruitment. Accessory factors like aIF1 and aIF1A further enhance specificity by monitoring the anticodon-codon interaction and preventing initiation at suboptimal sites. A distinctive feature of archaeal mRNAs is the prevalence of leaderless transcripts, which lack 5' untranslated regions and initiate directly at the start codon, mirroring bacterial strategies and comprising 69% of mRNAs in S. solfataricus and 72% in Haloferax volcanii. These are recruited via Shine-Dalgarno (SD) sequences base-pairing with the 16S rRNA anti-SD, enabling efficient 30S subunit binding without scanning. In contrast, some leadered mRNAs in certain archaeal lineages incorporate scanning elements, where the pre-initiation complex moves along the 5' UTR to locate the first suitable AUG, blending prokaryotic direct binding with eukaryotic scanning principles. Efficiency studies in Sulfolobus species underscore AUG dominance; for instance, in cell-free systems, AUG-initiated leaderless mRNAs exhibit up to 88% usage and higher translational output compared to GUG alternatives, with SD motifs boosting overall initiation rates by stabilizing ribosome-mRNA interactions.

Mitochondrial and Chloroplast Start Codons

In vertebrate mitochondria, the genetic code deviates from the standard code such that the codons AUA and AUG both encode methionine, with AUA serving as an alternative start codon in addition to AUG, which remains the primary initiator. Furthermore, the codon AUU, which typically encodes isoleucine in internal positions, can also function as a start codon and is translated as methionine during initiation. In human mitochondrial DNA, which encodes 13 proteins, four of these—ND1, ND2, ND3, and ND5—utilize non-AUG start codons (AUA for ND1, ND3, and ND5; AUU for ND2), representing approximately 31% of the protein-coding genes. This expanded usage is facilitated by a single mitochondrial tRNA^Met (mt-tRNA^Met) with the anticodon CAU, modified at the wobble position with 5-formylcytosine (f^5C), enabling it to decode both AUG and AUA as methionine for both initiation and elongation. Mitochondrial translation initiates with N-formylmethionine (fMet), similar to bacterial systems, where the charged mt-tRNA^Met is formylated by mitochondrial methionyl-tRNA formyltransferase before binding to the ribosomal . Unlike the bacterial fMet, which often remains N-terminal or is processed by deformylation and further cleavage, the mitochondrial fMet is typically deformylated post-translationally by peptide deformylases, resulting in an unmodified N-terminal in the mature protein. In human mitochondria, this process is exemplified by the mt-tRNA^Met, which is the sole tRNA for and supports efficient initiation at these alternative codons despite the compact genome's high A+T bias. Chloroplasts, like mitochondria, employ a genetic code closely resembling the bacterial standard, with AUG as the primary start codon encoding methionine, but they also utilize non-AUG alternatives such as GUG and UUG, which initiate translation with methionine at efficiencies of about 10-15% relative to AUG. These non-AUG starts are observed in specific genes, such as the psbC gene in plants like tobacco and Chlamydomonas, where GUG serves as the functional initiator despite an upstream AUG that is not utilized. Unlike vertebrate mitochondria, chloroplast codes do not reassign AUA to methionine; instead, AUA encodes isoleucine internally, and non-AUG initiation is limited to GUG and UUG without broader codon expansions. Chloroplast translation also begins with fMet, charged to the initiator tRNA^fMet (with anticodon CAU), which is formylated similarly to its bacterial counterpart and derived from the cyanobacterial endosymbiont. The use of expanded or alternative start codons in mitochondria and chloroplasts traces back to their endosymbiotic origins from free-living —alpha-proteobacteria for mitochondria and for chloroplasts—where the ancestral bacterial machinery allowed flexible initiation at GUG and UUG. Over evolutionary time, mitochondrial codes underwent further modifications, such as the reassignment of AUA to , likely driven by genome compaction and tRNA reduction to a minimal set, enabling broader start codon recognition without additional isoacceptors. In contrast, chloroplast codes retained more bacterial fidelity, with non-AUG starts providing regulatory layers for , as seen in plant cpDNA where about 5% of genes may initiate at non-AUG sites. This bacterial ancestry is evident in the shared of initiator and the reliance on prokaryotic-like ribosomes for decoding.

Non-Canonical Start Codons

Upstream Open Reading Frames

Upstream open reading frames (uORFs) are short sequences within the 5′ untranslated regions (UTRs) of mRNAs that begin with an start codon and terminate at an in-frame , typically encoding peptides of 10 to 100 in length. Although uORFs utilize the start codon, they represent non-canonical initiation events due to their location in the 5' UTR and regulatory function. These elements are prevalent in eukaryotic transcripts, with over 50% of human mRNAs containing at least one uORF. In contrast, uORFs are rare in bacterial mRNAs but occur in leader peptides associated with riboswitches, where their influences downstream or mRNA stability. uORFs primarily regulate of the main (ORF) by impeding ribosomal scanning from the 5′ cap, leading to reduced at the primary coding sequence. Common mechanisms include ribosome stalling during uORF , which blocks access to the main ORF, or modulation of reinitiation, where ribosomes completing a uORF may reacquire initiation factors to translate downstream sequences under specific conditions. A classic example of reinitiation control occurs in the GCN4 gene of yeast (Saccharomyces cerevisiae), where four uORFs in the 5′ UTR respond to starvation. Under nutrient-replete conditions, ribosomes translate the inhibitory uORFs 2–4, preventing main ORF initiation; during starvation, enhanced ternary complex availability allows ribosomes to bypass these after translating the permissive uORF1, promoting GCN4 translation and activation of the general control pathway. In mammals, the ATF4 transcription factor exemplifies stress-responsive uORF regulation, with two uORFs in its 5′ leader: uORF1 permits efficient reinitiation, while uORF2 overlaps the main AUG and inhibits basal translation. Under integrated stress conditions like endoplasmic reticulum stress, phosphorylation of eIF2α delays reinitiation after uORF1, enabling ribosomes to skip uORF2 and initiate at the ATF4 ORF, thereby inducing adaptive responses such as antioxidant gene expression.

Natural Non-AUG Starts

In various organisms, translation can initiate at non-AUG codons, allowing the production of proteins with N-terminal amino acids other than methionine in some cases, though often still incorporating formylmethionine in bacteria or methionine in eukaryotes via specialized mechanisms. In bacteria such as Escherichia coli, GUG (normally encoding valine) serves as a start codon for approximately 14% of genes, including the lacI repressor, but initiation occurs with N-formylmethionine delivered by the initiator tRNA^fMet through base-pairing at the first two positions of the codon. Similarly, UUG (normally leucine) initiates translation for about 3% of E. coli genes, again with fMet, as demonstrated by comprehensive measurements of initiation efficiencies across all 64 codons. These non-AUG starts in prokaryotes typically exhibit 20-50% of the efficiency of AUG, relying on wobble pairing with the initiator tRNA and strong Shine-Dalgarno sequences for recognition. In eukaryotes, non-AUG initiation more frequently results in alternative N-terminal residues, particularly under stress conditions or in viral contexts, where elongator tRNAs may be recruited instead of the standard initiator tRNA^iMet. For instance, in human cells, the CUG codon (normally leucine) initiates translation of an N-terminally extended isoform of fibroblast growth factor 2 (FGF2), incorporating leucine at the start via eIF2A-mediated decoding, which promotes nuclear localization and is upregulated in cancer and stress responses. Another example is the MLV gag region, where an upstream CUG codon initiates a longer glycosylated Gag (gPr80gag) protein with leucine rather than methionine, contributing to viral protein diversity and immune evasion through altered MHC class I presentation. In viral genomes, the Sendai virus P/C mRNA uses an ACG codon (normally threonine) to initiate the C' protein, decoding as threonine and extending the protein by 11 residues compared to the C protein initiated at a downstream AUG, facilitating nested protein expression essential for viral replication. These eukaryotic and viral non-AUG starts generally show reduced efficiency, ranging from 10-70% relative to AUG, depending on Kozak context and cellular conditions, and often involve wobble pairing or alternative initiation factors like eIF2A during stress when eIF2 is phosphorylated. Such non-canonical initiations expand the , enabling regulatory isoforms in developmental , stress , and strategies, sometimes followed by post-translational to refine the . For example, in mammalian heat shock responses, CUG initiation at the MRPL18 produces a leucine-started truncated isoform that enhances of heat shock proteins by altering ribosomal . Recent work shows non-canonical start codons confer context-dependent advantages in utilization for commensal E. coli in the murine gut. Discovery of these events has been advanced by , which maps ribosome-protected fragments to reveal non-AUG sites in endogenous transcripts, and site-directed mutagenesis studies confirming functional protein production from these codons.

Engineered and Synthetic Starts

Engineered and synthetic start codons expand the genetic code beyond canonical AUG initiation, enabling the site-specific incorporation of non-natural amino acids (ncAAs) at protein N-termini for biotechnology applications. Reassignment techniques, such as amber suppression, repurpose the UAG stop codon as a start using orthogonal initiator tRNAs in genomically recoded Escherichia coli strains lacking TAG codons. An engineered amber initiator tRNACUAfMet decodes UAG with high orthogonality, minimizing off-target initiation at internal sites while achieving 20–60% efficiency relative to AUG, depending on context and strain optimization. Quadruplet codons, like AUGC, further enable code expansion through frameshift-capable tRNAs that read four bases as one unit, allowing ncAA insertion; these systems yield 1–3% efficiency compared to triplets but support multiplexed decoding for enhanced proteome diversity. In E. coli, the pyrrolysyl-tRNA synthetase (PylRS)/tRNAPylCUA pair has been adapted to reassign UAG as a methionine-like start codon, charging the tRNA with ncAAs such as azido-lysine variants for bioorthogonal reactions. This orthogonal system, derived from archaeal origins and evolved for bacterial compatibility, facilitates N-terminal ncAA incorporation at efficiencies up to 50% in optimized recoded strains, avoiding competition from release factor 1. Orthogonal initiator tRNAs, such as itRNATy2, extend this to encode multiple distinct ncAAs (e.g., p-azidophenylalanine or propargyl-lysine) at UAG starts in single proteins, confirmed via and functional assays like reporting. These synthetic starts support by installing reactive handles, such as azides at the for copper-free , enabling conjugation to drugs, fluorophores, or polymers without disrupting folding. In , they allow library diversification with ncAAs, yielding variants with improved stability or activity, as demonstrated in selections for enhancement. Overall efficiencies reach 30–80% with tuned expression of synthetases, tRNAs, and reduced release factor activity, though toxicity from mischarging limits scalability. Post-2020 advances include quadruplet-decoding tRNAs for simultaneous ncAA incorporation at multiple sites, including N-termini, in eukaryotic models like . In mammalian cells, engineered initiator tRNAs initiate at non-AUG codons (e.g., CUG or GUG) with up to 70% efficiency, diversifying N-terminal residues for therapeutic protein production; these draw brief inspiration from natural non-AUG starts but rely on synthetic anticodon modifications. CRISPR-based editing integrates such orthogonal systems into mammalian genomes, enabling stable, heritable expression of ncAA-modified therapeutics like cytokines or antibodies with enhanced .

References

  1. [1]
    Glossary for Understanding Eukaryotic Genes
    Aug 6, 2021 · start codon (initiation codon). The first codon of a CDS. In eukaryotes this is almost always ATG, which codes for methionine (one of the 20 ...
  2. [2]
    Chapter 9 Translation
    AUG is the start codon (sometimes GUG) · Methionine and tryptophan have only one codon each. All the other amino acids have several codons note that usually it ...<|separator|>
  3. [3]
    Non-AUG start codons: expanding and regulating the small ... - NIH
    Mar 21, 2020 · In this review, we summarize what is currently known about the incidence, efficiency, and mechanism of non-AUG start codon usage in prokaryotes and eukaryotes.
  4. [4]
    'Start Codons' in DNA and RNA May Be More Numerous Than ...
    Feb 21, 2017 · There are at least 47 possible start codons, each of which can instruct a cell to begin protein synthesis.Missing: definition | Show results with:definition
  5. [5]
    Molecular Mechanism of Scanning and Start Codon Selection in ...
    This review will deal almost exclusively with the process of initiation codon selection in eukaryotes by ribosomal scanning.<|control11|><|separator|>
  6. [6]
    From RNA to Protein - Molecular Biology of the Cell - NCBI Bookshelf
    To initiate translation, a small ribosomal subunit binds to the mRNA molecule at a start codon (AUG) that is recognized by a unique initiator tRNA molecule.
  7. [7]
    Biology, Genetics, Genes and Proteins, The Genetic Code | OERTX
    The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a ...
  8. [8]
    Genetic Codes - NCBI
    The initiator codon - whether it is AUG, CTG, TTG or something else, - is by default translated as methionine (Met, M).
  9. [9]
    The Information in DNA Determines Cellular Function via Translation
    Reading the genetic code​​ Methionine is specified by the codon AUG, which is also known as the start codon. Consequently, methionine is the first amino acid to ...
  10. [10]
    Dual functions of codons in the genetic code - PMC - NIH
    During translation, AUG acts as an initiator of protein synthesis as well as a codon for methionine (Met) incorporation at internal protein positions in ...
  11. [11]
    Reading Frame - an overview | ScienceDirect Topics
    The initiator AUG codon not only defines the start but also the reading frame of an mRNA. Translation proceeds from this start in steps of three nucleotides ...
  12. [12]
    The 3′-Terminal Sequence of Escherichia coli 16S Ribosomal RNA
    The 3′-Terminal Sequence of Escherichia coli 16S Ribosomal RNA: Complementarity to Nonsense Triplets and Ribosome Binding Sites. J. Shine and L. DalgarnoAuthors ...
  13. [13]
    Comparative genomic analysis of translation initiation mechanisms ...
    In prokaryotes, translation initiation is believed to occur through an interaction between the 3΄ tail of a 16S rRNA and a corresponding Shine–Dalgarno (SD) ...
  14. [14]
    At least six nucleotides preceding the AUG initiator codon ... - PubMed
    From a comparison of several hundred mRNA sequences, CCA/GCCAUGG emerged as the consensus sequence for initiation in higher eukaryotes.Missing: original paper
  15. [15]
  16. [16]
  17. [17]
    Eukaryotic Initiator tRNA: Finely Tuned and Ready for Action - NIH
    The initiator tRNA is thought to bind directly to the P site of the small ribosomal subunit and to play a critical role in recognizing the start codon in the ...
  18. [18]
    Where Does N-Formylmethionine Come from? What for ... - NIH
    Mar 31, 2022 · fMet was initially found in the translation process of bacteria, chloroplast, and mitochondria. Moreover, fMet promotes protein complex ...
  19. [19]
    Suppressor mutations in Escherichia coli methionyl-tRNA ... - PNAS
    The formylation reaction is highly specific. The enzyme formylates the initiator Met-tRNA but not the elongator species of Met-tRNA or any other aminoacyl-tRNA ...
  20. [20]
    Yeast initiator tRNA identity elements cooperate to influence multiple ...
    The asterisk marks the position of the 2′-O-phosphoribosyl modification on A64 present in plant and fungal initiator tRNAs that serves as an anti-elongator ...
  21. [21]
    Crystal structure of methionyl‐tRNA f Met transformylase complexed ...
    The main basis for the specific formylation of eubacterial methionyl‐tRNAfMet is the lack of base pairing at the top of the acceptor helix. In the E.coli ...
  22. [22]
    Measurements of translation initiation from all 64 codons in E. coli
    Feb 21, 2017 · The most common start codons for known Escherichia coli genes are AUG (83% of genes), GUG (14%) and UUG (3%) (2–4). Similar percentages can be ...
  23. [23]
    Selection on start codons in prokaryotes and potential compensatory ...
    Sep 29, 2017 · The AUG starts are replaced by GUG and especially UUG significantly less frequently than expected under the neutral expectation derived from the ...
  24. [24]
    Full article: Why is start codon selection so precise in eukaryotes?
    While eukaryotic translation generally starts with the AUG codon, prokaryotic translation permits frequent GUG and UUG initiation besides AUG. In E. coli (Gram ...
  25. [25]
    Mechanism of Translation Initiation in Eukaryotes - NCBI - NIH
    The complex m7GpppX·eIF4E·eIF4G directs the 43S preinitiation complex to the mRNA 5' end. The 4E-BPs specifically inhibit cap-dependent translation ...
  26. [26]
    The roles of individual eukaryotic translation initiation factors in ...
    A 43S complex comprising a 40S ribosomal subunit in association with initiator tRNA and eukaryotic initiation factors (eIFs) binds to the capped 5′ end of an ...
  27. [27]
    Non-AUG translation: a new start for protein synthesis in eukaryotes
    This review by Kearse and Wilusz discusses the profound impact of non-AUG start codons in eukaryotic translation.
  28. [28]
  29. [29]
    Alternative translation of the proto-oncogene c-myc by an ... - PubMed
    Dec 19, 1997 · The human proto-oncogene c-myc encodes two proteins, c-Myc1 and c-Myc2, from two initiation codons, CUG and AUG, respectively.
  30. [30]
    Recent Advances in Archaeal Translation Initiation - PMC
    Sep 18, 2020 · Recent Advances in Archaeal Translation Initiation. Emmanuelle ... percentage of AUG, GUG, and UUG start codons are indicated. The ...
  31. [31]
  32. [32]
  33. [33]
  34. [34]
    Selection of initiator tRNA and start codon by mammalian ...
    Jan 29, 2025 · Translation initiation in mammalian mitochondria is characterized by the use of leaderless messenger RNAs (mRNAs) and non-AUG start codons, ...
  35. [35]
    Unconventional decoding of the AUA codon as methionine by ... - NIH
    Therefore, the single tRNAMet in mammalian mitochondria should recognize both the AUA and AUG codons as Met, serving as both the elongator and initiator tRNA.
  36. [36]
    Detection of Nα-terminally formylated native proteins by a pan-N ...
    Mar 27, 2023 · N-formyl methionine (fMet)-containing proteins are produced in bacteria, eukaryotic organelles mitochondria and plastids, and even in ...Missing: cpDNA | Show results with:cpDNA
  37. [37]
    The human mitochondrial tRNAMet: Structure/function relationship ...
    The bacterial and eukaryotic cytoplasmic initiator tRNAMetCAU are unmodified at the wobble position. The f5C modification has been found in the mitochondrial ...Missing: cpDNA | Show results with:cpDNA
  38. [38]
    Translation of psbC mRNAs Starts from the Downstream GUG, not ...
    Most psbC mRNAs of many organisms possess two possible initiation codons, AUG and GUG, and their coding regions are generally annotated from the upstream AUG.Missing: AUA | Show results with:AUA
  39. [39]
    Chloroplasts evolved an additional layer of translational regulation ...
    Jan 17, 2023 · Although AUG is the universal start codon for protein synthesis, several non-AUG codons have been shown to function as start codons13,14,15,16, ...
  40. [40]
    Homology between chloroplast and prokaryotic initiator tRNA ...
    Oct 25, 1980 · The nucleotide sequence of a chloroplast methionine initiator tRNA from spinach has been determined. Although from a eukaryotic organism, ...Missing: cpDNA | Show results with:cpDNA
  41. [41]
    The Genetic Systems of Mitochondria and Plastids - NCBI - NIH
    It is widely accepted that mitochondria and plastids evolved from bacteria that were engulfed by nucleated ancestral cells. As a relic of this evolutionary ...
  42. [42]
    Methionine on the rise: how mitochondria changed their codon usage
    Aug 30, 2016 · This change is apparent in the tRNAMet, which in the canonical codon usage only recognizes AUG codons. However, in mammalian mitochondria the ...
  43. [43]
    Determinants of genome-wide distribution and evolution of uORFs in ...
    Feb 17, 2021 · Upstream open reading frames (uORFs) are short open reading frames (ORFs) that have start codons located in the 5′ untranslated regions (UTRs) ...
  44. [44]
    Secondary structures that regulate mRNA translation provide ...
    Oct 3, 2023 · One of the most common elements is upstream open reading frames (uORFs) that encode short peptides and are present in approximately 50% of human ...
  45. [45]
    Conserved uORF Nascent Peptides that Control Translation - PMC
    Because of this coupling, the translation of bacterial uORFs (also known as bacterial leader peptides) can affect mRNA production.
  46. [46]
    Molecular mechanisms of translational control - Nature
    Oct 1, 2004 · Upstream open reading frames (uORFs) normally function as negative regulators by reducing translation from the main ORF. Green ovals ...
  47. [47]
    Reinitiation involving upstream ORFs regulates ATF4 mRNA ... - PNAS
    Aug 3, 2004 · Translational expression of GCN4 occurs by a mechanism that involves four upstream ORFs (uORFs) in the 5′ noncoding portion of the GCN4 mRNA.
  48. [48]
    Unanticipated Antigens: Translation Initiation at CUG with Leucine
    We found that when CUG acts as an alternate initiation codon, it can be decoded as leucine rather than the expected methionine residue. The leucine start does ...
  49. [49]
    Non-AUG translation initiation in mammals | Genome Biology
    May 9, 2022 · At a mechanistic level, start codon recognition depends on codon-anticodon interaction between AUG and Met-tRNAi.<|separator|>
  50. [50]
    Genetic Code: Expanding codon size - eLife
    May 11, 2022 · Engineering transfer RNAs to read codons consisting of four bases requires changes in tRNA that go beyond the anticodon sequence.
  51. [51]
    Engineering Pyrrolysine Systems for Genetic Code Expansion and ...
    Sep 5, 2024 · We describe the development of genetic code expansion, from E. coli to all domains of life, using PylRS/tRNA Pyl pairs, and the development of systems that ...
  52. [52]
    Multiplex suppression of four quadruplet codons via tRNA directed ...
    Sep 29, 2021 · Quadruplet codons, which may enable an expanded genetic code with up to 255 unique assignable amino acids (4permutations = 255 quadruplet codons ...
  53. [53]
    Engineered initiator tRNAs can effectively start translation at non ...
    Jan 20, 2025 · Other start codons such as AAG and GUC have been shown to be efficient in translation initiation with corresponding initiator tRNA anticodon ...
  54. [54]
    CRISPR/Cas9 therapeutics: progress and prospects - Nature
    Jan 16, 2023 · First research using CRISPR technology for disease treatment. In the months after CRISPR/Cas9 was shown to function in mammalian cells ...