Start codon
The start codon is a specific sequence of three nucleotides in messenger RNA (mRNA) that marks the point at which translation—the process of synthesizing a protein—begins by directing the ribosome to assemble the first amino acid. In both prokaryotes and eukaryotes, the most common start codon is AUG, which encodes the amino acid methionine in eukaryotes and N-formylmethionine in prokaryotes, serving dual roles as both an initiation signal and the first codon in the genetic code.[1][2] This codon is recognized by initiator transfer RNA (tRNA), which binds to the ribosome's P-site to kickstart polypeptide chain elongation.[3] While AUG predominates, alternative start codons exist and can expand the proteome's diversity, particularly under specific cellular conditions or in certain organisms. In prokaryotes, GUG and UUG can also function as start codons, often leading to the incorporation of formylmethionine, though with lower efficiency than AUG.[2] In eukaryotes, non-AUG codons such as CUG, GUG, and UUG are used in a subset of mRNAs, sometimes resulting in proteins with non-methionine N-termini, and their selection is influenced by the surrounding nucleotide context known as the Kozak sequence.[3] A 2017 study indicated that at least 47 of the 64 possible triplet codons may initiate translation in bacteria, challenging traditional views and highlighting the flexibility of start codon recognition.[4] The accuracy of start codon selection is critical for proper gene expression, as errors can lead to out-of-frame translation or truncated proteins, potentially causing cellular dysfunction or disease. In eukaryotes, ribosomal scanning from the mRNA's 5' cap ensures the first suitable AUG is chosen, modulated by initiation factors like eIF2 and eIF1.[5] Non-canonical start codons, while less efficient, play roles in regulating translation during stress, development, or in mitochondrial and viral genomes, underscoring their biological significance beyond canonical initiation.[3]Overview
Definition and Function
A start codon is a sequence of three nucleotides, or trinucleotide, in messenger RNA (mRNA) that specifies the initiation site for protein translation by the ribosome.[6] In the standard genetic code, the primary start codon is AUG, which codes for the amino acid methionine but serves a distinct role in signaling the beginning of translation.[6] The primary function of the start codon is to recruit the initiator transfer RNA (tRNA), which carries N-formylmethionine in prokaryotes or unmodified methionine in eukaryotes, to the ribosome-mRNA complex.[6] This recruitment facilitates the assembly of the ribosomal initiation complex, including the binding of the small ribosomal subunit to mRNA and subsequent joining of the large subunit, thereby establishing the correct reading frame for decoding subsequent codons.[6] By defining the start point, the start codon ensures the synthesis of the polypeptide chain proceeds accurately from the N-terminus to the C-terminus, preventing misinterpretation of the genetic message.[6] Start codons exhibit near-universal conservation across all domains of life, underscoring the shared evolutionary ancestry of the genetic code.[7] This universality, with AUG as the predominant initiator in most organisms, reflects the code's ancient origins, though rare exceptions occur in specialized systems such as mitochondria.[6] The absence or mutation of a start codon typically prevents translation initiation, resulting in no protein production or the use of an alternative downstream start site, which often yields truncated or non-functional proteins.[6] Such alterations can also induce frameshift errors if translation begins out of frame, leading to aberrant polypeptides with incorrect amino acid sequences and potential loss of biological activity.[6]Context in the Genetic Code
The standard genetic code consists of 64 possible triplets (codons) formed from the four nucleotide bases adenine (A), cytosine (C), guanine (G), and uracil (U) in messenger RNA (mRNA), which specify the 20 standard amino acids and three stop signals during protein translation.[8] This code is nearly universal across all domains of life, with the codon AUG universally assigned to the amino acid methionine (Met) in both internal positions and as the primary initiation signal.[9] The following table summarizes the standard genetic code, organized by the first two bases of each codon (third base degeneracy is indicated in the rows):| First base | U | C | A | G | Third base |
|---|---|---|---|---|---|
| U | UUU (Phe) UUC (Phe) UUA (Leu) UUG (Leu) | UCU (Ser) UCC (Ser) UCA (Ser) UCG (Ser) | UAU (Tyr) UAC (Tyr) UAA (Stop) UAG (Stop) | UGU (Cys) UGC (Cys) UGA (Stop) UGG (Trp) | U C A G |
| C | CUU (Leu) CUC (Leu) CUA (Leu) CUG (Leu) | CCU (Pro) CCC (Pro) CCA (Pro) CCG (Pro) | CAU (His) CAC (His) CAA (Gln) CAG (Gln) | CGU (Arg) CGC (Arg) CGA (Arg) CGG (Arg) | U C A G |
| A | AUU (Ile) AUC (Ile) AUA (Ile) AUG (Met, Start) | ACU (Thr) ACC (Thr) ACA (Thr) ACG (Thr) | AAU (Asn) AAC (Asn) AAA (Lys) AAG (Lys) | AGU (Ser) AGC (Ser) AGA (Arg) AGG (Arg) | U C A G |
| G | GUU (Val) GUC (Val) GUA (Val) GUG (Val) | GCU (Ala) GCC (Ala) GCA (Ala) GCG (Ala) | GAU (Asp) GAC (Asp) GAA (Glu) GAG (Glu) | GGU (Gly) GGC (Gly) GGA (Gly) GGG (Gly) | U C A G |