Fact-checked by Grok 2 weeks ago

Protein primary structure

The primary structure of a protein is defined as the linear sequence of amino acids linked together by peptide bonds to form a polypeptide chain. This sequence is specified from the amino-terminal (N-terminus) end to the carboxyl-terminal (C-terminus) end, and it consists of one of 20 standard amino acids, each contributing unique side chains that influence the protein's properties. The primary structure serves as the foundational level of protein organization, encoding all the information necessary for the protein to fold into its functional three-dimensional form. The primary structure is critical because it determines the higher levels of protein organization, including secondary, tertiary, and quaternary structures, through interactions among amino acid side chains such as hydrogen bonds, ionic bonds, hydrophobic effects, and disulfide bridges. These interactions dictate the protein's overall , , and biological , enabling roles in processes like enzymatic , molecular , , and cellular signaling. Even a single amino acid substitution in the primary sequence can disrupt folding and lead to loss of function or pathological conditions, as seen in genetic mutations. In living organisms, the primary structure is established during protein synthesis through transcription of DNA into messenger RNA (mRNA) and subsequent translation at ribosomes, where transfer RNA (tRNA) molecules deliver specific amino acids according to the mRNA codon sequence. Historically, primary structures were determined using methods like Edman degradation for sequential amino acid identification, but modern techniques rely on mass spectrometry and automated DNA sequencing to infer protein sequences from corresponding genes. This level of structure is unique to each protein and is conserved across species for homologous proteins, underscoring its evolutionary and functional significance.

Definition and Fundamentals

Definition

The primary structure of a protein refers to the linear sequence of covalently linked by bonds to form a polypeptide chain. This sequence is conventionally described from the amino (, where the free amino group is located, to the carboxyl (, where the free carboxyl group resides. The key components of this structure include the 20 standard encoded by the , which are joined through their alpha-amino and alpha-carboxyl groups via bonds, resulting in a unbranched chain unless post-translational modifications occur. These bonds are amide linkages formed by dehydration , creating a rigid, planar backbone that defines the one-dimensional nature of the primary structure. Unlike higher levels of protein organization, the primary structure represents the simplest, sequential arrangement without considering spatial folding, hydrogen bonding patterns, or non-covalent interactions that give rise to secondary, tertiary, or structures. For instance, in the insulin, the primary structure consists of two distinct polypeptide chains—A (21 ) and B (30 )—that emerge as separate sequences after enzymatic cleavage of a precursor protein, proinsulin, though they are later connected by bonds in the mature form.

Biological Importance

The primary structure of a protein, defined by its linear of , is fundamental to its biological as it determines the higher-order folding and thus the precise three-dimensional arrangement necessary for activity. Specific sequences enable the formation of active sites in enzymes, where catalytic residues interact with substrates to facilitate reactions, while also influencing binding affinities for ligands, cofactors, or other molecules through complementary physicochemical properties of side chains. For instance, the arrangement of polar, nonpolar, acidic, or basic in the primary dictates interactions that stabilize secondary structures like alpha helices or beta sheets, ultimately positioning residues for enzymatic catalysis or molecular recognition. The vast combinatorial diversity arising from the 20 standard allows for an enormous repertoire of proteins, far exceeding the needs of any and underpinning biological . For a typical protein of 100 residues, the theoretical number of possible sequences exceeds 10^130, enabling the of specialized functions tailored to diverse cellular environments. This provides the raw material for , where —such as single polymorphisms or insertions/deletions—alter the primary structure, potentially conferring adaptive advantages like enhanced or novel binding properties in response to environmental pressures. Alterations in primary structure due to mutations can also lead to pathological conditions by disrupting normal protein function. A classic example is sickle cell anemia, caused by a single point mutation in the β-globin gene that substitutes glutamic acid (Glu) with valine (Val) at the sixth position of the hemoglobin β-chain, resulting in abnormal hemoglobin polymerization, red blood cell deformation, and impaired oxygen transport. Such changes highlight how even minor sequence variations can cascade into severe diseases, emphasizing the primary structure's role in maintaining physiological homeostasis.

Synthesis

Biological Synthesis

The biological synthesis of a protein's primary structure occurs through the process of , in which the sequence of (mRNA) is decoded by ribosomes to assemble a linear polypeptide chain from . Ribosomes, composed of (rRNA) and proteins, serve as the that facilitate this decoding, while (tRNA) molecules act as adaptors, each carrying a specific and bearing an anticodon that base-pairs with complementary codons on the mRNA. This codon-anticodon recognition ensures that the sequence of three- codons in the mRNA directly dictates the order of in the protein, establishing the primary structure with high precision. The underlying this process is a non-overlapping triplet code, where successive groups of three (codons) in the mRNA are read sequentially without overlap, each specifying one of the 20 standard or serving as a signal for termination. This code exhibits degeneracy, meaning that most are encoded by multiple synonymous codons (up to six for some, like ), which provides redundancy and robustness against certain mutations. The AUG universally initiates by coding for in prokaryotes or in eukaryotes, while the code's triplet nature was experimentally established through frame-shift and decoding studies in the . Translation proceeds in three main phases: initiation, elongation, and termination. During , the small ribosomal subunit binds to the mRNA at the 5' cap (in eukaryotes) or Shine-Dalgarno sequence (in prokaryotes), scans to the AUG start codon, and assembles with the large subunit and initiator tRNA to form the 70S (prokaryotes) or (eukaryotes) initiation complex, aided by initiation factors like in eukaryotes. follows, with the ribosome's peptidyl (P) site holding the growing chain and the aminoacyl (A) site accepting the next cognate ; peptide bond formation occurs via the ribosome's activity, transferring the nascent chain to the new , after which elongation factor-driven translocation moves the ribosome three along the mRNA, ejecting the deacylated tRNA from the exit (E) site. Termination is triggered upon arrival of a (UAA, UAG, or UGA) in the A site, which is recognized by release factors (e.g., RF1/RF2 in prokaryotes or eRF1 in eukaryotes), leading to hydrolytic release of the completed polypeptide from the tRNA and of the ribosomal subunits. To maintain the of primary structure formation, several mechanisms ensure accurate codon decoding and incorporation, with overall error rates held to approximately 1 in 10^4 . synthetases (aaRSs) play a central role by catalyzing the specific attachment of to their tRNAs, achieving initial specificity through recognition but relying on (editing) domains to hydrolyze misactivated or misacylated tRNAs, reducing error rates from potential 1 in 200 misactivations to 1 in 10^4 or lower. Additional checks occur at the , including induced fit conformational changes that discriminate against near- tRNAs and kinetic during GTP by factors, collectively minimizing mistranslation that could disrupt protein function.

Chemical Synthesis

Chemical synthesis of protein primary structures enables the laboratory assembly of polypeptides with defined sequences, distinct from biological processes. The cornerstone method is solid-phase peptide synthesis (SPPS), introduced by Robert Bruce Merrifield in 1963, which facilitates the stepwise construction of peptide chains anchored to an insoluble resin support. This approach allows for automated synthesis, where are added sequentially from the to the , enabling precise control over the primary structure. In SPPS, protected amino acids—typically with N-terminal Boc or Fmoc groups and side-chain protections—are employed to prevent unwanted reactions. The process involves iterative cycles of activation, coupling, and deprotection. Activation converts the carboxyl group of the incoming amino acid into a reactive species, often using carbodiimides such as dicyclohexylcarbodiimide (DCC) to form an O-acylisourea intermediate, which promotes efficient amide bond formation. Coupling attaches this activated amino acid to the free N-terminal amine of the resin-bound peptide chain, typically achieving per-step yields exceeding 99% under optimized conditions. Deprotection then removes the N-terminal protecting group—e.g., via acid treatment for Boc or base for Fmoc—exposing the amine for the next cycle, while the resin facilitates easy separation of byproducts through filtration. Upon completion, the peptide is cleaved from the resin (e.g., using hydrogen fluoride for Boc chemistry) and purified, commonly by reversed-phase high-performance liquid chromatography (HPLC) to isolate the target sequence with high purity. Despite its efficiency, SPPS has practical limitations. The cumulative effect of incomplete couplings leads to a practical length limit of up to 50-100 residues, beyond which overall yields drop significantly due to side reactions and aggregation on the resin. Racemization, the partial conversion of L-amino acids to D-isomers during activation and coupling, poses another risk, particularly with certain residues like cysteine or serine, necessitating careful selection of reagents and conditions to minimize stereochemical integrity loss below 1%. SPPS has transformative applications in producing therapeutic peptides with custom primary structures. For instance, oxytocin, a nonapeptide hormone, was among the early successes synthesized via SPPS in the late , demonstrating the method's viability for biologically active molecules now used in clinical settings for and postpartum hemorrhage treatment. This capability has expanded to over 100 FDA-approved drugs as of 2024, underscoring SPPS's role in pharmaceutical development.

Determination

Classical Methods

The classical methods for determining protein primary structure relied on chemical labeling, selective hydrolysis, and chromatographic analysis to identify amino acid sequences step by step, primarily developed in the mid-20th century. One foundational approach was end-group labeling, pioneered by Frederick Sanger, which targeted the N-terminal amino acid of a polypeptide chain. In this method, the protein is reacted with 2,4-dinitrofluorobenzene (DNFB), also known as Sanger's reagent, to form a stable dinitrophenyl (DNP) derivative at the free amino group of the N-terminal residue. The labeled protein is then subjected to complete acid hydrolysis, which cleaves all peptide bonds, releasing individual amino acids, including the DNP-labeled N-terminal one, which can be identified and quantified through chromatography due to its distinctive yellow color and solubility properties. This technique allowed determination of the N-terminal residue but was limited to end groups and required additional strategies for internal sequences. To address the need for sequential analysis beyond just end groups, Pehr Edman introduced a degradation method in 1950 that enabled stepwise removal of N-terminal residues from intact peptides. The process involves treating the peptide with phenylisothiocyanate (PITC), which reacts specifically with the N-terminal amino group to form a phenylthiocarbamyl (PTC) derivative. Mild acid treatment then cleaves this derivative as a phenylthiohydantoin (PTH) amino acid, leaving the rest of the peptide chain intact for further cycles of reaction. Each released PTH-amino acid is identified by , typically or thin-layer, allowing sequences of up to 50-60 residues to be determined manually with high specificity, though yields decreased in later cycles due to incomplete reactions. complemented end-group labeling by providing a cyclic, non-destructive way to elucidate longer stretches of the primary structure. For larger proteins, where direct sequencing of the full chain was impractical, proteolytic digestion with specific enzymes was employed to fragment the polypeptide into smaller, overlapping whose sequences could be individually determined and then assembled. Enzymes like , which cleaves bonds after and residues, or , which targets aromatic , were used to generate predictable fragments. These were separated by or , sequenced using end-group or Edman methods, and aligned based on overlaps from multiple digests with different enzymes or partial acid . This overlap strategy was essential for reconstructing the complete sequence, as it resolved ambiguities in fragment order. These methods culminated in the first complete determination of a protein's primary structure: the sequencing of bovine insulin by Sanger's group in the early . Insulin, a 51-residue with two disulfide-linked chains, was oxidized to separate the A (21 residues) and B (30 residues) chains, then fragmented using , , and partial acid hydrolysis to yield over 50 peptides. Sequencing these via DNP labeling and revealed the exact order, including the positions of three interchain and two intrachain bonds, confirming that proteins possess a defined linear sequence of . This landmark achievement, published in 1951 for the B chain and 1953 for the A chain, established the genetic specificity of and earned Sanger the 1958 .

Modern Techniques

Modern techniques for determining protein primary structure have advanced significantly since the late 20th century, enabling high-throughput analysis of complex proteomes and direct sequencing of peptides. These methods leverage , genomic sequencing, and computational tools to achieve greater speed, sensitivity, and scalability compared to earlier approaches, often integrating multiple technologies in pipelines. Mass spectrometry (MS) stands as a of contemporary , particularly through tandem MS (MS/MS), which fragments peptides to generate sequence-specific ions for identification. In MS/MS workflows, proteins are digested into peptides, ionized, and subjected to or other fragmentation techniques to produce daughter ions whose mass-to-charge ratios reveal order via database matching or de novo sequencing algorithms. This approach excels in resolving ambiguous sequences and handling mixtures, with de novo sequencing particularly useful for novel proteins lacking genomic references. (ESI) and (MALDI) serve as key ionization methods; ESI produces multiply charged ions suitable for online coupling with liquid chromatography (LC), while MALDI generates singly charged ions ideal for imaging and high-molecular-weight analysis. ESI's soft ionization preserves labile modifications, enabling detection of post-translational modifications (PTMs) alongside primary sequence. Next-generation sequencing (NGS) provides an indirect yet powerful route to protein primary structure by determining DNA or RNA sequences, which are translated into amino acid sequences using the genetic code. NGS platforms, such as those from Illumina or Ion Torrent, parallelize millions of sequencing reads to assemble genomes or transcriptomes rapidly, allowing inference of coding regions (exons) and their codon-based protein products. This method is especially valuable for organisms with sequenced genomes, where proteome-wide sequences can be predicted ab initio, though it requires validation against direct protein data to account for splicing variants or errors. For example, NGS-enabled whole-genome sequencing has facilitated the annotation of proteomes in model organisms like humans, revealing over 20,000 protein-coding genes. Computational prediction tools complement experimental methods by validating the plausibility of primary sequences by predicting their three-dimensional structures and assessing fold stability. , developed by , uses on evolutionary multiple sequence alignments to predict three-dimensional protein structures from input sequences, thereby evaluating if the sequence aligns with biophysical constraints. While not a direct sequencing tool, aids validation in cases of sequencing ambiguity, such as distinguishing isoforms, by scoring how well variants fold into stable structures; for instance, it has achieved high accuracy, with median all-atom RMSD of about 1.5 in benchmarks, for the majority of human proteins. Limitations include reliance on known sequences for input and reduced performance for disordered regions or novel folds. Emerging techniques as of 2025 include , which enables direct, single-molecule analysis of polypeptide chains by detecting ionic current changes as pass through a nanopore. Combined with for signal interpretation, these methods offer potential for label-free, high-throughput sequencing of native proteins, addressing limitations of digestion-based approaches. workflows integrate these techniques for large-scale primary structure determination, with liquid chromatography-tandem (LC-MS/MS) as the gold standard for bottom-up analysis. In a typical LC-MS/MS pipeline, proteins are extracted, reduced, alkylated, and enzymatically digested (e.g., with ) into , which are separated by reversed-phase LC before ESI-MS/MS ionization and fragmentation. Spectral data are searched against databases like using tools such as or MaxQuant for identification and assembly into protein sequences, achieving proteome coverage of 5,000–10,000 proteins per run in complex samples. These workflows also detect PTMs, such as or , by identifying mass shifts in fragment ions, with neutral loss scans enhancing site localization. High-resolution instruments like analyzers provide sub-ppm mass accuracy, enabling confident sequencing even for PTM-bearing peptides.

Representation and Notation

Sequence Notation

Protein primary sequences are conventionally written from the to the , reflecting the direction of polypeptide chain synthesis in biological systems. This left-to-right notation in linear text representations ensures consistency across and databases. Two primary systems exist for denoting in sequences: the three-letter code, which uses abbreviated names like for , and the one-letter code, which employs single characters such as A for . The one-letter code is preferred for compact representation of long sequences, while the three-letter code offers greater readability for shorter segments or when emphasizing specific residues. These abbreviations are standardized by the International Union of Pure and Applied Chemistry (IUPAC) and the International Union of Biochemistry and Molecular Biology (IUBMB). The IUPAC-IUBMB recommendations specify codes for the 20 standard proteinogenic amino acids, as well as non-standard ones incorporated in some proteins, such as (denoted Sec or U) and pyrrolysine (Pyl or O). Below is a table of the standard abbreviations:
Amino AcidThree-Letter CodeOne-Letter Code
AlaA
ArgR
AsnN
AspD
CysC
GlnQ
GluE
GlyG
HisH
IleI
LeuL
LysK
MetM
PheF
ProP
SerineSerS
ThrT
TrpW
TyrY
ValV
For ambiguous residues, such as those undetermined by sequencing, the one-letter code "X" (or three-letter "Xaa") is used to indicate an unknown amino acid. Other ambiguity codes include "B" for aspartic acid (D) or asparagine (N), "Z" for glutamic acid (E) or glutamine (Q), "J" for isoleucine (I) or leucine (L), and "?" for gaps or completely unresolved positions in alignments. In databases, protein sequences are commonly stored and exchanged in the FASTA format, which begins with a header line starting with ">" followed by an identifier, and then the sequence in one-letter code, often wrapped at 60-80 characters per line for readability. The UniProt database, a comprehensive resource for protein sequences and annotations, displays and archives entries using these IUPAC one-letter codes, with canonical sequences serving as the reference for positional numbering. This standardization facilitates computational analysis, alignment, and sharing across bioinformatics tools.

Structural Representations

The primary structure of a protein can be visually represented through linear diagrams that depict the polypeptide chain as a sequential series of connected by bonds, often illustrated as a or straight chain to emphasize the covalent linkages without implying three-dimensional folding. These diagrams typically use circles or beads for residues and lines for bonds, highlighting the N-terminal to C-terminal directionality and allowing identification of specific sequences or motifs. Such representations facilitate understanding of the linear order and potential sites for interactions, as described in standard biochemical illustrations. Sequence logos provide a graphical to encode the and variability within aligned protein sequences, stacking letters for each position where the height of each symbol reflects its frequency or , measured in bits. Developed originally for nucleic acids but widely adapted for proteins, this approach visually summarizes motifs or domains in primary structures from multiple homologs, with taller stacks indicating higher and color-coding often distinguishing physicochemical properties. For instance, in protein families, sequence logos reveal conserved residues critical for , such as catalytic sites in enzymes. Software tools like PyMOL enable visualization of the primary chain by rendering the polypeptide backbone as a continuous tube or line trace, often with side chains appended to illustrate the sequence in a linear fashion before applying higher-level representations. PyMOL supports loading sequences from databases like and displaying them as editable chains, useful for annotating specific residues or bonds. Complementing this, Ramachandran plots offer a diagrammatic view of backbone conformational constraints inherent to the primary structure, plotting allowed phi (φ) and psi (ψ) dihedral angles for each residue type to show sterically feasible regions that limit possible chain geometries. These plots, derived from energy calculations, underscore how the sequence of influences local flexibility, with allowing broader ranges due to its small . A representative example is the primary structure of , a 153-residue oxygen-binding protein, often depicted in linear diagrams with predicted helical regions marked as segments A through H (e.g., A: residues 3-18, E: 58-77), illustrating how the sequence predisposes certain stretches to alpha-helical conformations based on composition. This visualization highlights eight helical motifs connected by non-helical loops, aiding in the interpretation of how primary sequence elements contribute to the protein's overall .

Modifications

Post-Translational Modifications

Post-translational modifications (PTMs) are covalent alterations to the amino acid residues of a protein that occur after its ribosomal synthesis, thereby expanding the functional diversity of the primary sequence without altering the polypeptide backbone length. These modifications introduce chemical groups, such as , sugars, or moieties, to specific side chains, influencing protein stability, localization, activity, and interactions. PTMs are primarily enzymatic processes mediated by dedicated enzymes like kinases, glycosyltransferases, and ubiquitin ligases, enabling dynamic regulation of cellular processes including signaling and metabolism. Over 400 distinct types of PTMs have been identified across proteomes as of 2024, with mass spectrometry-based serving as the primary method for their detection and mapping due to its high-throughput capability in identifying modification sites and stoichiometries. Among the most prevalent are , , ubiquitination, and , each targeting specific residues and serving regulatory roles. involves the addition of a group from ATP to serine (Ser), (Thr), or (Tyr) residues, catalyzed by protein kinases, which reversibly activates or inhibits enzymatic activity and facilitates pathways. attaches carbohydrate moieties either N-linked to (Asn) in the Asn-X-Ser/Thr (where X is any except ) via oligosaccharyltransferase in the , or O-linked to Ser or Thr by Golgi-resident glycosyltransferases, enhancing , stability, and cell-cell recognition. Ubiquitination conjugates —a 76-amino-acid protein—to (Lys) residues through a cascade of E1-activating, E2-conjugating, and E3-ligase enzymes, often forming polyubiquitin chains that signal proteasomal degradation or alter protein trafficking and interactions. transfers an acetyl group from to the ε-amino group of Lys or the N-terminal amine, executed by histone acetyltransferases (HATs) or non-histone acetyltransferases, which neutralizes positive charges to modulate protein-DNA binding and enzymatic function. These PTMs play critical roles in cellular regulation and signaling; for instance, histone at Lys residues on tails promotes relaxation and transcription in epigenetic control, as demonstrated in studies of activity during development and disease. Similarly, by cyclin-dependent kinases (CDKs) on Thr and Ser residues of cell cycle regulators, such as , drives orderly progression through G1/S and G2/M phases by sequentially activating downstream targets. Such modifications underscore the primary structure's adaptability, with their reversibility—via phosphatases, deglycosylases, deubiquitinases, and deacetylases—allowing rapid responses to environmental cues.

Other Modifications

In addition to post-translational modifications that alter side chains, the primary structure of proteins can undergo other alterations that change the linear sequence or of the polypeptide chain, such as proteolytic and enzymatic . These processes are essential for protein maturation, , and functional diversification during or after . Proteolytic represents a key mechanism for reshaping primary structure by excising segments from the polypeptide chain, often mediated by specific endoproteases. This includes the removal of N-terminal signal peptides during co-translational translocation into the , where signal peptidases cleave the hydrophobic signal sequence to yield the mature protein, ensuring proper localization. In , inactive precursors are converted to active enzymes via limited ; for example, is cleaved by enterokinase at a specific Lys-Ile bond, exposing the and initiating a conformational change that enables . Similarly, in maturation, proinsulin undergoes sequential cleavages by prohormone convertases (PC1/3 and PC2) and carboxypeptidase E in the Golgi and secretory granules, removing the linker to form the mature insulin consisting of A and B chains linked by bonds. These cleavages not only refine the primary sequence but also prevent premature activity and facilitate packaging. Enzymatic ligation counters cleavage by joining polypeptide segments, effectively altering chain connectivity without altering the amino acid sequence itself. Intein-mediated is a prominent example, where inteins—self-splicing protein elements—catalyze their own excision from a precursor protein and simultaneously ligate the flanking extein sequences via a series of nucleophilic attacks, forming a native . This process occurs in various organisms, including and eukaryotes, and is harnessed in for of proteins. Transpeptidation, another ligation mechanism, involves enzymes like sortases or asparaginyl endopeptidases that catalyze the transfer of peptide segments, often using a or acyl intermediate to form new isopeptide or bonds; for instance, archaeal connectase performs sequence-specific transpeptidation to join protein fragments efficiently. These ligations enable the assembly of multidomain proteins from separate modules, expanding functional diversity.30080-1)

Relation to Higher Structures

Secondary Structure

The primary structure of a protein, consisting of its linear sequence of amino acids, fundamentally determines the local folding patterns that form secondary structures such as α-helices and β-sheets through inherent propensities of individual residues. Certain amino acids exhibit strong preferences for specific secondary elements due to their side-chain properties and backbone flexibility; for instance, alanine has a high propensity for α-helices because its small methyl side chain minimizes steric hindrance and stabilizes hydrogen bonding, while valine favors β-sheets owing to its branched aliphatic side chain that promotes hydrophobic packing in extended conformations. In contrast, proline disrupts α-helices and β-sheets but has a strong propensity for β-turns, as its cyclic side chain restricts backbone rotation and introduces a kink essential for reversing chain direction. These propensities are statistically derived from analyses of known protein structures and reflect the physicochemical contributions of side chains to local stability. Prediction of secondary structure from primary sequence relies on empirical parameters that quantify these amino acid preferences, with the Chou-Fasman method providing a foundational approach for assigning α-helices, β-sheets, and turns. In this method, each is assigned propensity values (P_α for helices, P_β for sheets, and P_t for turns) based on their observed frequencies in secondary structures from data; regions where the average P_α exceeds 1.00 over a window of six residues are predicted as helical, while similar thresholds apply for β-sheets, with breakers like terminating elements. The parameters, originally tabulated from 29 non-homologous proteins, enable a rule-based assignment that highlights how sequential patterns of high-propensity residues nucleate and extend secondary elements, achieving accuracies around 50-60% for broad classifications. The formation of secondary structures involves cooperative transitions where the primary sequence governs —the energetically unfavorable initiation of a short structural segment—and —the favorable extension through sequential formation. In α-helix formation, nucleation requires overcoming an entropy penalty for aligning the first few residues, often favored by sequences rich in or at N-terminal positions, while propagation is driven by side-chain interactions that differ from nucleation, such as hydrogen bonding stabilized by residues like glutamate. This process is modeled by the Zimm-Bragg theory, which treats helix-coil transitions as a one-dimensional Ising-like with nucleation parameter σ (typically 10^{-4} to 10^{-2}) and s (>1 for helix-stabilizing residues), illustrating how primary sequence motifs control the balance between and ordered states. A representative example is , where repeating heptad sequences (e.g., (a-b-c-d-e-f-g)_n with hydrophobic residues at a and d positions) promote nucleation of individual α-helices and their propagation into a dimeric coiled-coil dimer, essential for assembly in structural proteins.

Tertiary and Quaternary Structures

The primary structure of a protein encodes the information necessary for its native conformation, as established by , which posits that the determines the thermodynamically stable three-dimensional under physiological conditions. This , derived from experiments on A refolding, implies that the native state represents the global minimum, guiding the protein through a landscape where unfolded ensembles progressively reduce conformational entropy while minimizing energy. In this funnel model, the primary biases the energy landscape to favor productive folding pathways, avoiding kinetic traps that could lead to misfolding. Tertiary structure arises from long-range interactions between distant residues in the primary sequence, primarily driven by the , where non-polar cluster to form a stabilizing shielded from aqueous solvent. This burial of hydrophobic side chains, such as those from and , contributes the dominant energetic force for folding, with the core providing mechanical rigidity. Covalent bonds between residues further stabilize the tertiary fold by linking spatially separated segments, as observed in the refolding of proteins like , where these bonds lock the structure against . bonds, ionic interactions, and van der Waals contacts between polar and charged residues complement these, fine-tuning the overall architecture dictated by the sequence. In quaternary structures, the primary sequences of individual subunits encode specific motifs that mediate inter-subunit interfaces, enabling assembly into functional complexes. For instance, in , the α and β subunit sequences feature complementary hydrophobic and electrostatic patches at their interfaces, such as the α1β1 contact involving residues like Phe42 (α) and Asp99 (β), which facilitate tetramer formation and . These sequence-determined interfaces bury significant surface area upon assembly, enhancing stability and enabling cooperative functions like oxygen binding at sites coordinated by invariant histidines in the sequences. Advances in computational prediction have leveraged primary sequence to model and structures with unprecedented accuracy, exemplified by , which uses on sequence alignments to infer 3D coordinates from evolutionary patterns. This AI approach, achieving near-atomic resolution for many globular proteins, underscores how sequence encodes folding by capturing residue-residue distance distributions in the energy landscape. However, predictions falter for , where sequences lack strong evolutionary constraints and fail to form stable cores, resulting in low-confidence models that reflect ensemble dynamics rather than unique folds.

Historical Development

Early Discoveries

The concept of proteins as complex organic substances began to take shape in the 19th century through chemical analyses that revealed their composition. In 1820, French chemist Henri Braconnot conducted early hydrolysis experiments on gelatin using sulfuric acid and heat, breaking it down into a sweet-tasting substance he named "glycine," one of the first identified amino acids, suggesting proteins could be decomposed into simpler components. Building on this, Dutch chemist Gerardus Johannes Mulder analyzed various animal and plant materials in the 1830s, finding they shared a consistent elemental composition rich in nitrogen and carbon; upon Jöns Jacob Berzelius's suggestion, Mulder coined the term "protein" (from the Greek "proteios," meaning primary) in his 1838 publication to describe this ubiquitous substance, proposing it as a fundamental building block of life. Mulder further demonstrated through alkaline hydrolysis with sodium hydroxide that proteins from sources like egg white, blood, and gluten yielded similar nitrogenous products, hinting at a polymeric structure composed of amino acid-like units. Advancing into the early 20th century, German chemist provided critical evidence for the linkage between in proteins. In 1901, Fischer and Ernest Fourneau synthesized the first , glycyl-glycine, by condensing two molecules, demonstrating that could be joined via an amide bond. In 1907, Fischer extended this work by synthesizing longer polypeptides up to 18 and proposed the ""—a specific carboxyl-amino —as the repeating unit connecting in proteins, based on studies that released free from natural proteins matching his synthetic ones. These experiments established proteins as linear chains of , though the exact sequence remained unknown. Fischer's collaborator, Emil Abderhalden, contributed to early efforts at characterizing sequences in the 1900s through detailed and enzymatic digestion studies. Joining lab in 1902, Abderhalden analyzed protein hydrolysates using proteolytic enzymes to isolate and identify and tripeptides, such as alanylglycine, confirming their presence in natural proteins and supporting the idea of specific arrangements rather than random aggregates. His work involved partial sequencing by stepwise degradation, revealing recurring motifs in fibroin and other proteins, though limited by the technology of the era. By the 1930s, physical methods began to suggest ordered internal structures within proteins, complementing chemical insights. Austrian-born biophysicist , working at Cambridge's , obtained the first X-ray diffraction patterns of hemoglobin crystals in 1937, after crystallizing the protein from horse blood; these patterns indicated a highly ordered molecular arrangement, implying that the amino acid chain adopted a specific three-dimensional configuration essential for function. 's initial photographs revealed fibrous patterns akin to those in oriented fibers, providing the first evidence that proteins like possessed regular, non-random primary sequences underlying their crystalline order.

Key Milestones

In 1951, and his collaborator Hans Tuppy published the first complete amino acid sequence of the phenylalanyl chain of bovine insulin, marking the inaugural determination of a protein's full primary structure. This breakthrough culminated in the full sequencing of insulin's two chains by 1955, demonstrating that proteins possess defined linear sequences of essential to their function. meticulous use of partial , , and end-group analysis established the foundational principles for , earning him the in 1958 for this pioneering work. The development of in the early 1950s by Swedish biochemist Pehr Edman revolutionized by enabling the sequential removal and identification of N-terminal without disrupting the rest of the polypeptide chain. First described in 1949 and refined through the 1950s, this cyclization method using phenylisothiocyanate allowed for the analysis of up to 50-60 residues in peptides, far surpassing prior techniques limited to short fragments. Its automation in the 1960s and 1970s, particularly through instruments like the spinning-cup sequenator, facilitated the sequencing of longer proteins, accelerating research. The advent of in the 1970s bridged and protein biochemistry, allowing primary structures to be inferred from corresponding gene sequences via the . In 1977, and Allan Maxam introduced a chemical cleavage method that directly sequenced DNA by generating base-specific fragments, while independently developed chain-termination sequencing using dideoxynucleotides, which became the dominant technique for its efficiency and accuracy. These methods enabled the rapid determination of nucleotide orders in genes, thereby predicting sequences in encoded proteins and transforming the study of primary structures from labor-intensive direct protein analysis to genome-driven inference. The completion of the in 2003 provided a reference sequence for the entire , encompassing approximately 20,000 protein-coding genes and enabling large-scale inference. This milestone allowed scientists to deduce the primary structures of virtually all human proteins from genomic data, bypassing traditional sequencing limitations and fostering to identify sequence variations across species. By integrating with post-genomic tools, it laid the groundwork for systematic annotation of protein sequences and their functional implications. Advancements in computational prediction culminated with DeepMind's , which from 2018 onward dramatically enhanced the utility of primary sequences by accurately modeling three-dimensional structures. AlphaFold's debut at the CASP13 competition in 2018 showcased improved sequence-based folding predictions, but its 2020-2021 iteration (AlphaFold 2) achieved near-experimental accuracy for diverse proteins, as validated in CASP14. By 2021, the system had predicted structures for nearly all known protein sequences in public databases, linking primary structure directly to higher-order folds and accelerating and evolutionary studies.

References

  1. [1]
    Biochemistry, Primary Protein Structure - StatPearls - NCBI Bookshelf
    Oct 31, 2022 · To reiterate, the primary structure of a protein is defined as the sequence of amino acids linked together to form a polypeptide chain.
  2. [2]
    Levels of Protein Organization
    A protein's primary structure is defined as the amino acid sequence of its polypeptide chain; secondary structure is the local spatial arrangement of a ...
  3. [3]
    The Shape and Structure of Proteins - Molecular Biology of the Cell
    Biologists distinguish four levels of organization in the structure of a protein. The amino acid sequence is known as the primary structure of the protein.<|control11|><|separator|>
  4. [4]
    Determination of the Primary Sequence/ Structure | NIST
    Oct 15, 2015 · The focus of this chapter is to demonstrate primary structure confirmation using the recombinant human IgG1ҝ NIST mAb. Mass spectrometry based ...
  5. [5]
    7.3: Primary structure of proteins - Chemistry LibreTexts
    Sep 21, 2023 · The primary structure of proteins is the sequence of amino acids linked by peptide bonds, written from N-terminus to C-terminus.
  6. [6]
    Physiology, Proteins - StatPearls - NCBI Bookshelf
    The first level is the primary structure because it is the most basic protein structure. It is composed of the linear order of amino acid residues. All of the ...
  7. [7]
    Insulin Biosynthesis, Secretion, Structure, and Structure-Activity ...
    Feb 1, 2014 · Containing two chains (A and B) connected by disulfide bonds, the mature hormone is the post-translational product of a single-chain precursor, ...Insulin Biosynthesis... · Insulin Biogenesis And... · Insulin Structure
  8. [8]
    Biochemistry, Proteins Enzymes - StatPearls - NCBI Bookshelf - NIH
    This sequence of amino acids in a polypeptide chain is called the primary structure. This, in turn, determines the three-dimensional structure of the enzyme, ...
  9. [9]
    Designing gene libraries from protein profiles for combinatorial ... - NIH
    The large number of possible sequences can be a limitation in combinatorial experiments. A 100-residue protein has more than 10130 possible sequences, ...
  10. [10]
    Mechanisms of protein evolution - PMC - NIH
    Macrotransitions: genetic mutations induce changes in protein structure. (a) A single amino acid mutation (I45Y, red) leads to a fold change as exemplified by ...
  11. [11]
    Sickle Cell Disease—Genetics, Pathophysiology, Clinical ...
    May 7, 2019 · Sickle cell disease (SCD) is a monogenetic disorder due to a single base-pair point mutation in the β-globin gene resulting in the substitution ...
  12. [12]
    Translation Phases in Eukaryotes - Ribosome Biogenesis - NCBI - NIH
    Jul 8, 2022 · The process of translation can be divided into four main phases: initiation, elongation, termination, and ribosome recycling.Introduction · Initiation · Elongation · Termination
  13. [13]
    On the origin of degeneracy in the genetic code - PMC
    Oct 18, 2019 · Abstract. The degeneracy of amino acid coding is one of the most crucial and enigmatic aspects of the genetic code.
  14. [14]
    Aminoacyl-tRNA synthetases - PMC - PubMed Central
    The robust prevention, correction and repair activities of DNA polymerase keeps errors in replication at a low frequency of 1 per 10−8, while the proof-reading ...
  15. [15]
    Solid Phase Peptide Synthesis. I. The Synthesis of a Tetrapeptide
    Advancing Sustainable Synthesis of Cyclic Peptides by Integrating Aqueous Fmoc/t-Bu Solid-Phase Peptide Synthesis with Disulfide Bond Formation and TFA/PFAS- ...
  16. [16]
    Bruce Merrifield and solid‐phase peptide synthesis: A historical ...
    Apr 21, 2008 · Garland Marshall recalls after the publication of Bruce's first paper in 1963 a “steady stream of prominent scientists visited the laboratory.HISTORICAL SOURCES · CHALLENGES IN THE... · SCIENTIFIC LEGACY OF R...
  17. [17]
    Introduction to Peptide Synthesis - PMC - NIH
    Peptide synthesis involves stepwise assembly from amino acids via coupling, using solid support, and coupling reagents, with coupling forming peptide bonds.
  18. [18]
    Guide to Solid Phase Peptide Synthesis - AAPPTEC
    An introductory guide to solid phase peptide synthesis. Resins, amino acid derivatives, coupling reagents, common problems and their solutions, peptide ...
  19. [19]
    Chemical Wastes in the Peptide Synthesis Process and Ways to ...
    The solid-phase peptide synthesis is unsuitable for peptides with more than 50 amino acids or large peptides. Thus, investigations have focused on developing ...
  20. [20]
    Suppression of alpha-carbon racemization in peptide synthesis ...
    Sep 1, 2023 · Occurrence and minimization of cysteine racemization during stepwise solid-phase peptide synthesis. J. Org. Chem. 62, 4307–4312 (1997) ...
  21. [21]
    Synthesis of oxytocin using the solid phase technique
    The peptide-resin was washed with chloroform, then shaken 10 min with a mixture of approximately 4 NHC1 in dioxane (to remove the NPS group), then washed three.
  22. [22]
    Automated solid-phase peptide synthesis to obtain therapeutic ...
    Automated solid-phase peptide synthesis (SPPS) offers a suitable technology to produce chemically engineered peptides.
  23. [23]
    Liquid Chromatography Mass Spectrometry-Based Proteomics - NIH
    A LC-MS-based proteomic experiment requires several steps of sample preparation (Figure 2), including cell lysis to break cells apart, protein separation to ...
  24. [24]
    Tandem Mass Spectrometry (MS/MS) Protein Analysis
    Tandem mass spectrometry (MS/MS) allows the fragmentation of proteins and peptides to determine the amino acid sequence of proteins.
  25. [25]
    Automated Protein Identification by Tandem Mass Spectrometry
    Nov 11, 2005 · Protein identification by tandem mass spectrometry (MS/MS) is key to most proteomics projects and has been widely explored in bioinformatics ...
  26. [26]
    [PDF] [1] Molecular Weight Determination of Peptides and Proteins by ESI ...
    The techniques of electrospray/ionization (ESI) and matrix‐assisted laser desorption/ionization (MALDI) have revolutionized biological mass spectrometry (MS) ...
  27. [27]
    Next-Generation Sequencing Technology: Current Trends and ... - NIH
    Next-generation sequencing (NGS) is a powerful tool used in genomics research. NGS can sequence millions of DNA fragments at once.
  28. [28]
    Next-Generation Sequencing (NGS) | Explore the technology - Illumina
    The technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA. NGS has revolutionized the biological sciences ...
  29. [29]
    Highly accurate protein structure prediction with AlphaFold - Nature
    Jul 15, 2021 · AlphaFold greatly improves the accuracy of structure prediction by incorporating novel neural network architectures and training procedures ...
  30. [30]
    AlphaFold two years on: Validation and impact - PNAS
    Here, we discuss some of the latest work based on AlphaFold, with a particular focus on its use within the structural biology community.
  31. [31]
    Comprehensive Overview of Bottom-Up Proteomics Using Mass ...
    Jun 4, 2024 · We provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and ...
  32. [32]
    [PDF] AMINO ACIDS AND PEPTIDES - iupac
    Only the C-terminal residue is represented by the name ofthe amino acid, and this ends the name ofthe peptide. Formulas should normally be written in the same ...
  33. [33]
    Sequences | UniProt help
    Jan 15, 2025 · The protein sequence displayed by default is the protein sequence to which all positional annotation refers. We call it the 'canonical' sequence ...Missing: notation | Show results with:notation
  34. [34]
    3AA-20 and 3AA-21 - IUPAC nomenclature
    One-letter symbol, Three-letter symbol, Amino acid. A, Ala, alanine. B, Asx, aspartic acid or asparagine. C, Cys, cysteine. D, Asp, aspartic acid.
  35. [35]
    Sequence annotation (Features) | UniProt help
    Sep 16, 2024 · Sequence annotations describe regions or sites of interest in the protein sequence, such as post-translational modifications, binding sites, enzyme active ...
  36. [36]
    FASTA Format for Nucleotide Sequences - NCBI - NIH
    Jun 18, 2025 · In FASTA format the line before the nucleotide sequence, called the FASTA definition line, must begin with a carat (">"), followed by a unique SeqID (sequence ...
  37. [37]
    Sequence logos: a new way to display consensus ... - PubMed
    Oct 25, 1990 · A graphical method is presented for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each ...Missing: original paper
  38. [38]
    PyMOL | pymol.org
    PyMOL is a user-sponsored, open-source molecular visualization system, maintained by Schrödinger, with most source code freely available.Educational-Use-Only · PyMOL Downloads and... · Buy PyMOL · SupportMissing: primary | Show results with:primary
  39. [39]
    A fresh look at the Ramachandran plot and the occurrence of ...
    The Ramachandran plot is a foundational concept used in biochemistry courses to describe the basic elements of protein structure, but in most cases the approach ...
  40. [40]
    Hemoglobin and Myoglobin - The Medical Biochemistry Page
    Sep 3, 2025 · A myoglobin polypeptide is comprised of 8 separate right handed α-helices, designated A through H, that are connected by short non helical ...
  41. [41]
    Discovering the landscape of protein modifications - ScienceDirect
    May 6, 2021 · By quantifying all protein modifications, we now count more than 500 discrete modifications (Figure 3A; Table S1). Although all 20 amino acids ...
  42. [42]
    Glycosylation: mechanisms, biological functions and clinical ... - Nature
    Aug 5, 2024 · Glycosylation is one of the most common PTMs, in which polysaccharides are transferred to specific amino acid residues in proteins by glycosyltransferases.<|control11|><|separator|>
  43. [43]
    Control of protein stability by post-translational modifications - Nature
    Jan 13, 2023 · In this review, we address the regulation of protein stability through PTMs with a focus on the emerging field of protein stability control by ...
  44. [44]
    Insights into the post-translational modification and its emerging role ...
    Dec 20, 2021 · In this review, we focus on recent developments in the PTM area and speculate on their importance as a critical functional readout for the regulation of TME.
  45. [45]
    The role of histone modifications: from neurodevelopment to ...
    Jul 6, 2022 · A growing body of evidence suggests that epigenetic mechanisms, such as histone modifications, allow the fine-tuning and coordination of spatiotemporal gene ...
  46. [46]
    Catalysis of proline isomerization and molecular chaperone activity ...
    Nov 27, 2020 · Catalysis of cis/trans isomerization of prolines is important for the activity and misfolding of intrinsically disordered proteins.
  47. [47]
    Both the cis-trans equilibrium and isomerization dynamics of ... - PNAS
    Our results show that both the chemical equilibrium and rate of cis-trans isomerization of proline 32 are critical for the solubility of β2-microglobulin.
  48. [48]
    An unusual signal peptide facilitates late steps in the biogenesis of a ...
    We found that removal of the N-terminal motif or replacement of the EspP signal peptide did not affect translocation of the protein across the IM. Remarkably, ...
  49. [49]
    Proteolytic enzymes, past and future - PNAS
    However, some protease precursors can regulate their own activation, e.g., trypsinogen, whereas others, e.g., plasminogen, do not require peptide bond ...Proteolytic Enzymes, Past... · Abstract · Sign Up For Pnas Alerts
  50. [50]
    Modified Western blotting for insulin and other diabetes-associated ...
    Jul 31, 2017 · Finally, C-peptide is cleaved out from proinsulin by several prohormone convertases (PC1/3, PC2, and CPE) in the Golgi apparatus, and then ...
  51. [51]
    Exploring chemoselective S-to-N acyl transfer reactions in synthesis ...
    May 24, 2017 · The first step of protein splicing involves an N-to-S acyl transfer between the intein Cys and N-extein producing a thioester intermediate (the ...
  52. [52]
    Archaeal Connectase is a specific and efficient protein ligase related ...
    Mar 9, 2021 · The transpeptidase reaction it catalyzes is highly sequence-specific, allowing it to efficiently ligate proteins that carry the recognition ...
  53. [53]
    A direct comparison of helix propensity in proteins and peptides
    Here we present a direct comparison of the helix propensity of the nonpolar amino acids measured in an α-helix in an intact protein, and in an α-helical ...
  54. [54]
    Intrinsic α-helical and β-sheet conformational preferences
    Knowing the intrinsic backbone conformational preferences of amino acids is necessary for a fundamental understanding of the dynamics of protein folding.
  55. [55]
    Alpha-helical, but not beta-sheet, propensity of proline is ... - NIH
    Proline is established as a potent breaker of both alpha-helical and beta-sheet structures in soluble (globular) proteins.
  56. [56]
    Prediction of protein conformation | Biochemistry - ACS Publications
    A multitask deep-learning method for predicting membrane associations and secondary structures of proteins.
  57. [57]
    Effects of side chains in helix nucleation differ from helix propagation
    (A) Models of the helix–coil transition consider helix formation to proceed in two steps consisting of nucleation and propagation steps. (B) In hydrogen bond ...
  58. [58]
    Free Energy Determinants of Secondary Structure Formation: I. α ...
    The Zimm-Bragg parameterssand σ are calculated for the helix-coil transition of poly-l-alanine. The theoretical approach involves evaluating gas phase ...
  59. [59]
    Structure and functions of keratin proteins in simple, stratified ...
    The amino acid sequence (i.e. the primary structure of a keratin ... These heptads are prerequisites for the formation of the α-helix and coiled-coil heterodimer ...
  60. [60]
    Coiled‐coils: The long and short of it - PMC - PubMed Central
    The canonical coiled‐coil is characterized by a heptad repeat in which hydrophobic residues are conserved at positions a and d. Undistorted α‐helices cannot ...Coiled‐coils Are... · Figure 1 · Figure 2
  61. [61]
    [PDF] Christian B. Anfinsen - Nobel Lecture
    It seems rea- sonable to suggest that portions of a protein chain that can serve as nucleation sites for folding will be those that can “flicker” in and out of ...
  62. [62]
    The Thermodynamic Hypothesis of Protein Folding - ResearchGate
    Aug 6, 2025 · In 1961, Christopher Anfinsen performed a critical experiment, which established the thermodynamic hypothesis of protein folding, or what became ...
  63. [63]
    Folding funnels and energy landscapes of larger proteins ... - PNAS
    Capillarity Description of a Folding Funnel. Bryngelson and Wolynes (9) examined folding barriers with capillarity ideas in 1990. Following classical nucleation ...Missing: original | Show results with:original
  64. [64]
    A View of the Hydrophobic Effect | The Journal of Physical Chemistry B
    The earliest such measurements on proteins were the basis for Walter Kauzmann's proposal for the importance of hydrophobicity in protein folding. 25,47 Kauzmann ...Missing: original | Show results with:original
  65. [65]
    Biochemistry, Tertiary Protein Structure - StatPearls - NCBI Bookshelf
    This structure encompasses four levels—primary, secondary, tertiary, and quaternary, each contributing to the overall shape and function of the protein ...
  66. [66]
    Hemoglobin: Structure, Function and Allostery - PMC
    In most vertebrates, Hb is a tetramer, consisting of two α-subunits (α1 and α2) and two β-subunits (β1 and β2) that are structurally similar and about the same ...Missing: motifs | Show results with:motifs
  67. [67]
    AlphaFold and Implications for Intrinsically Disordered Proteins
    Intrinsically disordered regions often evolve more rapidly than ordered regions and thus alignments of these regions are generally poorer and involve large gaps ...
  68. [68]
    Protein hydrolysates in animal nutrition: Industrial production ...
    Mar 7, 2017 · Acid hydrolysis of a protein (gelatin) at a high temperature was first reported by the French chemist H. Braconnot in 1920. It is now ...
  69. [69]
    Origin of the Word `Protein' - ADS
    This claim was based on the following passages in a letter to Mulder, the Dutch chemist, dated July 10, 1838: Publication: Nature.Missing: Gerardus coin
  70. [70]
    Mulder on protein
    In a series of experiments in the 1830s, Mulder heated a series of albuminous substances (eg, egg white, blood, milk solids, & plant gluten) with lye (NaOH).Missing: Gerardus coin 1838
  71. [71]
    Emil Fischer – Biographical - NobelPrize.org
    In 1901 he discovered, in collaboration with Fourneau, the synthesis of the dipeptide, glycyl-glycine and in that year he also published his work on the ...Missing: primary | Show results with:primary
  72. [72]
    [PDF] Emil Fischer (1852–1919) - American Thyroid Association
    He coined the term “peptide bond.” Fischer then started with individual purified amino acids and synthesized short peptides.Missing: primary source
  73. [73]
    The World - of Peptides
    Work in this field was continued by Emil Abderhalden. (1877-1950), one of Fischer's most faithful, devoted, and diligent associates (they published 30 joint ...
  74. [74]
    [PDF] Peptides from A to Z
    In 1902,. Abderhalden had joined → Emil Fischer's group and worked on protein hydrolysates and proteolytic enzymes which led, in 1904, to Habilitation. Further ...
  75. [75]
    Perspectives: Cracking the phase problem - NobelPrize.org
    In 1937 Perutz began the ambitious project of using X-ray diffraction to uncover the biological function of haemoglobin, the protein in red blood cells.
  76. [76]
    The BOHR effect before Perutz - Brunori - 2012 - IUBMB Journal
    Aug 1, 2012 · In 1937, one‐year after his arrival in Cambridge (UK), Max Perutz took the first X‐ray diffraction pictures of hemoglobin (Hb) crystals that he ...Missing: source | Show results with:source
  77. [77]
    Frederick Sanger – Facts - NobelPrize.org
    Beginning in the 1940s, Frederick Sanger studied the composition of the insulin molecule. He used acids to break the molecule into smaller parts, which were ...Missing: primary 1953
  78. [78]
    On 'A method for the determination of amino acid sequence in ... - NIH
    In 1945, the Sanger group published a method to determine the N ... N-terminal amino acids of peptides resulting in phenylthiocarbamyl amino acid derivatives.Missing: DNP original
  79. [79]
    DNA Sequencing Technologies–History and Overview - US
    In 1977, two pioneering methods for DNA sequencing were reported, one by Alan Maxam and Walter Gilbert [1] and the other by Frederick Sanger and colleagues.
  80. [80]
    Human Genome Project Fact Sheet
    Jun 13, 2024 · In 2003, the Human Genome Project produced a genome sequence that accounted for over 90% of the human genome. It was as close to complete as ...Missing: protein | Show results with:protein
  81. [81]
    The Human Genome Project: big science transforms biology and ...
    Sep 13, 2013 · The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence.
  82. [82]
    AlphaFold - Google DeepMind
    So far, AlphaFold has predicted over 200 million protein structures – nearly all catalogued proteins known to science. The AlphaFold Protein Structure Database ...