Protein secondary structure
Protein secondary structure refers to the local conformation of the polypeptide backbone in a protein, characterized by repeating patterns such as alpha-helices and beta-sheets, which are stabilized by hydrogen bonds between the carbonyl oxygen and amide hydrogen atoms of the peptide backbone.[1] These structures represent the first level of folding beyond the primary amino acid sequence and are crucial for determining the overall three-dimensional architecture and function of proteins.[1] The concept of secondary structure was first proposed in 1951 by Linus Pauling, Robert Corey, and Herman Branson, who described the alpha-helix and beta-sheet as fundamental motifs based on stereochemical constraints and hydrogen bonding patterns.[2] The most common secondary structural elements include the alpha-helix, a right-handed coiled structure with approximately 3.6 amino acid residues per turn and a pitch of 5.4 Å, where hydrogen bonds form between the carbonyl group of residue i and the amide nitrogen of residue i+4.[3] In this motif, the phi (Φ) and psi (Ψ) dihedral angles are typically around -57° and -47°, respectively, allowing side chains to project outward from the helix axis.[3] Alpha-helices are prevalent in both globular and fibrous proteins, such as keratin, and contribute to the stability of transmembrane segments.[1] Beta-pleated sheets consist of two or more beta-strands—extended polypeptide segments—aligned either in parallel or antiparallel orientations and linked by hydrogen bonds between backbone atoms of adjacent strands.[1] In antiparallel beta-sheets, the strands run in opposite directions with dihedral angles of approximately Φ = -139° and Ψ = 135°, while parallel sheets have angles around Φ = -119° and Ψ = 113°; both configurations result in a pleated appearance with side chains alternating above and below the plane.[3] These sheets often form the core of globular proteins and can assemble into more complex forms like beta-barrels in membrane proteins, such as porins.[1] In addition to helices and sheets, proteins feature irregular secondary elements like beta-turns and loops, which connect the regular motifs and allow the chain to reverse direction or adopt flexible conformations without extensive hydrogen bonding networks.[1] These regions, often involving specific amino acids like glycine or proline, facilitate the packing of secondary structures into higher-order tertiary folds.[1] Disruptions in secondary structure, such as excessive beta-sheet formation, are implicated in protein misfolding diseases including Alzheimer's and prion disorders.[1]Definition and Fundamentals
Definition
Protein secondary structure refers to the local conformation of the polypeptide backbone in a protein, characterized by regular, repeating patterns stabilized primarily by hydrogen bonds between the carbonyl oxygen and amide hydrogen atoms within the backbone, independent of side-chain interactions.[1] This level of structure emerges from the primary amino acid sequence and serves as a fundamental building block for the protein's overall three-dimensional architecture.[4] The concept of secondary structure was pioneered by Linus Pauling and Robert Corey, who in 1951 proposed the alpha helix as a coiled configuration where hydrogen bonds form between residues separated by three intervening amino acids along the chain, and the beta pleated sheet as a layered arrangement of extended strands linked laterally by interchain hydrogen bonds.[2][5] These structures, along with less regular elements such as turns and loops, allow the polypeptide to fold compactly while maintaining stability through non-covalent interactions.[6] In the alpha helix, the backbone adopts a right-handed spiral with 3.6 residues per turn and a pitch of approximately 5.4 Å, enabling efficient packing in both soluble and membrane proteins.[1] Beta sheets, in contrast, feature extended chains in a zigzag pattern, either parallel (strands running in the same direction) or antiparallel (opposite directions), forming the core of many globular proteins like enzymes.[1] Turns and loops, often involving 3-5 residues, connect these motifs and frequently occur at the protein surface, facilitating flexibility and interactions with other molecules.[6] Secondary structure elements are essential for protein function, as they dictate folding pathways, stability, and the positioning of functional groups, with disruptions leading to diseases such as amyloidosis.[4]Historical Background
The early investigations into protein secondary structure relied heavily on X-ray diffraction studies of fibrous proteins. In the 1930s, William T. Astbury and his collaborators at the University of Leeds analyzed keratin fibers from hair and wool, identifying distinct diffraction patterns they termed the "alpha" and "beta" forms, corresponding to unstretched and stretched states, respectively. These observations suggested regular, repeating structural features in proteins but lacked atomic-level details due to the limitations of the technology at the time. A breakthrough came in 1951 when Linus Pauling, Robert B. Corey, and Herman R. Branson at the California Institute of Technology proposed specific atomic models for protein secondary structures using wire-and-cardboard model-building techniques informed by known covalent bond lengths, angles, and van der Waals radii. In April, they described the α-helix, a right-handed coil stabilized by intra-chain hydrogen bonds between the carbonyl oxygen of residue i and the amide hydrogen of residue i+4, with 3.7 residues per turn and a pitch of 5.4 Å.[2] Just a month later, in May, Pauling and Corey introduced the β-pleated sheet, a layered configuration where adjacent polypeptide chains form inter-chain hydrogen bonds, creating extended, pleated structures observed in the beta form of silk fibroin and keratin.[5] These models were derived without full X-ray crystallographic data for entire proteins, relying instead on stereochemical feasibility and consistency with Astbury's diffraction patterns.[7] The concept of secondary structure as a distinct level of protein organization was formalized in 1952 by Danish biochemist Kaj Ulrik Linderstrøm-Lang during his lectures at Stanford University, where he distinguished it from primary (amino acid sequence), tertiary (overall fold), and later quaternary (multi-subunit assembly) structures.[8] This framework gained empirical validation in the late 1950s with the first X-ray crystal structures of proteins, such as myoglobin solved by John C. Kendrew in 1958, which prominently featured α-helices as predicted by Pauling. These developments laid the foundation for understanding how local hydrogen-bonding patterns dictate protein folding and function.[7]Types of Secondary Structures
Alpha Helix
The alpha helix is a prevalent motif in protein secondary structure, consisting of a right-handed helical coil formed by the polypeptide backbone, where each backbone amide group (N-H) forms a hydrogen bond with the carbonyl group (C=O) of the amino acid four residues earlier in the sequence. This intra-chain hydrogen bonding pattern stabilizes the structure, with bond lengths typically around 2.8–3.0 Å between the donor and acceptor atoms. The configuration ensures that all amino acid residues are stereochemically equivalent, with side chains projecting outward from the helix axis.[2] Proposed in 1951 by Linus Pauling, Robert Corey, and Herman Branson through model-building informed by X-ray diffraction data from amino acids and simple peptides, the alpha helix was one of the first regular secondary structures predicted for proteins, predating experimental confirmation. Their work identified it as a 3.7-residue helix (later refined to 3.6), emphasizing its compatibility with the planar peptide bonds and van der Waals radii of atoms in the backbone. This prediction was validated shortly after by X-ray crystallography of proteins like myoglobin, where alpha helices comprise about 75% of the structure.[2][9] Geometrically, the alpha helix features 3.6 amino acid residues per complete turn, a helical pitch (advance along the axis per turn) of 5.4 Å, and a translation of 1.5 Å per residue, resulting in a tightly coiled cylinder approximately 1–2 nm in diameter depending on side-chain bulk. The characteristic phi (φ) and psi (ψ) backbone dihedral angles are approximately -57° and -47°, respectively, placing the structure within the allowed region of the Ramachandran plot for non-glycine residues. These parameters arise from optimizing hydrogen bond geometry and minimizing steric clashes, with slight variations observed in real proteins due to side-chain interactions or environmental factors.[10][11] In soluble proteins, alpha helices account for roughly 30% of all residues, serving as scaffolds for tertiary structure formation through packing against other helices or sheets, often mediated by hydrophobic interactions between nonpolar side chains. Certain amino acids, such as alanine, leucine, and methionine, have high helix-forming propensities due to their ability to stabilize the core via van der Waals contacts, while proline disrupts helices by introducing kinks owing to its cyclic side chain. Helices also contribute to functional roles, such as in DNA-binding proteins like helix-turn-helix motifs or in membrane proteins where amphipathic helices span lipid bilayers. For instance, in hemoglobin, alpha helices form the oxygen-binding pockets, enabling cooperative function.[12][11] The stability of alpha helices is influenced by both local sequence and global context; isolated helices in short peptides are marginally stable in aqueous solution but are reinforced in proteins by capping interactions at the ends (e.g., asparagine or serine forming additional hydrogen bonds) and electrostatics like salt bridges between charged side chains (i, i+4 positions). Thermal denaturation studies show helix melting temperatures around 50–60°C for model peptides, underscoring the cooperative nature of unfolding. Variants like the pi-helix (4.4 residues per turn) are rarer, comprising less than 1% of helical structures, as they accommodate suboptimal hydrogen bonding geometries.[12]Beta Sheet
The β-sheet, also known as the β-pleated sheet, is a prevalent form of regular secondary structure in proteins, composed of two or more β-strands—extended segments of polypeptide chain—aligned adjacently and stabilized by a network of hydrogen bonds between their backbone carbonyl oxygen and amide hydrogen atoms. This configuration allows for efficient packing of the polypeptide backbone, with the side chains projecting alternately above and below the plane of the sheet. The structure was originally proposed by Linus Pauling and Robert B. Corey in 1951 as a "pleated sheet" layer of polypeptide chains, where the pleats arise from the zigzag arrangement of the peptide planes, enabling optimal hydrogen bonding between adjacent chains in an extended conformation.[5] β-Strands in a sheet typically span 5 to 10 amino acid residues, adopting a nearly fully extended backbone with characteristic Ramachandran dihedral angles of φ ≈ −140° and ψ ≈ +130°; these angles position the backbone for interstrand hydrogen bonding while minimizing steric clashes. The hydrogen bonds form a ladder-like pattern across strands, with typical N–H···O distances of about 2.9 Å and near-linear geometry (N–H···O angles close to 180°). In practice, β-sheets are rarely flat; they exhibit a right-handed twist of approximately 30° per residue due to favorable side-chain orientations and backbone rigidity, which enhances stability and allows the sheet to curve into motifs like β-barrels or β-propellers.[13][13] β-Sheets occur in two primary topologies: antiparallel, where adjacent strands run in opposite directions (N-terminus to C-terminus), and parallel, where strands run in the same direction. In antiparallel sheets, hydrogen bonds alternate directly between paired carbonyl and amide groups, resulting in more uniform and stronger bonds compared to the slightly distorted, wider-spaced bonds in parallel sheets; this makes antiparallel arrangements generally more stable and common in isolated sheets. Parallel sheets, often embedded within larger mixed topologies, tend to require longer connecting loops and are frequently buried in protein cores to shield their less optimal bonding from solvent. Amino acid preferences differ markedly between the two: antiparallel strands favor hydrophobic residues like valine and isoleucine for tight packing, while parallel strands show a propensity for asparagine and aspartate, which can form additional side-chain hydrogen bonds to compensate for backbone irregularities.[14][14][14] Distortions such as β-bulges—insertions of extra residues that disrupt the regular hydrogen-bonding pattern—commonly occur to accommodate sequence variations or functional needs, allowing sheets to bend or adjust without losing overall integrity. In fibrous proteins like silk fibroin from Bombyx mori, antiparallel β-sheets dominate, with stacked crystalline layers of Gly-Ala repeats providing exceptional tensile strength (up to 1 GPa) due to the dense hydrogen-bond network and intersheet van der Waals interactions.[13][15][15] In globular proteins, β-sheets often form the core of structural domains, as seen in the immunoglobulin fold (PDB ID 1icf), where antiparallel sheets create a β-sandwich stabilized by a hydrophobic interface, or in triosephosphate isomerase (PDB ID 1tim), featuring a parallel β-sheet barrel surrounded by α-helices. These motifs underscore the β-sheet's role in mediating protein folding, stability, and interactions; aberrant β-sheet aggregation, as in amyloid fibrils, is implicated in diseases like Alzheimer's, where cross-β structures propagate via templated misfolding.[16][16]Turns and Loops
Turns and loops represent irregular regions of protein secondary structure that lack the repetitive hydrogen bonding patterns of alpha helices and beta sheets, instead serving primarily to connect these regular elements and enable the overall three-dimensional folding of the polypeptide chain.[6] These motifs are essential for reversing the direction of the backbone, accommodating spatial constraints, and contributing to protein stability and function, often comprising hydrophilic residues exposed to solvent.[17] In typical globular proteins, turns and loops account for approximately 20-30% of residues, with their flexibility allowing dynamic conformational changes critical for enzymatic activity and molecular recognition.[18] Beta-turns, the most common type of turn, involve four consecutive amino acid residues (i to i+3) where the chain direction reverses sharply, defined by a Cα(i) to Cα(i+3) distance of less than 7 Å and often stabilized by a hydrogen bond between the carbonyl oxygen of residue i and the amide hydrogen of residue i+3.[19] First proposed by Venkatachalam in 1968 through stereochemical modeling of peptide units, beta-turns were identified as a distinct secondary structure motif alongside helices and sheets, with initial classifications into types I, II, and III based on backbone dihedral angles (φ, ψ).[19] Subsequent refinements by Richardson in 1981 expanded this to eight canonical types (I, I', II, II', VIa, VIb, IV, VIII), characterized by specific φ and ψ values; for example, type I features φ2 ≈ -60°, ψ2 ≈ -30°, φ3 ≈ -90°, ψ3 ≈ 0°, while type II has a cis-like glycine preference at position 3.[20] Type I and type IV are the most prevalent, occurring in about 38% and 32% of beta-turns, respectively, and they frequently feature asparagine, aspartic acid, or proline at key positions due to their ability to adopt strained conformations.[17] Other turn types include gamma-turns, which span three residues with a Cα(i) to Cα(i+2) distance under 7 Å and a hydrogen bond from i to i+2, often involving classic (φ ≈ 70°, ψ ≈ -60°) or inverse (φ ≈ -70°, ψ ≈ 60°) variants stabilized by residues like asparagine or serine.[21] Pi-turns, encompassing five residues, are rarer and feature a 4→1 hydrogen bond, while alpha-turns (five residues) and wider motifs like beta-hairpin loops bridge strands in beta sheets.[21] These tight turns are crucial for compact folding, with statistical analyses showing they cluster at protein surfaces and interfaces, influencing stability through side-chain interactions. Loops, in contrast, are longer irregular segments (typically 5-30 residues) that connect distant secondary structure elements without strict hydrogen bonding patterns, often adopting variable conformations classified as "coil" in secondary structure assignment schemes like DSSP.[22] A prominent subclass, omega (ω) loops, consists of 6 or more residues forming a rigid, loop-shaped structure with ends separated by 5-10 Å in space despite sequence contiguity up to 18 residues apart, as defined by Leszczynski and Rose in 1986 through analysis of 67 high-resolution protein structures revealing 270 such motifs.[22] These loops, frequently surface-exposed and hydrophilic, function as independent folding units and are enriched in functional sites, such as active centers in enzymes like subtilisin where they contribute to substrate binding.[22] Overall, turns and loops enhance protein versatility, with mutations in these regions often linked to diseases like cystic fibrosis due to disrupted folding or ligand interactions.[23]Classification Systems
DSSP Classification
The Dictionary of Secondary Structure of Proteins (DSSP) is an algorithm for assigning secondary structure elements to the amino acid residues in a protein based on its three-dimensional atomic coordinates, rather than predicting structure from sequence. Developed by Wolfgang Kabsch and Chris Sander in 1983, DSSP analyzes the pattern of hydrogen bonds within the protein backbone to identify structural motifs, approximating intuitive notions of secondary structure through objective criteria. It processes PDB files or equivalent coordinate data to classify residues into one of originally eight states, expanded to nine in DSSP 4, focusing on hydrogen-bonded and geometrical features such as backbone dihedral angles and bond distances. This method has become the de facto standard for secondary structure annotation in structural biology, with over 400 citations annually and integration into major databases like the Protein Data Bank (PDB).[24][25] At its core, the DSSP algorithm detects hydrogen bonds using an electrostatic model that considers the donor-acceptor distance (typically <3.0 Å for the hydrogen to acceptor) and the angle between the donor-hydrogen and hydrogen-acceptor vectors (deviation <30° from linearity). These bonds define primary elements like alpha-helices (sequential i to i+4 bonds) and beta-sheets (antiparallel or parallel bridges between strands). Residues not involved in such bonds are classified based on local geometry, such as turns or bends. The assignment is residue-specific, allowing for nuanced descriptions beyond binary helix/sheet categories, and it handles irregularities like distorted helices or isolated bridges. This hydrogen-bond-centric approach ensures consistency across protein structures, though it can be sensitive to coordinate resolution and refinement quality.[24][26] DSSP classifies residues into the following nine secondary structure types (eight original plus one new in DSSP 4), each denoted by a single-letter code:| Code | Structure Type | Description |
|---|---|---|
| H | α-helix | Right-handed coil with 3.6 residues per turn, stabilized by i to i+4 hydrogen bonds. |
| G | 3₁₀-helix | Tighter helix with 3.0 residues per turn, i to i+3 bonds, often at helix ends. |
| I | π-helix | Wider helix with 4.4 residues per turn, i to i+5 bonds, less common. |
| E | Extended strand | Part of a β-sheet, involved in extended hydrogen-bonded ladders. |
| B | β-bridge | Isolated β-pair without full sheet formation. |
| T | Hydrogen-bonded turn | Short motif (e.g., type I or II) with non-helical hydrogen bonds. |
| S | Bend | Curved backbone without hydrogen bonds, based on dihedral angles. |
| - | Loop | Irregular coil with no defined hydrogen bonds or geometry. |
| P | κ-helix (poly-proline II) | Left-handed extended helix with approximately 3 residues per turn, common in unstructured regions and transmembrane proteins. |
Other Assignment Methods
In addition to the DSSP algorithm, several other computational methods have been developed to assign protein secondary structures from atomic coordinates, employing diverse criteria such as dihedral angles, backbone geometry, and knowledge-based potentials to address limitations in hydrogen-bond-centric approaches.[26] These alternatives often aim to improve consistency in identifying irregular or edge elements like helix caps and strand distortions, though they can yield varying assignments for the same structure due to differing definitions.[29] The STRIDE algorithm uses a combination of hydrogen bond patterns and empirical phi/psi dihedral angle propensities derived from known protein structures to classify residues into alpha-helices, beta-strands, or coils, providing smoother transitions at secondary structure boundaries compared to DSSP.[29] It incorporates a spline-fitting procedure for beta-strands to better capture twisted conformations, achieving higher agreement with manual assignments in benchmark tests on globular proteins.[30] DEFINE relies solely on Cα atom coordinates, comparing inter-residue distances and angles to idealized geometries of helices and sheets without considering hydrogen bonds, which makes it computationally efficient for low-resolution models.[31] This method identifies structural motifs by masking expected distance patterns, such as 5.4 Å for adjacent Cα in alpha-helices, and has been influential in early automated analyses of supersecondary structures.[32] P-SEA assigns secondary structures using only the Cα trace, applying pattern recognition on local curvature and torsion angles to delineate helices and strands, often resulting in fewer assigned helices but more extended strands than hydrogen-bond-based methods.[33] It excels in handling distorted elements by prioritizing backbone linearity, with applications in fold recognition where precise atomic data is unavailable.[26] More recent tools like KAKSI focus on phi/psi dihedral angles and Cα distances to emphasize linear helices while minimizing assignments to curved or kinked variants, reducing over-assignment of irregular structures observed in older methods.[26] Similarly, SEGNO employs geometric criteria including residue distances, bond angles, and virtual torsion angles to classify elements, incorporating evolutionary conservation signals for enhanced accuracy in divergent protein families.[34]| Method | Primary Criteria | Key Advantages | Original Reference |
|---|---|---|---|
| STRIDE | Hydrogen bonds + dihedral angles | Better edge detection | Frishman & Argos (1995)[29] |
| DEFINE | Cα distances and angles | Efficiency for coarse models | Richards & Kundrot (1988)[31] |
| P-SEA | Cα trace curvature/torsion | Handles distortions | Labesse et al. (1997)[33] |
| KAKSI | Dihedrals + Cα distances | Linear element focus | Martin et al. (2005)[26] |
| SEGNO | Geometry + evolutionary signals | Reflects physical properties | Sonego et al. (2005)[34] |