Fact-checked by Grok 2 weeks ago

Protein complex

A protein complex is a group of two or more polypeptide chains that associate non-covalently, either stably or transiently, to form a functional multimolecular capable of executing coordinated biological tasks. These assemblies range in size from simple dimers, such as consisting of four subunits, to large structures like the with over 30 subunits, and they are distinguished from multidomain proteins by their composition of separate, interacting polypeptide chains rather than a single chain with multiple domains. Protein complexes are fundamental to cellular organization and function, serving as the primary units for processes including , enzymatic catalysis, , and structural maintenance within the cell. For instance, the anaphase-promoting complex (), comprising 11 core proteins, regulates progression by targeting specific proteins for degradation, while the SAGA/TFIID complex, with 14 subunits, facilitates transcription by modifying structure. Disruptions in protein complex formation or stability are implicated in numerous diseases, such as cancer and neurodegenerative disorders, where misassembly leads to loss of function or aberrant signaling. The study of protein complexes has advanced through techniques like affinity purification-mass spectrometry (AP-MS), which identifies interacting partners, and proximity labeling methods such as BioID, enabling the mapping of dynamic interactomes in living cells. Recent large-scale efforts, including the BioPlex network documenting nearly 120,000 human protein interactions, have revealed that most proteins participate in complexes, underscoring their prevalence and evolutionary conservation across species. These insights highlight protein complexes as co-evolved units essential for integrating cellular responses and maintaining homeostasis.

Overview

Definition

A protein complex is an assembly of two or more polypeptide chains that interact through non-covalent forces to form a functional unit. These interactions include , ionic bonds (electrostatic attractions), , and hydrophobic effects, which collectively provide the specificity and stability required for the complex's biological role without forming covalent linkages. Unlike individual proteins, which are single polypeptide chains that fold into functional structures, protein complexes involve multiple chains cooperating to achieve functions that a solitary protein cannot perform alone. Protein complexes are also distinct from protein aggregates, which are typically non-specific, disordered accumulations of misfolded proteins often associated with pathological conditions such as neurodegenerative diseases, lacking the organized, functional architecture of true complexes. Classic examples illustrate these principles. Hemoglobin, a heterotetrameric complex in vertebrates, consists of two α subunits and two β subunits arranged in a tetrahedral symmetry, enabling cooperative oxygen binding and transport. In contrast, the proteasome represents a large multi-subunit complex, with the 26S form comprising a cylindrical 20S core particle (formed by 28 subunits in four stacked rings) capped by 19S regulatory particles, facilitating targeted protein degradation in eukaryotic cells. The stoichiometry and spatial arrangement of subunits in such complexes are critical for their stability and efficiency, often determined by the complementary surfaces of interacting polypeptides.

Biological Importance

Protein complexes are fundamental to the execution of most cellular functions, with estimates indicating that the majority (nearly 70%) of human proteins operate as components of such assemblies rather than as isolated monomers. Proteomic analyses, including those compiled in the , underscore this prevalence by cataloging over 7,000 experimentally verified complexes across mammalian systems, highlighting their role in coordinating diverse biochemical activities. By assembling multiple protein subunits, these complexes enable multifunctionality, allowing a single entity to integrate catalytic, regulatory, and structural activities with enhanced efficiency and specificity. This modularity reduces cellular resource demands while minimizing off-target effects, as seen in core processes where precise spatiotemporal control is critical. Comprehensive mapping efforts, such as hu.MAP 3.0 derived from over 25,000 mass spectrometry experiments (as of 2024), have identified thousands of distinct complexes involving nearly 70% of human proteins, illustrating the scale of this organizational principle in human cells. Protein complexes are indispensable for key cellular processes, including , , and , where their disruption compromises organismal fitness. Aberrations in complex assembly or stability, often arising from genetic or environmental stressors, are implicated in major diseases such as cancer and neurodegeneration, leading to dysregulated signaling, metabolic imbalances, and genomic instability.

Functions

Structural Roles

Protein complexes play crucial roles in forming scaffolds that provide structural integrity and selective barriers within the . The nuclear pore complex (NPC), a massive protein assembly composed of approximately 30 different nucleoporins forming an octagonal scaffold, perforates the to facilitate controlled nucleocytoplasmic transport while acting as a selective permeability barrier. This scaffold, with its central channel lined by phenylalanine-glycine (FG) repeat nucleoporins, allows passive diffusion of small molecules but restricts larger cargoes unless bound to transport receptors, thereby maintaining nuclear compartmentalization. Similarly, in the , the - complex serves as a dynamic scaffold for cellular motility and shape maintenance; filaments provide tracks for motor proteins, enabling force generation and structural support in processes like and . Protein complexes also contribute to compartmentalization by creating specialized microenvironments that organize biochemical reactions. In mitochondria, the respiratory chain complexes I–IV, embedded in the inner , form higher-order assemblies known as supercomplexes or respirasomes, which spatially segregate pathways to enhance efficiency and prevent reactive oxygen species leakage. These complexes channel electrons from NADH and FADH₂ to oxygen, coupling oxidation to proton translocation across the , thus establishing proton gradients essential for ATP within the confined cristae architecture. Beyond intracellular roles, protein complexes ensure mechanical stability in extracellular matrices. Collagen fibrils, supramolecular assemblies of triple-helical collagen molecules cross-linked into staggered arrays, provide tensile strength and resilience to tissues like tendons and ; their , with diameters of 50–200 nm, withstands physiological stresses up to several hundred while resisting enzymatic degradation under load. This structural reinforcement maintains tissue integrity against mechanical forces. Allosteric regulation in protein complexes often arises from ligand-induced structural changes that propagate through the assembly, modulating function without direct interaction. In multi-subunit complexes, binding at distal sites can induce conformational shifts, such as rigid-body rotations or loop rearrangements, altering inter-subunit interfaces and thereby influencing binding affinities or catalytic efficiencies across the complex. For instance, in heterotetrameric complex—oxygen binding to one subunit triggers and rearrangements that enhance oxygen uptake, exemplifying how underpin regulatory control.

Regulatory and Catalytic Roles

Protein complexes play crucial roles in catalysis by assembling multiple subunits that specialize in distinct aspects of enzymatic activity, enabling efficient and regulated biochemical reactions. In multi-subunit enzymes like RNA polymerase II (Pol II), the core subunits such as Rpb1 and Rpb2 form the catalytic center responsible for nucleotide addition and DNA binding, while accessory factors like the Mediator complex, comprising around 20 proteins, facilitate promoter recognition and signal integration from transcription factors. This specialization allows Pol II to synthesize messenger RNA with high fidelity during eukaryotic transcription initiation and elongation. The division between core and accessory components ensures that the enzyme responds dynamically to cellular cues, such as activators that bridge enhancers to the basal machinery, thereby modulating catalytic output. In , protein complexes involving G-protein-coupled receptors (GPCRs) amplify extracellular signals through coordinated subunit interactions. GPCRs form assemblies with heterotrimeric G-proteins, where binding induces conformational changes that promote GDP/GTP on the Gα subunit, leading to dissociation and activation of downstream effectors like for second messenger production, such as . These complexes enhance signal amplification by allowing one activated receptor to engage multiple G-proteins, generating diverse signaling profiles via promiscuous coupling to G_s, G_i/o, or G_q/11 families. Accessory proteins like arrestins further diversify outputs by scaffolding kinases for pathways independent of G-proteins, such as MAPK activation. Regulatory checkpoints in the cell cycle rely on protein complexes like the anaphase-promoting complex/cyclosome (APC/C), a multi-subunit E3 ubiquitin ligase that orchestrates progression through mitosis and G1 phases. APC/C targets substrates such as securin and cyclin B for proteasomal degradation, activating separase for sister chromatid separation and inactivating Cdk1 for mitotic exit, respectively; this is tightly controlled by co-activators Cdc20 (in metaphase) and Cdh1 (in G1), which confer substrate specificity via motifs like the D-box. The ubiquitin-proteasome system (UPS) itself functions as a large complex, with the 26S proteasome—comprising a 20S catalytic core and 19S regulatory particle—unfolding and degrading polyubiquitinated proteins in an ATP-dependent manner to maintain protein homeostasis and regulate turnover of short-lived regulators. The proximity of subunits within these complexes enhances specificity by localizing reactive intermediates and reducing off-target interactions. In multi-enzyme assemblies, such as those mimicking natural cascades, channels substrates between active sites, minimizing diffusion losses and increasing local concentrations to boost reaction rates up to fivefold while limiting side products. This principle underlies the functional precision of complexes like Pol II and APC/C, where subunit interfaces ensure selective and regulation without extraneous reactivity.

Classification

Obligate vs Non-Obligate Complexes

Protein complexes are classified as or non- based on the structural and functional independence of their constituent subunits. In complexes, the individual protomers (subunits) cannot fold into stable, functional structures independently ; their stability and activity depend on assembly into the complex, and dissociation typically leads to unfolding or aggregation. For instance, the hemoglobin tetramer exemplifies an complex, where free α- and β-globin subunits are unstable and prone to without their partners, necessitating rapid for oxygen transport function. This dependency arises from extensive intersubunit interfaces that contribute to overall folding, often involving hydrophobic cores buried upon association. In contrast, non-obligate complexes form between protomers that are stable and capable of independent function, associating reversibly to enhance or modulate activity without requiring the interaction for basic structural integrity. A classic example is the enzyme-substrate complex, such as in kinase-substrate interactions, where the enzyme and substrate maintain their folded states alone but bind to facilitate catalysis, allowing for dynamic regulation. These associations enable modularity in cellular processes, permitting subunits to participate in multiple partnerships as needed. Biophysically, the distinction is often quantified by the equilibrium dissociation constant (K_d), which measures binding affinity; complexes exhibit very low K_d values, typically below 10^{-9} M (1 nM), reflecting their high stability and rarity of dissociation under physiological conditions. Non- complexes, however, display higher K_d values, often in the micromolar range, supporting reversible interactions. Functionally, complexes suit permanent roles in core cellular machinery, such as multisubunit enzymes, while non- ones promote adaptability in signaling and regulatory pathways. This classification underscores how subunit autonomy influences design and evolutionary conservation.

Transient vs Stable Complexes

Protein complexes are classified into transient and stable categories based on the duration and reversibility of their subunit associations, which influence their roles in cellular processes. Transient complexes form and dissociate rapidly, often in response to specific signals, with lifetimes typically ranging from seconds to minutes. In contrast, stable complexes persist for extended periods, sometimes throughout the or longer, maintaining structural integrity for essential functions. This distinction is determined primarily by the of association and dissociation rates, where transient interactions exhibit high off-rates, allowing quick disassembly. Transient complexes are crucial for dynamic cellular responses, such as pathways. For instance, in the (MAPK) cascade, kinases transiently associate with substrates and scaffolds to propagate signals, enabling rapid adaptation to environmental cues like growth factors. These interactions are often regulated by post-translational modifications, such as , which modulate binding affinities and facilitate timely dissociation. The short-lived nature of these complexes ensures specificity and prevents prolonged signaling that could lead to cellular dysfunction. Stable complexes, on the other hand, support housekeeping functions that require consistent activity, such as protein synthesis or . The exemplifies a stable complex, where ribosomal subunits assemble stoichiometrically and remain associated for hours or the duration of events, ensuring efficient polypeptide . These complexes typically feature low dissociation rates, often reinforced by multiple interaction interfaces that confer thermodynamic . In some cases, stable complexes may relate to interactions where subunits are interdependent for folding, though the primary here focuses on temporal . The biological context underscores the functional divergence: transient complexes enable adaptability in processes like immune responses or , while stable ones provide reliability for core metabolic pathways. Experimental quantification of these kinetics often involves techniques like or , revealing dissociation constants (K_d) in the micromolar range for transient interactions versus nanomolar for stable ones. This temporal classification highlights how protein complexes balance flexibility and permanence to sustain cellular .

Fuzzy Complexes

Fuzzy protein complexes represent a class of biomolecular assemblies in which intrinsically disordered regions (IDRs) of proteins maintain conformational heterogeneity and dynamic disorder even upon binding to their partners, enabling specific interactions without the need for complete structural folding. This contrasts with traditional rigid complexes, as the bound state features an ensemble of interconverting conformations rather than a single discrete structure. A classic example is the p53-MDM2 complex, where the intrinsically disordered transactivation domain of p53 binds to the MDM2 protein while retaining partial disorder, which enhances binding affinity through multivalent interactions. Key characteristics of fuzzy complexes include their topological diversity, such as polymorphic where multiple modes coexist, or clamp-like structures with flanking disordered segments that stabilize the . These facilitate allostery by allowing propagated structural changes across the and adaptability to environmental cues, such as post-translational modifications that tune the conformational landscape. The dynamic nature arises from sequence-encoded propensities, where the distribution of interaction hotspots and folding energies dictate the degree of , often modeled using to predict behaviors. The functional advantages of fuzzy interactions lie in their ability to enable rapid responses to cellular signals through transient, low-affinity contacts that can quickly form and dissociate, contrasting with slower rigid . Additionally, they achieve higher specificity via cumulative weak interactions across disordered regions, reducing the entropic penalty of and allowing fine-tuned , as seen in signaling pathways where promotes versatility without sacrificing precision. Representative examples include transcription factor complexes, such as the c-Myc-Max-DNA assembly, where disordered regions in c-Myc enable dynamic DNA recognition and cooperative binding to regulate gene expression. In viral contexts, the nucleoprotein-phosphoprotein complex of the measles virus exemplifies fuzziness, with disordered tails facilitating viral genome packaging and potentially aiding immune evasion through adaptable interfaces that resist host defenses. Recent advances in characterizing fuzzy complexes have leveraged (NMR) to map and single-molecule techniques, such as energy transfer (), to visualize conformational fluctuations in real time, with significant progress reported since the including database resources like FuzDB for cataloging these interactions.

Composition

Homomultimeric vs Heteromultimeric Complexes

Protein complexes are classified based on the identity of their subunits into homomultimeric and heteromultimeric types. Homomultimeric complexes, also known as homooligomers, consist of multiple identical polypeptide chains derived from the same , enabling through symmetric interactions. In contrast, heteromultimeric complexes, or heterooligomers, are composed of two or more distinct subunit types encoded by different genes, allowing for specialized functional roles within the assembly. This distinction influences the complexity of assembly and the functional versatility of the complex. A classic example of a homomultimeric complex is the Arc repressor, a homodimeric protein in where two identical subunits bind DNA to regulate , relying on symmetric interfaces for stability. For heteromultimeric complexes, III exemplifies the architecture, featuring distinct catalytic (alpha subunit), proofreading (epsilon subunit), and processivity (beta clamp) components that coordinate replication fidelity and efficiency. These examples highlight how subunit homogeneity in homomultimers simplifies interactions, while heterogeneity in heteromultimers enables division of labor. Homomultimeric complexes frequently display high degrees of in their , such as cyclic (C_n), (D_n), or icosahedral arrangements, which minimize energetic costs and maximize interface complementarity during assembly. For instance, many capsids adopt icosahedral with identical coat protein subunits to enclose the efficiently. Heteromultimeric complexes, however, often exhibit lower or pseudo- due to the diverse shapes and functions of subunits, leading to more asymmetric overall architectures that accommodate specific binding sites. This bias in homomultimers arises from the identical nature of subunits, facilitating rapid and stable oligomerization. Evolutionarily, homomultimeric complexes typically emerge from events, where a monomeric protein duplicates, and the paralogs retain self-interaction capabilities, often resulting in symmetric oligomers as a simpler path to functional enhancement. In contrast, heteromultimeric complexes evolve through mechanisms like , which links distinct domains into multi-subunit assemblies, or divergence of duplicated subunits that lose self-interaction while gaining specificity for each other, promoting functional specialization. These patterns underscore how duplication drives homomultimer simplicity, while and co-evolution enable the complexity of heteromultimers.

Essential Proteins

Essential proteins, also known as or indispensable subunits, within protein complexes are those polypeptides whose absence disrupts the overall assembly, stability, or functional activity of the complex. These subunits are critical for maintaining the structural integrity and operational efficacy of the multiprotein assembly, often serving as the foundational elements around which other components organize. For instance, in the , ribosomal proteins in the large subunit, such as uL2 and uL3, are vital for stabilizing the rRNA-based center (PTC), the catalytic responsible for formation during protein ; their depletion leads to impaired . Identification of essential proteins typically involves genetic perturbation techniques like or knockdown combined with proteomic analysis to assess complex integrity. In , genome-scale knockout libraries paired with have revealed that approximately 45% of proteins participating in complexes are essential for cell viability, far exceeding the 19% essentiality rate in the broader , highlighting their disproportionate role in cellular fitness. These methods detect essentiality by monitoring changes in complex stoichiometry or activity post-perturbation, often using to quantify subunit abundances. Essential proteins fulfill diverse roles, including acting as scaffolds to nucleate assembly, providing catalytic active sites, or forming key interfaces for subunit interactions, and they exhibit high evolutionary across due to their contributions to cellular processes. Scaffold proteins, such as those in signaling complexes, organize multiple partners into functional units by offering docking platforms that enhance efficiency and specificity. Catalytic essential subunits, like the beta subunits in the proteasome's , execute proteolytic degradation for protein . Interface providers mediate inter-subunit contacts, ensuring stable architecture, as seen in conserved complexes like the anaphase-promoting complex, where orthologous subunits maintain interaction networks from to humans. This conservation underscores their indispensability, with many essential subunits showing sequence and structural similarity across eukaryotes. The critical nature of essential proteins positions them as prime targets for therapeutic intervention, particularly in disease contexts where complex dysregulation occurs. For example, the beta-5 subunit (PSMB5) of the , an essential catalytic component, is targeted by inhibitors like , which disrupt protein degradation in cancer cells, leading to ; this subunit's conservation across species facilitates selective inhibition in pathogens as well. Such targeting exploits the vulnerability of complexes reliant on these subunits, minimizing off-target effects in host cells.

Intragenic Complementation

Intragenic complementation is a genetic observed in multimeric proteins, where two different alleles of the same can restore partial or full function when co-expressed, due to the assembly of complexes from the defective subunits. This occurs primarily in homomultimeric proteins, where intersubunit interactions allow one subunit to compensate for the defect in another, often by masking structural flaws or reconstituting catalytic sites at subunit interfaces. The mechanism relies on random mixing of subunits during assembly, enabling the formation of hybrid oligomers that exhibit higher activity than homooligomers of either mutant alone. A key requirement for intragenic complementation is that the mutations affect distinct functional domains or surfaces within the , without severely disrupting overall folding or oligomerization. This is prevalent in oligomeric enzymes, where the spans multiple subunits, allowing compensatory interactions to bypass individual defects. For instance, in human argininosuccinate lyase (ASL), a homotetrameric essential for the , complementation between alleles with mutations in the amino-terminal (affecting oligomerization) and carboxy-terminal (affecting ) regions restores enzymatic activity by stabilizing the tetramer and reforming the interface-based . The phenomenon was historically elucidated in the 1950s through Seymour Benzer's complementation studies on the rII locus of bacteriophage T4, which demonstrated how mutations within a could interact to influence protein function, providing early insights into structures in viral proteins. These experiments, using cis-trans tests on phage plaques, revealed non-recessive behaviors in multimeric contexts, paving the way for recognizing intragenic effects in oligomeric assemblies. Subsequent work in the on fungal enzymes, such as xanthine dehydrogenase in , confirmed the role of subunit mixing in complementation. Intragenic complementation serves as a powerful tool in genetic analysis to dissect quaternary structures of protein complexes, identifying critical intersubunit contacts and functional modularity. By testing pairs of mutants for restored activity, researchers can map interaction domains and predict dominance patterns, aiding in the study of enzyme architecture and mutation effects in diseases like urea cycle disorders.

Structure Determination

Experimental Techniques

Experimental techniques for determining the structures of protein complexes primarily rely on biophysical methods that provide atomic-level insights into their , interactions, and . These approaches have been instrumental in elucidating the organization of large macromolecular assemblies, such as and chaperonins, enabling a deeper understanding of their biological functions.01423-8) remains a cornerstone for obtaining high-resolution atomic models of protein complexes, particularly those that can be crystallized. This technique involves growing crystals of the complex, exposing them to X-rays, and analyzing the patterns to reconstruct the three-dimensional structure. Seminal applications include the determination of structures in the early 2000s, where resolutions reached approximately 3 for the 30S subunit, revealing key RNA-protein interactions and binding sites. Since then, advancements in sources and phasing methods have allowed structures of even larger complexes, like the 70S at up to 2.1 resolution, to be solved, providing precise details on catalytic sites and conformational changes. Cryo-electron microscopy (cryo-EM) has revolutionized the structural analysis of large and heterogeneous protein complexes, especially those resistant to . Samples are flash-frozen in vitreous , imaged under beams, and computationally reconstructed into density maps. The 2017 recognized , , and Richard Henderson for developing this method, which has achieved resolutions better than 3 Å for numerous complexes by the 2020s, including the nuclear pore complex at 3.2 Å and the at 2.5 Å. This technique excels at capturing native-like states and flexibility in megadalton-scale assemblies, such as viral capsids and supercomplexes.00332-5) Nuclear magnetic resonance (NMR) is particularly suited for studying smaller protein complexes (typically <100 ) or dynamic regions within larger ones, offering insights into solution-state conformations and transient s. By measuring nuclear spin s in a , NMR provides residue-specific information on flexibility and interfaces. For instance, it has characterized fuzzy complexes involving intrinsically disordered regions, such as the p53-MDM2 , where dynamic ensembles reveal multivalent modes at resolution. Recent solid-state NMR extensions have probed larger assemblies, like amyloid fibrils, at resolutions approaching 1 Å for rigid domains.31756-6) Despite their strengths, these techniques face limitations that often necessitate complementary approaches. X-ray crystallography requires high-quality crystals, which can be challenging for flexible or membrane-embedded complexes, potentially trapping non-native conformations. Cryo-EM, while accommodating heterogeneity, demands substantial sample quantities and computational resources for , with resolutions sometimes limited by beam-induced motion in sensitive samples. NMR is constrained by molecular size and requires , making it less ideal for very large or insoluble complexes. These challenges highlight the value of integrating experimental data with computational validation for comprehensive structural insights.01423-8)31756-6)

Computational Methods

Computational methods for predicting and modeling protein complex structures rely on algorithms that simulate interfaces, predict assemblies from sequences, and analyze , often integrating from multiple sources to generate testable hypotheses. These approaches enable the exploration of complex formation without direct experimentation, though they are typically validated against structures from techniques like or cryo-EM. Key tools include docking algorithms for interface prediction, models for structure prediction, simulations for post-assembly refinement, and specialized for reference and training . Protein-protein docking algorithms computationally predict the 3D arrangement of interacting proteins by sampling possible binding orientations and scoring them based on biophysical criteria such as shape complementarity, electrostatics, and desolvation effects. Rigid-body methods, which treat proteins as inflexible during the initial search, are foundational; for instance, ZDOCK employs a fast transform-based search to generate thousands of potential poses, followed by scoring with an energy function that rewards contacts and penalizes steric clashes. This approach has demonstrated success in blind challenges, achieving top-ranked predictions for unbound complexes with ligand root-mean-square deviations below 10 in critical benchmarks. More advanced variants incorporate partial flexibility through ensemble or post-search refinement, improving accuracy for challenging cases like antibody-antigen interactions. Deep learning-based predictors have markedly advanced the field by enabling accurate complex structure prediction directly from protein sequences, bypassing the need for individual monomer structures. AlphaFold-Multimer, developed by DeepMind and released in 2021, extends the AlphaFold2 architecture to handle multiple chains by jointly modeling intra- and inter-protein interactions during the Evoformer and structure module stages, leveraging multiple sequence alignments to infer coevolutionary signals at interfaces. On diverse benchmarks, it achieves median interface root-mean-square deviations under 4 Å for over 60% of heteromeric dimers and outperforms traditional docking for many cases, particularly when experimental monomer structures are unavailable. Subsequent refinements, such as improved multiple sequence alignment strategies, have further boosted performance for larger assemblies up to decamers. More recent developments, like AlphaFold 3 released in 2024, have enhanced multimer predictions by incorporating small molecules and nucleic acids, achieving even higher accuracy across diverse biomolecular complexes. Molecular dynamics (MD) simulations provide insights into the temporal evolution and stability of predicted or determined protein complexes by propagating atomic trajectories under classical force fields that account for bonded and non-bonded interactions. , an open-source software package optimized for , facilitates these simulations through efficient algorithms like particle-mesh Ewald for long-range and for constraint handling, enabling routine studies of complex dynamics on microsecond timescales for systems with hundreds of thousands of atoms. In protein complex research, MD refines docked models by revealing transient fluctuations at interfaces, quantifying binding free energies via techniques like , and identifying allosteric effects that stabilize assemblies. Databases underpin these computational pipelines by supplying curated structural and interaction data for model training, benchmarking, and hypothesis generation. The Protein Data Bank (PDB), maintained by the Worldwide Protein Data Bank consortium, archives over 200,000 experimentally derived 3D structures of protein complexes as of 2022 (over 250,000 as of 2025), including atomic coordinates and validation metrics essential for assessing prediction accuracy. The STRING database compiles physical and functional protein-protein associations from literature, experiments, and computational predictions across thousands of organisms, with confidence scores derived from orthogonal evidence to prioritize likely complex partners. Complementarily, the CORUM database offers a manually curated inventory of over 4,000 mammalian protein complexes as of 2022 (7,193 as of 2024), detailing subunit stoichiometries, functions, and disease associations to support targeted modeling efforts.

Assembly

Mechanisms of Formation

Protein complexes often assemble via a step, in which a of subunits forms an initial that serves as a scaffold for subsequent additions, followed by stepwise incorporation of remaining components to achieve the mature structure. This ordered process minimizes kinetic traps and off-target interactions, as evidenced in the assembly of the 20S proteasome core particle, where the outer α-ring nucleates first before templating the sequential addition of β-subunits, starting with β2, followed by β3, β4, β5, and β6, with β1 incorporating at variable stages and β7 added last to complete the structure. Recent cryo-EM studies have provided detailed views of these intermediates, highlighting the roles of assembly chaperones like POMP in guiding β-subunit incorporation. Chaperonins like exemplify this in their own oligomeric formation and substrate assistance, where monomeric GroEL units assemble into rings that facilitate stepwise encapsulation and folding of client proteins. Molecular chaperones are integral to these assembly mechanisms, binding transiently to hydrophobic regions of folding intermediates to prevent aggregation and guide productive interactions. The family, for example, plays a key role in by associating with nascent polypeptides on the , shielding them from misfolding and promoting their integration into pre-ribosomal complexes, as seen with the Ssb chaperone that interacts cotranslationally to support 40S subunit maturation. Similarly, chaperonins provide a protected environment for stepwise subunit addition in multi-protein assemblies by sequestering unfolded chains within their cavity until competent for association. These chaperone interventions ensure high-fidelity assembly without permanent incorporation into the final complex. Assembly of dynamic protein complexes frequently requires energy input from triphosphate hydrolysis to power conformational rearrangements and subunit dynamics. ATP hydrolysis drives the /GroES cycle, where binding of seven ATP molecules per ring induces allosteric expansion of the folding chamber, enabling substrate release and recycling for iterative assembly steps. In GTP-dependent systems, such as septin ring formation during , hydrolysis modulates interface affinities, allowing initial via NC interfaces before GTP-triggered shifts promote stepwise elongation into ordered filaments. These energy-dependent steps confer reversibility and adaptability to cellular needs. To maintain , quality control systems actively disassemble and degrade aberrant complexes through ubiquitin-mediated . The ubiquitin-proteasome pathway targets faulty assemblies by recognizing exposed degrons on unassembled subunits, such as in the COG complex where the E3 ligase Not4 ubiquitylates free Cog1 for degradation, or in where excess subunits are cleared to prevent toxic buildup. This selective disassembly, often chaperone-assisted, ensures only functional complexes persist, with evolutionary conservation underscoring its fundamental role in cellular .

Evolutionary Significance

Protein complexes have evolved primarily through gene duplication events, beginning with the formation of homomultimeric structures from ancient homomeric interactions. Duplication of genes encoding homomeric proteins frequently results in paralogous subunits that assemble into complexes, with approximately 30% of known protein complexes in yeast and the Protein Data Bank featuring such duplicated subunits. These homomultimers represent ancestral cores that are highly conserved across species, providing a foundation for subsequent evolutionary diversification. Later in evolution, heteromultimeric complexes arose through mechanisms like gene fusions, where separate genes merge into a single open reading frame, optimizing subunit interactions and assembly order. Evidence from genomic analyses shows that about 3.7% of heteromeric subunit pairs are linked to such fusion events, with fusions often conserving the sequential assembly pathways of the original components. The adaptive advantages of protein complexes include enhanced functional innovation and increased robustness to genetic perturbations. By incorporating moonlighting proteins—those capable of performing multiple independent functions—complexes enable regulatory mechanisms and allosteric control, allowing subunits to contribute to diverse cellular processes without compromising primary roles. For instance, followed by in multimeric assemblies can generate heterodimers from homodimers, fostering entirely new functionalities such as substrate channeling or . Regarding robustness, multimeric structures provide compensatory buffering against mutations, where defects in one subunit can be mitigated by others, thereby maintaining complex and activity under fluctuating conditions. This resilience is particularly evident in models, where unilateral fluctuations in subunit concentrations are tolerated more effectively than in monomeric proteins. Core protein complexes exhibit remarkable across the domains of life, underscoring their ancient origins. The 20S , a key proteolytic complex, is structurally preserved in , eukaryotes, and select like actinobacteria, with its cylindrical architecture of α- and β-subunits tracing back to early cellular . While not universally present in all —suggesting possible or domain-specific retention—its presence in and Eukarya indicates an origin predating the divergence of these lineages. Eukaryotes have seen recent expansions in complex diversity, with increased subunit specialization and regulatory layers, such as additional activators, enhancing proteolytic specificity compared to simpler prokaryotic versions. Recent metagenomic studies have illuminated the evolutionary of protein complexes in microbial ecosystems, revealing adaptations in uncultured lineages. Analyses of environmental microbiomes from onward have identified novel, highly divergent protein families forming complexes involved in metabolic pathways, such as those for processing in habitats, expanding our view of pre-LUCA-like innovations in prokaryotic . These findings highlight how and lineage-specific duplications drive complex evolution in microbes, contributing to functional plasticity across global microbial communities.

References

  1. [1]
    Protein complexes and functional modules in molecular networks
    Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Examples of ...Methods · Results · DiscussionMissing: review | Show results with:review
  2. [2]
    Interrogation of Mammalian Protein Complex Structure, Function ...
    Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment.
  3. [3]
  4. [4]
    Protein Complexes - an overview | ScienceDirect Topics
    A protein complex is defined as a group of polypeptide chains linked by noncovalent protein-protein interactions, which play essential roles in biological ...
  5. [5]
    Noncovalent interactions in proteins and nucleic acids
    May 13, 2022 · Noncovalent interactions include hydrogen bonding, π-stacking, halogen, chalcogen, pnictogen, tetrel, carbo-hydrogen, spodium bonding, n → π*, ...
  6. [6]
    A guide to studying protein aggregation - Housmans - FEBS Press
    Dec 4, 2021 · Protein aggregation is best known for its association with a wide range of human diseases, in which they predominantly occur as amyloid fibrils.Protein thermodynamics · Aggregation prediction tools · Methods for investigating...
  7. [7]
    Molecule of the Month: Hemoglobin - PDB-101
    Hemoglobin is the protein that makes blood red. It is composed of four protein chains, two alpha chains and two beta chains, each with a ring-like heme group ...
  8. [8]
  9. [9]
    Tools used to study how protein complexes are assembled in ...
    Proteins have enzymatic, structural and regulatory functions and when they bind to other proteins they establish highly regulated functional protein complexes.
  10. [10]
    the comprehensive resource of mammalian protein complexes–2022
    Nov 16, 2022 · In the CORUM dataset, a substantial proportion of proteins (30%) are found as subunit of more than one protein complex. Proteins such as ...Abstract · INTRODUCTION · RESULTS AND DISCUSSION · CONCLUSIONS AND...
  11. [11]
    Integration of over 9,000 mass spectrometry experiments builds a ...
    MAP, the most comprehensive and accurate human protein complex map to date, containing > 4,600 total complexes, > 7,700 proteins, and > 56,000 unique ...
  12. [12]
  13. [13]
    G protein-coupled receptors (GPCRs): advances in structures ...
    Apr 10, 2024 · GPCRs are conformationally dynamic proteins that mediate vital biological functions of signal transduction triggered by various extracellular ...
  14. [14]
  15. [15]
    Targeted protein degradation: mechanisms, strategies and application
    Apr 4, 2022 · Protein degradation via the ubiquitin-proteasome system (UPS). Proteins undergo ubiquitin-dependent degradation by a suite of three enzymes.
  16. [16]
    Engineered repeat proteins as scaffolds to assemble multi-enzyme ...
    May 4, 2023 · Multi-enzymatic cascades with enzymes arranged in close-proximity through a protein scaffold can trigger a substrate channeling effect, ...Missing: advantage | Show results with:advantage
  17. [17]
    On the binding affinity of macromolecular interactions: daring to ask ...
    Feb 6, 2013 · QS was first used to designate obligate complexes, such as haemoglobin ... A plethora of non-obligate protein–protein complexes have been ...
  18. [18]
    Diversity of protein–protein interactions | The EMBO Journal
    In an obligate PPI, the protomers are not found as stable structures on their own in vivo. Such complexes are generally also functionally obligate; for example, ...Missing: biophysical criteria
  19. [19]
  20. [20]
    Predicting Permanent and Transient Protein-Protein Interfaces - NIH
    Dissociation constants of strongly permanent complexes are typically determined to be in the nM range (1×10-9 M) or lower, while transient complexes commonly ...
  21. [21]
  22. [22]
    An Assessment of Quaternary Structure Functionality in Homomer ...
    Mar 22, 2023 · Heteromers are complexes formed by different proteins, while homomers are complexes formed by multiple units of the same protein.Missing: homomultimeric heteromultimeric
  23. [23]
    NEW EMBO MEMBER'S REVIEW: Diversity of protein–protein ... - NIH
    In this review, we discuss the structural and functional diversity of protein–protein interactions (PPIs) based primarily on protein families.Missing: homomultimeric heteromultimeric
  24. [24]
    Protein Complexes: The Evolution of Symmetry - ScienceDirect.com
    Jan 13, 2009 · Most proteins form symmetric, multimeric complexes. Modeling shows that a strong prevalence for symmetry among stable structures can account for this bias.
  25. [25]
    Evolution of protein complexes by duplication of homomeric ...
    Duplication of homomeric interactions often results in the formation of complexes of paralogous proteins, a common mechanism for protein complex evolution.
  26. [26]
    The Evolution of Multimeric Protein Assemblages - PMC - NIH
    By physically tying two loci together, gene fusion provides still another powerful way to facilitate mutual adhesion between two protein domains, with ...
  27. [27]
    Ribosomal proteins and human diseases: molecular mechanisms ...
    Aug 30, 2021 · The ribosomal proteins (RPs), comprising the structural parts of the ribosome, are essential for ribosome assembly and function. In addition to ...
  28. [28]
    Frontiers | Proteomic Investigations of Complex I Composition
    May 23, 2012 · It is generally assumed that a subunit is essential for either the presence or the function (or both) of the complex. However, not all proteins ...<|separator|>
  29. [29]
    The proteomic landscape of genome-wide genetic perturbations
    Apr 27, 2023 · We combined functional genomics with proteomics by quantifying protein abundances in a genome-scale knockout library in Saccharomyces cerevisiae.
  30. [30]
    From Hub Proteins to Hub Modules: The Relationship Between ...
    We begin by observing that complexes as a whole are enriched in essential proteins. In particular, whereas 18.60% (or ) of proteins are essential in the yeast ...
  31. [31]
    The Proteasome in Modern Drug Discovery: Second Life of a Highly ...
    Aug 7, 2017 · Most proteasome inhibitors target the ChTL β5-subunit because inhibition of β5 results in the greatest reduction of protein breakdown rates, ...
  32. [32]
    Microbial proteasomes as drug targets - PMC - PubMed Central
    Dec 9, 2021 · Only β subunits are proteolytically active in prokaryote proteasomes, and only β1, β2, and β5 are proteolytically active in eukaryotic ...
  33. [33]
    Human argininosuccinate lyase: A structural basis for intragenic ...
    Intragenic complementation is a phenomenon that occurs when a multimeric protein is formed from subunits produced by different mutant alleles of a gene.
  34. [34]
    Towards a model to explain the intragenic complementation in the ...
    Intragenic complementation is a phenomenon that occurs when a multimeric protein is formed from subunits produced by two differently mutated alleles of a gene.
  35. [35]
    [PDF] Seymour Benzer and T4 rII
    Benzer (1966) states: “if the phage genome were assumed to be one long thread of DNA with uniform probability of recombination per unit length, the resolving ...Missing: intragenic complexes
  36. [36]
    Complementation - WormBook - NCBI Bookshelf - NIH
    Oct 6, 2005 · During intragenic complementation, alleles of the same gene complement one another, even though both alleles produce a faulty gene product.
  37. [37]
    Press release: The 2017 Nobel Prize in Chemistry - NobelPrize.org
    Oct 4, 2017 · The Nobel Prize in Chemistry 2017 is awarded to Jacques Dubochet, Joachim Frank and Richard Henderson for the development of cryo-electron microscopy.Missing: improvements 2020s
  38. [38]
    NMR insights into dynamic, multivalent interactions of intrinsically ...
    Dec 16, 2022 · Here, we discuss how NMR has facilitated the characterization of these discrete, dynamic complexes and how such characterization has aided the understanding of ...
  39. [39]
    The Nobel Prize in Chemistry 2017 - Popular information
    Using cryo-electron microscopy, researchers can now freeze biomolecules midmovement and portray them at atomic resolution. This technology has taken ...Missing: 2020s | Show results with:2020s
  40. [40]
    Protein-Protein Docking: From Interaction to Interactome - PMC - NIH
    Protein-protein docking is the prediction of the structure of the complex, given the structures of the individual proteins.
  41. [41]
    Benchmarking of different molecular docking methods for protein ...
    Feb 4, 2019 · ZDOCK performed better than other docking methods for the top pose as well as for the best pose; followed by Hex. In the case of ZDOCK, L-RMSD ...
  42. [42]
    From Traditional Methods to Deep Learning Approaches: Advances ...
    Mar 24, 2025 · We systematically review the historical development of protein–protein docking from traditional approaches to DL techniques and provide insights into emerging ...
  43. [43]
    Protein complex prediction with AlphaFold-Multimer - bioRxiv
    Mar 10, 2022 · In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold- ...
  44. [44]
    Predicting the structure of large protein complexes using AlphaFold ...
    Oct 12, 2022 · The only deep learning method primarily designed to predict the structure of more than two protein chains is AlphaFold-multimer. This method has ...
  45. [45]
    Improved protein complex prediction with AlphaFold-multimer by ...
    Structure prediction of protein complexes has improved significantly with AlphaFold2 and AlphaFold-multimer (AFM), but only 60% of dimers are accurately ...
  46. [46]
    GROMACS: High performance molecular simulations through multi ...
    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules.
  47. [47]
    Introductory Tutorials for Simulating Protein Dynamics with GROMACS
    Sep 21, 2024 · Here, we present a set of introductory tutorials for performing MD simulations of proteins in the popular, open-source GROMACS package.Background and Theory · Exercises · Concluding Remarks · Conclusions
  48. [48]
    Protein Data Bank: A Comprehensive Review of 3D Structure ...
    At the time of writing, PDB holdings numbered nearly 200,000 experimentally-determined three-dimensional (3D) structures of proteins and nucleic acids (DNA and ...
  49. [49]
    STRING database in 2023: protein–protein association networks ...
    Nov 12, 2022 · The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as ...
  50. [50]
    Stepwise order in protein complex assembly - PubMed Central - NIH
    Jan 15, 2025 · One aspect that is particularly understudied is assembly order, the idea that there is a stepwise order to the subunit–subunit associations that ...
  51. [51]
  52. [52]
    The cotranslational function of ribosome-associated Hsp70 in ...
    Hsp70 is perhaps the major eukaryotic ribosome-associated chaperone and the first reported to bind cotranslationally to nascent chains. However, little is known ...
  53. [53]
    Review Quality control of protein complex assembly by the ubiquitin ...
    We review recent findings on how E3 ubiquitin ligases regulate protein complex assembly and highlight unanswered questions relating to their mechanism of ...Missing: faulty | Show results with:faulty
  54. [54]
  55. [55]
  56. [56]
    Cooperative stability renders protein complex formation more robust ...
    Jun 21, 2022 · Protein complexes serve as functional units in a wide range of cellular processes. In stable complexes, subunits are assembled at a fixed ratio, ...Results · Methods · Heterodimer
  57. [57]
  58. [58]
    New groups of highly divergent proteins in families as old as cellular ...
    Jun 11, 2025 · Metagenomics has considerably broadened our knowledge of microbial diversity, unravelling fascinating adaptations and characterising multiple ...Oceanic Metagenomes Harbour... · Discussion · Materials & Methods<|separator|>