Fact-checked by Grok 2 weeks ago

Protein engineering

Protein engineering is the design and construction of new or modified proteins with desired structural, functional, or stability properties through the manipulation of their sequences, typically using technology, , , or computational design approaches. The field emerged in the late alongside advances in and , with a pivotal milestone being the 1982 FDA approval of recombinant human insulin (Humulin), the first protein therapeutic produced via engineered , which overcame limitations of animal-derived insulins such as and supply constraints. Earlier roots trace to the 1890s with the use of animal-derived antibodies for treatment, but recombinant technologies enabled scalable production and precise modifications. By the 1990s, techniques like allowed targeted alterations to protein structures, while introduced random mutation libraries screened for improved traits, accelerating the field's growth into a cornerstone of . Key methods in protein engineering include rational design, which relies on structural knowledge from or NMR to predict and introduce specific mutations for enhanced activity or stability; directed evolution, involving iterative cycles of random mutagenesis, recombination (e.g., ), and to evolve proteins without prior structural data; and computational protein design, which uses algorithms and molecular modeling to de novo create sequences that fold into target structures. Additional chemical strategies encompass to extend circulation half-life by attaching chains, Fc fusion to leverage antibody recycling via the neonatal Fc receptor, and glycoengineering to alter patterns for improved . Emerging approaches integrate and for predicting and optimizing designs, as seen in tools like for structure prediction and recent advancements such as 3 for multi-modal predictions. Applications of protein engineering span therapeutics, industrial biocatalysis, and biosensors, with over 400 approved protein-based drugs as of 2025 generating a global market exceeding $440 billion annually (as of 2024). In medicine, engineered proteins treat conditions like (e.g., long-acting insulin analogs such as glargine via site-specific mutations), cancer (e.g., antibody-drug conjugates like Kadcyla, linking antibodies to cytotoxins for targeted delivery), and autoimmune diseases (e.g., , a TNF receptor ). Industrially, engineered enzymes enhance production by improving catalytic efficiency in non-aqueous environments and enable sustainable by replacing harsh catalysts. In research, stimulus-responsive proteins serve as smart drug systems for controlled release and biosensors for detecting toxins, with ongoing innovations focusing on designs for novel functions like virus-mimicking nanoparticles.

Overview

Definition and Principles

Protein engineering is the deliberate modification of a protein's sequence to achieve desired structural, functional, or stability enhancements, typically through techniques such as recombinant DNA technology, , and computational modeling. This process allows for the creation of novel proteins that may not occur naturally, by altering the genetic instructions that encode them. At its foundation, protein engineering relies on the principle that a protein's primary of determines its folding into secondary structures (like alpha-helices and beta-sheets) and structures, which in turn govern its , such as enzymatic activity or molecular recognition. Strategic substitutions can fine-tune these properties; for instance, replacing a polar residue with a hydrophobic one may increase by strengthening hydrophobic cores, while changes at active sites can enhance catalytic efficiency or substrate specificity. These interventions exploit the intimate link between , , and to optimize performance metrics like binding affinity or resistance to environmental stressors. Proteins arise through biosynthesis, a cellular process where messenger RNA (mRNA), transcribed from DNA, is translated by ribosomes according to the genetic code—a universal set of 64 codons that specify 20 standard amino acids or stop signals. In natural evolution, genetic variations arise randomly via mutations and are selected over generations for adaptive advantages, gradually refining protein functions in response to environmental pressures. Protein engineering, by contrast, accelerates and directs this variation using human-guided methods to introduce precise changes, bypassing the slow pace of natural selection.00064-8) This field holds profound importance by enabling the design of proteins with tailored properties unattainable through natural means, revolutionizing for applications like engineered enzymes in industrial catalysis, therapeutic proteins for disease treatment, and sustainable materials in . Such innovations address challenges in , such as developing more effective biologics, and in , where stable biocatalysts reduce reliance on chemical processes.

Historical Development

The foundations of protein engineering emerged in the with the discovery of restriction enzymes, which enabled precise manipulation of DNA and laid the groundwork for technology. In 1970, Hamilton O. Smith identified the first restriction endonuclease from , allowing scientists to cut DNA at specific sites, a breakthrough shared in the 1978 in or with and . This tool facilitated the creation of the first recombinant proteins, exemplified by Genentech's production of human insulin in 1978 using to express the A and B chains separately, marking the debut of genetically engineered therapeutic proteins. Concurrently, was developed by in 1978, introducing targeted mutations into DNA via oligonucleotide hybridization, a method that earned him the 1993 (shared with for ). The 1990s saw a paradigm shift toward evolutionary approaches, with Frances Arnold pioneering directed evolution in 1993 by randomly mutating the subtilisin E gene and screening variants for enhanced activity in organic solvents, earning her half of the 2018 Nobel Prize in Chemistry. This technique mimicked natural selection in vitro, accelerating protein optimization beyond rational design limitations. Complementing this, Willem P.C. Stemmer introduced DNA shuffling in 1994, a recombination method that fragmented and reassembled related genes to generate diverse libraries, significantly boosting evolutionary efficiency. In the 2000s and 2010s, computational tools transformed the field, with the Rosetta software suite, developed by David Baker's laboratory starting in the late 1990s, enabling de novo protein design by sampling conformational spaces to predict stable folds and sequences. Homology modeling advanced alongside the exponential growth of the Protein Data Bank (PDB), which expanded from about 3,000 structures in 1995 to over 110,000 by the end of 2015, providing richer templates for predicting structures of uncrystallized proteins. High-throughput evolution scaled up with phage-assisted continuous evolution (PACE), introduced by David R. Liu in 2011, which linked protein function to bacteriophage replication for rapid, continuous variant selection. The 2020s integrated , with DeepMind's achieving unprecedented accuracy in structure prediction during the 2020 CASP14 competition and releasing models for nearly all known proteins in 2021, revolutionizing engineering by providing atomic-level blueprints without experimental determination. Key milestones include the 1978 Nobel for restriction enzymes enabling , the 1993 award for , the 2018 prize for and , and the 2024 Nobel in Chemistry for computational protein design (Baker) and AI-driven prediction ( and John Jumper).

Fundamental Concepts

Protein Structure and Stability

Proteins exhibit a hierarchical organization of structure that dictates their function and stability, comprising four distinct levels. The primary structure refers to the linear sequence of linked by bonds, which serves as the foundational blueprint determining all higher-order arrangements. Secondary structure arises from local folding patterns stabilized by hydrogen bonds between backbone atoms, primarily forming alpha helices and beta sheets. Tertiary structure represents the overall three-dimensional conformation achieved through interactions among side chains, while quaternary structure involves the assembly of multiple polypeptide subunits into a functional complex, as seen in . This structural hierarchy ensures that proteins can perform specific biological roles, but disruptions at any level can compromise stability. Protein stability is maintained by a of non-covalent and covalent interactions that favor the native folded state over unfolded conformations. The hydrophobic core, formed by burial of non-polar residues away from aqueous solvent, provides the primary driving force for folding through the , which minimizes unfavorable water-hydrocarbon contacts. Hydrogen bonds between polar groups further stabilize secondary and tertiary elements, while bridges—covalent bonds between residues—enhance rigidity, particularly in extracellular proteins. Salt bridges, or ionic interactions between oppositely charged side chains, contribute to electrostatic stabilization, though their net effect can vary with solvent exposure. These factors collectively lower the of the folded state, enabling proteins to resist denaturation under physiological conditions. The thermodynamics of protein folding is governed by the Gibbs free energy change, where the folded state is thermodynamically favored when ΔG < 0. This is expressed as: \Delta G = \Delta H - T \Delta S Here, ΔH represents the enthalpy change from interactions like hydrogen bonding and van der Waals forces, T is the absolute temperature, and ΔS is the entropy change, which includes the entropic cost of restricting chain flexibility offset by solvent entropy gains from hydrophobic burial. Proteins typically fold with marginal stability, where ΔG_folding ranges from -5 to -15 kcal/mol, making them sensitive to environmental perturbations. Denaturation curves, obtained from techniques like circular dichroism or differential scanning calorimetry, plot stability as a function of temperature or denaturant concentration, revealing a cooperative unfolding transition. The melting temperature (T_m), defined as the midpoint of this transition where half the protein is unfolded, serves as a key metric of thermal stability, often ranging from 40–80°C for mesophilic proteins. In natural systems, molecular chaperones play a crucial role in enhancing protein stability by preventing misfolding and aggregation during synthesis or stress. These proteins, such as Hsp70 and GroEL, bind exposed hydrophobic regions in nascent or unfolded polypeptides, providing a protected environment for correct folding and inhibiting off-pathway associations. Chaperone activity is essential for maintaining proteostasis, particularly in crowded cellular environments where unfolded proteins risk irreversible aggregation. In protein engineering, understanding these structural and stability principles guides targeted modifications to improve folding efficiency and resilience. Mutations that improve packing in the hydrophobic core can enhance stability and increase T_m by 5–10°C without altering function. Conversely, destabilizing mutations, often involving charged residue introductions in the core, can disrupt folding pathways and promote partial unfolding. A common instability issue in engineered proteins is aggregation into amyloid-like fibrils, where exposed hydrophobic surfaces lead to β-sheet-rich assemblies that impair solubility and activity; for instance, mutations in have been used to stabilize oligomeric forms for studying neurodegenerative diseases, highlighting the need to mitigate such propensities through surface charge engineering. These biophysical insights underscore the importance of balancing stability enhancements with functional preservation in design strategies.

Genetic Basis of Protein Variation

The central dogma of molecular biology describes the flow of genetic information from DNA to messenger RNA (mRNA) and subsequently to proteins, where DNA serves as the template for transcription into mRNA, which is then translated into amino acid sequences during protein synthesis. This unidirectional transfer ensures that genetic instructions encoded in nucleotide sequences are converted into functional polypeptides, forming the basis for protein diversity. The genetic code, comprising 64 possible triplets of nucleotides (codons), specifies 20 standard amino acids and three stop signals, with redundancy known as degeneracy allowing multiple codons to encode the same amino acid. This degeneracy arises because most amino acids are represented by two to six synonymous codons, which differ primarily in the third nucleotide position, enabling variations in DNA sequence without altering the protein product. Such flexibility in the code underpins natural and engineered protein variation by permitting sequence changes that can influence translation efficiency or protein properties. Genetic mutations introduce diversity at the nucleotide level, with point mutations being the most common, where a single base substitution can be synonymous (no amino acid change) or nonsynonymous (resulting in a different amino acid, such as missense mutations that alter side chain properties). Insertions or deletions (indels) of nucleotides not in multiples of three cause frameshift mutations, shifting the reading frame and often leading to truncated or aberrant proteins with altered downstream sequences. These alterations can disrupt protein function, stability, or interactions, though some may confer adaptive advantages. In natural populations, single nucleotide polymorphisms (SNPs) and other polymorphisms represent common forms of genetic variation, with nonsynonymous SNPs potentially changing amino acid sequences and contributing to protein diversity across individuals or species. For instance, SNPs occurring at rates of about 1 per 1,000 bases in humans can lead to subtle functional differences in proteins, influencing traits or disease susceptibility. In protein engineering, codon bias— the preferential use of certain synonymous codons in highly expressed genes—serves as a key entry point for designing synthetic genes to optimize expression in heterologous systems, such as replacing rare codons in Escherichia coli to avoid translational pauses and enhance yield. This optimization accounts for host-specific tRNA availability, improving protein production without changing the amino acid sequence. Additionally, the baseline fidelity of DNA replication, with error rates around $10^{-9} per base pair due to proofreading mechanisms, provides a natural limit for mutagenesis strategies in engineering diverse protein libraries. Mutations from this low error rate can subtly alter protein folding and stability, as explored in related structural analyses.

Engineering Approaches

Rational Design

Rational design in protein engineering involves hypothesis-driven modifications to protein sequences based on established structure-function relationships, aiming to predict and implement targeted changes that alter specific properties such as stability, activity, or specificity. This approach contrasts with by relying on prior knowledge of the protein's atomic structure and evolutionary conservation to guide minimal alterations, typically involving few variants rather than large libraries. The process emphasizes precision to avoid unintended disruptions, making it suitable for well-characterized proteins where detailed mechanistic insights are available. The core strategy proceeds through sequential steps: first, structural modeling of the target protein using computational tools to visualize key regions like active sites or binding interfaces; second, prediction of beneficial mutations by analyzing how changes might stabilize interactions or reposition residues; and third, experimental validation of the designed variants through biophysical assays and structural confirmation. For instance, molecular dynamics simulations or energy minimization can forecast mutation impacts on folding or catalysis before synthesis. This iterative cycle allows refinement based on empirical data, ensuring modifications align with the protein's functional goals. Key tools in rational design include sequence alignments to identify conserved residues critical for function, which inform mutation choices by highlighting positions tolerant to change. Structural analysis via X-ray crystallography provides high-resolution atomic coordinates to map interaction networks, while nuclear magnetic resonance (NMR) spectroscopy reveals dynamic aspects in solution, both essential for pinpointing mutable sites without compromising overall fold. These methods enable designers to target specific motifs, such as catalytic triads in enzymes, for precise engineering. A representative application is site-directed mutagenesis to tweak active site residues in proteases, exemplified by engineering to alter substrate specificity. In this case, mutations at positions 156 and 166—replacing glutamate with glutamine or serine—shifted preference toward oppositely charged substrates at the P1 position, increasing catalytic efficiency (k_cat/K_m) up to 1900-fold for complementary pairs while decreasing it for mismatched ones, demonstrating control over electrostatic interactions in the binding pocket. This seminal work established rational design's potential for tailoring enzyme selectivity, influencing subsequent efforts in . Rational design offers high precision for targeted outcomes, often achieving functional improvements with small numbers of variants, but it demands extensive prior knowledge of the protein's structure and mechanism, limiting applicability to novel or poorly understood targets. Success rates for single mutations typically range from 10-50%, depending on the complexity of the desired change, as unpredictable long-range effects can reduce efficacy compared to more exploratory methods.

Directed Evolution

Directed evolution is a powerful protein engineering strategy that emulates Darwinian natural selection in vitro to enhance or confer novel functions on proteins, particularly when structural or mechanistic details are insufficient for rational design. The process begins with a starting gene encoding a protein of interest, often a natural enzyme or one modestly improved via rational approaches, followed by the generation of genetic diversity to create a library of variants. These variants are expressed in host cells or cell-free systems, and high-throughput screening or selection identifies those exhibiting superior performance under imposed conditions, such as altered temperature, pH, or substrate specificity. The cycle of diversification, expression, and selection is repeated iteratively, typically 3–10 rounds, until variants with substantially improved properties emerge, enabling optimization across rugged fitness landscapes that are challenging to navigate predictively. Genetic diversity is primarily generated through random mutagenesis techniques, such as error-prone polymerase chain reaction (), which employs biased nucleotide incorporation by DNA polymerases like Taq under conditions of imbalanced dNTPs or added Mn²⁺ to achieve a controlled mutation rate of approximately 10⁻³ to 10⁻⁴ errors per base pair, yielding libraries with 1–3 amino acid substitutions per protein on average. This randomness introduces point mutations that can beneficially alter protein folding, active sites, or interactions without requiring prior knowledge of the structure. Complementarily, recombination methods like DNA shuffling fragment and reassemble related homologous genes, facilitating the combination of distant beneficial mutations into single variants and accelerating functional gains beyond what point mutagenesis alone can achieve; for instance, shuffling β-lactamase homologs increased antibiotic resistance over 300-fold in three generations. To impose selection pressures, variants are subjected to stringent assays that link protein function directly to detectable signals, enabling the isolation of rare improved clones from libraries of 10⁶–10¹⁰ members. High-throughput screening methods, such as , utilize reporter substrates to quantify traits like binding affinity, where variants with enhanced expression indicate tighter interactions. For enzymatic properties, selection systems might employ growth-based complementation in auxotrophic hosts or colorimetric halos on agar plates to detect elevated activity or stability, as in protease assays measuring substrate hydrolysis. These approaches ensure survival or enrichment of functional variants under conditions mimicking industrial or therapeutic demands, such as high temperatures or non-natural solvents. Key milestones in directed evolution include the 1993 demonstration by Frances Arnold's group, who applied sequential epPCR rounds to evolve the mesophilic protease for catalysis in 60% dimethylformamide, achieving a 256-fold activity increase and proving the method's efficacy for non-natural environments. This work laid the foundation for broader applications, including the engineering of thermostable ; for example, compartmentalized self-replication enabled the evolution of variants with 11-fold higher thermostability for robust .

Computational and AI-Driven Methods

Computational methods in protein engineering leverage bioinformatics and physics-based simulations to predict and design protein structures and functions, enabling the exploration of vast sequence spaces without extensive wet-lab experimentation. These approaches integrate sequence analysis, energy minimization, and machine learning to model how mutations affect folding, stability, and interactions, facilitating targeted modifications for enhanced properties such as catalytic efficiency or binding affinity. By automating predictions, they complement experimental strategies and accelerate the design of novel proteins for applications in biotechnology and medicine. Structure prediction forms the cornerstone of computational protein engineering, encompassing ab initio, homology modeling, and threading techniques to generate three-dimensional models from amino acid sequences. Ab initio methods, such as those implemented in the , rely on physics-based energy functions to simulate folding pathways from first principles, assembling fragments of known structures and minimizing global energy to identify native-like conformations. For proteins without detectable homologs, and centroid-based low-resolution modeling have achieved sub-angstrom accuracy for small proteins in community-wide assessments like . Homology modeling constructs structures by aligning a target sequence to experimentally determined templates of related proteins, then refining side-chain placements and loop regions using spatial restraints derived from the template's coordinates. Tools like optimize these models by satisfying distance and dihedral angle constraints, yielding reliable predictions when sequence identity exceeds 30%, which is common for engineering variants within protein families. Threading, or template-based fold recognition, extends this to distant homologs by evaluating how well a query sequence fits into structural frameworks from the , using scoring functions that account for burial, secondary structure compatibility, and pairwise interactions. Methods like have successfully folded proteins up to 200 residues by combining threading restraints with ab initio assembly, improving fold identification accuracy to over 70% for hard targets. Advancements in artificial intelligence have revolutionized structure prediction, with deep learning models surpassing traditional methods in speed and precision. AlphaFold 2, developed by DeepMind, employs an attention-based neural network trained on multiple sequence alignments (MSAs) and structural data to predict atomic-level structures, achieving a median global distance test score (GDT_TS) of 92.4 in the CASP14 blind test—over 90% accuracy for diverse proteins including those lacking homologs. Building on this, AlphaFold 3 extends predictions to biomolecular complexes, incorporating diffusion modules to model interactions with ligands, nucleic acids, and modifications, with improved interface root-mean-square deviation (RMSD) below 2 Å for protein-protein contacts. For de novo design, diffusion models like RFdiffusion fine-tune RoseTTAFold networks to generate novel backbones from noise, conditioned on functional motifs or symmetries, enabling the creation of binders and enzymes with experimental success rates exceeding 10% for designed scaffolds. Coevolutionary analysis extracts structural insights from sequence covariation across homologs, inferring residue contacts that stabilize folds during evolution. By constructing MSAs from protein families, methods like direct-coupling analysis (DCA) compute statistical dependencies between residue pairs, filtering indirect correlations via mean-field approximations to predict contacts with precision up to 80% for top-scoring pairs in beta-sheet proteins. The EVfold approach applies DCA to diverse families, generating distance restraints for folding simulations that recover native topologies for 81% of tested proteins up to 240 residues. A foundational metric in these analyses is mutual information (MI), which quantifies coevolution between residues i and j as: I(i;j) = \sum_{x_i, x_j} p(x_i, x_j) \log \frac{p(x_i, x_j)}{p(x_i) p(x_j)} where p(x_i, x_j) is the joint probability of amino acids at positions i and j, and p(x_i), p(x_j) are marginals; high MI values (>2 bits) often indicate contacting pairs, aiding in constraint-based design. Multivalent protein design uses computational modeling to engineer assemblies that enhance avidity through repeated binding motifs, crucial for therapeutics like nanoparticle vaccines. Rosetta's symmetric docking and interface design protocols optimize multi-component structures by minimizing energies for oligomerization and ligand presentation, as demonstrated in the creation of 60-subunit nanoparticles displaying viral antigens with uniform geometry and stability. These methods enforce geometric constraints and score multimeric interfaces, yielding designs where experimental binding affinities increase by orders of magnitude due to cooperative effects, without relying on evolutionary templates.

Hybrid and Semi-Rational Strategies

Hybrid and semi-rational strategies in protein engineering integrate elements of rational design with techniques to enhance the efficiency of protein optimization by leveraging prior knowledge to guide variant generation and selection. These approaches aim to create targeted libraries that are smaller and more informative than those produced by purely random methods, thereby reducing the experimental burden while increasing the likelihood of identifying beneficial mutations. Semi-rational design typically involves the construction of focused libraries through site-saturation (SSM) at predicted functional hotspots, such as catalytic residues or binding sites identified via or sequence alignments. For instance, SSM systematically replaces specific residues with all 20 natural , allowing exploration of diverse substitutions at key positions without exhaustive randomization of the entire protein sequence. This method has been successfully applied to enzymes like lipases and cytochrome P450s, where near active sites improved specificity and enantioselectivity, often yielding variants with up to 100-fold enhancements in activity. By concentrating diversity on a limited number of sites (e.g., 5-10 residues), semi-rational SSM libraries typically contain 10^3 to 10^4 variants, compared to 10^9 or more for full-gene random , enabling higher hit rates of 1-10% for functional improvements. Hybrid workflows further combine computational or structural rational priming with subsequent directed evolution rounds to refine variants iteratively. In these pipelines, initial candidates are pre-selected using tools like or energy calculations to identify promising mutations, followed by evolutionary screening to accumulate synergistic changes. A prominent example is SCHEMA-guided recombination, which computationally predicts compatible crossover points in homologous proteins by minimizing structural disruptions from interacting residue pairs, as quantified by a disruption energy score (E). This approach has generated chimeric libraries of beta-lactamases and enzymes with over 50% functional chimeras, far exceeding random recombination yields, and has facilitated the of thermostable variants for applications. Similarly, ancestral sequence (ASR) serves as a robust starting point by inferring ancient protein sequences from phylogenetic data, often yielding enzymes with superior stability—such as beta-lactosidases active at 70°C versus 50°C for modern homologs—before subjecting them to for . These strategies collectively reduce sizes from 10^12 potential variants in unconstrained to manageable 10^4 scales, achieving hit rates up to 100-fold higher than unguided methods while preserving evolutionary exploration.

Experimental Techniques

Mutagenesis and Library Generation

Mutagenesis and library generation are essential steps in protein engineering, enabling the creation of diverse libraries for subsequent screening or selection. Random methods introduce nonspecific genetic changes across the target , mimicking natural to explore broad . A foundational technique is error-prone PCR, first described by Leung et al., which employs low-fidelity DNA polymerases like Taq under suboptimal conditions, such as the addition of Mn²⁺ ions to replace Mg²⁺, unbalanced dNTP concentrations, or increased cycle numbers, resulting in mutation rates of approximately 0.5–2% per . This approach favors transitions over transversions but allows control over mutation frequency, typically yielding libraries with 10⁶–10⁸ when expressed in bacterial hosts. Chemical mutagens, such as (EMS), alkylate bases to induce primarily G/C to A/T transitions during or replication, offering an alternative for treatment of DNA to generate random point mutations. Biological mutator strains, exemplified by the E. coli XL1-Red strain engineered with defects in DNA proofreading (mutD5) and mismatch repair (mutS), propagate plasmids at mutation rates 1,000–5,000 times higher than wild-type cells, producing diverse libraries through continuous replication without artifacts. Focused mutagenesis targets specific codons or regions to generate more efficient libraries with reduced size and bias, prioritizing positions informed by structural or computational . Site-saturation mutagenesis (SSM) employs degenerate with NNK triplets (N = A/C/G/T, K = G/T) at selected sites, encoding all 20 with only one (TAG), enabling exhaustive sampling of ~32 variants per position and library sizes of 10³–10⁵ for single-site changes. This method, pioneered in studies by Reetz and colleagues, minimizes redundancy and incorporation compared to NNN codons, facilitating high-quality libraries via overlap extension or QuikChange protocols. Sequence saturation mutagenesis (SeSaM) advances this by using trinucleotide phosphoramidites or cassettes to insert random codons directly, avoiding nucleotide-level biases and entirely, which results in equimolar representation of all 20 and supports transversion-rich mutations for broader chemical diversity. Advanced variants of these techniques allow tailored diversity, such as biased spectra or structural alterations. Ω-PCR, an overlap extension-based , enables controlled in error-prone conditions by adjusting primer overlaps and fidelity, useful for emphasizing specific types like transversions in targeted regions. Transposon insertion facilitates random in-frame insertions within a , promoting domain-level variations or extensions without full recombination, often yielding libraries of 10⁵–10⁷ transformants in E. coli. , through approaches like InDel-Assembly, generates variants with precise insertions or deletions (e.g., 1–9 ) to alter lengths or secondary structures, creating focused libraries of 10⁴–10⁶ sizes in or bacterial systems with high efficiencies up to 10⁹ cells per μg DNA. Overall, library sizes typically range from 10⁶ to 10⁹ variants, limited by host efficiency (e.g., 10⁸–10⁹ in electrocompetent E. coli, 10⁶–10⁷ in ), ensuring sufficient coverage of for functional discovery.

Recombination and Chimeragenesis

Recombination and chimeragenesis involve the of genetic s from multiple parental proteins to create chimeric variants with potentially improved or functions, the exploration of vast spaces beyond single . This approach leverages natural evolutionary principles by mimicking shuffling, often requiring some between parents for efficient crossover events. In protein engineering, these methods generate diverse libraries for subsequent screening or selection, particularly useful for enhancing activity, stability, or specificity in constructs. In vitro recombination techniques dominate early developments in chimeragenesis, starting with DNA shuffling introduced by Stemmer in 1994. This method fragments homologous parental genes via partial DNase I digestion into random pieces of 10-300 base pairs, then reassembles them through self-primed , yielding chimeras with multiple crossovers proportional to sequence identity. Applied to , it evolved variants with up to 270-fold increased resistance in just four generations. A related technique, the staggered extension process (StEP), developed by Zhao and in 1997, uses short-cycle with limited extension times to promote incremental template switching among homologous genes, avoiding fragmentation and reducing bias toward parental sequences. StEP has been used to evolve subtilisin E for improved in organic solvents. For generating chimeras from low- parents, incremental truncation for the creation of enzymes (ITCHY), pioneered by Ostermeier et al. in 1999, employs III to create single-stranded overhangs from truncated templates, followed by annealing and to form random crossover libraries independent of homology. ITCHY enables the creation of libraries between non-homologous genes, such as those encoding glycinamide transformylases from E. coli and humans, using and beta-galactosidase fusions for in-frame selection. To enhance crossover rates in such libraries, restriction-assisted chimeragenesis on transient templates (RACHITT), described by Coco et al. in 2001, uses uracil-containing single-stranded templates and nicks them with nicking endonucleases, followed by extension and treatment to favor recombination over parental recovery. RACHITT achieved over 50% chimeric content in libraries from low-homology genes like variants. Modular assembly methods like shuffling, optimized by Sarrion-Perdigones et al. in 2009, utilize type IIS restriction enzymes to create seamless, directionally cloned chimeras from non-homologous modules, enabling one-pot multi-fragment recombination with efficiencies exceeding 90% for up to eight parts. This has facilitated the engineering of hybrid pathways, such as modular polyketide synthases for novel production. Mimicking natural shuffling, SCRATCHY (shuffled codon-restricted alignment of truncated exons), introduced by Lutz et al. in 2001, combines ITCHY truncation with single-stranded protection using alpha-phosphorothioate to preserve coding frames and reduce frameshifts, followed by for increased crossovers. SCRATCHY generated libraries from non-homologous xylanase and genes, yielding chimeras with 10-fold higher activity on insoluble substrates. In vivo recombination methods offer continuous or high-efficiency alternatives. in via gap repair, established by Oldenburg et al. in 1997, assembles overlapping fragments into linearized plasmids during transformation, exploiting yeast's efficient for chimeric library construction. This has been applied to evolve hybrid antibodies with improved affinity. For accelerated evolution, phage-assisted continuous evolution (), developed by Esvelt et al. in 2011, links protein function to replication in E. coli chemostats, enabling up to 10^12 turnover events per day and recombining variants through host-mediated . evolved ATP-dependent with 1000-fold higher activity on modified nucleotides. These techniques have produced enzymes with synergistic properties, such as chimeric lipases combining from one parent with broad specificity from another, demonstrating recombination's power in creating functional diversity for and therapeutic applications.

Screening and Selection Systems

In protein engineering, screening and selection systems are high-throughput methods for identifying superior from large engineered libraries by evaluating their functional properties, such as enzymatic activity or binding affinity. These approaches enable the rapid assessment of millions to trillions of , bridging the gap between library generation and practical application. Screening typically involves non-destructive assays that measure performance without linking it directly to survival, while selection imposes a survival advantage on functional , allowing iterative enrichment. Screening methods often utilize fluorescence-activated cell sorting (FACS) coupled with display technologies, such as yeast surface display, where protein variants are fused to a anchor and labeled with fluorescent probes to quantify or activity. This enables of up to 10^8 cells per hour based on fluorescence intensity, facilitating affinity maturation of antibodies or enzymes. Microfluidic droplet systems encapsulate individual cells or variants in picoliter volumes, allowing compartmentalized activity assays, such as enzymatic turnover detected by , with rates exceeding 10^5 droplets per second for ultrahigh-throughput evaluation. Plate-based colorimetric tests, performed in multi-well formats, provide a simpler, lower-throughput alternative for detecting activity through chromogenic substrates that produce visible color changes, suitable for initial of up to 10^4 variants per plate. Selection systems couple protein function to host cell survival, enabling stringent enrichment without manual sorting. Antibiotic resistance linkage, often via fusion of the target protein to β-lactamase, confers resistance to ampicillin only when the variant stabilizes the fusion or activates the enzyme, allowing growth-based selection of stable or active proteins from libraries exceeding 10^9 variants. Growth-based auxotrophic complementation restores essential biosynthetic pathways in nutrient-deficient media; for instance, variants restoring methionine biosynthesis in auxotrophic E. coli enable colony formation, supporting selection for functional enzymes with enrichment factors up to 10^3-fold per round. Phage-assisted continuous evolution (PACE) accelerates this by linking protein activity to bacteriophage propagation in E. coli hosts, achieving up to 10^12 variants per day through continuous mutation and selection cycles, as demonstrated in evolving RNA polymerase specificity. Quantitative metrics in these systems include enrichment factors, which measure the fold increase in functional variants relative to inactive ones (typically 10^2 to 10^5 per round in FACS or ), providing insight into selection stringency. However, false positives can arise from promiscuous binders or cheater cells that bypass the assay without true function, reducing effective enrichment by up to 50% in -based screens; strategies like biosensor desensitization mitigate this by raising detection thresholds. Recent advances integrate to predict hits from screening data, using models trained on sequence-activity pairs to prioritize variants for validation, achieving up to 10-fold higher success rates in identifying emergent functions from diverse libraries.

Applications

Enzyme Optimization

Enzyme optimization in protein engineering aims to enhance the catalytic performance of enzymes for and applications by improving key parameters such as the k_{\text{cat}}/K_{\text{M}}, which measures catalytic efficiency, to withstand high temperatures during processing, and solvent tolerance to operate in non-aqueous environments. These modifications enable enzymes to achieve higher turnover numbers, often exceeding 10^3 s^{-1} for optimized variants, and increased half-lives at elevated temperatures, such as retaining over 80% activity after 100 hours at 60°C. For instance, solvent tolerance improvements allow enzymes to maintain activity in organic media like (DMF), where wild-type counterparts denature rapidly. A landmark case in involved optimizing E, a , for activity in polar organic solvents during the 1990s. Through sequential random mutagenesis and screening, researchers generated variants with up to 38-fold higher activity in 85% DMF compared to the parent enzyme while preserving proteolytic function in aqueous media. This work demonstrated how iterative evolution could adapt enzymes for non-natural environments, paving the way for biocatalysis in . techniques facilitated the identification of these beneficial mutations from large libraries. In the realm of computational and AI-driven methods, design has produced enzymes for novel reactions, exemplified by Kemp eliminases in the . Starting from the seminal computational designs, subsequent AI-optimized variants achieved catalytic efficiencies with k_{\text{cat}}/K_{\text{M}} values reaching approximately 10^5 M^{-1} s^{-1} for the Kemp elimination of 5-nitrobenzisoxazole, providing rate accelerations of over 10^6-fold relative to the uncatalyzed reaction and enabling efficient proton abstraction in designed active sites. These enzymes, often refined without extensive lab evolution, highlight the potential of to predict and stabilize catalytic motifs for reactions lacking natural counterparts. For industrial biofuel production, protein engineering of s has focused on enhancing tolerance and reusability. of a yielded variants like Dieselzyme 4, which exhibited significantly increased stability and reusability in up to 40% , facilitating synthesis from waste oils with high yields under mild conditions. Such optimizations reduce costs and improve scalability by increasing the enzyme's operational in solvent-heavy reactions. At industrial scales, engineering glucose isomerase has revolutionized (HFCS) production. of Thermoanaerobacter ethanolicus produced thermostable variants operating at 90°C, boosting yields to 55% while extending to over 500 hours, thereby lowering enzyme costs by 60-70% in commercial processes. This enzyme's improved to around 10^5 M^{-1} s^{-1}, enabling continuous immobilized-column operations that process millions of tons of annually.

Therapeutic Protein Design

Therapeutic protein design involves the targeted modification of proteins to enhance their therapeutic potential for medical applications, with a primary emphasis on improving pharmacokinetic properties, biological , and safety profiles . Engineers focus on altering protein structures to achieve greater against , prolonged circulation times, and minimized immune responses, which are critical for effective and patient tolerability. This process often integrates rational and semi-rational approaches to tailor proteins like antibodies and cytokines for specific disease targets, ensuring they maintain functionality while overcoming physiological barriers. A major focus in therapeutic protein design is the humanization of antibodies to reduce immunogenicity while preserving antigen-binding affinity. Complementarity-determining region (CDR) grafting transfers the CDRs from a non-human antibody onto a human framework, minimizing foreign epitopes and enabling safer clinical use. This technique has been widely adopted, as demonstrated in the development of humanized monoclonal antibodies where CDR grafting retains over 90% of the original binding potency in many cases. For cytokines, half-life extension strategies such as PEGylation covalently attach polyethylene glycol (PEG) chains to the protein surface, reducing renal clearance and enzymatic degradation; for instance, PEGylated interferons like peginterferon alfa-2a exhibit a 10- to 20-fold increase in serum half-life compared to their unmodified counterparts, improving dosing intervals for conditions like hepatitis C. Additional strategies include Fc engineering of antibodies to modulate (ADCC), where mutations in the Fc region, such as those enhancing binding to FcγRIIIa receptors, can increase ADCC activity by up to 50-fold, boosting antitumor effects without altering the antigen-binding site. Deimmunization further addresses by computationally identifying and removing T-cell epitopes through targeted substitutions, significantly reducing predicted immunogenic sequences while preserving protein function. Computational design tools aid in optimizing during these processes. Despite these advances, challenges persist in formulation and delivery. Protein aggregation in therapeutic formulations, often triggered by hydrophobic interactions or during manufacturing, can lead to reduced and potential , with aggregation levels exceeding 1-5% posing regulatory hurdles. Oral delivery faces significant barriers, including proteolytic degradation in the and poor mucosal permeability, resulting in below 1% for most unmodified proteins. Regulatory oversight by the FDA ensures and of engineered biologics; for example, humanized antibodies like (Humira) and its approved variants, such as adalimumab-aaty (Yuflyma), have undergone rigorous evaluation for structural and functional similarity, with over 10 such variants approved since 2023 to expand access while maintaining therapeutic equivalence.

Materials and Biosensors

Protein engineering has enabled the development of advanced biomaterials by modifying natural protein structures to enhance , mechanical properties, and for applications in scaffolds and tissue constructs. fibroin, derived from cocoons, has been engineered through genetic modifications and recombinant expression to create variants with tunable beta-sheet content, improving solubility and gelation for scaffolds in . These engineered fibroin bioinks support cell viability and proliferation, forming porous structures that mimic extracellular matrices for and regeneration. Similarly, , the primary component of connective tissues, is engineered via recombinant production in heterologous hosts to produce human-like with reduced and enhanced stability for scaffolds. Recombinant human variants incorporate specific mutations to improve assembly and cross-linking, facilitating the creation of hydrogels and decellularized matrices that promote and vascularization in and tissue models. Key design principles in protein engineering for materials and biosensors leverage modular domains to control and responsiveness. Multimerization domains, such as designed coiled-coils, enable precise oligomerization of protein subunits into higher-order structures like nanofibers and cages, driving in biomaterials. These coiled-coil motifs, with their heptad repeat sequences, allow orthogonal interactions for hierarchical organization, as seen in synthetic protein hydrogels. Responsiveness is achieved through conformational switches engineered into protein scaffolds, where pH-sensitive networks or azobenzene-based light-responsive elements induce reversible folding changes. For instance, proteins with buried exhibit sharp pH-dependent transitions from compact to extended states, enabling stimuli-responsive materials that adapt to environmental cues like acidity in wounds. Light-induced switches, incorporating photoisomerizable groups, trigger alpha-helix uncoiling for dynamic control of in biosensors. In biosensors, engineered proteins provide sensitive, real-time detection of s through conformational or luminescent changes at abiotic interfaces. enzymes, such as NanoLuc variants, have been allosterically modified to couple binding—such as small molecules or ions—with enhanced , allowing wash-free detection in point-of-care devices. These synthetic allostery designs achieve sub-nanomolar sensitivity for metabolites like glucose, integrating into portable platforms for . Affibody scaffolds, small three-helix bundle proteins derived from staphylococcal , are engineered for high-affinity binding to targets like biomarkers, forming compact probes for lateral flow assays in diagnostics. Optimized affibodies with mutated binding surfaces enable rapid, antibody-free detection of proteins in serum, supporting multiplexed point-of-care tests for infectious diseases. Notable examples illustrate the integration of these principles in functional devices. Virus-like particles (VLPs), assembled from engineered coat proteins like those from avian retroviruses, encapsulate therapeutic cargos through surface modifications with coiled-coil adapters, enabling targeted delivery across cellular barriers without . These protein-only VLPs, with diameters of 100-150 nm, achieve efficient cytosolic release of enzymes or antibodies . Amyloid-inspired nanowires, constructed from beta-sheet-rich peptides like those templated on , form conductive one-dimensional structures for bioelectronic interfaces. Engineering amyloidogenic sequences with metal-binding motifs yields nanowires up to microns in length with conductivities exceeding 10 S/cm, suitable for biosensors and energy-harvesting materials.

Notable Examples and Case Studies

Industrial Enzymes

Protein engineering has significantly advanced the development of , enabling their optimization for large-scale manufacturing processes such as biofuel production, , and detergent formulation. Through techniques like , wild-type enzymes are iteratively mutated and selected over multiple rounds to enhance properties like , tolerance, and catalytic efficiency, transforming them into robust biocatalysts that outperform their natural counterparts in harsh conditions. A prominent example is the engineering of , originally derived from the thermophilic bacterium , which has been further evolved for enhanced thermostability in (PCR) applications central to industrial biotechnology. Directed evolution methods, such as high-temperature isothermal compartmentalized self-replication, have produced variants like v5.9—a chimera of Taq's large fragment (Klentaq) and Geobacillus stearothermophilus polymerase—that maintain activity after exposure to 95°C, improving processivity and reliability in high-throughput DNA amplification for diagnostics and manufacturing. These evolved polymerases facilitate scalable PCR workflows, reducing cycle times and error rates in industrial settings like recombinant . Another key case involves of α-amylases for processing, where bacterial enzymes like those from are optimized for liquefaction in and industries. Multi-round error-prone and have yielded mutants such as BAA 42, which shifts the pH optimum from 6 to 7 and boosts activity fivefold at pH 10, alongside a 1.5-fold increase in , making it ideal for alkaline at elevated temperatures. Similarly, variant BAA 29 achieves a ninefold higher while preserving the wild-type pH profile, enabling more efficient conversion of to glucose syrups and reducing processing times in production. These engineered enzymes deliver substantial economic and environmental impacts, including cost reductions through process intensification. For instance, in formulations, evolved proteases and lipases enable effective cleaning at lower temperatures, yielding up to 50% savings by shifting cycles from 40°C to 20°C, which also extends fabric life and cuts operational expenses in commercial laundering. Sustainability benefits arise from replacing chemical catalysts with bio-based enzymes, minimizing waste and hazardous byproducts in sectors like processing and refining, thereby lowering the overall of manufacturing. Commercial successes underscore the field's maturity, with companies like (now part of Novonesis) leading through a portfolio exceeding 500 industrial enzyme products tailored for applications in , feed, and care. This dominance reflects the from single wild-type enzymes to optimized variants via iterative cycles, driving widespread adoption. The global industrial enzymes , fueled by these innovations, is projected to reach approximately USD 8 billion in 2025, highlighting the sector's growth in sustainable bioprocessing.

Medical Therapeutics

Protein engineering has significantly advanced medical therapeutics by enabling the design of biologics with enhanced efficacy, specificity, and pharmacokinetic properties for treating diseases such as and autoimmune disorders. Monoclonal antibodies represent a cornerstone of these innovations, with engineering strategies optimizing their binding affinity, effector functions, and circulation time to improve patient outcomes in . For instance, (Keytruda), a humanized IgG4 monoclonal antibody targeting the PD-1 receptor, incorporates mutations in the Fc region to minimize antibody-dependent cellular cytotoxicity while maintaining a prolonged serum half-life of approximately 22 days, allowing for less frequent dosing in advanced non-small cell and treatments. Clinical trials have demonstrated that pembrolizumab monotherapy yields a 5-year overall survival rate of up to 31.9% in patients with PD-L1-positive metastatic non-small cell , representing a substantial improvement over historical benchmarks of around 15-20%. Bispecific T-cell engagers, another engineered protein class, redirect cytotoxic T cells to tumor antigens, offering potent antitumor activity with reduced systemic toxicity compared to traditional chemotherapies. Blinatumomab, a bispecific single-chain variable fragment fusion protein targeting CD19 on B cells and CD3 on T cells, was designed to form a cytolytic synapse, leading to its approval for relapsed or refractory B-cell acute lymphoblastic leukemia. In clinical studies, blinatumomab has improved median overall survival from 4.0 months with standard chemotherapy to 7.7 months in relapsed/refractory cases, with even greater benefits in minimal residual disease-negative patients where it extended relapse-free survival by up to 25%. Fc-fusion proteins extend the therapeutic utility of cytokines, hormones, and receptor domains by leveraging the Fc region's interaction with the neonatal (FcRn) to prolong serum and enhance . Examples include , a TNF receptor-Fc fusion for , which achieves a of 4-5 days versus minutes for unbound TNF inhibitors, enabling weekly dosing and sustained control. Similarly, , a thrombopoietin agonist-Fc fusion, stimulates platelet production in immune with a extension that reduces dosing frequency from daily to weekly, improving and . Chimeric antigen receptor (CAR) T-cell therapies rely on protein-engineered receptors grafted onto T cells to confer tumor-specific recognition, bypassing restrictions for enhanced precision. The CAR construct, comprising an extracellular antigen-binding domain (often a ), transmembrane hinge, and intracellular signaling motifs (e.g., CD3ζ and or 4-1BB), is optimized for high-affinity binding to targets like in B-cell malignancies. Approved therapies such as have achieved complete remission rates of 50-80% in refractory large , with 3-year overall survival rates around 47%, marking a from prior salvage rates below 30%. Glycoengineering further refines these therapeutics by modulating N-linked glycosylation in the Fc domain to mitigate adverse effects, such as excessive immune activation leading to . For example, afucosylation enhances while reducing off-target inflammation, as seen in for , where it lowered infusion-related reactions by altering FcγRIIIa binding affinity. This approach has enabled dose reductions in regimens, correlating with 20-30% improvements in tolerability profiles without compromising antitumor efficacy. In recent developments as of 2025, artificial intelligence-driven design has produced miniprotein inhibitors as novel antivirals, offering compact scaffolds with high stability and specificity. These AI-optimized miniproteins, such as multivalent decoys targeting the , neutralize variants with picomolar affinity and demonstrate prophylactic protection in animal models, potentially addressing emerging viral threats with fewer side effects than larger biologics.

Challenges and Future Directions

Limitations in Prediction and Scalability

One major limitation in protein engineering lies in the accurate prediction of mutational effects, particularly due to , where the impact of a depends on the genetic background and interactions with other mutations. Epistasis complicates the forecasting of multi-mutation outcomes, as non-additive effects can drastically alter protein in ways that single-mutation models fail to capture, reducing the success of rational design approaches. For instance, higher-order epistasis has been shown to play a critical role in sequence-function relationships, making it challenging to predict beneficial variants without extensive experimental validation. This unpredictability slows evolutionary processes in both natural and laboratory settings, often leading to suboptimal engineering outcomes. Computational tools like have revolutionized structure prediction but remain limited in capturing protein dynamics, as they primarily output static structures rather than conformational ensembles essential for function. 's reliance on equilibrium states overlooks transient dynamics and allosteric effects, which are crucial for enzymatic activity and binding, thus hindering the design of proteins with desired kinetic properties. These gaps in AI-based prediction underscore the need for integrated models that incorporate dynamic simulations to better navigate complex fitness landscapes. Recent advancements, such as 3 released in May 2024, have improved predictions for multi-molecule complexes and some dynamic aspects, but challenges in full dynamics persist. Scalability in protein engineering is constrained by bottlenecks in library expression and screening, including the frequent formation of during recombinant production in hosts, which results in insoluble, misfolded proteins that require costly refolding or alternative expression systems. High-throughput screening of variant libraries, often comprising millions of candidates, incurs substantial expenses due to equipment, reagents, and labor demands. Experimental challenges further exacerbate these issues, such as off-target mutational effects that introduce unintended functional alterations and poor reproducibility when transferring engineered proteins between expression hosts, like from to mammalian cells, where post-translational modifications differ significantly. Success rates in remain low, with only a small of generated variants typically exhibiting desired functionality, highlighting the vast, rugged nature of protein fitness landscapes where most sequences are non-functional "holes." Addressing these landscapes requires improved mapping techniques to identify navigable paths, but current methods struggle with the of possibilities, limiting the efficiency of engineering campaigns.

and Ethical Considerations

Protein language models (PLMs) represent a transformative emerging in protein engineering, enabling the prediction and design of protein structures and functions from data alone. Models such as ESM-2, developed by in 2022, leverage on vast protein datasets to generate embeddings that capture evolutionary relationships and physicochemical properties, facilitating zero-shot predictions of variant fitness and stability. These PLMs outperform traditional methods in tasks like secondary structure prediction and have been integrated into workflows for rapid prototyping of novel enzymes and therapeutics. CRISPR-Cas systems are advancing in-cell protein engineering by allowing precise genomic modifications directly within living cells, bypassing the need for external expression systems. Engineered variants like nickases and base editors enable targeted insertions, deletions, or substitutions to optimize endogenous proteins for enhanced activity or specificity, as demonstrated in applications for rewiring. Looking toward the 2030s, quantum computing holds promise for simulating complex dynamics at scales unattainable by classical computers, potentially accelerating the design of large multidomain proteins through variational quantum algorithms. Hybrid approaches combining artificial intelligence with directed evolution are streamlining protein optimization by using machine learning to prioritize promising variants from massive libraries, reducing experimental iterations by orders of magnitude. For instance, AI-guided platforms integrate generative models with high-throughput assays to evolve enzymes with tailored catalytic properties. In synthetic biology, de novo protein design constructs entirely novel pathways using computational tools to assemble non-natural folds, enabling the creation of custom metabolic routes for biofuel production or xenobiotic degradation. Recent 2025 developments include AI-powered universal strategies for more accessible protein engineering and revelations of ancient rules of protein stability to guide designs. Ethical considerations in protein engineering are increasingly prominent due to dual-use risks, where technologies for beneficial applications, such as vaccine design, could be repurposed to engineer potent toxins or pathogens. Equity issues arise from unequal access to designer proteins, particularly in low-resource settings, where advanced tools exacerbate global health disparities despite their potential for affordable therapeutics. Intellectual property challenges further complicate the field, as overlapping patents on engineered proteins and AI algorithms hinder collaborative innovation and commercialization in biotechnology. Looking ahead, protein engineering is poised to drive by enabling patient-specific protein therapeutics, such as customized antibodies for rare diseases, through iterative AI-optimization cycles. The global market for protein engineering is projected to reach approximately $10.4 billion by 2031, fueled by demand in biopharmaceuticals and industrial biocatalysis.

References

  1. [1]
    Protein Engineering: A New Frontier for Biological Therapeutics - PMC
    Protein engineering holds the potential to transform the metabolic drug landscape through the development of smart, stimulus-responsive drug systems.Missing: definition | Show results with:definition
  2. [2]
    Engineering protein-based therapeutics through structural and ...
    Apr 27, 2023 · This review chronicles both well-established and emerging design strategies that have enabled this paradigm shift by transforming protein-based structures.
  3. [3]
    2 Protein Engineering Methods and Applications - Academia.edu
    Protein engineering is the design of new enzymes or proteins with new or desirable functions. It is based on the use of recombinant DNA technology to change ...
  4. [4]
    Protein engineering in the deep learning era - PMC - PubMed Central
    Dec 26, 2024 · This review frames frequently researched problems in protein understanding and engineering from the perspective of deep learning.
  5. [5]
    Protein Engineering - an overview | ScienceDirect Topics
    Protein engineering is defined as the deliberate modification of amino acids in a protein, typically utilizing the known three-dimensional structure and ...
  6. [6]
    Protein Engineering - an overview | ScienceDirect Topics
    Protein engineering is defined as an approach to synthesize enzyme or protein variants by altering their sequences to select variants with improved properties ...Missing: paper | Show results with:paper
  7. [7]
    Structure-Function Relationship - an overview | ScienceDirect Topics
    Structure–function relationships refer to the connections between the atomic structure of proteins and their functional properties, often studied through ...
  8. [8]
    Sequence-structure-function relationships in the microbial protein ...
    Apr 26, 2023 · Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures.
  9. [9]
    Protein Biosynthesis - an overview | ScienceDirect Topics
    Protein biosynthesis comprises the translation of the message contained in mRNA and the assembly of amino acids in the order dictated by mRNA. Three types of ...
  10. [10]
    Genetic Code - an overview | ScienceDirect Topics
    The genetic code refers to the system property that maps the nucleotide sequence of mRNA onto instructions for protein synthesis.Missing: biosynthesis | Show results with:biosynthesis
  11. [11]
    Advances in protein engineering and its application in synthetic ...
    Protein engineering has been used successfully in fields ranging from medicine to food science to biofuels. Applications of protein engineering include ...
  12. [12]
    The Nobel Prize in Physiology or Medicine 1978 - Press release
    Werner Arber started this field of research in Geneva during the 1960's. He discovered restriction enzymes. Arber was studying an earlier known phenomenon, “ ...
  13. [13]
    The Nobel Prize in Physiology or Medicine 1978 - NobelPrize.org
    The Nobel Prize in Physiology or Medicine 1978 was awarded jointly to Werner Arber, Daniel Nathans and Hamilton O. Smith for the discovery of restriction ...
  14. [14]
    Practically Useful: What the Rosetta Protein Modeling Suite Can Do ...
    Rosetta is a unified software package for protein structure prediction and functional design. ... Macromolecular modeling with Rosetta. Das, Rhiju; Baker, David.Rosetta Conformational... · Protein Structure Prediction · Protein Design
  15. [15]
    Growth of novel protein structural data - PNAS
    Contrary to popular assumption, the rate of growth of structural data has slowed, and the Protein Data Bank (PDB) has not been growing exponentially since 1995.Sign Up For Pnas Alerts · Results · Growth Of The Pdb
  16. [16]
    Highly accurate protein structure prediction with AlphaFold - Nature
    Jul 15, 2021 · AlphaFold greatly improves the accuracy of structure prediction by incorporating novel neural network architectures and training procedures ...
  17. [17]
    Press release: The Nobel Prize in Chemistry 2024 - NobelPrize.org
    Oct 9, 2024 · Demis Hassabis and John Jumper have developed an AI model to solve a 50-year-old problem: predicting proteins' complex structures. These ...
  18. [18]
    Physiology, Proteins - PubMed
    Nov 14, 2022 · Proteins can be further defined by their four structural levels: primary, secondary, tertiary, and quaternary. The first level is the primary ...
  19. [19]
    Protein Structure - an overview | ScienceDirect Topics
    Protein structure has four levels of organization. The primary structure is represented by sequence of amino acids bound together by peptide bonds.
  20. [20]
    Role of Hydrophobic Interactions in Protein Folding
    Aug 29, 2006 · These interactions strengthen hydrogen bonds and electrostatic interactions between charged groups both by reducing the entropy of otherwise ...
  21. [21]
    Secondary Forces in Protein Folding - PMC - PubMed Central
    The dominant contributors to protein folding include the hydrophobic effect and conventional hydrogen bonding, along with Coulombic interactions and van der ...
  22. [22]
    Uncovering protein structure - PMC - PubMed Central - NIH
    The driving force for protein folding is a result of hydrophobic collapse, hydrogen bond formation, electrostatic interactions and van der Waals interactions ...Missing: disulfide seminal
  23. [23]
    Principles of Protein Stability and Their Application in Computational ...
    Jan 26, 2018 · Similar to buried hydrogen bonds, buried salt bridges also have an important role in specifying the native conformation, since misfolded states, ...
  24. [24]
    [PDF] Energetics of Protein Folding - Stanford University
    The energetics of protein folding determine the 3D structure of a folded protein. Knowledge of the energetics is needed to predict the 3D structure.
  25. [25]
    Thermally versus Chemically Denatured Protein States | Biochemistry
    May 14, 2019 · Protein unfolding thermodynamic parameters are conventionally extracted from equilibrium thermal and chemical denaturation experiments.Abstract · Figure 1 · Figure 3
  26. [26]
    Cold denaturation as a tool to measure protein stability - ScienceDirect
    Highlights · Protein stability is not described adequately by unfolding temperature. · The area under the stability curve offers a better assessment of stability.
  27. [27]
    Molecular chaperones in protein folding and proteostasis - Nature
    Jul 20, 2011 · Because protein molecules are highly dynamic, constant chaperone surveillance is required to ensure protein homeostasis (proteostasis). Recent ...
  28. [28]
    Mega-scale experimental analysis of protein folding stability ... - Nature
    Jul 19, 2023 · b, Two examples of stabilizing mutations found by our assay, along with the distribution of ΔΔG values for these mutation types. The ...
  29. [29]
    Using protein engineering to understand and modulate aggregation
    Feb 19, 2020 · Protein aggregation occurs through a variety of mechanisms, initiated by the unfolded, non-native, or even the native state itself.
  30. [30]
    Stabilization of neurotoxic Alzheimer amyloid-β oligomers by protein ...
    Here we use protein engineering to address these issues and to provide a method to stabilize toxic Aβ oligomers for structural and functional studies.
  31. [31]
    Central Dogma of Molecular Biology - Nature
    Aug 8, 1970 · Central Dogma of Molecular Biology. FRANCIS CRICK. Nature volume 227, pages 561–563 (1970)Cite this article.
  32. [32]
    [PDF] Marshall Nirenberg - Nobel Lecture
    The genetic code is shown in Fig. 3. Most triplets correspond to amino acids. Codons for the same amino acid usually differ only in the base occupy- ing the ...
  33. [33]
  34. [34]
    Human non‐synonymous SNPs: server and survey - Oxford Academic
    Most human genetic variation is represented by single nucleotide polymorphisms (SNPs) and many of them are believed to cause phenotypic differences between ...
  35. [35]
    Comparison of two codon optimization strategies to enhance ...
    Mar 3, 2011 · Codon optimization affects translation rate which, in turn, may alter protein structure and function. It has been described that inclusion ...
  36. [36]
    DNA replication fidelity in Escherichia coli: a multi-DNA polymerase ...
    The error rate during DNA replication is as low as 10−9 to 10−11 errors per base pair. How this low level is achieved is an issue of major interest.
  37. [37]
    Toward complete rational control over protein structure and function ...
    This review will focus primarily on structure-based protein design methodology, which can be divided into three broadly overlapping categories based on ...
  38. [38]
    Beyond directed evolution - semi-rational protein engineering ... - NIH
    This review focuses on recent engineering and design examples that require screening or selection of small libraries.
  39. [39]
    Advances in protein structure prediction and design - Nature
    Aug 15, 2019 · In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have ...<|control11|><|separator|>
  40. [40]
    Rational design of enzyme activity and enantioselectivity - Frontiers
    Here, we reviewed the recent advances of applying the rational design strategy to engineer enzyme functions including activity and enantioselectivity.
  41. [41]
    Consensus protein design - PMC - NIH
    Jun 5, 2016 · Multiple sequence alignments (MSAs) and phylogenetic analyses have become standard tools for exploring sequence conservation (Steipe et al., ...
  42. [42]
    [PDF] DIRECTED EVOLUTION OF ENZYMES AND BINDING PROTEINS
    Oct 3, 2018 · Frances H Arnold reported the directed evolution of subtilisin E to obtain an enzyme variant which was active in a highly unnatural. (denaturing) ...
  43. [43]
    Directed evolution of polymerase function by compartmentalized self ...
    We describe compartmentalized self-replication (CSR), a strategy for the directed evolution of enzymes, especially polymerases.
  44. [44]
    Improving fragment-based ab initio protein structure assembly using ...
    Aug 18, 2021 · Ab initio protein structure prediction ... Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta.
  45. [45]
    TOUCHSTONE: An ab initio protein structure prediction method that ...
    A threading-based method of secondary and tertiary restraint prediction has been developed and applied to ab initio folding.
  46. [46]
    Accurate structure prediction of biomolecular interactions ... - Nature
    May 8, 2024 · Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes.Nobel Prize in Chemistry 2024 · Nature Machine Intelligence
  47. [47]
    De novo design of protein structure and function with RFdiffusion
    Jul 11, 2023 · De novo protein design seeks to generate proteins with specified structural and/or functional properties, for example, making a binding ...
  48. [48]
    Direct-coupling analysis of residue coevolution captures native ...
    Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations ...
  49. [49]
    Protein 3D Structure Computed from Evolutionary Sequence Variation
    In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co- ...
  50. [50]
    Mutual Information in Protein Multiple Sequence Alignments ...
    We used information theory to identify nonconserved amino acid residue pairs that coevolve for reasons of structure or function in a diverse set of protein ...
  51. [51]
    Tailored design of protein nanoparticle scaffolds for multivalent ...
    Aug 4, 2020 · We designed self-assembling protein nanoparticles with geometries tailored to present the ectodomains of influenza, HIV, and RSV viral glycoprotein trimers.
  52. [52]
  53. [53]
  54. [54]
  55. [55]
  56. [56]
    A method for random mutagenesis of a defined DNA segment using ...
    A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction · D. Leung, E. Chen, D. Goeddel · Published 1989 · Biology, ...
  57. [57]
    Critical evaluation of random mutagenesis by error-prone ... - PubMed
    May 1, 2009 · Error-prone PCR had the highest mutation rates, while chemical and biological methods had low mutation rates. A combination of methods is ...
  58. [58]
  59. [59]
    Random Mutagenesis Using a Mutator Strain
    Propagation of the genes cloned in plasmids through a mutator strain, like Escherichia coli XL1-red, produces randomly mutagenized plasmid libraries. This ...
  60. [60]
    Sequence saturation mutagenesis (SeSaM): a novel method for ...
    SeSaM is a conceptually novel and practically simple method that truly randomizes a target sequence at every single nucleotide position.Missing: engineering | Show results with:engineering
  61. [61]
    Robust one-Tube Ω-PCR Strategy Accelerates Precise Sequence ...
    Ω-PCR is based on an overlap extension site-directed mutagenesis technique, and is named for its characteristic Ω-shaped secondary structure during PCR. Ω-PCR ...Results · Substitution Ω-Pcr · Design Of Ω-Pcr PrimersMissing: biased | Show results with:biased
  62. [62]
    Methods for enzyme library creation: Which one will you choose?
    Jul 15, 2021 · The most basic approach for the generation of targeted mutations is site-directed mutagenesis (SDM), introduced by Nobel laureate Michael Smith ...Missing: EMS | Show results with:EMS
  63. [63]
    High Throughput Screening and Selection Methods for Directed ...
    In this review, we focus on high throughput screening and selection methods for evolutionary enzyme engineering and highlight their significant applications.
  64. [64]
    Applications of yeast surface display for protein engineering - NIH
    This review primarily focuses on the applications of yeast display from a protein engineering perspective, including examples of protein affinity maturation ...
  65. [65]
    Ultrahigh-throughput screening in drop-based microfluidics ... - PNAS
    We present a general ultrahigh-throughput screening platform using drop-based microfluidics that overcomes these limitations and revolutionizes both the scale ...
  66. [66]
    Cheating the cheater: Suppressing false positive enrichment during ...
    Our work outlines a methodology to successfully overcome enrichment of false positives during strain/protein engineering campaigns that utilize biosensors to ...
  67. [67]
    Machine learning-aided design and screening of an emergent ...
    Mar 5, 2024 · Here we describe a proof-of-principle of how such screening, in silico and in vitro, can be achieved for ML-generated variants of a protein that forms ...
  68. [68]
    Building Enzymes through Design and Evolution | ACS Catalysis
    Sep 7, 2023 · The optimization of KE07's activity to give a kcat/KM value of ∼ 2600 s-1M-1 and an ∼ 106-fold rate acceleration (kcat/kuncat) involved ...
  69. [69]
    Random Mutagenesis to Enhance Activity of Subtilisin E in Polar ...
    Nov 1, 1991 · Enzyme engineering for nonaqueous solvents II. Additive effects of mutations on the stability and activity of subtilisin E in polar organic ...
  70. [70]
    Directed evolution: Creating biocatalysts for the future - ScienceDirect
    You and Arnold, 1996. L. You, F.H. Arnold. Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide.
  71. [71]
    Complete computational design of high-efficiency Kemp elimination ...
    Jun 18, 2025 · We present a fully computational workflow for designing efficient enzymes in TIM-barrel folds using backbone fragments from natural proteins.
  72. [72]
    Protein engineering of xylose (glucose) isomerase from ...
    Protein engineering of xylose (glucose) ... isomerase from Thermoanaerobacter ethanolicus and its application in production of high fructose corn syrup.<|control11|><|separator|>
  73. [73]
    Industrial Applications of Engineered Glucose Isomerase
    Feb 9, 2024 · Engineered versions of glucose isomerase have dramatically improved production efficiency, sustainability, and cost-effectiveness of high-fructose corn syrup ...
  74. [74]
    Full article: Antibody humanization methods – a review and update
    CDR grafting with vernier zone retaining has been the most common method for production of humanized antibodies. For example, Herceptin (Trastuzumab), an FDA- ...Cdr Grafting Based On... · Affinity Maturation Of... · Immunogenicity Of Humanized...
  75. [75]
    Research progress on the PEGylation of therapeutic proteins and ...
    This review presents recent progress in the development and application of PEGylated therapeutic proteins and peptides (TPPs).
  76. [76]
    What is the future of PEGylated therapies? - PMC - PubMed Central
    Nov 19, 2015 · PEGylation is the most established half-life extension technology in the clinic with proven safety in humans for over two decades.
  77. [77]
    Fc-Engineered Therapeutic Antibodies: Recent Advances and ...
    Fc engineering aims to enhance the effector functions or half-life of therapeutic antibodies by modifying their Fc regions.
  78. [78]
    Boosting therapeutic potency of antibodies by taming Fc domain ...
    Nov 18, 2019 · In this report, we review Fc engineering efforts to improve therapeutic potency, and propose future antibody engineering directions that can fulfill unmet ...
  79. [79]
    Deimmunization of protein therapeutics – Recent advances in ... - NIH
    Dec 29, 2020 · This review highlights the most recent advances and current challenges in the deimmunization of protein therapeutics, with a special focus on ...
  80. [80]
    Stabilization challenges and aggregation in protein-based ... - NIH
    Dec 11, 2023 · In this review, we have discussed some features of protein aggregation during production, formulation and storage as well as stabilization strategies.
  81. [81]
    Therapeutic Protein Aggregation: Mechanisms, Design, and Control
    This review presents how and why a mechanistic approach to design and control of aggregation can be valuable, as well as highlighting areas for improvement.
  82. [82]
    Barriers and Strategies for Oral Peptide and Protein Therapeutics ...
    Mar 21, 2025 · This review presents oral PPs as a promising platform, highlighting the key barriers and strategies to transform their therapeutic landscapes.
  83. [83]
    Oral delivery of therapeutic peptides and proteins - ScienceDirect.com
    This review provides an overview about the different barriers for oral peptide and protein delivery and highlights the progress made on lipid-based ...
  84. [84]
  85. [85]
    Navigating adalimumab biosimilars: an expert opinion - PMC - NIH
    Oct 19, 2023 · This article explores the characteristics of various adalimumab biosimilars to help clinicians navigate the various options available across Europe and the USA.Missing: engineered variants
  86. [86]
    Evolution of a Thermophilic Strand-Displacing Polymerase Using ...
    Apr 9, 2018 · Compartmentalized self replication (CSR) is widely used for in vitro evolution of thermostable DNA polymerases able to perform PCR in emulsion.Subjects · Special Issue · Exonuclease Iii Parasite...
  87. [87]
    Directed evolution of a bacterial α-amylase: Toward enhanced pH ...
    α-Amylases, in particular, microbial α-amylases, are widely used in industrial processes such as starch liquefaction and pulp processes, and more recently ...
  88. [88]
    New Enzyme Technology for liquid Detergents: Improved Washing ...
    After all, reducing the washing temperature from 40 °C to 20 °C can result in energy savings of up to 50 percent (Source: Lav temperatur/koldvask vaskemiddel, ...
  89. [89]
    NOVOZYMES — Driven by research and scientists - Nature
    Jan 11, 2001 · Novozymes is the world's largest discoverer, manufacturer and marketer of industrial enzymes, with over 500 products.
  90. [90]
    Industrial Enzymes Market Size, Share | Industry Report, 2033
    Market size value in 2025. USD 7,997.6 million. Revenue forecast in 2033. USD 12,640.2 million. Growth rate. CAGR of 6.2% from 2025 to 2033. Base year for ...
  91. [91]
    Fc-fusion proteins: new developments and future perspectives - PMC
    Perhaps most important, the presence of the Fc domain markedly increases their plasma half-life, which prolongs therapeutic activity, owing to its interaction ...
  92. [92]
    Pembrolizumab: Uses, Interactions, Mechanism of Action - DrugBank
    The terminal half-life of pembrolizumab is 22 days. Clearance. Clearance is moderately lower at steady-state (195 mL/day) than after the first dose (252 mL ...Missing: engineering | Show results with:engineering
  93. [93]
    KEYTRUDA® (pembrolizumab) Demonstrates Long-Term Survival ...
    Oct 20, 2025 · Five-year exploratory follow-up analysis of KEYNOTE-671 continued to show clinically meaningful improvements in overall survival and ...
  94. [94]
    Blinatumomab, a Bispecific T-cell Engager (BiTE(®)) for ... - PubMed
    Blinatumomab is a bispecific T-cell engager (BiTE(®)) antibody construct that transiently links CD19-positive B cells to CD3-positive T cells.Missing: engineering | Show results with:engineering
  95. [95]
    Impact of blinatumomab on patient outcomes in relapsed/refractory ...
    Oct 2, 2018 · In R/R B-cell ALL, blinatumomab was associated with an improved median overall survival of 7.7 months vs 4.0 months with traditional chemotherapy.Missing: clinical | Show results with:clinical
  96. [96]
    Blinatumomab for MRD-Negative Acute Lymphoblastic Leukemia in ...
    Jul 24, 2024 · The addition of blinatumomab to consolidation chemotherapy in adult patients in MRD-negative remission from BCP-ALL significantly improved overall survival.
  97. [97]
    Fc Fusion Protein: Structure, Function, and Clinical Application
    Fusion of IL-2 to an IgG Fc domain significantly extends its circulating half-life, thereby supporting more sustained therapeutic activity.
  98. [98]
    The Evolving Protein Engineering in the Design of Chimeric Antigen ...
    Dec 27, 2019 · Chimeric antigen receptors (CARs) are synthetic proteins engineered to be expressed on the cell surface of cytotoxic immune cells, such as T ...
  99. [99]
    CAR-T cell therapy for cancer: current challenges and future directions
    Jul 4, 2025 · This review offers an overview of the current development of CAR-T cell therapies for both hematologic and solid tumors, while examining the ...
  100. [100]
    Recent advances in antibody glycoengineering for the gain of ...
    Here, we review the recent approaches to achieve glycoengineered antibodies, including the genetic engineering of the expression system, the in vitro chemo- ...
  101. [101]
    Using glyco-engineering to produce therapeutic proteins - PMC - NIH
    Glyco-engineering offers great potential for the generation of glycoprotein therapeutics with reduced side effects and enhanced activity. Analysis of ...Missing: mitigate | Show results with:mitigate
  102. [102]
    Multivalent designed proteins neutralize SARS-CoV-2 variants of ...
    Designed miniprotein receptor mimics geometrically arrayed to match pathogen receptor binding sites could be a widely applicable antiviral therapeutic strategy ...
  103. [103]
    Improving prediction performance of general protein language ...
    Sep 7, 2024 · The universal PLM ESM2 is formulated leveraging approximately 65 million non-redundant protein sequences through masked pretraining, culminating ...
  104. [104]
    Protein Engineering Strategies to Expand CRISPR‐Cas9 Applications
    Aug 2, 2018 · In this review, we will discuss recent protein-engineering approaches to expand Cas9 ... CRISPR-Cas9 for genome engineering, Cell. (2014) ...2. Engineering Cas9 By... · 2.2. Base Editing With Cas9... · 2.3. Cas9 Chimeras For...
  105. [105]
    Quantum Computing in Biopharma: Future Prospects and Strategic ...
    May 1, 2025 · Quantum computing (QC) in the pharmaceutical sector is transitioning from academic research to a specialist, pre-utility phase, ...
  106. [106]
    Accelerated enzyme engineering by machine-learning guided cell ...
    Jan 20, 2025 · We develop a machine learning (ML)-guided platform that integrates cell-free DNA assembly, cell-free gene expression, and functional assays.
  107. [107]
    Protein engineering: security implications: The increasing ability to ...
    This article assesses the security risks that are associated with protein engineering and explores some ways of minimizing them.
  108. [108]
    AI-Enabled Protein Design: A Strategic Asset for Global Health and ...
    Oct 28, 2024 · Promoting broad and equitable access to AI tools for biodesign should be considered essential for building 21st century regional public ...
  109. [109]
    Protein engineering as a driver of innovation in therapeutics ...
    Sep 27, 2025 · Protein engineering has emerged as a transformative force in biotechnology, enabling breakthroughs in therapeutics, agriculture, ...
  110. [110]
    Protein Engineering Market to Reach US$ 10.4 Billion by 2031 ...
    Oct 30, 2024 · The global protein engineering market size is estimated to be valued at USD 3.61 billion in 2024 and is expected to surpass USD 10.4 billion by 2031, growing ...Missing: projections | Show results with:projections