Fact-checked by Grok 2 weeks ago

Amino acid replacement

Amino acid replacement, also referred to as amino acid substitution, is the naturally occurring or experimentally induced exchange of one or more amino acids in a protein's primary sequence with different amino acids. This process typically results from point mutations in the encoding DNA, such as single nucleotide polymorphisms that alter the codon, leading to the incorporation of a new amino acid during protein synthesis. Depending on the chemical properties of the original and replacement amino acids—such as size, charge, hydrophobicity, or polarity—the substitution can range from having minimal impact to causing significant changes in the protein's folding, stability, enzymatic activity, or interactions with other molecules. Amino acid replacements are broadly classified into conservative and radical (or non-conservative) types based on the similarity of the physicochemical properties between the original and substituted . Conservative substitutions involve with comparable traits, such as replacing one hydrophobic residue (e.g., ) with another (e.g., ), which often preserves and function with little disruption. In contrast, radical substitutions exchange with dissimilar properties, like a charged for a non-polar , potentially leading to misfolding, reduced stability, or loss of function. These classifications influence substitution rates, with conservative changes occurring more frequently due to lower selective pressure against them. In , replacements drive protein diversification and across , shaped by mutational biases, , and . or nearly neutral substitutions accumulate over time without affecting , while beneficial ones enhance traits like efficiency and become fixed in populations; deleterious replacements, however, are typically purged by purifying selection. A classic example is the sickle cell in , where a glutamic acid-to-valine at position 6 of the β-globin chain alters the protein's hydrophobicity, causing sickling and the disease sickle cell anemia. Such changes underscore the role of substitutions in both disease pathology and evolutionary resilience, as the same confers resistance in heterozygous carriers. Beyond natural evolution, replacements are harnessed in and bioinformatics to design novel proteins or analyze sequences. Experimental techniques like enable precise substitutions to improve properties such as thermal stability or substrate specificity in enzymes. In , empirical replacement matrices—such as the or Le and Gascuel () series—quantify substitution probabilities based on observed alignments, aiding in , construction, and prediction of functional impacts. These tools, derived from large datasets of protein families, account for factors like similarity and evolutionary conservation to score potential replacements accurately.

Fundamentals

Definition and context

Amino acid replacement refers to the of one residue for another within a protein sequence, occurring through evolutionary processes, genetic mutations, or deliberate experimental modifications in . These substitutions alter the primary structure of proteins, potentially influencing their folding, stability, and function. Proteins are composed of 20 standard amino acids, which are genetically encoded by the 64 possible triplets of nucleotides known as codons in messenger RNA, derived from the DNA sequence via the universal genetic code. Each codon specifies one of these amino acids or a stop signal, with most amino acids represented by multiple synonymous codons to provide redundancy in the code. In , replacements often result from point mutations in DNA, such as single substitutions that change a codon to specify a different , thereby modifying the resulting protein sequence. This process plays a central role in protein , where such changes accumulate over generations, driving and adaptation, and is also integral to bioinformatics applications like , where substitutions are scored to infer evolutionary relationships between proteins. The concept emerged from early protein sequence comparisons in the 1960s, when limited data on homologous proteins revealed patterns of substitutions that informed models of evolutionary change. Pioneering work by Margaret Dayhoff and colleagues, through the compilation of protein sequences in the Atlas of Protein Sequence and Structure (first published in 1965), laid the foundation for quantitative substitution matrices like the Point Accepted Mutation (PAM) matrices, which estimate the likelihood of one amino acid replacing another over evolutionary time.

Biological significance

Amino acid replacements exert profound functional effects on proteins, either preserving their native structure and activity or inducing alterations that compromise stability, folding, enzymatic kinetics, or molecular interactions. Substitutions that maintain similar side-chain properties, such as replacing one hydrophobic residue with another, often sustain protein conformation and biological roles, minimizing disruptions to secondary and tertiary structures. However, radical changes can destabilize folding pathways, reduce thermal stability, or alter binding interfaces, leading to loss of function or gain of toxic properties. For example, in sickle cell anemia, a single to substitution at position 6 of the β-globin chain replaces a charged, hydrophilic residue with a neutral, hydrophobic one, creating a sticky patch that promotes deoxyhemoglobin , erythrocyte deformation, vaso-occlusion, and chronic . Disruptive replacements in active sites exemplify severe impacts; for instance, substitutions in catalytic residues of enzymes like β can abolish incorporation fidelity and efficiency by over 1,000-fold, highlighting how such changes directly impair substrate binding and stabilization. From an evolutionary perspective, replacements serve as the primary mechanism for nonsynonymous mutations, generating heritable variation that fuels , neutral evolution, and occasionally . Neutral replacements, which typically occur in solvent-exposed or non-conserved regions distant from functional cores, accumulate via without costs, enhancing population-level and buffering against future selective pressures. Deleterious substitutions, particularly in conserved domains or active sites, face strong purifying selection and are rapidly eliminated, constraining evolutionary rates in essential proteins. Adaptive replacements, conversely, can confer selective advantages—such as improved metabolic efficiency or pathogen resistance—driving positive Darwinian evolution and contributing to phenotypic divergence across lineages. These dynamics have broader implications for biodiversity and disease. Evolutionarily, clusters of adaptive substitutions in key proteins, like those enabling convergent adaptations in fish antifreeze glycoproteins, promote ecological specialization and lineage splitting, thereby increasing species diversity in heterogeneous environments. Pathologically, beyond sickle cell anemia and cystic fibrosis, such replacements underlie myriad disorders, from enzyme deficiencies to oncogenic transformations, underscoring their dual role in health and evolutionary innovation.

Classification of Replacements

Conservative replacements

Conservative replacements, also known as conservative substitutions, occur when one in a protein sequence is replaced by another with similar physicochemical properties, such as comparable size, charge, hydrophobicity, or polarity, thereby reducing the likelihood of significant alterations to the protein's structure or function. These substitutions are typically accepted by because the substituting functions similarly to the original, as observed in early empirical studies of protein evolution. The criteria for classifying a replacement as conservative rely on shared chemical characteristics of the side chains or functional groups. For instance, amino acids with aliphatic side chains, like , , and , share hydrophobic and branched structures that maintain similar packing in protein cores. Similarly, acidic residues such as and both carry negatively charged carboxyl groups, preserving electrostatic interactions, while polar uncharged residues like serine and threonine feature hydroxyl groups that support hydrogen bonding. Representative examples include substitutions among nonpolar aliphatic amino acids, such as to or , which are frequent due to their minimal impact on hydrophobicity; polar pairs like serine to ; and charged pairs like to or to . In natural proteins, these conservative changes are enriched, particularly in slowly evolving sequences, where they constitute a higher proportion of fixed missense variations compared to radical substitutions—for example, to and to occur hundreds of times in observed datasets, often with PAM scores exceeding 12 indicating high acceptability. Overall, conservative substitutions account for the majority of accepted changes in conserved protein regions, reflecting their prevalence in evolutionary alignments. These replacements offer advantages in by being largely neutral or slightly beneficial, as they preserve key aspects of , stability, and enzymatic activity without introducing disruptive shifts in properties. This allows for subtle adaptations, such as optimizing local interactions, while avoiding deleterious effects that could impair . In contrast to replacements, conservative ones facilitate gradual evolutionary refinement in functional proteins.

Radical replacements

Radical replacements involve the substitution of an with another that possesses markedly dissimilar physicochemical properties, such as alterations in charge, hydrophobicity, , or size, which can profoundly impact and stability. These changes contrast with conservative replacements by introducing disruptions that extend beyond minor adjustments, often compromising the native conformation of the protein. The criteria for identifying radical replacements center on substantial differences in key attributes: for instance, a shift from an acidic residue like (negatively charged) to a basic one like (positively charged) reverses electrostatic properties, potentially breaking bridges essential for structural integrity. Similarly, exchanges affecting size, such as (the smallest amino acid) to (a large, bulky aromatic residue), can sterically hinder tight packing in protein cores or helices. Variations in hydrophobicity or polarity may also destabilize hydrophobic cores or solvent-exposed regions, while loss of specific features like the thiol group in to swaps eliminates disulfide bond formation, critical for maintaining in extracellular proteins. Proline to substitutions exemplify how replacing a rigid, helix-breaking residue with a flexible, charged one can distort beta-turns and local secondary elements. Representative examples illustrate these effects: in sickle cell anemia, the glutamic acid to valine substitution at beta-globin position 6 transforms a hydrophilic, charged surface residue into a hydrophobic one, promoting abnormal polymerization and red blood cell sickling. In osteogenesis imperfecta, a glycine to tryptophan mutation in the COL1A1 of introduces bulkiness into the Gly-X-Y repeat motif, preventing proper assembly and resulting in brittle bones. The consequences of replacements frequently include loss-of-function through protein misfolding or aggregation, though rare gain-of-function scenarios can arise if the change creates novel interactions; these substitutions are particularly detrimental in conserved functional domains, often manifesting as genetic diseases like hemoglobinopathies or disorders.

Physicochemical Metrics

Grantham's

Grantham's distance is a numerical designed to quantify the physicochemical dissimilarity between pairs of , originally proposed by Richard Grantham in 1974 to correlate with observed frequencies in evolving proteins. This is computed from three side-chain properties: (ratio of non-carbon atom weight to carbon atom weight), (on a relative scale reflecting electron distribution), and (molecular size in cubic angstroms), which are selected for their strong with evolutionary substitution rates. The properties are first normalized, and the distance is calculated as an adjusted to ensure the mean pairwise value across all is 100: d_{i,j} = \rho \sqrt{\alpha (c_i - c_j)^2 + \beta (p_i - p_j)^2 + \gamma (v_i - v_j)^2} where c, p, and v are the normalized composition, polarity, and volume values, respectively; \alpha = 1.833, \beta = 0.1018, \gamma = 0.000399, and \rho = 50.723 are scaling constants derived to match empirical substitution data. The resulting scores range from a minimum of 5, indicating high similarity (e.g., between aspartic acid (D) and glutamic acid (E)), to a maximum of 215, reflecting extreme dissimilarity (e.g., between glycine (G) and tryptophan (W)). Scores below approximately 100 typically denote conservative replacements with minimal structural disruption, while those above 100 suggest radical changes likely to alter protein function. Grantham's distance has been widely applied in bioinformatics to evaluate the evolutionary cost of substitutions, inform the design of substitution matrices for protein (such as in assessing impacts alongside empirical matrices), and predict the pathogenicity of missense variants in genetic studies. The complete set of 190 unique pairwise distances (from the 20 standard ) is presented in the below, ordered by one-letter codes as S, R, L, P, T, A, V, G, I, F, Y, C, H, Q, N, K, D, E, M, W (diagonal values are 0 for self-comparisons).
SRLPTAVGIFYCHQNKDEMW
S0110145745899124561421551441128968461216580135177
R1100102103711129612597977718029438626965491101
L14510209892963213852236198991131531071721381561
P7410398038276842951141101697776911031089387147
T587192380586959891039214947426578856581128
A991129627580646094113112195869111110612610784148
V124963268696401092950551928496133971521212188
G5612513842596010901351531471599887801279498127184
I1429759589942913502133198941091491021681341061
F155972211410311350153210222051001161581021771402840
Y14477361109211255147332201948399143851601223637
C1121801981691491951921591982051940174154139202154170196215
H892999774786849894100831740246832814087115
Q684311376429196871091169915424046536129101130
N468615391651111338014915814313968460942342142174
K1212610710378106971271021028520232539401015695110
D65961721088512615294168177160154816123101045160181
E805413893651071219813414012217040294256450126152
M1359115878184211271028361968710114295160126067
W1771016114712814888184614037215115130174110181152670
A key limitation of Grantham's distance is its reliance solely on intrinsic side-chain properties, without considering site-specific factors such as environment or interactions with neighboring residues, which can modulate tolerability.

Sneath's index

Sneath's index is a dissimilarity measure for replacements, introduced by H. A. Sneath in to quantify overall resemblance between the 20 standard based on 134 physicochemical and biochemical properties, such as molecular weight, , hydrogen bonding capacity, and hydrophobicity. These properties are treated as attributes (present or absent), allowing the index to capture broad differences in and potential . The index calculates pairwise dissimilarity D using a fourfold for shared and non-shared attributes between two , with the formula: D = 100 \times \frac{b + c}{a + b + c + d} where a is the number of properties present in both , b present only in the first, c only in the second, and d in neither; this yields scores from 0 () to higher values (up to around 80-90) for opposites like and . in the original work reduces the properties to four key vectors representing composite factors like aliphatic chain length and , emphasizing and structural effects. For instance, and exhibit a low D score (approximately 25) due to their shared aromatic rings and hydrophobic profiles. As an early quantitative tool, Sneath's index facilitated evolutionary analyses of protein sequences by scoring replacement acceptability based on physicochemical divergence, influencing models of substitution rates. It has been incorporated into some algorithms to penalize dissimilar substitutions, though often alongside matrices like or . Despite its breadth, the index oversimplifies differences by relying solely on binary attributes without weighting magnitudes or accounting for contextual interactions like environments. Compared to Grantham's , it offers a more expansive property set but lacks focus on specific metrics like and volume.

Epstein's coefficient of difference

Epstein's coefficient of difference, developed by Charles J. Epstein in , is a metric designed to quantify the extent of differences between based on their side-chain composition and residue volume, aiding in the analysis of non-random substitutions in evolving proteins. This coefficient helps evaluate the compatibility of replacements by considering structural and chemical properties that influence protein conformation and function. The formula for the coefficient is given by d = \left| \frac{V_1 - V_2}{\sqrt{V_1 \cdot V_2}} \right| + |C_1 - C_2|, where V represents the residue and C is a factor, such as or nonpolarity . This approach normalizes the volume difference relative to the of the volumes, adding the in composition to capture overall dissimilarity. Low values of d indicate high similarity between , facilitating conservative replacements; for instance, the value for to serine is approximately 0.5, suggesting moderate compatibility. The metric has been applied to approximately 100-150 pairs in early studies of protein sequences. In applications, Epstein's coefficient influenced the development of early substitution models in by highlighting preferences for replacements that minimize structural disruption. It has proven useful in structural predictions, such as assessing the impact of mutations on and stability in homologous proteins. Related to Sneath's index, which focuses on hydrophobicity, Epstein's metric provides a broader structural by integrating volume with composition. A key limitation of Epstein's coefficient is its relatively lesser emphasis on polarity compared to subsequent metrics like Grantham's distance, which explicitly weights multiple physicochemical properties including charge.

Miyata's distance

Miyata's distance is a physicochemical metric proposed by Takashi Miyata, Sanzo Miyazawa, and Teruo Yasunaga in 1979 to quantify the dissimilarity between amino acids based on differences in their side-chain and . This approach simplifies the assessment of replacement costs by projecting amino acids onto a defined by these properties, facilitating the study of substitution patterns in protein sequences. The metric emphasizes how changes in (affecting hydrophobicity and interactions) and (impacting structural fit) contribute to evolutionary constraints. The distance d between two is computed using the formula: d = \sqrt{(P_1 - P_2)^2 + (V_1 - V_2)^2} where P_1 and P_2 represent the indices, and V_1 and V_2 denote the relative volumes of the s. values are derived from scales reflecting distribution and potential, while volumes are normalized relative to a standard (e.g., ). This yields a scale from 0 for identical to approximately 10 for highly dissimilar pairs, such as and , where has minimal volume and neutral , contrasting arginine's large, positively charged . Miyata's distance finds primary application in calculating evolutionary distances between protein sequences and constructing substitution matrices for phylogenetic analysis. For instance, it integrates into methods for estimating rates of substitutions from sequences, accounting for functional constraints in mRNA . Its streamlined focus on two key properties makes it efficient for modeling conservative changes in Japanese-led studies on protein divergence. However, the metric's adoption remains limited outside specific contexts, overshadowed by more comprehensive indices like Grantham's, due to its reliance on unweighted property scales that can skew results if and units are not balanced. This sensitivity highlights the need for in implementations.

Experimental exchangeability

Experimental exchangeability refers to the empirical assessment of how frequently one substitutes for another in protein sequences, derived from observed substitutions in aligned protein families rather than theoretical physicochemical properties. These measures are captured in substitution matrices that quantify the likelihood of replacements based on real evolutionary data. Seminal work by Dayhoff and colleagues introduced the concept in the late , with the first (PAM) matrices developed from alignments of closely related proteins, and refined in 1978 using a dataset of 71 protein families encompassing 1,572 observed changes. The methodology involves counting substitutions in globally aligned sequences of phylogenetically close proteins, typically those sharing at least 85% to minimize multiple mutations at the same site, and normalizing these counts by the relative mutability of each to obtain exchangeability probabilities. For instance, the 1-PAM represents expected changes after 1% divergence, extrapolated to higher distances like PAM250 for more divergent sequences; in PAM250, the substitution of () for (), both hydrophobic branched-chain , receives a log-odds score of +4, indicating high observed exchangeability compared to random expectation. Later refinements, such as the Jones-Taylor-Thornton (JTT) , expanded this approach using a larger of globular proteins, clustered at 85% sequence to avoid , and deriving a 20×20 log-odds that better captures rates across a broader evolutionary range. These matrices, structured as 20×20 tables with rows and columns for each standard , provide log-odds scores where positive values denote substitutions more frequent than chance, serving as the foundation for probabilistic models in phylogenetic inference, such as maximum likelihood tree reconstruction under models like JTT + Γ for rate heterogeneity. Empirical matrices like and JTT complement theoretical physicochemical distances by grounding predictions in actual substitution frequencies observed in nature. Despite their utility, these empirical matrices have limitations, including bias toward the protein families and sequence compositions available at the time of construction— relied on a small dataset dominated by proteins, potentially underrepresenting rare substitutions or those in non-globular contexts, while JTT improves coverage but still struggles with low-frequency events in underrepresented taxa.

Amino Acid Characteristics

Typical amino acids

Typical amino acids, in the context of evolutionary replacements, refer to those residues that are highly abundant in protein sequences and frequently participate in substitutions due to their physicochemical versatility and minimal structural disruption. These include , , , and serine (Ser), which together comprise approximately 30-40% of all residues across proteomes. For example, occurs at about 9.6% frequency in the UniProtKB/Swiss-Prot database, reflecting its prevalence in eukaryotic proteins. These share characteristics such as neutrality, small size, or hydrophobicity, which enhance their mutability and ease of exchange during . , for instance, displays high relative mutability and commonly substitutes with and , owing to their shared hydrophobic properties that preserve and function. Similarly, alanine's compact, non-polar allows it to act as a versatile "wildcard" in sequence alignments, tolerating substitutions with other small residues without major energetic penalties. Serine's polar but uncharged nature further contributes to its flexibility in hydrophilic environments. In protein , these typical play a key role in facilitating neutral s, particularly in non-critical regions where changes do not compromise overall fitness, thereby enabling to shape sequence diversity over time. This contrasts with idiosyncratic amino acids like and , which are rarer and more constrained in their replaceability.

Idiosyncratic amino acids

Idiosyncratic amino acids refer to a of residues, such as (Cys), (Trp), (Gly), (Met), (Pro), and (His), that exhibit specialized chemical and structural roles in proteins, leading to their infrequent substitution during . These amino acids are distinguished by their properties that confer specific s, such as Cys's ability to form bridges essential for protein stability and folding, or Trp's bulky ring that facilitates aromatic interactions in binding sites. Unlike more versatile typical that allow broader replacements while maintaining function, these residues are tightly constrained, resulting in high evolutionary conservation to preserve protein integrity. Key characteristics of these include their generally low abundance in proteomes and reduced exchangeability with other residues. For instance, Trp is the rarest standard , comprising approximately 1% of residues in eukaryotic proteins, due to its complex and specialized roles in stabilizing hydrophobic cores or mediating protein-protein interactions. Similarly, Cys occurs at low frequencies (around 1-2%) and is often buried or involved in catalytic sites, where its group enables redox-sensitive functions, making substitutions rare to avoid disrupting these roles. Gly and , while more abundant, are conserved for their influence on backbone flexibility and rigidity, respectively; Gly's lack of a allows it to occupy tight turns impossible for bulkier residues, and 's cyclic structure imposes kinks in secondary structures. Met and His further exemplify this pattern, with Met's thioether providing protection and His's ring enabling proton transfer in active sites. Overall, their low exchangeability stems from these functional specificities, as evidenced by alignments showing higher conservation scores for these residues compared to aliphatic or polar counterparts. Specific examples highlight their constrained replacements. Cys rarely substitutes with serine (Ser), despite , because Ser lacks the reactivity needed for formation or metal coordination in enzymes. Pro's rigidity limits exchanges to other residues that cannot replicate its cis-peptide bond propensity, which is crucial for turns in or signaling domains, leading to its preservation across orthologs. His in catalytic triads, such as in serine proteases, is highly conserved due to its near neutrality, facilitating acid-base ; mutations here impair activity without compensatory changes. Trp's conservation in proteins or ligand-binding pockets, like in the Trp cage , underscores its role in π-stacking, where alternatives like (Phe) fail to match its photophysical or steric properties. Gly's presence in selectivity filters of channels or β-turns exemplifies its irreplaceability for conformational flexibility. The implications of these idiosyncrasies are profound for protein evolution, as at these sites are often deleterious, imposing strong selective pressures that enforce . Disruptions, such as oxidizing Cys or replacing His in active sites, can lead to loss of function, misfolding, or , thereby limiting evolutionary flexibility in these positions. This highlights how specialized residues act as evolutionary bottlenecks, ensuring the maintenance of critical protein architectures across while restricting adaptive substitutions.

Evolutionary Dynamics

Tendency for replacement

The tendency for replacement refers to the relative likelihood that a given position in a protein will undergo during , quantified through mutability rates that account for both genetic and physicochemical influences. These rates stem from the degeneracy of the , which allows multiple codons to specify the same , thereby affecting the probability of non-synonymous mutations from single changes, and from the bias toward transitions (purine-to-purine or pyrimidine-to-pyrimidine substitutions) over transversions, as transitions occur 2-3 times more frequently and often result in conservative changes due to the code's organization. Key factors driving this tendency include the structure of the , where with higher exhibit greater potential for replacement because their multiple codons provide more avenues for substitutions to alter the encoded without immediate lethality. For instance, is specified by four codons (GCU, , GCA, GCG), all sharing the same first two bases, which allows third-position changes (often transitions) to frequently yield synonymous or similar substitutions, contributing to its baseline mutability. In contrast, physicochemical barriers, such as the energetic cost of transversions leading to dissimilar , reduce the acceptance of certain replacements, though the minimizes disruptive effects by clustering similar in codon space. Empirical examples highlight these tendencies: serine, encoded by six codons (UCU, UCC, UCA, UCG, AGU, AGC), displays high mutability due to abundant substitution pathways, while , with only one codon (), shows lower propensity as most changes result in stop codons or major alterations. Relative mutability scores from analyses of aligned orthologous proteins in closely related species, such as Dayhoff et al.'s study of 71 protein families, assign serine a score of 120, 65, and 20 (scaled to at 100), reflecting observed accepted point mutations that incorporate both codon-based mutation probabilities and chemical compatibility. These data, derived from sequences with 1% divergence to minimize multiple hits, underscore how features amplify replacement tendencies for amino acids like serine while constraining those for singleton-codon amino acids like .

Patterns of replacement

Amino acid replacements in evolving proteins display systematic biases that favor substitutions preserving core physicochemical properties, such as hydrophobicity and charge, thereby minimizing structural disruption. Hydrophobic-to-hydrophobic swaps, like to , are notably frequent due to lower substitution rates for pairs with large hydrophobicity differences, while charge-conserving changes, such as aspartate to glutamate, maintain electrostatic interactions. These biases arise from selective pressures that tolerate similar-property exchanges more readily than radical ones, as evidenced by reduced rates for substitutions altering both size and charge significantly. Recurring motifs include , where identical substitutions occur independently across lineages, such as valine-to-isoleucine changes in multiple protein families, reflecting shared functional constraints. Site-specific patterns are common in conserved motifs, like active sites where replacements preserve charge balance, as seen in aspartate-to-glutamate shifts in catalytic domains across distant taxa. These examples underscore how evolutionary pressures drive repeated, property-preserving changes at functionally critical positions. Such patterns are discerned through phylogenetic reconstruction from , which trace ancestral-to-derived changes, and multiple sequence alignments, which quantify frequencies across orthologous proteins. Analyses reveal trends like predominant conservative replacements in core structural regions and elevated in flexible, variable loops exposed to . These methods highlight taxon-wide biases, building briefly on underlying propensities for certain exchanges. Empirical studies confirm that conservative substitutions comprise the majority—often over %—of fixed changes in globular proteins, with radical-to-conservative rate ratios typically below 1 across property classes like charge and . In extremophiles, patterns shift; thermophilic proteins show increased hydrophobic-enhancing replacements, such as nonpolar-to-more-hydrophobic swaps, contrasting mesophilic tendencies toward balanced compositions, as adaptations to demand greater stability. Psychrophilic proteomes exhibit analogous biases, with reduced substitution rates and preferences for polar residues to maintain flexibility at low temperatures.

Implications for protein evolution

Amino acid replacements in proteins are governed by a balance between processes and selective pressures, fundamentally shaping evolutionary trajectories. Under the theory proposed by , most substitutions are effectively , fixed by rather than adaptive advantage, particularly in less constrained protein regions where they do not significantly alter fitness. In contrast, positive selection drives the fixation of advantageous replacements that enhance fitness, such as those improving enzymatic efficiency or environmental , often detectable through elevated rates of nonsynonymous substitutions. These mechanisms interplay to allow proteins to evolve while maintaining core functions, with changes providing raw variation and positive selection refining it for . Functional innovation frequently arises from targeted replacements that modify specificity or catalytic properties. For instance, in enzymes like glyceraldehyde-3-phosphate (GAPDH), positively selected residues near substrate-binding interfaces enable shifts in metabolic pathways, facilitating to changing cellular environments over evolutionary timescales. Such substitutions can repurpose binding sites or alter reaction mechanisms, as seen in the evolution of polyol oxidoreductases where single changes confer bifunctionality, allowing enzymes to process multiple substrates. A classic example is the globin family, where key replacements in the heme-binding pocket, such as variations in proximal residues across vertebrate lineages, have optimized oxygen affinity and , enabling adaptations from aquatic to terrestrial environments. Protein divergence rates closely correlate with the frequency of tolerated replacements, with functionally constrained regions exhibiting slower due to purifying selection against disruptive changes. For example, transmembrane segments in integral membrane proteins accumulate substitutions at half the rate of exposed regions, reflecting structural rigidity that limits viable . Evolutionary models integrate empirical patterns of into substitution matrices, such as the matrix, to infer phylogenies by estimating branch lengths and ancestral states based on observed exchangeabilities. Complementarily, dN/dS ratios quantify selection by comparing nonsynonymous (dN) to synonymous (dS) rates; values exceeding 1 signal positive selection driving adaptive divergence, while ratios below 1 indicate purifying selection preserving function. Predicting the long-term evolutionary effects of individual replacements remains challenging due to interactions and context-dependent outcomes. Thermodynamic stability constraints limit most substitutions to those with minimal destabilizing effects (typically <5 kcal/), yet compensatory mutations often follow to restore balance, creating non-reversible paths that entrench specific sequences. further complicates forecasts, as the fitness impact of a replacement depends on prior mutations; for instance, in evolution, only a fraction of potential trajectories are viable due to intermediate incompatibilities. These factors underscore the difficulty in extrapolating single-site effects to proteome-wide over .

Applications and Advances

In bioinformatics and sequence analysis

In bioinformatics, amino acid replacement concepts are integral to sequence alignment algorithms, which quantify substitution likelihoods to identify similarities between protein sequences. Tools such as (Basic Local Alignment Search Tool) and employ substitution matrices like to score alignments, where higher scores indicate more conservative replacements based on empirical frequencies observed in conserved blocks of related proteins. matrices, particularly BLOSUM62, are derived from local alignments without gaps, providing log-odds ratios that reflect the probability of substitutions relative to chance, thereby enabling sensitive detection of distant homologs. These matrices facilitate scoring substitutions in both pairwise and multiple alignments, where penalties for non-conservative changes penalize mismatches that disrupt physicochemical properties. In pairwise alignments, algorithms like use these scores to extend seed matches into high-scoring segment pairs, prioritizing conservative replacements to infer functional similarity. For multiple alignments, incorporates position-specific scoring to guide progressive assembly, enhancing accuracy in diverse sets. Hidden Markov models (HMMs), such as those in , extend this by building profiles from alignments that model position-specific replacement probabilities, allowing profile-based searches that detect remote homologs more effectively than simple matrix scoring. Experimental exchangeability data, captured in these matrices, underpins their reliability by linking observed substitutions to evolutionary tolerances. A key application involves inferring protein through counts of conservative replacements, where alignments rich in such substitutions—e.g., to —support evolutionary relatedness despite sequence divergence. In genomic variant calling, tools like SIFT and PolyPhen predict the pathogenicity of substitutions by evaluating conservation and physicochemical impacts; SIFT uses to score intolerance to changes, while PolyPhen integrates structural features. Recent advances integrate to refine replacement predictions, surpassing traditional matrices in accuracy. For instance, AlphaMissense employs a fine-tuned model on variant databases to classify all possible human missense variants, achieving 80% agreement with experimental assays and outperforming tools like SIFT by identifying 89% of pathogenic variants as likely harmful. These ML approaches leverage on vast alignments to predict substitution effects contextually, enhancing variant prioritization in clinical .

In protein engineering and design

In protein engineering, site-directed mutagenesis is a key technique for introducing targeted amino acid replacements to enhance protein stability, often favoring conservative substitutions that minimize structural disruption. For instance, replacing asparagine with threonine at certain positions in T4 lysozyme has been shown to increase thermodynamic stability through improved hydrogen bonding, thereby improving resistance to unfolding. Such changes are selected based on physicochemical similarity, preserving core folding while boosting thermostability in applications like biocatalysts. Strategies for variant selection incorporate distance metrics, such as scores, to prioritize replacements with low evolutionary divergence and minimal functional impact, guiding the design of stable mutants. Complementary approaches involve , where libraries of variants are generated via error-prone or gene shuffling to systematically scan replacements, followed by for desired traits like catalytic efficiency. These methods enable iterative refinement, with conservative replacements often serving as a starting point to maintain baseline function during evolution. Notable examples include antibody engineering, where site-directed substitutions—even radical ones altering charge or size—have dramatically boosted binding affinity; for example, three specific changes in an anti-p-azophenylarsonate antibody increased affinity by over 200-fold through optimized interactions at the antigen-binding site. Similarly, green fluorescent protein (GFP) variants have been developed via targeted substitutions, such as replacing key residues in the chromophore environment to shift emission spectra, yielding enhanced brightness and pH stability for imaging applications. These engineering efforts have yielded improved therapeutics, such as affinity-matured antibodies for targeted cancer therapies, and robust , like thermostable hydrolases used in production, where substitutions enhance activity under harsh conditions. Overall, such replacements facilitate the creation of proteins with tailored properties, advancing from laboratory design to practical deployment.

Recent developments

In recent years, protein language models have revolutionized the prediction of replacement effects by leveraging vast sequence data to infer functional and structural consequences of substitutions. The ESM-2 model, introduced in 2022, uses transformer-based architectures to generate embeddings that capture evolutionary patterns, enabling accurate zero-shot predictions of mutation impacts on protein stability and interactions, often outperforming traditional biophysical methods. Building on this, AlphaFold3, released in 2024, integrates diffusion-based modeling to simulate the structural ramifications of substitutions in protein complexes, providing high-fidelity predictions of how replacements alter binding affinities and conformational dynamics, as demonstrated in analyses of natural variants in bacterial proteins. Advancements in metrics for assessing replacements have incorporated three-dimensional structural context through -based approaches, enhancing the evaluation of substitution feasibility beyond linear alignments. A utilizing equivariant neural networks derived contextual encodings from protein atomic coordinates, yielding distance metrics that account for spatial interactions and improve predictions of exchangeable residues in folded states. Similarly, denoising models from the same year have enabled folding tasks, where guide the generation of viable , revealing patterns of spatial exchangeability that traditional matrices overlook. Large-scale metagenomic analyses have uncovered novel patterns of replacements in microbial communities under environmental , highlighting adaptive substitutions driven by climate-related pressures. A 2025 metagenomic survey of heat-ed freshwater ecosystems identified enriched hub with specific variants enhancing metabolic , such as replacements in heat-shock proteins that correlate with elevated temperatures. These findings, drawn from diverse proteomes, reveal increased substitution rates in carbon pathways amid altered patterns, underscoring how climate accelerates evolutionary dynamics at the residue level. Efforts to address gaps in substitution modeling include the of updated matrices derived from expansive, diverse proteomes, which better capture variability across taxa. The 3Di , proposed in 2025, integrates structural alignments from thousands of proteins to quantify replacement frequencies in deep phylogenies, offering improved accuracy for non-canonical exchanges in underrepresented lineages. Complementing this, CRISPR-based experiments have validated predicted substitutions by introducing targeted mutations and assessing phenotypic outcomes, as seen in 2025 studies engineering variants where machine learning-guided replacements were confirmed to enhance editing specificity without off-target effects. Looking ahead, is advancing the incorporation of non-standard into proteins, expanding the repertoire of functional replacements for therapeutic and materials applications. In 2025, genome recoding platforms enabled the programmable synthesis of proteins with unnatural residues, achieving yields up to 50% higher than prior methods by orthogonal tRNA/ pairs. Recent 2024-2025 developments in non-canonical , including modular cascades, have facilitated site-specific integrations that confer novel properties like enhanced , paving the way for designer proteins resilient to environmental challenges.

References

  1. [1]
  2. [2]
    Sickle Cell Disease—Genetics, Pathophysiology, Clinical ...
    May 7, 2019 · A single base-pair point mutation (GAG to GTG) results in the substitution of the amino acid glutamic acid (hydrophilic) to Valine (hydrophobic) ...
  3. [3]
    Amino Acid Substitution - an overview | ScienceDirect Topics
    Amino acid substitutions refer to the replacement of one amino acid in a protein with another, which can impact the protein's structure and function based ...
  4. [4]
    Amino Acid Properties, Substitution Rates, and the Nearly Neutral ...
    For example, Smith (2003) defined amino acid substitutions as either “radical” (physicochemically different) or “conservative” (physicochemically similar) and ...
  5. [5]
    Improved General Amino Acid Replacement Matrix - Oxford Academic
    Amino acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches.
  6. [6]
    Amino Acid Substitution - an overview | ScienceDirect Topics
    Amino acid substitutions refer to changes in the amino acid sequence of proteins that can enhance the copying fidelity of viral polymerases, ...
  7. [7]
    What is Amino Acid Replacement? - News-Medical
    Neutral amino acid replacements occur when the new amino acid does not significantly alter the protein or its function. Deleterious amino acid replacements are ...
  8. [8]
    On the Evolutionary History of the Twenty Encoded Amino Acids - PMC
    α‐Amino acids are essential molecular constituents of life, twenty of which are privileged because they are encoded by the ribosomal machinery.
  9. [9]
    Types of mutations - Understanding Evolution
    A substitution is a mutation that exchanges one base for another (i.e., a change in a single “chemical letter” such as switching an A to a G). Such a ...
  10. [10]
    Amino acid substitution models - Evolution and Genomics
    A model of evolution in which amino acids mutate randomly and independently from one another but according to some predefined probabilities depending on the ...
  11. [11]
    Margaret Belle Dayhoff - PMC - NIH
    She originated one of the first substitution matrices, Point accepted mutations (PAM). The one-letter code used for amino acids was developed by her ...
  12. [12]
    Molecular mechanisms of cystic fibrosis – how mutations lead to ...
    Clinically, the most devastating CF phenotype is the impairment of lung function and airways disease. Mutations in the CFTR gene lead to defective CFTR ...
  13. [13]
    Amino acid substitution in the active site of DNA polymerase β ... - NIH
    May 16, 2013 · The D256E enzyme is more than 1,000-fold less active than the wild-type enzyme, and the crystal structures are subtly different in the active ...
  14. [14]
    Why highly expressed proteins evolve slowly | PNAS
    Proteins vary in their tolerance for amino acid substitutions (37), providing the necessary raw material for evolution, whereas misfolded-protein aggregation ...
  15. [15]
    Mass & secondary structure propensity of amino acids explain their ...
    Aug 10, 2017 · Substitution matrices, such as BLOSUM62 & BLOSUM100, define seven replacement pairs of amino acids. Our structural similar pairs do coincide ...<|control11|><|separator|>
  16. [16]
    Discovering functionally important sites in proteins - Nature
    Jul 13, 2023 · Since most proteins need to be folded to function, it may, however, be difficult to deconvolute the effects of amino acid substitutions on ...Missing: replacements | Show results with:replacements
  17. [17]
    [PDF] dayhoff-1978-apss.pdf
    An accepted point mutation is a replacement of one amino acid by another, accepted by natural selection, where the new amino acid usually functions similarly ...
  18. [18]
    A global analysis of conservative and non-conservative mutations in ...
    Dec 30, 2021 · We found that the most frequent substitutions are T > I, L > F, and A > V and we observe accumulation of hydrophobic residues. These findings ...
  19. [19]
    What is a conservative substitution? | Journal of Molecular Evolution
    A substitution of one amino acid residue for another has a far greater chance of being accepted if the two residues are similar in properties.
  20. [20]
    The amino-acid mutational spectrum of human genetic disease
    Oct 30, 2003 · Alternatively, a particular amino-acid substitution can be damaging to a human protein but be relatively frequent in the homologous family due ...
  21. [21]
    Identification of the Cysteine Residue Responsible for Disulfide ...
    Here we show, using a mutagenesis study, that a single cysteine-to-alanine substitution at the β2 extracellular residue Cys-26 is sufficient to disrupt the ...
  22. [22]
    The first glycine‐to‐tryptophan substitution in the COL1A1 gene ...
    Jun 24, 2022 · We have identified the first glycine to tryptophan substitution (p.Gly245Trp) located in the COL1A1 gene in a patient with progressively‐deforming type of ...
  23. [23]
    Amino Acid Difference Formula to Help Explain Protein Evolution
    A formula for difference between amino acids combines properties that correlate best with protein residue substitution frequencies.Missing: original | Show results with:original
  24. [24]
    [PDF] Calculate the Grantham Distance
    The Grantham distance attempts to provide a proxy for the evolutionary distance between two amino acids based on three key chemical properties: composition, ...
  25. [25]
    Grantham score - Ion Reporter
    The Grantham score attempts to predict the distance between two amino acids, in an evolutionary sense. A lower Grantham score reflects less evolutionary ...Missing: paper | Show results with:paper
  26. [26]
    Optimization of amino acid replacement costs by mutational ... - Nature
    Apr 21, 2017 · The empirical matrices appeared to minimize at best costs of amino acid replacement according to conformational parameter for alpha helix, which ...
  27. [27]
    Universal Evolutionary Index for Amino Acid Changes
    Aug 1, 2004 · The analysis shows that the EI values differ by at least 10-fold between conservative and radical changes but their relative ranking appears ...
  28. [28]
    The Exchangeability of Amino Acids in Proteins - PMC - NIH
    Formally related to these are various schemes to distinguish “conservative” from “radical” amino acid changes (Hughes et al. 1990; Hughes 1992; Rand et al. 2000 ...
  29. [29]
    Protein Structure, Neighbor Effect, and a New Index of Amino Acid ...
    The maximum likelihood values (table 11 ) show that setting all amino acid distances as equal is the worst, followed by Grantham's and Miyata's distances.Missing: limitations | Show results with:limitations
  30. [30]
    Two types of amino acid substitutions in protein evolution
    The frequency of amino acid substitutions, relative to the frequency expected by chance, decreases linearly with the increase in physico-chemical differences.
  31. [31]
    Molecular evolution of mRNA: a method for estimating ... - PubMed
    A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented.Missing: distance | Show results with:distance
  32. [32]
    PAM250 Scoring Matrix - THE GPM
    The matrix is frequently used to score aligned peptide sequences to determine the similarity of those sequences.
  33. [33]
    rapid generation of mutation data matrices from protein sequences
    Article contents. Cite. Cite. David T. Jones, William R. Taylor, Janet M. Thornton, The rapid generation of mutation data matrices from protein sequences ...Missing: JTT | Show results with:JTT
  34. [34]
    Substitution scoring matrices for proteins ‐ An overview - Trivedi
    Sep 21, 2020 · In this review, we describe the development and application of various substitution scoring matrices peculiar to proteins with standard and biased compositions.
  35. [35]
    UniProtKB/Swiss-Prot Release 2025_04 statistics - Expasy
    AMINO ACID COMPOSITION 6.1 Composition in percent for the complete database Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.64 Ser (S) 6.65 Arg (R) 5.52 Glu (E) 6.71 ...
  36. [36]
    Alanine - Russell Lab
    Substitutions ... Alanine generally prefers to substitute with other small amino acids. Role in structure: Alanine is arguably the most boring amino acid.
  37. [37]
    Selectionism and Neutralism in Molecular Evolution - PMC - NIH
    Early molecular studies suggested that most amino acid substitutions in proteins are neutral or nearly neutral and the functional change of proteins occurs by a ...
  38. [38]
    Cysteine function governs its conservation and degeneration ... - NIH
    A major evolutionary outcome is that Cys usage on protein surfaces is limited, and the overall trend is that Cys is avoided in exposed regions of proteins ...
  39. [39]
    The Uniqueness of Tryptophan in Biology: Properties, Metabolism ...
    Tryptophan (Trp) holds a unique place in biology for a multitude of reasons. It is the largest of all twenty amino acids in the translational toolbox.
  40. [40]
    Glycine Residues Appear to Be Evolutionarily Conserved for Their ...
    Six glycine residues of human muscle acylphosphatase (AcP) are evolutionarily conserved across the three domains of life.
  41. [41]
    Positions of cysteine residues reveal local clusters and hidden ...
    Oct 29, 2024 · Cysteine is the least abundant amino acid in human proteins, and its position in protein sequences is often conserved across species.
  42. [42]
    Protein–protein interactions: Structurally conserved residues ... - PNAS
    Conservation of Trp on the protein surface indicates a highly likely binding site. To a lesser extent, conservation of Phe and Met also imply a binding site.
  43. [43]
    The protonation state of an evolutionarily conserved histidine ...
    Apr 1, 2019 · We postulate that the presence of this evolutionary histidine, which is only conserved in M, O and P subfamilies, is one of the many ...
  44. [44]
    Why is glycine a highly conserved amino acid? - Quora
    Jul 11, 2020 · Because glycine has just a hydrogen instead of a side chain, it is the least restrictive of how the polypeptide backbone can twist and turn. So ...
  45. [45]
    Conservation of cis prolyl bonds in proteins during evolution - PubMed
    Feb 15, 2005 · We observe that cis prolyl residues are more often conserved than trans prolyl residues, and both are more conserved than the surrounding amino ...
  46. [46]
    Methionine in proteins: The Cinderella of the proteinogenic amino ...
    Methionine in proteins fulfils an important antioxidant role, stabilizes the structure of proteins, participates in the sequence‐independent recognition of ...
  47. [47]
    The critical structural role of a highly conserved histidine residue in ...
    Oct 7, 2003 · The role of buried histidine residues in contributing to protein stability has been reported in a number of proteins, such as vigilin [15], ...<|separator|>
  48. [48]
    Proline: The Distribution, Frequency, Positioning, and Common ...
    Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained ...
  49. [49]
    An evolutionarily conserved tryptophan cage promotes folding of the ...
    Apr 18, 2025 · An evolutionarily conserved tryptophan cage promotes folding of the extended RNA recognition motif in the hnRNPR‐like protein family - PMC.
  50. [50]
    Glycine as a d-amino acid surrogate in the K+-selectivity filter | PNAS
    The K+ channel-selectivity filter consists of two absolutely conserved glycine residues. Crystal structures show that the first glycine in the selectivity ...
  51. [51]
    Molecular Frequency, Conservation and Role of Reactive Cysteines ...
    Many Cys residues have been shown to contribute to protein formation and stability in plants, primarily through formation of disulfide bonds, e.g. Bienert et al ...
  52. [52]
  53. [53]
    Evidence for the Selective Basis of Transition-to-Transversion ... - NIH
    Sep 25, 2017 · For example, a mutation that changes the charge of an amino acid is a “radical” change, whereas one that does not is a “conservative” change.
  54. [54]
    Genetic Codes - NCBI
    Detailed information on codon usage can be found at the Codon Usage Database. GenBank format by historical convention displays mRNA sequences using the DNA ...The Standard Code · The Mold, Protozoan, and... · The Bacterial, Archaeal and...
  55. [55]
    Fitting the standard genetic code into its triplet table - PNAS
    Aug 30, 2021 · The form of the standard genetic code (SGC) offers authoritative information about its origin. By calculating evolved coding tables via ...
  56. [56]
    [PDF] 22 A Model of Evolutionary Change in Proteins
    The relative mutabilities of the amino acids are shown in Table 21. On the average, Asn, Ser, Asp, and Glu are most mutable and Trp and Cys are least ...
  57. [57]
    Amino Acid Properties, Substitution Rates, and the Nearly Neutral ...
    Ratios of radical to conservative amino acid replacement are affected by mutational and compositional factors and may not be indicative of positive darwinian ...
  58. [58]
    Amino acid properties, substitution rates, and the nearly neutral theory
    Nov 6, 2024 · The nearly neutral theory predicts that greater selective constraint leads to slower rates of evolution; similarly, we expect amino acids that ...
  59. [59]
    Ratios of radical to conservative amino acid replacement ... - PubMed
    Ratios of radical to conservative amino acid replacement are affected by mutational and compositional factors and may not be indicative of positive Darwinian ...
  60. [60]
    Frequent and Widespread Parallel Evolution of Protein Sequences
    Some of the most common parallel substitutions in our data matrices involve amino acids with similar physicochemical properties, for example, valine and ...Materials And Methods · Results · Excess Homoplasy Is Largely...
  61. [61]
    Extensive parallelism in protein evolution - PMC - PubMed Central
    High, but below-neutral, rates of parallel amino acid replacements suggest that a majority of amino acid replacements that occur in evolution are subject to ...
  62. [62]
    A genetic polymorphism evolving in parallel in two cell ...
    Jan 12, 2013 · ... amino acid polymorphisms, Gly 335 ↔ Ser, Asp 503 ↔ Glu, and Ile 629 ↔ Val occur in both species, and in the first two cases are similar ...
  63. [63]
    Evolutionary patterns of amino acid substitutions in 12 Drosophila ...
    Dec 2, 2010 · We present a tool for the genome-wide analysis of frequencies and patterns of amino acid substitutions in multiple alignments of genes' coding ...
  64. [64]
    Ratios of Radical to Conservative Amino Acid Replacement are ...
    This method is based on the assumption that radical replacements are more likely than conservative replacements to improve the function of a protein.
  65. [65]
    Rates of conservative and radical nonsynonymous nucleotide ...
    The average radical/conservative rate ratio is 0.81 for charge changes, 0.85 for polarity changes, and 0.49 when both polarity and volume changes are ...
  66. [66]
    Comparative proteome analysis of psychrophilic versus mesophilic ...
    Jan 8, 2009 · We analyzed proteomes of psychrophilic and mesophilic bacterial species and compared the differences in amino acid composition and substitution patterns.Comparative Proteome... · Aromatic Amino Acids · Methods
  67. [67]
    Neutral Theory: The Null Hypothesis of Molecular Evolution | Learn Science at Scitable
    ### Summary: Neutral Theory vs Positive Selection in Context of Amino Acid Changes
  68. [68]
    The role of metabolism in shaping enzyme structures over ... - Nature
    Jul 9, 2025 · We reveal that metabolism shapes structural evolution across multiple scales, from species-wide metabolic specialization to network organization and the ...<|separator|>
  69. [69]
    Structural and functional innovations in the real-time evolution of ...
    Apr 17, 2017 · We found that selection affects enzyme structure and dynamics, and thus substrate preference, simultaneously and sequentially. Bifunctionality ...
  70. [70]
    Evolution of Hemoglobin and Its Genes - PMC - NIH
    The α-like globins are paralogous, meaning that they are homologous genes generated by gene duplication. ζ-globin is made in embryonic red cells, and α-globin ...
  71. [71]
    Selective Constraints, Amino Acid Composition, and the Rate of ...
    However, regression analyses showed that the variability in amino acid frequencies among proteins cannot explain more than 30% of the variability in ...
  72. [72]
    The Population Genetics of dN/dS - Research journals - PLOS
    The dN/dS ratio remains one of the most popular and reliable measures of evolutionary pressures on protein-coding regions. Much of its popularity stems from the ...
  73. [73]
    Genetic Constraints on Protein Evolution - PMC - PubMed Central
    Structural constraints have an impact on the tolerance of proteins to amino acid substitutions and therefore affect their rates of protein evolution. Several ...
  74. [74]
    Predicting the Effects of Amino Acid Substitutions on Protein Function
    Abstract. Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corre- sponding proteins.<|control11|><|separator|>
  75. [75]
    Accurate proteome-wide missense variant effect prediction ... - Science
    Sep 22, 2023 · AlphaMissense takes as input an amino acid sequence and predicts the pathogenicity of all possible single amino acid changes at a given position ...
  76. [76]
    effect of Ile→Val mutations in human lysozyme - ScienceDirect.com
    The contribution of the hydrophobic residues to protein stability has been studied using site-directed mutagenesis followed by thermodynamic and structural ...
  77. [77]
    Comparing mutagenesis and simulations as tools for identifying ...
    Dec 24, 2018 · To link the effects of specific amino acid substitutions with adaptive variations in enzyme thermal stability, we combined site-directed ...
  78. [78]
    library construction methods for directed evolution - PMC - NIH
    Abstract. Directed molecular evolution and combinatorial methodologies are playing an increasingly important role in the field of protein engineering.
  79. [79]
    Using Evolution to Guide Protein Engineering: The Devil IS in the ...
    Second, conservation can be identified via the use of a.a. substitution matrices (e.g., BLOSUM-62 (20)). These matrices are empirically determined from ...Main Text · Data Thresholding Greatly... · Mutations At Some...<|control11|><|separator|>
  80. [80]
    three engineered amino acid substitutions can increase the affinity ...
    Structural correlates of high antibody affinity: three engineered amino acid substitutions can increase the affinity of an anti-p-azophenylarsonate antibody 200 ...Missing: radical | Show results with:radical
  81. [81]
    Engineering green fluorescent protein for improved brightness ...
    The simplest way to shift the emission color of GFP is to substitute histidine or tryptophan for the tyrosine in the chromophore, but such blue-shifted point ...
  82. [82]
    Protein Engineering for Industrial Biocatalysis: Principles ... - MDPI
    This review explores the principles and methodologies of protein engineering, emphasizing rational design, directed evolution, semi-rational approaches,
  83. [83]
    Site-Directed Mutagenesis as Applied to Biocatalysts - IntechOpen
    Feb 5, 2013 · The Site-directed Saturation Mutagenesis (SDSM) approach consists ofusing all 20 amino acids at a position in a protein. Based on structural ...2.1. Sequence-Based... · 2.1. 2. Protein Sequence... · 3.3. 1. Transglycosylation...
  84. [84]
    [PDF] Language models of protein sequences at the scale of ... - bioRxiv
    Jul 21, 2022 · ESM-2 is a transformer-based language model, and uses an attention mechanism to learn interaction patterns between pairs of amino acids in the ...
  85. [85]
    Expert-guided protein language models enable accurate and ...
    Its prediction accuracy exceeded all of the following: zero-shot ESM-2 log-odds ratios between the mutant and wild-type amino acids (Δρ = 0.036, one-tailed ...2 Materials And Methods · 2.2. 1 Datasets · 3 Results
  86. [86]
    Accurate structure prediction of biomolecular interactions ... - Nature
    May 8, 2024 · Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes.
  87. [87]
    AlphaFold 3 accurately models natural variants of Helicobacter ...
    Aug 12, 2025 · Even single amino acid substitutions may alter ligand binding affinity, enzymatic activity, and protein stability. Yet, due to limitations in ...
  88. [88]
    [PDF] Contextual protein and antibody encodings from equivariant graph ...
    Jul 29, 2023 · The optimal residue identity at each position in a protein is determined by its structural, evolutionary, and functional context.
  89. [89]
    [PDF] Graph Denoising Diffusion for Inverse Protein Folding
    Inverse protein folding, or inverse folding, aims to predict feasible amino acid (AA) sequences that can fold into a specified 3D protein structure [22]. The ...
  90. [90]
    [PDF] Heat-tolerant network hub species make freshwater microbial ...
    Jun 7, 2025 · Metagenomics especially can recover full genomes from environmental microbes and further profile the taxonomic composition and functional ...
  91. [91]
    Long‐term elevated precipitation promotes an acid metabolic ...
    Jul 28, 2025 · Additionally, changes in precipitation patterns can influence microbial carbon metabolism by altering soil water availability, which affects ...
  92. [92]
    A General Substitution Matrix for Structural Phylogenetics - PMC - NIH
    Jun 6, 2025 · Our 3Di substitution matrix provides a starting point for revisiting many deep phylogenetic problems that have so far been extremely difficult to solve.
  93. [93]
    AI-guided Cas9 engineering provides an effective strategy to ...
    Sep 15, 2025 · Although ProMEP can currently predict the effects of any amino acid substitution in protein sequences, this strategy appears to only ...
  94. [94]
    Leveraging protein language models for cross-variant CRISPR ...
    Jul 2, 2025 · In this study, we proposed PLM-CRISPR, a novel deep learning-based model that leverages protein language models to capture Cas9 protein ( ...
  95. [95]
    Yale scientists recode the genome for programmable synthetic ...
    Feb 6, 2025 · A team of synthetic biologists have re-written the genetic code of an organism using a novel cellular platform for producing new classes of synthetic proteins.
  96. [96]
    Modular multi-enzyme cascades enable green and sustainable ...
    Aug 29, 2025 · Non-canonical amino acids (ncAAs) bearing diverse functional groups hold transformative potential in drug discovery, protein engineering, ...
  97. [97]
    Genetic Code Expansion: Recent Developments and Emerging ...
    Dec 31, 2024 · Notable developments include the refinement of aaRS/tRNA pairs, enhancements in screening methods, and the biosynthesis of noncanonical amino ...