Fact-checked by Grok 2 weeks ago

Homology modeling

Homology modeling, also known as comparative modeling, is a computational method in structural bioinformatics for predicting the three-dimensional () structure of a target protein based on its sequence and the known of a homologous template protein that shares evolutionary ancestry. This approach exploits the fundamental principle that proteins with sufficient sequence similarity—typically greater than 30% identity—adopt comparable folds due to conserved evolutionary relationships, where structural conservation often outpaces sequence divergence. As one of the most established techniques in , it has been instrumental in filling gaps in the (PDB) by enabling the modeling of sequences without experimentally determined structures, particularly when a suitable template is available. The origins of homology modeling trace back over half a century to early studies on protein folding and sequence-structure relationships in the 1960s and 1970s, but it gained practical momentum in the 1980s with the expansion of the PDB and seminal works demonstrating the feasibility of template-based modeling. Key advancements occurred through biennial Critical Assessment of Structure Prediction (CASP) experiments starting in 1994, which benchmarked and refined the method's accuracy, revealing that models can achieve root-mean-square deviation (RMSD) values below 1 Å for high-identity targets. By the 1990s, tools like MODELLER formalized the process, integrating statistical and physics-based refinements to address challenges such as alignment errors and loop flexibility. The core workflow of homology modeling involves several sequential steps to construct and validate a reliable model. First, a suitable is identified from structural databases like the PDB using sequence similarity searches (e.g., via or PSI-BLAST). Next, the target sequence is aligned to the template to conserved regions, followed by backbone coordinate transfer, modeling for regions (using methods like database scanning or ab initio generation), and side-chain packing with rotamer libraries (e.g., SCWRL). The model is then refined through energy minimization or simulations to resolve steric clashes, and finally validated using metrics such as Ramachandran plots, PROCHECK scores, or stereochemical assessments to ensure physical realism. Homology modeling's significance lies in its accessibility and efficiency compared to experimental methods like or cryo-electron microscopy, making it a for applications in , functional annotation, and structural . In pharmaceuticals, it facilitates structure-based by predicting binding sites for ligands, as seen in modeling G-protein coupled receptors (GPCRs) for development and of inhibitors against targets like histone deacetylases (HDACs). Despite limitations—such as reduced accuracy for low-identity templates (below 30%) or membrane proteins—ongoing integrations with and (e.g., ) continue to enhance its precision, even as methods like complement it for challenging cases.

Introduction

Definition and Principles

Homology modeling is a computational technique used to predict the of a target protein by leveraging the known experimental of a homologous protein. This method relies on the fundamental evolutionary that protein structures are more conserved than their sequences over time, allowing structural similarities to persist even as sequences diverge. At its core, homology modeling assumes that proteins sharing a common evolutionary ancestry—termed homologs—adopt similar folds due to shared descent, in contrast to analogous structures that arise from without a common ancestor. A key indicator of reliable homology is sequence identity, typically exceeding 30%, which correlates with structural similarity and model accuracy; this percentage is calculated as: \text{Sequence Identity (\%)} = \left( \frac{\text{Number of identical residues}}{\text{Total number of aligned residues}} \right) \times 100 Below this threshold, predictions become less dependable due to increased structural divergence. The basic workflow of homology modeling encompasses high-level steps: identifying suitable template structures through sequence similarity searches, aligning the target and template sequences, constructing the atomic model based on the template's coordinates, and validating the resulting structure for consistency. This approach is particularly valuable for proteins with functional similarities, as it extrapolates conserved structural features to infer the target's tertiary fold.

Historical Development

Homology modeling emerged in the late as one of the earliest computational approaches to , building on the observation that proteins with similar s often adopt similar three-dimensional folds. The foundational work was reported by Browne and colleagues in 1969, who manually constructed a model of bovine by aligning its to that of hen egg-white lysozyme—a structurally known homolog—and fitting coordinates through using physical models. This coordinate-fitting method represented an initial profile-based strategy, relying on similarity to infer structural , though it was labor-intensive and limited by the scarcity of available structures. The 1971 establishment of the (PDB) marked a pivotal shift, providing a centralized repository for experimentally determined structures that grew from just seven entries to thousands by the , enabling more systematic database-driven searches for homologous templates rather than ad hoc manual alignments. In the , quantitative insights advanced the field: Chothia and Lesk's 1986 analysis of homologous protein pairs demonstrated a nonlinear relationship between sequence divergence and structural deviation, establishing that even distantly related sequences (up to 30% identity) could retain core folds, thus justifying homology modeling for a broader range of targets. The 1990s saw the rise of automated tools, transforming homology modeling from manual processes to computational pipelines. A landmark was the development of MODELLER in 1993 by Sali and Blundell, which automated model generation by satisfying spatial restraints derived from template s and empirical potentials, significantly improving efficiency and accuracy. Concurrently, the inaugural Critical Assessment of Structure Prediction () experiment in 1994 introduced blind benchmarking, revealing homology modeling's strengths for targets with detectable homologs while highlighting needs for better and handling; subsequent CASPs through the decade refined these aspects and solidified the method's role. By the 2000s, refinements focused on challenging regions, such as loop modeling, with Fiser et al.'s 2000 extensions to MODELLER incorporating statistical potentials for loop conformation prediction, enhancing overall model quality. Homology modeling remained the dominant technique for structure prediction into the , applicable to over 50% of new sequences due to expanding structural databases, until advances like in 2020 demonstrated superior capabilities for cases without clear homologs. Subsequent advancements, such as 3 in 2024, have further expanded capabilities to model protein interactions with other biomolecules.

Prerequisites

Protein Structure Fundamentals

Proteins are macromolecules composed of amino acids linked by peptide bonds, and their three-dimensional structures are crucial for function. The structure of a protein is organized into four hierarchical levels. At the primary level, the structure is defined by the linear sequence of amino acids, which determines all higher-order folding. Secondary structure elements, such as alpha helices and beta sheets, arise from hydrogen bonding between the backbone carbonyl oxygen and amide hydrogen atoms within the polypeptide chain. Tertiary structure represents the overall three-dimensional fold of a single polypeptide chain, stabilized by hydrophobic interactions that bury nonpolar residues in the core, as well as electrostatic interactions, van der Waals forces, and disulfide bonds between cysteine residues. Quaternary structure occurs in proteins with multiple subunits, where individual chains assemble into a functional complex, often further stabilized by the same types of interactions as in tertiary structure. The folding of a protein from its primary into a native structure is governed by thermodynamic principles. posits that the native structure is the one with the lowest , uniquely determined by the under physiological conditions, as demonstrated by experiments refolding denatured . However, highlights the immense conformational search space—a single 100-residue protein could theoretically adopt more than 10^47 possible conformations—yet proteins fold rapidly in milliseconds to seconds, resolved by the concept of an energy funnel where the landscape guides folding toward the native state via partially folded intermediates. , molecular chaperones such as and assist folding by preventing aggregation and promoting correct pathways, particularly for larger proteins. Key geometric constraints define allowable protein conformations. The maps the backbone dihedral angles phi (φ) and psi (ψ), revealing favored regions for alpha helices (φ ≈ -60°, ψ ≈ -45°), beta sheets (φ ≈ -120°, ψ ≈ +120°), and other motifs, based on steric hindrance from side chains and backbone atoms; disallowed regions occupy about 40% of the plot due to atomic clashes. In folded proteins, the hydrophobic core exhibits high packing density, with solvent-accessible surface area () typically reduced by 80-90% compared to the unfolded state, as nonpolar residues minimize exposure to . Evolutionarily related proteins often conserve these tertiary folds despite sequence divergence, underscoring structure's role in function. Experimental structures are stored in the (PDB) format, an ASCII file containing atomic coordinates (x, y, z in angstroms), residue identifiers, and metadata like resolution from or cryo-EM; each entry includes a header with experimental details and models for ensembles. Visualization tools like PyMOL render these coordinates in 3D, allowing rotation, zooming, and highlighting of secondary elements or surfaces for analysis.

Sequence Homology Concepts

In protein sequences, denotes a relationship of common evolutionary ancestry, resulting in features due to shared descent from a . This conservation arises because functional constraints limit divergence, preserving key motifs across related proteins. Homologous proteins are classified into orthologs, which evolve via events from a single ancestral in different lineages, and paralogs, which arise from within a followed by divergence. In contrast, sequence similarity refers merely to observable matches in composition or order, representing a statistical measure without implying evolutionary relatedness; requires evidence of ancestry beyond mere resemblance. Detecting sequence homology relies on alignment-based metrics that assess significance amid random variation. Tools like and its iterative extension PSI-BLAST perform local alignments to identify similar regions, using bit scores to quantify match quality while E-values estimate the probability of chance occurrences in a database search, with thresholds below 0.01 typically indicating reliable . However, challenges emerge in the "twilight zone" of sequence identity below 30%, where alignments become unreliable for inferring due to saturation of substitutions and structural divergence, often requiring advanced profile-based methods for detection. Sequence conservation patterns reflect evolutionary pressures, with invariant residues frequently occurring in active sites to maintain catalytic or functions, as identified through phylogenetic analyses like the evolutionary trace method. Variable regions, such as surface loops, tolerate greater due to reduced functional constraints, allowing while core structural elements remain stable. To quantify evolutionary divergence accounting for multiple unobserved substitutions, the correction for proteins is used:
d = -\ln(1 - p)
where p represents the proportion of observed differences between aligned residues (gaps excluded via pairwise deletion), providing an estimate of substitutions per site.
In homology modeling, higher sequence identity strongly correlates with structural similarity, as measured by (RMSD) of atomic positions; for identities exceeding 40%, modeled structures typically achieve RMSD values below 2 Å relative to native templates, enabling reliable inference of tertiary folds from conserved cores. This relationship underpins the rationale for homology modeling, where predicts structural conservation despite moderate evolutionary distances.

Modeling Workflow

Template Selection

Template selection is a critical initial step in homology modeling, involving the identification of experimentally determined protein structures that serve as scaffolds for the target protein's three-dimensional model. These templates are sourced primarily from the (PDB), the central repository for atomic-level biomolecular structures, which as of November 2025 contains over 244,000 entries and continues to grow with thousands of new structures released annually. Specialized databases such as (Structural Classification of Proteins) and CATH (Class, Architecture, Topology, and Homologous superfamily) provide hierarchical classifications of protein folds, aiding in the selection of evolutionarily related templates by grouping structures into superfamilies based on structural similarity beyond sequence alone. Search methods for templates fall into sequence-based and structure-based categories. Sequence-based approaches utilize profile-profile alignments to detect homologous sequences with higher sensitivity than pairwise comparisons; prominent tools include HHblits, which performs iterative ()-HMM searches against large databases like , and JackHMMER, which iteratively builds HMMs from the target to query sequence databases. Structure-based methods, particularly useful for remote homologs, employ fold recognition through threading algorithms that evaluate how well the target fits known folds; I-TASSER, for instance, integrates multiple threading programs to rank templates by scores and structural compatibility. Templates are selected based on stringent criteria to ensure model reliability, including sequence identity exceeding 30%—a threshold associated with conserved core structures—coverage of at least 70% of the target sequence to minimize gaps, resolution better than 2.5 for accuracy, and low B-factors indicating well-ordered regions. Often, multiple templates are chosen to capture conformational variations, enabling modeling that averages alignments for improved accuracy. Advanced techniques enhance template detection for cases with low sequence similarity. Co-evolution analysis derived from multiple sequence alignments (MSAs) of homologous proteins identifies residue contacts that reveal distant structural relationships, extending the detectable horizon. Additionally, deep learning-based profiles, such as those from protein models, incorporate evolutionary and structural signals to improve remote homolog detection, outperforming traditional methods by over 10% in sensitivity on benchmark datasets.

Target-Template Alignment

Target-template alignment is a critical step in homology modeling, where the sequence of the protein is aligned with that of the selected to establish residue correspondences that guide subsequent structural modeling. This process typically begins with pairwise sequence alignment methods, which compute optimal alignments between two sequences using dynamic programming algorithms. The Needleman-Wunsch algorithm performs global alignment, seeking to align the entire lengths of the and sequences, while the Smith-Waterman algorithm conducts local alignment, focusing on the highest-scoring subsequence matches, which is particularly useful when the proteins share only conserved domains. For improved accuracy, especially with evolutionarily divergent s, profile-based alignments are preferred over simple pairwise methods. These involve constructing position-specific scoring matrices (PSSMs) or hidden Markov models (HMMs) from multiple alignments (MSAs) of the and families, capturing conserved patterns across homologs. Tools like Clustal Omega generate MSAs progressively, enabling profile-profile comparisons that enhance sensitivity in detecting remote homologs. In homology modeling, alignments are often refined with structural and predictive information to account for three-dimensional constraints. Structural alignments, such as those produced by TM-align, superpose the template's atomic coordinates and realign sequences based on spatial proximity, helping to resolve ambiguities in regions with insertions or deletions (). Secondary structure predictions from tools like PSIPRED are incorporated to penalize mismatches between predicted target helices or sheets and the template's known structure, guiding more biologically plausible alignments. Indel penalties are typically affine, comprising a higher cost for gap opening and a lower cost for gap extension, to discourage excessive fragmentation while allowing realistic loop insertions. Alignment quality is evaluated using scoring functions that quantify substitution likelihoods and penalize gaps. Substitution scores are derived from empirical matrices like BLOSUM (Block Substitution Matrix), which are clustered from conserved protein blocks to reflect observed evolutionary exchanges, or PAM (Point Accepted Mutation) matrices, based on closely related sequences extrapolated for divergence. The overall alignment score S is calculated as the sum of substitution scores minus gap penalties: S = \sum_{i,j} s(a_i, b_j) - (g_o + g_e \cdot l) where s(a_i, b_j) is the score for aligning residues a_i and b_j, g_o is the gap opening penalty, g_e is the gap extension penalty, and l is the gap length. BLOSUM62, for instance, is widely used due to its balance for alignments around 30% identity. Challenges arise particularly in low-identity alignments (below 30%), where sequence similarity enters the "twilight zone," leading to multiple equally plausible alignments and potential errors in residue mapping that propagate to model inaccuracy. To address this, iterative refinement strategies, as implemented in software like MODELLER, repeatedly optimize the alignment by satisfying spatial restraints derived from the template structure and adjusting for stereochemical feasibility. These methods, rooted in comparative modeling principles, have been shown to improve alignment reliability even for templates with 20-40% identity.

Backbone Modeling

In homology modeling, the construction of the protein backbone for the conserved core regions begins with the direct transfer of structural coordinates from the identified template structure to the corresponding aligned residues in the target sequence. This process relies on the target-template alignment to map equivalent residues, allowing the backbone atoms (typically N, Cα, C, and O) of the template to be copied to the model, preserving the local geometry where sequence and structural similarity is high. For aligned residues in the core, the phi (φ) and psi (ψ) dihedral angles from the template are adopted to maintain the secondary structure elements such as alpha-helices and beta-sheets. To ensure proper spatial of conserved segments, rigid-body superposition is applied, involving least-squares fitting to align the template's with the target's expected position. This minimizes the differences in atomic positions across the superimposed atoms, often focusing on Cα atoms in secondary regions to handle the (conserved ) separately from variable regions. The of this fit is quantified using the (RMSD), calculated as \sqrt{\frac{1}{N} \sum_{i=1}^{N} ( \mathbf{x}_i - \mathbf{y}_i )^2}, where \mathbf{x}_i and \mathbf{y}_i are the coordinates of corresponding atoms from the target model and template, and N is the number of atoms; RMSD values below Å for residues indicate high fidelity in conserved segments. Following coordinate transfer, conformational adjustments are necessary to satisfy standard bond lengths and , as direct copying may introduce minor distortions due to sequence differences. Distance geometry methods are employed to generate an ensemble of possible backbone conformations that adhere to these geometric constraints, derived from empirical stereochemical libraries, while optimizing the overall structure to match the template's . This step embeds the partial coordinates into 3D space, resolving any inconsistencies in the conserved core without altering the template-derived dihedrals significantly. When multiple templates are available, their backbones are integrated to improve accuracy by deriving a model. Coordinates can be averaged after pairwise superposition of the templates' regions, weighted by or structural similarity, or through a where the most frequent conformation for each residue is selected. Tools like MODELLER facilitate this by satisfying spatial restraints aggregated from all templates during optimization, leading to a refined backbone with reduced RMSD compared to single-template models.

Loop and Side-Chain Modeling

In homology modeling, regions arise primarily from insertions or deletions in the target-template , where the conserved backbone provides fixed anchor points to constrain the modeling process. These flexible segments, often critical for protein , are modeled separately after the core backbone construction, using the template-derived framework as a . Accurate loop prediction is essential, as errors here can propagate to overall model quality, with success rates typically higher for loops under 10 residues due to limited conformational space. Loop modeling methods span database searching, ab initio approaches, and kinematic closures. Database scanning identifies candidate loop conformations by querying libraries of known structures from the (PDB), selecting fragments that match the loop sequence and satisfy geometric constraints between anchors; the LOOPP method exemplifies this by integrating fold recognition signals to generate atomic coordinates for variable regions. techniques, such as those in , assemble short fragments (3-9 residues) derived from the PDB, sampling backbone torsion angles via methods to build initial conformations before refinement. Kinematic methods employ the cyclic coordinate descent (CCD) algorithm, which iteratively adjusts dihedral angles to close the loop by minimizing the distance violation between endpoint Cα atoms, originating from for problems. Loop closure imposes strict geometric constraints, primarily satisfying distance and angle restraints between the N- and C-terminal anchors while avoiding steric clashes with the surrounding structure. Sampling generates diverse conformations, often thousands per , which are then ranked and selected based on an energy function that penalizes deviations from ideal bond lengths, angles, and non-bonded interactions, ensuring the loop integrates seamlessly with the fixed backbone . For longer variable regions exceeding 10 residues, where database hits are sparse, conformational search algorithms like simulations explore broader space, incorporating fragment insertion and energy-guided acceptance to sample low-energy states. Side-chain modeling completes the atomic structure by placing variable atoms onto the backbone, using rotamer libraries that catalog observed χ-angle conformations from high-resolution PDB structures, often backbone-dependent for improved accuracy. The Dunbrack library, refined in SCWRL4, provides discrete rotamer sets for each residue type, enabling efficient packing. Packing algorithms resolve side-chain assignments across the model via dead-end elimination (), a branch-and-bound strategy that prunes high-energy rotamer combinations early, followed by energy minimization for the remaining possibilities. Clashes are assessed using van der Waals repulsion terms, approximated as \sum \epsilon \left(1 - \frac{r}{\sigma}\right)^{12} for overlapping atoms, where \epsilon and \sigma are Lennard-Jones parameters, \sigma the sum of van der Waals radii, and r the interatomic distance. This approach achieves near-native accuracy for core residues, with SCWRL4 outperforming predecessors by incorporating finer rotamer resolution and pairwise interactions.

Model Refinement

Energy Minimization

Energy minimization serves as an essential post-construction step in homology modeling, aimed at relaxing the atomic coordinates of the initial model to a local energy minimum. This process alleviates steric clashes, incorrect bond lengths, and angle distortions that inevitably occur when transferring coordinates from the template structure to the target sequence, particularly in regions with insertions, deletions, or low sequence identity. By optimizing the structure under a , energy minimization improves the overall geometry without significantly altering the fold, preparing the model for further refinement or validation. The optimization relies on empirical force fields such as CHARMM or , which approximate the potential energy of the protein through a sum of physically motivated terms. Bonded interactions are modeled with harmonic potentials for bonds, E_{\text{bond}} = k(r - r_0)^2, where k is the force constant, r is the observed , and r_0 is the ; similar quadratic forms apply to angles and terms to enforce stereochemical preferences. Non-bonded interactions include electrostatic forces via and van der Waals attractions/repulsions using the , capturing short-range atomic overlaps and long-range charge effects. These force fields, parameterized from experimental and quantum mechanical data, enable efficient computation of the total energy surface for protein conformations. Algorithms for minimization typically begin with the steepest descent method, which iteratively adjusts atomic positions in the direction opposite to the energy gradient to rapidly eliminate large clashes, followed by the more efficient conjugate gradient approach for convergence to finer minima. Implicit solvent models, such as Generalized Born with Surface Area (GBSA) approximations, are often incorporated to account for effects during this vacuum or near-vacuum optimization, balancing computational cost with realism. The initial coarse model, assembled from template backbone coordinates and added loops/side chains, undergoes this local adjustment to ensure physical plausibility. Convergence is assessed by monitoring the progressive decrease in total and the of the force (negative gradient) on atoms, typically halting when the root-mean-square (RMS) gradient norm drops below 0.01 kcal//Å, indicating a local minimum with negligible residual forces. This ensures the structure is free of major geometric artifacts while avoiding over-optimization that could distort biologically relevant features.

Refinement Methods

Refinement methods in homology modeling extend beyond initial minimization by employing dynamic simulations and statistical approaches to enhance model realism, addressing local distortions and global conformational sampling. These techniques aim to optimize side-chain packing, conformations, and overall while incorporating empirical knowledge from structural databases or experimental data. By iteratively sampling alternative conformations, refinement protocols can reduce (RMSD) from native structures, often achieving improvements of 1-3 Å in backbone accuracy for low-homology targets. Molecular dynamics (MD) simulations represent a cornerstone of refinement, utilizing short trajectories on the nanosecond scale to explore conformational ensembles in explicit solvent environments. In typical protocols, homology models undergo initial equilibration followed by unrestrained MD at physiological temperatures (around 300 K), employing force fields such as GROMOS96 or CHARMM to propagate atomic motions. For instance, simulations of 5-100 ns have demonstrated RMSD reductions exceeding 1 Å in 18% of tested models, particularly benefiting regions with template deviations greater than 5 Å. To overcome sampling limitations in rugged energy landscapes, replica-exchange MD (REMD) variants distribute replicas across elevated temperatures (e.g., 280-320 K) and attempt periodic exchanges, enhancing exploration of near-native states; this approach improved secondary structure RMSD by an average of 0.82 Å across 15 diverse targets. Knowledge-based refinement leverages statistical potentials derived from the (PDB) to guide optimization without relying solely on physics-based force fields. The DOPE (Discrete Optimized Protein Energy) potential, implemented in the MODELLER suite, computes atomic distance-dependent energies from a non-redundant set of 1,472 high-resolution PDB structures, using a reference state normalized by the protein's to identify favorable interactions. During refinement, DOPE minimizes pseudo-energy scores to correct loop and side-chain inaccuracies, outperforming other potentials in native state detection and achieving correlation coefficients up to 0.7 with experimental RMSD. Additionally, these methods emphasize hydrogen bonding networks, scoring intra- and inter-residue bonds to stabilize secondary elements, which has proven effective in iterative loop modeling where homology templates are sparse. Hybrid approaches integrate experimental restraints from techniques like (NMR) or cryo-electron microscopy (cryo-EM) to constrain refinement, bridging gaps in homology-derived models. For example, protocols combining NMR chemical shifts with cryo-EM maps employ iterative Rosetta-MD simulations, where shift-derived restraints and potentials guide fragment assembly and MD sampling over three cycles, yielding RMSD improvements of 1.2-3.1 Å for maps at 6-9 Å . Such methods are particularly valuable for multi-domain proteins, where sparse data refines flexible regions without over-reliance on templates. Machine learning-inspired refinement, drawing from deep learning architectures like , targets challenging loop regions by predicting residue-residue distances and refining via physics-based simulations. In hybrid pipelines, initial -generated conformations undergo MD refinement to correct loop misfolds, improving local accuracy (GDT-TS scores) by up to 10-15 points in CASP benchmarks, especially for sequences with limited homology. Recent extensions, such as 3 (released May 2024), further enable modeling of protein-ligand and multi-component complexes, enhancing refinement accuracy in hybrid pipelines by up to 20% for systems involving small molecules as of 2025. These approaches use neural network-derived potentials to prioritize diverse loop decoys before MD equilibration, enhancing sampling efficiency in non-template segments. Iterative protocols cycle through modeling, refinement, and evaluation to achieve convergence, often employing genetic algorithms or automated servers for alignment optimization and model rebuilding. In MODELLER-based frameworks, iterations alternate between generating alignments via crossover operations, building restraint-derived models, and scoring with composite potentials (e.g., combined with density fitting), resulting in 14% gains in residue-level accuracy for low-identity targets (<30%). This process continues until energy minima stabilize, typically after 5-10 cycles, providing robust ensembles for downstream applications.

Validation and Assessment

Quality Metrics

Quality metrics in homology modeling assess the structural integrity, stereochemical correctness, and overall plausibility of generated models by evaluating local geometric features, global fold similarity, alignment fidelity to templates, and residue-specific reliability. These metrics help identify deviations from experimentally observed protein structures and guide model selection or refinement. Local metrics focus on atomic-level accuracy, while global and comparative metrics evaluate higher-order structural features. Local metrics examine stereochemical and geometric validity at the residue or atom level. The assesses backbone dihedral angles (φ and ψ), with outliers indicating unfavorable conformations; models are considered acceptable if fewer than 5% of residues fall in disallowed regions, as this threshold aligns with high-quality experimental structures. Bond length and angle deviations from ideal values, typically measured against standard geometries, quantify covalent geometry errors, where deviations exceeding 0.02 Å for bonds or 2° for angles signal potential modeling artifacts. , computed via tools like , detect steric overlaps between non-bonded atoms, with scores below 10 clashes per 1,000 atoms indicating clash-free models comparable to crystal structures resolved at 2 Å. Global metrics provide an overview of the model's overall fold and energetic favorability. The Global Distance Test-Total Score (GDT-TS) measures fold similarity by calculating the percentage of Cα atoms within specified distance cutoffs (0.5, 1, 2, 3, and 4 Å) after optimal superposition, yielding scores from 0 to 1, where values above 0.5 denote reliable topology. Energy-based scores like DOPE (Discrete Optimized Protein Energy) evaluate statistical potentials derived from known structures, with lower (more negative) scores indicating models closer to native-like energy profiles. Similarly, QMEAN integrates local geometry, all-atom interactions, and solvation terms into a composite score normalized to a Z-score, where values near 0 or positive suggest good agreement with experimental data. Template comparison metrics quantify how well the model reproduces the selected template's structure in aligned regions. The root-mean-square deviation (RMSD) of Cα atoms in core regions measures atomic positional accuracy, with values under 2 Å typically indicating high-fidelity modeling for sequence identities above 40%. The TM-score assesses topological similarity on a scale of 0 to 1, robust to local perturbations, where scores greater than 0.5 confirm the same fold as the template regardless of size differences. Per-residue analysis highlights variable confidence across the model. B-factors (temperature factors) estimate positional uncertainty, with higher values (e.g., >50 Ų) in loop regions signaling lower reliability due to modeling ambiguity. -based weighting adjusts per-residue scores by evolutionary , prioritizing residues in functional sites; methods like those in ConQuass correlate conservation profiles with structural quality, enhancing detection of reliable motifs in low-homology regions.

Benchmarking Approaches

Benchmarking approaches in homology modeling involve standardized, objective evaluations to compare the performance of prediction methods across diverse protein targets. The most prominent is the , a community experiment initiated in 1994 that organizes blind prediction challenges using unpublished experimental structures. In CASP, participants submit models without knowledge of the target structures, which are revealed only after the submission deadline, allowing assessors to evaluate predictions against high-resolution experimental data such as or cryo-electron microscopy results. This blind protocol ensures unbiased assessment and has driven advancements in homology modeling by highlighting strengths and weaknesses in template selection, alignment, and model building. Within , homology modeling falls under the Template-Based Modeling (TBM) category, where targets are selected based on detectable structural homologs in the (PDB). Targets are further subdivided into easier cases (TBM-easy) with sequence identities greater than 40% to known templates, where modeling is straightforward, and harder cases (TBM-hard) with identities below 30%, requiring sophisticated and refinement to achieve reliable models. is quantified using metrics like the Global Distance Test High Accuracy (GDT-HA) score, which measures backbone similarity between predicted and experimental structures by averaging the percentage of aligned residues within specified distance cutoffs (0.5 , 1 , 2 , and 4 ). Success rates are analyzed in identity bins, revealing that models for targets above 50% identity often exceed 90% GDT-HA, while those below 30% typically range from 40-60%, emphasizing the challenges of low-homology scenarios. Over successive CASP editions, homology modeling has shown steady evolution, with average GDT-HA scores for top TBM predictions improving from approximately 65 in (2012) to over 85 in (2020), reflecting refinements in multiple-sequence , template combination, and loop modeling techniques. These gains are attributed to better integration of evolutionary information and for accuracy, though gains have plateaued for high-identity targets while persisting for harder ones. Complementing CASP, other benchmarks provide continuous or automated evaluations tailored to homology modeling. The Critical Assessment of Fully Automated Structure Prediction (CAFASP), run in parallel with early CASP rounds (e.g., CAFASP2 alongside 5 in 2002), focused exclusively on web-server-based methods without human intervention, assessing homology modeling targets via automated pipelines and GDT-based scores to promote accessible tools. For ongoing tracking, the (Evaluation of Automatic servers for Prediction) system, operational since 2001, continuously evaluates public servers on newly released PDB structures, categorizing predictions by template identity and reporting aggregate GDT-HA and RMSD metrics to monitor real-time performance trends. Similarly, LiveBench offers large-scale, automated assessments of fold-recognition and servers using periodic updates of test sets, emphasizing scalability and informing users on state-of-the-art automated modeling accuracy. In more recent editions, CASP15 (2022) and CASP16 (2024) have demonstrated that template-based modeling remains highly accurate, with top predictions in TBM categories achieving average GDT-HA scores exceeding 90, driven by enhancements that complement traditional approaches.

Accuracy and Limitations

Factors Influencing Accuracy

The accuracy of models is primarily determined by the degree of sequence identity between the target protein and the template structure. At sequence identities greater than 50%, backbone (RMSD) values are typically below 1 , providing high-confidence models suitable for detailed . In the intermediate range of 20-30% identity, often referred to as the challenging zone for reliability, models frequently exhibit errors of 2-5 , particularly in regions and surface-exposed areas, limiting their utility for precise predictions. Below 20% identity, known as , modeling often fails to generate structurally sound models due to unreliable alignments and undetected remote homologs, resulting in RMSD values exceeding 5 or completely incorrect folds. Template quality significantly modulates model precision beyond sequence identity alone. High-resolution templates (e.g., <2 Å) yield more accurate atomic coordinates, as lower-resolution structures introduce uncertainties in side-chain placements and backbone geometry. Incomplete templates that do not cover the full target length necessitate extrapolation, increasing errors in unaligned regions by up to 1-2 Å. Mismatches in oligomeric state between template and target can distort interface geometries, leading to RMSD deviations of 2-3 Å in multimeric assemblies. Employing multiple templates, selected for complementary coverage of the target's structure, improves average backbone RMSD by approximately 0.5 Å compared to single-template models by integrating diverse structural insights. The inherent complexity of the target protein further influences modeling outcomes. Larger single-domain proteins (>300 residues) tend to accumulate more errors over extended sequences, elevating overall RMSD by 0.5-1 relative to smaller domains. Multi-domain targets pose additional challenges, as independent modeling of each domain followed by assembly can introduce inaccuracies in inter-domain orientations, with errors compounding to 1-2 in linker regions. Differences between ligand-bound (holo) and apo forms of the target affect local accuracy; for instance, modeling an apo target using a holo may distort pockets by 1-3 , while the reverse can overlook induced-fit conformations critical for function. Empirical correlations underscore these factors, with the Chothia-Lesk relation describing how structural divergence in core regions approximates RMSD ≈ 1.0 × (1 - identity/100) Å for homologous proteins, highlighting the predictable scaling of errors with sequence divergence. Assessments from the experiments, including CASP16 (2024), demonstrate that integration of techniques enhances accuracy in low-identity cases (<30%), reducing RMSD by up to 3-5 Å compared to traditional methods through better alignment and refinement in hybrid homology approaches. As demonstrated in CASP16 (2024), hybrid AI-homology methods have further enhanced accuracy for low-identity targets, often achieving RMSD values below 2 Å where traditional approaches previously exceeded 5 Å.

Common Errors and Mitigation

One prevalent error in homology modeling arises from inaccuracies in sequence alignment, particularly misaligned insertions and deletions (indels) that cause register shifts in the backbone, leading to distorted secondary structures and propagated errors in the model. These issues are especially common when sequence identity falls below 35%, as automated aligners may overlook conserved structural motifs. To mitigate this, manual inspection of alignments is recommended, supplemented by secondary structure predictions to guide realignment and ensure compatibility with known template folds. Loop modeling often fails to produce realistic conformations, particularly for long loops exceeding 13 residues in low-homology regions, where the exponential growth in possible conformations results in high root-mean-square deviation () values and unrealistic geometries influenced by the surrounding protein core. Such errors dominate model inaccuracies even at moderate sequence similarities above 35%, as template-derived loops may not capture target-specific flexibility. Mitigation involves generating an ensemble of loop structures using ab initio methods like conformational search with energy scoring or knowledge-based database scanning (e.g., ), followed by selection of low-energy variants through optimization. Chirality and steric issues, such as incorrect amino acid handedness or atomic overlaps (steric clashes with non-bonded overlaps >0.4 ), frequently occur post-model construction, especially in side-chain placements and regions, leading to non-physical structures with Ramachandran outliers. These artifacts are more pronounced in models than high-resolution experimental structures, with clashscores often exceeding acceptable thresholds (e.g., >10 clashes per 1000 atoms for structures at ~2 resolution). Post-build validation using tools like MolProbity identifies these violations, while mitigation employs energy minimization—starting with steepest descent followed by conjugate gradient methods—or discrete to resolve clashes while preserving backbone integrity (Cα RMSD ≤1 ). Bias from reliance on a single template can cause overfitting to template-specific artifacts, such as domain distortions or non-conserved features, reducing overall model fidelity particularly for targets with low sequence identity. This is evident in benchmarks where single-template models show limited improvement in TM-score compared to multi-template approaches. To address this, diverse templates should be selected and aligned, with model quality assessment (e.g., ProQ) used to integrate the best regions, yielding up to a 5% quality gain in difficult cases. Additionally, incorporating experimental restraints, such as NMR data, during building helps counteract template-induced biases.

Applications

Biological and Medical Uses

Homology modeling plays a pivotal role in structure-function studies by enabling the of three-dimensional structures for proteins that lack experimental , particularly those uncrystallized or recalcitrant to techniques like or cryo-electron microscopy. This approach has significantly expanded structural coverage of the human , with template-based homology models providing reliable predictions for approximately 48% of residues prior to the advent of methods like , which have since increased coverage to over 75% of human protein residues as of 2022, with further refinements ongoing. For instance, tools like have generated homology-derived structures for thousands of human proteins, facilitating the inference of functional motifs, binding sites, and conformational changes essential for understanding biological processes such as signaling and . In , homology models serve as foundational scaffolds for and lead optimization, allowing researchers to simulate interactions with target proteins before synthesis and testing. Early applications demonstrated this utility in the development of inhibitors, where homology models based on protease templates predicted the enzyme's active site geometry, guiding the design of peptidomimetic inhibitors like that achieved micromolar potency and informed clinical candidates. More recently, these models have been employed to assess mutations in protease variants, enabling the prediction of binding affinities for approved inhibitors and supporting personalized therapy strategies. Similarly, homology modeling aids prediction for development by reconstructing surfaces to identify immunogenic regions, such as B-cell and T-cell epitopes on viral proteins, which has accelerated the design of subunit vaccines against pathogens like through validation of exposed loops and helices. Within , homology modeling elucidates the of protein folds across species by aligning sequences to conserved templates, revealing how structural variations arise from sequence mutations while preserving core architectures. This technique infers functional sites by mapping catalytic residues onto evolutionary related structures, as seen in studies of families where homology-derived models trace the emergence of specificity from ancestral scaffolds, providing insights into and over phylogenetic timescales. Such analyses have been instrumental in reconstructing ancestral protein states and understanding functional shifts, for example, in metalloproteases where modeled folds highlight conserved zinc-binding motifs amid sequence drift. Notable case studies underscore the practical impact of homology modeling in urgent biological challenges. During the 2020 SARS-CoV-2 outbreak, homology models of the viral , templated from SARS-CoV structures, enabled rapid design of neutralizing antibodies by identifying receptor-binding domain epitopes, leading to computational screening of over 89,000 and prioritization of candidates with predicted high affinity for therapeutic use. In , homology modeling supports enzyme engineering by predicting structures to enhance catalytic efficiency, as demonstrated in the redesign of like lipases for applications such as production, where modeled active sites guided mutations for improved and substrate specificity. These applications rely on prior validation of model quality to ensure predictive reliability in downstream experiments.

Integration with Modern Techniques

Homology modeling has found renewed synergy with artificial intelligence-driven methods, particularly in enhancing the accuracy of deep learning-based predictions. AlphaFold2, introduced in 2020, relies on multiple sequence alignments (MSAs) derived from homology searches to inform its Evoformer module, which processes evolutionary relationships to generate structural predictions with atomic accuracy even for proteins lacking close homologs. AlphaFold3, released in 2024, extends this by incorporating diffusion-based architectures for joint prediction of protein complexes with ligands and nucleic acids, yet still leverages homology-informed MSAs to improve confidence in regions with sparse evolutionary data. These AI models often require homology modeling for post-prediction refinement, where template-based adjustments correct local inaccuracies, such as in loop regions or interfaces, achieving up to 20% improvement in global distance test (GDT) scores for challenging targets. Hybrid tools exemplify this integration, combining templates with frameworks to address limitations in prediction. RoseTTAFold, developed by the Baker laboratory, incorporates structural templates from searches as inputs to its trRosetta-derived network, enabling more accurate modeling of protein folds and complexes when sequence identity is moderate (20-40%). In the CASP16 assessment of 2024-2025, hybrid approaches using -guided refinement outperformed pure AI predictions for protein-ligand complexes, particularly in pharmaceutical targets where ligand placement benefited from template-derived geometries, with top groups achieving (RMSD) values under 2 for 70% of cases. Recent advances further embed homology modeling in multimodal workflows, expanding its role beyond proteins. Tools like DiffDock employ diffusion generative models for protein-ligand , using homology-derived templates to initialize poses and refine predictions in low-confidence outputs, resulting in success rates exceeding 38% for blind challenges. For low-homology scenarios, techniques in protein language models, such as those in TM-Vec, detect remote structural similarities (below 20% sequence identity) by embedding homology alignments into deep networks, facilitating accurate recognition and where traditional searches fail. Looking ahead, homology modeling remains essential for interpreting and correcting AI-generated structures, especially in dynamic or multi-component systems where struggles with uncertainty quantification. Software platforms like have updated their pipelines by 2024 to incorporate AlphaFold-predicted structures as homology templates, streamlining hybrid workflows and supporting iterative refinement for over 200 million entries in integrated databases. This enduring role positions homology modeling as a complementary technique, enhancing AI's interpretability and applicability in biological research through 2025 and beyond.

References

  1. [1]
    Homology modeling in the time of collective and artificial intelligence
    Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural ...
  2. [2]
    Homology Modeling a Fast Tool for Drug Discovery - NIH
    Loop structure prediction: The homology modelling is very good tool for prediction of loop structures, the exaples are given in Table 10. Loops frequently ...
  3. [3]
    Discrimination between Distant Homologs and Structural Analogs
    Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways ...
  4. [4]
    In the twilight zone of protein sequence homology - Oxford Academic
    Aug 17, 2024 · Proteins sharing above 30% sequence identity are generally labeled as belonging to the same Family. On the other hand, two proteins are ...Introduction · Methods · Results and discussion · Conclusion
  5. [5]
    On the Accuracy of Homology Modeling and Sequence Alignment ...
    Jul 15, 2006 · ... 30% sequence identity. In contrast, below ∼25% sequence identity, which is the similarity of many G-protein-coupled receptors to bovine ...
  6. [6]
    Toward the solution of the protein structure prediction problem
    The first published attempt at TBM, and protein structure prediction in general, can be traced back to 1969 when Browne et al. (84) built a model for bovine ...
  7. [7]
    Celebrating 50 Years of the Protein Data Bank Archive - RCSB PDB
    In 1971, the structural biology community established the single worldwide archive for macromolecular structure data–the Protein Data Bank (PDB).
  8. [8]
    Protein Structure Prediction: Challenges, Advances, and the Shift of ...
    Homology modeling method for protein structure prediction. C. Threading ... In 1969, Browne et al. proposed the comparative modeling strategy for ...
  9. [9]
    Distance Estimation --MEGA manual
    The evolutionary distance between a pair of sequences is usually measured by the number of nucleotide or amino acid substitutions between them.Missing: proteins | Show results with:proteins
  10. [10]
    Updated resources for exploring experimentally-determined PDB ...
    Nov 28, 2024 · PDB holdings at the time of writing include >227 000 experimentally determined atomic-level 3D structures. These data are essential for research ...
  11. [11]
    CATH: Protein Structure Classification Database at UCL
    Sep 30, 2024 · CATH is a classification of protein structures downloaded from the Protein Data Bank. We group protein domains into superfamilies when there is sufficient ...Browse CATH-Gene3D... · Search · Download CATH-Gene3D Data · AboutMissing: selection | Show results with:selection
  12. [12]
    I-TASSER server for protein 3D structure prediction
    Jan 23, 2008 · The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users.
  13. [13]
    Practical Guide to Homology Modeling - Proteopedia, life in 3D
    Oct 20, 2024 · Provided there is sufficient sequence identity between the query and template (at least 30%), the main chain in homology models is usually ...
  14. [14]
    Help - SWISS-MODEL
    A set of maximally 50 top-ranked templates is then chosen from the full list of templates according to a simple score which combines sequence coverage and ...
  15. [15]
    Using multiple templates to improve quality of homology models in ...
    Here, we have carried out a systematic investigation of the potential of multiple templates to improving homology model quality.
  16. [16]
    Extending the Horizon of Homology Detection with Coevolution ...
    In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon.
  17. [17]
    Fast, sensitive detection of protein homologs using deep dense ...
    Aug 9, 2024 · DHR's trained encoders empowered with protein language models capture rich structural and coevolution information, enabling us to achieve a >10% ...
  18. [18]
    Cyclic coordinate descent: A robotics algorithm for protein loop closure
    The algorithm, referred to as cyclic coordinate descent or CCD, involves adjusting one dihedral angle at a time to minimize the sum of the squared distances.
  19. [19]
    Improved prediction of protein side-chain conformations with SCWRL4
    In this paper, we describe the development of the SCWRL4 program for prediction of protein side-chain conformations. We used a number of different approaches to ...
  20. [20]
    Modeller: generation and refinement of homology-based protein ...
    Modeller: generation and refinement of homology-based protein structure models. ... Authors. András Fiser , Andrej Sali. Affiliation. 1 Department of ...
  21. [21]
    [PDF] An efficient newton-like method for molecular mechanics energy ...
    For small molecules (N < 100) we typically require the RMS gradient per atom to be W4 to kcal/mol/A or less at convergence. For larger structures, values of ...
  22. [22]
    Refinement of homology-based protein structures by molecular ...
    Overall, the results indicate that molecular dynamics simulations on a tens to hundreds of nanoseconds time scale are useful for the refinement of homology or ...
  23. [23]
    Refining Homology Models by Combining Replica-Exchange ... - NIH
    In this article, we investigate the utility of a refinement protocol that combines replica-exchange molecular dynamics (REMD) as the primary sampling technique ...
  24. [24]
    Statistical potential for assessment and prediction of protein structures
    DOPE is the best performing function in the detection of the native state, the correlation between the score and Cα RMS error, and the identification of the ...
  25. [25]
    Using NMR chemical shifts and cryo-EM density restraints in ...
    This hybrid protocol will be a valuable tool when only low-resolution cryo-EM density data and NMR chemical shift data are available to refine structures.
  26. [26]
    HIGH-ACCURACY PROTEIN STRUCTURES BY COMBINING ... - NIH
    Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations ...
  27. [27]
    Refinement of Protein Structures by Iterative Comparative Modeling ...
    During this iterative process: (i) new alignments are constructed by application of five different genetic algorithm operators, such as alignment mutations and ...
  28. [28]
    Protein Structure Prediction Center
    The Critical Assessment of protein Structure Prediction (CASP) experiments ... CASP1 (1994) | CASP2 (1996) | CASP3 (1998) | CASP4 (2000) | CASP5 (2002) ...CASP13 · CASP1 (1994) · CASP_Commons · CASP16
  29. [29]
    Critical Assessment of Methods of Protein Structure Prediction (CASP)
    In CASP, models are primarily evaluated by comparison of their coordinates with those of the corresponding experimental structure, and this remains the gold ...
  30. [30]
    Critical assessment of methods of protein structure prediction (CASP)
    Nov 2, 2023 · Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems.Abstract · INTRODUCTION · RESULTS · DISCUSSION
  31. [31]
    Assessment of template based protein structure predictions in CASP10
    The GDT-HA and GDC-all scores are global measures of the agreement between a predicted model and the experimental structure, with GDT-HA reflecting the accuracy ...Missing: CASP15 | Show results with:CASP15
  32. [32]
    High‐accuracy protein structure prediction in CASP14
    Jul 3, 2021 · With CASP14, “high-accuracy” modeling has become uncoupled from “template-based” structure prediction. Models for targets without experimental ...
  33. [33]
    CAFASP3: the third critical assessment of fully automated structure ...
    We present the results of the fully automated CAFASP3 experiment, which was carried out in parallel with CASP5, using the same set of prediction targets.
  34. [34]
    LiveBench-1: Continuous benchmarking of protein structure ... - NIH
    We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers.
  35. [35]
    evaluation of automated homology modelling methods at low target ...
    Results: When using Modeller as the common modelling program, 3D-Coffee outperforms Staccato and SAlign using both multiple templates and the best single ...
  36. [36]
    Twilight zone of protein sequence alignments - PubMed
    The twilight zone of protein sequence alignments is where the signal gets blurred at 20-35% sequence identity, with most pairs having different structures.
  37. [37]
    Prediction of Homology Model Quality with Multivariate Regression
    The relatively high RMSD values are probably caused by the wide range of sequence identities between the targets and templates used for the homology modeling.<|control11|><|separator|>
  38. [38]
    Automated structure prediction of weakly homologous proteins on a ...
    For proteins with 30–50% sequence identity to their templates, the models often have ≈85% of their core regions within a rmsd of 3.5 Å from native, with errors ...
  39. [39]
    The relation between the divergence of sequence and structure in ...
    Lesk A. M., Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J ...Missing: correlation | Show results with:correlation
  40. [40]
    Ten quick tips for homology modeling of high-resolution protein 3D ...
    Apr 2, 2020 · These quick tips cover the process of homology modeling and some postmodeling computations such as molecular docking and molecular dynamics (MD).
  41. [41]
    Comparative modelling of protein structure and its impact on ...
    Jun 30, 2005 · Along with alignment, loop modeling is probably the most difficult step in comparative modelling process. Errors in loops are the dominant ...
  42. [42]
    Automated Minimization of Steric Clashes in Protein Structures - NIH
    We describe a rapid, automated and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models.Missing: chirality | Show results with:chirality
  43. [43]
    SWISS-MODEL: homology modelling of protein structures and ...
    May 21, 2018 · Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known ...
  44. [44]
    The structural coverage of the human proteome before and after ...
    Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up ...<|separator|>
  45. [45]
    Molecular Modeling of the HIV-1 Protease and Its Substrate Binding ...
    The HIV-1 protease is predicted to interact with seven residues of the protein substrate. This information can be used to design protease inhibitors and ...<|separator|>
  46. [46]
    Prediction of HIV drug resistance based on the 3D protein structure
    Aug 4, 2021 · The 3D structure of each HIV-1 protease variant was predicted using homology modeling. Since several crystal structures of drug-protease ...
  47. [47]
    In silico Homology Modeling and Epitope Prediction of NadA as a ...
    In the present study, we exploited bioinformatics tools to better understand and determine the 3D structure of NadA and its functional residues to select B cell ...Results · Fig. 2 · Fig. 8
  48. [48]
    A review of visualisations of protein fold networks and their ...
    One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity.Ii. Networks · (1). Similarity Networks · Iii. Key Questions
  49. [49]
    A unified analysis of evolutionary and population constraint ... - Nature
    Apr 11, 2024 · Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195, 957–961 (1987) ...
  50. [50]
    Functional Evolution of Proteins - PMC - NIH
    The functional evolution of proteins advances through gene duplication followed by functional drift, whereas molecular evolution occurs through random ...Results · Discussion · Figure 5
  51. [51]
    Rapid in silico design of antibodies targeting SARS-CoV-2 using ...
    Apr 10, 2020 · ... SARS-CoV-2 spike protein homology model. By Feb 13, 2020, we evaluated 89,263 mutant antibodies selected from a design space of 1040 (20 ...
  52. [52]
    Homology Modeling for Enzyme Design - PubMed
    Homology modeling is a very powerful tool in the absence of atomic structures for understanding the general fold of the enzyme, conserved residues, catalytic ...Missing: engineering | Show results with:engineering
  53. [53]
    Homology Modeling for Enzyme Design | Springer Nature Experiments
    Homology modeling is a very powerful tool in the absence of atomic structures for understanding the general fold of the enzyme, conserved residues, catalytic ...
  54. [54]
    Highly accurate protein structure prediction with AlphaFold - Nature
    Jul 15, 2021 · Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component ...
  55. [55]
    Accurate structure prediction of biomolecular interactions ... - Nature
    May 8, 2024 · The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein– ...Nobel Prize in Chemistry 2024 · Nature Machine Intelligence
  56. [56]
    Refinement of AlphaFold2 models against experimental and hybrid ...
    The overall idea in refinement is to first identify the most consistent model among a set of suggested models from AlphaFold2. The selected model was then ...
  57. [57]
    Home - CASP16 - Protein Structure Prediction Center
    The goals of CASP are to provide rigorous assessment of computational methods for modeling macromolecular structures and complexes so as to advance the state of ...CASP16 in numbers · Rankings: Ligands · Of /download_area/CASP16 · Target ListMissing: homology | Show results with:homology
  58. [58]
    (PDF) Modeling Protein–Protein and Protein–Ligand Interactions by ...
    Oct 25, 2025 · In the CASP16 experiment, our team employed hybrid computational strategies to predict both protein–protein and protein–ligand complex ...
  59. [59]
    Protein remote homology detection and structural alignment ... - Nature
    Sep 7, 2023 · We showcase the merits of TM-Vec models in the context of CATH and SWISS-MODEL to show how our tool can scale with regard to database size while ...<|separator|>
  60. [60]
    Blog - SWISS-MODEL
    OpenStructure Actions API · June 2025 · Structure Assessment web server publication · April 2024 · AlphaFoldDB structures included as homology modelling templates ...
  61. [61]
    An outlook on structural biology after AlphaFold: tools, limits and ...
    Sep 23, 2024 · AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio ...