Fact-checked by Grok 2 weeks ago

Structural alignment

Structural alignment is a computational in bioinformatics that superimposes and compares the three-dimensional structures of biological macromolecules, primarily proteins and nucleic acids, to establish correspondences between their atomic coordinates and identify similarities in spatial arrangements, folds, and functional motifs, irrespective of . This approach is essential for detecting evolutionary relationships and functional conservation in proteins with low identity, where traditional sequence-based methods fail, as it reveals shared structural cores that imply common ancestry or biochemical roles. For instance, globins like and neuroglobin exhibit structural similarity despite diverging sequences, highlighting alignment's role in uncovering distant homologs. Key applications include , functional annotation, and the classification of protein domains into folds within databases such as , CATH, and FSSP, which rely on structural alignments to organize the (PDB). Methods vary from rigid-body superimpositions, which assume fixed conformations for closely related structures, to flexible alignments that accommodate domain movements and loops, as in tools like FATCAT and developed since the . Topology-independent approaches further handle permutations, such as circular shifts in chain connectivity, enhancing accuracy for diverse superfamilies. Overall, structural alignment underpins phylogenetic analyses and structure prediction pipelines, like threading, by quantifying similarity via metrics such as (RMSD) of aligned atoms.

Fundamentals

Definition and Principles

Structural alignment is a computational method used to compare three-dimensional () structures of biomolecules, primarily proteins, by establishing correspondences between their atomic coordinates to reveal similarities in spatial arrangement despite limited sequence similarity. This process identifies equivalent residues or atoms, enabling the superposition of structures to assess shared folds or functional motifs that may indicate evolutionary relationships. While most commonly applied to proteins, the approach is extensible to other macromolecules like nucleic acids or ligands where 3D geometry informs . To contextualize structural alignment, protein structures are hierarchically organized: the primary structure refers to the linear ; secondary structure encompasses local patterns such as α-helices and β-strands stabilized by hydrogen bonds; and tertiary structure describes the global fold resulting from non-covalent interactions. Structural alignment operates at the tertiary level, focusing on coordinate-based comparisons rather than , as protein evolution conserves 3D folds more robustly than primary sequences, allowing detection of distant homologs with low sequence identity (often below 25%). The core principles involve establishing a residue-to-residue (or atom-to-atom) correspondence and applying rigid-body transformations—rotations and translations—to overlay the structures optimally. This superposition minimizes spatial discrepancies, quantifying similarity through metrics like (RMSD), which prioritizes structural invariance over sequential order. Unlike , which relies on linear matching, structural alignment accounts for topological equivalences and evolutionary drifts in backbone geometry. Historically, structural alignment emerged in the with pioneering work by Rossmann and , who developed systematic methods to explore by comparing backbone conformations across proteins, initially focusing on shared functional sites like active centers. Their approach laid the for modern tools by introducing iterative superposition techniques to detect subtle similarities. A key aspect is the least-squares fitting to minimize RMSD, formulated as: [ \text{RMSD} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} | \mathbf{r}_i - (R \mathbf{t}_i + \mathbf{t}) |^2} ] where N is the number of aligned points, \mathbf{r}_i and \mathbf{t}_i are the coordinates of corresponding atoms in the target and template structures, R is the optimal rotation matrix, and \mathbf{t} is the translation vector; the goal is to find R and \mathbf{t} that minimize this value.

Applications

Structural alignment serves as a cornerstone in evolutionary studies within bioinformatics, enabling the detection of remote homologs—proteins sharing a common ancestor but exhibiting sequence similarity below 20%—by focusing on conserved three-dimensional folds rather than linear sequences. This approach reveals evolutionary relationships that sequence-based methods often miss, particularly for divergent protein families. For instance, databases such as SCOP (Structural Classification of Proteins) and CATH (Class, Architecture, Topology, Homologous superfamily) rely on structural alignments to classify proteins into hierarchical superfamilies, facilitating the systematic organization of the protein universe and inference of evolutionary histories. In the realm of function prediction, structural alignment aids in inferring biochemical roles, especially for , by mapping conserved structural motifs like across related proteins. Alignments highlight fold conservation that correlates with functional similarity, even amid sequence divergence. A prominent example is the superfamily, a ubiquitous enzyme fold comprising eight α/β units, where structural alignments have elucidated shared catalytic mechanisms in diverse enzymes such as and , allowing predictions of residues and substrate specificity for novel members. For and modeling, structural alignment underpins template-based approaches in pipelines, where a target sequence is aligned to experimentally determined templates to generate atomic models. Tools like MODELLER exemplify this by using alignments to satisfy spatial restraints derived from template structures, producing reliable models for targets with detectable homologs and supporting downstream analyses in . In , structural alignment facilitates the identification of analogous binding pockets in related proteins, enabling the comparison of ligand-binding sites to guide inhibitor design or drug repurposing. This is particularly valuable in workflows, where aligned structures allow the of compound libraries against multiple targets, prioritizing hits based on conserved pharmacophores and accelerating lead optimization. Database searching represents another key application, where structural alignment tools query repositories like the (PDB) to retrieve proteins with similar folds, aiding in the functional annotation of newly solved structures. Servers such as perform exhaustive 3D comparisons to generate alignments and similarity scores, helping researchers contextualize novel folds within the broader structural landscape.

Representations and Data

Structure Representations

Protein structures for alignment are primarily represented by atomic coordinates in the (PDBx/mmCIF) , the current , or the legacy PDB , where in the latter each atom's is specified using Cartesian coordinates x, y, and z in angstroms within ATOM or HETATM . These coordinates capture the three-dimensional arrangement of all heavy atoms in the macromolecule, enabling precise spatial comparisons. To enhance computational efficiency, especially for large-scale alignments, representations are often simplified to Cα atoms only, focusing on the alpha carbon of each residue to form a polyline backbone. This reduction preserves the overall fold while drastically lowering the number of points from thousands to hundreds per protein, facilitating faster superposition and similarity calculations. Secondary structure encodings provide a higher-level abstraction, with the Dictionary of Secondary Structure of Proteins (DSSP) algorithm assigning states such as α-helices (H), β-sheets (E), and coils (C or loop regions) based on hydrogen bonding patterns between backbone atoms. These assignments reduce the structure to a sequence of categorical labels, aiding in alignment by matching regular secondary elements while accommodating irregular coils. Backbone dihedral offer a compact vector-based representation, where the conformation of each residue is encoded by the φ (phi) angle around the N-Cα and the ψ (psi) angle around the Cα-C , typically ranging from -180° to 180°. These form a sequence vector that captures local geometry without relying on absolute positions, useful for comparing flexible regions. Further abstractions include contact maps, which are binary matrices indicating pairs of residues in close proximity (e.g., Cα distance < 8 Å), torsion angle sequences beyond just φ and ψ (such as ω for peptide bonds), and intra-molecular distance matrices defined as D_{ij} = \| \mathbf{pos}_i - \mathbf{pos}_j \|, where \mathbf{pos}_i and \mathbf{pos}_j are the 3D coordinates of residues i and j. Distance matrices, in particular, transform the structure into a rotation- and translation-invariant form, ideal for alignment algorithms like that optimize matrix overlaps. Full atomic representations excel in capturing fine details like side-chain interactions and hydrogen bonds but incur high computational costs due to the large number of atoms, making them less suitable for rapid screening of protein databases. In contrast, Cα-only or abstracted models prioritize speed and scalability, though they may overlook subtle differences in flexible loops, where multi-conformer averaging or ensemble representations are sometimes employed to account for conformational variability.

Outputs from Alignments

Structural alignment processes generate tangible outputs that capture residue correspondences, quantify similarity, and enable visualization of three-dimensional overlaps between protein structures. These outputs are essential for downstream analyses in bioinformatics, such as evolutionary inference and functional annotation. Alignment files primarily consist of residue mappings that identify equivalent residues across compared structures, often presented as lists of corresponding amino acids with their positions. These mappings are exported in standardized formats, including the PIR (Protein Information Resource) format, which includes protein identifiers, sequence alignments, and annotations for structural modeling applications, and aligned PDB files that incorporate superimposed atomic coordinates for direct use in visualization software. For instance, tools like CE-MC produce such files to represent multi-structure alignments with preserved spatial relationships. Quantitative data derived from alignments include the root-mean-square deviation (RMSD), a metric of atomic fit quality computed for global structures or local subsets after superposition, typically reported in angstroms to indicate average displacement. Additionally, the number of aligned residues provides a count of the overlapping structural elements, reflecting the coverage of similarity. These values are standard in outputs from methods like TM-align, where they benchmark alignment robustness. Visual outputs feature superimposed models, where aligned structures are overlaid in three-dimensional space to reveal conserved folds, and difference maps that depict positional variances, often color-coded by deviation magnitude for intuitive interpretation. Such representations are generated by servers like , facilitating rapid assessment of conformational changes. Unlike sequence alignments, where gaps denote insertions or deletions (indels), structural alignments treat gaps as regions of non-equivalent topology or flexibility, preserving continuous residue chains without implying evolutionary events. Outputs from the exemplify this by providing gap-inclusive equivalence lists in searchable datasets, enabling analysis of discontinuous motifs in protein families.

Comparison Methods

Superposition Techniques

Superposition techniques form a foundational approach in structural alignment, focusing on geometrically overlaying molecular structures, particularly , to identify spatial correspondences between their atomic coordinates. These methods assume rigid-body transformations—rotations and translations—without deformations, aiming to minimize the distance between equivalent atoms in the two structures. The core objective is to find the optimal transformation that aligns the structures as closely as possible, often serving as an initial step before more sophisticated alignment refinements. Rigid-body superposition typically employs least-squares minimization to determine the best rotation and translation parameters. This involves iteratively adjusting the positions of one structure relative to the other to reduce the sum of squared distances between corresponding points. A seminal iterative method is the , which efficiently computes the optimal rotation matrix using (SVD). The process begins with centering both sets of atomic coordinates by subtracting their respective centroids, ensuring the structures are translationally aligned at their centers of mass. Next, a correlation matrix is constructed from the centered coordinates, and SVD is applied to derive the rotation matrix that minimizes the (RMSD), a common byproduct measure of alignment quality. This approach is computationally efficient and widely adopted for pairwise alignments of protein backbones, such as Cα atoms. In cases of multi-domain proteins connected by flexible linkers, global superposition may distort the alignment by forcing rigid overlay across mobile regions, leading to poor fits in individual domains. Local superposition addresses this by treating subdomains as independent rigid bodies, superposing them separately to account for inter-domain flexibility. This segmented approach enhances accuracy for proteins with conformational variability due to linker dynamics. To initiate superposition, especially for distantly related structures, initial seeding guides the selection of corresponding residues. Sequence alignments provide a preliminary mapping based on amino acid similarity, while secondary structure matches—such as aligning α-helices or β-sheets by direction and length—offer geometric priors to identify potential equivalents. Methods like combine gapless threading with secondary structure similarity to generate an initial set of aligned residues, which then informs the rigid-body transformation. These seeding strategies reduce search space and improve convergence in iterative superposition. A key challenge in superposition arises from insertions and deletions (indels) that manifest as topological discontinuities in 3D space, complicating the identification of equivalent points. Unlike sequence alignments, where indels are handled linearly, 3D indels require accounting for spatial gaps that can skew rotation estimates if not masked, potentially inflating in unaffected regions. Techniques often involve excluding indel-affected segments during initial overlay or using dynamic programming to tolerate such variations, though this increases computational demands for large structures.

Similarity Evaluation

Similarity evaluation in structural alignment quantifies the degree of resemblance between superimposed , providing a numerical basis for assessing evolutionary relationships, functional analogies, or modeling accuracy. These metrics address limitations of raw superposition by incorporating distance-based deviations, topological features, and statistical significance, often normalized to enable comparisons across proteins of varying sizes. The (RMSD) is a foundational metric, defined as the square root of the average squared distances between corresponding atoms after optimal superposition: \text{RMSD} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} d_i^2}, where d_i are the Euclidean distances for n atom pairs. Global RMSD evaluates the entire aligned structure, emphasizing overall fit but sensitive to outliers in flexible regions. Local RMSD, in contrast, focuses on specific subsets like active sites, mitigating the impact of conformational variability elsewhere. Backbone RMSD, typically computed using C\alpha atoms, reduces noise from side-chain orientations, while all-atom RMSD includes heavy atoms for a more comprehensive but noisier assessment. Advanced scores overcome RMSD's length dependence and sensitivity to alignment gaps. The TM-score, ranging from 0 to 1, measures topological similarity in a size-independent manner: \text{TM-score} = \max \left[ \frac{1}{L} \sum_{i=1}^{L_{\text{ali}}} \frac{1}{1 + \left( \frac{d_i}{d_0} \right)^2 } \right], where L is the length of the shorter protein, L_{\text{ali}} the number of aligned residues, d_i the distance between aligned residue pairs, and d_0 \approx 0.23 \, L^{2/3} N^{1/3} - 1.21 with N the length of the longer protein; values above 0.5 indicate the same fold regardless of size. The Global Distance Test Total Score (GDT-TS), used prominently in structure prediction assessments, averages the percentages of residues within distance cutoffs of 1 Å, 2 Å, 4 Å, and 8 Å from their references, yielding a 0-100 scale robust to local distortions. Other metrics provide complementary insights into significance and higher-order features. The Z-score assesses statistical relevance by standardizing a raw similarity score against a distribution of random alignments: Z = (S - \mu_S)/\sigma_S, where S is the score, and \mu_S, \sigma_S are the mean and standard deviation from null models; high Z-scores (e.g., >6) denote non-random similarity. Contact overlap evaluates preservation of intra-molecular interactions by comparing binary contact maps, often using C\beta-C\beta distances within 8 Å, normalized as the fraction of shared contacts to gauge fold conservation. Normalization is essential for fair comparisons, adjusting for protein , , and alignment coverage to avoid bias toward larger structures. For instance, RMSD increases with chain for random pairs, while TM-score and GDT-TS incorporate length scaling; thresholds like TM-score >0.5 or GDT-TS >60 reliably identify homologous folds in benchmarks. These metrics collectively enable robust interpretation of alignments, prioritizing fold-level conservation over atomic precision.

Algorithms

Complexity Analysis

Pairwise structural alignment of proteins can be formulated as a graph matching problem, where protein structures are represented as graphs with residues as nodes and spatial interactions (such as contacts or distance constraints) as edges, requiring the identification of a maximum-weight matching between the two graphs. Alternatively, it is viewed as an optimization over permutations of residue indices to maximize a scoring that accounts for spatial superposition and sequence similarity. These formulations capture the core challenge of aligning coordinates while respecting the sequential order of residues, often leading to integer linear programming models with variables representing possible residue correspondences. The optimal pairwise structural alignment problem is NP-hard, akin to the subgraph isomorphism problem, due to the combinatorial explosion in evaluating all possible residue mappings under 3D geometric constraints. Exact solutions exhibit time complexities of O(N^3) or worse, where N denotes the number of residues, arising from dynamic programming approaches over distance matrices or exhaustive enumeration in branch-and-bound frameworks; space requirements similarly scale poorly, often to O(N^4) in intermediate representations for unequal-length proteins. In contrast to 1D sequence alignment, which is solvable in polynomial time via O(N^2) dynamic programming, the 3D dimensionality introduces non-local geometric dependencies that preclude efficient exact optimization. Modeling protein flexibility, such as through gap penalties for loops or non-rigid deformations, further exacerbates complexity by expanding the search space with variable-length insertions and tolerance for conformational variations. These computational demands have significant practical implications, necessitating approximate methods for large-scale applications like database-wide searches across the (PDB), where exact alignment of thousands of structures against a query would be infeasible due to exponential runtimes even for moderate N (e.g., 100-300 residues). While exact algorithms remain valuable for verifying optimality in small, high-confidence cases, approximations enable scalable analyses but may sacrifice global optimality, as explored in subsequent sections on exact and approximate techniques.

Exact Algorithms

Exact algorithms for structural alignment seek to compute globally optimal solutions by exhaustively exploring the alignment space, often adapting techniques from to account for three-dimensional . These methods guarantee the best possible alignment under a defined scoring , such as maximizing the overlap of inter-residue distances or contact maps, but at the cost of high . The structural alignment problem is NP-hard, making exact approaches impractical for large proteins but valuable for precise of approximate methods. One prominent class involves extensions of dynamic programming, particularly adaptations of the Needleman-Wunsch algorithm originally developed for . In structural contexts, these employ double dynamic programming to align pairs of residues while incorporating geometric constraints, such as Cα atom distances or secondary structure similarities, to evaluate 3D compatibility. Another key category utilizes formulations, which model the alignment as an minimizing an energy-like function that captures structural similarity through distance matrices or overlap scores. These are typically solved via branch-and-bound or techniques to find the global optimum. A representative example is PAUL (Protein Alignment Using Lagrangian), which formulates pairwise alignment as an integer linear program based on inter-residue distances, using with variable splitting to efficiently compute optimal solutions by dualizing constraints and iteratively improving bounds. Similarly, DALIX applies with branch-and-cut methods to optimize the DALI scoring function, which emphasizes rigid-body superimposition and distance-based rewards, achieving exact alignments that outperform baselines in up to 85% of cases across protein folds. Despite their precision, exact algorithms are limited to small proteins, typically under 100 residues, due to time and memory demands—often O(n²m²) for proteins of lengths n and m. For pairs around 200 residues, runtimes can extend to hours on standard hardware, as seen in benchmarks where DALIX requires up to 30 CPU hours for challenging instances from datasets like or RIPC. Consequently, these methods see rare practical use beyond benchmarking and validation of faster heuristics, prioritizing conceptual optimality over scalability in routine analyses.

Approximate Algorithms

Approximate algorithms for structural alignment address the computational challenges of methods by employing heuristics that achieve near-optimal solutions in time, motivated by the NP-hard nature of the problem for general scoring functions. These approaches prioritize for large-scale analyses while maintaining sufficient accuracy for biological insights, often running in O(n^2) time where n is the number of residues. Iterative optimization forms a core in approximate structural , typically beginning with an initial seed alignment and refining it through repeated adjustments to minimize structural dissimilarity. For instance, algorithms sample rigid-body transformations using representations to align structures, iteratively optimizing scores like (RMSD) via gradient-based or methods until convergence. This process starts with a coarse superposition and progressively refines residue correspondences, enabling handling of globular proteins with approximation guarantees within a factor ε of the optimum. Seeding strategies enhance this by initializing matches based on sequence similarity or secondary structure elements, such as identifying conserved helices or sheets to anchor the alignment before extension. These seeds reduce the search space, allowing iterative refinement to explore local optima efficiently without exhaustive enumeration. For example, the SSAP (Sequential Structure Alignment Program) method, introduced by Orengo and colleagues, uses iterated double dynamic programming to generate high-scoring alignments by iteratively refining residue pairings based on vector representations of local geometry. This approach effectively handles the additional dimensionality of protein structures compared to linear sequences. Progressive alignment extends pairwise approximations to multiple structures by hierarchically building alignments, starting with highly similar pairs and iteratively incorporating additional structures based on a guide tree derived from initial similarity scores. This method aligns two structures or sub-alignments at each step using dynamic programming on a simplified scoring , propagating correspondences transitively to form a global multiple alignment. For example, tree-based progressive schemes first compute pairwise alignments and then merge them progressively, balancing the inclusion of distant homologs with computational tractability. in progressive contexts often leverages fragment-based matches from secondary structures to initiate pair alignments, ensuring robustness to conformational variations. These approximate strategies involve inherent trade-offs between accuracy and speed, where heuristics like O(n^2) dynamic programming for fixed transformations yield high-quality alignments for proteins up to thousands of residues but may sacrifice optimality for very divergent structures. Iterative and methods typically achieve 90-95% of exact scores in seconds to minutes, compared to hours for exact algorithms, making them suitable for database-scale searches while occasionally missing subtle local similarities. Quantitative benchmarks show that such approximations scale to alignments of 10-20 structures with RMSD errors under 2 for homologous pairs, prioritizing practical utility in evolutionary and functional studies.

Core Methods

Matrix-Based Methods

Matrix-based methods for structural alignment of proteins rely on representing three-dimensional structures as matrices of distances or similarities between residues, enabling the detection of topological equivalences even in the presence of conformational variations. These approaches typically construct intra-molecular distance matrices from Cα atomic coordinates, where each element reflects the between residue pairs, and then align structures by maximizing a similarity score between these matrices. This paradigm, originating in the , facilitates robust comparisons by focusing on global structural patterns rather than rigid superpositions, making it particularly effective for proteins with flexible loops or domain insertions. One seminal matrix-based method is DALI (Distance-matrix ALIgnment), introduced by Holm and Sander in 1993. DALI computes Cα distance matrices for each protein and decomposes them into hexagonal patterns of intra-molecular distances, identifying similar submatrices through an iterative process that combines Monte Carlo optimization with rigid-body superposition. The alignment score is refined by maximizing the number of matched residue pairs while penalizing gaps and distortions, ultimately yielding a Z-score that quantifies significance based on the distribution of intra-molecular distances in a reference set of unrelated structures. This Z-score, typically above 2 indicating structural similarity, allows DALI to benchmark alignments against statistical expectations, achieving high sensitivity for remote homologs. DALI's iterative refinement enhances accuracy by progressively superimposing aligned segments, with the final superposition minimizing root-mean-square deviation (RMSD) for equivalent residues. Another influential approach is SSAP (Sequential Structure Alignment Program), developed by Taylor and Orengo in 1989 and refined in subsequent works. SSAP employs a vector-based representation derived from distance matrices, where inter-residue vectors are encoded to capture local structural environments, such as solvent accessibility and secondary structure propensities. Alignment proceeds via double dynamic programming: an initial scan aligns vectors in three dimensions, followed by a sequence-like optimization that scores pairwise environments using a matrix combining geometric and physicochemical similarities. This scoring function, which weights vector directions and lengths, produces a normalized percentage identity score ranging from 0 to 100, with values above 70 often indicating significant similarity. SSAP's strength lies in its ability to handle non-sequential alignments while enforcing spatial consistency through iterative superposition, making it suitable for comparing domains within multidomain proteins. Structural alphabet methods extend matrix-based alignment by discretizing continuous structural information into a of "letters" analogous to codes, allowing algorithms to be applied directly to encoded structures. These methods originated in the with efforts to identify recurrent local motifs, such as Rooman et al.'s automated detection of structural fragments in proteins using distance-based clustering. Residues or short segments are encoded based on backbone torsion angles, distances, or environmental features into an of prototypes; for instance, the 3Di , introduced in , assigns one of 20 states to each residue by considering its closest spatial neighbor, capturing interactions while reducing dependency on sequential order. then uses standard dynamic programming on these letter sequences, with matrices derived from observed structural similarities, enabling efficient detection of conserved folds. This encoding preserves key topological features from underlying distance matrices, facilitating alignments robust to local distortions like loop variations. Overall, matrix-based methods excel in robustness to local distortions due to their emphasis on distance-derived similarities rather than coordinate overlays, a characteristic rooted in their development when computational resources limited exact geometric searches. These techniques have become foundational for database scanning and fold classification, with and SSAP remaining widely adopted for their balance of in identifying evolutionary relationships.

Fragment Assembly Methods

Fragment assembly methods in protein structural alignment involve identifying and combining short, local structural motifs or fragments from the proteins being compared to construct a alignment. These approaches are particularly effective for detecting similarities in proteins with low sequence identity, where methods may fail, by focusing on compatible local geometries rather than rigid whole-structure overlays. The Combinatorial Extension (CE) algorithm, introduced by Shindyalov and Bourne in 1998, exemplifies this strategy by incrementally building alignments from aligned fragment pairs (AFPs). CE begins by scanning the (PDB) to identify short segments (typically 8-24 residues) between two structures where the (RMSD) is below a threshold, such as 2.0 Å, ensuring geometric compatibility. These AFPs are then extended combinatorially by linking non-overlapping pairs that maintain spatial proximity and orientation, guided by a scoring function that accumulates similarity based on residue distances and alignment length. The process optimizes the path through a dynamic programming-like extension to maximize the cumulative score while minimizing gaps and distortions. MAMMOTH, developed by Ortiz et al. in 2002, advances fragment assembly through multidimensional embedding of local structural environments. It represents each residue by a in a high-dimensional space capturing intra- and inter-residue distances within a local window (e.g., 10-15 residues), then applies (PCA) to reduce dimensionality to 3-6 principal axes that preserve structural variance. Fragments are matched by comparing these reduced embeddings, with alignments grown by assembling compatible segments that maximize a Z-score based on structural similarity, allowing for flexible handling of conformational variations. Superposition techniques are used post-assembly to refine the global fit of the aligned fragments. In general, these methods rely on pre-computed fragment libraries derived from the PDB, where candidate matches are filtered using RMSD thresholds (often 1.5-3.0 ) to ensure low structural deviation before assembly. This modular approach excels in handling discontinuous alignments, such as those in multi-domain proteins or structures with insertions/deletions, by permitting gaps in the fragment chain without penalizing the overall similarity score as severely as in continuous methods. For instance, has demonstrated superior performance in aligning proteins with <20% sequence identity, achieving alignments with RMSD values under 3 for homologous folds in benchmark tests.

Geometric Methods

Geometric methods for structural alignment leverage directional and angular properties of protein backbones to identify similarities, focusing on vector representations and encodings rather than direct coordinate overlays. These approaches optimize alignments by minimizing deviations in and , often using matrices or pseudo-sequences derived from geometric features. By prioritizing continuous spatial characteristics, such methods enable efficient detection of structural homologs even when sequences diverge significantly. TM-align exemplifies geometric through optimization of a that superimposes protein while maximizing the TM-score, a scale-independent metric of topological similarity defined as \text{TM-score} = \max \left[ \frac{1}{L_{\text{target}}} \sum_{i=1}^{L_{\text{aligned}}} \frac{1}{1 + (d_i / d_0(L_{\text{target}}))^2} \right], where d_i is the distance between aligned residue pairs after superposition, and d_0 is a length-dependent scaling factor. The algorithm begins with an initial guided by secondary elements, assigning higher scores to matching helices or strands to prioritize conserved geometric motifs, followed by iterative dynamic programming refinements using a distance-based scoring matrix S(i,j) = 1 / (1 + d_{ij}^2 / d_0(L_{\min})^2). This rotation-centric approach ensures of angular and directional without relying on fragment decomposition. Vector alignment methods represent protein backbones as sequences of direction , typically unit vectors between consecutive Cα atoms, to capture local orientations and enable comparison via metrics like unit- root-mean-square deviation (URMS). In such approaches, structures are by finding the optimal rotation that minimizes the sum of squared distances between corresponding vectors, often using dynamic programming to match vector sequences while accounting for gaps. For instance, shapes for protein families are derived by averaging vectors, preserving directional consistency across homologs with variable backbone spacing up to approximately 4 . These methods emphasize geometric invariance under rigid transformations, facilitating the identification of shared folds through vector topology. comparisons complement vector representations by quantifying torsional similarities, where φ and ψ angles define local backbone conformations for finer . A prominent -1D encoding projects three-dimensional structures into one-dimensional pseudo-sequences using structural alphabets, such as the Protein Blocks (), which approximate local geometries via pentapeptide-like motifs derived from φ and ψ dihedral angles. Each residue is assigned a PB by minimizing on angular values within a sliding of five Cα atoms, transforming the 3D backbone into a 1D string suitable for sequence-like processing. Alignment then proceeds via dynamic programming on these PB sequences, employing a built from known structural alignments (e.g., from the PALI database) to score matches between blocks, supporting both local and global optimizations with gap penalties. This encoding preserves key angular and directional features while reducing . Geometric methods like TM-align and PB-based encodings demonstrate superior speed and accuracy, particularly for twilight zone proteins with sequence identities below 30%, where TM-align achieves average TM-scores of 0.51 for high-confidence matches and detects remote homologs with RMSD up to 5 Å and coverage over 40%, outperforming coordinate-based tools in sensitivity. Vector and PB approaches similarly enable rapid alignments (under 1 minute per pair) with recognition rates exceeding 85% for distant relatives in large databases like , balancing efficiency with robust geometric fidelity.

Extensions

Multiple Alignment

Multiple structural alignment extends the principles of pairwise structural alignment to simultaneously superimpose and compare more than two protein structures, enabling the identification of conserved cores and variable regions across a set of related proteins. This process typically builds on pairwise alignments as foundational building blocks, iteratively merging them to form a global superposition that captures evolutionary conservation and structural divergence. By aligning multiple structures, researchers can infer functional insights that are obscured in pairwise comparisons alone. The primary challenges in multiple structural alignment stem from the combinatorial explosion of possible residue equivalences as the number of input structures grows, rendering the problem NP-hard and computationally intractable for exact solutions beyond small sets. methods are thus essential to approximate optimal alignments efficiently. Another key difficulty is establishing a superposition that minimizes overall deviations while accommodating conformational flexibility and insertions/deletions across the structures, often requiring iterative refinement to balance global and local similarities. Prominent methods address these challenges through progressive or graph-based strategies. Progressive approaches, exemplified by MUSTANG (MUltiple STructural AligNment AlGorithm), begin with pairwise alignments scored on residue-residue contacts and local topology, then progressively incorporate additional structures along a guide tree without explicit gap penalties, allowing for flexibility in distant homologs and achieving high accuracy (e.g., 93.4% on benchmark families). Graph-based methods, such as POSA (Partial Order Structure Alignment), model protein backbones as partial order graphs to represent flexible alignments, enabling the detection of conserved regions present in subsets of structures and handling internal motions without assuming a linear order. These techniques prioritize structural similarity over sequence, often outperforming sequence-based multiples in cases of low homology. Multiple structural alignments find critical applications in analyzing family-wide folds, where they reveal conserved structural motifs across homologous proteins despite sequence divergence, as seen in alignments of cyclin-dependent kinases that enable of active and inactive states with over 98% accuracy using derived features. In superfamily analysis, they facilitate the detection of distant evolutionary relationships by highlighting shared cores in diverse proteins, supporting functional predictions and the expansion of structural databases like or CATH. Evaluation of multiple alignments relies on metrics that quantify and superposition quality, including core RMSD, which computes the solely over the strict core of positions equivalent in all s and within 4 Å, providing a measure of structural in conserved regions (lower values indicate tighter alignments). Alignment length consistency, often expressed as the core size as a percentage of the shortest input , assesses the extent of overlap and robustness across the set, with higher percentages signaling more reliable evolutionary inferences.

RNA Alignment

RNA structural alignment focuses on comparing three-dimensional () conformations of ribonucleic acid () molecules, which exhibit unique topological features such as pseudoknots, diverse base-pairing interactions, and junction loops that distinguish them from protein structures. Pseudoknots arise when nucleotides in a single-stranded loop form base pairs with complementary nucleotides outside the loop, often stabilized by coaxial stacking of stems and non-canonical base triples involving loops. Base interactions in RNA include canonical Watson-Crick pairs (A-U, G-C) as well as non-canonical ones like Hoogsteen and wobble pairs, which contribute to the flexibility and functional diversity of RNA motifs. Junction loops, or multi-branched loops, serve as critical connectors between helical stems, organizing the overall 3D architecture and enabling complex folding patterns observed in functional RNAs like ribozymes and riboswitches. The primary goal of RNA alignment is to identify correspondences that preserve secondary structure motifs, including base-pairing patterns and pseudoknotted regions, while accounting for the inherent flexibility and modular of RNA topologies. This highlights evolutionary relationships and functional similarities, such as in non-coding RNAs where structural integrity is key to regulatory roles. Unlike rigid superposition used in general structural comparisons, RNA alignments often incorporate dynamic programming or graph-based approaches to handle discontinuous helices and loop-mediated interactions. Key methods for 3D structural include RMalign, which employs a size-independent scoring function called RMscore to evaluate similarity based on residue-residue distances and secondary structure elements, achieving superior performance in classifying RNA structures compared to earlier tools like . TOPAS provides a network-based approach for pairwise of RNA sequences, constructing topological networks from predicted secondary structures that incorporate sequential and base-pairing edges, followed by probabilistic to capture structural similarities even in pseudoknotted regions. For homology detection, rMSA integrates sequence search against databases like NCBI nt and RNAcentral with covariance model-based , enhancing the accuracy of secondary structure by at least 20% through improved of homologous RNAs. Automated pipelines facilitate comprehensive analysis, with rMSA serving as a multi-stage tool that combines homology search, alignment, and structure modeling to streamline workflows for large-scale RNA datasets. Recent advances incorporate , such as the 2024 REDalign method, which uses a encoder-decoder network to predict and align RNA secondary structures by learning consensus patterns from sequence and structural data, offering high accuracy with reduced computational overhead compared to traditional dynamic programming.

Advances

Integration with Prediction Tools

Structural alignment plays a pivotal role in pipelines, particularly for template selection in AlphaFold-Multimer, where alignments to known structures generate diverse multiple sequence alignments (MSAs) and structural templates to boost prediction accuracy. Methods like MULTICOM leverage both sequence and structure alignments to create these inputs, enabling more robust modeling of protein complexes by identifying homologous templates beyond pure sequence similarity. Furthermore, alignment facilitates the refinement of predicted structures against experimentally validated ones, correcting discrepancies in low-confidence regions and improving overall model quality. Following prediction, structural alignment tools are essential for validating AlphaFold models, with TM-align commonly used to compare predicted structures to references, incorporating the TM-score for fold similarity assessment. This process integrates AlphaFold's per-residue confidence scores, such as pLDDT, to prioritize alignments in regions with predicted local distance differences test (pLDDT) values below 70, guiding targeted refinements. In large-scale databases like the Protein Structure Database, Foldseek enables efficient structural searches and alignments across millions of predicted models, encoding structures into a 20-state 3Di for rapid, sensitive comparisons. This integration supports template discovery and validation by returning alignments with metrics like E-values and overlap percentages, streamlining workflows for researchers querying the database. The synergy between structural alignment and prediction tools gained momentum after AlphaFold's dominance in CASP14 in 2021, where its high-accuracy predictions highlighted the need for alignment-based validation and refinement to handle novel folds. By 2025, advancements include multimodal methods that align outputs with cryo-EM densities for refined atomic models, and Distance-AF, which optimizes predictions using alignments to enhance structure accuracy. These developments, alongside October 2025 updates to the Database incorporating isoforms and expanded coverage, have further integrated alignment into iterative refinement pipelines.

Machine Learning Approaches

Machine learning approaches have revolutionized structural alignment by enabling faster, more scalable comparisons of protein and RNA structures, particularly in the era of vast predicted structure databases. These methods leverage neural networks to encode complex three-dimensional features into compact representations, reducing the computational burden of traditional geometric alignments while maintaining high accuracy. Key innovations include graph-based embeddings and deep architectures tailored for high-throughput searches and specific biomolecular types. Graph neural networks (GNNs) have been instrumental in creating embeddings for rapid structural search. For instance, Foldseek employs a vector-quantized (VQ-VAE), a type of , to discretize protein structures into a 20-character "3Di" alphabet that captures tertiary residue-residue interactions, transforming alignment into efficient . This approach achieves near-linear scaling, enabling searches across millions of structures in seconds, with sensitivity comparable to traditional tools like TM-align but orders of magnitude faster. Deep learning models have further advanced alignment for both proteins and . SARST2 integrates artificial neural networks (ANNs) and decision trees in a filter-and-refine pipeline, combining sequence, secondary structure, and evolutionary data to align query proteins against massive databases like DB in under 4 minutes on standard hardware, with 96% accuracy in detection. For , REDalign uses a residual encoder-decoder network to align secondary structures by learning pattern-based correspondences, outperforming classical methods like RNAforester in accuracy on datasets while requiring less computation. Recent benchmarking underscores the speed advantages of ML-driven tools. A 2025 evaluated nine algorithms—including ML-based ones like Foldseek, TM-Vec, and DeepAlign—on downstream tasks such as detection and , revealing that ML methods like Foldseek reduced runtime by up to 100-fold compared to geometric tools like TM-align, especially for large-scale datasets, without sacrificing quality. Advances in ML have extended to AI-assisted protein design and post- data handling. In design, embeddings facilitate of generated s against natural templates to validate novelty and , as seen in frameworks that invert models for design.00311-9) For large datasets, ML clustering via Foldseek has processed over 200 million AlphaFold s, identifying structural families and enabling proteome-wide s that were previously infeasible.

Tools

Web-Based Tools

Web-based tools for structural alignment provide accessible platforms that enable researchers to perform alignments directly in a without requiring software , making them ideal for quick analyses or users without advanced computational resources. These services typically accept uploads of Protein Data Bank (PDB) files or entry identifiers and output visualizations, scores, and aligned structures, though they often impose limits on input sizes or concurrent jobs to manage server load. Common scoring metrics, such as TM-score for global similarity or Z-scores for significance, help evaluate alignment quality. The RCSB PDB Pairwise Structure Alignment tool, hosted by the Research Collaboratory for Structural Bioinformatics (RCSB), allows users to upload PDB files or specify existing entries for pairwise superposition of protein structures. It computes alignments using methods like TM-align, providing TM-scores to quantify structural similarity, and supports visualization of overlaid models. This free service emphasizes user-friendly interfaces for selecting chains and viewing results, with no installation needed, though large datasets may require options. DaliLite, accessible via the Dali web server, serves as the online version of the DALI algorithm for pairwise and multiple alignments based on intra-molecular distance matrices. Users submit query structures against the PDB database or other inputs, receiving Z-scores that indicate of similarities, along with aligned coordinate files and structural neighborhoods. Designed for ease of use, it processes jobs asynchronously and limits inputs to manage computational demands, making it suitable for exploratory comparisons without local setup. PDBeFold, provided by the in Europe (PDBe), offers a web interface for structural alignments using Combinatorial Extension () and Sequence Structure Alignment Program (SSAP) methods, supporting both pairwise and multiple comparisons. It enables uploading of structures or searching against the PDB, outputting superposition visuals and similarity scores, with options to refine alignments iteratively. As a free, browser-based resource, it facilitates rapid assessments but restricts large-scale submissions to prevent overload. For RNA structures, RNAhub (launched in 2025) is a specialized that automates the alignment of RNA homologs by integrating searches with secondary and covariation analysis. Users input an RNA to generate multiple alignments that incorporate structural constraints, assessing via tools like R-scape, which is particularly useful for -to- alignments in non-coding RNAs. This no-install platform is free but caps query lengths and homology searches to ensure efficient processing.

Standalone Software

Standalone software for structural alignment provides downloadable programs that enable offline computation, supporting high-throughput analyses, customization via scripting, and integration with local workflows without reliance on . These tools are particularly valuable for researchers handling large datasets or requiring reproducible, resource-controlled environments, often available as open-source options with binaries or for various operating systems. TM-align is a widely used command-line tool for fast pairwise alignment based on a geometric approach employing the TM-score to optimize superposition while prioritizing global . Developed in 2005, it excels in speed and accuracy for comparing structures with low sequence similarity, making it suitable for scripting and pipeline integration in structural bioinformatics tasks. The tool outputs alignment details including TM-score, RMSD, and aligned residues, and is distributed as a precompiled for , macOS, and Windows, with minimal dependencies. Foldseek, introduced in 2023, is an open-source tool designed for rapid and sensitive large-scale searches and alignments by encoding structures into compact 1D sequences using a 20-state 3Di derived from inter-residue . This encoding allows techniques like reduced-alphabet to perform structural comparisons at speeds up to 2-4 orders of magnitude faster than traditional methods, while maintaining high sensitivity for remote homolog detection. It supports and multimer alignments, clustering, and on massive databases, with installation via conda or from source on and macOS, requiring dependencies like for parallelization. CE (Combinatorial Extension) is a fragment-based method that identifies and extends short aligned fragments to construct optimal global alignments, emphasizing rigid-body superpositions of protein backbones. Implemented as a standalone program since its inception in 1998, it supports pairwise and multiple alignments through CE-MC extensions, enabling for comparing models against reference structures. and binaries are available for download, compilable on systems with C dependencies, and it integrates well with tools for structural database . MaxCluster complements fragment-based approaches by providing a versatile command-line utility for pairwise structure comparison and clustering, computing metrics like RMSD, GDT-TS, and TM-score across large sets of models. Released around 2008, it facilitates high-throughput evaluation of predicted structures, such as those from folding simulations, with precompiled binaries for , macOS, and Windows, and no external dependencies beyond standard libraries. A recent advancement is SARST2, released in 2025, which offers high-throughput, resource-efficient structural alignment against massive databases by transforming Ramachandran angles into sequential representations for accelerated similarity searches. It achieves superior speed and low memory usage compared to predecessors, making it ideal for aligning predicted structures like those from models to explore evolutionary relationships at scale. The open-source implementation on supports Linux and macOS installation via Python dependencies including and , with options for GPU acceleration to handle datasets exceeding millions of structures.

References

  1. [1]
    Pairwise Structure Alignment - RCSB PDB
    Sep 23, 2024 · Introduction. What is Structure Alignment? Structure alignment attempts to establish residue-residue correspondence between two or more ...
  2. [2]
    The meaning of alignment: lessons from structural diversity - PMC
    Dec 23, 2008 · Protein structural alignment provides a fundamental basis for deriving principles of functional and evolutionary relationships.
  3. [3]
    [PDF] A Novel Approach to Structure Alignment - Stanford University
    It enables the study of func- tional relationship between proteins and is very important for homology and threading methods in structure prediction.
  4. [4]
    RAPIDO: a web server for the alignment of protein structures ... - PMC
    The RMSD for a flexible superposition (RMSDf) is calculated as the RMSD over all Cα-atoms of the individual rigid bodies superimposed separately. The ...Missing: equation | Show results with:equation
  5. [5]
    Methods of protein structure comparison - PMC - NIH
    RMSD can be calculated for any type and subset of atoms; for example, Cα atoms of the entire protein, Cα atoms of all residues in a specific subset (e.g. the ...Methods Of Protein Structure... · 2 Methods · Rms Of Dihedral Angles
  6. [6]
    The difficulty of protein structure alignment under the RMSD - PMC
    Protein structure alignment is often modeled as the largest common point set (LCP) problem based on the Root Mean Square Deviation (RMSD).
  7. [7]
    Protein remote homology detection and structural alignment ... - Nature
    Sep 7, 2023 · The challenge of remote homology detection is identifying structurally similar proteins that do not necessarily have high sequence similarity.
  8. [8]
    CATH database: an extended protein family resource for structural ...
    These approaches used datasets of distant homologues selected from the structural classifications, such as SCOP and CATH, to determine the sensitivity of ...
  9. [9]
    CATHe: detection of remote homologues for CATH superfamilies ...
    CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation ...
  10. [10]
    The TIM Barrel Architecture Facilitated the Early Evolution of Protein ...
    Jan 5, 2016 · The triosephosphate isomerase (TIM) barrel protein fold is a structurally repetitive architecture that is present in approximately 10 % of all enzymes.
  11. [11]
    Protein function annotation with Structurally Aligned Local Sites of ...
    Feb 28, 2013 · The top-ranked residues constitute the active site prediction. Structural alignments are obtained for sets of these local sites. Characteristic ...
  12. [12]
    Correlation of fitness landscapes from three orthologous TIM barrels ...
    Mar 6, 2017 · The TIM barrel superfamily contains 57 distinct families and encompasses five of the six enzyme commission functional categories catalysing at ...
  13. [13]
    Template-Based Protein Structure Modeling - PMC - PubMed Central
    Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of ...
  14. [14]
    Structure-Based Virtual Screening for Drug Discovery: Principles ...
    In this review, we focus on the principles and applications of Virtual Screening (VS) within the context of SBDD and examine different procedures.
  15. [15]
    PoLi: A Virtual Screening Pipeline Based On Template Pocket ... - NIH
    In PoLi, ligand binding pockets in the query protein's structure are predicted using two different approaches: first by global structural superposition of holo- ...
  16. [16]
    Dali server: structural unification of protein families - Oxford Academic
    May 24, 2022 · The Dali server for 3D protein structure comparison is widely used by crystallographers to relate new structures to pre-existing ones.
  17. [17]
    Dali server - ekhidna.biocenter.
    The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them.
  18. [18]
    Dealing with Coordinates - PDB-101
    In PDB file format, the ATOM record is used to identify proteins or nucleic acid atoms, and the HETATM record is used to identify atoms in small molecules.Atomic-Level Data · Chains And Models · Temperature Factors
  19. [19]
    [PDF] PROTEIN DATA BANK ATOMIC COORDINATE AND ...
    The file contains records of 80 characters each, including HEADER, COMPND, SOURCE, ATOM, and HETATM records, with ATOM and HETATM records appearing in  ...
  20. [20]
    GOSSIP: a method for fast and accurate global alignment of protein ...
    In this article, we propose a global structural alignment method that has the following properties: it can be applied to any set of protein structures with Cα ...
  21. [21]
    Comprehensive Evaluation of Protein Structure Alignment Methods
    Aug 7, 2025 · The majority of the current structure alignment 4, 5 and protein classification methods use either C α -polylines [6][7][8] or the 3D ...
  22. [22]
    Secondary structure assignment that accurately reflects physical and ...
    Secondary structure is used in hierarchical classification of protein structures, identification of protein features, such as helix caps and loops, ...
  23. [23]
    Protein secondary structure: category assignment and predictability
    The conventional 'H' assignment of α-helices by DSSP contains approximately 20% residues without hydrogen bonds from residues i→i+4 and/or i−4←i. For the β- ...
  24. [24]
    Deep learning methods for protein torsion angle prediction
    Sep 18, 2017 · The conformation of the backbone of a protein can be largely represented by two torsion angles (phi and psi angles) associated with each Cα atom ...
  25. [25]
    Part 1: Protein Structure - Backbone torsion angles
    The protein backbone can be described in terms of the phi, psi and omega torsion angles of the bonds: The phi angle is the angle around the -N-CA- bond ...
  26. [26]
    Alignments of biomolecular contact maps | Interface Focus - Journals
    Jun 11, 2021 · Here, we focus on contact maps, ie undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures.
  27. [27]
    Protein structure comparison by alignment of distance matrices
    Sep 5, 1993 · We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional co-ordinates of each protein are used to ...
  28. [28]
    [PDF] Protein Structure Comparison by Alignment of Distance Matrices
    Here, we present a general approach for aligning a pair of proteins represented by two-dimensional matrices. The result is a set of structurally equiva- lent ...
  29. [29]
    Alignment of protein structures in the presence of domain motions
    As it is well known, protein molecules are flexible entities with internal movements ranging from the displacement of individual atoms to movements of entire ...
  30. [30]
    CE-MC: a multiple protein structure alignment server - PMC - NIH
    Output data. The CE-MC program outputs multiple alignments in four different formats, i.e. JOY/html, JOY/PostScript, text and FASTA formats. The JOY/html format ...
  31. [31]
    Alignment file (PIR)
    The first line of each sequence entry specifies the protein code after the >P1; line identifier. The line identifier must occur at the beginning of the line.
  32. [32]
    Pairwise Structure Alignment - RCSB PDB
    Sep 23, 2024 · This tool presents options for pairwise structure alignment of proteins. In the case of pairwise alignment, structures are always compared in pairs.Pairwise Structure Alignment · Documentation · ExamplesMissing: efficiency | Show results with:efficiency<|separator|>
  33. [33]
    TM-align: a protein structure alignment algorithm based on the ... - NIH
    We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and ...Initial Structural Alignment · Results · Benchmark Test
  34. [34]
    SuperPose
    From a superposition of two or more structures, Superpose generates sequence alignments, structure alignments, PDB coordinates, RMSD statistics, Difference ...
  35. [35]
    Protein multiple alignments: sequence-based versus structure ...
    Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built ...Introduction · Materials and methods · Results · Discussion
  36. [36]
    Dali server: conservation mapping in 3D - PMC
    May 10, 2010 · Outputs. The Dali server, Dali Database and pairwise comparison use a common output format and share interactive analysis tools. The result ...
  37. [37]
    Matt: Local Flexibility Aids Protein Multiple Structure Alignment
    We introduce an algorithm that allows local flexibility in the structures when it brings them into closer alignment.
  38. [38]
    Fr-TM-align: a new protein structural alignment method based on ...
    TM-align employs a very simple approach that uses both gapless threading and secondary structure similarity to generate the initial set of equivalent residues.
  39. [39]
    The difficulty of protein structure alignment under the RMSD
    Jan 4, 2013 · The study identified two important factors in the problem's complexity: (1) The lack of a limit in the distance between the consecutive points ...
  40. [40]
    Comparative Analysis of Protein Structure Alignments - PMC
    Similar structures might display considerable structural variability and are often related by several insertions and deletions (indels) of considerable size.
  41. [41]
  42. [42]
  43. [43]
  44. [44]
    [PDF] Exact algorithms for pairwise protein structure alignment - CORE
    Finding the score- optimal residue correspondences is believed to be an NP-hard problem for almost all. 3-dimensional scoring functions. 2-dimensional scoring ...Missing: isomorphism | Show results with:isomorphism<|control11|><|separator|>
  45. [45]
    A comparison of algorithms for the pairwise alignment of biological ...
    The problem of global alignment is equivalent to the subgraph isomorphism problem, which is NP-complete (Cook, 1971), so aligners settle for approximate ...
  46. [46]
    Protein structure comparison using iterated double dynamic ... - NIH
    A protein structure comparison method is described that allows the generation of large populations of high-scoring alternate alignments.
  47. [47]
    An integrated approach to the analysis and modeling of protein ...
    This is a local alignment algorithm that focuses only on aligning the core structures defined by the threshold, and is similar to the Smith-Waterman ...<|control11|><|separator|>
  48. [48]
    PAUL: protein structural alignment using integer linear programming ...
    Oct 19, 2009 · Protein structural alignment determines the three-dimensional superposition of protein structures by means of aligning the protein's residues.Missing: definition | Show results with:definition
  49. [49]
    DALIX: optimal DALI protein structure alignment - PubMed
    We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model.Missing: runtime | Show results with:runtime
  50. [50]
    [PDF] DALIX: optimal DALI protein structure alignment - Hal-Inria
    Apr 26, 2012 · For SISY, RIPC and SKOLNICK, a maximum running time of 30 CPU hours per instance is applied and for the short SCOPCath instances a time ...
  51. [51]
    Approximate protein structural alignment in polynomial time - PNAS
    Here, we study the structural alignment problem as a family of optimization problems and develop an approximate polynomial-time algorithm to solve them.
  52. [52]
    Algorithms for Multiple Protein Structure Alignment and ... - NIH
    The program output format of multiple alignments is ClustalX or PIR. ... Kolodny R and Linial N Approximate protein structural alignment in polynomial time.
  53. [53]
  54. [54]
    Protein structure alignment by incremental combinatorial extension ...
    The algorithm uses combinatorial extension (CE) of aligned fragment pairs (AFPs) based on local geometry to build an optimal protein structure alignment.
  55. [55]
    A database and tools for 3-D protein structure comparison and ...
    The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of ...
  56. [56]
    MAMMOTH (matching molecular models obtained from theory)
    Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low- ...<|control11|><|separator|>
  57. [57]
    MAMMOTH (Matching molecular models obtained from theory): An ...
    Apr 13, 2009 · A new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low- ...
  58. [58]
    Geometric Methods for Protein Structure Comparison - SpringerLink
    Protein structural comparison is an important operation in molecular biology and bionformatics. It plays a central role in protein analysis and design.
  59. [59]
    a protein structure alignment algorithm based on the TM-score
    We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and ...
  60. [60]
    [PDF] Finding the Consensus Shape for a Protein Family
    The compared vectors are those between adjacent α-carbons along the protein backbone. We refer to these direction vectors as unit vectors because, for proteins, ...
  61. [61]
    SABERTOOTH: protein structural alignment based on a vectorial ...
    We show that protein comparison based on a vectorial representation of protein structure performs comparably to established algorithms based on coordinates.
  62. [62]
    Protein Block Expert (PBE): a web-based protein structure analysis ...
    Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis.
  63. [63]
    MUSTANG: A multiple structural alignment algorithm
    May 30, 2006 · MUSTANG aligns residues on the basis of similarity in patterns of both residue–residue contacts and local structural topology.
  64. [64]
    Algorithms, applications, and challenges of protein structure alignment
    In this review, we first present a small survey of current methods for protein pairwise and multiple alignment, focusing on those that are publicly available ...
  65. [65]
    Multiple flexible structure alignment using partial order graphs
    A new method of multiple protein structure alignment, POSA (Partial Order Structure Alignment), was developed using a partial order graph representation of ...
  66. [66]
    A multiple protein structure alignment and feature extraction suite - NIH
    Apr 6, 2020 · A multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments.Missing: wide | Show results with:wide
  67. [67]
    Multiple structure alignment and consensus identification for proteins
    Feb 2, 2010 · This paper presents an algorithm to compute a multiple structure alignment for a set of proteins and to generate a consensus structure. The ...
  68. [68]
    Accuracy analysis of multiple structure alignments - PMC - NIH
    Protein structure alignment methods are essential for many different challenges in protein science, such as the determination of relations between proteins ...Missing: cons speed
  69. [69]
    Structure, stability and function of RNA pseudoknots involved in ...
    An RNA pseudoknot is a structural element of RNA which forms when nucleotides within one of the four types of single-stranded loops in a secondary structure ( ...
  70. [70]
    Structure of the Human Telomerase RNA Pseudoknot Reveals ...
    The telomerase pseudoknot has 8 nt in loop 1, which allows it to form additional stem 2-loop 1 Hoogsteen base triples. These multiple base triple interactions ...
  71. [71]
    RNAloops: a database of RNA multiloops - Oxford Academic
    Jul 9, 2022 · RNAloops is a self-updating database that stores multi-branched loops identified in the PDB-deposited RNA structures.
  72. [72]
    All-at-once RNA folding with 3D motif prediction framed by ... - Nature
    Oct 3, 2025 · Critical loops and junctions connect these helices and arrange them into a 3D structure. These nonhelical linker regions, called RNA 3D motifs ...
  73. [73]
    RMalign: an RNA structural alignment tool based on a novel scoring ...
    Apr 8, 2019 · In this study, we develop a novel RNA 3D structure alignment approach RMalign, which is based on a size-independent scoring function RMscore.
  74. [74]
    TOPAS: network-based structural alignment of RNA sequences
    Jan 10, 2019 · We propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks.
  75. [75]
    rMSA: A Sequence Search and Alignment Algorithm to Improve RNA ...
    rMSA is a multi-stage pipeline for automated RNA homology search and MSA construction. rMSA improves covariance-based RNA secondary structure prediction ...
  76. [76]
    accurate RNA structural alignment using residual encoder-decoder ...
    Nov 5, 2024 · REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands.
  77. [77]
    Enhancing alphafold-multimer-based protein complex structure ...
    MULTICOM enhances AlphaFold-Multimer predictions by generating diverse MSAs and structural templates using both sequence and structure alignments for AlphaFold ...
  78. [78]
    Enhancing AlphaFold-Multimer-based Protein Complex Structure ...
    May 18, 2023 · MULTICOM enhances AlphaFold-Multimer predictions by generating diverse MSAs and structural templates using both sequence and structure ...Multimer Msa Generation · Multimer Template... · 3 Results
  79. [79]
    Enhancing cryo-EM structure prediction with DeepTracer and ...
    In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure.
  80. [80]
    Flexible fitting of AlphaFold2-predicted models to cryo-EM density ...
    Nov 18, 2024 · AlphaFold2 generally predicts highly accurate structures, but 18 of the 137 models of isolated chains exhibit a TM-score below 0.80. We achieved ...
  81. [81]
    AlphaFold two years on: Validation and impact - PNAS
    pTM stands for predicted TM-score, and is a measure of AlphaFold's expected global accuracy on a protein or complex. An interface-only version of pTM can be ...
  82. [82]
    Fast and accurate protein structure search with Foldseek - PMC - NIH
    Foldseek enables fast and sensitive comparison of large structure sets. It encodes structures as sequences over the 20-state 3Di alphabet.
  83. [83]
    AlphaFold database empowers researchers with enhanced ...
    Sep 12, 2024 · The integration provides a seamless and user-friendly experience, allowing for smooth navigation between sequence and structural data. This ...
  84. [84]
    Highly accurate protein structure prediction with AlphaFold - Nature
    Jul 15, 2021 · Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure ...
  85. [85]
    Multimodal deep learning integration of cryo-EM and AlphaFold3 for ...
    Oct 31, 2025 · This model is further refined using AlphaFold3-predicted structures and density maps to build final atomic structures. MICA significantly ...
  86. [86]
    Distance-AF improves predicted protein structure models by ... - Nature
    Sep 30, 2025 · The use of AF2 models is further facilitated by the AlphaFold Database which now holds structure models of over 200 million proteins. AF2 has ...
  87. [87]
    EMBL-EBI and Google DeepMind renew partnership and release ...
    Oct 7, 2025 · This AFDB update ensures AlphaFold coverage for newly discovered proteins and incorporates protein isoforms, providing a more complete view of ...<|control11|><|separator|>
  88. [88]
    Clustering predicted structures at the scale of the known protein ...
    Sep 13, 2023 · Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures ...
  89. [89]
    Pairwise Structure Alignment Tool - RCSB PDB
    This tool allows the selection of protein 3D structures for alignment. Use an existing PDB or Computed Structure Model entry ID, upload a local file with ...
  90. [90]
    PDBe < Fold < EMBL-EBI
    PDBeFold is used as a structure search engine in PDBePISA. PDBeFold queries may be launched from any web site (instructions).PDBeFold Links · Version log · PDBeFold · Publications
  91. [91]
    RNAhub—an automated pipeline to search and align RNA ...
    An online server created for identifying and aligning homologous sequences is rMSA [10]. The server generates MSA searching sequences from the NCBI Nucleotide ...
  92. [92]
    RNAhub
    RNAhub is a computational tool developed to automate the retrieval and alignment of homologous RNA sequences, with the aim of facilitating RNA structural and ...Missing: 2024 | Show results with:2024
  93. [93]
    A protein structure alignment algorithm using TM-score rotation matrix
    TM-align is an algorithm for sequence independent protein structure comparisons. For two protein structures of unknown equivalence, TM-align first generates ...
  94. [94]
    Fast and accurate protein structure search with Foldseek - Nature
    May 8, 2023 · Structural alphabets thus reduce structure comparisons to much faster sequence alignments. Many ways to discretize the local amino acid backbone ...
  95. [95]
    Foldseek enables fast and sensitive comparisons of large structure ...
    Foldseek enables fast and sensitive comparisons of large protein structure sets, supporting monomer and multimer searches, as well as clustering.Releases 10 · Steineggerlab/foldseek Wiki · Steineggerlab/foldseek · GitHub
  96. [96]
    MaxCluster - A tool for Protein Structure Comparison and Clustering
    MaxCluster is a command-line tool for the comparison of protein structures. It provides a simple interface for a large number of common structure comparison ...Missing: TS | Show results with:TS
  97. [97]
    SARST2 high-throughput and resource-efficient protein structure ...
    Sep 30, 2025 · Unlike TM-align, which aligns proteins solely based on structural information18, SARST2 and Foldseek incorporate amino acid sequence features ...
  98. [98]
    NYCU-10lab/sarst: An efficient protein structural alignment ... - GitHub
    SARST2 high-throughput and resource-efficient protein structure alignment against massive databases. Nature Communications (2025). https://doi.org/10.1038/ ...Missing: deep | Show results with:deep