Fact-checked by Grok 2 weeks ago

Phylogenetic tree

A phylogenetic tree is a branching that illustrates the evolutionary relationships among a set of biological taxa, such as or genes, based on inferred patterns of from common ancestors. These trees depict hypothesized hierarchies of ancestry, where branch points represent divergence events and branch lengths often indicate the amount of evolutionary change or time elapsed. Constructed from empirical data including morphological traits, records, or molecular sequences like DNA or proteins, phylogenetic trees provide a framework for understanding and testing hypotheses of . The concept traces back to Charles Darwin's 1837 sketch of an abstract evolutionary tree, evolving into formal cladistic methods in the 20th century that emphasize monophyletic groups—clades comprising an and all its descendants. Modern inference relies on computational algorithms such as maximum parsimony, which minimizes ary changes; maximum likelihood, which evaluates probabilistic models of sequence ; and Bayesian approaches incorporating prior probabilities. Despite their utility in reconstructing life's history, phylogenetic trees face challenges from phenomena like , incomplete lineage sorting, and long-branch attraction, which can introduce systematic errors and incongruence across datasets, underscoring the provisional nature of any single tree topology. These tools remain central to , enabling predictions about trait , disease transmission, and priorities grounded in causal patterns of inheritance rather than mere similarity.

History

Pre-modern concepts

Early biological classifications emphasized linear hierarchies rather than branching relationships. , in works such as Historia Animalium (c. 350 BCE), proposed the scala naturae, a continuous ladder ranking organisms from inanimate matter and at the base to humans and deities at the apex, based on increasing complexity, soul possession, and , with no of shared ancestry or temporal change. This static, teleological framework influenced subsequent thought, portraying nature as a fixed, graded continuum without evolutionary divergence. Medieval and adaptations extended this into the , a Christianized integrating Aristotelian ideas with , ordering all creation from minerals through , animals, humans, angels, to in an unbroken series of forms, each occupying a unique rung without branching or descent. Such concepts prioritized and divine order over relational histories, serving classificatory purposes but lacking diagrammatic trees or networks. In the 18th century, naturalists occasionally employed tree-like or reticulated diagrams for limited relational depictions, though still divorced from evolutionary mechanisms. , in (1753), illustrated a genealogical network of dog breeds, tracing putative descent from ancestral types via hybridization and environmental influence, yet framed within and species degeneration rather than progressive branching. Similarly, (1766) sketched a with a compound trunk to symbolize organismal gradations, and (1764) speculated on potential branching in the chain for classificatory ends, but these remained non-evolutionary tools focused on observable affinities. These precursors highlighted affinities but did not hypothesize with modification, contrasting sharply with later phylogenetic models.

Development of cladistics and modern synthesis

The modern evolutionary synthesis, formulated primarily between 1936 and 1947, integrated Mendelian genetics with Darwinian , emphasizing mechanisms such as , , , and population-level to explain evolutionary change. Key contributions included Theodosius Dobzhansky's Genetics and the Origin of Species (1937), which applied genetic principles to natural populations, and Ernst Mayr's Systematics and the Origin of Species (1942), which addressed and the role of geographic isolation. Julian Huxley's Evolution: The Modern Synthesis (1942) synthesized these ideas into a cohesive framework, affirming universal and branching phylogenetic patterns as outcomes of microevolutionary processes scaled to . However, while this synthesis solidified the theoretical basis for hierarchical evolutionary relationships, taxonomic practice often retained pre-synthesis elements, blending ancestry with overall similarity and permitting paraphyletic groups to reflect adaptive divergence. In the post-synthesis era, —championed by figures like Mayr—prioritized classifications that balanced phylogenetic history with phenotypic divergence, leading to inconsistencies in representing monophyletic clades via tree-like diagrams. This approach contrasted with emerging demands for a strictly genealogical system, where taxa correspond exclusively to branches on a phylogenetic tree defined by shared derived characters (synapomorphies). Willi Hennig, a entomologist, addressed these gaps through , outlined in his 1950 German-language monograph Grundzüge einer Theorie der phylogenetischen Systematik. Hennig argued that true natural groups must be monophyletic, encompassing an ancestor and all its descendants, with branching diagrams (cladograms) serving as the primary tool for depicting sister-group relationships inferred from . His method rejected paraphyletic assemblages, such as traditional "reptiles" excluding birds, insisting instead on rigorous testing via outgroup comparison and the principle of to minimize assumptions in tree reconstruction. Hennig's ideas, developed amid fieldwork on vectors, initially faced resistance due to publication in and his East German affiliation, but gained traction after the 1966 English translation of his work as Phylogenetic . diverged from modern synthesis by subordinating adaptive weighting to strict ancestry, challenging evolutionary taxonomists' inclusion of grade-based categories. By the 1970s, amid debates with phenetic —which emphasized overall similarity without explicit phylogeny— advanced through algorithmic implementations like parsimony analysis, enabling computational tree searches and formalizing phylogenetic trees as testable hypotheses of descent. This shift reinforced the modern synthesis's commitment to while providing a deductive framework for , prioritizing empirical over narrative evolutionary scenarios.

Rise of molecular phylogenetics

The foundations of molecular phylogenetics emerged in the 1960s with proposals to use protein sequences as documents of evolutionary history, emphasizing "semantides" such as polypeptide chains whose structures reflect genetic information with minimal functional constraint. In 1965, Émile Zuckerkandl and advanced this approach by analyzing amino acid substitutions in proteins like and , positing a "molecular evolutionary clock" where substitution rates approximate constancy over time, enabling divergence time estimates independent of fossil records. Their work demonstrated that molecular differences could quantify phylogenetic divergence more objectively than morphological traits, though initial applications were limited by manual sequencing techniques and focused primarily on vertebrates. A pivotal advancement occurred in the 1970s through Carl Woese's application of (rRNA) sequences, selected for their conservation and universality across cellular life. By comparing 16S rRNA catalogs from diverse prokaryotes, Woese and constructed the first universal phylogenetic tree in 1977, revealing three primary lineages—Bacteria, (initially termed archaebacteria), and Eukarya—challenging the prevailing dichotomy of prokaryotes versus eukaryotes. This discovery, based on sequence dissimilarity metrics rather than phenotypic traits, established rRNA as a robust molecular chronometer and highlighted deep evolutionary divergences undetectable by , fundamentally reshaping domain-level . The 1980s and 1990s marked the explosive rise of , driven by technological breakthroughs in analysis. Frederick Sanger's chain-termination method, developed in 1977 but scaled via automation in the mid-1980s, enabled routine , while (PCR), invented by in 1983 and commercialized in 1988, amplified target sequences for comparative studies. These tools facilitated large-scale DNA-based phylogenies across taxa, supplanting protein data and morphological comparisons with vast datasets; by the 1990s, and mitochondrial genomes yielded resolutions for fine-scale relationships, such as within species complexes, and spurred methods like maximum refinements and maximum likelihood models to account for substitution rate heterogeneity. This era's data deluge underscored ' superiority in resolving cryptic divergences but also exposed challenges like long-branch attraction artifacts and , necessitating model-based corrections.

Definitions and Fundamental Properties

Basic diagrammatic representation

A phylogenetic tree illustrates evolutionary relationships among biological entities through a branching structure composed of nodes and branches. The terminal points, or leaves, of the tree represent the observed taxa, such as extant or molecular sequences, while internal nodes denote hypothetical common ancestors where lineages diverge. Branches connecting these nodes symbolize the evolutionary lineages linking ancestors to descendants, with each branch typically indicating a single evolutionary path without implying proportional divergence unless specified. In its simplest form, the resembles a with a hierarchical of bifurcations, reflecting events as points of divergence from shared . The —the connectivity of branches and nodes—encodes the hypothesized relatedness, where closer branching implies more recent common . Orientation varies, with branches often extending from a at the base or left toward tips at the top or right, but the relative positions convey monophyletic groupings—clades—encompassing an and all its . This diagrammatic form serves as a visual of phylogeny, derived from comparative data like or , rather than a literal depiction of historical events. Labels on tips specify the taxa, and node labels may indicate inferred ancestors or support values from analytical methods, though basic representations often omit quantitative branch lengths or temporal scales.

Rooted versus unrooted trees

A rooted phylogenetic tree features a designated node that represents the of all included taxa, with branches directed away from the root to indicate the of evolutionary from to descendants. This structure imposes a temporal direction, allowing inferences about the order of divergences and the relative ages of lineages, as the root marks the point of origin for the . In contrast, an unrooted phylogenetic tree omits a root, depicting only the of branching relationships among taxa without specifying evolutionary direction or an ancestral . The distinction arises because unrooted trees represent equivalence classes of rooted trees consistent with the same branching pattern; placing a root on any internal branch of an unrooted tree yields a compatible rooted version. are essential for reconstructing ancestral states or estimating times, as they define ingroups and outgroups relative to the root. Unrooted trees, however, prove useful when the root's position is uncertain or irrelevant, such as in initial assessments of from distance-based methods or when comparing evolutionary relationships solely by connectivity. For trees with n labeled leaves, rooted versions contain 2n - 2 edges, while unrooted contain 2n - 3, reflecting the absence of a root edge in the latter.

Bifurcating versus multifurcating trees

Bifurcating phylogenetic trees, also termed binary trees, feature internal nodes that each split into precisely two descendant lineages, modeling evolutionary divergence as successive dichotomous events. This structure aligns with gradual splitting of ancestral populations into two lineages, facilitating computational analysis under methods like or likelihood that assume resolved branching. For an unrooted bifurcating tree with n labeled leaves, the number of possible topologies equals the (2n-3)!!, reflecting the exhaustive enumeration of binary resolutions. Multifurcating trees, conversely, contain polytomous nodes where one or more internal nodes branch into three or more immediate descendants, representing either true simultaneous diversification (hard polytomy) or artifactual lack of resolution from limited data (soft polytomy). Hard polytomies arise in scenarios of rapid or incomplete lineage sorting, where short internodes preclude distinguishing sequential bifurcations, as quantified by branch length thresholds in statistical tests. Soft polytomies, prevalent in microbial phylogenies due to high and sparse informative sites, signal insufficient phylogenetic signal rather than biological reality, often requiring additional data or resolution algorithms to differentiate from bifurcations. The distinction impacts tree inference and : bifurcating topologies imply full resolvability and are default outputs of many algorithms, yet forcing of true multifurcations risks inferring spurious relationships, inflating values. Multifurcating representations better convey evidential uncertainty, as in dated phylogenies where resolvers randomly binary-ize nodes while propagating branch length variances, though they complicate downstream metrics like distances. Empirically, multifurcations occur frequently in empirical datasets during searches, with prevalence tied to and model complexity, underscoring the need for explicit testing via likelihood ratio comparisons of multifurcating versus bifurcating alternatives.

Labeled versus unlabeled trees

A labeled phylogenetic tree assigns distinct identifiers, such as names or molecular sequence labels, to each in a bijective manner, ensuring every observed corresponds uniquely to a terminal . This structure is fundamental to empirical phylogenetic , as it links branching patterns directly to specific biological entities, enabling hypothesis testing against data like genetic distances or morphological traits. Internal nodes remain unlabeled, representing hypothetical ancestors without predefined identities. In contrast, an unlabeled phylogenetic tree, often termed a tree shape or unlabeled topology, omits taxon-specific labels, treating all leaves as structurally equivalent except for their positions in the branching . These abstract forms emphasize the combinatorial of evolutionary , such as or imbalance in branching, independent of which particular taxa occupy . Unlabeled trees facilitate theoretical analyses, including the study of phylogenetic shape distributions under models of and , where label permutations do not alter the underlying pattern. The distinction impacts enumeration and computational complexity: labeled trees outnumber unlabeled ones because labels distinguish isomorphic topologies, with the count of rooted binary labeled trees on n leaves given by the double factorial (2n-3)!!. For example, n=4 yields 15 labeled rooted binary topologies but only 2 unlabeled shapes—one balanced (all internal branches of equal depth) and one unbalanced. Unlabeled counts require accounting for symmetries, often via generating functions or recursive bijections to equivalence classes, and grow more slowly, aiding assessments of tree space diversity without label-induced multiplicity. In practice, phylogenetic inference algorithms operate on labeled trees to preserve biological specificity, while unlabeled shapes inform metrics like tree balance indices or prior distributions in Bayesian models.

Enumeration and mathematical properties

The enumeration of phylogenetic trees quantifies the number of distinct tree topologies possible for a given set of , which is fundamental to understanding the combinatorial of phylogenetic . For (fully bifurcating) trees, where internal nodes have exactly three branches in unrooted representations or two subtrees from the in rooted ones, closed-form expressions exist under the assumption of labeled leaves corresponding to distinct . Unlabeled trees, which disregard taxon identities, are enumerated differently but are less relevant to empirical where taxa are identifiable. ![Number of unrooted binary phylogenetic trees as a function of the number of leaves][float-right] The number of unrooted phylogenetic trees with n labeled leaves, n ≥ 3, is given by the (2n − 5)!!, equivalent to the product ∏i=3n (2i − 5) or (2n − 5)! / (2n−3 (n − 3)!). These trees are connected acyclic graphs with n leaves of 1 and n − 2 internal nodes of 3, totaling 2n* − 2 nodes and 2n − 3 edges. The formula arises recursively: the number for n taxa equals the number for n − 1 taxa multiplied by (2n − 5), reflecting the positions to attach the new to existing edges while maintaining structure. This count excludes multifurcations, where internal nodes can have greater than 3, as their enumeration lacks a simple closed form and depends on specifying degrees. For rooted binary phylogenetic trees with n labeled leaves, n ≥ 2, the number is (2n − 3)!! = (2n − 3)! / (2n−2 (n − 2)!), where the has 2 and other internal nodes 3. Each unrooted corresponds to exactly 2n − 3 rooted variants, obtained by placing the on any , yielding the relationship between the counts. These rooted trees also have 2n − 2 nodes and 2n − 3 , but the rooting imposes directionality from to . The —approaching roughly (2n − 5)n−3 / en−2 asymptotically via —renders exhaustive enumeration infeasible for moderate n, as seen in values like 105 for n=6 unrooted trees and over 1013 for n=20, motivating search algorithms in practice.

Types and Variants

Cladograms

A is a in that depicts the branching pattern of evolutionary relationships among taxa, illustrating the sequence of divergence events based exclusively on shared derived characteristics, known as synapomorphies, without scaling branch lengths to reflect the extent of evolutionary change or elapsed time. The structure emphasizes —the relative order and nesting of clades—over quantitative metrics, with branches typically drawn of arbitrary or equal length to prioritize clarity in hierarchical grouping. This unscaled representation distinguishes cladograms from phylograms, where branch lengths are proportional to inferred or substitution rates. In a cladogram, internal nodes represent hypothetical common ancestors, while terminal nodes (leaves) denote observed taxa, which may be extant species, genera, or fossil representatives. The diagram enforces , ensuring that each comprises an and all its descendants, derived from analyses that minimize —convergent or —through criteria like maximum , where the preferred tree requires the fewest evolutionary steps to explain character distributions. Rooted cladograms incorporate an outgroup to polarize character states, establishing the of evolutionary change by designating the outgroup as retaining the ancestral condition relative to the ingroup. Unrooted cladograms, conversely, omit this polarity, focusing solely on relative branching without implying a basal , often used in exploratory analyses of molecular data. Cladograms are constructed from discrete morphological or molecular characters, scored as (present/absent) or multistate, with algorithms evaluating thousands of possible topologies to select those best supported by among characters. For instance, in parsimony-based methods, character compatibility indices quantify how well traits map onto the , rejecting topologies with excessive reticulations or reversals. for clades is assessed via metrics like the decay index (Bremer ), which measures the number of additional steps required to a monophyletic group, or bootstrap resampling, which tests robustness by simulating data variability. These diagrams thus serve as hypotheses of phylogeny, subject to falsification by new evidence, underscoring ' emphasis on testable, evidence-driven classifications over phenetic similarity alone.

Phylograms and ultrametric trees

A phylogram is a rooted phylogenetic tree in which the lengths of branches are scaled to represent the amount of evolutionary , typically measured as or the number of substitutions per site, between taxa. Unlike cladograms, where branch lengths are arbitrary and convey only topological relationships, phylograms incorporate quantitative data to illustrate relative amounts of evolutionary change along lineages, with longer branches indicating greater . This additive property ensures that the path distance between any two leaves equals the observed evolutionary distance, allowing for inference of divergence magnitudes from molecular or morphological data. Ultrametric trees represent a constrained of phylograms, characterized by the that all leaves (tips) are equidistant from the , implying a constant rate of evolution across lineages consistent with the hypothesis. In an ultrametric tree, branch lengths are calibrated to time rather than raw divergence, such that the total distance from to any reflects chronological divergence under the assumption of uniform evolutionary rates, often tested via relative rate analyses. This structure facilitates dating of events when calibrations or clock-like are invoked, though violations of the clock assumption—such as rate heterogeneity—can distort ultrametric representations, necessitating relaxed clock models in modern analyses. Phylograms and ultrametric trees are constructed using distance-based methods like neighbor-joining or least-squares optimization, where input matrices are transformed into tree topologies with scaled edges; for ultrametrics, additional constraints enforce tip equidistance, often via algorithms testing for ultrametricity in pairwise . These representations are particularly useful in for visualizing substitution and temporal dynamics, but their accuracy depends on the additivity of the underlying and the validity of rate constancy assumptions.

Chronograms

A is a dated phylogenetic tree in which branch lengths are scaled to represent time units, such as millions of years, rather than the amount of genetic or morphological change. This scaling enables direct estimation of times between lineages, distinguishing chronograms from phylograms, where branch lengths correspond to evolutionary metrics like the number of substitutions per site. Chronograms are typically ultrametric when all terminal taxa are sampled contemporaneously, meaning the total branch length from the root to any tip equals the time since the of the root. Construction of chronograms begins with inferring a from molecular or morphological data, often yielding an initial phylogram, which is then calibrated to time using external constraints such as records or geological events. Strict models assume constant evolutionary rates across branches, but these are rarely realistic; instead, relaxed clock models accommodate rate variation while enforcing temporal scaling. Bayesian frameworks, implemented in software like , integrate phylogenetic with divergence time estimation by incorporating prior distributions on node ages derived from calibrations, typically yielding posterior distributions of chronograms that account for uncertainty in rates and calibrations. Chronograms facilitate analyses requiring temporal context, such as ancestral state reconstruction under time-heterogeneous models, macroevolutionary rate comparisons, and historical biogeography, though model fit assessments may favor phylograms for certain trait evolution scenarios where substitution rates better proxy opportunity for change. Errors in calibration or rate assumptions can propagate, with methods like relative node dating algorithms proposed to correct chronograms by leveraging multiple fossil constraints and phylogenetic signal. In practice, chronograms often depict confidence intervals on node ages as bars or shaded regions to convey estimation uncertainty.

Dendrograms and other hierarchical representations

Dendrograms represent hierarchical arrangements of taxa resulting from clustering algorithms applied to distance or similarity matrices in phylogenetic . These diagrams consist of leaves corresponding to observed taxa and internal nodes indicating successive merges of clusters, with lengths often reflecting the distance at which clusters are joined. Unlike cladograms, which prioritize qualitative shared characters without quantitative scaling, dendrograms incorporate metric information from pairwise comparisons, such as genetic or morphological distances. A prominent method for generating dendrograms is the (UPGMA), an agglomerative clustering technique that produces rooted ultrametric trees. In ultrametric dendrograms, all nodes (tips) are equidistant from the , implying a strict where evolutionary rates remain constant across lineages; the height from to any leaf equals the maximum evolutionary divergence observed. This assumption holds for data satisfying ultrametric conditions but fails under rate heterogeneity, potentially distorting true phylogenetic relationships by forcing unequal rates into a uniform framework. For instance, applied to non-clocklike data may cluster fast-evolving taxa artifactually with distant relatives. Distinctions from phylograms arise in and : phylograms proportion branches to evolutionary change (e.g., substitutions per site) without requiring tip synchrony, allowing variable rates, whereas UPGMA dendrograms enforce ultrametricity, prioritizing hierarchical similarity over additive path lengths. Non-ultrametric dendrograms can emerge from other algorithms, such as neighbor-joining, which yield additive trees approximable as rooted hierarchies but better suited to unrooted representations when evolutionary rates vary. Dendrograms thus serve exploratory roles in , emphasizing overall similarity rather than strictly homologous ancestry, though they risk conflating with shared descent absent corroboration from character-based methods. Other hierarchical representations in phylogenetics extend beyond binary dendrograms to include multifurcating (polytomous) structures resolving uncertainties as soft polytomies or visualizations like radial or circular layouts for dense taxa sets. Textual formats, such as Newick notation, encode tree topologies hierarchically (e.g., ((A,B),C); for a rooted ), facilitating computational interchange while preserving nesting. These variants accommodate large-scale analyses, as in supertree methods aggregating multiple dendrograms into consensus hierarchies, but require caution against overinterpreting clusters as clades without statistical support like , which assesses node reliability by resampling distances.

Specialized diagrams (spindle, coral of life)


Spindle diagrams, also known as romerograms, depict evolutionary diversification and extinction patterns by plotting taxonomic diversity on the horizontal axis against geological time on the vertical axis, forming spindle-like shapes that widen during adaptive radiations and narrow during mass extinctions. These diagrams originated from the work of Alfred Romer in vertebrate paleontology and are particularly useful for illustrating macroevolutionary trends in fossil records, such as the radiation of hoofed mammals during the era. Unlike bifurcating phylogenetic trees, spindle diagrams emphasize temporal changes in abundance rather than strict ancestor-descendant relationships, allowing representation of paraphyletic groups and evolutionary grades in .
The width of the spindle at any time slice approximates the number of families or genera, providing a visual proxy for biodiversity dynamics; for instance, vertebrate spindle diagrams show peaks in diversity correlating with ecological opportunities post-extinction events. This format facilitates integration of paleontological data with phylogenetic hypotheses, though it sacrifices precise branching topologies for broader temporal and diversity insights.
The coral of life extends the phylogenetic tree concept to account for reticulate , particularly (HGT) in prokaryotes, where lineages anastomose like coral branches rather than strictly diverge. First invoked by in 1837 to describe how extinct basal branches support living tips, the modern usage, popularized by W. Ford Doolittle, highlights that prokaryotic genomes often derive from multiple sources, rendering a single tree inadequate for deep phylogeny. In this model, vertical inheritance dominates in eukaryotes but is overlaid with HGT networks in and , forming a "web" or "coral" structure with dead basal segments obscured by time.
Empirical evidence from genomic studies supports the coral , as analyses of thousands of prokaryotic reveal conflicting due to HGT events estimated at 10-20% of histories in some lineages, challenging reconstructions while preserving tree-like signals for informational . This representation underscores causal realism in , prioritizing mechanisms over idealized bifurcations, and informs interpretations of life where microbial mergers shaped diversification.

Construction Methods

Data sources and preprocessing

Primary data sources for phylogenetic tree construction consist of molecular sequences, such as DNA, RNA, or amino acid alignments from homologous genes or genomic loci across taxa, which provide quantifiable variation for inferring evolutionary relationships. DNA sequences, particularly from mitochondrial, nuclear, or chloroplast genomes, predominate due to their abundance and ability to resolve deep divergences when sufficient loci are sampled. Protein sequences supplement DNA data in cases of high saturation or compositional bias in nucleotides, as amino acid substitutions evolve more slowly. These data are typically retrieved from public repositories like GenBank or the European Nucleotide Archive, which as of 2023 house over 10 million nucleotide sequences suitable for phylogenetics. Morphological data, derived from discrete phenotypic traits (e.g., structures or meristic counts), serve as an or complementary source, especially for fossil-inclusive trees where molecular data are unavailable; however, such characters are prone to and , yielding lower resolution compared to molecular datasets. In phylogenomics, whole-genome or data from high-throughput sequencing expand scale, incorporating thousands of loci to mitigate error, though they demand computational resources exceeding those for single-gene analyses. Preprocessing begins with of raw sequences, including trimming adapters, filtering low-quality reads (e.g., Phred scores below 20), and assembling contigs if from data, to ensure accurate assessment. follows, aligning homologous positions using algorithms like progressive (e.g., Clustal Omega) or iterative (e.g., MAFFT) methods, which as of 2024 achieve over 95% accuracy for closely related sequences but require manual curation for divergent ones. Alignment refinement involves masking or trimming ambiguously aligned regions (e.g., via trimAl or Gblocks) to exclude noise from indels or hypervariable sites, reducing systematic bias in or likelihood calculations; studies show this step can improve accuracy by 10-20% in simulated datasets. For multi-locus datasets, partitioning by gene or codon position occurs, alongside outgroup selection to root the tree, and imputation or exclusion of (affecting up to 30% of cells in phylogenomic matrices) to preserve signal without introducing artifacts. Evolutionary models are preliminarily tested (e.g., via jModelTest), though full optimization defers to construction phases. These steps, often automated in pipelines like IQ-TREE or RAxML-NG, minimize preprocessing artifacts that could propagate errors in downstream inference.

Distance-based approaches

Distance-based approaches construct phylogenetic trees by first deriving a matrix of pairwise evolutionary distances from molecular sequences or other traits, then applying clustering or optimization algorithms to recover a tree topology consistent with these distances under assumptions of additivity or minimality. These methods convert raw data, such as aligned DNA or protein sequences, into corrected distances using substitution models like Jukes-Cantor (1969), which accounts for unobserved multiple hits, or more complex ones like the general time-reversible model. The resulting distance matrix serves as input for tree-building, enabling rapid inference but at the cost of discarding site-specific pattern information inherent in character-based alternatives. A foundational algorithm is the unweighted pair group method with arithmetic mean (UPGMA), developed by Sokal and Michener in , which employs agglomerative . It iteratively merges the two clusters with the smallest average inter-cluster distance, updating distances via s, and assumes a strict yielding an ultrametric tree where terminal nodes align at equal depths from the root. This assumption holds only if evolutionary rates are constant across lineages, limiting UPGMA's accuracy in heterogeneous datasets; violations, such as varying substitution rates, can produce incorrect topologies by forcing equidistant leaf placements. Despite this, UPGMA remains computationally simple, with O(n^2) for n taxa, and is useful for preliminary analyses or clock-like data like some microbial phylogenies. The neighbor-joining (NJ) algorithm, introduced by Saitou and Nei in , overcomes 's clock assumption by constructing additive trees that minimize estimated total branch lengths without enforcing ultrametricity. NJ proceeds iteratively: for each step, it selects a pair of "neighbors" (taxa or clusters) minimizing a rate-corrected criterion—Q_ij = (n-2)d_ij - sum_k (d_ik + d_jk), where n is the number of current taxa/clusters and d denotes —joins them into a new node, estimates branch lengths via least-squares, and updates . This yields unrooted trees suitable for rate-variable data, with empirical studies showing NJ recovering correct topologies under moderate branch length variation where UPGMA fails; its O(n^3) implementation can be optimized to O(n^2) via approximations. NJ's efficiency has made it a staple for large-scale analyses, as in early phylogenies, though it remains and sensitive to distance estimation errors from saturation or compositional bias. Other distance-based variants include minimum (ME), which explicitly searches for the tree minimizing the sum of corrected branch lengths, often using NJ as a starting point followed by local optimization, and least-squares methods fitting distances to tree paths via . These approaches excel in —handling datasets with thousands of taxa faster than likelihood-based methods—but disadvantages include information loss during matrix conversion, of distance inaccuracies (e.g., undercorrection for ), and inability to model complex processes like site-specific rates without prior averaging. Empirical benchmarks indicate distance methods perform robustly for closely related taxa but degrade with deep divergences or long-branch effects, prompting hybrid uses with for support assessment.

Discrete character-based methods

Discrete character-based methods in phylogenetic reconstruction utilize discrete traits, such as morphological features, binary presence-absence data, or molecular sequence sites (e.g., nucleotides or amino acids treated as discrete states), to directly evaluate evolutionary relationships among taxa without prior summarization into pairwise distances. These approaches retain the full informational content of individual characters, allowing assessment of shared derived states (synapomorphies) and potential homoplasies, in contrast to distance-based methods that aggregate differences across all sites. Common data include aligned DNA sequences where each position constitutes a character with four possible states (A, C, G, T), or morphological matrices with multistate codings. The predominant technique within this framework is maximum (), which identifies the phylogenetic tree that minimizes the total number of character state changes (evolutionary steps) required to account for the observed data across all characters. Under , unordered characters assume equal cost for any state transition, while ordered characters impose a step-wise cost reflecting gradual ; the efficiently computes the parsimony score for unordered cases by propagating possible ancestral states via intersections and unions along branches. Tree search involves evaluating candidate topologies, often starting with strategies like stepwise addition—where taxa are sequentially added to growing trees via branch rearrangements—or more advanced swaps such as nearest-neighbor interchanges (NNI) and subtree-pruning-regrafting (SPR) to escape local optima. Exact methods, including exhaustive enumeration for small datasets (feasible up to ~10 taxa) or branch-and-bound pruning of suboptimal subtrees, guarantee optimality but scale poorly due to the NP-hard nature of the problem, with the number of unrooted binary trees growing as (2n-5)!! for n leaves. An alternative, less frequently applied approach is the compatibility method (or clique analysis), which seeks the largest subset of characters that can be explained without on a single , effectively solving the perfect phylogeny problem for characters via graph-theoretic cliques where characters are edges and compatibility implies non-crossing partitions. Successive approximations, such as reweighting characters by their consistency index (1 - ), can iteratively refine searches to handle dataset heterogeneity. These methods excel in retaining trait details for taxonomic or fossil-inclusive analyses but face challenges like sensitivity to long-branch attraction, where rapidly evolving lineages converge artifactually, potentially leading to inconsistent recovery under high rates or heterogeneous evolutionary models. Empirical studies indicate performs reliably for low-divergence datasets but may underperform relative to model-based alternatives when is extensive, as it lacks explicit probabilistic calibration of change frequencies.

Probabilistic methods (likelihood and Bayesian)

Probabilistic methods for phylogenetic tree reconstruction employ explicit stochastic models of , typically Markov processes along , to evaluate topologies and lengths against observed such as aligned molecular sequences. These approaches contrast with distance or methods by integrating evolutionary parameters like rates and site heterogeneity directly into the inference process, enabling statistical assessment of model fit and hypothesis testing. The core computation relies on the , which quantifies the probability of the given a hypothesized and model parameters, often calculated efficiently via Felsenstein's pruning algorithm that recursively sums conditional probabilities from leaves to root. Maximum likelihood (ML) estimation identifies the tree topology, branch lengths, and model parameters that maximize the likelihood of the observed data under the evolutionary model, providing a point estimate of the phylogeny. Initially proposed for gene frequency data by Cavalli-Sforza and Edwards in 1967 and extended to DNA sequences by Felsenstein in 1981, ML offers statistical consistency—asymptotic convergence to the true tree under correct model assumptions—and robustness to moderate model misspecification. Inference typically involves heuristic searches like hill-climbing or genetic algorithms to navigate the vast tree space, with branch support assessed via nonparametric bootstrapping, where pseudoreplicates are resampled and reanalyzed to gauge resampling frequency of clades. ML excels in handling complex models, such as those incorporating among-site rate variation via gamma distributions or invariant sites, but requires accurate model selection (e.g., via Akaike or Bayesian information criteria) to avoid bias from oversimplification or overparameterization. Bayesian inference extends likelihood-based evaluation by incorporating prior probabilities on trees and parameters, computing the posterior distribution proportional to the likelihood times the prior via , which naturally quantifies uncertainty through credible intervals and posterior clade probabilities. Popularized in phylogenetics by programs like MrBayes, introduced by Huelsenbeck and Ronquist in 2001, Bayesian methods use (MCMC) sampling to explore the posterior, running multiple chains to approximate the distribution and diagnose convergence via metrics like effective sample size and trace plots. Priors, such as uniform on topologies or Dirichlet on substitution rates, minimally influence results under informative data but can regularize inference in sparse datasets; however, simulations have shown potential overcredulity in posterior probabilities when concatenating genes without accounting for linkage or incomplete lineage sorting. Relative to , Bayesian approaches better integrate heterogeneous data partitions and enable marginal likelihood estimation for model comparison via thermodynamic integration or stepping-stone sampling, though they demand greater computational resources for adequate chain mixing and burn-in assessment. Recent advances include scalable MCMC variants for large phylogenomic datasets, enhancing applicability to thousands of loci while addressing reticulation via admixture models.

Algorithms and computational tools

Distance-based algorithms construct phylogenetic trees from pairwise distance matrices derived from sequence similarities, assuming additivity or using corrections for multiple substitutions. The unweighted pair group method with () clusters taxa hierarchically under a strict assumption, producing ultrametric trees suitable for rate-constant but sensitive to rate heterogeneity. Neighbor-joining (NJ), introduced by Saitou and Nei in , relaxes the clock assumption by iteratively joining taxa that minimize total branch length estimates, yielding additive trees; it remains computationally efficient (O(n^3) time) and widely applied for initial explorations despite potential inconsistencies under heterogeneous rates. Discrete character-based methods, such as maximum parsimony (), seek trees requiring the fewest evolutionary changes across aligned sites, treating gaps and substitutions as equally weighted steps unless specified otherwise; exact solutions via exhaustive search scale poorly (2^{n-3} unrooted trees for n taxa), necessitating heuristics like branch-and-bound or genetic s. Probabilistic approaches dominate modern inference: maximum likelihood (ML) evaluates tree topologies and parameters by maximizing the probability of observed data under explicit substitution models (e.g., GTR), using Felsenstein's 1981 pruning for dynamic likelihood computation across sites and branches (O(n k s) per evaluation, with k sites and s states). Bayesian methods extend ML via (MCMC) sampling from posterior distributions incorporating priors on topologies, branch lengths, and rates, enabling uncertainty quantification; they handle complex models like relaxed clocks but require convergence diagnostics due to MCMC autocorrelation. Key computational tools implement these algorithms with optimizations for large datasets: (Phylogeny Inference Package), developed by Felsenstein since 1980, supports diverse methods including NJ, , and distance corrections across multiple formats. PAUP* excels in heuristic and searches for data, though its commercial limits accessibility. RAxML, optimized for on thousands of sequences, employs randomized hill-climbing and bootstrap analysis (RAxML-NG variant since 2018 improves speed via AVX instructions). IQ-TREE, an efficient framework since 2014, integrates (ModelFinder), partition schemes, and alias-free likelihood computations, outperforming RAxML in accuracy and speed for phylogenomics. For Bayesian analysis, MrBayes facilitates MCMC on multi-gene datasets with mixed models, while BEAST (version 1.0 in 2007, BEAST 2 in 2014) specializes in time-calibrated trees via priors and birth-death sampling, accommodating calibrations and heterogeneous rates. These tools often interoperate via Newick or formats, with recent advances like Phylo-rs (2025) emphasizing scalable implementations for massive alignments. Heuristic searches predominate due to the of tree space—e.g., 34 million unrooted quartets for 10 taxa—rendering optimization NP-hard.

File formats and interoperability

Phylogenetic trees are stored and exchanged using several standardized file formats that encode tree topology, branch lengths, node labels, and sometimes associated data such as character matrices or annotations. The , developed for the software package, represents trees via a compact parenthetical notation where nested parentheses denote clades, commas separate siblings, and colons precede branch lengths, as in (A:0.1,B:0.2):0.1;. This format supports both rooted and unrooted trees but is limited to basic structural elements without native provisions for multiple trees, evolutionary models, or extensive in a single file. Its simplicity enables broad compatibility across tools like RAxML and IQ-TREE, yet variations in parsing—such as handling of internal node labels or semi-colon termination—can lead to issues between implementations. The format extends Newick by organizing content into modular blocks (e.g., DATA for character matrices, TREES for topologies), prefixed with #NEXUS, allowing integration of sequence data, assumptions, and multiple trees within one file. Introduced in for systematic , NEXUS supports commands for phylogenetic analysis software like PAUP* and MrBayes, including weighted characters and partition schemes, but its free-form syntax permits non-standard extensions that reduce portability across programs. For enhanced , XML-based standards like phyloXML and NeXML address limitations of text formats by providing schema-validated structures for trees, sequences, and annotations such as geographic data or accession numbers. PhyloXML, defined in 2009, uses nested <clade> elements to describe phylogenies with extensible properties for , supporting import/export in libraries like . NeXML, an evolution of inspired by XML standards, employs edge-node lists for precise representation of complex phylogenies, including networks, and facilitates programmatic validation to minimize errors in . These formats promote re-use in large-scale analyses, as evidenced by archiving policies in journals requiring deposition of trees with since 2012. Despite widespread adoption, interoperability challenges persist due to incomplete support in legacy software and the need for conversion tools, underscoring ongoing efforts for unified standards in phylogenomics.

Applications and Interpretations

Systematic classification and taxonomy

Phylogenetic systematics, or , employs trees to classify organisms based on inferred evolutionary relationships derived from shared derived traits, known as synapomorphies, which define monophyletic clades comprising a common ancestor and all its descendants. This method prioritizes over , using or probabilistic models to reconstruct branching patterns that minimize evolutionary changes. Clades identified in phylogenetic trees form the basis for taxonomic hierarchies, ensuring classifications reflect actual descent rather than superficial similarities. Traditional Linnaean taxonomy, with its fixed ranks like kingdom, phylum, and species, often incorporated paraphyletic groups excluding some descendants, leading to inconsistencies with evolutionary history; phylogenetic approaches revise these by naming only monophyletic assemblages, subordinating ranks to clade structure. For instance, reptiles excluding birds represent a paraphyletic assemblage, whereas Sauropsida, encompassing reptiles and birds, constitutes a monophyletic clade supported by molecular and morphological phylogenies. Taxonomic nomenclature under the PhyloCode or International Code of Zoological Nomenclature increasingly aligns with phylogenetic trees, requiring diagnoses tied to apomorphies or node-based definitions. In practice, phylogenetic trees facilitate ongoing taxonomic revisions; for example, molecular data have reclassified whales within Artiodactyla, forming the monophyletic Cetartiodactyla clade, overturning prior separations based on morphology alone. Such classifications enhance predictive power in biology, as closely related taxa share more traits due to common ancestry, informing fields from conservation to medicine. However, tree uncertainty from incomplete data or conflicting signals necessitates robust statistical support, such as bootstrap values exceeding 70% for clade credibility.

Evolutionary inference and comparative biology

Phylogenetic trees serve as frameworks for inferring evolutionary histories by phenotypic , genetic sequences, or ecological data onto branching topologies, enabling of times, rates of , and ancestral states. These inferences rely on models assuming gradual change or punctuated shifts along branches, with methods like maximum or likelihood-based approaches reconstructing internal node states to hypothesize transitions, such as the gain or loss of in lineages. For instance, minimizes the number of evolutionary changes required to explain observed tip data, while stochastic under continuous-time Markov models incorporates branch lengths to quantify uncertainty in reconstructions. In comparative biology, phylogenetic trees address non-independence among data arising from , preventing inflated Type I errors in statistical tests of trait correlations or adaptations. Phylogenetically independent contrasts, introduced by Felsenstein in 1985, transform trait values into differences across sister clades standardized by branch lengths, yielding phylogenetically independent data points for regression analyses of correlated evolution, such as body size and metabolic rate across mammals. This method assumes Brownian motion-like evolution, where traits diffuse randomly along branches proportional to time, and has been extended to for handling continuous covariates. Such tools facilitate hypothesis testing in , including detecting adaptive radiations via shifts in diversification rates or trait disparities on specific branches, and evaluating where distantly related lineages evolve similar forms under analogous selective pressures. For example, have quantified beak morphology evolution in , linking variation to ecological niches while controlling for shared ancestry, revealing bursts of adaptive change during environmental perturbations. These approaches integrate calibrations for timed trees, allowing causal inferences about drivers like or , though they demand robust phylogenies to avoid propagating estimation errors into downstream analyses.

Phylogenomics and large-scale analyses

Phylogenomics integrates genomic data with inference to reconstruct evolutionary relationships at a finer resolution than traditional single-gene , leveraging complete or near-complete sequences to identify orthologous genes and infer species trees. This approach emerged prominently in the early following the advent of high-throughput sequencing, which enabled the generation of vast datasets comprising thousands of loci, shifting from or limited molecular markers to genome-wide signals. By 2010, phylogenomic studies routinely analyzed alignments of over 100 orthologs across multiple taxa, revealing patterns of , loss, and divergence that inform macroevolutionary processes. In large-scale phylogenomic analyses, datasets scale to include hundreds of taxa and tens of thousands of genes, often processed via supermatrix —where orthologous sequences are aligned and combined into a single for maximum likelihood or —or through summary methods that aggregate gene trees to account for incomplete lineage sorting. For instance, pipelines like automate the extraction of 31 conserved single-copy genes from bacterial to build robust , demonstrating improved accuracy over single-marker approaches in microbial phylogenies. Recent tools, such as ROADIES, enable fully automated species directly from genome assemblies, handling datasets with high evolutionary rates and incomplete sampling by integrating detection and -based reconciliation. Computational demands escalate with scale; reconstructing trees for 1,000 under probabilistic models requires heuristics to manage the space, with methods like divide-and-conquer strategies reducing runtime from years to days on high-performance clusters. Challenges persist in resolving ancient divergences, where signal erosion from and heterotachy—rate variation across —can bias concatenated analyses toward artifactual groupings, as evidenced in early vertebrate phylogenies where long-branch attraction confounded placental mammal relationships until genome-wide data mitigated it. Multispecies models, implemented in software like , address gene tree discordance by estimating quartet frequencies, but demand dense sampling to distinguish incomplete sorting from , with undersampling inflating branch length variance by up to 50% in simulations. Reproducibility remains a hurdle, as proprietary pipelines and unarchived alignments hinder verification; studies from 2019 highlight that only 20-30% of published phylogenomic trees include and code, impeding meta-analyses of evolutionary rates across clades. Advances in read-based inference, such as Read2Tree, bypass assembly errors by directly grouping raw sequencing reads into gene families for tree building, achieving concordance within 5% of reference trees for datasets exceeding 100 genomes. These methods underscore phylogenomics' power for resolving polytomies in radiations, like the angiosperm explosion, where integrating 400+ loci yielded dated trees with 95% bootstrap support for key nodes previously unresolved.

Limitations and Empirical Challenges

Violations of tree model assumptions

The phylogenetic tree model fundamentally assumes that evolutionary relationships form a strictly hierarchical, bifurcating structure driven by vertical descent, with no post-divergence gene flow between lineages. This idealization posits a single, shared ancestral history captured by a species tree, where genetic similarities reflect common ancestry without reticulation or conflicting signals from non-tree processes. Violations occur when biological realities introduce network-like elements, such as horizontal gene transfer (HGT) or hybridization, which create multiple parental contributions to descendant lineages, rendering a pure tree topology inadequate. HGT exemplifies a major violation, particularly prevalent in prokaryotes where genes can transfer laterally across distant taxa, decoupling individual gene histories from the organismal phylogeny. In and , HGT rates can exceed 10-20% of gene content in some genomes, leading to mosaic evolutionary patterns that confound tree reconstruction by introducing phylogenetic incongruence. For instance, analyses of prokaryotic genomes reveal that HGT disrupts universal markers like ribosomal genes, with up to 90% of microbial gene families showing evidence of transfer events over . This process benefits to extreme environments but systematically biases distance-based and likelihood methods toward incorrect branching, as transferred genes embed foreign branches into recipient clades. In eukaryotes, hybridization and similarly breach tree assumptions by enabling between diverged species, often via fertile hybrids or . Plant phylogenies, for example, frequently exhibit reticulate signals from and allopolyploidy, with over 15% of angiosperm events involving hybridization, creating chimeric genomes that yield conflicting gene trees. Animal cases, such as archaic admixture in humans (1-4% DNA in non-Africans) or in , further illustrate how propagates adaptive alleles across species boundaries, violating the no-reticulation premise. Multispecies models can mitigate incomplete (ILS)—a where ancestral polymorphisms persist through rapid radiations, generating 20-50% gene tree discordance in mammalian clades—but true reticulation from remains unresolvable under strict frameworks, necessitating approaches. Parametric assumptions, such as site independence and homogeneous substitution rates across the tree, are also routinely violated, amplifying structural flaws. Compositional heterogeneity, where base frequencies vary systematically (e.g., biases in mitochondrial vs. genes), can induce long-branch attraction artifacts, misplacing fast-evolving lineages. Empirical tests across datasets show model misspecification affects up to 30% of branches in simulated phylogenies, underscoring the need for violation detection via posterior predictive checks or local model assessments. These breaches highlight that while trees approximate macroevolutionary patterns, pervasive reticulation and stochastic variance demand cautious interpretation, often integrating networks for reticulate-heavy clades like microbes or .

Sources of systematic error and incongruence

Systematic errors in phylogenetic occur when methodological or model-based biases consistently favor incorrect topologies over the true evolutionary , distinct from random noise that diminishes with larger datasets. These errors often stem from violations of assumptions, such as unequal evolutionary rates across lineages (heterotachy), which can distort branch length estimates and mislead distance-based or methods. Compositional heterogeneity, where or frequencies vary systematically among taxa, further exacerbates this by inflating apparent similarities between unrelated fast-evolving lineages, as documented in analyses of microbial and eukaryotic datasets. Site-specific rate variation, if inadequately modeled, leads to accumulation that obscures phylogenetic signal, particularly in ancient divergences where multiple substitutions saturate branches. A prominent example is long-branch attraction (LBA), first formalized by Felsenstein in 1978, wherein rapidly evolving taxa with extended branches artifactually cluster due to convergent losses of signal or shared derived states misinterpreted as synapomorphies. LBA is prevalent under and certain maximum-likelihood implementations without rate-across-site corrections, as simulations show it persists even with accurate models if long branches are unbalanced in the tree. inference errors, arising from paralog contamination or incomplete sampling, introduce systematic bias correlated with phylogenetic distance, where distantly related are more prone to misassignment, inflating support for erroneous clades in concatenated analyses. Phylogenetic incongruence manifests as topological discordance across gene trees or datasets, attributable to both methodological artifacts and genuine biological processes. Systematic incongruence from model misspecification amplifies with dataset size, as unmodeled heterogeneities (e.g., ) propagate errors genome-wide, yielding high-confidence but false inferences. Biological sources include incomplete lineage sorting (ILS), where ancestral polymorphisms fail to coalesce before , generating gene tree topologies that deviate from the tree in up to 30% of loci during rapid radiations, as observed in and phylogenies. (HGT) in prokaryotes and hybridization in eukaryotes introduce reticulate signals, causing localized incongruence; for instance, HGT rates exceed 10% in bacterial core genomes, confounding vertical inheritance assumptions. Distinguishing these sources requires quartet-based tests or multispecies coalescent models, which quantify ILS versus contributions, revealing that apparent discordance often reflects hemiplasy—ILS-masked allelic variation—rather than alone. Plesiomorphic states, retained ancestral traits mistaken for synapomorphies, systematically bias toward basal placements of conserved lineages, resolvable by excluding symplesiomorphic sites but persistent in uncorrected datasets. Overall, while stochastic error averages out with phylogenomic scale, systematic biases demand rigorous model testing and anomaly zone awareness to avoid overconfident resolutions of hard polytomies.

Interpretational pitfalls and overconfidence risks

One common interpretational pitfall involves equating high statistical support values, such as bootstrap percentages exceeding 95% or Bayesian posterior probabilities above 0.95, with definitive evidence of true evolutionary relationships, despite potential violations of model assumptions like stationarity or homogeneity of evolutionary rates. Systematic biases, including long-branch attraction—where rapidly evolving lineages cluster artifactually due to shared rather than —can produce misleadingly resolved topologies that appear robust but reflect methodological artifacts rather than biological history. For instance, compositional heterogeneity in sequence data, where base or frequencies vary across lineages, often goes undetected and inflates confidence in incorrect clades, as models fail to adequately or correct for such heterotachy. Overconfidence risks escalate in Bayesian phylogenetic inference under model misspecification, where equally inadequate models competing for the same data can yield polarized posterior probabilities, assigning near-certainty (e.g., 0.9999) to a favored but erroneous topology in datasets with hundreds of sites. Simulations demonstrate that this "type-3" volatile behavior occurs systematically, with the method prematurely rejecting alternatives before sufficient evidence accumulates, leading researchers to overstate reliability without verifying model fit via alternatives like non-Bayesian tests or bootstrap resampling. In phylogenomics, aggregating vast genomic datasets without resolving incongruence from processes like incomplete or exacerbates this, as increased data volume reinforces systematic errors rather than mitigating them, resulting in overconfident inferences of deep divergences. Interpreters must also guard against conflating tree topology with causal evolutionary narratives, such as assuming branch lengths uniformly proxy absolute time or assuming strict bifurcations preclude reticulate events, which can lead to erroneous projections of trait or biogeographic histories. Empirical cases, like persistent debates in or phylogenies despite genome-scale data, underscore how unaddressed systematic errors sustain controversy, urging cross-method validation and sensitivity analyses to temper overreliance on any single tree. Recommendations include incorporating multifurcating priors or exploring model adequacy through posterior predictive checks to reveal hidden uncertainties, thereby aligning interpretations more closely with empirical realities.

Alternatives to Strict Tree Models

Phylogenetic networks for reticulate evolution

Phylogenetic networks represent evolutionary histories that include reticulate events, such as hybridization and (HGT), where genetic material is exchanged between divergent lineages rather than strictly diverging in a tree-like manner. Unlike phylogenetic trees, which assume bifurcating descent without merging branches, networks incorporate directed acyclic graphs with reticulation nodes to model these non-tree processes, allowing multiple parents for descendant lineages. Reticulate evolution is prevalent in prokaryotes via HGT, which can transfer genes across distant taxa, and in eukaryotes like through , where fertile hybrids form new species. In animals, it manifests as , as seen in archaic human admixture with Neanderthals and Denisovans. Methods for constructing phylogenetic networks fall into two main categories: unrooted split networks, which visualize conflicting phylogenetic signals from matrices or data without explicit ancestry, and rooted reticulation networks, which infer explicit hybridization or transfer events using tree discordance or multispecies models. Split networks, implemented in software like SplitsTree, decompose into compatible and incompatible splits to display reticulation zones as parallelograms or boxes, useful for exploratory of recombination or incomplete lineage sorting. Rooted networks employ algorithms such as maximum for minimizing reticulation events or via tools like PhyloNet, which integrate multiple trees to estimate hybridization probabilities under the network multispecies . For instance, quartet-based methods decompose networks into four-taxon subnetworks to infer local reticulations efficiently, scalable to dozens of taxa. Applications of phylogenetic networks have revealed reticulate patterns in diverse systems, including bacterial pangenomes shaped by frequent HGT and plant radiations like those in , where hybridization drives speciation bursts. In mosquitoes of the complex, networks combining coalescent models with gene trees uncovered extensive , informing strategies. However, inferring networks faces challenges in , as certain reticulation topologies produce identical gene tree distributions, and computational demands grow exponentially with reticulation number, limiting analyses to small-to-moderate sets without approximations. Recent advances, such as algebraic invariants for detecting four- hybridization cycles, enable ultrafast inference from genomic data, enhancing detection in phylogenomic datasets. Despite these tools, distinguishing reticulation from tree-like processes like incomplete sorting requires multiple locus sampling and statistical validation to avoid overparameterization.

Supertrees and consensus methods

Supertree methods synthesize a comprehensive from multiple source trees that partially overlap in taxa but do not necessarily share all leaves, enabling the integration of heterogeneous datasets such as those from different genes or morphological studies. This approach addresses limitations of strict tree models by accommodating incomplete taxonomic sampling across studies, producing a supertree that encompasses all taxa from the input set while resolving relationships where possible. Common algorithms include the BUILD , which constructs supertrees from rooted triplet consistencies using a recursive divide-and-conquer to check compatibility among overlapping clades. Other techniques, such as Robinson-Foulds supertrees, minimize distances between source trees and candidate supertrees to preserve topological information, though they may require heuristics for computational tractability on large inputs. Despite these advances, supertree construction can introduce artifacts if source trees contain errors or conflicts, as the prioritizes compatibility over individual tree accuracy, potentially yielding resolutions unsupported by any single dataset. In contrast, consensus methods summarize a collection of trees defined on identical sets, typically derived from bootstrap replicates, Bayesian posteriors, or multiple inferences, to represent shared phylogenetic signal amid variation. Strict trees retain only s present in all input trees, ensuring maximal agreement but often resulting in unresolved polytomies when conflicts arise. Majority-rule , by including s supported in over 50% of trees, provides greater resolution and is widely used for summarizing posterior distributions in Bayesian analyses, with branch support values indicating frequencies. Advanced variants, such as rooted triple , focus on triplet consistencies for statistical consistency under species models, outperforming simpler methods in simulations of incomplete . However, approaches risk over-resolving weakly supported structures or masking systematic incongruences, as they average topologies without resolving underlying causes like discordance. Both supertrees and consensus methods serve as pragmatic alternatives to enforcing a single strict tree when generate conflicting signals, facilitating large-scale syntheses in phylogenomics; for instance, supertrees have assembled phylogenies from over 100 source trees spanning thousands of taxa. Yet, their outputs demand caution, as neither guarantees optimality under complex evolutionary processes like reticulation, and empirical evaluations show that supertree topologies can deviate from reference trees by up to 20% in branch lengths or support when source are noisy. Recent pipelines, such as those using dynamic programming for supertree correction, aim to mitigate these by iteratively refining against genomic , but scalability remains limited for datasets exceeding millions of leaves without approximations.

Integration with webs and non-tree representations

Phylogenetic networks extend tree models by incorporating reticulate events such as and hybridization, represented as directed acyclic graphs with additional edges beyond bifurcating branches. These webs integrate with trees through methods that embed gene trees into overarching networks, resolving conflicts via or likelihood optimization to infer reticulation points. For instance, protocols combining maximum tree with network have been applied to , identifying hybridization events that trees alone overlook. Non-tree representations, such as the " of life" model, depict phylogeny as anastomosing structures where ancestral lineages persist and fuse, contrasting strict trees by accommodating incomplete lineage sorting and without assuming exhaustive branching extinction. Proposed by in and formalized mathematically, this approach visualizes as a dynamic supported by "dead" basal branches, better capturing microbial and plant reticulation than dichotomous trees. occurs via hybrid visualizations, like outlines that planarize tree incompatibilities into network-like diagrams, facilitating comparison of conflicting datasets. In phylogenomics, tools intertwine trees and networks by aligning shared edges and quantifying reticulation support, as in SplitsTree extensions for explicit network rendering overlaid on tree scaffolds. Recent analyses in , such as , use integrated phylogenomic pipelines to unveil reticulate , blending tree clades with network reticulations to model Himalayan dynamics. These approaches mitigate tree-model limitations by prioritizing empirical incongruence, though they demand computational validation to distinguish signal from noise in reticulation inference.

Recent Advances

Big data and phylogenomic pipelines

The proliferation of high-throughput sequencing technologies has produced enormous phylogenomic datasets, often comprising thousands of orthologous genes across hundreds to thousands of taxa, fundamentally transforming . By 2023, public repositories like NCBI hosted over 1 million bacterial genomes alone, enabling comprehensive analyses but demanding scalable computational frameworks to process alignments, filter noisy loci, and infer trees while accounting for incomplete lineage sorting and systematic errors. Key challenges in handling such include the exponential growth in alignment sizes, which can exceed terabytes, leading to prohibitive runtime for traditional maximum likelihood methods, as well as locus-specific biases from or sequencing artifacts that exacerbate gene-tree incongruence. Solutions have centered on automated pipelines that integrate ortholog detection, , trimming, and coalescent-based species-tree estimation, often leveraging or approximations like single-precision arithmetic to achieve feasibility. For example, divide-and-conquer approaches partition datasets into manageable subsets for quartet-based inference before aggregating via summary methods, yielding accurate large-scale trees with reduced computational overhead compared to full supermatrix analyses. Recent pipelines exemplify these advances: EukPhylo v.1.0 (2025) offers a modular for eukaryotes, automating selection and phylogeny-informed with built-in contamination filtering, facilitating replication across diverse datasets. OrthoPhyl (2024), tailored for bacterial genomes, streamlines core ortholog identification and building via progressive , outperforming ad-hoc scripts in consistency. EasyCGTree (2023), a cross-platform tool for prokaryotes, identifies 120 universal core genes, constructs , and infers using maximum likelihood, processing dozens of genomes in hours on standard hardware. VeryFastTree (2024) extends this , inferring for up to 1 million leaves on single servers by optimizing neighbor-joining heuristics, demonstrating near-linear over prior versions. These tools prioritize through and standardized outputs, mitigating pitfalls like software version drift noted in earlier big-data .

Incorporation of structural and phenotypic data

In recent phylogenomic analyses, phenotypic data—encompassing discrete morphological characters and continuous morphometric measurements—have been integrated with molecular sequences to form total-evidence datasets, aiming to resolve conflicts arising from incomplete molecular signal, especially in fossil-calibrated trees or groups with rapid radiations. However, a 2024 systematic review of 12 studies found that incorporating continuous morphometric data, such as geometric morphometrics from landmark-based analyses, does not significantly improve phylogenetic resolution or congruence with molecular benchmarks compared to discrete morphological characters alone; combined datasets occasionally outperform continuous data but show no overall enhancement in node support or tree accuracy. This limited utility stems from challenges in character homology assessment and the discrete nature of evolutionary innovations, though advances in imaging and machine learning, including deep learning extraction of traits from specimen photographs, enable scalable phenotyping for insects and other taxa, potentially aiding total-evidence approaches in under-sequenced lineages. Structural data, particularly three-dimensional protein folds, offer a complementary signal conserved 3-10 times longer than primary s, facilitating of deep evolutionary relationships obscured by sequence saturation. Methods include distance-based metrics (e.g., or TM-score for structural similarity) and model-based approaches like incorporating structural alignments via tools such as TM-align or Foldseek's 3Di alphabet, which discretizes folds into analyzable sequences. The 2021 advent of enabled accurate prediction of structures for millions of proteins, reducing reliance on experimentally determined data and mitigating taxonomic biases in structural databases, thus allowing hybrid phylogenies that weight structural conservation alongside genomic data. Hybrid techniques further leverage structural information to refine tree support; for instance, the 2025 multistrap computes intra-molecular matrices from protein structures to generate bootstrap replicates, which are averaged with sequence-based maximum likelihood and minimum estimates, improving branch support accuracy (e.g., AUC rising from 0.843 to 0.880) and requiring fewer data columns for robust recovery in simulated and empirical datasets spanning 508 alignments. These integrations highlight structural data's role in addressing molecular incongruences, though challenges persist in aligning divergent folds and validating AI-predicted structures against functional .

Machine learning and algorithmic innovations

Machine learning has emerged as a powerful tool for phylogenetic tree , particularly in handling large-scale genomic data where traditional methods like or maximum likelihood struggle with computational demands. Supervised approaches, trained on simulated , predict tree topologies, branch lengths, and models by learning patterns in evolutionary signals, often outperforming searches in accuracy for specific datasets. neural networks enable rapid by encoding sequences into latent representations that capture phylogenetic structure without explicit alignment. Generative adversarial networks (GANs), such as phyloGAN developed in 2023, infer species relationships directly from concatenated alignments or sets of gene alignments, generating plausible tree distributions that approximate posterior probabilities under complex models. This method leverages adversarial training to refine tree predictions against simulated evolutionary scenarios, reducing reliance on sampling. Convolutional neural network-based frameworks like , introduced in 2023, extend quartet-based to multi-species trees, processing unaligned sequences via layers to output bifurcating topologies with branch supports. End-to-end models, including sequence encoders paired with tree decoders, reconstruct phylogenies from raw data as demonstrated in prototypes from 2025, bypassing intermediate steps like computation. formulations treat tree building as an optimization game, where agents iteratively refine topologies by rewarding congruence with input data, achieving competitive results on empirical datasets in studies published in 2024. These approaches scale to phylogenomic scales, with tools like PhyloTune accelerating incremental tree updates via learned approximations of likelihood surfaces. Neural networks also facilitate substitution model selection and parameter estimation from observed trees, providing alternatives to maximum likelihood when analytical solutions are intractable, as shown in evaluations from where they matched or exceeded traditional estimators on simulated phylogenies. However, critical assessments highlight limitations, including sensitivity to data biases and reduced generalizability beyond simulated regimes, underscoring the need for methods integrating with probabilistic foundations. Alignment-free predictors like Phyloformer use architectures to estimate evolutionary distances for neighbor-joining inputs, enhancing efficiency for unaligned sequences in 2025 benchmarks.

References

  1. [1]
  2. [2]
    Reading trees: A quick review - Understanding Evolution
    A phylogeny, or evolutionary tree, represents the evolutionary relationships among a set of organisms or groups of organisms, called taxa (singular: taxon).
  3. [3]
    Understanding phylogenies - Understanding Evolution - UC Berkeley
    Phylogenies trace patterns of shared ancestry between lineages. Each lineage has a part of its history that is unique to it alone and parts that are shared ...
  4. [4]
    Common Methods for Phylogenetic Tree Construction and Their ...
    May 11, 2024 · In this review, we summarize common methods for constructing phylogenetic trees, including distance methods, maximum parsimony, maximum likelihood, Bayesian ...
  5. [5]
    Understanding Evolutionary Trees | Evolution
    Feb 12, 2008 · Charles Darwin sketched his first evolutionary tree in 1837, and trees have remained a central metaphor in evolutionary biology up to the ...
  6. [6]
    Systematic errors in phylogenetic trees - ScienceDirect.com
    Jan 25, 2021 · Phylogenetic trees are now routinely inferred using hundreds of thousands of amino acid or nucleotide characters. It thus seems surprising that ...
  7. [7]
    Causes, consequences and solutions of phylogenetic incongruence
    May 28, 2014 · In this article, I concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved ...
  8. [8]
    Scientific, historical, and conceptual significance of the first tree of life
    Jan 20, 2012 · In describing the phylogenetic relationships, the results also charted the first scientific view of deep evolutionary history. Both these ...
  9. [9]
    Is evolutionary biology becoming too politically correct? A reflection ...
    Jun 11, 2014 · The notion of scala naturae dates back to thinkers such as Aristotle, who placed plants below animals and ranked the latter along a graded scale of complexity.
  10. [10]
    History of Evolutionary Thought - UBC Zoology
    Aristotle devised a hierarchical arrangement of natural forms, termed the "Scala Naturae" or Chain of Being. Species were arranged in a linear fashion along ...
  11. [11]
    The great chain of being is still here | Evolution
    Jun 28, 2013 · We aimed at discovering examples of scala naturae language. This is generally expressed by contrasting lower with higher representatives of a ...
  12. [12]
    Trees and networks before and after Darwin - Biology Direct
    Nov 16, 2009 · It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859).
  13. [13]
    The Modern Evolutionary Synthesis · 150 Years of On the Origin of ...
    Some of the principal scientists who contributed significantly to the Modern Evolutionary Synthesis were Theodosius Dobzhansky, Ernst Mayr, George Gaylord ...
  14. [14]
    1.7: Modern Synthesis - Social Sci LibreTexts
    Dec 3, 2020 · Mayr's Systematics and the Origin of Species (1942) is one of the key works of the Modern Synthesis. Figure. 7 .Missing: history | Show results with:history
  15. [15]
    Timeline of evolutionary theory
    Julian Huxley (1887-1975) publishes Evolution: The Modern Synthesis. 1942. Ernst Mayr. Ernst Mayr (1904-2005) publishes Systematics and the Origin of Species ...
  16. [16]
    Evolutionary Taxonomy and the Cladistic Challenge (Chapter 4)
    A classification should represent evolutionary history. For the evolutionary taxonomists, it should represent genealogy, evolutionary diversification, and ...
  17. [17]
    Willi Hennig: a shy man behind a scientific revolution (Chapter 2)
    Jul 5, 2016 · The book was only published in 1950, and marks a turning point in biological systematics (Hennig Reference Hennig1950). Hennig was released from ...
  18. [18]
    Willi Hennig | Phylogenetic Systematics, Cladistics, Taxonomy
    Willi Hennig was a German zoologist recognized as the leading proponent of the cladistic school of phylogenetic systematics. According to this school of ...
  19. [19]
    (PDF) Hennig's principles' and methods of phylogenetic systematics
    Hennig (1966) noted the importance of extensive comparisons among taxa to fully understand homology and hence synapomorphies.
  20. [20]
    Systematics
    Synthetic and cladistic taxonomy differ from each other primarily in the repeatability of the method. Cladistics, since it follows a more precise analytical ...
  21. [21]
    Heed the father of cladistics - Nature
    Apr 17, 2013 · The German entomologist and palaeontologist Willi Hennig transformed the classification of organisms into the rigorous science of cladistics.
  22. [22]
    Hennig, Phylogenetics, and Evolution | Edward Wiley | Inference
    Phylogenetic systematics is good biology because it is good evolutionary biology as applied to reconstructing phylogenies and ordering diversity.
  23. [23]
    Molecules as documents of evolutionary history - ScienceDirect
    This article, written in 1963, was published early in 1964, in a Russian translation, in the volume Problems of Evolutionary and Technical Biochemistry, ...Missing: 1960s | Show results with:1960s
  24. [24]
    Emile Zuckerkandl, Linus Pauling, and the molecular evolutionary ...
    Emile Zuckerkandl, Linus Pauling, and the molecular evolutionary clock, 1959-1965. J Hist Biol. 1998 Summer;31(2):155-78. doi: 10.1023/a:1004394418084.Missing: 1960s | Show results with:1960s
  25. [25]
    Narrative - 37. Molecular Evolutionary Clock
    During his final years at Caltech, the early 1960s, Pauling started a new line of inquiry with the aid of Emile Zuckerkandl. They proposed an evolutionary ...
  26. [26]
    Phylogenetic structure of the prokaryotic domain - PNAS
    A phylogenetic analysis based upon ribosomal RNA sequence characterization reveals that living systems represent one of three aboriginal lines of descent.
  27. [27]
    The discovery of archaea: from observed anomaly to consequential ...
    Mar 26, 2024 · This paper contemplates the history and philosophical implications of the discovery by Carl Woese in the 1970s of archaea, a third domain of the ...
  28. [28]
    Molecular Phylogenetics - Genomes - NCBI Bookshelf - NIH
    Molecular phylogenetics has grown in stature since the start of the 1990s, largely because of the development of more rigorous methods for tree building, ...Learning outcomes · The Origins of Molecular... · The Applications of Molecular...
  29. [29]
    Phylogenetic trees | Evolutionary tree (article) - Khan Academy
    A phylogenetic tree is a diagram that represents evolutionary relationships among organisms. Phylogenetic trees are hypotheses, not definitive facts. The ...
  30. [30]
    Phylogenetic Trees | Biological Principles
    A phylogenetic tree is a visual representation of the relationship between different organisms, showing the path through evolutionary time.
  31. [31]
    Structure of Phylogenetic Trees | Biology for Majors I - Lumen Learning
    A phylogenetic tree can be read like a map of evolutionary history. Many phylogenetic trees have a single lineage at the base representing a common ancestor.
  32. [32]
    2.08: Phylogenetic Trees: Modeling Evolution - Biology LibreTexts
    May 30, 2023 · Understanding Phylogenies. The following diagram describes the different components of phylogenetic trees. Click on the Information tab in ...
  33. [33]
    Phylogenetic Trees - cs.wisc.edu
    The root represents the common ancestor. Some trees do not have a well-defined common ancestor. We calls these unrooted trees. An unrooted tree specifies ...
  34. [34]
    Rooting Trees, Methods for - PMC - PubMed Central
    Rooted versus Unrooted. Phylogenetic trees are either rooted or unrooted, depending on the research questions being addressed. The root of the phylogenetic tree ...
  35. [35]
    [PDF] Lecture Outline: Trees
    Unrooted Tree An unrooted tree is drawn without reference to the direction of time. An unrooted tree represents all of the rooted trees consistent with it.
  36. [36]
    A reference guide for tree analysis and visualization - PMC
    A rooted phylogenetic tree is a directed tree with a unique node that is in the highest part of the hierarchy and is recognized as the root node of the tree.
  37. [37]
    [PDF] Phylogeny Tree Algorithms
    For n sequences, the number of unrooted tree is (2n-5)!!. • For n sequences, the number of rooted tree is (2n-3)!!. Page 16 ...
  38. [38]
    Phylogenetic Tree - an overview | ScienceDirect Topics
    A phylogenetic tree or evolutionary tree is a diagrammatic representation of the evolutionary relationships among various taxa.Missing: multifurcating | Show results with:multifurcating
  39. [39]
    Testing for Polytomies in Phylogenetic Species Trees Using Quartet ...
    Feb 28, 2018 · 2.2. A Statistical Test of Polytomy. A true polytomy is mathematically identical to a bifurcating node that has at least one adjacent branch ...
  40. [40]
    Bijections between the multifurcating unlabeled rooted trees and the ...
    In mathematical and biological applications of unlabeled rooted trees, however, nodes of rooted trees are sometimes multifurcating rather than bifurcating.Missing: versus | Show results with:versus
  41. [41]
    Phylogenetic Tree - an overview | ScienceDirect Topics
    Trees do not need to be strictly bifurcating and a node can have more than two descendant branches. This is known as multifurcation. This could be a genuine ...
  42. [42]
    Polytomy identification in microbial phylogenetic reconstruction - NIH
    Dec 23, 2011 · Microbial taxonomy imposes many difficulties in determining whether generated phylogenetic tree branches are bifurcations or multifurcations.
  43. [43]
    The prevalence of multifurcations in tree-space and their ... - PubMed
    Jun 28, 2010 · We show tree-search can prematurely terminate if it encounters multifurcating trees. We validate the relevance of this result by demonstrating that in real dataMissing: versus | Show results with:versus
  44. [44]
    A simple polytomy resolver for dated phylogenies - Kuhn - 2011
    Mar 21, 2011 · In phylogenetic analysis, polytomous nodes (multifurcations rather than bifurcations) can be considered 'soft' (incomplete taxonomic resolution ...
  45. [45]
    Phylogenetic Tree - an overview | ScienceDirect Topics
    A phylogenetic tree on a set of taxa is a tree whose leaves are labeled bijectively (i.e., every taxon labels exactly one leaf in the tree, and no leaf is ...
  46. [46]
    [PDF] Introduction Phylogenetic Trees - cs.Princeton
    In rooted trees, the root is the common ancestor of all OTUs under study. The path from root to a node defines an evolutionary path. An unrooted tree specifies.
  47. [47]
    [PDF] Comparing and Aggregating Partially Resolved Trees
    Such trees have uniquely labeled leaves, correspond- ing to the species, and unlabeled internal nodes, representing hypothetical ancestors. The trees may be ...
  48. [48]
    [PDF] Bijections between the multifurcating unlabeled rooted trees and the ...
    This paper describes bijections between unlabeled multifurcating rooted trees and positive integers, for trees with exactly k or at most k child nodes at non- ...
  49. [49]
    Statistical summaries of unlabelled evolutionary trees - PMC
    In this article, we use the previously defined distance metrics on unlabelled ranked evolutionary trees to understand distributional properties of some popular ...
  50. [50]
    10.5: Tree topology, tree shape, and tree balance under a birth ...
    Feb 19, 2022 · Tree topology summarizes the patterns of evolutionary relatedness among a group of species independent of the branch lengths of a phylogenetic tree.
  51. [51]
    Recursive algorithms for phylogenetic tree counting
    Oct 28, 2013 · We give a solution to this problem for rooted, ranked, labeled trees and generalise the algorithm to count resolutions to fully ranked trees.<|separator|>
  52. [52]
    [PDF] Enumeration and Simulation of Random Tree Topologies
    This shows that the two unlabeled topologies do not have the same number of labeled cases: there are 12 for the unbalanced topologies, and 3 for the ...<|control11|><|separator|>
  53. [53]
    A Metric on Phylogenetic Tree Shapes - PMC - PubMed Central - NIH
    The labeling scheme maps tree shapes and natural numbers in a bijective way: each tree has a unique label (the label of its root node) and each natural number ( ...
  54. [54]
    TurboTree: a fast algorithm for minimal trees - Oxford Academic
    The number of unrooted binary trees for n taxa is (2n—5)!! where the double factorial notation (for odd integers) is. 1.3.5.7. 2n—5. The number of possible ...
  55. [55]
    [PDF] Fixed-Parameter Algorithms in Phylogenetics - Institut für ...
    The number of possible binary trees with n distinctly labeled leaves is known to be 1 · 3 · 5 · ... · (2n − 5) ≤ 2n(n − 2)!. Therefore, enumerating all ...
  56. [56]
    Toward Resolving Deep Neoaves Phylogeny: Data, Signal ...
    Similarly, each of these 15 trees has 7 edges for the sixth taxon to be added, leading to the formula B(n) 5 (2n 5)!!, for the number of unrooted binary trees.
  57. [57]
    [PDF] Subtree transfer operations and their induced metrics on ...
    Jan 28, 1999 · An unrooted binary phylogenetic tree (or more briefly a binary ... number of unrooted binary trees is (2n- 5)!!. Thus if d = L.l(GsPR(n)) ...<|control11|><|separator|>
  58. [58]
    [PDF] Lecture: Some Topics in Phylogenetics - Bioinf Leipzig
    Let us assume that T is binary. Any non-binary tree on n taxa will have less nodes and edges. Lemma. A binary phylogenetic tree T on n taxa has 2n − 2 nodes,.
  59. [59]
    [PDF] 8. Building phylogenetic trees
    Thus any set of species is related, and this relationship is called a phylogeny. Usually the relationship can be represented by a phylogenetic tree.
  60. [60]
    [PDF] Linguistic Phylogenetic Inference by PAM-like Matrices
    The number of unrooted binary trees for n languages is equal to. 3*5*7*…*(2n-5) and because for each unrooted tree there are 2n-3 possible rooted trees the ...
  61. [61]
    [PPT] Computational problems in evolutionary tree reconstruction
    Number of (unrooted) binary trees on n leaves is (2n-5)!!; If each tree on 1000 taxa could be analyzed in 0.001 seconds, we would find the best tree in. 2890 ...
  62. [62]
    (PDF) What is a cladogram and what s not? - ResearchGate
    "cladogram" has been defined as a graphical representation of an empirical hypothesis of relationships among taxa, based on evidence from synapomorphies alone.<|separator|>
  63. [63]
    Cladograms vs Phylogenetic Trees: Key Differences Explained
    Jul 3, 2024 · A cladogram is a branching diagram used to illustrate hypothetical relationships among various biological species based on shared characteristics.
  64. [64]
    Cladogram - Cladistics: Phylogenetic Systematics - Palaeos
    A cladogram is a phylogenetic tree made up of dichotomous branches, with groups of organisms or individual species represented as terminals.
  65. [65]
    Cladistic Analysis and Synthesis: Principles and Definitions, with a ...
    Cladistic analysis is the analysis of hierarchically branching diagrams (cladograms), which estimate, with more or less informativeness and efficiency, one or ...
  66. [66]
    Method for Estimating the Relative Importance of Characters in ...
    By systematically identifying those characters that directly influence the resulting cladogram, CIR spotlights those characters most in need of careful review.Missing: peer | Show results with:peer
  67. [67]
    [PDF] Cladistic Characters and Cladogram Stability Jerrold I. Davis
    Cladistic analyses were conducted using. HENNIG86 (Farris 1988); character distributions on cladograms were examined and figures were generated using CLADOS ...
  68. [68]
    Hierarchies, classifications, cladograms and phylogeny
    Feb 14, 2023 · Cladists observe similarities and differences among states of characters deemed to be homologous.
  69. [69]
    [PDF] Chapter 2 Trees
    2.5 A phylogeny and the three basic kinds of tree used to depict that phylogeny. The cladogram represents relative recency of common ancestry; the additive tree ...
  70. [70]
    [PDF] Trees and their terms - TBI
    Ultrametric trees (sometimes also called “dendrograms”) are a special kind of additive tree in which the tips of the trees are all equidistant from the root of ...
  71. [71]
    Ultrametric networks: a new tool for phylogenetic analysis - PMC
    Mar 5, 2013 · The introduction of ultrametric trees in phylogeny was inspired by a model of evolution driven by the postulate of a molecular clock, now ...
  72. [72]
    [PDF] Phylogenetic trees I Foundations, Distance-based inference
    Feb 28, 2018 · A rooted tree is ultrametric iff all tips have the same distance from the root. Irish. Hindi. Gr eek. Portugu ese. Fr ench. Nepali.
  73. [73]
    DaTeR: error-correcting phylogenetic chronograms using relative ...
    A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on available date ...
  74. [74]
    Chronogram or phylogram for ancestral state estimation? Model‐fit ...
    Apr 20, 2022 · Chronogram or phylogram for ancestral state estimation? Model-fit statistics indicate the branch lengths underlying a binary character's evolution.
  75. [75]
    Chapter 9 Phylogenies and time | Phylogenetic Biology
    By definition, in a chronogram branch lengths are in units of time. The age of each node is specified. This has a couple implications for what we can say about ...
  76. [76]
    DaTeR: error-correcting phylogenetic chronograms using relative ...
    Feb 8, 2023 · A chronogram is a dated phylogenetic tree whose branch lengths have been scaled to represent time. Such chronograms are computed based on ...
  77. [77]
    [PDF] Tutorial
    A dendrogram is not, however, an phylogenetic tree because it does not show evolutionary information. Figure 1. Example of a dendrogram. Page 4. DendroUPGMA ...
  78. [78]
    Chapter 2 Phylogenies | Phylogenetic Biology
    To build and analyze trees it is better to have a format that has a more direct representation of nodes, branches, and their annotations. This allows us to ...
  79. [79]
    Evolutionary systematics: Spindle Diagrams - Palaeos
    A romerogram (spindle diagram), showing the evolution of hoofed mammals plotting diversity (horizontal axis) against time (vertical axis).
  80. [80]
    Fig. 1. Spindle diagrams and phylogenetic relationships of groups...
    In this paper, we quantify the morphological dynamics of the Graptoloidea during the end-Ordovician mass extinction, which began ∼ 445 million years ago.
  81. [81]
    File:Spindle diagram.jpg - Wikimedia Commons
    Feb 4, 2011 · The width of the spindle represents the number of families as a rough estimate of diversity. The diagram is based on Benton, M. J. (1998) The ...
  82. [82]
    The Coral of Life | Evolutionary Biology
    Apr 30, 2019 · The tree of life should perhaps be called the coral of life, base of branches dead; so that passages cannot be seen. This is a clear reference ...
  83. [83]
    Deep Phylogeny—How a Tree Can Help Characterize Early Life on ...
    Rooting (curiously not termed trunking) trees is theoretically possible if life shared a common ancestor and if a gene made a duplicate copy of itself (paralog) ...
  84. [84]
    Trees in the Web of Life | Journal of Biology - BioMed Central
    Jul 13, 2009 · While Darwin argued that the 'Coral of Life' may be a more apt description (since only the surface remains alive, supported by the dead ...
  85. [85]
    [PDF] biological innovation and the coral of life Horizontal gene transfer ...
    While Darwin also frequently used the term 'tree of life', he suggested that 'coral of life' would be more appropriate, as the base of coral is made of extinct,.
  86. [86]
    A Practical Guide to Design and Assess a Phylogenomic Study - PMC
    The advent of “big data” molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges.
  87. [87]
    Morphological and molecular convergences in mammalian ...
    Sep 2, 2016 · Phylogenetic trees reconstructed from molecular sequences are often considered more reliable than those reconstructed from morphological ...
  88. [88]
    Data‐driven guidelines for phylogenomic analyses using SNP data
    Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths.
  89. [89]
    Data pre-processing for analyzing microbiome data – A mini review
    Key steps include quality filtering, batch effect correction, imputation of missing values, data normalization, and transformation (Fig. 2). The strengths and ...
  90. [90]
    Understanding Phylogenetics - Geneious
    Phylogenetics is the study of evolutionary relatedness between organisms based on DNA, RNA, or protein sequences, producing phylogenetic trees.<|separator|>
  91. [91]
    [PDF] Distance Methods - Rice University
    If the evolutionary rate is constant over time, the distance will increase linearly with the time of divergence. A simplistic distance measure is the proportion.
  92. [92]
    UPGMA - an overview | ScienceDirect Topics
    UPGMA (unweighted pair group method with arithmetic mean; Sokal and Michener 1958) is a straightforward approach to constructing a phylogenetic tree from a ...
  93. [93]
    a new method for reconstructing phylogenetic trees - PubMed
    Authors. N Saitou , M Nei ... A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data.
  94. [94]
    Combinatorial and Computational Investigations of Neighbor ...
    The Neighbor-Joining algorithm is a popular distance-based phylogenetic method that computes a tree metric from a dissimilarity map arising from biological data ...Abstract · Introduction · The Neighbor-Joining Algorithm · Estimated Volumes
  95. [95]
    PhyloM: A Computer Program for Phylogenetic Inference from ... - NIH
    May 11, 2022 · Least squares (LS) and Minimum Evolution (ME) are two distance methods of phylogenetic inference. The neighbor-joining (NJ) method, developed by ...
  96. [96]
    26.3: Distance Based Methods - Biology LibreTexts
    Mar 17, 2021 · The distance based models sequester the sequence data into pairwise distances. This step loses some information, but sets up the platform for direct tree ...
  97. [97]
    Character-based methods | Bioinformatics Class Notes - Fiveable
    These approaches analyze discrete traits or features of organisms, such as DNA sequences or morphological characteristics, to reconstruct phylogenetic trees ...
  98. [98]
    [PDF] Model based phylogenetics - The University of Texas at Dallas
    Character based methods. • Explicitly model how the characters change. – easier to predict ancestral states. – model can be used to score candidate trees.<|separator|>
  99. [99]
    Techniques for Phylogenetic Tree Construction | by Monika Mate
    Mar 22, 2024 · Character-based methods, also known as parsimony methods, analyze discrete characters or traits shared among species to infer evolutionary ...
  100. [100]
    Maximum Parsimony on Phylogenetic networks
    May 2, 2012 · Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a ...Missing: review | Show results with:review
  101. [101]
    A comparison of two methods to study correlated discrete characters ...
    We use a simulation approach to study two methods proposed for the analysis of correlated discrete characters on cladograms.
  102. [102]
    Parsimony, Likelihood, and the Role of Models in Molecular ...
    Maximum parsimony (MP) is a popular technique for phylogeny reconstruction. However, MP is often criticized as being a statistically unsound method and one that ...
  103. [103]
    The effect of natural selection on the performance of maximum ...
    Maximum parsimony, as well as most other phylogeny reconstruction methods, may perform significantly better on actual biological data than is currently ...
  104. [104]
    Phylogenetic Inference: Maximum Likelihood Methods
    Oct 31, 2023 · The idea of using a maximum likelihood (ML) method for phylogenetic inference was first presented by Cavalli-Sforza and Edwards (1967) for gene frequency data.
  105. [105]
    Felsenstein Phylogenetic Likelihood - PMC - PubMed Central
    Jan 13, 2021 · The main goal of Felsenstein's 1981 JME article was to show how to efficiently calculate the probability of a set of aligned nucleotide ...
  106. [106]
    [PDF] Maximum Likelihood Methods for Phylogenetic Inference
    Dec 29, 2015 · In this article, we provide an overview of maximum likelihood methods for phylogenetic inference. A brief introduction to general maximum ...
  107. [107]
    Evolutionary trees from DNA sequences: A maximum likelihood ...
    The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed.
  108. [108]
    6 - Phylogenetic inference using maximum likelihood methods
    In the phylogenetic framework, one part of the model is that sequences actually evolve according to a tree. The possible hypotheses include the different tree ...
  109. [109]
    A biologist's guide to Bayesian phylogenetic analysis - PMC - NIH
    The most common type of data used in phylogenetic analyses is DNA and ... molecular data to estimate divergence times for extant and fossil species–.
  110. [110]
    MRBAYES: Bayesian inference of phylogenetic trees | Bioinformatics
    The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo.
  111. [111]
    Overcredibility of molecular phylogenies obtained by Bayesian ...
    We show by computer simulation that posterior probabilities in Bayesian analysis can be excessively liberal when concatenated gene sequences are used.Abstract · Sign Up For Pnas Alerts · Results
  112. [112]
    Scalable Bayesian phylogenetics - Journals
    Aug 22, 2022 · In this review, we present recent developments in Bayesian phylogenetics that begin to answer this call for scalable phylodynamic methods. In ...
  113. [113]
    Neighbor-Joining Revealed | Molecular Biology and Evolution
    The method has become the most widely used method for building phylogenetic trees from distances, and the original paper has been cited about 13,000 times ( ...
  114. [114]
    IQ-TREE: Efficient phylogenomic software by maximum likelihood
    A fast search algorithm (Nguyen et al., 2015) to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML.
  115. [115]
    BEAST Software - Bayesian Evolutionary Analysis Sampling Trees ...
    BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies.Installing BEAST on Windows · Tutorials · Molecular Clocks · Installing BEAST
  116. [116]
    Phylo-rs: an extensible phylogenetic analysis library in rust
    Jul 29, 2025 · Phylo-rs focuses on the efficient and convenient deployment of software aimed at large-scale phylogenetic analysis and inference. Scalability ...
  117. [117]
    Harnessing machine learning to guide phylogenetic-tree search ...
    Mar 31, 2021 · Reviews & Analysis ... Thus, all current algorithms for phylogenetic tree reconstruction use various heuristics to make tree inference feasible.
  118. [118]
    The Newick tree format - GitHub Pages
    The Newick Standard for representing trees in computer-readable form makes use of the correspondence between trees and nested parentheses.
  119. [119]
    Tree Formats - Evolution and Genomics
    Phylogenetic trees are commonly saved in two formats: Newick and NEXUS. Newick is used by many programs, while NEXUS includes other commands.Missing: specification | Show results with:specification
  120. [120]
    The NEXUS file format - Paul O. Lewis Lab Home
    NEXUS data files always begin with the characters #NEXUS but are otherwise organized into major units known as 'blocks'.
  121. [121]
    NEXUS: AN EXTENSIBLE FILE FORMAT FOR SYSTEMATIC ...
    —The file format ideally en- compasses all information a systematist or phylogenetic biologist might wish to use, including character and distance data, ...
  122. [122]
    phyloXML: XML for evolutionary biology and comparative genomics
    Oct 27, 2009 · Here we describe phyloXML, a new standardized format for phylogenetic documents that is based on the formal language of XML [11] and which is ...
  123. [123]
    NeXML: Rich, Extensible, and Verifiable Representation of ...
    The phylogenetics community has recognized XML as a potential data syntax: for example, in Inferring Phylogenies, Felsenstein makes some XML syntax suggestions ...
  124. [124]
    NeXML — ETE Toolkit - analysis and visualization of trees
    NeXML(http://nexml.org) is an exchange standard for representing phyloinformatic data inspired by the commonly used NEXUS format, but more robust and easier to ...
  125. [125]
    Sharing and re-use of phylogenetic trees (and associated data) to ...
    Oct 22, 2012 · Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data.
  126. [126]
    Best Practices for Data Sharing in Phylogenetic Research - PMC
    Jun 19, 2014 · In this paper, we provide ten “simple rules” that we view as best practices for data sharing in phylogenetic research.
  127. [127]
    Reconstructing trees: Cladistics - Understanding Evolution
    in other words, a method of reconstructing evolutionary trees.A step by step method · Parsimony · A simple example
  128. [128]
    23.3: Systematics and Classification - Biology LibreTexts
    Dec 3, 2021 · A phylogenetic tree is a diagram used to reflect evolutionary relationships among organisms or groups of organisms.<|separator|>
  129. [129]
    Phylogenetic Systematics
    While classification is primarily the creation of names for groups, systematics goes beyond this to elucidate new theories of the mechanisms of evolution.Missing: development | Show results with:development
  130. [130]
    Using trees for classification - Understanding Evolution - UC Berkeley
    Phylogenetic classification uses trees to name clades based on evolutionary history, unlike Linnaean which ranks groups, and does not attempt to rank organisms.
  131. [131]
    2.4 Phylogenetic Trees and Classification - Digital Atlas of Ancient Life
    On a phylogenetic tree, a monophyletic group includes a node and all of the descendants of that node, represented by both nodes and terminal taxa. Thus, a ...
  132. [132]
    Phylogeny, Taxonomy, and Nomenclature - a Primer - AmphibiaWeb
    Phylogenetic trees show the hypothesized relationships among species as a branching pattern of ancestors and descendants. In these trees, lineages (branches) ...
  133. [133]
    7.7: Phylogeny and Cladistics - Biology LibreTexts
    Sep 24, 2022 · Phylogenetic trees are diagrams used to reflect evolutionary relationships among organisms or groups of organisms.
  134. [134]
    Phylogenomic comparative methods: Accurate evolutionary ... - PNAS
    Phylogenetic comparative methods allow for the study of trait evolution between species by accounting for their shared evolutionary history.
  135. [135]
    Ancestral Reconstruction - PMC - PubMed Central
    Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals (or populations) to their common ancestors.
  136. [136]
    Reconstructing ancestral character states: a critical reappraisal
    Using parsimony to reconstruct ancestral character states on a phylogenetic tree has become a popular method for testing ecological and evolutionary hypotheses.
  137. [137]
    Felsenstein's "Phylogenies and the Comparative Method" - PubMed
    Apr 23, 2019 · Independent contrasts enabled comparative biologists to avoid the statistical dilemma of nonindependence of species values, arising from shared ...
  138. [138]
    Phylogenetically independent contrasts - Mike's Biostatistics Book
    Felsenstein (1985, 1988), largely credited for making the argument that Type I error likely if phylogeny ignored, and, importantly, provided an algorithm, ...
  139. [139]
    4.2: Estimating Rates using Independent Contrasts
    Feb 19, 2022 · Independent contrasts summarize the amount of character change across each node in the tree, and can be used to estimate the rate of character change across a ...
  140. [140]
  141. [141]
    Phylogenetic comparative methods improve the selection of ... - Nature
    Oct 22, 2019 · Phylogenetic comparative methods help to evaluate the convergent evolution of a given morphological character, thus enabling the discovery of ...Results · Character Evolution · Discussion<|separator|>
  142. [142]
    Phylogenetic comparative methods - ScienceDirect
    May 8, 2017 · Phylogenetic comparative methods (PCMs) enable us to study the history of organismal evolution and diversification.
  143. [143]
    Phylogenomics - PubMed
    Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes.
  144. [144]
    Next-generation phylogenomics | Biology Direct | Full Text
    Jan 22, 2013 · Phylogenomics – the study of evolutionary relationships based on comparative analysis of genome-scale data – has so far been developed as ...<|separator|>
  145. [145]
    Large-scale reconstruction and phylogenetic analysis of metabolic ...
    Phylogenetic analysis of the seed sets reveals the complex dynamics governing gain and loss of seeds across the phylogenetic tree and the process of transition ...
  146. [146]
    Phylogenomics — principles, opportunities and pitfalls of big‐data ...
    Dec 16, 2019 · Phylogenetics is the science of reconstructing the evolutionary history of life on Earth. Traditionally, phylogenies were constructed using morphological data ...<|control11|><|separator|>
  147. [147]
    A simple, fast, and accurate method of phylogenomic inference - PMC
    In this paper, we introduce AMPHORA (a pipeline for AutoMated PHylogenOmic infeRence) and demonstrate two significant applications: building a genome tree from ...
  148. [148]
    Accurate, scalable, and fully automated inference of species trees ...
    we developed ROADIES—an automated, scalable, and user-friendly tool that infers species trees directly from genome assemblies ...
  149. [149]
    Recent progress on methods for estimating and updating large ...
    Aug 22, 2022 · Large-scale phylogeny estimation presents substantial computational and statistical challenges: the most accurate methods are often ...Introduction · Recent advances in species... · Recent advances in updating...
  150. [150]
    The challenge of constructing large phylogenetic trees - ScienceDirect
    A final challenge is posed by the difficulty of visualizing and making inferences from trees that might soon routinely contain thousands of species.
  151. [151]
    Recent progress on methods for estimating and updating large ...
    Multiple sequence alignment (a precursor to phylogeny estimation) is also challenging, especially on large datasets that have high rates of evolution.
  152. [152]
    Inference of phylogenetic trees directly from raw sequencing reads ...
    Apr 20, 2023 · We present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference.
  153. [153]
    A Practical Guide to Design and Assess a Phylogenomic Study
    A practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic ...
  154. [154]
    Reconstructing patterns of reticulate evolution in plants - PMC - NIH
    Phylogenetic trees are the main tool for representing evolutionary relationships among biological entities at the level of species and above. Biologists, ...
  155. [155]
    Impact of Reticulate Evolution on Genome Phylogeny
    Abstract. Genome phylogenies are used to build tree-like representations of evolutionary relationships among genomes. However, in condensing the phylogenet.
  156. [156]
    Inferring Horizontal Gene Transfer - PMC - NIH
    (2) Phylogenetic approaches rely on the differences between genes and species tree evolution that result from HGT. Explicit phylogenetic methods reconstruct ...
  157. [157]
    The impact of long-distance horizontal gene transfer on prokaryotic ...
    Horizontal gene transfer (HGT) is one of the most dominant forces molding prokaryotic gene repertoires. These repertoires can be as small as ≈200 genes in ...<|separator|>
  158. [158]
    impact of HGT on phylogenomic reconstruction methods
    Aug 18, 2012 · We test the accuracy of supertree and supermatrix approaches in recovering the true organismal phylogeny under increased amounts of horizontally transferred ...
  159. [159]
    Reticulate evolution: Detection and utility in the phylogenomics era
    We present a brief overview of a phylogenomic workflow for inferring organismal histories and compare methods for distinguishing modes of reticulate evolution.Review · 2. Overview Of A... · 3. Reticulate Evolution...
  160. [160]
    Pervasive incomplete lineage sorting illuminates speciation and ...
    Jun 2, 2023 · Incomplete lineage sorting generates gene trees that are incongruent with the species tree. Incomplete lineage sorting has been described in ...
  161. [161]
    The Prevalence and Impact of Model Violations in Phylogenetic ...
    These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model ...
  162. [162]
    When phylogenetic assumptions are violated: base compositional ...
    Jun 7, 2010 · When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, ...
  163. [163]
    How Well Does Your Phylogenetic Model Fit Your Data?
    However, within the phylogenetic context most of these assumptions are often violated. We know that independence of sites is rarely given considering the ...
  164. [164]
    Confronting Sources of Systematic Error to Resolve Historically ...
    Systematic errors result from an inadequate modeling of methodological factors (e.g., incorrect model selection, poor orthology inference, biased taxon or gene ...
  165. [165]
    Long Branch Attraction Biases in Phylogenetics - Oxford Academic
    Feb 2, 2021 · Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood.Abstract · Small Sample Biases of Tree... · Partitioning Results and Bias...
  166. [166]
    Article Systematic errors in orthology inference and their effects on ...
    Feb 19, 2021 · Our results show that the errors in orthology are far from random but are strongly correlated with phylogenetic relationships of the species in ...
  167. [167]
    Variation Across Mitochondrial Gene Trees Provides Evidence for ...
    Increasing data set size may reduce stochastic error, but it can also exacerbate systematic error and lead to high confidence in erroneous phylogenies ( ...
  168. [168]
    Extensive Genome-Wide Phylogenetic Discordance Is Due to ...
    It is well known that identifying the genomic footprints of gene flow is difficult in the presence of incomplete lineage sorting (ILS) among the diversifying ...
  169. [169]
    Incomplete lineage sorting and hybridization underlie tree ...
    We inferred that gene tree discordance within genera is linked to hybridization events along with high levels of ILS due to their rapid diversification.Incomplete Lineage Sorting... · 3. Results · 4. DiscussionMissing: phylogenetic | Show results with:phylogenetic
  170. [170]
    Plesiomorphic character states cause systematic errors in molecular ...
    Jul 20, 2015 · The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set.<|separator|>
  171. [171]
    [PDF] The Sources of Phylogenetic Conflicts - HAL
    Apr 10, 2020 · In order to avoid strongly supported but incorrect inferences driven by systematic error, use of appropriate phylogenetic methods accounting for ...
  172. [172]
    Bayesian selection of misspecified models is overconfident and may ...
    Feb 5, 2018 · Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees | PNAS.
  173. [173]
    [PDF] Phylogenetic Networks: Modeling, Reconstructibility, and Accuracy
    Abstract—Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur.
  174. [174]
    Reconstructible Phylogenetic Networks: Do Not Distinguish the ...
    Apr 7, 2015 · Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation ...
  175. [175]
    “Normal” phylogenetic networks may be emerging as the leading class
    Aug 14, 2025 · 3. Phylogenetic network representations of a horizontal gene transfer (HGT) on the left, and a hybridization, on the right.
  176. [176]
    A Phylogenetic Networks perspective on reticulate human evolution
    Apr 23, 2021 · We present a methodological phylogenetic reconstruction approach combining Maximum Parsimony and Phylogenetic Networks methods for the study of human evolution.
  177. [177]
    Application of Phylogenetic Networks in Evolutionary Studies
    This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they ...Abstract · Introduction · Terminology · Background
  178. [178]
    [PDF] Reconstructing Reticulate Evolution in Species – Theory and Practice
    Our first method is a polynomial time algorithm for constructing phylogenetic networks from two gene trees contained inside the network. We allow the network to ...
  179. [179]
    How Much Information is Needed to Infer Reticulate Evolutionary ...
    Abstract. Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories.Abstract · Phylogenetic Trees And... · Main Results
  180. [180]
    Phylogenetic networks empower biodiversity research - PNAS
    Jul 28, 2025 · Reticulate evolution has long been recognized as a key mechanism that contributes to genetic and trait diversity.
  181. [181]
    Reticulate evolutionary history and extensive introgression in ... - NIH
    In this work, we reanalyse the Anopheles data using a recently devised framework that combines the multispecies coalescent with phylogenetic networks.
  182. [182]
    Ultrafast learning of four-node hybridization cycles in phylogenetic ...
    Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants.Abstract · Introduction · Materials and methods · Results
  183. [183]
    Phylogenetic supertrees: Assembling the trees of life - ScienceDirect
    Supertrees are estimates of phylogeny assembled from sets of smaller estimates (source trees) sharing some but not necessarily all their taxa in common.
  184. [184]
    Polynomial Supertree Methods Revisited - PMC
    Supertree methods allow to reconstruct large phylogenetic trees by combining smaller trees with overlapping leaf sets into one, more comprehensive supertree ...
  185. [185]
    Speeding up iterative applications of the BUILD supertree algorithm
    The Build algorithm is a recursive algorithm that determines if a set of rooted triplets of the form x, y|•z are jointly compatible (Aho et al., 1981). Here, we ...
  186. [186]
    Robinson-Foulds Supertrees - PMC - PubMed Central
    Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input ...
  187. [187]
    Supertrees join the mainstream of phylogenetics - ScienceDirect
    Supertree methods are fairly widely used to build comprehensive phylogenies for particular groups, but concerns remain over the adequacy of existing approaches.
  188. [188]
    Properties of Consensus Methods for Inferring Species Trees ... - NIH
    Consensus trees are used to summarize a set of trees defined on the same set of taxa. A consensus algorithm takes the trees as input, so that the method of ...
  189. [189]
    [PDF] Consensus trees and tree support - UMD Geology
    A common practice is to combine trees through consensus methods of different taxa inhabiting the same areas of the world to check for congruence and infer ...
  190. [190]
    Visualizing incompatibilities in phylogenetic trees using consensus ...
    May 31, 2023 · A consensus tree is often used to summarize what the trees have in common. Consensus networks were introduced to also allow the visualization of ...
  191. [191]
    Do we still need supertrees? - BMC Biology
    Feb 27, 2012 · In the supertree approach, phylogenetic trees are reconstructed for each of the five genes. The resulting source trees are recoded using the ...
  192. [192]
    Performance of flip supertree construction with a heuristic algorithm
    Supertree methods are used to assemble separate phylogenetic trees with shared taxa into larger trees (supertrees) in an effort to construct more ...
  193. [193]
  194. [194]
    A supertree pipeline for summarizing phylogenetic and taxonomic ...
    Mar 1, 2017 · We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves.
  195. [195]
    Embedding gene trees into phylogenetic networks by conflict ...
    May 19, 2022 · Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, ...
  196. [196]
    (PDF) The Coral of Life - ResearchGate
    This paper is supplemented with a figure, The Coral of Life (CoL), which is, to the author's knowledge, the first attempt to combine all of the above features ...
  197. [197]
    Branching Silhouettes—Corals, Cacti, and the Oaks - Oxford Academic
    Mar 11, 2017 · The tree of life should perhaps be called the coral of life, base of branches dead; so that passages cannot be seen. Darwin (1837–1838): ...
  198. [198]
    Intertwining phylogenetic trees and networks - Schliep - 2017
    Mar 7, 2017 · Visually compare trees and networks by identifying shared or exclusive branches or edges between trees and networks constructed from the same dataset.
  199. [199]
    Integrated phylogenomic analyses unveil reticulate evolution in ...
    Oct 28, 2022 · Integrated phylogenomic analyses unveil reticulate evolution in Parthenocissus (Vitaceae), highlighting speciation dynamics in the Himalayan–Hengduan Mountains.
  200. [200]
    EasyCGTree: a pipeline for prokaryotic phylogenomic analysis ...
    Oct 14, 2023 · In this study, we introduced EasyCGTree, which is a user-friendly and cross-platform Perl-language (https://www.perl.org/) tool, for ...<|separator|>
  201. [201]
    Efficient phylogenetic tree inference for massive taxonomic datasets
    Aug 8, 2024 · The Nexus format allows for storing sequences and the initial tree within a single file. ... phylogenetic tree imposes significant ...<|separator|>
  202. [202]
    Rethinking large-scale phylogenomics with EukPhylo v.1.0, a ...
    Aug 27, 2025 · EukPhylo v.1.0 is a flexible and modular pipeline that enables efficient phylogenomic analysis of eukaryotes and includes phylogeny-informed ...Missing: big | Show results with:big
  203. [203]
    Four new pipelines to streamline and improve genomic analyses
    Sep 17, 2024 · A new turn-key pipeline called OrthoPhyl has answered the call to improve the phylogenetic analysis of bacterial genomes. Developed by ...
  204. [204]
    Integrating morphology and phylogenomics supports a terrestrial ...
    Jan 29, 2019 · Tree topology integrates over a number of historical phylogenies, based on morphological data and/or Sanger-sequenced loci. (Right) Phylogenomic ...
  205. [205]
    Do morphometric data improve phylogenetic reconstruction? A ...
    Oct 18, 2024 · The inclusion of continuous morphological data did not improve phylogeny reconstruction in terms of the number of resolved nodes and their ...
  206. [206]
    Integrating Deep Learning Derived Morphological Traits and ...
    In this paper, we explore combining molecular data with deep learning derived morphological traits from images of pinned insects to generate total-evidence ...Our Contributions · Molecular Data · Results<|separator|>
  207. [207]
    Protein Structural Phylogenetics | Genome Biology and Evolution
    Aug 21, 2025 · Protein Structural Phylogenetics is a branch of molecular evolution that examines evolutionary relationships using the 3D structure of proteins.Missing: phenotypic | Show results with:phenotypic
  208. [208]
  209. [209]
    multistrap: boosting phylogenetic analyses with structural information
    Jan 15, 2025 · Phylogenetic tree topologies are only considered conclusive when complemented with branch support values—commonly estimated by Felsenstein ...
  210. [210]
    [PDF] Applications of Machine Learning in Phylogenetics - EcoEvoRxiv
    Supervised machine learning approaches that rely on simulated training data have been used to infer tree topologies and branch lengths, to select substitution ...<|separator|>
  211. [211]
    Applications of machine learning in phylogenetics - ScienceDirect.com
    Machine learning approaches have been applied to substitution model selection and inferences of discordance, introgression, and diversification rates.
  212. [212]
    Phylogenetic inference using generative adversarial networks - PMC
    We developed phyloGAN, a GAN that infers phylogenetic relationships among species. phyloGAN takes as input a concatenated alignment, or a set of gene alignments ...
  213. [213]
    Fusang: a framework for phylogenetic tree inference via deep learning
    Oct 11, 2023 · Recently, deep learning (DL) has been successfully applied to quartet phylogenetic tree inference and tentatively extended into more sequences ...
  214. [214]
    Accurate and efficient phylogenetic inference through end ... - bioRxiv
    Oct 2, 2025 · Tree decoder: Tree decoder aims to decode a phylogenetic tree from the representations of species calculated by sequence encoder. Tree decoder ...
  215. [215]
    Phylogenetic Reconstruction Using Reinforcement Learning
    This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.Introduction · Results · Discussion · Materials and Methods
  216. [216]
    PhyloTune: An efficient method to accelerate phylogenetic updates ...
    Jul 26, 2025 · In this study, we introduce a new solution to accelerate the integration of novel taxa into an existing phylogenetic tree using a pretrained DNA ...
  217. [217]
    Parameter Estimation from Phylogenetic Trees Using Neural ...
    Sep 3, 2025 · In cases where MLE is unavailable, our neural network method provides a promising alternative for estimating phylogenetic tree parameters. If ...
  218. [218]
    [PDF] A critical evaluation of deep-learning based phylogenetic inference ...
    Jan 15, 2025 · Recent efforts have aimed to apply deep neural networks. (DNNs) to phylogenetics, with a growing number of applications in tree reconstruction ( ...
  219. [219]
    [PDF] An Accurate Deep Learning Framework for Phylogenetic Tree ...
    Apr 8, 2025 · These predicted distances are then used in NJ to reconstruct phylogenetic trees. Phyloformer inte- grates information from the entire MSA ...