Fact-checked by Grok 2 weeks ago

Phylogenetics

Phylogenetics is the scientific discipline dedicated to reconstructing the evolutionary history and interrelationships among biological taxa, such as , populations, or genes, by analyzing shared derived characteristics, with a primary focus on genetic and morphological data to infer patterns of descent from common ancestors. This field employs computational methods to generate phylogenetic trees, branching diagrams that hypothesize the sequence and timing of evolutionary divergences, enabling insights into , , and processes. Key methodologies in phylogenetics include distance-based approaches, which compute evolutionary distances from sequence similarities; maximum parsimony, which seeks the tree requiring the fewest evolutionary changes; maximum likelihood, which evaluates trees based on probabilistic models of sequence evolution; and , which incorporates prior probabilities to estimate posterior distributions of trees. These techniques have evolved significantly since the mid-20th century, transitioning from morphological comparisons to following the elucidation of DNA structure in and the advent of sequencing technologies, culminating in landmark discoveries like Carl Woese's 1977 proposal of the (, , Eukarya) based on analyses. Phylogenetics underpins comparative biology by providing a framework for identifying homologous traits, predicting functional similarities, and tracing pathogen evolution, with applications in conservation genetics, , and . Despite methodological advances, challenges persist, including handling incomplete lineage sorting, , and long-branch attraction artifacts that can mislead tree topologies, necessitating multifaceted and model validation for robust inferences.

Fundamentals

Definition and Principles

Phylogenetics is the branch of concerned with reconstructing the evolutionary and relationships among organisms, populations, or genes based on shared heritable characteristics, such as molecular sequences or morphological traits. These relationships are inferred from patterns of similarity and divergence, assuming that closer relatives share more recent common ancestors, and are commonly visualized as phylogenetic trees—diagrammatic models depicting branching sequences of descent. The field emphasizes empirical data over speculative narratives, prioritizing observable traits that reflect historical contingencies rather than functional convergence alone. Central to phylogenetics are principles of and branching evolution (), which posit that lineages diverge through events, forming hierarchical clusters of monophyletic groups—clades—defined by shared derived traits (synapomorphies) inherited from a last common ancestor. , the similarity due to , is distinguished from (convergent or ), with inference methods seeking that minimize ad hoc assumptions of the latter to explain observed data. Outgroups, taxa presumed external to the ingroup of interest, by identifying ancestral states, enabling rooted phylogenies that orient branches toward the past. These principles underpin hypothesis testing in phylogenetics, where multiple trees are evaluated against data using criteria like parsimony (favoring the tree requiring fewest evolutionary changes) or likelihood (maximizing the probability of observing the data under a specified model of character evolution). While early approaches relied on morphological evidence, modern phylogenetics increasingly incorporates molecular data, such as DNA or protein sequences, to resolve deep divergences, though both require rigorous alignment and correction for rate heterogeneity to avoid artifacts like long-branch attraction. Validity rests on falsifiability: predictions of shared traits in undiscovered relatives or congruence across independent datasets.

Relation to Systematics and Taxonomy

Phylogenetics provides the methodological foundation for reconstructing evolutionary histories, which directly informs —the broader study of organismal diversity, including patterns of descent and differentiation among taxa. In , phylogenetic trees serve as hypotheses of common ancestry, enabling the identification of monophyletic groups (clades) defined by shared derived characteristics (synapomorphies) rather than overall similarity. This approach contrasts with earlier phenetic methods that prioritized phenotypic resemblance without explicit reference to ancestry, highlighting phylogenetics' role in shifting toward evidence-based evolutionary inference. Taxonomy, the practice of naming, describing, and classifying organisms into hierarchical categories, increasingly relies on phylogenetic data to ensure classifications reflect and minimize paraphyletic or polyphyletic groupings. For instance, has prompted revisions in taxonomic ranks, such as reclassifying protists and fungi based on genomic evidence of deep divergences, ensuring Linnaean categories align with branching patterns in trees of life. Willi Hennig's foundational work in Phylogenetic Systematics (original German 1950; English 1966) established as the standard, arguing that only groups stemming from a common ancestor exclusive to them warrant taxonomic recognition, a principle now integral to codes like the . Despite this integration, challenges persist: incomplete taxon sampling or conflicting data (e.g., from versus molecules) can lead to unstable classifications, underscoring phylogenetics' probabilistic nature in . Integrative taxonomy, combining phylogenetic with ecological and morphological data, addresses these by prioritizing robust clades over rigid ranks, as seen in recent fungal phylogenies resolving major phyla like and . This reflects phylogenetics' causal emphasis on descent with modification, privileging empirical trees over pre-Darwinian typological schemes.

Methods of Inference

Data Sources and Preparation

Phylogenetic analyses primarily rely on two categories of data: morphological and molecular. Morphological data consist of discrete or continuous traits derived from organismal structures, such as anatomical features, fossil imprints, or developmental patterns, which are coded into character states for analysis. These data have historically underpinned systematics but are prone to subjective homology assessments and limited by preservation biases in fossils. Molecular data, encompassing nucleotide sequences from DNA (e.g., mitochondrial or nuclear genes), amino acid sequences from proteins, and large-scale genomic datasets like whole-genome assemblies or transcriptomes, dominate contemporary phylogenetics due to their abundance, reproducibility, and capacity to resolve deep divergences through phylogenomics, which integrates thousands of loci. For instance, 16S rRNA genes serve as standard markers for microbial phylogenies, while multi-locus datasets enable species-tree inference beyond concatenation biases. Hybrid approaches combining both data types enhance congruence and address incongruences arising from incomplete lineage sorting or convergent evolution. Data preparation begins with collection and curation to ensure and quality. Molecular sequences are typically retrieved from public repositories like or Ensembl, or generated via high-throughput sequencing, followed by orthology identification using tools such as OrthoMCL or reciprocal to select paralog-free loci across taxa. Repetitive elements and low-complexity regions are masked (e.g., via RepeatMasker), and genomes are annotated for content to facilitate ortholog extraction. Morphological data preparation involves selecting informative characters, scoring them as binary (presence/absence) or multistate, and mitigating ascertainment bias by including autapomorphies only if they inform branching patterns. A critical step is (MSA) for molecular data, which posits positional by arranging residues to minimize gaps and mismatches. algorithms, such as those in MAFFT (using FFT-NS-2 strategy for accuracy) or MUSCLE (with iterative refinement), are standard, achieving alignments of thousands of sequences in hours on modern hardware. For divergent sequences, profile-based methods like incorporate secondary structure predictions to improve accuracy. Post-alignment, preparation includes trimming ambiguous ends (e.g., via trimAl), removing poorly aligned or highly variable sites to reduce noise (thresholds often set at 20-50% gaps), and filtering recombinant or saturated sites using tests like PhiPack or pairwise distances. In phylogenomics, data are partitioned by gene or codon position to account for rate heterogeneity, with missing data tolerated up to 50% per in robust methods. Morphological matrices undergo similar scrutiny, scoring missing or inapplicable states explicitly to avoid pseudosignal. These steps mitigate artifacts like alignment-induced long-branch attraction, ensuring datasets support reliable tree .

Tree Construction Algorithms

Phylogenetic tree construction algorithms infer evolutionary relationships among taxa by optimizing specific criteria based on molecular or morphological data. These methods broadly fall into distance-based approaches, which summarize pairwise dissimilarities into a before clustering, and character-based approaches, which directly evaluate evolutionary changes at individual sites. Distance-based methods are computationally efficient and suitable for large datasets but assume additive distances and can be sensitive to rate heterogeneity, while character-based methods incorporate explicit evolutionary models for greater statistical rigor, though they demand more computational resources. Distance-based algorithms begin by converting aligned sequences into a pairwise using metrics such as the Jukes-Cantor model for substitutions, which corrects for multiple hits. The Unweighted Pair Group Method with Arithmetic Mean (), developed in , builds ultrametric trees by iteratively merging the pair of taxa or clusters with the smallest average distance, assuming a constant evolutionary rate () across lineages; this assumption often leads to inaccuracies when rates vary, as it forces equal branch lengths from root to tips. In contrast, Neighbor-Joining (NJ), introduced by Saitou and Nei in , relaxes the clock assumption by selecting neighbors that minimize total branch length through a corrected distance formula Q_{ij} = (n-2)D_{ij} - \sum_k D_{ik} - \sum_k D_{jk}, where n is the number of and D denotes distances; NJ produces additive trees and performs well under unequal rates but remains and sensitive to distance estimation errors. Both methods scale poorly with number due to the initial O(n^2) matrix but enable rapid inference for exploratory analyses. Character-based methods evaluate trees directly from aligned character states, such as nucleotides or amino acids, without intermediate distance summarization. Maximum Parsimony (MP) seeks the tree requiring the fewest evolutionary changes (steps) to explain the data, formalized by Cavalli-Sforza and Edwards in 1967 and popularized by Farris in the 1970s; it employs branch-and-bound or heuristic searches like stepwise addition to navigate the vast tree space, but long-branch attraction can bias results toward grouping fast-evolving taxa artifactually, as MP lacks explicit models of substitution rates or multiple hits. Maximum Likelihood (ML), rooted in Neyman’s 1971 framework and advanced by Felsenstein in 1981, maximizes the probability of observing the data under a specified stochastic model (e.g., GTR + Γ for rate heterogeneity) via algorithms like pruned exhaustive search or hill-climbing heuristics; ML accounts for site-specific rates and branch lengths, yielding more robust inferences but requiring intensive computation, often mitigated by parallelization in software like RAxML, which reported trees for over 1,000 taxa in hours on 1980s hardware equivalents. Controversially, simulations show ML outperforming parsimony under complex models, though both can converge on incorrect topologies if taxon sampling misses intermediates to break long branches. Bayesian methods extend ML by incorporating prior probabilities on trees, parameters, and topologies, sampling the posterior distribution P(\theta | D) \propto P(D | \theta) P(\theta) via (MCMC), as implemented in MrBayes since 2001; this yields credible sets of trees with , avoiding single-point estimates, but chains must run for millions of generations to achieve convergence, assessed by metrics like the average standard deviation of split frequencies below 0.01. , released in 2002, integrates relaxed clock models for time-calibrated trees, enabling divergence time estimates from fossil-calibrated priors. These probabilistic approaches mitigate overfitting in sparse but risk bias from subjective priors, with empirical studies favoring them for resolving polytomies in ancient divergences. Overall, algorithm choice depends on scale and model fit, with hybrid approaches like quartet-based methods emerging for divide-and-conquer efficiency in supermatrices exceeding 10,000 taxa.

Model Selection and Evaluation

In phylogenetics, model selection entails identifying the that most adequately describes the evolutionary process underlying the data, as this directly influences the accuracy of tree inference under likelihood-based methods such as maximum likelihood (ML) and Bayesian approaches. Common models range from simple ones like Jukes-Cantor (JC69), which assumes equal substitution rates, to complex ones like the general time-reversible (GTR) model with gamma-distributed rate variation (+G) and invariant sites (+I). Selection is typically guided by information-theoretic criteria that balance model fit and : the (AIC) computes as AIC = -2 ln L + 2k, where ln L is the log-likelihood and k is the number of parameters, favoring models with better fit even if more parameterized; the (BIC) uses BIC = -2 ln L + k ln n, imposing a stronger penalty on as length n grows, thus often selecting simpler models. Hierarchical likelihood ratio tests (hLRT) compare nested models via likelihood ratios, while tools like ModelFinder integrate these with branch-length testing for efficiency. Empirical studies show AIC tends toward overparameterization compared to BIC, but both outperform arbitrary model choice, though recent analyses question the necessity of exhaustive selection when starting from highly parameterized models like GTR+G+I. Model adequacy is assessed post-selection using frequentist or Bayesian posterior predictive checks to detect misspecification, such as unmodeled rate heterogeneity, which can bias branch lengths and . For protein sequences, models incorporate empirical exchange matrices (e.g., , ) alongside site-specific rate profiles. Software implementations like IQ-TREE and ModelTest-NG automate selection, with comparisons revealing minor differences in chosen models across programs but consistent impacts under misspecification. Tree evaluation quantifies support and topological robustness, primarily via non-parametric , which resamples columns with replacement to generate pseudoreplicates, then reports the proportion (0-100%) supporting each in the original . Values above 70-95% indicate strong , depending on context, as reflects data variability under the fitted model. Bayesian posterior probabilities (), derived from () sampling of trees post-burn-in, represent the probability of a clade given the data, model, and priors; values exceed 0.95 are often deemed robust but tend to inflate confidence relative to bootstraps, especially on short branches or topologies, due to MCMC exploration averaging over uncertainty. studies confirm bootstraps provide conservative error estimates, while can overcredibly support incorrect clades under model violation. Additional tests include the approximately unbiased () test for topology comparison and scores for branch , enhancing evaluation beyond single metrics.

Effects of Taxon Sampling and Long Branch Attraction

Taxon sampling, encompassing both the number and selection of operational taxonomic units (OTUs) in a phylogenetic , profoundly influences the reliability of inferred evolutionary relationships. Sparse sampling risks omitting key intermediate , which can distort branch lengths and foster topological inaccuracies by failing to capture fine-scale evolutionary divergence. Empirical simulations and empirical datasets, such as those using rbcL gene sequences for seed , reveal that augmenting taxon density—through strategic addition of representatives across clades—typically boosts topological accuracy by interrupting long branches and diluting the effects of from substitution saturation. However, gains plateau when length remains fixed, as scales superexponentially with taxon count under methods like maximum , potentially yielding without commensurate data expansion. Long branch attraction (LBA), first articulated in the context of -based inference, manifests as an artifactual clustering of distantly related lineages exhibiting elevated rates, driven by the L-shaped decay of phylogenetic signal under metrics or scores. This bias stems from multiple hits at saturated sites, where convergent homoplasies inflate apparent similarities between fast-evolving taxa, overriding synapomorphies; under simple models assuming constant , such lineages appear erroneously proximate, as true distances are underestimated proportionally more for longer branches. LBA predominates in and unpartitioned analyses but persists subtly in model-based approaches lacking rate variation parameters, with simulations showing inconsistency risks escalating when heterogeneous evolutionary align with sparse ingroup-outgroup contrasts. The interplay between taxon sampling and LBA is causal: undersampled datasets amplify LBA by permitting unchecked elongation of branches via extinct or unsampled intermediates, concentrating and eroding , as evidenced in analyses of metazoan and fungal phylogenies where poor density masked true clades like Porifera's basal position. Conversely, targeted dense sampling—prioritizing rate-heterogeneous subclades—subdivides problematic branches, restores signal-to-noise ratios, and aligns inferences closer to reference trees, though over-sampling without model refinement can inadvertently homogenize branch lengths insufficiently. Mitigation demands multifaceted strategies: incorporating site-specific rate heterogeneity (e.g., via Γ-distributions or site removal), employing likelihood or Bayesian frameworks resilient to asymmetry, and validating via taxon-jackknifing or spectral signal analysis to detect attraction-prone quartets. These approaches, validated across datasets like and mtDNA, underscore that LBA's prevalence reflects modeling inadequacies rather than irreducible noise, with dense, representative sampling serving as a foundational corrective.

Historical Development

Early Conceptual Foundations

The conceptual foundations of phylogenetics trace back to the mid-19th century, when naturalists began representing organismal relationships as branching structures indicative of shared ancestry rather than static hierarchies. Charles Darwin's 1837 private notebook contained the first known sketch of a branching evolutionary , illustrating divergence from common ancestors through with modification. This idea was formalized in his 1859 book , where an abstract tree depicted how could produce the diversity of life from ancestral forms, emphasizing that "the affinities of all beings towards each other are due to their from common progenitors." Prior to Darwin, figures like Edward Hitchcock proposed tree-like charts in 1840 to organize fossil strata and life forms, but these were non-evolutionary, portraying a created without . In contrast, 's framework introduced causal mechanisms—variation, inheritance, and selection—grounding the tree in empirical observations of geographical distribution, , and . German paleontologist Heinrich Georg Bronn, in his 1859 translation of Darwin's work, incorporated tree diagrams influenced by pre-Darwinian ideas of progressive development, influencing subsequent thinkers. Ernst Haeckel advanced these concepts decisively in 1866 with Generelle Morphologie der Organismen, coining the term "phylogeny" (from Greek phylon meaning tribe or race, and genesis meaning origin) to denote the evolutionary history and genealogical tree of organisms. Haeckel constructed the first explicit Darwinian phylogenetic trees, branching from a single root and incorporating embryological and morphological data to reconstruct ancestral lineages, though his reconstructions often blended empirical evidence with speculative scala naturae progressions. These early trees laid the groundwork for viewing classification as reflective of historical genealogy rather than ideal types, despite limitations in data and methods predating genetics.

Rise of Cladistics and Molecular Phylogenetics

Cladistics, formalized by German entomologist Willi Hennig, emphasized reconstructing evolutionary relationships through monophyletic clades defined by shared derived characters (synapomorphies), distinguishing it from earlier evolutionary and phenetic approaches that prioritized overall similarity or ancestral traits. Hennig outlined these principles in his 1950 book Grundzüge einer Theorie der phylogenetischen Systematik, which argued for parsimony in tree-building by minimizing ad hoc assumptions about character evolution. The English translation, Phylogenetic Systematics, published in 1966, facilitated wider adoption amid debates with phenetics, a numerical taxonomy method dominant in the 1950s and 1960s that clustered taxa based on shared traits without inferring ancestry. By the 1970s, cladistics gained prominence through proponents like Lars Brundin, who applied it to insect and biogeographic studies, and computational tools enabling parsimony analysis of morphological data. This shift challenged evolutionary taxonomy's inclusion of paraphyletic groups, prioritizing testable hypotheses of common descent over subjective weighting of characters. Institutions such as the Willi Hennig Society, founded in 1979, further institutionalized cladistic methods, fostering rigorous debate and standardization in systematics. Parallel to cladistics' ascent, emerged in the 1960s with protein sequence comparisons, as Émile Zuckerkandl and analyzed and to infer divergence times via a "" assuming constant rates. Their 1965 paper posited molecules as "documents of evolutionary history," providing heritable, quantifiable independent of . Early applications, like Emanuel Margoliash's 1963 cytochrome c trees, demonstrated phylogenetic signals in differences, though limited by manual sequencing. The 1980s marked explosive growth in , driven by Frederick Sanger's dideoxy chain-termination method (1977), which scaled , and Kary Mullis's (PCR, patented 1985, commercialized late 1980s), amplifying specific loci for analysis. These tools generated datasets of (rRNA) and , enabling distance-based (e.g., neighbor-joining) and maximum-likelihood methods to construct trees, often validating or refuting cladistic hypotheses from . By the late 1980s, molecular data's abundance addressed ' reliance on scarce morphological traits, though debates arose over alignment ambiguities and rate heterogeneity violating clock assumptions. This synergy propelled phylogenetics toward data-driven inference, with software like (1980s) integrating cladistic with molecular models.

Computational and Bayesian Revolutions

The computational revolution in phylogenetics emerged in the 1970s and 1980s as digital computers enabled algorithmic inference of evolutionary trees from molecular sequences, overcoming the limitations of manual cladistic methods that were constrained to small datasets. Early software packages, such as Joseph Felsenstein's PHYLIP suite released in 1980, implemented distance-matrix methods like UPGMA and parsimony-based tree searches, allowing systematic evaluation of multiple topologies. Subsequent algorithms, including neighbor-joining (1987) for rapid distance-based reconstruction and maximum likelihood estimation formalized by Felsenstein (1981), incorporated probabilistic models of nucleotide substitution to infer trees under explicit evolutionary processes. These tools, distributed via programs like PAUP, facilitated the integration of growing DNA sequence data, shifting phylogenetics toward statistically grounded hypotheses testable against empirical alignments. Despite these advances, frequentist methods like maximum likelihood faced computational intractability for large phylogenies, as exhaustive searches over tree space (with (2n-3)!! possible topologies for n taxa) became infeasible beyond dozens of , often relying on approximations prone to optima. This spurred refinements in optimization techniques, such as branch-and-bound algorithms and , but uncertainty quantification remained challenging without resampling procedures like , which Felsenstein introduced in 1985 to assess node support via pseudoreplicate distributions. The Bayesian revolution, beginning in the mid-1990s, addressed these constraints by framing phylogenetic inference as posterior sampling over tree topologies, branch lengths, and substitution parameters via , integrating prior distributions with likelihoods to yield probabilistic statements on evolutionary relationships. (MCMC) algorithms, adapted from physics simulations, enabled exploration of vast parameter spaces without exhaustive enumeration, as pioneered in applications by Mau (1996) and Rannala and Yang (1997). This approach excelled in handling model complexity, such as partitioned genomic datasets and relaxed molecular clocks, providing credible intervals for divergence times and direct posterior probabilities for clades, which bootstraps approximate less reliably under certain violations. The 2001 release of MrBayes by Huelsenbeck and Ronquist marked a pivotal of Bayesian methods, offering user-friendly MCMC implementation for multi-gene analyses and model averaging, which rapidly supplanted maximum likelihood in empirical studies by permitting incorporation of fossil-calibrated priors for dated phylogenies. Subsequent extensions, including for time-calibrated inference (2002 onward), integrated heterogeneous substitution rates and coalescent models, revolutionizing fields like and by yielding robust estimates from incomplete data. These developments, underpinned by increasing computational power, elevated phylogenetics to a probabilistic capable of quantifying epistemic inherent in finite sequence .

Timeline of Pivotal Events

Applications

Evolutionary and Biodiversity Studies

Phylogenetic analyses reconstruct evolutionary relationships among taxa, enabling inferences about speciation rates, divergence timings, and adaptive radiations through tree topologies and branch lengths calibrated via molecular clocks or fossils. In adaptive radiations, such as those observed in fishes of African lakes, phylogenomics integrates genomic data to resolve rapid speciation events and identify genomic signatures of to diverse ecological niches. Similarly, phylogenetic trees have elucidated the diversification of , linking morphological variation in beak size and shape to ecological specialization following colonization of the approximately 1-2 million years ago. In biodiversity studies, phylogenetic diversity (PD) metrics extend traditional by quantifying the evolutionary history spanned by assemblages, computed as the aggregate branch lengths uniting taxa on a phylogeny. This approach prioritizes conservation of evolutionarily distinct lineages, as in the EDGE protocol, which combines PD with extinction risk to target species like the or for protection due to their isolated positions on the . PD analyses reveal that human impacts disproportionately erode deep phylogenetic branches, potentially diminishing future evolutionary potential more than species counts alone suggest. Phylogenetic methods for species delimitation, including coalescent-based models like the Generalized Mixed Coalescent (GMYC), distinguish evolutionary independent lineages from intraspecific variation, refining estimates in hyperdiverse groups such as insects or . Application of these techniques in notothenioid fishes reduced putative species counts from dozens to fewer distinct entities, highlighting cryptic while cautioning against over-delimitation from morphological data alone. Such delimitations inform design and threat assessments, ensuring resources target genuine units of evolution. Spatial phylogenetics further maps PD gradients to identify hotspots, as in the , where topographic extremes correlate with elevated phylogenetic .

Pharmacology and Drug Development

Phylogenetic analysis enhances by identifying evolutionary clusters of likely to yield bioactive compounds, thereby prioritizing screening efforts in biodiverse lineages. In natural product , related used traditionally for similar therapeutic purposes exhibit phylogenetic clustering, indicating conserved biochemical pathways that predict . A 2012 study of 1,500 medicinal across , , and found that "hot nodes" in genus-level phylogenies—clusters with elevated medicinal use—contained 60% more traditionally used than random samples (P < 0.001) and were enriched for bioactive (P = 0.001), improving hit rates for drug-like compounds by focusing on genera shared across regions. This approach leverages molecular trees, such as those built from rbcL sequences, to forecast pharmacological potential, as demonstrated in predictions for cardiovascular drugs where families with multiple sharing mechanisms were flagged for development. In , phylogenetics tracks the evolutionary trajectories of resistance genes, distinguishing from events to inform resistance-breaking strategies. For instance, phylogenetic reconstruction of bacterial and viral genomes reveals "highways" of resistance propagation, such as in lineages where urban and agricultural pressures drive resistant variants, guiding the design of next-generation antibiotics targeting conserved epitopes. In viral pathogens like , time-scaled phylogenies quantify dynamics of drug-resistant strains, enabling models that predict outbreak risks and optimize antiviral regimens, as shown in analyses of community-associated methicillin-resistant S. aureus where source-sink inferences supported targeted interventions. Phylogenetic methods also validate by assessing conservation across taxa, reconstructing receptor-enzyme phylogenies to hypothesize functional analogs for lead optimization. For orphaned receptors like GPR18, sequence-based trees from genetic databases generate experimental leads by inferring ligand-binding , a accessible via standard software for non-specialists. In , comparative phylogenetics integrates genomic data to predict resistance trajectories, as in models trained on bacterial phylogenies that rank variants for susceptibility with high accuracy. These applications underscore phylogenetics' role in for efficacy, prioritizing resilient to evolutionary pressures.

Infectious Disease Tracking

Phylogenetic analysis of genomes enables the reconstruction of histories during infectious disease outbreaks by inferring evolutionary relationships that mirror epidemiological networks. This approach leverages within-host and between-host events to build time-scaled trees, distinguishing point-source introductions from ongoing community spread. Phylodynamic models further integrate these trees with demographic data to estimate key parameters such as effective reproduction numbers (R_e), invasion times, and spatial diffusion patterns. In HIV epidemiology, phylogenetics has mapped transmission clusters and the emergence of drug-resistant strains, exploiting the virus's high mutation rate and short generation time—approximately 1-2 days—to trace networks among thousands of sequences. For instance, routine surveillance in high-prevalence regions uses partial genome phylogenies to identify dense clusters indicative of acute transmission, guiding targeted interventions like partner notification. Similarly, during the 2014-2016 West African Ebola outbreak, which caused over 28,000 cases, phylogenetic reconstruction of 1,000+ viral genomes revealed multiple zoonotic spillovers and human-to-human chains, including superspreader events accelerating the epidemic. For , real-time phylogenetic platforms like Nextstrain have tracked over 10 million genomes since December 2019, resolving variant introductions—such as the B.1.1.7 lineage's global dispersal from the in late 2020—and estimating growth advantages of mutants like (B.1.617.2), which exhibited 40-60% higher transmissibility. These analyses, combining maximum-likelihood trees with Bayesian phylodynamics, informed responses by pinpointing cryptic transmissions and vaccine-escape risks. In resource-limited settings, such as the 2018-2020 Ebola outbreaks in the of (over 3,400 cases), phylogenetics confirmed viral persistence in survivors as a , with sequences clustering by geography to guide contact tracing.30291-9/fulltext) Challenges include sampling biases, which can distort tree shapes and underestimate , and the need for high-resolution genomes to resolve fine-scale transmission. Nonetheless, integrating phylogenetics with enhances outbreak control, as demonstrated by reduced R_e estimates in modeled scenarios incorporating tree-informed priors. Ongoing advancements in phylogeographic inference continue to refine these applications for emerging pathogens.

Non-Biological Disciplines


Phylogenetic methods are applied in historical linguistics to reconstruct evolutionary relationships among languages, using datasets such as cognate lexicons, phonological features, or grammatical traits as analogous to biological characters. These techniques facilitate the inference of language family trees and divergence timings through statistical frameworks like Bayesian phylogenetics, which model substitution processes in linguistic data and incorporate rate variation.
Software adaptations, such as BEAST for linguistic analysis, enable estimation of evolutionary rates and detection of contact-induced horizontal transfer, which introduces reticulation akin to hybridization in biology. For example, analyses of Austronesian or Indo-European languages have dated proto-language origins using calibrated molecular clock analogs based on cognate retention rates and archaeological priors.
In cultural evolution, phylogenetic comparative methods assess trait co-evolution across societies, often proxying relatedness via language trees to isolate adaptive signals from shared ancestry. Applications include tracing medicinal plant uses or technological lineages, though cultural systems demand adjustments for elevated horizontal transmission via networks rather than bifurcating trees.
Material cultural phylogenies reconstruct artifact histories, such as stringed instruments or brasswinds, revealing innovation hotspots and descent patterns through distance-based or parsimony methods. In textual stemmatics, manuscript variants serve as character states to infer filiation trees, as demonstrated in reconstructions of Chaucer's Canterbury Tales. These non-biological uses highlight phylogenetics' versatility but underscore challenges like sparse data and reticulate dynamics diverging from biological vertical inheritance.

Limitations and Criticisms

Inherent Assumptions and Biases

Phylogenetic reconstruction fundamentally assumes that organisms share a common ancestry and that evolutionary divergence produces hierarchical patterns of shared derived traits, interpreted as rather than convergent or analogous. This assumption underpins the of branching topologies from morphological or molecular characters, positing that similarities reflect inherited ancestry over independent origins. The canonical further assumes strictly bifurcating splits with no fusion, instantaneous events, and absence of between branches, such as via (HGT). These premises idealize as a strictly vertical, reticulation-free process, which empirical data indicate is frequently violated, particularly in prokaryotes where HGT rates can exceed 10-20% of gene content in some . In molecular phylogenetics, inference methods impose additional assumptions of treelikeness—all alignment sites sharing the same underlying topology—alongside stationarity (constant nucleotide or amino acid frequencies across the tree), homogeneity (uniform substitution rates over time), and reversibility (symmetric substitution probabilities). Violations, detectable via tests like likelihood mapping for treelikeness or symmetry tests for stationarity (e.g., p-values <0.05 indicating rejection), systematically bias topology estimates and branch lengths toward incorrect resolutions. For instance, non-stationary compositional shifts, where base frequencies vary along branches, distort distance-based and likelihood-based reconstructions by inflating apparent similarities between heterogeneous lineages. Prominent methodological biases exacerbate these issues; long-branch attraction (LBA) causes and certain maximum likelihood analyses to erroneously cluster rapidly evolving (long-branch) taxa, as shared homoplasies mimic synapomorphies under rate heterogeneity. This artifact persists even in when site-rate variation is underspecified, with simulations showing LBA probabilities rising to 50% or higher in heterogenous datasets. Model misspecification, such as neglecting among-site rate heterogeneity or epistatic interactions, introduces systematic errors that favor suboptimal topologies, reducing accuracy by up to 20-30% in simulated scenarios with complex . Sampling biases, including underrepresentation of short branches or geographic undersampling, further confound phylogeographic inferences, amplifying migration event misreconstructions in datasets with uneven coverage. Incomplete lineage sorting (ILS) and reticulate processes like hybridization violate the concordance of gene and species trees, generating hemiplasic signals where up to 30% of loci may support alternative topologies in rapid radiations. While multi-gene phylogenomics mitigates some biases through or models, persistent incongruence—observed in 20-50% of empirical quartets—highlights the approximation inherent in tree-based paradigms, prompting shifts toward models in domains with high reticulation. Peer-reviewed assessments emphasize testing these assumptions via and analyses, as unaddressed violations propagate errors across downstream applications like divergence dating.

Sources of Incongruence and Error

In phylogenetics, incongruence refers to discrepancies between inferred evolutionary relationships, such as conflicts among gene trees, between gene trees and species trees, or deviations from the true underlying history due to biological processes or methodological artifacts. These sources can lead to topological differences, branch length distortions, or unsupported clades, complicating accurate reconstruction of the . Biological processes often generate genuine incongruence by violating the assumption of strictly bifurcating, vertical inheritance. Incomplete lineage sorting (ILS) arises when ancestral genetic polymorphisms fail to coalesce before speciation events, particularly in rapid radiations, resulting in gene trees that randomly match one of multiple possible species tree topologies with equal probability under the multispecies coalescent model. For instance, in closely related species with short internodes, ILS can affect up to 30-50% of loci in empirical datasets from mammals and . (HGT), prevalent in prokaryotes but also observed in eukaryotes like fungi and plants, introduces laterally acquired genes that incongruently link distantly related lineages, as documented in bacterial where 1-10% of genes show HGT signatures. Hybridization and further contribute by allowing across species boundaries, creating reticulate patterns; examples include adaptive introgression in , where 40% of the shows mosaic ancestry from hybrid zones. duplications followed by paralog misidentification as orthologs exacerbate this, as paralogs evolve independently and retain shared ancestral polymorphisms, leading to artifactual groupings in up to 20% of eukaryotic gene families. Recombination within loci generates chimeric histories, with fragmented in prokaryotes showing site-specific clustering of conflicting signals beyond neutral expectations. Methodological errors introduce systematic biases that mimic or amplify biological incongruence. Long-branch attraction (LBA) occurs when rapidly evolving lineages are erroneously grouped due to shared convergent substitutions or underestimated distances, a artifact prominent in analyses but persisting in likelihood methods under rate heterogeneity; simulations demonstrate LBA inflating support for incorrect clades by 10-20% in datasets with branch length disparities exceeding 0.5 substitutions per site. Model misspecification, such as ignoring compositional heterogeneity or site-specific rates, causes systematic over- or underestimation of evolutionary distances, as evidenced in metazoan phylogenies where codon-position biases drive attraction between compositionally similar but unrelated taxa like Porifera and . Sampling limitations, including sparse coverage or , amplify stochastic variance, with studies showing that fewer than 100 loci can yield 15-25% topological in phylogenomic datasets due to incomplete sampling of histories. Alignment inaccuracies from divergent sequences further propagate errors, particularly in non-coding regions, where automated aligners misplace indels in 5-10% of positions across protein families. Distinguishing biological from artifactual incongruence requires coalescent-based methods like or SVDquartets for ILS accommodation, network approaches for reticulation, and posterior predictive checks for model adequacy, though no single method fully resolves all conflicts without dense genomic sampling. from phylogenomics indicates that combining thousands of loci reduces error but highlights persistent systematic biases in deep divergences, necessitating caution in interpreting high-support clades as definitive.

Debates on Tree vs. Network Models

Phylogenetic tree models assume a bifurcating driven by vertical and events, providing a parsimonious framework for reconstructing evolutionary relationships in lineages with minimal reticulation, such as many multicellular eukaryotes. However, these models can mislead when reticulate processes like (HGT) or hybridization introduce conflicting signals, as evidenced by gene tree incongruence in microbial datasets where HGT impacts 5-15% of bacterial genes and up to 20-39% in . models address this by incorporating reticulations—non-vertical events—yielding representations that better capture complex evolutionary dynamics, particularly in prokaryotes where pervasive gene exchange challenges the strict tree-of-life . Critics of widespread network adoption argue that HGT's role is exaggerated, with vertical signals dominating concatenated genomic analyses and serving effectively as a for testing deviations. For instance, empirical studies show that even under moderate HGT, supertree and supermatrix methods recover accurate organismal phylogenies, suggesting should complement rather than supplant . In contrast, advocates for emphasize their necessity in scenarios of high incongruence, such as or eukaryotic hybridization, where alone obscure true relationships—as demonstrated in Ursidae (bears), where mitochondrial and data conflicts are resolved via consensus revealing . Debates intensify over applicability: networks excel in biodiversity contexts involving hybrid speciation or polyploidy, like Fragaria strawberries or Xiphophorus fishes, but face scalability issues with large taxa sets and interpretive challenges in distinguishing introgression from incomplete lineage sorting. Methodological inconsistencies across tools like PHYLONET further complicate inference under multiple reticulations. Hybrid strategies mitigate these by mapping tree-derived support (e.g., bootstrap values) onto networks, enabling comprehensive signal exploration without abandoning tree simplicity for exploratory phases. Ultimately, the choice hinges on data patterns—trees for compatible splits, networks for incompatible ones—prioritizing empirical fit over dogmatic adherence to either model.

Recent Developments

Phylogenomics and Genomic Integration

Phylogenomics applies genome-scale data to infer evolutionary relationships, contrasting with traditional phylogenetics that relies on single s or morphological traits. This approach leverages thousands of orthologous loci extracted from whole genomes or transcriptomes, reducing stochastic errors and improving resolution for deep divergences. Methods include supermatrix concatenation, where aligned orthologs are combined into a single alignment for , and coalescent-based models that account for incomplete sorting (ILS) by summarizing gene trees into trees. Genomic integration in phylogenomics involves processing vast datasets through ortholog identification, multiple sequence alignment, and handling of genomic heterogeneity such as recombination and horizontal gene transfer (HGT). Tools like OrthoFinder detect orthogroups across genomes, while pipelines such as PhyloPhlAn 3.0 automate retrieval and analysis of thousands of universal markers from public databases, enabling scalable inferences. Recent advancements incorporate whole-genome alignments and variant calling to model site-specific evolutionary rates, enhancing accuracy in detecting rapid radiations. For instance, in 2024, a phylogenomic study of angiosperms using over 1.6 million loci resolved polytomies in the early diversification of flowering plants, dating the crown group to approximately 146 million years ago. Challenges in genomic integration include systematic biases from long-branch attraction in concatenated analyses and the computational demands of multispecies coalescent methods on large datasets. However, progress in approximation algorithms, such as for gene tree summarization, has enabled phylogeny estimation from datasets exceeding 1,000 taxa and genes. In eukaryotic systems, frameworks like EukPhylo v1.0, released in 2025, integrate curated ortholog sets with for contamination filtering, yielding robust trees for assessment. These developments underscore phylogenomics' role in of evolutionary processes, prioritizing empirical genomic evidence over prior assumptions.

AI-Driven and Structural Approaches

Artificial intelligence-driven methods in phylogenetics leverage and to enhance tasks such as , selection, tree , and detection of evolutionary processes like and discordance. These approaches process large genomic datasets more efficiently than traditional statistical methods, particularly for phylogenomics involving thousands of loci. For instance, end-to-end neural networks like NeuralNJ integrate neighbor-joining heuristics with to iteratively refine tree topologies, achieving accuracy comparable to maximum likelihood on simulated and empirical data while reducing computational demands. Frameworks such as employ to directly infer trees from alignments, bypassing exhaustive searches and demonstrating robustness on datasets up to 1,000 taxa. Recent tools like PhyloInfer aim to reconstruct trees from raw sequencing reads using AI, minimizing preprocessing errors in . Deep learning models have also accelerated phylogenetic updates in dynamic datasets, as in PhyloTune, which uses neural networks to predict branch length adjustments upon adding new sequences, outperforming methods by up to 50% in speed on empirical alignments. Applications extend to , where convolutional neural networks classify branch length heterogeneity to infer incomplete sorting or hybridization. However, these methods require extensive data and can overfit to simulation biases, necessitating validation against gold-standard likelihood-based inferences. Structural approaches in phylogenetics utilize three-dimensional protein structures to infer evolutionary relationships, particularly for distantly related s where sequence similarity diverges. Protein structures evolve more slowly than primary s, preserving signals of ancestry. Methods align structures via metrics like TM-score or superimpose residues to construct distance matrices for tree building, often integrated with sequence data in supermatrix analyses. A 2025 study reconstructed phylogenies for thousands of protein families using structural alignments, resolving divergences beyond sequence-based limits and identifying novel superfamilies in enzymes. AI-predicted structures from tools like have revolutionized this field by providing high-accuracy models for orphan proteins, enabling structural phylogenetics where experimental data is scarce; these predictions outperform empirical structures in some -time inferences due to reduced noise. Integration of AI-driven structure prediction with phylogenetic pipelines has yielded hybrid methods, such as structure-informed multiple sequence alignments that boost bootstrap support by 10-20% in problematic clades. Challenges include sensitivity to prediction errors in flexible regions and the need for standardized definitions, but empirical validations confirm superior resolution of ancient splits, as in fungal or bacterial superfamilies. These advances complement genomic phylogenomics by incorporating biophysical constraints, potentially refining the for protein domains.

References

  1. [1]
    Phylogenetic Inference - Stanford Encyclopedia of Philosophy
    Dec 8, 2021 · Phylogenetics is the study of the evolutionary history and relationships among individuals, groups of organisms (e.g., populations, species, ...Phylogenetic Inference in... · Phylogenetic Inference and...
  2. [2]
    Understanding phylogenies - Understanding Evolution - UC Berkeley
    Phylogenies trace patterns of shared ancestry between lineages. Each lineage has a part of its history that is unique to it alone and parts that are shared ...
  3. [3]
    [PDF] Introduction to Phylogeny - Statistics & Data Science
    Phylogenetics is the branch of systematics that involves understanding and reconstructing the evolutionary history of organisms. The basic assumptions of a ...
  4. [4]
    Molecular Phylogenetics - Genomes - NCBI Bookshelf - NIH
    When we carry out a phylogenetic analysis our primary objective is to infer the pattern of the evolutionary relationships between the DNA sequences that are ...
  5. [5]
    Common Methods for Phylogenetic Tree Construction and Their ...
    May 11, 2024 · In this review, we summarize common methods for constructing phylogenetic trees, including distance methods, maximum parsimony, maximum likelihood, Bayesian ...
  6. [6]
    Scientific, historical, and conceptual significance of the first tree of life
    Jan 20, 2012 · The 1977 paper (1) and its aftermath transformed microbiology by introducing a phylogenetic framework for exploring life's diversity and ...
  7. [7]
    The Role of Phylogenetics in Comparative Genetics - PMC
    Many biologists agree that a phylogenetic tree of relationships should be the central underpinning of research in many areas of biology.
  8. [8]
    A Practical Guide to Design and Assess a Phylogenomic Study
    Phylogenetic signal: A measure of how much of the similarity between genetic sequences reflects common ancestry. A related concept is “phylogenetic noise”, ...
  9. [9]
    Phylogenetic Trees | Biological Principles
    A phylogenetic tree is a visual representation of the relationship between different organisms, showing the path through evolutionary time.
  10. [10]
    Phylogenetics
    Definition: A series of relationships between species; inferred based on their shared and unique characteristics; also called a tree. Describes a series of ...
  11. [11]
    Understanding Phylogenetics - Geneious
    Phylogenetic tree of life. Phylogenetics has a long history dating back to before scientists knew about the structure of DNA or how to sequence DNA. In the ...
  12. [12]
    Systematics
    The pattern of relatedness is called phylogeny and systematics is the field of biology that studies and seeks to determine phylogenies.
  13. [13]
    Phylogenetic systematics - Understanding Evolution - UC Berkeley
    Phylogenetic systematics is the formal name for the field within biology that reconstructs evolutionary history and studies the patterns of relationships among ...
  14. [14]
    Traditional Taxonomy and Modern Phylogenetics - Faculty Web Pages
    Much more recently, the advent of DNA sequencing has revolutionized the field of systematics and how phylogenetic trees are constructed.
  15. [15]
    Introduction to Phylogeny
    Phylogenetics, the science of phylogeny, is one part of the larger field of systematics, which also includes taxonomy. Taxonomy is the science of naming and ...
  16. [16]
    [PDF] Phylogenetics and the Trees of Life Taxonomy or Systematic
    A phylogenetic classification is based upon evolutionary relationships i.e. upon common ancestry (cladistic classification). • Sequence information is for ...
  17. [17]
    Willi Hennig | Phylogenetic Systematics - University of Illinois Press
    In stockPhylogenetic Systematics, first published in 1966, marks a turning point in the history of systematic biology. Willi Hennig's influential synthetic work ...
  18. [18]
    Phylogenetic Systematics - Annual Reviews
    Phylogenetic Systematics. Willi Hennig; Vol. 10:97-116 (Volume publication date January 1965) https://doi.org/10.1146/annurev.en.10.010165.000525.
  19. [19]
    Facilitating taxonomy and phylogenetics: An informative and cost ...
    Phylogenetic inference has become a standard technique in integrative taxonomy and systematics, as well as in biogeography and ecology.
  20. [20]
    Systematics, Taxonomy, and Phylogeny - Palaeos
    Systematics is concerned both with Taxonomy, the naming and classification of life, and Phylogeny, the science and study of understanding the family tree of ...
  21. [21]
    Phylogenetic congruence, conflict and consilience between ...
    Jul 5, 2023 · Phylogenetic trees are integral to understanding evolution, yet the true tree is often unknown and must be estimated using phylogenetic data.<|control11|><|separator|>
  22. [22]
    Do morphometric data improve phylogenetic reconstruction? A ...
    Oct 18, 2024 · Discrete morphological data have been the primary focus of traditional systematic methods for phylogenetic reconstruction. Even following the ...Missing: paper | Show results with:paper
  23. [23]
    Constructing phylogenetic trees for microbiome data analysis: A mini ...
    We present a comprehensive review of phylogenetic tree construction techniques for microbiome data (16S rRNA or whole-genome shotgun sequencing).
  24. [24]
    Preparing genomic data for phylogeny reconstruction
    Oct 21, 2022 · To prepare genomic data, mask repetitive sequences, annotate genomes, find common proteins (orthologs), and align them for phylogeny ...
  25. [25]
    Phylogenomics — principles, opportunities and pitfalls of big‐data ...
    Dec 16, 2019 · Phylogenetics is the science of reconstructing the evolutionary history of life on Earth. Traditionally, phylogenies were constructed using morphological data ...Introduction · Estimating evolutionary history... · have a phylogeny! What now?
  26. [26]
    Phylogenetics Algorithms and Applications - PMC - NIH
    Phylogenetics is a powerful approach in finding evolution of current day species. By studying phylogenetic trees, scientists gain a better understanding of ...Missing: definition | Show results with:definition
  27. [27]
    [PDF] distance based methods in phylogentic tree construction
    The algorithms of cluster-bsed include unweighted pair group method using arithmetic average (UPGMA) and neighbor joining (NJ).
  28. [28]
    Distance-based methods | Bioinformatics Class Notes - Fiveable
    Neighbor-joining (NJ) serves as a popular distance-based method for constructing phylogenetic trees · Developed by Saitou and Nei in 1987 as an efficient ...
  29. [29]
    26.4: Character-Based Methods - Biology LibreTexts
    Mar 17, 2021 · In character-based methods, the goal is to first create a valid algorithm for scoring the probability that a given tree would produce th observed sequences at ...
  30. [30]
    A Review on Phylogenetic Analysis: A Journey through Modern Era
    Phylogenetic analysis may be considered to be a highly reliable and important bioinformatics tool. The importance of phylogenetic analysis lies in its simple ...
  31. [31]
    A biologist's guide to Bayesian phylogenetic analysis - PMC - NIH
    Here, we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC).
  32. [32]
    BEAST Software - Bayesian Evolutionary Analysis Sampling Trees ...
    BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies.Installing BEAST on Windows · Tutorials · Molecular Clocks · Tree Priors<|control11|><|separator|>
  33. [33]
    Practical guidelines for Bayesian phylogenetic inference using ...
    Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, ...
  34. [34]
    Performance of criteria for selecting evolutionary models in ...
    Aug 9, 2010 · Of the four widely-used model-selection criteria in phylogenetics - the hLRT, AIC, BIC, and DT - the hLRT was once argued to be reasonably ...
  35. [35]
    On the Use of Information Criteria for Model Selection in Phylogenetics
    Nov 5, 2019 · The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection ...Introduction · Results and Discussion · Conclusions · Materials and Methods
  36. [36]
    ModelFinder: fast model selection for accurate phylogenetic estimates
    May 8, 2017 · We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate ...Missing: review | Show results with:review
  37. [37]
    Model selection may not be a mandatory step for phylogeny ... - Nature
    Feb 25, 2019 · Model selection is considered as a fundamental step in the process of phylogeny reconstruction and has penetrated into the broad phylogenetic ...
  38. [38]
    Assessment of Substitution Model Adequacy Using Frequentist ... - NIH
    In this study, we use empirical and simulated data to evaluate the adequacy of common substitution models using both frequentist and Bayesian methods.
  39. [39]
    Trends in substitution models of protein evolution for phylogenetic ...
    Sep 27, 2025 · Substitution models of protein evolution are essential for probabilistic approaches to phylogenetic inference. We overview their fundaments ...
  40. [40]
    The impact of software and criteria on the selection of best-fit ...
    Mar 26, 2025 · Our analysis of model selection accuracy across three popular phylogenetic programs (jModelTest2, ModelTest-NG, and IQ-TREE) revealed that the ...Results · Discussion · Associated Data
  41. [41]
    Comparing Bootstrap and Posterior Probability Values in the Four ...
    The study explores the relationship between bootstrap and posterior probability values, finding a complex association with significant differences in some ...
  42. [42]
    Overcredibility of molecular phylogenies obtained by Bayesian ...
    These results indicate that bootstrap probabilities are more suitable for assessing the reliability of phylogenetic trees than posterior probabilities and that ...
  43. [43]
    Frequentist Properties of Bayesian Posterior Probabilities of ...
    The posterior probability of a phylogenetic tree is the probability that the tree is correct, assuming that the model is correct. On the other hand, all ...Abstract · Methods · Results and Discussion · Recommendations
  44. [44]
    Taxon sampling and seed plant phylogeny - PubMed
    We investigated the effects of taxon sampling on phylogenetic inference by exchanging terminals in two sizes of rbcL matrices for seed plants, applying ...<|separator|>
  45. [45]
    Impacts of Taxon-Sampling Schemes on Bayesian Tip Dating Under ...
    Our simulation study has shown that the time estimates obtained were variably influenced by increasing taxon-sampling density. In addition to its negligible ...
  46. [46]
    Long-Branch Attraction Bias and Inconsistency in Bayesian ...
    BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously.
  47. [47]
    Multiple measures could alleviate long-branch attraction in ... - Nature
    Jan 25, 2017 · Major factors contributing to LBA include faster substitution rate in nonadjacent phylogenetic lineages, poor taxon sampling due to extinction ...<|separator|>
  48. [48]
    Compositionally Constrained Sites Drive Long-Branch Attraction
    We find evidence that compositionally constrained sites are driving long-branch attraction in two metazoan datasets and recover evidence for Porifera as the ...Abstract · Materials and Methods · Results · Discussion
  49. [49]
    Long-branch attraction and the phylogeny of true water bugs ...
    May 7, 2014 · The most common problem of LBA is that distantly related outgroups have a biased attraction to long branches within the ingroup [3, 4, 38]. For ...
  50. [50]
    A review of long-branch attraction - PubMed
    The history of long-branch attraction, and in particular methods suggested to detect and avoid the artifact to date, is reviewed.
  51. [51]
    Incomplete lineage sorting and long-branch attraction confound ...
    Jan 29, 2024 · This observation suggests that non-phylogenetic signals from LBA mask the true phylogenetic signals, resulting in reduced confidence in the ...
  52. [52]
    Trees and networks before and after Darwin - Biology Direct
    Nov 16, 2009 · Haeckel (following Bronn) did not accept every detail of Darwin's argument, but agreed that the natural system is necessarily genealogical, and ...
  53. [53]
    Edward Hitchcock's pre-Darwinian (1840) "tree of life" - PubMed
    Whereas Lamarck, Chambers, Bronn, Darwin, and Haeckel saw some form of transmutation as the mechanism that created their "trees of life," Hitchcock, like ...Missing: phylogenetics | Show results with:phylogenetics
  54. [54]
    Sander Gliboff. H. G. Bronn, Ernst Haeckel, and the Origins of ...
    H. G. Bronn provided the German translation of Darwin's Origin of Species on which Ernst Haeckel built his understanding of the theory of evolution by ...Missing: phylogenetics | Show results with:phylogenetics
  55. [55]
    The Roots of Phylogeny: How Did Haeckel Build His Trees?
    Haeckel's trees were genealogical, based on a linear morphological phylogeny, not phylogenetic trees. He used a single common root and three main branches.
  56. [56]
    The evolution of Willi Hennig's phylogenetic considerations (Chapter ...
    Jul 5, 2016 · Hennig's first book on the theory of phylogenetic systematics, which finally appeared in 1950, dealt with the distinction between confused ideas ...
  57. [57]
    [PDF] The impact of W. Hennig's - European Journal of Entomology
    Feb 2, 2001 · This important progress was in part initiated by the. German entomologist Willi Hennig (1913-1976; biogra phy, see Anonymous, 1978; Schlee, 1981 ...
  58. [58]
    Willi Hennig and the Rise of Cladistics - ResearchGate
    There is no comprehensive history of cladistics, the theory of systematics that revolutionized comparative biology in the early 1960s. There may be no ...
  59. [59]
    When phylogenetics met biogeography: Willi Hennig, Lars Brundin ...
    Oct 19, 2022 · Cladistic or vicariance biogeography (Nelson, 1974; Nelson and Platnick, 1981; Rosen, 1976, 1978) is usually assumed to represent a different ...
  60. [60]
    Willi Hennig Society
    Willi Hennig was born April 20, 1913, in the village of Dürrhennersdorf, southern Upper Lusatia (east of Dresden), Germany, and he died November 5, 1976, ...
  61. [61]
    Molecules as documents of evolutionary history - PubMed
    Molecules as documents of evolutionary history. J Theor Biol. 1965 Mar;8(2):357-66. doi: 10.1016/0022-5193(65)90083-4. Authors. E Zuckerkandl, L Pauling. PMID ...Missing: phylogenetics origins
  62. [62]
    Molecules as documents of evolutionary history - ScienceDirect
    Different types of molecules are discussed in relation to their fitness for providing the basis for a molecular phylogeny. ... Zuckerkandl and Pauling, 1962. E.Missing: phylogenetics origins
  63. [63]
    Narrative - 37. Molecular Evolutionary Clock
    After producing patterns for many species, Pauling and Zuckerkandl compared the fingerprints and concluded which species were closely or distantly related. Thus ...Missing: phylogenetics origins
  64. [64]
    The History of PCR | Thermo Fisher Scientific - US
    This article explores the history of modern PCR, from the isolation of the first DNA polymerase to the development of next-generation polymerases.
  65. [65]
    Current Advances in Molecular Phylogenetics - PMC
    Since its inception some 50 years ago, phylogenetics has permeated nearly every branch of biology. Initially developed to classify objects based on a set of ...
  66. [66]
    [PDF] Introduction to Computational Phylogenetics
    This document includes the basic material needed to understand computational methods for estimating phylogenetic trees in biology and linguistics, and to read ...
  67. [67]
    Computational approaches to species phylogeny inference and ...
    In 1997, Maddison surveyed phylogenetic incongruence, and described parsimony and likelihood criteria for various reconciliation and inference problems [11].Missing: revolution | Show results with:revolution
  68. [68]
    Computational Phylogenetics | Annual Reviews
    In this review, I explore some of the advantages and disadvantages of using computational tools for historical linguistics. I describe the theory that underlies ...
  69. [69]
    Bayesian Inference of Phylogeny and Its Impact on Evolutionary ...
    Bayesian inference of phylogeny uses posterior probability, based on Bayes's theorem, to analyze large phylogenetic trees and complex evolutionary models.Missing: revolution | Show results with:revolution
  70. [70]
    Scientific, historical, and conceptual significance of the first tree of life
    In describing the phylogenetic relationships, the results also charted the first scientific view of deep evolutionary history. Both these fundamental aspects of ...
  71. [71]
    a new method for reconstructing phylogenetic trees - PubMed
    A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data.
  72. [72]
    MRBAYES: Bayesian inference of phylogenetic trees | Bioinformatics
    The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo.
  73. [73]
    Population Genomics of Adaptive Radiation - Wiley Online Library
    Dec 24, 2024 · Here we review how population genomic data have facilitated our knowledge of adaptive radiation in five key areas: (1) phylogenetics, (2) ...
  74. [74]
    Why is phylogenetics important? - EMBL-EBI
    Applications of phylogenetics include classification, identifying pathogens, answering biological questions, forensics and bioinformatics Figure 2 Potential ...
  75. [75]
    Phylogenetic diversity in conservation: A brief history, critical ...
    A better understanding of how phylogenetic error and uncertainty affect conservation decision-making involving PD is one of the most pressing current issues ...
  76. [76]
    Beyond Species Richness: The Importance of Phylogenetic ...
    Jun 12, 2025 · We thus argue that phylogenetic diversity provides a more robust framework for understanding biodiversity–disease relationships than species ...
  77. [77]
    Species Delimitation, Phylogenetic Relationships, and Temporal ...
    Apr 20, 2018 · In this work we applied the Generalized Mixed Yule Coalescent (GMYC) method, which determines a divergence threshold to delimit species in a phylogenetic tree.
  78. [78]
    Phylogenomic Species Delimitation Dramatically Reduces Species ...
    Jul 10, 2021 · Application of genetic data to species delimitation often builds confidence in delimitations previously hypothesized using morphological, ...
  79. [79]
    [PDF] Phylogenomics and species delimitation for effective conservation of ...
    Sep 25, 2020 · Phylogenetic inference and species delimitation are therefore of fundamental importance to biodiversity conser- vation in assigning strategies ...
  80. [80]
    Spatial phylogenetics of two topographic extremes of the Hengduan ...
    We compared spatial patterns of diversity, endemism, and threatened species in these ecosystems based on both traditional measurements and recent phylogenetic ...<|separator|>
  81. [81]
    Phylogenies reveal predictive power of traditional medicine in ... - NIH
    Sep 25, 2012 · We conclude that phylogenetic cross-cultural comparisons can focus screening efforts on a subset of traditionally used plants that are richer in bioactive ...
  82. [82]
    The predictive utility of the plant phylogeny in identifying sources of ...
    Families with four or more species demonstrating the same mechanism of action are considered pharmacologically important for CV drug development. Figure 1.Introduction · Results · Discussion<|separator|>
  83. [83]
    Profile and evolution of antimicrobial resistance genes in ... - Nature
    Aug 26, 2025 · The evolution of S. aureus, especially the antibiotic-resistant variants, is probably driven by urbanisation, agricultural practices, and global ...
  84. [84]
    Time-Scaled Evolutionary Analysis of the Transmission and ...
    We used quantitative phylogenetic methods to identify sink and source populations for CC398 and formally investigated the evolutionary histories of antibiotic ...<|control11|><|separator|>
  85. [85]
    Role of Phylogenetics as a Tool to Predict the Spread of Resistance
    Oct 6, 2017 · In this review we discuss phylogenetic methods used to study the emergence of drug resistance and the spread of resistant viruses.Abstract · CLINICAL EPIDEMIOLOGY OF... · PHYLOGENETICS AND...
  86. [86]
    Phylogenetic methods in drug discovery - PubMed
    Phylogenetic analysis can help generate hypotheses and leads for experimentation. Reconstruction of molecular phylogenies for the nonspecialist is described ...
  87. [87]
    Machine learning and phylogenetic analysis allow for predicting ...
    Dec 20, 2023 · In this study, we present a novel phylogeny-based method for ranking genetic variants followed by training ML models for predicting antibiotic ...
  88. [88]
    Leveraging Comparative Phylogenetics for Evolutionary Medicine
    Feb 14, 2025 · Comparative phylogenetics provides a wealth of computational tools to understand evolutionary processes and their outcomes.
  89. [89]
    Relating Phylogenetic Trees to Transmission Trees of Infectious ...
    We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics ...
  90. [90]
    Phylogenetic and epidemic modeling of rapidly evolving infectious ...
    Molecular phylogenetics has had a profound impact on the study of infectious diseases, particularly rapidly evolving infectious agents such as RNA viruses.
  91. [91]
    Phylogenetic and phylodynamic approaches to understanding and ...
    Apr 22, 2022 · Here, we review and synthesize studies that illustrate how phylogenetic and phylodynamic techniques were applied during the first year of the pandemic.
  92. [92]
    The Role of Phylogenetics as a Tool to Predict the Spread of ...
    Oct 6, 2017 · PHYLOGENETICS AND DRUG RESISTANCE. HIV viruses rapidly accumulate genetic variation because of short generation times and high mutation rates.
  93. [93]
    Phylodynamic applications in 21st century global infectious disease ...
    May 8, 2017 · By integrating phylogenetic methods with traditional epidemiological methods, researchers are able to infer relationships between surveillance ...
  94. [94]
    Tracking virus outbreaks in the twenty-first century - PubMed Central
    Dec 13, 2018 · This Review Article describes how recent advances in viral genome sequencing and phylogenetics have enabled key issues associated with outbreak epidemiology to ...
  95. [95]
    Nextstrain
    By reconstructing a phylogeny we can learn about important epidemiological phenomena such as spatial spread, introduction timings and epidemic growth rate.Seasonal influenza · All Ebola outbreaks · HPAI outbreak · H5N1 cattle outbreak
  96. [96]
    Phylogenetic Analyses of Severe Acute Respiratory Syndrome ...
    This study used state-of-the-art phylodynamic methods to ascertain that the rapid rise of B.1.1.7 “Variant of Concern” most likely occurred by global dispersal.
  97. [97]
    Phylogenetic tree shapes resolve disease transmission patterns - PMC
    The shapes of phylogenies of pathogens can reveal patterns in how an outbreak spreads. We used simple features to summarise the shapes of pathogen ...
  98. [98]
    Epidemiological inference from pathogen genomes: A review of ...
    Phylodynamics combines evolutionary biology and epidemiology to generate evidence about the spread and source of pathogens. It does this by exploiting the ...
  99. [99]
    Phylogenetic Concepts and Tools Applied to Epidemiologic ...
    A critical application of phylogenetic models to molecular epidemiology is in infectious disease surveillance systems. Historically, epidemiologists used ...
  100. [100]
    Bayesian phylogenetic analysis of linguistic data using BEAST
    Sep 23, 2021 · Bayesian phylogenetic methods provide a set of tools to efficiently evaluate large linguistic datasets by reconstructing phylogenies—family ...
  101. [101]
    Phylogenetics in lingustics - Why and how to?
    Phylogenetic analyses play a key role in comparative linguistics. They provide not only information about the relationship of different languages, but also ...
  102. [102]
    Detecting contact in language trees: a Bayesian phylogenetic model ...
    Jun 17, 2022 · Phylogenetic trees are used to represent the evolutionary history of a language family from its descent from a common ancestor at the stem to ...
  103. [103]
    Phylogenetics beyond biology - PMC - NIH
    Evolutionary processes have been described not only in biology but also for a wide range of human cultural activities including languages and law.
  104. [104]
    Phylogenetic comparative methods - ScienceDirect
    May 8, 2017 · Phylogenetic comparative methods (PCMs) enable us to study the history of organismal evolution and diversification.
  105. [105]
    Comparative phylogenetic methods and the cultural evolution of ...
    Sep 10, 2018 · Here, we propose a cultural macroevolutionary framework to study the use of plants in ethnomedicine. Anthropologists have used phylogenetic ...
  106. [106]
    Phylogenetics and Material Cultural Evolution | Current Anthropology
    Phylogenetic methods are used to study material culture, but cultural evolution differs from biological evolution, requiring caution in applying biological ...
  107. [107]
    Assessing Phylogenetic Assumptions - IQ-TREE
    Mar 15, 2021 · Some common assumptions include treelikeness (all sites in the alignment have evolved under the same tree), stationarity (nucleotide/amino-acid ...
  108. [108]
    Compositional bias may affect both DNA-based and ... - PubMed
    It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences.
  109. [109]
    Long Branch Attraction Biases in Phylogenetics - Oxford Academic
    Feb 2, 2021 · Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood.Abstract · Small Sample Biases of Tree... · Partitioning Results and Bias...
  110. [110]
    Robustness of Phylogenetic Inference to Model Misspecification ...
    May 27, 2021 · Many other forms of model misspecification exist, which can make phylogenetic inference difficult (see, e.g., Philippe et al. 2011).
  111. [111]
    The impact of sampling bias on viral phylogeographic reconstruction
    Sampling bias impacts whether and how key migration events are reconstructed. This also depends on the migration rate. Overall, we find that biased sampling can ...
  112. [112]
    Approaches for Assessing Phylogenetic Accuracy
    Four principal methods have been used for assessing phylogenetic accuracy: simulation, known phylogenies, statistical analyses, and congruence studies.
  113. [113]
    Causes, consequences and solutions of phylogenetic incongruence
    May 28, 2014 · The bias causing systematic errors creates an erroneous signal that could dominate the true phylogenetic ... phylogenetic reconstruction methods.
  114. [114]
    Phylogenetic networks empower biodiversity research - PNAS
    Jul 28, 2025 · Hybridization. Under NMSC, two biological sources of gene tree incongruence are ILS and hybridization (SI Appendix, Fig. S1). These two ...Phylogenetic Networks... · 2.1. Hybridization · 2.2. Allopolyploidy
  115. [115]
    Dealing with incongruence in phylogenomic analyses - PMC
    Incomplete lineage sorting occurs when an ancestral species undergoes several speciation events in a short period of time. If, for a given gene, the ancestral ...
  116. [116]
    Dealing with incongruence in phylogenomic analyses - Journals
    Oct 7, 2008 · Incomplete lineage sorting occurs when an ancestral species undergoes several speciation events in a short period of time. If, for a given gene, ...
  117. [117]
    Phylogenetic incongruence arising from fragmented speciation in ...
    Jun 7, 2010 · The source of incongruence was inferred to be recombination, because individual genes support conflicting topology more robustly than expected ...
  118. [118]
    Topology-dependent asymmetry in systematic errors affects ...
    Dec 11, 2020 · LBA is a systematic error that falsely groups long branches (10), such as those leading to the outgroups and to both the Ctenophora and ...Results · Accuracy Correlates With The... · Materials And Methods
  119. [119]
    Multiple historical processes obscure phylogenetic relationships in a ...
    Jun 20, 2019 · Incongruence in phylogeny reconstruction can be caused by stochastic and systematic errors. Stochastic error is exacerbated by gene length. The ...Results · Topological Patterns Among... · Methods
  120. [120]
    The genetic code can cause systematic bias in simple phylogenetic ...
    This study examines a range of substitution models to see whether they are capable of recovering accurate estimates of branch lengths and tree topologies from ...
  121. [121]
    Complexity of avian evolution revealed by family-level genomes
    Apr 1, 2024 · Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions. Here we address these ...
  122. [122]
    Should Networks Supplant Tree Building? - PMC - PubMed Central
    Aug 3, 2020 · It is correct that HGT disrupts “true” phylogenetic signal and when it occurs one might be able to infer a network of gene relationships, but ...
  123. [123]
    A New Phylogenomic Approach For Quantifying Horizontal Gene ...
    Jul 24, 2020 · HGT tangles the conventional universal Tree of Life, turning it into a network of Evolution. HGT is pervasive and some estimates of the genes ...
  124. [124]
    impact of HGT on phylogenomic reconstruction methods
    Aug 18, 2012 · We test the accuracy of supertree and supermatrix approaches in recovering the true organismal phylogeny under increased amounts of horizontally transferred ...The Impact Of Hgt On... · Genome Simulation And Tree... · Horizontal Gene Transfer...
  125. [125]
    Intertwining phylogenetic trees and networks - Schliep - 2017
    Mar 7, 2017 · A combination of both, trees and networks, usually provides a better means to understand the underlying phylogenetic signal. The importance of ...
  126. [126]
    Precise phylogenetic analysis of microbial isolates and genomes ...
    May 19, 2020 · PhyloPhlAn 3.0 can, as needed, retrieve and integrate hundreds of thousands of genomes from public resources, while also incorporating ...
  127. [127]
    Phylogenomics and the rise of the angiosperms - Nature
    Apr 24, 2024 · ... advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade. Phylogenomic analysis of ...<|separator|>
  128. [128]
    Recent progress on methods for estimating and updating large ...
    Aug 22, 2022 · One of the reasons for its popularity is that ML tree estimation has been proven to be a statistically consistent estimator of the phylogeny ...
  129. [129]
    Rethinking large-scale phylogenomics with EukPhylo v.1.0, a ...
    Aug 27, 2025 · For the subsequent estimation of species trees, recent phylogenomic approaches include methods that use gene trees as inputs in inferring ...
  130. [130]
    Applications of machine learning in phylogenetics - ScienceDirect.com
    Machine learning approaches have been applied to substitution model selection and inferences of discordance, introgression, and diversification rates.
  131. [131]
    Accurate and efficient phylogenetic inference through end ... - bioRxiv
    Oct 2, 2025 · The inference accuracy is further enhanced through incorporating reinforcement learning-based tree search. Using both simulated and empirical ...
  132. [132]
    Fusang: a framework for phylogenetic tree inference via deep learning
    Oct 11, 2023 · Statistical inference is currently the main approach for reconstructing phylogenetic trees. Many statistical algorithms have been proposed for ...Abstract · Introduction · Materials and methods · Results
  133. [133]
    PhyloInfer - AI-Driven Phylogenetic Tree Reconstruction from Raw ...
    Mar 6, 2025 · The aim of the project is the development of an AI-driven tool that accurately infers phylogenetic trees directly from raw sequencing data of ...
  134. [134]
    PhyloTune: An efficient method to accelerate phylogenetic updates ...
    Jul 26, 2025 · Recent advances in deep learning offer promising opportunities for phylogenetic inference, which can be broadly categorized into classification ...
  135. [135]
    Phylogenetic Methods Meet Deep Learning - Oxford Academic
    We primarily focus on how deep learning (DL), a subset of AI methods that particularly benefits from big data, can enhance the reconstruction and analysis of ...
  136. [136]
    A Review of Artificial Intelligence based Biological-Tree Construction
    Oct 7, 2024 · We review current deep learning-based tree generation methods, summarizing recent advancements and existing challenges, offering a holistic ...
  137. [137]
    Protein Structural Phylogenetics | Genome Biology and Evolution
    Aug 21, 2025 · Most gene phylogenies are based on nucleic or amino acid sequences, as extensively reviewed by Yang and Rannala (2012) and Kapli et al. (2020).
  138. [138]
    Structural phylogenetics unravels the evolutionary diversification of ...
    Oct 10, 2025 · Here, we report the large-scale comprehensive evaluation of phylogenetic trees reconstructed from the structures of thousands of protein ...
  139. [139]
    Phylogenetics from AI-predicted Protein Structures: it works!!
    Sep 24, 2023 · Exciting new research directions. High-accuracy structural phylogenetics has the potential to uncover deeper evolutionary relationships, ...
  140. [140]
    3D protein shapes can resolve ancient evolutionary connections in ...
    Jan 15, 2025 · The three-dimensional shape of a protein can be used to resolve deep, ancient evolutionary relationships in the tree of life, according to a study in Nature ...
  141. [141]
    Structural phylogenetics unravels the evolutionary diversification of ...
    Sep 23, 2023 · Here, we demonstrate that the use of structure-based phylogenies can outperform sequence-based ones not only for distantly related proteins but ...
  142. [142]
    Protein Structural Phylogenetics - PubMed
    Jul 30, 2025 · Protein structural phylogenetics uses 3D structural data to trace evolutionary histories and explore protein structure diversity and ancestral ...
  143. [143]
    (PDF) Protein Structural Phylogenetics - ResearchGate
    Sep 16, 2025 · This article reviews the current state of protein structural phylogenetics, outlines methods for extracting evolutionary insights from ...