Fact-checked by Grok 2 weeks ago

Molecular evolution

Molecular evolution is the study of evolutionary changes at the molecular level, encompassing alterations in the sequences and structures of DNA, RNA, and proteins over time within and across species. This field integrates principles from population genetics, phylogenetics, and comparative genomics to explain how genetic variation arises and is maintained, driven primarily by mechanisms such as mutation, genetic drift, natural selection, gene duplication, and horizontal gene transfer. As a core subdiscipline of evolutionary biology, it emerged in the early 20th century but gained prominence in the 1960s with advances in sequencing technologies that enabled direct observation of molecular sequences, such as the amino acid compositions of proteins like hemoglobin. Key processes in molecular evolution include point mutations, which introduce substitutions at rates typically around 10^{-8} to 10^{-9} per site per generation in eukaryotes, and larger-scale events like insertions, deletions, and chromosomal rearrangements that reshape genomes. Neutral mutations, which do not affect fitness, accumulate via and form the basis of the neutral theory proposed by in , suggesting that most molecular changes are selectively neutral rather than adaptive. In contrast, positive selection accelerates the fixation of beneficial variants, as seen in cases of adaptive protein evolution, while purifying selection removes deleterious changes to preserve functional constraints. provides raw material for innovation by allowing one copy to evolve new functions without disrupting the original, a mechanism central to the expansion of gene families across evolutionary history. Applications of molecular evolution extend to reconstructing phylogenetic relationships through sequence divergence, estimating divergence times via molecular clocks—which assume relatively constant rates of change—and understanding phenomena like at the genetic level. The advent of high-throughput sequencing in the genomic era has revolutionized the field, enabling whole- comparisons that reveal patterns of selection, neutral evolution, and horizontal transfer, particularly in prokaryotes where the latter dominates flux. Today, molecular evolution informs diverse areas, from tracking emergence to elucidating the genetic basis of in changing environments, underscoring its role in bridging microevolutionary processes with macroevolutionary patterns.

Historical Development

Early Foundations

The foundations of molecular evolution were laid in the early through pioneering biochemical studies that began to elucidate the structure and function of biological molecules, particularly proteins, as carriers of genetic information. In 1901, proposed the hypothesis that proteins are composed of polypeptides linked by peptide bonds, based on his synthesis of dipeptides like glycyl-glycine and analyses of protein hydrolysates, which demonstrated the polymeric nature of these macromolecules. This insight shifted understanding from proteins as amorphous colloids to structured chains of , providing an early framework for investigating molecular changes over time. Fischer's work earned him the 1902 , primarily for related advancements in sugar and purine synthesis, but it profoundly influenced subsequent protein chemistry. Building on this, the mid-20th century saw the first complete sequencing of a protein, marking a critical step toward analyzing molecular variation at the sequence level. developed methods using and fluorescence labeling to determine the amino acid sequence of insulin, culminating in the full elucidation of its 51-amino-acid structure by 1955, including the disulfide bridges linking its A and B chains. Published in a series of papers in the Biochemical Journal, Sanger's achievement demonstrated that proteins have precise, genetically determined sequences, challenging earlier views of them as heterogeneous mixtures and enabling comparisons that hinted at evolutionary divergence. This work, for which Sanger received the 1958 , established as a tool for probing hereditary traits. A pivotal realization came with the identification of DNA as the molecule responsible for heredity, shifting focus from proteins to nucleic acids in evolutionary studies. The 1944 Avery-MacLeod-McCarty experiment demonstrated that purified DNA from virulent pneumococcus could transform non-virulent strains into virulent ones, establishing DNA as the "transforming principle" and genetic material, rather than proteins or other components. This was confirmed in 1952 by the Hershey-Chase experiment, which used radioactively labeled bacteriophages to show that DNA, not protein, enters bacterial cells to direct viral replication, with phosphorus-32-labeled DNA recovered inside infected cells while sulfur-35-labeled protein coats remained outside. These experiments provided empirical evidence that molecular evolution operates primarily through changes in DNA, laying the groundwork for later genetic analyses. Early observations of molecular variation emerged in the and through serological studies of blood group antigens and serum proteins, revealing heritable differences at the molecular level that could inform evolutionary relationships. Karl Landsteiner's discovery of the in 1900 identified antigenic variations on surfaces, but by the and , expansions like the factor—discovered by Landsteiner and Alexander Wiener in 1940—highlighted polymorphic proteins and glycoproteins as markers of across populations, useful for tracing migrations and relatedness. Concurrently, electrophoretic analyses of serum proteins in the , pioneered by Arne Tiselius's moving-boundary method ( 1948), revealed individual variations in and fractions, suggesting underlying genetic polymorphisms that predated direct sequencing. These findings demonstrated that molecular traits vary systematically, offering initial empirical data for evolutionary inference without nucleotide-level detail. The conceptual synthesis of these biochemical advances occurred in 1965, when Émile Zuckerkandl and proposed the idea of "molecular paleontology," arguing that sequences in proteins serve as records of evolutionary history, allowing reconstruction of phylogenetic trees through sequence comparisons. In their seminal paper, they illustrated this by comparing sequences across species, showing that differences accumulate over time and reflect divergence from common ancestors, thus bridging biochemistry with . This proposal emphasized proteins like and as "semantides"—molecules directly reflecting genetic information—for inferring deep-time events, setting the stage for the field's expansion with emerging sequencing technologies.

Theoretical Advances

In the early 1960s, Émile Zuckerkandl and developed the hypothesis, proposing that the rate of amino acid substitutions in proteins evolves at a roughly constant rate over time across lineages, enabling the use of molecular differences to estimate divergence times without relying on fossil records. This framework assumed that evolutionary changes at the molecular level accumulate steadily, akin to the ticking of a clock, and provided a theoretical basis for reconstructing phylogenetic histories from protein sequences. Building on this, introduced the in 1968, positing that the majority of genetic variations at the molecular level are selectively and become fixed in populations primarily through random rather than . Under this theory, the rate of molecular evolution is determined by the and , with neutral mutations segregating at rates proportional to their input, leading to a predictable substitution rate independent of adaptive pressures. To quantify nucleotide substitution rates, Thomas H. Jukes and Charles R. Cantor proposed a one-parameter model in , assuming equal probabilities of substitution among the four and equal base frequencies. The model corrects for multiple substitutions at the same site using the formula for the expected number of substitutions per site, d = -\frac{3}{4} \ln\left(1 - \frac{4}{3} p\right), where p represents the observed proportion of differing sites between two sequences. Kimura's neutral theory ignited a longstanding between neutralists, who argued that most molecular changes are non-adaptive and driven by drift, and selectionists, who contended that adaptive plays a dominant role in shaping molecular diversity. Independently, Jack L. King and Thomas H. Jukes reinforced the neutralist perspective in 1969 by analyzing protein sequence data and concluding that many replacements are , fixed via drift, challenging the primacy of Darwinian selection at the molecular level. This highlighted tensions between and deterministic forces in , influencing subsequent empirical tests of molecular rate constancy.

Field Establishment

The invention of methods in 1977 marked a pivotal technological advancement for molecular evolution, allowing researchers to obtain nucleotide sequences from biological samples on an unprecedented scale. and his colleagues developed the chain-termination method, which relies on the incorporation of dideoxynucleotides to halt at specific bases, facilitating the reading of sequences up to several hundred bases long. Independently, Allan Maxam and introduced a chemical cleavage approach that breaks DNA at specific s using and other reagents, enabling the analysis of labeled DNA fragments via . These techniques shifted evolutionary studies from protein-based comparisons to direct genomic data, providing empirical foundations for tracing genetic changes over time. Although originating in 1967, the work of Allan Wilson and Vincent Sarich on the profoundly influenced the field's consolidation in the 1970s by demonstrating that immunological differences in blood proteins, such as , accumulate at a steady rate among . Their analysis of serum albumins from humans, apes, and Old World monkeys suggested divergence times that challenged fossil-based phylogenies, establishing molecular data as a reliable tool for reconstructing evolutionary timelines and inspiring subsequent protein and DNA-based clock models. The institutional framework of molecular evolution solidified in the early 1980s with the founding of the Society for Molecular Biology and Evolution (SMBE) in 1982, prompted by a symposium on the evolution of genes and proteins at Stony Brook University. The society launched its flagship journal, Molecular Biology and Evolution, with its first issue in December 1983, which rapidly became a premier venue for publishing research on genetic mechanisms, phylogenetics, and evolutionary genomics. Technological innovations further entrenched the discipline, including the polymerase chain reaction (PCR), invented by Kary Mullis in 1983, which by the late 1980s enabled the amplification of minute DNA quantities, including from ancient specimens, thus bridging molecular evolution with paleogenomics. The Human Genome Project, launched in 1990 as an international effort and completed in 2003, generated the first reference human genome sequence and spurred comparative analyses across species, accelerating the integration of genomic data into evolutionary biology and establishing molecular evolution as a core interdisciplinary field.

Basic Mechanisms

Mutation

Mutations are the ultimate source of genetic variation in molecular evolution, introducing changes to the DNA or RNA sequences that serve as the raw material for subsequent evolutionary processes. The primary types of mutations include point substitutions, where a single nucleotide is replaced by another, and insertions or deletions (indels), which add or remove nucleotides from the sequence. Point substitutions are further classified as transitions, involving purine-to-purine (A↔G) or pyrimidine-to-pyrimidine (C↔T) changes, and transversions, which swap a purine for a pyrimidine or vice versa; transitions occur more frequently than transversions, often at a ratio of about 2:1 in many organisms due to biochemical biases in replication and repair. A notable example of this transition bias is the elevated mutation rate at CpG dinucleotides, where cytosine deamination to uracil (or 5-methylcytosine to thymine) preferentially generates C→T transitions, leading to rapid sequence divergence at these sites. Indels, while less common than substitutions in coding regions, can disrupt reading frames and have profound effects on protein function, particularly in non-coding areas where they may alter regulatory elements. Mutation rates quantify the frequency of these changes and vary widely across organisms and contexts, typically following a Poisson process that models the random occurrence of independent events over time. In eukaryotes, the per-site per-generation for base substitutions is generally on the order of 10^{-9} to 10^{-8}, as estimated from sequencing and analyses; for instance, in humans, it is approximately 1.2 × 10^{-8} substitutions per per generation. The probability of observing k mutations at a site over time t is given by the : P(k) = \frac{(\mu t)^k e^{-\mu t}}{k!} where μ is the mutation rate per site per unit time and t is the elapsed time, assuming mutations occur as rare, independent events. In contrast, RNA viruses exhibit dramatically higher rates, ranging from 10^{-6} to 10^{-4} substitutions per site per replication cycle, driven by the error-prone nature of RNA-dependent RNA polymerases lacking proofreading activity, which enables rapid viral evolution but limits genome complexity. Germline mutations, which are heritable and passed to offspring, occur at lower rates than somatic mutations within non-reproductive tissues; for example, somatic rates in human cells can be up to two orders of magnitude higher due to accumulated divisions and reduced repair fidelity in differentiated cells. Several factors influence these mutation rates, primarily errors during , exposure to environmental mutagens, and the efficacy of cellular repair mechanisms. Replication errors arise from the inherent infidelity of DNA polymerases, which misincorporate at a baseline rate of about 10^{-5} to 10^{-7} per site before , though post-replication mismatch repair corrects most of these, reducing the net rate to the observed levels. Environmental mutagens, such as ultraviolet radiation, , or chemicals like alkylating agents, induce DNA lesions that, if unrepaired, lead to s; for instance, UV light promotes cyclobutane , often resulting in C→T transitions at dipyrimidine sites. DNA repair pathways, including for deaminated bases, for bulky adducts, and mismatch repair for replication errors, actively suppress mutation accumulation; defects in these systems, as seen in hereditary conditions like Lynch syndrome (mismatch repair deficiency), can elevate rates by 100- to 1000-fold, underscoring their role in maintaining genomic stability.

Selection

Natural selection operates on molecular variants, such as substitutions in DNA sequences, to favor those that enhance organismal fitness, thereby driving adaptive evolution at the genetic level. In molecular evolution, selection manifests through differential survival and reproduction of , influencing the fixation or maintenance of variants in populations. This process contrasts with neutral mechanisms by systematically altering frequencies based on their functional consequences, often detectable through patterns of and divergence across . The primary types of selection at the molecular level include purifying selection, positive selection, and balancing selection. Purifying selection removes deleterious , maintaining functional constraints on proteins, and is characterized by a nonsynonymous substitution rate (dN) lower than the synonymous rate (), yielding a dN/ ratio (ω) less than 1. Positive selection, conversely, promotes advantageous , resulting in ω > 1, indicating adaptive changes in protein function. Balancing selection preserves by favoring multiple alleles, often through mechanisms like , without a straightforward dN/ signature but evident in elevated polymorphism levels. The dN/ ratio is calculated using codon substitution models that account for the , transition/transversion biases, and codon usage frequencies; these models estimate ω by comparing the probability of nonsynonymous versus synonymous changes along phylogenetic branches via maximum likelihood methods. To detect selection from polymorphism and divergence data, the McDonald-Kreitman (MK) test compares the ratio of nonsynonymous to synonymous polymorphisms within a species against the fixed differences between species; a significant excess of nonsynonymous fixed differences over polymorphisms indicates positive selection acting on adaptive substitutions. Developed in , this test has been widely applied to identify departures from expectations in protein-coding genes. Positive selection can lead to selective sweeps, where a beneficial rapidly increases in frequency, reducing at linked sites through a process known as genetic . In this effect, alleles in genomic proximity to the selected site are carried to fixation or near-fixation, creating regions of low polymorphism and high around the sweep. This phenomenon, first modeled in 1974, explains localized reductions in variation observed in bacterial and eukaryotic genomes under strong selection. A classic example of balancing selection is the sickle-cell allele (HbS) in humans, where heterozygotes (AS genotype) exhibit resistance to caused by , conferring a in endemic regions despite the homozygous sickle-cell anemia (SS) being deleterious. Molecular evidence shows elevated polymorphism at the HBB locus in African populations, maintained by this . In , positive selection drives the of antibiotic resistance; for instance, in extended-spectrum β-lactamase genes like CTX-M-1, dN/dS analyses reveal signatures of adaptive substitutions enhancing enzymatic activity against β-lactam antibiotics, facilitating rapid spread in clinical settings.

Genetic Drift

Genetic drift refers to the random fluctuations in frequencies within a population due to stochastic sampling of gametes, independent of . In the context of molecular evolution, it plays a central role in fixing or eliminating neutral variants at the DNA sequence level, particularly in finite populations where chance events can dominate evolutionary change. This process is especially pronounced in small populations, where random loss or fixation of occurs more rapidly, leading to reduced over time. Under the , most substitutions that become fixed in are with respect to , and the rate of molecular evolution equals the neutral mutation rate μ. This theory posits that the majority of fixed changes arise through rather than adaptive selection, explaining the observed constancy of evolutionary rates across lineages. The , denoted N_e, quantifies the strength of drift; smaller N_e amplifies its effects by increasing variance in frequencies. For a arising as a single copy in a diploid , the probability of eventual fixation is approximately 1/(2N_e), reflecting the inverse relationship between and the chance of random fixation. Additionally, the average time to fixation for such a , conditional on it fixing, is roughly 4N_e generations, highlighting how drift operates over extended timescales in larger . Population bottlenecks exemplify how severe reductions in N_e accelerate drift, drastically lowering . In ( jubatus), a historical bottleneck approximately 10,000–12,000 years ago reduced the effective population size to near levels, resulting in extremely low heterozygosity across nuclear and mitochondrial loci, increased homozygosity, and heightened vulnerability to diseases and reproductive issues. provides a mathematical framework to model drift backward in time, simulating the of sampled alleles under random coalescence; Kingman's 1982 formulation describes this process as a in the limit of large populations, enabling inference of historical demographic events from modern genetic data. In small populations, elevated drift facilitates the fixation of slightly deleterious mutations, often leading to pseudogenization—the inactivation and eventual loss of functional genes through accumulated disabling changes. This phenomenon is evident in species with persistently low N_e, such as certain island endemics or fragmented populations, where purifying selection is less effective against mildly harmful variants, resulting in genome-wide accumulation. Genetic drift thus contributes to constructive neutral evolution by allowing non-adaptive structural changes that may later become essential.

Gene Conversion

Gene conversion is a form of non-reciprocal that homogenizes DNA sequences between paralogous regions, effectively transferring genetic information from a donor sequence to an acceptor without reciprocal exchange. This process typically arises during the repair of double-strand breaks (DSBs) in DNA, where the broken strand invades a homologous sequence as a template for synthesis, leading to the replacement of mismatched segments in the acceptor with the donor's sequence. In the context of molecular evolution, gene conversion plays a key role in maintaining sequence identity among duplicated genes, counteracting the accumulation of mutations that would otherwise promote divergence. The mechanism is particularly prominent during , where DSBs induced by the Spo11 protein initiate recombination, and pathways such as synthesis-dependent strand annealing can result in conversion tracts of 100–2000 base pairs. Biased conversion can favor over AT alleles due to mismatch repair preferences, influencing composition over time. Rates of conversion vary across eukaryotes but are generally estimated at 10^{-6} to 10^{-4} per site per generation, often comparable to or exceeding rates in some lineages, thereby exerting a significant homogenizing pressure on paralogous sequences. This reduces between paralogs, preserving functional similarities within gene families despite independent mutational histories. A classic example of gene conversion's impact is seen in the concerted of ribosomal RNA (rRNA) genes, where multiple copies across chromosomes maintain near-identity through ongoing conversion events, as proposed in the molecular drive model. In the , rRNA gene clusters exhibit this pattern, with conversion ensuring uniform sequences essential for despite high copy numbers. Similarly, in the globin gene families, gene conversion events have integrated pseudogenes into functional ; for instance, in β-globin clusters, conversions between functional genes and pseudogenes like η-globin have altered patterns and potentially contributed to adaptive variants. Detection of gene conversion in molecular datasets often relies on signatures such as accelerated decay of (LD) between markers in paralogous regions, indicating non-reciprocal exchanges that break down expected associations faster than recombination alone. Phylogenetic analyses may also reveal incongruence, where converted sequences cluster with donors rather than expected orthologs, disrupting tree topologies and highlighting historical transfer events. These methods underscore gene conversion's role as a pervasive force in shaping sequence evolution, distinct from processes by its targeted homogenization.

Genome Architecture

Genome Size

Genome size, measured as the total amount of DNA in a haploid nucleus (C-value), varies enormously across organisms, spanning several orders of magnitude from approximately 5 × 10^{-5} pg in bacteriophage lambda (a virus) to over 160 pg in the fern Tmesipteris oblanceolata (as of 2024), with Paris japonica at about 152 pg representing one of the largest in plants. This variation highlights the dynamic nature of molecular evolution, where genome size is not fixed but shaped by mutational processes, selection pressures, and neutral drift over evolutionary time. A central puzzle in molecular evolution is the C-value paradox, which describes the lack of correlation between and organismal complexity or gene number. For instance, the onion (Allium cepa) has a haploid of approximately 16 pg—over five times larger than the at about 3.3 pg—despite humans possessing far greater phenotypic complexity and roughly 20,000 protein-coding genes compared to the onion's estimated 40,000. Similarly, bread wheat (Triticum aestivum) has a genome of around 17 pg, exceeding the human size, yet it possesses over 100,000 protein-coding genes. This paradox arises because much of the DNA increase stems from non-genic elements rather than additional genes; in humans, for example, transposable elements (TEs) comprise about 45% of the genome, often amplifying through selfish replication without contributing to complexity. , the multiplication of entire chromosome sets, is another major driver, particularly in plants, where it can rapidly double or quadruple genome size and foster evolutionary innovation through . While genome size does not scale with gene number or complexity, it correlates strongly with cell size across eukaryotes, a relationship known as the genome size-cell size rule. Larger genomes necessitate bigger nuclei to accommodate the DNA, which in turn influences cytoplasmic volume and overall cell dimensions; for example, angiosperms with larger C-values exhibit proportionally larger stomata and pollen grains. However, this expansion incurs evolutionary trade-offs: replicating a larger genome requires more time and energy, potentially slowing cell division rates, and increases the risk of replication errors due to the higher number of DNA synthesis events. In small populations or under resource-limited conditions, these costs can elevate extinction risk by amplifying stochastic mutations. Thus, genome size evolution balances informational storage against physiological constraints, contributing to diverse life histories in molecular evolution.

Chromosome Organization

Chromosome organization in eukaryotes varies widely, influencing the patterns and rates of molecular evolution through structural changes that affect linkage and recombination. The number of chromosomes can differ dramatically across due to fusions and fissions, which alter karyotypes without necessarily changing . For instance, in the Indian deer (Muntiacus muntjak vaginalis), the diploid number is as low as 2n=6 in females and 2n=7 in males, resulting from extensive Robertsonian fusions that reduced the ancestral count from around 2n=46 seen in related Reeves' (Muntiacus reevesi). Similarly, the Myrmecia pilosula exhibits extreme intraspecific variation, with diploid numbers ranging from 2n=2 to 2n=4 or higher due to fusions and shifts, representing one of the lowest counts in . These variations highlight how chromosomal rearrangements can occur rapidly, with the lineage showing one of the fastest rates of evolution among vertebrates. Karyotype evolution often involves inversions and translocations that rearrange while preserving in many cases, thereby maintaining functional genomic architecture during molecular evolution. Paracentric and pericentric inversions reverse segments of chromosomes, suppressing recombination in heterozygous individuals and potentially fixing adaptive combinations. Translocations, including and nonreciprocal exchanges, can relocate large blocks of between chromosomes, contributing to evolutionary novelty without disrupting overall if breakpoints avoid essential regions. These mechanisms are evident in mammalian lineages, where such rearrangements have driven divergence while linking to broader stability. Chromosomes also differ in centromere organization, with monocentric types featuring a single localized and holocentric types distributing centromeric function along their entire length, impacting evolutionary flexibility. Monocentric chromosomes, common in vertebrates and most , rely on a discrete for spindle attachment, making them prone to instability during fusions or fissions. In contrast, holocentric chromosomes, found in nematodes, like , and some , allow attachments anywhere along the , facilitating tolerance to breakage and promoting higher rates of structural evolution. This distributed centromere activity has evolved convergently multiple times, enabling rapid changes without loss of viability. Chromosome number evolves at comparable rates in both systems, but holocentrics may accelerate diversification in fragmented habitats. Sex chromosome organization evolves through similar rearrangements, often leading to degeneration of the Y chromosome in XY systems due to suppressed recombination. In many mammals and Drosophila, the Y chromosome accumulates deleterious mutations, transposable elements, and gene loss after evolving from autosomes, as the lack of pairing with the X prevents repair and purging of harmful variants. This degeneration reduces Y gene content to essential functions like male fertility, with neo-Y chromosomes in young systems showing early signs of insertions and frameshifts. Such changes can drive sex-specific adaptations but also contribute to evolutionary instability. Overall, chromosomal rearrangements serve as key drivers of by reducing recombination rates in zones, thereby preserving co-adapted complexes and amplifying . Inversions and fusions create underdominance or suppress crossover in heterozygotes, lowering and facilitating divergence even under gene exchange. This role is particularly pronounced in rapidly evolving lineages like muntjacs, where rearrangements correlate with radiation.

Organelle Genomes

Organelle genomes, encompassing (mtDNA) and (cpDNA), represent distinct evolutionary lineages derived from ancient endosymbiotic events, where free-living were incorporated into eukaryotic host cells. Mitochondria originated from an alphaproteobacterial , while chloroplasts arose from a cyanobacterial , leading to the transfer of many s from these s to the nuclear genome over evolutionary time. This endosymbiotic gene transfer has resulted in highly reduced organelle genomes that retain only a subset of essential genes, primarily those involved in core bioenergetic functions like in mitochondria and in chloroplasts. In humans, for instance, the mitochondrial genome encodes 13 proteins, 22 transfer RNAs (tRNAs), and 2 ribosomal RNAs (rRNAs), totaling 37 s. Mitochondrial genomes are typically circular, double-stranded DNA molecules, measuring approximately 16.6 kilobases (kb) in humans. They exhibit a mutation rate 10-20 times higher than that of nuclear DNA, attributed to limited DNA repair mechanisms and proximity to reactive oxygen species generated during respiration. This elevated mutation rate contributes to rapid sequence evolution, particularly in animals, where mtDNA evolves faster than in plants due to differences in replication fidelity and selection pressures. Over evolutionary history, extensive gene transfer to the nucleus has reduced the coding capacity; for example, 37 genes remain in the human mitochondrial genome, with many others relocated and repurposed under nuclear control. Mitochondrial DNA inheritance is predominantly uniparental and maternal in most eukaryotes, minimizing recombination and facilitating the accumulation of mutations, though rare paternal leakage can occur. Heteroplasmy, the coexistence of multiple mtDNA variants within a cell or individual, arises from this inheritance pattern and can influence disease susceptibility, as variant frequencies shift through bottleneck effects during oogenesis. Chloroplast genomes, found in photosynthetic eukaryotes, are larger circular molecules, typically ranging from 120 to 160 kb, encoding around 100-120 including those for photosynthetic proteins, rRNAs, and tRNAs. Like mtDNA, cpDNA has undergone significant gene transfer to the following its cyanobacterial endosymbiosis, leaving a compact focused on and . Chloroplasts also display uniparental, usually maternal, in angiosperms, which helps maintain genome stability but can lead to cytonuclear coordination challenges. Evolutionary rates in cpDNA are generally slower than in animal mtDNA but faster than in plant mtDNA, reflecting moderate mutation rates influenced by efficient repair systems and exposure to light-induced damage. In , cpDNA evolves more slowly overall compared to animal counterparts, with structural rearrangements like inversions and expansions occurring less frequently than in mitochondria. Codon usage biases in chloroplast , often shaped by mutational pressures and translational efficiency, show preferences for AT-rich codons, contributing to adaptive evolution in photosynthetic lineages. These genomes highlight unique evolutionary dynamics, including reduced recombination and high copy numbers per , which amplify the impact of drift relative to genomes while preserving endosymbiotic legacies.

Gene Evolution

Gene Family Dynamics

Gene families evolve through a dynamic balance of expansion and contraction, primarily driven by and loss events that shape their size and functional diversity over evolutionary time. Duplication mechanisms include tandem duplications, where genes are copied adjacently on the ; segmental duplications, involving larger genomic regions; and whole-genome duplications (WGD), which arise from polyploidization events and affect the entire genome. These processes generate redundancy, but most duplicates are short-lived, with retention rates typically ranging from 10% to 20% following duplication, as the majority are lost due to lack of selective advantage or dosage imbalances. Retained duplicates often contribute to adaptive innovation by partitioning ancestral functions or acquiring novel ones, thereby expanding the family's repertoire. The fate of duplicated genes is explained by models such as neofunctionalization, where one copy retains the original function while the other evolves a new role, as proposed by Ohno in his seminal work on evolution by gene duplication. In contrast, subfunctionalization posits that both copies degenerate complementary subsets of the ancestral function, preserving the pair through division of labor without requiring novel adaptations, a mechanism formalized by Force et al. through analysis of regulatory mutations. A classic example is the clusters in vertebrates, which expanded via two rounds of WGD early in vertebrate evolution; subsequent subfunctionalization partitioned spatial and temporal expression patterns among paralogs (e.g., HoxA, HoxB, HoxC, HoxD), while neofunctionalization enabled innovations like fin-to-limb transitions. These models highlight how duplication fosters diversification, with outcomes depending on selective pressures and genetic context. Illustrative cases underscore these dynamics within specific families. In the (RNR) family, essential for , phylogenetic analyses reveal ancient duplications leading to three classes (I, II, III) that diverged through structural innovations and adaptations to oxygen levels, with class I dominating in aerobes via aerobic enhancements. Similarly, in the family, paralogs adapted for oxygen storage in muscle; site-specific changes, such as substitutions in the pocket, fine-tuned oxygen-binding affinity in diving mammals like whales, balancing storage with release under without altering overall family size dramatically. Family contraction counterbalances expansion through pseudogenization, where duplicates accumulate disabling mutations (e.g., frameshifts, premature stops) and become nonfunctional pseudogenes, or direct deletion removing genomic segments. These losses are often neutral, governed by in non-essential copies, though selection may accelerate pseudogenization in dosage-sensitive genes to restore post-duplication. Over time, this "birth-and-death" process maintains , with drift ensuring that contraction rates match duplication to prevent unchecked proliferation.

Origins of New Genes

New genes in molecular evolution can emerge through mechanisms that generate novel genetic material beyond the simple duplication of existing genes, such as origination from non-coding sequences, retrotransposition, , and exon shuffling via intronic recombination. These processes contribute to genetic innovation by creating sequences without clear to ancestral genes, often leading to rapid functional diversification. While gene duplication serves as a precursor for many evolutionary novelties, the following mechanisms emphasize the genesis of entirely new coding potential. De novo origination refers to the evolution of protein-coding genes from previously non-genic DNA, including intergenic regions, introns, or untranscribed sequences that acquire transcriptional and translational competence through mutations. In Drosophila melanogaster, population genomic studies have identified 106 fixed and 142 segregating de novo genes, predominantly expressed in testis tissues, highlighting their role in reproductive adaptation. These young genes typically exhibit rapid sequence evolution, with elevated nonsynonymous substitution rates that facilitate quick acquisition of beneficial functions under sexual selection. For instance, systematic analyses across the Drosophilinae subfamily have uncovered 589 de novo candidates, underscoring their prevalence as a source of lineage-specific innovation. Retrotransposition generates retrogenes by reverse-transcribing mature mRNA into intronless cDNA, which integrates into new genomic locations, often acquiring novel promoters and regulatory elements to gain function. Unlike standard duplicates, retrogenes start as processed copies and can evolve independently, with many becoming functional in mammals through testis-biased expression. In humans, some retrocopies initially classified as processed pseudogenes have acquired exonic sequences or upstream promoters, enabling them to produce functional proteins distinct from their parental genes. This mechanism has contributed to the expansion of families involved in , where retrogenes often show accelerated evolution compared to their autosomal origins. Horizontal gene transfer (HGT) serves as a primary origin of new genes in bacteria and archaea, allowing the acquisition of functional DNA from distantly related organisms via conjugation, transformation, or transduction. This process introduces pre-evolved genes that confer immediate adaptive advantages, such as metabolic pathways or virulence factors, bypassing gradual mutation. In prokaryotes, HGT accounts for a substantial portion of pangenome diversity, with examples including the transfer of antibiotic resistance cassettes that rapidly disseminate across bacterial populations. Although less common in eukaryotes, HGT contributes to novel genes in lineages like bdelloid rotifers and fungi, enhancing evolutionary flexibility. Exon shuffling via intronic recombination enables the modular of new genes by fusing s—often encoding protein —from unrelated ancestral genes, typically through non-homologous or illegitimate recombination within introns. This has been instrumental in the of complex multidomain proteins, such as those in the and signaling pathways in eukaryotes. For example, the gene's structure reveals ancient exon shuffling events that combined repeated domains for enhanced ligand binding. Comparative genomic analyses indicate that exon shuffling hotspots correlate with repetitive elements in introns, promoting domain fusions that drive functional novelty without whole-gene duplication. Human-specific genes like ARHGAP11B illustrate how partial duplication can intersect with these mechanisms, arising from a truncated copy of ARHGAP11A approximately 3 million years ago and evolving a novel C-terminal extension via , which promotes neocortical expansion.

Constructive Neutral Evolution

Constructive neutral evolution (CNE) posits that molecular complexity can arise through non-adaptive processes, where fixes neutral or slightly deleterious mutations that increase interdependence among genetic elements, thereby constructing obligatory interactions without invoking . In this framework, redundant pathways or components initially provide functional backup, but drift can eliminate alternatives, rendering the redundant elements essential and thus enhancing system complexity. Michael Lynch elaborated on this theory, arguing that in populations of sufficient size, drift facilitates the fixation of such dependencies, leading to the of intricate genetic networks that appear irreducibly complex but originate neutrally. This contrasts with adaptive explanations, which attribute complexity to direct selective benefits, whereas CNE emphasizes how neutral processes can "construct" elaborate structures by progressively locking in interdependencies. A key mechanism in CNE involves the fixation of slightly deleterious variants that create dependencies, as neutral drift in finite populations allows such mutations to spread despite their minor fitness costs. For instance, when a protein acquires a that impairs its folding but is compensated by an existing chaperone, the chaperone becomes obligatory if the original folding pathway is lost through drift, increasing reliance on auxiliary machinery. Similarly, the evolution of the illustrates CNE: group II self-splicing introns fragmented into smaller components that required protein factors for reassembly, with drift fixing these dependencies as the autonomous splicing capability eroded, transforming a simple into a complex ribonucleoprotein machine. These examples highlight how CNE builds from existing elements via neutral loss of redundancy, rather than novel adaptive innovations. The mathematical foundation for CNE relies on population genetics models of fixation probabilities, particularly for slightly deleterious variants that impose dependencies. The probability of fixation for such a variant with selection coefficient s (where s < 0 but small in magnitude) is approximated by \pi \approx \frac{2s}{1 - e^{-4N_e s}}, where N_e is the effective population size; in small populations, this probability approaches the neutral case of $1/(2N_e), allowing deleterious dependencies to accumulate and become fixed by drift. This extends to neutral complexes, where the stepwise fixation of interdependent mutations occurs without selective pressure, as the overall fitness effect remains near zero until redundancies are lost. Such dynamics underscore how CNE operates alongside genetic drift to foster molecular interdependence, providing a neutral pathway for evolutionary elaboration.

Phylogenetic Inference

Methods in Molecular Phylogenetics

Molecular phylogenetics employs a variety of methods to infer evolutionary relationships from molecular data, such as DNA, RNA, or protein sequences, by constructing phylogenetic trees that represent hypothesized ancestor-descendant relationships. These methods can be broadly categorized into distance-based approaches, which use pairwise evolutionary distances between sequences, and character-based approaches, which directly analyze sequence site patterns. Distance methods, like neighbor-joining, are computationally efficient for large datasets and assume an additive distance metric to build trees iteratively by joining the least distant pairs of taxa. The neighbor-joining algorithm, introduced by Saitou and Nei in 1987, minimizes the total branch length of the tree and has been widely adopted for its speed and ability to handle moderate amounts of rate variation across lineages. Character-based methods include maximum parsimony, which seeks the requiring the fewest evolutionary changes (steps) to explain the observed data, and maximum likelihood, which evaluates s based on their probability under a specified model of sequence . Maximum parsimony, formalized for molecular data by Fitch in 1971, prioritizes simplicity but can be inconsistent under certain conditions, such as when long branches converge artifactually. Maximum likelihood, as developed by Felsenstein in 1981, optimizes the likelihood of observing the data given a and evolutionary model, providing a statistical framework that accounts for probabilities and branch lengths. extends this by incorporating prior probabilities and using (MCMC) sampling to estimate posterior distributions of s, as implemented in software like MrBayes by Huelsenbeck and Ronquist in 2001. Central to likelihood-based methods are substitution models that describe the process of or changes over time, incorporating parameters like transition/transversion ratios and base frequencies. The HKY85 model, proposed by Hasegawa, Kishino, and Yano in , extends earlier models by allowing unequal base frequencies and a distinct rate for transitions versus , improving fit for diverse molecular data. To accommodate heterogeneity across sites or genomic regions, datasets are often partitioned, such as by codon positions or regions (e.g., exons versus introns), allowing independent model parameters for each partition to better capture evolutionary dynamics. Branch support in phylogenetic trees is commonly assessed using bootstrap resampling, a nonparametric introduced by Felsenstein in 1985, which generates pseudoreplicate datasets by resampling columns with replacement and recalculates trees to estimate the proportion of replicates supporting each . Values above 70-95% typically indicate robust support, though interpretation depends on the used. A key challenge addressed in these methods is long-branch attraction, an artifact where rapidly evolving lineages are erroneously grouped together due to convergent substitutions, first demonstrated by Felsenstein in 1978 for and later shown to affect and likelihood methods without proper modeling. Techniques like using complex substitution models or slow-evolving genes mitigate this issue. Among-site rate variations are briefly accounted for in these approaches through models like the , though detailed handling of rate variation across lineages is addressed in the following subsection. Widely used software packages facilitate these analyses: , developed by Felsenstein since 1980, supports parsimony, distance, and likelihood methods across multiple data types; RAxML, originating from Stamatakis in 2006, excels in rapid maximum likelihood inference for large alignments with ; and MrBayes enables Bayesian MCMC sampling for comprehensive posterior exploration. These tools have enabled phylogenomic studies by balancing accuracy, speed, and scalability in tree reconstruction.

Evolutionary Rate Variation

Evolutionary rate variation refers to the differences in the pace of molecular substitutions across different positions in a , among genes, or along phylogenetic lineages, which complicates the assumption of a strict in phylogenetic inference. This heterogeneity arises due to varying selective pressures, functional constraints, and mutational biases, leading to some sites or lineages evolving rapidly while others remain nearly invariant. Accounting for such variation is essential for accurate estimation of evolutionary distances and divergence times in . A primary source of rate variation occurs at the site level, where substitution rates differ substantially across nucleotide or amino acid positions within a gene. To model this site heterogeneity, the gamma distribution is commonly used, assuming that rates follow a continuous probability distribution that captures both conserved and hypervariable sites. In the +Γ model, site-specific rates are drawn from a gamma distribution with shape parameter α and scale parameter β, discretized into categories for computational efficiency during likelihood calculations. This approach, introduced by Yang in 1994, significantly improves phylogenetic estimates by accommodating the overdispersion of rates observed in empirical data. Rate variation also manifests across phylogenetic lineages, where evolutionary tempos drift over time due to changes in , , or environmental pressures. Relaxed clock models address this by allowing branch-specific rates while assuming some or among them; for instance, the uncorrelated lognormal relaxed clock treats rates on each as independent draws from a , enabling rate heterogeneity without enforcing a global clock. Such models, developed by Drummond et al. in 2006, permit more realistic divergence time estimates in Bayesian frameworks. Empirical observations highlight systematic rate differences, such as the generally faster substitution rates in compared to DNA in , often by a factor of 5–10, attributed to higher rates and reduced effective sizes in the mitochondrial genome. The covarion model further refines this by positing that the evolutionary rate at a site can change over time—sites may switch between variable (fast-evolving) and conserved (slow-evolving) states along a phylogeny—capturing temporal shifts in selective constraints that static gamma models overlook. This model, formalized by Tuffley and in 1998, is particularly useful for analyzing ancient divergences where site roles evolve. Representative examples illustrate these patterns: (rRNA) genes often exhibit relatively clock-like evolution due to strong structural constraints maintaining conserved secondary structures, making them suitable for deep phylogenetic reconstructions. In contrast, immune system genes, such as those in the (MHC), display highly variable rates driven by pathogen-mediated positive selection, resulting in accelerated evolution at antigen-binding sites to enhance diversity.

Modern Approaches

Sequencing Technologies

Next-generation sequencing (NGS) technologies revolutionized molecular evolutionary studies by enabling high-throughput, cost-effective analysis of across populations and species. Introduced commercially in 2005 with the 454/Roche platform based on , NGS shifted from Sanger sequencing's low-throughput approach to methods that generate millions of short reads (typically 50-300 base pairs) per run. Illumina's sequencing-by-synthesis technology, launched as the Genome Analyzer in 2006, dominated the field due to its accuracy and scalability, facilitating applications like population genomics to infer evolutionary histories from allele frequency data. Long-read sequencing technologies, emerging in the 2010s, addressed limitations of short-read NGS by producing reads exceeding 10,000 base pairs, crucial for resolving repetitive regions, structural variants, and complex assemblies in evolutionary contexts. (PacBio) introduced in 2010, offering circular consensus reads with high fidelity for of eukaryotic s to trace divergence events. (ONT), commercialized around 2014, provided portable, real-time nanopore-based sequencing that detects base modifications directly, aiding studies of epigenetic and rapid microbial . Single-cell sequencing methods, such as single-cell RNA sequencing (scRNA-seq), have advanced the resolution of evolutionary processes at the cellular level by capturing transcriptomic heterogeneity in diverse lineages. Developed in the early 2010s, scRNA-seq enables reconstruction of evolutionary trajectories in cell populations, revealing developmental and adaptive dynamics without averaging bulk signals, as applied to immune cell evolution and tumor heterogeneity. Ancient DNA (aDNA) sequencing techniques, refined for degraded samples, have illuminated ; the 2010 used NGS to sequence ~1.3-fold coverage from fossils, confirming interbreeding with modern humans via shared variants. Metagenomics leverages NGS to sequence all genetic material from environmental samples, uncovering microbial evolutionary diversity without cultivation, such as tracking gene transfer and in microbiomes. These advancements have driven sequencing costs down dramatically, from approximately $100 million per in 2001 during the era to under $1,000 by the early 2020s and around $200 as of 2025, democratizing access for evolutionary research.

Computational Tools

Computational tools play a central role in molecular evolution by enabling the analysis of vast genomic datasets to infer evolutionary histories, detect adaptive changes, and model complex processes such as and selection. These tools encompass traditional phylogenetic software, advanced algorithms, and emerging frameworks that process sequence data to reconstruct evolutionary relationships and predict future trajectories. By integrating statistical models with , they address challenges like incomplete lineage sorting and reticulate evolution, providing insights unattainable through manual methods alone. In , software packages like facilitate of evolutionary , incorporating molecular clocks to estimate divergence times from alignments. uses (MCMC) sampling to integrate substitution models, tree topologies, and demographic parameters, making it particularly useful for dated phylogenies in viral and studies. Similarly, IQ-TREE employs maximum likelihood methods to efficiently reconstruct phylogenetic from large datasets, outperforming alternatives like RAxML in speed and accuracy for phylogenomic analyses involving thousands of genes. Its hill-climbing optimizes tree searches while supporting model selection via , enabling robust inference of evolutionary rates across taxa. Artificial intelligence has revolutionized molecular evolutionary analysis, with deep learning models enhancing tasks like sequence alignment and protein structure prediction to trace evolutionary changes. For instance, , developed in 2021, uses neural networks trained on evolutionary multiple sequence alignments to predict protein tertiary structures with near-experimental accuracy, revealing how mutations alter folding and function over evolutionary time. Machine learning also aids species delimitation by clustering genomic variants without predefined boundaries; unsupervised approaches, such as those based on convolutional neural networks, integrate multilocus data to identify cryptic species boundaries more reliably than traditional methods like . Genome-wide association studies (GWAS) adapted for selection scans detect signatures of by correlating allele frequencies with environmental variables across populations, identifying loci under positive or balancing selection in molecular evolution. These scans, often implemented in tools like PLINK or custom pipelines, millions of SNPs to pinpoint adaptive variants, as demonstrated in and studies where GWAS revealed polygenic responses to pressures. For reticulate evolution involving hybridization, network phylogeny software such as PhyloNet reconstructs non-tree-like histories by inferring reticulation events from gene trees, accounting for and in complexes like and fungi. Recent advancements in the 2020s incorporate transformer models, attention-based architectures originally from , to predict evolutionary trajectories directly from sequence data. These models, such as applied to , learn long-range dependencies in mutational paths to forecast lineage frequencies and adaptive shifts, achieving higher accuracy than recurrent neural networks in simulations. By processing sequential alignments as "sentences," transformers enable scalable predictions of protein and , bridging sequence data with forward evolutionary modeling.

Experimental Methods

Experimental methods in molecular evolution enable direct and of genetic changes in settings, providing empirical insights into evolutionary processes that complement computational and observational approaches. These techniques involve controlled interventions, such as and selection, to mimic or test specific hypotheses about molecular mechanisms. By accelerating evolutionary timescales, they reveal how mutations arise, fix, and confer fitness advantages in real-time, often using microbial or cellular systems for their rapid rates. Directed evolution stands as a cornerstone technique, pioneered in the 1990s, where random and iterative selection optimize protein function. Frances Arnold's seminal work demonstrated this by using error-prone to introduce random mutations into the gene encoding subtilisin E, followed by screening variants for activity in the organic solvent (DMF), yielding enzymes with up to 100-fold improved stability and function in non-aqueous environments. This method, involving cycles of (e.g., via error-prone with biased incorporation) and high-throughput selection, has since been applied to engineer enzymes for industrial biocatalysis, such as improving or substrate specificity in lipases and oxidoreductases. Arnold's approach highlighted how laboratory evolution parallels natural processes, with recombination steps like further enhancing diversity and efficiency. Long-term evolution experiments provide a window into sustained molecular change over thousands of generations. Richard Lenski's long-term experiment (LTEE), initiated in 1988, tracks 12 initially identical asexual populations propagated daily in a glucose-limited medium, exceeding 80,000 generations as of 2024, with daily transfers continuing into 2025 to yield ongoing data on adaptation dynamics. Key observations include parallel mutations in core metabolic genes across populations, such as those enhancing citrate utilization in one after 31,500 generations via a tandem duplication enabling aerobic —a novel trait absent in the ancestor. These experiments quantify increases (e.g., up to 1.5-fold over 50,000 generations) and , revealing contingency and repeatability in molecular trajectories under controlled selection. CRISPR-Cas9 has revolutionized experimental testing of evolutionary hypotheses since its for in 2012, allowing precise simulation of mutations to assess their impacts. In fish, CRISPR-Cas9 targeted edits at the Ectodysplasin (Eda) locus—a major evolutionary site for armor plate reduction—confirmed its role in parallel to freshwater environments by altering phenotypic traits like scale coverage. Similarly, in Pierid butterflies, knockouts of the nitrile-specifier protein (NSP) gene disrupted detoxification, testing coevolutionary arms races with host plants and revealing how single mutations drive ecological specialization. These applications enable , such as linking specific alleles to under selection, without relying on natural variation. Organoid models extend to multicellular contexts, culturing three-dimensional tissue-like structures from stem cells to study intercellular dynamics and . These self-organizing systems recapitulate tissue architecture, allowing observation of evolutionary processes like somatic mutations in cancer organoids, where sequential mutations lead to heterogeneous populations mimicking tumor progression over weeks. For instance, intestinal derived from patient cells enable tracking of driver mutations in and genes, illustrating how multicellular constraints shape paths differently from unicellular models. Recent advances incorporate environmental stressors to simulate selection, providing insights into developmental and . Synthetic biology techniques, including ancestral sequence reconstruction (ASR), resurrect ancient genes to probe molecular evolution directly. By inferring and synthesizing ancestral DNA sequences from phylogenetic alignments, researchers express and test proteins from extinct lineages, such as resurrecting a 450-million-year-old luciferase enzyme that illuminated early bioluminescent transitions in copepods. In a 2020s example, ASR revived ancient antibiotic resistance proteins from soil bacteria, revealing how promiscuous ancestral enzymes evolved specificity against modern pathogens, informing drug design. These methods confirm evolutionary predictions, like increased stability in ancient steroid receptors, and bridge paleogenomics with functional assays.

Academic Resources

Key Journals

Several key peer-reviewed journals serve as primary outlets for research in molecular evolution, publishing theoretical, empirical, and genomic studies that advance understanding of evolutionary processes at the molecular level. These journals emphasize rigorous peer review and high-impact contributions, often integrating computational, experimental, and phylogenetic approaches. Molecular Biology and Evolution (MBE), founded in 1983 by the Society for Molecular Biology and Evolution (SMBE), is a leading venue for theoretical and empirical studies on molecular evolutionary patterns, processes, and predictions across taxonomic, functional, genomic, and phenotypic levels. It transitioned to a fully open-access model in 2021 to broaden accessibility, reflecting broader trends in the field. The journal's 2024 impact factor is 5.3 (Clarivate Analytics), underscoring its influence in evolutionary biology. Genome Biology and Evolution (GBE), established in 2009 as an open-access sister journal to MBE under SMBE, specializes in genomic approaches to evolutionary questions, including genome structure, function, and adaptation. It prioritizes data-intensive research, aligning with the post-2010 shift toward open-access formats and large-scale genomic datasets in molecular evolution studies. GBE's 2024 impact factor is 2.8 (Clarivate Analytics), positioning it as a key resource for interdisciplinary genomic-evolutionary work. Evolution, launched in 1947 by the Society for the Study of , covers a broad spectrum of but remains central to molecular evolution through publications on genetic mechanisms, , and . Its scope includes empirical molecular studies that bridge micro- and macroevolutionary scales. The journal's 2024 is 2.6. Systematic Biology, originating in 1952 as Systematic Zoology and renamed in 1992, focuses on phylogenetic inference and evolutionary systematics, with significant contributions to molecular phylogenetics and rate variation analyses. It publishes methodologically innovative papers that integrate molecular data for reconstructing evolutionary histories. The journal's 2024 impact factor is 5.7 (Clarivate Analytics), with a 5-year impact factor of 6.9. Post-2010, molecular evolution journals have increasingly adopted open-access models and emphasized data-heavy publications, driven by advances in sequencing technologies and the need for reproducible, large-scale analyses. This trend, exemplified by GBE's launch and MBE's 2021 transition, has facilitated wider dissemination of genomic datasets and computational tools central to the field.

Professional Societies

The Society for Molecular Biology and Evolution (SMBE), established in 1982, serves as a primary dedicated to advancing research in molecular evolution by facilitating communication among worldwide. It hosts annual meetings that convene researchers to present and discuss advancements in areas such as and phylogenomics, often featuring symposia on emerging topics like integration. SMBE recognizes exceptional contributions through several awards, including the Early-Career Excellence Award for independent researchers within 3-7 years post-PhD, the Mid-Career Excellence Award, the Lifetime Research Achievement Award, and the Service to the SMBE Community Award, each providing cash prizes and travel support to recipients. Following 2020, SMBE launched the Inclusion, Diversity, Equity, and Access (IDEA) to enhance participation from underrepresented groups in and evolution research. The European Society for Evolutionary Biology (ESEB), founded in 1987, promotes across and globally, with dedicated sections on molecular evolution that integrate and population-level analyses. Its biennial congresses, attracting over 1,500 participants, include symposia on molecular evolution, , and related fields, fostering interdisciplinary collaboration. ESEB supports through its Equal Opportunities Board, which funds workshops, seminars, and travel grants for under-represented early-career researchers, alongside inclusivity measures like customized badges and social mixers at events implemented post-2020. These initiatives aim to broaden representation in evolutionary studies, including molecular aspects. Professional societies in molecular evolution also support specialized subgroups and activities, such as those addressing molecular evolution within broader frameworks like SMBE's plant-focused sessions or ESEB's symposia on in non-seed . To tackle research gaps, these organizations promote workshops on non-model , enabling studies of evolutionary processes in understudied taxa like or wild , as exemplified by collaborative events emphasizing practical genomic tools for diverse . Many such societies affiliate with journals for dissemination, though detailed outlets are covered elsewhere.

References

  1. [1]
    Neutral theory and beyond: A systematic review of molecular ...
    Jul 28, 2023 · Molecular evolution, or the study of changes in DNA, RNA, and proteins over time, represents a major subfield of evolutionary biology that began ...
  2. [2]
  3. [3]
    [PDF] 7.3 Genetic Drift and Molecular Evolution
    Apr 24, 2024 · The study of molecular evolution began in the mid-1960s, when biochemists succeeded in determining the amino acid sequences of hemoglobin, ...
  4. [4]
    Instant Update: Considering the Molecular Mechanisms of Mutation ...
    Jan 1, 2015 · Introduction. Molecular evolution is a change in the chemical composition of molecules such as DNA, RNA, and proteins over time as a result of ...
  5. [5]
    Mechanisms of molecular evolution - PMC - NIH
    Gene duplication and conversion are sources of the evolution of new gene functions. Positive selection is necessary for the evolution of novel functions.
  6. [6]
    Molecular evolution - Latest research and news | Nature
    Molecular evolution is the area of evolutionary biology that studies evolutionary change at the level of the DNA sequence.
  7. [7]
    Emil Fischer – Biographical - NobelPrize.org
    ... polypeptides. In 1901 he discovered, in collaboration with Fourneau, the synthesis of the dipeptide, glycyl-glycine and in that year he also published his ...
  8. [8]
    [PDF] Molecular Disease, Evolution, and Genic Heterogeneity - Evolocus
    Zuckerkandl, E., and Schroeder, W. A. (1961). Nature 192, 984. Zuckerkandl, E., Jones, R. T., and Pauling, L. (1960). Proc. Natl. Acad. Sci. U.S. 46, 1349.
  9. [9]
    [PDF] Kimura.pdf
    A special feature of the neutral theory is that its underlying assumptions are sufficiently simple that population genetical consequences can readily be worked.
  10. [10]
    [PDF] Jukes T H & Cantor C R. Evolution of protein molecules. (Munro H N ...
    Feb 16, 1990 · It was published in Munro's book in 1969, and the article has 110 printed ... Nonuniformity of nucleotide substitution rates in molecular ...
  11. [11]
    [PDF] Non-Darwinian Evolution
    Most evolutionary change in proteins may be due to neutral mutations and genetic drift. Jack Lester King and Thomas H. Jukes. Darwinism is so well established ...
  12. [12]
    Adaptive protein evolution at the Adh locus in Drosophila - Nature
    Jun 20, 1991 · Here we propose a simple statistical test of the neutral protein evolution hypothesis based on a comparison of the number of amino-acid replacement ...
  13. [13]
    Evolutionary Trajectories of Beta-Lactamase CTX-M-1 Cluster ...
    Antibiotic resistance presently constitutes one of the major factors influencing pathogenesis and the outcome of infections in antibiotic-exposed patients.
  14. [14]
    Evolutionary Rate at the Molecular Level - Nature
    Article; Published: 17 February 1968. Evolutionary Rate at the Molecular Level. MOTOO KIMURA. Nature volume 217, pages 624–626 (1968)Cite this article. 16k ...Missing: URL | Show results with:URL
  15. [15]
    The Cheetah Is Depauperate in Genetic Variation - Science
    Abstract. A sample of 55 South African cheetahs (Acinonyx jubatus jubatus) from two geographically isolated populations in South Africa were found to be ...Missing: URL | Show results with:URL
  16. [16]
    The coalescent - ScienceDirect.com
    September 1982, Pages 235-248. Stochastic Processes and their Applications. The coalescent. Author links open overlay panel J.F.C. Kingman. Show more. Add to ...
  17. [17]
    Gene conversion: mechanisms, evolution and human disease
    Gene conversion, one of the two mechanisms of homologous recombination, involves the unidirectional transfer of genetic material from a 'donor' sequence to a ...
  18. [18]
    Gene conversion: a non-Mendelian process integral to meiotic ...
    Apr 7, 2022 · The initiating events of meiotic recombination are DNA double-strand breaks (DSBs) which need to be repaired in a certain way to enable the ...
  19. [19]
    Gene Conversion and Evolution of Gene Families: An Overview
    Gene conversion, which is non-reciprocal transfer of genetic material, is one mechanism for such dynamics, and generates genetic variation of various kinds.Missing: seminal | Show results with:seminal
  20. [20]
    Frequent nonallelic gene conversion on the human lineage ... - PNAS
    Gene conversion is the copying of a genetic sequence from a “donor” region to an “acceptor.” In nonallelic gene conversion (NAGC), the donor and the acceptor ...Missing: seminal | Show results with:seminal
  21. [21]
    Genome size - Bacteriophage Lambda - BNID 105770
    Genome size ; 48502 bp · Bacteriophage Lambda · Sanger F, Coulson AR, Hong GF, Hill DF, Petersen GB. Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol.Missing: pg | Show results with:pg
  22. [22]
    Insights from the first genome assembly of Onion (Allium cepa) - PMC
    With an estimated genome size of 16,400 Gb/1C, we managed to assemble ∼91% of the onion genome. As our assembly is mostly based on short read Illumina ...
  23. [23]
    On the length, weight and GC content of the human genome
    Feb 27, 2019 · The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg).
  24. [24]
    You're more complex than a worm, and researchers now believe ...
    Feb 20, 2025 · The C-value of humans is over 3 billion base pairs. The C-value of Triticum aestivum, bread wheat, is much bigger—17 billion base pairs. If C- ...
  25. [25]
    The Transposable Element Environment of Human Genes Differs ...
    May 1, 2021 · Transposable elements (TEs) are major components of eukaryotic genomes and represent approximately 45% of the human genome. TEs can be important ...The Transposable Element... · Results · Literature Cited
  26. [26]
    Polyploidy as a Fundamental Phenomenon in Evolution ...
    Mar 24, 2022 · Polyploidy-related increase in biological plasticity, adaptation, and stress resistance manifests in evolution, development, regeneration, aging, oncogenesis, ...
  27. [27]
    Genome size is a strong predictor of cell size and stomatal density in ...
    Aug 6, 2008 · Across eukaryotes phenotypic correlations with genome size are thought to scale from genome size effects on cell size.
  28. [28]
    Replication rate-information storage trade-off shapes genome ...
    Aug 10, 2025 · In this study, we theoretically identify the evolutionary pressures that may have driven this divergence in genome size. We use a parameter-free ...Missing: risk | Show results with:risk
  29. [29]
    [PDF] Genome Size and the Extinction of Small Populations - The Adami Lab
    Additionally, we show that genotypes with large genomes have an elevated proba- bility of stochastic replication errors during reproduction (i.e., stochastic ...
  30. [30]
    Molecular mechanisms and topological consequences of drastic ...
    Nov 25, 2021 · In mammals, the chromosome number ranges from 2n = 6 in the female Indian muntjac (M. muntjak vaginalis) to 2n = 102 in the viscacha rat ...
  31. [31]
    Rapid and Parallel Chromosomal Number Reductions in Muntjac ...
    Analyses of sequence divergence reveal that the rate of change in chromosome number in muntjac deer is one of the fastest in vertebrates. Within the muntjac ...Missing: fissions | Show results with:fissions
  32. [32]
    Myrmecia pilosula, an Ant with Only One Pair of Chromosomes
    A new sibling species of the primitive Australian ant Myrmecia pilosula has a chromosome number of n = 1. C-banding techniques confirm that the two chromosomes ...
  33. [33]
    Chromosomal polymorphisms involving telomere fusion ...
    Oct 3, 1989 · The ant *Myrmecia pilosula* has chromosome numbers of 2n=2, 3, and 4, with six polymorphic chromosomes. Telomere fusion and centromere shift ...
  34. [34]
    Chromosomal evolution and speciation: a recombination‐based ...
    Nov 14, 2003 · These studies reveal that reciprocal and nonreciprocal translocations and paracentric and pericentric inversions are the gross structural ...
  35. [35]
    Frequency, Origins, and Evolutionary Role of Chromosomal ...
    Mar 17, 2020 · Chromosomal inversions have the potential to play an important role in evolution by reducing recombination between favorable combinations of alleles.
  36. [36]
    Chromosome number evolves at equal rates in holocentric and ...
    Oct 13, 2020 · Monocentric chromosomes possess a single region that function as the centromere while in holocentric chromosomes centromere activity is spread ...
  37. [37]
    Evolution of holocentric chromosomes: Drivers, diversity, and ...
    Across the eukaryotic tree, holocentric organisms show a sporadic distribution as compared to monocentric ones which are widespread. Holocentricity is so ...
  38. [38]
    Genetic degeneration of old and young Y chromosomes in ... - PNAS
    May 13, 2014 · Heteromorphic sex chromosomes have originated independently in many species, and a common feature of their evolution is the degeneration of the ...
  39. [39]
    Molecular aspects of Y-chromosome degeneration in Drosophila - NIH
    Genes on the nonrecombining neo-Y chromosome showed various signs of degeneration, including TE insertions, frameshift mutations, and a higher rate of amino ...
  40. [40]
    How chromosomal rearrangements shape adaptation and speciation
    Heterozygosity for chromosomal inversions severely reduces the rate of recombination through multiple mechanisms reviewed below, thereby preventing multiple ...
  41. [41]
    Chromosomal rearrangements and speciation - PubMed
    I argue that rearrangements reduce gene flow more by suppressing recombination and extending the effects of linked isolation genes than by reducing fitness.Missing: drivers | Show results with:drivers
  42. [42]
    Retention of duplicated genes in evolution - PMC - PubMed Central
    Gene duplication is a prevalent phenomenon across the tree of life. The processes that lead to the retention of duplicated genes are not well understood.
  43. [43]
    Rapid genome reshaping by multiple-gene loss after whole ... - PNAS
    We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements ...
  44. [44]
    Evolution by Gene Duplication | SpringerLink
    In stockAug 23, 2014 · Book Title: Evolution by Gene Duplication · Authors: Susumu Ohno · Publisher: Springer Berlin, Heidelberg · eBook Packages: Springer Book Archive.
  45. [45]
    Diversification and Functional Evolution of HOX Proteins - PMC
    In this review, we will provide a general overview of gene duplication and functional divergence and then focus on the functional evolution of HOX proteins.
  46. [46]
    Comprehensive phylogenetic analysis of the ribonucleotide ... - eLife
    Sep 1, 2022 · To study the molecular evolution of the RNR family, we performed comprehensive phylogenetic inference on the catalytic α subunits (Figure 2).
  47. [47]
    Common and unique strategies of myoglobin evolution for deep-sea ...
    Aug 20, 2021 · One of the best examples is myoglobin (Mb), which functions in O2 storage in muscle tissues for aerobic exercise. Mb is highly concentrated in ...
  48. [48]
    The life and death of gene families - Demuth - Wiley Online Library
    Jan 20, 2009 · This “accordian” of gene family expansion and contraction suggests that selection can fine tune gene dosage by adjusting gene copy number in ...
  49. [49]
    Gene loss through pseudogenization contributes to the ecological ...
    Sep 30, 2020 · Pseudogenization is a major mechanism underlying gene loss, and pseudogenes are best characterized by comparing closely related genomes because of their short ...
  50. [50]
    Origin and Spread of de Novo Genes in Drosophila melanogaster ...
    Jan 23, 2014 · In Drosophila, de novo genes tend to be specifically expressed in tissues associated with male reproduction (2, 10), which suggests that sexual ...
  51. [51]
    The origin and structural evolution of de novo genes in Drosophila
    Jan 27, 2024 · Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation.
  52. [52]
    Evolutionary Origin and Functions of Retrogene Introns
    Jun 24, 2009 · Retroposed genes (retrogenes) originate via the reverse transcription of mature messenger RNAs from parental source genes and are therefore ...
  53. [53]
  54. [54]
    a new method for reconstructing phylogenetic trees - PubMed
    A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data.
  55. [55]
    Evolutionary trees from DNA sequences: A maximum likelihood ...
    The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed.
  56. [56]
    MRBAYES: Bayesian inference of phylogenetic trees | Bioinformatics
    Cite. John P. Huelsenbeck, Fredrik Ronquist, MRBAYES: Bayesian inference of phylogenetic trees , Bioinformatics, Volume 17, Issue 8, August 2001, Pages 754 ...
  57. [57]
    Empirical evaluation of partitioning schemes for phylogenetic ...
    We found that the most useful categories for partitioning were codon position, RNA secondary structure pairing, and the coding/noncoding distinction.
  58. [58]
    PHYLIP Home Page
    PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of ...
  59. [59]
    RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses ...
    RAxML-VI-HPC is a program for large phylogenetic inference using maximum likelihood, with optimizations for speed and parallel processing, and mixed models.
  60. [60]
    Maximum likelihood phylogenetic estimation from DNA sequences ...
    Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites.
  61. [61]
    Relaxed Phylogenetics and Dating with Confidence | PLOS Biology
    Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times.
  62. [62]
    Modeling the covarion hypothesis of nucleotide substitution
    A “covarion” model for nucleotide substitution that allows sites to turn “on” and “off” with time was proposed in 1970 by Fitch and Markowitz.
  63. [63]
    Generation Time Effect on the Rate of Molecular Evolution in ...
    We found evidence that invertebrate eumetazoan species with shorter GTs have faster rates of molecular evolution in ribosomal RNA genes in mitochondrial and ...
  64. [64]
    Quantifying Adaptive Evolution in the Drosophila Immune System
    Immune genes show more variation in rates of adaptive evolution than other genes. The high rate of adaptive evolution that we found in immunity genes could be ...
  65. [65]
    The evolution of next-generation sequencing technologies - PMC
    NGS technology started with the development of pyrosequencing [8] and was first commercially available in 2005 as the 454/Roche platform [9].
  66. [66]
    Historical Perspective, Development and Applications of Next ...
    In 2005, Solexa released the Genome Analyzer (GA). Its sequencing technology is based on sequencing by synthesis (SBS) using reversible dye-terminators ...<|separator|>
  67. [67]
    Advancements in long-read genome sequencing technologies and ...
    The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial ...
  68. [68]
    Method of the year: long-read sequencing - Nature
    Jan 12, 2023 · With long reads generated with nanopore sequencing on ONT instruments, they can “resolve those complex genomic aberrations in cancer that are ...
  69. [69]
    Single‐cell RNA sequencing technologies and applications: A brief ...
    Mar 29, 2022 · One important application of the scRNA‐seq technology is to build a better and high‐resolution catalogue of cells in all living organism, ...
  70. [70]
    Investigating bacterial evolution in nature with metagenomics
    Metagenomic sequencing allows the investigation of bacterial evolution. •. Mutation, selection, drift, and gene flow occur in microbiomes in nature. •. These ...
  71. [71]
    DNA Sequencing Costs: Data
    May 16, 2023 · Data used to estimate the cost of sequencing the human genome over time since the Human Genome Project.
  72. [72]
    BEAST 2: A Software Platform for Bayesian Evolutionary Analysis
    A BEAST 2 package is a collection of BEASTObject extensions that builds on that platform. Packages make it easier to describe the separate pieces of academic ...
  73. [73]
    BEAST: Bayesian evolutionary analysis by sampling trees
    Nov 8, 2007 · BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new ...
  74. [74]
    IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating ...
    However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3–97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.Results · Fig. 2 · Fig. 3
  75. [75]
    Highly accurate protein structure prediction with AlphaFold - Nature
    Jul 15, 2021 · AlphaFold greatly improves the accuracy of structure prediction by incorporating novel neural network architectures and training procedures ...
  76. [76]
    A demonstration of unsupervised machine learning in species ...
    Jul 16, 2019 · We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa ...
  77. [77]
    What can genome‐wide association studies tell us about the ...
    Feb 17, 2017 · Studies of molecular traits, like gene expression, could be more likely to detect the effects of negative selection than GWAS on physical traits ...
  78. [78]
    PhyloNet: a software package for analyzing and reconstructing ...
    Jul 28, 2008 · In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or evolutionary ...
  79. [79]
  80. [80]
    A transformer model for SARS-CoV-2 lineage frequency forecasting
    Transformer models accurately forecast lineage trends in the USA and the UK, and more generally. The results above showed that the transformer models perform ...
  81. [81]
    Experimental evolution and the dynamics of adaptation and genome ...
    May 16, 2017 · The long-term evolution experiment, or LTEE, is simple both conceptually and practically. Twelve populations were started the same ancestral ...
  82. [82]
    [PDF] DIRECTED EVOLUTION OF ENZYMES AND BINDING PROTEINS
    Oct 3, 2018 · In the seminal paper (4), Arnold had mastered the whole work flow for directed evolution of enzymes, a methodology relying on several parts: 1) ...
  83. [83]
    Introduction to the Long-Term Evolution Experiment (LTEE)
    The LTEE is a pretty simple experiment, both conceptually and methodologically. The core idea is to observe and quantify the process of evolution in action.
  84. [84]
    Efficient CRISPR-Cas9 editing of major evolutionary loci in ... - NIH
    Our studies show that CRISPR-Cas9 is a powerful tool to induce mutations at defined loci in sticklebacks and to study the biology of major evolutionary loci.
  85. [85]
    Testing hypotheses of a coevolutionary key innovation ... - PNAS
    Dec 12, 2022 · Here, we use CRISPR-Cas9 gene knockouts to remove a Pierine butterfly's ability to detoxify mustard defensive chemistry.
  86. [86]
    3D multicellular systems in disease modelling: From organoids to ...
    Feb 1, 2023 · Hence, human organoids represent a powerful 3D multicellular system for modelling human-specific aspects of development and disease, bridging ...
  87. [87]
    Reconstructing Ancient Proteins to Understand the Causes of ...
    Here we review recent studies employing ancestral protein reconstruction and show how they have produced new knowledge not only of molecular evolutionary ...
  88. [88]
    Molecular de-extinction opens possibilities for new antibiotics - CAS
    Oct 15, 2025 · Molecular de-extinction revives ancient proteins and genes to discover new antibiotics and fight drug-resistant pathogens.
  89. [89]
    Molecular Biology and Evolution - Scimago
    Molecular Biology and Evolution open access ; SJR 2024. 4.085 Q1 ; H-Index. 256 ; Publication type. Journals ; ISSN. 07374038, 15371719 ; Coverage. 1983-2025 ...Missing: founding | Show results with:founding
  90. [90]
    MBE Transitions to the Open Access Publication Model in 2021
    Dec 10, 2020 · In 2010, SMBE introduced Genome Biology and Evolution (GBE), an online-only journal published under an OA model. Yet the OA movement has not ...
  91. [91]
    Molecular Biology and Evolution | Oxford Academic
    GBE is the sister journal of MBE. It is fully open access and publishes leading original research at the interface between evolutionary biology and genomics.Advance articles · Author guidelines · Editorial Board · About the Journal
  92. [92]
    Genome Biology and Evolution | Oxford Academic
    Impact Factor: 3.3 (20 out of 52 in Evolutionary Biology). 5 Year Impact Factor: 3.5. Google h5-index: 48 (9th in Evolutionary Biology). The Society for ...About the journal · Author Guidelines · Advance articles · Editorial Board
  93. [93]
    History - Society for the Study of Evolution
    Over 500 members joined the Society during the first year of its existence, and on the occasion of the First Annual Meeting in Boston, December 28-31, 1946, ...
  94. [94]
  95. [95]
    Systematic Biology - Scimago
    Systematic Biology ; Publisher. Oxford University Press ; SJR 2024. 2.945 Q1 ; H-Index. 208 ; Publication type. Journals ; ISSN. 10635157, 1076836X ...Missing: founding | Show results with:founding
  96. [96]
    Systematic Biology | Oxford Academic
    2024 Journal Impact Factor (Clarivate). 6.9. 2024 5 Year Impact Factor (Clarivate). 5/53. Evolutionary Biology (Clarivate). 13.1. 2024 CiteScore (Scopus). All ...About · Advance articles · Author Guidelines · Volume 74 Issue 3 May 2025Missing: founding | Show results with:founding
  97. [97]
    Society for Molecular Biology and Evolution: Home
    The Society for Molecular Biology and Evolution fosters communication among molecular evolutionists and advances the field. It publishes two peer-reviewed ...MeetingsMembership ApplicationMembershipCouncilMolecular Biology & Evolution
  98. [98]
    SMBE 2024
    This symposium will bring together researchers studying various aspects of evolutionary processes and population history in humans at the level of the genome.
  99. [99]
    Awards - Society for Molecular Biology and Evolution
    SMBE instituted four new awards for: Early-Career, Mid-Career, and Lifetime Research Achievements, and Service to the SMBE Community.Missing: Hannan | Show results with:Hannan
  100. [100]
    Society for Molecular Biology and Evolution Journals
    SMBE IDEA. The SMBE IDEA task force aims to increase the participation of scientists from diverse backgrounds in the fields of molecular biology and evolution.
  101. [101]
    European Society for Evolutionary Biology: ESEB
    The European Society for Evolutionary Biology (ESEB) is an academic society that brings together more than 1500 evolutionary biologists from Europe and the rest ...
  102. [102]
    List of Symposia – ESEB 2025. Congress of the European Society ...
    It will cover key areas of molecular evolution, evolutionary and population genomics, including genetic load and demographic history, as well as micro- and ...
  103. [103]
    Funds and awards to promote Equal Opportunities - ESEB
    Sponsor activities (e.g., workshops, seminars, symposia) through the Equal Opportunities Initiative Fund and the Under-represented Early Career Research ...
  104. [104]
    Workshop on Molecular Evolution | Marine Biological Laboratory
    The workshop serves graduate students, postdocs, and established faculty from around the world seeking to apply the principles of molecular evolution.