Fact-checked by Grok 2 weeks ago

C-value

The C-value, also known as the haploid genome size, refers to the total amount of DNA contained within the unreplicated nucleus of a gamete or a haploid somatic cell of an organism. It is conventionally measured in picograms (pg) of DNA or in base pairs (bp), with values ranging from approximately 0.02 pg in certain yeasts to over 150 pg in some protozoans and plants. The concept is inextricably linked to the C-value paradox (or enigma), which highlights the puzzling lack of correlation between an organism's genome size and its biological complexity, gene number, or phenotypic sophistication—such that simpler organisms like certain amphibians often possess far larger genomes than more complex ones like humans. This paradox, first formally articulated by Charles A. Thomas Jr. in 1971, arises because eukaryotic genomes are dominated by , which can comprise 90% or more of the total sequence in many species. Key contributors to variation include repetitive DNA elements (such as transposable elements and ), expanded introns, pseudogenes, and events, which amplify DNA content without proportionally increasing functional gene repertoire. For instance, the totals about 3.2 × 10⁹ bp (or ~3.3 pg), encoding roughly 20,000–25,000 protein-coding genes, while the genome of the (Protopterus aethiopicus) exceeds 130 pg despite similar or lower complexity. In plants, angiosperms exhibit over 1,000-fold variation in C-value, often driven by proliferation, as seen in where such elements account for much of the genome's expansion. Animals show comparable disparities, with pufferfish maintaining compact genomes around 0.4 pg through efficient DNA deletion mechanisms, contrasting with the bloated genomes of salamanders reaching 120 pg. While early explanations invoked "selfish" or proliferating without selective pressure, contemporary research emphasizes the functional roles of much in gene regulation, chromatin structure, and , rendering the less enigmatic but still a driver of genomic studies. influences cellular processes like division rates and metabolic costs, with larger C-values often correlating with bigger cells and slower development in certain taxa. Databases such as the Plant DNA C-values Database and the Animal Genome Size Database continue to catalog these variations, aiding evolutionary and ecological analyses.

Definition and Fundamentals

Definition

The C-value refers to the amount of DNA, measured in picograms (pg), contained within the haploid nucleus of an organism, typically assessed during the G1 phase of the cell cycle when the DNA content is at its baseline level for a haploid set./Unit_I:_Genes_Nucleic_Acids_Genomes_and_Chromosomes/4:_Genomes_and_Chromosomes/4.5:Sizes_of_genomes-_The_C-value_paradox) This metric provides a quantitative measure of the total DNA mass in one complete set of chromosomes, independent of the organism's overall ploidy. Importantly, the C-value is distinct from levels, as it specifically denotes the haploid DNA content (1C), whereas diploid cells contain twice that amount (2C), and polyploid cells may have multiples thereof, but the C-value remains the reference for the haploid . For example, in diploid organisms, somatic s at have a 2C DNA content, but the C-value is defined as the 1C amount./03:_The_Cell_Cycle_and_Mitosis/3.04:Amount_of_DNA(c-value)and_Number_of_Chromosomes(n-value)) This distinction emphasizes that C-value focuses on DNA quantity per haploid set rather than the number of chromosome sets present in a . In biological contexts, the C-value is often used synonymously with haploid genome size, but it uniquely highlights the physical mass of DNA rather than the length of the nucleotide sequence, accounting for variations in DNA density and composition. To relate C-value to sequence-based measurements, genome size in base pairs (bp) can be approximated using the conversion factor derived from the average mass of a DNA base pair: \text{Genome size (in bp)} \approx (\text{C-value in pg}) \times (0.978 \times 10^{9} \, \text{bp/pg}) This formula, based on the molecular weight of double-stranded DNA (approximately 660 Da per base pair), allows for practical interconversion between mass and sequence units.

Units of Measurement

The C-value, representing the amount of DNA in a haploid genome, is primarily measured in picograms (pg) of DNA mass, providing a direct quantification of nuclear DNA content. This unit reflects the total mass of double-stranded DNA molecules within the unreplicated haploid chromosome set. Alternatively, the C-value is expressed in terms of base pairs (bp) or megabase pairs (Mbp), which denote the length of the DNA sequence in nucleotide pairs, facilitating comparisons with sequenced genomes. Historically, early estimates of C-values relied on arbitrary units derived from optical absorbance measurements in Feulgen microdensitometry, where DNA content was inferred from staining intensity without absolute calibration, leading to inconsistencies across studies. By the post-1970s period, advancements in standardization, including the adoption of reference standards like chicken erythrocytes, enabled reporting in absolute picograms, with base pair equivalents becoming prevalent alongside the rise of molecular sequencing techniques. To convert between mass and length units, the established factor is 1 pg of DNA ≈ 978 Mbp, derived from the average molecular weight of nucleotides in double-stranded DNA (approximately 660 Da per bp, adjusted for hydration and composition). The conversion formula is: \text{genome size (bp)} = \text{C-value (pg)} \times 978 \times 10^6 This approximation assumes a typical nucleotide composition and is widely used for eukaryotic genomes. For enhanced accuracy, the conversion factor must account for variations in , which influences the average mass per due to the slightly higher molecular weight of G-C pairs (approximately 1-2% deviation from the standard). For instance, at around 40% —common in many plant and animal genomes—the factor is precisely 977.97 Mbp per , underscoring the need for compositional data in precise inter-species comparisons.

Historical Context

Origin of the Term

The term "C-value" was coined by American cytologist in to refer to the amount of DNA contained within the haploid of an organism, with the "C" denoting "constant" in line with the prevailing hypothesis that DNA content remained fixed across somatic cells of a given . introduced the terminology in his seminal paper examining DNA quantities in various animal tissues, where he used designations such as "1C value" and "2C value" to classify nuclear DNA amounts relative to the haploid baseline, assuming intraspecific constancy as a foundational principle. Swift's work built on early photometric techniques for quantifying deoxyribonucleic acid (DNA), influenced by the Vendrely couple's 1948 proposal of DNA constancy as a measure of genetic content, and aimed to defend this idea against emerging doubts by compiling data from diverse taxa like amphibians and insects. Although Swift did not explicitly define "C" as "constant" in his original publication—later clarifying it via personal communication as representing the DNA characteristic of a specific genotype—the term quickly gained traction for standardizing comparisons of nuclear DNA across cell types and species. By the mid-1960s, accumulating evidence from cytophotometric studies revealed substantial deviations from the assumed intraspecific constancy, prompting a reevaluation of the term's implications and shifting its usage from an indicator of fixed genetic material to a descriptor of variable s that often exceeded expectations based on organismal complexity. This evolution culminated in the formal recognition of the "C-value paradox" in 1971, highlighting the disconnect between DNA content and perceived biological sophistication, though Swift's foundational endured as the standard in research.

Early Observations of Variation

In the late 1940s, researchers including André Boivin, Colette Vendrely, and Roger Vendrely conducted pioneering measurements of DNA content in animal cells, initially hypothesizing a constant amount per somatic cell nucleus based on chemical extractions and early cytochemical assays. Their work suggested that DNA quantity remained stable across cell types within an organism, reflecting a presumed fixed genetic material load, but comparative analyses across species began revealing substantial discrepancies. By the early 1950s, these differences were quantified, showing DNA contents varying by factors of 10 to 100 or more between species, far exceeding expectations of uniformity tied to chromosome number or organismal complexity. The advent of microspectrophotometry in the 1950s, particularly Feulgen staining combined with ultraviolet absorption techniques, enabled precise quantification of DNA in individual nuclei, transforming these observations from qualitative to quantitative. Hewson Swift's 1950 study on animal nuclei demonstrated a wide range of DNA amounts, from less than 1 pg in some invertebrates to over 10 pg in vertebrates, highlighting interspecific variation independent of ploidy in many cases. Similarly, Alfred E. Mirsky and Hans Ris's 1951 analysis of diverse animal tissues confirmed constancy within species but reported up to 30-fold differences across taxa, such as between insects and mammals, challenging the notion that DNA content scaled directly with evolutionary advancement. These findings extended to plants, where Swift noted even broader ranges, underscoring the method's role in uncovering unexpected diversity. In the 1960s, studies on amphibians further illustrated dramatic genome size jumps without corresponding increases in morphological or genetic complexity, amplifying the puzzle. For instance, measurements in salamanders and frogs revealed DNA contents spanning 10- to 100-fold variation among closely related species, often linked to non-polyploid mechanisms like repetitive sequence proliferation rather than gene number expansion. Initial explanations attributed such disparities to polyploidy or technical artifacts in staining and measurement, as polyploid events were well-documented in plants and some amphibians; however, subsequent refinements in techniques and broader sampling disproved these for many animal lineages, establishing the variations as genuine biological phenomena.

Patterns of Genome Size Variation

Across Species and Kingdoms

The C-value, representing the haploid DNA content, exhibits extraordinary variation across and kingdoms, spanning over five orders of magnitude in eukaryotes alone. The smallest recorded eukaryotic C-values are found in parasitic such as Encephalitozoon intestinalis at approximately 0.0023 pg (2.3 Mbp), while among free-living multicellular eukaryotes, the Genlisea aurea has one of the most compact genomes at about 0.065 pg (63 Mbp). At the opposite extreme, certain plants and protists harbor massive genomes exceeding 150 pg; for instance, the Japanese canopy plant Paris japonica possesses a C-value of about 149 pg (152 Gbp), while recent studies confirm the Tmesipteris oblanceolata with the current record of approximately 164 pg (160.45 Gbp) as of 2024. Historical claims for protists like Amoeba dubia (up to 686 pg) and Polychaos dubium (>100 pg) are based on outdated measurements and remain disputed, with modern reassessments indicating much smaller sizes. Bacterial genomes, while not strictly classified under eukaryotic C-values, provide a baseline for minimal DNA content, typically ranging from 0.5 to 10 Mbp (0.0005–0.01 pg), as seen in minimalistic endosymbionts like Carsonella ruddii. Kingdom-specific patterns reveal distinct trends in C-value distribution. In bacteria and archaea (prokaryotes), genomes remain compact, averaging 2–5 Mbp, constrained by rapid replication needs and limited non-coding DNA. Protists display high variability, from tiny parasitic forms like the microsporidian Encephalitozoon cuniculi at 2.9 Mbp (0.003 pg) to enormous ones in certain free-living amoeboid forms, though verified extremes are lower than previously thought. Plants frequently exhibit expanded genomes, often due to polyploidy and transposon proliferation; for example, bread wheat (Triticum aestivum) has a C-value of about 17 pg (16 Gbp) from its hexaploid nature, while the overall plant range spans 0.06 pg in Genlisea to over 150 pg in lilies and ferns. Animals, by contrast, show more constrained variation, with mammals typically between 1.5 and 6 pg—humans at 3.5 pg (3.3 Gbp)—though exceptions occur in invertebrates like the marbled lungfish (Protopterus aethiopicus) at around 133 pg (130 Gbp). Phylogenetic analyses indicate that C-values often correlate with evolutionary divergence in specific lineages, showing gradual increases over time rather than uniform scaling with complexity. In amphibians, for instance, salamanders (order Urodela) display a pronounced expansion, with some species reaching up to ~120 pg, linked to activity along the lineage. Similar trends appear in within the Liliaceae family and certain groups, where escalates with phylogenetic distance from compact ancestors. Comprehensive databases underpin these observations: the Animal Genome Size Database catalogs C-values for 6,534 animal species as of 2025, while the Plant DNA C-values Database covers 12,273 species, enabling cross-kingdom comparisons.

Within Species and Populations

Intraspecific variation in C-value, or within a single , is generally modest compared to interspecific differences, often ranging from 5% to 20% across populations and individuals in both and . This variation arises primarily from differences in repetitive DNA content, such as transposable elements, and structural changes like duplications or deletions, rather than changes in number. In , such variation is typically constrained, reflecting stronger selection pressures against large genomes due to metabolic costs, while exhibit greater flexibility, allowing for wider ranges influenced by or accessory chromosomes. Detecting this intraspecific variability requires analyzing multiple individuals from diverse populations, as single-sample measurements can overlook subtle differences, and or sequencing-based estimates from recent studies emphasize the need for standardized protocols to account for tissue-specific or environmental artifacts. Geographic and ecological factors contribute to intraspecific C-value variation, with notable examples in . For instance, in the snapping shrimp Synalpheus idios, genome size varies by up to 35% across geographic regions, attributed to differential expansion of transposable elements in isolated populations, highlighting how dispersal barriers can drive localized . In , extremes can exceed 30%, as seen in (Zea mays), where inter-populational differences reach 36%, often linked to the presence of B chromosomes—supernumerary elements that add repetitive DNA without essential genes and vary in number among individuals. B chromosomes are a major driver of such variation in numerous plant species, correlating positively with overall and enabling rapid intraspecific diversification. Although the prompt mentions Arabidopsis thaliana, actual measurements show only about 2–5% variation among diploid accessions (2C-value ~0.30–0.32 pg), primarily due to chromosome polymorphisms rather than B chromosomes. Influencing factors include environmental stress, which correlates with higher intraspecific variation in certain taxa. In insects like seed beetles (Callosobruchus maculatus), populations with larger genomes exhibit improved buffering against stressors such as or , suggesting that genome size may enhance resilience in variable habitats, with variation up to 20% observed within species. Conversely, vertebrates, including , show minimal intraspecific C-value differences, typically under 5%, due to conserved genome architectures and strong purifying selection. Recent 2020s studies, such as analyses of over 1,000 bird species, reveal weak but positive associations between and geographic range extent, implying subtle clinal patterns tied to or gradients, though direct intraspecific clines remain rare and require broader sampling to confirm. These patterns underscore the role of in modulating genome size at the population level, distinct from broader interspecific trends across kingdoms.

Conceptual Challenges

C-value Paradox

The C-value paradox refers to the observed lack of correlation between the DNA content of an organism's genome (C-value) and its perceived phenotypic complexity or number of genes. This discrepancy was formally named by Charles A. Thomas Jr. in 1971, who highlighted how genome sizes vary dramatically across species without corresponding increases in organismal sophistication or functional genetic elements. For instance, the human haploid genome measures approximately 3.2 pg, supporting around 20,000 protein-coding genes, while the amoeba Amoeba proteus has a reported C-value of about 30 pg—nearly 10 times larger—yet possesses far fewer genes and simpler cellular organization. The paradox emerged from accumulating evidence in the mid-20th century, particularly data from the 1950s and 1960s that revealed substantial genome size variation among vertebrates and other eukaryotes, with no apparent scaling to morphological or physiological complexity. Early measurements using cytophotometry and chemical extraction techniques showed, for example, that amphibian genomes could span a 100-fold range within related taxa, challenging the assumption that DNA content directly reflected gene number. This built on the 1968 discovery by Roy J. Britten and David E. Kohne of highly repetitive DNA sequences in eukaryotic genomes, which accounted for much of the excess DNA but did not explain why such expansions occurred without functional benefits tied to complexity. The paradox was most pronounced in eukaryotes, where prokaryotes exhibit tighter correlations between genome size and gene count due to minimal non-coding regions. A prominent example illustrating the is the onion ( cepa), with a haploid of approximately 16 pg—five times larger than the —but encoding around 40,000 protein-coding genes despite the plant's relatively simple multicellular structure compared to vertebrates. Initial attempts to resolve this involved Susumu Ohno's 1972 proposal of "," suggesting that much of the extra DNA was non-functional and accumulated neutrally without selective pressure, thus decoupling genome size from complexity. This hypothesis gained traction as repetitive elements and pseudogenes were identified as major contributors to genome bloat. While the recognition of extensive —comprising over 90% of many eukaryotic genomes, including humans—provided a partial explanation by attributing excess size to selfish genetic elements and replication errors, it did not fully account for the persistence or scale of these expansions across lineages. The concept shifted focus from strict gene-centric views but left open questions about regulatory roles or evolutionary constraints, marking the as a foundational puzzle in .

C-value Enigma

The C-value enigma refers to the broader set of unresolved questions surrounding the vast variation in eukaryotic sizes, particularly why such diversity persists despite neutral evolutionary models predicting relative stability over time. Coined by T. Ryan Gregory in 2005, the term encompasses not only the empirical observation of genome size discrepancies (known as the C-value paradox) but also the underlying mechanistic and evolutionary processes driving this variation. Unlike the paradox, which focuses on the lack of correlation between genome size and organismal complexity, the enigma emphasizes the dynamic interplay of mutational biases, , and selection in shaping genome architecture across lineages. Central to the are debates over the relative contributions of macroevolutionary events like whole- duplications, which can rapidly expand , versus finer-scale processes such as insertions and deletions that incrementally alter size. A key puzzle is why some lineages exhibit strong purifying selection against genome expansion, leading to streamlined —such as the pufferfish (Tetraodon nigroviridis), with a minimal C-value of approximately 0.4 —while others tolerate extensive "bloat" without apparent costs. This variation raises questions about the efficacy of selection in constraining accumulation, particularly in with large effective population sizes where drift is limited. Recent advances have highlighted transposable elements (TEs) as major drivers of expansion, often comprising the bulk of ; for instance, TEs account for over 85% of the (Zea mays) , facilitating proliferation through replicative while deletions counteract this in some cases. Complementing this, the drift-barrier hypothesis proposes that small sizes in microbes and other organisms with large effective sizes result from enhanced efficiency of selection against deleterious insertions, as weakens in high-Ne populations, imposing a "barrier" to expansion. Formulated by Michael Lynch in 2012, this model integrates to explain why genome streamlining predominates in lineages with robust drift resistance, such as , while eukaryotes with smaller Ne often harbor larger genomes. Despite these insights, the remains unresolved, as genomic sequencing data from the reveal no consistent between C-value and physiological traits like metabolic rate or size across broad taxa, challenging expectations of adaptive constraints. Recent long-read sequencing as of 2025 has further clarified TE dynamics and non-coding architecture in diverse eukaryotes, aiding resolution of proliferation mechanisms. For example, while some studies suggest a weak negative link between and in mammals, this effect explains only a fraction of variation and fails to hold universally, underscoring the need for further integration of ecological and genomic factors. These persistent gaps highlight the 's complexity, demanding multidisciplinary approaches to disentangle neutral and selective forces in .

Measurement Methods

Classical Techniques

The primary classical technique for estimating C-value, or haploid nuclear DNA content, emerged in the 1950s through the combination of Feulgen staining and microspectrophotometry. Feulgen staining, developed in the 1920s but adapted for quantitative DNA measurement in the mid-20th century, employs a DNA-specific dye that reacts with depurinated DNA to produce a magenta color proportional to DNA amount. The process begins with fixation of cells or tissues, followed by acid hydrolysis—typically in 5 N HCl for 60-120 minutes at room temperature—to depurinate the DNA, exposing aldehyde groups that bind Schiff's reagent (leucofuchsin). This staining is highly specific to DNA and stoichiometric, allowing for visual and photometric quantification. Microspectrophotometry then measures the absorbance of stained nuclei at wavelengths of 550-570 nm, where the integrated optical density (IOD) serves as a proxy for DNA content, following the principle that absorbance is directly proportional to DNA amount via Beer's law: \text{Absorbance} \propto \text{DNA content}. More precisely, IOD is computed as IOD = \sum \log_{10} (1/T_i), with T_i denoting transmittance at each measured point across the nucleus. This method's accuracy was typically ±10-20%, influenced by factors such as staining variability, slide age (newer slides yielding up to 35% lower IOD than aged ones), and the number of nuclei per field (optimal at 10-20 to minimize errors exceeding 10%). The dissociation step via acid was critical for stain specificity but introduced potential artifacts if hydrolysis times varied, affecting uniformity. Early applications focused on animal tissues, notably in Hewson Swift's pioneering studies, which used Feulgen microspectrophotometry to demonstrate relative constancy of DNA content across plant and animal nuclei, coining the "C-value" term for the haploid DNA class. Swift's work, building on earlier qualitative Feulgen observations, quantified DNA in diverse species like lilies and amphibians, revealing unexpected variations that laid groundwork for the C-value paradox. Despite its innovations, the technique was labor-intensive, requiring manual dissociation to isolate intact —often via grinding or enzymatic —followed by embedding in or for sectioning and individual scanning under a equipped with a scanning densitometer. isolation posed challenges, including loss of fragile cells and incomplete dissociation leading to clumped material, which could bias measurements toward more robust cell types. These limitations restricted throughput to dozens of nuclei per sample, making large-scale studies time-consuming. The historical impact of these methods was profound, enabling the first systematic catalogs of C-values for approximately 100 by the early , primarily through compilations of microspectrophotometric from plants and animals. These early datasets, amassed via Feulgen-based surveys, documented variation across taxa and spurred foundational research into and evolutionary patterns, though constrained by the method's manual nature and error margins.

Modern Approaches

Modern approaches to determining C-value, the amount of DNA in a haploid , have evolved significantly since the , emphasizing , high throughput, and precision to facilitate large-scale studies across diverse organisms. These methods offer substantial improvements over earlier techniques in terms of speed, with analyses completable in hours rather than days, and accuracy, often achieving resolutions below 5% (CV) for measurements. Flow cytometry stands as the predominant modern technique for C-value estimation, particularly in and animals, by quantifying DNA content through fluorescent staining of isolated nuclei. The process involves preparing a nuclear suspension from fresh or fixed tissue, staining with DNA-specific fluorochromes such as propidium iodide, which intercalates with double-stranded DNA, and then passing the nuclei through a that excites the dye with a and measures emitted proportional to DNA amount. This method achieves high resolution, typically with a CV under 5% for the G1 peak, enabling detection of levels and sizes with minimal sample preparation. A standard protocol incorporates an internal reference standard, such as chicken red blood cells (RBCs) with a known C-value of approximately 1.25 pg (2C ≈ 2.5 pg), co-stained with the sample to account for instrument variability. The C-value is then estimated using the formula: C = \left( \frac{F_s}{F_{std}} \right) \times C_{std} where F_s is the mean of the sample peak, F_{std} is that of the , and C_{std} is the standard's C-value. This approach has enabled the compilation of extensive databases, such as the Plant DNA C-values Database, with 12,273 entries (as of 2025). Sequencing-based methods, emerging prominently in the , provide direct base-pair () counts for C-value determination through de novo whole-genome , bypassing the need for physical standards and offering nucleotide-level precision for genomes under 10 Gbp. Short-read platforms like Illumina generate millions of reads for assembly into contigs, while long-read technologies such as PacBio or resolve repetitive regions, yielding haploid assemblies where the total assembled length approximates the C-value in (converted to pg assuming 1 pg ≈ 978 Mbp). These methods became feasible for routine use post-2010 due to cost reductions and algorithmic advances, achieving near-complete assemblies for species like (≈135 Mbp) with error rates below 1%. Recent integrations of long-read sequencing with conformation capture () have further improved assembly accuracy for large genomes exceeding 50 Gbp. For unassembled data, frequency analysis from raw reads estimates by modeling coverage peaks, providing rapid proxies validated against results. For exceptionally large genomes exceeding 10 Gbp, such as those in amphibians or lilies, (PFGE) serves as an advanced tool to separate intact chromosomes or large fragments, allowing size summation to infer total C-value. Developed in the mid-1980s, PFGE uses alternating to migrate megabase-scale DNA through gels, resolving molecules up to 10 Mb, which is particularly useful for microbial or genomes but adaptable to eukaryotic chromosomes via embedding in plugs. Integration with bioinformatics pipelines, including error correction via multiple assemblies or hybrid short-long read approaches, further refines estimates, reducing gaps in repetitive sequences and enhancing overall accuracy to within 1-2% for validated cases.

Specific Examples

Human Genome Size

The human haploid genome size is currently estimated at approximately 3.2 picograms (pg) of DNA, equivalent to about 3.1 gigabase pairs (Gbp), based on the GRCh38.p14 reference assembly released in 2022, which includes 3,137,300,923 non-N bases across the 24 chromosomes. This value represents the total DNA content in a single unreplicated set of chromosomes and has been refined through iterative improvements in sequencing technologies, closing previous gaps in repetitive and centromeric regions. Early estimates in the , derived from Feulgen microspectrophotometry, placed the haploid at around 2.5 , reflecting initial biochemical and cytophotometric measurements of DNA content in human cells. By the 1970s, provided more precise quantification, refining the estimate to approximately 3.5 per haploid through analysis of stained nuclei and improved calibration with standards. The , completed in 2003, confirmed a total haploid size of about 3.0 Gbp, encompassing both coding and non-coding sequences, with subsequent assemblies like GRCh38 incorporating additional data to reach the modern consensus. Intraspecific variation in size is minimal, typically less than 1% across populations, as evidenced by sequence-based estimates averaging 3.039 Gbp for haploid genomes with standard deviations under 0.5%. differences are negligible, with haploid genomes slightly smaller due to the compact (about 59 million base pairs) compared to the second in females, contributing less than 2% overall variation. These attributes underscore the human genome's relative constancy, where approximately 98% of the sequence is , including regulatory elements and repeats. Deviations from this baseline, such as involving whole-chromosome imbalances, are observed in over 90% of solid tumors and contribute to cancer progression by disrupting and cellular fitness.

Extreme Cases in Other Organisms

Among the most striking examples of genomic gigantism in eukaryotes are found in , particularly in certain angiosperms and pteridophytes. The Japanese canopy plant Paris japonica holds a long-standing record for one of the largest known plant genomes, with a haploid DNA content (C-value) of 152.23 pg, equivalent to approximately 149 Gbp, as measured by in 2010. This value surpasses the size by about 50-fold, highlighting the vast expansions typical in some plant lineages. More recently, the fork fern Tmesipteris oblanceolata has been identified as possessing the largest eukaryotic genome documented to date, with a C-value of approximately 164 pg (160.45 Gbp), determined through analysis in 2024; this fern's compact fronds belie its enormous nuclear DNA content, which is roughly 7% larger than that of P. japonica. In contrast, prokaryotes and certain streamlined eukaryotes exhibit remarkably small genomes. Bacteria like the endosymbiont Candidatus Carsonella ruddii, which resides in psyllid insects, have one of the tiniest known bacterial genomes at 0.16 Mbp, corresponding to a C-value of about 0.00016 pg, reflecting extreme gene reduction due to reliance on host resources. Among eukaryotes, the budding yeast Saccharomyces cerevisiae represents a minimal eukaryotic genome with a C-value of 0.012 pg (12 Mbp), a compact organization that supports its rapid reproduction and has made it a model organism for genomic studies. Similarly, the Japanese pufferfish Fugu rubripes displays a streamlined vertebrate genome at 0.4 pg (400 Mbp), achieved through minimal intergenic regions and repetitive DNA, aiding comparative genomics with larger vertebrate counterparts like humans. Protists have historically been cited for purportedly enormous genomes, such as claims of 670 pg for the amoebozoan Amoeba dubia (or synonym Polychaos dubium), based on mid-20th-century cytophotometric measurements; however, recent re-evaluations using modern techniques have shown these to be significant overestimates due to methodological artifacts, with related species like having a revised C-value of approximately 39 pg; exact sizes for A. dubia remain unconfirmed by sequencing but are considerably smaller than the original claims. Surveys of eukaryotic genome sizes in the 2020s, drawing from expanded databases like the Plant DNA C-values Database and animal genome catalogs, confirm that no verified C-values exceed around 160 pg, with dominating the upper extremes. Ecologically, such large genomes often occur in sessile like geophytes and canopy species, where slower metabolic rates and reduced selection pressure against DNA accumulation may favor and transposon proliferation, contrasting with the compact genomes of mobile or resource-limited organisms.

References

  1. [1]
    Sizes of genomes: The C‑value paradox
    The C-value is the amount of DNA in the haploid genome of an organism. It varies over a very wide range, with a general increase in C-value with complexity of ...
  2. [2]
    The C-value Enigma in Plants and Animals: A Review of Parallels ...
    Mean C-value is indicated by the vertical line within the bars, and is obviously close to the low end of the distribution in most of these groups.
  3. [3]
    C-value paradox: Genesis in misconception that natural selection ...
    C-value paradox refers to the lack of correlation between biological complexity and the intuitively expected protein-coding genomic information or DNA content.
  4. [4]
    C-Value - an overview | ScienceDirect Topics
    C-Value refers to the amount of DNA in a single set of chromosomes, or the amount of DNA in one haploid set of chromosomes.
  5. [5]
    What's in a genome? The C-value enigma and the evolution of ... - NIH
    In eukaryotes, by contrast, haploid nuclear genome sizes ('C-values') range more than 60 000-fold, from 2.3 Mbp in the parasitic microsporidian ...
  6. [6]
    3.4 Amount of DNA (c-value) and Number of Chromosomes (n-value)
    c-value (c) represents DNA content, and n-value (n) represents the number of chromosome sets. In a haploid gamete, c=1 and n=1; in a diploid zygote, c=2 and n= ...
  7. [7]
    What's in a genome? The C-value enigma and the evolution of ...
    Sep 26, 2015 · In eukaryotes, by contrast, haploid nuclear genome sizes ('C-values') range more than 60 000-fold, from 2.3 Mbp in the parasitic ...
  8. [8]
    Plant DNA Flow Cytometry and Estimation of Nuclear Genome Size
    Considering the 1:1 ratio of AT:GC pairs and ignoring the presence of modified nucleotides, Doležel et al. (2003) showed that 1 pg DNA = 0·978 × 109 bp. To ...
  9. [9]
    Letter to the editor - Doležel - 2003 - Cytometry Part A
    Jan 22, 2003 · ... pg, and 1 pg of DNA would represent 0.978 × 109 base pairs. The same conversion factor (0.98 × 109) was proposed by Cavalier-Smith (17), to ...
  10. [10]
    Nuclear DNA Amounts in Angiosperms: Progress, Problems and ...
    Several methods have been used to measure plant DNA C-values, but most values have been estimated by Feulgen microdensitometry (Fe), both overall and since 1997 ...
  11. [11]
    Nuclear genome size: Are we getting closer? - Wiley Online Library
    Jun 25, 2010 · With the advent of molecular biology the trend was to express genome sizes in the number of base pairs. This necessitated a conversion factor.
  12. [12]
    Application‐based guidelines for best practices in plant flow cytometry
    Sep 29, 2021 · Thus, the holoploid genome size (or C-value) refers to the total amount of DNA in the nucleus regardless of generative ploidy.
  13. [13]
    The Desoxyribose Nucleic Acid Content of Animal Nuclei
    DNA content, cellsize, and the C‐value enigma, Biological Reviews 76, no.11 ... Hewson Swift , and Ruth Kleinfeld DNA in Grasshopper Spermatogenesis ...Missing: original | Show results with:original
  14. [14]
    The Constancy of Desoxyribose Nucleic Acid in Plant Nuclei - PNAS
    The Constancy of Desoxyribose Nucleic Acid in Plant Nuclei. Hewson SwiftAuthors Info & Affiliations. November 15, 1950. 36 (11) 643-654.
  15. [15]
    Plant Genome Size Research: A Field In Focus - Oxford Academic
    ... Hewson Swift, who coined the term, did not define it (Swift, 1950). This ... The constancy of desoxyribose nucleic acid in plant nuclei. Proceedings of ...
  16. [16]
    DNA | Summary - What is Biotechnology
    1949, DNA content of a cells linked to a cell's number of chromosomes, Vendrely, Boivin ; 1949 - 1950, DNA four base ratio shown to be always consistent ...Missing: 1940s | Show results with:1940s
  17. [17]
    THE DESOXYRIBONUCLEIC ACID CONTENT OF ANIMAL CELLS ...
    Evidence is summarized for the constancy of DNA content for each set of chromosomes in the various cells of an organism. ... MIRSKY A. E., RIS H. Variable and ...
  18. [18]
    [PDF] Understanding intraspecific variation in genome size in plants - Preslia
    However, exact measure- ments of genome size only became possible in the early 1950s with the introduction of. Feulgen densitometry (Swift 1950) and made ...
  19. [19]
    DNA and Amphibian Life History - jstor
    DNA and Amphibian Life History. OLIVE B. GOIN, COLEMAN J. GOIN, AND KONRAD ... COPEIA, 1968, NO. 3. TABLE 1. AMOUNT OF DNA IN AU/NUCLEUS IN 43 SPECIES OF ...
  20. [20]
    Genome size and genomic GC content evolution in the miniature ...
    Mar 24, 2014 · Since the first measurements of genome size in the early 1950s (Swift, 1950), researchers have tried to estimate the maximum capacity of ...
  21. [21]
    When it comes to genomes, size matters - Kew Gardens
    At the other extreme, the Japanese canopy plant (Paris japonica, Melanthiaceae) holds the current record, with a staggering 149,000 Mb of DNA packed into each ...<|separator|>
  22. [22]
  23. [23]
    Article A 160 Gbp fork fern genome shatters size record for eukaryotes
    Jun 21, 2024 · At 160.45 Gbp/1C, this record-breaking genome challenges current understanding and opens new avenues to explore the evolutionary dynamics of genomic gigantism.Article · Discussion · Star Methods
  24. [24]
    Does Genome Size Matter? - Mr. DNA Lab
    The size of the human genome is 3.23 Gbp. As compared to the Eukoryotes and fungi, Prokaryotes have much smaller genome sizes ranging from 0.14 Mbp (Hodgkinia ...
  25. [25]
    A Genomic Perspective Across Earth's Microbiomes Reveals That ...
    It is known that genome sizes of Archaea and Bacteria range between 100 kbp and 16 Mbp, but the genome size distribution in nature is still undefined.
  26. [26]
    Plant DNA C-values Database | Royal Botanic Gardens, Kew
    The C-value is the DNA amount in an organism's unreplicated gametic nucleus. The database contains C-value data for 12,273 species.Algal C-values · Search · Angiosperm · Introduction
  27. [27]
    Animal Genome Size Database:: Home
    A comprehensive catalogue of animal genome size data. Haploid DNA contents ( C-values , in picograms ) are currently available for 6534 species.Search DataReferences
  28. [28]
    Patterns of genome size variation in snapping shrimp
    Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus ...
  29. [29]
    Intra-specific variation in genome size in maize: cytological and ...
    The populations analysed presented significant differences in 2C DNA amount, from 4.62 to 6.29 pg, representing 36.15 % of the inter-populational variation.
  30. [30]
    Century of B Chromosomes in Plants: So What? - Oxford Academic
    Supernumerary B chromosomes (Bs) are a major source of intraspecific variation in nuclear DNA amounts in numerous species of plants. They favour large genomes, ...
  31. [31]
    Genome Size Variation among Accessions of Arabidopsis thaliana
    Intraspecific variations of genome size below the species level are supposed to be rare (see review by Greilhuber, 1998). Reported differences can be explained ...
  32. [32]
    Larger genomes show improved buffering of adult fitness against ...
    Jan 25, 2023 · We found that populations with larger genomes were indeed better buffered against environmental stress for adult, but not for juvenile, fitness.
  33. [33]
    Genome size versus geographic range size in birds - PMC
    Feb 10, 2021 · The present study confirmed our hypothesis that in birds, genome size is weakly but positively associated with geographic range size. Moreover, ...
  34. [34]
    So much "junk" DNA in our genome - PubMed
    So much "junk" DNA in our genome. Brookhaven Symp Biol. 1972:23:366-70. Author. S Ohno. PMID: 5065367. No abstract available. MeSH terms.Missing: paper | Show results with:paper
  35. [35]
    Repeated Sequences in DNA - Science
    Repeated Sequences in DNA: Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. R. J. Britten and D. E. ...Missing: discovery | Show results with:discovery
  36. [36]
    The C-value paradox, junk DNA and ENCODE - ScienceDirect.com
    Nov 6, 2012 · So why not call it the 'genome size paradox'? What is a 'C-value' anyway? 'C-value' means the 'constant' (or 'characteristic') value of haploid ...
  37. [37]
    The genomic ecosystem of transposable elements in maize
    Since, we have learned that TEs are a ubiquitous feature of eukaryotic genomes, and that TEs make up over 85% of all the DNA in a maize genome. Here, we ...
  38. [38]
    Drift-barrier hypothesis and mutation-rate evolution - PNAS
    ... genome size (4). Because the two scalings intersect at ∼10 Mb ... M Lynch The Origins of Genome Architecture (Sinauer Associates, Sunderland, 2007).
  39. [39]
    Effective population size does not explain long-term variation ... - eLife
    Jul 18, 2025 · Effective population size scales neither with the content of transposable elements nor with overall genome size.<|separator|>
  40. [40]
    The effects of genome size and climate on basal metabolic rate ...
    Apr 1, 2025 · Part of the remaining variation was attributed to a negative effect of genome size, explaining 14% of the BMR variance when Tmin was included in ...
  41. [41]
    A Beginners' Guide to Genome Quantification by Feulgen Image ...
    In 1950, Hewson Swift developed the concept of the “C-value” in reference to the haploid “class” of DNA in plants, and 1 year later Alfred Mirsky and Hans Ris ...
  42. [42]
    The Constancy of Desoxyribose Nucleic Acid in Plant Nuclei ... - PNAS
    Physiol., 31, 529-546 (1948). THE CONSTANCY OFDESOXYRIBOSE NUCLEICACID IN PLANT. NUCLEI*. By HEWSON SWIFT ... (in press) (1950). 9 Lison, L., and Pasteels, J., ...
  43. [43]
    Genome Size - Greilhuber - 2010 - Journal of Botany
    Aug 25, 2010 · Progress was made by the introduction of Feulgen microspectrophotometry, which provided a tool to measure DNA amounts in single cells. This ...
  44. [44]
    Flow Cytometry for Estimating Plant Genome Size - ASHS Journals
    2003). Further converting to pg was performed by using the following formula: 1 pg = 1.660539 × 10−12 pg/1 AMU (Mohr and Taylor 2000).
  45. [45]
    Quantitative testing of the methodology for genome size estimation ...
    Flow cytometry (FCM) is a commonly used method for estimating genome size in many organisms. The use of FCM in plants is influenced by endogenous fluorescence ...
  46. [46]
    Real‐time PCR‐based method for the estimation of genome sizes
    With flow cytometry‐based methods, the DNA content of individual nuclei is analyzed by fluorescence measurements after staining with propidium iodide (7–12) or ...Missing: modern | Show results with:modern
  47. [47]
    A Flow Cytometry Protocol for Measurement of Plant Genome Size ...
    Flow cytometry (FCM) is widely used to estimate genome size (GS) and ploidy in plants. The method involves the preparation of a suspension of intact nuclei, ...A Flow Cytometry Protocol... · 2. Materials And Methods · 3. Results
  48. [48]
    Reference standards for flow cytometric estimation of absolute ...
    Aug 25, 2021 · For the estimation of DNA amounts in absolute units, a number of well‐established standards are now available to cover the range of known plant genome sizes.
  49. [49]
    Standardization of high-resolution flow cytometric DNA analysis by ...
    Chicken red blood cells (CRBC) have a DNA content of 35% of the human diploid value and have been widely used as internal standard. The ratio calculated on the ...Missing: genome size estimation
  50. [50]
    Mapping-based genome size estimation
    May 14, 2025 · With an average prediction of 290 Mbp as genome size for B. distachyon, the MGSE prediction slightly exceeds the assembly size. The findGSE ...
  51. [51]
    Is it time to abandon the flow cytometry in estimations of genome ...
    Jul 28, 2025 · Contemporary genome size estimation relies on four methods: (1) flow cytometry (FCM), (2) genomic survey sequencing, (3) Feulgen ...Missing: pulse gel electrophoresis
  52. [52]
    findGSE: estimating genome size variation within human and ...
    Using previously published sequencing data from 142 human genomes from all around the world, we estimated an average GS of 3, 039 Mb and found a significant ...
  53. [53]
    Pulsed field gel electrophoresis and genome size estimates - PubMed
    This technique can be used to estimate genome size of a microorganism, to reveal if a genome is circular or linear, to indicate the presence of megaplasmids, ...Missing: C- value
  54. [54]
    [PDF] Pulsed-Field Gel Electrophoresis (PFGE) Technique and its use in ...
    Jan 1, 2001 · PFGE has proved to be an efficient method for genome size estimation and the construction of chromosomal maps, as well as being useful for ...<|control11|><|separator|>
  55. [55]
    GRCh38.p14 - hg38 - Genome - Assembly - NCBI
    Homo sapiens (human) genome assembly GRCh38.p14 (hg38) from Genome Reference ... 2022/02/03 Assembly type: haploid-with-alt-loci Assembly level ...
  56. [56]
    On the length, weight and GC content of the human genome
    Feb 27, 2019 · The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg).
  57. [57]
    Semi-automated assembly of high-quality diploid human reference ...
    Oct 19, 2022 · The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted ...
  58. [58]
    From Pixels to Picograms - David C. Hardie, T. Ryan Gregory, Paul ...
    In 1950, Hewson Swift developed the concept of the “C-value” in reference to the haploid “class” of DNA in plants, and 1 year later Alfred Mirsky and Hans Ris ( ...
  59. [59]
    [PDF] ABSTRACT NIX, JOHN TYLER, Flow Cytometry for Estimating Plant ...
    Overall, it was found that flow cytometry can be very precise, repeatable, and extremely valuable for determining relative genome size and ploidy of closely ...
  60. [60]
    Completing the human genome sequence
    Aug 10, 2021 · The human genome consists of about 3 billion bases in a precise order, each of which can be represented by a letter (G, A, T or C). A genome's ...Missing: estimate | Show results with:estimate
  61. [61]
    Diversity of human Y chromosome revealed - Sanger Centre
    Aug 23, 2023 · The researchers identified an extensive, close to two-fold variation in the size of the Y chromosome, ranging from 45.2 million to 84.9 million ...
  62. [62]
    Novel classes of non-coding RNAs and cancer - PMC
    But one of the great surprises of modern biology was the discovery that protein-coding genes represent less than 2% of the total genome sequence, and ...
  63. [63]
    Cancer aneuploidies are shaped primarily by effects on tumor fitness
    Jan 1, 2024 · Although aneuploidy, which we define as whole-chromosome or whole-arm DNA imbalance, is observed in ~90% of tumors and was the first proposed ...
  64. [64]
    genome analysis of Candidatus Carsonella ruddii - PMC
    This genome consists of a circular chromosome of 159,662 bp and has been proposed as the smallest bacterial endosymbiont genome known to date. ... genome sequence ...
  65. [65]
    Re-evaluating evidence for giant genomes in amoebae - SciELO
    Using this data in the above formula we estimated the genome size of A. proteus should be between 24.5 MB and 28.1 MB. This number is four orders of magnitude ...
  66. [66]
    Genome Size Diversity and Its Impact on the Evolution of Land Plants
    Feb 14, 2018 · Genome size is a biodiversity trait that shows staggering diversity across eukaryotes, varying over 64000-fold.