Fact-checked by Grok 2 weeks ago

Haplotype

A haplotype is a set of DNA variants, such as single nucleotide polymorphisms (SNPs), that are located close together on a single chromosome and tend to be inherited as a unit due to low rates of recombination between them. These variants form distinct combinations that reflect the genetic history of an individual's ancestry from one parent, and haplotypes can span a single gene or extend across multiple genes. The term derives from "haploid genotype," emphasizing the inheritance from a single chromosome copy. Haplotypes play a crucial role in genetics by capturing patterns of linked genetic variation, which helps researchers trace evolutionary relationships and population migrations. In population genetics, they reveal how genetic diversity has accumulated over time, with common haplotypes shared across groups indicating shared ancestry, while rare ones highlight unique mutations. For instance, haplotype diversity measures the breadth of allelic combinations within a population, providing insights into genetic health and adaptability. In medical and research contexts, haplotypes are essential for identifying genetic factors in complex diseases, such as or cancer, by associating specific haplotype blocks with disease risk through genome-wide association studies (GWAS). The , completed in phases from 2002 to 2010, mapped common haplotypes across diverse human populations using tag SNPs to represent larger variant blocks, facilitating efficient genotyping and accelerating gene discovery for and drug response. This has enabled precise tracking of disease-associated variants and improved understanding of how genetic backgrounds influence phenotypic traits.

Fundamentals

Definition

A haplotype is a combination of alleles at multiple linked loci on a single that are typically inherited together from one . This inheritance occurs because the loci are physically close on the , resulting in low recombination rates during that preserve the allelic combination as a unit. The term "haplotype" derives from "haploid ," referring to the genetic configuration transmitted via or egg. Haplotypes represent blocks of DNA variants, including single nucleotide polymorphisms (SNPs), insertions/deletions (indels), and other polymorphisms, that are co-transmitted due to their proximity. These blocks form because recombination hotspots are infrequent in certain genomic regions, allowing variants to persist together across generations. Unlike a , which reports the pair of alleles at each locus without specifying their chromosomal arrangement, a haplotype provides information by indicating which alleles reside on the same . For instance, in the (HLA) system on , haplotypes consist of linked genes such as , , and that are inherited as cohesive units and play a critical role in . This phased view is essential for understanding genetic associations beyond unphased genotype data.

Inheritance and Characteristics

Haplotypes are transmitted from parents to offspring through specific inheritance patterns that depend on the genomic region involved. In non-recombining regions such as mitochondrial DNA (mtDNA) and the Y chromosome, haplotypes are passed down intact without recombination. Mitochondrial DNA haplotypes are maternally inherited, as human mtDNA is exclusively transmitted from the mother via the egg, with paternal mtDNA typically eliminated during fertilization or early embryogenesis. Similarly, Y-chromosome haplotypes follow strict paternal inheritance, as the Y chromosome is transmitted from father to son in its entirety, lacking homologous recombination with the X chromosome in its male-specific region. In contrast, autosomal haplotypes are subject to recombination during meiosis, which can shuffle alleles and break up haplotype structures across generations, leading to greater diversity in these regions. A key genetic property of haplotypes is their association with (LD), which quantifies the non-random co-occurrence of alleles at different loci within a haplotype. LD arises when alleles are inherited together more frequently than expected by chance, often due to physical proximity on the and limited recombination. The coefficient of linkage disequilibrium, denoted as D, measures this association for a pair of alleles A and B and is calculated as: D = p_{AB} - p_A \cdot p_B where p_{AB} is the frequency of the haplotype carrying both alleles A and B, and p_A and p_B are the frequencies of alleles A and B individually in the population. Positive values of D indicate excess co-occurrence, while negative values suggest repulsion; this metric helps identify regions where haplotypes persist as cohesive units. Haplotype blocks represent segments of the genome characterized by high LD and low haplotype diversity, where a limited number of common haplotypes account for most of the variation. These blocks function as evolutionary units because recombination is rare within them, allowing alleles to be inherited together over many generations and preserving ancestral combinations. In the human genome, such blocks are prevalent, with studies showing that they cover substantial portions of chromosomes, facilitating the tracking of genetic history and adaptation.

Resolution Methods

Haplotype Resolution Techniques

In diploid organisms, technologies typically produce unphased data, where the two s at each heterozygous locus are known but their assignment to specific parental chromosomes—known as the —is . This phase arises because standard short-read sequencing or methods cannot distinguish which resides on which homolog, leading to multiple possible haplotype configurations for a given . Resolving this is essential for applications such as identifying in disease variants, fine-mapping causal loci, and understanding recombination patterns. Experimental methods for haplotype resolution directly observe through physical separation or long-range linkage. Family-based approaches leverage information, particularly in parent-offspring trios, where rules allow unambiguous phasing of the child's genotypes by comparing them to parental haplotypes; for instance, if both parents are homozygous at a locus, the child's alleles can be directly assigned to the corresponding parental . Molecular techniques include clone-based methods, such as fosmid , where large DNA fragments (up to 40 kb) from a single are isolated, sequenced at both ends, and assembled to reveal phased haplotypes across regions; this approach has been used to generate haplotype maps covering megabases of the . More recently, long-read sequencing technologies like (PacBio) HiFi or (ONT) produce reads spanning tens to hundreds of kilobases, enabling direct observation of phased variants without reliance on . Computational methods infer statistically from population data, often using patterns. A seminal approach is the coalescent-based Bayesian model implemented in the PHASE software, which employs a (MCMC) with to sample haplotype configurations proportional to their , incorporating , recombination, and processes. Other widely adopted tools, such as and SHAPEIT, build on similar principles but optimize for speed and scalability using hidden Markov models (HMMs) or graph-based representations of haplotype reference panels. Experimental methods offer high accuracy, often achieving near-perfect phasing over targeted regions, but they are resource-intensive, requiring family samples or specialized library preparation and sequencing, which limits scalability to population-level studies. In contrast, computational methods are faster and applicable to large cohorts without additional data, processing millions of variants in hours, though they yield probabilistic estimates with switch error rates typically around 1-5% in well-powered datasets, depending on marker density and population diversity. A common large-scale workflow involves with arrays to capture hundreds of thousands of common variants, followed by computational phasing to resolve ambiguity and imputation using reference panels like the to infer ungenotyped sites, enabling cost-effective haplotype reconstruction across biobanks such as .

Gametic Phase Determination

Gametic phase, also known as haplotype phase, refers to the specific physical arrangement of alleles on homologous s within a diploid . In the cis configuration, two particular alleles are situated on the same chromosome, whereas in the trans configuration, they reside on opposite homologous chromosomes. This distinction is fundamental to genetic analysis, as it determines how alleles interact and influences phenotypic outcomes, particularly in cases of where an individual carries two different mutant alleles—one on each chromosome ()—which can result in additive or interactive effects on protein function or ./09%3A_Linkage_and_Recombination_Frequency/9.04%3A_Coupling_and_Repulsion_(cis_and_trans)_Configuration) The biological basis of gametic lies in the process of meiotic recombination, where homologous chromosomes exchange genetic material during formation, thereby reshuffling combinations and generating diverse haplotype phases across generations. This recombination ensures but also means that the phase inherited by an reflects a unique history of crossovers in parental . Standard typically yields only unphased genotypes—revealing which alleles are present without specifying their chromosomal pairing—requiring additional contextual information, such as data or linked markers, to resolve the phase accurately.35457-0) Early recognition of gametic phase emerged in studies of human blood group inheritance, where observations of non-random allele associations in families revealed the role of chromosomal linkage in maintaining specific configurations. For instance, investigations into ABO and Rh blood groups in the early 20th century demonstrated deviations from independent assortment, highlighting cis and trans arrangements as key to inheritance patterns. Complementing this, Bateson and Punnett formally described coupling (cis) and repulsion (trans) phases in 1905 through experiments on sweet pea flower color, establishing the conceptual framework for phase in linked genes. A prominent example of gametic phase's functional importance is in sickle cell anemia, where the arrangement of the HbS mutation relative to cis-regulatory variants on the β-globin gene cluster haplotype modulates disease severity. Individuals with the HbS in cis to the Arab-Indian haplotype exhibit elevated (HbF) levels due to linked enhancers, resulting in milder symptoms compared to those with the Benin or haplotypes in cis, which correlate with lower HbF and more severe vaso-occlusive crises. This illustrates how phase-dependent cis effects can alter gene regulation and clinical outcomes in compound heterozygous states involving HbS and other β-globin variants.

Types of Haplotypes

Mitochondrial DNA Haplotypes

(mtDNA) haplotypes refer to specific combinations of genetic variants within the mitochondrial genome, which is maternally inherited and transmitted uniparentally without recombination. The human mtDNA is a circular, double-stranded approximately 16,569 base pairs () in , encoding 37 genes including 13 proteins essential for . Unlike DNA, mtDNA exhibits a high , estimated at around 20 times that of the , which facilitates the accumulation of polymorphisms that define haplotypes. The absence of recombination in mtDNA ensures that haplotypes are inherited as intact blocks, preserving maternal lineages across generations and enabling precise tracing of ancestry. mtDNA haplogroups represent major phylogenetic clades of these haplotypes, primarily defined by stable single nucleotide polymorphisms (SNPs) in the control region or coding sequences. In populations, the oldest haplogroups include through L3, which form the root of the human mtDNA tree and reflect the continent's role as the origin of modern humans. For instance, and L1 are basal lineages found predominantly in , while L3 gave rise to non-African clades. In contrast, European populations are dominated by H, which accounts for about 40-50% of mtDNA lineages and is characterized by defining SNPs such as those at positions 2706 and 7028. These haplogroups provide a framework for classifying maternal based on shared mutational histories. A key application of mtDNA haplotypes lies in reconstructing maternal ancestry and patterns of , particularly the Out-of-Africa model. This model posits that modern humans originated in around 150,000-200,000 years ago, with a major dispersal event approximately 60,000-70,000 years ago involving L3-derived lineages M and N that populated and beyond. Haplotypes within these clades, such as those in haplogroup L, exhibit star-like phylogenies indicative of rapid expansion from source populations. In forensics, mtDNA haplotypes are valuable for identifying maternal relatives in cases lacking nuclear DNA, such as degraded remains, due to their stability and uniparental inheritance. mtDNA haplotype diversity is notably higher in compared to other continents, underscoring the region's status as the cradle of . Sub-Saharan African populations display extensive haplotype richness within L haplogroups, with nucleotide diversity values often exceeding those in or by factors of 2-3, reflecting longer evolutionary histories and larger ancestral effective population sizes. This elevated diversity, for example, in West African groups like those in where L1-L3 dominate over 98% of lineages, aids in forensic discrimination of maternal origins with high resolution (e.g., random match probability around 1.3%).

Y-Chromosome Haplotypes

The human spans approximately 62.46 Mb and features two pseudoautosomal regions (PAR1 and PAR2) that permit recombination with the , comprising about 5% of its length, while the remaining 95% constitutes the non-recombining Y (NRY) or male-specific region (MSY). This non-recombining structure ensures that Y-chromosome haplotypes are passed intact from father to son, enabling the tracking of paternal lineages over millennia. Y-chromosome haplotypes are primarily delineated by single nucleotide polymorphisms (), which define major haplogroups representing ancient branches of the human paternal tree. For instance, , marked by the M412 SNP, dominates with frequencies often surpassing 70% in populations from to . These SNPs, also known as unique event polymorphisms (UEPs), provide stable markers for deep ancestry due to their low mutation rates. Short tandem repeats (STRs) serve as complementary markers, offering high-resolution haplotypes within specific s by capturing more recent mutations. Panels of 12 to 30 loci, such as DYS19 and DYS389, allow differentiation of closely related paternal lines that share the same SNP-defined haplogroup. In genealogical DNA testing, haplotypes are employed in surname projects to explore recent patrilineal connections, often revealing matches among individuals sharing a common within the last few centuries. Conversely, SNP testing elucidates broader migratory histories and ancient origins. The absence of recombination in the NRY leads to the gradual accumulation of variants, which preserves phylogenetic signals but can result in haplotype blocks with reduced over time. This characteristic renders Y-chromosome haplotypes particularly useful in forensics, where they facilitate male-specific identification in mixed DNA samples, such as those from cases, by targeting lineage-specific markers without interference from female contributors.

Autosomal Haplotypes

Autosomes comprise 22 pairs of chromosomes in humans, which are subject to meiotic recombination that breaks up ancestral chromosomal segments into shorter haplotype blocks typically spanning tens to hundreds of kilobases. This recombination process generates diversity in haplotype structures across autosomal regions, with block boundaries often aligning with hotspots of recombination where decays rapidly. Identification of autosomal haplotypes commonly occurs through genome-wide association studies (GWAS) that genotype single nucleotide polymorphisms (SNPs) to tag haplotype variations and infer associations with genetic markers. Statistical imputation further enhances resolution by predicting untyped SNPs and phasing genotypes into haplotypes using reference panels, such as the , which catalogs haplotype diversity from over 2,500 individuals across global populations. These methods rely on shared haplotype segments to achieve high accuracy in reconstructing autosomal phases, particularly for common variants. Autosomal haplotypes play a key role in the genetics of , as regions with low recombination rates—such as those near centromeres—preserve longer haplotype blocks that extend over megabases, facilitating the co-inheritance of multiple variants. For example, in the gene (LCT), population-specific haplotypes underscore this variation: the European-associated persistence (-13910C>T) resides on a conserved haplotype block exceeding 1 in length, reflecting recent positive selection, whereas distinct, shorter haplotypes carry persistence alleles in East pastoralist groups like the Maasai.

Applications in Genetics

Genealogical and Forensic Uses

Haplotypes play a central role in genealogical testing through commercial direct-to-consumer kits, which analyze Y-chromosome short tandem repeats () and (mtDNA) to reconstruct paternal and maternal lineages, respectively, aiding in the construction of family trees. For instance, services like provide maternal reports based on mtDNA variants inherited solely from the mother, tracing ancestry back thousands of years along the direct female line. Similarly, Y-STR testing identifies paternal , allowing users to connect with distant relatives sharing the same male lineage. Autosomal haplotypes, derived from recombining chromosomes, are used in these kits to estimate ethnicity percentages by comparing user data to reference populations, though such estimates represent broad geographic origins rather than precise family histories. In forensic applications, Y-haplotypes are valuable for tracing male lineages in crime scene investigations, particularly in cases involving sexual assault or mixed samples where autosomal DNA profiles may be incomplete. By generating a Y-STR haplotype from evidence, investigators can exclude non-matching male suspects or search databases for paternal relatives, as the haplotype is passed unchanged from father to son. mtDNA haplotypes, due to their high copy number per cell and maternal inheritance, are especially useful for analyzing degraded or low-quantity samples, such as those from burned remains or old bones, enabling comparisons to maternal relatives when nuclear DNA extraction fails. These methods complement standard short tandem repeat (STR) profiling but are not individually unique, requiring statistical evaluation against population databases to assess rarity. Despite their utility, haplotype-based analyses in and forensics face significant limitations, including risks from data uploads to public or commercial databases, which can enable unauthorized re-identification of individuals or relatives through shared genetic segments. Database biases arise when reference populations underrepresent certain , leading to inaccurate ethnicity estimates or haplotype frequency assessments that skew probabilistic interpretations. Accuracy in both fields depends heavily on the diversity and size of reference populations; for example, underrepresented groups may receive mismatched assignments or inflated match probabilities, potentially compromising identifications. A notable case study is the identification of victims from the September 11, 2001, attacks, where mtDNA haplotype analysis proved essential for matching fragmented, degraded remains to maternal relatives when autosomal profiles were unobtainable due to extreme heat and exposure. In this mass fatality incident, nearly 20,000 remains were processed, with mtDNA sequencing used as a supplementary tool in analysis, contributing to identifications in cases where DNA was insufficient by confirming maternal lineage matches against family reference samples. In mass fatality incidents, Y-haplotype analysis can support of remains by linking them to paternal lines, enhancing efficiency in testing. This application highlighted the robustness of haplotype methods in disaster victim identification while underscoring challenges like sample contamination and the need for extensive . As of 2025, identifications continue using advanced DNA methods, with over 1,650 victims identified and approximately 1,100 still unidentified.

Population and Medical Genetics

In population genetics, haplotypes serve as powerful markers for inferring historical and patterns across human populations. For instance, the presence of specific Native American-derived haplotypes in contemporary Latin American genomes highlights ancient from groups into admixed populations, enabling researchers to quantify the proportions of ancestral contributions with greater precision than single nucleotide polymorphisms (SNPs) alone. Such analyses often employ haplotype-based extensions of FST-like statistics, which measure genetic differentiation between populations by accounting for (LD) blocks, thus revealing subtle population substructures that reflect demographic events like bottlenecks or expansions. In , haplotypes are instrumental in identifying disease susceptibility and guiding . The (HLA) region on contains highly polymorphic haplotypes strongly associated with autoimmune disorders; for example, the haplotype increases risk for by influencing immune response regulation. Similarly, in , haplotypes within the gene on predict variable rates, where poor metabolizer haplotypes (e.g., *4/*4) elevate toxicity risks for medications like , informing dosing adjustments to enhance therapeutic outcomes. Integrating haplotypes into genome-wide association studies (GWAS) enhances the of causal beyond individual SNPs, as haplotype blocks capture LD patterns that refine fine-mapping efforts. This approach reduces false positives and narrows down candidate regions for , improving statistical power in diverse populations. A notable example is the APOE on , where the ε4 haplotype confers elevated risk for late-onset by modulating amyloid-beta clearance, with carriers showing up to a fourfold increased compared to non-carriers.

Diversity and Evolutionary Aspects

Haplotype Diversity Measures

Haplotype diversity measures quantify the variation in genetic sequences inherited together on a single chromosome, providing insights into the genetic structure and evolutionary dynamics of populations. These metrics are essential for assessing the extent of polymorphism within haplotype sets derived from DNA sequence data. Key indices include nucleotide diversity (π), which represents the average number of nucleotide differences per site between all pairs of sequences in a sample, calculated as the total number of pairwise differences divided by the total number of sites examined. This measure captures the overall sequence variation at the nucleotide level and is particularly useful for comparing diversity across genomic regions. Another fundamental metric is haplotype diversity (h), defined as the probability that two randomly selected haplotypes are different from each other, given by the formula h = 1 - \sum_{i=1}^{k} p_i^2, where p_i is the frequency of the i-th haplotype and k is the number of haplotypes in the sample. This index, analogous to expected heterozygosity, emphasizes the evenness of haplotype frequencies rather than raw sequence differences, making it suitable for analyzing discrete haplotype configurations in non-recombining regions like mitochondrial or Y-chromosomal DNA. Haplotype diversity typically ranges from 0 (complete monomorphism) to 1 (maximum diversity with all unique haplotypes), and its unbiased estimator adjusts for sample size to avoid underestimation in small datasets. To evaluate neutrality and distinguish within-population diversity patterns from those between populations, statistic is commonly applied to haplotype data. This test compares the number of segregating sites to the average pairwise differences (π), with values near zero indicating evolution, negative values suggesting population expansion or purifying selection, and positive values implying balancing selection or population contraction. When applied to haplotype-resolved sequences, helps detect deviations from expected diversity levels under the standard model, facilitating comparisons of intra-population variation against inter-population differentiation. Several evolutionary forces influence these diversity measures: recombination breaks down linkage blocks to generate novel haplotypes and elevate diversity, while can reduce variation by favoring specific alleles (e.g., through selective sweeps), and erodes diversity in small s by random fixation of alleles. Ancestral populations generally exhibit higher haplotype diversity due to longer accumulation of mutations and reduced bottlenecks compared to derived populations. Software tools like DnaSP (DNA Sequence Polymorphism) enable the computation of these metrics from aligned data, incorporating options for haplotype phasing, neutrality tests, and population subdivision analyses to ensure robust estimates.

Evolutionary Origins and History

Haplotypes serve as genetic signatures of ancient that occurred in ancestral populations, preserving blocks of linked inherited together from common ancestors. provides a framework for tracing these haplotypes back through time, modeling the of genetic lineages to infer the time to the (TMRCA) by simulating how accumulate on phylogenetic trees. This approach reconstructs haplotype trees that reveal the branching patterns of descent, highlighting how neutral mark historical events without selective . In , mitochondrial DNA (mtDNA) haplotypes trace matrilineal ancestry to "," an ancestral woman estimated to have lived approximately 150,000–200,000 years ago in , based on the root of the global mtDNA . Similarly, Y-chromosome haplotypes define patrilineal lines leading to "," estimated to have lived between 200,000 and 300,000 years ago, also in , based on analyses of non-recombining Y-SNP trees. Recent studies using and revised rates have refined these estimates, with some analyses suggesting older TMRCAs exceeding 300,000 years, though consensus remains within the 150,000–300,000 year range as of the . These haplotype-based estimates support an origin for anatomically modern humans around 200,000–300,000 years ago, as corroborated by fossil evidence, while the uniparental TMRCA reflects coalescence within surviving lineages. Human migration out of Africa involved population bottlenecks that drastically reduced haplotype diversity, as small founding groups carried only subsets of ancestral variation. Serial founder effects during stepwise expansions—such as from Africa to Eurasia and beyond—further eroded diversity, with each migration event sampling fewer haplotypes and amplifying drift in peripheral populations like Native Americans and Oceanians. These processes created star-like haplotype expansions in non-African haplogroups, reflecting rapid demographic growth after bottlenecks around 50,000–70,000 years ago. Recent admixture from gene flow, including intercontinental migrations and archaic introgression, has reshaped haplotype structures in modern populations by introducing long identical-by-descent segments that break down older linkage patterns. For instance, Holocene-era admixture in Eurasian groups has obscured ancient selective sweeps, creating mosaic haplotypes that blend African, Neanderthal, and Denisovan ancestries. This ongoing gene flow enhances diversity in admixed populations, such as African Americans and Latinos, while complicating inferences of deep-time evolutionary history.

Conceptual Development

Historical Milestones

The concept of haplotypes as sets of linked genetic variants emerged from early 20th-century studies on . In 1910, discovered a white-eyed in the Drosophila melanogaster, revealing that certain traits are inherited together due to their location on the same chromosome, thus establishing the principle of that underpins haplotype formation. This work, building on Mendelian , demonstrated how alleles at nearby loci tend to be transmitted as units, influencing later understandings of haplotype inheritance patterns. Advances in immunogenetics during the and highlighted haplotypes in disease susceptibility through (HLA) typing. It was during these studies that the term "haplotype" was coined by Italian Ruggero Ceppellini in 1967 to describe linked alleles at the HLA complex. Serological and family-based HLA studies identified specific haplotypes associated with autoimmune conditions, marking a shift toward recognizing haplotypes as functional units in . In 1973, independent research groups reported a strong association between the HLA-B27 haplotype and , with over 90% of affected individuals carrying this variant, establishing haplotypes as key factors in disease mapping. The 1980s saw the advent of sequencing uniparental markers, enabling direct haplotype analysis in . () sequencing efforts began with analyses, culminating in a that examined mtDNA variation across populations to maternal lineages, effectively treating mtDNA as a single non-recombining haplotype. Concurrently, Y-chromosome studies initiated haplotype-based tracking of paternal ancestry using similar sequencing approaches. These developments laid the groundwork for phylogeographic applications of haplotypes. In the pre-genomics era, pedigree-based methods became central to resolving haplotypes for genetic mapping. Large family panels, such as the Centre d'Etude du Polymorphisme Humain (CEPH) established in , facilitated multi-point linkage analysis by inferring haplotype phases from multi-generational genotypes, enabling the construction of human genetic linkage maps and localization of Mendelian disease loci without full sequences. This approach relied on recombination events observed within pedigrees to define haplotype blocks and estimate linkage distances.

Modern Advances

The , launched in 2002 and culminating in its Phase I results in 2005, generated a comprehensive haplotype map of the by over 1.1 million single nucleotide polymorphisms (SNPs) in 269 individuals from four diverse populations, enabling the cataloging of common haplotype structures and patterns of across global ancestries. This effort facilitated genome-wide association studies by identifying haplotype blocks that capture the majority of common genetic variation, with subsequent phases expanding to over 3.1 million SNPs by 2007 to refine these maps further. Advancements in sequencing technologies during the revolutionized haplotype resolution through long-read platforms like (PacBio), which produce reads spanning tens of kilobases, allowing direct phasing of variants without reliance on and improving accuracy for complex genomic regions such as structural variants. Concurrently, CRISPR-Cas9 emerged as a tool for haplotype-specific modifications, enabling targeted corrections of disease-associated alleles on particular chromosomal copies, as demonstrated in therapeutic models for autosomal dominant disorders where allele-specific editing disrupts the mutant haplotype while sparing the wild-type. The integration of resources, such as the UK Biobank's whole-genome sequencing of over 500,000 participants, has enabled the discovery of haplotypes through advanced phasing algorithms that population-scale datasets, revealing low-frequency variants previously undetected in smaller cohorts. , particularly models like convolutional autoencoders and diffusion-based approaches, has enhanced haplotype imputation by predicting ungenotyped variants with higher precision than traditional methods, especially in low-coverage data, by learning complex patterns in haplotype reference panels. In the , efforts to haplotypes in diverse ancestries have intensified to mitigate Eurocentric biases in genetic databases, with tools like SHAPEIT5 achieving near-perfect accuracy across global populations by leveraging identity-by-descent segments, thus improving equity in genomic analyses. These advances have extended to polygenic risk scores, where haplotype-resolved data enhances prediction accuracy for complex traits by accounting for long-range dependencies, as shown in cross-ancestry models that boost estimates and transferability between populations.

References

  1. [1]
    Haplotype - National Human Genome Research Institute
    A haplotype refers to a set of DNA variants along a single chromosome that tend to be inherited together. They tend to be inherited together because they are ...
  2. [2]
    haplotype / haplotypes | Learn Science at Scitable - Nature
    A haplotype is a group of genes within an organism that was inherited together from a single parent. The word "haplotype" is derived from the word "haploid," ...
  3. [3]
    Definition of haplotype - NCI Dictionary of Genetics Terms
    Listen to pronunciation. (HA-ploh-tipe) A set of closely linked genetic markers or DNA variations on a chromosome that tend to be inherited together.
  4. [4]
    Haplotype-Based Analysis: A Summary of GAW16 Group 4 ... - NIH
    First, haplotypes can be considered to better represent the parental chromosomes that are the defining units. Because they are the inherited units, haplotypes ...
  5. [5]
    Defining Genetic Diversity – Molecular Ecology & Evolution
    A population with high haplotype diversity has many distinct combinations of alleles, signifying a broad genetic base. This measure is particularly useful ...
  6. [6]
    Definition and clinical importance of haplotypes - PubMed - NIH
    Here we review basic concepts of high-density genetic maps of SNPs and haplotypes and how they are typically generated and used in human genetic research.Missing: significance | Show results with:significance
  7. [7]
    About the International HapMap Project
    Jun 4, 2012 · The goal of the International HapMap Project was to develop a haplotype map of the human genome. Often referred to as the HapMap, it describes the common ...What Was The International... · What Is A Haplotype? · What Populations Were...
  8. [8]
    Linkage disequilibrium — understanding the evolutionary past and ...
    This graph provides some of the first evidence of haplotype blocks and their association with recombination hot spots. ... low recombination rate. Hence, natural ...Haplotype Phase · Figure 1. Haplotype Blocks · Population Genetics Of Ld
  9. [9]
    Pairwise comparative analysis of six haplotype assembly methods ...
    A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and ...
  10. [10]
    Use of diplotypes – matched haplotype pairs from homologous ... - NIH
    A haplotype is a subset of all alleles on specific chromosomes in the population. A diplotype is a subset of all genotypes on homologous chromosome pairs in the ...
  11. [11]
    Genetics, Human Major Histocompatibility Complex (MHC) - NCBI
    Aug 14, 2023 · The HLA haplotype is a combination of linked HLA genes (HLA-A, -B,-C,-DR, -DQ,-DP) transmitted on a single parental chromosome.[2] HLA antigens ...
  12. [12]
    Maternal inheritance of human mitochondrial DNA. - PNAS
    The results of this study demonstrate that human mitochondrial DNA is maternally inherited. The techniques described for using peripheral blood platelets as a ...
  13. [13]
    The human Y chromosome: the biological role of a “functional ...
    The nonrecombining portion of the Y retains a record of the mutational events that have occurred along male lineages throughout evolution. This is because it is ...
  14. [14]
    Significant variation in haplotype block structure but conservation in ...
    Jan 3, 2007 · The HapMap data document the generality of a block-like pattern of linkage disequilibrium (LD) with regions of low and high haplotype diversity ...Missing: units | Show results with:units
  15. [15]
    Haplotype phasing: existing methods and new developments - Nature
    Sep 16, 2011 · The authors review the experimental and computational approaches for determining haplotype phase, focusing on statistical methods, the
  16. [16]
    A Comparison of Phasing Algorithms for Trios and Unrelated ...
    Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios.
  17. [17]
    Haplotype sorting using human fosmid clone end-sequence pairs
    As an example, we sequenced 165 fosmid clone inserts to generate 6.8 Mbp of sequenced haplotypes, and demonstrate its utility in uncovering phase-switching ...
  18. [18]
    A Long-Read Sequencing Approach for Direct Haplotype Phasing in ...
    Dec 1, 2020 · This demonstrates that variant calling based on ONT reads is reliable only for SNVs, as long as they are not embedded in long homopolymer runs.
  19. [19]
    A New Statistical Method for Haplotype Reconstruction from ...
    We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms.
  20. [20]
    Rapid and Accurate Haplotype Phasing and Missing-Data Inference ...
    We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies.
  21. [21]
    Accurate rare variant phasing of whole-genome and whole-exome ...
    Jun 29, 2023 · We introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and ...
  22. [22]
    Genetics 372, Definition of Course Terms
    cis configuration Two sites on the same molecule of DNA. ... The process can result in the exchange of alleles between chromosomes. ... Trans configuration The ...
  23. [23]
    Locations and patterns of meiotic recombination in two-generation ...
    Meiotic recombination is an essential feature of chromosomal biology and is largely responsible for generating haplotype diversity in offspring. Until recently, ...
  24. [24]
    Studies in Human Inheritance. V. Multiple Allelomorphism as ... - jstor
    LINKAGE IN BLOOD GROUP HEREDITY. PROFESSOR LAURENCE H. SNYDER. DEPARTMENT OF ZOOLOGY, OHIO STATE UNIVERSITY. THIE inheritance of the hnman blood groups has been.
  25. [25]
    History of Genetics: 3 Periods - Biology Discussion
    Bateson and Punnet (1905) described the term 'Coupling' and 'Repulsion'. According to them coupling and repulsion are the two aspects of one phenomenon, i.e., ...
  26. [26]
    Genetic modifiers of sickle cell disease - Wiley Online Library
    Apr 10, 2012 · Sickle cell anemia and HbS-β0 thalassemia in patients with the AI haplotype have a higher HbF concentration than comparable patients of African ...
  27. [27]
    Biological impact of α genes, β haplotypes, and G6PD activity in ...
    Mar 19, 2018 · α genes and CAR haplotypes independently impact hemolytic anemia severity; low G6PD-activity impacts anemia severity in CAR/CAR patients.Missing: trans | Show results with:trans
  28. [28]
    TWINKLE and Other Human Mitochondrial DNA Helicases
    Apr 9, 2020 · Mitochondria contain their own circular genome of 16,569 bp (mtDNA) ... No recombination of mtDNA after heteroplasmy for 50 generations ...
  29. [29]
    African mitochondrial haplogroup L7 - Nature
    Jun 24, 2022 · Seven African mtDNA haplogroups (L0–L6) traditionally captured this ancient structure—these L haplogroups have formed the backbone of the mtDNA ...
  30. [30]
    A Pied Cladistic Canvas of mtDNA Haplogroup H in Eurasia
    The mitochondrial DNA (mtDNA) sequences of Europeans are sorted into ten major phylogenetic clades, or haplogroups, alphabetically named H, J, K, N1, T, U4, U5, ...
  31. [31]
    Human migration, diversity and disease association - PubMed Central
    The application of mtDNA to trace the evolutionary pattern and the migration events in human is based on the fact that certain haplotypes are observed in ...
  32. [32]
    Article The Dawn of Human Matrilineal Diversity - ScienceDirect.com
    Our results suggest that the early settlement of humans in Africa was already matrilineally structured and involved small, separately evolving isolated ...
  33. [33]
    MtDNA diversity of Ghana: a forensic and phylogeographic view - PMC
    Based on available data haplogroup L4 seems more abundant within East Africa [44] but only scarcely present in West Africa (Nigeria and Ghana [20]). Haplogroup ...
  34. [34]
    Reconstructing ancient mitochondrial DNA links between Africa and ...
    Previous studies (Salas et al. 2004) showed that L haplotypes account for <1% of mtDNAs in Europe (Fig. 1), with L1b being the most common haplogroup (Fig ...
  35. [35]
    Y chromosome in health and diseases - PMC - PubMed Central
    Aug 13, 2020 · The human Y chromosome is 57.23 MB in size and harbors two specific regions, generally referred to as the pseudoautosomal regions (i.e. PAR1 ...
  36. [36]
    Methodology for Y Chromosome Capture: A complete genome ...
    Jun 21, 2018 · The Y-chromosome in particular, since it does not recombine, has changed over generations only through an accumulation of mutations, thereby ...
  37. [37]
    A major Y-chromosome haplogroup R1b Holocene era founder ...
    Aug 25, 2010 · Major R1b Founder Effect in West Europe​​ R1b-M412 appears to be the most common Y-chromosome haplogroup in Western Europe (>70%), while being ...
  38. [38]
    Y chromosome DNA tests - ISOGG Wiki
    Benefits: Y-SNP tests are more accurate for determining deep ancestry and can help distinguish between unrelated individuals with similar Y-STR results.
  39. [39]
    Extended Y chromosome haplotypes resolve multiple and unique ...
    A total of 43 of the 99 chromosomes still match completely when we increase the number of Y-STRs to 12 (DYS19, DYS385a, DYS385b, DYS388, DYS389I, DYS389II, DYS ...
  40. [40]
    Understanding genetic ancestry testing - ISOGG Wiki
    Aug 25, 2015 · Y-STR tests are used for genetic genealogy purposes within surname projects to test hypotheses about patrilineal relationships and to ...
  41. [41]
    Y-DNA SNP testing chart - ISOGG Wiki
    SNP testing for the Y chromosome advances knowledge about one's detailed haplogroup assignment and deep ancestry, and can also be used for genealogy purposes.
  42. [42]
    [PDF] The Future of Forensic DNA Testing: Predictions of the Research ...
    Alternatively, the Y chromo- some is transmitted from father to all his sons, so DNA on the Y chromosome can be used to trace the male lineage. Y markers are ...
  43. [43]
    The Y chromosome and its use in forensic DNA analysis - PMC - NIH
    Sep 17, 2021 · STRs on the Y chromosome are no more likely to mutate than those selected for forensic analysis on autosomes but, not only are they are ...
  44. [44]
    A haplotype map of the human genome - PMC - PubMed Central
    Although haplotypes often break at recombination hotspots (and block boundaries), this tendency is not invariant. We identified all unique haplotypes with ...Missing: shorter | Show results with:shorter
  45. [45]
    Genotype Imputation with Thousands of Genomes - PMC - NIH
    Nov 1, 2011 · Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of ...
  46. [46]
    The variation and evolution of complete human centromeres | Nature
    Apr 3, 2024 · Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres ...Missing: traits | Show results with:traits
  47. [47]
    Genetics of lactase persistence – fresh lessons in the history of milk ...
    Dec 15, 2004 · It confirms that the haplotype carrying lactase persistence is almost identical for nearly 1 Mb, is therefore young and must have been ...
  48. [48]
    Stronger signal of recent selection for lactase persistence in Maasai ...
    Sep 5, 2012 · We found that signatures of recent selection coinciding with the LCT gene are the strongest across the genome in the Maasai population.
  49. [49]
    Y-DNA and mtDNA Results - Genealogy Guide - LibGuides
    Sep 5, 2024 · When you test with 23andMe or the FamilyFinder test at FamilyTreeDNA, you will receive a very general haplogroup assignment.
  50. [50]
    Maternal Haplogroups - mtDNA - 23andMe Customer Care
    These maps reflect human migration over tens of thousands of years, a period in which humans migrated from eastern Africa to inhabit every continent on Earth ...Phylogenetic Tree · Science Behind · Common Questions<|separator|>
  51. [51]
    Haplogroup Comparisons Between Family Tree DNA and 23andMe
    Mar 24, 2014 · When you take the Y test, Family Tree DNA also provides you with an estimated haplogroup. That estimate has proven to be very accurate over the ...
  52. [52]
    Autosomal DNA testing comparison chart - ISOGG Wiki
    Includes a comparison of results from Living DNA, AncestryDNA, 23andMe, MyHeritage and Family Tree DNA. Comparing ethnicity estimates by Elizabeth Onheiber.
  53. [53]
    The Y chromosome and its use in forensic DNA analysis
    Sep 17, 2021 · As a marker of lineage, the Y chromosome provides additional tools to assist in the inference of ancestry, both geographical and familial and ...
  54. [54]
    Forensic use of Y-chromosome DNA: a general overview - PubMed
    Mar 17, 2017 · Y-STR haplotyping applied in crime scene investigation can (i) exclude male suspects from involvement in crime, (ii) identify the paternal ...
  55. [55]
    Mitochondrial DNA in forensic use - PMC - NIH
    Aug 10, 2021 · The rarity of a haplotype can be determined by simple counting of how many times the sequence is observed amongst samples of a database. As ...
  56. [56]
    Y-chromosomal SNP haplotype diversity in forensic analysis
    Y-chromosomal diversity thus accumulates within lineages, creating male-specific haplotypes which, because of population genetic and behavioural factors ...
  57. [57]
    Attacks on genetic privacy via uploads to genealogical databases
    Jan 7, 2020 · Direct-to-consumer genetic genealogy services that allow users to upload their own datasets are vulnerable to attacks on genetic privacy ...
  58. [58]
    Power and Limitations of Inferring Genetic Ancestry - PMC - NIH
    In addition, there are concerns over the collection of genetic data from people without adequate consent, and the unethical use of genetic databases for ...
  59. [59]
    [PDF] Lessons Learned From 9/11: DNA Identification in Mass Fatality ...
    Sep 1, 2006 · This report contains the KADAP's “lessons learned,” particularly regarding DNA protocols, laboratory techniques, and statistical approaches, in ...
  60. [60]
    Epidemiology. DNA identifications after the 9/11 World Trade Center ...
    The large number of victims and the extreme thermal and physical conditions of the site necessitated special approaches to the DNA-based identification.Missing: haplotypes | Show results with:haplotypes
  61. [61]
    a software for comprehensive analysis of DNA polymorphism data
    DnaSP v5 is a software for comprehensive DNA polymorphism analysis, with features for multiple data files, haplotype phasing, and insertion/deletion data.
  62. [62]
    Distribution of haplotypes from a chromosome 21 region ... - PNAS
    Geographic distribution of contemporary haplotypes implies distinctive prehistoric human migrations: one to Oceania, one to Asia and subsequently to America.
  63. [63]
    Serial coalescent simulations suggest a weak genealogical ... - PNAS
    May 23, 2006 · After the reconstruction of the genealogy, mutations are then randomly distributed onto the tree by using a user-specified mutation model, in ...
  64. [64]
    The power of coalescent methods for inferring recent and ancient ...
    There has been an emphasis of the fact that the coalescent model assumes neutral evolution of genetic markers, with mutations having no impact on the ...Results · Exons And Functional Genomes... · Bayesian Test Of Gene Flow
  65. [65]
    Mitochondrial DNA Mixes It Up | Science | AAAS
    "Mitochondrial Eve," the hypothetical mother of all modern humans who lived about 150,000 years ago, might be lying about her age.
  66. [66]
    Estimating the Age of the Common Ancestor of Men from ... - Science
    Using coalescence theory (3), Dorit et al. argue that the MRCA of the Y chromosome existed some 270,000 years ago, with a. "95 ...<|control11|><|separator|>
  67. [67]
    Genetic Adam and Eve did not live too far apart in time | Nature
    Aug 6, 2013 · A comparable analysis of the same men's mtDNA sequences suggested that Eve lived between 99,000 and 148,000 years ago. “This idea of a very ...
  68. [68]
    Genomic inference of a severe human bottleneck during ... - Science
    Aug 31, 2023 · Our findings indicate that the severe bottleneck brought the ancestral human population close to extinction and completely reshaped present-day ...
  69. [69]
    The great human expansion - PNAS
    Oct 17, 2012 · A serial founder effect model involves three explicit assumptions. First, migration after the initial founder expansion was sufficiently limited ...
  70. [70]
    Explaining worldwide patterns of human genetic variation using a ...
    Forward simulations of unlinked loci have shown that the decline in heterozygosity can be described by a serial founder model, in which populations migrate ...
  71. [71]
    Gene flow from North Africa contributes to differential human genetic ...
    Recent gene flow among populations results in haplotypes shared identical by descent. To investigate differences in African ancestry among European populations ...
  72. [72]
    Admixture has obscured signals of historical hard sweeps in humans
    Oct 31, 2022 · Beneficial mutations were introduced on the Main Eurasian branch at three different times: 55 ka, 44 ka and 36 ka (Supplementary Fig. 19). We ...
  73. [73]
    Reconstructing recent population history while mapping rare ...
    Apr 10, 2019 · This finding implied a recent gene flow between studied populations, and we next carried out the reconstruction of recent population history. To ...
  74. [74]
  75. [75]
    Thomas Hunt Morgan – Article - NobelPrize.org
    Apr 20, 1998 · ... discovery of the white-eyed mutation in the fruit fly, Drosophila. Morgan received his Ph. D. degree in 1890 at Johns Hopkins University. He ...
  76. [76]
    Mitochondrial DNA and human evolution - Nature
    Jan 1, 1987 · Mitochondrial DNAsfrom 147 people, drawn from five geographic populations have been analysed by restriction mapping. All these mitochondrial ...
  77. [77]
    A haplotype map of the human genome - Nature
    Oct 27, 2005 · Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs)
  78. [78]
    Application of long-read sequencing to elucidate complex ... - NIH
    Nov 5, 2021 · Long-read sequencing data offers promising opportunities in elucidating complex pharmacogenes and haplotype phasing while maintaining accurate variant calling.
  79. [79]
    Genotype imputation methods for whole and complex genomic ...
    Jan 15, 2024 · This review describes the currently available deep learning-based genotype and HLA imputation methods, focusing on their specific adaptations for imputation ...
  80. [80]
    Phasing millions of samples achieves near perfect accuracy ... - NIH
    Jul 22, 2025 · Haplotype phasing is the process of determining which genetic variants are located on the same physical chromosome, enabling genotype imputation ...<|control11|><|separator|>
  81. [81]
    Leveraging haplotype information in heritability estimation and ...
    Jan 2, 2025 · We introduce a framework, named hapla, with a novel algorithm for clustering haplotypes in phased genotype data to estimate heritability and perform reference- ...