Fact-checked by Grok 2 weeks ago

Genotype frequency

Genotype frequency refers to the proportion of individuals in a that possess a specific at a given genetic locus, typically expressed as a or of the total size. This measure is distinct from , which quantifies the prevalence of individual gene variants (alleles) rather than combinations. In , frequencies provide a snapshot of and serve as a baseline for detecting evolutionary changes. A central concept linking genotype and allele frequencies is the Hardy-Weinberg equilibrium (HWE), formulated in , which predicts that in an idealized population—large, randomly mating, with no , , selection, or frequencies stabilize after one generation of random mating. Under HWE, for a biallelic locus with s A (frequency p) and a (frequency q = 1 - p), the expected frequencies are p² for homozygous AA, 2pq for heterozygous Aa, and q² for homozygous aa. These frequencies can be empirically calculated by individuals and dividing the count of each by the total sample size, often verified against HWE expectations using statistical tests like chi-squared to assess deviations. Deviations from expected genotype frequencies signal the action of evolutionary forces, such as favoring certain genotypes, in small s, or non-random mating like , which alters heterozygosity levels. For instance, excess homozygotes may indicate or , while heterozygote excess could reflect balancing selection. Genotype frequencies are integral to applications in , where they help model adaptation; in conservation genetics, to evaluate and diversity; and in , to estimate disease risk in populations assuming HWE.

Basic Concepts

Definition

In population genetics, a refers to the specific combination of alleles present at a given genetic locus in an individual's , representing the inherited genetic makeup for that locus. Genotypes are classified as homozygous when the two alleles are identical (e.g., AA or aa) or heterozygous when they differ (e.g., Aa). Genotype frequency denotes the proportion of individuals within a that carry a particular at a specified locus, calculated as the number of individuals with that genotype divided by the total number of individuals in the ; it is conventionally expressed as a value between and or as a . This frequency quantifies the distribution of genetic variants across the and serves as a key metric for describing the genetic composition at the level of allele combinations rather than single variants. The study of genotype frequencies is essential for tracking levels of , elucidating patterns of , and delineating structure, as these frequencies reveal how is maintained or altered over time. For instance, shifts in genotype frequencies can indicate underlying demographic or selective processes influencing the . The concept of genotype frequency originated in early 20th-century , pioneered by and Wilhelm Weinberg, whose independent 1908 publications laid the groundwork for understanding stable genetic distributions in non-evolving populations. Genotype frequencies are intrinsically linked to allele frequencies, the proportions of individual alleles in the population, providing a foundation for modeling genetic dynamics.

Measurement and Calculation

Genotype frequencies are typically measured through direct counting in a sampled , where the frequency of a specific , such as the homozygous , is calculated as the ratio of the number of individuals exhibiting that to the total number of individuals sampled. This method, expressed mathematically as p_{AA} = \frac{n_{AA}}{N}, with n_{AA} denoting the count of AA individuals and N the total sample size, provides an unbiased estimate when the sample is representative of the . Direct counting assumes complete ascertainment of and is most accurate in small, fully genotyped populations, though it extends to larger datasets via proportional scaling. Accurate measurement relies on genotyping techniques that identify specific alleles, including (PCR) for amplifying target DNA regions followed by allele-specific detection, and next-generation sequencing (NGS) for high-throughput determination of sequences at polymorphic sites. PCR-based methods, such as assays, offer cost-effective resolution for single polymorphisms (SNPs), while sequencing provides comprehensive coverage for complex loci. In large populations, direct counting often involves random sampling to estimate frequencies, ensuring each individual has an equal probability of selection to minimize . Sample-based estimates incorporate confidence intervals derived from the , where for a genotype frequency \hat{p} based on n successes in N trials, the is \sqrt{\frac{\hat{p}(1 - \hat{p})}{N}}, and intervals are constructed using methods like the Wilson score for improved accuracy at low frequencies. These intervals quantify uncertainty, with wider bounds for rare genotypes or small samples, guiding the reliability of population-level inferences. Genotype frequencies under random mating can be related to allele frequencies, but direct measurement prioritizes empirical counts over indirect derivations.

Relation to Allele Frequencies

Allele Frequency Basics

refers to the proportion of a particular among all alleles at a given genetic locus within a population's . It quantifies how common a specific variant of a is, serving as a fundamental measure in to assess . Unlike genotype frequency, which describes the proportion of individuals carrying a specific combination of alleles (such as homozygous or heterozygous), focuses on the individual variants themselves and their relative abundance across the . For any locus, the frequencies of all alleles sum to 1, reflecting the complete set of genetic variants at that position. Allele frequencies are typically estimated from observed counts using the counting rule, where each individual's contribution to the pool is tallied. For a biallelic locus with A and a in a diploid of size N, the p of A is calculated as: p = \frac{2n_{AA} + n_{Aa}}{2N} where nAA is the number of homozygous AA individuals and nAa is the number of heterozygous Aa individuals; the total number of alleles is 2N since each individual contributes two copies. For loci with multiple alleles (more than two variants), the method extends by counting copies of each allele from all genotypes and dividing by the total number of alleles (2N), ensuring the sum of all allele frequencies equals 1. As a prerequisite for analyzing genotype frequencies, allele frequencies provide the baseline proportions that, under assumptions of random mating, predict the expected distribution of genotypes in a .

Hardy-Weinberg Equilibrium

The Hardy-Weinberg principle states that, in a large undergoing random mating with no evolutionary forces acting upon it, the frequencies of and genotypes will remain constant from generation to generation. This equilibrium serves as a foundational null model in , allowing researchers to detect deviations that indicate evolutionary processes at work. For a locus with two alleles, denoted as A (with frequency p) and a (with frequency q = 1 - p), predicts that the genotype frequencies at will be p^2 for the homozygous dominant genotype AA, $2pq for the heterozygous genotype Aa, and q^2 for the homozygous recessive genotype aa. These frequencies derive from the random union of gametes, where the probability of forming each genotype follows the product of allele frequencies, summing to 1 as (p + q)^2 = p^2 + 2pq + q^2 = 1. The rests on five key s: an infinitely large to eliminate random ; random mating with no assortative preferences; absence of , ensuring all s have equal ; no to alter frequencies; and no through . Each is critical because violations introduce changes in or frequencies, disrupting the predicted stability. To demonstrate stability, consider the gamete frequencies in the parental generation: p for A and q for a. The offspring genotypes form via random combination, yielding frequencies p^2, $2pq, and q^2 in the first generation, as shown by the binomial expansion of (p + q)^2. Extracting alleles from this equilibrium generation produces the same gamete frequencies p and q, confirming that the genotype proportions persist unchanged in subsequent generations under the assumptions. Alternatively, a Punnett square for the two alleles illustrates this for a single locus, with rows and columns representing gamete contributions and cells showing the resulting genotype probabilities matching the equilibrium values.

Examples and Applications

Numerical Example

Consider a hypothetical of 100 diploid individuals at a single locus with two , A and a. Suppose the observed genotype counts are 49 AA, 42 Aa, and 9 aa. To calculate allele frequencies, count the total number of A and a alleles across all individuals. There are 200 alleles in total (2 per individual). The number of A alleles is (2 × 49) + 42 = 140, so the frequency of A (p) is 140/200 = 0.7. The frequency of a (q) is 1 - p = 0.3. The observed genotype frequencies are then 49/100 = 0.49 for , 42/100 = 0.42 for , and 9/100 = 0.09 for . Under Hardy-Weinberg equilibrium (HWE), the expected genotype frequencies are p² for , 2pq for , and q² for , assuming random mating and other standard conditions. Substituting the values gives expected frequencies of (0.7)² = 0.49 for , 2 × 0.7 × 0.3 = 0.42 for , and (0.3)² = 0.09 for . The expected counts, based on the population size of 100, are thus 49 , 42 , and 9 . To assess whether the observed counts fit the HWE expectations, apply a goodness-of-fit test: χ² = Σ [(O - E)² / E], where O is the observed count and E is the expected count for each . = number of genotypes - 1 - number of estimated parameters = 3 - 1 - 1 = 1. Here, since O = E for all genotypes, χ² = 0. A χ² value of 0 indicates a perfect fit (p = 1), confirming the population is in HWE.
GenotypeObserved Count (O)Expected Count (E)(O - E)² / E
AA49490
Aa42420
aa990
Total100100χ² = 0
This equilibrium implies genetic stability in the population, with frequencies unlikely to change in the absence of evolutionary forces.

Real-World Estimation

In practice, genotype frequencies are estimated using large-scale genomic databases such as the , which sequences diverse human populations to provide reference data for variants across multiple loci, enabling multi-locus analysis through tools that compute frequencies from genotype calls in phased haplotypes. Field studies complement these by collecting samples from specific populations for targeted multi-locus genotyping, often using high-throughput sequencing to assess frequencies in non-model organisms or isolated groups where database coverage is limited. A notable case study involves the , where genotype frequencies vary significantly across ethnic groups due to historical selection and drift; for instance, Native American populations exhibit notably high frequencies of the OO genotype (corresponding to type O blood), often exceeding 90% in some indigenous groups, compared to lower rates (around 40-50%) in populations. This variation highlights how founder effects can shape genotype distributions, with the O dominating in likely due to ancient bottlenecks during migration from . Statistical estimation of frequencies commonly employs maximum likelihood methods, which maximize the probability of observed to infer frequencies, particularly useful for multi-allelic loci; these approaches can incorporate expectation-maximization algorithms to handle by iteratively imputing genotypes based on patterns. For small samples, bias-corrected maximum likelihood estimators adjust for finite sizes, reducing overestimation of rare genotypes while accounting for . Computational tools facilitate these analyses, with R packages like HardyWeinberg providing functions to estimate genotype frequencies from count data and test for equilibrium, including handling of missing genotypes via EM algorithms. Similarly, the adegenet package supports multi-locus computations by converting genetic data into frequency matrices for diverse marker types, enabling exploratory analysis of population structure.

Deviations and Dynamics

Factors Causing Deviations

Genotype frequencies in populations are expected to conform to Hardy-Weinberg (HWE) under ideal conditions of random , no selection, infinite , no , and no . Non-random disrupts this by altering the proportions of homozygous and heterozygous genotypes. , where individuals mate with close relatives, increases the frequency of homozygotes relative to expectations, as related parents are more likely to share alleles identical by descent. This phenomenon, known as the Wahlund effect in the context of population substructure, results from mixing subpopulations with differing allele frequencies, leading to a deficit of heterozygotes and an excess of homozygotes. , in which individuals preferentially mate with others of similar phenotypes or genotypes, similarly skews genotype frequencies by reducing heterozygote production and amplifying homozygote frequencies for traits under assortative selection. Natural selection acts on genotype frequencies by favoring individuals with specific genotypes that confer survival or reproductive advantages, thereby shifting frequencies away from HWE predictions. For instance, in regions endemic to malaria, the sickle-cell allele (HbS) is maintained at higher frequencies due to heterozygote advantage: carriers (HbA/HbS) exhibit resistance to infection without suffering severe sickle-cell disease, while homozygotes (HbS/HbS) face high mortality from the disease. This selective pressure results in elevated frequencies of the HbS genotype in affected populations, deviating from equilibrium proportions. Migration, or gene flow, introduces alleles from external populations, altering local genotype frequencies and preventing equilibrium if the incoming alleles differ in frequency from the resident population. Genetic drift, the random fluctuation of allele frequencies, is particularly pronounced in small populations, where chance events can lead to unpredictable changes in genotype proportions, often resulting in loss of rare genotypes and increased homozygosity over time. Mutation introduces new alleles into the , albeit at low rates, gradually changing frequencies by creating novel variants that were not present in the original . Deviations from expected frequencies under HWE can be quantified using , such as F_{IS}, which measures the inbreeding coefficient within subpopulations by comparing observed heterozygosity to that expected under random . A positive F_{IS} indicates a deficit of heterozygotes due to factors like or Wahlund effects, while negative values suggest excess heterozygotes, such as from .

Evolutionary Implications

Changes in genotype frequencies within populations are a fundamental driver of microevolution, enabling adaptation to selective pressures such as environmental changes or human-induced stressors. Under natural or artificial selection, advantageous genotypes increase in frequency, altering the genetic composition of the population and facilitating survival in novel conditions. For instance, in insects exposed to pesticides, dominant resistance alleles rapidly rise in frequency due to strong selective sweeps, as observed in Anopheles gambiae where kdr mutations conferring pyrethroid resistance spread quickly, reducing genetic diversity around the locus and adapting populations to insecticide use. Divergent shifts in frequencies between populations play a critical role in by promoting . When populations experience different selective environments, is reduced at divergently selected loci, creating "genomic islands" of elevated that accumulate and structural variants. This hitchhiking effect links selected sites to nearby genomic regions, enhancing barriers to interbreeding and contributing to incompatibilities or mate preference traits that solidify boundaries. In conservation genetics, reduced heterozygosity—reflected in skewed genotype frequencies toward homozygotes—signals , where deleterious recessive alleles become more expressed, impairing components like viability and . Empirical studies in wild populations, such as the Glanville fritillary metapopulation, demonstrate that low heterozygosity elevates risk by over 25% in declining or large subpopulations, exacerbating demographic declines and pushing toward an . Long-term evolutionary dynamics of genotype frequencies are modeled by the Wright-Fisher framework, which simulates genetic drift in finite populations through binomial sampling of alleles from the previous generation. In this neutral model, random fluctuations cause allele frequencies to drift over generations, leading to stochastic changes in genotype frequencies that can result in fixation or loss of alleles, with variance in allele frequency change per generation given by \frac{p(1-p)}{2N}, where p is the allele frequency and N is the effective population size. This process highlights how drift erodes genetic variation in small populations, influencing long-term evolutionary trajectories independent of selection. In , genotype frequency data inform estimates by partitioning phenotypic variance into additive genetic components, where allele frequencies determine the average effects of genes contributing to trait variation. Broad-sense heritability H^2 = \frac{V_G}{V_P} (with V_G as genetic variance and V_P as total phenotypic variance) can be derived from observed genotype distributions and relative similarities among , enabling predictions of response to selection in programs or natural populations.

References

  1. [1]
    2: Allele and Genotype Frequencies - Biology LibreTexts
    Dec 24, 2022 · In this chapter we will work through how the basics of Mendelian genetics play out at the population level in sexually reproducing organisms.
  2. [2]
    [PDF] The Hardy-Weinberg Principle in Population Genetics
    The Hardy-Weinberg principle states that allele and genotype frequencies in an ideal population will remain constant without evolutionary factors.
  3. [3]
    Archived | Hardy-Weinberg Principle
    Jul 17, 2023 · Step 1: Determine the gene frequencies of the current generation. · Step 2: Determine the expected genotype frequencies for the next generation.Genetic Drift and Natural... · Inbreeding · Random Mating · Migration and Gene Flow
  4. [4]
    Hardy-Weinberg Equilibrium in the Large Scale Genomic ... - PMC
    Mar 13, 2020 · Hardy-Weinberg Equilibrium (HWE) is used to estimate the number of homozygous and heterozygous variant carriers based on its allele frequency in populations ...
  5. [5]
    Genotype - National Human Genome Research Institute (NHGRI)
    A genotype is a scoring of the type of variant present at a given location (ie, a locus) in the genome. It can be represented by symbols.
  6. [6]
    Definition of genotype - NCI Dictionary of Genetics Terms
    A term that refers to the two alleles present at a specific locus in the genome. Genotype also refers to the entire genetic makeup of an individual.
  7. [7]
    Lect 3 Pop. Gen. I Intro.
    Genotype frequency is a measure of the commonness of a genotype in a population; i.e., the proportion of a specific genotype in a population. Two major terms ...
  8. [8]
    Population Genetics - Stanford Encyclopedia of Philosophy
    Sep 22, 2006 · Population genetics is a field of biology that studies the genetic composition of biological populations, and the changes in genetic composition that result ...2. The Hardy-Weinberg... · 3.1 Selection At One Locus · 4. Random Drift
  9. [9]
    A Fundamental Relationship Between Genotype Frequencies ... - NIH
    This population genetic state is represented by a point in genotype frequency space, where each dimension corresponds to the frequency of a particular genotype.Missing: definition | Show results with:definition
  10. [10]
    The Hardy-Weinberg Principle | Learn Science at Scitable - Nature
    The Hardy-Weinberg theorem characterizes the distributions of genotype frequencies in populations that are not evolving, and is thus the fundamental null model ...
  11. [11]
    [PDF] Lecture Notes in Population Genetics - Holsinger Lab
    This table describes, empirically, the relationship between the genotypes of mothers and the genotypes of their offspring. We can also make some inferences ...
  12. [12]
    Estimating the Population Frequency of a DNA Pattern - Chance
    A standard way to estimate frequency is to count occurrences in a random sample of the appropriate population and then use classical statistical formulas.
  13. [13]
    Overview of Genotyping Technologies and Methods - Kockum - 2023
    Apr 7, 2023 · This overview covers key concepts in genetics, the development of common genotyping methods, and a comparison of several techniques, including PCR, microarrays ...
  14. [14]
    MIG-seq: an effective PCR-based method for genome-wide ... - Nature
    Nov 23, 2015 · MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform ...Results · Population Genetic Analysis... · Methods
  15. [15]
    Confidence Intervals for Population Allele Frequencies - PMC - NIH
    Jan 21, 2014 · This study presents a method to construct confidence intervals for population allele frequencies, addressing the lack of a method for this ...
  16. [16]
    (PDF) Confidence Intervals for Population Allele Frequencies
    Aug 6, 2025 · The estimation of population allele frequencies using sample data forms a central component of studies in population genetics.
  17. [17]
    allele frequency | Learn Science at Scitable - Nature
    The allele frequency represents the incidence of a gene variant in a population. Alleles are variant forms of a gene that are located at the same position.
  18. [18]
    19.1B: Population Genetics - Biology LibreTexts
    Nov 23, 2024 · Allele Frequency. The allele frequency (or gene frequency) is the rate at which a specific allele appears within a population. In population ...
  19. [19]
    Allele frequency & the gene pool (article) - Khan Academy
    Allele frequency refers to how common an allele is in a population. It is determined by counting how many times the allele appears in the population then ...
  20. [20]
    [PDF] Mendelian Proportions in a Mixed Population. - ESP.ORG
    The “stability” of the particular ratio 1:2:1 is recognized by Professor Karl Pearson (Phil. Trans. Roy. Soc. (A), vol. 203, p. 60). Hardy, G. H. 1908.
  21. [21]
    Population Genetics: the Hardy-Weinberg Principle
    The Hardy-Weinberg principle is a baseline for no evolution, where allele frequencies are stable and match predicted genotype proportions.Missing: term | Show results with:term
  22. [22]
    Mendelian proportions in a mixed population. 1908 - PMC - NIH
    Mendelian proportions in a mixed population. 1908. · Full text · ACTIONS · PERMALINK · RESOURCES · Cite · Add to Collections.
  23. [23]
    [PDF] Karl Pearson a - McGill University
    Karl Pearson a a University College, London. Online Publication Date: 01 July 1900. To cite this Article Pearson, Karl(1900)'X. On the criterion that a given ...Missing: original | Show results with:original
  24. [24]
    1000 Genomes Project summary
    The goal of the 1000 Genomes Project was to find common genetic variants with frequencies of at least 1% in the populations studied.<|control11|><|separator|>
  25. [25]
    Genotype-Frequency Estimation from High-Throughput Sequencing ...
    It is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks.
  26. [26]
    Blood groups in Native Americans: a look beyond ABO and Rh - PMC
    Apr 19, 2021 · The study presents comparisons between blood group frequencies beyond ABO and Rh blood systems in Native American populations and previously published data ...
  27. [27]
    Blood group O alleles in Native Americans - PubMed
    All major ABO blood alleles are found in most populations worldwide, whereas the majority of Native Americans are nearly exclusively in the O group.
  28. [28]
    ABO gene may be salient to the out of Africa migrations to the ...
    Population frequencies of the ABO blood groups differ significantly. Unlike any other population, Native American populations almost exclusively have the ABO O ...
  29. [29]
    Maximum likelihood estimates of allele frequencies and error rates ...
    Abstract. Summary: Graphical modeling is used to extend the gene counting method to compute maximum likelihood estimates of allele frequencies for samples.
  30. [30]
    Estimation of allele frequency and association mapping using next ...
    Jun 11, 2011 · We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data.
  31. [31]
    A Likelihood-Based Approach for Missing Genotype Data - PMC
    The purpose of this paper is to describe how several standard methods for handling missing data can be applied or adapted to this problem, and to compare their ...
  32. [32]
    [PDF] Exploring Diallelic Genetic Markers: The HardyWeinberg Package
    The HardyWeinberg package tests for Hardy-Weinberg equilibrium, which is when genotype frequencies are p2, 2pq, and q2, and provides tools for analyzing ...
  33. [33]
    adegenet: a R package for the multivariate analysis of genetic markers
    The package adegenet for the R software is dedicated to the multivariate analysis of genetic markers. It extends the ade4 package of multivariate methods.
  34. [34]
    makefreq: Compute allelic frequencies in adegenet - rdrr.io
    The `makefreq` function computes allele frequencies for `genind` or `genpop` objects, returning a matrix of frequencies and number of observations.
  35. [35]
    Revisiting the Hardy-Weinberg Equilibrium – Molecular Ecology ...
    If new mutations confer an adaptive advantage, they may increase in frequency over time, creating a noticeable deviation from Hardy-Weinberg expectations.
  36. [36]
    4 Population Genetics | The Evaluation of Forensic DNA Evidence
    First, parents might be related, leading to inbreeding. Inbreeding decreases the proportion of heterozygotes, with a compensatory increase in homozygotes.
  37. [37]
    [PDF] Population Structure
    inbred due to more homozygotes than expected under the assumption of random mating. ▷ Wahlund Effect: Reduction in observed heterozygosity. (increased ...
  38. [38]
    1
    ... assortative mating may be reducing the frequency of heterozygotes; drift may be reducing the frequency of heterozygotes and increasing the frequency of one ...
  39. [39]
    Resistance to malaria in humans: the impact of strong, recent selection
    Oct 22, 2012 · Perhaps best known is the sickle cell haemoglobin variant, which is often used as an example of heterozygote advantage.
  40. [40]
    Modern Theories of Evolution: Natural Selection
    People who are heterozygous (Aa) for sickle-cell trait also have moderately good resistance to malaria because some of their red cells are misshapen and ...
  41. [41]
    [PDF] Deviations from Hardy-Weinberg Equilibrium
    The Hardy-Weinburg Equilibrium can also be used to predict the frequency of heterozygotes in a given population, given the proportion of homozygous recessive ...
  42. [42]
    Mechanisms of Evolutionary Change – BSC109 – Biology I
    Mutation – New alleles generated by mutation are rare and are often 'neutral' in effect. Natural Selection – Individuals who have more surviving offspring pass ...Learning · Mutation · Natural Selection
  43. [43]
    Genetics in geographically structured populations: defining ... - PMC
    Wright's F-statistics, and especially FST, provide important insights into the evolutionary processes that influence the structure of genetic variation ...
  44. [44]
    Detecting selection-induced departures from Hardy-Weinberg ...
    Departures from Hardy-Weinberg can be measured by an inbreeding coefficient (Fsel). Note that while F-statistics are used, this does not imply that any actual ...
  45. [45]
    The evolutionary origins of pesticide resistance - PMC - NIH
    Pesticides are mostly novel synthetic compounds, and yet target species are often able to evolve resistance soon after a new compound is introduced.
  46. [46]
    Genomic divergence during speciation: causes and consequences
    Speciation is often an extended and quantitative process, during which reproductive isolation and genomic divergence builds up.
  47. [47]
    Demography and environment modulate the effects of genetic ... - NIH
    Aug 8, 2024 · Populations with low heterozygosity had an increased risk of extinction compared to high heterozygosity patches only when population size ...
  48. [48]
    Review Genetics and extinction - ScienceDirect.com
    Inbreeding depression, loss of genetic diversity and mutation accumulation have been hypothesised to increase extinction risk. There is now compelling evidence ...
  49. [49]
    Statistical Inference in the Wright–Fisher Model Using Allele ... - PMC
    The Wright–Fisher model, in its simplest form, only considers random genetic drift (Fig. 2), where the stochastic fluctuations in the allele frequency are ...
  50. [50]
    Estimating Trait Heritability | Learn Science at Scitable - Nature
    Heritability is a concept that summarizes how much of the variation in a trait is due to variation in genetic factors.