Genotype frequency
Genotype frequency refers to the proportion of individuals in a population that possess a specific genotype at a given genetic locus, typically expressed as a fraction or percentage of the total population size.[1] This measure is distinct from allele frequency, which quantifies the prevalence of individual gene variants (alleles) rather than combinations.[1] In population genetics, genotype frequencies provide a snapshot of genetic variation and serve as a baseline for detecting evolutionary changes.[2] A central concept linking genotype and allele frequencies is the Hardy-Weinberg equilibrium (HWE), formulated in 1908, which predicts that in an idealized population—large, randomly mating, with no mutation, migration, selection, or genetic drift—genotype frequencies stabilize after one generation of random mating.[3] Under HWE, for a biallelic locus with alleles A (frequency p) and a (frequency q = 1 - p), the expected genotype frequencies are p² for homozygous AA, 2pq for heterozygous Aa, and q² for homozygous aa.[3] These frequencies can be empirically calculated by genotyping individuals and dividing the count of each genotype by the total sample size, often verified against HWE expectations using statistical tests like chi-squared to assess deviations.[3] Deviations from expected genotype frequencies signal the action of evolutionary forces, such as natural selection favoring certain genotypes, genetic drift in small populations, or non-random mating like inbreeding, which alters heterozygosity levels.[1] For instance, excess homozygotes may indicate population structure or assortative mating, while heterozygote excess could reflect balancing selection.[2] Genotype frequencies are integral to applications in evolutionary biology, where they help model adaptation; in conservation genetics, to evaluate inbreeding depression and diversity; and in medical genetics, to estimate disease risk in populations assuming HWE.[4]Basic Concepts
Definition
In population genetics, a genotype refers to the specific combination of alleles present at a given genetic locus in an individual's genome, representing the inherited genetic makeup for that locus. Genotypes are classified as homozygous when the two alleles are identical (e.g., AA or aa) or heterozygous when they differ (e.g., Aa).[5][6] Genotype frequency denotes the proportion of individuals within a population that carry a particular genotype at a specified locus, calculated as the number of individuals with that genotype divided by the total number of individuals in the population; it is conventionally expressed as a decimal value between 0 and 1 or as a percentage.[7] This frequency quantifies the distribution of genetic variants across the population and serves as a key metric for describing the genetic composition at the level of allele combinations rather than single variants.[8] The study of genotype frequencies is essential for tracking levels of genetic variation, elucidating patterns of inheritance, and delineating population structure, as these frequencies reveal how genetic diversity is maintained or altered over time.[8][9] For instance, shifts in genotype frequencies can indicate underlying demographic or selective processes influencing the gene pool.[7] The concept of genotype frequency originated in early 20th-century population genetics, pioneered by G. H. Hardy and Wilhelm Weinberg, whose independent 1908 publications laid the groundwork for understanding stable genetic distributions in non-evolving populations.[10] Genotype frequencies are intrinsically linked to allele frequencies, the proportions of individual alleles in the population, providing a foundation for modeling genetic dynamics.[8]Measurement and Calculation
Genotype frequencies are typically measured through direct counting in a sampled population, where the frequency of a specific genotype, such as the homozygous AA, is calculated as the ratio of the number of individuals exhibiting that genotype to the total number of individuals sampled.[11] This method, expressed mathematically as p_{AA} = \frac{n_{AA}}{N}, with n_{AA} denoting the count of AA individuals and N the total sample size, provides an unbiased estimate when the sample is representative of the population.[12] Direct counting assumes complete ascertainment of genotypes and is most accurate in small, fully genotyped populations, though it extends to larger datasets via proportional scaling.[13] Accurate measurement relies on genotyping techniques that identify specific alleles, including polymerase chain reaction (PCR) for amplifying target DNA regions followed by allele-specific detection, and next-generation sequencing (NGS) for high-throughput determination of nucleotide sequences at polymorphic sites.[14] PCR-based methods, such as TaqMan assays, offer cost-effective resolution for single nucleotide polymorphisms (SNPs), while sequencing provides comprehensive coverage for complex loci.[15] In large populations, direct counting often involves random sampling to estimate frequencies, ensuring each individual has an equal probability of selection to minimize bias.[13] Sample-based estimates incorporate confidence intervals derived from the binomial distribution, where for a genotype frequency \hat{p} based on n successes in N trials, the standard error is \sqrt{\frac{\hat{p}(1 - \hat{p})}{N}}, and intervals are constructed using methods like the Wilson score for improved accuracy at low frequencies. These intervals quantify uncertainty, with wider bounds for rare genotypes or small samples, guiding the reliability of population-level inferences. Genotype frequencies under random mating can be related to allele frequencies, but direct measurement prioritizes empirical counts over indirect derivations.[16]Relation to Allele Frequencies
Allele Frequency Basics
Allele frequency refers to the proportion of a particular allele among all alleles at a given genetic locus within a population's gene pool.[17] It quantifies how common a specific variant of a gene is, serving as a fundamental measure in population genetics to assess genetic diversity.[18] Unlike genotype frequency, which describes the proportion of individuals carrying a specific combination of alleles (such as homozygous or heterozygous), allele frequency focuses on the individual variants themselves and their relative abundance across the population.[19] For any locus, the frequencies of all alleles sum to 1, reflecting the complete set of genetic variants at that position.[17] Allele frequencies are typically estimated from observed genotype counts using the allele counting rule, where each individual's contribution to the allele pool is tallied. For a biallelic locus with alleles A and a in a diploid population of size N, the frequency p of allele A is calculated as: p = \frac{2n_{AA} + n_{Aa}}{2N} where nAA is the number of homozygous AA individuals and nAa is the number of heterozygous Aa individuals; the total number of alleles is 2N since each individual contributes two copies.[19] For loci with multiple alleles (more than two variants), the method extends by counting copies of each allele from all genotypes and dividing by the total number of alleles (2N), ensuring the sum of all allele frequencies equals 1.[19] As a prerequisite for analyzing genotype frequencies, allele frequencies provide the baseline proportions that, under assumptions of random mating, predict the expected distribution of genotypes in a population.[19]Hardy-Weinberg Equilibrium
The Hardy-Weinberg principle states that, in a large population undergoing random mating with no evolutionary forces acting upon it, the frequencies of alleles and genotypes will remain constant from generation to generation.[20] This equilibrium serves as a foundational null model in population genetics, allowing researchers to detect deviations that indicate evolutionary processes at work.[10] For a locus with two alleles, denoted as A (with frequency p) and a (with frequency q = 1 - p), the principle predicts that the genotype frequencies at equilibrium will be p^2 for the homozygous dominant genotype AA, $2pq for the heterozygous genotype Aa, and q^2 for the homozygous recessive genotype aa.[20] These frequencies derive from the random union of gametes, where the probability of forming each genotype follows the product of allele frequencies, summing to 1 as (p + q)^2 = p^2 + 2pq + q^2 = 1.[21] The principle rests on five key assumptions: an infinitely large population size to eliminate random genetic drift; random mating with no assortative preferences; absence of natural selection, ensuring all genotypes have equal fitness; no mutation to alter allele frequencies; and no gene flow through migration.[22] Each assumption is critical because violations introduce changes in allele or genotype frequencies, disrupting the predicted stability.[10] To demonstrate stability, consider the gamete frequencies in the parental generation: p for A and q for a. The offspring genotypes form via random combination, yielding frequencies p^2, $2pq, and q^2 in the first generation, as shown by the binomial expansion of (p + q)^2.[20] Extracting alleles from this equilibrium generation produces the same gamete frequencies p and q, confirming that the genotype proportions persist unchanged in subsequent generations under the assumptions.[21] Alternatively, a Punnett square for the two alleles illustrates this for a single locus, with rows and columns representing gamete contributions and cells showing the resulting genotype probabilities matching the equilibrium values.[22]Examples and Applications
Numerical Example
Consider a hypothetical population of 100 diploid individuals at a single locus with two alleles, A and a. Suppose the observed genotype counts are 49 AA, 42 Aa, and 9 aa. To calculate allele frequencies, count the total number of A and a alleles across all individuals. There are 200 alleles in total (2 per individual). The number of A alleles is (2 × 49) + 42 = 140, so the frequency of A (p) is 140/200 = 0.7. The frequency of a (q) is 1 - p = 0.3. The observed genotype frequencies are then 49/100 = 0.49 for AA, 42/100 = 0.42 for Aa, and 9/100 = 0.09 for aa. Under Hardy-Weinberg equilibrium (HWE), the expected genotype frequencies are p² for AA, 2pq for Aa, and q² for aa, assuming random mating and other standard conditions. Substituting the values gives expected frequencies of (0.7)² = 0.49 for AA, 2 × 0.7 × 0.3 = 0.42 for Aa, and (0.3)² = 0.09 for aa. The expected counts, based on the population size of 100, are thus 49 AA, 42 Aa, and 9 aa. To assess whether the observed counts fit the HWE expectations, apply a chi-square goodness-of-fit test: χ² = Σ [(O - E)² / E], where O is the observed count and E is the expected count for each genotype. Degrees of freedom = number of genotypes - 1 - number of estimated parameters = 3 - 1 - 1 = 1. Here, since O = E for all genotypes, χ² = 0. A χ² value of 0 indicates a perfect fit (p = 1), confirming the population is in HWE.[23]| Genotype | Observed Count (O) | Expected Count (E) | (O - E)² / E |
|---|---|---|---|
| AA | 49 | 49 | 0 |
| Aa | 42 | 42 | 0 |
| aa | 9 | 9 | 0 |
| Total | 100 | 100 | χ² = 0 |