Fact-checked by Grok 2 weeks ago

Allele frequency

Allele frequency is the relative proportion of a specific allele at a given genetic locus within a population, representing the incidence of that gene variant among all copies of the gene in the population. It serves as a fundamental measure in population genetics for quantifying genetic variation and monitoring evolutionary changes at the molecular level. In diploid organisms, such as humans, frequencies are calculated by first determining the total number of at a locus, which equals twice the number of in the since each carries two copies of the . The frequency of a specific , denoted as p for the dominant or q for the recessive, is then computed as the number of copies of that divided by the total number of . For example, if N_AA is the number of homozygous dominant , N_Aa is the number of heterozygotes, and N is the total , then p = (2N_AA + N_Aa) / (2N). The concept of allele frequency gained prominence through the Hardy-Weinberg principle, independently formulated by mathematician and physician Wilhelm Weinberg in 1908, which predicts that frequencies in a large, randomly remain constant across generations in the absence of evolutionary influences like selection, , , or . Under Hardy-Weinberg equilibrium, frequencies can be derived from frequencies using the equation p² + 2pq + q² = 1, where and represent the frequencies of homozygous genotypes and 2pq the heterozygous . Allele frequencies are essential for studying population structure, , and , as deviations from expected frequencies under Hardy-Weinberg conditions signal the action of forces driving . They are widely applied in fields like to assess or , in to evaluate disease allele prevalence, and in forensics to interpret DNA evidence from population databases.

Core Concepts

What is an Allele?

An is one of two or more variant forms of a that occupy the same position, or locus, on a . These variants differ in their DNA , often due to changes at one or more positions, and they represent alternative versions of the genetic information encoded by that . In a , multiple alleles may exist for a given , contributing to among individuals. Alleles typically arise through mutations, which are alterations in the DNA sequence that can create new variants from an existing form. These mutations may result from errors during DNA replication, exposure to mutagens, or other genetic processes, leading to alleles that can be classified as dominant or recessive based on their expression patterns. For instance, a dominant allele expresses its trait even when paired with a recessive one, while a recessive allele requires two copies to manifest. Over time, such variants accumulate and persist in populations, influencing traits like eye color or disease susceptibility. The combination of alleles at a specific locus constitutes an individual's genotype, which ultimately determines the observable characteristics, or phenotype, through interactions with environmental factors. In diploid organisms, such as humans, each individual inherits two alleles per locus—one from each parent—resulting in possible homozygous (identical alleles) or heterozygous (different alleles) configurations. In contrast, haploid organisms carry only one allele per locus, simplifying inheritance patterns but still allowing for allelic variation across populations. The term "," a shortening of "allelomorph," was coined by British geneticist and his colleague Edith Saunders in 1902 to describe these alternative forms observed in studies. This terminology became foundational in early 20th-century , facilitating precise discussions of hereditary variation.

Defining Allele Frequency

Allele frequency is a fundamental metric in that quantifies the prevalence of a specific at a given genetic locus within a . It is defined as the proportion of that among all alleles present at the locus across the entire . Mathematically, for a specific allele A, its frequency p is calculated as p = \frac{\text{number of } A \text{ alleles}}{\text{total number of alleles at the locus}}, where the total number of alleles equals twice the number of individuals in a diploid or matches the number of individuals in a haploid one. This measure captures the relative abundance of genetic variants, providing insight into the genetic composition of the . The concept of allele frequency is intrinsically linked to the , which represents the collective set of all alleles carried by the individuals in a at a particular time. The encompasses the total available for inheritance and serves as the reservoir from which future generations draw their genetic material, thereby underpinning the 's evolutionary potential. Allele frequencies within this reflect the underlying and can indicate the health, adaptability, and historical dynamics of the . Allele frequency operates at the level of individual variants, distinct from , which describes the proportion of individuals possessing specific combinations of (such as homozygous or heterozygous states). While frequencies pertain to the observable traits or combinations in individuals, allele frequencies focus on the raw counts of in the , independent of how they are paired. In biallelic loci—those with only two possible , say A and a—standard notation assigns p to the frequency of A and q to the frequency of a, with the relationship q = 1 - p ensuring the frequencies sum to . Many loci have more than two (multi-allelic), in which case the frequencies of all at the locus sum to 1. In , allele frequency is essential for assessing , tracking evolutionary processes, and predicting inheritance patterns across generations. It enables researchers to model how alleles may spread or diminish, informing studies on disease susceptibility, , and . By measuring the distribution of alleles, this metric provides a baseline for understanding and its implications for .

Computing Allele Frequencies

In Haploid Populations

In haploid populations, such as those found in , many fungi, and gametes, each individual carries only one copy of each at a given locus due to their monoploid nature, which simplifies the estimation of allele frequencies compared to polyploid organisms. This single-allele-per-individual structure allows direct counting without the need to account for multiple copies within genotypes. The frequency of a specific A, denoted as p_A, is calculated as the proportion of individuals in the that possess A:
p_A = \frac{n_A}{N}
where n_A is the number of individuals with A, and N is the total number of individuals sampled. This formula arises from the fact that the total number of at the locus the total number of individuals (N), making the allele frequency a direct proportion of the count of that to the . To derive it step-by-step, first identify the present by or phenotyping the sampled individuals; then the counts for each type, ensuring the across all N; finally, divide the count for the target by N to obtain its frequency, with the frequencies of all ming to 1.
This calculation assumes a random sample from the , where each is independently genotyped without toward specific genotypes, and focuses on a single locus without considering interactions from multiple loci. In practice, for clonal or asexually reproducing haploids like , clone correction may be applied to avoid overrepresenting repeated genotypes in the sample. For instance, in a bacterial exposed to antibiotics, the of a resistance can be estimated by counting the proportion of resistant colonies (carrying the allele) relative to the total colonies cultured from the sample, providing insight into the prevalence of resistance under selective pressure.

In Diploid Populations

In diploid organisms, which include most eukaryotes such as humans and many , each individual carries two homologous chromosomes per locus, resulting in two alleles at each genetic locus. To compute frequencies at a biallelic locus (with alleles A and a) from observed counts in a diploid of size N, the p of allele A is given by p = \frac{2 \times (\text{number of AA homozygotes}) + (\text{number of Aa heterozygotes})}{2N}. The frequency q of allele a is then q = 1 - p. This formula arises from counting the total number of alleles in the population, which equals 2N since each of the N diploid individuals contributes two alleles. The AA homozygotes contribute two A alleles each, the Aa heterozygotes contribute one A allele (and one a allele) each, and the aa homozygotes contribute zero A alleles (two a alleles each). Thus, the total number of A alleles is 2 × (number of AA) + (number of Aa), and dividing by the total 2N yields p. For a locus with multiple alleles A_1, A_2, ..., A_k (k > 2), the frequency p_i of allele A_i generalizes to p_i = \frac{2 \times (\text{number of A_i A_i homozygotes}) + \sum_{j \neq i} (\text{number of A_i A_j heterozygotes})}{2N}, where the summation accounts for the single contribution of A_i from each heterozygote involving a different allele A_j, and the frequencies sum to 1 across all i. This follows the same allele-counting principle as the biallelic case, extended over all genotype classes. Genotype counts are typically obtained through direct (e.g., via ) or phenotyping (e.g., observing traits linked to ). While the empirical calculation itself requires no further assumptions, inferences about underlying frequencies from phenotypic data often assume random in the to relate observed phenotypes to expected genotype proportions.

Illustrative Example

To illustrate the calculation of frequencies in a diploid , consider a hypothetical sample of 100 individuals from a exhibiting variation at a single locus controlling flower color, with two : A (dominant, red flowers) and a (recessive, white flowers). The observed genotypes are 25 AA (homozygous dominant), 50 Aa (heterozygous), and 25 aa (homozygous recessive). The following table summarizes the genotype counts and their contributions to the total allele pool:
GenotypeCountContribution to A allelesContribution to a alleles
AA2550 (2 × 25)0
Aa5050 (1 × 50)50 (1 × 50)
aa25050 (2 × 25)
Total100100100
In a diploid , the total number of alleles at this locus is twice the number of individuals, or 200. The of the A (p) is the number of A alleles divided by the total number of alleles: p = 100 / 200 = 0.5. Similarly, the of the a (q) is 100 / 200 = 0.5. These equal frequencies indicate a balanced level of at the locus, where neither predominates in the sample. If this represents a sample rather than the entire , intervals can be estimated using the to account for sampling variability; for p = 0.5 with n = 200 alleles, the 95% is approximately 0.43 to 0.57. This type of calculation mirrors approaches used in studies of model organisms like Drosophila melanogaster, where frequencies are estimated from genotyped cohorts to assess across loci.

Changes in Allele Frequencies

Hardy-Weinberg Equilibrium

The Hardy-Weinberg equilibrium, also known as the Hardy-Weinberg principle, describes a theoretical state in where the frequencies of alleles and genotypes in a population remain constant from generation to generation in the absence of evolutionary influences such as , selection, , or non-random mating. In this equilibrium, the genotype frequencies can be predicted as the products of the underlying allele frequencies, assuming and random union of gametes. This principle was independently formulated in 1908 by British mathematician in a letter to titled "Mendelian Proportions in a Mixed Population," where he addressed misconceptions about the inevitable increase of dominant alleles under random mating, and by German physician Wilhelm Weinberg in a paper presented to the Natural Science Society of , "Über den Nachweis der Vererbung beim Menschen." Hardy's work, prompted by a query from geneticist R. C. Punnett, emphasized the stability of allele ratios in large, randomly mating populations, while Weinberg derived the general equilibrium for a single locus with multiple alleles. Their contributions reconciled Mendelian with and laid the foundation for modern . For a biallelic locus with alleles A (frequency p) and a (frequency q, where p + q = 1), the equilibrium genotype frequencies are given by: p^2 \quad (\text{AA}), \quad 2pq \quad (\text{Aa}), \quad q^2 \quad (\text{aa}) These satisfy the equation p^2 + 2pq + q^2 = 1, and the allele frequencies remain stable such that p_{t+1} = p_t and q_{t+1} = q_t across generations. The principle holds under five key conditions: infinitely large population size (to avoid ), random mating with no assortative preferences, absence of , no affecting survival or reproduction, and no migration or from other populations. The derivation arises from the random union of gametes in a diploid population: the probability of forming an AA homozygote is p × p = , an aa homozygote is q × q = , and a heterozygote Aa is 2 × (p × q) = 2pq, reflecting the equal likelihood of A from one parent and a from the other, or vice versa. This expected distribution matches the allele frequencies directly, confirming stability without external forces. To test for Hardy-Weinberg equilibrium in empirical data, researchers calculate expected genotype counts from observed frequencies and compare them to observed counts using a goodness-of-fit test, where significant deviations indicate violation of the assumptions.

Evolutionary Forces

In , evolutionary forces are the mechanisms that disrupt Hardy-Weinberg equilibrium by causing deviations in frequencies across generations. These forces include , , , , and non-random mating, each contributing to the dynamic of in populations. Under ideal conditions of equilibrium, frequencies remain stable, but the presence of these forces introduces change, driving evolutionary processes as outlined in the modern synthesis. Mutation introduces new genetic variation by creating novel alleles or altering existing ones, typically at a low rate that leads to gradual shifts in allele frequencies. The mutation rate per locus per generation (μ) is often on the order of 10^{-6}, meaning that while mutations are rare events, they serve as the ultimate source of genetic diversity over long timescales. For instance, deleterious mutations may reduce the frequency of affected alleles, but beneficial ones can increase slowly if not lost to other forces. This process is fundamental, as without mutation, populations would lack the raw material for adaptation. Natural selection acts by favoring individuals with higher , leading to predictable changes in based on the relative advantages of genotypes. For a beneficial with selection coefficient s (the proportional advantage), the approximate change in its p in a large is given by Δps p q, where q = 1 - p represents the of the alternative ; this approximation holds for weak selection and additive effects. thus increase in , while deleterious ones decline, as seen in cases like the spread of in . This directed process underlies adaptive , contrasting with random forces. Genetic drift causes random fluctuations in allele frequencies due to sampling error in finite populations, with effects amplified in small groups. In the Wright-Fisher model, the variance in the change of allele frequency Δp is p q / (2N), where N is the effective population size; this stochastic variation can lead to fixation or loss of alleles unrelated to fitness. For example, in populations with N < 100, neutral alleles may drift to fixation rapidly, reducing genetic diversity. Drift is neutral and non-adaptive, dominating in isolated or bottlenecked populations. Gene flow, or migration, homogenizes frequencies by exchanging genetic material between populations, counteracting divergence. If a proportion m of migrants arrives from a source population with frequency p_m, the change in the recipient population's frequency p is approximately Δp = m (p_m - p); high migration rates (m > 0.1) can prevent local by swamping differences. This force is evident in hybrid zones where interbreeding blends pools, promoting connectivity across landscapes. Non-random mating, such as or , primarily alters frequencies by deviating from but does not directly change frequencies in the absence of other forces. For instance, positive assortative mating increases homozygosity for certain alleles without shifting their overall proportions, though it can indirectly amplify selection or drift effects on genotypes. This mechanism influences short-term genetic structure but requires interaction with other forces for long-term evolutionary impact. The combined effects of these forces often interact, with their relative strengths determining net evolutionary trajectories; for example, in small populations, can override weak (s < 1/(2N)*), causing even advantageous alleles to be lost by chance. and introduce variation that selection or drift then shapes, while non-random mating modulates how these interact at the genotypic level. Such interactions highlight the complexity of in real populations. In , changes in allele frequencies form the basis of the modern synthesis, integrating Darwin's with through models by , Haldane, and . This framework explains how polygenic traits evolve via shifts in underlying allele frequencies, providing a genetic foundation for phenotypic adaptation across generations.

References

  1. [1]
    allele frequency | Learn Science at Scitable - Nature
    The allele frequency represents the incidence of a gene variant in a population. Alleles are variant forms of a gene that are located at the same position.
  2. [2]
    19.1B: Population Genetics - Biology LibreTexts
    Nov 23, 2024 · The allele frequency (or gene frequency) is the rate at which a specific allele appears within a population. In population genetics, the term ...Population Genetics · Allele Frequency · Population Size and Evolution
  3. [3]
    [PDF] Hardy-Weinberg equilibrium and population genetics calculations In ...
    To calculate allele frequencies for populations of diploid organisms, first multiply the number of individuals in the population by 2 to obtain the total number ...
  4. [4]
    [PDF] Mendelian Proportions in a Mixed Population. - ESP.ORG
    The “stability” of the particular ratio 1:2:1 is recognized by Professor Karl Pearson (Phil. Trans. Roy. Soc. (A), vol. 203, p. 60). Hardy, G. H. 1908.
  5. [5]
    The Hardy-Weinberg Principle | Learn Science at Scitable - Nature
    The Hardy-Weinberg theorem characterizes the distributions of genotype frequencies in populations that are not evolving, and is thus the fundamental null model ...
  6. [6]
    allele | Learn Science at Scitable - Nature
    An allele is a variant form of a gene. Some genes have a variety of different forms, which are located at the same position, or genetic locus, on a chromosome.<|control11|><|separator|>
  7. [7]
    Definition of allele - NCI Dictionary of Genetics Terms
    One of two or more versions of a genetic sequence at a particular region on a chromosome. An individual inherits two alleles for each gene, one from each ...
  8. [8]
    Minute to Understanding: What is an allele? - The Jackson Laboratory
    Alleles are matching genes; one from our biological mother, one from our biological father. We have two copies of every gene.
  9. [9]
    Genetic Terminology - PMC - NIH
    Different alleles arise at a locus as a result of mutation, or sudden change in the genetic material. Mutation is a relatively rare event, caused for example by ...
  10. [10]
    Allele - Definition and Examples - Biology Online Dictionary
    May 21, 2024 · An allele is a term coined to describe a specific copy of a gene. Genes, the DNA sequences controlling our traits, are usually found in two copies in ...
  11. [11]
    [PDF] The facts of heredity in the light of Mendel's discovery. - ESP.ORG
    Bateson, W. and Saunders, E. R. 1902. The facts of heredity in the light of Mendel's discovery. Reports to the Evolution Committee of the Royal.Missing: coined | Show results with:coined
  12. [12]
    Allele frequency & the gene pool (article) - Khan Academy
    Allele frequency refers to how common an allele is in a population. It is determined by counting how many times the allele appears in the population then ...
  13. [13]
    Allele Frequency - an overview | ScienceDirect Topics
    Allele frequency refers to the proportion of a specific allele within a population, calculated by considering the total number of alleles contributed by ...
  14. [14]
  15. [15]
    population genetics and the hardy-weinberg law answers to sample ...
    Aug 21, 2000 · Remember the basic formulas: p 2 + 2pq + q 2 = 1 and p + q = 1. p = frequency of the dominant allele in the population. q = frequency of the recessive allele ...
  16. [16]
    Allele frequency dynamics in a pedigreed natural population - PNAS
    A central goal of population genetics is to understand how genetic drift, natural selection, and gene flow shape allele frequencies through time.Allele Frequency Dynamics In... · Results · Effect Of Gene Flow
  17. [17]
    [PDF] Best Practices for Population Genetic Analyses
    For haploid pathogens sequencing of genes is simplified by the fact that only one allele is found at a locus (Berbegal et al. 2013; Milgroom et al. 2014).
  18. [18]
    Natural Selection - American Phytopathological Society
    The haploid model of natural selection is based on one locus with two alleles, and is applicable to haploid fungal pathogens (such as ascomycetes), bacteria and ...
  19. [19]
    [PDF] BIOL 434/509: Population genetics - UBC Zoology
    The expected value of the allele frequency doesn't change. • The amount of drift is inversely proportional to population size. (For haploids, there are N ...
  20. [20]
    Alarmingly High Segregation Frequencies of Quinolone Resistance ...
    The frequency of quinolone resistance alleles was especially high within host-associated environments, where it averaged an alarming ∼40%.
  21. [21]
    [PDF] Population Genetics and the Hardy-Weinberg Principle
    The total number of dominant A alleles in our population equals 600, which is the sum of: - the number of AA individuals times 2 (the number of A alleles per ...
  22. [22]
  23. [23]
    Calculating Gene (Allele) Frequencies in a Population | Genetics
    To calculate frequencies of the two codominant alleles, LM and LN, it should be kept in mind that these 6,129 persons possess a total of 6,129 x 2 = 12,258 ...
  24. [24]
    Chapter 1: Gene Frequencies – Quantitative Genetics for Plant ...
    If the population sampled is in Hardy-Weinberg Equilibrium (see below), the genetic sampling of alleles will be random so that [latex]P_{AA}=p_{A}^{2} \; \ ...Missing: mathematical | Show results with:mathematical
  25. [25]
    The Drosophila melanogaster Genetic Reference Panel - Nature
    Feb 8, 2012 · We describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits.<|control11|><|separator|>
  26. [26]
    G. H. Hardy (1908) and Hardy–Weinberg Equilibrium - PMC - NIH
    Here is Hardy–Weinberg clearly established for a gene frequency of one-half. ... This is probably the first reference to Hardy's paper. PUNNETT'S LECTURE AND ...
  27. [27]
    The population genetics of mutations: good, bad and indifferent - PMC
    We review current knowledge on mutation rates and their harmful and beneficial effects on fitness and then consider theories that predict the fate of individual ...
  28. [28]
    Rate, molecular spectrum, and consequences of human mutation
    For the genes involved in this study, the average rates of base-substitutional mutation are 11.63 (1.80) and 11.22 (3.23) × 10−9 per site per generation for ...
  29. [29]
    Natural Selection - an overview | ScienceDirect Topics
    Directional selection for aa increases the frequency of the a allele at a rate proportional to spq, that is, the fitness advantage (s) of aa (i.e., the strength ...Missing: approximation | Show results with:approximation
  30. [30]
    An introduction to the mathematical structure of the Wright–Fisher ...
    In this paper, we develop the mathematical structure of the Wright–Fisher model for evolution of the relative frequencies of two alleles at a diploid locus.Missing: Δp = pq /
  31. [31]
    Evolutionary Dynamics in Structured Populations Under Strong ...
    Migration: Allele frequencies are deterministically adjusted based on the migration rate, m: f A ′ = f A ( 1 − m ) + f B m. f B ′ = f B ( 1 − m ) + f A m .
  32. [32]
    Interaction of Selection, Mutation, and Drift - Oxford Academic
    This chapter examines the joint impact of selection, mutation, and drift on the allele frequencies at a locus.
  33. [33]
    Genetic Drift and Diversity – Molecular Ecology & Evolution
    While genetic drift and natural selection are distinct processes, they can interact. In small populations where drift predominates, even beneficial alleles ...
  34. [34]
    Darwin and Genetics - PMC - PubMed Central - NIH
    We review the interaction between evolution and genetics, showing how, unlike Mendel, Darwin's lack of a model of the mechanism of inheritance left him unable ...