Allele frequency
Allele frequency is the relative proportion of a specific allele at a given genetic locus within a population, representing the incidence of that gene variant among all copies of the gene in the population.[1] It serves as a fundamental measure in population genetics for quantifying genetic variation and monitoring evolutionary changes at the molecular level.[2] In diploid organisms, such as humans, allele frequencies are calculated by first determining the total number of alleles at a locus, which equals twice the number of individuals in the population since each individual carries two copies of the gene.[3] The frequency of a specific allele, denoted as p for the dominant allele or q for the recessive, is then computed as the number of copies of that allele divided by the total number of alleles.[3] For example, if N_AA is the number of homozygous dominant individuals, N_Aa is the number of heterozygotes, and N is the total population size, then p = (2N_AA + N_Aa) / (2N).[3] The concept of allele frequency gained prominence through the Hardy-Weinberg principle, independently formulated by mathematician G. H. Hardy and physician Wilhelm Weinberg in 1908, which predicts that allele frequencies in a large, randomly mating population remain constant across generations in the absence of evolutionary influences like selection, mutation, migration, or genetic drift.[4] Under Hardy-Weinberg equilibrium, genotype frequencies can be derived from allele frequencies using the equation p² + 2pq + q² = 1, where p² and q² represent the frequencies of homozygous genotypes and 2pq the heterozygous genotype.[5] Allele frequencies are essential for studying population structure, genetic diversity, and evolutionary processes, as deviations from expected frequencies under Hardy-Weinberg conditions signal the action of forces driving evolution.[5] They are widely applied in fields like conservation biology to assess inbreeding or gene flow, in medical genetics to evaluate disease allele prevalence, and in forensics to interpret DNA evidence from population databases.[2]Core Concepts
What is an Allele?
An allele is one of two or more variant forms of a gene that occupy the same position, or locus, on a chromosome.[6] These variants differ in their DNA sequence, often due to changes at one or more nucleotide positions, and they represent alternative versions of the genetic information encoded by that gene.[7] In a population, multiple alleles may exist for a given gene, contributing to genetic diversity among individuals.[8] Alleles typically arise through mutations, which are alterations in the DNA sequence that can create new variants from an existing form.[9] These mutations may result from errors during DNA replication, exposure to mutagens, or other genetic processes, leading to alleles that can be classified as dominant or recessive based on their expression patterns.[6] For instance, a dominant allele expresses its trait even when paired with a recessive one, while a recessive allele requires two copies to manifest.[10] Over time, such variants accumulate and persist in populations, influencing traits like eye color or disease susceptibility. The combination of alleles at a specific locus constitutes an individual's genotype, which ultimately determines the observable characteristics, or phenotype, through interactions with environmental factors.[7] In diploid organisms, such as humans, each individual inherits two alleles per locus—one from each parent—resulting in possible homozygous (identical alleles) or heterozygous (different alleles) configurations.[8] In contrast, haploid organisms carry only one allele per locus, simplifying inheritance patterns but still allowing for allelic variation across populations.[6] The term "allele," a shortening of "allelomorph," was coined by British geneticist William Bateson and his colleague Edith Saunders in 1902 to describe these alternative gene forms observed in Mendelian inheritance studies.[11] This terminology became foundational in early 20th-century genetics, facilitating precise discussions of hereditary variation.[9]Defining Allele Frequency
Allele frequency is a fundamental metric in population genetics that quantifies the prevalence of a specific allele at a given genetic locus within a population. It is defined as the proportion of that allele among all alleles present at the locus across the entire population.[12] Mathematically, for a specific allele A, its frequency p is calculated as p = \frac{\text{number of } A \text{ alleles}}{\text{total number of alleles at the locus}}, where the total number of alleles equals twice the number of individuals in a diploid population or matches the number of individuals in a haploid one.[13] This measure captures the relative abundance of genetic variants, providing insight into the genetic composition of the population.[2] The concept of allele frequency is intrinsically linked to the gene pool, which represents the collective set of all alleles carried by the individuals in a population at a particular time. The gene pool encompasses the total genetic diversity available for inheritance and serves as the reservoir from which future generations draw their genetic material, thereby underpinning the population's evolutionary potential.[12] Allele frequencies within this gene pool reflect the underlying genetic variation and can indicate the health, adaptability, and historical dynamics of the population.[14] Allele frequency operates at the level of individual gene variants, distinct from genotype frequency, which describes the proportion of individuals possessing specific combinations of alleles (such as homozygous or heterozygous states). While genotype frequencies pertain to the observable traits or combinations in individuals, allele frequencies focus on the raw counts of alleles in the gene pool, independent of how they are paired.[15] In biallelic loci—those with only two possible alleles, say A and a—standard notation assigns p to the frequency of A and q to the frequency of a, with the relationship q = 1 - p ensuring the frequencies sum to unity. Many loci have more than two alleles (multi-allelic), in which case the frequencies of all alleles at the locus sum to 1.[16][17] In population genetics, allele frequency is essential for assessing genetic variation, tracking evolutionary processes, and predicting inheritance patterns across generations. It enables researchers to model how alleles may spread or diminish, informing studies on disease susceptibility, conservation, and adaptation.[18] By measuring the distribution of alleles, this metric provides a baseline for understanding genetic diversity and its implications for population resilience.Computing Allele Frequencies
In Haploid Populations
In haploid populations, such as those found in bacteria, many fungi, and gametes, each individual carries only one copy of each gene at a given locus due to their monoploid nature, which simplifies the estimation of allele frequencies compared to polyploid organisms.[19][20] This single-allele-per-individual structure allows direct counting without the need to account for multiple copies within genotypes.[19] The frequency of a specific allele A, denoted as p_A, is calculated as the proportion of individuals in the population that possess allele A:p_A = \frac{n_A}{N}
where n_A is the number of individuals with allele A, and N is the total number of individuals sampled.[19][21] This formula arises from the fact that the total number of alleles at the locus equals the total number of individuals (N), making the allele frequency a direct proportion of the count of that allele to the population size.[21] To derive it step-by-step, first identify the alleles present by genotyping or phenotyping the sampled individuals; then sum the counts for each allele type, ensuring the sum across all alleles equals N; finally, divide the count for the target allele by N to obtain its frequency, with the frequencies of all alleles summing to 1.[19] This calculation assumes a random sample from the population, where each individual is independently genotyped without bias toward specific genotypes, and focuses on a single locus without considering interactions from multiple loci.[19][21] In practice, for clonal or asexually reproducing haploids like bacteria, clone correction may be applied to avoid overrepresenting repeated genotypes in the sample.[19] For instance, in a bacterial population exposed to antibiotics, the frequency of a resistance allele can be estimated by counting the proportion of resistant colonies (carrying the allele) relative to the total colonies cultured from the sample, providing insight into the prevalence of resistance under selective pressure.[19][22]
In Diploid Populations
In diploid organisms, which include most eukaryotes such as humans and many plants, each individual carries two homologous chromosomes per locus, resulting in two alleles at each genetic locus. To compute allele frequencies at a biallelic locus (with alleles A and a) from observed genotype counts in a diploid population of size N, the frequency p of allele A is given by p = \frac{2 \times (\text{number of AA homozygotes}) + (\text{number of Aa heterozygotes})}{2N}. The frequency q of allele a is then q = 1 - p.[23][24] This formula arises from counting the total number of alleles in the population, which equals 2N since each of the N diploid individuals contributes two alleles. The AA homozygotes contribute two A alleles each, the Aa heterozygotes contribute one A allele (and one a allele) each, and the aa homozygotes contribute zero A alleles (two a alleles each). Thus, the total number of A alleles is 2 × (number of AA) + (number of Aa), and dividing by the total 2N yields p.[23] For a locus with multiple alleles A_1, A_2, ..., A_k (k > 2), the frequency p_i of allele A_i generalizes to p_i = \frac{2 \times (\text{number of A_i A_i homozygotes}) + \sum_{j \neq i} (\text{number of A_i A_j heterozygotes})}{2N}, where the summation accounts for the single contribution of A_i from each heterozygote involving a different allele A_j, and the frequencies sum to 1 across all i. This follows the same allele-counting principle as the biallelic case, extended over all genotype classes.[24] Genotype counts are typically obtained through direct genotyping (e.g., via DNA sequencing) or phenotyping (e.g., observing traits linked to genotypes). While the empirical calculation itself requires no further assumptions, inferences about underlying allele frequencies from phenotypic data often assume random mating in the population to relate observed phenotypes to expected genotype proportions.[23]Illustrative Example
To illustrate the calculation of allele frequencies in a diploid population, consider a hypothetical sample of 100 individuals from a plant population exhibiting variation at a single locus controlling flower color, with two alleles: A (dominant, red flowers) and a (recessive, white flowers). The observed genotypes are 25 AA (homozygous dominant), 50 Aa (heterozygous), and 25 aa (homozygous recessive).[25] The following table summarizes the genotype counts and their contributions to the total allele pool:| Genotype | Count | Contribution to A alleles | Contribution to a alleles |
|---|---|---|---|
| AA | 25 | 50 (2 × 25) | 0 |
| Aa | 50 | 50 (1 × 50) | 50 (1 × 50) |
| aa | 25 | 0 | 50 (2 × 25) |
| Total | 100 | 100 | 100 |