An allele is one of two or more versions of a DNA sequence, such as a single base or a segment of bases, at a specific genomic location on a chromosome.[1] Individuals inherit two alleles for each gene—one from each parent—with these variants occupying the same locus on homologous chromosomes and potentially influencing the production of the same gene product in different ways.[2] Alleles arise from mutations and contribute to genetic variation, enabling differences in inherited traits such as eye color, blood type, or susceptibility to certain diseases.[3]In diploid organisms like humans, the combination of two alleles at a locus determines the genotype, which may result in a homozygous state (both alleles identical) or a heterozygous state (alleles differ).[1] Many alleles exhibit dominance relationships, where a dominant allele expresses its phenotype even if paired with a recessive one, while a recessive allele requires two copies to manifest.[4] For instance, in recessive traits, both inherited alleles must be the variant form for the trait to appear, as seen in conditions like cystic fibrosis.[5] This inheritance pattern follows Mendelian principles, where alleles segregate independently during gamete formation, contributing to the diversity observed in populations.[3]The concept of alleles emerged in early 20th-century genetics, with the term "allelomorph" (shortened to allele) coined by William Bateson and Edith Saunders in 1902 to describe alternative forms of genes in Mendelian inheritance.[6] Today, alleles play a central role in fields like population genetics and evolutionary biology, where their frequencies in populations are analyzed to understand mechanisms such as natural selection, genetic drift, and adaptation.[7] Advances in genomics have revealed millions of common and rare alleles across species, underpinning personalized medicine by linking specific variants to disease risk or drug response.[1]
Definition and History
Definition
An allele is one of two or more versions of a DNA sequence at a specific genomic location, representing alternative forms of a gene that arise through mutation and occupy the same position on a chromosome.[1] These variants determine differences in traits or disease susceptibility among individuals.[8]In genetics, a gene refers to a sequence of nucleotides in DNA or RNA that serves as the basic unit of heredity, encoding functional products such as proteins. The locus is the precise physical site on a chromosome where a particular gene or DNA sequence is located. Homologous chromosomes, which are paired chromosomes—one inherited from each parent—carry alleles at corresponding loci, allowing for potential variation between the maternal and paternal copies.[9]While an allele is a specific variant of a gene, it differs from a haplotype, which is a set of DNA variants (including multiple alleles) inherited together on the same chromosome due to their close proximity.[10] Thus, alleles focus on single-gene variations, whereas haplotypes describe linked combinations across genomic regions.[11]A classic example is the ABO blood group system, where the alleles A, B, and O are variants at the same locus on chromosome 9, leading to different blood types (A, B, AB, or O) based on their combinations.[12] These alleles can result in dominant or recessive phenotypic expressions, such as the O allele being recessive to A and B.[12]
Etymology and Historical Development
The concept of the allele emerged from early efforts to formalize the discrete units of heredity described in Gregor Mendel's 1866 experiments with pea plants, where he implicitly identified alternative forms of heritable factors that segregated independently during reproduction, though he did not name them as such.[13] Mendel's work, published in Versuche über Pflanzen-Hybriden, laid the groundwork for understanding these units as paired alternatives influencing traits like flower color and seed shape, but the ideas remained obscure until their rediscovery in 1900 by Hugo de Vries, Carl Correns, and Erich von Tschermak, who recognized the patterns in their own hybridization studies.[14] This revival integrated Mendel's principles into experimental biology, prompting the need for precise terminology to describe variant forms of these heritable units.The term "allelomorph," meaning alternative forms of a Mendelian factor, was coined in 1902 by British geneticists William Bateson and Edith Rebecca Saunders in their report on poultry plumage inheritance, derived from the Greek roots allel- (meaning "mutual" or "reciprocal") and morphē (form), to denote contrasting versions at the same locus that could pair in heterozygotes.[15] Bateson, a key advocate of Mendelism, used the term to explain dominance and recessiveness in traits like pea comb shape in chickens, solidifying the allelic framework within post-rediscovery genetics. In 1909, Danish botanist Wilhelm Ludwig Johannsen further advanced the conceptual landscape by introducing the terms "genotype" (the hereditary constitution), "phenotype" (the observable expression), and "gene" (the elemental unit within the genotype), distinguishing the underlying allelic makeup from environmental influences in his bean selection experiments.[16]The abbreviation "allele" for "allelomorph" was proposed in 1909 by American geneticist George Harrison Shull in his studies on maize inheritance, simplifying the nomenclature while retaining the original meaning of variant gene forms.[17] Around the same time, Thomas Hunt Morgan's 1910 discovery of a white-eyed mutation in fruit flies (Drosophila melanogaster) provided empirical evidence for allelic variants linked to sex chromosomes, demonstrating how alleles could explain non-Mendelian patterns like sex-linked inheritance and advancing the chromosomal theory of heredity.[18] This work in the 1910s, building on Bateson's allelic ideas, established alleles as discrete, position-specific factors on chromosomes, shifting the understanding from abstract units to mappable entities. The modern molecular view of alleles as sequence variants in DNA solidified after James Watson and Francis Crick's 1953 model of the double helix, revealing how mutations at nucleotide loci create allelic diversity.
Types of Alleles in Classical Genetics
Dominant and Recessive Alleles
In classical Mendelian genetics, alleles at a single locus can exhibit dominance relationships that determine the observable phenotype. A dominant allele is one that determines the phenotype of a trait in a heterozygous individual, effectively masking the expression of its paired allele.[19] In contrast, a recessive allele only produces its associated phenotype when present in the homozygous state, as its effect is overridden by a dominant allele in heterozygotes.[5] These patterns form the basis of simple inheritance for many traits.The phenotypic outcomes depend on the genotype formed by the combination of alleles. In a homozygous dominant genotype (AA), both alleles are dominant, resulting in the full expression of the dominant trait. A heterozygous genotype (Aa) also displays the dominant phenotype due to complete masking of the recessive allele. Only in the homozygous recessive genotype (aa) is the recessive trait fully expressed.[20]Dominance can occur through complete dominance, where the dominant allele fully suppresses the recessive one, leading to no intermediate phenotype in heterozygotes. However, incomplete dominance arises when neither allele fully masks the other, producing a blended or intermediate phenotype in heterozygotes. For instance, in four o'clock plants (Mirabilis jalapa), the allele for red flower color (R) and white flower color (r) interact via incomplete dominance: homozygous RR plants have red flowers, rr plants have white flowers, and heterozygous Rr plants display pink flowers.[21]Representative examples illustrate these concepts in both plants and humans. In Gregor Mendel's pea plant experiments, the allele for tall height (T) is dominant over the allele for short height (t); thus, TT and Tt plants grow tall, while only tt plants are short.[13] In humans, Huntington's disease exemplifies a dominant allele's effect, where a single copy of the mutated HTT gene (responsible for expanded CAG repeats) causes the neurodegenerative disorder in heterozygous individuals, with homozygous cases being rare and typically lethal earlier in life.[22]The inheritance of dominant and recessive alleles in monohybrid crosses can be visualized using Punnett squares, which predict genotypic and phenotypic ratios among offspring. For a cross between two heterozygous parents (Tt × Tt), the Punnett square shows:
T
t
T
TT
Tt
t
Tt
tt
This yields a genotypic ratio of 1:2:1 (1 TT : 2 Tt : 1 tt) and, under complete dominance, a phenotypic ratio of 3:1 (3 tall : 1 short).[23] Such ratios were key to Mendel's formulation of inheritance laws, demonstrating how dominant alleles predominate in populations of offspring from heterozygous matings.[24]
Multiple Alleles
Multiple alleles occur when three or more different forms of a gene, known as alleles, exist at a single locus within a population, allowing for greater genetic variation than the two-allele systems typical in basic Mendelian inheritance. These alleles can interact in ways that extend beyond simple dominance, including codominance, where both alleles in a heterozygote are fully expressed, or a series of dominance where one allele masks another in a hierarchical manner. This phenomenon arises from successive mutations at the locus, introducing new variants that can persist if they provide selective advantages, and is particularly common in immune-related genes such as those encoding blood group antigens, where diversity enhances pathogen resistance through balancing selection.[25][26]A prominent example of codominance among multiple alleles is the human ABO blood group system, controlled by alleles at the ABO locus on chromosome 9: I^A (encoding A antigen), I^B (encoding B antigen), and i (encoding neither, resulting in O blood type). The I^A and I^B alleles are codominant, so individuals with genotype I^A I^B express both A and B antigens on red blood cells, producing the AB phenotype, while i is recessive to both. In contrast, the series of dominance is evident in genotypes like I^A i (A phenotype) or I^B i (B phenotype), where the dominant allele determines the single antigen expressed.[12]Another illustration is rabbit coat color at the C locus (encoding the tyrosinase enzyme), which features four alleles: C (full color), c^{ch} (chinchilla, gray fur), c^h (Himalayan, white with dark extremities), and c (albino, all white). These follow a dominance hierarchy of C > c^{ch} > c^h > c, such that a rabbit with genotype C c^{ch} displays full color, overriding the chinchilla effect, while c^{ch} c^h results in chinchilla fur. This hierarchy produces a spectrum of phenotypes depending on allele combinations, highlighting how multiple alleles can create nuanced trait variations.[27][28]Inheritance patterns with multiple alleles require more complex analyses than standard dihybrid crosses, as individuals inherit only two alleles but the population harbors more, leading to expanded Punnett squares—for a triallelic system, a 3x3 grid yields nine possible zygotes with diverse outcomes rather than the simple 3:1 phenotypic ratio of two-allele dominance. For instance, crossing an I^A I^B parent (AB blood) with an I^A i parent (A blood) produces offspring in a 2:1:1 phenotypic ratio of A : AB : B, with no O types, illustrating the absence of straightforward Mendelian proportions. This contrasts with the two-allele case of dominant and recessive interactions, serving as a foundational subset.
Population and Frequency Aspects
Allele and Genotype Frequencies
In population genetics, the allele frequency refers to the proportion of a specific allele among all alleles at a given gene locus within a population. For a diploid organism with two alleles, A and B, at a locus, the frequency of allele A is denoted as p, and the frequency of allele B is q = 1 - p. This measure quantifies the relative abundance of each allele in the gene pool, which consists of all alleles for that locus across the population.[29][30]Genotype frequency, in contrast, is the proportion of individuals in the population that possess a particular combination of alleles at the locus, such as homozygous AA, heterozygous AB, or homozygous BB. These frequencies reflect the observable genetic makeup of individuals and sum to 1 for all genotypes at the locus. Under assumptions of random mating and no evolutionary forces, the expected genotype frequencies are p^2 for AA, $2pq for AB, and q^2 for BB, providing a baseline for estimating allele frequencies from genotypic data. Allele frequencies are inherently "invisible" as they pertain to the gametic level and require inference from population samples, while genotype frequencies are directly observable through phenotyping or genotyping individuals.[30][7][31]Allele and genotype frequencies are typically calculated by direct counting from sampled individuals. In a population of N diploids, there are $2N alleles at each locus; the count of a specific allele divided by $2N yields its frequency. When direct genotyping is unavailable, frequencies can be estimated from phenotypes, particularly for loci with dominant and recessive alleles, where the frequency of the recessive homozygote directly gives q^2, allowing q to be derived as its square root. These methods enable researchers to assess genetic variation without assuming long-term stability.[32][33]A notable example is the sickle cell allele (HbS) in human populations from malaria-endemic regions of sub-Saharan Africa, where its frequency reaches 0.10 to 0.20 due to heterozygote advantage: individuals with one HbS allele (AS genotype) have increased resistance to severe malaria without severe sickle cell disease. This maintains the allele at higher frequencies than expected otherwise, illustrating how environmental pressures influence genetic composition. Factors such as mutation, which introduces new allelic variants at low rates, migration or gene flow, which redistributes alleles between populations, and natural selection, which favors alleles enhancing survival or reproduction, can alter these frequencies over generations.[34][35][7]
Hardy-Weinberg Equilibrium
The Hardy-Weinberg principle, independently formulated by G. H. Hardy and Wilhelm Weinberg in 1908, describes the expected genotype frequencies in a population that is not evolving, serving as a foundational null model in population genetics.[36][37] Under this model, in a large population undergoing random mating with no evolutionary forces acting, allele frequencies remain constant across generations, and genotype frequencies stabilize at predictable proportions after one generation of random mating.[38] For a gene with two alleles, denoted as A (frequency p) and a (frequency q, where p + q = 1), the expected genotype frequencies are homozygous AA at p², heterozygous Aa at 2pq, and homozygous aa at q², summing to unity:p^2 + 2pq + q^2 = 1This binomial expansion arises from the random union of gametes, where the probability of combining two A alleles is p × p, two a alleles is q × q, and one of each is 2 × (p × q).[38]The principle relies on five key assumptions: infinitely large population size (to eliminate genetic drift), random mating (no assortative mating or inbreeding), no migration (no gene flow), no mutation (no new alleles introduced), and no natural selection (equal fitness among genotypes).[39] Deviations from these conditions indicate the presence of evolutionary forces. To test whether observed genotype frequencies conform to Hardy-Weinberg expectations, a chi-square goodness-of-fit test is commonly applied, comparing observed counts to expected values under the model; the test statistic is calculated as \sum \frac{(O - E)^2}{E}, where O is observed and E is expected, with degrees of freedom equal to the number of genotypes minus the number of alleles (e.g., 1 for two alleles).[40]The model extends to multiple alleles at a locus, say k alleles with frequencies p₁, p₂, ..., pₖ (summing to 1), where expected homozygote frequencies are pᵢ² and heterozygote frequencies are 2pᵢpⱼ for i ≠ j, following the multinomial expansion (p_1 + p_2 + \dots + p_k)^2 = 1.[41] In practice, the principle is used to detect evolutionary processes by assessing deviations via statistical tests, such as in the MN blood group system (alleles M and N, codominant phenotypes M, MN, N). For instance, in a sample of 5,000 individuals with 1,460 M, 2,550 MN, and 990 N, allele frequencies are p_M ≈ 0.547 and p_N ≈ 0.453, yielding expected counts of approximately 1,496 M, 2,478 MN, and 1,026 N under equilibrium, which can be tested for fit.[33]
Molecular and Epigenetic Variants
Molecular Basis of Alleles
At the molecular level, alleles are defined as alternative forms of a gene arising from variations in the DNA sequence at a specific locus on a chromosome.[42] These variants include single nucleotide polymorphisms (SNPs), where a single base pair differs between individuals, as well as insertions and deletions (indels) that add or remove nucleotides.[43] SNPs represent the most common type of genetic variation in humans, occurring approximately once every 300-1,000 base pairs.[43]Alleles originate primarily from mutations that alter the DNA sequence during replication, repair, or exposure to mutagens. Point mutations, or substitutions, replace one nucleotide with another and can lead to SNPs; these may be transitions (purine to purine or pyrimidine to pyrimidine) or transversions (purine to pyrimidine or vice versa).[44] Insertions and deletions cause frameshift mutations when they occur in coding regions, shifting the reading frame and often resulting in truncated or nonfunctional proteins.[45] Copy number variations (CNVs), involving duplications or deletions of larger DNA segments (typically 1 kb to several Mb), also generate allelic diversity by altering gene dosage.[46]Detection of allelic variants relies on molecular techniques that interrogate DNA sequences. Sanger sequencing provides high-accuracy reads for targeted loci, serving as a gold standard for confirming variants like SNPs and small indels.[47] Next-generation sequencing (NGS) enables high-throughput analysis of entire genomes or exomes, identifying a broad spectrum of variants including SNPs, indels, and CNVs with greater sensitivity than traditional methods.[47] PCR-based genotyping, such as allele-specific PCR, amplifies and distinguishes specific alleles by designing primers that bind uniquely to variant sequences, offering a cost-effective approach for known mutations.[48]The functional consequences of alleles depend on their location and nature within the gene. Synonymous mutations alter the DNA sequence but do not change the encoded amino acid due to the degeneracy of the genetic code, typically having minimal impact on protein structure.[49] In contrast, nonsynonymous mutations substitute one amino acid for another, potentially disrupting protein folding, stability, or function.[50] Regulatory alleles, located in noncoding regions such as promoters or enhancers, influence gene expression levels without altering the protein sequence, often by affecting transcription factor binding or chromatin accessibility.[51]A prominent example is the alleles of the CFTR gene associated with cystic fibrosis. The ΔF508 allele, a common mutant variant, results from a three-base-pair deletion that removes phenylalanine at position 508, leading to a misfolded protein that fails to traffic properly to the cell membrane; this deletion accounts for approximately 70% of cystic fibrosis alleles worldwide.[52]
Epialleles
Epialleles represent heritable variations in gene expression that arise from differences in epigenetic modifications, such as DNA methylation or histone modifications, without alterations to the underlying DNA sequence. These variants function as allele-like entities because they can be stably transmitted through cell divisions (mitotically heritable) or across generations (meiotically heritable), influencing phenotypic outcomes in a manner analogous to genetic alleles. Unlike transient epigenetic changes, epialleles maintain their states over multiple generations, distinguishing them from short-term environmental responses.[53]Key mechanisms underlying epialleles include paramutation, where one allele induces a heritable epigenetic change in a homologous allele, often through RNA-mediated silencing or chromatin remodeling. Another prominent mechanism is genomic imprinting, which involves parent-of-origin-specific epigenetic marks that silence one parental allele, leading to monoallelic expression. These processes enable epialleles to modulate gene activity in a tissue- or developmental stage-specific way, contributing to complex traits and diseases.[54]A classic example is the agouti viable yellow (A^vy) epiallele in mice, where variable DNA methylation at a retrotransposon promoter regulates the ectopic expression of the agouti gene, resulting in coat color variation from yellow (hypomethylated, obese) to pseudoagouti (hypermethylated, lean). In humans, epialleles play a critical role in imprinting disorders such as Prader-Willi syndrome (paternal deletion or maternal imprinting defect at 15q11-13) and Angelman syndrome (maternal deletion or paternal imprinting defect at the same locus), where loss of specific epiallelic marks leads to neurodevelopmental phenotypes.[55]Epialleles exhibit varying degrees of stability, often persisting through meiosis but remaining sensitive to environmental factors like diet or stress, which can alter methylation patterns and propagate changes transgenerationally. This environmental responsiveness differentiates stable epialleles from purely genetic variants, while their heritability sets them apart from reversible epigenetic modifications.[56]Recent research highlights epialleles' roles in development and cancer, with studies in the 2020s demonstrating their involvement in stem cell differentiation and tumor progression through dynamic histone modifications. Advances in transgenerational epigenetics have revealed enhanced stability of certain epialleles in response to stressors, as shown in rodent models where paternal exposure to endocrine disruptors induced heritable metabolic changes via spermmethylation. These findings underscore epialleles' potential as mediators of environmental adaptation and disease susceptibility.[56][57]
Alleles in Disease and Special Contexts
Dominance in Genetic Disorders
In genetic disorders, dominance plays a pivotal role in determining the phenotypic expression of mutant alleles. Autosomal dominant disorders typically arise from a single mutant allele in a heterozygous individual, where the mutant allele exerts a sufficient effect to cause disease, often through mechanisms that disrupt normal cellular function. In contrast, autosomal recessive disorders require two mutant alleles (homozygous state) for manifestation, as the wild-type allele in heterozygotes can compensate for the loss. X-linked recessive disorders, such as hemophilia, primarily affect males who inherit a single mutant allele on the X chromosome, while females are usually asymptomatic carriers due to the presence of a second X chromosome.[58][59][60]The molecular mechanisms underlying dominance in these disorders often involve gain-of-function mutations for dominant effects, where the mutant protein acquires enhanced or novel activity that interferes with normal processes, or loss-of-function mutations that are recessive unless compounded by haploinsufficiency—the scenario where one functional allele produces insufficient protein for normal function. For instance, in Marfan syndrome, an autosomal dominant connective tissue disorder, mutations in the FBN1 gene lead to haploinsufficiency or dominant-negative effects on fibrillin-1 protein assembly, resulting in aortic aneurysms and skeletal abnormalities even in heterozygotes. Gain-of-function mutations, such as those in certain ion channels or signaling proteins, can drive dominant disorders like long QT syndrome by prolonging cardiac repolarization. Recessive disorders like cystic fibrosis typically stem from loss-of-function alleles in the CFTR gene, requiring biallelic impairment for chloride transport defects to manifest.[61][62][63]Incomplete penetrance complicates dominance patterns, as seen in BRCA1 mutations associated with hereditary breast and ovarian cancer, where not all carriers develop disease due to modifier effects, with lifetime breast cancer risk estimated at 55-72% rather than 100%. Clinically, pedigree analysis is essential for tracing inheritance patterns, identifying at-risk relatives, and guiding genetic counseling, which informs reproductive decisions and risk management strategies like prophylactic surgeries. Population screening for carriers of recessive alleles, such as in Tay-Sachs disease or sickle cell anemia, enables preconception testing to assess couple risks and prevent affected offspring.[64][65][66]Modern insights reveal that polygenic factors and environmental influences can modulate dominance and penetrance; for example, common genetic variants may alter the expressivity of a dominant allele in complex disorders like schizophrenia, shifting from strict Mendelian patterns toward polygenic risk models. Advances in CRISPR-Cas9 editing offer therapeutic potential by precisely targeting disease alleles, such as correcting FBN1 mutations in Marfan syndrome models or silencing gain-of-function alleles in autoinflammatory diseases. Preclinical studies have demonstrated allele-specific editing efficiencies of 20–40% in models of genetic disorders like Huntington's disease and retinitis pigmentosa, while clinical trials for CRISPR-based therapies in conditions such as sickle cell disease and transthyretin amyloidosis have shown clinical benefits, including up to 96% target reduction in some cases as of 2025. Recent 2025 advances include allele-selective editing breakthroughs for Huntington's disease and collagen disorders. These approaches underscore the transition from descriptive genetics to targeted interventions for dominance-related pathologies.[67][68][69][70][71]
Idiomorphs
Idiomorphs represent a specialized form of allelic variation observed at mating-type loci in certain fungi, where the sequences are non-homologous yet occupy the same genomic position and fulfill equivalent biological roles in regulating sexual compatibility. Unlike conventional alleles, which share significant sequence homology due to common descent, idiomorphs exhibit substantial dissimilarity in nucleotide composition and structure while maintaining functional parity in determining mating identity. This terminology was introduced to describe such sequences in the ascomycete fungus Neurospora crassa, highlighting their divergence from traditional allelic definitions.In the context of fungal biology, idiomorphs are predominantly found in ascomycete and basidiomycete species, where they control key aspects of sexual reproduction, including mate recognition, pheromone signaling, and the initiation of developmental pathways leading to spore formation. These loci ensure that mating occurs only between compatible partners, thereby enforcing outcrossing in heterothallic species and preventing self-fertilization. The presence of idiomorphs at a single locus (unifactorial system) or multiple unlinked loci (bifactorial system) varies across fungal lineages, but their core function remains the promotion of genetic diversity through regulated sexual interactions.[72]A prominent example is provided by Neurospora crassa, a model ascomycete, where the mating-type locus contains two idiomorphs: mat a and mat A, each spanning approximately 5 kb with no detectable sequence homology between them. The mat a idiomorph encodes a single gene (mat a-1) that specifies "a" mating identity, while mat A harbors three genes (mat A-1, mat A-2, and mat A-3) that collectively define "A" identity and suppress vegetative incompatibility in heterokaryons. These idiomorphs orchestrate the sexual cycle by activating downstream genes only upon fusion of opposite mating types. Similarly, in the yeast Saccharomyces cerevisiae, the MAT locus features MATa and MATα idiomorphs, which are nonhomologous and differ in size and gene content; MATa encodes the MATa1transcription factor, promoting "a"-specific functions like a-factor production, whereas MATα encodes MATα1 and MATα2 regulators that drive α-specific traits, including α-factor secretion and cell cycle arrest in response to pheromones.[73]The evolutionary origins of idiomorphs trace back to ancient genome rearrangements in fungal ancestors, including inversions, translocations, and gene duplications that created divergent sequences while suppressing recombination across the locus boundaries. Such mechanisms likely evolved to stabilize distinct mating types, thereby sustaining heterothallism and enhancing outcrossing rates in diverse environments. Evidence from comparative genomics indicates that these rearrangements predated the divergence of ascomycetes and basidiomycetes, with small regions of shared identity (e.g., flanking sequences) suggesting a common ancestral locus that underwent rapid evolution to evade homologous pairing.[74][72]Although idiomorphs do not qualify as true alleles under strict homology criteria, their designation as such stems from their allelic position on the chromosome and interchangeable functional outcomes in mating, offering profound insights into fungal genetics, including the molecular basis of reproductive isolation and the potential for engineeringmating systems in biotechnology. This distinction underscores the adaptive flexibility of fungal genomes in sexual regulation, contrasting with more conserved allelic systems in other organisms.[72]