Fact-checked by Grok 2 weeks ago

Allele

An allele is one of two or more versions of sequence, such as a single or a segment of bases, at a specific genomic location on a . Individuals inherit two alleles for each —one from each parent—with these variants occupying the same locus on homologous chromosomes and potentially influencing the production of the same in different ways. Alleles arise from and contribute to , enabling differences in inherited traits such as , , or susceptibility to certain diseases. In diploid organisms like humans, the combination of two alleles at a locus determines the , which may result in a homozygous state (both alleles identical) or a heterozygous state (alleles differ). Many alleles exhibit dominance relationships, where a dominant allele expresses its even if paired with a recessive one, while a recessive allele requires two copies to manifest. For instance, in recessive traits, both inherited alleles must be the variant form for the trait to appear, as seen in conditions like . This inheritance pattern follows Mendelian principles, where alleles segregate independently during gamete formation, contributing to the diversity observed in populations. The concept of alleles emerged in early 20th-century , with the term "allelomorph" (shortened to allele) coined by and Edith Saunders in 1902 to describe alternative forms of genes in . Today, alleles play a central role in fields like and , where their frequencies in populations are analyzed to understand mechanisms such as , , and . Advances in have revealed millions of common and rare alleles across , underpinning by linking specific variants to disease risk or drug response.

Definition and History

Definition

An allele is one of two or more versions of sequence at a specific genomic , representing alternative forms of a that arise through and occupy the same position on a . These variants determine differences in traits or susceptibility among individuals. In genetics, a gene refers to a sequence of nucleotides in DNA or RNA that serves as the basic unit of heredity, encoding functional products such as proteins. The locus is the precise physical site on a chromosome where a particular gene or DNA sequence is located. Homologous chromosomes, which are paired chromosomes—one inherited from each parent—carry alleles at corresponding loci, allowing for potential variation between the maternal and paternal copies. While an allele is a specific variant of a , it differs from a haplotype, which is a set of DNA variants (including multiple alleles) inherited together on the same due to their close proximity. Thus, alleles focus on single-gene variations, whereas haplotypes describe linked combinations across genomic regions. A classic example is the , where the alleles A, B, and O are variants at the same locus on , leading to different blood types (A, B, AB, or O) based on their combinations. These alleles can result in dominant or recessive phenotypic expressions, such as the O allele being recessive to A and B.

Etymology and Historical Development

The concept of the allele emerged from early efforts to formalize the discrete units of described in Gregor Mendel's experiments with pea plants, where he implicitly identified alternative forms of heritable factors that segregated independently during reproduction, though he did not name them as such. Mendel's work, published in Versuche über Pflanzen-Hybriden, laid the groundwork for understanding these units as paired alternatives influencing traits like flower color and seed shape, but the ideas remained obscure until their rediscovery in 1900 by , , and , who recognized the patterns in their own hybridization studies. This revival integrated Mendel's principles into experimental biology, prompting the need for precise terminology to describe variant forms of these heritable units. The term "allelomorph," meaning alternative forms of a Mendelian factor, was coined in 1902 by British geneticists William Bateson and Edith Rebecca Saunders in their report on poultry plumage inheritance, derived from the Greek roots allel- (meaning "mutual" or "reciprocal") and morphē (form), to denote contrasting versions at the same locus that could pair in heterozygotes. Bateson, a key advocate of Mendelism, used the term to explain dominance and recessiveness in traits like pea comb shape in chickens, solidifying the allelic framework within post-rediscovery genetics. In 1909, Danish botanist Wilhelm Ludwig Johannsen further advanced the conceptual landscape by introducing the terms "genotype" (the hereditary constitution), "phenotype" (the observable expression), and "gene" (the elemental unit within the genotype), distinguishing the underlying allelic makeup from environmental influences in his bean selection experiments. The abbreviation "allele" for "allelomorph" was proposed in 1909 by American geneticist George Harrison Shull in his studies on maize inheritance, simplifying the nomenclature while retaining the original meaning of variant gene forms. Around the same time, Thomas Hunt Morgan's 1910 discovery of a white-eyed mutation in fruit flies (Drosophila melanogaster) provided empirical evidence for allelic variants linked to sex chromosomes, demonstrating how alleles could explain non-Mendelian patterns like sex-linked inheritance and advancing the chromosomal theory of heredity. This work in the 1910s, building on Bateson's allelic ideas, established alleles as discrete, position-specific factors on chromosomes, shifting the understanding from abstract units to mappable entities. The modern molecular view of alleles as sequence variants in DNA solidified after James Watson and Francis Crick's 1953 model of the double helix, revealing how mutations at nucleotide loci create allelic diversity.

Types of Alleles in Classical Genetics

Dominant and Recessive Alleles

In classical Mendelian , alleles at a single locus can exhibit dominance relationships that determine the observable . A dominant allele is one that determines the of a trait in a heterozygous , effectively masking the expression of its paired allele. In contrast, a recessive allele only produces its associated when present in the homozygous state, as its effect is overridden by a dominant allele in heterozygotes. These patterns form the basis of simple inheritance for many . The phenotypic outcomes depend on the formed by the combination of alleles. In a homozygous dominant (AA), both alleles are dominant, resulting in the full expression of the dominant . A heterozygous (Aa) also displays the dominant phenotype due to complete masking of the recessive allele. Only in the homozygous recessive (aa) is the recessive fully expressed. Dominance can occur through complete dominance, where the dominant allele fully suppresses the recessive one, leading to no intermediate in heterozygotes. However, incomplete dominance arises when neither allele fully masks the other, producing a blended or intermediate in heterozygotes. For instance, in four o'clock (), the allele for red flower color (R) and white flower color (r) interact via incomplete dominance: homozygous RR have red flowers, rr have white flowers, and heterozygous Rr display pink flowers. Representative examples illustrate these concepts in both plants and humans. In Gregor Mendel's pea plant experiments, the allele for tall height (T) is dominant over the allele for short height (t); thus, TT and Tt plants grow tall, while only tt plants are short. In humans, exemplifies a dominant allele's effect, where a single copy of the mutated HTT gene (responsible for expanded repeats) causes the neurodegenerative disorder in heterozygous individuals, with homozygous cases being rare and typically lethal earlier in life. The of dominant and recessive alleles in monohybrid crosses can be visualized using , which predict genotypic and phenotypic ratios among offspring. For a cross between two heterozygous parents (Tt × Tt), the shows:
Tt
TTTTt
tTttt
This yields a genotypic ratio of 1:2:1 (1 TT : 2 Tt : 1 tt) and, under complete dominance, a phenotypic ratio of 3:1 (3 tall : 1 short). Such ratios were key to Mendel's formulation of laws, demonstrating how dominant alleles predominate in populations of offspring from heterozygous matings.

Multiple Alleles

Multiple alleles occur when three or more different forms of a , known as alleles, exist at a single locus within a , allowing for greater than the two-allele systems typical in basic . These alleles can interact in ways that extend beyond simple dominance, including codominance, where both alleles in a heterozygote are fully expressed, or a series of dominance where one allele masks another in a hierarchical manner. This phenomenon arises from successive mutations at the locus, introducing new variants that can persist if they provide selective advantages, and is particularly common in immune-related genes such as those encoding blood group antigens, where enhances resistance through balancing selection. A prominent example of codominance among multiple alleles is the human , controlled by alleles at the ABO locus on : I^A (encoding A ), I^B (encoding B ), and i (encoding neither, resulting in O ). The I^A and I^B alleles are codominant, so individuals with genotype I^A I^B express both A and B antigens on red blood cells, producing the AB , while i is recessive to both. In contrast, the series of dominance is evident in genotypes like I^A i (A ) or I^B i (B ), where the dominant allele determines the single antigen expressed. Another illustration is rabbit coat color at the C locus (encoding the tyrosinase enzyme), which features four alleles: C (full color), c^{ch} (chinchilla, gray fur), c^h (Himalayan, white with dark extremities), and c (albino, all white). These follow a dominance hierarchy of C > c^{ch} > c^h > c, such that a rabbit with genotype C c^{ch} displays full color, overriding the chinchilla effect, while c^{ch} c^h results in chinchilla fur. This hierarchy produces a spectrum of phenotypes depending on allele combinations, highlighting how multiple alleles can create nuanced trait variations. Inheritance patterns with multiple alleles require more complex analyses than standard dihybrid crosses, as individuals inherit only two alleles but the population harbors more, leading to expanded Punnett squares—for a triallelic system, a 3x3 grid yields nine possible zygotes with diverse outcomes rather than the simple 3:1 phenotypic ratio of two-allele dominance. For instance, crossing an I^A I^B parent (AB blood) with an I^A i parent (A blood) produces offspring in a 2:1:1 phenotypic ratio of A : AB : B, with no O types, illustrating the absence of straightforward Mendelian proportions. This contrasts with the two-allele case of dominant and recessive interactions, serving as a foundational subset.

Population and Frequency Aspects

Allele and Genotype Frequencies

In , the refers to the proportion of a specific among all alleles at a given locus within a . For a diploid with two alleles, A and B, at a locus, the frequency of allele A is denoted as p, and the frequency of allele B is q = 1 - p. This measure quantifies the relative abundance of each allele in the , which consists of all alleles for that locus across the population. Genotype frequency, in contrast, is the proportion of individuals in the that possess a particular combination of alleles at the locus, such as homozygous , heterozygous , or homozygous . These frequencies reflect the observable genetic makeup of individuals and sum to 1 for all genotypes at the locus. Under assumptions of random and no evolutionary forces, the expected genotype frequencies are p^2 for , $2pq for , and q^2 for , providing a baseline for estimating allele frequencies from genotypic . Allele frequencies are inherently "invisible" as they pertain to the gametic level and require from samples, while genotype frequencies are directly observable through phenotyping or individuals. Allele and frequencies are typically calculated by direct counting from sampled individuals. In a of N diploids, there are $2N alleles at each locus; the count of a specific allele divided by $2N yields its . When direct is unavailable, frequencies can be estimated from phenotypes, particularly for loci with dominant and recessive alleles, where the frequency of the recessive homozygote directly gives q^2, allowing q to be derived as its . These methods enable researchers to assess without assuming long-term stability. A notable example is the sickle cell allele (HbS) in human populations from malaria-endemic regions of , where its frequency reaches 0.10 to 0.20 due to : individuals with one HbS allele (AS ) have increased resistance to severe without severe . This maintains the allele at higher frequencies than expected otherwise, illustrating how environmental pressures influence genetic composition. Factors such as , which introduces new allelic variants at low rates, migration or , which redistributes alleles between populations, and , which favors alleles enhancing survival or reproduction, can alter these frequencies over generations.

Hardy-Weinberg Equilibrium

The Hardy-Weinberg principle, independently formulated by and Wilhelm Weinberg in 1908, describes the expected frequencies in a that is not evolving, serving as a foundational null model in . Under this model, in a large undergoing random with no evolutionary forces , allele frequencies remain constant across s, and frequencies stabilize at predictable proportions after one of random . For a with two alleles, denoted as A (frequency p) and a (frequency q, where p + q = 1), the expected frequencies are homozygous AA at , heterozygous Aa at 2pq, and homozygous aa at , summing to unity: p^2 + 2pq + q^2 = 1 This binomial expansion arises from the random union of gametes, where the probability of combining two A alleles is p × p, two a alleles is q × q, and one of each is 2 × (p × q). The principle relies on five key assumptions: infinitely large population size (to eliminate genetic drift), random mating (no assortative mating or inbreeding), no migration (no gene flow), no mutation (no new alleles introduced), and no natural selection (equal fitness among genotypes). Deviations from these conditions indicate the presence of evolutionary forces. To test whether observed genotype frequencies conform to Hardy-Weinberg expectations, a chi-square goodness-of-fit test is commonly applied, comparing observed counts to expected values under the model; the test statistic is calculated as \sum \frac{(O - E)^2}{E}, where O is observed and E is expected, with degrees of freedom equal to the number of genotypes minus the number of alleles (e.g., 1 for two alleles). The model extends to multiple alleles at a locus, say k alleles with frequencies p₁, p₂, ..., pₖ (summing to 1), where expected homozygote frequencies are pᵢ² and heterozygote frequencies are 2pᵢpⱼ for i ≠ j, following the multinomial (p_1 + p_2 + \dots + p_k)^2 = 1. In practice, the principle is used to detect evolutionary processes by assessing deviations via statistical tests, such as in the MN blood group system (alleles M and N, codominant phenotypes M, MN, N). For instance, in a sample of 5,000 individuals with 1,460 M, 2,550 MN, and 990 N, allele frequencies are p_M ≈ 0.547 and p_N ≈ 0.453, yielding expected counts of approximately 1,496 M, 2,478 MN, and 1,026 N under equilibrium, which can be tested for fit.

Molecular and Epigenetic Variants

Molecular Basis of Alleles

At the molecular level, alleles are defined as alternative forms of a gene arising from variations in the DNA sequence at a specific locus on a chromosome. These variants include single nucleotide polymorphisms (SNPs), where a single base pair differs between individuals, as well as insertions and deletions (indels) that add or remove nucleotides. SNPs represent the most common type of genetic variation in humans, occurring approximately once every 300-1,000 base pairs. Alleles originate primarily from mutations that alter the DNA sequence during replication, repair, or exposure to mutagens. Point mutations, or substitutions, replace one nucleotide with another and can lead to SNPs; these may be transitions (purine to purine or pyrimidine to pyrimidine) or transversions (purine to pyrimidine or vice versa). Insertions and deletions cause frameshift mutations when they occur in coding regions, shifting the reading frame and often resulting in truncated or nonfunctional proteins. Copy number variations (CNVs), involving duplications or deletions of larger DNA segments (typically 1 kb to several Mb), also generate allelic diversity by altering gene dosage. Detection of allelic variants relies on molecular techniques that interrogate DNA sequences. Sanger sequencing provides high-accuracy reads for targeted loci, serving as a gold standard for confirming variants like SNPs and small indels. Next-generation sequencing (NGS) enables high-throughput analysis of entire genomes or exomes, identifying a broad spectrum of variants including SNPs, indels, and CNVs with greater sensitivity than traditional methods. PCR-based genotyping, such as allele-specific PCR, amplifies and distinguishes specific alleles by designing primers that bind uniquely to variant sequences, offering a cost-effective approach for known mutations. The functional consequences of alleles depend on their location and nature within the gene. Synonymous mutations alter the DNA sequence but do not change the encoded amino acid due to the degeneracy of the genetic code, typically having minimal impact on protein structure. In contrast, nonsynonymous mutations substitute one amino acid for another, potentially disrupting protein folding, stability, or function. Regulatory alleles, located in noncoding regions such as promoters or enhancers, influence gene expression levels without altering the protein sequence, often by affecting transcription factor binding or chromatin accessibility. A prominent example is the alleles of the CFTR gene associated with . The ΔF508 allele, a common mutant variant, results from a three-base-pair deletion that removes at position 508, leading to a misfolded protein that fails to traffic properly to the ; this deletion accounts for approximately 70% of cystic fibrosis alleles worldwide.

Epialleles

Epialleles represent heritable variations in that arise from differences in epigenetic modifications, such as or histone modifications, without alterations to the underlying DNA sequence. These variants function as allele-like entities because they can be stably transmitted through cell divisions (mitotically heritable) or across generations (meiotically heritable), influencing phenotypic outcomes in a manner analogous to genetic alleles. Unlike transient epigenetic changes, epialleles maintain their states over multiple generations, distinguishing them from short-term environmental responses. Key mechanisms underlying epialleles include paramutation, where one allele induces a heritable epigenetic change in a homologous allele, often through RNA-mediated silencing or . Another prominent mechanism is , which involves parent-of-origin-specific epigenetic marks that silence one parental allele, leading to monoallelic expression. These processes enable epialleles to modulate activity in a - or developmental stage-specific way, contributing to and diseases. A classic example is the viable yellow (A^vy) epiallele in mice, where variable at a promoter regulates the ectopic expression of the , resulting in coat color variation from yellow (hypomethylated, obese) to pseudoagouti (hypermethylated, lean). In humans, epialleles play a critical role in imprinting disorders such as Prader-Willi syndrome (paternal deletion or maternal imprinting defect at 15q11-13) and (maternal deletion or paternal imprinting defect at the same locus), where loss of specific epiallelic marks leads to neurodevelopmental phenotypes. Epialleles exhibit varying degrees of stability, often persisting through but remaining sensitive to environmental factors like or , which can alter patterns and propagate changes transgenerationally. This environmental responsiveness differentiates stable epialleles from purely genetic variants, while their sets them apart from reversible epigenetic modifications. Recent highlights epialleles' roles in and cancer, with studies in the 2020s demonstrating their involvement in differentiation and tumor progression through dynamic histone modifications. Advances in transgenerational epigenetics have revealed enhanced stability of certain epialleles in response to stressors, as shown in models where paternal exposure to endocrine disruptors induced heritable metabolic changes via . These findings underscore epialleles' potential as mediators of environmental and susceptibility.

Alleles in Disease and Special Contexts

Dominance in Genetic Disorders

In genetic disorders, dominance plays a pivotal role in determining the phenotypic expression of mutant alleles. Autosomal dominant disorders typically arise from a single mutant allele in a heterozygous individual, where the mutant allele exerts a sufficient effect to cause disease, often through mechanisms that disrupt normal cellular function. In contrast, autosomal recessive disorders require two mutant alleles (homozygous state) for manifestation, as the wild-type allele in heterozygotes can compensate for the loss. X-linked recessive disorders, such as hemophilia, primarily affect males who inherit a single mutant allele on the , while females are usually asymptomatic carriers due to the presence of a second . The molecular mechanisms underlying dominance in these disorders often involve gain-of-function mutations for dominant effects, where the mutant protein acquires enhanced or novel activity that interferes with normal processes, or loss-of-function mutations that are recessive unless compounded by —the scenario where one functional allele produces insufficient protein for normal function. For instance, in , an autosomal dominant disorder, mutations in the FBN1 gene lead to haploinsufficiency or dominant-negative effects on fibrillin-1 protein assembly, resulting in aortic aneurysms and skeletal abnormalities even in heterozygotes. Gain-of-function mutations, such as those in certain ion channels or signaling proteins, can drive dominant disorders like by prolonging cardiac . Recessive disorders like typically stem from loss-of-function alleles in the CFTR gene, requiring biallelic impairment for chloride transport defects to manifest. Incomplete penetrance complicates dominance patterns, as seen in BRCA1 mutations associated with hereditary breast and ovarian cancer, where not all carriers develop disease due to modifier effects, with lifetime breast cancer risk estimated at 55-72% rather than 100%. Clinically, pedigree analysis is essential for tracing inheritance patterns, identifying at-risk relatives, and guiding genetic counseling, which informs reproductive decisions and risk management strategies like prophylactic surgeries. Population screening for carriers of recessive alleles, such as in Tay-Sachs disease or sickle cell anemia, enables preconception testing to assess couple risks and prevent affected offspring. Modern insights reveal that polygenic factors and environmental influences can modulate dominance and ; for example, common genetic variants may alter the expressivity of a dominant allele in complex disorders like , shifting from strict Mendelian patterns toward polygenic risk models. Advances in CRISPR-Cas9 editing offer therapeutic potential by precisely targeting disease alleles, such as correcting FBN1 mutations in models or silencing gain-of-function alleles in . Preclinical studies have demonstrated allele-specific editing efficiencies of 20–40% in models of genetic disorders like and , while clinical trials for CRISPR-based therapies in conditions such as and transthyretin have shown clinical benefits, including up to 96% target reduction in some cases as of 2025. Recent 2025 advances include allele-selective editing breakthroughs for and disorders. These approaches underscore the transition from descriptive to targeted interventions for dominance-related pathologies.

Idiomorphs

Idiomorphs represent a specialized form of allelic variation observed at mating-type loci in certain fungi, where the sequences are non-homologous yet occupy the same genomic position and fulfill equivalent biological roles in regulating sexual compatibility. Unlike conventional alleles, which share significant due to , idiomorphs exhibit substantial dissimilarity in nucleotide composition and structure while maintaining functional parity in determining mating identity. This terminology was introduced to describe such sequences in the ascomycete fungus , highlighting their divergence from traditional allelic definitions. In the context of fungal biology, idiomorphs are predominantly found in ascomycete and basidiomycete , where they control key aspects of , including mate recognition, signaling, and the initiation of developmental pathways leading to formation. These loci ensure that mating occurs only between compatible partners, thereby enforcing in heterothallic and preventing self-fertilization. The presence of idiomorphs at a single locus (unifactorial system) or multiple unlinked loci (bifactorial system) varies across fungal lineages, but their core function remains the promotion of through regulated sexual interactions. A prominent example is provided by , a model ascomycete, where the mating-type locus contains two idiomorphs: mat a and mat A, each spanning approximately 5 kb with no detectable between them. The mat a idiomorph encodes a single (mat a-1) that specifies "a" identity, while mat A harbors three s (mat A-1, mat A-2, and mat A-3) that collectively define "A" identity and suppress vegetative incompatibility in heterokaryons. These idiomorphs orchestrate the sexual cycle by activating downstream s only upon fusion of opposite . Similarly, in the yeast , the locus features MATa and MATα idiomorphs, which are nonhomologous and differ in size and content; MATa encodes the MATa1 , promoting "a"-specific functions like a-factor production, whereas MATα encodes MATα1 and MATα2 regulators that drive α-specific traits, including α-factor secretion and arrest in response to pheromones. The evolutionary origins of idiomorphs trace back to ancient genome rearrangements in fungal ancestors, including inversions, translocations, and duplications that created divergent sequences while suppressing recombination across the locus boundaries. Such mechanisms likely evolved to stabilize distinct , thereby sustaining and enhancing rates in diverse environments. Evidence from indicates that these rearrangements predated the divergence of ascomycetes and basidiomycetes, with small regions of shared identity (e.g., flanking sequences) suggesting a common ancestral locus that underwent rapid to evade homologous pairing. Although idiomorphs do not qualify as true alleles under strict criteria, their designation as such stems from their allelic position on the and interchangeable functional outcomes in , offering profound insights into fungal , including the molecular basis of and the potential for systems in . This distinction underscores the adaptive flexibility of fungal genomes in sexual regulation, contrasting with more conserved allelic systems in other .