Coefficient of inbreeding
The coefficient of inbreeding, denoted as F, is a key parameter in population genetics that quantifies the probability that two alleles at any given locus in an individual are identical by descent, meaning they are replicas of the same allele inherited from a common ancestor rather than arising independently.[1] This measure reflects the extent of consanguineous mating in an individual's pedigree and the resulting increase in homozygosity compared to random mating.[1] Developed by geneticist Sewall Wright in his 1922 paper "Coefficients of Inbreeding and Relationship," the coefficient is typically calculated using pedigree data with the formula F = \sum \left( \frac{1}{2} \right)^{n_1 + n_2 + 1} (1 + F_A) where the summation is over all common ancestors A, n_1 and n_2 are the numbers of generations from each parent to A, and F_A is the inbreeding coefficient of the common ancestor A.[1] Values of F range from 0 (no inbreeding, as in a randomly mating population) to 1 (complete inbreeding, where all loci are homozygous by descent), with examples including F = 0.25 for offspring of full siblings and F = 0.125 for half-sibling or grandparent-grandchild matings.[1] Modern genomic methods can also estimate F directly from DNA sequence data by assessing runs of homozygosity, providing a more precise alternative to pedigree-based calculations in species with incomplete records.[2] In animal and plant breeding, the coefficient guides efforts to fix desirable traits through controlled inbreeding while minimizing risks, as elevated F values correlate with inbreeding depression—reduced biological fitness, such as lower growth rates, smaller litter sizes, and higher mortality in farm animals.[3] For instance, in swine, each 10% increase in F can decrease litter size by 0.20 to 0.44 pigs.[3] In conservation biology, monitoring F helps identify populations at risk from genetic erosion, where high inbreeding accelerates extinction by amplifying deleterious recessive alleles and diminishing adaptive potential in threatened species.[4] Overall, the coefficient remains a foundational tool for managing genetic health across domesticated, wild, and captive populations.[5]Definition and Fundamentals
Definition
The coefficient of inbreeding, denoted as F, for an individual is the probability that the two alleles at any autosomal locus are identical by descent (IBD) from a common ancestor.[6] This measure quantifies the extent of inbreeding in an individual's pedigree by assessing the likelihood that both alleles inherited from the parents trace back to the same ancestral allele, rather than arising independently. Identical by descent (IBD) differs from identical by state (IBS), where IBS refers simply to alleles having the same nucleotide sequence, regardless of origin, while IBD specifically requires that the alleles are copies of the same ancestral allele passed through the pedigree.[7] IBD thus emphasizes genealogical tracing and shared ancestry, whereas IBS can occur due to chance similarity or convergence in unrelated lineages, barring rare mutations that might alter sequence identity in IBD cases.[8] The distinction is crucial in population genetics, as IBD directly informs inbreeding effects on homozygosity. Commonly notated as F_I for an individual I, the coefficient ranges from 0, indicating no inbreeding with alleles drawn independently from the population, to 1, signifying complete homozygosity by descent where both alleles are IBD copies of a single ancestral allele.[9] Originating in Mendelian genetics, the concept was formalized by Sewall Wright in the early 20th century to model the genetic consequences of mating patterns in diploid organisms, primarily applying to autosomal loci.[1]Probability Interpretation
The inbreeding coefficient F is fundamentally a probabilistic measure in population genetics, representing the probability that the two alleles at any given locus in an individual are identical by descent (IBD), meaning they are copies of the same ancestral allele inherited through both parents from a common ancestor.[10] This interpretation stems from Sewall Wright's foundational work, where F also quantifies the correlation between the uniting gametes (egg and sperm) that form the zygote, reflecting the degree of genetic similarity due to relatedness between parents. In essence, F captures the likelihood of locus-specific homozygosity arising from pedigree structure rather than random chance. This probabilistic framework directly links F to changes in genetic diversity, particularly heterozygosity. Assuming a base population in Hardy-Weinberg equilibrium with random mating, no selection, no mutation, and no migration altering allele frequencies, the expected heterozygosity H at a biallelic locus with allele frequencies p and q = 1 - p is reduced by inbreeding according to the formula: H = 2pq(1 - F) [11] [12] Here, $2pq represents the heterozygosity under random mating (F = 0), and the term (1 - F) scales it downward, demonstrating how inbreeding systematically erodes heterozygosity and thus overall genetic variation within the population.[10] Correspondingly, F governs the increase in homozygosity. The probability of homozygosity by descent at a locus is precisely F, but the total homozygosity—encompassing both identical-by-descent and identical-by-state (random matching)—is given by: p^2 + q^2 + 2pqF or equivalently, F + (1 - F)(p^2 + q^2) [10] [12] This expression shows that inbreeding elevates homozygosity beyond the baseline random-mating level of p^2 + q^2, with the excess proportional to F and the product of allele frequencies $2pq, under the same assumptions of equilibrium and absence of evolutionary forces.[11]Relation to Related Concepts
The coefficient of inbreeding for an individual, denoted F_I, is mathematically equivalent to the coefficient of kinship between its parents, which quantifies the probability that two alleles, one drawn at random from each parent at a given locus, are identical by descent (IBD). The kinship coefficient, often symbolized as \theta or \phi, thus serves as a pairwise measure of genetic relatedness between any two individuals, whereas the inbreeding coefficient applies specifically to the offspring of such a pair, capturing the elevated risk of homozygosity due to shared ancestry in the parents. This relationship underscores how individual-level inbreeding emerges directly from parental relatedness, without requiring separate computation for the offspring beyond the parental kinship value.[13] Coancestry is a term synonymous with the kinship coefficient in many genetic contexts, referring to the same probability of IBD for alleles sampled from different individuals, and the two are used interchangeably in pedigree and population analyses.[13] For instance, in breeding programs, coancestry matrices are constructed to monitor relatedness across populations, directly informing inbreeding risks for potential matings.[14] This equivalence highlights the interconnectedness of these measures in tracking genetic similarity, though coancestry emphasizes the ancestral contribution to relatedness. The inbreeding coefficient also relates to broader identity coefficients, which describe the various states of allelic identity within or between individuals under models like Jacquard's nine condensed identity states. Specifically, F represents a special case of gametic identity, equivalent to the coefficient \Delta_{AA}, which is the probability that the two alleles at a locus in a single individual are IBD from a common ancestor.[5] In this framework, F focuses on the autozygosity within the individual, distinguishing it from other identity states that might involve alleles from different loci or individuals, and it approximates the recent coalescence probability for the pair of alleles relative to a baseline.[5] In contrast to these individual-focused measures, the fixation index F_{ST} operates at the subpopulation level, quantifying the proportion of total genetic variance attributable to differences among subpopulations rather than within them, as a measure of population structure and differentiation.[15] While F assesses inbreeding within a single entity, F_{ST} (ranging typically from 0 to 1) reflects broader patterns of isolation or gene flow across groups, with no direct equivalence to individual inbreeding but sharing conceptual roots in Wright's F-statistics for homozygosity excess.[15]Calculation Methods
Path Coefficient Method
The path coefficient method, developed by Sewall Wright, provides a foundational approach for calculating the inbreeding coefficient F_I of an individual I by tracing pedigree paths to common ancestors.[1] This method quantifies the probability that two alleles at a locus are identical by descent from a shared ancestor, using path coefficients that represent the contribution of genetic transmission along each lineage segment, typically $1/2 per generation due to Mendelian segregation.[1] The core formula is F_I = \sum_A \left( \frac{1}{2} \right)^{n_1 + n_2 + 1} (1 + F_A), where the sum is over all common ancestors A, n_1 is the number of generations from one parent of I to A, n_2 is the number from the other parent to A, and F_A is the inbreeding coefficient of ancestor A (set to 0 if unknown or unrelated).[1] The exponent n_1 + n_2 + 1 accounts for the path lengths between parents via A plus the additional factor for the two uniting gametes forming I, with the term (1 + F_A) adjusting for any prior inbreeding in A that increases the correlation of alleles at A.[1] To apply the method, first construct a pedigree diagram with arrows indicating generational descent, ensuring paths are traced only through non-inbred loops to avoid circularity. Identify all common ancestors connecting the parents of I, then for each such A, determine the disjoint paths from each parent to A and compute the contribution using the formula, summing across all relevant A. If an ancestor's F_A is needed, calculate it recursively starting from the earliest generations. Arrow diagrams help visualize and prevent double-counting by directing arrows from ancestors to descendants, ensuring each path is unique.[1] In pedigrees with complex loops, such as repeated matings, arrow conventions resolve ambiguities by specifying directionality, allowing systematic enumeration of paths without overcounting contributions from the same ancestral alleles.[1] For a simple derivation in full-sibling mating—where the parents of I are full siblings with unrelated grandparents—the common ancestors are the two grandparents. For each grandparent A, n_1 = 1 (sire to A) and n_2 = 1 (dam to A), with F_A = 0, yielding \left( \frac{1}{2} \right)^{1+1+1} (1 + 0) = \frac{1}{8} per ancestor. Summing over the two grandparents gives F_I = 2 \times \frac{1}{8} = \frac{1}{4}.[1] The method assumes complete and accurate pedigree information, with no alleles identical by state except through descent from traced ancestors, and equal transmission probabilities across generations. Limitations include challenges with incomplete pedigrees, where unknown F_A values may underestimate F_I, and computational intensity for deep or branched pedigrees requiring manual path tracing.[1]Tabular and Computational Methods
The tabular method provides an alternative to path-based approaches for computing inbreeding coefficients by constructing a symmetric matrix of coancestry coefficients (also known as kinship coefficients) among all individuals in the pedigree. To apply this method, one first identifies all relevant ancestors and arranges them in chronological order in a table, filling the matrix recursively: the off-diagonal entry for two individuals is the average of their parents' coancestries, while for base (founder) animals with no known parents, the coancestry between distinct individuals is set to 0 (assuming they are unrelated), and the self-coancestry (diagonal elements) is (1 + F_A)/2, which equals 0.5 if F_A = 0. The inbreeding coefficient F_I for an individual I is then the coancestry between its two parents, extracted directly from the corresponding off-diagonal element of the matrix.[16][17] In quantitative genetics, matrix methods extend this framework using the additive genetic relationship matrix \mathbf{A}, where the diagonal elements satisfy a_{ii} = 1 + F_i, allowing F_i = a_{ii} - 1 once the matrix is constructed. For large pedigrees, \mathbf{A} is computed via recursive algorithms that avoid full matrix inversion by processing individuals sequentially, enabling efficient handling of thousands of entries; direct inversion is reserved for smaller subsets when needed for downstream analyses like BLUP evaluations. These methods scale well for complex structures by incorporating unknown parent groups or phantom parents to approximate base population inbreeding.[18][19] Several software tools implement these tabular and matrix approaches for practical computation. PEDIG, developed for large-scale pedigree analysis, uses recursive algorithms such as those by Meuwissen and Luo (1992) and VanRaden (1992), derived from tabular methods, to calculate inbreeding coefficients and is optimized for populations exceeding 100,000 individuals.[20] CFC employs a tabular algorithm to compute coancestries and inbreeding, with features for ancestral contributions and effective population size estimation, making it suitable for monitoring genetic diversity in livestock.[21] In R, the nadiv package generates the inverse additive relationship matrix \mathbf{A}^{-1} directly, incorporating user-specified base inbreeding and supporting non-additive extensions for efficient processing of pedigrees up to millions of records.[22] These tools enhance scalability for real-world applications compared to manual path tracing.[17][16] Genomic estimation offers a data-driven proxy for the pedigree inbreeding coefficient by leveraging single nucleotide polymorphism (SNP) arrays to measure realized identity-by-descent (IBD) segments. Tools like PLINK's --ibc command compute three estimators (Fhat1, Fhat2, Fhat3) from genotype homozygosity and allele frequencies, providing robust estimates even with incomplete pedigrees by directly observing genomic sharing rather than relying on ancestral paths. This approach is particularly valuable for wild or conserved populations where pedigree records are sparse.[23] Tabular and matrix methods, along with their software implementations, offer advantages over foundational path coefficient techniques by systematically handling incomplete or expansive pedigrees without requiring exhaustive path enumeration, thus reducing computational overhead and errors in complex datasets.[17][16]Examples and Common Values
Pedigree-Based Examples
One common pedigree-based example involves the offspring of full siblings, a case often encountered in selective breeding programs for livestock or plants. Consider a pedigree where two full siblings, designated as sire C and dam D, share common parents (grandparents A and B). The offspring E inherits one allele from C and one from D. Using the path coefficient method, the inbreeding coefficient F_E is calculated by identifying paths connecting C and D through their common ancestors. There are two such paths: one via A (C → A → D, with n=3) and one via B (C → B → D, n=3). Assuming the grandparents are non-inbred (F_A = F_B = 0), each path contributes (1/2)^3 = 0.125, yielding F_E = 0.125 + 0.125 = 0.25.[24] A pedigree diagram for this scenario typically illustrates A and B at the top, connected to C and D below, with E at the bottom linked to C and D, highlighting the looping paths through A and B for step-by-step visualization. Another illustrative case is the offspring of first cousins, relevant in both human genealogy and animal husbandry. In this pedigree, grandparents A and B produce two full sibling offspring: P1 and P2. First cousins C (offspring of P1 and an unrelated mate) and D (offspring of P2 and an unrelated mate) then mate to produce E. The path method identifies two paths connecting C and D through their common grandparents: one via A (C → P1 → A → P2 → D, n=5) and one via B (similarly, n=5). With non-inbred ancestors, each contributes (1/2)^5 = 0.03125, so F_E = 0.03125 + 0.03125 = 0.0625.[24] The diagram would depict A and B at the top, connected to P1 and P2 below, then branching to C and D, and finally to E, with arrows marking the five-link paths for clarity in path summation. In plants capable of self-fertilization, such as many crop species, the first generation of selfed progeny provides a straightforward example of high inbreeding. Here, a non-inbred parent plant (F=0) produces offspring via self-pollination, where both gametes originate from the same individual. The probability that the two alleles in the progeny are identical by descent is 1/2, as the parent transmits one of its two alleles to each gamete, yielding F=0.5 for the first selfed generation.[12] Subsequent generations of continued selfing increase F according to the recurrence F_t = (1 + F_{t-1})/2; for instance, the second generation has F=0.75, the third F=0.875, approaching 1 asymptotically as homozygosity becomes complete. A pedigree diagram might represent the parent as a single node self-looping to the progeny, with generational lines showing the accumulating inbreeding. A more complex scenario arises with the offspring of double first cousins, where multiple common ancestors amplify relatedness, as seen in some isolated populations or breeding lines. Double first cousins occur when two full siblings from one family marry two full siblings from another unrelated family; their children share all four grandparents. If these double first cousins mate, their offspring E has paths through two independent pairs of common great-grandparents. Each pair contributes like a first-cousin path (n=5 per path, 2 × (1/2)^5 = 0.0625 per pair), but with two such pairs, F_E = 2 × 0.0625 = 0.125, equivalent to half the 0.25 relationship coefficient between the parents.[3] The pedigree diagram would show two sibling pairs at the top, branching to four parents (the double cousins), then to E, with dual looping paths through each grandparent pair to emphasize the doubled contributions in path counting.Table of Standard Coefficients
The coefficient of inbreeding (F) quantifies the probability that two alleles at a locus in an individual are identical by descent from a common ancestor, assuming a non-inbred base population. The following table presents standard F values for common pedigree relationships in animals and humans, derived using path coefficient methods. These values assume no prior inbreeding in the ancestors and are applicable to diploid organisms without self-fertilization. For plants capable of selfing, distinct values apply due to reproductive modes.| Relationship of Parents | F Value | Brief Path Explanation |
|---|---|---|
| Unrelated | 0 | No common ancestors; random mating baseline. |
| Half-siblings | 0.125 | One shared parent; single path through that parent (exponent 3: (1/2)^3). |
| Full siblings | 0.25 | Two shared parents; two paths, each through one parent (exponent 3 per path: 2 × (1/2)^3). |
| Parent-offspring | 0.25 | Direct line; path through the shared parent (exponent 2: (1/2)^2). |
| Grandparent-grandchild | 0.125 | Path through the shared grandparent (exponent 3: (1/2)^3). |
| Uncle-niece or aunt-nephew | 0.125 | Shared grandparents; two paths (each exponent 4: 2 × (1/2)^4). |
| First cousins | 0.0625 | Shared grandparents; two paths (each exponent 5: 2 × (1/2)^5). |
| Double first cousins | 0.125 | All four grandparents shared (e.g., sibling pairs marrying); four paths (each exponent 5: 4 × (1/2)^5). |