Polygenic score
A polygenic score (PGS), also designated a polygenic risk score (PRS), constitutes a genomic estimator of an individual's hereditary propensity toward a complex trait or disease, derived by aggregating the imputed effects of myriad single-nucleotide polymorphisms (SNPs) weighted by their marginal associations ascertained in genome-wide association studies (GWAS).[1] This summation presupposes an additive genetic model, wherein the score approximates the narrow-sense heritability captured by common variants, enabling probabilistic forecasts of phenotypic outcomes from genotypic data alone.[2] PGS methodologies have yielded substantial predictive efficacy for select quantitative traits, such as adult height, wherein scores derived from large-scale GWAS elucidate up to 38-40% of interindividual variance within European-ancestry samples, surpassing prior monogenic paradigms and facilitating empirical validation against observed phenotypes.[3] For polygenic diseases including coronary artery disease and type 2 diabetes, PRS augment conventional risk models by quantifying latent genetic liabilities, with demonstrated improvements in net reclassification indices and area under the curve metrics in validation cohorts.[4] Notwithstanding these accomplishments, PGS confront inherent constraints, including modest liability explained for most endpoints (frequently under 10% beyond height) and pronounced efficacy disparities across ancestries, attributable to heterogeneous linkage disequilibrium structures and variant frequency spectra rather than ascertainment artifacts.[5] Such transferability deficits, empirically verified in diverse biobanks, underscore the imperative for ancestry-comprehensive GWAS to attenuate predictive biases, while applications to behavioral phenotypes like educational attainment—explaining 12-16% of variance—illuminate causal genetic contributions amid pervasive gene-environment interplay, albeit provoking scrutiny over interpretive overreach and societal ramifications.[6][7][8]Definition and Fundamentals
Core Definition and Principles
A polygenic score (PGS), often termed a polygenic risk score (PRS) when applied to disease outcomes, quantifies an individual's genetic predisposition to a complex trait or condition by summing the effects of numerous genetic variants, typically single nucleotide polymorphisms (SNPs), across the genome.[9] These variants are selected and weighted based on their statistical associations with the trait, as identified in large-scale genome-wide association studies (GWAS).[10] Unlike monogenic scores focused on rare, high-impact mutations, PGS capture the cumulative influence of common variants with small individual effects, reflecting the polygenic architecture underlying most heritable human phenotypes.[11] The core construction principle involves deriving effect size estimates (\hat{\beta}_j) for each variant j from GWAS regression analyses, which model the trait as a function of genotype dosages while adjusting for population structure and other confounders.[12] For an individual, the PGS (\hat{S}) is then computed as \hat{S} = \sum_{j=1}^{m} X_j \hat{\beta}_j, where X_j represents the genotype dosage (e.g., 0, 1, or 2 for additive models of allele count) and m is the number of included variants, often pruned for linkage disequilibrium to avoid redundancy.[12] This linear aggregation assumes additive genetic effects without epistasis or dominance, enabling the score to predict relative risk compared to a reference population rather than absolute incidence, as environmental factors and gene-environment interactions remain unaccounted for.[13] Fundamental principles hinge on empirical validation of heritability: traits with higher narrow-sense heritability (e.g., height at ~80% or schizophrenia at ~20-30%) yield more predictive PGS, explaining a fraction of phenotypic variance proportional to the GWAS sample size and variant discovery power.[14] Transferability across ancestries is limited by allele frequency differences and linkage disequilibrium patterns, necessitating ancestry-matched reference data for accurate application.[15] PGS thus serve as probabilistic genomic summaries, with predictive utility scaling with discovery cohort size—doubling effective sample sizes can increase explained variance by up to 50% in simulations—but they do not imply determinism, as scores typically account for less than 20% of trait variance even in optimized cases.[9]Historical Origins and Key Milestones
The concept of polygenic scores originates in the foundations of quantitative genetics, articulated by Ronald A. Fisher in 1918, who proposed the infinitesimal model whereby continuous traits arise from the additive effects of innumerable genetic loci each contributing small increments to phenotypic variation. This framework shifted understanding from Mendelian single-gene dominance to polygenic inheritance as the norm for complex traits, enabling estimates of heritability from family and twin studies without molecular data. The methodological precursor to modern polygenic scores emerged in agricultural genetics through genomic selection techniques. In 2001, Theo H.E. Meuwissen, Ben J. Hayes, and Mike E. Goddard demonstrated the feasibility of predicting total genetic merit in livestock using dense genome-wide markers, aggregating effects across thousands of single nucleotide polymorphisms (SNPs) via statistical models rather than identifying causal variants individually. This approach, termed genomic estimated breeding values, leveraged linkage disequilibrium and population-level data to achieve practical predictive accuracy, setting the stage for similar applications beyond breeding. Transition to human genetics coincided with the rise of genome-wide association studies (GWAS) in the mid-2000s. A pivotal 2007 paper by Naomi R. Wray, Mike E. Goddard, and Peter M. Visscher explicitly proposed polygenic risk prediction for human diseases, arguing that GWAS summary statistics could sum weighted effects from all tested SNPs—including those below conventional significance thresholds—to capture cumulative polygenic liability, far surpassing single-locus risks. This marked the formal inception of polygenic scores (or risk scores) in human contexts, emphasizing their utility for stratifying individual risk in polygenic disorders. Key empirical validation followed swiftly: the International Schizophrenia Consortium's 2009 study constructed the first genome-wide polygenic risk scores for schizophrenia, using discovery GWAS data to predict case-control status in independent cohorts with modest but significant accuracy (explaining ~3% of liability), confirming the polygenic architecture and cross-sample transferability. Concurrently, early applications appeared for traits like height and prostate cancer, though limited by nascent GWAS sample sizes; by 2010, scores for height demonstrated prediction of ~40% of heritability in Europeans using common SNPs. These milestones underscored the shift from hypothesis-driven candidate genes to data-driven, genome-wide aggregation, fueling rapid methodological refinements amid expanding genomic datasets.Methodological Foundations
Construction via Genome-Wide Association Studies
Polygenic scores, also known as polygenic risk scores (PRS), are constructed primarily from summary statistics obtained through genome-wide association studies (GWAS), which scan millions of single nucleotide polymorphisms (SNPs) across the genome to identify those statistically associated with a quantitative trait or disease risk. In a GWAS, linear or logistic regression models, assuming additive genetic effects, estimate the per-allele effect size \hat{\beta}_j for each SNP j, along with standard errors and p-values, using data from a large discovery cohort. These effect sizes represent the change in phenotype per additional copy of the effect allele.[16][17] The core computation of a PRS for a target individual aggregates the weighted contributions of selected SNPs from the GWAS summary statistics. The standard formula is \hat{S} = \sum_{j=1}^{m} X_j \hat{\beta}_j, where X_j is the genotype dosage (0, 1, or 2, indicating the number of effect alleles at SNP j), and \hat{\beta}_j is the GWAS-derived effect size; for binary traits, \hat{\beta}_j approximates the log-odds ratio. Genotype dosages for target individuals are typically obtained from imputed or genotyped data aligned to the same reference genome build as the GWAS. This summation assumes independence or accounts for linkage disequilibrium (LD) through preprocessing.[16][17] SNP selection is critical to balance signal inclusion and noise reduction. The most common approach, clumping and thresholding (C+T), first prunes SNPs in high LD—using reference panels like 1000 Genomes to retain the most significant SNP per LD block (e.g., r^2 < 0.1 within 250 kb windows)—then applies p-value thresholds (e.g., P < 5 \times 10^{-8} or a spectrum from P < 0.05 to genome-wide significance) to include variants. This reduces multicollinearity and overfitting, though it may discard weak signals. Software tools such as PRSice-2 automate C+T, processing summary statistics and target genotypes to output PRS values, often evaluating multiple thresholds for optimal predictive performance.[16] Advanced methods enhance construction by modeling LD and sparsity. Bayesian approaches like LDpred assume a fraction of SNPs (e.g., 1-10%) are causal and shrink effect sizes toward zero using a point-normal prior, integrating MCMC sampling over LD patterns from reference data; parameters such as the causal proportion \pi are tuned via cross-validation or summary-statistic-based proxies. These outperform C+T in simulations and real data when discovery sample sizes exceed 100,000, capturing polygenic architecture more accurately. Preconstruction quality control ensures allele harmonization (matching effect alleles), exclusion of ambiguous palindromic SNPs, and removal of population stratification artifacts, with discovery-target sample overlap minimized to avoid overfitting. Larger, more diverse GWAS (e.g., effective N > 1 million) yield robust weights, as demonstrated in traits like height where PRS explain up to 20-40% of variance.[17][16]Statistical Weighting and Modeling Techniques
Polygenic scores aggregate the effects of multiple genetic variants by assigning weights corresponding to their estimated associations with the target trait, typically derived as standardized beta coefficients (\hat{\beta}_j) from linear regression models in genome-wide association studies (GWAS).[16] These weights reflect the per-allele change in the trait, adjusted for minor allele frequency and often standardized to have unit variance for comparability across scores.[5] The resulting score for an individual is computed as \hat{S} = \sum_{j=1}^{m} X_j \hat{\beta}_j, where X_j denotes the genotype dosage (0, 1, or 2 for the effect allele) across m selected variants, assuming additive effects under a linear model.[16] To address overfitting from finite GWAS sample sizes and linkage disequilibrium (LD) among variants, which can inflate weights for correlated SNPs, basic construction often employs pruning and thresholding (P+T).[16] In P+T, SNPs are first pruned to retain those with the strongest associations while removing others within a defined LD window (e.g., r^2 < 0.1 within 250 kb), followed by including only those surpassing a p-value threshold (e.g., p < 0.05 or p < 5 \times 10^{-8}).[16] This heuristic reduces noise but discards potentially useful signals from non-significant variants, limiting predictive accuracy in polygenic architectures where many effect sizes are small.[18] Advanced Bayesian methods enhance weighting by incorporating prior distributions on effect sizes to shrink estimates toward zero, accounting for polygenicity and LD patterns estimated from a reference panel.[19] LDpred, introduced in 2015 and refined as LDpred2 in 2020, models SNPs as drawn from a mixture of null and causal effects under an infinitesimal prior (all variants causal with small effects), using MCMC or variational inference to compute posterior means as weights; empirical evaluations show it outperforming P+T by 10-50% in explained variance for traits like height and schizophrenia in European-ancestry cohorts.[20] Similarly, PRS-CS (2019) applies continuous shrinkage priors via approximate Bayesian computation on genome-wide summary statistics, enabling efficient handling of millions of variants in blocks and yielding superior transferability across ancestries compared to LDpred in simulations and real data for diseases like type 2 diabetes.[21] Other modeling approaches include sparse Bayesian regression variants like SBayesR, which partitions effect sizes into normal mixtures for multi-component shrinkage, and emerging machine learning techniques such as deep learning-based PRS that integrate non-linear interactions, though these remain computationally intensive and less validated for out-of-sample prediction as of 2024.[19][22] Parameter tuning in these methods, such as the proportion of causal variants in LDpred or shrinkage intensity in PRS-CS, critically influences performance; recent frameworks like PRStuning (2024) optimize these using cross-validation on GWAS statistics alone to maximize held-out R² without target cohort data.[17] Despite advances, all techniques assume GWAS betas are unbiased estimators, yet winner's curse and population stratification can bias weights upward, necessitating large discovery samples (e.g., >100,000 individuals) for robust scores.[23]Ancestry-Specific Adjustments and Transferability Issues
Polygenic scores (PGS) derived predominantly from genome-wide association studies (GWAS) in European-ancestry populations exhibit reduced predictive accuracy when applied to individuals of non-European ancestries, primarily due to differences in linkage disequilibrium (LD) patterns, allele frequencies, and underlying genetic architectures across populations.[15][24] For instance, LD and minor allele frequency (MAF) disparities between European and African ancestries account for 70-80% of the observed loss in PGS relative accuracy for various traits.[24] This transferability gap manifests as a systematic decline in explained variance, with PGS accuracy decreasing by an average of 14% across 84 traits when evaluated along a genetic ancestry continuum away from the training data.[15] Environmental and demographic factors further complicate transferability, as gene-environment interactions and population stratification can introduce biases not captured by ancestry principal components alone.[25] Studies across diseases like coronary artery disease and Alzheimer disease demonstrate that European-derived PGS retain only partial efficacy in East Asian, South Asian, or African cohorts, often explaining less than half the variance compared to within-ancestry applications.[26][27] Multi-ancestry evaluations for 14 common conditions confirm this pattern, showing superior performance of ancestry-matched PGS over transferred models in Africans, Europeans, East Asians, and South Asians.[28] To mitigate these issues, ancestry-specific adjustments involve constructing PGS using GWAS summary statistics tailored to the target population's genetic background, such as through ancestry-stratified analyses or multi-ancestry meta-GWAS that harmonize effect sizes while accounting for heterogeneity.[29] Methods like explicit ancestry modeling via principal component regression or Bayesian approaches (e.g., PRS-CS) enhance portability by incorporating local LD reference panels and adjusting weights for admixed populations, yielding up to 20-30% improvements in non-European prediction accuracy for traits like body mass index.[30][31] Leveraging diverse global GWAS resources, such as combining European and non-European data, further boosts PGS performance in underrepresented groups by aligning causal variants more closely with target ancestries.[32] However, persistent challenges include the scarcity of large-scale non-European GWAS—e.g., African-ancestry samples remain underrepresented by orders of magnitude—necessitating ongoing efforts in data diversification to achieve equitable PGS utility.[33][34]Empirical Predictive Power
Performance in Disease Risk Assessment
Polygenic risk scores (PRSs) exhibit modest predictive performance for common diseases, typically yielding area under the receiver operating characteristic curve (AUC) values of 0.55 to 0.70 in independent validation cohorts, reflecting their capture of a limited portion of genetic liability amid multifactorial etiology.[35] When integrated with established clinical risk models, PRSs provide incremental improvements, such as a 10% average increase in AUC across diseases including breast cancer, type 2 diabetes, and coronary artery disease (CAD).[35] However, standalone PRSs often underperform in population-level applications, with median detection rates of 11% at a 5% false positive rate, resulting in 89% of cases missed and post-test odds ratios that confer only marginal elevation over baseline risks (e.g., 1:8 versus 1:19 for CAD at age 50).[36] For CAD, a 2023 multi-ancestry PRS demonstrated a hazard ratio of 1.73 (95% CI 1.70–1.76) per standard deviation increment in UK Biobank data, with individuals in the top PRS percentile facing 11.7% incident risk compared to 1.1% in the bottom percentile; addition to the Pooled Cohort Equations enhanced the C-index from 0.739 to 0.763 and yielded a net reclassification improvement of 7.0%.[37] Performance holds across ancestries, with odds ratios per standard deviation of 1.25 in African, 1.72 in East Asian, 1.74 in European, and 1.62 in South Asian groups, though absolute gains remain tempered by environmental confounders.[37] In type 2 diabetes, PRSs achieve AUCs from 0.576 to 0.795, boosting clinical model AUC from 0.699 to 0.74 excluding BMI, while explaining 10–15% of variance; breast cancer PRSs range 0.596–0.68 AUC with similar additive effects in models like BOADICEA (0.691 to 0.704); prostate cancer yields 0.640–0.68 AUC and odds ratios up to 2.9 in high-risk strata; Parkinson's disease PRSs span 0.616–0.692 AUC but explain only 5–10% variance.[35] These metrics underscore PRSs' role in refining risk within intermediate categories rather than supplanting traditional factors, with efficacy scaling alongside genome-wide association study sample sizes but constrained by incomplete heritability capture and ancestry-specific portability.[35][36]Larger discovery sample sizes correlate with enhanced PRS accuracy across traits, including diseases, as evidenced by converging prediction metrics.[38]