World Inequality Database
The World Inequality Database (WID) is an open-access repository providing historical data on the global distribution of income and wealth, covering over 100 countries and regions from the early 19th century to the present, with estimates of metrics such as top income and wealth shares derived from integrated sources including national accounts, household surveys, fiscal records, and wealth rankings.[1] Launched as the successor to the World Top Incomes Database in 2017, it is maintained by the World Inequality Lab, an international consortium of over 100 researchers coordinated by economists Facundo Alvaredo (income data), Gabriel Zucman (wealth data), Thomas Piketty, Emmanuel Saez, and others, with the aim of enabling transparent, long-run analysis of economic inequality to inform policy debates.[2][3] The database employs the Distributional National Accounts (DINA) framework, which distributes aggregate national income and wealth across population deciles and percentiles, addressing limitations in traditional surveys—such as underreporting of top incomes—by prioritizing fiscal data for high-end estimates while anchoring to comprehensive national totals.[4] This approach has facilitated influential publications, including the World Inequality Report 2022, which synthesized WID data to argue for progressive taxation amid rising global wealth concentration, influencing discussions on redistribution in academic and public spheres.[5] However, the methodology has drawn criticism for relying on sparse or inconsistent historical tax records, potentially leading to unreliable extrapolations of top shares, as noted in peer-reviewed analyses questioning the superiority of fiscal data over survey-based alternatives in certain contexts.[6][7] Despite such debates, WID remains a primary resource for inequality researchers due to its breadth and code transparency via GitHub, though users are cautioned to verify country-specific assumptions given varying data quality across regions.[4]Overview
Purpose and Scope
The World Inequality Database (WID) functions as an open-access repository aggregating extensive historical data on the distribution of income and wealth, both within countries and at global scales, to support empirical analysis of economic disparities. Its core purpose is to deliver transparent, harmonized series that reveal long-term trends, such as rising top income shares in many nations since the 1980s, by combining national accounts totals with distributional evidence from fiscal records, surveys, and other sources to overcome limitations in isolated datasets.[3][4] In scope, WID covers pre-tax and post-tax income distributions, including thresholds, averages, and shares for percentiles from the bottom 50% to the top 1% and 0.1%, alongside wealth metrics like total holdings, top shares, and wealth-to-income ratios. It encompasses over 100 countries and regions—spanning Europe, North America, Asia, Latin America, Africa, and sub-national units such as U.S. states or urban-rural divides in China—with time series often extending from the early 20th century or earlier (e.g., global top 10% income shares tracked from 1820 to 2020) up to recent years like 2023.[3][8][9] Originally focused on top incomes and wealth, the database's 2018 rebranding from World Wealth and Income Database reflected ambitions to broaden beyond these to dimensions like gender and environmental inequalities, though income and wealth series predominate and form the basis for initiatives such as Distributional National Accounts. This expansion prioritizes methodological consistency for cross-country comparability, with explicit documentation enabling user verification and contributions amid ongoing refinements to data for regions with sparse records.[10][11]Organizational Context
The World Inequality Database (WID) is developed and maintained by the World Inequality Lab (WIL), an academic consortium dedicated to research on economic inequality. The Lab is primarily hosted at the Paris School of Economics (PSE) in France, with key affiliations at the University of California, Berkeley, reflecting a transatlantic academic structure.[12][13][14] WIL's executive committee comprises five co-directors: Facundo Alvaredo (PSE, University of Oxford, and Instituto Interdisciplinario de Economía Política de Buenos Aires), Thomas Piketty (PSE), Emmanuel Saez (UC Berkeley), Gabriel Zucman (UC Berkeley), and Lucas Chancel (PSE). These economists, recognized for pioneering work on historical income and wealth distributions using fiscal and national accounts data, oversee the project's coordination and methodological standards.[2] The core team includes about 40 members, encompassing co-directors, project coordinators, research fellows, assistants, and administrative staff, supported by over 200 WID Fellows affiliated with global institutions. This network extends to more than 90 researchers across nearly 70 countries, facilitating data contributions and validation through collaborative efforts grounded in distributed income and national accounts (DINA) guidelines.[13][2][14] As an open-access initiative, WIL operates without formal corporate or governmental funding dependencies, relying instead on academic grants and institutional support from entities like PSE and CNRS, though its directors' advocacy for progressive fiscal policies—such as global wealth taxes—has drawn scrutiny for potential interpretive biases in inequality narratives amid academia's documented left-leaning orientations.[12][15]Historical Development
Predecessors and Foundations
The methodological foundations of the World Inequality Database trace back to pioneering research in the late 1990s and early 2000s that revived the use of historical administrative tax records to estimate income and wealth concentration, addressing limitations in household surveys that often underreport top earners. Thomas Piketty's analysis of French income inequality from 1901 to 1998, published in 2001, demonstrated how fiscal data tabulations could reconstruct long-term trends in top income shares, revealing patterns of rising inequality post-World War II followed by compression and resurgence. Similar approaches were applied to the United States by Piketty and Emmanuel Saez in 2003, estimating top income shares using IRS data back to 1913, which highlighted the U-shaped trajectory of inequality over the 20th century. Anthony B. Atkinson's contemporaneous work on the United Kingdom and other European countries further established this fiscal data paradigm, building on Simon Kuznets' earlier 1950s framework but correcting for postwar data gaps and biases in national accounts. These country-specific studies laid the groundwork for systematic international comparison, as researchers recognized the need to aggregate comparable series across nations to analyze global inequality dynamics. By the mid-2000s, collaborative efforts had produced harmonized top income estimates for over a dozen advanced economies, emphasizing pre-tax income distributions and the role of progressive taxation in shaping inequality.[16] This accumulation of evidence challenged reliance on Gini coefficients from surveys, which academic critiques noted systematically underestimate top shares due to non-response and underreporting among the wealthy.[11] The direct predecessor to the World Inequality Database was the World Top Incomes Database (WTID), launched in January 2011 to centralize and freely disseminate these top income series from approximately 30 countries, covering periods from the early 20th century onward.[3] The WTID facilitated user-friendly access to raw and processed data, enabling cross-country analyses that revealed convergent declines in top income shares during mid-20th-century shocks like wars and policy reforms, followed by divergences in the neoliberal era.[3] Its creation stemmed from the collaborative input of over a hundred researchers, prioritizing transparency in data sources—primarily tax returns and national accounts—over opaque survey aggregates. This foundation emphasized causal links between institutional changes, such as tax policy shifts, and distributional outcomes, providing a benchmark for subsequent expansions into wealth and bottom-end distributions.[16]Launch and Key Milestones
The World Top Incomes Database (WTID), the direct predecessor to the World Inequality Database, was launched in January 2011 by economists Facundo Alvaredo, Anthony B. Atkinson, Thomas Piketty, and Emmanuel Saez to provide open access to historical data on top income shares across multiple countries, building on earlier national-level studies of income concentration.[3][2] This initiative aggregated fiscal, survey, and national accounts data, initially covering around 30 countries with series extending back to the early 20th century in some cases, such as France and the United States.[4] In 2015, the database expanded to include aggregate national wealth series, incorporating balance sheet data alongside income metrics to track the joint evolution of income and wealth inequality, with coverage growing to dozens of countries.[4] This extension addressed prior limitations in wealth measurement, drawing from household surveys, inheritance records, and financial asset valuations, though it highlighted challenges in underreporting of offshore assets and non-financial holdings in developing economies.[4] WID.world, the relaunched and broadened platform, officially debuted on January 9, 2017, succeeding the WTID with an interactive website featuring data on income and wealth distributions for over 70 countries, including developing nations like China, India, and South Africa, and extending analysis beyond top earners to the full income spectrum.[17] The launch, coordinated by the same core team plus Gabriel Zucman, coincided with presentations at the American Economic Association conference and followed the heightened interest spurred by Piketty's 2014 analysis of rising top income shares.[17][2] Between 2016 and 2019, key methodological advancements included the integration of Distributional National Accounts (DINA) guidelines, enabling estimates of pre- and post-tax income and wealth for the entire population in over 100 countries or regions.[4] Subsequent milestones encompass annual macro updates, such as the 2024 revision extending series to 2023 with new government expenditure breakdowns, and the September 2025 release of a global wealth accumulation database spanning 1800 to 2025, revealing wealth-income ratios rising from 390% of net domestic product in 1980 to over 625% by 2025.[18][19]Methodology
Data Sources and Integration
The World Inequality Database (WID.world) primarily draws from national accounts aligned with System of National Accounts (SNA) 2008 standards, fiscal data including income tax tabulations and micro-files, household survey microdata and tabulations, and wealth data such as rich lists and inheritance records.[11][20] Supplementary sources include datasets from the International Monetary Fund (IMF), Organisation for Economic Co-operation and Development (OECD), United Nations Mutual Assistance in Development and Trade (UN MADT), and Eurostat, alongside country-specific research contributions.[11] These inputs enable coverage of pre-tax national income concepts, such as factor incomes, replacement incomes, and capital incomes, while addressing gaps in individual sources like survey underreporting of top incomes.[21] Integration occurs through the Distributional National Accounts (DINA) framework, which harmonizes disparate sources by rescaling fiscal and survey data to match aggregate national accounts totals, ensuring consistency between micro-level distributions and macro-level benchmarks like gross domestic product (GDP) adjusted for depreciation and net foreign income.[11][20] This process involves splicing overlapping series—calculating average discrepancies as fractions of GDP over shared periods and correcting lower-priority data—and imputing missing elements, such as consumption of fixed capital via log-log regressions on GDP per capita or social contributions proportionally.[11] Standardization applies uniform units, such as equal-split adults for household equivalization, and deflates series using GDP deflators for real-term comparability.[11] Fiscal data informs top income and wealth estimates, surveys provide middle and bottom distributions, and national accounts anchor totals, with enforcement of accounting identities via quadratic programming tools like the Stataenforce command to minimize residuals.[11]
For top shares, where direct data is sparse, integration employs interpolation techniques like generalized Pareto curves and copula-based methods to extend tax tabulations, alongside extrapolation using synthetic micro-files that normalize distributions to a mean of unity for cross-country alignment.[11][20] Wealth distributions are derived via income capitalization, assuming asset-specific returns to convert capital income flows into stock estimates, supplemented by survey data for non-taxable assets like owner-occupied housing and adjustments for offshore holdings based on external studies.[20] Undistributed profits are allocated to shareholders proportionally, taxes to factor income bearers, and in-kind transfers via lump-sum or proportional rules, with full processes documented in open-source code repositories for reproducibility.[11] This multi-source approach yields g-percentile series (e.g., 127 brackets) but introduces estimation uncertainties, particularly for historical or non-reporting periods, mitigated through sensitivity analyses and methodological updates.[11][20]
Estimation Methods for Top Incomes and Wealth
The World Inequality Database (WID) employs the Distributional National Accounts (DINA) framework to estimate top income shares, integrating fiscal data from income tax tabulations with national accounts totals to ensure macroeconomic consistency.[4] Income tax data, which often capture detailed upper-tail information, are adjusted for underreporting—such as evasion or non-filing—through scaling factors derived from audits or cross-country comparisons, and extrapolated using generalized Pareto curves to model the distribution beyond observed thresholds.[16] These curves assume a power-law tail behavior, parameterized by shape and threshold estimates from available fiscal records, allowing interpolation between percentiles (e.g., from the top 1% to 0.1%) and anchoring to pre-tax national income aggregates like GDP minus consumption of fixed capital plus net foreign income.[11] Retained corporate earnings and other missing components are proportionally allocated to top earners based on their observed shares in fiscal data or reduced-form models.[11] For countries with limited fiscal coverage, WID splices survey data with tax tabulations using overlapping periods, applying functions like survey-to-fiscal corrections (e.g., c_2(p) = 1 + \sigma p^{1/\gamma}) to address survey underrepresentation of top incomes, which typically underestimate top 1% shares by factors of 2–3 compared to tax records.[11] The gpinter tool facilitates this by generating synthetic distributions from grouped data, enforcing consistency via quadratic programming to minimize deviations from national totals.[11] Historical series, extending back to the early 20th century in many cases, build on tabulation-based approaches pioneered in works like those of Kuznets but updated with microdata where available.[4] Top wealth shares in WID are estimated using a sparser set of sources, including inheritance tax records via the estate multiplier method, which reconstructs wealth distributions from decedent estates adjusted by age-specific mortality rates to approximate living populations.[16] This method, originally formalized by Atkinson and Harrison, multiplies observed estate values by inverse mortality multipliers (e.g., 1/mortality rate at age) and interpolates the upper tail with Pareto assumptions, though it requires corrections for incomplete reporting and valuation biases in historical data.[16] Wealth surveys, such as the U.S. Survey of Consumer Finances, provide benchmarks but are linked to administrative data for top-end accuracy, while billionaire rankings from sources like Forbes calibrate the extreme tail (top 0.001%), assuming Pareto extrapolation for shares between survey cutoffs and ranking thresholds.[4] National balance sheets from accounts supply aggregate wealth stocks, distributed via these micro-level inputs, with imputations for data gaps using weighted averages from peer countries or growth extrapolations.[22] Wealth estimates incorporate flow reconciliations, such as capital gains imputed from asset price indices and savings rates, to align stock changes with income flows under DINA principles.[11] Limitations include greater reliance on assumptions for wealth than income due to rarer direct taxation—e.g., only a few countries maintain ongoing wealth taxes—leading to higher uncertainty in emerging economies, where surveys dominate but undercapture assets like real estate held offshore.[4] Recent advancements, as of 2025, include estate-based refinements and global imputations, but critics note potential overestimation if Pareto tails prove too steep or underreporting adjustments insufficient for hidden wealth in tax havens.[23][16]Assumptions, Adjustments, and Known Limitations
The World Inequality Database (WID) employs the Distributional National Accounts (DINA) framework, which assumes net national income—adjusted for depreciation and net foreign income—as the primary aggregate for distribution, rather than gross domestic product, to better reflect household resources.[4] It further assumes an "equal-split adults" unit of analysis, where income and wealth within couples are divided equally among adults aged 20 and older, though alternatives like individualistic attribution exist for sensitivity checks.[11] Proportional allocation is assumed for undistributed elements such as missing capital incomes or indirect taxes, unless country-specific data indicates otherwise, and fixed rates of return are applied in wealth capitalization absent better evidence.[16] Adjustments to raw data emphasize comparability across sources: fiscal records are corrected for underreporting by rescaling to national accounts totals and imputing offshore wealth using profit-shifting estimates, while surveys are reweighted and expanded at the top to align with macro aggregates via methods like generalized Pareto interpolation.[4] For top income shares, fiscal tabulations are stitched with survey microdata, and wealth estimates incorporate adjustments for tax-exempt assets and foreign holdings, often drawing on estate tax multipliers or income-wealth correlations.[11] These harmonizations use open-source codes for transparency, such as those enforcing consistency between micro- and macro-data.[4] Estimation of top incomes relies heavily on fiscal data, which captures high earners missed by self-reported surveys, combined with national accounts for full coverage; top wealth shares are derived via estate multipliers from inheritance records, capitalization of reported incomes, and rich lists for the uppermost fractiles.[16] In data-scarce contexts, regional benchmarks or synthetic controls are imputed, assuming stable inequality structures within peer groups.[4] Known limitations include sparse fiscal coverage in developing countries, where informal economies and limited tax enforcement lead to gaps affecting up to 85% of country-years for comprehensive data, prompting provisional imputations prone to revision.[11] Wealth series face greater uncertainty due to inconsistent sources, with estate-based methods potentially underestimating inequality if evasion distorts records, and survey-fiscal hybrids risking over- or under-correction for top underreporting.[16] External critiques highlight tax data's unreliability from policy-induced discontinuities—like U.S. 1986 reforms inflating reported shares—or sparsity in regions such as sub-Saharan Africa, potentially overstating global top shares when extrapolated from high-data nations like France and the UK.[6] WID acknowledges these as imperfect, urging users to consult country-specific notes and codes for robustness checks.[4]Data Content
Inequality Metrics and Indicators
The World Inequality Database (WID) primarily measures income inequality through percentile-based shares of pre-tax national income, such as the top 1% share (p99p100), which captures the proportion of total income accruing to the richest 1% of adults after imputing missing top incomes from tax records and surveys.[4] These shares are derived using Distributional National Accounts (DINA) guidelines, which integrate national accounts totals with micro-level distributions to ensure consistency with macroeconomic aggregates like GDP.[11] For instance, global top 10% income shares have historically ranged from 50% to 60% between 1820 and 2020, while bottom 50% shares hovered at 5-15%.[9] Wealth inequality indicators in WID focus on net personal wealth shares, including the top 10% wealth share (p90p100), calculated by combining household balance sheets, inheritance flows, and capital income imputations to address survey undercoverage of high-wealth individuals.[4] Thresholds and averages provide additional granularity, such as the minimum wealth required to enter the top 1% or the average wealth of the top 0.1%, expressed in 2011 purchasing power parity dollars for cross-country comparability.[24] Post-tax metrics adjust pre-tax distributions for fiscal interventions, revealing redistribution effects; for example, in many countries, taxes and transfers reduce top income shares by 20-30 percentage points.[25] Synthetic indices like the Gini coefficient are available for both income and wealth, scaled from 0 (perfect equality) to 1 (perfect inequality), though WID emphasizes share metrics for their transparency in highlighting top-end concentrations, as Gini can mask extreme disparities due to its quadratic sensitivity to the middle of the distribution.[26] Multipliers, such as the top 10% to bottom 50% income ratio, quantify relative gaps, often exceeding 10:1 globally in recent decades.[5] All indicators cover over 100 countries from the 19th century to projections through 2025, with global aggregates treating the world as a single unit under uniform adult population weights.[24]Global and National Coverage
The World Inequality Database (WID) aggregates national-level data to produce global estimates of income and wealth distribution, covering the majority of the world's population through harmonized series derived from national accounts, fiscal records, and surveys. As of the 2024 update, these global series incorporate data from 216 countries, representing comprehensive tracking of trends such as the share of income accruing to the top 1% worldwide, with historical depth extending to the early 20th century for key metrics in aggregated form. Global wealth inequality estimates, in particular, rely on imputations and methods like estate multipliers for periods where direct data is sparse, enabling analysis from the mid-20th century onward, though pre-1980 coverage remains partial due to inconsistencies in source reporting across nations.[27][22] At the national level, WID provides detailed inequality indicators for over 200 countries, with varying temporal scopes based on data availability: long-run series for Western European countries and the United States often begin in the 1800s or early 1900s, utilizing historical tax records and Pareto interpolation for top income shares, while coverage for many Latin American, African, and Asian nations starts post-1950 or later, supplemented by household surveys and adjusted national accounts. For instance, France's income data extends back to 1900, enabling examination of interwar inequality dynamics, whereas recent additions for smaller economies like those in sub-Saharan Africa incorporate post-2000 fiscal data to address gaps in survey underreporting of top incomes. The database's national coverage emphasizes Distributional National Accounts (DINA) harmonization, which reconciles fiscal and survey sources to mitigate biases such as top-end undercoverage in self-reported data, though limitations persist in regions with opaque wealth registries or political instability, resulting in shorter or interpolated series.[27][4][11]| Region/Example Countries | Typical Historical Start | Key Data Types |
|---|---|---|
| Western Europe (e.g., France, UK) | 1800s–1910s | Income/wealth shares from tax tabulations, national accounts |
| North America (e.g., USA) | 1913 onward | Top income fractiles via IRS data, wealth from estate records |
| Latin America (e.g., Brazil, Argentina) | 1930s–1960s | Survey-adjusted incomes, recent fiscal leaks for high-end |
| Africa/Asia (e.g., South Africa, India) | 1960s–1980s | Post-colonial surveys, imputations for top wealth |
| Global Aggregates | Early 1900s (partial); 1980s (fuller) | Harmonized DINA for income; imputations for wealth |