g -index
The g-index is a bibliometric measure designed to quantify the productivity and citation impact of a researcher's body of work, proposed by Leo Egghe in 2006 as an extension of the h-index.[1] It is defined as the largest integer g such that the researcher's top g most-cited publications have collectively received at least g2 citations when ranked in descending order of citations.[1] This formulation emphasizes the total citation performance of an author's most influential papers, providing a single numeric value that balances quantity of output with qualitative impact.[2] Unlike the h-index, which requires each of the h highest-cited papers to individually have at least h citations, the g-index aggregates citations across the top g papers, thereby assigning greater weight to exceptionally highly cited works and often yielding a higher value than the h-index for the same publication set.[3] Egghe introduced the g-index in his paper "Theory and practise of the g-index," published in Scientometrics, to address limitations in the h-index by better capturing "global citation performance" for researchers with skewed citation distributions.[1] The index is particularly sensitive to highly productive authors or those with a few standout publications, making it useful for distinguishing nuanced differences in impact among elite scholars.[4] In practice, the g-index is computed using citation databases such as Google Scholar, Scopus, or Web of Science, where publications are sorted by citation count and cumulative totals are checked against the g2 threshold.[5] It has been integrated into academic evaluation frameworks, tenure reviews, and ranking systems as a complementary metric to the h-index, though critics note its tendency to produce inflated values compared to the h-index, which can complicate direct comparisons across disciplines.[6] Despite these considerations, the g-index remains a key tool in scientometrics for assessing long-term scholarly influence, with ongoing refinements and variants explored in subsequent research.[7]Overview
Definition
The g-index is a bibliometric indicator that measures the citation impact of a researcher's body of work by identifying the largest number g such that the top g most-cited publications collectively account for at least g^2 citations.[8] This metric ranks publications in decreasing order of their citation counts, emphasizing the overall productivity and influence captured by highly cited works within the core set.[8] Here, g serves dual roles: it denotes both the quantity of top publications considered and the squared threshold for their minimum total citations, providing a balanced view of quantity and quality in scholarly output.[8] Formally, given a researcher's publications ordered by decreasing citation counts c_1 \geq c_2 \geq \dots \geq c_n, the g-index is the maximum integer g satisfying \sum_{i=1}^g c_i \geq g^2. [8] This formulation extends concepts like the h-index, which it always meets or exceeds (g \geq h), by incorporating the full citation strength of the top papers rather than a uniform threshold.[8]Purpose and motivation
The g-index was developed by Leo Egghe as an enhancement to the h-index, aiming to provide a more comprehensive measure of a researcher's scientific output by incorporating the overall citation performance of their most cited works.[1] Introduced in 2006, it addresses the need for an index that better reflects the global impact of an author's publications, particularly in fields where citation counts vary widely.[1] A key shortcoming of the h-index is its equal treatment of all papers within the h-core, regardless of significant disparities in their citation counts, which can undervalue the influence of a few standout, highly cited publications.[1] The g-index mitigates this by emphasizing the total citations received by the top g articles, thereby giving greater weight to outliers and providing a fuller picture of an author's productivity and scholarly visibility.[1] This adjustment ensures that researchers with skewed citation distributions—common in many disciplines—are not penalized for having a concentration of high-impact works alongside more modestly cited ones.[1] Conceptually, the g-index seeks to strike a more effective balance between the quantity of publications (productivity) and their qualitative impact (citations), offering a refined tool for scientometric evaluations that distinguishes scientists more accurately based on their true influence.[1] By inheriting the strengths of the h-index while extending its scope to account for citation totals in the leading set of papers, it promotes a nuanced assessment of research excellence in diverse academic contexts.[1]History and development
Introduction by Leo Egghe
Leo Egghe, a Belgian scientometrician affiliated with Hasselt University, has made significant contributions to the field of informetrics through his extensive research on citation analysis and scholarly impact measures.[9] His work often focuses on mathematical models for evaluating scientific productivity and influence, building on foundational concepts in bibliometrics. In 2006, Egghe introduced the g-index as a refinement to existing author-level metrics, particularly in response to the rapid adoption of the h-index proposed by Jorge E. Hirsch in 2005.[1] This development occurred amid increasing interest in quantitative tools for assessing researchers' citation performance beyond traditional metrics like total citations.[10] The g-index was formally presented in Egghe's paper "Theory and practise of the g-index," published in Scientometrics (volume 69, issue 1, pages 131–152), where he outlined its theoretical foundations and practical applications for measuring the global citation impact of a researcher's body of work.[11] This publication marked a key advancement in informetrics, emphasizing the need for indices that balance productivity and citation distribution.[12]Evolution and related proposals
Following its introduction, the g-index saw early extensions aimed at addressing its integer-based limitations for more precise evaluations. In 2009, Raf Guns and Ronald Rousseau reviewed and formalized real and rational variants of the g-index, extending prior work on similar adaptations for the h-index. The real g-index (gr) generalizes the metric to continuous values by interpolating based on the citation threshold, while the rational g-index (grat) achieves non-integer results through a fractional adjustment tied to the citations of the (g+1)th paper. These variants enhance granularity by allowing values between integers, reducing abrupt changes in scores for minor citation shifts. Related indices emerged contemporaneously, building on the g-index's emphasis on highly cited works. The hg-index, proposed by Sergio Alonso, Francisco José Cabrerizo, Enrique Herrera-Viedma, and Francisco Herrera in 2009 (with a preprint in 2008), combines the h-index and g-index via the geometric mean √(h × g) to balance broad productivity with citation impact from top papers. Similarly, the contemporary h-index (hc), introduced by Antonis Sidiropoulos, Dimitrios Katsaros, and Yannis Manolopoulos in 2007, incorporates temporal weighting to favor recent citations, aligning with the g-index's goal of mitigating underemphasis on influential outliers in standard metrics. The g-index gained adoption in the late 2000s as bibliometric tools proliferated, reflecting its utility in complementing the h-index. By 2007, it was integrated into the initial release of Publish or Perish, a widely used software by Anne-Wil Harzing for retrieving and analyzing Google Scholar citations, enabling easy computation alongside other metrics. This incorporation facilitated broader application in academic evaluations during the period.[13] Criticisms of the g-index, particularly its saturation effect—where the metric caps at the total number of publications (g ≤ P) and additional citations to existing top papers fail to raise it without new outputs—sparked debates on its responsiveness for prolific researchers. This limitation, noted in analyses of citation rank distributions, prompted adjusted formulas like the real and rational variants to mitigate discreteness and related stagnation issues, alongside entirely new indices to better capture ongoing impact.[14]Calculation
Step-by-step method
To compute the g-index for an author or researcher, begin by compiling a complete list of their publications along with the corresponding number of citations each has received. This dataset forms the basis for the calculation, drawing from bibliometric databases such as Scopus or Web of Science.[15] Next, sort the publications in descending order of citation counts, denoted as c_1 \geq c_2 \geq \dots \geq c_n, where n is the total number of publications. This ordering ensures that the most highly cited works are prioritized in the evaluation.[15] Then, for each possible value of g from 1 to n, calculate the cumulative sum of citations for the top g publications:S_g = \sum_{i=1}^{g} c_i.
This step aggregates the citation impact of the leading publications incrementally.[15] Identify the largest integer g such that S_g \geq g^2. This threshold condition captures the point where the collective citations of the top g papers meet or exceed the square of g, defining the g-index value.[15] For edge cases, if an author has zero publications, the g-index is defined as 0. Uncited publications (where c_i = 0) do not contribute to the top g papers due to the descending sort and thus have no impact on the computed value.[15] Computationally, the process can be implemented iteratively: initialize the cumulative sum at 0 and increment g from 1, updating S_g by adding the next c_g at each step, while checking the condition S_g \geq g^2; stop at the first g where the inequality fails, as subsequent values will not satisfy it given the non-increasing citation sequence. This approach avoids exhaustive checks for all g up to n and ensures efficiency, particularly for large publication lists.[15]
Illustrative example
Consider a hypothetical researcher with 10 publications, receiving the following citation counts in descending order: 25, 12, 10, 8, 7, 5, 4, 3, 2, 1. To compute the g-index, calculate the cumulative sum S_g of the top g most-cited publications and find the largest g such that S_g \geq g^2. The sorted list remains the same as provided. For g=1, S_1 = 25 \geq 1^2 = 1; for g=2, S_2 = 37 \geq 4; for g=3, S_3 = 47 \geq 9; for g=4, S_4 = 55 \geq 16; for g=5, S_5 = 62 \geq 25; but for g=6, S_6 = 67 < 36. Thus, the g-index is 5. This result indicates that the researcher's top 5 publications collectively account for at least 25 citations, highlighting a subset of high-impact work while accounting for uneven citation distributions. The following table summarizes the computation:| g | Cumulative citations S_g | g^2 | Threshold met? |
|---|---|---|---|
| 1 | 25 | 1 | Yes |
| 2 | 37 | 4 | Yes |
| 3 | 47 | 9 | Yes |
| 4 | 55 | 16 | Yes |
| 5 | 62 | 25 | Yes |
| 6 | 67 | 36 | No |