Betweenness centrality
Betweenness centrality is a fundamental measure in graph theory and network analysis that quantifies the extent to which a given vertex serves as an intermediary on the shortest paths between other pairs of vertices in a graph.[1] Introduced by sociologist Linton C. Freeman in 1977 as part of a family of centrality indices inspired by early work on communication in groups, it emphasizes a vertex's potential for control over information or resource flows by occupying bridging positions in the network structure.[2] This metric is particularly useful for identifying nodes that act as brokers or gatekeepers, distinguishing it from other centrality measures like degree or closeness that focus more on local connectivity or proximity.[3]
The mathematical formulation of betweenness centrality for a vertex v in an undirected graph G = (V, E) with n = |V| vertices is given by:
C_B(v) = \sum_{\substack{s, t \in V \\ s < t \\ s \neq v \\ t \neq v}} \frac{\sigma_{st}(v)}{\sigma_{st}},
where \sigma_{st} denotes the total number of shortest paths (geodesics) from vertex s to vertex t, and \sigma_{st}(v) is the number of those shortest paths that pass through v.[4] To normalize the values for comparison across networks, it is often scaled by dividing by the maximum possible value, \frac{(n-1)(n-2)}{2}, yielding a proportion between 0 and 1.[4] Computationally, exact calculation requires enumerating all pairs of shortest paths, which has a time complexity of O(n m) using efficient algorithms like Brandes' single-source approach, making it feasible for moderate-sized networks but challenging for very large ones.[5]
Betweenness centrality finds wide applications across disciplines, including social network analysis where it highlights influential individuals who mediate interactions and wield power through their positional advantages, such as in communication or collaboration networks.[3] In transportation and infrastructure systems, it identifies critical junctions or links whose removal would most disrupt connectivity, aiding in vulnerability assessments and optimization.[6] Biological networks, like protein interaction graphs, use it to pinpoint essential hubs in signaling pathways, while in web graphs and communication systems, it helps detect bottlenecks or key routers for traffic management.[7] Its emphasis on global structure makes it a key tool for understanding network robustness, community detection, and strategic node targeting in complex systems.[5]
Fundamentals
In graph theory, betweenness centrality quantifies the extent to which a node lies on the shortest paths between other nodes in an unweighted, undirected graph, serving as a measure of the node's potential to control or mediate communication flows.[2] Introduced by Linton C. Freeman in 1977, it formalizes the idea that nodes positioned on many shortest paths exert greater influence over information dissemination in a network.[2]
For a graph G = (V, E) with node set V of size n = |V| and edge set E, the betweenness centrality C_B(v) of a node v \in V is defined as
C_B(v) = \sum_{\substack{i, j \in V \\ i < j \\ i \neq v, j \neq v}} \frac{g_{ij}(v)}{g_{ij}},
where g_{ij} denotes the total number of shortest paths (geodesics) from node i to node j, and g_{ij}(v) is the number of those shortest paths that pass through v as an intermediate node.[2] This formulation sums the fractions of shortest paths between all unordered pairs of distinct nodes excluding v itself, capturing the proportion of communication routes dependent on v.[8]
To facilitate comparisons across networks of varying sizes, betweenness centrality is often normalized to the range [0, 1]. The normalized variant is
C_B'(v) = \frac{C_B(v)}{\frac{1}{2}(n-1)(n-2)},
where the denominator represents the maximum possible betweenness value for a single node in an undirected graph of n nodes (achieved, for instance, by the central node in a star graph).[8] This scaling removes the dependency on network size, allowing direct assessment of relative centrality.[8]
Consider a simple path graph with 4 nodes labeled 1--2--3--4, where edges connect consecutive nodes. To compute C_B(2), identify all unordered pairs excluding node 2: (1,3), (1,4), and (3,4). For pair (1,3), there is 1 shortest path (1-2-3), all passing through 2, so the fraction is 1. For (1,4), there is 1 shortest path (1-2-3-4), all passing through 2, so the fraction is 1. For (3,4), there is 1 shortest path (3-4), none through 2, so the fraction is 0. Thus, C_B(2) = 1 + 1 + 0 = 2. By symmetry, C_B(3) = 2, while C_B(1) = C_B(4) = 0. Normalizing by \frac{1}{2}(4-1)(4-2) = 3 yields C_B'(2) = C_B'(3) = \frac{2}{3}.[2]
Intuition and Interpretation
Betweenness centrality was introduced by Linton Freeman in 1977 to quantify a node's potential to control communication or influence in a network by measuring how often it lies on the shortest paths between other nodes, thereby extending the foundational ideas from Alex Bavelas's 1950 studies on communication patterns in task-oriented groups.[2][9] Freeman's formulation captures bridging positions in social networks, where certain actors occupy positions that bridge otherwise separate parts of the network, facilitating or constraining interactions across the structure.[2]
At its core, betweenness centrality captures the intuition of a node acting as a bridge or bottleneck, exerting control over the flow of information, resources, or influence between otherwise separate parts of the network.[2] Nodes with high betweenness are essential intermediaries, as many shortest paths—representing efficient routes for transmission—must pass through them, highlighting their role in maintaining overall network cohesion.[3] This perspective emphasizes dependency: a node with elevated betweenness indicates vulnerability, as its failure or removal could sever connections and fragment the network into isolated components.[10]
Illustrative examples underscore this concept. In a star graph, the central node achieves maximum betweenness by lying on every shortest path between peripheral nodes, making it indispensable, while leaves have zero betweenness.[3] Conversely, in a ring graph (or cycle), all nodes exhibit equal and low betweenness, as redundant paths allow communication to bypass any single node without disruption.[3]
Qualitatively, betweenness centrality excels at pinpointing key influencers or failure-prone elements in undirected networks, offering insights into structural leverage without presupposing directional flows or hierarchies.[2]
Extensions
Weighted and Directed Variants
In weighted networks, betweenness centrality extends the unweighted definition by replacing standard shortest paths with minimum-weight paths, where edge weights represent costs, distances, capacities, or other attributes that affect path optimality. Shortest paths are computed using algorithms like Dijkstra's, which account for cumulative weights along edges. The core formula adapts accordingly:
C_B(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{\sigma_{st}(v)}{\sigma_{st}}
Here, \sigma_{st} denotes the total number of minimum-weight paths from s to t, and \sigma_{st}(v) is the number of such paths passing through vertex v, with the ratio \frac{\sigma_{st}(v)}{\sigma_{st}} capturing the fraction of geodesic weight routed via v. This formulation, introduced as an efficient computational extension, enables the measure to reflect realistic flow dynamics, such as traffic loads in road networks where nodes with high betweenness indicate potential bottlenecks based on travel distances rather than hop counts.[11]
The key difference from unweighted variants lies in how weights model heterogeneous edge properties, allowing betweenness to prioritize paths that minimize total cost over sheer length; for instance, in a transportation graph, a central interchange might accrue high centrality if it lies on fewer but critically shorter routes. This adaptation preserves the interpretive intuition of brokerage while accommodating applications where uniform edge assumptions fail, such as in communication networks with varying bandwidths.[11]
For directed networks, betweenness centrality computes over directed shortest paths, respecting edge orientations and introducing inherent asymmetry: a vertex's role as an intermediary depends on the directionality of flows, potentially elevating nodes with strong in-degree or out-degree influences in one direction but not the reciprocal. The summation follows ordered pairs (s, t), with the formula mirroring the weighted case but applied to arc-respecting geodesics:
C_B(v) = \sum_{s \neq v} \sum_{t \neq v} \frac{g_{st}(v)}{g_{st}}
where g_{st} is the number of directed shortest paths from s to t, and g_{st}(v) those traversing v. Normalization adjusts to divide by (n-1)(n-2) to standardize scores across graph sizes and enable cross-network comparisons, treating directed structures analogously to undirected for interpretability despite the asymmetry.[12] This generalization requires only minor algorithmic tweaks, such as orienting dependency accumulations along directed predecessors.[11]
In practice, directed betweenness highlights brokerage in oriented systems; for example, in a citation network, a highly central paper or author brokers knowledge dissemination by lying on many directed paths from foundational works to emerging research, quantifying influence asymmetry where influence flows outward from sources but converges on hubs. Such adaptations, building on early undirected formulations, were refined for directed graphs to address interpretability in asymmetric contexts.[12]
Edge and Pairwise Betweenness
Edge betweenness centrality extends the concept of node betweenness to individual edges, measuring the extent to which an edge lies on the shortest paths between all pairs of nodes in the network. Formally, for an edge e, the edge betweenness centrality C_B(e) is given by
C_B(e) = \sum_{s \neq t} \frac{\sigma_{st}(e)}{\sigma_{st}},
where \sigma_{st} denotes the total number of shortest paths from node s to node t, and \sigma_{st}(e) is the number of those shortest paths that pass through edge e. If no shortest path from s to t traverses e, then \sigma_{st}(e) = 0. This formulation assigns equal weight to each shortest path, ensuring the fractions sum appropriately across pairs.
The edge variant was introduced by Girvan and Newman in 2002 specifically to identify bridges between network communities, where high-betweenness edges are iteratively removed to reveal modular structures.[13] In practice, edges with elevated betweenness scores highlight potential bottlenecks, making this measure valuable for vulnerability analysis in infrastructure networks, such as pinpointing failure-prone links that could disrupt overall connectivity.[14] It also aids in link prediction by signaling edges that, if added or reinforced, could optimize path diversity or infer missing connections in evolving graphs.[15]
For instance, in communication networks, computing edge betweenness helps prioritize critical links for redundancy planning, ensuring resilience by duplicating high-betweenness edges susceptible to outages.
Pairwise betweenness centrality, often termed pair-dependency, quantifies the dependence of a specific source-target pair (s, t) on an intermediate node v, focusing on the fraction of shortest paths between s and t that route through v. It is defined as
\delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}},
where \sigma_{st}(v) counts the shortest paths from s to t passing through v, and the value is zero if v lies on no such path. This pairwise measure captures the localized control exerted by v over communication between exactly two nodes, serving as a building block for broader centrality assessments.
The full node betweenness centrality aggregates these pairwise dependencies across all distinct pairs, yielding C_B(v) = \sum_{s \neq v \neq t} \delta_{st}(v), often normalized by the number of pairs for comparability. Introduced as a key intermediate in Brandes' 2001 algorithm for efficient betweenness computation, pairwise betweenness enables single-source traversals to accumulate dependencies recursively, avoiding exhaustive all-pairs calculations. This approach underpins scalable implementations in large networks, emphasizing v's role in specific pairwise flows without recomputing global paths.
Computation
Exact Algorithms
Exact algorithms for betweenness centrality compute the measure precisely by systematically accounting for all shortest paths between pairs of vertices. These methods, while computationally intensive for large graphs, provide the full accuracy required for applications demanding exact values. The seminal approach is Brandes' algorithm, introduced in 2001, which improves upon earlier pairwise computation strategies by leveraging single-source shortest path calculations and a dependency accumulation technique to avoid redundant path enumerations.[11]
Brandes' algorithm proceeds in two main phases for each vertex s treated as a source: first, it computes the shortest-path distances and predecessor trees from s to all other vertices using breadth-first search (BFS) for unweighted graphs or Dijkstra's algorithm for weighted graphs; second, it accumulates the pairwise dependencies \delta_s(v) in a backward pass, starting from vertices farthest from s. The dependency for a vertex v is defined recursively as
\delta_s(v) = \sum_{w: v \in P_s(w)} \frac{\sigma_{s v}}{\sigma_{s w}} \left(1 + \delta_s(w)\right),
where P_s(w) is the set of predecessors of w on shortest paths from s, \sigma_{s t} denotes the number of shortest paths from s to t, and the summation is over all w such that v precedes w in the shortest-path tree. The betweenness centrality of v is then the sum of these dependencies over all sources s \neq v, normalized if desired by dividing by (n-1)(n-2) for undirected graphs with n vertices. This accumulation exploits the additivity of dependencies, allowing efficient computation without explicitly counting all paths.[11]
The algorithm handles directed graphs through the same framework, as the shortest-path computations naturally respect edge directions. For weighted graphs, Dijkstra's algorithm incorporates non-negative edge weights via priority queues. In the special case of directed acyclic graphs (DAGs), shortest paths can be computed in linear time using a topological sort instead of BFS or Dijkstra, followed by the same dependency accumulation in reverse topological order, yielding an overall time complexity of O(n + m) per source.[11][16]
A high-level pseudocode outline for the unweighted case is as follows:
Initialize centrality C(v) = 0 for all v
For each source s in V:
Compute distances d and predecessors P via [BFS](/page/Breadth-first_search) from s
Initialize dependency δ(v) = 0 for all v
Create a stack S of vertices in order of decreasing distance from s
While S is not empty:
w = S.pop()
For each predecessor v of w:
δ(v) += (σ(s,v) / σ(s,w)) * (1 + δ(w))
If w ≠ s:
C(w) += δ(w)
Return C
Initialize centrality C(v) = 0 for all v
For each source s in V:
Compute distances d and predecessors P via [BFS](/page/Breadth-first_search) from s
Initialize dependency δ(v) = 0 for all v
Create a stack S of vertices in order of decreasing distance from s
While S is not empty:
w = S.pop()
For each predecessor v of w:
δ(v) += (σ(s,v) / σ(s,w)) * (1 + δ(w))
If w ≠ s:
C(w) += δ(w)
Return C
(Adapted from the original pseudocode in Brandes (2001), with σ values computed during BFS.)[11]
For sparse graphs with n = |V| vertices and m = |E| edges, the time complexity is O(nm) for unweighted graphs and O(nm + n^2 \log n) for weighted graphs, using Fibonacci heaps for Dijkstra; space usage is O(n + m). On a real-world collaboration network with 4,259 vertices and 61,693 edges, the algorithm completed in approximately 448 seconds on a 1999-era workstation, demonstrating feasibility for moderately sized graphs at the time.[11]
Approximation Techniques
Exact methods for computing betweenness centrality, such as Brandes' algorithm, become impractical for large graphs due to their O(nm) time complexity, where n is the number of vertices and m the number of edges.[11] Approximation techniques mitigate this by estimating centrality values with controlled error, enabling analysis of networks with millions of nodes.
Random sampling methods, often based on Monte Carlo estimation, select k random source-target pairs and compute the fraction of shortest paths between them that pass through each node to approximate centrality.[17] These approaches provide probabilistic guarantees on accuracy; for instance, using Hoeffding's inequality, O(1/ε²) samples suffice to achieve an additive error of ε(n-1) with high probability, where the bound depends on the graph's size but not its structure.[17][18]
Landmark-based approximations select a small subset of nodes as landmarks (or pivots) and compute exact shortest-path dependencies from these, using the results to unbiasedly estimate full centrality via extrapolation or weighting.[17] For example, Riondato and Kornaropoulos (2014) developed a sampling framework that draws paths uniformly from shortest paths between random pairs, yielding additive ε-approximations for all nodes or multiplicative ε-approximations for the top-k nodes, with sample sizes logarithmic in the graph diameter and independent of n and m, along with provable (1-δ) success probability.[19]
Parallel and distributed variants extend these approximations for big data environments; for instance, MapReduce-based systems distribute random pair sampling and shortest-path computations across clusters, achieving scalable estimation on layered distributed architectures.[20] Implementations leveraging GraphBLAS primitives further accelerate matrix-based sampling and aggregation in shared-memory parallel settings.
Approximations trade precision for efficiency: exact computations are feasible for graphs with n < 10^4, but for n > 10^6, sampling methods deliver estimates with relative errors under 10% while providing speedups of 2-4 orders of magnitude on scale-free networks, such as social or web graphs.[21]
Post-2010 advances emphasize adaptive sampling to prioritize high-centrality nodes; KADABRA (2016), for example, dynamically adjusts sample sizes based on interim variance estimates using bidirectional BFS for path sampling, focusing efforts on promising regions and achieving up to 100x speedups over prior samplers on networks with millions of edges.[22] Recent developments (as of 2025) have extended these techniques to dynamic and temporal graphs, with algorithms for efficient updates in evolving networks and computations over heterogeneous information networks, enabling applications in time-varying systems like social media and transportation.[23][24]
Properties
Mathematical Characteristics
Betweenness centrality exhibits several key mathematical properties that underpin its utility as a network measure. In undirected graphs, it demonstrates symmetry, as the centrality score of a node depends solely on the undirected shortest paths passing through it, yielding identical values regardless of path directionality.[8] This symmetry ensures consistent interpretation across bidirectional networks. Additionally, betweenness centrality is additive in its formulation, computed as the sum of its contributions over all distinct pairs of nodes, where each pair's dependency fraction (the proportion of shortest paths through the node) contributes independently to the total score.[8]
The measure satisfies fundamental axioms for centrality, including invariance under graph isomorphism, meaning that isomorphic graphs produce identical centrality vectors, preserving structural equivalence.[25] It also aligns with Freeman's conceptual conditions for centrality measures, such as assigning higher values to nodes in more central positions (e.g., those bridging otherwise disconnected components) and adhering to boundary conditions where isolated nodes receive zero centrality.[8] Regarding monotonicity, betweenness centrality is not strictly score-monotone—adding an edge can decrease the score of some nodes by introducing shortcuts—but it is rank semi-monotone in connected undirected graphs. This means that upon adding an edge between nodes u and v, the rank of at least one of u or v does not worsen relative to other nodes, as the new edge can only enhance or maintain the relative positioning of its endpoints in terms of path mediation. A proof sketch for this relies on the non-decreasing nature of shortest path counts involving the endpoints: adding the edge increases or preserves the number of paths routed through at least one endpoint without disproportionately benefiting non-endpoints in ranking.[26]
Sensitivity analysis reveals that betweenness centrality is highly responsive to graph perturbations, such as edge or node removal, often more so than degree or closeness measures. Removing a high-betweenness edge can significantly alter shortest path distributions, leading to cascading decreases in centrality for nodes dependent on those paths; this property makes it valuable for assessing network robustness, where changes in centrality scores quantify vulnerability to failures.[27] For instance, in theoretical studies, the edge sensitivity of betweenness is shown to be O(n^2) in the worst case for unweighted networks, highlighting its structural dependence.[27]
Normalization plays a crucial role in interpreting betweenness values. The unnormalized score provides an absolute measure of path mediation, suitable for analyzing control within a fixed network, while the normalized variant—divided by (n-1)(n-2)/2 for undirected graphs with n nodes—scales scores to [0,1], facilitating comparisons and rankings across networks of varying sizes.[8] This normalization preserves the relative ordering of nodes but adjusts for the maximum possible centrality in a star graph, where the center achieves the highest value.[8]
Limitations and Criticisms
One major limitation of betweenness centrality is its high computational complexity, with exact algorithms requiring O(n m) time in sparse unweighted graphs, where n is the number of nodes and m is the number of edges, rendering it infeasible for large-scale networks with n exceeding 10^5 nodes. This scalability issue has prompted the widespread use of approximation methods, which, while faster, introduce bias and variance that can distort centrality rankings, particularly in heterogeneous networks.
The measure's core assumption—that influence or flow occurs exclusively along shortest paths—has been widely criticized for overlooking alternative routes, such as longer paths or non-geodesic flows prevalent in processes like random walks or information diffusion. Empirical analyses of real-world data, including human navigation and transportation flows, reveal significant deviations from this assumption, with actual trajectories often bypassing shortest paths and leading to mismatched centrality rankings.[28] For instance, in air transportation networks, hubs like Anchorage exhibit inflated betweenness scores under the shortest-path model but diminished importance when accounting for observed passenger journeys.[28]
Interpretively, betweenness centrality overemphasizes nodes acting as global brokers while neglecting local structures, such as clustering or dense subgroups, where it often assigns zero scores to nodes with interconnected contacts indistinguishable from isolated ones.[29] It also proves insensitive to network density variations, failing to capture node importance in highly connected clusters where shortest-path assumptions break down and alternative flow dynamics dominate.[30]
Empirical studies from the 2010s highlight betweenness centrality's poor correlation with real-world influence metrics, especially in dense graphs, where it underperforms compared to degree or eigenvector centrality in predicting outcomes like causal impact or firm performance. For example, in sampled or error-prone networks, its rankings show instability and weak alignment with full data, limiting its reliability for practical applications.[29]
To address these drawbacks, researchers have developed hybrid measures combining betweenness with local metrics like clustering coefficients, as well as approximation techniques providing probabilistic error bounds to mitigate bias in large networks.[29]
Applications
Social and Communication Networks
In social networks, betweenness centrality serves as a key metric for identifying structural holes—gaps in connections between groups that provide brokerage opportunities—and highlighting individuals who act as gatekeepers controlling information flow between otherwise disconnected clusters.[31] Ronald Burt's seminal work demonstrated that actors spanning these structural holes gain competitive advantages in organizations by facilitating non-redundant information exchange, positioning high-betweenness nodes as influential brokers in professional and community settings.[32] This aligns with Linton Freeman's 1977 application of betweenness centrality to sociograms, where it quantified the centrality of actors in facilitating communication paths within small social groups, revealing gatekeepers who bridge subgroups in interpersonal networks.[2]
In communication networks, such as email or phone call graphs, betweenness centrality measures the extent to which certain nodes control the dissemination of information, identifying key intermediaries in the spread of messages or rumors. Analyses of the Enron email corpus, for instance, have shown that executives with high betweenness centrality occupied pivotal roles in coordinating corporate communications, underscoring their influence over information pathways during organizational crises.[33] Similarly, in online platforms like Twitter, studies from the 2010s have applied betweenness centrality to detect influencers who bridge diverse user communities during events, such as political discussions, where high-betweenness accounts amplify message propagation across ideological divides.[34]
The removal of high-betweenness nodes significantly impacts network resilience, as these bridges are critical for maintaining connectivity; targeted sabotage models demonstrate that eliminating such nodes fragments communities more effectively than random removals, leading to rapid disintegration of information flow.[35] This vulnerability has been evidenced in simulations of social networks, where betweenness-based attacks outperform degree-based ones in isolating clusters, highlighting the strategic importance of these nodes in preserving overall network cohesion.[36]
A classic case study is Freeman's 1977 analysis of sociograms from interpersonal advice networks, where betweenness centrality identified consultants as central brokers enhancing group efficiency.[2] In a modern context, during the COVID-19 pandemic, betweenness centrality has been employed in contact tracing networks to pinpoint superspreaders—individuals bridging infection clusters—enabling targeted interventions that disconnected transmission paths and reduced outbreak sizes in real-world epidemiological data from 2020.[37]
Biological and Physical Systems
In biological systems, betweenness centrality has been instrumental in analyzing protein-protein interaction (PPI) networks, where nodes with high betweenness values often represent essential proteins acting as critical hubs that mediate information flow between otherwise disconnected modules. For instance, in the yeast PPI network, proteins exhibiting elevated betweenness centrality are significantly more likely to be essential for cellular viability, as they lie on numerous shortest paths facilitating signal transduction and metabolic coordination. This property has enabled the identification of potential drug targets, as disrupting high-betweenness proteins can selectively impair disease-related pathways while minimizing off-target effects in cancer and infectious disease modeling.
In neuroscience, betweenness centrality applied to brain connectivity graphs derived from functional magnetic resonance imaging (fMRI) data reveals regions that serve as integration zones for information processing. High-betweenness nodes, such as those in the default mode network or prefrontal cortex, indicate hubs where neural signals converge and are rerouted across distant brain areas, supporting cognitive functions like attention and memory. Studies from the 2010s using resting-state fMRI have shown that alterations in betweenness centrality correlate with neurological disorders, including Alzheimer's disease, where reduced betweenness in temporal regions disrupts global integration.
Physical systems leverage betweenness centrality to model flow dynamics in natural and engineered structures. In river networks, it quantifies sediment transport pathways by identifying junctions where the majority of shortest paths—representing efficient water and particle routing—converge, aiding in the prediction of erosion hotspots and flood resilience. For example, analyses of basin-scale river graphs highlight high-betweenness confluences as key control points for sediment flux. Similarly, in transportation grids, betweenness centrality pinpoints traffic bottlenecks by measuring the fraction of shortest routes passing through road segments or intersections, informing urban planning to alleviate congestion in high-load corridors.
In environmental applications, betweenness centrality supports failure prediction in critical infrastructure like urban water distribution systems and power grids. In water networks, edges with high betweenness indicate pipes that carry disproportionate flow volumes, making them vulnerable to bursts under pressure variations and guiding targeted maintenance to prevent widespread outages. For power grids, it identifies transmission lines prone to cascading failures, as high-betweenness components channel electricity across the system; vulnerability assessments using this metric have predicted blackout risks with high accuracy in real-world grids. Post-2015 applications extend to climate network analysis, where betweenness centrality maps teleconnections in atmospheric circulation patterns, revealing influential regions like the Pacific Ocean that mediate extreme weather propagation.
Comparisons with Other Centralities
Betweenness centrality differs from degree centrality, which simply counts the number of direct connections a node has, by emphasizing a node's role in facilitating communication across the entire network rather than local popularity. In a star graph, the central node exhibits high values in both measures due to its numerous direct links and control over all paths between peripheral nodes. However, in a network consisting of two densely connected cliques linked by a single bridge node, the bridge node has relatively low degree but exceptionally high betweenness, as it lies on nearly all shortest paths between the cliques, highlighting betweenness's focus on global intermediation. Empirical studies across diverse networks, such as social and collaboration graphs, show a moderate positive correlation between the two measures (average Pearson's r ≈ 0.70), indicating that high-degree nodes often serve as intermediaries but not always.[38]
In contrast to closeness centrality, which quantifies a node's average shortest-path distance to all others and thus measures overall accessibility, betweenness centrality specifically captures the extent to which a node mediates between pairs of other nodes. For instance, in a linear path graph, the central node scores highly on both, benefiting from proximity and path control, but in elongated or hierarchical networks, the measures diverge: nodes near the center may have high closeness for quick access but lower betweenness if paths bypass them. Correlations between betweenness and closeness are generally weaker (average r ≈ 0.37), particularly in sparse or asymmetric directed networks where proximity does not imply brokerage.[38]
Unlike eigenvector centrality (or PageRank variants), which assesses a node's influence based on connections to other influential nodes and thus emphasizes prestige or recursive importance, betweenness centrality disregards the "quality" of endpoints and focuses solely on path dependency. This makes betweenness particularly suited for identifying brokers who control information flow, while eigenvector centrality excels in detecting hubs of influence, such as in citation networks where authority propagates through high-prestige links. For example, in a prestige-driven network like a star with a high eigenvector center, betweenness may undervalue peripheral nodes' indirect roles, and empirical correlations are moderate (average r ≈ 0.64), with betweenness outperforming in brokerage tasks and eigenvector in influence diffusion.[38]
| Measure | Key Assumption | Computational Complexity | Primary Use Cases |
|---|
| Degree Centrality | Local direct connections suffice for influence | O(n + m) | Assessing immediate popularity or infection risk in local clusters[39] |
| Closeness Centrality | Shortest paths determine proximity and efficiency | O(n m) | Evaluating communication speed or reachability in elongated structures[39] |
| Betweenness Centrality | Nodes control flow if on many shortest paths between others | O(n m) | Identifying brokers or bottlenecks in global information routing[39] |
| Eigenvector Centrality | Influence accrues from connections to influential nodes | O(n^3) or iterative O(m log n) | Measuring prestige or authority in recursive influence networks[40] |
Generalizations and Alternatives
Current-flow betweenness centrality extends the traditional measure by modeling information spread as electrical current or random walks across all possible paths, rather than restricting to shortest paths (geodesics).[41] In this formulation, the centrality of a vertex v, denoted C_B(v), is computed as the sum over all pairs of vertices s and t of the fraction of current flowing through v when a unit current is injected at s and extracted at t, analogous to voltage differences in a resistor network.[41] This approach is particularly suitable for undirected networks where flows do not follow strict shortest paths, such as in communication or transportation systems with multiple routes.[42]
Random walk betweenness centrality provides another generalization, focusing on the expected number of times a vertex is visited during random walks between all pairs of other vertices, thereby accounting for diffusion processes over all paths.[43] Introduced by Newman, this measure C_B(v) sums, for each pair s, t, the probability that a random walk from s to t passes through v, normalized appropriately.[43] It proves useful in modeling stochastic processes like disease spread or information diffusion, where traversals are probabilistic rather than deterministic.[43]
Percolation centrality further generalizes by incorporating network reliability, weighting paths based on the probability that edges survive random failures, thus emphasizing robust connectivity during percolation transitions.[44] The centrality of a node is quantified by its contribution to the percolation process, integrating topological position with edge survival probabilities up to a percolation threshold.[44] This variant is applied in assessing node importance in resilient infrastructures, such as power grids or biological networks under stress.[44]
Bounded-distance variants offer approximations by limiting consideration to shortest paths within a maximum distance k, reducing computational demands while capturing local flow influences.[45] These are employed in large-scale networks for efficient analysis of approximate centrality in scenarios like delay-tolerant communications.[46] Overall, such generalizations like current-flow and percolation are preferred when traditional betweenness overlooks path diversity or reliability, enhancing applicability in dynamic or uncertain environments.[41][44]