Wiener index
The Wiener index, also known as the path number, is a topological invariant of a graph defined as the sum of the shortest path distances between all unordered pairs of vertices.[1] Introduced by chemist Harry Wiener in 1947, it was originally developed to quantify the branching of alkane molecular structures and predict their boiling points by modeling carbon atoms as vertices and bonds as edges in a graph.[1][2]
Wiener's work, spanning five papers from 1947 to 1948, marked the beginning of chemical graph theory, where the index served as the first molecular descriptor to correlate structural features with physicochemical properties such as boiling points, heat of formation, and viscosity of paraffin hydrocarbons.[2] Over the subsequent decades, the Wiener index has become the oldest and most studied distance-based topological index, inspiring hundreds of similar invariants and extensive research in both chemistry and pure mathematics.[3] In chemistry, it remains widely used for quantitative structure-property relationship (QSPR) modeling, drug design, and predicting molecular stability, with strong correlations observed for properties like octanol-water partition coefficients and toxicity in organic compounds.[3]
In graph theory, the Wiener index quantifies connectivity and compactness, with explicit formulas available for specific graph classes such as trees, paths, and stars; for an n-vertex path graph, it equals \binom{n+1}{3}, representing the maximum value among trees, while the star graph yields the minimum (n-1)^2.[3] Its computation and bounds have been explored for various operations like joins, coronas, and line graphs, revealing applications in network analysis, optimization problems, and enumerative combinatorics.[4] Despite its simplicity, the index's sensitivity to graph diameter and branching has led to ongoing studies on its extremal properties and generalizations to hypergraphs, directed graphs, and fuzzy environments.[3]
Fundamentals
Definition
In graph theory, the Wiener index is a distance-based invariant originally motivated by the analysis of molecular structures in chemistry.
An undirected graph G consists of a finite nonempty vertex set V(G) and an edge set E(G) whose elements are unordered pairs of distinct vertices from V(G), representing connections without direction. A graph G is connected if, for every pair of distinct vertices u, v \in V(G), there exists at least one path—a sequence of distinct vertices and edges alternately connecting them—from u to v. The shortest path distance d_G(u, v) between two vertices u, v \in V(G) in a connected graph G is the minimum number of edges in any path connecting u and v.
For a connected undirected graph G with vertex set V, the Wiener index W(G) is defined as the sum of the shortest path distances over all unordered pairs of distinct vertices:
W(G) = \sum_{\{u, v\} \subseteq V, \, u \neq v} d_G(u, v).
This is equivalently expressed as
W(G) = \frac{1}{2} \sum_{u \in V} \sum_{\substack{v \in V \\ v \neq u}} d_G(u, v).
The notation W(G) and d_G(u, v) are conventional in this context.
Examples
To illustrate the Wiener index, consider simple graph families where distances can be enumerated explicitly.
For the path graph P_n with n vertices labeled sequentially as v_1, v_2, \dots, v_n connected by edges v_i - v_{i+1}, the Wiener index is given by the closed-form expression W(P_n) = \frac{n(n^2 - 1)}{6}. This formula arises from summing distances d(v_i, v_j) = |i - j| over all unordered pairs \{i, j\} with i < j, which simplifies to \sum_{k=1}^{n-1} k(n - k). To compute explicitly for n=4 (vertices v_1 - v_2 - v_3 - v_4), the pairwise distances are: d(v_1,v_2)=1, d(v_1,v_3)=2, d(v_1,v_4)=3, d(v_2,v_3)=1, d(v_2,v_4)=2, d(v_3,v_4)=1. Summing these yields W(P_4) = 1 + 2 + 3 + 1 + 2 + 1 = 10, matching the formula \frac{4(16 - 1)}{6} = 10.
For the complete graph K_n where every pair of n vertices is adjacent, all distances are 1, so the Wiener index is simply the number of edges, W(K_n) = \binom{n}{2} = \frac{n(n-1)}{2}. This reflects the minimal possible sum for connected graphs on n vertices, as no path exceeds length 1.
For the cycle graph C_n with n vertices forming a closed loop, the Wiener index for even n is W(C_n) = \frac{n^3}{8}. This derives from symmetry: for even n = 2m, the sum of distances from each vertex to the other n-1 vertices is $2 \sum_{k=1}^{m-1} k + m = m(m-1) + m = m^2, and dividing by 2 (to avoid double-counting pairs) gives the total \frac{n}{2} \cdot m^2 = m^3 = \frac{n^3}{8}. For n=4, this yields W(C_4) = \frac{64}{8} = 8, consistent with four adjacent pairs at distance 1 and two opposite pairs at distance 2.
The following table compares these values for n=4:
| Graph | Wiener Index |
|---|
| P_4 | 10 |
| C_4 | 8 |
| K_4 | 6 |
Historical Development
Origins in Chemistry
The Wiener index, originally termed the "path number," was introduced by chemist Harry Wiener in 1947 as a structural descriptor for predicting the boiling points of paraffin hydrocarbons.[1] Wiener's work on this topic spanned five papers published between 1947 and 1948. In his seminal paper, Wiener sought to quantify the topological features of molecular structures, emphasizing how the arrangement of atoms influences physical properties such as boiling points.[1] This approach arose amid post-World War II advancements in chemical engineering, where there was growing demand for computational tools to model and design molecules efficiently without extensive laboratory testing.
Wiener's primary motivation was to capture the effects of molecular branching and overall size in hydrocarbon graphs, representing carbon skeletons as graphs where vertices denote carbon atoms and edges represent bonds.[1] The index, defined as the sum of the shortest path distances between all pairs of vertices in the graph, provided a measure of molecular compactness: lower values indicated more branched structures, which correlated with lower boiling points due to reduced surface area for intermolecular forces.[1] By focusing on acyclic paraffin chains and branches, Wiener addressed the limitations of simpler descriptors like chain length alone, which failed to account for isomer-specific variations.[1]
In early applications, Wiener applied the index to a dataset of 94 paraffin hydrocarbons, including linear alkanes from butane to dodecane and branched isomers up to decanes.[1] He developed linear regression models combining the Wiener index with other topological parameters—such as the number of bonds (I), the number of branches (B), and a complexity measure (K)—to achieve accurate predictions of boiling points, with correlation coefficients exceeding 0.98 for the combined model.[1] These models demonstrated the index's utility in empirical structure-property relationships, laying the groundwork for quantitative structure-activity analysis in organic chemistry.[1]
Mathematical Adoption
During the 1970s, the Wiener index began transitioning from its origins as a chemical descriptor to a fundamental graph invariant in pure mathematics, particularly within combinatorial graph theory. Mathematicians such as Frank Harary integrated it into studies of distance-regular graphs and spectral graph theory, recognizing its utility in quantifying global connectivity and structural properties beyond chemical contexts.
A pivotal contribution came in 1976 with the work of Roger Entringer, David Jackson, and David Snyder, who examined the Wiener index specifically for trees, proving that among all trees on n vertices, the path graph P_n achieves the maximum value while the star graph K_{1,n-1} achieves the minimum.[5] This paper marked an early mathematical formalization, sparking interest in extremal problems for the index in discrete structures. In the 1980s, Ivan Gutman further advanced its mathematical standing through contributions linking the Wiener index to graph eigenvalues, including bounds and correlations with spectral radius and Laplacian spectra in various graph classes.[6]
The index was increasingly formalized as a topological invariant in combinatorics, emphasizing its role in extremal graph theory, where researchers sought graphs maximizing or minimizing it under constraints like degree sequences or diameter.[7] Key milestones included its comprehensive treatment in the 1990 monograph Distance in Graphs by Frank Buckley and Frank Harary, which positioned the Wiener index alongside other distance-based measures in graph theory monographs. By the 1990s, it had become a standard tool in combinatorial problems, appearing in mathematical olympiads and competitions focused on graph extremals.[8]
Properties
Basic Properties
The Wiener index of a graph is invariant under isomorphism, as it is defined in terms of the shortest-path distances between vertices, which are preserved by isomorphisms.[7]
For disconnected graphs, the Wiener index is additive over the connected components; specifically, if G is the disjoint union of connected graphs G_1, G_2, \dots, G_p, then W(G) = \sum_{i=1}^p W(G_i). Thus, for two connected graphs G and H, W(G \cup H) = W(G) + W(H).[7]
Subdividing an edge in a connected graph by inserting a new vertex strictly increases the Wiener index, since this operation lengthens the distances for all pairs of vertices whose shortest paths traverse that edge. This monotonicity property underscores how structural modifications that elongate paths affect the overall sum of distances.[7]
The Wiener index is highly sensitive to the connectivity of the graph: adding edges generally decreases W(G) by shortening some distances, while removing edges increases it. Among all connected graphs on n vertices, the minimum value is W(G) \geq \binom{n}{2}, achieved uniquely by the complete graph K_n. For trees on n vertices, the minimum is (n-1)^2, attained by the star graph S_n.[7]
Bounds and Extremal Graphs
The Wiener index W(G) of a connected graph G on n vertices satisfies \frac{n(n-1)}{2} \leq W(G) \leq \frac{n(n^2 - 1)}{6}, with the lower bound achieved by the complete graph K_n where all pairwise distances are 1, and the upper bound achieved by the path graph P_n where distances are maximized under connectivity.[9][10] These extremal graphs represent the fundamental optimization problems for the Wiener index among all connected graphs, as adding edges reduces distances and thus W(G), while trees maximize it relative to denser graphs.[10]
Among trees on n vertices, the star graph S_n (or K_{1,n-1}) achieves the minimum Wiener index, with W(S_n) = (n-1)^2, as distances are either 1 (from center to leaves) or 2 (between leaves). This result, often referred to in the context of Wiener's early work on molecular graphs and later formalized in extremal graph theory, shows that highly centralized structures minimize the sum of distances.[11] The maximum Wiener index for trees coincides with that of P_n, \frac{n(n^2 - 1)}{6}, as the path maximizes diameters and pairwise distances without cycles.[12]
Extremal problems for the Wiener index often incorporate degree constraints to model real-world networks. For trees with maximum degree \Delta \geq 2, the graphs minimizing W(G) are balanced \Delta-ary trees, which minimize the diameter and thus the sum of distances; for example, when \Delta = 3, the balanced ternary tree achieves the minimum among cubic trees.[13] Conversely, maximizing W(G) under fixed maximum degree \Delta yields path-like trees with degrees bounded by \Delta, approaching the unconstrained path as \Delta increases.[12] These results highlight how degree restrictions shift extremal structures toward balanced branching for minimization and linear extension for maximization.
Recent advancements (post-2000) have refined bounds incorporating structural parameters like minimum and maximum degrees. For instance, upper bounds on the Wiener index in terms of order n, minimum degree \delta, and maximum degree \Delta have been derived, with extremal graphs including Moore-like trees or cages for small girth.[14] Extensions to hypergraphs, relevant for higher-dimensional networks, include sharp upper bounds on the Wiener index for uniform hypergraphs, achieved by linear (path-like) structures that maximize distances across edges.[15]
Computation
General Algorithms
The Wiener index of a graph G = (V, E) with n = |V| vertices can be computed by first determining all pairwise shortest path distances and then summing them over all unordered pairs of vertices. One standard approach for dense graphs utilizes the Floyd-Warshall algorithm to calculate the all-pairs shortest paths matrix in O(n^3) time, after which the distances are summed, yielding the index directly.[16] This method is particularly suitable when the graph's adjacency matrix is readily available and the number of edges m is on the order of n^2, as it avoids repeated traversals.
For sparser graphs, a more efficient technique involves running breadth-first search (BFS) from each vertex to compute its distances to all others, followed by summing these distances and dividing by 2 to account for undirected edges. This BFS-based approach has a total time complexity of O(n(n + m)), making it preferable when m \ll n^2.[16] The algorithm leverages the graph's adjacency list representation for BFS traversals, ensuring scalability for moderate-sized networks.
While direct computation of the Wiener index for general graphs is polynomial-time solvable via these methods, certain restricted variants—such as finding an orientation of the graph that maximizes the Wiener index—are NP-complete.[17] These hardness results highlight computational challenges in optimization contexts but do not affect the feasibility of exact calculation for unoriented, undirected graphs.
In practice, libraries like NetworkX implement the Wiener index computation using efficient shortest-path routines, typically invoking all-pairs BFS for unweighted graphs to sum the distances.[18] For illustration, the following pseudocode outlines the BFS-based summation:
function wiener_index(G):
n = number of vertices in G
total = 0
for each vertex v in G:
distances = BFS_distances_from(v, G) // O(n + m) time
total += sum(distances)
return total / 2
function wiener_index(G):
n = number of vertices in G
total = 0
for each vertex v in G:
distances = BFS_distances_from(v, G) // O(n + m) time
total += sum(distances)
return total / 2
This implementation assumes an unweighted, connected graph and can be adapted for weighted cases using Dijkstra's algorithm in place of BFS.
Special Graph Classes
For trees, the Wiener index admits a closed-form expression based on the sizes of subtrees induced by edge removals. Specifically, for a tree T with edge set E(T), the Wiener index is given by
W(T) = \sum_{e=(u,v) \in E(T)} n_u(e) \cdot n_v(e),
where n_u(e) (respectively, n_v(e)) denotes the number of vertices in the connected component of T - e that contains u (respectively, v).[19] This formula arises because each edge e contributes to the total distance sum exactly n_u(e) \cdot n_v(e) times, as paths crossing e connect vertices in the two components separated by e. Computing this requires a single traversal to determine subtree sizes, yielding an O(|V|) time complexity.[19]
For cycle graphs C_n with n \geq 3 vertices, the distances between vertices are determined by the minimum arc length along the cycle, leading to exact closed-form expressions for the Wiener index. When n is even,
W(C_n) = \frac{n^3}{8},
and when n is odd,
W(C_n) = \frac{n(n^2 - 1)}{8}.
[20] These formulas are derived by summing the pairwise distances, where for each vertex, the distances to others are $1, 2, \dots, \lfloor n/2 \rfloor, each appearing twice except possibly the maximum in odd cases, and then multiplying by n/2 to account for unordered pairs.[20]
In complete bipartite graphs K_{m,n}, all edges connect the two partite sets of sizes m and n, so distances are 1 between different sets and 2 within the same set. The Wiener index is thus
W(K_{m,n}) = m^2 + mn + n^2 - m - n,
which simplifies from mn pairs at distance 1 and \binom{m}{2} + \binom{n}{2} pairs at distance 2.[21] This expression can be computed in constant time given m and n.[21]
For grid graphs, specifically the Cartesian product P_m \square P_n of two paths with m and n vertices, the Wiener index leverages the additive distance structure in the product. It is given by
W(P_m \square P_n) = n^2 \cdot \frac{m(m^2 - 1)}{6} + m^2 \cdot \frac{n(n^2 - 1)}{6},
where \frac{k(k^2 - 1)}{6} is the Wiener index of the path P_k.[22] This follows from the general formula for Cartesian products W(G \square H) = |H|^2 W(G) + |G|^2 W(H), with distances being the sum of component distances.[22] For hypercube graphs Q_n, which are n-fold Cartesian products of P_2, the Wiener index has the closed form
W(Q_n) = n \cdot 4^{n-1}.
[23] This reflects the binomial distribution of Hamming distances in the hypercube, where the number of pairs at distance k is \binom{n}{k} 2^{n-k}, summed as k \binom{n}{k} 2^{n-k} over k=1 to n, yielding the formula after simplification.[23]
Applications
Molecular Topology
The Wiener index functions as a fundamental topological descriptor in quantitative structure-activity relationship (QSAR) analyses of molecular graphs, where it quantifies the overall shape and extent of a molecule by summing the shortest path distances between all pairs of atoms.[1] This index correlates inversely with molecular compactness: linear alkane chains exhibit high Wiener index values due to extended path lengths, while branched structures yield lower values as atoms are more closely interconnected, providing a measure of molecular "spread" or linearity.[24]
In QSAR modeling, the Wiener index has been applied in linear regression to predict key physico-chemical properties such as boiling points and solubility, leveraging its sensitivity to molecular topology. Wiener's seminal 1947 study demonstrated its efficacy by correlating the index with boiling points of paraffin hydrocarbons, achieving an r² of approximately 0.98 across the dataset.[1]
In contemporary drug design during the 2020s, the Wiener index contributes to predicting bioavailability within absorption, distribution, metabolism, and excretion (ADME) frameworks, often alongside other topological indices in machine learning regressions on PubChem datasets of potential therapeutics. For instance, studies on antiviral candidates like those for COVID-19 have used the index in QSPR modeling of properties such as polar surface area—a proxy for oral bioavailability.[25]
Network Theory
In network theory, the Wiener index quantifies the overall efficiency of communication, social, and transportation networks by summing the shortest path distances between all pairs of vertices, providing insight into the compactness or dispersion of connections. A high Wiener index signifies spread-out networks, such as sparse social graphs where interactions are infrequent and paths are long, leading to reduced information flow or coordination. Conversely, a low Wiener index characterizes hub-dominated structures, like star-like topologies in peer-to-peer systems, where central nodes facilitate shorter paths and higher efficiency.[26]
In social networks, the Wiener index is applied in sociometry to evaluate the structural dispersion of relationships, helping researchers assess how isolated or interconnected communities are within larger groups. For instance, in sparse affiliation networks, elevated values highlight potential bottlenecks in influence propagation.[20]
Practical examples include urban road networks, where the Wiener index aids traffic optimization by modeling total travel distances; removing a critical link, such as a major artery, increases the index, signaling longer detours and higher congestion costs for commuters. Similarly, in internet topology, constructing minimum Wiener index spanning trees minimizes aggregate latency in wireless sensor and routing infrastructures, ensuring efficient data transmission across distributed nodes.[26][27]
Recent applications from the 2010s to 2025 have extended the Wiener index to epidemiology, where it models disease spread on contact networks; lower values correlate with faster outbreak speeds due to shorter transmission paths, as seen in simulations of pathogen dynamics on real-world graphs.[28][29]
Compared to the average distance, the Wiener index is directly proportional—specifically, it equals \binom{n}{2} times the average, where n is the number of vertices—offering a global scale for efficiency while the average provides a per-pair perspective. It is also integrated with clustering coefficients in community detection, where low Wiener values alongside high clustering reveal dense, modular subgroups in social or infrastructural networks.[7]
Extensions
Variants and Generalizations
The weighted Wiener index generalizes the standard Wiener index by incorporating a weight function on the vertices to reflect additional properties such as atom types or masses in molecular structures. For a connected graph G = (V, E) with vertex weight function w: V \to \mathbb{R}_+, the weighted Wiener index is defined as
W_w(G) = \sum_{u < v} w(u) w(v) \, d_G(u,v),
where d_G(u,v) is the shortest path distance between vertices u and v. This variant enhances the original index's ability to correlate graph topology with physico-chemical properties, such as boiling points in alkanes, by weighting distances according to vertex characteristics. The definition was formalized in chemical graph theory to address limitations of unweighted indices in quantitative structure-property relationship (QSPR) models.[30]
The partial Wiener index restricts the summation to pairs of vertices from specified subsets, allowing analysis of distances within or between particular groups, such as different atom types in a molecule. For subsets A, B \subseteq V(G), it is given by
W(A, B; G) = \sum_{u \in A} \sum_{v \in B} d_G(u,v).
When A = B, it captures intra-subset distances; otherwise, it measures inter-subset interactions. This adaptation is valuable in organic chemistry for evaluating structural features like branching or heteroatom placements, improving predictions of molecular stability and reactivity without computing the full index. The concept emerged in studies of tree-like molecular graphs to isolate contributions from specific vertex classes.
Higher-order analogs extend the index by raising distances to powers, emphasizing longer paths in the distance distribution. The k-th power Wiener index for integer k \geq 1 is
W^{(k)}(G) = \sum_{u < v} d_G(u,v)^k.
For k=1, it recovers the standard Wiener index; higher k values amplify the role of remote vertex pairs, yielding better regression fits in QSAR for properties like heat of formation in hydrocarbons. These indices provide a hierarchy of topological descriptors, with seminal applications in modeling alkane isomers.[31]
Variants for directed graphs and multigraphs adapt the index to handle directionality or multiple/weighted edges. In a strongly connected digraph D, the directed Wiener index is
W(D) = \sum_{u \neq v} d_D(u,v),
summing directed shortest path lengths over ordered pairs, suitable for asymmetric networks like traffic flows. For multigraphs or edge-weighted graphs, distances use the minimum path weight sum, \delta_G(u,v) = \min \{ \sum_{e \in P} w(e) \mid P \text{ path from } u \text{ to } v \}, yielding W(G) = \sum_{u < v} \delta_G(u,v); this accommodates edge capacities in applications like transportation modeling. These extensions preserve the core summation while incorporating edge properties, with extremal properties studied for orientations and weighted trees.
Reconstruction Problems
The reconstruction problem for the Wiener index involves determining whether a given integer w can be realized as the Wiener index of a graph (or a specific class like trees) and, if so, identifying the possible graphs. A key challenge is the non-uniqueness of such realizations, as multiple non-isomorphic trees can share the same Wiener index w. For sufficiently large w, there exist at least $2^{\sqrt{w}/4} non-isomorphic trees with Wiener index w, while the upper bound is at most $2^{O(\sqrt{w})}. This ambiguity arises from structural variations, such as different branchings in trees with comparable path lengths, limiting the Wiener index's ability to uniquely identify a graph.
For trees, reconstruction often combines the Wiener index with other invariants like the degree sequence to narrow possibilities. Algorithms exploit the Wiener matrix, whose entries are distances between vertices, to reconstruct the tree by identifying adjacencies from maximum row or column entries, assuming no pendant vertices and sufficient girth conditions. When paired with the degree sequence, the Wiener index enables identification of extremal trees, such as the unique ball or festoon minimizing or maximizing W via direct or alternating labelings based on W - d(v), where d(v) is the distance sum from v. These methods provide certificates for bounded-degree trees and refine majorization orders on degree sequences.
The complexity of realizing a given w as the Wiener index varies by graph class. For trees, pseudo-polynomial dynamic programming algorithms construct a tree with exact Wiener index w, running in time polynomial in the input size and w. For caterpillar trees, which have a path backbone with pendant leaves, the structural simplicity allows polynomial-time reconstruction variants, particularly when combined with constant distinct weights or path constraints. In contrast, for general graphs, related inverse problems like maximizing the Wiener index under vertex weights or degrees are NP-complete, highlighting the computational hardness of precise realizations.
Computational studies up to 1206 confirmed 49 specific values that are not realizable as Wiener indices of trees, a result proved for all larger integers in 2006. These bounds aid molecular design by quantifying structural diversity for given w, though full disambiguation typically requires additional invariants.[32]