Fact-checked by Grok 2 weeks ago

Link prediction

Link prediction is a core task in graph theory and network analysis that aims to infer the likelihood of missing or future edges between nodes in a network, given a partial snapshot of its structure.^[1] This problem models real-world systems as graphs where vertices represent entities and edges denote relationships, enabling predictions based on observed topology, node attributes, and relational patterns.^[2] Originating from information retrieval and social network studies in the early 2000s, link prediction gained prominence through foundational work examining collaboration networks and web structures.^[1] Early methods focused on similarity-based heuristics, such as the common neighbors index, which scores potential links by the number of shared connections, or global measures like the Katz index that propagate influence across paths of varying lengths.^[1] These approaches leverage topological features to estimate proximity, with local methods prioritizing efficiency and global ones capturing long-range dependencies.^[2] Subsequent developments introduced probabilistic and maximum likelihood models, including stochastic block models that assume community structures and hierarchical models accounting for network organization.^[2] In the deep learning era, graph neural networks (GNNs) have revolutionized the field by learning low-dimensional embeddings of nodes and edges, enabling scalable predictions through message-passing mechanisms in frameworks like GraphSAGE and SEAL.^[3] These GNN-based techniques excel in handling heterogeneous and dynamic graphs, outperforming traditional methods on benchmarks like citation and protein interaction networks.^[4] Link prediction finds diverse applications across domains, including recommender systems where it suggests user-item connections, biological network reconstruction to uncover protein interactions (e.g., estimating 80% missing links in yeast PPI networks), and knowledge graph completion for semantic web tasks.^[2] In social networks, it anticipates friendships or collaborations, while in cybersecurity, it detects anomalous links.^[1] Evaluation typically uses metrics like area under the ROC curve (AUC) and precision at top-k, with datasets from sources like SNAP and KONECT serving as standards.^[2] Ongoing challenges include scalability for massive graphs, handling temporal dynamics, and addressing biases in imbalanced link distributions.^[5]

Problem Definition

Mathematical Formulation

Link prediction is the task of inferring the existence of potential missing or future edges in an undirected graph G = (V, E), where V is the set of nodes representing entities and E is the set of observed edges representing interactions between them.^[6] This formulation assumes simple graphs without self-loops or multiple edges between the same pair of nodes.^[6] Formally, given a snapshot of the graph, the goal is to predict edges in the unobserved portion U \setminus E, where U is the universal set of all possible edges, with |U| = |V| \cdot (|V| - 1)/2.^[6] In temporal settings, this involves dividing the edges into a training set E_T from an earlier time interval and a probe set E_P from a later interval, such that E = E_T \cup E_P and E_T \cap E_P = \emptyset, then predicting links in E_P based on E_T.^[1] The problem is commonly framed as a binary classification task over pairs of nodes, treating observed edges as positive samples and unobserved pairs as negative samples, with the objective of estimating the probability p(u, v) that an edge exists between nodes u and v.^[6] The graph structure is often represented using the adjacency matrix A \in \{0, 1\}^{|V| \times |V|}, where A_{uv} = 1 if (u, v) \in E and A_{uv} = 0 otherwise (with A symmetric for undirected graphs); link prediction then seeks to estimate values of A_{uv} for unobserved entries.^[6] For large graphs, the computational challenge arises from the quadratic number of possible edges, O(|V|^2), which makes exhaustive evaluation prohibitive without efficient approximations; static formulations (random edge splits) contrast with snapshot-based approaches (temporal divisions), where the latter better capture evolving dynamics but require handling time-stamped data.^[6]^[1]

Problem Variants

Link prediction problems extend beyond the standard static, undirected, binary-edge setting to accommodate real-world complexities in graph structures. One key variant distinguishes between static and dynamic scenarios. In static link prediction, the graph is treated as fixed, focusing on inferring missing or potential links based on the current snapshot. In contrast, dynamic link prediction addresses evolving graphs where edges appear, disappear, or change over time, often using temporal snapshots to forecast future connections. This variant is crucial for applications like social network evolution, where historical interaction patterns inform predictions of emerging ties. Surveys highlight that dynamic approaches must capture temporal dependencies, such as using time-decay functions or sequential models to weigh recent interactions more heavily than older ones.^[7] Recent surveys as of 2025 emphasize ongoing advancements in temporal link prediction.^[5] Another important distinction involves directed versus undirected graphs. Undirected link prediction assumes symmetric relationships, where a link between nodes u and v implies mutual connectivity, common in friendship networks. Directed variants, however, model asymmetric relations, such as citations or web hyperlinks, where an edge from u to v does not imply the reverse. In directed graphs, prediction requires distinguishing incoming and outgoing links, often using asymmetric similarity measures like directed common neighbors or PageRank variants to account for directionality. For instance, in citation networks, predicting a directed link from paper A to B involves analyzing A's references to B's topics without assuming reciprocity. Research emphasizes that naive adaptation of undirected methods to directed graphs can lead to suboptimal performance due to ignored asymmetry.^[8]^[9] Heterogeneous graphs introduce further complexity by featuring multiple node and edge types, as seen in knowledge graphs with entities like persons, organizations, and relations such as "employs" or "located_in." Unlike homogeneous graphs, link prediction here must respect type constraints, predicting only plausible connections (e.g., employment links between persons and organizations). Methods often leverage meta-paths—sequences of typed edges—to capture semantic paths, enabling type-aware embeddings that enhance prediction accuracy in benchmarks like academic collaboration networks. This variant is prevalent in recommendation systems, where diverse data types (users, items, reviews) require tailored proximity measures. Seminal work on relational graph convolutional networks has established foundations for handling such multiplicity.^[10]^[11] N-ary link prediction generalizes beyond binary edges to multi-relational facts in knowledge graphs, such as (head entity, relation, tail entity, attributes), capturing complex interactions like "Einstein (discovered, theory of relativity, in 1915, with mathematics)." Traditional binary methods falter here, as they cannot model roles or additional qualifiers; n-ary approaches instead decompose facts into structured representations, like hyper-relations or box embeddings, to predict missing components. Surveys note that this variant addresses limitations in binary knowledge graphs, achieving improved Hits@1 scores on datasets like WikiPeople by incorporating role-specific embeddings. It is essential for domains like biomedicine, where facts involve multiple interacting elements. Recent 2025 surveys review progress in n-ary methods.^[12]^[13]^[12] Signed and weighted link prediction incorporate edge attributes for richer semantics. Signed variants predict positive (e.g., trust) or negative (e.g., distrust) links in balance theory networks, using structural balance principles where cycles of even length with mixed signs indicate instability. Weighted prediction extends this by forecasting edge strengths, such as interaction frequencies, often via probabilistic models that normalize similarities by edge magnitudes. In signed networks, methods exploiting longer cycles have demonstrated superior AUC scores compared to local heuristics. For weights, reliable route-based approaches predict both existence and intensity, vital for traffic or trade networks. These variants enhance standard prediction by quantifying relation quality.^[14]^[15] Finally, hyperlink prediction in hypergraphs targets higher-order connections where edges (hyperlinks) link multiple nodes, as in co-authorship groups or protein complexes. Unlike pairwise links, this predicts entire hyperedges, using incidence matrices or hypergraph embeddings to infer group formations. Approaches like neural hypergraph models have demonstrated improved F1 scores on datasets like DBLP, by modeling node subsets' joint affinities. This variant is key for capturing collective interactions absent in simple graphs.^[16]^[17]

Historical Development

Early Foundations

The origins of link prediction can be traced to foundational concepts in graph theory and social network analysis during the 1970s, emphasizing graph distance, connectivity, and node similarity to model relational structures. A key contribution came from Lorrain and White, who in 1971 defined structural equivalence in social networks as occurring when two individuals (nodes) share identical connections to others, using measures like Euclidean distance on adjacency vectors to quantify similarity and infer potential relational patterns.^[18] This approach highlighted how topological proximity could reveal underlying social structures, providing an early theoretical basis for predicting absent links based on existing connectivity. In the 1990s, these ideas extended to bibliographic and social networks, where similarity measures were applied to analyze co-authorship patterns and anticipate collaborations. Researchers drew on metrics such as common neighbors and co-citation—building on earlier work like Kessler's 1963 bibliographic coupling, which linked papers sharing common references as similar—to explore network evolution in scientific communities.^[19] Wasserman and Faust's 1994 comprehensive treatment formalized such measures for social network analysis, including structural equivalence variants like Hamming distance, enabling the identification of potential ties in co-authorship graphs through observed relational overlaps. These efforts underscored early predictive intuitions, such as forecasting researcher collaborations based on shared connections. Parallel influences emerged from database management and information retrieval in the 1980s and 1990s, where record linkage and entity resolution techniques addressed the challenge of merging disparate data sources by estimating the likelihood of matches via similarity scores. The probabilistic framework established by Fellegi and Sunter in 1969, which compared record pairs using weighted agreement patterns, became a cornerstone, with practical implementations proliferating in statistical agencies during this era to resolve entity identities across datasets.^[20] This work prefigured link prediction by treating potential matches as inferred connections in an implicit graph of records. The formalization of link prediction as a distinct problem occurred in the early 2000s, with Liben-Nowell and Kleinberg's 2003 paper articulating it as the task of inferring likely future edges from a network snapshot using proximity metrics derived from graph topology.^[1] Initial motivations focused on practical applications, such as predicting hyperlinks in web graphs and new collaborations in co-authorship networks, where simple node similarity measures demonstrated predictive power without requiring external features.^[21]

Key Advancements

In the 2000s, link prediction research gained prominence in social network analysis, with similarity-based methods emerging as foundational approaches by leveraging topological features to infer potential connections. These methods, such as common neighbors and preferential attachment, were extended through integrations with global centrality measures like PageRank and HITS, which adapted random walk principles to score node pairs based on their authority and hub scores within the network.^[1] This period marked a shift toward empirical applications in large-scale social graphs, where such techniques demonstrated improved predictive accuracy over purely local heuristics by incorporating network-wide propagation effects.^[1] The 2010s saw the rise of matrix factorization and embedding techniques, enabling scalable representations for link prediction in sparse, high-dimensional networks. Early matrix factorization models treated link prediction as a supervised low-rank approximation task, learning latent factors from observed edges to reconstruct missing ones, which proved effective on recommendation-like graphs.^[22] A pivotal advancement was the LINE model, which introduced efficient first- and second-order proximity objectives to embed nodes in low-dimensional spaces while preserving local and global structures, achieving up to 5x speedup over methods like DeepWalk on networks with millions of nodes.^[23] From 2015 to 2020, the advent of graph neural networks (GNNs) revolutionized link prediction by enabling end-to-end learning of node representations through message passing. GraphSAGE (2017) advanced inductive learning by sampling and aggregating neighborhood features, allowing generalization to unseen nodes and demonstrating superior performance on inductive node classification tasks using dynamic citation networks.^[24] Similarly, the Graph Attention Network (GAT, 2018) incorporated attention mechanisms to weigh neighbor contributions dynamically, outperforming prior GNNs on node classification benchmarks.^[25] Between 2020 and 2025, developments addressed dynamic and heterogeneous settings, with temporal GNNs like TGAT (2020) introducing time-aware attention to model evolving interactions, capturing temporal dependencies in event streams for improved future link forecasting in social and biological networks.^[26] In knowledge graphs, N-ary models extended binary relation prediction to multi-entity facts, with surveys highlighting tensor-based and hypergraph embeddings that enhanced completeness in complex domains like biomedicine.^[12] The impact of big data drove scalability enhancements through sampling and approximation, such as neighborhood subsampling in GraphSAGE to handle billion-edge graphs and randomized low-rank approximations in matrix methods, reducing computational complexity from O(n^2) to near-linear while maintaining predictive quality.^[24] A key milestone was the 2024 survey on feature learning techniques, which underscored the transition to deep embedding methods, reporting significant improvements, such as GCN achieving higher AUC scores than traditional metrics like common neighbors on benchmarks.^[27] In late 2025, studies such as those reconsidering Graph Autoencoder (GAE) performance in link prediction highlighted the impact of advanced training techniques on GNN efficacy.^[28]

Background Concepts

Graphs and Networks

A graph G is formally defined as an ordered pair G = (V, E), where V is a finite set of vertices (also called nodes) and E is a set of edges, with each edge being an unordered pair of distinct vertices from V in the case of simple undirected graphs.^[29] Simple undirected graphs prohibit self-loops (edges connecting a vertex to itself) and multiple edges between the same pair of vertices, making them suitable for modeling symmetric relationships without redundancy.^[29] The adjacency matrix A of such a graph is a square matrix where the entry A_{ij} = 1 if there is an edge between vertices i and j, and $0 otherwise; since the graph is undirected, A is symmetric.^[29] Networks, often used interchangeably with graphs in this context, can vary in structure. Unweighted networks assign no numerical values to edges, focusing solely on their presence or absence, whereas weighted networks include edge weights to represent strengths, capacities, or similarities.^[29] Simple graphs contrast with multigraphs, which allow multiple edges between the same pair of vertices and may include self-loops, enabling the modeling of parallel connections or reflexive relations.^[29] Real-world complex networks, such as social or biological systems, frequently exhibit distinctive properties: scale-free degree distributions, where the probability P(k) of a node having degree k follows a power law P(k) \sim k^{-\gamma} with $2 < \gamma < 3, as modeled by the preferential attachment mechanism in the Barabási–Albert model^[30]; the small-world phenomenon, characterized by short average path lengths despite high local clustering^[31]; and non-zero clustering coefficients, quantifying the tendency for neighbors of a node to be interconnected.^[31] The local clustering coefficient for a node u with degree k_u is C_u = \frac{2 \times \text{number of edges between neighbors of } u}{k_u (k_u - 1)}, while the global coefficient averages this over all nodes.^[31] For efficient storage and traversal, especially in large-scale applications, graph representations differ based on density. Adjacency matrices require O(|V|^2) space, which is inefficient for sparse graphs where |E| \ll |V|^2, but adjacency lists—arrays of lists where each index corresponds to a vertex and lists its adjacent vertices—use O(|V| + |E|) space, making them preferable for sparse structures common in complex networks. Basic operations include computing the degree d(u) = |\{ v \in V : \{u, v\} \in E \}|, identifying the open neighborhood N(u) = \{ v \in V : \{u, v\} \in E \}, and measuring path lengths as the number of edges in the shortest sequence of adjacent vertices between two nodes.^[29] In the context of link prediction, these graph structures are crucial because the observed edge set E typically represents an incomplete view of the underlying network.^[6]

Similarity and Proximity Measures

Similarity and proximity measures quantify the likelihood of a potential link between nodes in a graph by assessing how closely connected or structurally alike the nodes are, serving as foundational components in link prediction tasks across various network types. These measures operate on the principle that nodes exhibiting higher similarity or proximity are more prone to forming edges, drawing from topological properties of the graph. In link prediction, such measures help rank non-existent edges by assigning scores that reflect relational closeness, often derived from neighborhood overlaps or path structures without relying on node attributes. Local similarity measures focus on the immediate neighborhoods of nodes, capturing direct connections and common neighbors within short distances, which makes them computationally efficient for large graphs but limited in scope. In contrast, global similarity measures incorporate information from the entire network structure, such as paths spanning multiple hops, to provide a more comprehensive view of node relationships, though they are often more resource-intensive to compute. This distinction allows local measures to excel in dense, localized clusters while global ones better handle broader connectivity patterns in complex networks. Path-based proximity emphasizes the role of connecting paths between nodes, where shorter paths or weighted ensembles of all paths indicate stronger potential links; for instance, weights can decay exponentially with path length to prioritize nearby connections. These approaches model proximity as the aggregation of paths, reflecting how information or influence might propagate through the graph, and have been shown to improve prediction accuracy in networks with hierarchical structures. By considering multiple path lengths, path-based methods bridge local and global perspectives, enhancing robustness in diverse graph topologies. Structural equivalence assesses similarity based on nodes having comparable connection patterns to other nodes in the graph, regardless of direct proximity, identifying nodes that occupy similar roles or positions within the network. Nodes deemed structurally equivalent are expected to link to similar sets of other nodes, making this measure useful for predicting links in role-based systems like organizational or biological networks. This concept extends beyond mere neighbor overlap to capture isomorphic substructures, providing insights into functional similarities. The role of randomness in similarity measures is exemplified by stochastic block models (SBMs), which incorporate probabilistic community structures to define node similarity based on latent group memberships, where nodes within the same block have higher link probabilities. SBMs introduce randomness to account for unobserved community effects, enabling similarity scores that reflect block-wise affinities rather than deterministic patterns, and have been adapted for link prediction in evolving networks. This probabilistic framework helps model assortative mixing in communities, improving predictions in modular graphs. Despite their utility, similarity and proximity measures exhibit limitations, particularly sensitivity to network density and sparsity, where sparse graphs lead to unreliable scores due to insufficient paths or neighbors, resulting in poor generalization. In low-density settings, local measures may underperform due to noise from isolated nodes, while global measures suffer from computational overhead without proportional gains in accuracy. These challenges highlight the need for hybrid or adaptive approaches to mitigate biases in underrepresented regions. These measures fundamentally inform scoring functions in link prediction by transforming structural insights into probabilistic edge likelihoods, guiding algorithms toward pairs with high predicted connectivity and paving the way for more advanced topological methods.

Approaches

Topology-Based Methods

Topology-based methods for link prediction rely exclusively on the structural properties of the graph, such as connectivity patterns and path counts between nodes, to estimate the likelihood of a potential edge forming between two non-adjacent nodes u and v. These approaches assume that the topology encodes latent relational tendencies, where nodes with similar neighborhood structures or short paths between them are more prone to connect. Seminal work formalized many of these heuristics in the context of social networks, evaluating their predictive power across diverse datasets.^[1] One of the simplest topology-based measures is the common neighbors index, which scores a pair (u, v) as the size of the intersection of their neighbor sets: s(u,v) = |\Gamma(u) \cap \Gamma(v)|, where \Gamma(x) denotes the neighbors of node x. This metric posits that shared connections indicate a higher probability of future linkage, as common neighbors may facilitate introductions or influence. It performs well in dense, clustered networks like collaboration graphs but can favor high-degree nodes without normalization.^[1] To address degree biases in common neighbors, the Jaccard coefficient normalizes the overlap by the union of neighborhoods: s(u,v) = \frac{|\Gamma(u) \cap \Gamma(v)|}{|\Gamma(u) \cup \Gamma(v)|}. Originally a set similarity measure, it was adapted for link prediction to penalize pairs where one or both nodes have many connections, emphasizing relative overlap over absolute count. This makes it particularly effective in heterogeneous networks with varying node degrees.^[1] The Adamic-Adar index refines common neighbors by weighting shared intermediaries inversely by their degree: s(u,v) = \sum_{w \in \Gamma(u) \cap \Gamma(v)} \frac{1}{\log |\Gamma(w)|}. Introduced to capture the intuition that connections through low-degree (rarer) nodes are more predictive than those through hubs, it boosts the signal from distinctive common neighbors. Empirical studies on web and citation networks showed it outperforming unweighted counts in precision.^[32]^[1] The Katz index extends beyond immediate neighbors by aggregating all walks of varying lengths between u and v, with exponential decay for longer paths:

s(u,v) = \sum_{\ell=1}^{\infty} \beta^\ell \cdot \#(\text{walks of length } \ell \text{ from } u \text{ to } v),

where $0 < \beta < 1 controls the attenuation. This can be computed in closed form as the (u,v)-entry of the matrix (I - \beta A)^{-1} - I, with A the adjacency matrix, capturing global topological proximity akin to a decaying random walk. Derived from sociometric status measures, it excels in capturing transitive closures but requires tuning \beta for sparsity.^[1] Preferential attachment draws from scale-free network growth models, scoring pairs as the product of their degrees: s(u,v) = |\Gamma(u)| \cdot |\Gamma(v)|. It embodies the "rich-get-richer" principle, where high-degree nodes are more likely to attract new links due to visibility or resources. Rooted in empirical observations of power-law distributions, this heuristic predicts well in growing networks like the World Wide Web but underperforms in static or assortative structures.^[1]^[33] For large-scale graphs, computing these scores naively is prohibitive due to O(n^2) pairs and high-degree intersections. Efficient approximations leverage sparse matrix multiplications: for instance, common neighbors can be estimated via A^2's diagonal-off entries, Katz via Neumann series truncation or low-rank inversions, and preferential attachment directly from degree sequences. These techniques scale to millions of nodes on standard hardware, with preprocessing like indexing accelerating neighbor intersections.^[1]

Node and Edge Feature-Based Methods

Node and edge feature-based methods for link prediction rely on explicit attributes associated with nodes or potential edges, rather than solely on graph topology, to infer the likelihood of connections between pairs of nodes. These attributes can include vector representations \mathbf{x}_u for each node u, such as user profiles in social networks (e.g., age, interests, or location) or protein properties in biological graphs.^[34] By computing similarity scores directly from these features, methods estimate link probabilities; for instance, nodes with highly similar attributes are more likely to form edges in domains like collaboration networks.^[35] A common approach uses distance-based similarity measures on node attribute vectors. The Euclidean distance between two nodes u and v is calculated as s(u,v) = \|\mathbf{x}_u - \mathbf{x}_v\|_2, where lower distances indicate higher link probability, making it suitable for numerical attributes like coordinates or quantitative profiles.^[35] For directional or high-dimensional data, cosine similarity is preferred: s(u,v) = \frac{\mathbf{x}_u \cdot \mathbf{x}_v}{\|\mathbf{x}_u\| \|\mathbf{x}_v\|}, which captures angular alignment and is robust to vector magnitudes, as seen in text-based attributes like document embeddings or keyword vectors in academic coauthorship prediction.^[35] These measures provide interpretable scores that can be thresholded or ranked to predict links.^[34] Edge features extend this paradigm by incorporating attributes on potential connections, such as spatial distances in geographic networks where proximity (e.g., Euclidean distance between locations) serves as a direct predictor of link formation.^[36] In supervised settings, these node and edge features are concatenated into input vectors for classification models to predict edge existence. Logistic regression models the probability of a link as P(u,v) = \sigma(\mathbf{w}^T \mathbf{f}_{uv} + b), where \mathbf{f}_{uv} combines node similarities and edge attributes, while support vector machines (SVMs) with RBF kernels have demonstrated superior performance, achieving up to 90% accuracy on coauthorship datasets by classifying feature vectors from historical links.^[34] Handling mixed attribute types is essential for real-world graphs. Categorical features, such as user categories or research topics, are often encoded via match counts (e.g., overlapping keywords) or one-hot encoding to enable similarity computation, while numerical features undergo normalization (e.g., min-max scaling to [0,1]) to prevent scale dominance in distance metrics.^[34] This preprocessing ensures robust feature integration, as validated in supervised frameworks where encoded attributes improve prediction precision and recall over unprocessed inputs.^[34]

Embedding and Representation Learning Methods

Embedding and representation learning methods for link prediction focus on deriving low-dimensional vector representations (embeddings) of nodes from graph structure, node attributes, or both, which are then used to compute similarity scores for potential edges. These approaches enable scalable predictions by capturing latent patterns in the data, often outperforming heuristic measures on large, sparse networks. Early techniques drew from collaborative filtering and natural language processing, while modern variants leverage deep learning to handle complex dependencies.^[22] Matrix factorization represents a foundational embedding method, decomposing the adjacency matrix A of the graph into two low-rank matrices U and V such that A \approx U V^T, where rows of U and V serve as embeddings for source and target nodes, respectively. The likelihood of a link between nodes i and j is scored via the dot product u_i \cdot v_j, which approximates the entry A_{ij}. This approach, inspired by recommender systems, effectively learns latent factors from observed links while regularizing for sparsity, achieving strong performance on bipartite graphs like user-item networks.^[22] Random walk-based methods generate embeddings by simulating walks on the graph and applying word embedding techniques like Skip-Gram to predict context nodes. Node2Vec, introduced in 2016, extends this paradigm with biased random walks that balance breadth-first (homophily-focused) and depth-first (structural equivalence-focused) sampling via parameters p and q, producing embeddings where similar nodes co-occur in walk contexts. Link scores are computed as the dot product of node embeddings, enabling effective prediction in diverse networks such as social and citation graphs, with demonstrated improvements over prior walk-based models like DeepWalk.^[37] Graph neural networks (GNNs) advance embedding learning through iterative message passing, where node representations are updated by aggregating features from neighbors. GraphSAGE, proposed in 2017, introduces an inductive framework that samples and aggregates neighbor features using mean, LSTM, or pooling operations, generating embeddings scalable to unseen nodes without full graph retraining. For link prediction, embeddings of candidate nodes are concatenated and passed through a multilayer perceptron (MLP) to output edge probabilities, excelling in inductive settings like large-scale web graphs with textual node features.^[24] Extensions to heterogeneous and dynamic graphs address multi-relational and temporal aspects. Relational Graph Convolutional Networks (R-GCNs), from 2018, handle knowledge graphs with multiple edge types by applying relation-specific transformations during message passing, learning type-aware embeddings while sharing parameters for efficiency; link scores follow standard GNN decoding, improving prediction on benchmarks like Freebase.^[26] For temporal graphs, Temporal Graph Attention Networks (TGAT), introduced in 2020, incorporate time encoding via learnable embeddings and attention mechanisms to weigh historical interactions, producing time-aware node representations for dynamic link prediction in evolving networks like social interactions.^[26] Recent advances integrate transformer architectures into GNNs for enhanced scalability and expressiveness in heterogeneous link prediction. Transformer-based GNNs, such as those leveraging multi-head attention over graph paths or subgraphs from 2023 onward, capture long-range dependencies in knowledge graphs, outperforming convolutional variants on large-scale datasets by fusing local structure with global context; for instance, models like enhanced graph transformers achieve state-of-the-art results on tasks involving multimodal features.^[38] Training these embedding models often employs negative sampling to approximate the full objective efficiently, generating non-existent edges as negatives to contrast against observed positives, as in Node2Vec's Skip-Gram optimization. Contrastive losses further enhance representations by maximizing similarity for positive (linked) pairs while minimizing it for negatives, promoting discriminative embeddings; this is particularly vital for scalability in GNNs, reducing computational overhead from dense pairwise computations.^[37]^[39]

Probabilistic and Logical Methods

Probabilistic and logical methods in link prediction model the uncertainty inherent in graph edges by leveraging probabilistic frameworks and logical rules to infer potential links, emphasizing interpretability and the incorporation of prior knowledge over purely data-driven approaches. These methods treat link existence as a probabilistic event, often estimating the probability P(\text{edge} \mid u, v) based on relational structures and domain-specific rules, which allows for handling incomplete or noisy data in networks. By combining statistical inference with logical reasoning, they provide not only predictions but also explanations grounded in human-readable rules, distinguishing them from embedding-based techniques that prioritize latent representations. Bayesian approaches formulate link prediction as a probabilistic inference problem, where priors are placed on graph paths or relational patterns to estimate the likelihood of an edge between nodes u and v. For instance, Bayesian personalized ranking adapts matrix factorization with Bayesian priors to rank potential links in recommendation graphs, achieving superior performance on sparse datasets by incorporating user-item interactions as probabilistic evidence. More advanced variants, such as deep graph convolutional Gaussian processes, integrate Bayesian nonparametrics with graph convolutions to model edge probabilities while quantifying epistemic uncertainty through variational inference, demonstrating improved accuracy on citation networks like Cora compared to deterministic baselines. These methods excel in scenarios with limited labeled data, as the priors regularize predictions against overfitting.^[40] Markov logic networks (MLNs) extend first-order logic with probabilistic weights attached to rules, enabling the modeling of relational dependencies for link prediction. Each logical formula, such as "friends of friends are likely to be friends" (e.g., \forall x, y, z \, \text{Friends}(x,y) \land \text{Friends}(y,z) \Rightarrow \text{Friends}(x,z)), is assigned a weight that reflects its confidence, and the network's probability distribution over possible worlds is defined via a Markov random field. Inference in MLNs for link prediction typically employs Markov chain Monte Carlo (MCMC) sampling to compute marginal probabilities of missing edges, as demonstrated in applications to social networks where relational rules capture transitivity and homophily. This framework has been shown to outperform independent link predictors by collectively reasoning over the entire graph structure.^[41] Probabilistic soft logic (PSL) relaxes the binary truths of classical logic using Łukasiewicz semantics, where truth values range continuously from 0 to 1, allowing for soft constraints in relational domains. In link prediction, PSL defines an objective function that maximizes the posterior probability over these soft truth values by minimizing hinge-loss violations of weighted rules, such as those encoding path-based similarities. The optimization is solved via convex programming, making it scalable for large graphs, and has been applied effectively to tasks like entity resolution and knowledge base completion, where it integrates noisy evidence from multiple relations. PSL's relaxation enables handling of conflicting rules, yielding more robust predictions than crisp logical systems.^[42] Relational models like R-Models (RMLs) use node embedding techniques in neural network architectures to predict link weights by extracting relational information from known connections. These models facilitate scalable inference over relational data in weighted graphs, balancing predictive power with the ability to handle multi-relational structures.^[43] Handling noise in these methods involves uncertainty quantification, such as computing confidence intervals or posterior distributions over predicted edges to account for observational errors or incomplete information. Bayesian frameworks naturally provide this through posterior sampling, offering epistemic uncertainty estimates that highlight low-confidence predictions in noisy environments like biological networks. Similarly, MLNs and PSL incorporate noise via weighted rules and soft truths, enabling robust inference that propagates uncertainty across relations. Recent neurosymbolic integrations, particularly from 2024-2025, hybridize graph neural networks (GNNs) with logical and probabilistic components for explainable link prediction. These approaches use GNNs to learn embeddings that are then constrained by symbolic rules, improving interpretability in complex domains; for example, neural-symbolic reasoning frameworks infer missing links via rule-grounded attention mechanisms, outperforming pure GNNs on knowledge graphs by providing traceable explanations as of 2025. Such hybrids have shown promise in applications requiring accountability, like supply chain networks, where they quantify prediction uncertainty while adhering to domain logic.^[44]

Evaluation

Performance Metrics

Link prediction performance is typically evaluated using metrics that assess the ability of models to rank potential edges correctly, given the inherent sparsity and class imbalance in graphs where non-edges vastly outnumber observed edges.^[45] These metrics emphasize ranking quality over absolute classification accuracy, as the task often involves prioritizing likely links from a large pool of candidates.^[1] The area under the receiver operating characteristic curve (AUC-ROC) measures the model's capacity to distinguish observed edges from non-edges by ranking non-observed pairs below observed ones, computed as the probability that a randomly chosen observed edge scores higher than a randomly chosen non-edge.^[45] Formally, for a scoring function s(u,v) assigning scores to node pairs (u,v),

\text{AUC} = \frac{1}{|\mathcal{E}^+||\mathcal{E}^-|} \sum_{(u,v) \in \mathcal{E}^+} \sum_{(x,y) \in \mathcal{E}^-} \mathbb{I}[s(u,v) > s(x,y)],

where \mathcal{E}^+ and \mathcal{E}^- are sets of observed and non-observed edges, respectively, and \mathbb{I}[\cdot] is the indicator function.^[46] AUC is widely adopted due to its threshold-independent nature and interpretability as ranking performance, though it can be less sensitive in highly sparse networks.^[45] Precision at k (P@k) evaluates the fraction of true edges among the top-k highest-scored candidate pairs, focusing on the quality of the most promising predictions.^[47] It is defined as

\text{P@k} = \frac{|\{ (u,v) \in \text{top-}k : (u,v) \in \mathcal{E}^+ \}|}{k},

prioritizing practical utility in applications like recommendation systems where only a small number of top predictions are acted upon.^[48] Mean average precision (MAP) extends precision by averaging precision values at each rank where a true edge appears, providing a robust measure for imbalanced settings through its emphasis on ranking order.^[49] For a set of queries (e.g., per-node predictions), MAP is the mean of average precisions across all queries, where average precision for a query is

\text{AP} = \frac{1}{m} \sum_{r=1}^{n} \text{P@r} \cdot \mathbb{I}[r \in \text{relevant ranks}],

with m as the number of true edges for the query and n as the total ranked list length; this metric is particularly effective for capturing cumulative ranking quality in sparse graphs.^[49] Hit ratio at k (HR@k), also known as recall at k, assesses whether at least one true edge appears in the top-k predictions, offering a binary indicator of basic retrieval success.^[50] Computed as

\text{HR@k} = \frac{|\{ q : \text{top-}k_q \cap \mathcal{E}^+_q \neq \emptyset \}|}{|\mathcal{Q}|},

where \mathcal{Q} is the set of queries and top-k_q is the top-k for query q, it is useful for evaluating whether models recover any relevant links but overlooks ranking depth.^[50] Beyond accuracy-focused metrics, novelty and diversity evaluate the non-triviality and variety of predictions to avoid recommending obvious or redundant links. Novelty metrics, such as long-tail novelty, quantify how unexpected predictions are by measuring deviation from users' existing connections, often using formulas like the average complement of indegrees for predicted contacts.^[51] Diversity metrics, like intra-list dissimilarity, compute average pairwise distances between recommended nodes to ensure varied suggestions, promoting broader exploration in networks.^[51] These are adapted from recommender systems to penalize predictions of highly connected or similar nodes, enhancing practical value.^[51] Due to class imbalance in link prediction, where non-edges dominate, standard AUC can overestimate performance; alternatives like the F1-score, the harmonic mean of precision and recall, or the area under the precision-recall curve (AUPR) are preferred for balanced assessment.^[52] The F1-score is

\text{F1} = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}},

applied after thresholding scores, while AUPR integrates precision across recall levels, better reflecting minority class (edge) performance in sparse graphs.^[45]

Datasets and Benchmarks

Link prediction research relies on a variety of benchmark datasets that capture different graph structures, domains, and dynamics to evaluate model performance across static, temporal, heterogeneous, and domain-specific scenarios.^[53] These datasets provide standardized resources for comparing algorithms, with static benchmarks often serving as foundational tests for topology- and embedding-based methods, while dynamic and heterogeneous ones address real-world complexities like evolving networks and multi-type relations.^[54] Static benchmarks include the Cora citation network, which consists of 2,708 nodes representing machine learning papers and 5,429 edges denoting citations between them, commonly used to assess link prediction in academic graphs. Another prominent example is the Protein-Protein Interaction (PPI) dataset, featuring 3,890 proteins as nodes and 76,584 interactions as edges in a human network subgraph, enabling evaluation of biological link formation.^[55] For dynamic link prediction, datasets like the Wikipedia edit networks model temporal user interactions through edit histories, with snapshots capturing evolving collaborations over time. The email-Eu-core dataset, derived from anonymized email communications at a European research institution, includes 1,005 nodes and 25,571 edges, supporting predictions of interactions in communication networks.^[56] Heterogeneous datasets extend evaluation to multi-relational graphs; subsets of Freebase, such as FB15k with 14,951 entities and 134,500 triples across 1,345 relation types, test prediction of diverse knowledge graph links. The OGB-Arxiv dataset from the Open Graph Benchmark comprises 169,343 arXiv papers as nodes, 1,166,243 citation edges, and category labels as node features, facilitating link prediction in heterogeneous citation networks with metadata.^[53] In biomedical contexts, the BioSNAP collection provides datasets like those from the STRING database, which includes over 2,000 proteins and 100,000+ interactions in human networks, for predicting protein associations.^[57] STRING specifically offers high-confidence physical and functional links; the human network in version 11 contains over 19,000 proteins and millions of interactions, used to benchmark domain-specific predictors. Standard evaluation protocols involve time-splitting for dynamic datasets, typically allocating 80% of edges to training and 20% to testing based on timestamps to simulate future predictions.^[58] Negative sampling is employed to generate non-edges for training and ranking, with ratios ranging from 1:1 for balanced sets to 1:50 for challenging imbalanced scenarios, ensuring robust assessment of model discrimination.^[59] Recent benchmarks from 2023-2025 emphasize N-ary facts in knowledge graphs, extending beyond binary links; for instance, WikiKG90Mv2 from the Open Graph Benchmark contains 91 million entities and 601 million triples extracted from Wikidata, supporting large-scale N-ary link imputation.^[60] These additions address complex, multi-entity relations in evolving KGs, as surveyed in works on N-ary prediction frameworks.^[12]

Applications

Link prediction plays a central role in friend recommendation systems on social platforms like Facebook and Twitter, where topological measures such as the Adamic-Adar index are employed to identify potential new connections based on common neighbors and their degrees. Introduced in 2003, the Adamic-Adar measure weights common friends by the inverse of their degree, giving higher importance to less-connected intermediaries, and has been implemented in production systems during the 2010s to suggest friends by predicting edges in user friendship graphs.^[61] For instance, on Facebook, supervised random walks incorporating temporal dynamics have enhanced prediction accuracy for future friendships, achieving significant improvements over baseline topological methods in large-scale evaluations.^[61] Similarly, on Twitter, link prediction using retweet histories has proven more effective than traditional friendship-based approaches for inferring user relationships, leveraging diffusion patterns to recommend follows.^[62] In web-based networks, link prediction aids in inferring missing hyperlinks within evolving sites, which supports efficient web crawling by anticipating new pages and connections.^[16] Hyperlink prediction extends traditional graph link prediction to directed web graphs, using node features like content similarity and topological scores to forecast outlinks, as surveyed in comprehensive studies of web structure dynamics.^[16] A practical application involves focused crawling, where models predict new outlinks from existing pages to prioritize discovery of relevant content, reducing crawl overhead in resource-constrained environments; for example, machine learning classifiers trained on historical link formations have demonstrated up to 20% improvement in discovering unindexed pages.^[63] Information diffusion in social and news networks often relies on link prediction to forecast sharing behaviors, such as retweets, by modeling the propagation of content as evolving edges in interaction graphs.^[62] Learning-based models that integrate user features, content semantics, and network structure have been developed to predict diffusion cascades on Twitter, capturing how initial shares lead to broader retweet links with accuracies exceeding 80% in benchmark tests.^[64] These approaches highlight the role of temporal and probabilistic factors in anticipating information spread, aiding platforms in moderating viral content. Ethical considerations in link prediction for social recommendations include privacy risks from inferring sensitive relationships and the potential to reinforce echo chambers by prioritizing homophilous connections.^[65] Algorithms that suggest friends based on inferred links can inadvertently expose private associations through data aggregation, raising concerns under regulations like GDPR.^[65] Moreover, link recommendation systems have been shown to amplify polarization by creating denser intra-group ties, as simulations on opinion dynamics reveal increased echo chamber formation when recommendations favor similar users.^[66] To mitigate this, ethical frameworks advocate for diversity-aware predictions that balance accuracy with exposure to diverse viewpoints, preventing societal fragmentation.^[67] A notable case study is the Netflix Prize competition of 2009, which influenced collaborative filtering as a form of link prediction in bipartite user-item networks.^[68] The challenge framed movie recommendations as predicting missing ratings (edges) using matrix factorization and neighborhood methods, with winning ensembles achieving a 10% RMSE improvement over baselines, establishing collaborative filtering as a cornerstone for social recommendation systems.^[68] This work demonstrated how link prediction principles scale to implicit feedback graphs, inspiring applications in friend and content suggestions on platforms like Facebook.^[69]

Biological and Biomedical Networks

Link prediction in biological and biomedical networks plays a crucial role in uncovering hidden relationships within complex systems, such as protein interactions, drug mechanisms, and disease associations, thereby accelerating scientific discovery in areas like genomics and pharmacology. These networks often model entities like proteins, genes, drugs, and diseases as nodes, with edges representing interactions derived from experimental data or databases. By applying link prediction techniques, researchers can infer potential connections that inform hypotheses for drug development and personalized medicine.^[70]^[71] In protein-protein interaction (PPI) networks, link prediction helps identify novel interactions that elucidate cellular processes and disease pathways. The STRING database, a comprehensive resource for PPI data, has been widely used in recent studies employing graph neural networks (GNNs) to predict edges in these networks. For instance, multiscale GNN models like MGPPI integrate hierarchical structural information to enhance prediction accuracy on STRING-derived datasets, achieving improvements over traditional methods by capturing both local and global network patterns. Similarly, hierarchical graph learning approaches have demonstrated superior performance in PPI prediction by leveraging multi-level representations, outperforming baseline topology-based methods on benchmark PPI datasets from STRING. These 2024 advancements highlight the efficacy of GNNs in handling the scale and heterogeneity of PPI networks for tasks like functional annotation.^[71]^[72]^[73] Drug-target prediction utilizes link prediction to infer binding affinities in pharmacology graphs, where nodes represent drugs and proteins, and edges denote potential interactions. Graph-based models, such as GraphDTA, represent drugs as molecular graphs and employ GNNs to predict affinities, providing a foundational approach that has influenced subsequent work. More recent heterogeneous GNN frameworks, like GHCDTI, address limitations in homogeneous representations by modeling multi-type relations in pharmacology networks, leading to enhanced prediction of drug-target interactions for repurposing applications. These methods prioritize structural and semantic features to forecast novel bindings, aiding in the identification of therapeutic candidates.^[74]^[75] For disease association, link prediction in gene-disease networks facilitates comorbidity forecasting by revealing shared genetic or pathway links between conditions. Approaches using variational graph auto-encoders on disease-disease interaction networks have shown promise in predicting comorbidities, with embeddings capturing latent associations to prioritize high-risk pairs. Genetically enriched homogeneous networks further improve accuracy by integrating semantic and genomic features, enabling the scoring of comorbid disease pairs based on learned node representations. Such predictions support epidemiological modeling and early intervention strategies in multifactorial diseases.^[76]^[77] Biomedical link prediction faces significant challenges, including noisy data from high-throughput experiments like mass spectrometry, which introduce false positives and affect model robustness. Interpretability remains a key hurdle, as complex GNNs often act as black boxes, complicating validation in clinical contexts where causal insights are essential. Recent reviews emphasize addressing these through techniques like attention mechanisms and rule-based explanations to balance predictive power with transparency in biological networks.^[78]^[79]^[80] In 2025, heterogeneous graph neural networks (HGNNs) have advanced biomedical link prediction for drug discovery by modeling diverse node types and relations in multi-omics networks. Frameworks like those applied to STRING and pharmacology datasets demonstrate HGNNs' ability to predict interactions for lead optimization, outperforming homogeneous GNNs in heterogeneous biomedical settings. These applications underscore HGNNs' role in integrating multi-modal data for efficient drug candidate identification.^[81]^[75]^[82]

Recommendation and Knowledge Graphs

Link prediction plays a pivotal role in recommendation systems, particularly through collaborative filtering on user-item bipartite graphs. In these graphs, nodes represent users and items (such as movies or books), with edges indicating interactions like ratings or purchases. Matrix factorization techniques decompose the adjacency matrix into low-dimensional latent factor matrices for users and items, enabling the prediction of missing links by reconstructing potential interactions. For instance, this approach has been shown to outperform topology-based metrics like Adamic-Adar in collaborative filtering tasks, achieving higher AUC scores on datasets with millions of edges.^[22] In knowledge graph completion, link prediction focuses on inferring missing triples (head, relation, tail) in structured RDF graphs, such as DBpedia, to enhance semantic web applications. The TransE model embeds entities and relations into a continuous vector space, where a relation is modeled as a translation vector such that the head entity vector plus the relation vector approximates the tail entity vector, formalized as \mathbf{h} + \mathbf{r} \approx \mathbf{t}. This energy-based scoring function allows efficient prediction of plausible links, demonstrating strong performance on benchmarks like WN18 and FB15k, which include DBpedia subsets.^[83] In e-commerce, link prediction on product co-purchase networks, exemplified by Amazon's dataset of over 500,000 products, identifies complementary items for recommendations. These networks treat products as nodes and co-purchases as edges, using graph neural networks like GraphSAGE to embed node features (e.g., titles and categories) and predict new edges for newly listed items. A modified GraphSAGE variant with one-hop sampling has achieved improvements in top-5 accuracy over random forest baselines, enabling scalable suggestions in dynamic catalogs.^[84] For supply chain management, neurosymbolic methods integrate neural networks with symbolic reasoning to predict supplier links in knowledge graphs, addressing opacity in traditional models. The Neural Bellman-Ford Network (NBFNet), introduced in 2024, combines graph neural propagation with logical rules for explainable predictions on heterogeneous graphs from sources like Marklines and Achilles, matching black-box GNN performance while providing transparency on datasets with tens of thousands of companies. This approach enhances digital supply chain surveillance by uncovering hidden dependencies across global networks.^[85] Recent advancements in n-ary extensions model multi-attribute relations beyond binary triples, crucial for enterprise knowledge graphs handling complex facts like contracts with multiple roles and qualifiers. Models surveyed in 2025, such as those using hypergraph convolutions and attention mechanisms, represent n-ary facts as star or box structures to capture qualifiers (e.g., time, location), improving completion accuracy on datasets like WikiPeople by 10-20% over binary baselines in enterprise risk management applications. These methods facilitate richer representations in industrial settings, such as supply chain ontologies.^[86] Scalability remains essential for billion-scale graphs in recommendation and knowledge graphs, where full-graph processing is infeasible. Sampling techniques, such as two-stage degree-based and Personalized PageRank methods, extract representative subgraphs to approximate link predictions, reducing computational load while preserving performance. For example, on graphs with over 2 billion edges like OAG, these approaches with LLMs achieve 30% gains in hit rates over GNNs.^[87] These enable deployment on industrial platforms.

References

[1]
[PDF] The Link Prediction Problem for Social Networks - CS@Cornell
Jan 8, 2004 · The Link Prediction Problem for Social Networks. ∗. David Liben-Nowell†. Laboratory for Computer Science. Massachusetts Institute of Technology.
[2]
[PDF] Link prediction in complex networks: A survey - Carlo Piccardi
Dec 2, 2010 · Link prediction in complex networks has attracted increasing attention from both physical and computer science communities.
[3]
[PDF] Link Prediction Based on Graph Neural Networks
Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index,.
[4]
[PDF] Evaluating Graph Neural Networks for Link Prediction - OpenReview
The task of link prediction is to determine the existence of an edge between two unconnected nodes in a graph. Existing link prediction algorithms attempt to ...
[5]
[PDF] A Survey of Link Prediction in Temporal Networks - arXiv
Feb 28, 2025 · Abstract. Temporal networks have gained significant prominence in the past decade for modelling dynamic interactions within complex systems.
[6]
[1010.0725] Link Prediction in Complex Networks: A Survey - arXiv
Oct 4, 2010 · This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches.
[7]
[PDF] Representation Learning for Dynamic Graphs: A Survey
In this survey, we mainly study three general problems for dynamic graphs: node classification, edge prediction, and graph classification. Node classification ...
[8]
[2502.05724] Rethinking Link Prediction for Directed Graphs - arXiv
Feb 8, 2025 · In this paper, we propose a unified framework to assess the expressiveness of existing methods, highlighting the impact of dual embeddings and decoder design.
[9]
Link Prediction for Directed Graphs - SpringerLink
The goal of link prediction is to predict whether a link between two users will be established or if a link in a partially observed network is missing. The ...
[10]
Link prediction in heterogeneous information networks: An improved ...
This study develops an improved spatial graph convolution network to learn predictive vertex embeddings with minimal information loss based on local community ...
[11]
[2302.10432] Link Prediction on Latent Heterogeneous Graphs - arXiv
Feb 21, 2023 · In this paper, we study the challenging and unexplored problem of link prediction on an LHG. As existing approaches depend heavily on type-based information, ...
[12]
[2506.08970] A Survey of Link Prediction in N-ary Knowledge Graphs
Jun 10, 2025 · In this paper, we present the first comprehensive survey of link prediction in NKGs, providing an overview of the field, systematically categorizing existing ...
[13]
Link Prediction on N-ary Relational Data - ACM Digital Library
May 13, 2019 · A method to conduct Link Prediction on N-ary relational data, thus called NaLP, which explicitly models the relatedness of all the role-value pairs in the same ...
[14]
[PDF] Exploiting Longer Cycles for Link Prediction in Signed Networks
An undirected graph has an associated adjacency matrix A ∈ {−1, 0, +1}|V |×|V |. For undirected graphs, A is symmetric, i.e. A = AT , while for directed graphs ...
[15]
Prediction of Links and Weights in Networks by Reliable Routes
Jul 22, 2015 · Link prediction aims to uncover missing links or predict the emergence of future relationships from the current network structure.
[16]
[2207.02911] A Survey on Hyperlink Prediction - arXiv
Jul 6, 2022 · As a natural extension of link prediction on graphs, hyperlink prediction aims for the inference of missing hyperlinks in hypergraphs, where a ...
[17]
[PDF] NHP: Neural Hypergraph Link Prediction - MALL Lab @ IISc
We have introduced NHP, a novel approach for hyperlink prediction for both undirected and the first method on directed hypergraphs. The novel scoring ...
[18]
Structural equivalence of individuals in social networks
The aim of this paper is to understand the interrelations among relations within concrete social groups. Social structure is sought, not ideal types.
[19]
Bibliographic coupling between scientific papers - Wiley Online Library
This report describes the results of automatic processing of a large number of scientific papers according to a rigorously defined criterion of coupling.
[20]
The link prediction problem for social networks - ACM Digital Library
The link prediction problem for social networks. Authors: David Liben-Nowell. David Liben-Nowell. Massachusetts Institute of Technology. View Profile. , Jon ...
[21]
[PDF] Link Prediction via Matrix Factorization - UCSD CSE
If the graph is undirected, then we can absorb Λ into the U matrix. For directed graphs, we can let Λ be an arbitrary asymmetric matrix, following [38,23].Missing: mathematical | Show results with:mathematical
[22]
[1503.03578] LINE: Large-scale Information Network Embedding
Mar 12, 2015 · This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks.
[23]
Inductive Representation Learning on Large Graphs - arXiv
Jun 7, 2017 · Here we present GraphSAGE, a general, inductive framework that leverages node feature information (eg, text attributes) to efficiently generate node embeddings ...
[24]
[1710.10903] Graph Attention Networks - arXiv
Oct 30, 2017 · We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers.
[25]
[2002.07962] Inductive Representation Learning on Temporal Graphs
Feb 19, 2020 · We propose the temporal graph attention (TGAT) layer to efficiently aggregate temporal-topological neighborhood features as well as to learn the time-feature ...
[26]
A survey on feature extraction and learning techniques for link ...
Oct 28, 2024 · This survey investigates several strategies related to link prediction, ranging from feature extraction based to feature learning based techniques.
[27]
Graph Theory
Reinhard Diestel. Graph Theory. GTM 173, Sixth edition 2025. Springer-Verlag, Heidelberg Graduate Texts in Mathematics, Volume 173
[28]
Collective dynamics of 'small-world' networks - Nature
Jun 4, 1998 · Strogatz, S. H. & Stewart, I. Coupled oscillators and biological synchronization. Sci. Am. 269(6), 102–109 (1993). Article Google Scholar.
[29]
[PDF] Friends and neighbors on the Web - Semantic Scholar
Friends and neighbors on the Web · Lada A. Adamic, Eytan Adar · Published in Soc. Networks 1 July 2003 · Computer Science, Sociology · Soc. Networks.
[30]
[PDF] Albert-László Barabási, Emergence of Scaling in Random Networks
Oct 19, 2007 · 1. Because of the preferential attachment, a vertex that acquires more connections than another one will increase its connectivity at a higher ...Missing: original | Show results with:original
[31]
[PDF] Link Prediction using Supervised Learning ∗ - Computer Science
Link prediction is predicting the likelihood of future associations between nodes, using supervised learning and identifying key features.Missing: survey | Show results with:survey
[32]
[PDF] Link Prediction in Networks with Nodes Attributes by Similarity ...
Similarity of vectors Ti and Tj also can be computed based on their distance such as Euclidean distance or cosine distance. Denote the attributes similarity ...
[33]
[PDF] Adversarial Link Prediction in Spatial Networks
May 29, 2023 · A link prediction problem in such spatial networks then amounts to deter- mining whether the pair of nodes are sufficiently close according to ...
[34]
node2vec: Scalable Feature Learning for Networks - arXiv
Jul 3, 2016 · We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several ...
[35]
Graph representation learning via enhanced GNNs and transformers
Aug 6, 2025 · The combination of GNNs and Transformers leverages the sensitivity of GNNs to local information and the ability of Transformers to handle global ...
[36]
[PDF] Understanding Negative Sampling in Graph Representation Learning
Negative sampling in graph learning is crucial, as it is as important as positive sampling in determining the optimization objective and variance.
[37]
Comparing discriminating abilities of evaluation metrics in link ...
Jan 8, 2024 · The results indicate that the discriminating abilities of the three metrics, AUC, AUPR, and NDCG, are significantly higher than those of other metrics.
[38]
Discriminating abilities of threshold-free evaluation metrics in link ...
The area under the receiver operating curve (AUC) [35], [36] is the most frequently used threshold-free metric in link prediction, probably because of its high ...
[39]
Inconsistency among evaluation metrics in link prediction
As many observed networks are incomplete or dynamically changing, link prediction can find direct applications in inferring missing or upcoming links, such ...
[40]
A link prediction-based recommendation system using transactional ...
Apr 27, 2023 · Mean average precision at K (MAP@K) metric treats the recommendation system as a ranking task since recommendation systems offer a ranked list ...
[41]
Neural networks for link prediction in realistic biomedical graphs
May 21, 2018 · Mean average precision (MAP):. Given a ranked list of predicted links relevant to a particular node, we calculate the precision after each true ...
[42]
Link Prediction on Complex Networks: An Experimental Survey - PMC
Link prediction plays an important role in complex network analysis in that it can find missing links or predict the links which will arise in the future.
[43]
Structural Novelty and Diversity in Link Prediction - ACM Digital Library
We discuss the adaptation, for this purpose, of specific network, novelty and diversity metrics from social network analysis, recommender systems, and ...
[44]
Network link prediction via deep learning method: A comparative ...
Selected link prediction scores. Link prediction plays a crucial role in ... evaluation metrics, including accuracy, precision, and F1 Score. However ...
[45]
[PDF] Open Graph Benchmark: Datasets for Machine Learning on Graphs
Feb 25, 2021 · We present the OPEN GRAPH BENCHMARK (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, ...
[46]
[PDF] Revisiting Link Prediction: A data perspective - arXiv
Nov 7, 2024 · Link prediction, which aims to find missing links within a graph, is a fundamental task in the graph domain.
[47]
Graph Convolutional Prediction of Protein Interactions in Yeast
We formulate this prediction task as a link prediction problem on unweighted and undirected networks and use a graph convolutional neural network to solve the ...
[48]
Network datasets: email-Eu-core network - SNAP: Stanford
The network was generated using email data from a large European research institution. We have anonymized information about all incoming and outgoing email.
[49]
Stanford Biomedical Network Dataset Collection
The collection includes datasets with relationships between entities like cells, drugs, genes, and diseases, and also datasets with information about entities.Missing: prediction STRING
[50]
[PDF] Towards Better Evaluation for Dynamic Link Prediction
To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications.
[51]
Exploring the Performance of Continuous-Time Dynamic Link ...
Apr 22, 2024 · (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time.
[52]
WikiKG90Mv2 - Open Graph Benchmark
WikiKG90Mv2 is a Knowledge Graph (KG) extracted from the entire Wikidata knowledge base. The task is to automatically impute missing triples that are not yet ...Missing: 2023-2025 ary
[53]
[PDF] Predicting and Recommending Links in Social Networks - CS Stanford
To address these challenges we develop a method for both link prediction and link recommendation. We develop a concept of Supervised Random. Walks that ...
[54]
Retweets as a Predictor of Relationships among Users on Social ...
The results of this study indicate that in social media, link prediction based on retweet history is more effective than conventional prediction based on ...
[55]
(PDF) Prediction of new outlinks for focused crawling - ResearchGate
Nov 10, 2021 · Discovering new hyperlinks enables Web crawlers to find new pages that have not yet been indexed. This is especially important for focused ...
[56]
https://snap.stanford.edu/data/email-Eu-core.html
[57]
Social Networking and Ethics - Stanford Encyclopedia of Philosophy
Aug 3, 2012 · Fundamental practices of concern for direct ethical impacts on privacy include: the transfer of users' data to third parties for intrusive ...<|control11|><|separator|>
[58]
Link recommendation algorithms and dynamics of polarization in ...
Our study sheds light on the impacts of social-network algorithms in opinion dynamics and unveils avenues to steer polarization in online social networks.Abstract · Results · Discussion
[59]
The Effect of People Recommenders on Echo Chambers and ...
May 31, 2022 · Our thorough experimentation shows that people recommenders can in fact lead to a significant increase in echo chambers.
[60]
[PDF] The Netflix Prize - Computer Science
If no personalized prediction is available, the average rating based on all ratings for the film is used. These predictions are displayed on the website as red- ...Missing: link | Show results with:link
[61]
[PDF] Link prediction approach to recommender systems
link prediction measures as recommendations to the users. Our work ... Lessons from the netflix prize challenge. Acm Sigkdd. Explorations Newsletter ...<|control11|><|separator|>
[62]
Graph Neural Networks for Protein-Protein Interactions - arXiv
This paper aims to review the latest research developments of graph neural networks in forecasting protein-protein interactions, compare the architectures of ...
[63]
MGPPI: multiscale graph neural networks for explainable protein ...
Jul 15, 2024 · In this paper, we propose MGPPI, which is a Multiscale graph convolutional neural network model for PPI prediction.
[64]
Hierarchical graph learning for protein–protein interaction - Nature
Feb 25, 2023 · Link prediction methods based on common neighbor (CN) assign high probabilities of PPI to protein pairs that are known to share common PPI ...
[65]
Prediction of protein–protein interaction based on ... - BMC Biology
Aug 4, 2025 · The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.
[66]
Predicting drug–target binding affinity with graph neural networks
Jan 22, 2020 · Methods We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity.
[67]
Heterogeneous network drug-target interaction prediction model ...
Aug 19, 2025 · Here, we present GHCDTI, a heterogeneous graph neural framework designed to overcome these challenges through three synergistic innovations.
[68]
Link Prediction in Disease-Disease Interactions Network Using a ...
Nov 1, 2025 · The learned embeddings are leveraged by the variational graph auto-encoder to predict disease comorbidity in the disease-disease interactions ...Missing: forecasting | Show results with:forecasting
[69]
Genetically and semantically aware homogeneous network for ...
Scoring and predicting comorbidity disease pairs using a genetically and semantically enriched homogeneous network. •. Learnt rich disease node embedding using ...Missing: forecasting | Show results with:forecasting
[70]
Deep learning with noisy labels in medical prediction problems - NIH
Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels.
[71]
A Review of Link Prediction Applications in Network Biology
Apr 3, 2025 · We examine the current applications of LP metrics for predicting links between diseases, genes, proteins, RNA, microbiomes, drugs, and neurons.
[72]
Interpretable Graph Convolutional Neural Networks for Inference on ...
In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges ...
[73]
Heterogeneous graph neural networks for link prediction in ...
In this study, we posit the utility of readily available generic HGNNs in addressing the link prediction tasks in biomedical settings. Thus, we conduct a ...
[74]
Graph neural networks driven acceleration in drug discovery
Oct 18, 2025 · EKGDR enabled the prediction of 20 potential drug-repositioning candidates for treating Alzheimer's disease and Parkinson's disease (Table 2).
[75]
Not Found
Insufficient relevant content.
[76]
Towards trustworthy AI for link prediction in supply chain knowledge ...
Oct 3, 2024 · In this paper, we design a trustworthy ML approach based on recent theoretical development in neurosymbolic AI methods that enables a more ...
[77]
[PDF] A Survey of Link Prediction in N-ary Knowledge Graphs
Nov 4, 2025 · Methods for link pre- diction in NKGs fall into three categories: spa- tial mapping-based (Wen et al., 2016), tensor decomposition-based, and ...