Fact-checked by Grok 2 weeks ago

Graph kernel

A graph kernel is a symmetric, function defined on pairs of graphs that measures their structural similarity, enabling the application of kernel methods—such as support vector machines—to tasks on graph-structured data. These kernels typically operate by decomposing graphs into substructures (e.g., walks, subgraphs, or vertices) and computing similarities based on counts or alignments of these components, ensuring computational tractability for non-Euclidean data representations common in fields like bioinformatics and . Graph kernels emerged in the early as a response to the need for scalable similarity measures in structured data learning, with foundational works including the kernel proposed independently by Gärtner et al. and Kashima et al. in 2003. Early developments focused on labeled s and discrete structures, drawing from kernel frameworks introduced by Haussler in 1999, but subsequent innovations addressed continuous node attributes and unlabeled s starting around 2012. Over the past two decades, dozens of graph kernels have been developed, establishing them as a cornerstone of alongside more recent approaches. Key types of graph kernels include neighborhood aggregation methods, such as the Weisfeiler-Lehman kernel, which iteratively refines labels to capture patterns and has shown strong performance on benchmark datasets for graph classification. Other prominent variants encompass random walk-based kernels, which count matching walks between graphs but can be computationally intensive, and subgraph-matching kernels like the graphlet kernel, which enumerate small induced subgraphs for similarity computation. Assignment-based kernels, such as the optimal assignment kernel, further extend this by optimizing bipartite matchings between graph elements to handle permutations and alignments. Graph kernels have found wide applications in cheminformatics for molecular similarity prediction, bioinformatics for protein function , and for detection, often outperforming traditional on irregular data. Despite their versatility, challenges remain in for large graphs and expressiveness compared to graph neural networks, driving ongoing research into hybrid and approximation techniques.

Overview

Definition and Motivation

A graph kernel is defined as a positive semi-definite kernel k: \mathcal{G} \times \mathcal{G} \to \mathbb{R}, where \mathcal{G} denotes the space of graphs, that corresponds to an inner product in a reproducing kernel Hilbert space (RKHS) of feature mappings over graphs. This formulation allows the computation of pairwise similarities between graphs G and H implicitly in a high-dimensional feature space, bypassing the need to explicitly extract or represent graph features as vectors. Graph kernels are motivated by the need to extend kernel-based algorithms, such as support vector machines (SVMs), to non-Euclidean data structures like graphs, which arise naturally in domains including cheminformatics (e.g., molecular graphs) and network analysis (e.g., social or biological networks). Traditional kernel methods operate effectively on vector spaces but fail on irregular graph topologies, where nodes and edges lack fixed dimensionality or alignment, hindering direct application of algorithms requiring inner products. By providing a valid on graphs, these functions enable the use of powerful, techniques in SVMs and related methods for tasks like graph classification and clustering without resorting to heuristic embeddings. Intuitively, a graph kernel measures overall similarity by summing or integrating pairwise similarities between aligned substructures—such as walks, paths, or subgraphs—across the two input graphs, effectively decomposing complex relational data into comparable components. This approach originated with Haussler's introduction of R-convolution kernels in 1999, which generalized kernel methods to structures including graphs by convolving kernels over their decomposable parts.

Historical Development

The development of graph kernels traces its roots to the late , when the need arose to extend kernel methods from vector spaces to structured data such as . In 1999, David Haussler introduced the convolution kernel framework, which provided a general method for constructing positive semi-definite kernels on discrete structures like strings, trees, and by decomposing them into simpler components and applying product kernels recursively. This foundational work laid the groundwork for handling graph-structured data within kernel-based algorithms, enabling similarity measures that respect the relational nature of . Building on Haussler's ideas, early formalizations in the early focused on specific kernel constructions for graphs. In 2002, Risi Kondor and John Lafferty proposed diffusion kernels, which model graph similarity through heat diffusion processes on graphs, capturing global structural properties via matrix exponentials of graph Laplacians. The following year, Thomas Gärtner, Peter Flach, and Stefan Wrobel developed the kernel, which counts matching walks of varying lengths between graphs, offering an efficient alternative while establishing theoretical hardness results for kernels; this was independently proposed by Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. These contributions formalized graph kernels as viable tools for tasks like on graph datasets. The mid-2000s saw the emergence of diverse kernel families, including subgraph-based and spectrum-based approaches, expanding the representational power of graph kernels. Subgraph kernels, such as the cyclic pattern kernel introduced by Tamás Horváth, Thomas Gärtner, and Stefan Wrobel in 2004, explicitly enumerate and match frequent subgraph patterns to measure similarity, proving effective for predictive mining on molecular graphs. Concurrently, spectrum-based kernels, exemplified by Alexander J. Smola and Risi Kondor's 2003 work on regularization operators, leveraged eigenvalues of graph adjacency or Laplacian matrices to encode spectral properties, providing a compact representation of graph topology. Key advancements in the late 2000s and early addressed computational efficiency and expressiveness. S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt's 2010 survey unified various graph kernels under a common framework, emphasizing fast approximation techniques for large graphs using linear algebra in reproducing kernel Hilbert spaces. In 2011, Nino Shervashidze and colleagues proposed the Weisfeiler-Lehman graph kernel, inspired by the Weisfeiler-Lehman graph isomorphism test, which iteratively refines node labels to capture structures efficiently and outperformed prior kernels on datasets. By the end of the , these developments had established graph kernels as a mature subfield, with applications spanning bioinformatics and , though computational challenges persisted for very large graphs.

Mathematical Foundations

Kernel Functions on Graphs

Kernel functions provide a principled way to measure similarity between data points by implicitly mapping them into a high-dimensional (RKHS), where the kernel value corresponds to the inner product of the mapped representations. Formally, for data points x, y from some input space \mathcal{X}, a function k: \mathcal{X} \times \mathcal{X} \to \mathbb{R} is defined as k(x, y) = \langle \phi(x), \phi(y) \rangle_{\mathcal{H}}, where \phi: \mathcal{X} \to \mathcal{H} is a feature map into an RKHS \mathcal{H}, and \langle \cdot, \cdot \rangle_{\mathcal{H}} denotes the inner product in \mathcal{H}. This construction allows kernel methods, such as support vector machines, to operate in the feature space without explicitly computing \phi, leveraging the kernel trick for computational efficiency. In the context of graphs, kernel functions are adapted to compare pairs of graphs G = (V_G, E_G) and H = (V_H, E_H), where V denotes and E denotes edges, by defining similarity through implicit feature maps that capture graph substructures such as and edges. A graph kernel k: \mathbb{G} \times \mathbb{G} \to \mathbb{R}, with \mathbb{G} being the space of graphs, takes the form k(G, H) = \langle \phi(G), \phi(H) \rangle_{\mathcal{H}}, where the feature map \phi embeds graphs into the RKHS based on counts or similarities of these substructures, ensuring permutation invariance with respect to vertex labeling. This adaptation enables the comparison of non-Euclidean graph data while preserving structural information. A basic instantiation of this is the kernel, which measures pairwise similarities and aggregates them to obtain a graph-level similarity: k(G, H) = \sum_{v_i \in V_G} \sum_{u_j \in V_H} k_v(v_i, u_j), where k_v is a on attributes or labels. This can be extended to incorporate information by defining an kernel k_E(G, H) = \sum_{e \in E_G} \sum_{e' \in E_H} k_e(e, e'), where k_e operates on attributes, or by combining both through product kernels on - pairs. Such decompositions align with the paradigm for structured data, allowing modular construction from base on substructures. The validity of these graph kernels relies on Mercer's theorem, which guarantees that a continuous, symmetric, positive semi-definite kernel corresponds to an inner product in some feature space, ensuring that the implicit embeddings \phi(G) form a valid RKHS for graph representations. This theorem underpins the theoretical soundness of graph kernels, allowing them to be used in kernel-based algorithms without risking non-convex optimization issues arising from invalid kernels.

Positive Definiteness and Mercer Condition

In the context of graph kernels, positive semi-definiteness (PSD) is a fundamental property that ensures the kernel function k can be employed in kernel-based algorithms. Specifically, for any finite set of graphs \{G_1, \dots, G_n\}, the K with entries K_{ij} = k(G_i, G_j) must be , meaning all its eigenvalues are non-negative. This condition guarantees that the kernel corresponds to an inner product in some feature space, allowing for stable and in methods like support vector machines (SVMs). The Mercer condition provides a theoretical foundation for PSD kernels on graphs, stating that a symmetric kernel k(G, H) is valid if there exists a feature map \phi into a reproducing kernel Hilbert space such that k(G, H) = \langle \phi(G), \phi(H) \rangle. For graph kernels, this often manifests as a countable sum over graph substructures (e.g., walks or subgraphs) rather than an integral, but the general form k(G, H) = \int \phi(G, t) \phi(H, t) \, dt underscores the existence of such a map, ensuring the kernel's applicability in the representer theorem for learning in high-dimensional spaces. This property is crucial for graph kernels derived from convolution frameworks, where PSD is preserved through composition of base kernels on discrete parts of the graphs. The implications of and the Mercer condition extend to practical workflows, particularly enabling convex in SVMs by avoiding indefinite matrices that could lead to optimization instability or non-unique solutions. Non- graph kernels, such as certain assignment-based similarities, may require regularization or modification to be usable, highlighting the importance of these properties in kernel design. To verify empirically, one computes the eigenvalues of the for a sample of s; all must be non-negative, or alternatively, attempts , which succeeds the matrix is . These checks are essential during kernel development to confirm theoretical guarantees hold on real datasets.

Construction Methods

R-Convolution Framework

The R-convolution framework, introduced by in , provides a foundational method for constructing functions on discrete structures such as graphs by leveraging their relational composition. In this approach, a relational structure is defined as a set of objects connected by relations, and the between two such structures x and y decomposes into a sum over their constituent parts: k(x, y) = \sum_{R \in \mathcal{R}} \sum_{s \in S_R(x), t \in S_R(y)} k_R(s, t), where \mathcal{R} denotes the set of relations, S_R(x) and S_R(y) are the substructures corresponding to relation R in x and y, and k_R is a base on those substructures. This decomposition ensures that the overall measures similarity by aggregating pairwise similarities of aligned components, provided the base kernels k_R are positive semi-definite to guarantee the condition for the composite . When applied to graphs, the R-convolution framework specializes to convolutions over vertices and edges, treating the graph as a relational structure with and relations. For two graphs G = (V_G, E_G) and H = (V_H, E_H), the kernel becomes k(G, H) = \sum_{u \in V_G, v \in V_H} k_{\text{node}}(u, v) + \sum_{e \in E_G, f \in E_H} k_{\text{edge}}(e, f), where k_{\text{node}} compares features (e.g., labels or degrees) and k_{\text{edge}} compares features (e.g., connecting pairs). This formulation allows the kernel to capture both local structural similarities and global alignments without requiring explicit . The modular design of the R-convolution framework is a key advantage, as it enables the combination of simple base kernels on atomic elements—like label matches or geometric features—to build expressive graph kernels for complex tasks such as . However, this decomposition can lead to high , scaling with the product of the number of substructures in the input graphs, necessitating techniques to mitigate inefficiency in large-scale applications.

Walk and Path-Based Approaches

Walk kernels construct graph similarities by aggregating matches between sequences of adjacent vertices, often incorporating transition probabilities to model traversals across the graphs. These methods, building on the R-convolution framework, decompose graphs into walks—ordered sequences of vertices connected by edges—and compute values as weighted sums of similarities between corresponding walks in two graphs G and H. Transition probabilities can be derived from adjacency matrices or dynamics, emphasizing structural patterns like connectivity and labeling along the sequences. This approach captures relational information but requires careful handling of infinite walk lengths, typically addressed through factors. A general form for walk and path kernels is given by
k(G, H) = \sum_{p \in \mathcal{P}_G} \sum_{q \in \mathcal{P}_H} \mu(|p|, |q|) \delta(\ell(p), \ell(q)),
where \mathcal{P}_G and \mathcal{P}_H denote the sets of walks or in G and H, respectively, \mu is a penalizing longer structures (e.g., \mu(l_1, l_2) = \lambda^{l_1 + l_2} for \lambda < 1), and \delta measures label sequence compatibility, such as exact matching or a kernel on / labels. For path-based approaches, the focus narrows to shortest or all simple between pairs, leveraging graph products or powers of the to enumerate and compare them efficiently. These kernels highlight distances and direct connections, making them suitable for tasks where path efficiency matters, such as in molecular similarity. Computation often involves for all-paths variants or Floyd-Warshall for shortest , achieving time in size.
To mitigate computational overhead, pruning techniques eliminate redundant or unproductive structures, such as interrupted walks that revisit vertices unnecessarily (tottering) or exceed a maximum length. Second-order Markov models, for instance, condition walk continuation on the previous two vertices, pruning cycles and back-and-forth oscillations that inflate counts without adding discriminative power. For paths, restricting to shortest ones or sampling high-probability paths reduces the search space from exponential to cubic time in vertex count. These optimizations preserve expressiveness while enabling scalability to larger graphs, as demonstrated in early implementations achieving O(n^4) complexity for labeled shortest-path comparisons.

Notable Graph Kernels

Random Walk Kernel

The kernel measures the similarity between two graphs by counting the number of matching s between them, weighted by a factor to emphasize shorter walks. Formally, for two graphs G and H with labels, the is defined as k(G, H) = \sum_{\substack{p, q \\ |p|=|q|}} \lambda^{|p|} \prod_{i=1}^{|p|} \delta(l(p_i), l(q_i)), where the sum is over all pairs of walks p in G and q in H of equal length, \lambda \in (0, 1) is a controlling the contribution of longer walks, l(\cdot) denotes the of a , and \delta is the function that is 1 if the labels match and 0 otherwise. This formulation arises from the R-convolution framework, where walks serve as structural features, and the computes their inner product in a high-dimensional feature space. Computation of the kernel exploits the structure of the G \times H, whose vertices are pairs (u, v) with u \in G and v \in H, and edges connect pairs if there are corresponding edges in G and H with matching labels. The number of matching walks of t starting from all possible vertex pairs is given by the of entries in the t-th power of the A of G \times H, i.e., \sum_{i,j} (A^t)_{ij}, weighted by \lambda^t. The full kernel is then the over all lengths: k(G, H) = \sum_{t=0}^\infty \lambda^t \sum_{i,j} (A^t)_{ij}. This can be computed efficiently using dynamic programming, where the number of walks of up to T is tracked iteratively in O(|V_G| |V_H| T) time, or via for fixed T in O((|V_G| |V_H|)^\omega) time with \omega \approx 2.37 the multiplication ; for the , closed-form solutions using inversion yield O((|V_G| |V_H|)^3) time when \lambda < 1. Variants of the kernel adapt it for specific graph properties and efficiency. The labeled kernel explicitly incorporates and labels by requiring exact matches in the \delta , extending the basic unlabeled version to discrete-labeled s while preserving under the condition. For efficiency on large s, interrupted random walks introduce a probability \gamma of halting the walk at each step, preventing cycles like tottering (immediate back-and-forth on edges) and yielding a finite-sum kernel computable via a second-order or absorbing Markov chains, with reduced to O(|V|^3) per graph pair. In terms of expressiveness, the kernel effectively captures local connectivity patterns through sequences of neighboring vertices but often fails to distinguish graphs with identical local structures yet differing global topologies, such as cycles of different lengths, limiting its discriminative power compared to subgraph-based methods.

Weisfeiler-Lehman Kernel

The Weisfeiler-Lehman kernel is a graph kernel that extends the classical Weisfeiler-Lehman stabilization method, originally developed for testing , to provide an approximate measure of structural similarity between graphs. This approach leverages iterative label refinement to capture hierarchical substructures, making it suitable for comparing graphs in tasks where exact is computationally infeasible. It operates within the R-convolution framework by decomposing graphs into rooted subtrees based on node neighborhoods. The algorithm begins with initial node labels, typically the node degrees for unlabeled graphs or predefined discrete labels if available. In each of h iterations, the label of a v is updated by forming a of the labels of its neighbors, this , and hashing the combination with the node's current label to produce a new unique label. This process, akin to a 1-dimensional Weisfeiler-Lehman test, propagates structural information outward from each , with the hashing step ensuring efficient compression of neighborhood descriptions. At each iteration c, a φ_c(G) is computed for graph G, representing the counts of each distinct label across all nodes, which serves as a feature encoding the graph's refined structure. The kernel between two graphs G and H is defined as the sum over iterations of a base kernel applied to these histogram vectors: k_{\text{WL}}^{(h)}(G, H) = \sum_{c=0}^{h} k_{\text{base}}(\phi_c(G), \phi_c(H)) where k_base is typically the kernel on the histogram vectors. This summation aggregates similarities at increasing levels of structural detail, with the number of iterations h controlling the depth of refinement. In terms of expressiveness, the 1-Weisfeiler-Lehman kernel (corresponding to a single iteration) is equivalent to a degree-refinement kernel that distinguishes nodes only by their degrees and immediate neighborhoods. Increasing h allows the kernel to capture more complex substructures, such as paths and cycles up to length roughly 2h, approaching the full power of the for distinguishing non-isomorphic graphs in most practical cases. However, like the underlying test, it may fail to separate certain strongly regular graphs that are non-isomorphic.

Applications

Bioinformatics and Chemoinformatics

In bioinformatics, graph kernels facilitate protein function prediction by measuring structural similarities in protein-protein interaction () networks, where proteins are represented as nodes and interactions as edges. Graphlet kernels, which quantify the frequency of small induced subgraphs (graphlets) as motifs, capture local topological patterns to infer functional annotations, particularly useful for identifying -related genes through subnetwork comparisons. For instance, works from the and employed motif-based graphlet similarity to prioritize candidate genes in PPI networks by aligning their neighborhood structures with known disease modules. A seminal application involved random walk graph kernels on PPI networks for classifying enzyme functions, where the kernel computes the similarity of random walks between protein graphs, enabling support vector machine (SVM) classifiers to achieve competitive performance in multi-class prediction tasks. Extending this to disease contexts, composite random walk kernels integrated PPI data with gene expression profiles to predict cancer outcomes, yielding accuracies of 63.33% on leukemia datasets and 61.54% on breast cancer datasets by accounting for both present and absent interactions. In chemoinformatics, graph kernels model molecules as graphs with atoms as labeled nodes and bonds as edges, supporting tasks like and property prediction without predefined features. Tanimoto-like kernels extend the classical Tanimoto coefficient to graph substructures, normalizing the intersection of path-based features for robust molecular similarity assessment in and evaluation. Mahé et al. introduced the Tanimoto kernel, which counts labeled paths up to a fixed depth via , achieving 91.5% accuracy on the MUTAG dataset for predicting mutagenicity of aromatic compounds. Marginalized graph kernels, which average similarities over all possible walks between graph pairs, have been applied to regression tasks such as atomization energy prediction. Tang and de Jong combined this kernel with regression and on molecular graphs derived from atomic positions, attaining a of 0.62 kcal/mol on the QM7 of small molecules with up to 7 heavy atoms. A representative involves SVM classifiers with the Weisfeiler-Lehman () kernel for , relevant to applications like toxicity screening and target interaction modeling. On the MUTAG , the subtree kernel, which iteratively refines node labels to capture hierarchies, delivers 82.05% ± 0.36% accuracy in distinguishing mutagenic from non-mutagenic compounds. This kernel's extension to bipartite drug-target graphs supports interaction prediction by embedding drug and protein structures into a shared kernel , often outperforming alignment-based methods in accuracy on benchmark interaction datasets. Graph kernels offer significant benefits in these domains by accommodating variable-sized graphs—ranging from small molecules to large PPI networks—without requiring costly alignment or canonical labeling, thus enabling efficient similarity computation directly from raw topological data.

Social and Biological Network Analysis

Graph kernels, particularly random walk-based variants, play a key role in social network analysis by enabling the measurement of structural similarities that support community detection and link prediction. In community detection, these kernels compare ego-networks—subgraphs induced by individual nodes and their neighbors—to identify cohesive groups, as demonstrated in hierarchical clustering approaches that leverage random walk counts to quantify local structural motifs across social graphs. For link prediction, random walk kernels compute node similarities by aggregating probabilities of walks between pairs of nodes in ego-networks, outperforming traditional metrics like common neighbors on datasets such as ego networks. Such similarities directly inform user recommendation systems, where kernel-derived proximity ranks potential connections to enhance network growth predictions. In biological network analysis, path-based graph kernels facilitate the comparison of regulatory networks (GRNs) by capturing regulatory pathway similarities through counts of shortest paths or walks between nodes. For example, the shortest path kernel decomposes GRNs into path features weighted by regulatory motifs, enabling classification of pathways on benchmark datasets of transcription factor interactions. This approach aids in identifying conserved regulatory mechanisms across or conditions. In protein-protein interaction (PPI) networks, random walk kernels detect disease-associated modules by measuring similarities centered on known disease s, as shown in predictions of outcomes where kernel methods on PPI graphs achieve competitive performance, highlighting disrupted interaction clusters. A notable application involves vertex nomination in attributed graphs, where hybrid kernels combining structural and attribute similarities nominate corresponding vertices across networks. Subgraph matching kernels, which align attributed substructures via optimal bijections, have been benchmarked on web graphs like citation networks in the 2010s, integrating node labels such as page categories with walk-based features in semi-supervised settings. To address scalability in large networks, subgraph sampling techniques adapt graph kernels by approximating full-graph computations on induced subgraphs, preserving kernel positive-definiteness while reducing computational demands for sparse graphs. This has been applied to large social networks, enabling kernel-based clustering while maintaining performance for community detection tasks. As of 2025, hybrid approaches integrating graph kernels with graph neural networks have extended applications in for privacy-preserving community detection and in bioinformatics for enhanced protein-ligand binding prediction in .

Challenges and Advances

Computational Limitations

Computing graph kernels often faces significant efficiency challenges, particularly due to their high time complexities. For instance, the direct computation of kernels on graphs with n nodes requires O(n^6) time, arising from the need to construct and traverse the product graph between pairs of input graphs. In contrast, the kernel involves h iterations of neighborhood aggregation, with a per-iteration complexity of O(n^2 \log n) for dense graphs, leading to an overall cost of O(h n^2 \log n) when evaluating similarities. These kernels map graphs into high-dimensional feature spaces, exacerbating the curse of dimensionality, where the exponential growth in feature dimensions leads to sparse representations and increased sensitivity to noise. Additionally, performance is highly sensitive to hyperparameters, such as the decay parameter \lambda in random walk kernels, which controls the weighting of walks and requires careful tuning via cross-validation to avoid under- or over-emphasizing certain structural motifs. For the WL kernel, the number of iterations h similarly influences expressiveness, with excessive values amplifying computational overhead without proportional gains in discriminative power. To address these issues, approximation techniques such as sampling random walks limit the to a of paths, reducing to near-linear time in practice while preserving approximation quality. For kernels, optimized implementations leverage of neighbor labels instead of hashing to minimize collision risks and enable deterministic computations, achieving sub-quadratic on sparse graphs. Empirical evaluations in 2020 surveys highlight poor scaling of classical graph kernels on datasets with graphs exceeding 1000 nodes, where computation times become prohibitive compared to neural alternatives, often limiting applicability to smaller molecular or social network benchmarks.

Integration with Deep Learning

The integration of graph kernels with deep learning has advanced significantly since 2020, particularly through hybrid models that combine the structural inductive biases of kernels with the representational power of neural networks. One prominent approach involves graph kernel-inspired graph neural networks (GNNs), where kernels serve as approximations or inspirations for neural architectures. For instance, neural tangent kernels (NTKs) have been adapted to graph settings to analyze and enhance GNNs, enabling kernel regression techniques that interpret infinitely wide GNNs as kernel methods for tasks like node classification and graph regression. These hybrid models, such as the Graph Neural Tangent Kernel (GNTK), bridge classical kernel methods with deep learning by deriving closed-form kernels from GNN training dynamics, achieving comparable performance to deep GNNs while offering theoretical insights into over-smoothing and expressivity limitations. Recent surveys underscore the growing synergy between graph kernels and , emphasizing expressiveness and scalability. A 2021 survey in the Journal of Artificial Intelligence Research provides a comprehensive of graph kernel expressiveness, highlighting how kernels like the Weisfeiler-Lehman () kernel inform the design of GNNs by capturing subgraph isomorphism tests that align with higher-order GNN variants. Complementing this, a 2024 survey in reviews GNN architectures grounded in graph kernels, such as WL-based models, evaluating their expressiveness, performance, and applicability in domains requiring structural comparisons, while noting that kernel-inspired GNNs often outperform traditional kernels in scalability for large graphs. Key advances in this integration include efficiency-focused simplifications and novel dynamical systems. The Simplified Graph Neural Tangent Kernel (SGTK), introduced in 2025, reduces the reliance on deep layers by performing fixed-step message aggregation, yielding a computationally lighter kernel that approximates NTK behavior with linear scaling in graph size and improved efficiency over full GNTK computations on benchmarks like molecular datasets. Additionally, Graph-Coupled Oscillator Networks (GraphCON), presented at ICML 2022, model graph propagation through coupled second-order differential equations discretized as neural layers, integrating kernel-like structural encodings for dynamic graphs and demonstrating superior expressivity in tasks involving temporal evolution, such as traffic forecasting, by approximating Fourier bases for irregular structures. Further advancements in 2025 include quantum evolution kernels, which leverage quantum computing for enhanced graph machine learning on complex structures, and graph kernel neural networks that fuse kernel methods directly into neural architectures for improved representation learning. Looking ahead, future directions emphasize scalable kernel designs leveraging attention mechanisms to handle large-scale graphs, addressing gaps in deep integration by enabling linear-time computations and adaptive substructure weighting. Attention-augmented kernels, such as those in multiple kernel ensemble GNNs, facilitate this by dynamically prioritizing relevant graph walks or subgraphs, promising enhanced performance on massive networks while maintaining the interpretability of kernel methods.

References

  1. [1]
    Graph Kernels: A Survey | Journal of Artificial Intelligence Research
    Nov 23, 2021 · Graph kernels have attracted a lot of attention during the last decade, and have evolved into a rapidly developing branch of learning on ...
  2. [2]
    A survey on graph kernels | Applied Network Science - SpringerOpen
    Jan 14, 2020 · A popular approach to learning with graph-structured data is to make use of graph kernels—functions which measure the similarity between graphs— ...<|control11|><|separator|>
  3. [3]
  4. [4]
  5. [5]
  6. [6]
    [PDF] Convolution Kernels on Discrete Structures UCSC-CRL-99-10
    We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs.
  7. [7]
    [PDF] Diffusion Kernels on Graphs and Other Discrete Structures
    A notable exception to this is the line of work stemming from the convolution kernel idea introduced in (Haussler, 1999) and related but inde- pendently ...
  8. [8]
    On Graph Kernels: Hardness Results and Efficient Alternatives
    For an ex- tensive overview of kernels on structured data, the reader is referred to [5]. Page 13. On Graph Kernels: Hardness Results and Efficient Alternatives.Missing: Gärtner | Show results with:Gärtner
  9. [9]
    [PDF] Kernels and Regularization on Graphs - Full-Time Faculty
    Abstract. We introduce a family of kernels on graphs based on the notion of regularization operators. This generalizes in a natural way the.
  10. [10]
    [PDF] Weisfeiler-Lehman Graph Kernels
    Graph kernels are instances of the family of so-called R-convolution kernels by Haussler (1999). ... Several different graph kernels have been defined in machine ...
  11. [11]
    [PDF] Graph Kernels arXiv:2011.03854v2 [cs.LG] 10 Nov 2020
    Nov 10, 2020 · Graph kernels are kernel functions between graphs used to assess similarity between graphs for predictions in classification and regression.
  12. [12]
    A unifying view of explicit and implicit feature maps of graph kernels
    Sep 17, 2019 · The graph kernel can then be plugged into a kernel machine, such as ... \sum _{v \in V(G)} \sum _{v' \in V(H)} k_W(v,v') \cdot k_V(v,v ...<|control11|><|separator|>
  13. [13]
    None
    Summary of each segment:
  14. [14]
    [PDF] Graph Kernels – a Synthesis Note on Positive Definiteness
    We will show how such a space can be defined on the set of graphs in the following. 3.3. Defining a Reproducing Kernel Hilbert Space by the explicit map- ping Φ.
  15. [15]
    [PDF] Convolution Kernels on Discrete Structures UCSC-CRL-99-10
    We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be.
  16. [16]
    [PDF] Convolution Kernels on Discrete Structures - UCL Computer Science
    Jul 8, 1999 · Abstract. We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs.
  17. [17]
    [PDF] Survey on Graph Kernels - arXiv
    Another survey was published in 2010 by Vishwanathan et al. [3] but its main topic are random walk kernels and it does not include recent advances. Moreover ...
  18. [18]
    [PDF] Shortest-path kernels on graphs
    In this article, we present a class of graph kernels that measure similar- ity based on shortest paths in graphs, that are computable in polynomial time, that ...
  19. [19]
    None
    Summary of each segment:
  20. [20]
    On Graph Kernels: Hardness Results and Efficient Alternatives
    On Graph Kernels: Hardness Results and Efficient Alternatives. Conference paper. pp 129–143; Cite this conference paper. Download book PDF.
  21. [21]
    [PDF] Fast Computation of Graph Kernels - NIPS papers
    Using extensions of linear algebra concepts to Reproducing Kernel Hilbert Spaces. (RKHS), we define a unifying framework for random walk kernels on graphs.
  22. [22]
    [PDF] Graph Kernels For Disease Outcome Prediction From Protein ...
    Sep 25, 2006 · An important goal of research on protein interactions is to identify relevant interactions that are involved in disease outbreak and progression ...Missing: graphlet | Show results with:graphlet
  23. [23]
  24. [24]
    Graph kernels, hierarchical clustering, and network community ...
    Aug 6, 2025 · In this paper, we target to present a survey on various types of communities and community detection algorithms in social networks. We also ...
  25. [25]
    Graph kernels for disease outcome prediction from protein ... - PubMed
    In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate ...Missing: module | Show results with:module
  26. [26]
    Graph Kernels
    We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and ...Missing: survey | Show results with:survey
  27. [27]
    Fast Random Walk Graph Kernel - SIAM Publications Library
    Dec 18, 2013 · In this paper, we propose ARK, a set of fast algorithms for random walk graph kernel computation. ARK is based on the observation that real ...Missing: original | Show results with:original
  28. [28]
    New Insights into Graph Convolutional Networks using Neural ...
    Oct 8, 2021 · This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs).
  29. [29]
    Graph neural network based on graph kernel: A survey - ScienceDirect
    This paper organizes some important methods of graph neural networks based on graph kernels in recent years in terms of expressiveness, performance, and ...
  30. [30]
    [2507.03560] Simplifying Graph Kernels for Efficient - arXiv
    Jul 4, 2025 · The paper introduces a simplified graph kernel using K-step message aggregation, avoiding deep layers, and another kernel using Gaussian ...Missing: 2024 | Show results with:2024
  31. [31]
    [2202.02296] Graph-Coupled Oscillator Networks - arXiv
    Feb 4, 2022 · We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system.
  32. [32]
    Graph neural networks with multiple kernel ensemble attention
    Oct 11, 2021 · In this paper, we propose a multiple kernel ensemble attention method for graph learning. Unlike previous work, the attention weights in the proposed method ...