Induced subgraph
In graph theory, an induced subgraph of a graph G is formed by selecting a nonempty subset S of the vertices of G and including all edges from G that connect pairs of vertices in S, resulting in a graph whose vertex set is S and whose edge set consists precisely of the edges of G induced by S.[1] This construction ensures that the induced subgraph preserves the local connectivity of the original graph among the chosen vertices without adding or omitting any edges between them.[2]
Unlike a general subgraph, which allows the deletion of both vertices and arbitrary edges, an induced subgraph arises solely from vertex deletion, retaining every possible edge between the remaining vertices to maintain the original structure intact.[3] Formally, if H is an induced subgraph of G, then V(H) \subseteq V(G) and E(H) includes every edge of G with both endpoints in V(H), making every induced subgraph a subgraph but not conversely.[4] This property distinguishes induced subgraphs as maximal with respect to their vertex sets, often denoted as G[S] or G - (V(G) \setminus S).[3]
Induced subgraphs are fundamental in structural graph theory, enabling the characterization of various graph classes through forbidden induced subgraphs, such as chordal graphs, which are precisely the graphs containing no induced cycle of length at least four.[5] They also feature prominently in extremal graph theory, as exemplified by Erdős's theorem stating that every graph with average degree at least $2k (for natural number k) contains an induced subgraph with minimum degree at least k+1.[3] These concepts underpin applications in algorithm design, network analysis, and optimization problems where preserving edge relations is essential.[6]
Definitions and Basics
In graph theory, an induced subgraph of an undirected graph G = (V, E) is formed by selecting a nonempty subset S \subseteq V of vertices and including all edges from E that connect pairs of vertices in S. Formally, the induced subgraph G[S] has vertex set S and edge set E(S) = \{ \{u, v\} \in E \mid u, v \in S \}.[7]
This construction preserves the adjacency relations among the selected vertices exactly as they appear in the original graph G, ensuring that no edges are added or omitted between vertices in S. The notation G[S] or "the subgraph of G induced by S" is standard, emphasizing the complete retention of edges based on the original graph's structure.[7]
For directed graphs G = (V, A), where A is the set of directed arcs, the induced subgraph G[S] for S \subseteq V consists of vertex set S and arc set A(S) = \{ (u, v) \in A \mid u, v \in S \}, thereby preserving all directed connections between vertices in S.[8]
In the case of multigraphs, which allow multiple edges between the same pair of vertices, the induced subgraph G[S] includes all parallel edges from the original multigraph that lie between vertices in S, maintaining the multiplicity of connections.[7]
Comparison with Other Subgraphs
An induced subgraph of a graph G on a vertex subset S \subseteq V(G) includes all edges of G that connect vertices in S, ensuring the subgraph G[S] preserves the complete edge structure between those vertices.[9] In contrast, an arbitrary subgraph selects both a subset of vertices and a subset of edges from G, potentially omitting some edges between the chosen vertices, which allows for greater flexibility but loses the edge-completeness property.[10] This distinction makes induced subgraphs particularly useful for studying hereditary properties, where the presence of certain structures in G implies their existence in every induced subgraph.[11]
A spanning subgraph, on the other hand, retains all vertices of G but includes only a subset of the edges, focusing on edge reductions without vertex deletions.[3] Unlike induced subgraphs, which fix the edge set based on a vertex selection and may exclude vertices, spanning subgraphs emphasize connectivity or sparsity across the entire vertex set, such as in spanning trees.[10] The full subgraph, which is simply G itself, includes all vertices and all edges, serving as the trivial case that contrasts with both induced and spanning variants by imposing no reductions.[12]
Minors and contractions differ fundamentally from these subgraph notions, as they involve not just deletions but also edge contractions that merge vertices, resulting in embeddings that are not isomorphic to any subgraph of the original graph.[13]
| Subgraph Type | Vertex Selection | Edge Selection |
|---|
| Arbitrary | Subset of V(G) | Subset of E(G), possibly omitting edges between selected vertices |
| Induced | Subset of V(G) | All edges of G between selected vertices |
| Spanning | All of V(G) | Subset of E(G) |
| Full | All of V(G) | All of E(G) |
Examples
Introductory Examples
To illustrate the concept of an induced subgraph, consider simple graphs where the preservation of adjacencies is evident through vertex selection. An induced subgraph arises by choosing a subset of vertices from a graph and retaining exactly those edges between them that appear in the original graph, ensuring no additional or omitted edges.[14]
A basic example is the path graph P_4, consisting of vertices {1, 2, 3, 4} connected sequentially by edges 1--2, 2--3, and 3--4. Selecting the subset S = \{1, 3\} yields an induced subgraph with no edges, as vertices 1 and 3 are non-adjacent in P_4; this results in two isolated vertices, emphasizing how non-edges are preserved.[14] Visually, P_4 can be drawn as a straight line: 1—2—3—4, and the induced subgraph on {1, 3} appears as disconnected points at positions 1 and 3, with no connecting line.
Another example involves the cycle graph C_4, with vertices {1, 2, 3, 4} and edges 1--2, 2--3, 3--4, 4--1. The subset S = \{1, 2, 3\} of three consecutive vertices induces a path graph P_3, including edges 1--2 and 2--3 but omitting any direct edge between 1 and 3 (which does not exist in C_4).[14] This can be visualized as a square cycle 1—2—3—4—1, where removing vertex 4 leaves the chain 1—2—3.
In contrast, for the complete graph K_4 on vertices {1, 2, 3, 4} with all six possible edges present, any subset S of three vertices, such as {1, 2, 3}, induces a complete graph K_3 (a triangle), as every pair in S is adjacent.[9] Drawing K_4 as a tetrahedron highlights this: selecting three vertices retains the full triangle among them, demonstrating preservation of all original edges.
Examples in Specific Graph Classes
In hypercube graphs Q_n, the longest induced path is a central object of study in the snake-in-the-box problem, which seeks to maximize the length of such paths for applications in coding theory and network design. For instance, in Q_4, the maximum induced path length is 7, while exact values remain open for larger n, with known lower bounds improving asymptotically.
In bipartite graphs, an induced matching is a matching where the subgraph induced by its endpoints consists solely of those isolated edges, with no additional edges between the matched vertices.[15] This structure ensures the matching is vertex-disjoint and that the induced subgraph consists exactly of those isolated edges, with no additional edges between the matched vertices, and in bipartite graphs, the size of the maximum induced matching relates to structural parameters like the matching number, though it can be strictly smaller.[15]
Any induced subgraph of a tree is a forest, as trees are acyclic and the induced subgraph operation preserves the absence of cycles by retaining all edges between selected vertices without introducing new ones.
Chordal graphs are defined by the absence of induced cycles of length 4 or greater, meaning every cycle in such a graph either has length 3 or includes chords that prevent it from being induced. This property distinguishes chordal graphs from more general classes, ensuring that minimal separators are cliques, though the focus here is solely on the induced cycle characterization.
Properties
Fundamental Properties
One fundamental property of induced subgraphs is their role in defining hereditary graph classes. A graph class is hereditary if it is closed under the operation of taking induced subgraphs, meaning that for any graph G in the class and any subset S of vertices of G, the induced subgraph G[S] also belongs to the class.[16] This closure property ensures that hereditary classes can often be characterized by a finite or infinite set of forbidden induced subgraphs. For example, the class of bipartite graphs is hereditary, as the induced subgraph of a bipartite graph remains bipartite, preserving the absence of odd cycles within the vertex subset.[16][11]
Induced subgraphs preserve the local degree structure relative to the selected vertex subset, capturing the density of connections within that subset without alteration. Specifically, for a vertex v \in S, the degree of v in G[S] is exactly the number of neighbors of v that lie in S, reflecting the original adjacency relations restricted to S. This preservation of relative degrees highlights how induced subgraphs maintain the intrinsic connectivity patterns of the original graph locally, without introducing or omitting edges between vertices in S.[9]
By construction, an induced subgraph G[S] includes no new edges or vertices beyond those induced by S from G, ensuring that isomorphism between G[S] and another graph H directly implies a corresponding structural embedding in G. That is, if G[S] \cong H, then the adjacency and non-adjacency relations among vertices in S mirror exactly those in H, preserving the graph's structural properties without extraneous connections.[17] A key illustration of this is that every induced subgraph of a complete graph K_n is itself complete, as all possible edges between the selected vertices are present in K_n.[18]
Connections to Other Graph Structures
Induced subgraphs provide a foundational link to cliques in graph theory, where a clique of size r in a graph G is defined as a subset of r vertices that induces a complete subgraph K_r, meaning every pair of vertices in the subset is adjacent. This induced structure ensures that the clique captures all possible edges within the vertex set. Maximal cliques, in particular, are induced complete subgraphs that are not properly contained in any larger clique, serving as key components in decompositions and recognition algorithms for various graph classes.
In a complementary fashion, independent sets connect to induced subgraphs through the absence of edges: an independent set in G is a vertex subset that induces the empty graph (with no edges), ensuring no two vertices in the set are adjacent. This property makes independent sets the "dual" of cliques in the complement graph, where an independent set in G corresponds to a clique in the complement \overline{G}. The maximum independent set, or independence number, thus relies on identifying the largest such induced empty subgraph.
Forbidden induced subgraphs offer a powerful characterization for hereditary graph classes, where a class is closed under taking induced subgraphs if excluding certain patterns defines membership. Perfect graphs, for instance, are precisely those graphs with no induced odd cycle of length at least 5 (an odd hole) or the complement of such a cycle (an odd antihole), as established by the Strong Perfect Graph Theorem; this equates the chromatic number and clique number for every induced subgraph. Cographs, another prominent class, are defined as graphs forbidding the induced path P_4 on four vertices, rendering them complement-reducible and perfect with efficient recognition via cotree representations. These forbidden structures enable structural theorems and algorithmic tractability in extremal graph theory.[19]
Induced cycles further bridge induced subgraphs to structural definitions, notably in chordal graphs, which are exactly the graphs without induced cycles of length 4 or greater; every cycle of length at least 4 must contain a chord (an edge connecting non-consecutive vertices). This absence of long induced cycles implies perfect elimination orderings, facilitating linear-time algorithms for optimization problems on these graphs. In planar graphs, induced cycles play a role in embedding properties, where the shortest induced cycle can relate to the girth—the length of the shortest cycle overall—particularly in girth-constrained planar graphs, as high girth ensures no short induced cycles, influencing embeddability and minor exclusions.
Computational Aspects
Basic Algorithms
The standard algorithm to construct an induced subgraph G[S] of a graph G = (V, E) given a vertex subset S \subseteq V selects all vertices in S and includes exactly those edges from E whose both endpoints lie in S. This process directly implements the definition of an induced subgraph as a vertex-induced subgraph containing all original edges between the selected vertices.[20]
In an adjacency list representation of G, the construction iterates over all edges by traversing the adjacency lists and collects edges with both endpoints in S, achieving O(|E|) time complexity.[21] The following pseudocode illustrates this for an undirected graph, assuming S is provided as a set for O(1) membership checks:
function InducedSubgraph_AdjList(G, S):
V_new = S
E_new = [empty set](/page/Empty_set)
for v in V(G):
if v in S:
for u in adj[G][v]:
if u in S and u > v: // Avoid duplicate edges in undirected graph
E_new.add({v, u})
return [graph](/page/Graph)(V_new, E_new)
function InducedSubgraph_AdjList(G, S):
V_new = S
E_new = [empty set](/page/Empty_set)
for v in V(G):
if v in S:
for u in adj[G][v]:
if u in S and u > v: // Avoid duplicate edges in undirected graph
E_new.add({v, u})
return [graph](/page/Graph)(V_new, E_new)
For an adjacency matrix representation, where G is encoded as an |V| \times |V| boolean matrix A, the induced subgraph corresponds to the |S| \times |S| submatrix formed by the rows and columns indexed by S. This submatrix can be extracted in O(|S|^2) time by copying the relevant entries.[21]
To verify whether a given subgraph H = (S, F) with S \subseteq V(G) and F \subseteq E(G) is the induced subgraph G[S], confirm that F exactly equals the set of all edges in E(G) with both endpoints in S. Assuming F \subseteq E(G) \cap \binom{S}{2}, this reduces to checking for no missing edges from G between vertices in S, which can be done in O(|E|) time using adjacency lists (by iterating over all edges in G and verifying inclusion if endpoints are in S) or O(|S|^2) time using adjacency matrices (by comparing entries).[20]
Trivial cases of induced subgraphs include the singleton set S = \{v\} for any vertex v \in V(G), which yields an isolated vertex with no edges, and the full set S = V(G), which yields G itself.[21]
Hardness Results
The induced subgraph isomorphism problem, which asks whether a graph G contains a given graph H as an induced subgraph, is NP-complete. This follows from a polynomial-time reduction from the clique problem: determining if G has a clique of size k is equivalent to checking for an induced subgraph isomorphic to the complete graph K_k, as the induced subgraph on a clique induces all required edges without extras.[22] It is also NP-complete via reduction from the independent set problem, where finding an independent set of size k corresponds to an induced subgraph isomorphic to the empty graph on k vertices.[22]
Special cases of induced subgraph detection exhibit varying complexity. Detecting an induced path of length at least k in a general graph is NP-complete, even in bipartite graphs.[23] Similarly, finding an induced cycle of length at least k is NP-complete in general graphs. However, these problems are solvable in polynomial time in certain graph classes, such as trees (where induced paths coincide with paths) or chordal graphs using structural decompositions.[23]
In parameterized complexity, induced subgraph isomorphism is fixed-parameter tractable when parameterized by the size of H, |V(H)|. Algorithms achieve this via techniques like color-coding, which randomly colors vertices of G with |V(H)| colors and searches for a colorful set inducing a subgraph isomorphic to H, derandomizable in FPT time, or through branching on potential mappings.[22] The running time is typically O(2^{O(|V(H)|)} n^{|V(H)|}) or improved variants for specific H.[22]
Optimization variants face inapproximability barriers. The maximum induced matching problem, seeking the largest induced subgraph that is a matching (no two edges share a vertex or are adjacent), is NP-hard even on bipartite graphs of maximum degree 4, and APX-hard, meaning it cannot be approximated within a constant factor unless P=NP.[24] More strongly, it is inapproximable within a factor of n^{1/3 - \epsilon} for any \epsilon > 0 in polynomial time, assuming P \neq NP.[24]
Applications
In Graph Theory and Combinatorics
In Ramsey theory, induced subgraphs arise naturally in the study of edge colorings, where monochromatic cliques serve as induced subgraphs in the corresponding color class. Classical Ramsey's theorem guarantees the existence of monochromatic cliques in sufficiently large edge-colored complete graphs, and these cliques are inherently induced subgraphs since they include all possible edges within the vertex set in one color.[25] More generally, induced Ramsey-type theorems extend this idea by ensuring that for any fixed graph H, there exists a number such that any 2-coloring of the edges of a large enough complete graph contains a monochromatic induced copy of H. This result, established by Rödl and others, highlights the role of induced subgraphs in avoiding certain structures while forcing others in colored environments.[26]
In extremal graph theory, extensions of Turán's theorem address graphs that forbid induced copies of specific subgraphs, leading to the concept of induced Turán numbers. Turán's original theorem bounds the maximum edges in a graph without a complete subgraph K_{r+1}, but the induced variant considers the extremal function \mathrm{ex}(n, H; \mathrm{ind}), which is the maximum edges in an n-vertex graph without an induced copy of H. For certain forbidden induced subgraphs like paths or cycles, these numbers provide tighter bounds than classical Turán numbers, revealing structural constraints beyond mere subgraph avoidance. Seminal work in this area, including results on induced-forbidden complete bipartite graphs, demonstrates that such extremal graphs often exhibit balanced incomplete structures similar to Turán graphs but with additional independence conditions.[27]
Characterization theorems frequently rely on forbidden induced subgraphs to define graph classes. A graph is bipartite if and only if it contains no induced odd cycle, as any odd cycle in a non-bipartite graph implies the existence of a chordless induced odd cycle via girth considerations. This forbidden induced subgraph characterization extends to variants of the Hadwiger conjecture, where excluding certain induced subgraphs like specific complements or wheels allows partial resolutions or strengthenings. For instance, graphs free of induced \overline{K_3} and additional forbidden structures satisfy Hadwiger's condition for chromatic number bounds, providing evidence toward the conjecture in restricted classes.[28]
Combinatorial enumeration of induced subgraphs of a given size k employs generating functions to capture the distribution over vertex subsets. The induced subgraph counting polynomial, which sums over all k-vertex induced subgraphs, can be expressed using exponential generating functions that account for vertex selections and edge preservations, facilitating asymptotic analysis in random graph models. This approach, rooted in algebraic combinatorics, avoids exhaustive computation by leveraging symmetries and inclusion principles, though exact closed forms remain challenging for general graphs.[29]
In Computer Science and Beyond
In graph algorithms, induced subgraphs play a central role in optimization problems such as the maximum independent set (MIS), where the goal is to find the largest induced subgraph with no edges between its vertices. Branch-and-bound algorithms address this NP-hard problem by systematically exploring the search space, branching on vertices to include or exclude them while pruning branches using upper bounds on the independent set size. A key technique involves reduction rules, such as removing degree-0 or degree-1 vertices and applying domination or folding operations, to simplify the graph before branching. For graphs with average degree at most 3, these methods achieve a worst-case time complexity of O^*(1.0821^n). In general graphs, the best known algorithms run in O^*(1.1996^n) time, enabling practical solutions for moderately sized instances in scheduling and resource allocation.[30][31]
In network analysis, detecting communities in social networks often relies on identifying dense induced subgraphs, where vertices represent users and edges denote interactions, capturing tightly knit groups with high internal connectivity. Algorithms for dense subgraph extraction adapt sparse matrix techniques, such as column similarity measures, to perform partial clustering without presupposing the number of communities, allowing some nodes to remain unassigned. This method efficiently identifies overlapping communities in large-scale social graphs, as demonstrated on real-world datasets where it reveals structural patterns like interest-based groups. By focusing on induced density, these approaches provide interpretable insights into information flow and influence propagation, outperforming global clustering in handling sparse, heterogeneous networks.[32]
In bioinformatics, induced subgraphs serve as network motifs in protein-protein interaction (PPI) graphs, representing recurring, functionally significant patterns such as triangles or feed-forward loops that are over-represented compared to random expectations. For instance, the human interactome exhibits approximately 194 times more triangles than in yeast PPI networks and about 3 times more than predicted by null models, highlighting evolutionary conservation in signaling complexes. Methods for counting these motifs account for noise and incompleteness in PPI data through unbiased estimators based on uniform node sampling, enabling accurate quantification in large-scale human networks derived from high-throughput experiments. Conserved induced paths, as specific linear motifs, aid in identifying regulatory cascades, such as those in transcription factor networks where feed-forward loops outnumber feedback loops by a ratio of ~10:1, informing drug target discovery and pathway analysis.[33]
In machine learning, induced subgraph embeddings enhance graph neural networks (GNNs) by extracting topological features from local induced subgraphs, improving tasks like node classification and link prediction on complex graphs. Subgraph Neural Networks (SubGNNs) introduce a routing mechanism that propagates messages across subgraph components via position, neighborhood, and structure channels, disentangling representations to capture intricate patterns even in disconnected or multi-component subgraphs. This inductive approach allows generalization to unseen regions, achieving up to 125.2% improvement over baselines on real-world biomedical datasets for rare disease diagnosis. By embedding induced subgraphs into low-dimensional spaces, these models facilitate scalable feature extraction in applications like social profiling and molecular property prediction.[34]