Fact-checked by Grok 2 weeks ago

Pumping lemma for regular languages

The is a fundamental in theory that characterizes an essential property of , providing a necessary condition for membership in this class. It asserts that for any L, there exists a positive p (known as the pumping length) such that every s \in L with |s| \geq p can be decomposed as s = xyz, where |xy| \leq p, |y| \geq 1, and for all i \geq 0, the xy^i z \in L. Introduced by and in their 1959 paper on finite automata, the lemma emerged as part of early work establishing the equivalence between regular languages and those recognized by finite automata. The proof relies on the : given a deterministic finite automaton (DFA) with p states recognizing L, any input string of length at least p must cause a state repetition during processing, creating a repeatable substring y (the "pumpable" part) corresponding to a in the automaton's state graph. This structure ensures that inserting or removing copies of y keeps the string accepted by the DFA, as the computation loops through the same states. The pumping lemma's primary application is in proving that a given is not regular via : assume regularity, select a sufficiently long string in the language, and demonstrate that no satisfies the pumping without producing strings outside the language. Common examples include showing that languages like \{ a^n b^n \mid n \geq 0 \} or \{ 0^n 1^n \mid n \geq 0 \} fail the , as pumping disrupts the required balance between symbols. While it does not provide a sufficient for regularity (some non-regular languages accidentally satisfy it), the remains a for distinguishing regular from non-regular languages in .

The Lemma

Formal Statement

Let L be a regular language. Then there exists a positive p \geq 1 (called the pumping length) such that, for every w \in L with |w| \geq p, w can be divided as w = xyz satisfying the following conditions: |xy| \leq p, |y| \geq 1, and for all k \geq 0, xy^k z \in L. In this formulation, x, y, and z are over the of L such that x and z may be empty, but y is nonempty to ensure the "pumping" operation alters the length. The condition |xy| \leq p guarantees that the decomposable portion begins within the first p symbols of w, reflecting the bounded memory of finite automata recognizing regular languages. This lemma was first proved by and in their seminal 1959 paper on finite automata, establishing a key property distinguishing regular languages from non-regular ones. A similar result for regular languages was independently obtained by Yehoshua Bar-Hillel, Micha A. Perles, and Eli Shamir in 1961, though their work primarily advanced pumping lemmas for context-free languages. The pumping lemma arises from the finite state structure of regular languages: since a (DFA) accepting L has finitely many s (at most p), any sufficiently long input string must revisit a within the first p symbols, allowing the loop (corresponding to y) to be repeated arbitrarily.

General Version

The general version of the pumping lemma for regular languages offers a strengthened formulation that allows for the decomposition of sufficiently long strings into five parts, enabling simultaneous pumping of two potentially non-contiguous substrings while preserving membership in the language. Specifically, for every regular language L, there exists a positive integer p (the pumping length) such that for every string w \in L with |w| \geq p, w can be expressed as w = u v x y z where |v x y| \leq p, |v y| \geq 1, and for all integers k \geq 0, the string u v^k x y^k z \in L. This variant differs from the basic version, which decomposes w = x y z with |x y| \leq p and |y| \geq 1, allowing pumping only of the single substring y to yield x y^k z \in L for k \geq 0. In the general version, the non-empty pumped portion is distributed across v and y (with at least one non-empty), separated by x, providing greater flexibility in identifying repeatable segments within the bounded region of length at most p. This dual-pumping structure imposes a stricter condition on the language but remains satisfied by all regular languages due to their underlying finite structure. The term "general" arises because this form generalizes the basic pumping by permitting the repeatable content to be split into two parts, which can simplify certain analyses or proofs by accommodating decompositions where the cycle in a DFA might correspond to disjoint segments. It is particularly useful in scenarios requiring more nuanced string manipulations while still capturing the essential repeatable property of regular languages. Although stricter, the general version implies the basic one, as every satisfies the general condition. Like the basic version, however, it provides only a necessary condition for a to be , not a sufficient one.

Applications

Proving Non-Regularity

The provides a powerful tool for proving that a given is not through . To apply it, assume that the L is . Then, there exists a pumping p such that for any string w \in L with |w| \geq p, w can be decomposed as w = xyz where |xy| \leq p, |y| \geq 1, and for all integers k \geq 0, xy^k z \in L. To derive a , select a specific w \in L with |w| \geq p that forces the pumped strings xy^k z to exit L for some k, regardless of how the decomposition is chosen (as the guarantees the existence of at least one such decomposition if L were , but the proof must cover all possible valid decompositions to show none satisfy the condition). This method exploits the finite nature of languages, where long strings must repeat patterns in a way that preserves membership under pumping, but certain languages resist this repetition without violating their defining properties. A classic example is the language L = \{ a^n b^n \mid n \geq 0 \}. Assume L is regular, so it has a pumping length p. Choose w = a^p b^p \in L, with |w| = 2p \geq p. Any valid decomposition w = xyz must satisfy |xy| \leq p, so x and y consist solely of a's (since the first p symbols are a's), and |y| \geq 1. Thus, y = a^{|y|} for some $1 \leq |y| \leq p, z = a^{p - |x| - |y|} b^p. Pumping with k=2 yields xy^2 z = a^{p + |y|} b^p. Since |y| \geq 1, this string has more a's than b's, so xy^2 z \notin L. This holds for any such decomposition, contradicting the pumping lemma. Therefore, L is not regular. Another illustrative example is the language L = \{ a^n b^m \mid m \geq n \}. Assume L is regular with pumping length p. Select w = a^p b^p \in L, so |w| = 2p \geq p. In any decomposition w = xyz with |xy| \leq p and |y| \geq 1, y must be a non-empty substring of the a's, hence y = a^{|y|}. Pumping up with k=2 gives xy^2 z = a^{p + |y|} b^p, where the number of a's (p + |y|) now exceeds the number of b's (p), violating m \geq n. Thus, xy^2 z \notin L for any valid decomposition, yielding a contradiction and proving L is not regular. When applying this technique, care must be taken to choose w strategically to constrain the position of y within a "critical" region of the string, ensuring that pumping disrupts the language's structure no matter the exact split (as the choice of is adversarial and not under the prover's control). A frequent error is failing to consider all possible or incorrectly assuming y spans multiple types when |xy| \leq p limits it to the ; another is selecting a w where some might keep pumped strings in L, invalidating the . The can only establish non-regularity—it cannot prove a is , as non- languages may accidentally satisfy the condition.

Failure of the Converse

The converse of the pumping lemma for asserts that if a satisfies the pumping condition, then it is . This assertion is false, as there exist that satisfy the condition but are not . A is the L = \{ uu^R v \mid u, v \in \{a, b\}^+ \}, where u^R is the reverse of u. This is not , as can be shown using the Myhill-Nerode theorem, which characterizes by the finiteness of equivalence classes in the right-invariant relation on strings; here, there are infinitely many distinguishable equivalence classes (e.g., strings of the form ab^{2n+1}a and ab^{2m+1}a for n \neq m are pairwise inequivalent). Nevertheless, L satisfies the pumping condition of the lemma for regular languages. For pumping length p = 4, any string w \in L with |w| \geq 4 admits a decomposition w = xyz with |xy| \leq 4, |y| > 0, such that xy^i z \in L for all i \geq 0. If |u| = 1, set x = uu^R, y as the first letter of v, z as the rest of v; then |xy| = 3 \leq 4, and pumping y keeps the form uu^R v' with v' extended by repeats of the first letter, remaining in L. If |u| > 1, set x = \epsilon, y as the first letter of u, z as the rest; pumping up repeats the first letter in u, preserving the palindrome property followed by v, and pumping down to i=0 still yields a valid string in L. This demonstrates that the pumping lemma provides only a one-way implication for regularity: satisfaction is necessary but not sufficient, so passing the test cannot prove a regular. The pumping lemma is thus weaker than the Myhill-Nerode theorem, which gives a necessary and sufficient condition for regularity via equivalence classes, allowing full characterization where the pumping lemma offers merely a partial test.

Proof

Proof of the Formal Statement

Assume that the language L is regular. Then there exists a deterministic finite automaton (DFA) M = (Q, \Sigma, \delta, q_0, F) that accepts L, where |Q| = q is the number of states in M. Let the pumping length p be equal to the number of states q in M. Consider any string w \in L such that |w| \geq p. Since M accepts w, there is a sequence of states s_0, s_1, \dots, s_{|w|} where s_0 = q_0 is the start state, s_{|w|} \in F is an accepting state, and s_{i} = \delta(s_{i-1}, w_i) for $1 \leq i \leq |w|, with w = w_1 w_2 \dots w_{|w|}. Now consider the prefix of w consisting of the first p symbols, which induces a sequence of p+1 states: s_0, s_1, \dots, s_p. Since there are p+1 states in this sequence and only p possible states in Q, by the , there must exist two indices i and j such that $0 \leq i < j \leq p and s_i = s_j. Decompose w as w = x y z, where x is the prefix of w of length i (possibly empty), y is the substring of w from position i+1 to j (so |y| = j - i \geq 1), and z is the suffix of w after position j. Thus, |x y| = j \leq p and |y| \geq 1. To verify that x y^k z \in L for all integers k \geq 0, consider the computation of M on x y^k z. After reading x, M reaches state s_i. Then, reading the first copy of y takes M from s_i to s_j = s_i. For each additional copy of y (in k \geq 2), M traverses the same transition loop from s_i back to s_i. After reading all k copies of y, M is again in state s_i = s_j. Finally, reading z from s_j leads to the accepting state s_{|w|}, as in the original computation on w. For k=0, reading x z reaches s_i after x, and then z from there (equivalent to skipping y but starting the suffix from s_i = s_j) also leads to the accepting state. Thus, all such pumped strings are accepted by M.

Proof of the General Version

The proof of the general version of the pumping lemma for regular languages proceeds by adapting the argument used for the formal statement, but with a decomposition that allows for potentially two pumpable substrings v and y while ensuring the conditions |vy| \geq 1 and |vxy| \leq p. Let L be a regular language accepted by a DFA M with p states. For any string w = a_1 a_2 \dots a_n \in L where n \geq p, consider the unique computation path in M starting from the initial state q_0, yielding states q_i = \delta(q_{i-1}, a_i) for $1 \leq i \leq n, with q_n an accepting state. Focus on the prefix of w of length p, producing the state sequence q_0, q_1, \dots, q_p. This sequence has p+1 states but only p possible states in M, so by the , there exist indices $0 \leq i < j \leq p such that q_i = q_j. Decompose w as w = u v x y z, where u = a_1 \dots a_i, v = a_{i+1} \dots a_j, x = a_{j+1} \dots a_p, y = \epsilon (the empty string), and z = a_{p+1} \dots a_n. This ensures |vxy| = |v x| = p - i \leq p and |vy| = |v| = j - i \geq 1 (since j > i). To verify the pumping condition, consider any integer k \geq 0. The pumped string is u v^k x y^k z = u v^k x z (since y = \epsilon). The computation reaches state q_i after reading u. Reading v^k then performs the transition loop from q_i to q_j = q_i exactly k times, returning to q_i. From q_i, reading x follows the original path to q_p, and reading z reaches the accepting state q_n. Thus, u v^k x y^k z \in L for all k \geq 0. This construction handles the case where y = \epsilon but v \neq \epsilon, satisfying |vy| \geq 1. Symmetrically, if a repeat is chosen such that the loop aligns later in the prefix (e.g., by selecting i and j to maximize j), one could set v = \epsilon and y \neq \epsilon with x preceding the loop, ensuring the conditions hold without both being empty. The general version is equivalent to the basic version (with decomposition w = xyz, |xy| \leq p, |y| \geq 1, and x y^k z \in L for k \geq 0). The basic implies the general by applying the basic decomposition w = x' y' z' and mapping it to the general form via u = x', v = y', x = \epsilon, y = \epsilon, z = z', preserving |vxy| = |y'| \leq p and |vy| = |y'| \geq 1, with pumping matching exactly. Conversely, the general implies the basic: given a general decomposition, the construction always admits a choice where one of v or y is empty. For the case v \neq \epsilon and y = \epsilon, set the basic x = u, y = v, z = x z; then |x y| = |u v| = j \leq p, and x y^k z = u v^k x z, matching the general pumping. Symmetrically, for v = \epsilon and y \neq \epsilon, set the basic x = u x, y = y, z = z; then |x y| \leq p by the construction, and pumping matches u x y^k z. This equivalence confirms the general version as a strengthened yet equivalent characterization for regular languages.

References

  1. [1]
    None
    ### Formal Statement of the Pumping Lemma for Regular Languages
  2. [2]
    [PDF] The Pumping Lemma for Regular Languages
    Mar 6, 2013 · Sipser Ch 1: p77–82. A regular language can be “pumped,” i.e., any long enough string can be made even longer or have some part chopped off, ...
  3. [3]
    [PDF] The Pumping Lemma - Theorem of the Day
    This lemma was published in 1959 by Michael Rabin and Dana Scott, and, independently in a generalisation to context-free languages, by Yehoshua Bar-Hillel ...
  4. [4]
    [PDF] Formal Statement of the Pumping Lemma - Washington
    Pumping Lemma in Plain English. ✦ Let L be a regular language and let p = “pumping length” = no. of states of a DFA accepting L. ✦ Then, any string s in L ...
  5. [5]
    [PDF] Pumping Lemma Handout Contents 1 Regular Languages
    The formal statement of the pumping lemma follows. Lemma 1 (Pumping Lemma) For any regular language L, there exists a positive integer p, (which is called a ...
  6. [6]
    [PDF] Pumping Lemma
    Pumping Lemma: Regular Languages. If A is a regular language, then there is a pumping length p st if s ∈ A with |s| ≥ p then we can write s = xyz so that.
  7. [7]
    [PDF] Lecture 9: Proving non-regularity
    Feb 17, 2009 · We will see two methods for showing that a language is not regular. The “pumping lemma” shows that certain key “seed” languages are not regular.
  8. [8]
    Non-Regular Languages
    Example 4: As an example to illustrate how Pumping Lemma might be used to prove that a language is nonregular, let us prove that the language L = akbk is ...
  9. [9]
    [PDF] IT 328 Theory of Computation Exercise: Pumping Lemma for ...
    Nov 1, 2022 · (d) The language L = {anbl \. \ n ≤ l} is not regular. By contradiction, assume L is regular. • Let m be sufficiently large, and consider ambm.
  10. [10]
    [PDF] Comments on the Pumping Lemma for Regular Languages
    To prove that a language is not regular, we have to show that no spanning set exists, which means that we have to show that there are infinitely many ...
  11. [11]
    AMS :: Proceedings of the American Mathematical Society
    Linear automaton transformations. HTML articles powered by AMS MathViewer ; by A. Nerode ; Proc. Amer. Math. Soc. 9 ; PDF | Request permission.
  12. [12]
    [PDF] Finite Automata and Their Decision Proble'ms#
    Abstract: Finite automata are considered in this paper as instruments for classifying finite tapes. Each one- tape automaton defines a set of tapes, ...