Fact-checked by Grok 2 weeks ago

Regular language

In and theory, a regular language is a formal language consisting of a set of strings over a finite that can be recognized by a finite , equivalently expressed using regular expressions, or generated by a . Regular languages were introduced by mathematician in his 1951 technical report, published in 1956 as "Representation of Events in Nerve Nets and Finite Automata," where he introduced the algebraic structure of regular events as a model for neural networks and sequential machines. The class of regular languages occupies the lowest level (Type-3) in the of formal grammars, which classifies languages based on the restrictions on production rules; regular languages are strictly contained within context-free languages and higher classes. They possess several closure properties, meaning that if two or more regular languages are subjected to operations like , , (repetition), complement, , reversal, or , the resulting language remains regular. A fundamental characterization is provided by the , which states that for any regular language L, there exists a pumping length p such that every string s in L with length at least p can be divided as s = xyz where |xy| \leq p, |y| > 0, and xy^iz is in L for all nonnegative integers i; this lemma is often used to prove that certain languages are non-regular. Regular languages and their equivalents, such as deterministic finite automata (DFAs), nondeterministic finite automata (NFAs), and regular expressions, are foundational in , enabling efficient algorithms for recognition and manipulation since finite automata have finite memory and can process inputs in linear time. In practice, regular expressions power applications in text processing, in compilers (e.g., tokenizing ), pattern matching in search tools, and built-in features of programming languages like , , and for string manipulation.

Fundamentals

Formal Definition

In formal language theory, an alphabet \Sigma is a finite nonempty set of symbols, such as \Sigma = \{0, 1\}. A string over \Sigma is a finite sequence of symbols from \Sigma, including the empty string \epsilon, which has length zero and contains no symbols. The set of all finite strings over \Sigma, denoted \Sigma^*, is known as the of \Sigma and forms the universal set for languages over \Sigma. A regular language over an \Sigma is a L \subseteq \Sigma^* consisting of all strings generated by a over \Sigma, or equivalently, all strings accepted by a over \Sigma. This definition captures the simplest class of formal languages in the . The formal syntax of regular expressions over \Sigma is defined inductively as follows:
  • The empty set \emptyset is a regular expression denoting the empty language.
  • Any symbol a \in \Sigma is a regular expression denoting the singleton set \{a\}.
  • The empty string \epsilon is a regular expression denoting \{\epsilon\}.
  • If R_1 and R_2 are regular expressions denoting languages L_1 and L_2, then:
    • (R_1 + R_2) (or (R_1 \mid R_2)) is a regular expression denoting L_1 \cup L_2 (union).
    • (R_1 \cdot R_2) is a regular expression denoting L_1 L_2 = \{xy \mid x \in L_1, y \in L_2\} (concatenation).
    • R_1^* is a regular expression denoting the Kleene star L_1^* = \bigcup_{n=0}^\infty L_1^n, where L_1^0 = \{\epsilon\} and L_1^{n+1} = L_1^n L_1 for n \geq 0.
      Parentheses are used to indicate precedence, with star having highest precedence, followed by concatenation, then union.
The concept of regular languages was introduced by Stephen C. Kleene in 1956 as "regular events," defined inductively using union, concatenation, and star operations on sequences of input symbols, with equivalence to recognition by finite automata (nerve nets).

Examples

Regular languages can be illustrated through simple sets of strings that can be recognized by finite automata, which maintain a finite amount of memory to track patterns. A classic example is the language L_1 = \{ w \in \{0,1\}^* \mid w ends with 1\} , consisting of all binary strings terminating in the symbol 1, such as 01, 101, and 111. This language is regular because a (DFA) with two states suffices: a start state q_0 (non-accepting, representing the last symbol was 0 or start) and an accepting state q_1 (last symbol was 1), with transitions q_0 \xrightarrow{0} q_0, q_0 \xrightarrow{1} q_1, q_1 \xrightarrow{0} q_0, and q_1 \xrightarrow{1} q_1. Another basic regular language is L_2, the set of all even- binary strings over \{0,1\}, including the \epsilon (length 0), 00, 01, 10, 11 (length 2), and so on. Recognition requires tracking with a DFA featuring two states: q_e (even length so far, accepting) as the start state and q_o (odd length, non-accepting), with transitions swapping states on each input symbol regardless of 0 or 1. Trivial cases include the empty language \emptyset, which contains no strings and is recognized by a DFA with no accepting states, and the full language \Sigma^* (for alphabet \Sigma = \{0,1\}), encompassing all possible binary strings, accepted by a single-state DFA where the sole state is accepting and loops on all inputs. In contrast, the language L_3 = \{ 0^n 1^n \mid n \geq 0 \}, such as \epsilon, 01, 0011, and 000111, is not regular because any purported DFA would require unbounded states to match the equal number of 0s and 1s, violating the finite memory constraint; this is shown via the pumping lemma for regular languages. These examples can be described using regular expressions, such as (0 + 1)^*1 for L_1 and (00 + 01 + 10 + 11)^* for even-length strings. In practice, regular languages underpin in text processing, like validating addresses with regex patterns that check formats such as local-part@ (e.g., user@), ensuring finite-state recognition of structural rules without infinite memory.

Equivalent Formalisms

Regular Expressions

Regular expressions provide a symbolic notation for specifying regular languages, originally introduced by Stephen Kleene in his 1956 work on representation of events in finite automata. They consist of atomic expressions combined using operations that correspond to , , and on languages. The syntax of regular expressions over an \Sigma is defined recursively as follows: the atomic expressions are the \emptyset, the \varepsilon, and single symbols a \in \Sigma; if r and s are regular expressions, then so are (r + s) (), (r s) (), and r^* (). Parentheses are used for grouping, and precedence rules apply: Kleene star has highest precedence, followed by concatenation, then , allowing omission of some parentheses (e.g., a b^* + c means (a (b^*)) + c). The semantics of a regular expression r, denoted L(r), map it to the it represents, defined recursively: L(\emptyset) = \{\}, L(\varepsilon) = \{\varepsilon\}, L(a) = \{a\} for a \in \Sigma; L(r s) = L(r) L(s) (concatenation of ); L(r + s) = L(r) \cup L(s); and L(r^*) = \bigcup_{k=0}^\infty L(r)^k, where L(r)^0 = \{\varepsilon\} and L(r)^{k+1} = L(r)^k L(r). For example, the expression a^* b denotes the of all strings consisting of zero or more a's followed by a single b. Every regular expression can be converted to an equivalent (NFA) using , an algorithm that builds the NFA recursively from subexpressions. The process starts with basic NFAs for atoms and combines them for operations, ensuring each partial NFA has a unique start state i and accept state f:
  • For \emptyset: States i and f with no transitions between them.
  • For \varepsilon: States i and f connected by an \varepsilon-transition.
  • For literal a \in \Sigma: States i and f connected by a transition labeled a.
For union r + s: Create new states i' and f'; add \varepsilon-transitions from i' to the starts of the NFAs for r and s, and from their accepts to f'. For r s: Use the start of r's NFA as overall start and s's accept as overall accept; add an \varepsilon-transition from r's accept to s's start. For star r^*: Create new states i' and f'; add \varepsilon-transitions from i' to r's start and to f', and from r's accept to r's start and to f'. This construction yields an \varepsilon-NFA with O(n) states and transitions, where n is the length of the expression. For example, consider a b^*: First, build NFA for a (states 0--a-->1); for b^*, build inner b (2--b-->3), then star with new 4--ε-->2, 4--ε-->5, 3--ε-->2, 3--ε-->5; concatenate by ε from 1 to 2, overall start 0, accept 5. In practice, extensions like extended regular expressions (ERE) are used for in Unix tools, adding features such as the quantifiers + (one or more) and ? (zero or one), bounded repetition {m,n}, and character classes [abc] or [:digit:]. These go beyond theoretical regular expressions, as they can describe non-regular languages when combined (e.g., backreferences in some implementations), and rely on implementation-specific behaviors like matching.

Finite Automata

Finite automata serve as fundamental computational models for recognizing regular languages, processing input strings sequentially through a finite number of states based on transitions determined by the symbols. These machines operate without memory beyond their current state, making them suitable for in linear time relative to the input length. The two primary variants are deterministic finite automata (DFAs), which have unique transitions for each input symbol from any state, and nondeterministic finite automata (NFAs), which allow multiple possible transitions or even spontaneous moves without consuming input. A DFA is formally defined as a 5-tuple (Q, \Sigma, \delta, q_0, F), where [Q](/page/Q) is a finite set of states, [\Sigma](/page/Sigma) is a finite input , [\delta](/page/Delta): Q \times \Sigma \to Q is the transition function that specifies a unique next state for each state-symbol pair, q_0 \in Q is the initial state, and F \subseteq Q is the set of accepting (or final) states. To accept a w \in \Sigma^*, the DFA begins in q_0 and applies \delta successively for each symbol in w; the string is accepted if the resulting state is in F, otherwise rejected. This deterministic behavior ensures exactly one computation path per input, facilitating efficient simulation. An NFA extends the DFA model to a 5-tuple (Q, \Sigma, \delta, q_0, F), where the transition function is \delta: Q \times (\Sigma \cup \{\epsilon\}) \to 2^Q, allowing a set of possible next states (including the empty set) for each state and input symbol or the empty string \epsilon. Acceptance occurs if there exists at least one computation path from q_0 to a state in F that consumes the entire input string, accounting for \epsilon-transitions that enable moves without reading symbols. Despite nondeterminism, NFAs recognize exactly the same class of languages as DFAs, as any NFA can be converted to an equivalent DFA via the , which treats subsets of Q as new states and simulates all possible NFA paths in parallel. In this construction, the DFA's states are elements of $2^Q, the initial state is \{q_0\}, accepting states are those subsets intersecting F, and transitions are defined by applying \delta to all states in a subset and taking the union. To optimize DFAs by reducing the number of states while preserving the recognized , minimization algorithms states into equivalence classes based on distinguishability by future inputs. Hopcroft's achieves this in O(|Q| \log |Q|) time by iteratively refining partitions using a breadth-first search-like refinement on the , improving upon earlier O(|Q|^2) methods. This makes it practical for large automata in applications like design and . As an illustrative example, consider an NFA recognizing binary strings over \Sigma = \{0, 1\} with an even number of 1's. The NFA has states Q = \{q_e, q_o\}, where q_e (initial and accepting) tracks even parity and q_o tracks odd parity, with transitions: \delta(q_e, 0) = \{q_e\}, \delta(q_e, 1) = \{q_o\}, \delta(q_o, 0) = \{q_o\}, \delta(q_o, 1) = \{q_e\}, and no -transitions. For input "101", the computation paths yield states q_e \to q_o \to q_e \to q_o (rejecting, odd 1's) or parallel simulations confirming acceptance only for even counts like "110" ending in q_e. Applying , the DFA states are \{\{q_e\}, \{q_o\}, \emptyset\}, but \emptyset is unreachable; transitions mirror the NFA since each yields a singleton, resulting in an identical two-state DFA with states renamed as even and odd parity. NFAs can also be constructed systematically from regular expressions using methods like .

Regular Grammars

A , also known as a type-3 grammar in the , is a that generates precisely the languages through restrictions on its production rules to ensure in one direction. These grammars are classified as either right-linear or left-linear. In a right-linear grammar, all productions are of the form A \to aB or A \to a, where A and B are nonterminal symbols, a is a terminal symbol from the alphabet \Sigma, and there is a distinguished start symbol S. Left-linear grammars mirror this structure with productions A \to Ba or A \to a. A grammar is if it is either right-linear or left-linear but not a mixture of both forms. The language L(G) generated by a G consists of all finite strings of terminals that can be obtained from the start symbol S via a , which is a of substitutions applying the production rules. In each derivation step, a nonterminal is replaced by the right-hand side of a matching , proceeding until only terminals remain; for regular grammars, derivations are inherently linear due to the single nonterminal allowed on the right-hand side. This generative process enumerates the language's strings without ambiguity in structure, though multiple derivations may yield the same string in nondeterministic cases. Regular grammars are equivalent to nondeterministic finite automata (NFAs), with constructive algorithms establishing bidirectional conversions. To convert a right-linear regular grammar to an NFA, map each nonterminal to a state, designate S as the start state, add transitions A \xrightarrow{a} B for each production A \to aB, introduce transitions A \xrightarrow{a} F to a new accepting state F for each A \to a, and mark states accepting if there is a production A \to \epsilon. The reverse conversion from an NFA to a regular grammar assigns nonterminals to states, with the start state as S, and generates productions A \to aB from transitions A \xrightarrow{a} B and A \to a from transitions to accepting states. These algorithms preserve the language exactly, confirming the equivalence. As an illustrative example, consider the regular language L = \{ a^n b^m \mid n, m \geq 0 \}, which matches the regular expression a^* b^*. A corresponding right-linear grammar is G = (\{S, B\}, \{a, b\}, P, S), with productions S \to aS \mid B, B \to bB \mid \epsilon. A derivation for the string a^2 b^3 (i.e., "aabbb") proceeds as follows:
S \Rightarrow aS \Rightarrow a(aS) \Rightarrow aaB \Rightarrow aabB \Rightarrow aabbB \Rightarrow aabbbB \Rightarrow aabbb.
The derivation tree forms a right-skewed spine: the root S branches to a and another S (twice for the a's), then to B, which branches to b and B (thrice), terminating in \epsilon. This tree visually captures the sequential generation, with leaves yielding the terminal string.

Properties

Closure Properties

Regular languages possess notable closure properties under several fundamental operations on formal languages. These properties underscore the robustness of the class, as the result of applying such an operation to one or more regular languages is always regular. Proofs of closure typically rely on constructive methods using finite automata, leveraging the equivalence between regular languages and the languages accepted by nondeterministic finite automata (NFAs) or deterministic finite automata (DFAs). The class of regular languages is closed under . Given regular languages L_1 and L_2 accepted by NFAs M_1 = (Q_1, \Sigma, \delta_1, s_1, F_1) and M_2 = (Q_2, \Sigma, \delta_2, s_2, F_2), construct an NFA M = (Q_1 \cup Q_2 \cup \{s\}, \Sigma, \delta, s, F_1 \cup F_2) with a new start state s and \varepsilon-transitions from s to s_1 and s_2; the transition function \delta extends \delta_1 and \delta_2. This M accepts exactly L_1 \cup L_2, as paths from s reach accepting states via either submachine. Closure under intersection holds via the product automaton construction. For DFAs M_1 and M_2 as above, form the product DFA M = (Q_1 \times Q_2, \Sigma, \delta, (s_1, s_2), F_1 \times F_2), where \delta((p, q), a) = (\delta_1(p, a), \delta_2(q, a)) for states p \in Q_1, q \in Q_2, and symbol a \in \Sigma. A string is accepted by M if and only if it is accepted by both M_1 and M_2, so M accepts L_1 \cap L_2. Since the state space is finite (|Q_1| \times |Q_2|), M is a DFA. Regular languages are closed under complementation. If L is accepted by a DFA M = (Q, \Sigma, \delta, s, F), then the complement \overline{L} = \Sigma^* \setminus L is accepted by the DFA M' = (Q, \Sigma, \delta, s, Q \setminus F), which swaps accepting and non-accepting s. Every string drives M' to a state in Q \setminus F precisely when M reaches a state not in F, ensuring acceptance of \overline{L}. The class is closed under . For regular languages L_1 and L_2 accepted by NFAs M_1 and M_2, construct M by adding \varepsilon-transitions from each state in F_1 to s_2, using states Q_1 \cup Q_2, start s_1, and accept states F_2. A string w = uv with u \in L_1, v \in L_2 reaches an accepting state via the path for u in M_1 followed by \varepsilon to M_2 for v, and no other strings are accepted. Closure under the operation L^* = \bigcup_{i=0}^\infty L^i (with L^0 = \{\varepsilon\}) is established using NFAs. For L accepted by NFA M = (Q, \Sigma, \delta, s, F), create M' = (Q \cup \{s', f\}, \Sigma, \delta', s', \{f\}) with new states s' (start) and f (sole accept), \varepsilon-transitions from s' to s and from s' to f, and from each state in F via \varepsilon to both s and f. This allows zero or more repetitions of paths accepting strings in L. Regular languages are closed under reversal, where L^R = \{w^R \mid w \in L\} and w^R reverses w. Given NFA M for L, construct NFA M^R by reversing all transitions (if \delta(p, a) includes q, add \delta(q, a) including p), setting start states to F (with \varepsilon-transitions if multiple to a new start if needed, but typically direct), and accept state to s. A path labeled w from s to some f \in F in M corresponds to a path labeled w^R from f to s in M^R, hence from start to accept in M^R. The class is closed under homomorphism. A homomorphism h: \Sigma^* \to \Gamma^* maps each symbol to a string in \Gamma^*. If L \subseteq \Sigma^* is regular, accepted by DFA M = (Q, \Sigma, \delta, s, F), then h(L) is accepted by NFA M_h = (Q_h, \Gamma, \delta_h, s, F), where for each transition \delta(p, a) = q in M, add a chain of states and transitions spelling h(a) from p to q (introducing intermediate states as needed for the fixed-length h(a)); \varepsilon-transitions connect chains. Since |h(a)| is fixed per a and |\Sigma| finite, the added states are finite, and processing w = a_1 \cdots a_k simulates M on w while outputting h(w), accepting if w \in L. Closure under inverse homomorphism holds: For homomorphism h: \Sigma^* \to \Gamma^* and regular L \subseteq \Gamma^* accepted by DFA N = (P, \Gamma, \gamma, t, G), the inverse h^{-1}(L) = \{w \in \Sigma^* \mid h(w) \in L\} is accepted by NFA N_i = (P_i, \Sigma, \delta_i, t, G), where states include copies of P for each position in the images h(a); on input a \in \Sigma, from state p \in P, chain through \gamma applied to the symbols of h(a) using intermediate states, ending at \gamma(p, h(a)). The finite added states per a (bounded by \max |h(a)| \times |P|) ensure finiteness; w is accepted if the simulation reaches G after h(w). Regular languages are closed under the (right) quotient operation when the divisor is regular: if L_1, L_2 are regular, then L_1 / L_2 = \{u \in \Sigma^* \mid \exists v \in L_2, \, uv \in L_1\} is regular. The construction modifies the DFA for L_1 by redefining accepting states to those from which some string in L_2 reaches an original accepting state, using the finite state space. However, the class is not closed under quotient when the dividend is non-regular. For a counterexample, let K = \{a^n b^n c \mid n \geq 0\} (non-regular) and R = \{c\} (regular); then K / R = \{a^n b^n \mid n \geq 0\}, which is non-regular. The following table summarizes the closure operations and corresponding automaton constructions:
OperationConstruction Method
New NFA start state with \varepsilon-transitions to starts of input NFAs; union of accept sets
Product DFA with synchronized transitions on symbols; product of start and accept sets
ComplementSame DFA, but swap accepting and non-accepting states
NFA combining input NFAs with \varepsilon-transitions from accept of first to start of second
Augmented NFA with new start/accept state and \varepsilon-loops from accepts to start
Reverse transitions in NFA; original accepts become starts (via \varepsilon if needed), original start becomes accept
NFA with \varepsilon-chains replacing transitions for strings h(a)
Inverse homomorphismNFA simulating transitions on full strings h(a) via chained states per symbol

Decidability Properties

Regular languages enjoy strong decidability properties, allowing algorithms to resolve key questions about them efficiently. Unlike higher classes in the , all standard decision problems for regular languages—when presented via finite automata, regular expressions, or regular grammars—are computable in time relative to the input size. The membership problem, determining whether a given w \in \Sigma^* belongs to a regular language L described by a (DFA) M = (Q, \Sigma, \delta, q_0, F), is decidable by simulating \delta on w starting from q_0. If the final state is in F, then w \in L. This simulation requires examining one transition per symbol in w, yielding a of O(|w| \cdot |\Sigma|). Emptiness, checking if L(M) = \emptyset for a DFA M, is decidable via graph reachability: model the DFA as a with states as nodes and transitions as edges, then use (BFS) from q_0 to detect if any state in F is reachable. Since the graph has |Q| nodes and at most |Q| \cdot |\Sigma| edges, BFS runs in O(|Q| \cdot |\Sigma|) time. For nondeterministic finite automata (NFAs), first convert to a DFA using subset construction, which is exponential but finite. Finiteness, determining if |L(M)| < \infty, is decidable by analyzing the DFA's structure for cycles that contribute to infinite languages. Specifically, compute the reachable states from q_0, then check for cycles within the subgraph of states from which an accepting state is reachable; the language is infinite if and only if such a cycle exists. This involves graph traversal algorithms like depth-first search for cycle detection, again in O(|Q| \cdot |\Sigma|) time. Equivalence between two regular languages given by DFAs M_1 and M_2 is decidable by minimizing both automata and verifying isomorphism of the results, or by constructing DFAs for L(M_1) \triangle L(M_2) (symmetric difference) and testing emptiness. Minimization partitions states into equivalence classes using a table-filling or partition refinement algorithm, with Hopcroft's algorithm achieving O(|Q| \log |Q| \cdot |\Sigma|) time, where |Q| is the total number of states across both DFAs. The Myhill-Nerode theorem underpins these results by characterizing regular languages via the right-congruence relation x \equiv_L y if xz \in L iff yz \in L for all z \in \Sigma^*; L is regular if and only if the number of equivalence classes (the index) is finite, equaling the size of the minimal DFA. This equivalence relation directly informs DFA minimization and equivalence testing by identifying indistinguishable states. In contrast to regular languages, non-regular classes like context-sensitive languages exhibit undecidability for problems such as equivalence; for instance, while membership in \{a^n b^n c^n \mid n \geq 0\} is decidable, deciding equivalence among context-sensitive languages in general is not.

Complexity Results

The conversion from a nondeterministic finite automaton (NFA) with |Q| states to an equivalent deterministic finite automaton (DFA) via the subset construction algorithm can result in an exponential blowup in the number of states, reaching up to $2^{|Q|} in the worst case. This worst-case bound is tight, as there exist NFAs requiring a DFA with exactly $2^{|Q|} states for equivalence. Minimization of a DFA with n states and alphabet size k can be performed in O(n k \log n) time using , which refines an initial partition of states based on distinguishability. This bound is asymptotically optimal, as the algorithm's partitioning process necessitates handling logarithmic factors in the worst case. The construction of an NFA from a regular expression of length m using proceeds in linear time, O(m), by inductively building sub-NFAs for basic symbols and combining them via epsilon transitions for union, concatenation, and Kleene star operations. This efficiency stems from the epsilon-move rules, which add a constant number of states and transitions per operator without requiring determinization. The emptiness problem for DFAs—determining whether the language is empty—is solvable in linear time, O(n), by checking reachability from the initial state to any accepting state via graph traversal. Equivalence of two DFAs with m and n states can be decided in polynomial time, specifically O(mn), by constructing the product automaton and testing emptiness of the symmetric difference. For NFAs, emptiness is NL-complete, reducible to graph reachability, while universality (whether the language equals \Sigma^*) is PSPACE-complete. State complexity measures the minimal number of states required in a DFA recognizing the result of an operation on input DFAs. For union of languages recognized by an m-state DFA and an n-state DFA, up to mn states may be necessary and sufficient in the worst case, achieved via the product construction with merged initial and accepting states. Lower bounds demonstrate that this bound is tight for certain languages over binary alphabets. Similarly, intersection requires up to mn states, using the standard product automaton. Reversal of an n-state DFA language has state complexity up to $2^n, as the reverse may require determinization of an equivalent NFA; tight examples exist over suitable alphabets. The following table summarizes the time and state complexities for key operations on regular languages represented by DFAs or NFAs:
OperationTime ComplexityState Complexity (Worst Case)
NFA to DFA (subset construction)$ O(2^{Q
DFA Minimization (Hopcroft)$ O(n \log n \cdot\Sigma
RegExp to NFA (Thompson)O(m)O(m) states
DFA EmptinessO(n)N/A
DFA Equivalence$ O(mn \cdot\Sigma
NFA EmptinessNL-completeN/A
Union (DFAs, m, n states)$ O(mn \cdot\Sigma
Intersection (DFAs)$ O(mn \cdot\Sigma
Reversal (DFA to minimal DFA)$ O(2^n \cdot\Sigma

Theoretical Position

Chomsky Hierarchy

The Chomsky hierarchy classifies formal grammars according to their generative power and the corresponding language classes they define, forming a nested sequence where each level includes all languages from lower levels but adds more expressive capabilities. Proposed by , the hierarchy consists of four types: Type 0 encompasses unrestricted grammars generating recursively enumerable languages; Type 1 includes context-sensitive grammars producing context-sensitive languages; Type 2 covers context-free grammars yielding context-free languages; and Type 3 comprises regular grammars that generate . This structure highlights a progression from simple, finite-memory models to those capable of handling arbitrary computation. Regular languages represent the base level, Type 3, in this hierarchy. They are generated by regular grammars, which are linear grammars restricted to right-linear or left-linear productions of the form A \to aB or A \to a (where A and B are nonterminals and a is a terminal). Every regular language is also context-free, as regular grammars form a proper subclass of context-free grammars, ensuring the inclusion \text{regular} \subseteq \text{context-free}. Brief reference to regular grammars underscores their role as the generative mechanism for Type 3. The hierarchy's levels are strictly increasing in expressive power, meaning there are context-free languages that are not regular. A canonical example is the language L = \{ a^n b^n \mid n \geq 0 \}, which is context-free—generated by the grammar with productions S \to a S b \mid \epsilon—but not regular, as no finite automaton can accept it. The intuition arises from the pumping lemma for regular languages: any sufficiently long string in a regular language can be decomposed as z = uvw where |uv| \leq p (for some pumping length p), |v| > 0, and uv^k w \in L for all k \geq 0; however, applying this to strings in L (e.g., a^n b^n with n > p) forces the pumpable v to either imbalance the counts of a's and b's or introduce mismatched symbols, violating membership in L. This demonstrates the limitations of finite-state recognition for matching nested structures. In terms of automata, regular languages are recognized by finite automata, which operate with finite memory and correspond to Type 3. This places them below context-free languages, accepted by pushdown automata that incorporate an unbounded stack for additional memory, and far below the full recursively enumerable languages recognized by Turing machines at Type 0. The hierarchy thus positions regular languages as the simplest class with decidable membership via bounded resources.

Characterization Theorems

The provides a necessary condition for a language to be regular. It states that if L is a regular language, then there exists a positive p (the pumping length) such that for every w \in L with |w| \geq p, w can be divided as w = xyz where |xy| \leq p, |y| \geq 1, and for all i \geq 0, xy^i z \in L. This lemma arises from the finite-state nature of recognizers for regular languages; a proof sketch relies on a (DFA) accepting L. For a w of length at least the number of DFA states, the path through the DFA must revisit a state by the , creating a that corresponds to the pumpable substring y. A key application of the is to prove non-regularity by . Consider the L = \{0^n 1^n \mid n \geq 0\}. Assume L is with pumping length p. Take w = 0^p 1^p \in L, so |w| = 2p \geq p. Then w = xyz with |xy| \leq p and |y| \geq 1. Since |xy| \leq p, y consists only of 0s. Pumping with i = 2 yields xy^2 z = 0^{p + |y|} 1^p, which has more 0s than 1s and thus \notin L, contradicting the lemma. Hence, L is not . The Myhill-Nerode theorem offers a precise characterization of languages in terms of equivalence relations on strings. Define the relation \sim_L on \Sigma^* by x \sim_L y for all z \in \Sigma^*, xz \in L \leftrightarrow yz \in L. The theorem states that L is \sim_L has finitely many equivalence classes, and the number of such classes (the index of \sim_L) equals the number of states in a minimal DFA recognizing L. This equivalence captures the "distinguishability" of prefixes based on their future extensions, directly linking the language's structure to finite-state minimality. Kleene's theorem establishes the foundational equivalence among formalisms for regular languages: a language is regular if and only if it can be recognized by a finite automaton, generated by a , or described by a . Detailed proofs of these equivalences appear in the section on equivalent formalisms.

Enumeration

Number of Regular Languages

Over a finite \Sigma with |\Sigma| = k \geq 1, the set of all regular languages is countably infinite. Each regular language is accepted by a (DFA) with finitely many states, and for each fixed number of states n, there are only finitely many such DFAs (specifically, at most k^{n^2} \cdot 2^n); the countable union over all n thus yields countably many distinct languages. The regular languages can be systematically enumerated by ordering them according to the number of states in their minimal DFA. Let g_k(n) denote the number of distinct regular languages over a k-letter alphabet that are accepted by some DFA with at most n states; equivalently, g_k(n) counts those whose unique minimal DFA (up to ) has at most n states. Since every DFA with n states accepts a language whose minimal DFA has at most n states, g_k(n) is finite for each n, and g_k(n) = \sum_{m=1}^n f_k(m), where f_k(n) is the number of pairwise non-isomorphic minimal DFAs with exactly n states. The value f_k(n) represents the number of regular languages whose minimal DFA requires precisely n states. For the binary alphabet (k=2), these values form the sequence beginning f_2(1) = 2, f_2(2) = 24, f_2(3) = 1028, f_2(4) = 56014, f_2(5) = 3705306, with subsequent terms growing rapidly (e.g., f_2(6) = 286717796). These counts arise from enumerating the possible transition structures and accepting state sets that yield minimal and accessible automata, canonicalized under state relabeling to eliminate isomorphisms. All regular languages can be generated algorithmically by enumerating minimal DFAs up to . One approach involves generating all accessible complete labeled DFAs and applying state minimization, followed by isomorphism testing via canonical labeling or graph invariants; optimizations exploit the structure of groups for efficiency up to moderate n (e.g., n \leq 10). Such methods have been implemented to compute the sequences above. For fixed k \geq 2, f_k(n) exhibits super-exponential growth. Domaratzki et al. (2002) provide a lower bound f_k(n) \geq f_1(n) \cdot n \cdot (k-1)^n, where f_1(n) is the case (asymptotically \sim 2^{n-1} n); however, this is loose for small k. Tighter shows that the proportion of minimal among accessible labeled complete DFAs approaches a positive constant (approximately 0.853 for k=2), implying f_k(n) is asymptotically the number of accessible labeled DFAs divided by n! (accounting for isomorphisms). The number of accessible labeled complete k-ary DFAs with n states grows as \sim k^{n(n-1)} \cdot n^{k n - 1} \cdot e^{-k n} / (k-1)^{n-1}, yielding \log f_k(n) \sim n^2 \log k - n \log n + O(n) after symmetry adjustment.

Length of Words in Regular Languages

In regular languages, the lengths of words are quantified by the sequence a_n = |\{ w \in L : |w| = n \}|, which counts the number of accepted strings of exact length n. The ordinary generating function associated with this sequence is the formal power series S_L(x) = \sum_{n=0}^\infty a_n x^n. For any regular language L, S_L(x) is a rational function, reflecting the finite-state nature of the recognizing automaton. This rationality arises directly from the automaton's structure via the transfer-matrix method. Given a deterministic finite automaton for L with state set Q, initial state vector \mathbf{u} (a row vector with 1 at the start state and 0 elsewhere), accepting state vector \mathbf{v} (a column vector with 1 at accepting states and 0 elsewhere), and transition matrix T (the adjacency matrix where T_{i,j} is the number of symbols transitioning from state i to j), the generating function is S_L(x) = \mathbf{u} (I - x T)^{-1} \mathbf{v}. The inverse (I - x T)^{-1} expands as a Neumann series summing paths of length n weighted by x^n, yielding the coefficients a_n = \mathbf{u} T^n \mathbf{v}. The sequence \{a_n\} satisfies a linear homogeneous whose is the denominator of S_L(x) in lowest terms, with at most the number of states. Asymptotically, a_n grows as a_n \sim c \rho^n for some constant c > 0 and growth rate \rho \geq 1, where \rho is the largest (Perron-Frobenius) eigenvalue of T; more precisely, the gives a_n = \sum_k p_k(n) \alpha_k^{-n}, with the dominant from the closest to the origin ( $1/\rho). For example, the regular language L = (ab)^* over alphabet \{a, b\} has automaton states q_0 (initial and accepting) and q_1, with transitions q_0 \xrightarrow{a} q_1 and q_1 \xrightarrow{b} q_0. Here, a_{2n} = [1](/page/1) (solely (ab)^n) and a_{2n+1} = 0 for n \geq 0, so S_L(x) = \sum_{n=0}^\infty x^{2n} = \frac{[1](/page/1)}{1 - x^2}. The growth rate is \rho = [1](/page/1), with periodic zero coefficients for odd lengths. To compute individual a_n, matrix evaluates T^n via in O(s^3 \log n) arithmetic operations, where s = |Q| is the number of states, followed by the vector-matrix-vector product. This yields a_n efficiently for large n, consistent with polynomial-time complexity bounds for finite automata computations.

Extensions

Generalizations

Star-free languages form a proper subclass of the regular languages, defined as those that can be constructed from the , singletons, and the full using the operations of , complement, and , but without the operation. This restriction captures languages whose syntactic monoids are aperiodic, meaning they contain no nontrivial groups, as established by Schützenberger's theorem. Equivalently, star-free languages are those definable in over words with the successor and order predicates (FO[<]), providing a logical characterization of their expressive power. Piecewise testable languages, introduced by Simon, are regular languages where membership depends solely on the presence or absence of scattered subwords of length up to some fixed k, formalized as boolean combinations of languages of the form A^* a_1 A^* a_2 \cdots A^* a_k A^* for words a_1 \cdots a_k over alphabet A. Simon's theorem characterizes these languages algebraically: a regular language is piecewise testable if and only if its syntactic monoid is J-trivial, meaning distinct principal ideals are incomparable under inclusion. This class forms a hierarchy indexed by k, with decidable membership and a strict inclusion within star-free languages. Two-way finite automata extend the standard one-way model by allowing the read head to move bidirectionally over the input tape, with endmarkers to bound the computation. Despite this added flexibility, proved that two-way deterministic finite automata recognize exactly the , equivalent in expressive power to one-way automata. Nondeterministic variants similarly capture only , though simulations may increase state complexity exponentially. For infinite words, ω-regular languages generalize regular languages to infinite sequences over a finite alphabet, recognized by Büchi automata, which accept if the computation visits some accepting state infinitely often. Introduced by Büchi to solve decision problems in weak second-order arithmetic, these automata extend finite automata with acceptance conditions suited to ω-words, maintaining closure under boolean operations and projection.

Modern Variants

Modern variants of regular languages incorporate probabilistic, quantum, and weighted elements to extend classical finite automata, addressing limitations in handling uncertainty, superposition, and quantitative measures. Probabilistic finite automata (PFA) generalize deterministic finite automata by associating probabilities with transitions, where each transition function maps to a stochastic matrix ensuring the sum of probabilities from any state is 1. A 1-way PFA accepts a string if the probability of reaching an accepting state exceeds 1/2 after processing the input. With bounded error (acceptance probability >1/2 for strings in the language and <1/2 otherwise), 1-way PFAs recognize exactly the regular languages, matching the expressive power of classical models while introducing for efficient approximation in noisy environments. Quantum finite automata (QFA) extend PFAs by using quantum states and unitary transitions, where the machine's configuration is a superposition of basis states represented as a in a . Transitions are governed by unitary matrices parameterized by input symbols, preserving the norm of the , and is determined by the squared (probability) of measuring the state in an accepting subspace. While bounded-error 1-way QFAs recognize only regular languages, exact-acceptance QFAs—requiring probability 1 for and 0 for rejection—exhibit strictly greater power for certain problems, where inputs are promised to belong to specific subsets, enabling separations from classical automata. Weighted automata generalize regular languages by assigning weights from a to transitions and accepting inputs based on the semiring sum over all , rather than mere existence. Over arbitrary , they capture quantitative behaviors beyond acceptance; for instance, the tropical semiring (\mathbb{R} \cup \{\infty\}, \min, +, \infty, 0) computes the shortest weight in a represented by the , with applications in optimization problems like distance computation in weighted . Recent developments in the highlight separations between quantum and classical models, particularly for promise problems. For example, extensions of 2-way quantum-classical automata demonstrate advantages in recognizing word problems for semigroups, where quantum components enable efficient reversibility not achievable classically. These results underscore QFAs' potential in hybrid computing paradigms. Such variants address gaps in classical theory by connecting to machine learning; PFAs, for instance, model stochastic sequences for prediction tasks, where learned transition probabilities enable forecasting in applications like natural language processing, outperforming deterministic models on probabilistic data.

Learning

Algorithms for Learning

Learning regular languages from data typically involves algorithms that infer a finite automaton, such as a (DFA) or (NFA), using either queries to an or samples of positive and negative examples. These methods operate under models like exact learning with queries or probably approximately correct () learning, aiming to identify the target efficiently. Seminal approaches focus on symbolic representations for exact or near-exact , with query-based methods providing guarantees under conditions. The L* algorithm, introduced by Dana Angluin in , is a foundational query-learning method for identifying the minimal DFA recognizing an unknown regular language. It uses membership queries, which ask whether a specific string belongs to the language, and queries, which check if a hypothesized DFA accepts exactly the target language (with counterexamples provided if not). The algorithm maintains an observation table to track string behaviors and iteratively refines a DFA until is confirmed, achieving polynomial-time convergence relative to the number of states n in the minimal DFA. It makes at most n queries and O(n^3) membership queries in the worst case (assuming counterexamples of length O(n)). For learning from positive and negative examples without queries, the evidence-driven state merging (EDSM) provides an effective in the PAC framework. Developed by Rodney and collaborators following the 1997 Abbadingo One competition, EDSM starts with a prefix tree acceptor from the sample traces and iteratively merges states based on an evidence score that measures compatibility, prioritizing merges supported by the most consistent test cases to avoid errors. This approach excels on sparse or noisy data by deferring risky merges, often yielding compact DFAs close to the target. Despite their strengths, these face limitations in handling inconsistent or noisy data; L* assumes a perfect and may diverge with erroneous responses, while EDSM relies on sample quality and can fail to converge to the minimal DFA on highly sparse or contradictory traces. Extensions address nondeterminism, such as the NL* algorithm, which adapts the L* framework to learn minimal NFAs using similar membership and equivalence queries, though with higher computational overhead due to constructions. While exact symbolic methods like L* and EDSM remain central, recent work explores neural approximations, such as recurrent neural networks trained to mimic s, offering scalable but less interpretable alternatives for approximate learning. Recent advances as of 2024 include methods for with advice and extracting automata from large models (LLMs), enhancing for complex systems. However, these fall short of exact guarantees, underscoring the enduring value of query- and evidence-based symbolic techniques.

Applications in Practice

Regular languages underpin numerous practical tools in and system design, particularly through their implementation via regular expressions and finite state machines. In , tools like Flex generate scanners for compilers by converting regular expressions into efficient finite automata that tokenize , enabling the identification of keywords, identifiers, and operators in programming languages such as or . This approach ensures fast, deterministic processing of input streams, forming the first phase of compilation in systems like . Pattern matching applications leverage regular expressions for searching and manipulating text in everyday tasks. The GNU grep utility employs extended regular expressions to filter lines in files based on patterns, supporting operations like finding email addresses or log entries across large datasets in systems. Similarly, text editors such as Vim and integrate regex-based search and replace functions, allowing users to perform global substitutions, such as reformatting code or cleaning data, with support for anchors, quantifiers, and character classes. In , regular expressions are applied to simple of and XML for tasks like extracting attributes or validating tags in well-formed documents, though they are limited to non-nested structures due to the context-free nature of full markup languages. Finite state machines derived from regular languages model state transitions in network protocols, ensuring reliable communication. The Transmission Control Protocol (), as specified in RFC 793, uses a finite state to manage connection establishment, data transfer, and teardown through states like LISTEN, SYN-SENT, and ESTABLISHED, handling events such as segment arrivals and timeouts. This automaton-based design detects errors and maintains protocol integrity in routing and transport layers. In and , regular languages facilitate preprocessing and probabilistic modeling. Regular expressions preprocess text by tokenizing sentences, normalizing case, or removing stopwords, as seen in NLP pipelines for tasks like where patterns match URLs or . Probabilistic finite automata (PFAs), extensions of regular languages, underpin hidden Markov models (HMMs) in systems, modeling acoustic sequences as state transitions with emission probabilities to decode phonemes and words, achieving high accuracy in tools like early versions of Google's . PFAs relate to HMMs by representing observable strings generated from hidden states, enabling efficient inference in dynamic programming algorithms like Viterbi decoding. Recent applications in the extend regular languages to cybersecurity and . In intrusion detection systems, automata based on regular languages analyze network traffic for protocol anomalies, such as deviations in packet sequences, using specification mining to model normal behavior in environments and flag attacks like buffer overflows. For , regular model employs automata to verify infinite-state systems, such as parameterized protocols, by representing reachable states as regular languages and checking temporal properties against linear-time logic specifications in tools like . This technique has verified safety in concurrent systems, including protocols, by accelerating exploration through automata operations like and testing.

References

  1. [1]
    Formal Language Definitions - UMBC CSEE
    Jan 24, 2004 · The building blocks of regular languages are symbols, concatenation of symbols to make strings (words), set union of strings and Kleene closure ...
  2. [2]
    What Are Regular Languages? | Baeldung on Computer Science
    May 30, 2023 · Regular languages are formal languages that regular expressions can describe and can also be recognized by finite automata.
  3. [3]
    [PDF] Representation of Events in Nerve Nets and Finite Automata - RAND
    Stimulus and Response: An organism or robot receives certain stimuli (via its sensory receptor organs) and performs. -certain actions (via its efre·ctor ...
  4. [4]
    The Chomsky Hierarchy
    These languages form a strict hierarchy; that is, regular languages Ì context-free languages Ì context-sensitive languages Ì recursively enumerable languages.
  5. [5]
    [PDF] 1 Closure Properties
    Regular Languages are closed under complementation, i.e., if L is regular then. L = Σ∗ \ L is also regular. Proof. • If L is regular, then there is a DFA M = (Q ...
  6. [6]
    Pumping Lemma
    The Pumping Lemma states that if A is a regular language, then for strings of length at least p, s=xyz exists where xy^i is in A, |y|>0, and |xy|≤p.
  7. [7]
    [PDF] 1 Equivalence of Finite Automata and Regular Expressions
    L is a regular language iff there is a regular expression R such that L(R) = L iff there is a DFA M such that L(M) = L iff there is a NFA N such that L(N) = L.
  8. [8]
    [PDF] Chapter 5 Regular Languages and Regular Expressions - CIS UPenn
    Regular expressions have several practical applications. The first important application is to lexical analysis. A lexical analyzer is the first component of a ...
  9. [9]
    [PDF] 2 Regular Languages
    2.1 Languages. A formal language (or just a language) is a set of strings over some finite alphabet Σ, or. equivalently, an arbitrary subset of Σ∗. For example ...
  10. [10]
    [PDF] COMPSCI 501: Formal Language Theory Regular Expressions ...
    Jan 30, 2019 · Definition of Regular Expressions. R is a regular expression if it is. ▷ a, for some symbol a ∈ Σ represents {a}. ▷ ε represents {ε}.<|control11|><|separator|>
  11. [11]
    A Formal Definition of Regular Languages
    A set L of strings over a specified alphabet is said to be a regular language if and only if L is recognized by some FSA.
  12. [12]
    [PDF] 22C:111 page 1 of 2 String Notations For a set C of characters, the ...
    Each regular expression A denotes a language L(A)Õ C* , referred to as a regular language, as ... language consisting of all strings ending with '1'. • 0*(10*10 ...
  13. [13]
    [PDF] 2 Regular Languages - Jeff Erickson
    the set of all binary strings whose length is even. 4 1∗(01∗01∗)∗ — the set of all binary strings with an even number of 0s. 4 0 ...
  14. [14]
    [PDF] CISC 4090: Theory of Computation - Chapter 1 Regular Languages
    Formal definition of finite state automata. A finite (state) automaton (FA) is a 5-tuple (Q,О,Ц,q0,F):. I Q is a finite set of states. I О is a finite set ...
  15. [15]
    [PDF] Theory of Computation Non-regular Languages Warm up Problem
    Proving that a language is not regular—the Pumping Lemma. Consider the ... B = { 0n1n | n ≥ 0 } is not regular. Proof. Proof by contradiction. Assume ...
  16. [16]
    Week 7 Regular Expressions - CS50's Introduction to Programming ...
    0:20:53going to keep track of whether or not the user's email matches this pattern? 0:20:57Well, it turns out that it's going to be using a machine of sorts ...
  17. [17]
    [PDF] Kleene Algebra with Equations - CS@Cornell
    Kleene algebra (KA) is the algebra of regular expressions. Introduced by Stephen. Cole Kleene in 1956, it is fundamental and ubiquitous in computer science.
  18. [18]
    Definitions of Regular Language and Regular Expression
    Here we are going to learn one type of language called regular language which is the simplest of the four Homsky formal languages, and regular expression which ...
  19. [19]
    Regular Expressions and Finite Automata - SI342
    Jun 28, 2018 · This means the set of all languages defined by regular expressions is equal to the set of all languages accepted by finite automata, so ...<|control11|><|separator|>
  20. [20]
    9. Regular Expressions
    The extended regular expression (ERE) notation and construction rules shall apply to utilities defined as using extended regular expressions; any exceptions to ...
  21. [21]
    [PDF] Finite Automata and Their Decision Proble'ms#
    Abstract: Finite automata are considered in this paper as instruments for classifying finite tapes. Each one- tape automaton defines a set of tapes, ...
  22. [22]
    [PDF] an/n log n algorithm for minimizing - Stanford University
    The results seem to indicate that the algorithm is practical for minimizing states in finite automata (or testing equivalence of finite automata) of up to ...
  23. [23]
    [PDF] On Certain Formal Properties of Grammars*
    A grammar can be regarded as a device that enumerates the sen- tences of a language. We study a sequence of restrictions that limit grammars first to Turing ...
  24. [24]
    CSci 311, Models of Computation, Chapter 3 Regular Languages ...
    Dec 29, 2015 · Algorithm to convert a regular grammar to an nfa. Start with a right-linear grammar and construct an equivalent nfa. We label the nfa's ...
  25. [25]
    [PDF] Grammars - Web - UNLV Computer Science
    (d) The empty string. Left and right linear grammars are called regular grammars. Note: the (b) rules cannot be mixed. For example, a regular grammer could not ...
  26. [26]
    [PDF] Closure Properties of Regular Languages - Stanford InfoLab
    Closure Properties of Regular. Languages. Union, Intersection, Difference,. Concatenation, Kleene Closure,. Reversal, Homomorphism, Inverse. Homomorphism. Page ...
  27. [27]
    [PDF] Closure Properties of Regular Languages
    Fact. The set of regular languages is closed under intersection. and that regular languages are closed under union and complementation. Each state in the ...
  28. [28]
    Use of NFA's for Closure Properties of Regular Languages
    NFAs are useful to show regular languages are closed under the last three operations (union, concatenation, star). Union. An NFA to recognize L1 ∪ ...
  29. [29]
    Properties of Regular Languages
    Feb 14, 2025 · 2.2 Other Closure Properties · 2.2.1 The Regular Languages are Closed under Reversal · 2.2.2 The Regular Languages are Closed under Homomorphism.
  30. [30]
    [PDF] Closure Properties for Regular Languages - Computer Science
    – We define inverse-homomorphism of a language L ∈ Γ∗ as h−1(L) = {w ∈ Σ∗ : h(w) ∈ L ⊆ Γ∗}. Theorem. The class of regular languages is closed under homomorphism ...
  31. [31]
    [PDF] CSci 311, Models of Computation Chapter 4 Properties of Regular ...
    Dec 29, 2015 · Linz Theorem 4.4 (Closure under Right Quotient): If L1 and L2 are regular languages, then L1/L2 is also regular. We say that the family of ...
  32. [32]
    [PDF] Decision Properties of Regular Languages - Stanford InfoLab
    A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a. DFA) and tells whether or not some property ...
  33. [33]
    [PDF] DECIDABILITY AND UNDECIDABILITY
    DFA containment is decidable. Proof hint: using closure properties of regular languages reduce the question to checking emptiness. 5. DFA equivalence. INSTANCE: ...
  34. [34]
    [PDF] Regular Languages: Closure Properties and Decidability
    Mar 15, 2023 · Theorem. The finiteness problem for regular languages is decidable. Proof. Construct a DFA M with L(M) = L(G). We have |L(G)| = ∞ iff in the ...
  35. [35]
    [PDF] Lecture 4: Summary of Regular Languages
    Sep 28, 2016 · 2 Minimization of FA and regular expressions. For DFA, there is an efficient (polynomial-time) algorithm for finding a minimal equivalent DFA,.
  36. [36]
    [PDF] Handout 3: The Myhill-Nerode Theorem and Implications 1 Overview
    The Myhill-Nerode theorem is a fundamental result in the theory of regular languages. It provides a characterization of regular languages, and hence can be ...
  37. [37]
    On the state complexity of reversals of regular languages
    We compare the number of states between minimal deterministic finite automata accepting a regular language and its reversal (mirror image).
  38. [38]
    [PDF] State complexity of union and intersection combined with star ... - arXiv
    Jun 18, 2010 · In this paper, we study the state complexities of union and intersection combined with star and reversal, respectively. We obtain the state com-.
  39. [39]
    On the Complexity of Hopcroft's State Minimization Algorithm
    Aug 7, 2025 · Hopcroft's algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight.
  40. [40]
    [PDF] Implementing Regular Expressions
    Nov 17, 2000 · Also in Automata Studies (Princeton UP, 1956) pp. 129-. 156. Kleene proved that regular expressions and finite automata describe the same ...
  41. [41]
    [PDF] Decidability
    Emptiness Problem for DFA: Given a DFA A, is L(A) empty? The corresponding language is. EDFA = {⟨A⟩ : A is a DFA such that L(A) = ∅}. Lemma. EDFA is a ...
  42. [42]
    [PDF] Testing the Equivalence of Regular Languages - arXiv
    Two NFAs A and B are equivalent, denoted by A ∼ B if they accept the same language. Given an NFA A = (QN ,Σ,δN ,I,FN ), we can use the powerset construction to ...
  43. [43]
    [PDF] Finite Automata with Restricted Nondeterminism and Universality
    NFA is PSPACE-complete [2]. Deciding whether an automaton accepts every string. (called the universality problem) is NL-complete for DFAs, and PSPACE-complete.
  44. [44]
    [PDF] Technical Report No. 2007-534 State Complexity of Basic ...
    Jun 11, 2007 · Next, we investigate the state complexity of intersection and union of suffix-free regular languages based on the Cartesian product of states in ...
  45. [45]
    (PDF) Pushdown Automata - ResearchGate
    Aug 6, 2025 · This chapter contains much of the main theory of pushdown automata as treated in the various introductory books on formal language theory.
  46. [46]
    [PDF] Chapter Eleven: Non-Regular Languages
    1. Proof is by contradiction using the pumping lemma for regular languages. Assume that L = {anbn} is regular, so the pumping lemma holds for L. Let k be as ...
  47. [47]
    AMS :: Proceedings of the American Mathematical Society
    Linear automaton transformations. HTML articles powered by AMS MathViewer ; by A. Nerode ; Proc. Amer. Math. Soc. 9 ; PDF | Request permission.
  48. [48]
    CS103 Guide to the Myhill-Nerode Theorem
    Dec 4, 2024 · The Myhill-Nerode theorem is our go-to tool for proving that languages are not regular. This guide recaps the major definitions needed for ...
  49. [49]
    Is the number of regular languages countably or uncountably infinite?
    Oct 6, 2020 · Every regular language corresponds to a finite automaton. So the set of regular languages is countable provided the alphabet is countable or finite (and not ...Can $L$ be regular language if it is a union of infinitely many regular ...Every language is regular? - Math Stack ExchangeMore results from math.stackexchange.com
  50. [50]
    A129622 - OEIS
    Jul 31, 2025 · On the number of distinct languages accepted by finite automata with n states, Journal of Automata, Languages and Combinatorics 7 (2002) 4-18, ...Missing: Shallit | Show results with:Shallit
  51. [51]
    [PDF] AC.pdf - Analytic Combinatorics
    Analytic combinatorics aims to enable precise quantitative predictions of the proper- ties of large combinatorial structures. The theory has emerged over ...
  52. [52]
    [PDF] On Finite Monoids Having Only Trivial Subgroups.
    On Finite Monoids Having Only Trivial Subgroups. M. P. SCHÜTZENBERGER. An alternative definition is given for a family of subsets of a free monoid that has ...
  53. [53]
    [PDF] How to Prove that a Language Is Regular or Star-Free? - HAL
    Sep 25, 2020 · Theorem 1.6 (Schützenberger [47]). For a language L, the following conditions are equivalent: (1) L is star-free,. (2) L is recognised by a ...
  54. [54]
    Piecewise testable events - SpringerLink
    May 25, 2005 · J.A. Brzozowski and Imre Simon, Characterizations of Locally Testable Events, Discrete Mathematics 4 (1973), 243–271. Article MathSciNet Google ...Missing: original | Show results with:original
  55. [55]
    A proof of Simon's theorem on piecewise testable languages
    We give a new proof of the Theorem of I. Simon that a language is piecewise testable if and only if it is recognized by a finite F-trivial monoid.
  56. [56]
    [PDF] Finite Automata and Their Decision Proble·mst - Otterbein University
    Abstract: Finite automata are considered in this paper as instruments for classifying finite tapes. Each one tape automaton defines a set of tapes, ...
  57. [57]
    Weak Second‐Order Arithmetic and Finite Automata - Büchi - 1960
    First published: 1960. https://doi.org/10.1002/malq.19600060105. Citations ... Download PDF. back. Additional links. About Wiley Online Library. Privacy ...Missing: original | Show results with:original
  58. [58]
    [PDF] Chapter BOUNDED ERROR PROBABILISTIC FINITE STATE ...
    We denote the class of languages accepted by such automata, with either. 1-way or 2-way input heads, as Regular. Page 3. 3. A 2-way head probabilistic finite ...
  59. [59]
    [PDF] Characterizations of 1-Way Quantum Finite Automata
    Oct 30, 2000 · Since the acceptance capability of a 1-way QFA depends on the measurements that the. QFA may perform during the computation, we investigate two ...
  60. [60]
    [PDF] Two-Way Quantum Finite Automata and the Word Problem - DROPS
    The two-way finite automaton with quantum and classical states (2QCFA), defined by Ambainis and. Watrous, is a model of quantum computation whose quantum part ...
  61. [61]
  62. [62]
    Learning regular sets from queries and counterexamples
    A learning algorithm L ∗ is described that correctly learns any regular set from any minimally adequate Teacher in time polynomial in the number of states of ...
  63. [63]
    [PDF] Angluin-Style Learning of NFA - IJCAI
    A popular setup for an online approach is that of Angluin's. L. ∗ algorithm [Angluin, 1987]: A minimal DFA is learned based on membership and equivalence ...
  64. [64]
    [PDF] Understanding Robust Generalization in Learning Regular Languages
    We study robust generalization in the context of using recurrent neural networks (RNNs) to learn regular languages.
  65. [65]
    Regular Expressions (GNU Grep 3.12)
    A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions.
  66. [66]
    RegEx match open tags except XHTML self-contained tags
    Nov 13, 2009 · While parsing arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML.
  67. [67]
    RFC 9293 - Transmission Control Protocol (TCP) - IETF Datatracker
    This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack.Missing: automaton | Show results with:automaton
  68. [68]
  69. [69]
    Links between probabilistic automata and hidden Markov models
    This article presents an overview of Probabilistic Automata (PA) and discrete Hidden Markov Models (HMMs), and aims at clarifying the links between them.
  70. [70]
    [PDF] Specification Mining for Intrusion Detection in Networked Control ...
    Aug 12, 2016 · This paper discusses a novel approach to specification- based intrusion detection in the field of networked con- trol systems.
  71. [71]
    Regular Model Checking - ACM Digital Library
    However, there are many parameterized verification problems that cannot be described by regular languages, ... Read More · Model Checking Vs. Generalized ...
  72. [72]
    Regular Model Checking with Regular Relations - ACM Digital Library
    Sep 12, 2021 · Regular model checking is an exploration technique for infinite state systems where state spaces are represented as regular languages and ...