Fact-checked by Grok 2 weeks ago

Regular language

In theoretical computer science and formal language theory, a regular language is a formal language consisting of a set of strings over a finite alphabet that can be recognized by a finite automaton, equivalently expressed using regular expressions, or generated by a regular grammar.^[1]^[2] Regular languages were introduced by mathematician Stephen Cole Kleene in his 1951 technical report, published in 1956 as "Representation of Events in Nerve Nets and Finite Automata," where he introduced the algebraic structure of regular events as a model for neural networks and sequential machines.^[3] The class of regular languages occupies the lowest level (Type-3) in the Chomsky hierarchy of formal grammars, which classifies languages based on the restrictions on production rules; regular languages are strictly contained within context-free languages and higher classes.^[4] They possess several closure properties, meaning that if two or more regular languages are subjected to operations like union, concatenation, Kleene star (repetition), complement, intersection, reversal, or homomorphism, the resulting language remains regular.^[5] A fundamental characterization is provided by the pumping lemma for regular languages, which states that for any regular language L, there exists a pumping length p such that every string s in L with length at least p can be divided as s = xyz where |xy| \leq p, |y| > 0, and xy^iz is in L for all nonnegative integers i; this lemma is often used to prove that certain languages are non-regular.^[6] Regular languages and their equivalents, such as deterministic finite automata (DFAs), nondeterministic finite automata (NFAs), and regular expressions, are foundational in automata theory, enabling efficient algorithms for recognition and manipulation since finite automata have finite memory and can process inputs in linear time.^[7] In practice, regular expressions power applications in text processing, lexical analysis in compilers (e.g., tokenizing source code), pattern matching in search tools, and built-in features of programming languages like Java, Python, and Perl for string manipulation.^[8]

Fundamentals

Formal Definition

In formal language theory, an alphabet \Sigma is a finite nonempty set of symbols, such as \Sigma = \{0, 1\}.^[9] A string over \Sigma is a finite sequence of symbols from \Sigma, including the empty string \epsilon, which has length zero and contains no symbols.^[9] The set of all finite strings over \Sigma, denoted \Sigma^*, is known as the Kleene star of \Sigma and forms the universal set for languages over \Sigma.^[9] A regular language over an alphabet \Sigma is a subset L \subseteq \Sigma^* consisting of all strings generated by a regular expression over \Sigma, or equivalently, all strings accepted by a finite automaton over \Sigma.^[10] This definition captures the simplest class of formal languages in the Chomsky hierarchy.^[11] The formal syntax of regular expressions over \Sigma is defined inductively as follows:

The empty set \emptyset is a regular expression denoting the empty language.
Any symbol a \in \Sigma is a regular expression denoting the singleton set \{a\}.
The empty string \epsilon is a regular expression denoting \{\epsilon\}.
If R_1 and R_2 are regular expressions denoting languages L_1 and L_2, then:
- (R_1 + R_2) (or (R_1 \mid R_2)) is a regular expression denoting L_1 \cup L_2 (union).
- (R_1 \cdot R_2) is a regular expression denoting L_1 L_2 = \{xy \mid x \in L_1, y \in L_2\} (concatenation).
- R_1^* is a regular expression denoting the Kleene star L_1^* = \bigcup_{n=0}^\infty L_1^n, where L_1^0 = \{\epsilon\} and L_1^{n+1} = L_1^n L_1 for n \geq 0.
  Parentheses are used to indicate precedence, with star having highest precedence, followed by concatenation, then union.^[10]

The concept of regular languages was introduced by Stephen C. Kleene in 1956 as "regular events," defined inductively using union, concatenation, and star operations on sequences of input symbols, with equivalence to recognition by finite automata (nerve nets).^[3]

Examples

Regular languages can be illustrated through simple sets of strings that can be recognized by finite automata, which maintain a finite amount of memory to track patterns. A classic example is the language L_1 = \{ w \in \{0,1\}^* \mid w ends with 1\} , consisting of all binary strings terminating in the symbol 1, such as 01, 101, and 111. This language is regular because a deterministic finite automaton (DFA) with two states suffices: a start state q_0 (non-accepting, representing the last symbol was 0 or start) and an accepting state q_1 (last symbol was 1), with transitions q_0 \xrightarrow{0} q_0, q_0 \xrightarrow{1} q_1, q_1 \xrightarrow{0} q_0, and q_1 \xrightarrow{1} q_1.^[12] Another basic regular language is L_2, the set of all even-length binary strings over \{0,1\}, including the empty string \epsilon (length 0), 00, 01, 10, 11 (length 2), and so on. Recognition requires tracking parity with a DFA featuring two states: q_e (even length so far, accepting) as the start state and q_o (odd length, non-accepting), with transitions swapping states on each input symbol regardless of 0 or 1.^[13] Trivial cases include the empty language \emptyset, which contains no strings and is recognized by a DFA with no accepting states, and the full language \Sigma^* (for alphabet \Sigma = \{0,1\}), encompassing all possible binary strings, accepted by a single-state DFA where the sole state is accepting and loops on all inputs.^[14] In contrast, the language L_3 = \{ 0^n 1^n \mid n \geq 0 \}, such as \epsilon, 01, 0011, and 000111, is not regular because any purported DFA would require unbounded states to match the equal number of 0s and 1s, violating the finite memory constraint; this is shown via the pumping lemma for regular languages.^[15] These examples can be described using regular expressions, such as (0 + 1)^*1 for L_1 and (00 + 01 + 10 + 11)^* for even-length strings.^[16] In practice, regular languages underpin pattern matching in text processing, like validating email addresses with regex patterns that check formats such as local-part@domain (e.g., user@example.com), ensuring finite-state recognition of structural rules without infinite memory.^[17]

Equivalent Formalisms

Regular Expressions

Regular expressions provide a symbolic notation for specifying regular languages, originally introduced by Stephen Kleene in his 1956 work on representation of events in finite automata.^[18] They consist of atomic expressions combined using operations that correspond to union, concatenation, and Kleene star on languages. The syntax of regular expressions over an alphabet \Sigma is defined recursively as follows: the atomic expressions are the empty set \emptyset, the empty string \varepsilon, and single symbols a \in \Sigma; if r and s are regular expressions, then so are (r + s) (union), (r s) (concatenation), and r^* (Kleene star).^[19] Parentheses are used for grouping, and precedence rules apply: Kleene star has highest precedence, followed by concatenation, then union, allowing omission of some parentheses (e.g., a b^* + c means (a (b^*)) + c).^[20] The semantics of a regular expression r, denoted L(r), map it to the language it represents, defined recursively: L(\emptyset) = \{\}, L(\varepsilon) = \{\varepsilon\}, L(a) = \{a\} for a \in \Sigma; L(r s) = L(r) L(s) (concatenation of languages); L(r + s) = L(r) \cup L(s); and L(r^*) = \bigcup_{k=0}^\infty L(r)^k, where L(r)^0 = \{\varepsilon\} and L(r)^{k+1} = L(r)^k L(r).^[19] For example, the expression a^* b denotes the language of all strings consisting of zero or more a's followed by a single b. Every regular expression can be converted to an equivalent nondeterministic finite automaton (NFA) using Thompson's construction, an algorithm that builds the NFA recursively from subexpressions. The process starts with basic NFAs for atoms and combines them for operations, ensuring each partial NFA has a unique start state i and accept state f:

For \emptyset: States i and f with no transitions between them.
For \varepsilon: States i and f connected by an \varepsilon-transition.
For literal a \in \Sigma: States i and f connected by a transition labeled a.

For union r + s: Create new states i' and f'; add \varepsilon-transitions from i' to the starts of the NFAs for r and s, and from their accepts to f'. For concatenation r s: Use the start of r's NFA as overall start and s's accept as overall accept; add an \varepsilon-transition from r's accept to s's start. For star r^*: Create new states i' and f'; add \varepsilon-transitions from i' to r's start and to f', and from r's accept to r's start and to f'. This construction yields an \varepsilon-NFA with O(n) states and transitions, where n is the length of the expression. For example, consider a b^*: First, build NFA for a (states 0--a-->1); for b^*, build inner b (2--b-->3), then star with new 4--ε-->2, 4--ε-->5, 3--ε-->2, 3--ε-->5; concatenate by ε from 1 to 2, overall start 0, accept 5. In practice, extensions like POSIX extended regular expressions (ERE) are used for pattern matching in Unix tools, adding features such as the quantifiers + (one or more) and ? (zero or one), bounded repetition {m,n}, and character classes [abc] or [:digit:].^[21] These go beyond theoretical regular expressions, as they can describe non-regular languages when combined (e.g., backreferences in some implementations), and rely on implementation-specific behaviors like greedy matching.^[21]

Finite Automata

Finite automata serve as fundamental computational models for recognizing regular languages, processing input strings sequentially through a finite number of states based on transitions determined by the alphabet symbols. These machines operate without memory beyond their current state, making them suitable for pattern matching in linear time relative to the input length. The two primary variants are deterministic finite automata (DFAs), which have unique transitions for each input symbol from any state, and nondeterministic finite automata (NFAs), which allow multiple possible transitions or even spontaneous moves without consuming input.^[22] A DFA is formally defined as a 5-tuple (Q, \Sigma, \delta, q_0, F), where [Q](/page/Q) is a finite set of states, [\Sigma](/page/Sigma) is a finite input alphabet, [\delta](/page/Delta): Q \times \Sigma \to Q is the transition function that specifies a unique next state for each state-symbol pair, q_0 \in Q is the initial state, and F \subseteq Q is the set of accepting (or final) states.^[22] To accept a string w \in \Sigma^*, the DFA begins in q_0 and applies \delta successively for each symbol in w; the string is accepted if the resulting state is in F, otherwise rejected.^[22] This deterministic behavior ensures exactly one computation path per input, facilitating efficient simulation. An NFA extends the DFA model to a 5-tuple (Q, \Sigma, \delta, q_0, F), where the transition function is \delta: Q \times (\Sigma \cup \{\epsilon\}) \to 2^Q, allowing a set of possible next states (including the empty set) for each state and input symbol or the empty string \epsilon.^[22] Acceptance occurs if there exists at least one computation path from q_0 to a state in F that consumes the entire input string, accounting for \epsilon-transitions that enable moves without reading symbols.^[22] Despite nondeterminism, NFAs recognize exactly the same class of languages as DFAs, as any NFA can be converted to an equivalent DFA via the powerset construction, which treats subsets of Q as new states and simulates all possible NFA paths in parallel.^[22] In this construction, the DFA's states are elements of $2^Q, the initial state is \{q_0\}, accepting states are those subsets intersecting F, and transitions are defined by applying \delta to all states in a subset and taking the union.^[22] To optimize DFAs by reducing the number of states while preserving the recognized language, minimization algorithms partition states into equivalence classes based on distinguishability by future inputs. Hopcroft's algorithm achieves this in O(|Q| \log |Q|) time by iteratively refining partitions using a breadth-first search-like refinement process on the state graph, improving upon earlier O(|Q|^2) methods.^[23] This complexity makes it practical for large automata in applications like compiler design and lexical analysis. As an illustrative example, consider an NFA recognizing binary strings over \Sigma = \{0, 1\} with an even number of 1's. The NFA has states Q = \{q_e, q_o\}, where q_e (initial and accepting) tracks even parity and q_o tracks odd parity, with transitions: \delta(q_e, 0) = \{q_e\}, \delta(q_e, 1) = \{q_o\}, \delta(q_o, 0) = \{q_o\}, \delta(q_o, 1) = \{q_e\}, and no \epsilon-transitions. For input "101", the computation paths yield states q_e \to q_o \to q_e \to q_o (rejecting, odd 1's) or parallel simulations confirming acceptance only for even counts like "110" ending in q_e. Applying powerset construction, the DFA states are \{\{q_e\}, \{q_o\}, \emptyset\}, but \emptyset is unreachable; transitions mirror the NFA since each yields a singleton, resulting in an identical two-state DFA with states renamed as even and odd parity. NFAs can also be constructed systematically from regular expressions using methods like Thompson's construction.^[22]

Regular Grammars

A regular grammar, also known as a type-3 grammar in the Chomsky hierarchy, is a formal grammar that generates precisely the regular languages through restrictions on its production rules to ensure linearity in one direction. These grammars are classified as either right-linear or left-linear. In a right-linear grammar, all productions are of the form A \to aB or A \to a, where A and B are nonterminal symbols, a is a terminal symbol from the alphabet \Sigma, and there is a distinguished start symbol S. Left-linear grammars mirror this structure with productions A \to Ba or A \to a. A grammar is regular if it is either right-linear or left-linear but not a mixture of both forms.^[24] The language L(G) generated by a regular grammar G consists of all finite strings of terminals that can be obtained from the start symbol S via a derivation, which is a sequence of substitutions applying the production rules. In each derivation step, a nonterminal is replaced by the right-hand side of a matching production, proceeding until only terminals remain; for regular grammars, derivations are inherently linear due to the single nonterminal allowed on the right-hand side. This generative process enumerates the language's strings without ambiguity in structure, though multiple derivations may yield the same string in nondeterministic cases.^[24] Regular grammars are equivalent to nondeterministic finite automata (NFAs), with constructive algorithms establishing bidirectional conversions. To convert a right-linear regular grammar to an NFA, map each nonterminal to a state, designate S as the start state, add transitions A \xrightarrow{a} B for each production A \to aB, introduce transitions A \xrightarrow{a} F to a new accepting state F for each A \to a, and mark states accepting if there is a production A \to \epsilon. The reverse conversion from an NFA to a regular grammar assigns nonterminals to states, with the start state as S, and generates productions A \to aB from transitions A \xrightarrow{a} B and A \to a from transitions to accepting states. These algorithms preserve the language exactly, confirming the equivalence.^[24]^[25] As an illustrative example, consider the regular language L = \{ a^n b^m \mid n, m \geq 0 \}, which matches the regular expression a^* b^*. A corresponding right-linear grammar is G = (\{S, B\}, \{a, b\}, P, S), with productions S \to aS \mid B, B \to bB \mid \epsilon. A derivation for the string a^2 b^3 (i.e., "aabbb") proceeds as follows:
S \Rightarrow aS \Rightarrow a(aS) \Rightarrow aaB \Rightarrow aabB \Rightarrow aabbB \Rightarrow aabbbB \Rightarrow aabbb. The derivation tree forms a right-skewed spine: the root S branches to a and another S (twice for the a's), then to B, which branches to b and B (thrice), terminating in \epsilon. This tree visually captures the sequential generation, with leaves yielding the terminal string.^[26]

Properties

Closure Properties

Regular languages possess notable closure properties under several fundamental operations on formal languages. These properties underscore the robustness of the class, as the result of applying such an operation to one or more regular languages is always regular. Proofs of closure typically rely on constructive methods using finite automata, leveraging the equivalence between regular languages and the languages accepted by nondeterministic finite automata (NFAs) or deterministic finite automata (DFAs).^[27] The class of regular languages is closed under union. Given regular languages L_1 and L_2 accepted by NFAs M_1 = (Q_1, \Sigma, \delta_1, s_1, F_1) and M_2 = (Q_2, \Sigma, \delta_2, s_2, F_2), construct an NFA M = (Q_1 \cup Q_2 \cup \{s\}, \Sigma, \delta, s, F_1 \cup F_2) with a new start state s and \varepsilon-transitions from s to s_1 and s_2; the transition function \delta extends \delta_1 and \delta_2. This M accepts exactly L_1 \cup L_2, as paths from s reach accepting states via either submachine.^[27] Closure under intersection holds via the product automaton construction. For DFAs M_1 and M_2 as above, form the product DFA M = (Q_1 \times Q_2, \Sigma, \delta, (s_1, s_2), F_1 \times F_2), where \delta((p, q), a) = (\delta_1(p, a), \delta_2(q, a)) for states p \in Q_1, q \in Q_2, and symbol a \in \Sigma. A string is accepted by M if and only if it is accepted by both M_1 and M_2, so M accepts L_1 \cap L_2. Since the state space is finite (|Q_1| \times |Q_2|), M is a DFA.^[28] Regular languages are closed under complementation. If L is accepted by a DFA M = (Q, \Sigma, \delta, s, F), then the complement \overline{L} = \Sigma^* \setminus L is accepted by the DFA M' = (Q, \Sigma, \delta, s, Q \setminus F), which swaps accepting and non-accepting states. Every string drives M' to a state in Q \setminus F precisely when M reaches a state not in F, ensuring acceptance of \overline{L}.^[5] The class is closed under concatenation. For regular languages L_1 and L_2 accepted by NFAs M_1 and M_2, construct M by adding \varepsilon-transitions from each state in F_1 to s_2, using states Q_1 \cup Q_2, start s_1, and accept states F_2. A string w = uv with u \in L_1, v \in L_2 reaches an accepting state via the path for u in M_1 followed by \varepsilon to M_2 for v, and no other strings are accepted.^[29] Closure under the Kleene star operation L^* = \bigcup_{i=0}^\infty L^i (with L^0 = \{\varepsilon\}) is established using NFAs. For L accepted by NFA M = (Q, \Sigma, \delta, s, F), create M' = (Q \cup \{s', f\}, \Sigma, \delta', s', \{f\}) with new states s' (start) and f (sole accept), \varepsilon-transitions from s' to s and from s' to f, and from each state in F via \varepsilon to both s and f. This allows zero or more repetitions of paths accepting strings in L.^[29] Regular languages are closed under reversal, where L^R = \{w^R \mid w \in L\} and w^R reverses w. Given NFA M for L, construct NFA M^R by reversing all transitions (if \delta(p, a) includes q, add \delta(q, a) including p), setting start states to F (with \varepsilon-transitions if multiple to a new start if needed, but typically direct), and accept state to s. A path labeled w from s to some f \in F in M corresponds to a path labeled w^R from f to s in M^R, hence from start to accept in M^R.^[30] The class is closed under homomorphism. A homomorphism h: \Sigma^* \to \Gamma^* maps each symbol to a string in \Gamma^*. If L \subseteq \Sigma^* is regular, accepted by DFA M = (Q, \Sigma, \delta, s, F), then h(L) is accepted by NFA M_h = (Q_h, \Gamma, \delta_h, s, F), where for each transition \delta(p, a) = q in M, add a chain of states and transitions spelling h(a) from p to q (introducing intermediate states as needed for the fixed-length h(a)); \varepsilon-transitions connect chains. Since |h(a)| is fixed per a and |\Sigma| finite, the added states are finite, and processing w = a_1 \cdots a_k simulates M on w while outputting h(w), accepting if w \in L.^[31] Closure under inverse homomorphism holds: For homomorphism h: \Sigma^* \to \Gamma^* and regular L \subseteq \Gamma^* accepted by DFA N = (P, \Gamma, \gamma, t, G), the inverse h^{-1}(L) = \{w \in \Sigma^* \mid h(w) \in L\} is accepted by NFA N_i = (P_i, \Sigma, \delta_i, t, G), where states include copies of P for each position in the images h(a); on input a \in \Sigma, from state p \in P, chain through \gamma applied to the symbols of h(a) using intermediate states, ending at \gamma(p, h(a)). The finite added states per a (bounded by \max |h(a)| \times |P|) ensure finiteness; w is accepted if the simulation reaches G after h(w). Regular languages are closed under the (right) quotient operation when the divisor is regular: if L_1, L_2 are regular, then L_1 / L_2 = \{u \in \Sigma^* \mid \exists v \in L_2, \, uv \in L_1\} is regular. The construction modifies the DFA for L_1 by redefining accepting states to those from which some string in L_2 reaches an original accepting state, using the finite state space.^[32] However, the class is not closed under quotient when the dividend is non-regular. For a counterexample, let K = \{a^n b^n c \mid n \geq 0\} (non-regular) and R = \{c\} (regular); then K / R = \{a^n b^n \mid n \geq 0\}, which is non-regular.^[32] The following table summarizes the closure operations and corresponding automaton constructions:

Operation	Construction Method
Union	New NFA start state with \varepsilon-transitions to starts of input NFAs; union of accept sets
Intersection	Product DFA with synchronized transitions on symbols; product of start and accept sets
Complement	Same DFA, but swap accepting and non-accepting states
Concatenation	NFA combining input NFAs with \varepsilon-transitions from accept of first to start of second
Kleene star	Augmented NFA with new start/accept state and \varepsilon-loops from accepts to start
Reversal	Reverse transitions in NFA; original accepts become starts (via \varepsilon if needed), original start becomes accept
Homomorphism	NFA with \varepsilon-chains replacing transitions for strings h(a)
Inverse homomorphism	NFA simulating transitions on full strings h(a) via chained states per symbol

Decidability Properties

Regular languages enjoy strong decidability properties, allowing algorithms to resolve key questions about them efficiently. Unlike higher classes in the Chomsky hierarchy, all standard decision problems for regular languages—when presented via finite automata, regular expressions, or regular grammars—are computable in polynomial time relative to the input size. The membership problem, determining whether a given string w \in \Sigma^* belongs to a regular language L described by a deterministic finite automaton (DFA) M = (Q, \Sigma, \delta, q_0, F), is decidable by simulating \delta on w starting from q_0. If the final state is in F, then w \in L. This simulation requires examining one transition per symbol in w, yielding a time complexity of O(|w| \cdot |\Sigma|).^[33] Emptiness, checking if L(M) = \emptyset for a DFA M, is decidable via graph reachability: model the DFA as a directed graph with states as nodes and transitions as edges, then use breadth-first search (BFS) from q_0 to detect if any state in F is reachable. Since the graph has |Q| nodes and at most |Q| \cdot |\Sigma| edges, BFS runs in O(|Q| \cdot |\Sigma|) time. For nondeterministic finite automata (NFAs), first convert to a DFA using subset construction, which is exponential but finite.^[34] Finiteness, determining if |L(M)| < \infty, is decidable by analyzing the DFA's structure for cycles that contribute to infinite languages. Specifically, compute the reachable states from q_0, then check for cycles within the subgraph of states from which an accepting state is reachable; the language is infinite if and only if such a cycle exists. This involves graph traversal algorithms like depth-first search for cycle detection, again in O(|Q| \cdot |\Sigma|) time.^[35] Equivalence between two regular languages given by DFAs M_1 and M_2 is decidable by minimizing both automata and verifying isomorphism of the results, or by constructing DFAs for L(M_1) \triangle L(M_2) (symmetric difference) and testing emptiness. Minimization partitions states into equivalence classes using a table-filling or partition refinement algorithm, with Hopcroft's algorithm achieving O(|Q| \log |Q| \cdot |\Sigma|) time, where |Q| is the total number of states across both DFAs.^[36] The Myhill-Nerode theorem underpins these results by characterizing regular languages via the right-congruence relation x \equiv_L y if xz \in L iff yz \in L for all z \in \Sigma^*; L is regular if and only if the number of equivalence classes (the index) is finite, equaling the size of the minimal DFA. This equivalence relation directly informs DFA minimization and equivalence testing by identifying indistinguishable states.^[37] In contrast to regular languages, non-regular classes like context-sensitive languages exhibit undecidability for problems such as equivalence; for instance, while membership in \{a^n b^n c^n \mid n \geq 0\} is decidable, deciding equivalence among context-sensitive languages in general is not.^[34]

Complexity Results

The conversion from a nondeterministic finite automaton (NFA) with |Q| states to an equivalent deterministic finite automaton (DFA) via the subset construction algorithm can result in an exponential blowup in the number of states, reaching up to $2^{|Q|} in the worst case.^[38] This worst-case bound is tight, as there exist NFAs requiring a DFA with exactly $2^{|Q|} states for equivalence.^[39] Minimization of a DFA with n states and alphabet size k can be performed in O(n k \log n) time using Hopcroft's algorithm, which refines an initial partition of states based on distinguishability. This bound is asymptotically optimal, as the algorithm's partitioning process necessitates handling logarithmic factors in the worst case.^[40] The construction of an NFA from a regular expression of length m using Thompson's method proceeds in linear time, O(m), by inductively building sub-NFAs for basic symbols and combining them via epsilon transitions for union, concatenation, and Kleene star operations. This efficiency stems from the epsilon-move rules, which add a constant number of states and transitions per operator without requiring determinization.^[41] The emptiness problem for DFAs—determining whether the language is empty—is solvable in linear time, O(n), by checking reachability from the initial state to any accepting state via graph traversal.^[42] Equivalence of two DFAs with m and n states can be decided in polynomial time, specifically O(mn), by constructing the product automaton and testing emptiness of the symmetric difference.^[43] For NFAs, emptiness is NL-complete, reducible to graph reachability, while universality (whether the language equals \Sigma^*) is PSPACE-complete.^[44] State complexity measures the minimal number of states required in a DFA recognizing the result of an operation on input DFAs. For union of languages recognized by an m-state DFA and an n-state DFA, up to mn states may be necessary and sufficient in the worst case, achieved via the product construction with merged initial and accepting states.^[39] Lower bounds demonstrate that this bound is tight for certain languages over binary alphabets.^[45] Similarly, intersection requires up to mn states, using the standard product automaton. Reversal of an n-state DFA language has state complexity up to $2^n, as the reverse may require determinization of an equivalent NFA; tight examples exist over suitable alphabets.^[38] The following table summarizes the time and state complexities for key operations on regular languages represented by DFAs or NFAs:

Operation	Time Complexity	State Complexity (Worst Case)
NFA to DFA (subset construction)	$ O(2^{	Q
DFA Minimization (Hopcroft)	$ O(n \log n \cdot	\Sigma
RegExp to NFA (Thompson)	O(m)	O(m) states
DFA Emptiness	O(n)	N/A
DFA Equivalence	$ O(mn \cdot	\Sigma
NFA Emptiness	NL-complete	N/A
Union (DFAs, m, n states)	$ O(mn \cdot	\Sigma
Intersection (DFAs)	$ O(mn \cdot	\Sigma
Reversal (DFA to minimal DFA)	$ O(2^n \cdot	\Sigma

Theoretical Position

Chomsky Hierarchy

The Chomsky hierarchy classifies formal grammars according to their generative power and the corresponding language classes they define, forming a nested sequence where each level includes all languages from lower levels but adds more expressive capabilities. Proposed by Noam Chomsky, the hierarchy consists of four types: Type 0 encompasses unrestricted grammars generating recursively enumerable languages; Type 1 includes context-sensitive grammars producing context-sensitive languages; Type 2 covers context-free grammars yielding context-free languages; and Type 3 comprises regular grammars that generate regular languages.^[24] This structure highlights a progression from simple, finite-memory models to those capable of handling arbitrary computation. Regular languages represent the base level, Type 3, in this hierarchy. They are generated by regular grammars, which are linear grammars restricted to right-linear or left-linear productions of the form A \to aB or A \to a (where A and B are nonterminals and a is a terminal).^[24] Every regular language is also context-free, as regular grammars form a proper subclass of context-free grammars, ensuring the inclusion \text{regular} \subseteq \text{context-free}.^[24] Brief reference to regular grammars underscores their role as the generative mechanism for Type 3. The hierarchy's levels are strictly increasing in expressive power, meaning there are context-free languages that are not regular. A canonical example is the language L = \{ a^n b^n \mid n \geq 0 \}, which is context-free—generated by the grammar with productions S \to a S b \mid \epsilon—but not regular, as no finite automaton can accept it.^[24] The intuition arises from the pumping lemma for regular languages: any sufficiently long string in a regular language can be decomposed as z = uvw where |uv| \leq p (for some pumping length p), |v| > 0, and uv^k w \in L for all k \geq 0; however, applying this to strings in L (e.g., a^n b^n with n > p) forces the pumpable substring v to either imbalance the counts of a's and b's or introduce mismatched symbols, violating membership in L.^[46] This demonstrates the limitations of finite-state recognition for matching nested structures. In terms of automata, regular languages are recognized by finite automata, which operate with finite memory and correspond to Type 3.^[22] This places them below context-free languages, accepted by pushdown automata that incorporate an unbounded stack for additional memory, and far below the full recursively enumerable languages recognized by Turing machines at Type 0.^[24] The hierarchy thus positions regular languages as the simplest class with decidable membership via bounded resources.

Characterization Theorems

The pumping lemma for regular languages provides a necessary condition for a language to be regular. It states that if L is a regular language, then there exists a positive integer p (the pumping length) such that for every string w \in L with |w| \geq p, w can be divided as w = xyz where |xy| \leq p, |y| \geq 1, and for all i \geq 0, xy^i z \in L.^[22] This lemma arises from the finite-state nature of recognizers for regular languages; a proof sketch relies on a deterministic finite automaton (DFA) accepting L. For a string w of length at least the number of DFA states, the path through the DFA must revisit a state by the pigeonhole principle, creating a cycle that corresponds to the pumpable substring y.^[6] A key application of the pumping lemma is to prove non-regularity by contradiction. Consider the language L = \{0^n 1^n \mid n \geq 0\}. Assume L is regular with pumping length p. Take w = 0^p 1^p \in L, so |w| = 2p \geq p. Then w = xyz with |xy| \leq p and |y| \geq 1. Since |xy| \leq p, y consists only of 0s. Pumping with i = 2 yields xy^2 z = 0^{p + |y|} 1^p, which has more 0s than 1s and thus \notin L, contradicting the lemma. Hence, L is not regular.^[47] The Myhill-Nerode theorem offers a precise characterization of regular languages in terms of equivalence relations on strings. Define the relation \sim_L on \Sigma^* by x \sim_L y if and only if for all z \in \Sigma^*, xz \in L \leftrightarrow yz \in L. The theorem states that L is regular if and only if \sim_L has finitely many equivalence classes, and the number of such classes (the index of \sim_L) equals the number of states in a minimal DFA recognizing L.^[48] This equivalence captures the "distinguishability" of prefixes based on their future extensions, directly linking the language's structure to finite-state minimality.^[49] Kleene's theorem establishes the foundational equivalence among formalisms for regular languages: a language is regular if and only if it can be recognized by a finite automaton, generated by a regular grammar, or described by a regular expression.^[3] Detailed proofs of these equivalences appear in the section on equivalent formalisms.

Enumeration

Number of Regular Languages

Over a finite alphabet \Sigma with |\Sigma| = k \geq 1, the set of all regular languages is countably infinite. Each regular language is accepted by a deterministic finite automaton (DFA) with finitely many states, and for each fixed number of states n, there are only finitely many such DFAs (specifically, at most k^{n^2} \cdot 2^n); the countable union over all n thus yields countably many distinct languages.^[50] The regular languages can be systematically enumerated by ordering them according to the number of states in their minimal DFA. Let g_k(n) denote the number of distinct regular languages over a k-letter alphabet that are accepted by some DFA with at most n states; equivalently, g_k(n) counts those whose unique minimal DFA (up to isomorphism) has at most n states. Since every DFA with n states accepts a language whose minimal DFA has at most n states, g_k(n) is finite for each n, and g_k(n) = \sum_{m=1}^n f_k(m), where f_k(n) is the number of pairwise non-isomorphic minimal DFAs with exactly n states. The value f_k(n) represents the number of regular languages whose minimal DFA requires precisely n states. For the binary alphabet (k=2), these values form the sequence beginning f_2(1) = 2, f_2(2) = 24, f_2(3) = 1028, f_2(4) = 56014, f_2(5) = 3705306, with subsequent terms growing rapidly (e.g., f_2(6) = 286717796). These counts arise from enumerating the possible transition structures and accepting state sets that yield minimal and accessible automata, canonicalized under state relabeling to eliminate isomorphisms.^[51] All regular languages can be generated algorithmically by enumerating minimal DFAs up to isomorphism. One approach involves generating all accessible complete labeled DFAs and applying state minimization, followed by isomorphism testing via canonical labeling or graph invariants; optimizations exploit the structure of permutation groups for efficiency up to moderate n (e.g., n \leq 10). Such methods have been implemented to compute the sequences above. For fixed k \geq 2, f_k(n) exhibits super-exponential growth. Domaratzki et al. (2002) provide a lower bound f_k(n) \geq f_1(n) \cdot n \cdot (k-1)^n, where f_1(n) is the unary case (asymptotically \sim 2^{n-1} n); however, this is loose for small k. Tighter analysis shows that the proportion of minimal among accessible labeled complete DFAs approaches a positive constant (approximately 0.853 for k=2), implying f_k(n) is asymptotically the number of accessible labeled DFAs divided by n! (accounting for isomorphisms). The number of accessible labeled complete k-ary DFAs with n states grows as \sim k^{n(n-1)} \cdot n^{k n - 1} \cdot e^{-k n} / (k-1)^{n-1}, yielding \log f_k(n) \sim n^2 \log k - n \log n + O(n) after symmetry adjustment.

Length of Words in Regular Languages

In regular languages, the lengths of words are quantified by the sequence a_n = |\{ w \in L : |w| = n \}|, which counts the number of accepted strings of exact length n. The ordinary generating function associated with this sequence is the formal power series S_L(x) = \sum_{n=0}^\infty a_n x^n. For any regular language L, S_L(x) is a rational function, reflecting the finite-state nature of the recognizing automaton.^[52] This rationality arises directly from the automaton's structure via the transfer-matrix method. Given a deterministic finite automaton for L with state set Q, initial state vector \mathbf{u} (a row vector with 1 at the start state and 0 elsewhere), accepting state vector \mathbf{v} (a column vector with 1 at accepting states and 0 elsewhere), and transition matrix T (the adjacency matrix where T_{i,j} is the number of symbols transitioning from state i to j), the generating function is S_L(x) = \mathbf{u} (I - x T)^{-1} \mathbf{v}.^[52] The inverse (I - x T)^{-1} expands as a Neumann series summing paths of length n weighted by x^n, yielding the coefficients a_n = \mathbf{u} T^n \mathbf{v}. The sequence \{a_n\} satisfies a linear homogeneous recurrence relation whose characteristic equation is the denominator polynomial of S_L(x) in lowest terms, with degree at most the number of states. Asymptotically, a_n grows as a_n \sim c \rho^n for some constant c > 0 and growth rate \rho \geq 1, where \rho is the largest (Perron-Frobenius) eigenvalue of T; more precisely, the partial fraction decomposition gives a_n = \sum_k p_k(n) \alpha_k^{-n}, with the dominant term from the pole closest to the origin (radius of convergence $1/\rho).^[52] For example, the regular language L = (ab)^* over alphabet \{a, b\} has automaton states q_0 (initial and accepting) and q_1, with transitions q_0 \xrightarrow{a} q_1 and q_1 \xrightarrow{b} q_0. Here, a_{2n} = [1](/page/1) (solely (ab)^n) and a_{2n+1} = 0 for n \geq 0, so S_L(x) = \sum_{n=0}^\infty x^{2n} = \frac{[1](/page/1)}{1 - x^2}. The growth rate is \rho = [1](/page/1), with periodic zero coefficients for odd lengths. To compute individual a_n, matrix exponentiation evaluates T^n via exponentiation by squaring in O(s^3 \log n) arithmetic operations, where s = |Q| is the number of states, followed by the vector-matrix-vector product. This yields a_n efficiently for large n, consistent with polynomial-time complexity bounds for finite automata computations.^[52]

Extensions

Generalizations

Star-free languages form a proper subclass of the regular languages, defined as those that can be constructed from the empty set, singletons, and the full alphabet using the operations of union, complement, and concatenation, but without the Kleene star operation.^[53] This restriction captures languages whose syntactic monoids are aperiodic, meaning they contain no nontrivial groups, as established by Schützenberger's theorem.^[53] Equivalently, star-free languages are those definable in first-order logic over words with the successor and order predicates (FO[<]), providing a logical characterization of their expressive power.^[54] Piecewise testable languages, introduced by Simon, are regular languages where membership depends solely on the presence or absence of scattered subwords of length up to some fixed k, formalized as boolean combinations of languages of the form A^* a_1 A^* a_2 \cdots A^* a_k A^* for words a_1 \cdots a_k over alphabet A.^[55] Simon's theorem characterizes these languages algebraically: a regular language is piecewise testable if and only if its syntactic monoid is J-trivial, meaning distinct principal ideals are incomparable under inclusion.^[55] This class forms a hierarchy indexed by k, with decidable membership and a strict inclusion within star-free languages.^[56] Two-way finite automata extend the standard one-way model by allowing the read head to move bidirectionally over the input tape, with endmarkers to bound the computation. Despite this added flexibility, Rabin and Scott proved that two-way deterministic finite automata recognize exactly the regular languages, equivalent in expressive power to one-way automata.^[57] Nondeterministic variants similarly capture only regular languages, though simulations may increase state complexity exponentially.^[57] For infinite words, ω-regular languages generalize regular languages to infinite sequences over a finite alphabet, recognized by Büchi automata, which accept if the computation visits some accepting state infinitely often.^[58] Introduced by Büchi to solve decision problems in weak second-order arithmetic, these automata extend finite automata with acceptance conditions suited to ω-words, maintaining closure under boolean operations and projection.^[58]

Modern Variants

Modern variants of regular languages incorporate probabilistic, quantum, and weighted elements to extend classical finite automata, addressing limitations in handling uncertainty, superposition, and quantitative measures. Probabilistic finite automata (PFA) generalize deterministic finite automata by associating probabilities with transitions, where each transition function maps to a stochastic matrix ensuring the sum of probabilities from any state is 1. A 1-way PFA accepts a string if the probability of reaching an accepting state exceeds 1/2 after processing the input. With bounded error (acceptance probability >1/2 for strings in the language and <1/2 otherwise), 1-way PFAs recognize exactly the regular languages, matching the expressive power of classical models while introducing randomness for efficient approximation in noisy environments.^[59] Quantum finite automata (QFA) extend PFAs by using quantum states and unitary transitions, where the machine's configuration is a superposition of basis states represented as a unit vector in a Hilbert space. Transitions are governed by unitary matrices parameterized by input symbols, preserving the norm of the state vector, and acceptance is determined by the squared amplitude (probability) of measuring the state in an accepting subspace. While bounded-error 1-way QFAs recognize only regular languages, exact-acceptance QFAs—requiring probability 1 for acceptance and 0 for rejection—exhibit strictly greater power for certain promise problems, where inputs are promised to belong to specific subsets, enabling separations from classical automata.^[60] Weighted automata generalize regular languages by assigning weights from a semiring to transitions and accepting inputs based on the semiring sum over all paths, rather than mere existence. Over arbitrary semirings, they capture quantitative behaviors beyond binary acceptance; for instance, the tropical semiring (\mathbb{R} \cup \{\infty\}, \min, +, \infty, 0) computes the shortest path weight in a graph represented by the automaton, with applications in optimization problems like distance computation in weighted graphs. Recent developments in the 2020s highlight separations between quantum and classical models, particularly for promise problems. For example, extensions of 2-way quantum-classical hybrid automata demonstrate advantages in recognizing word problems for matrix semigroups, where quantum components enable efficient reversibility not achievable classically. These results underscore QFAs' potential in hybrid computing paradigms.^[61] Such variants address gaps in classical theory by connecting to machine learning; PFAs, for instance, model stochastic sequences for prediction tasks, where learned transition probabilities enable forecasting in applications like natural language processing, outperforming deterministic models on probabilistic data.^[62]

Learning

Algorithms for Learning

Learning regular languages from data typically involves algorithms that infer a finite automaton, such as a deterministic finite automaton (DFA) or nondeterministic finite automaton (NFA), using either queries to an oracle or samples of positive and negative examples. These methods operate under models like exact learning with queries or probably approximately correct (PAC) learning, aiming to identify the target language efficiently. Seminal approaches focus on symbolic representations for exact or near-exact inference, with query-based methods providing guarantees under ideal conditions.^[63] The L* algorithm, introduced by Dana Angluin in 1987, is a foundational query-learning method for identifying the minimal DFA recognizing an unknown regular language. It uses membership queries, which ask whether a specific string belongs to the language, and equivalence queries, which check if a hypothesized DFA accepts exactly the target language (with counterexamples provided if not). The algorithm maintains an observation table to track string behaviors and iteratively refines a hypothesis DFA until equivalence is confirmed, achieving polynomial-time convergence relative to the number of states n in the minimal DFA. It makes at most n equivalence queries and O(n^3) membership queries in the worst case (assuming counterexamples of length O(n)).^[63]^[63] For learning from positive and negative examples without queries, the evidence-driven state merging (EDSM) algorithm provides an effective heuristic in the PAC framework. Developed by Rodney Price and collaborators following the 1997 Abbadingo One competition, EDSM starts with a prefix tree acceptor from the sample traces and iteratively merges states based on an evidence score that measures compatibility, prioritizing merges supported by the most consistent test cases to avoid errors. This approach excels on sparse or noisy data by deferring risky merges, often yielding compact DFAs close to the target. Despite their strengths, these algorithms face limitations in handling inconsistent or noisy data; L* assumes a perfect oracle and may diverge with erroneous responses, while EDSM relies on sample quality and can fail to converge to the minimal DFA on highly sparse or contradictory traces. Extensions address nondeterminism, such as the NL* algorithm, which adapts the L* framework to learn minimal NFAs using similar membership and equivalence queries, though with higher computational overhead due to subset constructions.^[64]^[64] While exact symbolic methods like L* and EDSM remain central, recent work explores neural approximations, such as recurrent neural networks trained to mimic regular languages, offering scalable but less interpretable alternatives for approximate learning. Recent advances as of 2024 include methods for active learning with advice and extracting automata from large language models (LLMs), enhancing scalability for complex systems.^[65]^[66] However, these fall short of exact guarantees, underscoring the enduring value of query- and evidence-based symbolic techniques.

Applications in Practice

Regular languages underpin numerous practical tools in software development and system design, particularly through their implementation via regular expressions and finite state machines. In lexical analysis, tools like Flex generate scanners for compilers by converting regular expressions into efficient finite automata that tokenize source code, enabling the identification of keywords, identifiers, and operators in programming languages such as C or Java. This approach ensures fast, deterministic processing of input streams, forming the first phase of compilation in systems like GCC. Pattern matching applications leverage regular expressions for searching and manipulating text in everyday computing tasks. The GNU grep utility employs extended regular expressions to filter lines in files based on patterns, supporting operations like finding email addresses or log entries across large datasets in Unix-like systems.^[67] Similarly, text editors such as Vim and Emacs integrate regex-based search and replace functions, allowing users to perform global substitutions, such as reformatting code or cleaning data, with support for anchors, quantifiers, and character classes. In web development, regular expressions are applied to simple parsing of HTML and XML for tasks like extracting attributes or validating tags in well-formed documents, though they are limited to non-nested structures due to the context-free nature of full markup languages.^[68] Finite state machines derived from regular languages model state transitions in network protocols, ensuring reliable communication. The Transmission Control Protocol (TCP), as specified in RFC 793, uses a finite state automaton to manage connection establishment, data transfer, and teardown through states like LISTEN, SYN-SENT, and ESTABLISHED, handling events such as segment arrivals and timeouts. This automaton-based design detects errors and maintains protocol integrity in internet routing and transport layers.^[69] In machine learning and natural language processing, regular languages facilitate preprocessing and probabilistic modeling. Regular expressions preprocess text by tokenizing sentences, normalizing case, or removing stopwords, as seen in NLP pipelines for tasks like sentiment analysis where patterns match URLs or punctuation. Probabilistic finite automata (PFAs), extensions of regular languages, underpin hidden Markov models (HMMs) in speech recognition systems, modeling acoustic sequences as state transitions with emission probabilities to decode phonemes and words, achieving high accuracy in tools like early versions of Google's voice search.^[70] PFAs relate to HMMs by representing observable strings generated from hidden states, enabling efficient inference in dynamic programming algorithms like Viterbi decoding.^[71] Recent applications in the 2020s extend regular languages to cybersecurity and formal verification. In intrusion detection systems, automata based on regular languages analyze network traffic for protocol anomalies, such as deviations in packet sequences, using specification mining to model normal behavior in IoT environments and flag attacks like buffer overflows.^[72] For formal verification, regular model checking employs automata to verify infinite-state systems, such as parameterized protocols, by representing reachable states as regular languages and checking temporal properties against linear-time logic specifications in tools like SPIN.^[73] This technique has verified safety in concurrent systems, including cache coherence protocols, by accelerating exploration through automata operations like intersection and emptiness testing.^[74]

References

[1]
Formal Language Definitions - UMBC CSEE
Jan 24, 2004 · The building blocks of regular languages are symbols, concatenation of symbols to make strings (words), set union of strings and Kleene closure ...
[2]
What Are Regular Languages? | Baeldung on Computer Science
May 30, 2023 · Regular languages are formal languages that regular expressions can describe and can also be recognized by finite automata.
[3]
[PDF] Representation of Events in Nerve Nets and Finite Automata - RAND
Stimulus and Response: An organism or robot receives certain stimuli (via its sensory receptor organs) and performs. -certain actions (via its efre·ctor ...
[4]
The Chomsky Hierarchy
These languages form a strict hierarchy; that is, regular languages Ì context-free languages Ì context-sensitive languages Ì recursively enumerable languages.
[5]
[PDF] 1 Closure Properties
Regular Languages are closed under complementation, i.e., if L is regular then. L = Σ∗ \ L is also regular. Proof. • If L is regular, then there is a DFA M = (Q ...
[6]
Pumping Lemma
The Pumping Lemma states that if A is a regular language, then for strings of length at least p, s=xyz exists where xy^i is in A, |y|>0, and |xy|≤p.
[7]
[PDF] 1 Equivalence of Finite Automata and Regular Expressions
L is a regular language iff there is a regular expression R such that L(R) = L iff there is a DFA M such that L(M) = L iff there is a NFA N such that L(N) = L.
[8]
[PDF] Chapter 5 Regular Languages and Regular Expressions - CIS UPenn
Regular expressions have several practical applications. The first important application is to lexical analysis. A lexical analyzer is the first component of a ...
[9]
[PDF] 2 Regular Languages
2.1 Languages. A formal language (or just a language) is a set of strings over some finite alphabet Σ, or. equivalently, an arbitrary subset of Σ∗. For example ...
[10]
[PDF] COMPSCI 501: Formal Language Theory Regular Expressions ...
Jan 30, 2019 · Definition of Regular Expressions. R is a regular expression if it is. ▷ a, for some symbol a ∈ Σ represents {a}. ▷ ε represents {ε}.<|control11|><|separator|>
[11]
A Formal Definition of Regular Languages
A set L of strings over a specified alphabet is said to be a regular language if and only if L is recognized by some FSA.
[12]
[PDF] 22C:111 page 1 of 2 String Notations For a set C of characters, the ...
Each regular expression A denotes a language L(A)Õ C* , referred to as a regular language, as ... language consisting of all strings ending with '1'. • 0*(10*10 ...
[13]
[PDF] 2 Regular Languages - Jeff Erickson
the set of all binary strings whose length is even. 4 1∗(01∗01∗)∗ — the set of all binary strings with an even number of 0s. 4 0 ...
[14]
[PDF] CISC 4090: Theory of Computation - Chapter 1 Regular Languages
Formal definition of finite state automata. A finite (state) automaton (FA) is a 5-tuple (Q,О,Ц,q0,F):. I Q is a finite set of states. I О is a finite set ...
[15]
[PDF] Theory of Computation Non-regular Languages Warm up Problem
Proving that a language is not regular—the Pumping Lemma. Consider the ... B = { 0n1n | n ≥ 0 } is not regular. Proof. Proof by contradiction. Assume ...
[16]
Week 7 Regular Expressions - CS50's Introduction to Programming ...
0:20:53going to keep track of whether or not the user's email matches this pattern? 0:20:57Well, it turns out that it's going to be using a machine of sorts ...
[17]
[PDF] Kleene Algebra with Equations - CS@Cornell
Kleene algebra (KA) is the algebra of regular expressions. Introduced by Stephen. Cole Kleene in 1956, it is fundamental and ubiquitous in computer science.
[18]
Definitions of Regular Language and Regular Expression
Here we are going to learn one type of language called regular language which is the simplest of the four Homsky formal languages, and regular expression which ...
[19]
Regular Expressions and Finite Automata - SI342
Jun 28, 2018 · This means the set of all languages defined by regular expressions is equal to the set of all languages accepted by finite automata, so ...<|control11|><|separator|>
[20]
9. Regular Expressions
The extended regular expression (ERE) notation and construction rules shall apply to utilities defined as using extended regular expressions; any exceptions to ...
[21]
[PDF] Finite Automata and Their Decision Proble'ms#
Abstract: Finite automata are considered in this paper as instruments for classifying finite tapes. Each one- tape automaton defines a set of tapes, ...
[22]
[PDF] an/n log n algorithm for minimizing - Stanford University
The results seem to indicate that the algorithm is practical for minimizing states in finite automata (or testing equivalence of finite automata) of up to ...
[23]
[PDF] On Certain Formal Properties of Grammars*
A grammar can be regarded as a device that enumerates the sen- tences of a language. We study a sequence of restrictions that limit grammars first to Turing ...
[24]
CSci 311, Models of Computation, Chapter 3 Regular Languages ...
Dec 29, 2015 · Algorithm to convert a regular grammar to an nfa. Start with a right-linear grammar and construct an equivalent nfa. We label the nfa's ...
[25]
[PDF] Grammars - Web - UNLV Computer Science
(d) The empty string. Left and right linear grammars are called regular grammars. Note: the (b) rules cannot be mixed. For example, a regular grammer could not ...
[26]
[PDF] Closure Properties of Regular Languages - Stanford InfoLab
Closure Properties of Regular. Languages. Union, Intersection, Difference,. Concatenation, Kleene Closure,. Reversal, Homomorphism, Inverse. Homomorphism. Page ...
[27]
[PDF] Closure Properties of Regular Languages
Fact. The set of regular languages is closed under intersection. and that regular languages are closed under union and complementation. Each state in the ...
[28]
Use of NFA's for Closure Properties of Regular Languages
NFAs are useful to show regular languages are closed under the last three operations (union, concatenation, star). Union. An NFA to recognize L1 ∪ ...
[29]
Properties of Regular Languages
Feb 14, 2025 · 2.2 Other Closure Properties · 2.2.1 The Regular Languages are Closed under Reversal · 2.2.2 The Regular Languages are Closed under Homomorphism.
[30]
[PDF] Closure Properties for Regular Languages - Computer Science
– We define inverse-homomorphism of a language L ∈ Γ∗ as h−1(L) = {w ∈ Σ∗ : h(w) ∈ L ⊆ Γ∗}. Theorem. The class of regular languages is closed under homomorphism ...
[31]
[PDF] CSci 311, Models of Computation Chapter 4 Properties of Regular ...
Dec 29, 2015 · Linz Theorem 4.4 (Closure under Right Quotient): If L1 and L2 are regular languages, then L1/L2 is also regular. We say that the family of ...
[32]
[PDF] Decision Properties of Regular Languages - Stanford InfoLab
A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a. DFA) and tells whether or not some property ...
[33]
[PDF] DECIDABILITY AND UNDECIDABILITY
DFA containment is decidable. Proof hint: using closure properties of regular languages reduce the question to checking emptiness. 5. DFA equivalence. INSTANCE: ...
[34]
[PDF] Regular Languages: Closure Properties and Decidability
Mar 15, 2023 · Theorem. The finiteness problem for regular languages is decidable. Proof. Construct a DFA M with L(M) = L(G). We have |L(G)| = ∞ iff in the ...
[35]
[PDF] Lecture 4: Summary of Regular Languages
Sep 28, 2016 · 2 Minimization of FA and regular expressions. For DFA, there is an efficient (polynomial-time) algorithm for finding a minimal equivalent DFA,.
[36]
[PDF] Handout 3: The Myhill-Nerode Theorem and Implications 1 Overview
The Myhill-Nerode theorem is a fundamental result in the theory of regular languages. It provides a characterization of regular languages, and hence can be ...
[37]
On the state complexity of reversals of regular languages
We compare the number of states between minimal deterministic finite automata accepting a regular language and its reversal (mirror image).
[38]
[PDF] State complexity of union and intersection combined with star ... - arXiv
Jun 18, 2010 · In this paper, we study the state complexities of union and intersection combined with star and reversal, respectively. We obtain the state com-.
[39]
On the Complexity of Hopcroft's State Minimization Algorithm
Aug 7, 2025 · Hopcroft's algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight.
[40]
[PDF] Implementing Regular Expressions
Nov 17, 2000 · Also in Automata Studies (Princeton UP, 1956) pp. 129-. 156. Kleene proved that regular expressions and finite automata describe the same ...
[41]
[PDF] Decidability
Emptiness Problem for DFA: Given a DFA A, is L(A) empty? The corresponding language is. EDFA = {⟨A⟩ : A is a DFA such that L(A) = ∅}. Lemma. EDFA is a ...
[42]
[PDF] Testing the Equivalence of Regular Languages - arXiv
Two NFAs A and B are equivalent, denoted by A ∼ B if they accept the same language. Given an NFA A = (QN ,Σ,δN ,I,FN ), we can use the powerset construction to ...
[43]
[PDF] Finite Automata with Restricted Nondeterminism and Universality
NFA is PSPACE-complete [2]. Deciding whether an automaton accepts every string. (called the universality problem) is NL-complete for DFAs, and PSPACE-complete.
[44]
[PDF] Technical Report No. 2007-534 State Complexity of Basic ...
Jun 11, 2007 · Next, we investigate the state complexity of intersection and union of suffix-free regular languages based on the Cartesian product of states in ...
[45]
(PDF) Pushdown Automata - ResearchGate
Aug 6, 2025 · This chapter contains much of the main theory of pushdown automata as treated in the various introductory books on formal language theory.
[46]
[PDF] Chapter Eleven: Non-Regular Languages
1. Proof is by contradiction using the pumping lemma for regular languages. Assume that L = {anbn} is regular, so the pumping lemma holds for L. Let k be as ...
[47]
AMS :: Proceedings of the American Mathematical Society
Linear automaton transformations. HTML articles powered by AMS MathViewer ; by A. Nerode ; Proc. Amer. Math. Soc. 9 ; PDF | Request permission.
[48]
CS103 Guide to the Myhill-Nerode Theorem
Dec 4, 2024 · The Myhill-Nerode theorem is our go-to tool for proving that languages are not regular. This guide recaps the major definitions needed for ...
[49]
Is the number of regular languages countably or uncountably infinite?
Oct 6, 2020 · Every regular language corresponds to a finite automaton. So the set of regular languages is countable provided the alphabet is countable or finite (and not ...Can $L$ be regular language if it is a union of infinitely many regular ...Every language is regular? - Math Stack ExchangeMore results from math.stackexchange.com
[50]
A129622 - OEIS
Jul 31, 2025 · On the number of distinct languages accepted by finite automata with n states, Journal of Automata, Languages and Combinatorics 7 (2002) 4-18, ...Missing: Shallit | Show results with:Shallit
[51]
[PDF] AC.pdf - Analytic Combinatorics
Analytic combinatorics aims to enable precise quantitative predictions of the proper- ties of large combinatorial structures. The theory has emerged over ...
[52]
[PDF] On Finite Monoids Having Only Trivial Subgroups.
On Finite Monoids Having Only Trivial Subgroups. M. P. SCHÜTZENBERGER. An alternative definition is given for a family of subsets of a free monoid that has ...
[53]
[PDF] How to Prove that a Language Is Regular or Star-Free? - HAL
Sep 25, 2020 · Theorem 1.6 (Schützenberger [47]). For a language L, the following conditions are equivalent: (1) L is star-free,. (2) L is recognised by a ...
[54]
Piecewise testable events - SpringerLink
May 25, 2005 · J.A. Brzozowski and Imre Simon, Characterizations of Locally Testable Events, Discrete Mathematics 4 (1973), 243–271. Article MathSciNet Google ...Missing: original | Show results with:original
[55]
A proof of Simon's theorem on piecewise testable languages
We give a new proof of the Theorem of I. Simon that a language is piecewise testable if and only if it is recognized by a finite F-trivial monoid.
[56]
[PDF] Finite Automata and Their Decision Proble·mst - Otterbein University
Abstract: Finite automata are considered in this paper as instruments for classifying finite tapes. Each one tape automaton defines a set of tapes, ...
[57]
Weak Second‐Order Arithmetic and Finite Automata - Büchi - 1960
First published: 1960. https://doi.org/10.1002/malq.19600060105. Citations ... Download PDF. back. Additional links. About Wiley Online Library. Privacy ...Missing: original | Show results with:original
[58]
[PDF] Chapter BOUNDED ERROR PROBABILISTIC FINITE STATE ...
We denote the class of languages accepted by such automata, with either. 1-way or 2-way input heads, as Regular. Page 3. 3. A 2-way head probabilistic finite ...
[59]
[PDF] Characterizations of 1-Way Quantum Finite Automata
Oct 30, 2000 · Since the acceptance capability of a 1-way QFA depends on the measurements that the. QFA may perform during the computation, we investigate two ...
[60]
[PDF] Two-Way Quantum Finite Automata and the Word Problem - DROPS
The two-way finite automaton with quantum and classical states (2QCFA), defined by Ambainis and. Watrous, is a model of quantum computation whose quantum part ...
[61]
https://drops.dagstuhl.de/storage/00lipics/lipics-vol168-icalp2020/LIPIcs.ICALP.2020.139/LIPIcs.ICALP.2020.139.pdf
[62]
Learning regular sets from queries and counterexamples
A learning algorithm L ∗ is described that correctly learns any regular set from any minimally adequate Teacher in time polynomial in the number of states of ...
[63]
[PDF] Angluin-Style Learning of NFA - IJCAI
A popular setup for an online approach is that of Angluin's. L. ∗ algorithm [Angluin, 1987]: A minimal DFA is learned based on membership and equivalence ...
[64]
[PDF] Understanding Robust Generalization in Learning Regular Languages
We study robust generalization in the context of using recurrent neural networks (RNNs) to learn regular languages.
[65]
Regular Expressions (GNU Grep 3.12)
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions.
[66]
RegEx match open tags except XHTML self-contained tags
Nov 13, 2009 · While parsing arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML.
[67]
RFC 9293 - Transmission Control Protocol (TCP) - IETF Datatracker
This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack.Missing: automaton | Show results with:automaton
[68]
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
[69]
Links between probabilistic automata and hidden Markov models
This article presents an overview of Probabilistic Automata (PA) and discrete Hidden Markov Models (HMMs), and aims at clarifying the links between them.
[70]
[PDF] Specification Mining for Intrusion Detection in Networked Control ...
Aug 12, 2016 · This paper discusses a novel approach to specification- based intrusion detection in the field of networked con- trol systems.
[71]
Regular Model Checking - ACM Digital Library
However, there are many parameterized verification problems that cannot be described by regular languages, ... Read More · Model Checking Vs. Generalized ...
[72]
Regular Model Checking with Regular Relations - ACM Digital Library
Sep 12, 2021 · Regular model checking is an exploration technique for infinite state systems where state spaces are represented as regular languages and ...