Fact-checked by Grok 2 weeks ago

Linear grammar

A linear grammar is a type of in formal language theory in which each production rule contains at most one nonterminal symbol on the right-hand side. This restriction distinguishes linear grammars from general , which allow multiple nonterminals per production, and positions them as a subclass within the . Linear grammars generate linear languages, which form a proper subset of the context-free languages; while all linear languages are context-free, not all context-free languages are linear. Linear languages encompass all regular languages, as right-linear and left-linear grammars—subclasses of linear grammars—precisely characterize the regular languages via equivalence to finite automata. Notable examples of linear languages include the set of all palindromes over a alphabet, generated by productions such as S \to aSa \mid bSb \mid a \mid b \mid \epsilon, and the language \{a^n b^n \mid n \geq 0\}, which can be derived using S \to aSb \mid \epsilon. However, languages requiring multiple independent recursive structures, such as the Dyck language of balanced parentheses strings, cannot be generated by linear grammars and thus serve as canonical examples of context-free but non-linear languages. Regarding closure properties, linear languages are closed under , , and , allowing constructions similar to those for context-free languages but adapted to maintain the single nonterminal constraint. They are also closed under with regular languages, reflecting their position between and context-free classes. Unlike context-free languages, linear languages lack closure under or complementation in general, highlighting their intermediate expressive power. Parsing linear languages can be performed efficiently using variants of pushdown automata or dynamic programming techniques tailored to their bounded depth.

Definition and Formalism

Formal Definition

A linear grammar is a G = (V, \Sigma, P, S), where V is the of nonterminal symbols (variables), \Sigma is the of terminal symbols (with V \cap \Sigma = \emptyset), P is a finite set of rules, and S \in V is the start symbol (). The productions in P are restricted such that each rule is of the form A \to u B v or A \to w, where A, B \in V are nonterminals, u, v, w \in \Sigma^* are (possibly empty) strings of terminals, and at most one nonterminal appears on the right-hand side. This structure ensures that derivations proceed by expanding a single nonterminal at each step, flanked by terminal strings. form a proper subclass of , which allow multiple nonterminals on the right-hand side without such restrictions. Special cases of linear grammars include left-linear and right-linear grammars, which further restrict the position of the nonterminal in productions of the form with a nonterminal. A left-linear grammar has productions A \to B u or A \to u, where the nonterminal (if present) appears at the left end of the right-hand side. A right-linear grammar has productions A \to u B or A \to u, where the nonterminal (if present) appears at the right end. Epsilon-productions (A \to \epsilon) are permitted in linear grammars following the same conventions as in context-free grammars, including for the start symbol if the empty string is in the language.

Production Rules

In a linear grammar G = (V, \Sigma, P, S), the production rules in P are restricted such that each rule is of the form A \to \alpha, where A \in V is a non-terminal, and \alpha \in (\Sigma \cup V)^* contains at most one non-terminal symbol from V; rules with no non-terminals on the right-hand side, such as A \to w where w \in \Sigma^*, are also permitted. The key constraint is that non-terminals appear exactly once (or not at all) in the right-hand side of any production, ensuring that derivations maintain a linear structure by avoiding branching through multiple non-terminals in a single step. In linear cases involving a non-terminal, it may be positioned at the left end (left-linear), right end (right-linear), or in the interior (general linear). Derivations in linear grammars follow a step-by-step replacement process, denoted as \alpha \Rightarrow \beta when a production A \to \gamma replaces an occurrence of A in \alpha to yield \beta; a sequence of such steps is written \alpha \Rightarrow^* \delta. This process is termed a linear derivation because each intermediate sentential form contains at most one non-terminal, starting from the single non-terminal S and preserving this property through substitutions that introduce at most one new non-terminal per step. Productions violating linearity include those with multiple non-terminals on the right-hand side, such as A \to B C u (two non-terminals followed by terminals) or A \to u B C v (terminals sandwiching two non-terminals), as these introduce branching beyond the single non-terminal limit. Similarly, A \to B u C mixes terminals with more than one non-terminal, breaking the constraint.

Examples and Illustrations

Basic Example

A simple example of a right-linear grammar generates the language L = \{ a^n \mid n \geq 0 \}, which consists of all finite strings composed of zero or more instances of the terminal symbol a, including the empty string \epsilon. This grammar G has a single nonterminal S (the start symbol), terminal alphabet \{a\}, and the following production rules: S \to aS \mid \epsilon These rules ensure that strings are built by optionally prefixing an a to a previously derived string, terminating with the empty production when no more a's are added. To illustrate, consider the derivation of the string aaa (i.e., a^3) from the start symbol S. The sequence proceeds leftmost, replacing the single nonterminal at each step: S \Rightarrow aS \Rightarrow aaS \Rightarrow aaaS \Rightarrow aaa\epsilon = aaa This derivation highlights the linear constraint: each production introduces at most one nonterminal on the right-hand side, resulting in a sequence where only one nonterminal appears per sentential form until termination. The corresponding parse tree is a right-branching chain: S expands to a followed by another S, repeated three times, ending in \epsilon. This grammar exemplifies a linear grammar, as every production adheres to the form with at most one nonterminal in the right-hand side. It is also regular, since it qualifies as right-linear, where all nonterminals (if present) appear at the right end of the right-hand side.

Non-Regular Linear Language Example

A classic example of a non-regular language generated by a linear grammar is L = \{ a^n b^n \mid n \geq 1 \}, which consists of strings with an equal number of a's followed by the same number of b's. This language cannot be generated by a regular grammar, as demonstrated by the pumping lemma for regular languages, which fails for strings like a^p b^p where p is the pumping length. A strictly linear grammar for this language is G = (\{S\}, \{a, b\}, S, P), where the rules are S \to a S b \mid ab. Each contains exactly one non-terminal on the right-hand side, ensuring the by restricting to a single embedded variable. The base S \to ab generates the string for n=1, while the recursive rule S \to a S b wraps matching terminals around an inner . To illustrate, consider the for n=2, yielding aabb: \begin{align*} S &\Rightarrow a S b \\ &\Rightarrow a (ab) b \\ &= aabb \end{align*} This process builds the string outward from the center, with the non-terminal S always positioned between the accumulating a's and b's, enforcing the count equality through linear expansion without requiring multiple non-terminals or context sensitivity. The single non-terminal's central placement in derivations prevents mismatches in a and b counts, as each recursion adds one a on the left and one b on the right, mirroring the stack operations of a that accepts this deterministic . Thus, while equivalent to pushdown automaton acceptance, the linear grammar highlights generation via bounded non-terminal usage.

Relations to Other Grammars

Relation to Regular Grammars

Linear grammars encompass a class of context-free grammars where each production rule contains at most one nonterminal symbol on the right-hand side. Within this class, right-linear grammars—those where the nonterminal, if present, appears at the right end of the right-hand side (e.g., A \to \alpha B or A \to \alpha, with \alpha a string of terminals)—generate exactly the languages. Similarly, left-linear grammars, where the nonterminal appears at the left end (e.g., A \to B \alpha or A \to \alpha), also generate precisely the languages. This equivalence positions right-linear and left-linear grammars at Type 3 in the , corresponding to finite-state automata. The conversion from a right-linear grammar to a regular expression or nondeterministic finite automaton (NFA) follows a straightforward inductive process: treat nonterminals as states, map each production A \to a B to a transition from state A to B on symbol a, and designate the start symbol as the initial state with accepting states corresponding to productions yielding terminals. Conversely, constructing a right-linear grammar from an NFA involves creating a nonterminal for each state, with productions mirroring the transitions (e.g., from state q_i to q_j on a yields Q_i \to a Q_j) and epsilon transitions handled by additional rules. This bidirectional correspondence ensures that every regular language admits a right-linear (or left-linear) grammar, and vice versa, without loss of expressive power. However, not all linear grammars are regular, as the placement of the nonterminal in the middle of the right-hand side (e.g., A \to \alpha B \beta with both \alpha and \beta nonempty) can generate non-regular languages, such as the canonical example \{ a^n b^n \mid n \geq 0 \}. This language, produced by the linear grammar S \to a S b \mid \epsilon, requires unbounded matching beyond finite-state capabilities, as proven by the . Linear grammars were introduced in the 1950s by as part of his foundational work on formal language hierarchies, serving as an intermediate structure between finite-state (regular) models and more powerful phrase-structure systems to analyze linguistic complexity. This positioning highlighted their role in bridging simple with hierarchical syntax in early formal language theory.

Relation to Context-Free Grammars

Linear grammars form a proper of , as every linear grammar is context-free by definition, but the class of linear languages is strictly smaller than the class of context-free languages. For instance, the language L = \{ w \in \{a,b\}^* \mid \#_a(w) = \#_b(w) \} (strings with equal numbers of a's and b's) is context-free but not linear. It can be generated by the context-free grammar with productions S \to a S b S \mid b S a S \mid \epsilon, which includes rules with multiple nonterminals on the right-hand side to handle arbitrary interleaving. The key structural difference arises from the form of production rules. Context-free grammars permit productions of the form A \to \alpha, where \alpha is any string over the alphabet of terminals and nonterminals, potentially including multiple nonterminals (e.g., A \to BC). Linear grammars, however, restrict each production to A \to u B v or A \to w, where u and v are (possibly empty) strings of terminals, B is a single nonterminal, and w is a string of terminals, ensuring at most one nonterminal appears on the right-hand side. Determining whether a given is linear involves a simple algorithmic check: examine each rule and verify that its right-hand side contains no more than one nonterminal. This inspection runs in time linear in the total size of the grammar's set. This restriction on linear grammars implies advantageous characteristics. Linear languages are deterministic context-free languages, allowing them to be parsed deterministically in linear time via pushdown automata or dedicated algorithms, whereas general context-free languages may necessitate nondeterministic with up to cubic in the input length.

Expressive Power

Languages Generated

Linear languages are the class of formal languages generated by linear grammars, denoted L(G) where G is a linear grammar consisting of context-free productions with at most one nonterminal symbol on the right-hand side of each rule. In the , linear languages form a proper subclass that strictly contains all regular languages (Type-3) and is strictly contained within the context-free languages (Type-2), which themselves are a proper subclass of the recursively enumerable languages (Type-0). For instance, the language \{a^n b^n \mid n \geq 0\} is linear, while the language \{a^n b^n c^n \mid n \geq 0\} is context-free but not linear. A characterizing property of linear languages is their acceptance by one-turn pushdown automata, which are nondeterministic pushdown automata that perform at most one change of direction on the stack (pushing until a point and then only popping without further pushing). This equivalence highlights the bounded "linear" structure in derivations, distinguishing them from general context-free languages that may require multiple stack turns. While all regular languages are linear and thus accepted by deterministic pushdown automata, not every linear language is deterministic context-free. For example, the even-length language \{ww^R \mid w \in \{a,b\}^*\} is linear but not accepted by any .

Limitations Compared to Context-Free

Linear grammars, while capable of generating a wide range of context-free languages, suffer from significant expressive limitations when compared to the full class of context-free grammars. Specifically, they cannot generate languages that require multiple independent recursive structures, such as the language L = \{ a^n b^n c^m d^m \mid n, m \geq 0 \}, which consists of strings where the number of a's matches the number of b's independently of the matching between c's and d's. This language is context-free, as it can be generated by a context-free grammar with productions that handle the two pairs separately (e.g., S \to A C, A \to a A b \mid \epsilon, C \to c C d \mid \epsilon), but no linear grammar exists for it. The fundamental reason for this limitation lies in the structure of linear derivations. In a linear grammar, every production is of the form A \to u B v (with u, v strings and B a nonterminal) or A \to w (a string), ensuring that any sentential form during a contains at most one nonterminal at any time. This restricts the grammar to a single "path" of , akin to a single in a , which suffices for dependencies like a^n b^n but fails for two-way or independent dependencies that demand branching or multiple concurrent counts. Consequently, linear grammars lack the ability to "remember" and coordinate multiple unbounded pieces of information simultaneously, unlike general context-free grammars that allow productions with multiple nonterminals (e.g., A \to B C) to enable nested or recursions. These limitations are formally established through an adapted pumping lemma for linear languages, which demonstrates that linear languages form a proper subset of the context-free languages. The lemma states that for any linear language L, there exists a constant p such that every string w \in L with |w| \geq p can be decomposed as w = u v x y z where |v x y| \leq p, |v y| \geq 1, and u v^k x y^k z \in L for all k \geq 0. Applying this to L = \{ a^n b^n c^m d^m \mid n, m \geq 0 \} with a long string like a^n b^n c^n d^n leads to a contradiction, as any valid decomposition and pumping disrupts the equality in at least one pair without affecting the other, producing strings outside L. This contrasts with the standard context-free pumping lemma, which allows more flexible decompositions (e.g., u v w x y with pumping on v and x) sufficient for the language. The lower generative density of linear languages compared to context-free ones is thus quantifiable via such lemmas, highlighting their intermediate position in the Chomsky hierarchy between regular and full context-free languages. Due to these constraints, linear grammars are well-suited for modeling "linear" or single-recursion phenomena, such as simple nested structures in certain natural language constructs or program analyses, but they fall short for fully nested or multiply dependent patterns that require the broader power of context-free grammars.

Closure Properties

Positive Closure Properties

Linear languages demonstrate robustness through several positive closure properties, where applying specific operations to linear languages yields another linear language. These properties are established via constructive proofs that preserve the linear structure of grammars, in which productions contain at most one nonterminal on the right-hand side. The class of linear languages is closed under . Given two linear grammars G_1 with start symbol S_1 and G_2 with start symbol S_2, construct a new grammar G with a fresh start symbol S and productions S \to S_1 and S \to S_2, along with all productions from G_1 and G_2 (renaming nonterminals in one grammar if necessary to avoid conflicts). This G is linear, as the added productions are of the form S \to S_i (with empty terminal strings around the nonterminal), and it generates L(G_1) \cup L(G_2). The class of linear languages is closed under reversal. The reversal of a linear language L, denoted L^R = \{w^R \mid w \in L\} where w^R is the reverse of string w, is also linear. To construct a linear grammar for L^R from a linear grammar G = (V, \Sigma, P, S) for L, form a new grammar G^R with the same nonterminals and start symbol, but replace each production A \to uBv (where u, v \in \Sigma^* and B \in V) with A \to v^RB u^R, and each terminal production A \to w with A \to w^R. This preserves the linear form, as each right-hand side still has at most one nonterminal flanked by terminal strings, and L(G^R) = L(G)^R. Linear languages are also closed under homomorphism and inverse homomorphism. For a homomorphism h: \Sigma^* \to \Delta^* and linear language L \subseteq \Sigma^*, h(L) is linear; the grammar is obtained by replacing each terminal string in the productions of a grammar for L with its image under h, yielding a linear grammar over \Delta. Similarly, for inverse homomorphism, if L \subseteq \Delta^* is linear, then h^{-1}(L) = \{w \in \Sigma^* \mid h(w) \in L\} is linear, via a construction that simulates the homomorphism in the grammar derivations while maintaining linearity. These properties imply that linear languages form a full trio. Additionally, linear languages are closed under intersection with regular languages, which can be shown using a product construction between a linear grammar and a , resulting in a linear grammar for the .

Negative Closure Properties

Linear languages are not closed under . To illustrate this, consider the languages L_1 = \{a^j b^j c^k \mid j, k \geq 0\} and L_2 = \{a^k b^j c^j \mid j, k \geq 0\}, both of which are generated by linear grammars. Their is L_1 \cap L_2 = \{a^n b^n c^n \mid n \geq 0\}, which is not context-free and hence not linear, as proven by the applied to strings of the form a^p b^p c^p where p is the pumping length. This counterexample demonstrates that the intersection of two linear languages may require context-sensitive power beyond linear grammars. Although specific cases like the intersection of \{a^n b^n \mid n \geq 0\} and \{a^m b^{2m} \mid m \geq 0\} yield a linear language \{a^n b^n \mid n \geq 0\}, the general failure under highlights the boundaries of the class. In contrast to regular languages, which are closed under , linear languages exhibit this limitation due to their restricted derivation structure allowing only one nonterminal at a time. Linear languages are also not closed under complement. Since the family is closed under union but not under intersection, closure under complement would imply closure under intersection via De Morgan's laws (), leading to a contradiction. An explicit construction involves the marked-copy language M = \{w c w \mid w \in \{a, b\}^*\} over alphabet \{a, b, c\}, which is not context-free (and thus not linear) by the pumping lemma, as pumping a string a^p b^p c a^p b^p violates the equal-prefix-suffix property. However, its complement \overline{M} is linear, generated by a linear grammar that enforces mismatch between the prefix and suffix around c; if linear languages were closed under complement, then the complement of \overline{M} (i.e., M) would also be linear, which it is not. Linear languages are not closed under or . This, along with the lack of closure under and complement, highlights their intermediate expressive power between and context-free languages.

References

  1. [1]
    4.1. Context-Free Languages - OpenDSA
    Key point: A grammar is context-free if the LHS of every rule is a single variable. ... Definition: A linear grammar has at most one variable on the right hand ...
  2. [2]
    [PDF] CSci 311, Models of Computation Chapter 8 Properties of Context ...
    Dec 29, 2015 · Definition (Linear Grammar): A linear grammar is a grammar in which at ... Let L1 and L2 be context-free languages with the corresponding context- ...
  3. [3]
    [PDF] Pumping lemmas for linear and nonlinear context-free languages
    Nov 30, 2010 · Thus LH is not linear. At this point we can apply Lemma 8, and the ... Example 18 The DYCK language (the language of correct bracket expres-.
  4. [4]
    [PDF] TCB - Lecture 5: Context-Free GRAMMARS and LANGUAGES
    That is, a regular (right-linear) grammar is a context-free grammar such that the right-hand side of every rule contains at most one nonterminal, which if ...
  5. [5]
    [PDF] Introduction to Languages and Grammars
    The definition of linear grammar is a restriction on the definition of context free. • The definitions of left linear and right linear are restrictions on the ...
  6. [6]
    [PDF] Learning of Context-Free Languages: A Survey of the Literature
    any alphabet, there is a universal even linear grammar G0 that generates any ... G is even linear if all productions are of the form A → uBv, where. |u| = |v|.
  7. [7]
    [PDF] Right- and Left-Linear Grammars
    The first production rule creates an edge labeled a between. Vo and V₁. For the second rule, we need to introduce an additional vertex so that there is a ...
  8. [8]
    [PDF] automata theory
    Hopcroft, Ullman “ Theory of Computation & Formal Languages”, TMH. 2. FORMAL ... A linear language is a language generated by some linear grammar. Example. A ...
  9. [9]
    [PDF] Automata: Formal Language: Grammar:
    A regular grammar or right-linear grammar. G is a quadruple (V, Σ, R, S) ... nonterminal. Legal: S → a, S → ε, and T → aS. Not legal: S → aSa and aSa ...
  10. [10]
    [PDF] An Introduction to Formal Languages and Automata - Peter Linz
    Right- anrl Left-Linear Grammars 89. Right-Linear Grammars Generate Regular ... To define a detenninistic linear bounded autornaton, we carl use Definition.
  11. [11]
    [PDF] TIIKEE MODELS FOR TIE DESCRIPTION OF LANGUAGE
    We study the formal properties of a set of grammatical trans- formations that carry sentences with phra.se structure into new sentences with derived phrase.
  12. [12]
    CSci 311, Models of Computation Chapter 5 Context-Free Languages
    Dec 29, 2015 · The family of regular languages is a subset of the family of context-free languages! with no dependencies on other symbols in the sentential ...
  13. [13]
    [PDF] 1 Determinism and Parsing
    It turns out that any deterministic context-free language can be parsed in linear time, though this is not easy to prove, because a deterministic push- down ...
  14. [14]
    [PDF] Context-Free Languages and Pushdown Automata
    contains the family of languages recognized by deterministic one-turn pda (or equivalently, the family of languages simultaneously deterministic and linear).
  15. [15]
    [PDF] CS481F01 Prelim 2 Solutions - CS@Cornell
    Nov 7, 2001 · This is a right-linear grammar; it generates a regular set. There are many non-regular linear languages – the palindromes, for example. (b) Are ...
  16. [16]
    3.3. Closure Properties
    Theorem 14. The class of linear languages is not closed under concatenation and Kleene-star. Instead of a formal proof we offer a suggestion:.
  17. [17]
    [PDF] On describing the regular closure of the linear languages with graph ...
    The language class LIN is neither closed under concatenation nor under Kleene closure. This motivates us to consider classes of formal languages built from ...
  18. [18]
  19. [19]
    Closure properties of linear context-free languages
    May 12, 2015 · Linear languages are closed under union, construction as for context-free grammars S→S1,S→S2. Likewise they are closed under intersection with ...
  20. [20]
    Linear grammar - Wikipedia
    In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.Example · Relationship with regular... · Expressive power · Closure properties
  21. [21]