Fact-checked by Grok 2 weeks ago

Greibach normal form

Greibach normal form (GNF) is a representation for context-free grammars (CFGs) in formal language theory, where each production rule has the form A \to a \alpha, with A a nonterminal symbol, a a symbol, and \alpha a (possibly of nonterminal symbols. This structure ensures that every right-hand side begins with a , preventing rules that start solely with nonterminals or consist only of the (except possibly for the start symbol). Introduced by Sheila A. Greibach in her seminal paper, the form addresses limitations in other normalizations like by enabling direct correspondence between derivations and automaton computations. The central theorem establishes that for any L not containing the empty word, there exists an equivalent CFG in GNF generating exactly L; if the empty word is included, it can be handled by an additional start-symbol production. Proofs of this theorem typically involve transforming a through substitutions and eliminations to achieve the required production shapes, with the resulting size bounded polynomially in the original. GNF holds significant importance in , particularly for constructing pushdown automata (PDAs) from CFGs, as its left-to-right terminal initiation mirrors the input-reading process in PDA simulations of leftmost derivations. This facilitates proofs of language equivalence between CFGs and PDAs, analysis of algorithms in compilers, and extensions to weighted or infinite-word contexts in advanced studies.

Definition

Production Rule Structure

In Greibach normal form, every production rule adheres to the strict syntactic structure A \to a \alpha, where A is a single nonterminal symbol, a is a single symbol, and \alpha is a string of zero or more nonterminal symbols (which may be empty). This form is defined for context-free grammars, ensuring that derivations begin with terminal symbols to facilitate leftmost . A key restriction of this structure is the prohibition against : no production rule may begin with a nonterminal, meaning forms like A \to B \beta (where B is a nonterminal and \beta is any string) are not permitted. This eliminates cycles that could lead to infinite left derivations without progress in consuming input terminals. Regarding the empty string \epsilon, it is allowed exclusively as a production for a designated start symbol, such as S \to \epsilon, but only if the generated language includes \epsilon; no other nonterminal may have a direct empty production in this form, and the start symbol S must not appear on the right-hand side of any rule. Representative examples of valid production rules in Greibach normal form include:
  • S \to [aA](/page/Terminal), where a is a and A is a nonterminal.
  • A \to b, where b is a (here, \alpha is empty).
  • B \to [cBB](/page/String), where c is a and BB is a of nonterminals.

Formal Grammar Requirements

A context-free grammar G = (V, \Sigma, P, S) is in Greibach normal form if all productions in P adhere to the specified production rule structure, the start symbol S does not appear on the right-hand side of any production, and the grammar generates the same language as its original form prior to conversion. This ensures that derivations proceed without left recursion and that the start symbol initiates all valid sentential forms without cyclic dependencies involving itself. The nonterminals in V must satisfy additional constraints to maintain the form's validity: no ε-productions are allowed except possibly the single rule S \to \epsilon if \epsilon \in L(G), and all nonterminals are reachable from the start symbol S and productive (i.e., each derives at least one terminal string), eliminating useless symbols that contribute nothing to the language. These conditions promote a clean grammar structure, where every nonterminal is both reachable from S and productive in deriving terminal strings, avoiding superfluous elements that could complicate analysis or parsing. The generated by a in Greibach normal form remains equivalent to that of the original , preserving L(G) throughout the normalization process. For instance, the grammar with productions S \to aS \mid b is already in Greibach normal form, generating the \{ a^n b \mid n \geq 0 \}; in contrast, a grammar such as S \to Sa \mid a requires conversion to eliminate and achieve the form.

Conversion Process

Eliminating ε-Productions

ε-productions in a are rules of the form A \to \epsilon, where A is a nonterminal and \epsilon is the . These productions allow nonterminals to derive the and must be eliminated (except possibly for the start symbol if the includes \epsilon) to achieve Greibach normal form, as GNF productions cannot be empty except in specific cases. Eliminating ε-productions preserves the generated while preparing the grammar for further normalization steps. The standard algorithm to remove ε-productions proceeds as follows: First, identify the set of nullable nonterminals—those that can derive (including those with direct A \to \epsilon or through chains). This can be computed by starting with nonterminals having \to [\epsilon](/page/Epsilon) and iteratively adding those whose all productions consist of nullables. Then, for every A \to X_1 X_2 \dots X_k (where X_i are symbols), generate all possible productions by omitting subsets of the nullable X_i's, adding A \to Y_1 Y_2 \dots Y_m for each non-empty subset where nullables are skipped. Finally, remove all original ε-productions. If the original start symbol is nullable and the includes \epsilon, introduce a new start symbol S_0 \to S \mid \epsilon; otherwise, if S nullable but \epsilon not in language, ensure no S \to \epsilon. This process may introduce unit productions, which are handled in a subsequent step, and the resulting grammar generates the same language. For example, consider productions S \to A B \mid \epsilon, A \to a \mid \epsilon, B \to b \mid \epsilon. Nullable are S, A, B. For S \to A B, add S \to A, S \to B, S \to \epsilon (but already have), and keep S \to A B. Remove ε-prods, but since \epsilon in , keep S \to \epsilon or use new start. The grammar becomes S \to A B \mid A \mid B \mid \epsilon, A \to a, B \to b. This step ensures the grammar is ε-free (except possibly start), setting the stage for eliminating unit productions and left recursion.

Substituting Unit Productions

Unit productions in a context-free grammar are rules of the form A \to B, where both A and B are nonterminal symbols. These productions do not contribute terminals to derivations and can form chains or cycles that complicate normalization. In the process of converting a grammar to Greibach normal form (GNF), where every production must begin with a terminal symbol, unit productions must be eliminated to ensure all right-hand sides start appropriately. The elimination of unit productions involves substituting them with non-unit productions from reachable nonterminals, preserving the generated . The standard algorithm assumes the grammar is ε-free and proceeds as follows: Construct a where nodes are nonterminals and edges represent unit productions. For each nonterminal A, compute the set N_A of all nonterminals reachable from A via paths in this (including A itself). Then, for every B \in N_A and every non-unit production B \to \gamma (where \gamma is not a single nonterminal), add the production A \to \gamma to the new set of rules. Finally, remove all original unit productions. This process handles chains like A \to B \to C by effectively substituting C's non-unit rules directly into A, and it resolves cycles (such as A \to B \to \cdots \to A) by including all reachable non-unit productions without introducing loops. The resulting generates the same , as any using unit chains can be shortened by replacing them with direct non-unit rules, and vice versa. If a nonterminal B reachable from A via units is nullable (i.e., B \Rightarrow^* \epsilon), the algorithm may introduce A \to \epsilon only if the original grammar allows it; however, since ε-productions are typically removed prior to this step in GNF conversion (except possibly for the start symbol if the language includes the empty string), such cases are avoided to maintain ε-freedom. Consider the grammar with productions A \to B, B \to a \mid C, C \to b, where a and b are terminals. The unit graph has edges A \to B and B \to C, so N_A = \{A, B, C\}. Non-unit productions are B \to a and C \to b. Thus, add A \to a and A \to b, then remove the units, yielding A \to a \mid b. This eliminates the chain while preserving derivations. This substitution prepares the grammar for subsequent steps, such as eliminating left recursion, by ensuring no unproductive unit chains remain.

Eliminating Left Recursion

Left recursion in a context-free grammar occurs when a non-terminal can derive a string beginning with itself, either directly (immediate left recursion, where a production is of the form A \to A \alpha) or indirectly (through a cycle involving other non-terminals). Such recursion prevents the construction of Greibach normal form (GNF), as GNF requires all productions to begin with a terminal symbol, ensuring no leftmost non-terminal derivations. Eliminating left recursion is thus a crucial step in the conversion process, preserving the language generated by the grammar while enabling subsequent substitutions to achieve GNF. For immediate left recursion, consider productions for a non-terminal A partitioned into left-recursive forms A \to A \alpha_1 | \cdots | A \alpha_r and non-left-recursive forms A \to \beta_1 | \cdots | \beta_s, where none of the \beta_i begin with A. The standard transformation introduces a new non-terminal A' and replaces the productions as follows: \begin{align*} A &\to \beta_1 A' \mid \cdots \mid \beta_s A' \\ A' &\to \alpha_1 A' \mid \cdots \mid \alpha_r A' \mid \epsilon \end{align*} This converts the left recursion into right recursion, allowing derivations to progress without infinite loops in top-down parsing contexts. As a simple example, consider the grammar fragment S \to S \alpha \mid a, where a is a terminal and \alpha is a string of grammar symbols. Applying the transformation yields S \to a S' and S' \to \alpha S' \mid \epsilon, eliminating the recursion while generating the same language. Indirect left recursion arises when a non-terminal A_i derives another non-terminal A_j (with j < i) that eventually leads back to A_i through a chain. To handle this, non-terminals are ordered arbitrarily as A_1, \dots, A_n, and substitutions are performed iteratively to convert indirect recursion into immediate recursion, which is then eliminated using the above method. The full procedure assumes the grammar has no cycles among non-terminals and no \epsilon-productions (with nullable sets computed beforehand to handle potential empties). Unit productions should be substituted first as a prerequisite to simplify the grammar. The complete algorithm for eliminating left recursion proceeds as follows:
  1. Arrange the non-terminals in some order A_1, \dots, A_n.
  2. For each i from 1 to n:
    • For each j from 1 to i-1, and for each production A_i \to A_j \gamma, replace it by adding productions A_i \to \delta \gamma for each existing production A_j \to \delta, and remove the original A_i \to A_j \gamma.
    • Eliminate immediate left recursion in the updated productions for A_i using the direct transformation.
This pseudocode ensures all indirect recursions are resolved by substitution before direct elimination:
for i = 1 to n do
    for j = 1 to i-1 do
        for each production A_i → A_j γ do
            remove A_i → A_j γ
            for each production A_j → δ do
                add A_i → δ γ
    eliminate immediate left recursion on A_i
The resulting grammar is free of left recursion and equivalent to the original. For instance, in a grammar with productions A \to B \gamma \mid \beta and B \to A \delta, substitution first replaces A \to B \gamma with A \to A \delta \gamma \mid \beta, yielding immediate recursion A \to A (\delta \gamma) \mid \beta, which is then transformed to A \to \beta A' and A' \to \delta \gamma A' \mid \epsilon.

Substitution to Achieve Greibach Form

After eliminating ε-productions, unit productions, and left recursion, the grammar is free of these issues but may still have productions starting with nonterminals. The final step to achieve Greibach normal form involves substituting the productions of the leftmost nonterminal in each rule to ensure every production starts with a terminal. This step preserves the language and results in the desired form A \to a \alpha. Assuming the grammar is ε-free, unit-free, and left-recursion-free, order the nonterminals A_1, A_2, \dots, A_n in a topological order such that if a production has leftmost nonterminal A_j in a rule for A_i, then j > i (possible due to no ). Process the nonterminals from A_n to A_1: For each production A_i \to A_j \gamma (where j > i, \gamma is a string of nonterminals), replace it with productions A_i \to \beta \gamma for every production A_j \to \beta of A_j. Since A_j (with higher index) has already been processed, all its productions A_j \to \beta start with a (i.e., in GNF). Productions already starting with a remain unchanged. Remove any duplicates after substitution. If the resulting grammar introduces ε-productions due to nullable symbols, they should have been handled earlier, but the process ensures none appear. For example, suppose after prior steps, we have A \to B C \mid a, B \to b D \mid b, ordered with B after A. Processing B first (already starts with b). Then for A → B C, substitute: A → b D C | b C, plus A → a. Now all start with terminals: A → b D C | b C | a, B → b D | b. This achieves GNF while preserving derivations. If the language contains \epsilon, add a new start symbol S_0 \to S \mid \epsilon after the process, ensuring S_0 has no other productions. The resulting grammar is in Greibach normal form and generates the same language.

Properties

Uniqueness and Equivalence

Every context-free language possesses at least one equivalent grammar in Greibach normal form (GNF), as established by a constructive algorithm that transforms any context-free grammar into this form, handling the empty string via a start symbol production if necessary. This existence theorem relies on systematic elimination of certain production types while preserving the generated language. The equivalence of GNF grammars to their original counterparts is guaranteed by the conversion , which involves substituting unit productions, removing , and ensuring each production begins with a symbol followed by zero or more non-terminals; this yields a generating precisely the same L(G). Specifically, the resulting GNF grammar is strongly equivalent to the input grammar, meaning their languages are identical. Although GNF provides a structure, it is not unique: multiple GNF grammars can generate the identical , varying in non-terminal selections or production sequencing due to the non-canonical nature of the transformation algorithms. In relation to (CNF), GNF holds equivalent expressive power, as both forms can describe any , but GNF emphasizes leftmost terminal derivations without the binary non-terminal branching characteristic of CNF.

Implications for Parsing

The structure of Greibach normal form (GNF), where every production rule begins with a terminal symbol, eliminates left recursion inherent in general context-free grammars, thereby facilitating top-down parsing techniques such as recursive descent without the risk of infinite loops during derivation. In a recursive descent parser, the terminal-first rule allows immediate lookahead to select the appropriate production, ensuring that the parser processes the input string in a single pass from left to right, with each recursive call consuming exactly one terminal symbol before expanding nonterminals. This property guarantees that, for a string of length n in the language generated by a GNF grammar, any top-down parser halts after at most depth n, directly mirroring the leftmost derivation process where terminals are generated sequentially. Furthermore, GNF supports predictive in LL(1) frameworks by providing unique prefixes for the alternatives of each nonterminal's productions, enabling unambiguous selection with a single token of lookahead. If the initial for a nonterminal's rules are distinct, the becomes LL(1) compatible, as the parser can deterministically choose the expansion without or , which is particularly advantageous for constructing efficient table-driven parsers. However, the conversion process to GNF can significantly impact parser construction due to potential increases in , which in worst-case scenarios grow exponentially with the number of nonterminals, complicating the generation of tables or . For example, a with chained indirect may require substituting and expanding productions, leading to an explosion in the number of rules—up to $2^{n^2} for n nonterminals in pathological cases—thus raising the time and of building and executing the parser. Although improved algorithms achieve bounds, such as O(|G|^4) where |G| is the original , the practical overhead can still affect for large grammars in design.

Applications

In Formal Language Theory

Greibach normal form (GNF) was introduced by Sheila A. Greibach in 1965 as a for context-free , where every begins with a terminal symbol, ensuring that derivations proceed in a left-to-right manner without , which aligns closely with leftmost derivations. In formal language theory, GNF is useful for analyzing context-free languages (CFLs), as its structure ensures a direct correspondence to leftmost derivations. Since every CFL has an equivalent grammar in GNF, properties of CFLs can be studied using this form . GNF also aids in proofs of non-context-freeness by assuming a grammar in this form and deriving contradictions through analysis of terminal prefixes in derivations; for instance, pumping may alter expected prefix lengths or structures in languages requiring strict balancing, like those involving multiple matched counts. This technique leverages the sequential production of terminals to expose mismatches that violate the language's constraints.

Relation to Pushdown Automata

Greibach normal form (GNF) provides a direct correspondence to (PDAs) by facilitating the construction of a PDA that accepts the same language as the , without epsilon transitions on the input. In a GNF , each is of the form A \to a \alpha, where a is a and \alpha is a (possibly of nonterminals; this structure allows the PDA to simulate the grammar's leftmost derivations by reading one terminal at a time while managing the for nonterminals. The PDA operates with two states: an initial state p and a working state q. From p, it reads the first terminal a and pushes the corresponding \alpha from the start symbol productions. In q, for each nonterminal A on top of the , upon reading a, it pops A and pushes \alpha if A \to a \alpha applies. occurs by empty stack when the input is consumed, ensuring the PDA halts after exactly n steps for an input string of length n. For example, consider the A \to a B in a GNF grammar. This maps to a PDA transition: on input a with A on top of the , pop A and push B, moving to state q. If multiple productions exist for A, such as A \to a B and A \to a C, the PDA becomes nondeterministic, choosing which string to push. This construction preserves the language generated by the grammar, excluding the , and demonstrates how GNF eliminates the need for epsilon moves during input consumption. In the context of deterministic context-free languages (DCFLs), a restricted form of GNF—where all productions for a given nonterminal start with the same terminal—enables the construction of a (DPDA) without nondeterminism. This "deterministic GNF" ensures unique choices for operations upon reading a terminal, aligning with the deterministic nature of the language. Such grammars arise from DCFLs and support DPDA recognition, where the processes one input symbol per step. The conversion follows the standard GNF-to-PDA algorithm but resolves choices deterministically due to the unique terminal prefix per nonterminal. The overall conversion algorithm from a GNF grammar to a PDA simulates leftmost derivations by treating the stack as the current sentential form's nonterminal suffix. Starting with the start symbol on the stack, each input terminal triggers a matching production, replacing the top nonterminal with the production's nonterminal string while consuming the terminal. This process mirrors the grammar's generative steps in reverse for recognition, proving equivalence between GNF grammars and PDAs for context-free languages.

References

  1. [1]
    [PDF] Lecture 7: Greibach Normal Form
    Jan 27, 2015 · We are now introducing an important normal form for CGL, which is like regular expression for regular languages. 1. Page 2. Definition 2.1. A ...
  2. [2]
    [PDF] The Greibach Normal Form Theorem
    The Greibach Normal Form Theorem If L is a context-free language not containing the empty word then L is generated by a context-free grammar in Greibach Normal ...
  3. [3]
    [PDF] Part III: Context-Free Languages and Pushdown Automata
    A pushdown automaton, or PDA, is a finite state machine that ... Proof: The proof is by a construction that begins by converting G to Greibach normal form.
  4. [4]
    A New Normal-Form Theorem for Context-Free Phrase Structure ...
    Analogical Conception of Chomsky Normal Form and Greibach Normal Form for Linear, Monadic Context-Free Tree Grammars. This paper presents the analogical ...
  5. [5]
    [PDF] formal languages and their relation to automata - saved paradigms
    (Greibach Normal Form.) Every context-free language L can be generated by a grammar for which every production is of the form A ~ ace, where A is a variable ...
  6. [6]
    [PDF] Every CFG G can also be converted to an equivalent grammar in ...
    Every CFG G can also be converted to an equivalent grammar in Greibach Normal Form (for short, GNF). A context-free grammar G = (V,Σ, P, S) is in Greibach.
  7. [7]
    [PDF] Removing Left Recursion from Context-Free Grammars* - Microsoft
    We present a new method for removing left recur- sion from CFGs that is both theoretically superior to the standard algorithm, and produces very compact non- ...<|control11|><|separator|>
  8. [8]
    [PDF] 1 Normal Forms for CFG - CS 373: Theory of Computation
    Remove all unit production rules. Let G0 be the grammar obtained from G using this algorithm. Then L(G0) = L(G). Correctness Proof.
  9. [9]
    [PDF] Normal Forms
    Every derivation of a string s contains |s| rule applications. ○ Greibach normal form grammars can easily be converted to pushdown automata with no ε-.<|control11|><|separator|>
  10. [10]
    [PDF] Formal Languages and Logics Lecture Notes
    ... productions, no unit productions and no useless symbols. Proof: Carry out the steps for eliminating ε-productions, unit productions, and useless symbols, in ...
  11. [11]
    Building Greibach Normal Form Grammars Using Genetic Algorithms
    GNF is a left-recursion free grammar which means that any top-down parser will halt at maximum depth n. Left-recursion-free means that one non-terminal ...
  12. [12]
    [PDF] Greibach Normal Form - Archive of Formal Proofs
    Sep 12, 2025 · This theory formalizes Hopcroft and Ullman's algorithm [3] to trans- form a set of productions into Greibach Normal Form (GNF) [2]. We.
  13. [13]
    [PDF] Encoding complex languages with simple grammars
    The method used by SLG relies on a grammar in Greibach normal form, in which each production must begin with a terminal. With the grammar in this form, it is ...<|control11|><|separator|>
  14. [14]
    [PDF] A Comparative Study - Semantic Scholar
    This study aims to identify the impact of the induction of Greibach normal form on conventional LL(1) gram- mar of arithmetic expression. The significance of ...
  15. [15]
    (PDF) LL (1) Parser versus GNF inducted LL (1) Parser on Arithmetic ...
    Jun 12, 2025 · The prime objective of the proposed study is to determine the induction of Greibach Normal Form (GNF) in Arithmetic Expression Grammars to ...
  16. [16]
    [PDF] Greibach Normal Form Transformation, Revisited
    The size of the constructed grammar is O(jGj4) instead of O(jGj6), which we would obtain if we transform G into Chomsky normal form and then into Greibach ...
  17. [17]
    Greibach Normal Form Transformation Revisited - ScienceDirect.com
    We develop a new method for placing a given context-free grammar into Greibach normal form with only polynomial increase of its size.
  18. [18]
    [PDF] Chapter 3 Context-Free Grammars, Context-Free Languages, Parse ...
    ... normal form for context-free grammars, the Greibach Normal. Form. ... version of the pumping lemma for the context-free languages known as Ogden's lemma.
  19. [19]
    [PDF] Context-Free Languages and Pushdown Automata
    The construction of the desired grammar is decomposed into four steps. The two rst ones will lead to an equivalent grammar in quadratic Greibach normal form.Missing: citation | Show results with:citation