One-pass compiler
A one-pass compiler is a compiler that processes the source code of a program in a single sequential traversal, integrating lexical analysis, syntax analysis, semantic analysis, and code generation without revisiting any part of the input.[1] This approach contrasts with multi-pass compilers, which make multiple traversals over the code to perform different phases separately.[2] One-pass compilers are valued for their efficiency, as they require less memory by avoiding the storage of intermediate representations like abstract syntax trees and enable faster compilation times due to the streamlined process.[1][2] However, they impose limitations on language design, such as requiring declarations before use to resolve symbols during the initial scan and complicating the handling of forward references, which often necessitates techniques like backpatching.[2] These constraints make them less suitable for complex optimizations or languages with extensive forward dependencies, restricting their use to simpler constructs and grammars, typically LL(1) parsable.[1][2] Historically, one-pass compilers emerged as a practical choice for resource-constrained environments in the mid-20th century, influencing the design of early programming languages to facilitate single-pass implementation.[2] A notable example is the original Pascal language, developed in the late 1960s by Niklaus Wirth, which enforced strict declaration-before-use rules and avoided features like separate compilation to enable efficient one-pass compilation.[3] Such designs prioritized simplicity and speed, paving the way for educational and prototyping tools, though modern compilers largely favor multi-pass architectures for greater flexibility and optimization potential.[1]Overview
Definition
A one-pass compiler is a type of compiler that processes the source code of a compilation unit in a single sequential traversal, immediately translating each encountered statement or declaration into target machine code without requiring subsequent revisits to earlier parts of the input. This approach contrasts with multi-pass compilers, which typically involve multiple traversals over the source or intermediate representations.[4] In this context, a compilation unit refers to a self-contained portion of the program, such as a single source file or module, that can be compiled independently.[5] A compiler pass, more broadly, denotes a complete processing phase that transforms input into output, often involving analysis and generation steps.[4] Key characteristics of a one-pass compiler include the absence of intermediate storage for a full abstract syntax tree (AST), as the entire structure is not retained; instead, semantic and translation decisions are made on the fly using only the current token stream and symbol information accumulated from prior context.[6] This on-the-fly processing enables direct code generation without buffering the complete parse tree, relying on syntax-directed translation techniques where actions trigger immediate output during parsing.[7] The basic workflow of a one-pass compiler entails reading the input source code once from start to end, interleaving lexical analysis (tokenization), syntactic parsing (structure recognition), semantic analysis (type and scope checks), and code generation (target code emission) within the same traversal.[8] This concurrent execution of phases ensures that each element of the source is handled progressively, with symbol tables updated incrementally to support forward references where possible.[9]Comparison to Multi-Pass Compilers
Multi-pass compilers process the source code through multiple traversals, typically separating the front-end phases—such as lexical analysis, syntax analysis, and semantic analysis—from the back-end phases of optimization and code generation, often using intermediate representations like abstract syntax trees (ASTs) to facilitate deferred processing.[10][11] In contrast, one-pass compilers integrate all these phases into a single, linear traversal of the source code, where the syntactic analyzer invokes contextual analysis and code generation on-the-fly as constructs are encountered, without constructing a complete intermediate representation.[10] A key structural difference lies in symbol handling: one-pass compilers resolve declarations and references immediately during the traversal, necessitating forward declarations or references for any unresolved symbols to avoid errors, whereas multi-pass compilers defer such resolution to later phases by storing symbol information in the AST or symbol tables for global linkage.[10][12] This design choice significantly influences language features; one-pass compilers are best suited to languages enforcing strict declaration-before-use rules, as seen in Pascal, where identifiers must be defined prior to their use to enable real-time resolution, thereby restricting constructs like mutual recursion unless headers or forward declarations are employed.[13][10] Multi-pass approaches, by allowing analysis across the entire program, support more flexible scoping and reference patterns in languages like Java.[10] At a high level, these differences yield efficiency trade-offs: one-pass compilers consume less memory by processing incrementally without retaining full program structures, enabling faster compilation for straightforward languages, but they limit opportunities for comprehensive global optimizations that multi-pass compilers achieve through repeated traversals of intermediate forms.[10][11]Implementation
Parsing and Semantic Analysis
In one-pass compilers, lexical analysis and syntactic parsing are tightly integrated to process the source code in a single forward traversal, without buffering the entire input or generating intermediate representations like parse trees for later phases. The lexical analyzer, or scanner, produces a stream of tokens—such as identifiers, keywords, operators, and literals—that are immediately consumed by the parser as they are generated, enabling continuous processing without revisiting prior input.[14] This direct feed supports top-down parsers, particularly LL(1) or simple recursive descent parsers, which predict the next production based on a single token of lookahead and construct the parse incrementally from left to right.[15] Such parsers are well-suited to one-pass constraints because they facilitate straightforward integration of semantic actions without backtracking. Bottom-up parsers, such as LR parsers, can also be employed in one-pass compilation with appropriate attribute evaluation.[16] Semantic analysis is interwoven with parsing through embedded action routines in the grammar productions, allowing checks and annotations to occur as syntactic constructs are recognized, rather than in a separate phase. For instance, upon encountering a declaration likeint a;, the parser triggers an action to insert the identifier "a" and its type into the symbol table, building it incrementally as scopes open and close during the parse.[15] Type checking is performed immediately when variables are used; for example, in an expression like a + b, the parser verifies that both operands are numeric types by querying the symbol table entries created earlier in the pass.[17] Error detection, such as undeclared variable usage or type mismatches, is reported on-the-fly, preventing propagation of invalid constructs to later stages.[14] This approach relies on attribute grammars, where synthesized attributes (computed bottom-up from children) and inherited attributes (passed top-down from parents) are evaluated during the parse to manage semantic properties without delaying resolution.[16]
To handle dependencies where a symbol is used before its full declaration, one-pass compilers enforce top-down processing and often require forward declarations or predefined scoping rules to ensure symbols are resolvable upon first encounter. In languages supporting this, a preliminary declaration (e.g., a prototype or header) allows the parser to enter partial symbol information into the table, deferring complete details until the full definition is parsed later in the input stream.[14] Symbol table implementations, such as scope stacks or global tables with scope tags, facilitate this by pushing new scopes on entry to a block and popping them on exit, enabling nested lookups without re-scanning.[15] This mechanism avoids backtracking by assuming a linear order where definitions precede uses, or by using temporary placeholders for forward references that are resolved as the pass progresses.[17]
The single-pass constraint imposes challenges on analysis, primarily due to limited lookahead, which restricts the grammar to deterministic forms without left recursion or ambiguity that would require multiple tokens of preview. LL(1) parsers, for example, demand grammars where the first set of alternatives is disjoint based on one lookahead token, necessitating careful design to eliminate conflicts during semantic action placement.[16] Context-sensitive dependencies, such as those spanning distant parts of the program, further complicate matters, as the parser cannot defer resolution without violating the one-pass rule, often limiting supported language features to those amenable to local, on-the-fly evaluation.[14] These restrictions ensure feasibility but demand L-attributed grammars, where attribute dependencies flow left-to-right without cycles, to maintain efficient, single-sweep processing.[17]
Code Generation and Optimization
In one-pass compilers, code generation proceeds immediately upon recognition of syntactic constructs during parsing, emitting target machine code in a sequential manner without buffering the full abstract syntax tree or other program structures.[18] This approach integrates code emission directly into the parsing process, allowing for rapid translation of source elements like expressions, assignments, and control structures into instructions for a target architecture, such as a reduced instruction set computer (RISC).[18] Symbol resolution during code generation relies on the symbol table constructed incrementally as declarations are processed in the single pass. As identifiers are encountered, their attributes—such as types, scopes, and memory addresses—are entered into the table, enabling immediate lookup and insertion of resolved values like variable offsets or procedure entry points into the generated code.[18] For instance, when a variable declaration is parsed, its address is allocated (often with alignment to optimize access), and subsequent references use this information on-the-fly without backtracking.[18] This forward-only resolution imposes constraints, such as requiring declarations before uses in most cases to avoid unresolved references.[1] Due to the linear processing nature, optimizations in one-pass compilers are confined to local techniques that can be applied immediately within the current context, such as constant folding, which evaluates and replaces constant expressions at compile time (e.g., simplifyingx := 5 + 3 to x := 8).[19][1] Algebraic simplifications, like eliminating identities such as x + 0 = x, may also occur locally during expression evaluation.[19] Global optimizations, including dead code elimination or interprocedural analysis across the entire program, are not feasible because the compiler lacks a complete view of the code structure.[19]
The output of code generation in one-pass compilers typically consists of direct assembly code or relocatable object code, tailored for an abstract or physical machine without necessitating additional passes for assembly or basic linking in straightforward scenarios.[18] For example, instructions for stack-based operations or register loads are emitted sequentially, with symbol table data ensuring proper addressing, though complex linking for external references may still require post-processing.[18]