Fact-checked by Grok 2 weeks ago

Controlled natural language

Controlled natural language (CNL) is a restricted form of a natural language, such as English, that applies deliberate limitations on vocabulary, grammar, and semantics to enhance clarity, reduce ambiguity, and enable both human readability and computer processability.^[1] These engineered languages bridge the gap between unrestricted natural languages and formal logics, preserving intuitive expressiveness while supporting applications like automated reasoning and machine translation.^[2] CNLs serve three primary purposes: improving human comprehensibility, especially for non-native speakers or in technical contexts; facilitating translation through machine-aided or automated systems; and providing a natural interface for formal knowledge representation and inference.^[1] For comprehensibility, CNLs restrict lexicon and syntax to simplify texts, as seen in early examples like Ogden's Basic English (1930), which limits vocabulary to 850 words for global communication.^[1] Translation-oriented CNLs, such as the Simplified Technical English standard used in aerospace documentation, enforce rules to minimize syntactic variation and idiomatic expressions, aiding consistent multilingual output.^[1] In formal representation, CNLs like Attempto Controlled English map directly to first-order logic, allowing reliable automated processing for semantic web applications and expert systems.^[1] The development of CNLs dates back to the early 20th century, with over 100 English-based variants documented by 2014, evolving from linguistic simplification efforts to sophisticated tools integrated with artificial intelligence.^[1] Classification schemes, such as the PENS framework, evaluate CNLs along dimensions of precision (from vague to semantically fixed), expressiveness (from basic to complex concepts), naturalness (resembling everyday language), and simplicity (ease of description and use).^[1] Ongoing research, supported by groups like the Special Interest Group on CNL, focuses on enhancing CNLs for various applications, ensuring they remain adaptable to emerging computational needs.^[2] Recent research (as of 2025) explores integrating CNLs with large language models to enhance robustness in human-AI interactions and automated reasoning.^[3]

Overview

Definition

Controlled natural language (CNL) is a constructed subset of a natural language, such as English or Japanese, that imposes deliberate restrictions on its lexicon, syntax, and semantics to minimize or eliminate ambiguity and complexity inherent in unrestricted natural languages, while retaining sufficient naturalness for human readability and comprehension.^[1] This engineering approach ensures that CNL expressions can be precisely interpreted, often facilitating direct mapping to formal representations for computational processing. Key attributes of CNL include a controlled lexicon that avoids synonyms, homonyms, and polysemous terms to enforce unique meanings, alongside unambiguous syntactic rules that limit structural variations, such as prohibiting complex nesting or optional elements.^[1] These features render CNL machine-processable, enabling reliable parsing and inference without the interpretive challenges of full natural language, yet the language remains intuitive for non-expert users by mimicking everyday phrasing. Unlike formal languages, which rely on artificial symbols, operators, and rigid notations (as in mathematical logic or programming paradigms), CNL eschews such constructs in favor of verbal forms drawn from natural language, prioritizing accessibility over absolute precision in every context.^[1] In contrast to full natural languages, which permit free-form variation, idiomatic expressions, and contextual inferences leading to ambiguity, CNL systematically curtails these freedoms to achieve determinism.

Purposes and Benefits

Controlled natural languages (CNLs) are primarily designed to facilitate unambiguous communication between humans and machines, ensuring that instructions or specifications can be processed with high precision without misinterpretation. By restricting grammar and vocabulary, CNLs enable automated reasoning and validation, allowing systems to parse and interpret text as formal logic while retaining a natural language appearance. This makes them particularly useful in domains requiring reliability, such as software requirements engineering and knowledge representation.^[1]^[4] A key benefit of CNLs is their enhancement of readability for non-experts, as the simplified structure reduces cognitive load compared to unrestricted natural language, promoting clearer technical writing and documentation. In multilingual contexts, CNLs minimize translation errors by standardizing expressions, leading to more consistent international communication. Additionally, they bridge gaps in natural language processing by providing a middle ground between fully natural text and formal languages, supporting tasks like semantic analysis and information extraction without the need for complete formalization. Recent research has explored integrating CNLs with large language models to enhance semantic parsing for knowledge graph question answering.^[1]^[4]^[5] Quantitative studies demonstrate significant advantages in efficiency and accuracy. For instance, the use of CNLs in translation workflows has shown reductions in post-editing time by up to 20%, with some variants achieving 3-4 times faster processing overall. Surveys indicate that CNLs can significantly reduce ambiguity in complex texts relative to unrestricted English, improving comprehension and downstream automation. These benefits also translate to cost savings in software development, where fewer misunderstandings lead to reduced rework and faster validation cycles.^[1]

History

Origins

The roots of controlled natural languages (CNLs) trace back to the mid-20th century, particularly the 1950s and 1960s, when efforts in machine translation (MT) and early artificial intelligence (AI) encountered profound challenges due to the inherent ambiguity of unrestricted natural languages.^[6] Pioneering MT projects, such as the 1954 Georgetown-IBM experiment, demonstrated limited success with small-scale translations but highlighted issues like polysemy—where words like "pen" could mean a writing instrument or an enclosure—requiring extensive contextual knowledge that computers lacked.^[6] Researchers like Yehoshua Bar-Hillel argued in 1960 that resolving such ambiguities demanded either massive encyclopedic databases or restricted input languages to make computational understanding feasible, laying the conceptual groundwork for CNLs as a means to mitigate these barriers in AI systems.^[6] A significant non-computational precursor influencing later CNL designs was Charles K. Ogden's Basic English, proposed in 1930 as an international auxiliary language to promote global communication in politics, commerce, and science.^[1] This system restricted vocabulary to 850 root words, primarily nouns and verbs, while simplifying grammar to 18 basic rules, aiming to make English accessible to non-native speakers without full linguistic complexity.^[1] Although developed decades before widespread computing and not intended for machine processing, Basic English demonstrated the efficacy of vocabulary and syntactic controls for clarity, inspiring subsequent efforts in technical and computational domains.^[1] The 1970s marked the advent of formal CNLs tailored for computational applications, with one of the earliest being REL English, part of the Rapidly Extensible Language (REL) system developed by F. B. Thompson and colleagues at the California Institute of Technology starting in the late 1960s and refined through the 1970s.^[7] REL English imposed strict grammatical rules on English subsets to enable unambiguous parsing for database queries and requirements specification, allowing users to define new concepts via paraphrases while supporting arithmetic and relational operations.^[7] This language was applied in aerospace contexts for software requirements and data analysis, emphasizing controlled syntax to ensure precision in high-stakes technical specifications.^[8] By the 1980s, CNLs gained practical traction in industry, exemplified by the origins of AECMA Simplified English, initiated in 1979 by the European Association of Aerospace Industries (AECMA) in response to ambiguities in aviation maintenance manuals that contributed to errors and costly translations.^[9] The project, formalized as the AECMA Simplified English Guide in 1986, restricted vocabulary to about 1,100 approved words and enforced writing rules to enhance readability for non-native English speakers and support machine-assisted processing.^[9] This effort built on earlier influences like Basic English and REL, prioritizing syntactic simplicity and semantic consistency to reduce misinterpretation in technical documentation for aircraft operations.^[1]

Key Developments

The 1990s marked a significant surge in controlled natural language (CNL) research, particularly through the Attempto project at the University of Zurich, which developed Attempto Controlled English (ACE) as a precisely defined subset of English for unambiguous knowledge representation.^[10] Launched in the mid-1990s, ACE was designed to bridge natural language and formal logics, enabling domain specialists to author specifications that could be automatically translated into executable forms.^[11] A key advancement was ACE's integration with description logics, allowing translations to formal ontologies for reasoning tasks such as verification and querying, which enhanced its applicability in knowledge engineering.^[12] In the 2000s, the CNL community gained momentum with the inaugural Workshop on Controlled Natural Language (CNL 2009) held in Marettimo Island, Italy, which brought together researchers to discuss similarities, differences, and future directions for CNLs, thereby fostering collaborative development. Concurrently, CNLs saw increased integration with the Semantic Web, where languages like ACE and others served as interfaces for authoring OWL ontologies, enabling non-experts to express complex semantic structures in restricted natural language that mapped directly to RDF and OWL constructs.^[13] Standardization efforts culminated in the ISO 24620 series on language resource management for CNLs, with the first part (ISO/TS 24620-1) published in 2015, establishing basic concepts, principles, and normalizing guidelines for CNL design and use across domains.^[14] Subsequent parts expanded this framework, including ISO 24620-3 (2021) for quality assessment methodologies and metrics, ISO 24620-4 (2023) for stylistic guidelines in English-based CNLs, and ISO 24620-5 (2024) for evaluating completeness and compliance, providing a comprehensive international benchmark for CNL development and evaluation. In the 2020s, CNLs have increasingly integrated with artificial intelligence, particularly large language models (LLMs), to enhance output control and semantic parsing; for instance, LLMs pretrained on vast text corpora have been adapted as CNL parsers for knowledge graph question answering, improving precision in translating restricted inputs to formal representations.^[15] This trend is supported by ongoing workshops, such as the International Workshop on Controlled Natural Language series, to explore AI-driven applications like bridging LLMs with knowledge graphs via CNL intermediaries for more reliable reasoning.^[16]^[17]

Characteristics

Grammatical Restrictions

Grammatical restrictions in controlled natural languages (CNLs) form the syntactic backbone that ensures texts are unambiguous, parsable, and translatable into formal representations, distinguishing CNLs from unrestricted natural languages. These restrictions limit the complexity of sentence structures to prevent syntactic ambiguities, such as those arising from scope, attachment, or coordination, thereby facilitating deterministic parsing where each valid sentence maps to a unique logical form.^[1] By enforcing predefined rules, CNLs achieve machine readability without sacrificing the natural language facade, as seen in their design to support applications like knowledge representation and automated reasoning.^[18] Core restrictions typically include fixed sentence structures, often adhering to a subject-verb-object (SVO) pattern or limited templates, such as single binary relations in some CNLs (e.g., "A person drives a vehicle") to avoid multi-clause complexities. Prohibitions on elements like passive voice, questions, or relative clauses are common to eliminate ambiguities; for instance, many CNLs mandate active voice and declarative statements only, disallowing passives that could obscure agent-patient roles or questions that introduce interrogative scope issues. Additionally, conjunctions causing scope ambiguity, such as those in coordinated noun phrases or verbs, are restricted, and complex noun clusters are capped (e.g., no more than three nouns) to prevent parsing uncertainties.^[1]^[19]^[18] To enable deterministic parsing, rules often eliminate optional elements, homographs, and pronouns in favor of variables or explicit references, ensuring unique parse trees without backtracking or multiple interpretations. Examples include mandatory articles before nouns to resolve definiteness ambiguities and the enforcement of singular nouns only, alongside present tense verbs, which standardize temporal and number agreements. These measures guarantee that syntactic analysis yields a single, unambiguous output, critical for downstream semantic processing.^[1]^[18] Restriction levels in CNLs vary from mildly controlled, such as simplified variants with basic grammar tweaks for readability (e.g., suggesting active voice without strict enforcement), to fully formal ones resembling logic syntax with rigid templates and no tolerance for natural language variability. The PENS framework classifies CNLs by precision (P1: imprecise to P5: fixed semantics) and simplicity (S1: complex to S5: very concise), spanning 25 categories across over 100 English-based CNLs, where higher precision correlates with stricter grammatical controls for formal translatability.^[1]

Vocabulary and Semantic Controls

Controlled natural languages (CNLs) impose strict vocabulary restrictions to minimize lexical ambiguity and ensure precise communication. These typically involve predefined glossaries limited to 800–2,000 words, where each term is assigned a single, fixed meaning without synonyms to prevent multiple interpretations.^[1] Domain-specific terms must be explicitly defined, often through mandatory glosses or ontology-based specifications, allowing extensibility while maintaining semantic consistency.^[20] For instance, technical vocabulary is drawn from approved dictionaries that enforce literal usage, excluding idiomatic or figurative expressions.^[1] Semantic controls in CNLs further disambiguate meaning by prohibiting metaphors, homonyms, and polysemous words, with predefined part-of-speech assignments for each lexical item to eliminate syntactic-semantic conflicts.^[20] Quantifiers are handled through strict scoping rules, such as restricting them to explicit logical forms (e.g., "every" or "at least three") that map directly to formal semantics without nested ambiguities.^[20] These controls often integrate with grammatical restrictions to reinforce unambiguous parsing, ensuring that semantic intent aligns with syntactic structure.^[1] Key techniques for managing lexicon and semantics include concept hierarchies, which organize terms into inheritance-based structures (e.g., "apple is-a fruit") to avoid redundancy and promote reuse across definitions.^[20] Mandatory definitions for all non-primitive terms are required, typically provided as controlled sentences or axiomatic statements, enabling systematic extension without introducing vagueness.^[20] Such hierarchies and definitions facilitate formal verification, reducing the risk of inconsistent interpretations in knowledge representation tasks.^[1] Evaluation of these controls emphasizes semantic coverage tests, which assess whether the vocabulary and rules prevent unintended meanings through metrics like writability (ease of expressing concepts) and understandability (accuracy in comprehension tasks).^[20] For example, graph-based experiments compare CNL statements against visual scenarios to measure truth-value alignment, while paraphrase tests quantify the absence of alternative interpretations.^[20]

Types and Examples

Classification Frameworks

Controlled natural languages (CNLs) are categorized using various taxonomic frameworks that organize them based on design principles, intended use, and linguistic properties. A seminal survey by Kuhn provides a primary classification scheme, dividing CNLs into three main types: bridge CNLs, which facilitate translation between natural language and formal representations or improve human-machine communication; human-oriented CNLs, which prioritize readability and comprehension for human users; and machine-oriented CNLs, which emphasize unambiguous parsing and formal semantics for computational processing.^[1] This classification highlights the spectrum of CNLs as intermediaries between unrestricted natural languages and purely formal logics, with bridge CNLs often serving dual purposes.^[1] CNLs also vary in levels of control, ranging from mild restrictions—such as style guides that suggest vocabulary limitations and grammatical preferences without enforcement—to strict controls that impose rigid syntax and semantics equivalent to formal languages.^[1] Kuhn's PENS scheme quantifies this variation across four dimensions: Precision (unambiguity in interpretation), Expressiveness (range of representable concepts), Naturalness (closeness to everyday language), and Simplicity (ease of learning and use), each rated on a scale from 1 to 5.^[1] Mild CNLs, like those used in technical writing guides, score higher on naturalness and simplicity but lower on precision, while strict ones achieve high precision at the cost of naturalness.^[1] Classification dimensions further refine these categories. CNLs can be distinguished by purpose, such as specification (describing systems or knowledge) versus querying (retrieving information from databases or knowledge bases).^[1] By base language, most documented CNLs derive from English, though variants exist in Japanese (e.g., for ontology engineering) and other tongues to accommodate linguistic diversity.^[1] Output forms represent another axis, with some CNLs producing restricted text for human consumption and others generating formal outputs like first-order logic or database queries.^[1] The ISO 24620 standard, first published as ISO/TS 24620-1:2015, establishes a complementary framework, defining CNLs as subsets of natural languages with controlled grammar and lexicon to minimize ambiguity. It classifies CNLs based on restriction levels—linguistic (e.g., syntax and vocabulary) and extra-linguistic (e.g., domain-specific rules)—and purposes such as enhancing human readability or supporting computational processing. The standard has since been expanded, with ISO 24620-4:2023 providing assessment measures for CNL syntax description and ISO 24620-5:2024 addressing recognition of personal data in free text across languages. These updates guide CNL development across applications, reflecting ongoing standardization efforts as of 2024.^[14]^[21]^[22] A key trade-off in CNL design is between restriction degree and expressiveness, where stricter controls enhance machine interpretability but limit the concepts that can be naturally conveyed. The following conceptual table illustrates this balance:

Restriction Degree	Expressiveness	Example Characteristics	Typical Use
Mild	High	Flexible vocabulary, advisory grammar rules	Human communication aids, like style guides for documentation^[1]
Moderate	Medium	Defined lexicon, partial syntax enforcement	Bridge languages for translation to formal systems^[1]
Strict	Low	Rigid syntax, formal semantics mapping	Machine-oriented specification and inference^[1]

This framework underscores the deliberate engineering of CNLs to balance usability and precision. Recent research, including the Controlled Natural Language workshop series (e.g., CNL 2021), continues to explore these dimensions in contexts like AI semantic parsing.^[23]^[1]

Notable Controlled Languages

Attempto Controlled English (ACE) is a controlled subset of English developed in the 1990s at the University of Zurich for specifying requirements in software engineering and knowledge representation.^[24] It supports the formulation of assertions, queries, and narratives that map deterministically to first-order logic representations, including Prolog output, enabling unambiguous parsing and reasoning for ontology engineering.^[25] Key features include restrictions on complex noun phrases, anaphoric references, and definite descriptions to ensure referential clarity, while allowing modality and subordinated clauses for expressive yet precise descriptions.^[26] ACE has been applied in ontology editors like Attempto Reasoning Language (RACE) and Semantic Web tools, facilitating domain experts in creating formal knowledge bases without programming expertise.^[27] The Process Specification Language (PSL), developed by the National Institute of Standards and Technology (NIST) starting in the mid-1990s, provides a standardized ontology and controlled English interface for describing manufacturing and business processes.^[28] Its core theory defines basic process concepts like activities, occurrences, and ordering, with an English-like syntax restricted to declarative sentences for formal interchange among software applications.^[29] PSL integrates with XML for serialization and has been formalized as ISO 18629, supporting automated reasoning over process models in design, production, and supply chain domains.^[30] This language emphasizes neutrality to bridge disparate manufacturing systems, enabling precise specification of temporal and causal relations without ambiguity.^[31] Rabbit is a controlled natural language designed for ontology authoring, particularly translating simple English sentences into OWL descriptions to bridge domain experts and knowledge engineers.^[32] Developed around 2008 by the Ordnance Survey, it features a limited set of sentence patterns for declarations, axioms, and imports, ensuring high precision in formal representations while remaining readable for non-technical users.^[27] In legal contexts, Rabbit has been adapted for semantic wikis and automated analysis tools, such as games simulating legal reasoning from controlled text inputs.^[33] Its use cases include creating domain-specific ontologies in fields like geography and law, where it supports iterative refinement of knowledge structures.^[34] Examples of multilingual controlled natural languages include Controlled Polish, which extends controlled English principles to Polish syntax within the Grammatical Framework (GF) resource grammar library for parallel multilingual processing.^[35] Developed as part of broader CNL efforts in the 2010s, it supports semantic rules representation and machine processing for business and information systems modeling, ensuring cross-lingual consistency in ontology and rule specification.^[36] This approach facilitates applications in international knowledge representation, where texts in Polish are parsed equivalently to their English counterparts for formal reasoning.^[35]

Processing

Parsing Techniques

Parsing techniques for controlled natural languages (CNLs) exploit the language's grammatical restrictions to enable deterministic syntactic analysis, ensuring that each valid input yields a unique parse without ambiguity or backtracking. This contrasts with unrestricted natural language processing, where parsers must navigate multiple possible interpretations, often requiring nondeterministic methods. By design, CNLs support efficient algorithms such as top-down predictive parsers (e.g., LL(k)) or optimized chart parsers, which predict and construct parse trees incrementally based on limited lookahead. These restrictions, including fixed syntactic structures and avoidance of garden-path sentences, facilitate parsability by eliminating the need for exhaustive exploration of parse forests.^[27] A key advantage of CNL parsing is its deterministic nature, where the grammar rules guarantee a single valid derivation path for compliant inputs. Top-down parsers begin from the start symbol and descend through the grammar, matching tokens sequentially with minimal revisions, while chart parsers maintain a dynamic table of partial parses to reuse substructures efficiently. In CNLs, the absence of left-recursion and ambiguity allows these methods to operate without backtracking, achieving linear time complexity O(n) relative to input length n, compared to the cubic O(n³) complexity of general context-free parsers like the Cocke-Kasami-Younger algorithm for ambiguous grammars. This efficiency stems from the CNL's subset of context-free languages that are suitable for deterministic recognition, often aligning with LR(1) or LL(1) classes.^[27] The Attempto Parsing Engine (APE), a seminal tool for Attempto Controlled English (ACE), exemplifies these techniques through its implementation in SWI-Prolog using Definite Clause Grammars (DCGs). DCGs encode the ACE grammar as Prolog clauses augmented with difference lists for token consumption, enabling both syntactic parsing and initial feature-based semantic attachment in a unified framework. APE processes ACE texts to produce a discourse representation structure, handling the language's constraints like mandatory articles and restricted quantification to ensure unambiguous results. Its open-source availability under the GNU Lesser General Public License has facilitated integration into various knowledge representation systems.^[37]^[38]^[39] Error handling in CNL parsers emphasizes user-friendly feedback to enforce compliance, often through diagnostic messages that pinpoint violations. APE, for example, generates detailed warnings and errors via a message container system, logging issues like unknown words, syntactic mismatches, or semantic inconsistencies to standard error output while suggesting resolutions such as word replacements or rephrasing. This approach includes highlighting problematic phrases and providing context-specific guidance, reducing the cognitive load for authors iterating on CNL texts. Such mechanisms are integral to the parsability enabled by CNL restrictions, promoting iterative refinement without derailing the overall process.^[37]^[40]

Encoding and Semantic Representation

In controlled natural languages (CNLs), the encoding process transforms parsed syntactic structures into formal semantic representations, enabling computational reasoning and interoperability with knowledge bases. A primary method involves mapping CNL sentences to first-order logic (FOL), where declarative statements are converted into logical formulas that capture quantifiers, predicates, and relations unambiguously. For instance, in Attempto Controlled English (ACE), the sentence "Every dog barks" is mapped to the FOL formula \forall x (Dog(x) \rightarrow Barks(x)), ensuring precise quantification over individuals.^[39] Similarly, mappings to description logics (DL) support subclass hierarchies and property restrictions, as seen in systems like PENG-D, where "If X is a labrador then X is a dog" translates to the DL axiom Labrador \sqsubseteq Dog.^[41] Integration with Semantic Web standards further standardizes these representations, allowing CNL outputs to be serialized in formats like RDF and OWL for ontology engineering. In ACE, parsed texts are translated into OWL DL axioms, facilitating bidirectional exchange with RDF triples; for example, "Nic is a human" becomes the RDF assertion nic rdf:type Human or the OWL class assertion Nic : Human.^[39] PENG-D extends this by generating RDF Schema (RDFS) and OWL Lite structures from CNL, ensuring decidable reasoning within the DL-safe rules paradigm, such as encoding domain constraints like "If X has Y as dog then X is a human" as hasDog \sqsubseteq Human in OWL.^[41] These encodings support XML-based serialization for web-scale knowledge graphs, promoting reuse in tools like ontology editors. Discourse representation structures (DRS) play a crucial role in handling discourse-level semantics, particularly anaphora resolution across sentences in CNL texts. In ACE, the Attempto Parsing Engine produces DRS as reified FOL variants, using discourse referents to link entities; for example, the text "A customer X greets a clerk. The clerk is happy. X is glad" yields a DRS with predicates like predicate(greet, [customer](/page/Customer), [clerk](/page/Clerk)) and predicate(be, [customer](/page/Customer), glad), where X resolves the anaphoric reference to the customer without explicit variable binding in the output.^[42] This structure extends standard FOL to accommodate plurals, generalized quantifiers, and context, enabling robust semantic integration for multi-sentence CNL inputs. Bidirectionality enhances verification by generating CNL from formal representations, allowing users to check fidelity between natural-language inputs and logical outputs. In ACE, tools like AceView and the OWL-ACE mapping support round-trip translation, where an OWL DL axiom such as Person \sqsubseteq \exists hasChild.Child is rendered as "Every person has a child," aiding validation in ontology development.^[43] PENG Light employs bidirectional grammars for similar purposes, parsing CNL to logic and inversely generating text from Horn clauses, which supports iterative refinement in knowledge representation tasks.^[44]

Applications

Knowledge Representation and Reasoning

Controlled natural languages (CNLs) play a pivotal role in knowledge representation by enabling the authoring of formal ontologies in a human-readable format that can be systematically translated into description logics such as OWL. This approach bridges the gap between domain experts, who may lack expertise in formal syntax, and computational systems requiring precise semantics. For instance, CNLs facilitate the construction of ontologies by restricting natural language to unambiguous constructs that map directly to OWL axioms, ensuring both expressiveness and verifiability.^[20] A prominent example is Attempto Controlled English (ACE), which allows users to write ontology statements in controlled English that are parsed into OWL via an intermediate representation aligned with Manchester OWL syntax. This translation process supports the creation of complex class definitions, property restrictions, and relationships, making ontology engineering more accessible without sacrificing formal rigor. Tools like ACE View extend this capability by providing an interactive editor for OWL 2 ontologies and SWRL rules, where users edit in ACE and receive bidirectional verbalizations for validation.^[45]^[46] In reasoning tasks, CNLs contribute by converting controlled sentences into first-order logic (FOL) representations suitable for automated theorem proving. This enables inference mechanisms, such as consistency checks and entailment verification, in expert systems where CNL inputs are transformed into logical forms processed by reasoners like RACE Prover for ACE. For example, a CNL statement like "Every woman is a person" can be mapped to FOL ∀x (woman(x) → person(x)) and checked for logical consistency against a knowledge base, supporting applications in formal verification.^[47] CNLs enhance Semantic Web applications by serving as query interfaces to knowledge bases, allowing users to pose questions in controlled English that are translated into SPARQL or DL queries. The Attempto Parsing Engine (APE) exemplifies this by enabling natural language querying over RDF/OWL data, as demonstrated in interfaces for exploring semantic repositories. Rabbit CNL supports collaborative ontology building between experts and engineers, translating domain-specific sentences into OWL for reasoning.^[48]^[34] CNLs also extend to advanced AI reasoning through integrations with non-monotonic logics, allowing default inferences in uncertain domains. A framework for CNLs with defeasible reasoning, for instance, incorporates exceptions and priorities, enabling applications like policy decision-making where defaults can be overridden by specific facts. This is achieved by augmenting FOL translations with non-monotonic operators, as explored in extensions supporting default logic for practical knowledge bases.^[49]

Requirements Engineering and Documentation

Controlled natural languages (CNLs) play a crucial role in requirements engineering by enabling the elicitation and specification of unambiguous requirements, which minimizes misinterpretations and supports automated validation through parsing. In software design, CNLs facilitate precise expression of functional and non-functional requirements, allowing stakeholders to articulate needs in a readable yet formally verifiable format. For example, the Semantics of Business Vocabulary and Business Rules (SBVR) standard, developed by the Object Management Group, uses a controlled subset of English to define business rules and vocabulary, ensuring consistency and machine-processability during requirements analysis. This approach aids in transforming informal stakeholder inputs into structured specifications that can be parsed to detect inconsistencies or ambiguities early in the development lifecycle. More recently, CNLs have been explored for specifying business intelligence application requirements.^[50]^[51] In technical documentation, CNLs enhance clarity and reduce errors in manuals and specifications, particularly in safety-critical domains. The ASD-STE100 Simplified Technical English specification, an international standard for aviation maintenance documentation, restricts vocabulary to about 900 approved words and enforces 65 writing rules to eliminate ambiguity, thereby improving comprehension and reducing the risk of procedural errors during aircraft maintenance.^[52] Studies on its implementation show that documents adhering to ASD-STE100 are read up to 40% faster and support more efficient translations, contributing to overall error reduction in multilingual technical environments.^[53] Validation of such CNL-based documents often involves parsing to confirm compliance with syntactic and semantic rules, ensuring traceability from requirements to implementation.^[54] Notable case studies illustrate CNL's impact in high-stakes projects. NASA has utilized structured natural language approaches since the 1970s to specify mission-critical requirements for space systems. More recently, the Formal Requirements Elicitation Tool (FRET), developed by NASA Ames in the late 2010s, translates structured English statements into formal temporal logic for verification and simulation. This tool has been applied in projects such as advanced aircraft concepts and robotics missions.^[55] In European Union projects, CNLs support multilingual documentation; for instance, the MOLTO project (Multilingual Online Translation) employed controlled natural language generation from a multilingual FrameNet-based grammar to produce consistent technical specifications across languages, facilitating cross-border collaboration in research and development.^[56] CNLs integrate seamlessly with requirements management tools to maintain traceability and support lifecycle processes. For example, CNL specifications can be linked within IBM DOORS, a widely used platform for requirements tracking, allowing automated checks for consistency and impact analysis during changes.^[57] This integration ensures that requirements remain verifiable and aligned with design artifacts, as demonstrated in automotive and aerospace projects where CNL parsing feeds into traceability matrices.^[58]

Challenges

Limitations and Drawbacks

Controlled natural languages (CNLs) often face expressiveness trade-offs, where restrictions on syntax and vocabulary to ensure unambiguity limit their ability to capture complex nuances, such as conditionals or intricate logical structures in milder variants.^[1] This can result in increased verbosity, as users must rephrase ideas using longer, more explicit constructions to fit the constrained rules, potentially reducing efficiency in communication.^[59] For instance, highly restricted CNLs like those aligned with propositional logic (E1 in classification schemes) sacrifice broader semantic depth for precision, making them unsuitable for domains requiring nuanced descriptions.^[1] The learning curve for CNLs presents another drawback, as users accustomed to unrestricted natural language must undergo training to adhere to specific grammatical and lexical limitations, which can lead to initial resistance and slower adoption.^[1] Domain experts, in particular, may find the shift challenging despite CNLs' relative accessibility compared to formal logics, with empirical studies noting difficulties in encoding knowledge without violating rules.^[60] This training requirement can hinder widespread use in collaborative environments where participants prefer intuitive expression over enforced simplicity.^[59] Scalability issues arise in maintaining and adapting CNLs, as expanding glossaries or tailoring them to new domains demands ongoing effort to preserve unambiguity without introducing inconsistencies.^[1] Restricted vocabularies and incomplete feature support, such as limitations in handling separation of duties or temporal conditions, can impede application to complex, real-world scenarios like access control policies.^[61] Domain adaptation further complicates scalability, as CNLs optimized for one field may require significant revisions for others, increasing maintenance costs.^[60] Empirical evaluations highlight coverage limitations, with surveys indicating that CNLs often achieve partial coverage of real-world texts due to their inability to parse or represent certain syntactic variations and inconsistencies without loss of meaning.^[1] User experiments reveal mixed outcomes, such as improved comprehension in controlled settings but persistent usability challenges from reduced naturalness, underscoring efficiency challenges in processing diverse inputs compared to full natural language.^[1] These findings emphasize the trade-offs in precision gains versus practical applicability across varied corpora.^[59]

Future Directions

One emerging trend in controlled natural language (CNL) research involves its integration with large language models (LLMs) to create hybrid systems that enhance precision in human-AI interactions. For instance, CNL principles have been adapted into prompting strategies, such as Controlled Natural Language for Prompt (CNL-P), which applies structured grammar and semantic norms to reduce ambiguity in LLM inputs, thereby improving output consistency and quality.^[3] This approach draws on software engineering practices like linting for prompt validation, enabling more robust "APIs" for AI collaboration. Additionally, CNLs serve as intermediate representations in semantic parsing tasks, where LLMs translate natural language queries into CNL forms for knowledge graph question answering, mitigating issues like hallucinations by enforcing verifiable semantic constraints.^[62] Efforts to expand CNL beyond English-centric applications are gaining momentum, particularly through multilingual frameworks that support low-resource languages. Grammatical Framework (GF)-based systems enable embedded CNL implementations for machine translation and generation, allowing bilingual or multilingual grammars that maintain controlled syntax across languages. The ISO 24620 series provides foundational principles for CNL design, with recent parts like ISO 24620-4:2023 introducing stylistic guidelines for English that could inform broader multilingual adaptations, though explicit support for low-resource languages remains an active research gap addressed via transfer learning in related NLP tools. These developments aim to leverage ISO standards for creating CNLs in underrepresented languages, facilitating applications in global knowledge representation. Automation in CNL development is advancing through AI-assisted tools, particularly for grammar generation and quality evaluation. Large language models, via in-context learning, can automate the creation of grammatical rules and lexical entries from bilingual resources, offering a cost-effective method to bootstrap CNL specifications for endangered or low-resource languages without extensive manual annotation.^[63] For evaluation, frameworks incorporate translation metrics such as BLEU and METEOR alongside execution accuracy to assess CNL semantic parsing, ensuring high fidelity in downstream applications like legal or requirements engineering.^[5] Recent workshop proceedings highlight dedicated quality evaluation methodologies for CNLs, focusing on metrics for agile execution and comprehensibility.^[23] Open challenges persist in balancing CNL usability with formality, as overly restrictive grammars can hinder adoption while insufficient controls compromise precision. CNLs inherently navigate this tension by subsetting natural language to enhance machine processability without fully sacrificing human readability, yet empirical studies underscore the need for user-centered testing to optimize comprehensibility.^[64] Standardization efforts, such as updates to ISO 24620, are evolving to address these issues; for example, Part 4 proposes methodologies for stylistic rules that could extend to hybrid plain language systems, with future parts potentially incorporating multilingual and AI-aligned guidelines to resolve usability-formality trade-offs.

References

[1]
[PDF] A Survey and Classification of Controlled Natural Languages
Controlled natural language (CNL) is a restricted, engineered version of a natural language, that can be processed by a computer and used by non-specialists.
[2]
Controlled Natural Language: About
Controlled Natural Languages (CNL) are languages that are based on natural language but apply restrictions on vocabulary, grammar, and/or semantics.Missing: definition | Show results with:definition
[3]
A comprehensive review on resolving ambiguities in natural ...
The survey led to some of the related work, which proposed a hybrid ... Controlled Natural Language (CNL) and style guide approches. Despite of the ...<|control11|><|separator|>
[4]
[PDF] Machine translation over fifty years - ACL Anthology
This paper traces the history of efforts to develop computer programs (software) for the translation of natural languages, commonly and traditionally called ' ...
[5]
[PDF] A RAPIDLY EXTENSIBLE LANGUAGE SYSTEM (REL ENGLISH)
The REL language processor is designed to accommodate a variety of languages whose structural charaCter- istics may be considerably divergent. The REL English ...
[6]
[PDF] REL English for the User, - DTIC
This version contains the REL English language and a data base on U.S. commercial aircraft. (4) REL ENGLISH. This version is a base version (see section on ...
[7]
About ASD-STE100
ASD-STE100 (STE) is a controlled language developed in the early Eighties (as AECMA Simplified English) to help the users of English-language maintenance ...Missing: 1980s | Show results with:1980s
[8]
Attempto Project
Attempto Project. Attempto is a research project of the University of Zurich with the objective to develop Attempto Controlled English (ACE) and its tools.
[9]
[cmp-lg/9603003] Attempto Controlled English (ACE) - arXiv
Mar 13, 1996 · Attempto Controlled English (ACE) allows domain specialists to interactively formulate requirements specifications in domain concepts. ACE can ...Missing: 1990 | Show results with:1990
[10]
[PDF] Attempto Controlled English for Knowledge Representation
Attempto Controlled English combines pros of formal and natural languages. • ACE is a controlled natural language. – precisely defined, tractable subset of ...
[11]
[PDF] Controlled Natural Languages and the Semantic Web - ePrints Soton
Jul 27, 2008 · In most cases, these languages have been used to support the authoring of Semantic Web ontologies by translating CNL expressions into OWL models ...
[12]
ISO/TS 24620-1:2015 - Language resource management
Controlled natural language (CNL)Part 1: Basic concepts and principles ... Publication date. : 2015-03.
[13]
Language models as controlled natural language semantic parsers ...
Our research hypothesis is that the pretraining of large language models (LLMs) on vast amounts of textual data leads to the ability to parse into controlled ...
[14]
CNL: Controlled Natural Language 2026 2025 2024 ... - WikiCFP
This workshop on Controlled Natural Language (CNL) has a broad scope and embraces all approaches that are based on natural language and apply restrictions on ...
[15]
Bridging Language Models and Knowledge Graphs with Controlled ...
Nov 4, 2024 · We hypothesize that knowledge graphs can be effectively connected to large language models via controlled natural languages.Unlike standard ...
[16]
[PDF] Controlled Natural Languages for Knowledge Representation
The primary syntactic restrictions are the use of present tense verbs, singular nouns, and variables instead of pronouns. Despite these limitations, CLCE can ...
[17]
[PDF] Controlled Natural Language and Opportunities for Standardization
Jun 8, 2013 · • Syntactic restrictions, e.g. “Use active voice”. • Semantic restrictions, e.g. “Use may only to grant permission”. • Stylistic restrictions ...
[18]
[PDF] Controlled Natural Languages for Knowledge Representation
This paper presents a survey of research in controlled natural languages that can be used as high-level knowledge representa- tion languages.
[19]
[PDF] Attempto Controlled English (ACE)
Attempto Controlled English (ACE) is a language specifically designed to write specifications. ACE is a controlled natural language (cf.
[20]
[PDF] General architecture of a controlled natural language based ...
Attempto Controlled English (ACE) [6] is a general purpose first-order language (FOL) with English syntax, i.e. ACE can be viewed as both a natural language ...
[21]
https://www.iso.org/standard/79087.html
[22]
A Survey and Classification of Controlled Natural Languages
Controlled, processable, simplified, technical, structured, and basic are just a few examples of attributes given to constructed languages of the type to be ...Background · Languages · Analysis · Appendix: Full List of English...
[23]
The Process Specification Language (PSL) Overview and Version ...
Feb 1, 2000 · The Process Specification Language (PSL)(Version 1.0) developed at the National Institute of Standards and Technology identifies, formally defines, and ...
[24]
[PDF] The Process Specification Language (PSL) overview and version ...
... processes related to manufacturing, including all processes in the design/manufacturing life cycle. Business processes and manufacturing engineering ...
[25]
ISO 18629 PSL : A STANDARDISED LANGUAGE FOR ...
This paper is aimed at presenting the approach of ISO 18629, i.e. the Process Specification Language (PSL), to this problem. In the first part, the architecture ...
[26]
[PDF] PROCESS SPECIFICATION LANGUAGE (PSL)
This paper focuses on the Process Specification Language (PSL) effort at the National Institute of Standards and Technology whose goal is to identify, formally ...Missing: features | Show results with:features
[27]
Rabbit to OWL: Ontology Authoring with a CNL-Based Tool
Recent work on ontology engineering has seen the adoption of controlled natural languages to ease the process of ontology authoring.
[28]
[PDF] Controlled Natural Language in a Game for Legal Assistance
The aim of the paper is to highlight the challenges in formal reasoning about legal texts through a CNL interface, and set out a framework in which to ...
[29]
Developing a Control Natural Language for Authoring Ontologies
This paper describes Rabbit, a Controlled Natural Language that can be translated into OWL with the aim of achieving both comprehension by domain experts and ...
[30]
[PDF] Multilingual CNL in GF and at Digital Grammars
CNL = Controlled Natural Language. = language defined by a formal grammar (as ... Norwegial (nynorsk), Persian, Polish, Punjabi, Romanian, Russian, Sindhi,.
[31]
Controlled natural language for business rules specification inthe ...
What is here called controlled natural language (CNL) has traditionally been given many different names. ... Polish controlled language and its machine ...
[32]
Attempto/APE: Parser for Attempto Controlled English (ACE) - GitHub
If there is a man X1 then the man X1 is a human. APESERVERSTREAMEND Connection closed by foreign host. HTTP interface to APE.
[33]
Attempto Tools and Resources
Apr 4, 2025 · APE (ACE parser). The source code of the Attempto Parsing Engine (APE) in available under the GNU Lesser General Public License. Recent versions ...
[34]
(PDF) Attempto Controlled English for Knowledge Representation
Aug 7, 2025 · ACE is supported by a number of tools, predominantly by the Attempto Parsing Engine (APE) that translates ACE texts into Discourse ...
[35]
Attempto Controlled English for Knowledge Representation
Attempto Controlled English (ACE) is a controlled natural language, i.e. a precisely defined subset of English that can automatically and unambiguously be ...
[36]
[PDF] Controlled Natural Language meets the Semantic Web
Abstract. In this paper we present PENG-D, a proposal for a controlled natural language that can be used for expressing knowledge about resources.Missing: integration | Show results with:integration
[37]
[PDF] Discourse Representation Structures for ACE 6.6
This technical report describes the representation of discourse representation structures (DRS) derived from version 6.6 of Attempto Controlled English (ACE 6.6) ...
[38]
Bidirectional Mapping Between OWL DL and Attempto Controlled ...
We describe ongoing work on a bidirectional mapping between Attempto Controlled English (ACE) and OWL DL. ACE is a well-studied controlled natural language ...
[39]
[PDF] a Bidirectional Grammar for a Controlled Natural Language
Abstract. This paper introduces the controlled natural language PENG. Light together with a language processor that is based on a bidirectional grammar.
[40]
[PDF] ACE View — an ontology and rule editor based on Attempto ...
As the entailments are structurally simple, their natural language verbalization does not bring significant usability improvement over a traditional OWL syntax.
[41]
(PDF) A comparison of three controlled natural languages for OWL 1.1
At OWLED2007 a task force was formed to work towards a common Controlled Natural Language Syntax for OWL 1.1. In this pa-per members of the task force compare ...
[42]
[PDF] First-Order Reasoning for Attempto Controlled English
The power and the limitations of RACE are demonstrated and discussed by concrete examples. Keywords: controlled natural language, Attempto Controlled English, ...
[43]
[PDF] Talking to the Semantic Web – A Controlled English Query Interface ...
We address this problem by presenting a natural language front-end to semantic web querying. The front- end allows formulating queries in Attempto Controlled ...
[44]
[PDF] Controlled Natural Languages and Default Reasoning - arXiv
May 11, 2019 · According to Kuhn, “A controlled natural language is a constructed language that is based on a certain natural language, being more restrictive ...Missing: coined 1970s
[45]
(PDF) SBVR's Approach to Controlled Natural Language
Aug 7, 2025 · SBVR covers two aspects: Vocabulary (natural language ontology) and Rules (elements of guidance that govern actions). However, SBVR does not ...
[46]
ASD-STE100 HOME PAGE
ASD-STE100 is a controlled natural language and international standard for technical documentation, developed to make aircraft maintenance documentation easier ...STE Downloads · About STE · Stemg · Tools for STE
[47]
ASD-STE100 Standard - Complete Guide to Simplified Technical ...
Technical documents written in ASD-STE100 are read 40% faster than traditional technical writing, improving productivity across all teams. 60%. Translation ...
[48]
Improving quality of maintenance through Simplified Technical English
Aug 6, 2025 · Technical English, Specification ASD-STE100, Brussels. ASD ... To reduce the probability of introducing undetected maintenance errors and ...
[49]
[PDF] Integrating FRET with Copilot: Automated Translation of Natural ...
Jan 1, 2022 · This report presents an end-to-end framework to capture requirements in structured natural language and generate monitors that capture their ...
[50]
(PDF) Controlled Natural Language Generation from a Multilingual ...
Aug 7, 2025 · In this paper we report on our ongoing work in the EU project Multilingual Online Translation (MOLTO), supported by the European Union ...
[51]
A Controlled Natural Language Approach for Integrating ...
This approach is based upon a controlled natural language for requirements specification, supporting the automatic extraction and verification of requirements ...
[52]
Natural language processing-enhanced extraction of SBVR ...
This section introduces the principles of NLP-enhanced extraction of SBVR business vocabulary concepts and business rules from UML use case diagrams. The ...
[53]
[PDF] Controlled Language for Multilingual Machine Translation
However, if a controlled lan- guage becomes too restrictive, it may introduce us- ability and productivity problems.
[54]
[PDF] Controlled Natural Languages for Knowledge Representation and ...
CNLs are unambiguous and simple as opposed to their base languages. They preserve the expressiveness and coherence of natural languages.
[55]
[PDF] The Virtuous Circle of Natural Language for Access Control Policy ...
Jul 15, 2008 · We consider ways in which resource owners' natural expertise can be engaged, and we show that controlled natural language has been used in ...
[56]
When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
### Summary of CNL-P and LLM Integration
[57]
(PDF) Language Models as Controlled Natural Language Semantic ...
We propose the use of controlled natural language as a target for knowledge graph question answering (KGQA) semantic parsing via language models.
[58]
Can LLMs Help Create Grammar?: Automating Grammar Creation for Endangered Languages with In-Context Learning
### Summary: LLMs in Grammar Generation for Controlled/Natural Languages
[59]
[PDF] Language Models as Controlled Natural Language Semantic ...
Nov 3, 2023 · Our research hy- pothesis is that using controlled natural language is more suitable than a formal query language for knowledge graphs, since ...
[60]
[PDF] 2021.cnl-1.pdf - ACL Anthology
Apr 24, 2021 · Facilitating the application of Controlled Natural Language (CNL) to ... A quality evaluation framework for a CNL for agile law execution.
[61]
On systematically building a controlled natural language for ... - NIH
On systematically building a controlled natural language for functional requirements · Abstract · Introduction · Background and related work · Qualitative study.