Fact-checked by Grok 2 weeks ago

Natural language understanding

Natural language understanding (NLU) is a subfield of (NLP) within that enables machines to interpret and derive meaning from human language, encompassing semantic analysis to grasp intent, context, and relationships, as well as pragmatic elements to account for usage in real-world scenarios. This process addresses the inherent ambiguities in language, such as multiple word senses or contextual dependencies, allowing systems to perform tasks like extracting structured information from unstructured text. Originating from early AI efforts, including Alan Turing's 1950 conceptualization of machine intelligence through language interaction and systems like in the 1960s, NLU has evolved to incorporate lexical resources such as for sense disambiguation. Key components of NLU include word sense disambiguation (WSD), which resolves polysemous words using contextual clues—achieving over 70% accuracy with (LSTM) models—and semantic role labeling (SRL), which identifies roles like agent or patient in sentences via frameworks such as PropBank or . Other core tasks encompass semantic parsing, mapping natural language to formal representations like Abstract Meaning Representation (AMR) for reasoning, and recognizing textual entailment (RTE), determining if one text implies another. Challenges persist in handling incomplete information, requiring integration of background knowledge (e.g., commonsense facts) and contextual data, as well as scaling to multilingual or low-resource settings. Advancements in large language models (LLMs), such as those based on the architecture, have transformed NLU by shifting paradigms toward generative approaches, where tasks like , , and dialogue state tracking are handled via zero-shot or few-shot prompting without task-specific . For instance, LLMs excel in and multi-intent detection by leveraging pre-trained knowledge, outperforming traditional supervised methods in generalization across domains. However, limitations remain in numerical reasoning, mitigation, and factual accuracy, prompting ongoing into multimodal integration and efficient reasoning mechanisms.

Fundamentals and Scope

Definition and Objectives

Natural language understanding (NLU) is a subfield of (NLP) within focused on enabling computers to interpret and comprehend the meaning, intent, and context behind human language inputs, going beyond surface-level syntactic processing to grasp semantic and pragmatic nuances. Unlike basic tasks that handle tokenization or , NLU emphasizes deriving actionable insights from unstructured text or speech, such as resolving ambiguities and inferring implied relationships. The core objectives of NLU include intent recognition, which identifies the user's goal or purpose in a communication; extraction, which locates and classifies key elements like names, dates, or locations within the input; relation inference, which determines connections between entities (e.g., "causes" or "part of"); and understanding, which analyzes how sentences cohere into larger narratives or conversations. These objectives aim to bridge the gap between human expressive flexibility and machine rigidity, allowing systems to respond appropriately in real-world scenarios. Representative NLU tasks illustrate these objectives in practice, such as , where systems retrieve and synthesize relevant information to respond to queries; , which discerns emotional tone or opinion from text; and dialogue management, which maintains context across multi-turn interactions in chatbots or virtual assistants. For instance, in , NLU might classify a product review as "positive" by integrating word meanings with contextual sarcasm detection. Early systems like Terry Winograd's SHRDLU (1970) demonstrated capabilities in interpreting English commands for block manipulation in a virtual world. NLU overlaps with the broader field of natural language processing as a specialized component dedicated to meaning interpretation rather than general text manipulation.

Relation to Natural Language Processing

Natural Language Processing (NLP) serves as the overarching discipline within and that enables computers to process and analyze human , encompassing a wide range of tasks from syntactic analysis to semantic interpretation and pragmatic inference. Within this framework, Natural Language Understanding (NLU) constitutes a specialized subfield focused on achieving deeper comprehension of meaning, , and , going beyond mere structural manipulation to infer human-like understanding. This positioning highlights NLU's role in bridging raw data to actionable insights, such as recognizing in conversational systems. A primary distinction lies in the scope and depth of processing: NLP incorporates foundational, surface-level operations like tokenization, part-of-speech tagging, and named entity recognition, which prepare text for further analysis without necessarily extracting underlying semantics. In contrast, NLU builds upon these by emphasizing semantic and pragmatic layers to derive meaning, resolving ambiguities and contextual nuances that surface tasks overlook. For instance, while NLP might segment a sentence into words and tags, NLU interprets the implied intent, such as distinguishing a query for information from a command. Despite these differences, NLU and NLP exhibit strong synergies, with NLU heavily dependent on pipelines for preprocessing to ensure efficient input handling. This integration allows NLU systems to leverage standardized tools for initial data structuring before applying comprehension models. Over time, the relationship has evolved significantly through advancements in , where end-to-end architectures like transformers enable seamless incorporation of preprocessing directly into NLU tasks, reducing reliance on modular pipelines and improving overall performance on benchmarks such as GLUE. This shift has unified the fields, allowing modern NLU to process raw holistically for enhanced accuracy in applications like dialogue systems.

Historical Development

Early Foundations (1950s–1980s)

The origins of natural language understanding (NLU) in the were closely tied to the emergence of (AI), with foundational ideas exploring whether machines could simulate human-like comprehension of language. Alan Turing's 1950 paper introduced , later known as the , as a criterion for machine intelligence, positing that a computer could be deemed to "think" if it could engage in a text-based conversation indistinguishable from a human's, thereby highlighting the need for systems to process and generate natural language convincingly. Concurrently, early efforts in laid practical groundwork for NLU; the 1954 Georgetown-IBM experiment demonstrated the first public automatic translation of 60 Russian sentences into English using a on an computer, successfully translating simple chemical and mathematical phrases in a controlled domain, though it underscored the limitations of direct word-for-word mapping without deeper semantic grasp. In the and , symbolic AI approaches advanced NLU through structured representations of and , emphasizing rule-based systems within constrained environments. Joseph Weizenbaum's , developed in 1966 at , was an early that used and scripted responses to simulate in a Rogerian psychotherapist style, demonstrating basic for but relying on keyword recognition without true understanding. Terry Winograd's SHRDLU system, developed between 1968 and 1970 at , exemplified this by enabling a computer to understand and execute English commands in a simulated "," where it parsed sentences, maintained a model of the scene, and inferred actions like "pick up a big red block," achieving robust in this limited domain through procedural semantics and grammatical rules. This work demonstrated how explicit programming of syntax and semantics could resolve referential ambiguities, such as identifying objects based on descriptions, but relied heavily on the system's predefined world model to avoid broader linguistic variability. By the 1980s, knowledge representation techniques further refined NLU by incorporating structured schemas to handle inference and context. Marvin Minsky's 1974 framework of "," expanded in subsequent research during the decade, proposed organizing into hierarchical data structures with default values and slots for situational understanding, such as filling in expectations for a "" scenario to interpret incomplete descriptions in language input. Similarly, Roger Schank's , introduced in 1972 and extended through scripts in the late , modeled understanding as parsing text into primitive acts (e.g., ATRANS for transferring possession) linked in causal chains, with scripts as stereotypical sequences to predict and fill gaps in narratives, like inferring steps in a "restaurant script" from partial user queries. These early systems primarily addressed key NLU challenges, such as lexical and , through hand-crafted logical rules and semantics confined to narrow domains, enabling precise resolution in toy worlds but revealing scalability issues for open-ended . This symbolic paradigm paved the way for later shifts toward statistical methods in the .

Key Milestones (1990s–Present)

The 1990s heralded the rise of statistical natural language processing (NLP), marking a pivotal shift from rigid rule-based systems to probabilistic, data-driven methods that enhanced natural language understanding (NLU) by modeling linguistic uncertainties more effectively. Hidden Markov Models (HMMs) emerged as a cornerstone for foundational NLU tasks like part-of-speech (POS) tagging, where they probabilistically assigned tags to words based on sequential dependencies in annotated corpora, achieving accuracies exceeding 95% on standard datasets such as the Penn Treebank. This statistical paradigm influenced broader NLU components, including syntactic analysis, by enabling scalable training on large text corpora. A landmark contribution was 's statistical machine translation framework, which introduced alignment models (IBM Models 1–5) to estimate translation probabilities from bilingual data, laying the groundwork for probabilistic semantic alignment in NLU tasks like cross-lingual inference. In the 2000s, corpus-based lexical resources propelled semantic interpretation in NLU, providing structured knowledge bases for disambiguating word meanings and roles. , originally conceptualized in the late 1980s, underwent major expansions during this decade—reaching in 2000 and 3.0 in 2006—with over 117,000 synsets linking nouns, verbs, adjectives, and adverbs through semantic relations like hypernymy and meronymy, facilitating applications in semantic parsing and . Complementing this, , initiated in 1997 at UC Berkeley, advanced (SRL) by annotating sentences with frame-semantic structures that evoke event scenarios, enabling systems to identify who did what to whom; a key methodological breakthrough came with supervised approaches for automatic SRL, attaining an F1 score of around 63% for the full task on FrameNet corpora. The 2010s saw the revolution transform NLU through neural architectures that learned hierarchical representations from vast unlabeled data. , introduced in 2013, popularized distributed word embeddings by training shallow neural networks on skip-gram or continuous bag-of-words objectives, capturing semantic analogies (e.g., king - man + woman ≈ queen) in low-dimensional vectors and boosting downstream NLU performance in tasks like . (seq2seq) models, adapted from to dialogue systems around 2015, used encoder-decoder LSTMs to generate coherent responses from input contexts, pioneering end-to-end trainable conversational agents with applications in chatbots. The 2020s amplified these advances with massive pre-trained language models, emphasizing contextual and generative NLU at unprecedented scales. BERT, launched in 2018, pioneered bidirectional transformer-based pre-training on masked language modeling and next-sentence prediction, yielding superior contextual understanding and topping the GLUE benchmark with an average score of 80.5%. The GPT series progressed rapidly—GPT-1 in 2018 establishing autoregressive pre-training, GPT-3 in 2020 scaling to 175 billion parameters for few-shot NLU across reasoning and generation, and GPT-4 in 2023 introducing multimodal capabilities with enhanced accuracy on complex tasks like visual question answering. SuperGLUE, released in 2019 as a more rigorous extension of GLUE, incorporated harder tasks like coreference resolution and causal reasoning, driving model improvements and revealing persistent gaps in robust NLU, with top scores reaching around 90% by 2021.

Core Components

Syntactic Parsing

Syntactic parsing, a core component of natural language understanding, involves the computational analysis of a sentence to uncover its grammatical structure, typically represented as a hierarchical or a that illustrates relationships between words and phrases. This process breaks down the linear sequence of words into constituents or dependencies, identifying elements such as phrases, phrases, and their syntactic roles without regard to meaning. By resolving structural ambiguities—such as attachment decisions in phrases like " saw with the "—syntactic parsing establishes the foundational framework for higher-level language processing. Key techniques in syntactic parsing include context-free grammars (CFGs), which formalize sentence structures using production rules where non-terminals expand into sequences of terminals and non-terminals, as originally proposed by in his theory of . Probabilistic context-free grammars (PCFGs) extend CFGs by incorporating probabilities for each rule, allowing parsers to select the most likely structure from multiple possibilities based on training data from corpora. Dependency parsing, an alternative approach, models as directed edges between words, emphasizing head-dependent relations rather than phrase boundaries, and has gained prominence for its efficiency in handling free-word-order languages. Prominent algorithms for syntactic encompass the , introduced by Jay Earley in 1970, which employs dynamic programming to handle any CFG in O(n^3) time complexity and is particularly effective for ambiguous grammars. Shift-reduce parsing, a bottom-up method, incrementally builds the by shifting words onto a and reducing them according to grammar rules, enabling linear-time processing in deterministic variants and forming the basis for many practical systems. Parser performance is commonly evaluated using PARSEVAL metrics, developed by Black et al. in 1991, which compute over constituents while bracketing crossing errors to assess structural accuracy without penalizing minor labeling discrepancies. In natural language understanding, syntactic parsing is essential as it resolves ambiguities in and phrase s, providing a clear hierarchical or relational scaffold that enables the attachment of semantic roles and meanings in subsequent processing stages.

Semantic Interpretation

Semantic in natural language understanding (NLU) refers to the process of deriving the literal meaning of a from its syntactic , focusing on how words and phrases contribute to overall propositional content. This stage transforms parsed syntactic representations into semantic ones, capturing relationships between entities, events, and attributes without considering extralinguistic context or speaker intent. It relies on syntactic inputs as prerequisites, such as dependency trees or constituency parses, to identify meaningful units and their connections. Key processes in semantic interpretation include word sense disambiguation (WSD), named entity recognition (NER), and semantic role labeling (SRL). WSD resolves ambiguity in polysemous words by selecting the most appropriate sense based on contextual evidence, as pioneered by the Lesk algorithm, which measures overlap between a word's dictionary definitions and its surrounding context to achieve disambiguation. For example, in the sentence "The bank was flooded after the storm," WSD distinguishes the financial institution sense from the riverbank sense using co-occurring terms like "flooded" and "storm." NER identifies and classifies proper nouns or noun phrases into predefined categories such as persons, organizations, or locations, enabling the recognition of referential expressions; early maximum entropy models for NER integrated diverse features like part-of-speech tags and capitalization to label entities in text. SRL assigns thematic roles (e.g., agent, patient, theme) to constituents relative to a predicate verb, delineating who does what to whom; the PropBank framework annotates corpora with such roles to support machine learning systems that predict argument structures from parses. Semantic representations often employ formalisms like for compositional meaning assembly, as in , where expressions are translated into using lambda abstractions to handle quantification and predicate modification systematically. For instance, the phrase "every dog chases a cat" is composed by applying lambda terms to build higher-order functions that denote generalized quantifiers over predicates. Alternatively, represents predicate-argument structures through atomic formulas, where predicates denote relations and arguments fill roles, facilitating logical inference over events; in SRL, this manifests as formulas like chase(dog, cat), capturing the core propositional content derived from syntactic dependencies. A prominent technique in semantic interpretation is distributional semantics, which models word meanings in vector spaces based on the hypothesis that linguistic items appearing in similar contexts share semantic properties. Vectors are constructed from co-occurrence statistics in corpora, allowing computational estimation of relatedness; for WSD, cosine similarity measures the angle between context vectors for candidate senses, with the formula \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}| \, |\mathbf{B}|} selecting the sense whose vector aligns most closely with the sentence context, as in comparing embeddings for ambiguous terms against distributional profiles. Challenges in semantic interpretation arise from phenomena like polysemy, where a word has multiple related senses, and hyponymy, involving hierarchical subtype relations that complicate generalization. Resources such as address these by organizing senses into synsets linked by hypernymy (e.g., "" as a hyponym of "animal"), providing lexical knowledge for disambiguation and role assignment while modeling semantic hierarchies to mitigate in interpretation.

Pragmatic and Contextual Analysis

Pragmatics in natural language understanding (NLU) extends beyond literal semantic meaning to incorporate the intended implications, assumptions, and social functions of utterances in context. Central to this is the concept of , where speakers convey meaning indirectly through adherence to or flouting of conversational principles, as outlined in Grice's and its four maxims: quantity (provide sufficient but not excessive information), quality (be truthful), relation (be relevant), and manner (be clear and orderly). These maxims enable NLU systems to infer unspoken intentions, such as or , by modeling how rational communicators balance informativeness and efficiency in dialogue. , another key pragmatic element, involves background assumptions that utterances take for granted, like the existence of a unique in "The king of is bald," which NLU must detect to avoid misinterpretation. Speech acts, formalized by Searle, classify utterances by their performative force—such as asserting, questioning, or directing—allowing NLU to discern intent from form, for instance, interpreting "Can you pass the salt?" as a request rather than a query about ability. Key techniques for pragmatic analysis include coreference resolution, which links pronouns or noun phrases to their antecedents across sentences, enabling coherent discourse interpretation. Discourse analysis builds on this by examining relations between utterances, such as elaboration or contrast, to construct a global structure that reveals argumentative flow or narrative progression in text. Belief modeling further refines pragmatics by representing the speaker's or hearer's mental states, including desires and knowledge, to predict inferences like irony or deception based on inconsistent beliefs. In conversational AI, contextual models like dialogue state tracking maintain a dynamic representation of user goals, slot values (e.g., date or location), and prior turns to handle multi-turn interactions effectively. For example, anaphora resolution often employs salience models, such as those in centering theory, which rank entities by attentional focus—prioritizing subjects or recent mentions—to resolve pronouns like "it" in "John entered the room. It was dark," linking "it" to "the room" based on discourse prominence. These models draw briefly from semantic foundations for entity identification but emphasize evolving context over static meanings. Integration with world knowledge enhances pragmatic inference through structured ontologies, exemplified by the project, which encodes millions of commonsense axioms to support reasoning about everyday scenarios, such as inferring that "John stopped running" presupposes he was running previously. 's hierarchical facilitates abduction-like inferences, bridging utterance context to broader real-world plausibility in NLU applications.

Architectures and Techniques

Rule-Based Approaches

Rule-based approaches to natural language understanding (NLU) rely on hand-crafted grammars, ontologies, and inference engines to enable deterministic processing of language inputs. These systems employ explicit linguistic rules derived from formal grammars to parse syntax and map it to semantic representations, often using symbolic structures like semantic networks or frames stored in ontologies to represent . Inference engines then apply logical rules to derive meaning, ensuring predictable outputs based on predefined procedures rather than learned patterns. This methodology, prominent in early research, emphasizes transparency and control, allowing developers to encode expert linguistic and domain-specific knowledge directly into the system. A seminal historical example is the LUNAR system, developed in the 1970s, which facilitated question answering over a database of lunar rock samples. LUNAR used an Augmented Transition Network (ATN) parser augmented with procedural semantics to interpret natural language queries, such as "How much aluminum oxide is there in the high titanium basalts?", and generate database retrievals with high accuracy in its restricted geology domain. Similarly, formalisms like Definite Clause Grammars (DCGs), introduced in the late 1970s, extended context-free grammars within logic programming frameworks to handle both syntactic parsing and semantic interpretation through declarative rules. DCGs, implemented in Prolog, allowed for efficient top-down parsing with built-in support for context-dependent computations, making them a cornerstone for rule-based NLU prototypes. These approaches excel in providing high within narrow, well-defined domains, where exhaustive rule coverage ensures reliable without the need for large training data. For instance, LUNAR achieved near-perfect on controlled queries due to its tailored and , enabling deep semantic analysis in specialized expert systems. However, their limitations include poor scalability to broader language use, as expanding rule sets becomes labor-intensive and error-prone for handling the vast variability of . Additionally, they exhibit brittleness in resolving , such as polysemous words or elliptical constructions, since deterministic rules lack mechanisms for probabilistic . The decline of pure rule-based NLU stemmed from these inherent constraints, particularly their inability to generalize beyond hand-engineered domains without incorporating probabilistic elements to manage linguistic ambiguity and real-world variability. By the late 1980s, the maintenance burden of ever-growing rulebases and failures in open-ended scenarios led to a , though rule-based components persisted in hybrid systems for targeted .

Statistical and Probabilistic Methods

Statistical and probabilistic methods in natural language understanding (NLU) rely on data-driven approaches to model the inherent uncertainties and ambiguities in , estimating probabilities from large corpora to predict structures and meanings. These methods emerged as alternatives to rule-based systems, leveraging to handle variability in , semantics, and . By treating as a probabilistic process, they enable robust over possible interpretations, such as resolving ambiguities or sequence predictions. Foundational to these techniques is , which addresses ambiguity resolution by computing posterior probabilities over possible interpretations given observed data. For instance, in , Bayesian methods calculate the likelihood of a sense given contextual evidence, updating priors based on corpus statistics to select the most probable resolution. This approach underpins tasks like , where ambiguities in word categories are resolved through conditional probabilities derived from training data. N-gram models further support sequence probability estimation, approximating the likelihood of word sequences under the Markov assumption that the probability of a word depends only on a limited preceding context. Developed from early work in , n-grams use maximum likelihood estimates from corpora, such as P(w_n | w_{n-1}) = \frac{C(w_{n-1} w_n)}{C(w_{n-1})}, to model fluency and predict continuations, with smoothing techniques mitigating sparse data issues. Key methods include conditional random fields (CRFs), which excel in structured prediction tasks like (NER) by modeling conditional probabilities over label sequences given input observations. Introduced as an extension of probabilistic graphical models, CRFs avoid the independence assumptions of hidden Markov models, incorporating diverse features such as word shapes and surrounding contexts to achieve high accuracy; for example, on the CoNLL-2003 English NER dataset, CRFs outperformed generative alternatives. Maximum entropy models, another cornerstone, estimate probabilities by maximizing entropy subject to observed constraints, enabling flexible feature integration for tasks like and translation disambiguation. Seminal work demonstrated their efficacy in bilingual sense disambiguation, improving accuracy over simpler models by weighing contextual features via iterative scaling. A representative application is the for intent detection in dialogue systems, which computes the probability of an class c given a d as: P(c|d) = \frac{P(d|c) P(c)}{P(d)} Here, P(d|c) is the likelihood of the input features under the class, often assuming feature independence, while P(c) is the prior; this simplifies computation and performs well on sparse data. Advancements in these methods trace from 1990s applications in , where n-grams and enhanced error correction, to 2000s via the noisy channel model. This model posits translation as recovering a source sentence S from a noisy observed target T, maximizing P(S|T) ∝ P(T|S) P(S) using and fertility parameters estimated via from parallel corpora, yielding pioneering results on held-out data and establishing as a dominant .

Neural Network-Based Systems

Neural network-based systems represent a significant shift in natural language understanding (NLU), emphasizing connectionist models that process through layered, distributed representations rather than explicit rules or probabilities. Emerging in the , recurrent neural networks (RNNs) were adapted for sequential data like text, enabling models to capture dependencies in modeling tasks. A seminal advancement came with Tomas Mikolov's 2010 work on RNN-based models, which demonstrated substantial reductions in —up to 50% compared to n-gram models—on benchmarks by learning continuous representations of words and contexts. These models built briefly on statistical methods by leveraging large corpora for training but introduced dynamic hidden states to handle variable-length inputs, paving the way for deeper integration in NLU. To mitigate the limitations of standard RNNs, (LSTM) units were introduced in 1997 by and , incorporating gating mechanisms to regulate information flow and preserve long-term dependencies in sequences. LSTMs became foundational in 2000s–2010s NLU applications, such as and , where they outperformed earlier architectures by maintaining context over longer spans, as evidenced by improved accuracy on datasets like ATIS for intent classification. Word embeddings further enhanced these systems; the GloVe model, developed in 2014 by Jeffrey Pennington and colleagues at Stanford, generated static vector representations from global word co-occurrence statistics, capturing semantic relationships like analogies (e.g., "" - "" + "" ≈ "") with lower dimensionality than prior methods. Attention mechanisms augmented RNN and LSTM architectures by allowing models to weigh relevant parts of input sequences dynamically, introduced in 2014 by Dzmitry Bahdanau, Kyunghyun Cho, and for . This basic form computes alignment weights as \alpha_t = \softmax(e_t), where e_t represents compatibility scores between input and output positions, enabling better handling of variable alignments without fixed encoding bottlenecks. In practice, end-to-end NLU pipelines using these components powered early neural chatbots, such as Microsoft's launched in 2014, which employed LSTM-based models for empathetic responses, achieving over 40% engagement rates in user interactions by directly mapping user queries to replies. Despite these advances, neural network-based systems faced inherent challenges, including the identified by Hochreiter in 1991, where gradients diminish exponentially during through time, hindering learning of long-range dependencies in extended sequences. LSTMs alleviated this partially through forget and input gates but struggled with very long contexts, limiting scalability in complex NLU tasks like multi-turn dialogue until further innovations.

Modern Advances and Applications

Transformer Models and Large Language Models

The transformer architecture, introduced in 2017, revolutionized natural language understanding by replacing recurrent neural networks with a mechanism centered on self-attention, enabling parallel processing of sequences and capturing long-range dependencies more effectively. This model relies on the self-attention operation, defined as \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) V, where Q, K, and V represent query, key, and value matrices derived from input embeddings, and d_k is the dimension of the keys, allowing the model to weigh the importance of different parts of the input dynamically. Transformers consist of encoder-decoder stacks with multi-head attention and feed-forward layers, achieving state-of-the-art results on machine translation tasks while being computationally efficient for training on large datasets. Key developments building on this foundation include , which introduced bidirectional pre-training for contextual embeddings in 2018, allowing the model to understand words by considering both preceding and following context simultaneously through masked language modeling. In contrast, the GPT series, starting with in 2018, employs autoregressive generation via unidirectional pre-training on vast text corpora, enabling coherent text production and downstream adaptation for tasks like summarization and . Multimodal extensions, such as CLIP in 2021, extend transformer principles to align visual and textual representations through contrastive learning on image-text pairs, facilitating zero-shot transfer to vision-language tasks without task-specific . These models have profoundly impacted NLU by supporting , where capabilities emerge from pre-training without explicit examples, and for specialized tasks like or entity recognition, often yielding accuracies exceeding 90% on benchmarks such as GLUE. Scaling laws, exemplified by the hypothesis in 2022, demonstrate that optimal performance arises from balancing model size and training data volume—for instance, a 70-billion-parameter model trained on 1.4 trillion tokens outperforms larger counterparts under equivalent compute budgets, achieving 67.5% on the MMLU benchmark. By 2024, advancements like OpenAI's o1 model integrated internal reasoning chains during inference, using to simulate step-by-step thought processes, enhancing complex problem-solving in NLU applications such as logical inference and multi-hop reasoning. As of 2025, ongoing trends include improved multilingual NLU with better cultural context awareness and advancements in language agents for interactive systems.

Real-World Implementations

Natural language understanding (NLU) powers virtual assistants like , which has utilized intent recognition since its launch in 2014 to interpret user queries and generate appropriate responses, such as controlling smart home devices or providing information. Similarly, , introduced in 2016, employs NLU for intent-based interactions, enabling features like contextual follow-up questions in conversations across devices. These systems process spoken or typed inputs to discern user intentions, such as booking appointments or playing music, with ongoing improvements in handling ambiguous requests through domain-specific training. In search and recommendation systems, NLU enhances semantic understanding to deliver more relevant results. Google's integration of in 2019 revolutionized its by improving the comprehension of query context, affecting approximately 10% of English search queries in the and enabling better handling of nuanced searches like distinguishing "jaguar" as an animal versus a car brand. Netflix applies NLU to analyze user-generated text, such as reviews and search queries, alongside content metadata to refine personalized recommendations, incorporating to interpret preferences like "thrilling sci-fi" for matching titles. NLU finds critical applications in healthcare through clinical named entity recognition (NER) in electronic health records (EHRs), where it identifies and extracts entities like medications, symptoms, and diagnoses from unstructured clinical notes to support decision-making and reduce errors. For instance, systems using NLU-based NER achieve F1-scores above 0.85 in extracting problems and treatments from EHRs, aiding in automated summarization and patient monitoring. In finance, NLU detects fraud by analyzing anomalies in transaction dialogues and customer communications, such as flagging unusual patterns in email or chat descriptions of transfers to prevent money laundering. This approach models user spending profiles via natural language cues, identifying deviations that indicate potential fraud with precision rates exceeding 90% in controlled evaluations. Evaluation of NLU implementations relies on task-specific metrics adapted to domain needs. For NER tasks in healthcare or , the F1-score balances to assess entity extraction accuracy, often domain-tuned to handle specialized terminology like medical codes. In translation-influenced NLU components, such as multilingual intent detection, measures output fluency and adequacy against references, though it is supplemented by human judgments for contextual fidelity. models have enabled scalable NLU in these deployments by providing robust contextual embeddings.

Challenges and Future Directions

Persistent Limitations

Despite significant advances, natural language understanding (NLU) systems face persistent challenges in resolving linguistic ambiguity and avoiding hallucinations, where models produce outputs that appear coherent but are factually unsupported. In large language models (LLMs) like GPT-3.5, this manifests as the generation of false inferences, such as fabricating details or relationships not present in the input or training data. A 2023 study evaluating ChatGPT's reference generation found that GPT-3.5 hallucinated non-existent citations in over 30% of responses across various prompts, highlighting the model's propensity to confabulate authoritative-sounding but erroneous information. These issues undermine the reliability of NLU in applications requiring factual accuracy, such as or . Bias amplification in NLU arises when demographic imbalances in training data are intensified during model optimization, leading to discriminatory outputs. For example, LLMs trained on internet-scale text often perpetuate gender or racial stereotypes, with the model's inference process exacerbating subtle biases present in the source material. Fairness metrics like demographic parity, which quantifies the absolute difference in positive prediction rates between protected groups (e.g., male vs. female), have shown disparities in sentiment analysis tasks across demographic subsets. A survey of bias in LLMs notes that this amplification occurs not only from data but also from architectural choices, such as attention mechanisms that prioritize majority-group patterns, resulting in lower performance equity for underrepresented demographics in tasks like coreference resolution. Robustness gaps further limit NLU systems, particularly their susceptibility to adversarial attacks that subtly alter inputs to provoke incorrect interpretations. In , white-box attacks exploiting model gradients can achieve high success rates in causing misclassifications on intent recognition tasks, demonstrating how minor synonym substitutions or paraphrases evade defenses. Black-box scenarios, where attackers have no internal model access, still yield high attack accuracies on commercial NLU APIs through query-based perturbations. Compounding this, low-resource language handling remains inadequate, as NLU models perform poorly on the over 7,000 low-resource languages lacking sufficient annotated data, due to data scarcity and cross-lingual transfer limitations. Ethical concerns in NLU are pronounced in privacy risks tied to dialogue data, where conversational corpora used for training often include personal identifiers or sensitive exchanges without adequate anonymization. Human-sourced dialogue datasets for NLU, such as those from virtual assistants, raise issues of inadvertent data leakage, as models can memorize and regurgitate private details from training interactions. This vulnerability is heightened in home-based dialog systems, where utterances may contain health or location information, potentially violating user regulations like GDPR. Additionally, explainability deficits in black-box NLU models, typically neural networks with billions of parameters, obscure the reasoning behind outputs, making it impossible for users or regulators to trace errors or biases. Studies emphasize that post-hoc methods, such as attribution, often fail to capture true causal mechanisms, leading to misleading interpretations and reduced trust in high-stakes deployments like legal or medical NLU. Recent advancements in natural language understanding (NLU) increasingly emphasize multimodal integration, where vision-language models (VLMs) combine textual and visual inputs to achieve grounded comprehension of language in context. Models like Flamingo, introduced in 2022, exemplify this trend by enabling few-shot learning across interleaved image-text tasks, such as visual question answering and captioning, which enhance NLU by anchoring linguistic meaning to perceptual data. Building on this, 2024 surveys highlight the evolution of VLMs toward more robust architectures, including those incorporating cross-modal attention mechanisms to improve semantic alignment between vision and language, thereby addressing limitations in purely textual NLU systems. For instance, extensions like Flamingo-CXR demonstrate practical applications in domain-specific NLU, such as generating interpretive reports from chest radiographs by fusing visual diagnostics with natural language descriptions. Another prominent trend involves neuro-symbolic hybrids, which merge neural networks with symbolic logic to enable verifiable reasoning in NLU tasks that require structured inference, such as coreference resolution and entailment. These approaches leverage differentiable to make symbolic rules trainable via , allowing neural components to learn probabilistic patterns while symbolic layers enforce logical consistency. A 2024 systematic review of underscores significant progress in integrating these hybrids for language tasks, noting improved explainability and robustness over purely neural methods, particularly in handling compositional semantics. For example, the framework (2024) applies differentiable forward reasoning via graph-based to abstract visual-linguistic reasoning, achieving verifiable outcomes in NLU benchmarks by combining neural embeddings with logical . Efficiency and sustainability have emerged as critical foci in NLU , driven by the computational demands of large models, with techniques like model and enabling deployment on edge devices. Model methods, including and quantization, reduce LLM parameters while preserving NLU performance on tasks like intent recognition, as detailed in a 2024 survey that evaluates their impact on speed and use. Complementing this, facilitates collaborative training across distributed edge devices without centralizing sensitive data, promoting by minimizing carbon footprints associated with cloud-based training; a 2024 reports reductions in for NLU in mobile environments. These innovations, such as hierarchical federated frameworks with integrated , support real-time NLU applications on resource-constrained like smartphones. Efforts toward global inclusivity in NLU center on low-resource languages, utilizing from high-resource models and generation to bridge data scarcity gaps. Transfer learning approaches, such as those in the Kardeş-NLU benchmark (2024), adapt multilingual models to low-resource targets via intermediate pivots (e.g., Turkish for ), yielding accuracy gains of up to 17% in tasks like natural language inference for under-resourced dialects. generation further amplifies this by employing fine-tuned LLMs to create diverse training corpora; for instance, the MURI method (2025) produces high-quality instruction-tuning datasets for low-resource languages, improving zero-shot NLU performance by simulating varied linguistic patterns without native annotations. Combined, these techniques foster equitable NLU access, as evidenced by advancements in grammatical error correction for languages like using tagged corruption models for synthetic augmentation.

References

  1. [1]
    [PDF] Natural Language Understanding: Instructions for (Present and ...
    Natural Language Understanding (NLU) aims to make sense of text by enabling computers to read and comprehend text, involving semantic and pragmatic levels.
  2. [2]
    [PDF] Knowledge-Aware Natural Language Understanding - Pradeep Dasigi
    Natural Language Understanding (NLU) systems need to encode human gener- ated text (or speech) and reason over it at a deep semantic level. Any NLU system.
  3. [3]
  4. [4]
    What is Natural Language Understanding (NLU)? - IBM
    NLU is a subset of artificial intelligence (AI) that uses semantic and syntactic analysis to enable computers to understand human-language inputs.
  5. [5]
    Natural Language Understanding (NLU) Explained - DataCamp
    Sep 1, 2024 · NLU is a subfield of natural language processing (NLP) focused on enabling machines to understand the meaning, context, and intent of human language.Missing: seminal | Show results with:seminal
  6. [6]
    An Introduction to NLP (Natural Language Processing) | Oracle
    Sep 22, 2025 · NLP is a branch of artificial intelligence that enables computers to comprehend, generate, and manipulate human language. NLP applies to both ...
  7. [7]
    What is Natural Language Understanding (NLU)? - TechTarget
    Jul 29, 2024 · NLU is a branch of artificial intelligence (AI) that uses computer software to understand input in the form of sentences using text or speech.
  8. [8]
    The History of Artificial Intelligence - IBM
    1970. Terry Winograd creates SHRDLU, a groundbreaking natural language understanding program.13 SHRDLU can interact with users in plain English to manipulate ...
  9. [9]
    Speech and Language Processing
    Aug 24, 2025 · An introduction to natural language processing, computational linguistics, and speech recognition with language models, 3rd edition.
  10. [10]
    Natural-Language Understanding - an overview - ScienceDirect.com
    Natural language understanding (NLU) is defined as a complex subdomain of natural language processing (NLP) that focuses on comprehending human language through ...
  11. [11]
    GLUE: A Multi-Task Benchmark and Analysis Platform for Natural ...
    Apr 20, 2018 · We introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse ...
  12. [12]
    NLP vs. NLU vs. NLG: What's the Difference? | IBM
    At a high level, NLU and NLG are just components of NLP. In this post, we'll define each term individually and summarize their differences.
  13. [13]
    [PDF] COMPUTING MACHINERY AND INTELLIGENCE - UMBC
    A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433-460. COMPUTING MACHINERY AND INTELLIGENCE. By A. M. Turing. 1. The Imitation Game. I ...
  14. [14]
    [PDF] The Georgetown-IBM experiment demonstrated in January 1954
    Sep 28, 2004 · The public demonstration of a Russian-English machine translation system in New York in January 1954 – a collaboration of IBM and Georgetown ...
  15. [15]
    [PDF] in a computer program for understanding - DSpace@MIT
    PROCEDURES AS A REPRESENTATION FOR DATA. IN A COMPUTER PROGRAM FOR UNDERSTANDING. NATURAL LANGUAGE by. Terry Winograd. REFERENCE ONLY. DO NOT REMOVE. FROM ...
  16. [16]
    [PDF] A Framework for Representing Knowledge
    different question. - processing mechanisms to operate our low -level stereotypes and our most comprehensive strategic overviews . Page 19. 128. Marvin Minsky.
  17. [17]
    [PDF] Scripts, Plans, Goals, and Understanding - Colin Allen
    ... Conceptual. Dependency theory (Schank, 1972) to describe individual actions. There has been much debate over whether the conceptual primi- tives of CD theory ...
  18. [18]
    [PDF] Automatic Labeling of Semantic Roles
    We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame.
  19. [19]
    Efficient Estimation of Word Representations in Vector Space - arXiv
    Jan 16, 2013 · We propose two novel model architectures for computing continuous vector representations of words from very large data sets.
  20. [20]
    [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers ...
    Oct 11, 2018 · BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
  21. [21]
    GLUE Benchmark
    The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language ...SuperGLUE Benchmark · GLUE Diagnostic Dataset · Leaderboard · Tasks<|control11|><|separator|>
  22. [22]
    SuperGLUE: A Stickier Benchmark for General-Purpose Language ...
    May 2, 2019 · In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a ...
  23. [23]
    [PDF] Intelligent Parsing in Natural Language Processing - ACL Anthology
    I. INTRODUCTION: In the context of Natural Language Processing (NLP), parsing may be defined as the process of assigning structural description to sequence of ...Missing: understanding | Show results with:understanding
  24. [24]
    [PDF] Unsupervised Natural Language Parsing (Introductory Tutorial)
    Apr 20, 2021 · Syntactic parsing is an important task in natural language processing that aims to uncover the syn- tactic structure (e.g., a constituent or ...
  25. [25]
    [PDF] Accelerating and Evaluation of Syntactic Parsing in Natural ... - arXiv
    Mar 10, 2009 · Natural Language Processing (NLP) is one of the most important ... Syntactic Parsing is defined to generate a syntactic tree form a ...Missing: techniques | Show results with:techniques
  26. [26]
    [PDF] Chomsky-1957.pdf - Stanford University
    First edition published in 1957. Various reprints. Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and ...
  27. [27]
    [PDF] arXiv:1906.10225v9 [cs.CL] 29 Mar 2020
    Mar 29, 2020 · A probabilistic context-free grammar (PCFG) consists of a grammar G and rule probabilities π = {πr}r∈R such that πr is the probability of.
  28. [28]
    [PDF] Inductive Dependency Parsing - ACL Anthology
    Nivre proves that the parsing algorithm correctly performs the formalized dependency parsing task, producing an acyclic, single-headed, projective dependency ...
  29. [29]
    An efficient context-free parsing algorithm - ACM Digital Library
    A parsing algorithm which seems to be the most efficient general context-free algorithm known is described. It is similar to both Knuth's LR(k) algorithm ...
  30. [30]
    [PDF] Sentence Disambiguation by a Shift-Reduce Parsing Technique
    For natural language processing systems to be useful, they must assign the same interpretation to a given sentence that a native speaker would, since that is ...<|separator|>
  31. [31]
    A Procedure for Quantitatively Comparing the Syntactic Coverage of ...
    PARSEVAL (Black et al., 1991) has been the standard evaluation metric for constituency parsing in most scenarios, which takes the ground truth and predicted ...
  32. [32]
    [PDF] The Importance of Syntactic Parsing and Inference in Semantic Role ...
    Semantic parsing of sentences is believed to be an important task on the road to natural language understanding, and has immediate applications in tasks such as ...
  33. [33]
    [PDF] H. P. Grice Logic and Conversation
    "Logic and conversation", pp. 41-58,. (1975), with permission from Elsevier. This is a digital version of copyright material made under licence from the ...
  34. [34]
    [PDF] The Gricean Maxims in NLP - A Survey - ACL Anthology
    Sep 23, 2024 · In this paper, we provide an in-depth review of how the Gricean maxims have been used to develop and evaluate Natural Language Pro- cessing (NLP) ...
  35. [35]
    Speech Acts - Stanford Encyclopedia of Philosophy
    Jul 3, 2007 · Searle offers a new categorization of speech acts based on relatively clear principles of distinction. To appreciate this it will help to ...
  36. [36]
    [PDF] First published 1969 Reprinted I 969 - Daniel W. Harris
    4 Why study speech acts? J The principle of expressibiliry. 2. Expressions, meaning and speech acts. I.
  37. [37]
    [PDF] Coreference Resolution and Entity Linking - Stanford University
    These entity coreference resolution problems are designed to be too difficult to be solved by the resolution methods we describe in this chapter, and the kind ...
  38. [38]
    Discourse Analysis and Its Applications - ACL Anthology
    Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels.
  39. [39]
    [PDF] Natural Language Understanding (NLU, not NLP) in Cognitive ...
    Mainstream natural language processing (NLP) of the past 25 years has concentrated on the manipulation of text strings within the empir-.Missing: subset citation
  40. [40]
    “Do you follow me?”: A Survey of Recent Approaches in Dialogue ...
    A task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking ( ...
  41. [41]
    [PDF] ATTENTION, INTENTIONS, AND THE STRUCTURE OF DISCOURSE
    In this paper we explore a new theory of discourse structure that stresses the role of purpose and processing in discourse. In this theory, discourse ...
  42. [42]
    (PDF) CYC: Toward programs with common sense - ResearchGate
    Aug 6, 2025 · Cyc is a bold attempt to assemble a massive knowledge base (on the order of 108 axioms) spanning human consensus knowledge.
  43. [43]
    CYC: Using Common Sense Knowledge to Overcome Brittleness ...
    The recent history of expert systems, for example highlights how constricting the brittleness and knowledge acquisition bottlenecks are.
  44. [44]
    [PDF] A Survey on Hybrid Approaches to Natural Language Processing
    Hybrid models are often used to overcome the limitations of individual models via integration. ... Initially, rule-based approaches and sta- tistical ...
  45. [45]
    Progress in natural language understanding - ACM Digital Library
    The Lunar Sciences Natural Language Information System (which we will hereafter refer to as LUNAR) is a research prototype of a system to deal with this and ...Missing: William | Show results with:William
  46. [46]
    Definite clause grammars for language analysis—A survey of the ...
    This paper compares DCGs with the successful and widely used augmented transition network (ATN) formalism, and indicates how ATNs can be translated into DCGs.
  47. [47]
    [PDF] Talking to Computers in Natural Language
    We have seen early rule-based systems such as. LUNAR and SHRDLU perform relative- ly deep analyses of natural language, but only in narrow domains. We have also ...
  48. [48]
    [PDF] Recurrent Neural Network Based Language Model
    A new recurrent neural network based language model (RNN. LM) with applications to speech recognition is presented. Re- sults indicate that it is possible ...Missing: seminal | Show results with:seminal
  49. [49]
    (PDF) Recurrent Neural Networks for Language Understanding
    In this paper, we modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset. The core of our ...Missing: seminal | Show results with:seminal
  50. [50]
    [PDF] GloVe: Global Vectors for Word Representation - Stanford NLP Group
    Abstract. Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and.
  51. [51]
    [PDF] Neural Approaches to Conversational AI - Microsoft
    Jul 8, 2018 · Example: I made her duck. • I cooked waterfowl for her. • I cooked waterfowl belonging to her. • I created the plaster duck she owns.
  52. [52]
    [PDF] the vanishing gradient problem during learning recurrent neural nets ...
    The extremely increased learning time arises because the error vanishes as it gets propagated back. In this article the de- caying error ow is theoretically ...
  53. [53]
    [1706.03762] Attention Is All You Need - arXiv
    Jun 12, 2017 · Attention Is All You Need. Authors:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia ...
  54. [54]
    [PDF] Improving Language Understanding by Generative Pre-Training
    Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and.<|separator|>
  55. [55]
    Learning Transferable Visual Models From Natural Language ...
    Feb 26, 2021 · View a PDF of the paper titled Learning Transferable Visual Models From Natural Language Supervision, by Alec Radford and 11 other authors.
  56. [56]
    Training Compute-Optimal Large Language Models - arXiv
    Mar 29, 2022 · Abstract:We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
  57. [57]
    Learning to reason with LLMs | OpenAI
    Sep 12, 2024 · We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before ...Evals · Coding · Safety
  58. [58]
    What Is Natural Language Understanding? - Alexa Skills Kit Official ...
    Natural language understanding (NLU) is a technology topic that describes how computers deduce what speakers actually mean, not just what words they say.Missing: sources | Show results with:sources
  59. [59]
    An Observational Study of Siri, Alexa, and Google Assistant - PMC
    Sep 4, 2018 · Given the potential for harm by conversational assistants that use NLU for medical counseling, and the lack of risk analysis in the research ...
  60. [60]
    How our scientists are making Alexa smarter - About Amazon
    Mar 29, 2018 · Once the spoken audio has been converted to text, Alexa uses natural language understanding (NLU) to convert the words into a structured ...Missing: sources | Show results with:sources
  61. [61]
    [PDF] arXiv:1810.04805v2 [cs.CL] 24 May 2019
    May 24, 2019 · We introduce a new language representa- tion model called BERT, which stands for. Bidirectional Encoder Representations from. Transformers.
  62. [62]
    NLP & Conversations - Netflix Research
    We delve into the world of conversational recommendation systems, exploring how large language models (LLMs) can be leveraged to create more intuitive and ...
  63. [63]
    Named Entity Recognition in Electronic Health Records
    We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. Conclusions. EHRs play a pivotal ...
  64. [64]
    Clinical named entity recognition and relation extraction using ...
    63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were “problem”, “test” ...
  65. [65]
    [PDF] Fraud detection in telephone conversations for financial services ...
    To achieve this, a linguistic based approach using Natural Language Processing (NLP) techniques [4] can be used.
  66. [66]
    [PDF] A Natural Language Processing Approach \\ for Financial Fraud ...
    In this paper, we propose a novel approach to fraud detection based on Natural Language Processing models. We model the user's spending profile and detect ...<|separator|>
  67. [67]
    Clinical concept recognition: Evaluation of existing systems on EHRs
    Jan 12, 2023 · The goal of this research is to evaluate the performance of existing systems to retrieve relevant clinical concepts from EHRs.
  68. [68]
    Understanding BLEU and ROUGE score for NLP evaluation
    Jul 23, 2025 · The BLEU score is calculated by using the tokenized version of the reference and candidate texts, and the score is scaled to be a percentage ...
  69. [69]
    ChatGPT Hallucinates Non-existent Citations: Evidence from ...
    Nov 23, 2023 · (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730. Go to ...
  70. [70]
    Parity benchmark for measuring bias in LLMs | AI and Ethics
    Dec 17, 2024 · Bias in LLMs can arise from multiple sources, including biases present in the training data, biases encoded in the model architecture or ...
  71. [71]
    Bias and Fairness in Large Language Models: A Survey
    Model: The training or inference procedure itself may amplify bias, beyond what is present in the training data. The choice of optimization function, such as ...
  72. [72]
    A Survey of Adversarial Defenses and Robustness in NLP
    These methods aim to increase the robustness of neural networks by training them in an environment that simulates adversarial attacks or by adding mechanisms to ...
  73. [73]
    Adversarial natural language processing: overview, challenges, and ...
    Sep 22, 2025 · This paper explores attacks, defenses, and the growing role of Bayesian methods to improve robustness and decision-making. However, these ...
  74. [74]
    Opportunities and Challenges of Large Language Models for Low ...
    Sep 2, 2025 · The evolution of low-resource languages is a dynamic and complex process. It is shaped by factors such as the shrinking or migration of ...
  75. [75]
    [PDF] Creating Synthetic Dialogue Datasets for NLU Training - GUPEA
    Jun 20, 2024 · First of all, collecting human-sourced dialogues often involves privacy concerns. ... Previous approaches to generating synthetic dialogue data ...
  76. [76]
    Revisiting the Boundary between ASR and NLU in the Age of ...
    Apr 4, 2022 · This privacy concern is even more pressing when dealing with utterances that humans issue to dialog agents at home that contain personal ...
  77. [77]
    [1811.10154] Stop Explaining Black Box Machine Learning Models ...
    Nov 26, 2018 · The paper argues that trying to explain black box models is harmful and that instead, models should be designed to be inherently interpretable.<|control11|><|separator|>
  78. [78]
    Flamingo: a Visual Language Model for Few-Shot Learning - arXiv
    Apr 29, 2022 · Flamingo is a Visual Language Model (VLM) designed for few-shot learning, rapidly adapting to novel tasks with few examples. It handles ...Missing: NLU advancements 2024 2025
  79. [79]
    Collaboration between clinicians and vision–language models in ...
    Nov 7, 2024 · We build a state-of-the-art report generation system for chest radiographs, called Flamingo-CXR, and perform an expert evaluation of AI-generated reports.Missing: NLU advancements
  80. [80]
    Differentiable Logic Programming for Distant Supervision - arXiv
    Aug 22, 2024 · We introduce a new method for integrating neural networks with logic programming in Neural-Symbolic AI (NeSy), aimed at learning with distant supervision.Missing: NLU | Show results with:NLU
  81. [81]
    [PDF] Neuro-Symbolic AI in 2024: A Systematic Review - CEUR-WS
    Open research questions remain around how Neuro-Symbolic AI can develop scalable frameworks that integrate traditional logic programming with neural networks.
  82. [82]
    Learning differentiable logic programs for abstract visual reasoning
    Oct 26, 2024 · We propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), a graph-based approach for differentiable forward reasoning, sending messages in a ...
  83. [83]
    A Survey on Model Compression for Large Language Models
    This paper presents a survey of model compression techniques for LLMs. We cover methods like quantization, pruning, and knowledge distillation, highlighting ...
  84. [84]
    Federated and edge learning for large language models
    This survey explores the nuanced interplay between federated and edge learning for large language models (LLMs), considering the evolving landscape of ...
  85. [85]
    Efficient Model Compression for Hierarchical Federated Learning
    May 27, 2024 · This paper introduces a novel hierarchical FL framework that integrates the benefits of clustered FL and model compression.Missing: sustainability NLU 2025
  86. [86]
    Synthetic Data Generation for Low-resource Grammatical Error ...
    In this work, we demonstrate their application to four languages with substantially fewer GEC resources than English: German, Romanian, Russian, and Spanish. We ...Missing: transfer learning