Fact-checked by Grok 2 weeks ago

Ontology learning

Ontology learning is the semi-automatic process of extracting conceptual knowledge—such as concepts, relations, and axioms—from unstructured or semi-structured data sources, particularly text, to construct or extend formal ontologies that represent domain-specific knowledge in a machine-readable format. An ontology itself is defined as "a formal, explicit specification of a shared conceptualization," enabling knowledge sharing and reuse across applications in fields like artificial intelligence and the Semantic Web. The field emerged in the late 1990s and early 2000s as a response to the "knowledge acquisition bottleneck" in , driven by the vision of the and the need for scalable methods to build large-scale knowledge bases from vast amounts of textual data. Early efforts focused on (NLP) techniques to address the labor-intensive manual construction of ontologies, with foundational workshops like the OLT 2002 workshop on and for Ontology Engineering highlighting its growing importance. By the mid-2000s, over 50 ontology learning systems had been developed, combining insights from , , and to automate the process. Key processes in ontology learning typically follow a layered model, starting from term extraction (identifying relevant linguistic units like nouns and verbs), synonym detection (grouping semantic variants), concept identification (defining entities with their intensions and extensions), taxonomy construction (building hierarchies via subsumption or clustering), non-hierarchical relation extraction (e.g., "part-of" or "causes"), and finally axiom derivation for logical rules and inferences. Techniques span statistical methods (e.g., co-occurrence analysis and distributional similarity based on Harris's hypothesis that words in similar contexts share meanings), symbolic approaches (e.g., lexico-syntactic patterns like Hearst's hypernymy rules), and hybrid systems that integrate both for improved accuracy. Bootstrapping methods, such as those using seed terms to iteratively expand lexicons, have achieved high precision, for instance, 80% accuracy in extending noun lexicons with 6,000 terms. Applications of ontology learning are diverse, supporting in , enhancing and by enabling better document clustering and , and facilitating tasks like entity recognition and relation filling in domains such as and e-learning. Evaluation often relies on metrics against gold-standard ontologies or task-specific performance, though challenges persist in handling ambiguity, scalability to large corpora, and integrating with emerging technologies like large language models (LLMs) for end-to-end learning. Since 2023, LLMs have been increasingly used for tasks, with dedicated challenges such as LLMs4OL in 2024 and 2025 evaluating their effectiveness.

Introduction

Definition and Scope

Ontology learning refers to the semi-automatic or automatic acquisition of ontology components, including concepts, relations, and axioms, from unstructured, semi-structured, or structured data sources such as text corpora, databases, and web content. This process integrates techniques from , , and knowledge representation to construct or extend domain-specific ontologies. The primary objectives of ontology learning are to extract structured, domain-specific knowledge that supports applications in the , reasoning, and systems. By automating the identification of key elements like terms and their interconnections, it aims to facilitate the creation of machine-readable knowledge bases that enhance data interoperability and semantic understanding across diverse domains. In scope, ontology learning distinguishes itself from manual ontology engineering by employing inductive methods that transform raw data into formal ontology structures, often represented in standards like OWL (Web Ontology Language). It typically begins with sources like free text or legacy databases and progresses through extraction, pruning, and refinement to yield reusable ontologies, thereby addressing the scalability challenges of hand-crafted approaches. For instance, ontology learning can derive concepts such as "disease" and relations like "causes" from medical abstracts in PubMed, using datasets like OHSUMED to build biomedical ontologies.

Historical Context

Ontology learning traces its roots to the late and , emerging as a response to the challenges of in , particularly the labor-intensive manual encoding of bases. Douglas Lenat's project, launched in 1984, exemplified these early efforts by manually constructing a comprehensive ontology of common-sense comprising millions of assertions to enable reasoning, underscoring the need for automated alternatives to overcome bottlenecks. By the , advancements in and began to facilitate the semi-automated extraction of concepts and relations from unstructured text, laying foundational techniques for deriving ontological structures from data sources. The field gained formal structure in the early 2000s amid the rise of the , with Alexander Maedche and Steffen Staab coining the term "ontology learning" in their 2001 paper, which proposed a framework integrating knowledge discovery methods—such as association rule mining and clustering—for semi-automatic ontology construction from web resources. This work marked a pivotal by emphasizing iterative processes of , , , refinement, and evaluation to support machine-understandable semantics. Post-2005, deeper integration with tools advanced the discipline, notably through the Text2Onto framework, which introduced probabilistic methods for learning ontologies from text corpora, handling uncertainty in linguistic evidence like term frequencies and co-occurrence patterns. Ontology learning has incorporated paradigms, including statistical techniques such as clustering and probabilistic graphical models, for more robust relation detection and induction from heterogeneous . The growth of large-scale has further advanced the field, with methods enabling scalable processing of vast text volumes; for instance, neural embeddings and recurrent networks have improved taxonomic and non-taxonomic relation learning by capturing semantic nuances. More recently, as of 2025, large language models (LLMs) have transformed ontology learning by supporting end-to-end extraction and generation of ontological structures from text, as demonstrated in challenges like the Large Language Models for Ontology Learning at ISWC 2025. Influential workshops at conferences like the European Knowledge Acquisition Workshop (EKAW) from the and the International Semantic Web Conference (ISWC) starting in 2002—including dedicated ontology learning sessions at EKAW 2004 and ISWC 2005—played a key role in disseminating ideas and fostering interdisciplinary collaboration.

Core Concepts

Ontologies in Knowledge Engineering

In , ontologies serve as explicit specifications of shared conceptualizations, providing a structured for representing in a manner that is both human-interpretable and machine-processable. This conceptualization bridges the gap between human cognitive models and computational systems, enabling the formalization of knowledge to support tasks such as and decision-making in applications. By defining a common vocabulary and semantics, ontologies facilitate knowledge reuse and across diverse systems, reducing and enhancing in information processing. The core components of an ontology include concepts (often termed classes), which represent abstract categories or types within a domain; instances, which are specific entities belonging to those classes; and relations that connect them. Taxonomic relations, such as "is-a" hierarchies, establish subclass-superclass structures to organize concepts hierarchically, while non-taxonomic relations, like "part-of" or "causes," capture associative or functional dependencies between concepts or instances. Additionally, axioms provide logical constraints or definitions that ensure the integrity of the knowledge base, and rules articulate inferential patterns, such as "if A is a subclass of B, then every instance of A is an instance of B," to derive new knowledge from existing facts. Ontologies are formally represented using standardized languages like RDF (Resource Description Framework), which provides a graph-based model for expressing data as triples (subject-predicate-object), and OWL (Web Ontology Language), which extends RDF with richer constructs for defining classes, properties, and axioms to support advanced reasoning. These representations enable by allowing heterogeneous data sources to be linked and queried uniformly, while facilitating automated inference through description logic-based reasoners that detect inconsistencies or compute implicit knowledge. In the context of ontology learning, the development of domain-specific, reusable ontologies is essential as prerequisites for applications like , where they enhance query precision by disambiguating terms and retrieving contextually relevant results, and expert systems, where they underpin rule-based to simulate domain expertise. These structures form the foundational targets for learning procedures that automatically extract and populate components from raw data.

Sources and Data Types for Learning

Ontology learning relies on diverse primary sources categorized by their level of structure. Unstructured text, such as documents and web pages, serves as a fundamental input, providing raw content from which concepts and relations can be extracted. , including formats like XML, JSON, and HTML, offers partial organization through tags and schemas, facilitating more targeted . Structured sources, such as relational databases and existing thesauri like , supply predefined schemas and explicit relationships, enabling direct mapping to ontological elements. Data types vary in linguistic scope and domain focus, influencing the applicability of learning methods. Monolingual corpora, typically in a single language like English, simplify initial extraction but limit cross-lingual reuse, while multilingual corpora support broader ontology development by aligning terms across languages, often using parallel texts or dictionaries. Domain-specific data, such as biomedical texts from , captures specialized terminology and hierarchies in fields like , whereas general-purpose sources like dumps or provide foundational concepts applicable across domains. These sources present inherent challenges that impact learning efficacy. Noise in unstructured and web-based data, including irrelevant or erroneous relations, can propagate inaccuracies into the ontology. Ambiguity arises from polysemous terms and contextual variations, complicating concept disambiguation without additional resources like lexicons. Scalability issues emerge with large-scale corpora, as processing vast volumes demands efficient algorithms to avoid computational bottlenecks. Preprocessing steps, such as tokenization, part-of-speech tagging, and named entity recognition, are essential to mitigate these issues by cleaning and normalizing input data for reliable extraction. Representative examples illustrate practical applications. DBpedia, derived from Wikipedia's structured infoboxes, is commonly used for entity extraction and relation inference in general-domain ontologies. In contrast, abstracts enable domain-specific learning in biomedicine, supporting the construction of ontologies like those extending the with novel terms and associations.

Learning Procedures

Terminology Extraction

Terminology extraction serves as the foundational phase in ontology learning, where domain-specific is identified from unstructured or sources such as text corpora, to establish the basic for subsequent construction. This process emphasizes domain relevance by prioritizing terms that capture specialized concepts while excluding common words, often through preprocessing steps like stop-word removal and thresholding. The output typically consists of ranked candidate term lists, which provide raw material for higher-level elements without implying conceptual clustering or relational structuring. Statistical methods dominate terminology extraction due to their simplicity and scalability, relying on quantitative measures to gauge term significance within a . Term Frequency-Inverse Document Frequency (TF-IDF) is a seminal approach, calculating a term's score as the product of its frequency in a (TF) and the inverse of its frequency across the entire (IDF), thereby highlighting terms rare in general texts but frequent in domain-specific ones. In ontology learning, TF-IDF effectively filters general vocabulary to isolate domain . Association measures like (PMI) further support this by quantifying collocation strength between words, defined as PMI(x,y) = log₂ [P(x,y) / (P(x) × P(y))], where P denotes probability, to detect multi-word units like "" that co-occur more than expected by chance. Linguistic methods leverage syntactic and morphological analysis to ensure extracted terms align with natural language patterns typical of domain nomenclature. Part-of-speech (POS) tagging assigns grammatical categories to words, enabling the selection of noun-heavy sequences as candidate terms, while noun phrase extraction employs rules or parsers to isolate phrasal structures such as adjective-noun or noun-preposition-noun patterns. For instance, in biomedical ontology learning, POS tagging might flag "" as a from scientific abstracts, filtering out verbs and adverbs to maintain focus on conceptual descriptors. Hybrid approaches integrate statistical and linguistic techniques to mitigate the limitations of each, such as statistical methods' oversight of syntax or linguistic rules' rigidity in varied corpora. The C/NC-value algorithm exemplifies this, first applying linguistic filters (e.g., POS patterns) to generate candidates, then ranking them statistically while accounting for nesting. The C-value for a term string a is computed as: C\text{-value}(a) = \begin{cases} \log_2 |a| \times f(a) & \text{if } a \text{ is not nested in other terms} \\ \log_2 |a| \times \left( f(a) - \frac{1}{P(T_a)} \sum_{b \in T_a} f(b) \right) & \text{if } a \text{ is nested} \end{cases} where |a| is the term length, f(a) its , P(T_a) the number of longer terms containing a, and the sum over their frequencies adjusts for subsumption. The NC-value refines this by incorporating contextual associations, yielding NC-value(a) = 0.8 × C-value(a) + 0.2 × contextual score, where the latter weights co-occurring domain words. Applied to medical texts, this method extracts terms like "heart " with precision gains of 6-8% over frequency alone for nested terms. These extracted terms can then inform concept identification in later stages.

Concept Identification

Concept identification in ontology learning transforms extracted terms into coherent, abstract concepts by grouping semantically related terms and resolving ambiguities inherent in . This process builds upon terminology extraction by aggregating terms that represent the same underlying idea, ensuring the ontology captures without redundancy. Key techniques for grouping terms include clustering algorithms applied to vector representations of terms derived from text corpora. For instance, partitions terms based on their distributional semantics, where terms with similar co-occurrence patterns in documents are grouped into clusters representing distinct concepts. Similarity measures such as on term vectors or WordNet-based semantic distances, which leverage lexical relations like hypernymy and synonymy, guide the clustering to account for both syntactic and semantic proximity. Systems like TEXT-TO-ONTO employ such clustering alongside to form initial concept candidates from unstructured text. Disambiguation addresses , where a single term carries multiple meanings, by analyzing contextual cues from surrounding terms or . Context-based methods, such as those in SVETLAN', classify terms by their distributional to select the appropriate , preventing unrelated meanings from merging into a single concept. For example, the term "bank" in a financial context (e.g., surrounded by words like "" and "deposit") is disambiguated to the sense, distinct from its geographical meaning in environmental texts (e.g., near "river" and "shore"). This ensures precise concept formation without conflating unrelated interpretations. Validation of identified concepts often involves threshold-based acceptance, where clusters below a similarity threshold are rejected or refined, or approaches for cooperative refinement. Tools like ASIUM support user intervention to split or merge clusters, producing final outputs as sets enriched with synonyms and near-synonyms. In a medical domain example, terms such as "" and "heart attack" are clustered into a unified "" , capturing their shared semantic essence while noting synonymous variants.

Taxonomic Relation Derivation

Taxonomic relation derivation in ontology learning focuses on inferring hierarchical "is-a" (hyponym-hypernym or subsumption) relationships among identified to form the backbone of an ontology's . These relations establish vertical hierarchies where a more specific concept (hyponym) is subsumed under a more general one (hypernym), enabling and . Building upon concepts extracted from textual sources, this process typically involves pattern-based extraction, distributional analysis, or structural methods to discover and organize such relations automatically. One prominent approach utilizes lexico-syntactic patterns to identify hyponymy directly from texts. Pioneered by Hearst patterns, these are predefined linguistic templates that capture common expressions indicating subsumption, such as "such as X and Y" or "X, Ys, and other Zs," where X and Y are hyponyms of Z. For instance, the phrase "mammals such as dogs, cats, and " derives the relations "dog is-a ," "cat is-a ," and " is-a ." This method is effective for large corpora due to its simplicity and high precision, though it may miss relations not fitting the patterns. Another key approach involves subsumption discovery through inclusion metrics, which leverages to infer hierarchies based on contextual overlap. Concepts are represented as term distributions over document contexts, and subsumption is detected when the context of a hyponym is largely included in that of the hypernym, often measured via metrics like asymmetric Kullback-Leibler divergence or tests. For example, if the contexts of "" encompass those of "" (e.g., shared attributes like "" or ""), then "dog is-a mammal" is inferred. This distributional method complements pattern-based techniques by handling implicit relations but requires careful thresholding to avoid noise. Among algorithms for deriving taxonomic structures, constructs lattice-based hierarchies from binary incidence matrices of concepts and their attributes derived from text. The process involves: (1) extracting terms and contexts via linguistic to form a formal context, (2) applying FCA to generate concepts as pairs of extents (objects) and intents (attributes), and (3) deriving a partial order from the to form the taxonomy. FCA excels in producing interpretable, non-redundant hierarchies and naturally handles inconsistencies by resolving overlaps in the lattice structure, though it can be computationally intensive for sparse data. Weighting schemes, such as term frequency-inverse document frequency, enhance its performance on text corpora. Clustering algorithms support bottom-up taxonomy building by grouping similar concepts and iteratively merging clusters to form trees. In metric-based frameworks, terms are clustered using multi-criteria optimization, incorporating features like co-occurrence and syntactic dependencies, with distances minimized to ensure hierarchical coherence. For example, agglomerative clustering starts with individual concepts and merges based on semantic similarity until a tree emerges, guided by metrics like minimum evolution to preserve structure. This approach is flexible for unlabeled data but may introduce cycles, which are resolved through post-processing like path maximization. Validation of derived taxonomies emphasizes coverage and cohesion metrics to assess and . Coverage measures the proportion of input concepts and relations captured in the , computed as the of matched elements to the total (e.g., Cov(O) = Concept_Cov(O) + Rel_Cov(O), where relations focus on "is-a" links). Cohesion evaluates the density of relations within the , such as the average number of subsumption links per concept (Coh(O) = Σ I(c_i, c_j) / (n(n-1)), with I indicating a relation), promoting modular, tightly connected structures. High coverage ensures broad representation, while strong cohesion indicates logical ; inconsistencies like cycles are detected via graph analysis and pruned to maintain acyclicity. These metrics guide refinement, with empirical studies showing improved F1-scores (e.g., 0.82 on benchmarks) when balanced.

Non-Taxonomic Relation Learning

Non-taxonomic relation learning in ontology learning focuses on identifying associative links between concepts that extend beyond hierarchical (is-a) structures, enriching ontologies with relational semantics such as part-whole compositions or causal dependencies. These relations are crucial for modeling complex real-world interactions, enabling more expressive knowledge representations in domains like or . Common types include meronymy (part-of relations, e.g., "wheel" as part of "car"), causality (cause-effect links), and spatial or temporal associations (e.g., "located in" or "precedes"). Domain-specific variants, such as "treats" in medical ontologies linking diseases to therapies, further tailor these relations to specialized applications. Methods for extracting non-taxonomic relations primarily leverage techniques applied to unstructured text sources. Pattern-based approaches use predefined lexico-syntactic templates to match relational expressions, such as "X causes Y" for or "X is a part of Y" for meronymy, achieving high precision in controlled corpora like (up to 75.55% accuracy). Dependency parsing analyzes syntactic dependencies in sentence structures to uncover implicit relations, for instance, by tracing paths between noun phrases in parse trees, as demonstrated in bioinformatics texts with 83.3% accuracy. Co-occurrence analysis, meanwhile, statistically measures term proximity or frequencies to infer relations, yielding 67.35% precision on cancer-related datasets. Algorithms for non-taxonomic relation extraction often integrate these methods into scalable frameworks, with Open Information Extraction (OpenIE) systems exemplifying tuple-based extraction of arbitrary relations from text without predefined schemas. For example, OpenIE can derive triples like (smoking, causes, ) from , supporting augmentation in domains. Confidence scoring typically employs probabilistic models, such as association rule mining or neural classifiers, to assign reliability scores to extracted relations, filtering low-confidence links during integration. These techniques complement taxonomic structures by populating associative edges, though they require validation against domain expertise to mitigate noise from ambiguous text.

Rule and Axiom Discovery

Rule and axiom discovery in ontology learning focuses on automatically deriving logical and constraints that define the inferential semantics of concepts and relations within an ontology, enabling richer reasoning capabilities beyond mere taxonomic structures. This process typically builds upon extracted relations to formalize them into declarative , such as implications or restrictions, which can be integrated into description logic-based ontologies. Seminal approaches emphasize inductive methods to ensure the rules are grounded in data patterns while maintaining logical consistency. One prominent technique is (ILP), which induces general rules from specific examples by combining background knowledge, such as existing fragments, with positive and negative training instances. In ILP for ontology learning, algorithms like or Progol search for clauses that cover observed data while adhering to constraints like coverage thresholds and rule simplicity. For instance, ILP has been applied to learn onto-relational rules, where the background theory includes ontological axioms, allowing the discovery of implications like subclass relationships or property inheritances from relational datasets. Another key method involves mining association rules, adapted from techniques such as the , to identify frequent co-occurrences in textual or structured data that suggest ontological constraints. These rules, often expressed in the form "if antecedent then consequent" with support and confidence metrics, are filtered and transformed into formal axioms; for example, adaptations of have been used to enrich ontologies by deriving rules from database transactions aligned with domain concepts. Association rule mining excels in handling large-scale data but requires post-processing to prune spurious rules and map them to ontology languages like . Axioms discovered through these techniques include constraints such as cardinality restrictions (e.g., a class requiring exactly n related instances) and disjointness declarations (e.g., two classes having no overlapping instances), which enhance the ontology's expressive power. Rules akin to those in the (SWRL) are commonly produced, such as "if X is-a Bird and hasProperty(Flys), then X is-a FlyingAnimal," allowing inference. The process generally starts from patterns in non-taxonomic relations, formalizing them into axioms via rule induction, followed by validation through consistency checking with reasoners like Pellet, which detects incoherencies in OWL-DL ontologies augmented with SWRL rules. A representative example is the discovery of the "all squares are rectangles" from geometric texts, where ILP or association identifies recurring patterns linking square properties (e.g., equal sides) to definitions, formalizing it as a subclass after validation to ensure no contradictions with existing geometric constraints. This approach has been demonstrated in domain-specific enrichment, highlighting how rule discovery bridges empirical data to axiomatic .

Ontology Population and Extension

Ontology population refers to the process of instantiating concepts within an existing by identifying and extracting specific entities from data sources, such as text corpora, and associating them with appropriate classes. This step transforms abstract ontological structures into populated knowledge bases capable of supporting applications like and . (NER) plays a central role in this process, where classifiers or rule-based systems detect entities like persons, locations, or organizations in unstructured text. For instance, techniques leveraging background from resources like enhance NER accuracy by disambiguating entities and linking them to ontological concepts. Instance extraction methods often combine gazetteers—precompiled lists of known entities—with classifiers to scale the process efficiently. Gazetteers provide a lookup mechanism for rapid identification of common instances, while supervised classifiers, trained on annotated corpora, handle novel or context-dependent entities, enabling the of ontologies with thousands of instances from large-scale sources. In a practical example, populating a geography ontology with involves extracting place names from articles using NER tools, then classifying them under concepts like "City" or "AdministrativeRegion" based on attributes such as and , resulting in ontologies covering over 100,000 global locations. This method ensures scalability for domain-specific extensions, such as adding urban areas to geospatial ontologies. Ontology extension, on the other hand, involves incrementally growing an existing ontology by integrating new concepts, relations, or instances, often through merging with complementary ontologies or handling evolutionary changes. Merging typically aligns a domain-specific ontology with upper-level ontologies like the Suggested Upper Merged (SUMO), which provides broad foundational categories such as "Process" or "Physical," facilitating across domains. Alignment techniques, including string matching and structural similarity measures, map concepts between ontologies, with manual or semi-automated processes ensuring consistency, as seen in alignments of conference ontologies to SUMO that resolve over 90% of mappings accurately. To manage extension over time, ontology evolution incorporates versioning mechanisms that track changes like additions or deletions, preserving historical states while allowing updates to reflect domain shifts. Tools for versioning maintain multiple ontology versions through diff mechanisms and change logs, supporting collaborative development and rollback capabilities, which is crucial for long-term maintenance in dynamic fields like . This ensures that extensions, such as adding new subconcepts to hierarchies (e.g., introducing "" as a subclass of "" based on thresholds), do not disrupt existing inferences or applications. Recent advances as of 2025 have integrated large language models (LLMs) into ontology learning procedures, enhancing tasks across the pipeline. For example, LLMs facilitate end-to-end ontology construction by improving extraction through contextual embeddings, automating relation discovery with zero-shot prompting, and supporting axiom generation via benchmarked inference capabilities. These methods, such as OLLM frameworks, achieve scalable taxonomy building from scratch, with applications in domains like showing improved precision in population.

Techniques and Methods

Natural Language Processing Approaches

(NLP) approaches form the cornerstone of ontology learning from text, leveraging linguistic structures to extract concepts, relations, and hierarchies automatically. These methods process unstructured textual data through syntactic and semantic analysis, enabling the identification of domain-specific knowledge without relying solely on manual curation. By parsing sentences into grammatical components and inferring meanings, NLP techniques bridge the gap between raw language and formal ontological representations, often serving as preprocessing steps in broader learning procedures. Core NLP methods include dependency , which analyzes sentence structures to uncover relations between terms, such as hypernymy or meronymy, by traversing parse trees to identify head-dependent links indicative of ontological connections. Semantic role labeling () complements this by delineating event structures in text, assigning roles like , , or to predicates, thereby facilitating the derivation of non-taxonomic relations in ontologies. Topic modeling via (LDA) aids in domain identification by discovering latent themes in corpora, grouping co-occurring terms into topical clusters that inform concept hierarchies. Advanced techniques incorporate word embeddings, such as for capturing through vector similarities that reveal synonymy and relatedness for concept clustering, and for contextual embeddings that enhance relation detection by considering bidirectional context. Transformer models enable end-to-end ontology learning by fine-tuning on tasks like and relation extraction, processing entire sequences to predict ontological triples directly from text. Integration occurs through pipelines that chain part-of-speech (POS) tagging with , where tagged nouns and verbs are matched against linguistic templates to extract terms and relations, while cross-lingual embeddings like multilingual support ontology learning across languages by aligning vector spaces.

Machine Learning Integration

Machine learning algorithms play a pivotal role in automating ontology learning by leveraging predictive modeling to extract, classify, and refine ontological structures from unstructured or . These techniques address the limitations of purely rule-based or linguistic methods by learning patterns from data, thereby improving scalability and adaptability across domains. Supervised, , and deep learning paradigms, often combined in approaches, enable more robust handling of complex relations and concepts. Supervised learning methods, such as support vector machines (SVMs), are widely applied for relation labeling tasks, where features like lexical patterns or dependency parses are used to classify taxonomic and non-taxonomic relationships between terms. For example, models trained on dependency paths have demonstrated effectiveness in extracting hypernym-hyponym relations from text, achieving substantial improvements over baseline . further enhances efficiency in these supervised setups by iteratively selecting uncertain instances for human annotation, minimizing labeling efforts while optimizing model performance in ontology population and extension. This approach has been particularly useful in ontology matching, where it queries users on high-uncertainty mappings to refine alignments with reduced expert involvement. Unsupervised learning techniques, including clustering algorithms, facilitate identification by grouping similar terms based on or co-occurrence statistics, aiding the discovery of hierarchical structures without prior annotations. Agglomerative clustering, for instance, has been employed to derive taxonomies from heterogeneous text sources, outperforming random partitioning in forming coherent clusters. methods complement this by validating learned axioms through outlier identification, flagging inconsistencies such as illogical inheritance relations in the emerging to ensure structural integrity. Deep learning has advanced ontology learning through neural architectures suited for sequence labeling, such as bidirectional (Bi-LSTM) networks and transformer-based models like , which capture contextual embeddings for precise recognition and . These models excel in processing sequential data from texts, enabling automated derivation of non-taxonomic relations with higher semantic fidelity compared to traditional statistical methods. More recent large language models (LLMs), such as those based on the architecture, support end-to-end learning by generating taxonomic structures and relations directly from textual inputs, as demonstrated in methods that model components holistically. contributes to iterative refinement by framing construction as a , where agents learn optimal actions for aligning or extending ontological elements through reward-based feedback, particularly in dynamic matching scenarios. Hybrid models integrate with rule-based reasoning to balance empirical learning with formal constraints, enhancing robustness in axiom discovery and ontology extension. Seminal frameworks, such as those combining statistical clustering with ontological heuristics, have set the foundation for these integrations, allowing learned patterns to be constrained by logic for consistent outputs. Overall, these ML integrations are assessed using , underscoring their impact on quality.

Tools and Implementations

Open-Source Frameworks

Open-source frameworks play a crucial role in ontology learning by providing accessible, extensible platforms for researchers and developers to implement extraction, derivation, and population tasks without proprietary dependencies. These tools often emphasize modularity to allow integration of various natural language processing (NLP) and machine learning techniques, facilitating the construction of ontologies from unstructured text sources. One prominent example is Text2Onto, a Java-based framework designed for ontology learning from textual resources through processes. It features a probabilistic model that supports the extraction of terms, concepts, and relations using techniques such as and association rule mining, while allowing dynamic updates to existing ontologies. Text2Onto integrates with OWL formats via standard , enabling seamless export and reasoning over learned structures. The General Architecture for Text Engineering () is a versatile open-source framework commonly used for preprocessing in ontology learning workflows, including semantic tagging and annotation. provides modular pipelines for tasks like tokenization, entity recognition, and ontology population, with built-in support for linking annotations to ontologies through its ontology editor plugin. For instance, can preprocess corpora for input into extraction tools, enabling the identification of non-taxonomic relations via . These frameworks typically offer modular pipelines that span from term extraction to ontology population and extension, often integrating with OWL APIs such as the library for standardized representation and inference. Many implement NLP approaches like and methods for relation discovery, as seen in their configurable processing chains. Post-2015 developments have sustained activity in these areas, with ongoing maintenance and extensions available via repositories; for example, GATE's core has seen regular updates for enhanced plugin support. Recent advancements include LLM-based frameworks, such as those evaluated in the LLMs4OL challenge (2025), which automate ontology learning tasks using large language models for term extraction, concept typing, and induction across diverse domains.

Notable Systems and Case Studies

One prominent system in ontology learning is the DBpedia extraction pipeline, which automatically extracts structured knowledge from articles to populate a large-scale, multilingual . The pipeline processes Wikipedia infoboxes, templates, and other semi-structured elements using mapping-based extractors to generate RDF triples aligned with a shared comprising 768 classes and over 3,000 properties. This approach has enabled the creation of a with over 1.3 billion facts in English alone, as of the 2025-06 release, demonstrating in deriving taxonomic and relational structures from unstructured text sources. YAGO represents another key system, extending traditional knowledge bases by incorporating temporal facts extracted from , , and . It anchors entities, events, and relations to specific times and locations, using rule-based and statistical methods to infer over 80 million facts from nearly 10 million entities in its early versions. By the 2020s, YAGO 4 has scaled to more than 50 million entities and 2 billion facts, highlighting its effectiveness in handling temporal dimensions for dynamic ontology population and adaptation across domains. For non-taxonomic relations, semantic orientation analyzers like those in ontology-supported polarity mining systems identify directional relations such as positive or negative associations between concepts. These tools compute semantic scores based on patterns in text corpora, enabling the learning of affective or evaluative within ontologies. Such methods have been applied to enhance extraction by quantifying differences, improving the of inferred links in domain-specific bases. In biomedical applications, the NCBO Annotator serves as a notable from the , leveraging BioPortal's repository of over 1,500 ontologies to annotate clinical texts and support ontology learning. It identifies and maps biomedical terms in , such as electronic health records, to ontology classes, facilitating the extension of ontologies like with new instances and relations. This system has processed millions of annotations, aiding in the discovery of hierarchical and associative structures in while addressing domain-specific challenges like synonymy. A practical in involves ontology learning from product reviews to construct product hierarchies and feature relations. By applying techniques to customer feedback on items like digital cameras, researchers extracted design attributes (e.g., battery life, image quality) and populated ontologies with user-derived relations, using 296 reviews across models from , , and . This approach bridged customer insights with , revealing in handling noisy, opinionated text for commercial ontology extension. These systems underscore key outcomes in ontology learning, such as YAGO's handling of billions of facts for temporal scalability and lessons in from biomedical and contexts, where improves robustness to varied sources. By the 2020s, ontology learning has evolved toward systems integrating large models (LLMs), as seen in ontology-guided prompt learning frameworks that combine KG ontologies with LLM-generated queries for enhanced generalization.

Evaluation and Challenges

Assessment Metrics

The evaluation of learned ontologies in ontology learning relies on a combination of quantitative and qualitative metrics to assess the accuracy, quality, and utility of the extracted knowledge structures. For tasks involving the extraction of concepts, relations, or instances from data sources, standard information retrieval metrics such as precision, recall, and F-measure are commonly applied. Precision measures the proportion of extracted elements that are correct relative to the total extracted, recall evaluates the proportion of relevant elements successfully identified from the entire set of true elements, and the F-measure provides a balanced harmonic mean of the two, often weighted (F1-score) to emphasize their trade-off. These metrics are particularly useful in comparing learned outputs against gold standard ontologies, where true positives (TP), false positives (FP), and false negatives (FN) are determined by alignment techniques. Beyond extraction-specific measures, overall quality is gauged through metrics like and coverage. , often interpreted as semantic , assesses the internal relatedness and meaningful interconnectedness of elements, such as the of relations among concepts, to ensure logical without contradictions. Coverage, synonymous with in this , evaluates how comprehensively the represents the target , typically by measuring the proportion of entities or relations captured relative to an expected . These are derived from structural analyses, where low semantic might indicate sparse or unrelated concepts, while incomplete coverage highlights gaps in representation. Gold standards serve as benchmarks for rigorous comparison, often involving manually curated ontologies like the (GO) in biomedical domains, where learned hierarchies are aligned and scored for depth, breadth, and fidelity to expert annotations. Task-specific metrics, such as hierarchy depth (average levels of subsumption) or , further quantify structural adequacy against these references. methods are broadly categorized as intrinsic or extrinsic: intrinsic methods focus on internal properties like and completeness through automated checks (e.g., reasoner-based contradiction detection), while extrinsic methods assess performance in downstream applications, such as improved accuracy in question-answering systems or tasks. A prominent example of standardized assessment is the Ontology Alignment Evaluation Initiative (OAEI), which evaluates aspects of learned ontologies using , , and F-measure against reference mappings in tracks like or biomedical datasets, providing comparative benchmarks across systems. These campaigns highlight the importance of robust standards to quantify quality, often revealing trade-offs in versus accuracy for ontology learning approaches.

Limitations and Future Directions

Ontology learning faces significant scalability challenges when processing massive datasets, as traditional shallow learning methods often struggle to handle large-scale, diverse volumes efficiently, leading to computational bottlenecks and reduced . Additionally, these approaches exhibit limitations in managing and contextual nuances in inputs, where inter-ontology uncertainties can introduce inconsistencies during and integration. Domain portability remains a persistent issue, with models trained on specific corpora performing poorly when applied to new domains due to the lack of generalizable feature representations across varied textual sources. Machine learning integration in ontology learning introduces biases inherent to training data, which can propagate stereotypes or skewed representations into the resulting ontologies, amplifying undesired societal prejudices in downstream applications. For instance, ML models may favor dominant linguistic patterns in monolingual datasets, exacerbating underrepresentation of minority concepts or relations. Notable gaps include underdeveloped multilingual support, where current techniques predominantly rely on English-centric resources, hindering effective ontology construction from non-English texts and limiting global applicability. Integration with dynamic graphs poses further challenges, as static learning pipelines fail to accommodate updates and evolving relational structures, resulting in outdated or incomplete representations. Ethical concerns, particularly in population, arise from the extraction of sensitive entity relations from , risking unintended disclosure without robust anonymization mechanisms. Looking ahead, future directions emphasize leveraging large language models, such as variants developed post-2020, for zero-shot , enabling more flexible and context-aware learning from unstructured text without extensive retraining. Recent benchmarks, such as the LLMs4OL 2025 challenge, demonstrate that hybrid approaches combining commercial LLMs with domain-tuned embeddings and fine-tuning achieve high performance in tasks like discovery and . hybrids represent a promising avenue, combining learning with symbolic reasoning to improve interpretability and accuracy in construction by bridging statistical patterns with logical inference. By 2025 and beyond, there is a clear shift toward paradigms for evolving ontologies, incorporating continual learning techniques to adapt class hierarchies and relations dynamically as new data emerges, thus supporting persistent knowledge maintenance.

References

  1. [1]
    [PDF] Ontology Learning from Text: A Survey of Methods
    An early paper on semantic clustering is Hindle (1990), which aims at finding seman- tically similar nouns by comparing their behavior with respect to ...
  2. [2]
    [PDF] A Survey of Ontology Learning from Text - UPV
    This area of research is usually referred to as ontology learning [1]-[3]. We present in this paper a survey of ontology learning from text. Section 2 ...
  3. [3]
    None
    ### Summary of Ontology Learning from the Document
  4. [4]
    [PDF] Ontology Learning for the Semantic Web - UMBC CSEE
    the Semantic Web. Alexander Maedche and Steffen Staab, University of Karlsruhe. The Semantic Web relies heavily on formal ontologies to structure data for com-.
  5. [5]
    A survey of ontology learning techniques and applications - PMC - NIH
    Oct 5, 2018 · Ontology learning is a multidisciplinary task that extracts important terms, concepts, attributes and relations from unstructured text by ...
  6. [6]
    Ontology learning for the Semantic Web - IEEE Xplore
    The authors present an ontology learning framework that extends typical ontology engineering environments by using semiautomatic ontology construction tools.
  7. [7]
    [PDF] Perspectives on Ontology Learning - Jens Lehmann
    Nov 2, 2005 · The field of ontology learning, a term coined by Alexander Mädche and Steffen Staab in 2001 [7], is concerned with the development of methods ...
  8. [8]
    (PDF) Text2onto - ResearchGate
    Aug 7, 2025 · In this paper we present Text2Onto, a framework for ontology learning from textual resources. ... ontology learning tools should be indepen-. dent ...
  9. [9]
    survey of ontology learning techniques and applications | Database
    Oct 5, 2018 · This paper describes the process of ontology learning and further classification of ontology learning techniques into three classes (linguistics, statistical ...
  10. [10]
    [PDF] A translation approach to portable ontology specifications
    This paper describes a mechanism for defining ontologies that are portable over representation systems. Definitions written in a standard format for predicate ...
  11. [11]
    [PDF] Ontology Development 101: A Guide to Creating Your First ... - protégé
    1 Why develop an ontology? In recent years the development of ontologies—explicit formal specifications of the terms in the domain and relations among them ...
  12. [12]
    OWL Web Ontology Language Reference - W3C
    Feb 10, 2004 · This document contains a structured informal description of the full set of OWL language constructs and is meant to serve as a reference for OWL users.
  13. [13]
    The Role of Ontologies in Knowledge Engineering - ResearchGate
    Jun 10, 2016 · An ontology, in the Knowledge Engineering and Artificial Intelligence sense, is a framework for the domain knowledge of an intelligent system.
  14. [14]
    Ontology learning towards expressiveness: A survey - ScienceDirect
    Ontology learning covers all the techniques which help knowledge engineers to build or update an ontology of a given domain by automatizing its whole ...
  15. [15]
  16. [16]
    Natural Language Processing Methods and Systems for Biomedical ...
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use.
  17. [17]
  18. [18]
    Ontology learning: State of the art and open issues - ResearchGate
    Among these challenges are achieving fully automatic construction, handling noise terms to improve pre-processing, enhancing the discovery of relations between ...
  19. [19]
    [PDF] Unresolved Issues in Ontology Learning - Position Paper -
    The issue of text understanding refers to the ambiguity and complexity of natural lan- guage and raises the question of the availability of NLP tools able to ...
  20. [20]
    [PDF] Domain Ontology Learning Enhanced by Optimized Relation ...
    In this paper, we propose a method using DBpedia in a different manner. We utilize relation instances in DBpedia to supervise the ontology learning procedure ...
  21. [21]
    Survey on terminology extraction from texts - Journal of Big Data
    Feb 6, 2025 · Hybrid terminology extraction methods combine linguistic analysis, statistical analysis, and other techniques, leveraging the strengths of ...
  22. [22]
  23. [23]
  24. [24]
    Identifying Terms to Represent Concepts of a Work Process Ontology
    The findings of the comparison analysis suggest that the TFIDF term weighting scheme exhibits better results compared to the TF and MI weighting schemes.
  25. [25]
  26. [26]
    [PDF] C-value/NC-value Method - The University of Manchester
    The method we present and evaluate in this paper extracts multi-word terms from English corpora com- bining linguistic and statistical information. It is ...Missing: original | Show results with:original
  27. [27]
    [PDF] Learning Domain Ontologies from Document Warehouses and ...
    Semantic interpretation is the process of determining the right concept (sense) for each component of a complex term (this is known as sense disambiguation) and ...
  28. [28]
    Automatic Acquisition of Hyponyms from Large Text Corpora
    Marti A. Hearst. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational ...
  29. [29]
    [PDF] Learning subsumption hierarchies of ontology concepts from texts
    The method identifies concepts in documents and organizes them into a subsumption hierarchy, without presupposing the existence of a seed ontology. The method ...
  30. [30]
  31. [31]
    [PDF] A Metric-based Framework for Automatic Taxonomy Induction
    This paper presents a novel metric-based framework for the task of automatic taxonomy induction. The framework incrementally clus-.
  32. [32]
    A method of ontology evaluation based on coverage, cohesion and ...
    Oct 22, 2021 · The metrics are coverage, cohesion, coupling. Then we improved the method of measuring coverage by. comparing the number of concepts and ...
  33. [33]
    Evaluating techniques for learning non-taxonomic relationships of ...
    Sep 1, 2014 · Learning non-taxonomic relationships is a sub-field of Ontology Learning that aims at automating the extraction of these relationships from ...Missing: methods | Show results with:methods
  34. [34]
    A Process for Extracting Non-Taxonomic Relationships of ...
    Ontology learning looks for identifying ontology elements like non-taxonomic relationships from information sources. These relationships correspond to slots in ...
  35. [35]
    [PDF] Learning Onto-Relational Rules with Inductive Logic Programming
    Oct 29, 2012 · In this chapter we take a critical look at ILP proposals for learning relational rules while having an ontology as the background theory. These ...Missing: papers | Show results with:papers
  36. [36]
    [PDF] Using Association Rules for Ontology Enrichment - CEUR-WS.org
    In this article, we propose a new approach for enriching an existing ontology by the use of association rules using the Apriori algorithm applied to a database.
  37. [37]
    Named Entity Recognition for Ontology Population using ... - IGI Global
    Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia. Ziqi Zhang (University of Sheffield, UK) and Fabio Ciravegna ...
  38. [38]
    (PDF) Named Entity Recognition for Ontology Population using ...
    Mar 29, 2018 · (Nothman et al., 2008). Named Entity Recognition and Ontology Population. NER plays a critical role in many complex knowledge discovery and ...
  39. [39]
    A domain-independent process for automatic ontology population ...
    Dec 1, 2014 · This is a new approach for automatic ontology population that uses an ontology to automatically generate rules to extract instances from text and classify them ...
  40. [40]
    [PDF] ENRICHMENT AND POPULATION OF A GEOSPATIAL ONTOLOGY ...
    concepts or instances for enriching or populating the ontology. (e.g., urban area, geography education, natural environment). Among the identified noun ...
  41. [41]
    Building a geographical ontology by using Wikipedia - ResearchGate
    This paper introduces an approach to build a geographical ontology of countries at the global scale. Our approach is based on the free online encyclopedia ...
  42. [42]
    The Suggested Upper Merged Ontology (SUMO) - Ontology Portal
    Aug 10, 2025 · The Suggested Upper Merged Ontology (SUMO) and its domain ontologies form the largest formal public ontology in existence today.
  43. [43]
    [PDF] Aligning Conference Ontologies with SUMO: A Report on Manual ...
    We present the process of establishing a consensual alignment between the Con- ference ontologies and SUMO (Suggested Upper Merged Ontology) [19,21]. Matching.
  44. [44]
    [PDF] Ontology Evolution and Versioning
    This report gives an overview of the state of the art of research done so far in the fields of Ontology Evolution and Versioning. i. Page 4. Contents. 1 ...
  45. [45]
    Integrating Tools of Evolution and Versioning in Ontology
    Aug 8, 2012 · This paper presents a new way to manage the lifecycle of an ontology incorporating both versioning tools and evolution process.
  46. [46]
    [PDF] Unsupervised Ontology Induction from Text - ACL Anthology
    We present OntoUSP, a system that induces and populates a probabilistic on- tology using only dependency-parsed text as input. OntoUSP builds on the USP ...
  47. [47]
    A semantic role labelling-based framework for learning ontologies ...
    This paper proposes a new ontology learning methodology based on semantic role labeling from digital Spanish documents.Missing: labeling | Show results with:labeling
  48. [48]
    Modular Ontology Learning with Topic Modelling over Core Ontology
    In this paper, we focus on modular taxonomy learning from text, where each module collects terms with the same topic insights.
  49. [49]
    Ontology Embedding: A Survey of Methods, Applications and ... - arXiv
    Jun 16, 2024 · In early 2010s, word embedding algorithms like Word2Vec were proposed to represent natural language words as low dimensional vectors with their ...
  50. [50]
    Ontology Vocabulary Extraction from Texts Using Transformers
    Three transformer models including BERT, RoBERTA, and DistilBERT are used for concept extraction, whereas, LUKE and MLUKE models are used for relation ...
  51. [51]
    [PDF] OLAF: An Ontology Learning Applied Framework - HAL
    Dec 13, 2023 · Possible approaches for this task include pattern matching from Part Of. Speech (POS) tagging techniques and scored-based methods (occurrences, ...
  52. [52]
    BertSRC: transformer-based semantic relation classification
    Sep 6, 2022 · We built a relation classification model based on Bidirectional Encoder Representations from Transformers (BERT) trained on our dataset, applying our newly ...<|control11|><|separator|>
  53. [53]
    Actively Learning Ontology Matching via User Interaction
    To address these questions, we propose an active learning framework for ontology matching, which tries to find the most informative candidate matches to query ...<|control11|><|separator|>
  54. [54]
  55. [55]
    Deep reinforcement learning approach for ontology matching problem
    Jul 6, 2023 · This work presents a novel approach to ontology matching using a deep reinforcement learning (DRL) model. The model transforms the problem into ...
  56. [56]
    A Review of State of the Art Deep Learning Models for Ontology ...
    Aug 6, 2025 · ArticlePDF Available. A Review of State of the Art Deep Learning Models for Ontology Construction. January 2024; IEEE Access PP(99):1-1. DOI ...
  57. [57]
    A flexible framework to experiment with ontology learning techniques
    Machine learning and automated language-processing techniques have been used to extract concepts and relationships from structured and unstructured data, such ...Missing: shift | Show results with:shift
  58. [58]
    Text2Onto - SpringerLink
    In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier ...
  59. [59]
    GateNLP/gateplugin-Ontology: Ontology support for GATE - GitHub
    Ontology support for GATE. Contribute to GateNLP/gateplugin-Ontology development by creating an account on GitHub.Missing: NLP | Show results with:NLP
  60. [60]
    (PDF) DBpedia - A Large-scale, Multilingual Knowledge Base ...
    The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The ...<|separator|>
  61. [61]
    YAGO2: exploring and querying world knowledge in time, space ...
    We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, and ...
  62. [62]
    Downloads/yago 4 | Yago Project
    YAGO is thus a simplified, cleaned, and “reasonable” version of Wikidata. It contains more than 50 million entities and 2 billion facts.
  63. [63]
    Ontology‐supported polarity mining - Wiley Online Library
    Oct 10, 2007 · The semantic orientation of a word is measured by the relative difference in semantic scores between a target word and the average of all ...
  64. [64]
    BioPortal: enhanced functionality via new Web services from ... - NIH
    BioPortal is a Web portal that provides access to a library of biomedical ontologies and terminologies developed in Web Ontology Language (OWL).Ontology Data · Web Services · Discussion
  65. [65]
    Ontology-based approach to extract product's design features from ...
    In this paper, the online customers' reviews analysis for the identification of key product attributes to be used in the conceptual design phase of a product ...
  66. [66]
    Ontology-Guided, Hybrid Prompt Learning for Generalization in ...
    Feb 6, 2025 · We present an ontology-guided, hybrid prompt learning strategy that integrates KG ontology into the learning process of hybrid prompts.Missing: 2020s | Show results with:2020s
  67. [67]
    [PDF] A Survey on Ontology Evaluation Methods - HAL
    This paper addresses the issue of finding an efficient ontology evaluation method by presenting the existing ontology evaluation techniques, while discussing ...
  68. [68]
    [PDF] Approaches, methods, metrics, measures, and subjectivity in ...
    An ontology would have high cohesion if its its classes are strongly related therefore, high cohesion is a desirable property. Completeness. Measures if the ...
  69. [69]
    [PDF] A Conceptual Model for Ontology Quality Assessment
    This has been discussed under the aspects of ontology evaluation space in [32,72] namely extrinsic and intrinsic aspects of ontologies. Thus, Figure 4 presents ...
  70. [70]
    [PDF] OnE: An Ontology Evaluation Framework - IMR Press
    Apr 29, 2020 · Clarity, con- sistency or coherence, conciseness, completeness, coverage, expendability/extendibility, correctness and minimal onto- logical ...
  71. [71]
    [PDF] Results of the Ontology Alignment Evaluation \\ Initiative 2024
    The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based ...
  72. [72]
  73. [73]
    A Short Review for Ontology Learning: Stride to Large Language ...
    Jun 17, 2024 · This paper gives a review in approaches and challenges of ontology learning. It analyzes the methodologies and limitations of shallow-learning-based and deep- ...
  74. [74]
    [PDF] USING DOMAIN ONTOLOGY UNCERTAINTY OR ONTOLOGY ...
    Dec 31, 2023 · Usage of external ontologies includes its inter-ontology uncertainties, and also can lead to additional inter-ontology ambiguities if they are ...
  75. [75]
    A Survey of Ontology Learning Approaches - ResearchGate
    Aug 7, 2025 · In this paper, we present a survey for the different approaches in ontology learning from semi-structured and unstructured date
  76. [76]
    Leveraging Ontologies to Document Bias in Data - arXiv
    Jun 29, 2024 · Machine Learning (ML) systems are capable of reproducing and often amplifying undesired biases. This puts emphasis on the importance of ...
  77. [77]
    [PDF] Towards an Ontology-Driven Approach to Document Bias
    The description and modeling of machine learning pipelines and measured biases with ontologies have the capacity to improve the interpretation of bias in data, ...
  78. [78]
    [PDF] Multilingual Knowledge Graphs: Challenges and Opportunities
    To construct a multilingual knowledge graph, there are several processes of integration of different languages like define the purpose and scope, select.Missing: gaps | Show results with:gaps
  79. [79]
    Using dynamic knowledge graphs to detect emerging communities ...
    Jun 21, 2024 · We propose a method for generating knowledge relationships over unconnected components of a knowledge graph, allowing for a targeted exploration of emerging ...Missing: multilingual | Show results with:multilingual
  80. [80]
    Privacy-Preserving Synthetically Augmented Knowledge Graphs ...
    Oct 16, 2024 · We introduce a novel privacy measure for KGs, which considers derived knowledge and a new utility metric that captures the business semantics we want to ...
  81. [81]
  82. [82]
    Enhancing Large Language Models through Neuro-Symbolic ... - arXiv
    Apr 10, 2025 · We propose a neuro-symbolic approach integrating symbolic ontological reasoning and machine learning methods to enhance the consistency and reliability of LLM ...
  83. [83]
    [2210.04993] Learning with an Evolving Class Ontology - arXiv
    Oct 10, 2022 · Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels ...