Fact-checked by Grok 2 weeks ago

BabelNet

BabelNet is a large-scale multilingual and that connects concepts and named entities across languages through semantic relations, providing wide lexicographic and encyclopedic coverage of terms. Developed at the Sapienza University of Rome's Group, it automatically integrates resources such as and to create a unified structured around multilingual synsets—groups of synonymous terms representing a single meaning in multiple languages. Conceived by Roberto Navigli and initially presented in 2010, BabelNet has evolved into a comprehensive and , with version 5.3 (released December 2023) featuring over 22 million synsets, more than 1.7 billion senses across 600 languages, and nearly 1.9 billion semantic relations. Each synset includes lexicalizations, definitions, images, and links to external resources, enabling applications in tasks like , computation, and multilingual . The resource is maintained and extended by Babelscape, a company founded to commercialize its technology, and has received support from the . BabelNet's construction relies on an automatic algorithm that aligns monolingual lexicons and encyclopedias, ensuring broad coverage of both general vocabulary and specialized domains while handling named entities through integration with sources like YAGO and DBpedia. It offers programmatic access via APIs in and , a endpoint for querying, and a interface for RDF exports, facilitating its use in research and industry. Notable extensions include tools like Babelfy for and VerbAtlas for verbal relations, underscoring its role as a foundational resource in multilingual semantics.

Overview

Definition and Purpose

BabelNet is a multilingual lexical-semantic , , and that merges synonyms, translations, and definitions across languages into unified concepts called synsets, where each synset represents a single meaning with its lexicalizations in multiple languages. This structure allows for a cohesive representation of lexical and , bridging the gap between dictionary-like entries and broader conceptual interconnections in a . The primary purpose of BabelNet is to facilitate cross-lingual semantic understanding by offering a unified representation of word meanings across 600 languages as of version 5.3 (December 2023), thereby overcoming limitations in monolingual resources such as that lack extensive multilingual coverage. It addresses challenges in tasks like and by providing language-independent access to concepts, enabling seamless equivalence across linguistic boundaries without relying on pairwise translations. Conceptually, BabelNet is founded on the vision of a universal multilingual dictionary that links words directly to underlying concepts, supporting applications in multilingual information retrieval, question answering, and knowledge base population by minimizing language-specific barriers. It was created by the Natural Language Processing group at Sapienza University of Rome, led by Roberto Navigli.

Key Features

BabelNet employs a synset-based organization, where each synset groups synonymous word senses from multiple languages that represent the same underlying concept, extending the WordNet model to a multilingual framework. This structure allows for the inclusion of named entities alongside common nouns, each synset enriched with definitions derived from integrated lexical and encyclopedic sources, associated images for visual representation, and domain labels to categorize concepts into specific fields such as arts, science, or sports. The resource provides extensive multilingual coverage across 600 languages as of version 5.3 (December 2023), achieved through automatic alignment techniques that leverage inter-language links from and for less-resourced languages. This integration combines encyclopedic depth—offering detailed, Wikipedia-like entries for broader contextual knowledge—with the lexical precision of , enabling precise sense distinctions while facilitating cross-lingual . As a bridge for cross-lingual tasks, it supports applications requiring consistent concept mapping across diverse linguistic contexts. Semantic relations in BabelNet include hypernymy (is-a), meronymy (part-of), and other -style pointers, systematically extended to multilingual synsets for hierarchical and compositional reasoning. Additionally, it incorporates Wikipedia-derived edges, such as related-to links, to capture broader semantic relatedness beyond strict taxonomies. Among its unique aspects, BabelNet supports automatic disambiguation capabilities through associated tools like Babelfy, which performs joint and across hundreds of languages using graph-based algorithms. It also establishes bidirectional links to external ontologies, including for lexical senses and for structured knowledge and properties, enhancing reasoning and interoperability with broader knowledge graphs.

History and Development

Origins and Creation

BabelNet originated from the need to overcome the limitations of monolingual lexical-semantic resources, such as , which were primarily English-centric and lacked broad multilingual coverage, thereby hindering applications in global tasks. Researchers recognized that existing s suffered from high manual maintenance costs and insufficient support for multiple languages, motivating the creation of an automated, wide-coverage multilingual alternative that could leverage vast encyclopedic knowledge. This vision was driven by the potential to enable cross-lingual semantic understanding, inspired by the semantic network model of but extended to handle diverse languages through integration with collaborative resources like . The project was first presented in the 2010 paper "BabelNet: Building a Very Large Multilingual " by Roberto Navigli and Simone Paolo Ponzetto, introduced at the 48th Annual Meeting of the Association for Computational Linguistics in , . In this seminal work, the authors outlined the initial construction of BabelNet as an automatic process that mapped WordNet's English synsets to Wikipedia articles across languages, using context-based disambiguation and to generate multilingual lexicalizations. This knowledge-based approach allowed for the rapid assembly of a without extensive manual annotation, establishing BabelNet as a foundational resource for multilingual . Initial development was led by Roberto Navigli at the Sapienza Natural Language Processing (NLP) Group within the Department of Computer Science at Sapienza University of Rome, with contributions from Simone Paolo Ponzetto during his affiliation at Heidelberg University. The effort received funding through the European Research Council's MultiJEDI Starting Grant (grant number 259234), a five-year project running from 2011 to 2016, which supported the expansion and refinement of BabelNet as part of broader multilingual joint disambiguation and entity linking research. Engineering aspects were later handled in collaboration with Babelscape, a Sapienza University spin-off founded in 2016 by Navigli and focused on multilingual NLP technologies.

Evolution of Versions

BabelNet's development has progressed through iterative releases since its inception, with each version expanding its multilingual coverage, integrating new resources, and refining alignment techniques to enhance semantic connectivity. The initial version emerged from foundational research integrating and , evolving into a vast through systematic additions of lexical and encyclopedic sources. Subsequent updates focused on broadening language support, improving mapping accuracy with , and addressing scalability via frameworks. Version 1.0, introduced in 2010, marked the project's launch as an automatic merger of WordNet's English synsets with Wikipedia's multilingual entries, creating an initial with basic cross-lingual links. By version 1.1 in January 2013, coverage extended to six languages and four sources, including DBpedia, laying the groundwork for wider encyclopedic integration. Version 2.0, released in March 2014, scaled to 50 languages and added OmegaWiki, while version 2.5 in November 2014 incorporated and , enriching relational structures and multilingual senses. Further advancements in version 3.0 (December 2014) dramatically increased languages to 271 and enhanced mappings, followed by version 3.5 (September 2015), which introduced BabelDomains for semantic labeling and additional wordnets. Version 4.0 in 2018 integrated resources like YAGO and , boosting synset validation and adding images starting with version 3.5, with over 90% manual precision checks. Key evolutions included gradual incorporation of from 2014, images starting in 2015, and domain labels in 2015, alongside machine learning refinements for alignment accuracy, such as BERT-based methods in later iterations. Version 5.0, released in February 2021, achieved 500 languages and 51 sources, notably integrating VerbAtlas for verbal relations and achieving over 99.5% precision through extensive manual validation. The most recent major update, version 5.3 in December 2023, expanded to 600 languages with 53 sources and 80 new languages, updating core resources like . Scalability challenges in early versions, such as handling massive alignments, were resolved using , enabling efficient processing of billions of senses. No significant updates have been announced since 5.3 as of 2025. Milestones include the 2015 META prize awarded to the BabelNet team for its contributions to multilingual , and a 2022 workshop celebrating the project's tenth anniversary, following the IJCAI survey paper reviewing a decade of progress. These developments underscore BabelNet's shift from a bilingual prototype to a comprehensive, machine-refined global .

Architecture and Model

Semantic Network Structure

BabelNet is formally modeled as a G = (V, E), where the set of vertices V represents concepts and entities, and the set of edges E encodes semantic relations between them, with each edge labeled according to its relation type. This structure extends the lexical-semantic paradigm of to a multilingual scale, enabling the representation of both fine-grained lexical meanings and broad . The nodes in BabelNet primarily consist of synsets, which serve as concept nodes that group synonymous word senses across multiple languages into a single meaning unit; for example, a synset might include "" in English, "chien" in , and "cane" in . Separate nodes are designated for named entities, such as persons, locations, or organizations, to distinguish them from general concepts and support entity-specific linkages. Each synset is enriched with attached glosses—textual definitions derived from integrated sources—and images, typically sourced from entries, to provide descriptions of the represented meaning. Edges in the graph are categorized into semantic relations and relatedness links. Semantic relations include structured pointers such as hypernymy, representing "is-a" hierarchies (e.g., "" is-a ""), with over 364,000 semantic relations initially drawn from 3.0 (including hypernymy, among others) and extended through alignments with other resources. Relatedness edges, which capture looser associations, are derived from co-occurrences in articles, yielding over 1.9 billion undirected links that connect concepts based on contextual proximity rather than strict . The graph exhibits directed acyclic properties in its taxonomic components, ensuring hierarchical consistency without loops in relations like hypernymy, while the relatedness edges permit cycles to reflect real-world semantic interconnections. Overall, the network comprises 1,911,610,725 relations across its 22,892,310 synsets as of version 5.3 (December 2023). BabelNet's formal representation is compatible with RDF and standards, facilitating its use as an in applications, with each synset assigned a in the form "bn: followed by an 8-digit number and a part-of-speech tag," such as bn:00000001n for the concept "animal."

Integration Methodology

BabelNet's integration methodology centers on a knowledge-based mapping process that aligns synsets with pages to form unified "babel synsets." This core algorithm employs string similarity measures on glosses and definitions, combined with exact or fuzzy matching of page titles, to link English-centric WordNet senses to Wikipedia's encyclopedic entries. For instance, the disambiguation relies on context overlap scoring, where the intersection of lexical contexts—such as synonyms, hypernyms, and category labels from both resources—determines the best alignment, achieving an F1 score of approximately 79% in early implementations. To extend this to non-English languages, the methodology incorporates , such as via the API, applied to SemCor-annotated sentences and Wikipedia excerpts, thereby generating multilingual lexicalizations and enriching babel synsets with translations from inter-language links. Alignment techniques further refine this mapping through bilingual dictionaries for sense induction across languages and graph propagation algorithms to infer semantic relations. Bilingual resources, including those derived from Wikipedia's inter-language links, enable the induction of senses in languages like Italian or French by propagating alignments from English pivots, covering up to 86% of word senses in aligned wordnets. Relation inference uses graph-based propagation, leveraging structural similarities in WordNet's hypernymy chains and Wikipedia's category hierarchies, weighted by metrics like the Dice coefficient to extend edges beyond direct mappings—resulting in millions of inferred relations. Ambiguities are handled via overlap-based scoring, prioritizing alignments with the highest contextual intersection (e.g., |Ctx(s) ∩ Ctx(w)| + 1), which resolves polysemy by favoring Wikipedia disambiguated pages over redirects. These techniques ensure a cohesive multilingual graph while preserving the distinctiveness of input resources. The methodology has evolved from rule-based heuristics in initial versions to incorporating machine learning for enhanced precision. Early releases (v1, 2010–2013) relied on deterministic rules and bag-of-words matching for mapping, but subsequent iterations integrated graph-based algorithms with deeper propagation (up to depth 2) to improve recall. By v3 and later, machine learning models were adopted for entity linking and sense disambiguation, notably through the integration of Babelfy—a tool that uses personalized PageRank on the BabelNet graph combined with surface-level features to achieve state-of-the-art word sense disambiguation. This shift addressed limitations in handling noisy alignments, boosting overall accuracy. Quality control involves iterative manual validation on subsets of mappings, with error rates progressively reduced through refinement. In v1, manual evaluation of 3,000 synsets revealed an error rate of about 15%, primarily from incomplete multilingual coverage. By v5 (2021), over 90% of core Wikipedia-WordNet mappings underwent manual curation, yielding error rates under 5% and precision exceeding 99.5% on validated subsets. This process ensures the reliability of the unified semantic network.

Content and Resources

Integrated Sources

BabelNet integrates a wide array of external linguistic and knowledge resources to form its multilingual , with primary sources providing the foundational lexical and encyclopedic elements. The core is seeded by , particularly Princeton WordNet 3.0 and the Open English WordNet, which supply lexical relations and serve as the English-language base for synsets and semantic connections. contributes encyclopedic definitions and multilingual pages, forming the bulk of the resource's descriptive content across hundreds of languages, while adds translations and lexical information for additional languages, enhancing cross-lingual coverage without structured sense distinctions. Secondary sources further enrich the network with specialized and collaborative data. provides structured entities and properties, linking millions of named entities to BabelNet's concepts via connections. OmegaWiki offers a collaborative, multilingual lexicon modeled after , contributing synset-like structures for diverse terms. VerbAtlas supplies verbal relations, including semantic roles for predicates, which are transferred multilingually to expand relational depth. Over 50 additional resources, including more than 30 regional such as those for and , provide language-specific lexical data to broaden global representation. In terms of contributions, Wikipedia offers detailed explanations and interconnections that differentiate BabelNet from purely lexical resources. establishes the semantic core through its foundational synsets and relations, while integrates around 15 million named entities, enabling robust and knowledge grounding. These integrations emphasize open-source materials, with a total of 53 resources fused in version 5.3. The sources are refreshed annually to maintain currency, with BabelNet 5.3 incorporating the November 2023 dumps of , , and , alongside the October 2023 Open English update. This periodic synchronization ensures evolving coverage without disrupting the resource's structural integrity.

Scale and Coverage Statistics

BabelNet version 5.3, released in December 2023, represents a vast multilingual semantic resource, encompassing 600 languages and totaling 1.7 billion word senses across its entries. This scale is evidenced by 22.9 million synsets, which serve as the core units grouping synonymous terms and concepts, alongside 7.3 million distinct concepts and 15.6 million named entities. The network further includes 159.7 million definitions and 61.4 million associated images, providing rich encyclopedic and visual context for its entries. The relational structure underscores BabelNet's depth, with 1.9 billion total relations connecting its elements, including approximately 1.9 billion Wikipedia-derived relatedness edges that capture broad semantic associations across languages. Additionally, domain-labeled synsets categorize content into specialized fields such as , science, and technology, while integration of contributes labeled relations like hypernymy and meronymy. These metrics highlight BabelNet's role as a comprehensive , particularly strong in European languages where coverage is extensive—for instance, English alone accounts for over 14 million synsets—while offering emerging support for low-resource languages through integrations like , including examples such as Kavalan and Hadza. As of November 2025, no significant updates beyond version 5.3 have been released, suggesting that while the resource remains robust, its expansion may lag behind real-time linguistic developments in underrepresented areas.
MetricQuantity (Version 5.3)
Languages600
Synsets22,892,310
Word Senses1,706,278,218
Concepts7,327,078
Named Entities15,565,232
Definitions159,683,527
Images61,431,991
Total Relations1,911,610,725

Applications and Impact

Natural Language Processing Uses

BabelNet serves as a foundational resource in (WSD), functioning both as a inventory for multilingual datasets and as a feature in graph-based algorithms. In SemEval-2013 Task 12, systems leveraging BabelNet senses achieved state-of-the-art F1 scores exceeding 70% across English, , , , and , with top performances reaching 71.0 F1 for Spanish nouns. As of 2021, more recent integrations had pushed accuracies beyond 80% in supervised WSD pipelines, as highlighted in comprehensive surveys of the field. By 2025, evaluations of large language models (LLMs) on extended benchmarks like XL-WSD, built using BabelNet, demonstrate continued in zero-shot multilingual WSD across 18 languages. For entity linking and recognition, BabelNet enables the disambiguation of textual mentions to multilingual synsets, with the associated Babelfy tool providing a unified graph-based framework that jointly performs WSD and across numerous languages. This approach matches or surpasses monolingual baselines in , particularly for cross-lingual scenarios where named entities are mapped to BabelNet's integrated Wikipedia-derived concepts. In 2024, it has been applied in domain-specific tasks, such as biomedical entity linking using dual gloss encoders from BabelNet 5.0. BabelNet's semantic network supports graph-based measures of similarity and relatedness through its weighted edges, derived from relations like hypernymy and meronymy in integrated resources. These measures enhance tasks such as detection, where relatedness paths between synsets help identify semantic equivalents, and , by ranking candidate answers based on contextual proximity in the . For instance, approaches like NASARI utilize BabelNet's structure to generate interpretable embeddings that improve relatedness scoring in downstream applications. In multilingual applications, BabelNet facilitates cross-lingual by aligning concepts across languages without parallel corpora, as seen in alignment benchmarks like MuCoW, which exploit mappings for 10 language pairs. A 2021 survey underscores its role in enabling zero-shot multilingual , such as in the XL-WSD framework covering 18 languages, by providing a shared synset inventory for transfer from high-resource to low-resource settings.

Broader Applications

BabelNet enhances multilingual by enabling semantic indexing that bridges language barriers in search engines. For instance, it supports cross-lingual through its integration with terminological databases like IATE, the Union's inter-institutional resource, facilitating domain-specific searches across 24 official EU languages via tools such as Babelfy for and . This approach has been applied in EU-funded initiatives, including pilots for multilingual access in collections, where query translation and semantic matching improve retrieval of diverse . Additionally, BabelNet's synset-based representations aid in tasks like detection by clustering related concepts from multilingual keywords, thereby refining search relevance. As of 2024, it supports massively multilingual vision-language evaluation in benchmarks like Babel-ImageNet, translating labels to 100 languages for zero-shot image classification in models like CLIP. In knowledge graph completion, BabelNet integrates with resources such as DBpedia and YAGO to populate ontologies with multilingual semantic relations, enriching sparse graphs with encyclopedic and lexical data. This integration supports applications in recommendation systems, where semantic product matching leverages BabelNet's concept alignments to suggest items based on cross-lingual similarities, as seen in e-commerce scenarios involving diverse linguistic inventories. By combining BabelNet's wide-coverage network with DBpedia's structured extractions from Wikipedia and YAGO's temporal facts, these systems achieve more comprehensive entity resolution and relation inference without relying solely on monolingual sources. BabelNet contributes to and by serving as a foundation for multilingual tools, where its aligned synsets enable the creation of interlinked dictionaries that support linguistic across hundreds of languages. In language learning applications, it functions as an interactive , providing users with concept explanations and translations to facilitate vocabulary acquisition and cultural understanding. Furthermore, within , BabelNet aids historical text by mapping evolving terminologies to stable semantic networks, as demonstrated in studies of lexical that trace conceptual shifts in archival documents. Its role in the ELEXIS project underscores this, promoting standardized interlinking of European lexicographic resources for scholarly research in low-resource languages. In 2025, it has been used to generate multilingual benchmarks for robust text editing, such as BABELEDITS, which tests performance on edits across languages. Commercially, BabelNet is licensed through Babelscape, the company commercializing the technology, for deployment in services that require robust multilingual understanding. These licenses enable applications in chatbots, where BabelNet's powers context-aware responses across languages, and in systems that use synset clustering to detect semantically related violations in user-generated text. In , it supports semantic product matching and personalized recommendations by integrating with enterprise knowledge bases, as evidenced by its use in document representation for fraud detection and inventory alignment in global marketplaces. Babelscape's solutions, adopted by organizations like the , highlight BabelNet's scalability for semantic processing in production environments.

Associated Tools and Access

Core Tools

BabelNet is supported by several core tools that extend its for practical applications in and beyond. These tools leverage BabelNet's multilingual synsets to provide specialized functionalities, such as , verbal semantics, and collocational analysis, and are generally accessible via open-source downloads or . Babelfy serves as a key multilingual service for and , processing input text to identify and map ambiguous mentions—such as words or named entities—to the most appropriate BabelNet synsets. It employs a graph-based approach that first associates candidate meanings with BabelNet vertices using semantic signatures, then extracts linkable fragments from the text, and finally applies a densest to select coherent interpretations, outputting annotations with confidence scores that reflect the strength of the disambiguation. This enables handling of texts in over 270 languages, including a mode for mixed-language content, making it suitable for tasks like semantic annotation of documents. Babelfy is available through a and , with its underlying model integrated with BabelNet 3.0 (as of ). VerbAtlas functions as a resource focused on verbal semantics, organizing BabelNet's verbal synsets into semantically coherent frames equipped with prototypical argument structures. It links verbs to corresponding BabelNet synsets while specifying semantic roles for arguments (e.g., , ) and incorporating selectional preferences, such as restrictions on argument types, along with details on implicit, , or default arguments to capture nuanced verbal behaviors. Covering 11,529 verb synsets from 3.0 fully integrated with BabelNet, this tool aids in and verb understanding across languages. VerbAtlas is hand-crafted for accuracy and is downloadable under a , with a web interface for exploration. SyntagNet provides a collocational knowledge base that captures lexical-semantic combinations, such as noun-verb or noun-noun pairs, by extracting co-occurrences from large corpora like and the , then manually disambiguating and aligning them to BabelNet synsets. It includes approximately 88,000 such combinations across more than 20,000 synsets, emphasizing syntagmatic relations that distinguish word senses based on contextual patterns, thereby enhancing BabelNet's paradigmatic structure with practical usage data. This resource supports applications in and lexical in multiple languages through tools like SyntagRank. SyntagNet is accessible via a web interface, , and downloads. Additionally, alignments between and BabelNet facilitate visual-semantic tasks by mapping ImageNet's image categories—derived from synsets—to BabelNet's multilingual concepts, enabling cross-modal applications like multilingual image captioning or vision-language model evaluation. This linkage, exemplified in resources like Babel-ImageNet, provides translations of over 1,000 ImageNet labels to more than 100 languages without relying on , ensuring high-quality semantic correspondence. Such alignments are integrated into BabelNet's ecosystem and support open-access benchmarks for research.

APIs and Interfaces

BabelNet provides programmatic access through dedicated client libraries in and , enabling developers to query its for synsets, senses, and relational paths. The , version 5.3 released in December 2023, offers a comprehensive set of classes and methods for interacting with the resource either online via HTTP requests or offline using local indices; it supports operations such as retrieving synset details, word senses, and edge relations between concepts, which facilitate computations like based on path lengths or shared hypernyms. Similarly, the , available via PyPI since October 2022, mirrors these functionalities with methods for synset ID retrieval, sense extraction, and graph edge queries, allowing seamless integration into pipelines for tasks involving multilingual semantic analysis. Both libraries require configuration with an for online access and are designed for high-performance querying, with the version compatible with JRE 1.8 and the version requiring 3.8 or later. For interactions, BabelNet exposes a at ://babelnet.org/sparql/, which allows users to perform complex graph queries on its structure, such as retrieving hypernym chains or semantic relations between entities across languages. This integrates with the Linguistic (LLOD) Cloud, providing dereferenceable URIs via ://babelnet.org/rdf/ for direct access to RDF representations of synsets and relations. RDF dumps, while primarily available for earlier versions like 2.0 in format, enable bulk downloads and local querying for advanced users, supporting standards-compliant exploration without real-time dependencies. Web services are accessible through a RESTful HTTP hosted at babelnet.io, which returns responses for endpoints like getSynset, getSenses, and getEdges, requiring an obtained via registration. Academic and non-commercial use is free under the BabelNet Non-Commercial License, restricted to research institutions and mandating attribution to the official source, while commercial applications are handled through Babelscape's enterprise services. This API supports rate limits to ensure fair usage, with compression recommended for efficient data transfer. Download options include the full BabelNet 5.3 indices, approximately 45 GB, available upon request for non-commercial to enable offline and integrations. Smaller subsets or bundles, such as the 22 MB package, allow quick starts without the full dataset, and the resource's RDF format facilitates loading into editors for visualization and extension.

Recognition and Influence

Awards and Prizes

BabelNet and its principal developer, Roberto Navigli, have received several notable awards recognizing their contributions to multilingual semantic resources and . In 2015, Navigli was awarded the Prize for Excellent Research by the META Technology and Research Innovation for European Language Technologies initiative, honoring BabelNet's pioneering role in advancing multilingual through its integration of diverse lexical and encyclopedic knowledge sources. In 2023, received the Fellowship for outstanding contributions to , including advancements in multilingual resources like BabelNet. The foundational 2012 paper on BabelNet's automatic , , and application, published in the Artificial Intelligence Journal, earned the AIJ Prominent Paper Award in 2017, highlighting its enduring impact on the field of development and semantic integration. BabelNet has also garnered media recognition for its innovative approach to bridging linguistic barriers, featuring prominently in Time magazine's May 2016 article "Redefining the Modern Dictionary," which praised it as a groundbreaking tool for enhancing global language understanding and reliability through crowdsourced validation. Additionally, the project's development was supported by the European Research Council's MultiJEDI Starting Grant (2011–2016), a prestigious funding award that provided foundational resources for creating large-scale multilingual lexical assets and enabling advanced text understanding applications.

Academic and Industry Impact

BabelNet has accumulated over 5,000 citations in academic literature since its 2010 introduction, reflecting its substantial influence as measured by metrics as of 2025. This body of work underscores its role as a foundational resource in multilingual (NLP), where it has been leveraged in extensions like BabelBERT to enhance models such as multilingual BERT (mBERT) and XLM-RoBERTa (XLM-R) by providing aligned lexical-semantic knowledge across languages, improving cross-lingual performance in tasks like and . In industry, BabelNet powers commercial offerings from Babelscape, its parent company, including tools for multilingual semantic parsing, , and knowledge-enhanced search. It has also been integrated into European Union Horizon 2020 projects, such as ELEXIS and TraMOOC, supporting advancements in cross-lingual and for diverse applications. These adoptions demonstrate BabelNet's practical utility in bridging linguistic barriers for enterprise-level solutions. BabelNet's broader influence extends to enabling progress in low-resource languages by automatically aligning lexical resources like WordNets and Wikipedia editions, facilitating and zero-shot capabilities in underrepresented linguistic contexts. A 2021 survey emphasizes its contributions to multilingual semantic networks while identifying gaps in coverage for non-Indo-European languages as a key area for future enhancement, particularly in expanding synset density and cultural specificity. Despite these strengths, BabelNet exhibits signs of potential stagnation, with no major version updates released since version 5.3 in December 2023. In comparison, evolving knowledge graphs like , released in 2024, incorporate dynamic integrations and refined taxonomies, highlighting the need for BabelNet to pursue more frequent, automated enrichment to maintain competitiveness in rapidly advancing KG ecosystems.

References

  1. [1]
    About - BabelNet
    BabelNet is an innovative multilingual encyclopedic dictionary, with wide lexicographic and encyclopedic coverage of terms, and a semantic network/ontology.
  2. [2]
    BabelNet: The automatic construction, evaluation and application of ...
    Aug 9, 2025 · We present an automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network.
  3. [3]
    BabelNet: Building a Very Large Multilingual Semantic Network
    Roberto Navigli and Simone Paolo Ponzetto. 2010. BabelNet: Building a Very Large Multilingual Semantic Network. In Proceedings of the 48th Annual Meeting of the ...
  4. [4]
  5. [5]
    BabelNet: The automatic construction, evaluation and application of ...
    We present an automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network.
  6. [6]
    None
    **Authors and Affiliation:**
  7. [7]
    Statistics | BabelNet
    ### BabelNet Statistics (Version 5.3, Released 2023/12)
  8. [8]
    [PDF] Ten Years of BabelNet: A Survey - IJCAI
    Aug 11, 2021 · Over its ten years of development, there has been a massive growth in the number of supported languages (from 6 to 500), sources. (from 4 to 51) ...
  9. [9]
  10. [10]
    News - BabelNet
    The BabelNet model is centered around multilingual synsets, i.e., concepts and named entities lexicalized in many languages, and connected with large amounts of ...
  11. [11]
    MultiJEDI - Sapienza NLP
    MultiJEDI is a 5-year ERC Starting Grant (2011-2016) headed by Prof. Roberto Navigli at the Linguistic Computing Laboratory of the Sapienza University of Rome.
  12. [12]
    About Babelscape Team
    Babelscape is a deep-tech startup company founded in 2016 as a spin-off of Sapienza University of Rome. The company multilingual Natural Language Processing ( ...Missing: BabelNet | Show results with:BabelNet
  13. [13]
    [PDF] BabelNet - Dipartimento di Informatica
    Aug 1, 2012 · [91] R. Navigli, S.P. Ponzetto, BabelNet: Building a very large multilingual semantic network, in: Proceedings of the 48th Annual Meeting of the ...
  14. [14]
    Announcement: Release of BabelNet 5.3 - Corpora - ELRA lists
    Dec 7, 2023 · We are proud to announce the release of a new version of BabelNet ... *Version 5.3* ships with the following features: - *80 new languages ...
  15. [15]
  16. [16]
    [PDF] SemEval-2013 Task 12: Multilingual Word Sense Disambiguation
    Table 3: System performance, reported as F1, for all five languages in the test set when using BabelNet senses. Top performing systems are marked in bold. nouns ...
  17. [17]
    [PDF] A BIG MULTILINGUAL TERMINOLOGICAL DATA SPACE
    Apr 29, 2015 · How to link IATE to BabelNet? • We are leveraging Babelfy, a joint graph-based approach to multilingual Entity Linking and Word Sense.
  18. [18]
    Implementation and Evaluation of a Multilingual Search Pilot in the ...
    Sep 20, 2022 · BabelNet is a very large, wide-coverage multilingual ontology. This resource is created by linking the largest multilingual Web encyclopedia - ...
  19. [19]
  20. [20]
    Babelscape®: Language generation, semantically grounded
    Developed by the Sapienza NLP group, led by Professor Roberto Navigli, Minerva represents a groundbreaking advance in multilingual natural language processing.Babelscape Vera · Babelscape News: AI, ML and... · About us · Text AnalyticsMissing: BabelNet | Show results with:BabelNet
  21. [21]
    BabelNet | The largest multilingual encyclopedic dictionary and ...
    BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms in 500 languages, and a semantic network ...About · Downloads · API Guide · News
  22. [22]
    About - Babelfy
    Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation based on a loose identification of candidate meanings.Missing: tools VerbAtlas SyntagNet ImageNet- alignment documentation
  23. [23]
    About | VerbAtlas
    VerbAtlas is a hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from BabelNet into semantically-coherent frames.
  24. [24]
    VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its ...
    We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent ...Missing: BabelNet | Show results with:BabelNet
  25. [25]
    About - SyntagNet
    SyntagNet is a manually-curated database associating concept pairs with co-occurring words, created from Wikipedia and British National Corpus, covering 78,000 ...
  26. [26]
    Babel-ImageNet: Massively Multilingual Evaluation of Vision ... - arXiv
    Jun 14, 2023 · Babel-ImageNet is a multilingual benchmark with 100 language translations of ImageNet labels, used to evaluate vision-and-language models.
  27. [27]
    Downloads | BabelNet
    BabelNet Python API version 1.1.0 (October 2022). The API, developed by Babelscape, provides a full set of classes and methods equivalent to the Java API. Both ...Missing: history | Show results with:history
  28. [28]
    API Guide | BabelNet
    BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms in 500 languages, and a semantic network ...
  29. [29]
    BabelNet - LOD Cloud
    BabelNet is a multilingual dictionary and ontology connecting concepts and named entities, with 1,971,744,856 triples. Each synset contains synonyms for a ...
  30. [30]
    License | BabelNet
    ### BabelNet Non-Commercial License Summary
  31. [31]
    AIJ Awards: List of Current and Previous Winners
    2017 PROMINENT PAPER AWARD was shared by the following two papers: BabelNet: The automatic construction, evaluation and application of a wide-coverage ...Missing: prizes | Show results with:prizes<|control11|><|separator|>
  32. [32]
    Redefining the Modern Dictionary | TIME
    May 12, 2016 · To increase BabelNet's reliability, Navigli has released online games and this fall plans to launch a social network through which users ...
  33. [33]
    ‪Roberto Navigli‬ - ‪Google Scholar‬
    BabelNet: Building a very large multilingual semantic network. R Navigli, SP Ponzetto. Proceedings of the 48th annual meeting of the association for ...
  34. [34]
    [2208.01018] BabelBERT: Massively Multilingual Transformers Meet ...
    Aug 1, 2022 · In this work we expose massively multilingual transformers (MMTs, eg, mBERT or XLM-R) to multilingual lexical knowledge at scale, leveraging BabelNet.Missing: foundational NLP
  35. [35]
    Fully-Semantic Parsing and Generation: the BabelNet Meaning ...
    See firsthand how our products can transform your business by providing advanced multilingual understanding, entity linking, semantic search, and more. Explore ...Missing: chatbots | Show results with:chatbots
  36. [36]
    EU's BabelNet Breaking through to Business Applications - Slator
    Oct 29, 2015 · Navigli is an associate professor at the Linguistic Computing Laboratory of the Sapienza University of Rome. He holds a PhD in Computer Science ...<|separator|>
  37. [37]
    YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy
    In this paper, we extend YAGO 4 with a large part of the Wikidata taxonomy - while respecting logical constraints and the distinction between classes and ...Missing: graph | Show results with:graph<|control11|><|separator|>