Wikidata
Wikidata is a free and open collaborative knowledge base hosted and maintained by the Wikimedia Foundation, operating as a multilingual central storage repository for structured data that is editable by both humans and machines.[1][2] Launched on October 29, 2012, it supports Wikimedia sister projects such as Wikipedia, Wikimedia Commons, Wikivoyage, Wiktionary, and Wikisource by providing shared factual data, managing interlanguage links, and enabling automated updates for elements like infoboxes and lists.[3][2] As of November 2025, Wikidata contains over 119 million data items, making it the largest Wikimedia project by content volume.[1] The project's data model represents knowledge through items (entities like people, places, or concepts), properties (relations or attributes, with over 13,000 defined, many as external identifiers), and statements structured as subject-predicate-object triples, such as "Tim Berners-Lee (Q80) instance of (P31) human (Q5)."[4] Statements can include qualifiers for additional context (e.g., specifying a time period) and references to sources for verifiability, with ranks (preferred, normal, or deprecated) to indicate reliability or preference.[4] This flexible, extensible structure supports 12 core data types (e.g., strings, quantities, URLs) and integrates with extensions like WikibaseLexeme for linguistic data, allowing complex, computable representations of real-world knowledge.[4][5] Initially funded by the Allen Institute for AI, the Gordon and Betty Moore Foundation, and Google, Inc., Wikidata has evolved into a key resource for open data initiatives, interlinking with external datasets and enabling tools like the Wikidata Query Service for SPARQL-based queries.[2][6] All content is released under the Creative Commons Zero (CC0) dedication, permitting unrestricted reuse and modification.[1] Its growth reflects community-driven contributions, fostering applications in research, cultural heritage, and beyond while addressing challenges like data quality and multilingual coverage.[2][4]History
Inception and Early Development
In 2011, Wikimedia Deutschland proposed the creation of Wikidata as a central repository to address key challenges in Wikipedia maintenance, particularly the decentralized management of interlanguage links and the repetitive updating of infobox content across multiple language editions.[7] This initiative aimed to centralize structured data, reducing duplication and errors that arose from editors manually synchronizing links and facts in over 280 language versions of Wikipedia at the time.[7] The proposal outlined a phased approach, starting with interlanguage links to streamline navigation between articles on the same topic in different languages, thereby easing the burden on volunteer editors, especially in smaller Wikipedias.[7] Development of Wikidata officially began on April 1, 2012, in Berlin, under the leadership of Denny Vrandečić and Markus Krötzsch, who had earlier explored semantic enhancements for Wikipedia.[8] The project was initiated by Wikimedia Deutschland, with initial funding secured from Google, the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, and support from the Wikimedia Foundation, totaling approximately €1.3 million for the early stages.[9] During these initial months, the team focused on designing the core data model, which relied on simple property-value pairs to represent entities—such as linking a city to its population or coordinates—allowing for flexible, machine-readable storage without rigid schemas.[10] The beta version of Wikidata launched on October 29, 2012, initially restricting editing to the creation of items and their connections via interlanguage links to Wikipedia articles, marking the project's first operational phase.[11] This limited scope enabled early testing of the centralized linking system while laying the groundwork for broader structured data integration in subsequent phases.[10]Key Milestones and Rollouts
Wikidata's development was structured around three primary phases, each building on the previous to expand its functionality and integration with Wikimedia projects. Phase 1, from 2012 to 2013, established the foundational infrastructure for centralizing interlanguage links, replacing the fragmented system where each Wikipedia maintained its own links to other language versions. The project launched in beta on October 29, 2012, initially permitting users to create items—unique identifiers for concepts—and add sitelinks connecting them to corresponding articles across Wikimedia sites.[10] Pilot testing began in January 2013 with the Hungarian, English, and French Wikipedias, and by March 6, 2013, interlanguage links were enabled across all Wikipedias, streamlining maintenance and improving multilingual navigation.[10][12] Phase 2 in 2013 introduced the core data model, enabling Wikidata to store structured facts beyond mere linking. Statements, consisting of a property-value pair with optional qualifiers and references, were first added on February 4, 2013, initially supporting limited data types such as items and Wikimedia Commons media files. Properties, which define the types of relationships (e.g., "instance of" or "country"), and sitelinks were integrated as essential components, allowing items to represent real-world entities with verifiable claims. Editing of statements was opened to the public on February 4, 2013, marking the transition to full community-driven content creation and significantly increasing contributions.[10] Phase 3, beginning in 2014, focused on practical applications and broader interoperability, including the integration of Wikidata data into Wikipedia infoboxes and the central coordination of external identifiers. In July 2014, the English Wikipedia began widespread adoption of Lua modules to pull data from Wikidata into infobox templates, automating the display of structured information like birth dates or occupations while reducing redundancy across articles. This integration relied on properties dedicated to external identifiers (e.g., ISBN or GeoNames ID), positioning Wikidata as a hub for linking to external databases and enhancing data reuse beyond Wikimedia.[10] Subsequent milestones extended Wikidata's scope to specialized data types. In May 2018, lexemes were introduced to support linguistic data, allowing the storage of words, forms, senses, and etymologies in multiple languages, thereby complementing Wiktionary and enabling queries on lexical relationships.[13] In 2019, entity schemas were launched using the Shape Expressions (ShEx) language to define and validate data models, helping enforce constraints on item structures and improve data quality through community-defined templates.[14] These advancements solidified Wikidata's role as a versatile, multilingual knowledge graph up to 2023.Recent Advancements (2024–2025)
In 2025, the Wikidata community engaged in a comprehensive survey conducted in July 2024, with results released on October 13, 2025, revealing key insights into contributors' backgrounds and involvement patterns. The report highlighted that editing data remains the most common activity, while priorities such as research with Wikidata and building applications are growing, informing future developments including enhanced data quality frameworks to support more reliable and reusable structured information.[15] Technical enhancements continued with the launch of the "Search by entity type" feature on June 19, 2025, which introduced typeahead compatibility in the Wikidata search box and allowed users to filter results specifically to items, properties, or lexemes. This update significantly improved navigation for users seeking particular data classes, streamlining access to the database's diverse entity types.[1] Accessibility efforts advanced through the introduction of a mobile editing prototype on June 12, 2025, enabling statement editing on items directly from mobile devices—a long-requested capability. Community feedback was actively solicited via video demonstrations and discussions, aiming to refine the tool for broader usability and inclusivity among editors on the go.[1] The WikidataCon 2025 conference, held online from October 31 to November 2, 2025, and organized by Wikimedia Deutschland, gathered developers, editors, and organizations to explore advancements in linked open data, with a strong emphasis on AI integrations and collaborative tools for enhanced data connectivity.[16] Wikimedia Deutschland's 2025 plan, outlined in February 2025 and aligned with strategic goals through 2030, prioritized scalability for Wikidata as part of broader linked open data infrastructure, targeting a doubling of technology contributors by 2030 to handle expanding data volumes and global participation. The plan also supported machine editing initiatives by improving editing experiences and productivity tools, facilitating automated contributions while maintaining community oversight.[17] On October 29, 2025, Wikidata received official recognition as a digital public good from the Digital Public Goods Alliance, affirming its role as an openly licensed, collaborative knowledge base with over 1.6 billion facts that promotes equitable access to information worldwide—the second Wikimedia project after Wikipedia to earn this distinction.[18]Core Concepts
Items and Identifiers
Items are the primary entities in Wikidata, representing real-world topics, concepts, or objects such as people, places, events, or abstract ideas.[19] Each item serves as a unique container for structured data about its subject, enabling the centralized storage and reuse of information across Wikimedia projects without redundancy.[20] For instance, the item for Douglas Adams is identified as Q42, which encapsulates all relevant data about the author in one place.[20] Every item is assigned a unique identifier known as a Q-ID, consisting of the letter "Q" followed by a sequential numeric code, such as Q42 or Q7186 for Marie Curie.[19] These Q-IDs ensure global uniqueness within Wikidata, preventing duplication by providing a single reference point for each entity regardless of language or project.[20] As of November 2025, Wikidata contains over 119 million items, forming the foundational scale of its knowledge base.[1] The structure of an item includes monolingual labels, which provide the primary name in a specific language (e.g., "Douglas Adams" in English for Q42); descriptions, offering a brief disambiguating summary (e.g., "English writer and humorist (1952–2001)"); and aliases, listing alternative names or variants (e.g., "DNA" as an alias for Q42).[19] Additionally, sitelinks connect the item to corresponding pages on other Wikimedia sites, such as linking Q42 to the English Wikipedia article on Douglas Adams, facilitating seamless cross-project navigation and data synchronization.[20] This structure allows Q-IDs to enable efficient linking across projects; for example, a single item like Q8470 for the 1988 Summer Olympics can be referenced uniformly in multiple Wikipedias, avoiding the need for separate, inconsistent entries.[19]Properties
Properties in Wikidata are reusable attributes that function as unique descriptors to define relationships and values for items, forming the predicates in the knowledge graph's triples. Each property has a dedicated page on Wikidata and is identified by a unique alphanumeric label consisting of the prefix "P" followed by a sequential number, referred to as a P-ID; for example, P31 denotes "instance of," which classifies an item as belonging to a specific class or category.[21] The creation of properties follows a structured, community-governed process to maintain relevance and avoid redundancy. Proposals for new properties are submitted to the dedicated Property proposal forum, where editors discuss their necessity, proposed datatype, and constraints; approval requires consensus or sufficient support from the community. Upon approval, the property is created by users with appropriate permissions, such as property creators or administrators, and assigned the next available sequential numeric ID starting from P1. This process ensures that properties are introduced only when they address a clear need in describing entities.[21] Properties are categorized by their datatype, which dictates the structure and validation of values they accept, enabling diverse representations of information. Common datatypes include those for geographical coordinates (e.g., P625, used for location data), dates and times (e.g., P569 for date of birth or death), and external identifiers that link to external databases (e.g., P345 for IMDb person ID or P214 for VIAF ID). These types support interoperability with other linked data systems. The most extensively used property is "cites work" (P2860), applied to over 312 million item pages as of November 2025, primarily for bibliographic citations in scholarly and creative works.[21][22] To promote data integrity, properties incorporate constraints that enforce validation rules on associated statements. Examples include format constraints to ensure values match expected patterns (e.g., ISO 8601 for dates), uniqueness constraints limiting a property to a single value per item (e.g., for identifiers like ISBN), and type constraints verifying that values align with specified classes or formats. These are defined on the property's page and checked automatically during editing, aiding in error detection and quality control.Statements
In Wikidata, statements form the fundamental units of structured data, representing assertions about entities through a subject-predicate-object triple structure. The subject is an item, such as a person, place, or concept identified by a unique Q-number (e.g., Q42 for Douglas Adams); the predicate is a property (e.g., P31 for "instance of"); and the object is the value, which can be another item (e.g., Q5 for "human"), a string, a quantity, a date, or other supported data types.[23] This triple-based model aligns with linked data principles, enabling interconnections across the knowledge base.[4] Properties in Wikidata often permit multiple statements to accommodate real-world complexity, such as varying attributes over time or contexts, with each statement-value pair assigned a rank to indicate its status: preferred for the most reliable or current information, normal as the default, or deprecated for outdated or incorrect data.[23][24] Ranks help editors and consumers prioritize information without deleting historical details. These statements can be enhanced with qualifiers for additional context, such as specifying a time period or location, though core assertions remain self-contained.[23] As of April 2025, Wikidata encompasses approximately 1.65 billion statements, supporting complex queries and integrations across Wikimedia projects and beyond.[25] A representative example is the statement for Douglas Adams (Q42): instance of (P31) human (Q5), which establishes the item's classification as a person.[23] This scale underscores Wikidata's role as a vast, collaborative repository of verifiable facts.Lexemes
Lexemes were introduced to Wikidata in May 2018 to extend its data model beyond encyclopedic concepts, enabling the structured storage of linguistic and lexical information for words, phrases, and their variations across languages.[26] Unlike general items (Q-IDs), lexemes are specialized entities identified by unique L-IDs, such as L7 for the English noun "cat," allowing for the representation of language-specific lexical units.[27] This addition supports the integration of dictionary-like data, complementing projects like Wiktionary through shared identifiers and tools for cross-referencing.[26] The core structure of a lexeme centers on its lemma, the canonical base form of the word (e.g., "cat" for L7), associated with a specific language item (such as Q1860 for English) and a lexical category denoting its grammatical role, like noun or verb.[28] Senses capture the distinct meanings of the lemma, each with a gloss and optional statements linking to related concepts; for instance, L7 includes senses for the domesticated animal and the musical instrument.[27] Forms represent inflected or derived variants, such as "cats" or "cat's," including textual representations and grammatical features like number (plural) or case, drawn from ontologies in the Linguistic Linked Open Data community.[28] These components allow lexemes to link to broader Wikidata items via statements, facilitating connections between lexical and encyclopedic knowledge.[29] As of 2025, Wikidata's lexeme collection includes over 1.3 million entries across hundreds of languages, reflecting rapid community contributions.[30] This growth underscores lexemes' role in supporting Wiktionary integration, where data can be imported or exported to enrich dictionary entries.[26] Lexemes enable detailed linguistic annotations, such as etymological links tracing word origins; for example, the Afrikaans lexeme for "hond" (dog, L208466) connects through derivations to Dutch, Middle Dutch, Old Dutch, Proto-Germanic, and ultimately Proto-Indo-European roots.[31] Pronunciation data, including International Phonetic Alphabet (IPA) transcriptions, is attached to forms and qualified by senses to specify contexts like regional accents.[28] These features promote applications in natural language processing and multilingual research by providing verifiable, interconnected lexical data.[31]Entity Schemas
Entity schemas in Wikidata are declarative models that define the expected structure and constraints for classes of entities, enabling validation to ensure data consistency and quality. Launched on May 28, 2019, they utilize the Shape Expressions (ShEx) language, expressed in ShExC syntax, and are stored as a dedicated entity type in the EntitySchema namespace, identifiable by the prefix "E". This infrastructure allows users to specify required properties, their cardinalities, allowed values, and relationships for specific item classes, such as mandating a birth date property for items of the class "person" (Q215627). Unlike property constraints, which focus on individual properties, entity schemas provide holistic shape definitions for entire entity sets, including qualifiers and references.[14][32] The primary purpose of entity schemas is to model and validate RDF-based data structures within Wikidata, facilitating the detection of inconsistencies or errors during editing. Community members propose and develop schemas through the WikiProject Schemas, with versioning supported via Wikidata's page history mechanism, allowing revisions and tracking of changes over time. Integration with editing tools enhances usability; for instance, ShExStatements enables schema generation from CSV files and validation against Wikidata items, while tools like Entityshape and WikiShape provide visual interfaces for creation and testing. These features promote collaborative maintenance, where schemas can be proposed via requests for comment (RfC) to standardize data structures for particular subjects.[32][33][34] In practice, entity schemas support domain-specific applications, particularly in biomedicine, where they ensure consistent representation of entities like genes, proteins, and virus strains. For example, schemas for molecular biology entities define mandatory properties such as sequence data or taxonomic classifications, aiding in the integration of biomedical ontologies and reducing variability in knowledge graph subsets. Research initiatives have proposed expanding these schemas to cover clinical entities, enhancing Wikidata's utility in health-related data modeling and validation.[35][36][37]Data Structure and Management
Qualifiers, References, and Constraints
In Wikidata, qualifiers, references, and constraints serve as essential mechanisms to add context, verifiability, and validation to statements, which are the core property-value pairs representing knowledge about entities.[23] Qualifiers provide additional details to refine a statement's meaning, references link statements to supporting sources for credibility, and constraints enforce rules to maintain data consistency and prevent errors. Together, these features enhance the reliability and usability of Wikidata's knowledge graph by allowing nuanced, sourced, and structured information without altering the primary statement structure. Qualifiers are property-value pairs attached to a statement to expand, annotate, or contextualize its main value, offering further description or refinement without creating separate statements.[38] For instance, a statement about the population of France (66,600,000) might include qualifiers such as "excluding Adélie Land" to specify territorial scope, or for Berlin's population (3,500,000), qualifiers like "point in time: 2005" and "method of estimation" clarify temporal and methodological aspects.[38] Similarly, a statement designating Louis XIV as King of France could be qualified with "start time: 14 May 1643" and "end time: 1 September 1715" to denote the duration of his reign.[38] These qualifiers modify the statement's interpretation—such as constraining its validity to a specific period or context—while avoiding ambiguity by not altering other qualifiers on the same statement. By enabling such precision, qualifiers help resolve multiple possible values for a property and support community consensus on disputed facts through ranking mechanisms.[38] References in Wikidata consist of property-value pairs that cite sources to back up a statement, ensuring its verifiability and traceability to reliable origins.[39] They typically employ properties like "stated in (P248)" to reference publications or items (e.g., books or journals) and "reference URL (P854)" for online sources, often supplemented with details such as author, publication date, or retrieval date.[39] For example, a statement about a scientific fact might reference the CRC Handbook of Chemistry and Physics via its Wikidata item, or an online claim could cite a specific webpage URL with the access date to account for potential changes.[39] References are required for most statements, except those involving common knowledge or self-evident data, and can be shared across multiple statements to promote efficiency. This sourcing practice upholds Wikidata's commitment to reliability, allowing users to verify claims against primary or authoritative materials like academic journals or official databases.[39] Constraints are predefined rules applied to properties via the "property constraint (P2302)" property, functioning as editorial guidelines to ensure appropriate usage and detect inconsistencies in data entry.[40] Implemented through the Wikibase Quality Constraints extension, these rules—over 30 types in total—categorize into datatype-independent (e.g., single-value, which limits a property like place of birth to one value per entity) and datatype-specific (e.g., format, which validates identifiers against patterns like ISBN or email syntax).[40] For instance, a single-value constraint prevents duplicate entries for unique attributes, while a format constraint ensures telephone numbers adhere to expected structures. Violations are reported to logged-in editors via tools like the constraint report, though exceptions can be explicitly noted using qualifiers like "exception to constraint (P2303)" for edge cases, such as a fictional entity defying real-world rules. By providing these checks, constraints proactively prevent errors, promote data quality, and guide contributors toward consistent modeling, ultimately bolstering the graph's integrity without imposing rigid enforcement.[40]Editing Processes and Tools
Human editing on Wikidata primarily occurs through a web-based interface accessible via the project's main site, where users can search for existing items using titles or identifiers and create new ones if none exist.[41] To create an item, editors enter a label (the primary name in a chosen language) and a brief description to disambiguate it, followed by adding aliases for alternative names and interwiki links to corresponding Wikipedia articles in various languages.[41] Once created, statements—structured triples consisting of an item, property, and value—can be added directly in the interface, with options to include qualifiers and references for precision.[42] For larger-scale human contributions, tools like QuickStatements enable batch uploads by allowing editors to input simple text commands or CSV files to add or modify labels, descriptions, aliases, statements, qualifiers, and sources across multiple items.[43] This tool, developed by Magnus Manske, processes commands sequentially via an import interface or external scripts, making it suitable for importing data from spreadsheets without needing programming knowledge, though users must supply existing item identifiers (QIDs) for accurate targeting.[43] Similarly, OpenRefine supports reconciliation of external datasets with Wikidata by matching values in tabular data (e.g., names or identifiers) to existing items through a dedicated service, flagging ambiguities for manual review and enabling bulk additions of new statements or links.[44] OpenRefine's process involves selecting a reconciliation endpoint (such as the Wikidata-specific API), restricting matches by entity types or languages, and using property paths to pull in details like labels or sitelinks for augmentation.[44] Machine editing on Wikidata is governed by strict guidelines to ensure quality and prevent disruption, with bots—automated or semi-automated scripts—requiring separate accounts flagged as "bot" and operator contact information.[45] Approval for bot flags is obtained through community requests on Meta-Wiki, where proposals detail the bot's purpose, such as importing identifiers from external databases (e.g., ISBNs or GeoNames IDs), and undergo review for compliance with edit frequency limits and error-handling mechanisms; global bots may receive automatic approval for specific tasks like interwiki maintenance.[45] Once approved, bots operate under reduced visibility in recent changes to avoid overwhelming human editors, but they must pause or be blocked if malfunctions occur, with flags revocable after discussion or prolonged inactivity.[45] Collaboration and maintenance rely on version history, which tracks all edits to an item with timestamps, user attributions, and diffs for comparison, allowing reversion to prior states via the "history" tab.[42] Talk pages associated with each item facilitate discussions on proposed changes, disputes, or improvements, mirroring Wikimedia's broader discussion norms.[42] Reversion tools integrated into the interface enable quick undoing of errors or vandalism, often used in tandem with watchlists to monitor items.[42] Wikidata's community upholds norms emphasizing notability for items—requiring they support Wikimedia projects, link to reliable sources, or fill structural roles—while promoting neutrality through unbiased descriptions and balanced statements.[42] All claims must be sourced to verifiable references, such as published works or databases, with unsourced statements discouraged and subject to removal; editors are encouraged to join WikiProjects for coordinated adherence to these standards.[42]Content Scope and Quality Control
Wikidata's content scope encompasses structured data across diverse domains, including over 10 million biographies of humans marked as instances of the "human" class (Q5), detailed geographic entities such as locations and administrative divisions, medical concepts like diseases and treatments, and scholarly metadata through initiatives like WikiCite for citations and references. This breadth supports interoperability with Wikimedia projects and external applications while adhering to strict verifiability standards, ensuring all entries draw from reliable, published sources rather than primary data collection. Notably, Wikidata explicitly excludes original research, personal opinions, or unpublished material, positioning it as a secondary knowledge base that aggregates and links to authoritative references such as academic publications, official databases, and news outlets.[46][47][47][48] To facilitate coordinated development within these thematic areas, Wikidata relies on community-driven WikiProjects that focus on specific domains, providing guidelines, property standards, and collaborative tasks. For instance, WikiProject Music standardizes properties like performer (P175), instrument (P1303), and release identifiers (e.g., Discogs master ID, P1954) to enhance coverage of compositions, artists, albums, and genres, while enabling cross-project data mapping from Wikipedia and Commons. These projects promote thematic consistency by organizing SPARQL queries for gap analysis, encouraging contributor participation through chat channels and task lists, and ensuring alignment with broader Wikidata schemas without imposing rigid notability criteria.[49][50] Quality control mechanisms emphasize proactive detection and community oversight to uphold data integrity. Database reports, such as those tracking constraint violations, systematically scan for non-compliance with predefined rules—like mandatory qualifiers or format constraints—listing affected items and statements for editors to review and resolve, thereby preventing structural degradation. Community-voted deletions further support maintenance, allowing proposals for removing redundant or erroneous properties and items through dedicated request pages, where consensus guides administrative action. These tools integrate with editing interfaces to flag issues in real-time, drawing on templates like the Constraint template for automated validation.[51][52][53] Despite these safeguards, challenges persist in maintaining accuracy, particularly vandalism detection and multilingual consistency. Vandalism, often involving disruptive edits like false statements or mass deletions, is mitigated through machine learning classifiers that analyze revision features—such as edit patterns and abuse filter tags—to identify 89% of cases while reducing patroller workload by 98%, as demonstrated in research prototypes adaptable to Wikidata's abuse filter. Multilingual consistency presents another hurdle, with studies revealing issues like duplicate entities, missing triples, and taxonomic inconsistencies across language versions, exacerbated by varying editorial priorities and source availability, though constraint checks and cross-lingual queries help address them.[54][55]Technical Infrastructure
Software Foundation
Wikidata's software foundation is built upon the MediaWiki platform, which provides the core wiki engine for collaborative editing and version control.[56] The Wikibase extension suite transforms MediaWiki into a structured data repository, enabling the creation, management, and querying of entities such as items and properties in a versioned, multilingual format.[57] This integration allows Wikidata to leverage MediaWiki's established infrastructure while adding specialized capabilities for knowledge graph operations.[58] Data storage in Wikidata relies on a dual approach to handle both relational and graph-based needs. Items and properties are primarily stored in a MySQL database, which supports the revision history, entity metadata, and structured attributes through Wikibase's schema.[59] For RDF representations, the system uses Blazegraph as a triplestore to manage billions of RDF triples derived from Wikidata entities, facilitating efficient SPARQL queries via the Wikidata Query Service.[6] This separation ensures robust handling of both editable content and semantic linkages. To address the scale of Wikidata's growing dataset, the infrastructure incorporates scalability features such as sharding and caching. Horizontal sharding partitions data across multiple Blazegraph nodes to distribute query loads and manage edit propagation, with ongoing efforts to optimize entity-based splitting.[60] Caching mechanisms, including in-memory stores and diff-based updates, reduce latency by minimizing redundant computations during data synchronization.[60] The entire system is hosted on servers managed by the Wikimedia Foundation in data centers across multiple locations, ensuring high availability and global access.[61] Wikibase and its components are released under the GNU General Public License version 2.0 or later (GPL-2.0-or-later), promoting open-source development and allowing independent installations of Wikibase repositories beyond Wikidata.[62] This licensing aligns with MediaWiki's copyleft model, fostering community contributions and reuse in diverse structured data projects.Query Services and Data Access
Wikidata provides several mechanisms for retrieving and manipulating its structured data, enabling users and applications to access the knowledge graph efficiently. The primary query service is the Wikidata Query Service (WDQS), consisting of SPARQL endpoints launched in September 2015 that support complex, federated queries across Wikidata's RDF triples and external linked data sources. In May 2025, to enhance scalability, the WDQS backend was updated to split the dataset into a main graph (accessible via query-main.wikidata.org or the redirected query.wikidata.org) and a scholarly graph (query-scholarly.wikidata.org), with a legacy full-graph endpoint (query-legacy-full.wikidata.org) available until December 2025. Queries spanning both graphs now require SPARQL federation. The Wikimedia Foundation is also searching for a replacement to Blazegraph, the current triplestore backend, due to its lack of updates since 2018.[63][64][65][6] This service allows for sophisticated pattern matching and filtering, such as retrieving all instances of cities with a population exceeding 1 million, by leveraging predicates likewdt:P1082 for populated places and wdt:P31 for instance-of relations.[66]
In addition to SPARQL, Wikidata offers programmatic access through APIs tailored for different operations. The MediaWiki Action API facilitates both read and write interactions with entities, supporting actions like fetching entity data via wbgetentities or editing statements through wbeditentity.[67] Complementing this, the Wikibase REST API provides a modern, stateless interface primarily for entity retrieval, such as obtaining JSON representations of items or properties without the overhead of session-based authentication.[68] These APIs adhere to standard HTTP practices, with endpoints like https://www.wikidata.org/w/api.php for the Action API and https://www.wikidata.org/rest.php for REST operations, ensuring compatibility with a wide range of client libraries and tools.[30]
To illustrate basic querying, SPARQL SELECT patterns form the foundation of WDQS interactions. A simple example retrieves all humans born in the 20th century:
This pattern binds variables to subjects, predicates, and objects while applying filters for precision, drawing on Wikidata's statement structure where properties link items to values.[69] More advanced queries can federate with external endpoints using theSELECT ?human ?humanLabel ?birthDate WHERE { ?human wdt:P31 wd:Q5 . # instance of [human](/page/Human) ?human wdt:P569 ?birthDate . # date of birth FILTER(YEAR(?birthDate) >= 1900 && YEAR(?birthDate) < 2000) . [SERVICE](/page/Service) wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 10SELECT ?human ?humanLabel ?birthDate WHERE { ?human wdt:P31 wd:Q5 . # instance of [human](/page/Human) ?human wdt:P569 ?birthDate . # date of birth FILTER(YEAR(?birthDate) >= 1900 && YEAR(?birthDate) < 2000) . [SERVICE](/page/Service) wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 10
SERVICE keyword, expanding results beyond Wikidata's core dataset.[6]
The query services incorporate safeguards to maintain performance and reliability. WDQS enforces timeouts, typically set to 60 seconds for public queries, to prevent resource exhaustion from computationally intensive operations, alongside result limits such as a maximum of 10,000 rows per response to balance load.[70] Ongoing improvements include query optimization techniques, like index utilization in the underlying Blazegraph engine, and integration with user-friendly interfaces such as the Wikidata Query Service's built-in editor, which offers syntax highlighting, prefix autocompletion, and visualization of results as tables or graphs.[6] These enhancements, combined with tools like Query Helper for visual query building, lower the barrier for non-experts while supporting advanced federated explorations.[71]