Fact-checked by Grok 2 weeks ago

Query language

A query language is a specialized computer programming language designed to make requests (queries) into databases and information systems for the purpose of retrieving, manipulating, and managing data.^[1] These languages enable users to interact with structured or unstructured data stores by specifying selection criteria, often in a declarative manner that describes what data is needed rather than how to retrieve it.^[2] The development of query languages traces back to the 1970s, emerging from foundational work in relational database theory. In 1970, IBM researcher Edgar F. Codd published a seminal paper introducing the relational model, which laid the groundwork for systematic data querying.^[3] SQL (Structured Query Language), the most widely adopted query language, was initially developed by IBM in the early 1970s as SEQUEL (Structured English QUEry Language) to support relational databases like System R.^[4] By 1979, Oracle (then Relational Software, Inc.) released the first commercial SQL-based relational database management system, standardizing SQL as the de facto language for data operations.^[4] Over the decades, SQL evolved through ANSI and ISO standards (e.g., SQL-86, SQL-92), incorporating features for data definition, manipulation, and control, while alternatives like QUEL appeared in the 1980s but were eventually overshadowed by SQL's dominance.^[5] Query languages encompass various types tailored to different data models and use cases, broadly categorized as declarative (specifying desired results) or imperative (detailing retrieval steps).^[6] The primary subtypes include Data Query Language (DQL) for retrieving data, Data Manipulation Language (DML) for modifying it, and extensions like Data Definition Language (DDL) for schema management, all integral to SQL.^[7] Beyond relational systems, notable examples include NoSQL query languages for unstructured data (e.g., MongoDB Query Language), GraphQL for API-driven flexible queries, SPARQL for RDF semantic web data, and domain-specific ones like SPL for machine data analysis.^[2] Today, query languages are essential in big data, cloud computing, and AI applications, powering everything from business intelligence to real-time analytics.^[2]

Definition and Purpose

Core Definition

A query language is a specialized computer language used to retrieve, manipulate, and manage data stored in databases or information systems, abstracting away the precise algorithmic steps required for execution.^[8] This formalism enables users to define queries as functions that input a database or set of facts and output a relevant subset or derived facts, focusing on the logical specification of data needs rather than implementation details.^[9] Central to query languages is their declarative nature, which allows users to specify what data is desired—such as particular records meeting certain criteria—while the underlying system determines how to efficiently compute and deliver it.^[10] This paradigm contrasts with procedural approaches, promoting higher-level abstractions that enhance usability and enable optimization by the database engine. Query languages typically encompass both retrieval and manipulation operations; for example, in SQL, the Data Query Language (DQL) subset handles read-centric activities like extraction and analysis via SELECT statements, while the Data Manipulation Language (DML) subset supports modifications such as insertions and updates via INSERT, UPDATE, and DELETE.^[10]^[11] This integrated focus facilitates efficient data exploration and management in large-scale systems. At their core, query languages comprise query expressions that articulate the intended output, operators for tasks like selection (filtering records) and projection (specifying attributes), and result sets that encapsulate the processed data in a structured format.^[12] These elements collectively form a syntax and semantics tailored for precise data interaction.^[13]

Applications in Data Systems

Query languages serve as the foundational interface for interacting with data in relational database management systems (RDBMS), where languages like SQL enable users to retrieve, manipulate, and manage structured data stored in tables.^[14] In NoSQL databases, query languages such as Cypher for graph databases or MongoDB's query API support flexible data models, including document, key-value, and column-family stores, facilitating operations on unstructured or semi-structured data.^[15] Search engines employ query languages based on keyword, Boolean, and natural language constructs to perform information retrieval from vast textual corpora, powering ranked result delivery in systems like web search platforms.^[16] Knowledge graphs utilize specialized query languages like SPARQL for RDF-based structures or Cypher for property graphs, allowing traversal and pattern matching across interconnected entities to support semantic querying.^[17] In business intelligence tools, query languages play a pivotal role in data retrieval for analytics, reporting, and decision-making by extracting insights from operational databases and data warehouses.^[18] For instance, SQL-based queries integrate with platforms like Tableau or Power BI to aggregate metrics, generate dashboards, and enable predictive analytics that inform strategic choices in organizations.^[19] This capability streamlines the transformation of raw data into actionable reports, enhancing efficiency in sectors such as finance and healthcare. Query languages integrate seamlessly with APIs for web services, allowing SQL extensions to mash up data from multiple relational sources and external endpoints in a unified query environment.^[20] In big data platforms, they extend to distributed systems like Hadoop via HiveQL for SQL-like querying on HDFS-stored data, and cloud services such as AWS Athena, which uses standard SQL to analyze petabyte-scale datasets in S3 without infrastructure management.^[21]^[22] These languages offer benefits including high efficiency in processing large datasets through optimized execution plans and declarative paradigms that abstract low-level details, focusing instead on what data to retrieve.^[23] Additionally, they support ad-hoc querying, enabling on-the-fly analysis without predefined schemas, which is essential for exploratory data science and rapid prototyping in dynamic environments.^[24]

Historical Development

Origins in Relational Databases

The origins of query languages are deeply rooted in the relational model of data, proposed by Edgar F. Codd in his seminal 1970 paper, which formalized databases as collections of relations (tables) composed of tuples (rows) and attributes (columns), emphasizing data independence and logical structure over physical storage.^[25] This model laid the theoretical groundwork for querying by introducing relational algebra as a procedural foundation for data manipulation, but it was the non-procedural relational calculi—specifically tuple relational calculus (focusing on selecting tuples satisfying predicates) and domain relational calculus (emphasizing domain variables and conditions)—developed in Codd's 1972 work on relational completeness, that served as key precursors to declarative query languages.^[26] These calculi provided a formal, logic-based means to express queries without specifying retrieval steps, enabling completeness in expressing any relational algebra operation and influencing the design of practical sublanguages for database interaction.^[26] Building on this foundation, early practical query languages emerged within IBM's research efforts to implement the relational model. In 1975, Donald D. Chamberlin and Raymond F. Boyce introduced SQUARE (Specifying Queries as Relational Expressions), a data sublanguage designed for ad hoc querying in relational databases, which directly translated relational algebra operations into a textual form but relied heavily on mathematical notation, subscripts, and complex expressions that proved cumbersome for non-experts.^[27] To address these usability challenges, the same researchers simplified SQUARE into SEQUEL (Structured English Query Language) in 1974, adopting a more readable, English-like syntax while retaining declarative semantics inspired by the relational calculi, and integrating it as the query interface for IBM's System R prototype—a pioneering relational database management system developed to demonstrate Codd's concepts in a working environment.^[28]^[29] By the late 1970s, SEQUEL transitioned to SQL (Structured Query Language) due to a trademark conflict with the existing SEQUEL name held by an unrelated company, prompting IBM to shorten it while preserving its core features.^[30] This evolution marked the shift from research prototypes to commercial viability, with Relational Software, Inc. (later Oracle Corporation) releasing the first production implementation of SQL in Oracle Version 2 in 1979, enabling structured queries on relational data in a multi-user setting and setting the stage for widespread adoption.^[31]

Evolution and Standardization

The evolution of query languages, building on early relational concepts, accelerated in the 1980s with the formal standardization of SQL as a core query mechanism for relational databases. In 1986, the American National Standards Institute (ANSI) approved the first SQL standard, designated ANSI X3.135-1986, which defined essential syntax for data definition, manipulation, and control operations, including SELECT, INSERT, UPDATE, and DELETE statements.^[32] This standard was adopted internationally by the International Organization for Standardization (ISO) in 1987 as ISO/IEC 9075:1987, promoting portability and consistency across database systems. The 1990s marked significant expansions to the SQL standard, enhancing its expressiveness and applicability. The SQL-92 standard (ISO/IEC 9075:1992), also known as SQL2, introduced features such as outer joins for handling unmatched rows in queries, improved support for views and schemas, and new data types like DATE, TIME, and TIMESTAMP, while defining conformance levels (Entry, Intermediate, Full) to guide implementations.^[33] Building on this, SQL:1999 (ISO/IEC 9075:1999), or SQL3, incorporated object-relational extensions including user-defined types, inheritance, and recursive queries via common table expressions (CTEs), allowing complex hierarchical data retrieval without procedural code.^[34] Subsequent revisions continued to evolve SQL for modern data needs. SQL:2003 added support for XML data querying and manipulation. Later versions, including SQL:2008 and SQL:2011, enhanced analytical processing with improved window functions and temporal data handling. SQL:2016 introduced JSON data type and functions for semi-structured data. The most recent, SQL:2023 (ISO/IEC 9075:2023), further expanded JSON capabilities and added enhancements for property graphs and regular expression matching in JSON contexts.^[35] As query languages matured, domain-specific extensions emerged to address limitations in handling non-relational data and procedural logic, alongside alternatives to SQL. For instance, QUEL (Query Language), developed in the late 1970s for the Ingres database system at UC Berkeley and based on relational calculus, offered a more mathematical syntax and was used commercially in the 1980s but was eventually supplanted by SQL's growing dominance and English-like readability. For XML data, the W3C standardized XQuery 1.0 in 2007 as a functional query language for retrieving and transforming XML documents, complementing SQL by supporting path expressions and FLWOR (For-Let-Where-Order-Return) constructs.^[36] Concurrently, integration with procedural elements gained traction; for instance, Oracle introduced PL/SQL in 1992 with Oracle7, extending SQL with blocks, variables, loops, and exception handling for server-side programming.^[37] Database vendors further influenced standardization through proprietary evolutions that extended core SQL while aiming for partial compliance. Microsoft's Transact-SQL (T-SQL), originating from the 1989 Sybase-Microsoft partnership for SQL Server and fully developed by Microsoft after 1993, added procedural constructs like cursors and error handling, alongside extensions for analytics such as window functions in later versions.^[38] Similarly, Oracle's PL/SQL evolved as a robust procedural layer, enabling stored procedures and triggers that influenced subsequent ISO standards on persistent stored modules.^[37] These developments balanced innovation with interoperability, shaping query languages into versatile tools for enterprise data management.

Recent Advancements

The 2010s marked a significant shift in query languages with the rise of graph databases, addressing the limitations of relational models in handling interconnected data. Cypher, developed by Neo4j engineers in 2011, emerged as a declarative query language specifically designed for property graph databases, enabling pattern matching and traversal operations that are intuitive for graph structures.^[39] This innovation laid the groundwork for broader adoption of graph querying, culminating in the standardization of GQL (Graph Query Language) as ISO/IEC 39075 in April 2024, which defines operations for creating, querying, and maintaining property graphs in a vendor-neutral manner.^[40] GQL draws heavily from Cypher's syntax while incorporating elements from other graph languages, promoting interoperability across graph database systems.^[41] Parallel to graph advancements, NoSQL databases prompted adaptations in query paradigms to support flexible, schema-less data models. The MongoDB Query Language (MQL), integral to MongoDB since its initial release in August 2009, uses JSON-like documents for querying, allowing operations like aggregation pipelines and full-text search without rigid schemas.^[42] Similarly, the Cassandra Query Language (CQL), introduced in 2011 for Apache Cassandra, mimics SQL syntax to query wide-column stores, facilitating distributed data manipulation across clusters with commands for keyspace management and conditional updates.^[43] These adaptations enabled scalable querying in non-relational environments, influencing hybrid systems that blend NoSQL flexibility with familiar SQL-like interfaces. API-centric query languages further evolved data access in web and microservices architectures. GraphQL, open-sourced by Facebook in 2015, introduced a flexible querying mechanism where clients specify exact data requirements via a single endpoint, reducing over-fetching and under-fetching common in REST APIs.^[44] This approach, now widely adopted by platforms like GitHub and Shopify, supports introspection and type safety through schema definitions, streamlining client-server interactions in distributed applications. Integrations with artificial intelligence have transformed query generation by bridging natural language and structured queries. From 2023 onward, large language model (LLM)-based tools have enabled natural language processing for automatic SQL or query generation, with examples like Uber's QueryGPT (2024) using LLMs and vector search to convert English questions into executable database queries, improving accessibility for non-experts.^[45] Complementary innovations include PRQL, a pipelined relational query language developed in the early 2020s, which compiles to SQL and emphasizes readable, chainable expressions over nested subqueries to enhance maintainability in analytical workflows.^[46] Cloud-native systems have advanced distributed query capabilities through SQL extensions tailored for massive scalability. Snowflake, a cloud data platform launched in 2014, has iteratively extended SQL in the 2020s with features like dynamic table functions and vector search support, optimizing queries across distributed warehouses for real-time analytics on petabyte-scale data without traditional indexing overhead.^[47] These enhancements facilitate seamless federated querying over hybrid cloud environments, underscoring the trend toward unified, elastic data processing.

Key Characteristics

Declarative vs. Procedural Paradigms

Query languages predominantly adopt the declarative paradigm, where users specify the desired results—what data to retrieve or manipulate—without dictating the method of execution. The underlying database management system (DBMS) optimizer then determines the optimal execution plan, including choices like join orders, index usage, and parallelization, based on system statistics and constraints. This paradigm is exemplified by set-based operations inspired by relational algebra, such as selections, projections, and unions, which treat data as mathematical sets rather than sequential records, enabling concise expressions of complex queries.^[25] In contrast, the procedural paradigm requires explicit step-by-step instructions for accessing and processing data, akin to imperative programming where control flow and operations are fully prescribed by the user. Although less prevalent in pure query languages due to their complexity and reduced flexibility, procedural elements persist in extensions like SQL cursors, which facilitate iterative, row-by-row traversal of result sets for tasks requiring ordered processing or dynamic decision-making. These mechanisms allow fine-grained control but often lead to less efficient, harder-to-optimize code compared to set-based alternatives.^[48] The dominance of the declarative paradigm stems from its key advantages: enhanced portability, as queries remain valid across diverse DBMS implementations without modification for underlying storage or hardware differences; superior performance optimization, where the engine automatically generates efficient plans that outperform manually tuned procedural equivalents in most scenarios; and clear separation of concerns, isolating logical query intent from physical execution details to improve maintainability and reduce developer burden.^[6]^[49] Theoretically, declarative query languages are grounded in relational calculus, a non-procedural formalism that defines queries through logical predicates on relations, offering equivalent expressive power to the procedural relational algebra without specifying operational sequences. Relational algebra, introduced by E.F. Codd, serves as the procedural foundation with its explicit operators for data manipulation, mirroring the step-wise control of imperative loops in general programming languages like C or Java. This duality, formalized in Codd's work on relational completeness, underscores why declarative approaches prevail in modern database systems for their balance of power and abstraction.^[26]^[25]

Syntax and Semantic Elements

Query languages are constructed using a formal syntax that includes predefined keywords, operators, and clauses to articulate data selection, filtering, and manipulation instructions. Keywords such as SELECT and FROM delineate the projection of desired attributes and the specification of data sources, respectively, forming the foundational structure of most queries.^[50] Logical operators like AND and OR enable the combination of conditions, while comparison operators including = and > facilitate precise filtering based on relational predicates. Clauses such as WHERE for conditional filtering and GROUP BY for aggregation organize the query logic, ensuring systematic processing of input data.^[51] Semantically, query languages define mappings from underlying data models—such as relations or graphs—to output result sets, where the interpretation of a query determines the exact transformation applied. In the relational model, these semantics embody closure properties, whereby algebraic operations on relations yield relations, thereby preserving the model's structure throughout computation.^[52] Expressiveness is a key semantic attribute, exemplified by the completeness of relational calculus, which equivalently captures all queries formulable in relational algebra, ensuring no loss of representational power.^[53] Common patterns in query languages include pattern matching for identifying structural similarities in data retrieval, joins for integrating information across multiple relations or entities, and aggregation functions such as COUNT and SUM for condensing datasets into summary metrics. Pattern matching employs symbolic representations, often using wildcards or regular expressions, to locate conforming elements within records or nodes.^[54] Joins, typically categorized as inner, outer, or equi-joins, merge datasets based on shared attributes, enabling relational composition without data duplication.^[55] Aggregation functions apply over grouped data to compute scalar values, supporting analytical operations like totals or averages in result sets.^[56] Challenges in query language design encompass ambiguity in natural language interfaces, where polysemous terms or contextual nuances can yield multiple valid interpretations, thus hindering precise query translation.^[57] In structured queries, type safety poses another hurdle, as mismatches between operand types may lead to runtime failures unless enforced by static checks or schema-aware compilation.^[58]

Classification by Type

Database Query Languages

Database query languages enable the retrieval, manipulation, and management of structured data within database systems, primarily focusing on relational models where data is organized into tables with predefined schemas. The cornerstone of these languages is SQL (Structured Query Language), a standardized domain-specific language developed for relational databases to perform create, read, update, and delete (CRUD) operations, with Data Query Language (DQL) components emphasizing efficient read operations such as selecting and filtering data from tables. SQL and its variants, including those in systems like Oracle Database, Microsoft SQL Server, and PostgreSQL, adhere to ANSI/ISO standards, allowing developers to express queries declaratively for consistent data interaction across RDBMS platforms. In non-relational or NoSQL environments, query languages adapt to diverse data models while retaining core principles of structured retrieval. Key-value stores, exemplified by Redis, utilize command-based queries like GET, SET, and MGET to access data stored as simple pairs, prioritizing speed for caching and session management. Document-oriented databases, such as MongoDB, employ a JavaScript Object Notation (JSON)-like query syntax to match and aggregate semi-structured documents, supporting operations akin to CRUD through methods like find() and update(). Column-family stores like Apache Cassandra use Cassandra Query Language (CQL), a SQL-inspired syntax tailored for distributed wide-column data, enabling inserts, selects, and updates across partitioned tables. Essential features of these query languages include support for ACID (Atomicity, Consistency, Isolation, Durability) compliance to guarantee transaction reliability, particularly in relational systems where SQL enforces data integrity during multi-statement operations. Indexing structures, such as B-tree or hash indexes in SQL and secondary indexes in NoSQL variants, accelerate query execution by facilitating rapid lookups and reducing full-table scans. Transactional capabilities allow queries to bundle operations atomically, with rollback mechanisms in SQL and multi-document transactions in MongoDB ensuring consistency in concurrent environments. These languages power enterprise data management by underpinning online transaction processing (OLTP) for real-time, high-throughput tasks like order processing and inventory updates, while also supporting online analytical processing (OLAP) for aggregating and analyzing large datasets in business intelligence applications.^[59]^[60]

Information Retrieval Query Languages

Information retrieval (IR) query languages are designed to search and rank documents in large collections of unstructured or semi-structured text, emphasizing probabilistic relevance over exact matches. These languages enable users to express information needs through terms, operators, and modifiers that facilitate retrieval from corpora such as web pages, digital libraries, or enterprise archives. Unlike precise data extraction in structured databases, IR queries prioritize ranking documents by estimated relevance, often using statistical models to handle ambiguity and scale to billions of items. Boolean queries form the foundational logic in early IR systems, employing operators like AND, OR, and NOT to combine terms for exact set-based retrieval. For instance, a query such as "cat AND dog NOT bird" retrieves documents containing both "cat" and "dog" but excluding "bird," processed efficiently via inverted indexes that map terms to document lists. This model, prominent in systems like the SMART retrieval system from the 1960s, provides binary yes/no results without inherent ranking, making it suitable for precise filtering in controlled vocabularies but limited for vague user intents in full-text scenarios.^[61]^[62] Full-text and ranked retrieval extend Boolean capabilities by incorporating term weighting and proximity operators to score document relevance. In term-based approaches, queries use free-text keywords weighted by models like TF-IDF (Term Frequency-Inverse Document Frequency), where term frequency measures local importance within a document, and inverse document frequency downweights common terms across the corpus, enabling ranked lists ordered by cosine similarity or similar metrics. Proximity operators, such as "cat NEAR/5 dog," refine searches by requiring terms within a specified distance, improving precision in phrase-like queries. These elements, central to vector space models, power modern search engines by addressing vocabulary mismatches and supporting relevance feedback.^[63] Structured elements in IR query languages allow field-specific searches to target metadata or document sections, enhancing precision in semi-structured collections. For example, queries like "title:quantum physics" restrict matching to titles, while "author:Einstein date:>1900" combines fields for temporal filtering, common in tools like web search engines or digital libraries. This approach leverages document schemas without full relational structure, bridging free-text and metadata-driven retrieval.^[16]^[64] The evolution of IR query languages has incorporated faceted search and query expansion to better capture user intent and support exploratory navigation. Faceted search presents results with navigable categories (facets) like genre or date, allowing progressive refinement of queries through selections that intersect with initial terms, originating from library classification systems and advanced in tools like the Flamenco interface. Query expansion automatically augments user queries with related terms—via thesauri, co-occurrence analysis, or relevance feedback—to mitigate issues like synonymy or polysemy, as demonstrated in techniques from Rocchio's 1971 method and later surveys showing 7-14% recall improvements in benchmark tests. These advancements shift IR from rigid logic to interactive, intent-aware paradigms.^[65]^[66]

Emerging and Specialized Languages

In recent years, query languages for graph data have advanced to handle complex relational structures beyond traditional tabular models. Property graph query languages, such as Cypher and Gremlin, enable traversals that navigate nodes and relationships to uncover patterns in interconnected data, supporting applications like social network analysis and recommendation systems.^[67]^[68] For semantic web applications, RDF-based languages like SPARQL facilitate querying distributed knowledge graphs by matching triples (subject-predicate-object) across heterogeneous sources, with the SPARQL 1.2 Working Draft (as of November 2025) enhancing federation and update capabilities for large-scale RDF datasets.^[69]^[70] The integration of large language models (LLMs) has given rise to natural language-driven query interfaces, allowing users to pose conversational questions that are automatically translated into executable code. Tools like Uber's QueryGPT, launched in 2024, leverage generative AI to convert natural language prompts into SQL queries, improving accessibility for non-technical users in data analysis workflows.^[45] Recent advancements in text-to-SQL, as surveyed in 2025, demonstrate LLMs achieving up to 80% accuracy on benchmark datasets like Spider by incorporating retrieval-augmented generation (RAG) to refine schema understanding and query synthesis.^[71]^[72] Domain-specific query languages address niche data paradigms, optimizing for performance in specialized environments. PromQL, the query language for Prometheus, supports real-time aggregation of time-series metrics using functions like rate() and histogram_quantile() to monitor infrastructure and applications at scale.^[73] For AI embeddings in vector databases, query mechanisms often extend SQL with similarity operators (e.g., cosine distance in pgvector) or use dedicated syntax in systems like Milvus for approximate nearest neighbor searches over high-dimensional data.^[74] The Graph Query Language (GQL), standardized by ISO/IEC 39075 in 2024, provides a unified declarative syntax for property graphs, enabling path traversals and pattern matching in knowledge graphs while promoting interoperability across vendors.^[75]^[41] Emerging trends emphasize hybrid query languages that blend paradigms for polyglot persistence, where systems manage diverse data types within a single query interface. For instance, extensions like PostgreSQL's SQL/PGQ integrate graph traversals with relational joins, allowing unified queries over SQL tables and property graphs to support complex analytics in mixed workloads.^[76] This approach reduces data silos, as seen in 2025 hybrid models that combine vector embeddings with graph structures for enhanced retrieval-augmented generation in AI applications.^[77]

Notable Examples

Structured Query Language (SQL)

Structured Query Language (SQL) is a standardized domain-specific language designed for managing and querying data held in relational database management systems (RDBMS). Originally developed by IBM in the 1970s, it became an ANSI standard in 1986 and an international ISO standard in 1987, enabling declarative expressions for data retrieval, manipulation, and control. SQL's widespread adoption stems from its simplicity and power in handling structured data through relational models, where data is organized into tables with rows and columns related via keys. As the de facto standard for relational databases, SQL underpins systems like Oracle, MySQL, PostgreSQL, and SQL Server, facilitating operations from simple lookups to complex analytical queries.^[78]^[79] At its core, SQL syntax revolves around the SELECT-FROM-WHERE structure for querying data. The SELECT clause specifies the columns or expressions to retrieve, the FROM clause identifies the source tables, and the WHERE clause applies filtering conditions to rows. For example, to retrieve employee names from a department, one might use:

SELECT name FROM employees WHERE department = 'Sales';
SELECT name FROM employees WHERE department = 'Sales';

This basic form supports data aggregation with GROUP BY and HAVING for conditional summaries. SQL also includes Data Manipulation Language (DML) statements like INSERT, UPDATE, and DELETE for modifying data, and Data Definition Language (DDL) commands like CREATE TABLE for schema management. To combine data from multiple tables, SQL employs JOIN operations, which link rows based on related columns. Common types include INNER JOIN, which returns only matching rows from both tables, and LEFT JOIN, which includes all rows from the left table and matching rows from the right, with NULLs for non-matches. An example INNER JOIN on customers and orders:

SELECT customers.name, orders.date 
FROM customers 
INNER JOIN orders ON customers.id = orders.customer_id;
SELECT customers.name, orders.date 
FROM customers 
INNER JOIN orders ON customers.id = orders.customer_id;

Subqueries enhance expressiveness by nesting one query within another, often in the WHERE clause for comparisons or in FROM for derived tables. For instance, a subquery might filter employees earning above the departmental average. Window functions, introduced in SQL:1999, perform calculations across row sets without grouping, using an OVER clause to define the window. The ROW_NUMBER() function assigns sequential numbers to rows within a partition, useful for ranking:

SELECT name, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank 
FROM employees;
SELECT name, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank 
FROM employees;

These features allow SQL to handle analytical tasks efficiently in relational contexts.^[80] SQL's evolution is tracked through successive ISO/IEC 9075 revisions, balancing core stability with new capabilities. The progression began with ANSI X3.135-1986 (SQL-86), focusing on basic relational operations, followed by enhancements in SQL-89 for integrity constraints and SQL-92 for fuller syntax including outer joins. Later versions added object-relational features: SQL:1999 introduced recursive queries and window functions; SQL:2003 supported XML data; SQL:2006 and SQL:2008 enhanced temporal and window support; SQL:2011 added temporal tables. SQL:2016 (ISO/IEC 9075-2016) notably incorporated JSON support through functions like JSON_VALUE for extracting values from JSON documents stored in columns, enabling hybrid relational-NoSQL workloads. The latest, SQL:2023 (ISO/IEC 9075-2023), introduces property graph queries via clauses like MATCH for traversing graph structures directly in SQL, extending its reach to graph data without abandoning relational foundations.^[78]^[81]^[82] Database vendors extend the SQL standard to address domain-specific needs, often through proprietary functions while maintaining core compliance. PostgreSQL, for instance, provides robust full-text search via the tsvector and tsquery data types, integrated into SQL queries using operators like @@ for matching parsed text against search terms. This allows efficient indexing and ranking of textual content, as in:

SELECT title FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('english', 'database & query');
SELECT title FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('english', 'database & query');

Such extensions leverage PostgreSQL's GIN indexes for performance on large corpora. MySQL offers spatial query extensions compliant with Open Geospatial Consortium (OGC) standards, supporting geometry types like POINT, LINESTRING, and POLYGON for storing and querying geospatial data. Functions such as ST_Distance compute metrics between features, enabling location-based queries like finding nearby points:

SELECT name FROM locations WHERE ST_Distance_Sphere(geom, POINT(40.7128, -74.0060)) < 10000;
SELECT name FROM locations WHERE ST_Distance_Sphere(geom, POINT(40.7128, -74.0060)) < 10000;

These build on MySQL's spatial indexes for efficient analysis in GIS applications.^[83] Despite its strengths, traditional SQL implementations in monolithic RDBMS face scalability limitations when handling big data volumes, such as petabyte-scale datasets or high-velocity streams, due to challenges in distributed processing, locking, and index maintenance that can lead to performance bottlenecks. These issues are mitigated in modern dialects like Google BigQuery's SQL, which leverages a serverless, columnar storage architecture with automatic sharding and massively parallel processing to query terabytes in seconds without managing infrastructure. BigQuery's extensions, such as scripting and machine learning integrations, further adapt SQL for cloud-scale analytics while preserving standard syntax.^[84]

Graph and NoSQL Query Languages

Graph query languages are designed to operate on graph data models, which represent entities as nodes and relationships as edges, enabling efficient traversal and pattern matching for interconnected data. Unlike relational approaches, these languages emphasize declarative specifications of graph patterns and traversals, facilitating queries over complex networks such as social graphs or recommendation systems.^[85] NoSQL query languages extend this paradigm to non-relational stores, supporting diverse data models like documents, key-value pairs, and semantic webs, while providing schema flexibility for big data environments.^[86] Cypher is a declarative query language developed for Neo4j, a leading property graph database, allowing users to express graph patterns and traversals in a readable, ASCII-art-inspired syntax. It focuses on pattern matching to retrieve connected data, such as identifying relationships between nodes, and is optimized for real-time queries in graph databases. For instance, the query MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a, b finds all pairs of people connected by a "KNOWS" relationship, enabling efficient traversals without explicit joins. Cypher's design draws from SQL-like readability but prioritizes graph semantics, making it suitable for applications requiring deep relationship analysis.^[87]^[88] Gremlin serves as the graph traversal language for the Apache TinkerPop framework, supporting a wide range of graph databases through a functional, data-flow approach composed of sequential steps. It enables both imperative traversals for procedural control and declarative patterns for high-level queries, with operations like addV('person').property('name', 'Alice') to create vertices and outE('knows') to follow outgoing edges labeled "knows." This step-based model allows for complex path computations, such as shortest paths or community detection, and is embeddable in languages like Java or Python for versatile graph processing. Gremlin's Turing-complete nature supports both online transaction processing (OLTP) and analytics (OLAP) workloads across TinkerPop-compatible systems.^[85] The Graph Query Language (GQL), standardized as ISO/IEC 39075:2024, is a declarative language for querying property graph databases, serving as the international standard analogous to SQL for relational data. Inspired by Cypher, it uses pattern-matching syntax for traversals, such as MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN n.name, m.name to retrieve connected persons, supporting efficient querying of complex relationships in graph stores. GQL enables vendor-neutral graph operations, including path finding and subgraph extraction, and is implemented in databases like Neo4j and AWS Neptune as of 2025.^[76]^[41] In the NoSQL domain, languages like AQL (ArangoDB Query Language) provide unified querying for multi-model databases that combine graphs, documents, and key-value stores. AQL is declarative and SQL-inspired, supporting operations across heterogeneous data with features like traversals and aggregations in a single query, such as FOR v IN 1..3 INBOUND STARTVERTEX GRAPH 'social' OPTIONS {bfs: true} RETURN v.name for graph navigation. Similarly, SPARQL is the W3C-standardized query language for RDF (Resource Description Framework) data, treating it as directed labeled graphs for semantic web applications. It uses triple patterns for matching, as in SELECT ?subject WHERE { ?subject rdf:type :[Resource](/page/Resource) }, to retrieve resources of a specific type, with support for federated queries, filters, and constructs to build new RDF graphs. These languages enable flexible, scalable data access in distributed NoSQL environments.^[89]^[90] Graph and NoSQL query languages offer distinct advantages over rigid relational systems, particularly in handling complex relationships through native traversals that avoid costly multi-table joins, achieving up to orders-of-magnitude performance gains in interconnected datasets. For example, graph databases like Neo4j demonstrate superior efficiency in relationship-heavy queries compared to MySQL, as joins in SQL scale poorly with degree of connectivity. Additionally, their schema-less or flexible designs accommodate evolving data structures without migrations, supporting agile development in big data scenarios where relational schemas impose constraints. This flexibility is crucial for applications like fraud detection or knowledge graphs, where ad-hoc patterns and semi-structured data prevail.^[86]^[91]

References

[1]
What is query language? - Elastic
Query language, which includes database query language (DQL), is a specialized computer language used to make queries and retrieve information from databases.Query language definition · So, what is a query? · How to improve your query...
[2]
Query Languages: A Simple Introduction - Splunk
Mar 13, 2024 · A query language is a computer programming language used to retrieve and manipulate data from databases. It allows users to communicate with the ...
[3]
A brief history of databases: From relational, to NoSQL, to distributed ...
Feb 24, 2022 · SQL, the Structured Query Language, became the language of data, and software developers learned to use it to ask for what they wanted, and ...
[4]
What is SQL? - Structured Query Language (SQL) Explained - AWS
Structured query language (SQL) is a programming language for storing and processing information in a relational database.
[5]
50 Years of Queries - Communications of the ACM
Jul 26, 2024 · Two relational query languages were available in the marketplace in 1982: SQL, marketed by IBM and SDL (later Oracle); and QUEL, marketed by RTI ...
[6]
Imperative vs. Declarative Query Languages: What's the Difference?
Aug 21, 2018 · The two main paradigms of database query languages are imperative and declarative. Understanding the difference between these two approaches is ...<|separator|>
[7]
What are query languages? - Aerospike
Data Query Language (DQL): Primarily used for retrieving data from databases. · Data Manipulation Language (DML): Facilitates the modification of data stored in ...
[8]
On the expressive power of query languages for relational databases
We prove some general results and show that only a proper subset of first-order logic formulas may be used as a practical query language. We characterize ...
[9]
Query languages - korrekt.org
A query language is any formalism that can be used to define queries, where a query is a function that takes a database (or set of facts) as an input and ...
[10]
[PDF] CSC 261/461 – Database Systems Lecture 2
– Are “set”-oriented and specify what data to retrieve rather than how to retrieve it. – Also called declarative languages. • Low Level or Procedural Language:.
[11]
Introduction to Databases - UTK-EECS
Data Definition Language (DDL): Defines the structure of the database by ... Data Manipulation Language (DML): A query language that allows users to ...
[12]
[PDF] The complexity of relational query languages - Rice University
Definition: A query language (or language for short) is a set of expressions L and a meaning function µ such that for every expression e in L, µ(e) is a ...
[13]
Data Management and Analytics in Business - OPEN OCO
Database query languages are computer programming languages used to retrieve and manipulate data in a database. There are several types of query languages, ...Data Storage And Retrieval · Using Databases · Relational Databases
[14]
[PDF] ANALYSIS OF SQL AND NOSQL DATABASE MANAGEMENT ...
Graph databases support. ACID transactions, indexing, and rich query languages like Cypher, making them suitable for applications requiring relationship-aware.
[15]
5. Query Specification
These are: command language, form fillin, menu selection, direct manipulation, and natural language. ... Each technique has been used in query specification ...
[16]
[2305.14485] Knowledge Graphs Querying - arXiv
May 23, 2023 · Querying KGs is critical in web search, question answering (QA), semantic search, personal assistants, fact checking, and recommendation.
[17]
Predictive Analytics in Business Intelligence -How machine learning ...
Mar 10, 2025 · SQL-based analytics enhances data extraction and querying processes by offering advanced capabilities to operate data query systems more ...
[18]
Understanding How SQL is Used in Data Analytics for Effective ...
Apr 24, 2025 · SQL is used by analysts for data manipulation which includes operations such as altering records, adding new records and the removal of redundant records.
[19]
SQL as a mashup tool: design and implementation of a web service ...
Apr 1, 2010 · With this conversion layer provided by the tool, multiple Web services and relational databases can be integrated in SQL, and therefore, ...
[20]
Evaluation of high-level query languages based on MapReduce in ...
Oct 6, 2018 · In other words, it provides an easy data summarization, ad-hoc querying and analysis of large volumes of data. The Hive architecture presented ...
[21]
The Hadoop Ecosystem's Continued Impact
Jun 12, 2023 · Speaking of Hive clones, the AWS query service Athena is actually the Presto framework under the hood. As the saying goes, imitation is the ...
[22]
Query Language Extensions for Advanced Analytics on Big Data ...
Advanced analytics and other Big Data applications call for query languages that can express the complex logic of advanced analytics, and are also amenable ...
[23]
[PDF] Efficient Query Processing Techniques for Big Data Analytics
Ad Hoc Analysis: Efficient query processing enables ad hoc analysis, allowing users to formulate and execute queries on the fly. This flexibility is essential ...
[24]
A relational model of data for large shared data banks
A relational model of data for large shared data banks. Author: E. F. Codd ... Published: 01 June 1970 Publication History. 5,614citation66,017Downloads.
[25]
[PDF] Relational Completeness of Data Base Sublanguages
This paper attempts to provide a theoretical basis which may be used to determine how complete a selection capability is provided in a proposed data sublanguage.
[26]
Specifying queries as relational expressions: the SQUARE data ...
This paper presents a data sublanguage called SQUARE, intended for use in ad hoc, interactive problem solving by non-computer specialists.
[27]
SEQUEL: A structured English query language - ACM Digital Library
In this paper we present the data manipulation facility for a structured English query language (SEQUEL) which can be used for accessing data in an integrated ...
[28]
System R: relational approach to database management
System R is a database management system which provides a high level relational data interface. The systems provides a high level of data independence.
[29]
What is a Relational Database? - IBM
Originally known as SEQUEL, it was simplified to SQL due to a trademark issue. SQL queries also allows users to retrieve data from databases using only a ...
[30]
History of SQL
In 1979, Relational Software, Inc. (now Oracle) introduced the first commercially available implementation of SQL. Today, SQL is accepted as the standard ...
[31]
SQL: American National Standard Adoptions by INCITS
Oct 5, 2018 · SQL ANSI Standard INCITS ANSI X3 135 1986. SQL is the standard language for relational database management systems, and, in the age of mass ...
[32]
The History of SQL Standards | LearnSQL.com
Dec 8, 2020 · The first SQL standard was SQL-86. It was published in 1986 as ANSI standard and in 1987 as International Organization for Standardization (ISO) standard.
[33]
[PDF] SQL:1999, formerly known as SQL3
As we shall show, SQL:1999 is much more than merely SQL-92 plus object technology. It involves additional features that we consider to fall into SQL's ...
[34]
XQuery 1.0: An XML Query Language - W3C
This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
[35]
Introduction to Oracle Database
Oracle7, released in 1992, introduced PL/SQL stored procedures and triggers. Objects and partitioning. Oracle8 was released in 1997 as the object-relational ...
[36]
New Video: The History of SQL Server - Microsoft
Feb 15, 2012 · The history of SQL Server dates back to 1989 when the product came about as a result of a partnership between Microsoft, Sybase, and Ashton-Tate.
[37]
Overview - Cypher Manual - Neo4j
Cypher is Neo4j's declarative graph query language. It was created in 2011 by Neo4j engineers as an SQL-equivalent language for graph databases.Missing: history | Show results with:history
[38]
ISO/IEC 39075:2024 - Database languages — GQL
In stockThis document defines data structures and basic operations on property graphs. It provides capabilities for creating, accessing, querying, maintaining, and ...
[39]
GQL Standard
Database languages — GQL, is officially published and available for purchase on the ISO web store!What is a GQL Standard? · GQL Blogs · Existing Languages · Resources
[40]
MongoDB Evolved – Version History
The first version of the MongoDB database shipped in August 2009. The 1.0 release and those that followed shortly after were focused on validating a new and ...What's New In The Latest... · 2024 -- Mongodb 8.0 · 2023 -- Mongodb 7.0
[41]
The Cassandra Query Language (CQL)
This document describes the Cassandra Query Language (CQL) version 3. Note that this document describes the last version of the language.Cassandra Documentation · Data Definition · SASI · JSON SupportMissing: history | Show results with:history
[42]
GraphQL: A data query language - Engineering at Meta
Sep 14, 2015 · A GraphQL query is a string that is sent to a server to be interpreted and fulfilled, which then returns JSON back to the client.
[43]
QueryGPT - Natural Language to SQL using Generative AI | Uber Blog
Sep 19, 2024 · QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English questions that are ...
[44]
PRQL
a simple, powerful, pipelined SQL replacement.Book · Playground · FAQ · Roadmap
[45]
Announcing New SQL Features in Public Preview - Snowflake
to optimize queries, extend logic and simplify ...Missing: extensions distributed 2020s
[46]
DECLARE CURSOR (Transact-SQL) - SQL Server - Microsoft Learn
Nov 22, 2024 · Defines the attributes of a Transact-SQL server cursor, such as its scrolling behavior and the query used to build the result set on which the cursor operates.
[47]
Mastering Declarative Programming Languages for Code Optimization
Apr 30, 2024 · Benefits of Declarative Programming · Readability · Referential transparency · Less data mutability · Seamless optimization · Easy maintainability.
[48]
Query Language Reference (Version 0.7) | Charts
Jul 10, 2024 · The query language syntax is similar to SQL and includes clauses like select , where , group by , pivot , order by , limit , offset , label , ...
[49]
Access SQL: basic concepts, vocabulary, and syntax
SQL terms ; operator. verb or adverb. A keyword that represents an action or modifies an action. ; constant. noun. A value that does not change, such as a number ...
[50]
[PDF] A Relational Model of Data for Large Shared Data Banks
Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each ele- ment (n-tuple) of that relation. Such a domain ...
[51]
[PDF] Database theory: Query languages
Computer Science Logic (CSL), volume 4646 of Lecture Notes in Computer. Science, pages 84–98. Springer-Verlag, 2007. [35] Jan Van den Bussche, Dirk Van Gucht ...
[52]
SQL for Pattern Matching - Oracle Help Center
Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause. These patterns use regular expression syntax, a powerful and expressive ...
[53]
What Are Graph Query Languages? - PuppyGraph
Rating 100% (2) · FreeMay 9, 2025 · At the core of graph query languages are two fundamental operations: pattern matching and traversal. These operations enable querying graphs in ...
[54]
[PDF] Relational Algebra and Calculus with SQL Null Values - arXiv
Feb 23, 2022 · We extend Codd's theorem by proving the equivalence of the relational algebra with both domain relational calculi in presence of SQL null values ...
[55]
DataTone: Managing Ambiguity in Natural Language Interfaces for ...
In this work we propose a mixed-initiative approach to managing ambiguity in natural language interfaces for data visualization.
[56]
A Cup of ChaiSQL: Benefits of Type Checking for SQL
May 9, 2024 · In this paper, we present the early design and evaluation of ChaiSQL - an optional, comment-based, static type-checker and CLI tool for type safety analyses.
[57]
What Is Online Transaction Processing (OLTP)? - Oracle
Aug 1, 2023 · OLTP is data processing that executes concurrent transactions, like online banking, and involves inserting, updating, or deleting small amounts ...OLTP · Oracle Africa Region · Oracle Middle East Regional
[58]
Online Analytical Processing - Azure Architecture Center
Apr 22, 2025 · Online analytical processing (OLAP) is a technology that organizes large business databases to perform complex calculations and trend analysis.Missing: enterprise | Show results with:enterprise
[59]
Boolean retrieval
### Summary of Boolean Retrieval in Information Retrieval
[60]
[PDF] boolean retrieval
– Easier to use (supports full text queries). – Similar efficiency (based on inverted file implementations). • Disadvantages: – More difficult to convey an ...
[61]
[PDF] Query Languages - DCC UChile
For query languages not aimed at information retrieval, the concept of ranking cannot be easily de ned, so we consider them as languages for data retrieval.
[62]
[PDF] A Query Language for Information Retrieval in XML Documents
Roughly speaking, there are two kinds of IR approaches that deal with the retrieval of structured documents: • The structural approach enriches text search by ...
[63]
[PDF] Faceted Search
The interested reader is encouraged to consult a textbook on information retrieval to learn more about the variety of query expansion and relevance feedback ...
[64]
[PDF] A Survey of Automatic Query Expansion in Information Retrieval
With a correct disambiguation rate of 90%, this paper was the first to show that WSD can work successfully with an IR system, reporting a 7 to 14% improvement ...
[65]
A Guide to Graph Query Languages - Hypermode
Jun 27, 2024 · Discover the power of the graph database model and how graph query languages like Cypher, Gremlin, and SPARQL simplify handling complex, ...Missing: emerging | Show results with:emerging
[66]
RDF Triple Stores vs. Property Graphs: What's the Difference? - Neo4j
Jun 4, 2024 · This article compares two methods: RDF from the original 1990s Semantic Web research and the property graph model from the modern graph database.Rdf Vs. Property Graphs... · What Is Rdf? · Share ArticleMissing: emerging | Show results with:emerging
[67]
https://hypermode.com/blog/graph-query-languages
[68]
RDF & SPARQL Working Group Charter - W3C
This specification defines an update language for RDF graphs. It uses a syntax derived from the SPARQL Query Language for RDF. Update operations are performed ...Missing: emerging | Show results with:emerging
[69]
Natural Language to SQL: State of the Art and Open Problems
Aug 1, 2025 · Translating users' natural language queries (nl) into sql queries (i.e., nl2sql) can significantly reduce barriers to accessing relational ...
[70]
Build your gen AI–based text-to-SQL application using RAG ...
Mar 18, 2025 · This application allows users to ask questions in natural language and then generates a SQL query for the user's request. Large language models ...
[71]
Querying basics - Prometheus
Jan 4, 2021 · Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time.Query examples · Operators · HTTP API
[72]
The 7 Best Vector Databases in 2025 - DataCamp
A comprehensive guide to the best vector databases. Master high-dimensional data storage, decipher unstructured information, and leverage vector embeddings ...
[73]
GQL: The ISO Standard for Graphs Has Arrived - Neo4j
Apr 25, 2024 · GQL, which stands for Graph Query Language, is the first new ISO database language since the introduction of SQL in 1987.
[74]
GQL: The ISO standard for graphs has arrived | AWS Database Blog
Apr 25, 2024 · GQL, which stands for Graph Query Language, is the first new ISO database language since the introduction of SQL in 1987.<|separator|>
[75]
The Hybrid Multimodal Graph Index (HMGI) - arXiv
Oct 11, 2025 · Research has highlighted the shift from polyglot persistence models, which relied on separate vector and graph databases, to native hybrid ...
[76]
SQL Standards - JCC Consulting
The original SQL standard was completed as a USA ANSI (American National Standards Institute) standard in 1986, and adopted as an ISO (International Standards ...Missing: progression | Show results with:progression
[77]
The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135)
Oct 5, 2018 · SQL (Structured Query Language) standard for relational database management systems is ISO/IEC 9075:2023, with origins in ANSI X3.135.
[78]
Subqueries (SQL Server) - Microsoft Learn
Aug 21, 2025 · A subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery.Missing: core window
[79]
(PDF) The new and improved SQL:2016 standard - ResearchGate
Aug 7, 2025 · SQL:2016 (officially called ISO/IEC 9075:2016, Information technology - Database languages - SQL) was published in December of 2016, replacing SQL:2011 as the ...
[80]
SQL:2023 is finished: Here is what's new | Peter Eisentraut
Apr 4, 2023 · Normally, ISO/IEC standards are supposed to take 4 to 5 years (or 3 to 4 years in the future).Missing: progression | Show results with:progression
[81]
Documentation: 18: Chapter 12. Full Text Search - PostgreSQL
Chapter 12 covers full text search, including what a document is, basic text matching, tables, indexes, controlling search, and parsing documents and queries.12.3. Controlling Text Search · 12.1. Introduction · 12.2. Tables and Indexes
[82]
Google BigQuery Vs SQL Server: 8 Critical Differences - Hevo Data
Oct 7, 2024 · Google Bigquery can auto-scale up and down based on the data load. On the other hand, SQL Server doesn't have auto-scalability, and hence it ...
[83]
Graph Query Language - Gremlin - Apache TinkerPop
Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph.
[84]
A comparison of a graph database and a relational database
This paper reports on a comparison of one such NoSQL graph database called Neo4j with a common relational database system, MySQL.
[85]
Introduction - Cypher Manual - Neo4j
Welcome to the Neo4j Cypher® Manual. Cypher is Neo4j's declarative query language, allowing users to unlock the full potential of property graph databases.Cypher and Neo4j · Cypher and Aura · OverviewMissing: history | Show results with:history
[86]
What is Cypher - Getting Started - Neo4j
This page covers the basics of Cypher®. For the complete documentation, refer to Cypher. Cypher is Neo4j's declarative and GQL conformant query language. ...
[87]
404 Page not found
### Overview of AQL in ArangoDB
[88]
SPARQL 1.1 Query Language - W3C
Mar 21, 2013 · This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources.
[89]
Foundations of Modern Query Languages for Graph Databases
Sep 26, 2017 · We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs.