Power Query
Power Query is a data connectivity and data preparation technology developed by Microsoft that enables end users to discover, connect to, combine, and refine data from a wide range of sources into cohesive datasets for analysis and reporting.[1] It functions as an extract, transform, and load (ETL) engine, featuring a graphical user interface called the Power Query Editor for intuitive data shaping and the M formula language for advanced scripting and automation of transformations.[2]
Originally introduced as a free add-in for Excel 2010 and 2013 on Windows, Power Query allowed users to import and transform external data without complex coding.[3] It was later integrated natively into Excel 2016 and subsequent versions as the "Get & Transform" feature on the Data tab, becoming the primary tool for data import and cleaning in Microsoft 365.[3] Support expanded to Excel for Mac starting in 2019, with capabilities for refreshing data from files like TXT, CSV, and JSON, as well as databases such as SQL Server.[3]
Key features include connectivity to diverse sources—such as files (Excel, CSV, XML), databases (SQL Server, Oracle), web services, Azure services, and cloud platforms—and a step-by-step transformation process that records operations like filtering rows, removing columns, changing data types, merging tables, and pivoting data for repeatability and refreshability.[3] The tool supports loading transformed data directly into worksheets, the Excel Data Model, or other destinations, helping business users who typically spend up to 80% of their time on data preparation.[2]
Power Query is embedded across the Microsoft ecosystem, including Power BI for business intelligence visualizations, Analysis Services for multidimensional modeling, Dataverse for data storage in Power Apps, and Azure Synapse Analytics for cloud-based analytics, ensuring consistent data handling workflows.[1] It also powers cloud experiences like dataflows in Power BI service and Microsoft Fabric, enabling scalable, shared data preparation in collaborative environments.[2] This integration makes Power Query essential for self-service business intelligence, allowing non-technical users to handle large volumes of structured and semi-structured data efficiently.[1]
Introduction
Definition and Purpose
Power Query is a data connectivity and preparation technology developed by Microsoft, functioning as an extract, transform, and load (ETL) engine that allows users to connect to a wide range of data sources, clean and shape the data, and load it into analysis tools such as Excel and Power BI.[2] This graphical interface-based tool supports over 350 transformations and connects to hundreds of data sources, enabling seamless data import and reshaping without requiring deep technical expertise.[2]
The core purpose of Power Query is to empower business users and analysts to perform complex data transformations efficiently, helping to reduce the time spent on data preparation tasks, which can account for up to 80% of a business user's time.[2] Originally introduced as a free add-in for earlier versions of Excel, it has evolved into a native feature within Microsoft 365, enhancing accessibility across platforms like Excel for Windows, Mac, and Power BI Desktop and Service.[3] At its foundation, Power Query relies on the M formula language for defining transformations, which can be edited advancedly while keeping the primary workflow visual and intuitive.[4]
Common use cases include merging datasets from multiple sources, such as combining sales data from a database with customer information from a spreadsheet, and handling messy data by removing duplicates, filling missing values, or filtering irrelevant rows and columns to prepare it for reporting or visualization.[2] These capabilities make Power Query essential for creating repeatable and refreshable queries that update automatically as source data changes, streamlining workflows in data-driven decision-making.[2]
Key Features and Benefits
Power Query offers a graphical user interface that enables users to perform data transformations intuitively without writing code, supporting over 350 transformation types for cleaning, reshaping, and combining data from various sources.[2] This interface automatically detects and suggests data types during import, streamlining the initial data shaping process and reducing errors in subsequent analyses.[3] Additionally, it provides connectivity to hundreds of data sources through built-in connectors, allowing seamless integration with databases, files, web services, and cloud platforms like Azure and Google BigQuery.[2] A core capability is non-destructive editing, where all transformations are recorded as sequential "Applied Steps" that can be reviewed, modified, or reverted at any time, preserving the original data integrity.[5]
The primary benefits of Power Query include significantly reducing the time required for data preparation by automating repetitive tasks such as filtering, merging, and pivoting, enabling analysts to focus more on insights rather than manual manipulation.[6] This leads to reproducible workflows, as the Applied Steps ensure consistent transformations across datasets and users, facilitating reliable reporting in tools like Excel and Power BI.[5] In enterprise settings, it supports collaboration by allowing shared queries and dataflows in the Power BI service, where teams can build upon each other's preparations without duplicating efforts.[7]
Power Query handles large datasets scalably through features like query folding, which pushes transformations back to the data source to minimize processing overhead and improve performance.[8] It integrates data profiling tools for column quality, distribution, and statistics, providing tools to identify issues like missing values or outliers early in the workflow.[9] For data privacy, on-premises deployment options via gateways ensure sensitive information remains secure without cloud exposure.[10]
Compared to traditional ETL tools like SQL Server Integration Services (SSIS), Power Query adopts a low-code approach with its visual editor, making it more accessible for business users and faster for ad-hoc transformations, whereas SSIS relies on scripted packages better suited for complex, scheduled enterprise pipelines.[11] The M language underpins these features, allowing advanced customizations when needed beyond the graphical interface.[12]
History
Origins and Early Development
Power Query originated as a project within Microsoft SQL Azure Labs, initially codenamed "Data Explorer," with its first public announcement occurring at the SQL Pass Summit in October 2011.[13] This early incarnation was designed as a cloud-based service to facilitate data exploration and mashup capabilities, allowing users to connect to and transform data from diverse sources such as SQL Server, Excel files, and web pages directly in a browser environment.[13] The tool's development drew from Microsoft's internal data integration efforts, emphasizing mashup technologies to combine heterogeneous data without requiring extensive coding.[14]
A core objective of Data Explorer was to democratize data access and analysis, empowering non-technical "power users" to perform self-service extract, transform, and publish (ETP) operations on varied data types.[13] This vision was inspired by functional programming paradigms, which informed the creation of the underlying M language—a declarative, functional query language that prioritized composable transformations and reproducibility.[14] By focusing on intuitive data shaping rather than procedural scripting, the project aimed to bridge the gap between IT-managed data pipelines and end-user needs, particularly for creating custom OData feeds from blended sources.[13]
Initial prototypes emerged in late 2011 as web-centric tools for exploratory data work, with early previews released in January 2012 via SQL Azure Labs, including both cloud-hosted and desktop client options for testing.[13] Beta testing throughout 2012 emphasized web data extraction and integration, enabling users to pull structured content from online sources and apply transformations iteratively through a visual interface.[14] These phases refined the tool's ability to handle real-world data mashups, setting the foundation for its evolution into a desktop add-in while retaining the M language as its scripting backbone. In 2013, it was renamed Power Query to align with Microsoft's Power family of tools.[14]
Major Milestones and Product Integrations
Power Query was initially released in July 2013 as a free add-in for Excel 2010 and 2013, having been renamed from its preview codename "Data Explorer" to emphasize its focus on self-service business intelligence and data discovery.[15][16] This add-in simplified data import, transformation, and mashup capabilities, making advanced ETL processes accessible to non-technical users without requiring programming expertise.[3]
In September 2015, Power Query achieved native integration into Excel 2016 as "Get & Transform," embedding its engine directly into the application and expanding its reach to a broader user base within the Microsoft Office suite.[3] By 2018, Microsoft unified the branding across its ecosystem, reverting to "Power Query" as the official technology name while retaining "Get & Transform" for the Excel interface, which facilitated consistent development and updates.[3] That same year, Power Query was integrated into the Common Data Service (now Microsoft Dataverse), enabling seamless data ingestion, preparation, and loading into cloud-based entity stores for Power Apps and Dynamics 365.[17]
A key enterprise milestone occurred in 2017 with the addition of Power Query to SQL Server Analysis Services (SSAS) tabular models at compatibility level 1400, introducing the modern Get Data experience and support for M language expressions to handle complex data transformations directly within SSAS projects.[18] This integration bridged self-service analytics with on-premises data warehousing, allowing SSAS developers to leverage Power Query's query builder for enhanced data sourcing and folding.
Recent advancements from 2024 to 2025 have further solidified Power Query's role in Microsoft's AI-driven ecosystem. In June 2024, Power Query templates entered preview in Power BI and Microsoft Fabric, providing reusable scripts with metadata for streamlined dataflow creation and project portability across environments.[19] Enhancements to Copilot integration enabled natural language-guided transformations in Power BI, such as automated query generation and data shaping; in September 2025, Copilot in Dataflow Gen2 reached general availability, supporting natural language for generating and explaining transformations, with iterative improvements boosting accuracy for semantic models and reports.[20] Simultaneously, Power Query expanded within Microsoft Fabric, powering unified data pipelines in data engineering workloads like lakehouses and warehouses, which support hybrid cloud-on-premises scenarios through seamless connectivity to OneLake and external sources.[21]
These milestones mark Power Query's evolution from a standalone Excel add-in to a core ETL component of the Power Platform, fostering scalable, low-code data workflows that integrate disparate sources while maintaining consistency via the M language across versions.[1] This shift has enabled organizations to build hybrid architectures, combining on-premises SSAS models with cloud services like Dataverse and Fabric for end-to-end analytics.[18]
User Interface and Workflow
Power Query Editor Interface
The Power Query Editor provides a graphical user interface for data preparation, allowing users to connect to data sources, preview content, and apply transformations visually.[22] It is accessible in Microsoft Excel through the Data tab by selecting Get Data, which launches the editor after source selection, or in Power BI Desktop via the Transform Data button on the Home tab, initiating the editor directly from the report view.[22] Upon entry, users encounter an initial data preview via the Navigator dialog, where they select tables or sheets from the source for loading into the editor.[23]
The editor's layout centers on a ribbon at the top, featuring Home, Transform, and View tabs that organize commands for common actions such as connecting to new sources, applying filters, or toggling display modes.[22] Below the ribbon lies the Queries pane on the left, which lists all loaded queries and enables management tasks like renaming, duplicating, or grouping them for organization.[22] The central area displays the Data preview grid, a tabular view of the current query's data, allowing real-time inspection as changes are made.[22]
Additional panels enhance interaction: the Formula bar, visible when enabled via the View tab, shows M expressions for the selected step, facilitating quick edits to the underlying code.[22] The Properties pane, accessible by right-clicking a query in the Queries pane, displays settings such as query name and description, while error diagnostics appear in the status bar at the bottom, reporting details like row counts, execution time, or issues encountered during refresh.[22] For advanced customization, the Advanced Editor dialog, opened from the View tab or Home ribbon, provides a full view of the query's M code for manual scripting.[22]
Navigation within the editor includes the Diagram view, introduced in November 2020 updates to Power Query Online, which visualizes query dependencies as a flowchart to illustrate dataflow relationships and transformations.[24] The Schema view, available in Power Query Online, offers a focused interface for schema-level operations such as managing columns and data types.[25] This view aids in understanding complex query structures without delving into code. The Applied Steps pane on the right records sequential changes applied to the data, serving as a visual audit trail.[22] Users can reference the M language briefly through the Advanced Editor for custom modifications when visual tools are insufficient.[22]
In Power Query, the Applied Steps list serves as a sequential record of all transformations applied to a dataset, enabling users to track, review, and manage the data shaping process. This list appears in the Query Settings pane and includes default steps such as "Source," which connects to the data origin, "Changed Type," which adjusts column data types, and "Removed Columns," which eliminates unnecessary fields. Each step represents a specific operation, and selecting one in the list previews the dataset at that point in the workflow, facilitating step-by-step verification.[5][22]
Users can manage these steps dynamically to refine transformations. For instance, steps can be deleted by right-clicking and selecting the option or using the X icon, which removes the step and any dependent subsequent ones if needed. Reordering is possible via drag-and-drop or right-click menu to move a step before or after another, ensuring logical sequence. Editing settings for a step, such as modifying a filter criterion or source URL, is available through the right-click menu where applicable, while renaming provides clarity for complex queries. Additionally, inserting a new step after a selected one or extracting previous steps into a new query supports modular development. This management allows iterative adjustments without restarting the entire process.[5]
Power Query offers a wide range of built-in transformation options, over 350 in total, categorized under tabs like Home, Transform, and Add Column in the editor ribbon. Common operations include filtering rows to retain only relevant records based on conditions like date ranges, splitting columns to separate delimited values into multiple fields, and pivoting or unpivoting data to switch between columnar and row-based formats for analysis. Merging queries combines datasets from multiple sources using join types such as inner or left outer, while appending stacks tables vertically for unified views. These transformations are applied sequentially, with each addition automatically generating an entry in the Applied Steps list.[22][2]
The workflow emphasizes iterative development, where users preview data changes in real-time after each step and apply modifications directly in the Power Query Editor interface. This supports branching through query references, where a new query can reference an existing one at a specific step, enabling parameterized variations like dynamic date filters without duplicating base transformations. Upon completion, steps are committed via the Close & Apply command, loading the shaped data into the host application. Each applied step corresponds to underlying M language code, though edits are primarily graphical.[22][5]
Error resolution in the Applied Steps focuses on diagnosing and addressing step failures to maintain query integrity. When a step encounters an issue, such as a missing column or inaccessible source, a yellow error pane displays the reason (e.g., Expression.Error), message, and details, preventing full query loading until fixed. Users resolve these by reviewing steps sequentially, adjusting parameters like file paths for DataSource.NotFound errors, or removing problematic steps. Diagnostics tools, accessible via the editor, provide deeper insights into evaluation errors, while cell-level issues like data type mismatches can be handled by replacing errors with nulls or defaults directly in the preview. Refresh mechanisms update the dataset upon source changes, re-evaluating steps from the beginning, but persistent errors require targeted fixes to avoid propagation.[26][5]
The M Language
Syntax and Core Concepts
The Power Query M language is a functional, case-sensitive programming language similar to F#, specifically designed for creating data mashups by transforming and combining data from various sources, and it employs lazy evaluation to optimize performance by deferring computations until necessary.[27][28] As a higher-order functional language, M treats functions as mappings from input values to output values, enabling composable operations that align with data transformation workflows.[29]
At its core, M revolves around expressions that evaluate to values, where every syntactic construct produces a computable result, fostering a declarative style suited to data tasks. Primary value types in M include lists for ordered collections, records for named fields, and tables as structured data containers, which serve as the foundational building blocks for mashup operations. Functions are first-class citizens, meaning they can be passed as arguments, returned from other functions, or stored in variables, allowing for reusable and modular code in data processing.[27][29]
Basic syntax in M begins with an invocation operator = to start a query expression, such as = Table.SelectRows(previousStep, each [Column] > 5), which applies transformations to prior results. Scoping is managed through let and in constructs, where let defines intermediate variables or steps and in specifies the final output, promoting readable, step-by-step query building. The language supports error-tolerant operations, such as handling nulls or failures gracefully without halting execution, which enhances robustness in data pipelines. Let expressions form a key construct for this scoping mechanism.[27][30]
M's design prioritizes readability for non-programmers engaged in data preparation, using intuitive syntax that mirrors natural data manipulation steps, while offering extensibility through built-in libraries like Table.FromRecords for converting structures into tables. This balance supports seamless integration with the Power Query user interface, where UI actions generate underlying M code automatically.[27][28]
Let Expressions and Functions
The let expression serves as the foundational construct for defining Power Query M queries, enabling the modular construction of data transformations by assigning intermediate results to named variables before yielding a final output.[31] This structure promotes readability and reusability, particularly in data pipelines where multiple steps are chained together.[30]
The syntax of a let expression follows the form let Variable1 = expression1, Variable2 = expression2, ... in finalExpression, where each variable binds the result of an expression, and the in clause specifies the output, typically the last variable or a derived value.[31] For instance, a simple let expression to load and filter a CSV file might appear as:
let
Source = Csv.Document("OrderID,Price\n1,100\n2,50\n3,75", [Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
Filtered = Table.SelectRows(Source, each [Price] > 60)
in
Filtered
let
Source = Csv.Document("OrderID,Price\n1,100\n2,50\n3,75", [Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
Filtered = Table.SelectRows(Source, each [Price] > 60)
in
Filtered
This loads the CSV data into a table via Csv.Document and applies row filtering using Table.SelectRows, returning only rows where the Price exceeds 60.[32][33] For more complex pipelines, nested let expressions can encapsulate sub-queries within variables, allowing hierarchical organization of transformations.[29]
M includes a rich library of built-in functions that integrate seamlessly within let expressions for common operations. The Table.Group function performs aggregation by grouping rows on specified keys and applying aggregators to columns, such as summing values; for example:
let
Source = Table.FromRecords({
[CustomerID = 1, Price = 20],
[CustomerID = 1, Price = 10],
[CustomerID = 2, Price = 30]
}),
Aggregated = Table.Group(Source, {"CustomerID"}, {{"TotalPrice", each List.Sum([Price]), type number}})
in
Aggregated
let
Source = Table.FromRecords({
[CustomerID = 1, Price = 20],
[CustomerID = 1, Price = 10],
[CustomerID = 2, Price = 30]
}),
Aggregated = Table.Group(Source, {"CustomerID"}, {{"TotalPrice", each List.Sum([Price]), type number}})
in
Aggregated
This groups by CustomerID and computes the sum of Price for each group.[34] For string manipulations, Text.Combine joins a list of text values with an optional separator, useful in data cleaning tasks like concatenating fields.[35] Temporal data handling relies on DateTime functions, such as DateTime.From to construct datetime values from date and time components, or DateTime.ToText to format them as strings for reporting.[36]
Users can define custom functions in M using the lambda syntax (parameters) => expression, which creates invocable mappings akin to built-in ones and can be assigned to variables in let expressions for reuse.[37] A basic example defines a function to add a fixed value to a number:
let
AddFive = (x as number) => x + 5,
Result = AddFive(10)
in
Result
let
AddFive = (x as number) => x + 5,
Result = AddFive(10)
in
Result
This yields 15 and demonstrates how custom functions enhance modularity in queries.[37]
Data Handling in M
Data Types and Assertions
Power Query's M language employs a type system that classifies values to ensure data integrity and facilitate transformations. Primitive types form the foundational building blocks, including type text for Unicode strings, type number for numeric values (such as decimals or integers), type datetime for date and time combinations, and type logical for boolean values (true or false).[38][39] These types support nullable variants, denoted as type nullable text or similar, which accommodate the null value alongside the base type, allowing for optional or missing data without disrupting queries.[38] The type null primitive specifically classifies the absence of a value.[38]
Structured types in M build upon primitives to represent complex data arrangements. Records are defined with field-specific types, such as type [Name = text, Age = number], enabling named key-value pairs that can be open (allowing extra fields) or closed.[38] Lists specify an item type, for example type list or more precisely type {number} for collections of numbers, maintaining order while permitting heterogeneous elements unless constrained.[38] Tables, akin to structured datasets, declare column schemas like type table [Column1 = text, Column2 = number], supporting row-based operations and optional key definitions for uniqueness.[38][39] These structured types enhance data modeling by enforcing schemas during transformations.
Assertions in M enforce type compatibility, preventing errors from mismatched data. The as operator performs type ascription, such as 42 as number, which returns the value if compatible or raises an Expression.Error if not, as in "abc" as number.[40] For non-enforcing checks, the is operator tests compatibility, returning a logical value: 42 is number yields true, while "abc" is number yields false.[40] Functions like Value.Type(value) retrieve a value's type, aiding in dynamic assertions, and constructs such as try expression otherwise alternative handle potential type-related errors gracefully by providing fallbacks.[38][41] Null values are compatible with any, null, or nullable types under these checks.[40]
Type propagation in Power Query combines automatic inference with manual controls. In the user interface, types are inferred from source schemas for structured data or sampled from the first 200 rows for unstructured sources, adjustable via options like "Detect data type."[39] Manually, M code applies types using functions such as Table.TransformColumnTypes(sourceTable, {{"Column", type text}}) to enforce schemas across columns, or Value.ReplaceType(value, type number) for individual values with minimal validation.[39][41] Locale settings influence inference, such as interpreting date formats differently between regions.[39] This propagation ensures consistent data flow, often documented briefly in let expressions for clarity.[41]
In the Power Query M language, comments serve as documentation tools that are ignored by the evaluator during query execution, allowing developers to annotate code without affecting runtime behavior. Single-line comments begin with two forward slashes (//) and extend to the end of the line, while multi-line comments are enclosed between /* and */ and can span multiple lines. These comment types are particularly useful for explaining complex transformation steps, such as custom function logic or data merging operations, to improve code maintainability in collaborative environments.[42][43]
Errors in M represent failures during expression evaluation, typically manifested as records that halt processing unless explicitly handled. Common error types include Expression.Error, which signals a general failure in evaluating an expression (e.g., invalid operations like division by zero), and Conversion.Failed, which occurs during type conversions such as Number.ToText when the input cannot be properly transformed. These errors are structured as records containing fields like Reason (specifying the error type, e.g., "Expression.Error"), Message (a descriptive string, e.g., "We cannot convert the value null to type Text"), and Detail (additional diagnostic information, often null or an error detail record).[44]
Error handling in M primarily relies on the try expression, which wraps a protected-expression and optionally includes an error-handler to manage failures gracefully. The basic syntax is try protected-expression otherwise default-expression, where successful evaluation returns the result of protected-expression, but an error triggers the return of default-expression (e.g., try Number.FromText("invalid") otherwise null returns null instead of propagating the Conversion.Failed error). For more advanced control, the catch clause can capture the error as a parameter: try protected-expression catch errorValue => customLogic, enabling inspection of the error record before deciding on a response. Custom errors can be raised using Error.Record to construct records with specified Reason, Message, and Detail fields, such as Error.Record("CustomError", "Invalid input detected", [Input = invalidValue]). Unhandled errors propagate upward through the query evaluation, potentially causing the entire step or query to fail, which underscores the importance of wrapping risky operations like data source accesses or type assertions.[45][46]
Best practices for error management in M include using comments to document anticipated error scenarios in complex let expressions, thereby aiding debugging and team reviews. For runtime logging, the Diagnostics.Trace function outputs trace messages at specified levels (e.g., Information, Warning, Error) when tracing is enabled, returning the input value unchanged to avoid disrupting query flow—e.g., Diagnostics.Trace(TraceLevel.Information, "Processing row: " & Text.From(rowIndex), value). Additionally, Query Diagnostics provides a built-in tracing tool in Power Query Editor or Power BI Desktop, capturing detailed evaluation logs during authoring or refresh to identify error sources, such as failed transformations in applied steps. Assertions can be referenced briefly for proactive prevention by validating inputs early, complementing reactive try handling.[47][48]
Data Connectivity
Built-in Connectors
Power Query provides a wide array of built-in connectors that enable users to import data from diverse sources directly within applications like Power BI, Excel, and Power Apps. These connectors facilitate seamless connectivity to files, databases, web services, and cloud platforms without requiring custom coding, supporting a variety of data formats and protocols. As of October 2025, Power Query includes over 150 such connectors, categorized to cover common data ingestion scenarios.[49]
The connectors are grouped into primary categories, each tailored to specific data environments. For file-based sources, Power Query supports connectors for formats such as CSV, Excel workbooks, JSON files, PDF documents, and XML, allowing users to load structured or semi-structured data from local or network files. Database connectors target relational systems, including SQL Server, Oracle Database, PostgreSQL, MySQL, and Azure SQL Database, enabling queries against tables, views, or stored procedures. Web connectors handle online data feeds, such as OData services, REST APIs, and platforms like Google Analytics or Salesforce, often via URL-based access. Cloud connectors integrate with services like Azure Blob Storage, SharePoint lists, Amazon S3, and Google BigQuery, supporting scalable storage and collaboration tools.[50]
Authentication for these connectors varies by category and source but commonly includes options like Basic authentication (username/password), OAuth 2.0 for secure token-based access, Windows integrated authentication for on-premises networks, and organizational accounts via Microsoft Entra ID. Users configure these during connection setup to ensure secure data retrieval, with some connectors supporting anonymous access for public web sources.[51]
To use a built-in connector, users access the "Get Data" menu in the Power Query Editor or host application, select the desired connector from the list, and provide necessary parameters such as server addresses, database names, file paths, URLs, or API endpoints, followed by credential entry. This process generates an initial query that can be refined with transformations. Certain connectors, particularly those for on-premises databases or files, require an on-premises data gateway for cloud-based services like Power BI Service to enable refresh capabilities.[50]
While most built-in connectors are available in standard editions, some—such as the Exact Online Premium connector or advanced cloud integrations—necessitate a premium license for full functionality. Additionally, compatibility with query folding (which pushes transformations back to the source for efficiency) depends on the connector, though not all support live connections like DirectQuery. Gateway requirements apply to non-cloud sources in shared environments, potentially adding setup overhead.[50]
Query Folding Mechanism
Query folding is an optimization mechanism in Power Query that enables the translation of data transformation steps written in the M language into native queries executed directly at the data source, thereby minimizing data transfer and leveraging the source's computational capabilities.[8] This process occurs automatically when Power Query detects compatible steps, offloading evaluation to the source where possible, such as converting a filter operation into a SQL WHERE clause for relational databases.[8]
The mechanism analyzes the sequence of applied steps in a query, identifying those that can be folded back to the source, and combines them into a single native query before retrieving results.[8] For instance, operations like selecting specific columns, sorting rows, or limiting row counts—such as keeping the top 10 rows from a large table—can be fully folded, resulting in the source returning only the processed subset of data.[52] Similarly, filtering rows based on conditions, grouping data for aggregation, and performing joins between tables are often supported on foldable connectors, such as relational databases, where these translate to efficient SQL equivalents like SELECT with GROUP BY or INNER JOIN.[8] Partial folding may occur if some steps are compatible while others are not, allowing initial transformations to execute at the source before subsequent ones in Power Query.[52]
Folding breaks when encountering non-foldable operations, such as custom M functions, complex calculations not native to the source, or transformations like certain text manipulations that require full data loading.[8] Connectors that support folding, including those for SQL Server and Oracle, enable these optimizations, though support varies by data source capabilities.[8] In the Power Query Editor interface, folding status is indicated by icons in the Applied Steps pane—accessible via the Query Settings view—such as a green icon for full folding up to a step, red for no folding, and yellow for runtime-determined folding.[53]
By reducing the volume of data transferred over the network and utilizing optimized source-side processing, query folding significantly enhances performance, particularly for large datasets where full loading could otherwise take minutes or longer; for example, a fully folded query on millions of rows might complete in seconds compared to hundreds of seconds without it.[52] This efficiency is crucial for scalable data preparation in tools like Power BI, serving as a key alternative to modes like DirectQuery for certain workloads.[8]
Advanced Features
DirectQuery Support
DirectQuery is a connectivity mode in Power BI that enables real-time querying of data sources without importing data into the model, contrasting with Import mode, which loads a snapshot of the data into Power BI for faster local processing. In DirectQuery, queries generated by reports or visuals are sent directly to the underlying data source each time the report is refreshed or interacted with, ensuring access to the most current data while keeping the dataset size minimal. This mode is particularly suited for scenarios where data volumes are large or change frequently, as it avoids the need for scheduled imports.[54]
To set up DirectQuery, users connect to a supported data source via the Get Data option in Power BI Desktop, selecting the DirectQuery connectivity mode during the connection process rather than Import. Once connected, the storage mode for tables is set to DirectQuery in the model's properties, and transformations in Power Query must be limited to those that support query folding—where steps are translated into native queries sent to the source—to prevent errors or performance degradation. Supported sources include relational databases such as SQL Server, Azure SQL Database, PostgreSQL, and Snowflake, as well as some multidimensional sources like SAP BW, though compatibility varies by connector. For on-premises sources, an on-premises data gateway is required to facilitate secure connections.[55][50]
The advantages of DirectQuery include near real-time data accuracy, the ability to work with large underlying datasets without importing them into the model, although individual queries and intermediate operations are limited to returning up to 1,000,000 rows in non-Premium capacities (with higher limits configurable in Premium), and adherence to row-level security defined at the source. However, it introduces dependencies on the source system's performance and availability, potentially leading to higher latency during report interactions, and restricts advanced modeling features like automatic date hierarchies or complex DAX measures that require imported data. Additionally, not all Power Query transformations are supported, as they must fold back to the source to avoid downloading entire datasets.[54][56]
As of 2025, enhancements to DirectQuery include improved performance optimizations in certain visuals (e.g., Zebra BI Tables) for handling hierarchies including ragged ones, and support for composite models, which allow mixing DirectQuery tables with Import or Direct Lake tables within the same semantic model to blend real-time and cached data sources. Composite models enable relationships across different storage modes, such as linking live SQL Server data via DirectQuery to imported Excel files, providing flexibility for hybrid scenarios while maintaining query folding where possible. These updates, introduced in features like the May 2025 release, reduce load times for complex reports without altering the core live-querying behavior.[57][58]
Common use cases for DirectQuery involve building dashboards that require up-to-the-minute insights, such as sales metrics from operational databases or financial reports from ERP systems, where data freshness outweighs the need for sub-second visual responsiveness.[54]
Custom Functions and Connectors
Power Query allows users to extend its functionality through custom functions written in the M formula language, enabling reusable logic for data transformations. These functions are defined using a let expression structure, where the function is assigned within the let block and invoked via the in clause, facilitating modular query building. For instance, a simple custom function to convert text to uppercase can be created as follows:
let
UpperText = (input as text) => Text.Upper(input)
in
UpperText
let
UpperText = (input as text) => Text.Upper(input)
in
UpperText
This function takes a text parameter and applies the built-in Text.Upper operation, promoting reusability across multiple queries.[59][37]
To develop custom functions, users access the Advanced Editor in the Power Query Editor interface, where they manually enter M code to define the function's parameters and body. Parameterization enhances flexibility; for example, a function processing a list of values might accept both the list and a delimiter as inputs, allowing dynamic invocation based on query context. Once defined, the function appears in the Queries pane and can be invoked in subsequent steps, such as adding a custom column via the "Invoke Custom Function" option in the Add Column tab, passing column values as arguments to apply the logic row-wise. This approach supports complex scenarios like data validation or aggregation without duplicating code.[59][31][60]
Custom connectors further extend Power Query by integrating with non-native data sources, such as proprietary APIs, using the Power Query SDK. Developed using the M formula language for defining schemas and logic, with optional .NET components for advanced features such as custom authentication, and supporting Web APIs for RESTful interactions, these connectors encapsulate authentication, navigation, and data retrieval mechanisms tailored to specific services. The SDK provides templates for common patterns, including OAuth 2.0 support and schema definition via M extension files (.pq), enabling seamless integration with sources like custom OData endpoints or internal databases.[61][62]
Development occurs in Visual Studio Code with the official Power Query SDK extension, which offers syntax highlighting, validation, and a test runner for simulating queries like ConnectorName.Contents(). Completed connectors are packaged as .mez files—a zipped archive containing the .pq definition, resources, and icons—for distribution. For proprietary APIs, developers define navigation tables and entity schemas in M, ensuring query folding where possible to optimize performance.[61][63]
Deployment involves placing the .mez file in the user's Custom Connectors directory (e.g., Documents\Microsoft Power BI Desktop\Custom Connectors) for local use in Power BI Desktop or Excel, with security settings adjusted to allow uncertified connectors. For enterprise scenarios, connectors are uploaded to the on-premises data gateway, enabling refresh in the Power BI service and Microsoft Fabric environments; this supports shared access across workspaces without per-user installation. Certification by Microsoft, involving signing with a trusted certificate, allows broader distribution via the built-in Get Data dialog.[62][64][61]