OpenXR
OpenXR is a royalty-free, open standard API developed by the Khronos Group that provides high-performance, cross-platform access to augmented reality (AR) and virtual reality (VR) platforms and devices, collectively known as extended reality (XR).[1] It enables developers to create portable XR applications that run across diverse hardware, including head-mounted displays, controllers, hand trackers, eye trackers, and haptic devices, without needing to rewrite code for each vendor's ecosystem.[1] The development of OpenXR began in late 2016 when an exploratory group formed during a Khronos face-to-face meeting in Korea to address the fragmentation in VR hardware ecosystems.[2] In early 2017, the OpenXR Working Group was officially established under the Khronos Group, comprising major industry players such as Microsoft, Oculus (now Meta), Sony Interactive Entertainment, Epic Games, and Qualcomm, to define a unified API specification.[2] After iterative feedback phases, provisional specification 0.90 was released for public review in March 2019 at the Game Developers Conference (GDC), followed by the full OpenXR 1.0 specification on July 29, 2019, at SIGGRAPH, establishing a foundational standard for XR interoperability.[3] On April 15, 2024, Khronos released OpenXR 1.1, which integrated several extensions into the core specification to further reduce developer overhead and enhance support for advanced XR features like foveated rendering and scene anchors.[4] OpenXR's architecture separates applications from underlying runtimes via a loader and extension system, allowing core APIs for essential functions like spatial tracking, rendering, and input handling, while extensions enable vendor-specific capabilities such as hand tracking or passthrough video.[1] It has been integrated into major game engines, including Unreal Engine since version 4.24 and Unity since 2020 LTS, facilitating seamless XR development.[1] As of June 2025, 39 products and runtimes have achieved OpenXR conformance, including Meta Quest 2/3/Pro, HTC VIVE Cosmos and XR Elite, Varjo XR-3/4, Microsoft HoloLens 2, Valve SteamVR, and Qualcomm's Monado implementation, spanning platforms like Windows, Android, and Linux.[5] This widespread adoption underscores OpenXR's role in streamlining XR ecosystems and promoting innovation across industries such as gaming, enterprise training, and medical simulation.[5]Introduction
Definition and Scope
OpenXR is a royalty-free, open standard API developed by the Khronos Group, designed to provide high-performance access to augmented reality (AR), virtual reality (VR), and mixed reality (MR) devices and platforms, collectively known as extended reality (XR).[1][6] This standard establishes a common set of application programming interfaces (APIs) that enable developers to create XR applications compatible with a diverse array of hardware from multiple vendors, without requiring platform-specific implementations.[1] The core purpose of OpenXR is to deliver a unified API that facilitates cross-platform XR development, allowing applications to operate seamlessly across varied hardware ecosystems while minimizing the fragmentation caused by proprietary SDKs and runtime environments.[7] By abstracting device-specific details, OpenXR separates application logic—such as scene rendering and user interactions—from hardware-dependent operations like input processing and output rendering, thereby streamlining development and enhancing portability.[8] Key concepts include support for spatial computing through 3D positional tracking and interaction models, as well as optional extensions for advanced features like hand tracking (via XR_EXT_hand_tracking), eye tracking (via XR_EXT_eye_gaze_interaction), and passthrough video feeds (via XR_FB_passthrough), which integrate real-world visuals into virtual environments.[6] Initially focused on desktop and mobile XR platforms, OpenXR has expanded to include support for standalone headsets, broadening its applicability to untethered devices.[1] From version 1.0 onward, the standard guarantees full backward compatibility, ensuring that applications built against earlier specifications continue to function with subsequent updates without modification.[9] This commitment to stability, combined with its layered architecture that briefly interfaces with runtime components for hardware abstraction, positions OpenXR as a foundational element for the XR ecosystem.[8]History and Milestones
The development of OpenXR originated as an initiative by the Khronos Group to address the growing fragmentation in extended reality (XR) platforms, where proprietary SDKs such as Oculus SDK and OpenVR created silos that hindered cross-device application portability.[2] In December 2016, the Khronos Group announced the formation of the OpenXR working group, bringing together leading XR companies including Valve, Oculus, Sony, and Microsoft to collaborate on a unified standard.[10] The working group was officially established in early 2017, marking the start of collaborative specification development.[11] Key milestones in OpenXR's evolution include the release of its provisional specification in March 2019, which provided an initial framework for high-performance XR access across AR and VR devices. This was followed by the finalization of OpenXR 1.0 on July 29, 2019, establishing a royalty-free, open standard API that enabled developers to build applications compatible with multiple hardware platforms while ensuring backward compatibility for future updates. The specification's 1.0 version laid the groundwork for ecosystem-wide adoption by defining core runtime interactions and loader mechanisms. In April 2024, OpenXR 1.1 was released, promoting widely used extensions—such as Local Floor (from XR_EXT_local_floor) and Grip Surface (from XR_EXT_palm_pose)—into the core specification to reduce developer overhead and streamline advanced XR feature implementation, while committing the Khronos Group to an annual release cycle for ongoing enhancements.[4] Earlier that year, in September, Blender announced alignment of its XR roadmap with OpenXR standards during a developer meeting, focusing on expanded support for immersive 3D workflows in virtual and mixed reality environments.[12] Complementing this, Unity verified full production-ready support for Android XR via OpenXR in September 2025, allowing developers to target emerging Android-based headsets with unified tooling.[13] Pivotal adoption events underscored OpenXR's growing industry traction. In 2020, Meta integrated OpenXR runtime support starting with Oculus software version 19, enabling developers to submit OpenXR-compliant applications to the Oculus Store and facilitating cross-platform VR development on Quest devices.[14] Microsoft incorporated OpenXR into its Mixed Reality Toolkit for HoloLens 2, allowing holographic applications to leverage the standard for immersive interactions across Windows-based XR ecosystems.[15] At SIGGRAPH 2025, the Khronos Group presented on OpenXR's core specification updates and platform interoperability advancements, highlighting demonstrations of seamless XR application portability across diverse hardware vendors.[16]Technical Foundation
Architecture Overview
OpenXR employs a layered architecture designed to abstract hardware-specific details and promote cross-platform compatibility in extended reality (XR) development. At the highest level, the Application Layer consists of developer code that interacts directly with the OpenXR API to create immersive experiences. This layer communicates with the Runtime Layer, which serves as a vendor-neutral intermediary that translates API calls into hardware-appropriate instructions. The Runtime Layer, in turn, interfaces with the Platform/Device Layer, encompassing device drivers and platform-specific implementations that handle low-level operations for diverse XR hardware such as head-mounted displays and controllers. This stratified approach ensures that applications remain portable across varying ecosystems without requiring modifications for specific vendors.[17][6] Central to OpenXR's design principles is platform neutrality, achieved through a modular framework that decouples application logic from underlying hardware dependencies, allowing a single codebase to target multiple devices. Extensibility is facilitated by the OpenXR Loader, a core component that dynamically discovers and loads runtimes, enabling seamless integration of extensions and support for multiple runtimes coexisting on the same system—such as Windows Mixed Reality alongside vendor-specific solutions. The loader employs runtime discovery mechanisms, like environment variables or system configurations, to select an active runtime, ensuring only one is operational at a time while permitting fallback options for robustness. This architecture also accommodates both symmetric XR experiences, like stereoscopic virtual reality, and asymmetric ones, such as monocular augmented reality views, by configuring viewports and rendering parameters accordingly.[1][6][17] Key concepts in OpenXR's framework include instance creation, which initializes anXrInstance object to manage sessions and establish connections to the runtime, supporting multiple instances per application if permitted. Swapchains, created via the XrSession, provide buffers for efficient rendering and presentation of views to the user, with customizable formats and usage flags to optimize for different XR modes. For augmented reality persistence, spatial anchors—represented as XrSpace objects—enable tracking and anchoring of virtual elements to real-world coordinates, facilitating stable overlays in dynamic environments. Error handling is enhanced through validation layers, which intercept API calls to perform compliance checks and debugging, toggleable for development but removable in production to maintain performance.[6][17]
API Layers and Components
OpenXR employs a layered API architecture that enables applications to interact with runtime implementations through a series of functional interfaces, where optional API layers can intercept calls for purposes such as validation or debugging.[6] These layers are enumerated using thexrEnumerateApiLayerProperties function and activated during instance creation with xrCreateInstance, allowing developers to insert custom code between the application and the runtime without altering core behavior. The action system forms a key layer for input handling, where xrCreateActionSet creates an action set to group related actions, such as poses or button inputs, which are then suggested for binding to interaction profiles via paths like /user/hand/left/input/[aim](/page/AIM)/pose or /user/hand/right/input/select/[click](/page/Click). Actions are synchronized with the session using xrSyncActions, enabling flexible, device-agnostic input mapping that abstracts hardware differences.[18][19]
Core components include sessions, events, and interaction mechanisms that drive the XR experience. Sessions are initiated with xrBeginSession, transitioning the session into a running state for VR or AR modes after enumeration of view configurations, ensuring the application can begin rendering frames in a stable loop involving xrWaitFrame, xrBeginFrame, and xrEndFrame. Events are polled asynchronously via xrPollEvent to capture state changes, such as session focus or input notifications, stored in an XrEventDataBuffer for processing. Interaction handling extends to hand and controller tracking through action-based APIs, where xrGetActionStatePose retrieves pose data for bound actions, supporting natural user inputs across devices.[18]
Spatial tracking is facilitated through data structures like XrPosef, which encapsulates 6DoF (six degrees of freedom) information with a position vector (XrVector3f in meters) and orientation quaternion (XrQuaternionf), used in functions such as xrLocateSpace to compute real-time poses and velocities relative to reference spaces. The rendering pipeline integrates projection views, where XrViewConfigurationType enums define display setups like XR_VIEW_CONFIGURATION_TYPE_PRIMARY_STEREO for binocular rendering (left eye at index 0, right at 1) or monoscopic modes, passed during session setup to match device capabilities. Composition layers, submitted via xrEndFrame, overlay application content using structures like XrCompositionLayerProjection, which specify swapchain sub-images and layer flags for depth or quad overlays.[20]
Specific concepts enhance AR integration, such as environment blend modes defined by XrEnvironmentBlendMode, which control how virtual elements composite with the real world—options include opaque rendering (XR_ENVIRONMENT_BLEND_MODE_OPAQUE), additive blending for transparency, or alpha modes for passthrough occlusion, applied per view configuration to support mixed reality scenarios. Action binding to paths ensures portability, as developers suggest bindings like associating a pose action to /user/hand/left for left-hand tracking, allowing the runtime to map these to available hardware inputs dynamically.[21]
Specifications and Features
Core API Elements
The core API elements of OpenXR provide the foundational mechanisms for applications to initialize, manage sessions, handle input and output, and render XR content across diverse hardware platforms. These elements are designed to abstract hardware-specific details, enabling portability while allowing runtimes to optimize performance. The API operates through a handle-based system where objects like instances and sessions are created and destroyed explicitly, ensuring resource management and state tracking.[8] Initialization begins with xrCreateInstance, which establishes a connection to an XR runtime by providing application information and optional extensions or layers via an XrInstanceCreateInfo structure; this returns an XrInstance handle on success, representing the entry point for all subsequent API interactions.[22] Once initialized, xrGetSystem queries the runtime for an XrSystemId based on a specified form factor, such as head-mounted displays (XR_FORM_FACTOR_HEAD_MOUNTED_DISPLAY), allowing applications to select appropriate hardware capabilities like view configurations and environment blend modes.[23] To start an XR experience, xrCreateSession creates an XrSession handle using the instance, system ID, and a graphics binding structure (e.g., XrGraphicsBindingOpenGLWin32KHR for OpenGL on Windows), which links the application's graphics context without direct API binding in the core specification—specific graphics APIs like Vulkan, OpenGL, or DirectX are integrated via these bindings.[24] Finally, xrDestroyInstance terminates the instance and releases all associated resources, including any child objects like sessions, ensuring clean shutdown.[25] Input handling in the core API is action-based, decoupling application logic from device-specific inputs to promote cross-device compatibility. Applications define actions (e.g., "grab" or "aim") grouped into action sets, which are attached to a session using xrAttachSessionActionSets; states such as boolean triggers or float values are queried per frame via functions like xrGetActionStateFloat or xrGetActionStatePose, providing current, historical (at least 50 ms), or predicted input data at a specified XrTime.[18] To optimize bindings, xrSuggestInteractionProfileBindings allows applications to propose mappings between their actions and standard interaction profiles (e.g., /interaction_profiles/khr/simple_controller for basic controllers), enabling the runtime to evaluate and apply compatible suggestions for better user experience without mandating them.[26] For output and rendering, the API supplies view and projection data through structures like XrView, which includes pose (position and orientation via XrPosef) and field-of-view (XrFovf) for each eye or view, and XrProjection2D, which defines the projection matrix parameters such as near/far clip planes and tangent values for perspective correction.[27][28] Session management revolves around a structured frame loop to ensure low-latency rendering synchronized with hardware display timing. The loop starts with xrWaitFrame, which blocks until the runtime is ready for the next frame and returns a predicted display time (XrTime) indicating when the frame will be presented, allowing applications to compute poses and inputs accordingly for minimal motion-to-photon latency.[29] This is followed by xrBeginFrame, which begins the frame using the predicted time and retrieves view configurations via xrLocateViews to update XrView poses based on head tracking; applications then render to swapchains and end the frame with xrEndFrame, submitting composed images.[30] The predicted display time supports accurate synchronization, with runtimes guaranteeing at least 50 ms of historical data for past predictions.[31] The core API also incorporates support for advanced rendering optimizations, such as foveated rendering hints, where applications can suggest variable resolution based on gaze direction through view configuration properties, though full implementation often relies on runtime capabilities.[32] For augmented reality scenarios, depth submission enables occlusion by allowing applications to provide depth buffers via XrFrameEndInfo attachments during frame submission, integrating virtual content with real-world geometry for realistic compositing.[33] These elements collectively form a robust, extensible foundation for XR development, emphasizing performance and interoperability without tying to specific graphics pipelines in the baseline API.[34]Versions and Extensions
The OpenXR specification began with version 1.0, released on July 29, 2019, which established the foundational API for cross-platform XR development, including core elements such as instance creation, session management, and basic spatial tracking to enable consistent access to VR, AR, and MR hardware.[3] This initial release focused on baseline capabilities like action-based input and composition layers, providing a stable foundation while allowing experimentation through optional extensions.[1] Version 1.1, ratified and released on April 15, 2024, advanced the specification by promoting several widely adopted extensions into the core API to reduce developer fragmentation and streamline implementation of advanced features.[4] Notable integrations include the local floor reference space (from XR_EXT_local_floor), which provides a gravity-aligned, world-locked origin for standing-scale XR content with estimated floor height detection, eliminating the need for explicit extension enablement in applications.[35] This update also commits the OpenXR Working Group to an annual release cadence for incorporating mature extensions and addressing ecosystem needs.[4] OpenXR employs an extension mechanism to extend core functionality without disrupting backward compatibility, allowing runtimes, API layers, or loaders to opt-in new features dynamically.[6] Extensions are categorized as provisional or final (ratified); provisional ones, marked with a KHX prefix to denote experimental status, enable early testing and feedback before promotion to final KHR or EXT status.[36] For instance, the XR_EXT_eye_gaze_interaction extension supports eye gaze tracking for applications like foveated rendering, where gaze direction informs variable resolution rendering to optimize performance.[37] The OpenXR loader facilitates dynamic loading of extensions by querying runtime support via functions like xrGetInstanceProcAddr, ensuring applications can access optional features at runtime without recompilation.[38] In 2025, extensions have increasingly targeted mobile and spatial computing platforms, enhancing Android XR compatibility through vendor-specific additions like those for plane detection and spatial anchors.[39] These include the OpenXR Spatial Entities extensions, released on June 10, 2025, which standardize environmental feature tracking (e.g., planes and markers) and persistent spatial anchors across sessions, promoting mobile-specific capabilities toward potential core integration in future releases.[39] Examples encompass XR_EXT_hand_tracking for gesture recognition via joint pose data, enabling natural hand interactions without controllers.[40] Conformance to the OpenXR specification is validated through the OpenXR Conformance Test Suite (CTS), an open-source toolset that verifies runtime implementations against core and extension requirements, ensuring consistent behavior across devices.[41] The CTS includes tests for API layers, extensions, and interactivity, with results submitted for official Adopter status to maintain ecosystem reliability.[42] OpenXR's deprecation policy preserves compatibility by retaining deprecated extensions in the API registry with clear markings until at least the next major version release, allowing gradual migration while avoiding abrupt breaks in existing applications.[43] This approach, applied to prerelease or superseded features, ensures long-term stability as the specification evolves.[44]Implementations
Runtime Implementations
OpenXR runtimes serve as the intermediary software layer that connects applications to XR hardware and platforms, abstracting device-specific details through the standardized API. These implementations must pass the Khronos Group's Conformance Test Suite (CTS) to ensure compatibility and reliability, allowing developers to use the official OpenXR logo on verified products.[5][45] Among primary runtimes, the Oculus Runtime from Meta provides full support for OpenXR 1.0, with conformance achieved for devices like the Quest 2, Quest 3, and Quest Pro in 2023, enabling native VR applications across Meta's ecosystem.[5][46] The SteamVR Runtime by Valve, which includes open-source components, supports OpenXR 1.0 with conformance since 2021, facilitating integration with a wide array of PC-based VR headsets through its extensible architecture.[5][47] Monado, an open-source runtime developed by Collabora, targets Linux environments primarily but extends to Windows and Android, achieving OpenXR 1.0 conformance in 2021 for simulation devices and emphasizing cross-platform accessibility without proprietary dependencies.[5][48][49] Vendor-specific implementations further expand OpenXR's reach. The Microsoft Mixed Reality Runtime integrates OpenXR 1.0 support for HoloLens 2 and Windows Mixed Reality headsets, with conformance dating back to 2020, allowing seamless access to mixed reality features on Windows 10 and later.[5][15] Varjo's Runtime, geared toward enterprise XR applications, supports OpenXR 1.0 with updates for high-end headsets like the XR-4 in 2024, including specialized extensions for professional visualization.[5][50] Google's Android XR platform, which integrates ARCore for augmented reality features, provides OpenXR 1.0 and 1.1 runtime support through the Android XR platform on hardware such as Snapdragon XR2 Gen 2, with runtime conformance achieved in December 2024, enabling AR experiences on Android-based XR devices.[5][51] Recent additions include the NVIDIA CloudXR runtime, which achieved conformance for OpenXR 1.0 and 1.1 on Linux in June 2025, and Sony's ELF-SR1 and ELF-SR2 runtimes on Windows 11 in March 2025.[5] Runtime selection occurs dynamically via the OpenXR loader, which on Windows uses the registry keyHKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenXR\1\ActiveRuntime to prioritize implementations, while JSON files can specify custom paths for loader configuration.[52] Multi-runtime support on Windows allows seamless switching between options like SteamVR and Oculus without rebooting, registered under AvailableRuntimes for system-wide discovery.[53] Verified runtimes earn Khronos conformance logos, signaling adherence to the standard and enabling IP protection for adopters.[1]
Challenges in runtime implementations include performance overhead during switching, as transitioning between runtimes like SteamVR and Varjo can introduce latency or micro-stutters due to reinitialization of graphics contexts.[54] Debugging is aided by OpenXR Validation Layers, part of the Khronos SDK, which intercept API calls to detect errors and validate conformance in real-time.