Open Neural Network Exchange
The Open Neural Network Exchange (ONNX) is an open-source format and ecosystem designed to represent machine learning models in a standardized way, enabling seamless interoperability across diverse frameworks, tools, runtimes, and hardware platforms.[1][2] Developed initially in 2017 as a collaborative effort between Microsoft and Facebook (now Meta) to address the challenges of model portability in AI development, ONNX defines an extensible computation graph model with built-in operators, standard data types, and a protobuf-based serialization format that supports both deep learning and traditional machine learning workflows.[3][4] In December 2017, version 1.0 was released with additional support from AWS and other partners, marking its transition to a production-ready standard.[5] Now governed as a graduate project under the LF AI & Data Foundation, ONNX fosters a community-driven ecosystem that includes special interest groups for areas like quantization and runtime optimization, allowing developers to train models in one framework (such as PyTorch or TensorFlow) and deploy them in another without proprietary lock-in.[6] Key benefits include enhanced hardware acceleration through compatible runtimes like ONNX Runtime, support for over 200 operators in its latest versions, and broad adoption by industry leaders including Intel, AMD, NVIDIA, and Qualcomm for efficient inference on edge devices and cloud environments.[1][2] This standardization reduces development friction, promotes innovation by decoupling model representation from specific tools, and continues to evolve through ongoing contributions, with recent focuses on advanced features like generative AI model support.[6]Overview
Definition and Purpose
The Open Neural Network Exchange (ONNX) is an open-source format designed to represent machine learning models in a framework-agnostic manner, capturing both their structural topology and parameter values to ensure portability across diverse ecosystems.[1][2] Developed as a standardized intermediate representation, ONNX supports a wide range of models, including those from deep learning and traditional machine learning paradigms, by defining an extensible computation graph that encapsulates the model's logic independently of the originating framework.[7] The primary purpose of ONNX is to enable seamless model exchange between training frameworks, such as PyTorch and TensorFlow, and inference engines, allowing developers to transfer models without loss of information or fidelity.[6][2] This interoperability addresses a key challenge in AI development by standardizing how models are serialized and shared, facilitating deployment on various hardware accelerators and runtimes while preserving the model's intended behavior.[1] ONNX achieves this through Protocol Buffers (protobuf)-based serialization, which provides a compact and extensible binary format for encoding models. At its core, an ONNX model is represented as a directed acyclic graph (DAG), consisting of nodes that denote operations, along with inputs, outputs, and attributes that define tensor shapes, data types, and other metadata essential for execution.[7] This graph structure ensures that the model's computational flow and parameters are fully described in a vendor-neutral way, supporting efficient parsing and optimization by downstream tools.[2] By decoupling model creation from deployment, ONNX promotes innovation in the AI ecosystem, empowering researchers to experiment with preferred training environments while enabling production teams to optimize for specific inference platforms without proprietary constraints.[6][1] This separation fosters broader adoption of machine learning models across industries, as evidenced by its role in streamlining workflows from research to real-world applications.[2]Key Benefits
The Open Neural Network Exchange (ONNX) provides significant interoperability benefits by allowing machine learning models trained in one framework, such as PyTorch or TensorFlow, to be seamlessly exported and deployed in another without requiring retraining or reimplementation, thereby reducing vendor lock-in and facilitating hybrid workflows across development and production environments.[1][8] This standardization enables data scientists and engineers to collaborate more effectively, as models can be shared and utilized regardless of the originating toolset.[9] ONNX also offers optimization advantages through its support for shared runtime libraries and hardware accelerators from multiple vendors, which can lead to faster inference times and reduced resource consumption compared to framework-specific implementations.[1][10] For instance, ONNX Runtime, a key execution provider, applies cross-platform optimizations that enhance performance on diverse hardware like GPUs and CPUs.[11] Additionally, its portability ensures models can transition smoothly between cloud-based services, edge devices, and on-device applications, maintaining consistency in behavior and efficiency across deployment scenarios.[12][8] The adoption of ONNX has fostered ecosystem growth by promoting collaboration among AI developers, tool providers, and hardware manufacturers under an open governance model, resulting in a rich collection of tools, pre-trained models, and extensions that accelerate innovation.[1] This collaborative environment has contributed to quantifiable impacts, such as reduced costs and time-to-market in cross-framework projects by eliminating the need for redundant model adaptations.[8][13] Overall, these benefits have made ONNX a cornerstone for scalable AI deployment, enhancing flexibility and efficiency in the broader machine learning landscape.[14]History
Origins
The Open Neural Network Exchange (ONNX) was founded in 2017 as a joint initiative by Facebook (now Meta) and Microsoft to establish an open standard for representing deep learning models. Announced on September 7, 2017, the project aimed to create a shared format that would facilitate seamless model interchange across different AI frameworks, addressing the challenges posed by the rapid proliferation of diverse tools in the machine learning ecosystem.[4][3] The primary motivations for ONNX stemmed from the growing fragmentation in deep learning frameworks, such as Caffe2, PyTorch, and Microsoft's Cognitive Toolkit (CNTK), which made it difficult for developers to move models between training environments and production runtimes. By providing a unified export format, ONNX sought to streamline AI pipelines, enabling researchers and engineers to select the best framework for each project phase without compatibility barriers, ultimately accelerating innovation and deployment in artificial intelligence applications.[4][3] ONNX 1.0 was released in December 2017 as the initial production-ready version, concentrating on a core set of neural network operators primarily for vision-based models while laying the groundwork for broader applicability. Early involvement from hardware and software partners bolstered the specification's development; for instance, AMD announced support for ONNX in October 2017, joining other contributors to promote interoperability across ecosystems.[15][16]Milestones and Governance
The Open Neural Network Exchange (ONNX) project marked a significant milestone in November 2019 when it was accepted as a graduate project under the Linux Foundation AI & Data Foundation, transitioning to a vendor-neutral governance model that fosters broader community participation.[17][18] This move built on its initial launch in 2017 and emphasized open collaboration among industry leaders. Earlier, the release of ONNX version 1.2 in September 2018 introduced essential control flow operators, such as Loop and If, enabling support for more complex model structures beyond static computations.[19] ONNX employs semantic versioning for its releases, with the Interchange Representation (IR) version and operator sets (opsets) evolving independently to maintain backward compatibility; for instance, opset 18, released in December 2022 as part of ONNX 1.13, enhanced quantization capabilities through improved support for low-precision data types like INT8.[20] In March 2024, ONNX 1.16.0 was released, introducing enhanced machine learning-specific operators, including support for UINT4 and INT4 data types to facilitate efficient quantization, alongside refinements to function and node prototypes for better overload handling.[21] This version also bolstered compatibility with diverse hardware, including improved execution on ARM architectures via associated runtimes.[21] More recently, as of October 2025, ONNX 1.19.0 was released, adding support for advanced features such as new operators for generative AI models and further quantization improvements.[22] Governance of ONNX is overseen by the Technical Steering Committee (TSC) within the LF AI & Data Foundation since 2019, comprising elected representatives from contributing organizations to guide technical direction, release planning, and community standards.[18][23] The TSC includes members from over 20 companies, such as NVIDIA, Intel, IBM, Amazon Web Services (AWS), and Microsoft, ensuring diverse input on specifications and extensions.[23] Elections for TSC seats occur annually, with the current term (September 2025–May 2026) featuring experts like Alexandre Eichenberger from IBM and Mayank Kaushik from NVIDIA.[23][16] ONNX has long included native support for traditional machine learning models via ONNX-ML extensions, allowing representation of non-deep learning algorithms like decision trees alongside neural networks.[2] ONNX Runtime provides extensions for on-device training, including support for federated learning scenarios that enable privacy-preserving model training across distributed edge devices without central data aggregation.[24] These developments underscore ONNX's evolution toward comprehensive AI interoperability.[1]Technical Specifications
Model Representation
The Open Neural Network Exchange (ONNX) represents machine learning models as a directed acyclic graph (DAG), which captures the computational topology of the model through a sequence of operations without cycles. This graph-based structure allows for a portable and framework-agnostic description of model inference, where each node in the DAG corresponds to a specific operation, and edges represent data flow between them. The entire model is serialized into a compact binary file with the .onnx extension using Protocol Buffers, enabling efficient storage, transmission, and parsing across diverse environments.[25][7] At the core of an ONNX model is theModelProto structure, which encapsulates the graph along with essential metadata. The graph itself consists of nodes that define operations, input and output tensors that specify data shapes and types, initializers for constant tensors (such as weights and biases), and attributes that provide fixed parameters to operations. Metadata elements include the producer name and version (indicating the tool or framework that generated the model), the domain (e.g., 'ai.onnx' for the standard namespace), model version for tracking iterations, and a doc_string for human-readable descriptions. Tensors are multidimensional arrays supporting various data types, such as FLOAT and INT32, though detailed type specifications are covered elsewhere.[25][7]
ONNX supports modular extensions through subgraphs and functions, enhancing reusability and complexity handling. Subgraphs appear within certain control-flow nodes, allowing nested computations for conditional or iterative logic, while functions define reusable compositions of operations that can be invoked like built-in nodes, promoting efficiency in large-scale models. To maintain interoperability, every ONNX model specifies an Intermediate Representation (IR) version—IR version 11 as of ONNX 1.17.0—which dictates the supported features and ensures compatibility across tools and runtimes.[25][7][26]
Graph execution in ONNX can be conceptually expressed as:
\text{Output tensors} = f(\text{Input tensors}, \text{Weights})
where f represents the composition of node operations applied in topological order, transforming inputs and constants into outputs without deriving intermediate steps here.[25]
Operators and Data Types
ONNX defines a comprehensive set of standardized operators that form the building blocks for representing machine learning models as computational graphs. These operators are organized into versioned operator sets, known as opsets, which ensure interoperability across frameworks and runtimes. Each opset represents a collection of immutable operator specifications within a specific domain, with the default domain being "ai.onnx" for core operators. As of 2025, the latest opset version for the ai.onnx domain is 25, encompassing over 170 operators, including foundational ones introduced in earlier versions and new additions like DeformConv in opset 19.[27][20] Operators are categorized based on their functionality to support diverse machine learning tasks. Core machine learning operators handle fundamental computations, such as MatMul for matrix multiplication, Conv for convolution, and Relu for rectified linear unit activation. Vision-specific operators include Resize for image scaling and MaxPool for pooling operations. Sequence processing operators support recurrent structures, exemplified by GRU for gated recurrent units and LSTM for long short-term memory cells. These categories enable the expression of complex models, from convolutional neural networks to recurrent architectures, while maintaining a unified vocabulary.[27] Data types in ONNX are primarily tensor-based, allowing models to specify the shape and element types of inputs, outputs, and intermediate values within the graph. Supported tensor element types include float (32-bit), float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64, bool, and string. Recent versions have added support for bfloat16 to optimize for hardware acceleration in deep learning, complex types for advanced signal processing applications, and 8-bit floating-point formats such as FLOAT8E4M3FN and FLOAT8E5M2 for quantization. Tensors can also incorporate sparse representations and optional types that may hold null values, enhancing flexibility for optional inputs or dynamic graphs.[28] To accommodate specialized models beyond core deep learning, ONNX includes domain-specific operators, such as those in the "ai.onnx.ml" domain for traditional machine learning algorithms from libraries like scikit-learn. Examples include LinearRegressor for linear regression and TreeEnsemble for tree-based ensembles. Backward compatibility is maintained through opset versioning: models specify the required opset version in their metadata, allowing runtimes to select appropriate operator implementations without breaking existing models, as non-breaking changes (e.g., documentation updates) are handled within the same version, while breaking changes increment the version number.[27][20] A representative example is the Add operator, which performs element-wise addition of two input tensors, supporting broadcasting to align shapes. For inputs A and B (tensors of compatible shapes), the output C is computed as: C_{i,j} = A_{i,j} + B_{i,j} where broadcasting rules apply if dimensions differ (e.g., a scalar added to a tensor expands the scalar across all elements). This operator has evolved across versions, with enhancements in opset 14 for improved string support, ensuring robust arithmetic operations in models.Interoperability
Framework Support
ONNX provides native export capabilities in several major machine learning frameworks, enabling seamless conversion of trained models to the ONNX format for interoperability. PyTorch includes built-in support through thetorch.onnx.export function, which has been available since version 1.2, allowing users to export computational graphs directly from torch.nn.Module instances. TensorFlow relies on the tf2onnx tool for exporting models, including those built with Keras or TensorFlow Lite, supporting opsets up to 18 for compatibility with various inference engines.[29] Similarly, scikit-learn models can be exported using the skl2onnx library, which converts pipelines and estimators into ONNX graphs while preserving sklearn's feature engineering components.[30]
Import support for ONNX models is also widespread, facilitating the loading and execution of ONNX files within diverse frameworks. Keras integration is handled through tf2onnx, as keras-onnx is deprecated; primary loading often occurs through ONNX Runtime, which handles Keras-derived models efficiently.[29] Apache MXNet, now a retired project since 2023, previously provided native import APIs in its contrib.onnx module for converting ONNX models to MXNet symbols and parameters.[31] PaddlePaddle supports ONNX import through its high-performance inference plugins, allowing deployment of ONNX models alongside native Paddle formats in production pipelines.[32]
As of 2025, over 15 frameworks offer official ONNX export functionality, reflecting the format's broad adoption for cross-ecosystem workflows; notable examples include Hugging Face Transformers via the Optimum library, which streamlines export of NLP models like BERT for optimized inference.[33] This extensive compatibility supports bidirectional operations, such as initial training in PyTorch followed by fine-tuning in TensorFlow through ONNX round-trip conversion, minimizing data loss and enabling hybrid development pipelines.[34] Conversion tools, detailed separately, underpin these integrations by handling graph translations between frameworks.[35]
Conversion Mechanisms
Conversion to the Open Neural Network Exchange (ONNX) format involves framework-specific exporters that capture a model's computation graph and serialize it into a standardized Protocol Buffers (protobuf) representation.[35] This process ensures the model can be imported into various runtimes while maintaining compatibility with specified operator sets (opsets). For instance, in PyTorch, thetorch.onnx.export function converts a torch.nn.Module to ONNX by providing example inputs and specifying parameters like input/output names and the target opset version.
Exporters handle variable input shapes through mechanisms such as dynamic axes in PyTorch, where a dictionary maps input names to dynamic dimensions (e.g., dynamic_axes={"input": {0: "batch_size"}}), allowing flexibility for batch sizes or sequence lengths without fixed dimensions. Similar exporters exist for other frameworks like TensorFlow via tensorflow-onnx, which rewrites model components using ONNX operators during conversion.[29] Once exported, the model is serialized to a .onnx file in protobuf format, with opset compatibility verified to match the target runtime's supported versions.[35]
Post-conversion, importers load the ONNX protobuf into memory for further processing or execution, often accompanied by validation to ensure integrity. The ONNX checker tool, via onnx.checker.check_model, verifies the model's legality, checking for issues like duplicate metadata or opset imports, and can optionally perform full checks including shape inference.[36] This validation is crucial after import to detect inconsistencies arising from framework differences, such as type mismatches (e.g., float32 vs. float64).[35]
Discrepancies in dynamic dimensions are addressed using shape inference, implemented through onnx.shape_inference.infer_shapes(), which propagates known shapes across the graph and adds inferred dimensions to the model's value_info field.[37] This resolves partial or symbolic shapes post-export, ensuring the model is executable without runtime errors from undefined tensors.[37] For custom operators not in the standard ONNX domain, extensions map them via domain registration, where a custom domain (e.g., "com.example") is defined in the model's opset_import to isolate proprietary ops from core ones.[38]
In PyTorch, custom ops during export can be handled by providing a custom_translation_table dictionary to decompose them into supported ONNX primitives. Updates in 2025, including ONNX v1.17.0 (released October 2025), enhance this with support for bfloat16 data types in multiple operators and other improvements for handling complex graphs.[26]
The overall conversion pipeline follows: start with the source model in a supported framework, export via the appropriate API to generate the ONNX graph, serialize to protobuf while specifying the opset, and validate using the checker with shape inference to confirm fidelity.[35][36] This structured approach minimizes fidelity loss, though challenges like unsupported custom ops may require manual decomposition or domain extensions.