Fact-checked by Grok 2 weeks ago

Open Neural Network Exchange

The Open Neural Network Exchange (ONNX) is an open-source format and ecosystem designed to represent machine learning models in a standardized way, enabling seamless interoperability across diverse frameworks, tools, runtimes, and hardware platforms.^[1]^[2] Developed initially in 2017 as a collaborative effort between Microsoft and Facebook (now Meta) to address the challenges of model portability in AI development, ONNX defines an extensible computation graph model with built-in operators, standard data types, and a protobuf-based serialization format that supports both deep learning and traditional machine learning workflows.^[3]^[4] In December 2017, version 1.0 was released with additional support from AWS and other partners, marking its transition to a production-ready standard.^[5] Now governed as a graduate project under the LF AI & Data Foundation, ONNX fosters a community-driven ecosystem that includes special interest groups for areas like quantization and runtime optimization, allowing developers to train models in one framework (such as PyTorch or TensorFlow) and deploy them in another without proprietary lock-in.^[6] Key benefits include enhanced hardware acceleration through compatible runtimes like ONNX Runtime, support for over 200 operators in its latest versions, and broad adoption by industry leaders including Intel, AMD, NVIDIA, and Qualcomm for efficient inference on edge devices and cloud environments.^[1]^[2] This standardization reduces development friction, promotes innovation by decoupling model representation from specific tools, and continues to evolve through ongoing contributions, with recent focuses on advanced features like generative AI model support.^[6]

Overview

Definition and Purpose

The Open Neural Network Exchange (ONNX) is an open-source format designed to represent machine learning models in a framework-agnostic manner, capturing both their structural topology and parameter values to ensure portability across diverse ecosystems.^[1]^[2] Developed as a standardized intermediate representation, ONNX supports a wide range of models, including those from deep learning and traditional machine learning paradigms, by defining an extensible computation graph that encapsulates the model's logic independently of the originating framework.^[7] The primary purpose of ONNX is to enable seamless model exchange between training frameworks, such as PyTorch and TensorFlow, and inference engines, allowing developers to transfer models without loss of information or fidelity.^[6]^[2] This interoperability addresses a key challenge in AI development by standardizing how models are serialized and shared, facilitating deployment on various hardware accelerators and runtimes while preserving the model's intended behavior.^[1] ONNX achieves this through Protocol Buffers (protobuf)-based serialization, which provides a compact and extensible binary format for encoding models. At its core, an ONNX model is represented as a directed acyclic graph (DAG), consisting of nodes that denote operations, along with inputs, outputs, and attributes that define tensor shapes, data types, and other metadata essential for execution.^[7] This graph structure ensures that the model's computational flow and parameters are fully described in a vendor-neutral way, supporting efficient parsing and optimization by downstream tools.^[2] By decoupling model creation from deployment, ONNX promotes innovation in the AI ecosystem, empowering researchers to experiment with preferred training environments while enabling production teams to optimize for specific inference platforms without proprietary constraints.^[6]^[1] This separation fosters broader adoption of machine learning models across industries, as evidenced by its role in streamlining workflows from research to real-world applications.^[2]

Key Benefits

The Open Neural Network Exchange (ONNX) provides significant interoperability benefits by allowing machine learning models trained in one framework, such as PyTorch or TensorFlow, to be seamlessly exported and deployed in another without requiring retraining or reimplementation, thereby reducing vendor lock-in and facilitating hybrid workflows across development and production environments.^[1]^[8] This standardization enables data scientists and engineers to collaborate more effectively, as models can be shared and utilized regardless of the originating toolset.^[9] ONNX also offers optimization advantages through its support for shared runtime libraries and hardware accelerators from multiple vendors, which can lead to faster inference times and reduced resource consumption compared to framework-specific implementations.^[1]^[10] For instance, ONNX Runtime, a key execution provider, applies cross-platform optimizations that enhance performance on diverse hardware like GPUs and CPUs.^[11] Additionally, its portability ensures models can transition smoothly between cloud-based services, edge devices, and on-device applications, maintaining consistency in behavior and efficiency across deployment scenarios.^[12]^[8] The adoption of ONNX has fostered ecosystem growth by promoting collaboration among AI developers, tool providers, and hardware manufacturers under an open governance model, resulting in a rich collection of tools, pre-trained models, and extensions that accelerate innovation.^[1] This collaborative environment has contributed to quantifiable impacts, such as reduced costs and time-to-market in cross-framework projects by eliminating the need for redundant model adaptations.^[8]^[13] Overall, these benefits have made ONNX a cornerstone for scalable AI deployment, enhancing flexibility and efficiency in the broader machine learning landscape.^[14]

History

Origins

The Open Neural Network Exchange (ONNX) was founded in 2017 as a joint initiative by Facebook (now Meta) and Microsoft to establish an open standard for representing deep learning models. Announced on September 7, 2017, the project aimed to create a shared format that would facilitate seamless model interchange across different AI frameworks, addressing the challenges posed by the rapid proliferation of diverse tools in the machine learning ecosystem.^[4]^[3] The primary motivations for ONNX stemmed from the growing fragmentation in deep learning frameworks, such as Caffe2, PyTorch, and Microsoft's Cognitive Toolkit (CNTK), which made it difficult for developers to move models between training environments and production runtimes. By providing a unified export format, ONNX sought to streamline AI pipelines, enabling researchers and engineers to select the best framework for each project phase without compatibility barriers, ultimately accelerating innovation and deployment in artificial intelligence applications.^[4]^[3] ONNX 1.0 was released in December 2017 as the initial production-ready version, concentrating on a core set of neural network operators primarily for vision-based models while laying the groundwork for broader applicability. Early involvement from hardware and software partners bolstered the specification's development; for instance, AMD announced support for ONNX in October 2017, joining other contributors to promote interoperability across ecosystems.^[15]^[16]

Milestones and Governance

The Open Neural Network Exchange (ONNX) project marked a significant milestone in November 2019 when it was accepted as a graduate project under the Linux Foundation AI & Data Foundation, transitioning to a vendor-neutral governance model that fosters broader community participation.^[17]^[18] This move built on its initial launch in 2017 and emphasized open collaboration among industry leaders. Earlier, the release of ONNX version 1.2 in September 2018 introduced essential control flow operators, such as Loop and If, enabling support for more complex model structures beyond static computations.^[19] ONNX employs semantic versioning for its releases, with the Interchange Representation (IR) version and operator sets (opsets) evolving independently to maintain backward compatibility; for instance, opset 18, released in December 2022 as part of ONNX 1.13, enhanced quantization capabilities through improved support for low-precision data types like INT8.^[20] In March 2024, ONNX 1.16.0 was released, introducing enhanced machine learning-specific operators, including support for UINT4 and INT4 data types to facilitate efficient quantization, alongside refinements to function and node prototypes for better overload handling.^[21] This version also bolstered compatibility with diverse hardware, including improved execution on ARM architectures via associated runtimes.^[21] More recently, as of October 2025, ONNX 1.19.0 was released, adding support for advanced features such as new operators for generative AI models and further quantization improvements.^[22] Governance of ONNX is overseen by the Technical Steering Committee (TSC) within the LF AI & Data Foundation since 2019, comprising elected representatives from contributing organizations to guide technical direction, release planning, and community standards.^[18]^[23] The TSC includes members from over 20 companies, such as NVIDIA, Intel, IBM, Amazon Web Services (AWS), and Microsoft, ensuring diverse input on specifications and extensions.^[23] Elections for TSC seats occur annually, with the current term (September 2025–May 2026) featuring experts like Alexandre Eichenberger from IBM and Mayank Kaushik from NVIDIA.^[23]^[16] ONNX has long included native support for traditional machine learning models via ONNX-ML extensions, allowing representation of non-deep learning algorithms like decision trees alongside neural networks.^[2] ONNX Runtime provides extensions for on-device training, including support for federated learning scenarios that enable privacy-preserving model training across distributed edge devices without central data aggregation.^[24] These developments underscore ONNX's evolution toward comprehensive AI interoperability.^[1]

Technical Specifications

Model Representation

The Open Neural Network Exchange (ONNX) represents machine learning models as a directed acyclic graph (DAG), which captures the computational topology of the model through a sequence of operations without cycles. This graph-based structure allows for a portable and framework-agnostic description of model inference, where each node in the DAG corresponds to a specific operation, and edges represent data flow between them. The entire model is serialized into a compact binary file with the .onnx extension using Protocol Buffers, enabling efficient storage, transmission, and parsing across diverse environments.^[25]^[7] At the core of an ONNX model is the ModelProto structure, which encapsulates the graph along with essential metadata. The graph itself consists of nodes that define operations, input and output tensors that specify data shapes and types, initializers for constant tensors (such as weights and biases), and attributes that provide fixed parameters to operations. Metadata elements include the producer name and version (indicating the tool or framework that generated the model), the domain (e.g., 'ai.onnx' for the standard namespace), model version for tracking iterations, and a doc_string for human-readable descriptions. Tensors are multidimensional arrays supporting various data types, such as FLOAT and INT32, though detailed type specifications are covered elsewhere.^[25]^[7] ONNX supports modular extensions through subgraphs and functions, enhancing reusability and complexity handling. Subgraphs appear within certain control-flow nodes, allowing nested computations for conditional or iterative logic, while functions define reusable compositions of operations that can be invoked like built-in nodes, promoting efficiency in large-scale models. To maintain interoperability, every ONNX model specifies an Intermediate Representation (IR) version—IR version 11 as of ONNX 1.17.0—which dictates the supported features and ensures compatibility across tools and runtimes.^[25]^[7]^[26] Graph execution in ONNX can be conceptually expressed as:

\text{Output tensors} = f(\text{Input tensors}, \text{Weights})

where f represents the composition of node operations applied in topological order, transforming inputs and constants into outputs without deriving intermediate steps here.^[25]

Operators and Data Types

ONNX defines a comprehensive set of standardized operators that form the building blocks for representing machine learning models as computational graphs. These operators are organized into versioned operator sets, known as opsets, which ensure interoperability across frameworks and runtimes. Each opset represents a collection of immutable operator specifications within a specific domain, with the default domain being "ai.onnx" for core operators. As of 2025, the latest opset version for the ai.onnx domain is 25, encompassing over 170 operators, including foundational ones introduced in earlier versions and new additions like DeformConv in opset 19.^[27]^[20] Operators are categorized based on their functionality to support diverse machine learning tasks. Core machine learning operators handle fundamental computations, such as MatMul for matrix multiplication, Conv for convolution, and Relu for rectified linear unit activation. Vision-specific operators include Resize for image scaling and MaxPool for pooling operations. Sequence processing operators support recurrent structures, exemplified by GRU for gated recurrent units and LSTM for long short-term memory cells. These categories enable the expression of complex models, from convolutional neural networks to recurrent architectures, while maintaining a unified vocabulary.^[27] Data types in ONNX are primarily tensor-based, allowing models to specify the shape and element types of inputs, outputs, and intermediate values within the graph. Supported tensor element types include float (32-bit), float16, int8, int16, int32, int64, uint8, uint16, uint32, uint64, bool, and string. Recent versions have added support for bfloat16 to optimize for hardware acceleration in deep learning, complex types for advanced signal processing applications, and 8-bit floating-point formats such as FLOAT8E4M3FN and FLOAT8E5M2 for quantization. Tensors can also incorporate sparse representations and optional types that may hold null values, enhancing flexibility for optional inputs or dynamic graphs.^[28] To accommodate specialized models beyond core deep learning, ONNX includes domain-specific operators, such as those in the "ai.onnx.ml" domain for traditional machine learning algorithms from libraries like scikit-learn. Examples include LinearRegressor for linear regression and TreeEnsemble for tree-based ensembles. Backward compatibility is maintained through opset versioning: models specify the required opset version in their metadata, allowing runtimes to select appropriate operator implementations without breaking existing models, as non-breaking changes (e.g., documentation updates) are handled within the same version, while breaking changes increment the version number.^[27]^[20] A representative example is the Add operator, which performs element-wise addition of two input tensors, supporting broadcasting to align shapes. For inputs A and B (tensors of compatible shapes), the output C is computed as:

C_{i,j} = A_{i,j} + B_{i,j}

where broadcasting rules apply if dimensions differ (e.g., a scalar added to a tensor expands the scalar across all elements). This operator has evolved across versions, with enhancements in opset 14 for improved string support, ensuring robust arithmetic operations in models.

Interoperability

Framework Support

ONNX provides native export capabilities in several major machine learning frameworks, enabling seamless conversion of trained models to the ONNX format for interoperability. PyTorch includes built-in support through the torch.onnx.export function, which has been available since version 1.2, allowing users to export computational graphs directly from torch.nn.Module instances. TensorFlow relies on the tf2onnx tool for exporting models, including those built with Keras or TensorFlow Lite, supporting opsets up to 18 for compatibility with various inference engines.^[29] Similarly, scikit-learn models can be exported using the skl2onnx library, which converts pipelines and estimators into ONNX graphs while preserving sklearn's feature engineering components.^[30] Import support for ONNX models is also widespread, facilitating the loading and execution of ONNX files within diverse frameworks. Keras integration is handled through tf2onnx, as keras-onnx is deprecated; primary loading often occurs through ONNX Runtime, which handles Keras-derived models efficiently.^[29] Apache MXNet, now a retired project since 2023, previously provided native import APIs in its contrib.onnx module for converting ONNX models to MXNet symbols and parameters.^[31] PaddlePaddle supports ONNX import through its high-performance inference plugins, allowing deployment of ONNX models alongside native Paddle formats in production pipelines.^[32] As of 2025, over 15 frameworks offer official ONNX export functionality, reflecting the format's broad adoption for cross-ecosystem workflows; notable examples include Hugging Face Transformers via the Optimum library, which streamlines export of NLP models like BERT for optimized inference.^[33] This extensive compatibility supports bidirectional operations, such as initial training in PyTorch followed by fine-tuning in TensorFlow through ONNX round-trip conversion, minimizing data loss and enabling hybrid development pipelines.^[34] Conversion tools, detailed separately, underpin these integrations by handling graph translations between frameworks.^[35]

Conversion Mechanisms

Conversion to the Open Neural Network Exchange (ONNX) format involves framework-specific exporters that capture a model's computation graph and serialize it into a standardized Protocol Buffers (protobuf) representation.^[35] This process ensures the model can be imported into various runtimes while maintaining compatibility with specified operator sets (opsets). For instance, in PyTorch, the torch.onnx.export function converts a torch.nn.Module to ONNX by providing example inputs and specifying parameters like input/output names and the target opset version. Exporters handle variable input shapes through mechanisms such as dynamic axes in PyTorch, where a dictionary maps input names to dynamic dimensions (e.g., dynamic_axes={"input": {0: "batch_size"}}), allowing flexibility for batch sizes or sequence lengths without fixed dimensions. Similar exporters exist for other frameworks like TensorFlow via tensorflow-onnx, which rewrites model components using ONNX operators during conversion.^[29] Once exported, the model is serialized to a .onnx file in protobuf format, with opset compatibility verified to match the target runtime's supported versions.^[35] Post-conversion, importers load the ONNX protobuf into memory for further processing or execution, often accompanied by validation to ensure integrity. The ONNX checker tool, via onnx.checker.check_model, verifies the model's legality, checking for issues like duplicate metadata or opset imports, and can optionally perform full checks including shape inference.^[36] This validation is crucial after import to detect inconsistencies arising from framework differences, such as type mismatches (e.g., float32 vs. float64).^[35] Discrepancies in dynamic dimensions are addressed using shape inference, implemented through onnx.shape_inference.infer_shapes(), which propagates known shapes across the graph and adds inferred dimensions to the model's value_info field.^[37] This resolves partial or symbolic shapes post-export, ensuring the model is executable without runtime errors from undefined tensors.^[37] For custom operators not in the standard ONNX domain, extensions map them via domain registration, where a custom domain (e.g., "com.example") is defined in the model's opset_import to isolate proprietary ops from core ones.^[38] In PyTorch, custom ops during export can be handled by providing a custom_translation_table dictionary to decompose them into supported ONNX primitives. Updates in 2025, including ONNX v1.17.0 (released October 2025), enhance this with support for bfloat16 data types in multiple operators and other improvements for handling complex graphs.^[26] The overall conversion pipeline follows: start with the source model in a supported framework, export via the appropriate API to generate the ONNX graph, serialize to protobuf while specifying the opset, and validate using the checker with shape inference to confirm fidelity.^[35]^[36] This structured approach minimizes fidelity loss, though challenges like unsupported custom ops may require manual decomposition or domain extensions.

Optimization and Deployment

Runtimes and Execution

ONNX Runtime (ORT), developed by Microsoft, serves as the primary cross-platform inference engine for executing ONNX models, supporting a wide range of hardware including CPUs, GPUs, and mobile devices such as those on Android and iOS.^[39] ORT enables portable deployment by abstracting hardware-specific optimizations through its Execution Provider (EP) interface, allowing seamless integration with backends like DirectML for Windows, CUDA for NVIDIA GPUs, and CoreML for Apple devices.^[40] This architecture ensures that ONNX models can run efficiently across diverse environments without requiring framework-specific modifications. The execution model in ORT relies on just-in-time (JIT) compilation, where the ONNX computation graph is parsed, optimized, and transformed into platform-specific code paths for efficient inference. Graph optimizations, including constant folding, node fusion, and layout transformations, are applied to reduce overhead and improve throughput; for instance, operator fusion combines multiple nodes (e.g., Conv + ReLU) into a single kernel to minimize memory accesses and kernel launch costs.^[41] Inference latency can be modeled as \text{[Latency](/page/Latency)} = \sum (\text{Node execution time}), where optimizations like graph partitioning and fusion reduce the number of nodes and inter-node data transfers, leading to measurable performance gains.^[42] In version 1.17, released in 2024, ORT introduced WebGPU support, enabling accelerated browser-based inference for web applications while maintaining compatibility with WebAssembly (Wasm), SIMD, and multi-threading extensions for better performance.^[43] ^[44] On ARM-based devices, such as mobile processors, operator fusion contributes to significant speedups, with benchmarks showing up to 2x improvement in inference time for fused convolutional layers compared to unfused execution. Beyond ORT, other runtimes provide ONNX compatibility for specialized use cases. Apache TVM, an open-source deep learning compiler, supports direct import and compilation of ONNX models into optimized machine code for various hardware targets, emphasizing auto-tuning for high-performance kernels. NVIDIA's TensorRT, while hardware-specific to GPUs, offers robust ONNX parsing and engine building, partitioning the graph to leverage layer fusion and precision calibration for up to 5x faster inference on compatible NVIDIA hardware. These runtimes collectively enhance ONNX's portability by allowing model execution on diverse software stacks.

Hardware Acceleration

ONNX leverages hardware acceleration primarily through ONNX Runtime (ORT), which integrates with vendor-specific execution providers (EPs) to optimize model inference on diverse accelerators without requiring modifications to the ONNX model itself.^[45] These EPs map ONNX operators to hardware-optimized kernels, enabling efficient execution on GPUs, NPUs, and other specialized hardware.^[40] Key backends include the CUDA EP for NVIDIA GPUs, which accelerates computations using CUDA libraries and supports features like CUDA graphs for reduced latency in repeated inferences.^[46] The DirectML EP targets Windows-based GPUs from various vendors, utilizing Microsoft's DirectML API for cross-vendor compatibility and hardware-agnostic acceleration.^[47] For Apple ecosystems, the CoreML EP delegates sub-graphs to CoreML's runtime, exploiting CPU, GPU, and Neural Engine for low-power, high-performance inference on iOS and macOS devices.^[48] Intel's OpenVINO EP optimizes for Intel CPUs, integrated GPUs, and VPUs, applying techniques like low-precision inference to enhance throughput on edge and server hardware.^[49] Optimization techniques in ONNX for hardware acceleration emphasize quantization and operator fusion. Quantization converts floating-point models to lower-precision formats, such as INT8, using operators like QuantizeLinear, which maps high-precision tensors to quantized representations via scale and zero-point parameters, reducing memory footprint and boosting inference speed on supported hardware.^[50]^[51] Operator fusion merges compatible ONNX operators into single hardware-specific kernels, minimizing data transfers and overhead; for instance, fusing convolution and activation operations in CUDA or OpenVINO backends can yield significant performance gains by leveraging vendor-optimized implementations. In 2025, ORT introduced enhanced AMD ROCm support via a dedicated GPU EP, enabling acceleration of generative AI workloads on AMD Instinct and Radeon GPUs through ROCm libraries, with seamless integration into the existing EP framework.^[52] Benchmarks on edge devices, including those using GPUs and NPUs via compatible EPs like NNAPI, demonstrate speedups of 2-3x in inference latency compared to unoptimized CPU execution, establishing key context for deployment efficiency. Vendor contributions via EPs in ORT facilitate plug-and-play hardware integration, allowing developers to switch accelerators—such as from CUDA to ROCm—while maintaining model portability and performance portability across ecosystems.^[45] This modular design, as detailed in the runtimes section, underscores ONNX's role in hardware-agnostic deployment.^[45]

Adoption and Ecosystem

Community Contributions

The ONNX project has attracted contributions from over 500 individuals on its primary GitHub repository, reflecting broad community involvement in enhancing model interoperability and ecosystem tools. Leading contributions have come from major organizations such as Meta, Microsoft, and NVIDIA, which have driven core development through code submissions, specification updates, and integration efforts since the project's inception.^[53]^[18] To foster collaboration, ONNX maintains Community Working Groups focused on key domains, such as Generative-AI for advancing support in generative models, Preprocessing for data pre/post processing and featurization, Multi-device for multi-device support, and Safety-Related-Profile for safety considerations.^[54] These groups facilitate cross-organizational discussions on operator extensions and domain-specific optimizations, ensuring ONNX evolves with diverse AI workloads. Additionally, annual ONNX Community Meetups have been held since 2020, providing forums for technical presentations, networking, and roadmap planning, with the 2025 event emphasizing interoperability in generative AI.^[16]^[55] Growth metrics underscore the project's momentum: the main repository has surpassed 25,000 stars on GitHub by 2025, indicating widespread adoption among developers and researchers.^[2] As a graduate project under the LF AI & Data Foundation, ONNX's ecosystem has expanded to include over 30 member organizations, ranging from tech giants to AI startups, supporting governance and resource allocation.^[56]

Real-World Applications

ONNX has found significant application in the automotive sector, particularly in autonomous vehicles. NVIDIA's DRIVE platform supports ONNX models for inference in perception tasks through integration with TensorRT, enabling efficient deployment of deep learning models on edge hardware for real-time processing.^[57] In cloud-based machine learning services, Azure Machine Learning facilitates the export of models from various frameworks to ONNX format, supporting scalable inference across cloud and edge environments. This allows developers to train models in tools like PyTorch or TensorFlow and deploy them seamlessly using ONNX Runtime for optimized performance.^[58] Within research communities, ONNX integration in Hugging Face's Transformers library enables the export and deployment of large language models and other transformer architectures. Researchers can convert pretrained models to ONNX for faster inference with ONNX Runtime, reducing latency in natural language processing tasks without altering the underlying architecture.^[59] For edge AI in IoT devices, ONNX Runtime provides deployment capabilities on platforms like Raspberry Pi, supporting applications such as computer vision for real-time image classification from device cameras. This enables lightweight, privacy-focused inference on resource-constrained hardware, suitable for smart home and industrial monitoring systems.^[60] Meta employs ONNX in its large-scale recommendation systems through compatibility with Deep Learning Recommendation Models (DLRM), allowing model portability and optimization for production-scale serving.^[61] Emerging applications of ONNX include federated learning scenarios for privacy-preserving machine learning, where tools like HE-MAN enable secure inference on ONNX models using homomorphic encryption across distributed devices. This approach maintains data confidentiality while allowing collaborative model training in sensitive domains like healthcare.^[62]

References

[1]
ONNX | Home
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep ...Docs · Get Started · About · News
[2]
onnx/onnx: Open standard for machine learning interoperability
Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves.Open Neural Network Exchange · ONNX tutorials · Pre-trained ONNX models · Wiki
[3]
Microsoft and Facebook create open ecosystem for AI model ...
Sep 7, 2017 · Today we are excited to announce the Open Neural Network Exchange (ONNX) format in conjunction with Facebook. ONNX provides a shared model ...Missing: original | Show results with:original
[4]
Facebook and Microsoft introduce new open ecosystem for ...
Facebook and Microsoft are today introducing Open Neural Network Exchange (ONNX) format, a standard for representing deep learning models that enables models ...Missing: announcement | Show results with:announcement
[5]
ONNX v1 - Meta for Developers
Today Facebook, AWS, and Microsoft are excited to announce that with the support of the community and new partners the first version of ONNX is now production- ...Missing: original | Show results with:original
[6]
ONNX | About
### Summary of ONNX About Page
[7]
Open Neural Network Exchange Intermediate Representation ...
ONNX IR is an open specification defining a computation graph model, data types, and built-in operators. It includes a portable, serialized format of a ...
[8]
Open Neural Network Exchange (ONNX) Explained - Splunk
Nov 14, 2024 · ONNX is an open-source format that bridges AI frameworks, enabling interoperability and model portability using a computation graph model.Advantages Of Using Onnx · What Is Onnx Runtime? · Examples Of Onnx Runtime...
[9]
ONNX Explained: A New Paradigm in AI Interoperability - Viso Suite
Dec 18, 2023 · ONNX models can benefit from optimizations available in different frameworks and efficiently run on various hardware platforms. Community ...
[10]
Unlocking the Power of ONNX: Model Interoperability and Boosting ...
Oct 24, 2023 · One of ONNX's key benefits is that it makes it simple to export models from one framework, like PyTorch, and import them into another ...
[11]
https://onnx.ai/supported-tools.html#deployModel
[12]
ONNX: Enhancing AI Model Portability and Performance - Ikomia
May 23, 2024 · ONNX revolutionizes AI development by ensuring interoperability and portability of machine learning models across different frameworks and ...
[13]
Onnx in Stream Products - GetStream.io
Sep 17, 2024 · Reduced development time: The ability to export and import models in the Onnx format significantly reduces the time required to move from model ...Missing: cost savings
[14]
LF AI Welcomes ONNX, Ecosystem for Interoperable AI Models, as ...
Nov 14, 2019 · ONNX allows the transfer of models between deep learning frameworks and simplifies the deployment of trained models to inference servers in ...<|control11|><|separator|>
[15]
ONNX V1 released - Engineering at Meta - Facebook
Dec 8, 2017 · The first version of ONNX is now production-ready. With ONNX, we are working to create an AI ecosystem that gives developers the freedom to innovate.
[16]
News - ONNX
The ONNX initiative envisions the flexibility to move deep learning models seamlessly between open-source frameworks to accelerate development for data ...Missing: statistics | Show results with:statistics
[17]
ONNX joins Linux Foundation - Microsoft Open Source Blog
Nov 14, 2019 · Open Neural Network eXchange (ONNX) is joining the LF AI Foundation, an umbrella foundation of the Linux Foundation.
[18]
ONNX – LFAI & Data
ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools.
[19]
Releases · onnx/onnx - GitHub
Oct 7, 2025 · ONNX v1.16.0 is now available with exciting new features! We would like to thank everyone who contributed to this release!Missing: milestones | Show results with:milestones
[20]
ONNX Versioning - ONNX 1.21.0 documentation
The ONNX versioning system allows for simple monotonically increasing numbers or semantic versioning (SemVer). For IR and operator sets, versioning is based on ...Missing: milestones | Show results with:milestones
[21]
Release v1.16.0 · onnx/onnx
### Key Features and Changes in ONNX 1.16.0
[22]
Notes and artifacts from the ONNX steering committee - GitHub
Members. Current September 1st, 2025 - May 31, 2026, Alexandre Eichenberger (IBM) Mayank Kaushik (Nvidia) Andreas Fehlner (TRUMPF Laser SE) Ganesan Ramalingam ...
[23]
On-Device Training with ONNX Runtime
Federated learning tasks, where the model is locally trained on data that is distributed across multiple devices in an effort to build a more robust aggregated ...
[24]
ONNX Concepts - ONNX 1.21.0 documentation
ONNX can be compared to a programming language specialized in mathematical functions. It defines all the necessary operations a machine learning model needs.
[25]
ONNX Operators - ONNX 1.21.0 documentation
Lists out all the ONNX operators. For each operator, lists out the usage guide, parameters, examples, and line-by-line version history.Ai.onnx.ml - TreeEnsemble · Ai.onnx.ml - LinearRegressor · Ai.onnx.ml - CastMap · If
[26]
ONNX Types - ONNX 1.20.0 documentation
An optional type represents a reference to either an element (could be Tensor, Sequence, Map, or Sparse Tensor) or a null value.
[27]
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
tf2onnx will use the ONNX version installed on your system and installs the latest ONNX version if none is found. We support and test ONNX opset-14 to opset-18.Missing: official | Show results with:official
[28]
onnx/sklearn-onnx: Convert scikit-learn models and pipelines to ONNX
sklearn-onnx converts scikit-learn models to ONNX. Once in the ONNX format, you can use tools like ONNX Runtime for high performance scoring.Missing: official | Show results with:official
[29]
onnx/keras-onnx: Convert tf.keras/Keras models to ONNX - GitHub
Oct 13, 2021 · The keras2onnx model converter enables users to convert Keras models into the ONNX model format. Initially, the Keras converter was developed in the project ...
[30]
Importing an ONNX model into MXNet
To completely describe a pre-trained model in MXNet, we need two elements: a symbolic graph, containing the model's network definition, and a binary file ...Prerequisites · Loading The Model Into Mxnet · Input Pre-ProcessingMissing: support | Show results with:support
[31]
High Performance Inference - PaddleX Documentation
The high-performance inference plugin supports handling multiple model formats, including PaddlePaddle static graph ( .pdmodel , .json ), ONNX ( .onnx ) and ...
[32]
Supported Tools - ONNX
Frameworks & Converters. Use the frameworks you already know and love. Yandex CatBoost · CoreML · Optimum · Keras · LibSVM · Matlab · MindSpore.Missing: exporters importers<|control11|><|separator|>
[33]
Getting Started Converting TensorFlow to ONNX
TensorFlow models (including keras and TFLite models) can be converted to ONNX using the tf2onnx tool. Full code for this tutorial is available here.Missing: official | Show results with:official
[34]
Converters - ONNX 1.21.0 documentation
It enables code reuse across libraries like NumPy, JAX, PyTorch, CuPy and more. ndonnx enables execution with an ONNX backend and instant ONNX export for Array ...Other Api · A Class Graph With A Method... · Tricks Learned From...
[35]
onnx.checker - ONNX 1.21.0 documentation
The onnx.checker module provides graph utilities for checking if an ONNX proto message is legal and checks the consistency of a model.Missing: importers | Show results with:importers
[36]
onnx.shape_inference - ONNX 1.21.0 documentation
Apply shape inference to the provided ModelProto. Inferred shapes are added to the value_info field of the graph.Missing: importers validation
[37]
How to make custom operator in onnx and run it in onnx-runtime?
Aug 18, 2021 · For ONNX, you need to set the domain name for opset. ... Then ONNX checker will know it's an op from your custom domain instead of official ONNX ...
[38]
onnxruntime - ONNX Runtime
ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries.Missing: representation | Show results with:representation
[39]
microsoft/onnxruntime: ONNX Runtime: cross-platform, high ... - GitHub
ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences and lower ...
[40]
Graph Optimizations in ONNX Runtime
Graph optimizations are essentially graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions ...Missing: 2025 | Show results with:2025
[41]
ONNX Runtime Performance
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator.Performance Diagnosis · Quantize ONNX models · ORT model format runtime...Missing: fusion | Show results with:fusion
[42]
Build for web | onnxruntime
ONNX Runtime WebAssembly can be built with or without multi-thread and Single Instruction Multiple Data (SIMD) support. This support is added/removed by ...
[43]
ONNX Runtime Execution Providers
ONNX Runtime works with different hardware acceleration libraries through its extensible Execution Providers (EP) framework to optimally execute the ONNX models ...Nvidia - cuda · Intel - OpenVINO · Windows - DirectML · Apple - CoreML
[44]
NVIDIA - CUDA | onnxruntime
The CUDA Execution Provider enables hardware accelerated computation on Nvidia CUDA-enabled GPUs.Configuration Options · Performance Tuning · Using Cuda Graphs (preview)
[45]
Windows - DirectML | onnxruntime
The DirectML Execution Provider is a component of ONNX Runtime that uses DirectML to accelerate inference of ONNX models. The DirectML execution provider is ...
[46]
CoreML Execution Provider - Apple - ONNX Runtime
CoreML is Apple's machine learning framework designed to efficiently use hardware like CPU, GPU, and Neural Engine for performance and minimal power ...Available Options (new Api) · Supported Operators · NeuralnetworkMissing: Microsoft | Show results with:Microsoft
[47]
https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html
[48]
QuantizeLinear - ONNX 1.21.0 documentation
The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the low-precision/quantized tensor.Missing: fusion hardware kernels
[49]
Quantize ONNX models | onnxruntime
ONNX Runtime provides python APIs for converting 32-bit floating point model to an 8-bit integer model, aka quantization.
[50]
Introducing AMD GPU EP for Generative AI with Amuse
Oct 6, 2025 · AMD GPU EP is built on ONNX Runtime and powered by ROCm technologies. Diagram of AMD GPU EP showing Hugging Face to ONNX Runtime pipeline with ...
[51]
ONNX expansion speeds AI development - Engineering at Meta
May 2, 2018 · ONNX is adding support for additional AI tools, including Baidu's PaddlePaddle platform, and Qualcomm SNPE. ONNX is also adding a production-ready converter ...
[52]
Repository for ONNX working group artifacts - GitHub
Completed working groups ; Control Flow and Loops, Enable dynamic control structures to enable advanced models for NLP, speech, and video/image processing ...Missing: vision | Show results with:vision
[53]
LF AI & Data Day - ONNX Community Virtual Meetup – Fall | LF Events
*LF AI & Data Days are regional, one-day events hosted and organized by local members with support from the LF AI & Data Foundation and its projects.Missing: organizations | Show results with:organizations
[54]
Press Releases – LFAI & Data
LF AI Welcomes ONNX, Ecosystem for Interoperable AI Models, as Graduate Project. Active contributors to ONNX code base include over 30 blue chip companies in AI ...
[55]
daquexian/onnx-simplifier: Simplify your onnx model - GitHub
ONNX Simplifier is presented to simplify the ONNX model. It infers the whole computation graph and then replaces the redundant operators with their constant ...Issues 166 · Pull requests 9 · Security · ActivityMissing: pruning | Show results with:pruning
[56]
OODTE: A Differential Testing Engine for the ONNX Optimizer - arXiv
May 3, 2025 · In this work, we present OODTE, a utility designed to automatically and comprehensively evaluate the correctness of the ONNX Optimizer.<|separator|>
[57]
TensorRT, Drive AGX, Jetson and the .onnx format
Jul 27, 2021 · The latest DRIVE release(DRIVE OS 5.2.0) has TensorRT 6.3. If you want to get TensorRT Optimized model, you need to use it. No. Both have ...
[58]
ONNX Runtime and Models - Azure Machine Learning
Oct 13, 2025 · Supported frameworks include TensorFlow, PyTorch, scikit-learn, Keras, Chainer, MXNet, and MATLAB. You can run models in the ONNX format on ...ONNX Runtime · Ways to get ONNX modelsMissing: importers | Show results with:importers
[59]
ONNX - Hugging Face
ONNX is an open standard that defines a common set of operators and a file format to represent deep learning models in different frameworks, including PyTorch ...Missing: structure | Show results with:structure
[60]
ONNX Runtime IoT Deployment on Raspberry Pi
Learn how to perform image classification on the edge using ONNX Runtime and a Raspberry Pi, taking input from the device's camera and sending the ...
[61]
Engineering Engagement: A Practitioner's Guide to DLRM in Large ...
May 3, 2025 · DLRM is compatible with production tools like TorchScript, ONNX, and distributed PyTorch environments. Facebook's open-source TorchRec further ...
[62]
Homomorphically Encrypted MAchine learning with oNnx models
HE-MAN is introduced, an open-source two-party machine learning toolset for privacy preserving inference with ONNX models and homomorphically encrypted data