Fact-checked by Grok 2 weeks ago

OpenVINO

OpenVINO is an open-source toolkit developed by Intel for optimizing and deploying deep learning models to enable high-performance AI inference across diverse environments, including edge devices, on-premises servers, and cloud infrastructure. Originally released in 2018, OpenVINO—short for Open Visual Inference and Neural Network Optimization—builds on Intel's earlier computer vision technologies to accelerate AI applications in domains such as computer vision, natural language processing (NLP), and automatic speech recognition. It allows developers to convert, optimize, and run models from popular frameworks like PyTorch, TensorFlow, ONNX, and PaddlePaddle on a variety of hardware targets, prioritizing Intel processors but extending to ARM/ARM64 and other architectures. Key components include the OpenVINO Runtime, a that supports , , and C APIs for cross-platform deployment on , Windows, and macOS; model optimization tools for quantization, , and to reduce and resource usage; and integration with libraries like for enhanced workflows. The toolkit emphasizes on x86 and CPUs (optimized for processors), integrated and discrete GPUs, and neural processing units (NPUs), such as those in Ultra processors, delivering up to significant performance gains in real-time inference scenarios. As of the 2025.3 release in September 2025, OpenVINO continues to expand support for generative AI, large language models, and emerging hardware like GPUs, while maintaining and providing pre-trained models via the OpenVINO Model Hub for . This evolution positions it as a versatile solution for scalable AI deployment, fostering innovation in edge AI, AI PCs, and hybrid cloud-edge systems.

Overview

Introduction

OpenVINO is an open-source toolkit developed by for optimizing and deploying models, with a particular emphasis on accelerating on hardware platforms. The toolkit enables developers to convert, optimize, and run models from various frameworks such as , , and ONNX, facilitating high-performance across diverse environments. The acronym OpenVINO stands for Open Visual and Optimization. The primary goals of OpenVINO include reducing latency, increasing throughput, and preserving model accuracy, making it suitable for deployments ranging from devices to . Initially focused on applications that emulate human vision capabilities, the toolkit has since expanded to support broader tasks, including support for generative models on platforms like CPUs, GPUs, and NPUs. OpenVINO is released under the Apache 2.0 license as an open-source project, hosted on , allowing for community contributions and widespread adoption.

History

OpenVINO was initially released by on May 16, 2018, as an open-source toolkit primarily designed for optimizing and deploying inference applications on Intel hardware. During its early versions from 2018 to 2020, OpenVINO emphasized the Model Optimizer and components, which facilitated the conversion of deep learning models from frameworks such as and Caffe into an (IR) format suitable for efficient inference on Intel CPUs, GPUs, and VPUs. In 2021, OpenVINO integrated with Intel's oneAPI ecosystem to expand hardware support and streamline development workflows, with version 2021.2 released in December 2020 introducing enhancements for broader and optimizations. The 2023.0 release, launched on May 30, 2023, coincided with OpenVINO's five-year anniversary and brought significant improvements, including an enhanced for easier model handling and better support for ONNX models through direct loading capabilities without mandatory offline conversion. In 2025, OpenVINO continued its evolution with version 2025.1 released on April 10, 2025, which added support for vision-language models such as Jina CLIP v1 and introduced acceleration for text generation to enable efficient deployment on AI PCs. Version 2025.3, released on September 3, 2025, further advanced generative capabilities with broader model support, new framework integrations for minimal code changes, and GPU performance optimizations. A key shift in 2025 involved the of legacy tools such as the Model Optimizer, with all functionality unified under the to simplify the and reduce dependencies. Since its open-sourcing under the Apache 2.0 license, OpenVINO has benefited from community contributions through its repository, where developers have submitted enhancements, bug fixes, and extensions for diverse applications.

Technical Architecture

Core Components

OpenVINO's core components form the foundational software elements enabling efficient inference deployment. At the heart is the OpenVINO Runtime, a C++-based core library designed for executing inference across diverse hardware platforms. It provides device-agnostic model loading and execution capabilities, allowing developers to deploy models without hardware-specific modifications. The runtime includes bindings for , C++, and C APIs, supporting operating systems such as , Windows, and macOS, which facilitates flexible integration into various application environments. Complementing the runtime are specialized tools for performance evaluation and validation. The Benchmark App is a utility for measuring inference performance of models on target hardware, supporting both synchronous and asynchronous execution modes to estimate throughput and latency under realistic conditions. It processes user-provided inputs to generate metrics like frames per second, aiding in hardware selection and optimization planning. The Post-Training Optimization Tool (POT) historically enabled quantization and other compression techniques without model retraining, focusing on integer quantization to reduce model size and inference time while preserving accuracy. However, POT has been discontinued since OpenVINO 2024.0, with its functionality superseded by the Neural Network Compression Framework (NNCF) for advanced post-training optimizations, including accuracy-aware quantization. OpenVINO integrates with 's oneAPI ecosystem through libraries like oneDNN (Intel oneAPI Deep Neural Network Library), which optimizes primitives for CPU and GPU execution, ensuring unified programming across heterogeneous Intel hardware. This integration promotes portability and performance consistency within the broader oneAPI framework for AI development. Among deprecated components, the Model Optimizer—previously used for converting models to (IR)—has been fully phased out in 2025 releases. It is replaced by direct IR conversion support via the OpenVINO Converter , streamlining model preparation without legacy dependencies.

Model Representation

OpenVINO employs the (IR) as its core model format, designed specifically for efficient on hardware. The IR is a binary format comprising two files: an XML file (.xml) that encodes the model's , including layers, inputs, outputs, and operations; and a file (.bin) that stores the trained weights and biases. This structure optimizes the model for deployment by abstracting away framework-specific details from earlier formats like those of Caffe or , facilitating portability across diverse environments. OpenVINO supports direct import of models from several popular frameworks, including ONNX, , TensorFlow Lite, (through direct conversion or export to ONNX), and PaddlePaddle, in addition to its native format. Models in these input formats can be loaded into OpenVINO without manual preprocessing in many cases, as the runtime handles the ingestion process. For models, conversion to has been supported natively since the 2023 release, streamlining the transition from training to inference. In OpenVINO versions from 2025 onward, model conversion and initial optimization occur automatically during runtime loading for supported formats such as , ONNX, TensorFlow Lite, and PaddlePaddle, obviating the need for a distinct offline optimization step. This on-the-fly process converts the input model to internally, applying basic optimizations like and to prepare it for execution. Users can explicitly convert models to using the openvino.convert_model if desired, for scenarios requiring custom optimizations or repeated use. To accommodate custom layers and operations not natively supported, OpenVINO offers an extensibility mechanism through its , which allows developers to register and implement custom operations at . This enables the integration of framework-specific or proprietary ops by providing C++ or implementations that plug into the pipeline, ensuring model compatibility without altering the core toolkit. The IR format evolves with OpenVINO releases to enhance compatibility and features; for instance, IR version 11, introduced in 2023 alongside API 2.0, improved support for ONNX operations and dynamic shapes, aligning more closely with modern model requirements. Subsequent versions continue this progression, incorporating updates for new operation sets and inference optimizations.

Development and Optimization

Workflow

The OpenVINO workflow enables developers to efficiently prepare, optimize, and deploy inference pipelines by providing a streamlined for model handling and execution across diverse . This end-to-end process begins with importing pre-trained models and culminates in post-processing outputs, allowing seamless integration into applications without extensive framework-specific modifications. The first step involves model import, where pre-trained models from popular frameworks such as , , or ONNX are loaded directly into the OpenVINO runtime using the ov.Core().read_model() method, which supports formats like , ONNX, and PaddlePaddle without requiring manual conversion in many cases. This import creates an ov.Model object representing the computational graph, ready for further processing. Next, configuration occurs, where developers specify the target device (e.g., CPU, GPU, or ), input shapes, and levels, such as converting from FP32 to INT8 to balance performance and accuracy. Input shapes can be fixed or dynamic to accommodate variable data sizes, using methods like reshape() to adapt the model dynamically. configuration is set via parameters to enable quantization-aware adjustments. Compilation follows, where the runtime compiles the model for the specified hardware using core.compile_model(), applying built-in optimizations tailored to the device for improved latency and throughput. This step generates a CompiledModel object optimized for execution, incorporating techniques like those detailed in the optimization methods section. Inference execution then runs predictions on the compiled model, supporting both synchronous and asynchronous modes via CompiledModel.infer_new_request() or create_infer_request(). In synchronous mode, infer() blocks until results are available, while asynchronous mode uses start_async() and wait() for non-blocking operation, enabling overlap with data preparation. Post-processing handles the raw outputs from , such as applying non-maximum suppression (NMS) to filter bounding boxes in tasks or decoding logits into classifications. For models like , this involves extracting coordinates, confidence scores, and drawing visualizations on input images. A basic inference loop exemplifies this workflow:
python
import openvino as ov
import [numpy](/page/NumPy) as np

# Step 1: Model import
core = ov.[Core](/page/Core)()
model = core.read_model("model.xml")  # Or path to ONNX, etc.

# Step 2: Configuration (e.g., set dynamic shape if needed)
# model.reshape({0: [1, 3, 224, 224]})  # Example for fixed shape

# Step 3: Compilation
compiled_model = core.compile_model(model, "CPU")  # Specify device

# Step 4: Inference execution (synchronous example)
input_data = np.random.uniform(-1, 1, (1, 3, 224, 224)).astype(np.float32)
result = compiled_model([input_data])[compiled_model.output(0)]

# Step 5: Post-processing (task-specific, e.g., argmax for classification)
predictions = np.argmax(result, axis=1)
print(predictions)
Best practices include using asynchronous execution to maximize throughput by pipelining with input preprocessing and output handling, particularly in applications. Additionally, leveraging dynamic shapes supports variable input sizes, such as batching or resizing images, to avoid recompilation overhead.

Optimization Methods

OpenVINO employs a range of algorithmic techniques to enhance model efficiency, primarily through the Neural Network Compression Framework (NNCF), which integrates compression algorithms like quantization and to reduce model size and accelerate while preserving accuracy. These methods target both model-level and runtime-level improvements, enabling deployment on resource-constrained devices. Quantization in OpenVINO reduces precision from floating-point to representations, notably supporting post-training quantization (PTQ) to INT8, which converts weights and activations without retraining, typically shrinking model size by approximately 4x and yielding speedups of 2-4x on CPU with minimal accuracy degradation. For scenarios requiring higher fidelity, quantization-aware training () simulates low-precision operations during to mitigate accuracy loss, often restoring performance close to the original floating-point model. NNCF facilitates custom quantization by allowing users to provide representative datasets, ensuring robust parameter estimation for activations and weights. As of the 2025.3 release, NNCF supports advanced low-bit techniques including INT4 data-aware weights and NF4-FP8 for ONNX models, further reducing footprint for generative workloads. Pruning techniques eliminate redundant parameters, with NNCF offering filter pruning that removes unimportant convolutional filters, reducing and model footprint while maintaining quality through magnitude-based or gradient-driven criteria. Sparsity induction further sparsifies weights, particularly effective for models, where structured sparsity patterns leverage hardware vectorization for up to 2x throughput gains on processors without significant accuracy penalties. Graph optimizations occur during Intermediate Representation (IR) compilation, applying transformations such as to precompute and replace constant subgraphs with their evaluated values, thereby simplifying the computation graph and reducing runtime overhead. removes unused nodes and operations, streamlining the model for faster execution, while layer fusion merges compatible operations—like convolutions with activations—into single kernels to minimize memory access and boost efficiency on supported hardware. To tune and throughput, OpenVINO supports dynamic batching, which aggregates variable-sized inputs into batches at to maximize utilization, potentially increasing throughput by saturating compute resources. parallelism divides model layers across execution stages, enabling concurrent processing of sequential operations and reducing end-to-end , especially beneficial for in high-throughput scenarios. These techniques measure success via metrics like frames per second () for throughput and milliseconds () for , with quantized models often demonstrating 2-4x improvements in on CPU compared to full-precision baselines. For generative AI workloads, OpenVINO includes specialized optimizations such as token eviction in the KV cache, which limits cache size to manage memory for long-context large models (LLMs) by selectively discarding older tokens, preventing out-of-memory issues during extended sequences. KV cache optimizations further compress and reuse key-value pairs across generations, enhancing throughput for autoregressive decoding in LLMs by reducing redundant computations. In the 2025.3 release, Sage Attention support was added for CPU , providing performance boosts for first-token in LLMs via the ENABLE_SAGE_ATTN property.

Platform Compatibility

Operating Systems

OpenVINO provides primary support for several major operating systems, enabling deployment on diverse computing environments. On , it officially supports 22.04 LTS and later (including full support for 24.04 LTS), as well as 8 and later, all in 64-bit architectures. and 11 (64-bit x86_64) are fully supported, while macOS 12 and later versions target processors through integration with oneAPI, which leverages the backend for GPU acceleration. Installation options vary by platform to facilitate ease of setup. users can install OpenVINO via across , Windows, and macOS, providing access to the runtime and development tools. For , / users have access to APT repositories for package management, while Windows installations utilize executors for straightforward deployment. Cross-compilation capabilities extend support to Arm-based distributions, including embedded systems like , allowing inference on resource-constrained devices through custom builds. Version-specific compatibility ensures reliable operation. Docker containers are available for all supported platforms, offering a consistent, isolated environment that simplifies dependency management and reproducibility across development and production setups. Recent 2025 updates have enhanced macOS integration, improving performance on Apple Silicon via optimized oneAPI components that interface with the Metal API for GPU tasks. Despite broad desktop and server coverage, OpenVINO lacks native support for mobile operating systems such as or due to hardware and ecosystem constraints. For mobile inference, developers can export models to ONNX format and utilize ONNX Runtime, which incorporates an OpenVINO execution provider to run optimized on compatible backends.

Hardware Acceleration

OpenVINO optimizes across a range of targets, leveraging specialized accelerators to enhance while maintaining with standard units. The toolkit integrates with ecosystem to exploit architecture-specific features, enabling efficient execution on diverse devices from data centers to systems. For central processing units (CPUs), OpenVINO supports Intel x86 architectures, including and processors, where it utilizes multi-threading for parallel execution and like for accelerated computations on supported hardware. Additionally, through integration with the oneAPI Deep Neural Network Library (oneDNN), OpenVINO extends compatibility to 64-bit architectures, incorporating optimized kernels for broader model support on Arm-based systems. Graphics processing units (GPUs) in OpenVINO encompass both integrated and discrete variants, such as Intel Iris Xe integrated graphics and the Arc series discrete GPUs. These are accelerated via the oneAPI Level Zero interface, with backend support for on integrated GPUs and for more advanced programmability on discrete models, allowing for high-throughput of layers. Neural processing units (NPUs) provide dedicated acceleration in OpenVINO, targeting low-power inference scenarios on Ultra processors starting from and extending to subsequent generations. These units offload compute-intensive tasks from the CPU and GPU, optimizing for in always-on applications like real-time on laptops and devices. In 2025, OpenVINO added full support for Lunar Lake NPUs, enhancing capabilities for generative workloads through updated drivers and runtime optimizations. Vision processing units (VPUs), such as the Movidius Myriad X, were historically supported in earlier OpenVINO versions for devices like cameras, connected via USB or PCIe interfaces to enable compact, low-latency at the network periphery. However, as of OpenVINO 2023.0 and later releases including 2025, dedicated VPU support has been discontinued, with models redirected to CPU or GPU execution. Field-programmable gate arrays (FPGAs), particularly Intel Agilex devices, are targeted through the OpenVINO FPGA Plugin within the FPGA AI Suite, allowing customizable hardware acceleration for high-performance inference in data center and embedded environments. This plugin facilitates model deployment on reconfigurable logic, optimizing for specific topologies via compiled bitstreams. Performance benefits vary by and model , with NPUs delivering significant speedups over CPUs for INT8 quantized models—typically 3 to 5 times faster in common tasks—due to their specialized multiply units and reduced power consumption. For example, OpenVINO benchmarks on NPUs show improved throughput for models compared to CPU-only execution. Device selection in OpenVINO is managed programmatically via the ov.Core().set_property() method, supporting multi-device execution modes like for automatic optimal assignment or explicit configurations for heterogeneous setups across CPU, GPU, and .

Applications

Computer Vision

OpenVINO facilitates a range of traditional tasks, including , pose estimation, and semantic segmentation, by optimizing models for efficient on hardware. For , it supports models such as variants (e.g., YOLOv8, YOLOv11) and SSD-based architectures like MobileNet-SSD and SSD300, which enable real-time identification and localization of objects in images or video streams. Pose estimation is handled by specialized models like human-pose-estimation-0001 and human-pose-estimation-3d-0001, which detect human keypoints for applications requiring body posture analysis. Semantic segmentation models, such as road-segmentation-adas-0001 and variants like unet-camvid-onnx-0001, assign class labels to every pixel in an image, supporting tasks like scene understanding in urban environments. The OpenVINO Model Zoo provides a repository of pre-trained models specifically optimized for Intel processors, GPUs, and VPUs, allowing developers to deploy these without extensive reconfiguration. These models are converted to the OpenVINO Intermediate Representation (IR) format and quantized (e.g., to INT8) to reduce latency and memory usage while preserving accuracy, particularly on Intel CPUs and integrated graphics. For instance, ResNet-50, a foundational classification model often used in vision pipelines, achieves over 30 FPS on standard Intel CPUs in latency mode, demonstrating the toolkit's efficiency for real-time applications. In deployments, OpenVINO excels in video on resource-constrained devices like smart cameras equipped with Movidius Myriad X VPUs, enabling low-power inference for continuous processing. These VPUs handle tasks such as object tracking in feeds at , minimizing data transmission to the . Practical examples include systems, where models like face-detection-adas-0001 process video to identify individuals securely, and autonomous pipelines that integrate detection and segmentation for avoidance and . OpenVINO integrates seamlessly with for preprocessing (e.g., image resizing, normalization) and postprocessing (e.g., non-maximum suppression for bounding boxes), streamlining end-to-end workflows. This combination allows developers to leverage OpenCV's robust handling alongside OpenVINO's optimized , as seen in demos for multi-model pipelines.

Generative AI

OpenVINO supports a range of generative AI models, enabling efficient inference for tasks such as text generation and image synthesis on hardware. Key supported models include for high-quality image generation from text prompts and large language models (LLMs) like , integrated seamlessly through the ecosystem via the Optimum Intel library. To accelerate generative workflows, OpenVINO incorporates optimizations like key-value () cache management, which stores intermediate computations to reduce redundant calculations during autoregressive generation, and speculative decoding, which uses a smaller draft model to predict tokens ahead of verification by the main model, thereby speeding up token generation. These techniques can achieve up to 2.5x improvements in token throughput for models like Llama-2-7B on processors. In 2025 releases, OpenVINO introduced enhanced support for vision-language models (VLMs), such as , enabling multimodal tasks that combine visual and textual inputs; image-to-image pipelines for editing and style transfer using diffusion models; and token eviction mechanisms in the KV cache to handle long sequences by dynamically managing memory for extended contexts. Deployments of generative AI with OpenVINO emphasize on-device execution on AI PCs equipped with Neural Processing Units (NPUs), such as Ultra laptops, minimizing reliance on cloud resources for privacy-sensitive applications. For instance, text-to-image generation with runs efficiently on Core Ultra hardware, while LLM-based chatbots like those using benefit from NPU acceleration compared to CPU-only inference. These features address key challenges in deploying large generative models on edge hardware, particularly memory efficiency, by compressing weights, optimizing cache usage, and leveraging hardware-specific accelerations to fit billion-parameter models within limited resources without sacrificing output quality.

Automatic Speech Recognition

OpenVINO supports automatic speech recognition (ASR) tasks by optimizing models for efficient inference on Intel hardware, enabling real-time transcription and voice processing applications. Key models include Whisper and Distil-Whisper from Hugging Face, which perform speech-to-text conversion with high accuracy across multiple languages. These models are converted to OpenVINO IR format and quantized for reduced latency on CPUs, GPUs, and NPUs. In practical deployments, OpenVINO-powered ASR is used in edge devices for applications like voice assistants, live captioning, and meeting transcription, where low-latency processing is critical. For example, Whisper-large-v3, quantized to INT4, achieves efficient performance on Ultra processors, supporting long audio sequences with minimal resource usage. Integration with audio preprocessing libraries enhances end-to-end workflows for continuous .

References

  1. [1]
    Intel® Distribution of OpenVINO™ Toolkit
    OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere. AI anywhere illustration of developers working. Overview ...OpenVINO™ Model Hub · AI PC · Edge AI Reference Kits
  2. [2]
    OpenVINO 2025.3 — OpenVINO™ documentation — Version(2025)
    OpenVINO is an open-source toolkit for deploying performant AI solutions in the cloud, on-prem, and on the edge alike.Learn OpenVINO · OpenVINO Tokenizers · openvino.PartialShape · Documentation
  3. [3]
    Streamline Your Intel Distribution of OpenVINO Toolkit Development ...
    Mar 3, 2020 · Back in 2018, Intel launched the Intel® Distribution of OpenVINO™ toolkit. Since then, it's been widely adopted by partners and developers to ...
  4. [4]
    OpenVINO™ is an open source toolkit for optimizing and deploying ...
    OpenVINO™ supports inference on CPU (x86, ARM), GPU (Intel integrated & discrete GPU) and AI accelerators (Intel NPU).Awesome OpenVINO · Openvinotoolkit/openvino Wiki · OpenVINO model servers
  5. [5]
    [PDF] Fact Sheet: OpenVINO - Intel
    Full support for Intel Meteor Lake, NPU. • Streamlined AI model deployment via. OpenVINO Runtime and Model Server. • Access to MediaPipe for multipurpose AI.
  6. [6]
    OpenVINO™ Toolkit for AI PC - Intel
    Intel provides highly optimized developer support for AI workloads by including the OpenVINO™ toolkit on your PC.
  7. [7]
    Intel® Distribution of OpenVINO™ Toolkit
    Sep 5, 2025 · This package contains the Intel® Distribution of OpenVINO™ Toolkit software version 2025.3 for Linux*, Windows* and macOS*.
  8. [8]
    OpenVINO™ Model Hub - Intel
    Discover the performance difference OpenVINO toolkit can deliver across AI models on Intel® hardware platforms from the edge to AI PCs.
  9. [9]
    What's New in the Intel® Distribution of OpenVINO™ Toolkit
    OpenVINO™ 2025.3 takes your AI deployments to the next level with new features and performance enhancements. In this release, you'll see continuous improvements ...
  10. [10]
    OpenVINO Toolkit: Optimize AI on Intel Hardware - Viso Suite
    The name stands for “Open Visual Inference and Neural Network Optimization.” OpenVINO focuses on optimizing neural network inference with a write-once, deploy- ...
  11. [11]
    Intel Accelerates PadChest and fMRI Models on Azure ML
    Sep 15, 2023 · The OpenVINO toolkit provides additional performance gains and ... These optimizations reduce latency and increase throughput without sacrificing ...
  12. [12]
    Where to Download the OpenVINO™ toolkit - Intel
    The OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), ...
  13. [13]
    Intel Launches Neural Network Toolkit for High-Performance IoT ...
    May 16, 2018 · On Wednesday Intel Corp. announced a new artificial intelligence (AI) solution: Open Visual Inference & Neural Network Optimization toolkit, ...Missing: press | Show results with:press
  14. [14]
    OpenVINO model optimization - OpenCV
    Oct 16, 2020 · OpenVINO Toolkit provides Model Optimizer – a tool that optimizes the models for inference on target devices using static model analysis.Missing: 2018-2020 | Show results with:2018-2020
  15. [15]
    Intel® Distribution of OpenVINO™ Toolkit Release Notes 2021.4
    This 2021.4.2 release provides functional bug fixes, and minor capability changes for the previous 2021.4.1 release.
  16. [16]
    Release Notes for Intel® Distribution of OpenVINO™ Toolkit 2023.0
    May 30, 2023 · We are proud to announce the release of OpenVINO 2023.0 introducing a range of new features, improvements, and deprecations aimed at enhancing the developer ...Missing: June | Show results with:June
  17. [17]
    Release Notes for Intel Distribution of OpenVINO Toolkit 2025.1
    Apr 10, 2025 · NPU acceleration for text generation is now enabled in OpenVINO™ Runtime and OpenVINO™ Model Server to support the power-efficient deployment of ...
  18. [18]
    Release Notes for Intel Distribution of OpenVINO Toolkit 2025.3
    Sep 3, 2025 · What's new: More Gen AI coverage and frameworks integrations to minimize code changes, Broader LLM model support and more model compression ...Missing: generative | Show results with:generative
  19. [19]
    OpenVINO Release Notes
    2025.3 - 3 September 2025#. System Requirements | Release policy | Installation Guides. What's new#. More Gen AI coverage and frameworks integrations to ...
  20. [20]
    Deep Learning accuracy validation framework
    The Accuracy Checker is an extensible, flexible, and configurable Deep Learning accuracy validation framework with a modular structure.
  21. [21]
    Legacy Features and Components - OpenVINO™ documentation
    Post-training Optimization Tool (POT). New solution: Neural Network Compression Framework (NNCF) now offers the same functionality. Old solution: POT ...<|separator|>
  22. [22]
    [Deprecated] Post-training Quantization with POT
    For the needs of post-training optimization, OpenVINO provides a Post-training Optimization Tool (POT) which supports the uniform integer quantization method.
  23. [23]
    OpenVINO is powered by OneDNN for the best performance on ...
    Jun 28, 2023 · OpenVINO™ is a framework designed to accelerate deep-learning models from DL frameworks like Tensorflow or Pytorch. By using OpenVINO ...
  24. [24]
    Documentation OpenVINO IR format
    OpenVINO IR, known as Intermediate Representation, is the result of model conversion in OpenVINO and is represented by two files: an XML and a binary file.<|control11|><|separator|>
  25. [25]
    Convert to OpenVINO IR
    OpenVINO IR is the proprietary model format used by OpenVINO, typically obtained by converting models of supported frameworks.
  26. [26]
    Conventional Model Preparation - OpenVINO™ documentation
    OpenVINO supports the following model formats: PyTorch,. TensorFlow ... It converts files from one of the supported formats to OpenVINO IR, which ...Convert to OpenVINO IR · Generative Model Preparation · Setting Input Shapes
  27. [27]
    Convert a PyTorch Model to OpenVINO™ IR
    Starting from the 2023.0 release OpenVINO supports direct PyTorch models conversion to OpenVINO Intermediate Representation (IR) format. OpenVINO model ...
  28. [28]
    Converting a PyTorch Model - OpenVINO™ documentation
    To convert the model, use the openvino.convert_model function. Here is the simplest example of PyTorch model conversion using a model from torchvision.Missing: oneAPI | Show results with:oneAPI<|separator|>
  29. [29]
    Conversion Parameters - OpenVINO™ documentation
    This document describes all available parameters for openvino.convert_model, ovc, and openvino.save_model without focusing on a particular framework model ...
  30. [30]
    OpenVINO Extensibility Mechanism
    Explore OpenVINO™ Extensibility API, which allows adding support for models with custom operations and their further implementation in applications.
  31. [31]
    Custom OpenVINO Operations
    Explore OpenVINO™ Extension API which enables registering custom operations to support models with operations not supported by OpenVINO.Missing: SDK | Show results with:SDK
  32. [32]
    OpenVINO™ API 2.0 Transition Guide
    This guide introduces the new OpenVINO™ API: API 2.0, as well as the new OpenVINO IR model format: IR v11. Here, you will find comparisons of their “old” and “ ...Missing: history | Show results with:history
  33. [33]
    Operation Sets in OpenVINO
    Learn the essentials of representing deep learning models in OpenVINO IR format and the use of supported operation sets.Missing: SDK | Show results with:SDK
  34. [34]
    Conventional AI Workflow - OpenVINO™ documentation
    Supported file formats: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite. PyTorch files are not directly supported. OpenVINO files are read ...
  35. [35]
    Setting Input Shapes - OpenVINO™ documentation
    openvino.convert_model supports conversion of models with dynamic input shapes that contain undefined dimensions.
  36. [36]
    Model Optimization - NNCF - OpenVINO™ documentation
    Model optimization improves performance and reduces size. NNCF is the default tool, using compression algorithms to make models smaller and faster.Missing: 2018-2020 | Show results with:2018-2020
  37. [37]
    OpenVINO™ Inference Request
    To set up and run inference, use the ov::InferRequest class. It enables you to run inference on different devices either synchronously or asynchronously.
  38. [38]
    General Optimizations - OpenVINO™ documentation
    This article covers application-level optimization techniques, such as asynchronous execution, to improve data pipelining, pre-processing acceleration and so ...<|control11|><|separator|>
  39. [39]
    Post-training Quantization - OpenVINO™ documentation
    Post-training quantization is a method of reducing the size of a model, to make it lighter, faster, and less resource hungry. Importantly, this process does ...
  40. [40]
    Intel OpenVINO Export - Ultralytics YOLO Docs
    In this guide, we cover exporting YOLO11 models to the OpenVINO format, which can provide up to 3x CPU speedup, as well as accelerating YOLO inference on Intel ...
  41. [41]
    Quantization-aware Training (QAT) - OpenVINO™ documentation
    Quantization-aware Training is a popular method that allows quantizing a model and applying fine-tuning to restore accuracy degradation caused by quantization.<|control11|><|separator|>
  42. [42]
    Basic Quantization Flow - OpenVINO™ documentation
    Prepare a representative calibration dataset that is used to estimate quantization parameters of the activations within the model, for example, of 300 samples.
  43. [43]
    Filter Pruning of Convolutional Models - OpenVINO™ documentation
    Filter pruning is an advanced optimization method that allows reducing the computational complexity of the model by removing redundant or unimportant filters.
  44. [44]
    Accelerate Inference of Sparse Transformer Models with OpenVINO ...
    This tutorial demonstrates how to improve performance of sparse Transformer models with OpenVINO on 4th Gen Intel Xeon Scalable processors.Missing: fusion | Show results with:fusion
  45. [45]
    Overview of Transformations API - OpenVINO™ documentation
    If your transformation inserts constant sub-graphs that need to be folded, do not forget to use ov::pass::ConstantFolding() after your transformation or call ...
  46. [46]
    Group Common optimization passes - OpenVINO™ documentation
    Dequantization subgraph may have two forms: with and without Subtract. Input │ ▽ Convert ZeroPoints │ │ ▽ ▽ Input Subtract │ │ ▽ │ Scale Convert Scale │ │ │ │ ▽ ...
  47. [47]
    How Automatic Batching Works - OpenVINO™ documentation
    This article provides a preview of the Automatic Batching function, including how it works, its configurations, and testing performance.
  48. [48]
    Advanced Throughput Options: Streams and Batching
    With OpenVINO streams a device may handle processing multiple inference requests and the batching helps to saturate the device and leads to higher ...
  49. [49]
    openvino_genai.CacheEvictionConfig - OpenVINO™ documentation
    max_cache_size (int) – Maximum number of tokens that should be kept in the KV cache. The evictable block area will be located between the “start” and “recent” ...
  50. [50]
    System Requirements - OpenVINO™ documentation
    Higher versions of kernel might be required for 10th Gen Intel® Core ... Refer to the OpenVINO Release Policy to learn more about the release types.Missing: history | Show results with:history
  51. [51]
    openvino - PyPI
    OpenVINO supports the CPU, GPU, and NPU devices and works with models from PyTorch, TensorFlow, ONNX, TensorFlow Lite, PaddlePaddle, and JAX/Flax frameworks.Missing: April | Show results with:April
  52. [52]
    Release Notes for Intel Distribution of OpenVINO Toolkit 2023.2
    Nov 27, 2024 · Windows 11, 64-bit. CentOS 7. Red Hat Enterprise Linux 8, 64-bit. Intel® Neural Processing Unit with corresponding operating systems.
  53. [53]
    GLIBC library mismatch for Debian12 - Intel Community
    Mar 6, 2025 · Currently, Debian 12 is not officially supported for OpenVINO 2025. The issue you encountered is due to OpenVINO 2025 requiring GLIBC 2.38 ...
  54. [54]
    Can we use an android device to run inference using OpenVINO IE ...
    Aug 4, 2020 · OpenVINO isn't compatible with processors used by the Android devices yet. It is enabled for Intel Hardware like CPU, VPU, FPGA, GPU as we pass in these as a ...Missing: ios | Show results with:ios
  55. [55]
    Intel - OpenVINO™ | onnxruntime
    The OpenVINO Execution Provider supports the following devices for deep learning model execution: CPU, GPU, and NPU. Configuration supports both single device ...
  56. [56]
    Supported Devices — OpenVINO™ documentation — Version(2025)
    The OpenVINO™ runtime enables you to use the following devices to run your deep learning models: CPU, GPU, NPU. For their usage guides, see Devices and Modes.<|control11|><|separator|>
  57. [57]
    OpenVINO toolkit for ARM platforms overview
    Feb 17, 2025 · Beyond ACL, the plugin supports additional ARM-optimized kernels available through OneDNN, enabling broader model compatibility. ‍. It is ...
  58. [58]
    oneAPI Deep Neural Network Library (oneDNN) - GitHub
    oneDNN supports systems meeting the following requirements: Operating system with Intel 64 / Arm 64 / Power / IBMz architecture support; C++ compiler with C ...Releases 188 · Issues 28 · Pull requests 116 · Workflow runs
  59. [59]
    Configurations for Intel® Processor Graphics (GPU) with OpenVINO
    Learn how to provide additional configuration for Intel® Processor Graphics (GPU) to work with Intel® Distribution of OpenVINO™ toolkit on your system.
  60. [60]
    NPU Device - OpenVINO™ documentation
    The NPU device is currently supported by AUTO inference modes (HETERO execution is partially supported, for certain models). The NPU support in OpenVINO is ...Missing: 2025.1 April<|separator|>
  61. [61]
    Intel® NPU Driver - Windows*
    Intel ® NPU Driver for Windows* (Intel ® AI Boost) includes support for OpenVino™ 2025.3 ; Operating System: Microsoft Windows 11* 64-bit (22H2, 23H2, 24H2, 25H2) ...
  62. [62]
    OpenVINO GenAI on NPU
    This guide will give you extra details on how to use NPU with OpenVINO GenAI. See the installation guide for information on how to start.Missing: April | Show results with:April
  63. [63]
    Supported Devices — OpenVINO™ documentation — Version(2024)
    With the OpenVINO™ 2023.0 release, support has been cancelled for: Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X.Missing: history | Show results with:history
  64. [64]
    Intel Movidius Myriad X Openvino Version and Support
    Jun 11, 2025 · Thank you for reaching out to us. For your information, OpenVINO™ 2022.3 LTS will be the last version of OpenVINO™ to support the Myriad plugin.Compile model for myriad vpu - Intel CommunityInstalling OpenVino for Movidius NCS2 on Raspberry Pi 4More results from community.intel.comMissing: VPU | Show results with:VPU
  65. [65]
    Intel® Vision Accelerator Design with Intel® Movidius™ VPU
    Support for heterogeneous execution across various accelerators—CPU, GPU, Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU), and FPGA—using a common API.
  66. [66]
    3.3.2. OpenVINO™ FPGA Runtime Plugin - Intel
    FPGA AI Suite: Design Examples User Guide · 1. OpenVINO™ FPGA Runtime Overview 3.3. · 2. OpenVINO™ FPGA Runtime Plugin 3.3. · 3. FPGA AI Suite Runtime 3.3. · 4.
  67. [67]
    FPGA AI Suite - AI Inference Development Platform - Altera
    The FPGA AI Suite enables FPGA designers, machine learning engineers, and software developers to create optimized FPGA AI platforms efficiently.
  68. [68]
    Hello NPU - OpenVINO™ documentation
    Note that the actual performance may depend on the hardware used. Generally, we should expect NPU to be better than CPU. Please refer to the benchmark_app log ...
  69. [69]
    Multi-device execution - OpenVINO™ documentation
    The Multi-Device execution mode in OpenVINO Runtime assigns multiple available computing devices to particular inference requests to execute in parallel.
  70. [70]
    Convert and Optimize YOLOv8 real-time object detection with ...
    This tutorial demonstrates step-by-step instructions on how to run and optimize PyTorch YOLOv8 with OpenVINO.
  71. [71]
    mobilenet-ssd - OpenVINO™ documentation
    The mobilenet-ssd model is a Single-Shot multibox Detection (SSD) network intended to perform object detection. This model is implemented using the Caffe* ...
  72. [72]
    human-pose-estimation-0001 - OpenVINO™ documentation
    This is a multi-person 2D pose estimation network (based on the OpenPose approach) with tuned MobileNet v1 as a feature extractor. For every person in an image, ...
  73. [73]
    unet-camvid-onnx-0001 - OpenVINO™ documentation
    This is a U-Net model that is designed to perform semantic segmentation. The model has been trained on the CamVid dataset from scratch using PyTorch framework.
  74. [74]
    Model Zoo — OpenVINO™ documentation — Version(2024)
    Models, demos and full documentation are available in the Open Model Zoo GitHub repo and licensed under Apache License Version 2.0. Browse through over 200 ...Missing: computer | Show results with:computer
  75. [75]
    Edge AI-based Vision Analytics Powered by Intel OpenVINO and ...
    GoodVision Real-time Traffic Video Analytics · SOP Compliance Monitoring for ... Edge AI-based Vision Analytics Powered by Intel OpenVINO and Myriad X VPU ...
  76. [76]
    Bird's Eye View Perception: Fast-BEV — Intel Embodied Intelligence ...
    The FastBEV model is trained using PyTorch but can achieve optimized inference performance on Intel devices using OpenVINO. To enable this, the PyTorch model ...
  77. [77]
    Face Recognition Python* Demo - OpenVINO™ documentation
    The demo uses 3 models to detect faces, predict keypoints, and recognize persons by matching faces to a gallery, processing video frame by frame.
  78. [78]
    How to use OpenCV with OpenVINO
    Nov 16, 2022 · This script made it easy to install both OpenVINO and OpenCV together. Starting in the 2022.1.1 release we removed the OpenCV download script ...
  79. [79]
    Inference with Optimum Intel - OpenVINO™ documentation
    The steps below show how to load and infer LLMs from Hugging Face using Optimum Intel. They also show how to convert models into OpenVINO IR format.
  80. [80]
    OpenVINO - Hugging Face
    This guide will show you how to use the Stable Diffusion and Stable Diffusion XL (SDXL) pipelines with OpenVINO.
  81. [81]
    OpenVINO GenAI
    Compatible with popular models including Llama, Mistral, Phi, Qwen, Stable Diffusion, Flux, Whisper, etc. Easy model conversion from Hugging Face and ModelScope ...
  82. [82]
    Text Generation via Speculative Decoding using FastDraft and ...
    In this tutorial we consider how to apply Speculative decoding using FastDraft and OpenVINO GenAI.
  83. [83]
    Leveraging Speculative Sampling and KV-Cache Optimizations ...
    Nov 8, 2023 · In this article, we illustrate a form of dynamic execution known as speculative sampling to reduce the overall latency of text generation.Missing: management decoding
  84. [84]
    [PDF] Accelerate Innovation Generative AI with the toolkit - Intel
    OpenVINO 2024.2 optimizations for dGPUs deliver up to 1.6xperformance gains on the. Stable Diffusion transformer model. Intel® Discrete GPU Arc A770M. 1.6x ...
  85. [85]
    Introducing OpenVINO™ 2025.1 - Medium
    Apr 11, 2025 · We are pleased to announce the next release of OpenVINO™ for 2025 with more enhancements and features from our engineering team.
  86. [86]
    Releases · openvinotoolkit/openvino - GitHub
    Aug 26, 2025 · OpenVINO™ is an open source toolkit for optimizing and deploying AI inference - Releases · openvinotoolkit/openvino.
  87. [87]
    Generative AI on Intel AI PC: Local Power & Control - Plain Concepts
    Intel Core™ Ultra laptops can run AI models locally using compatible frameworks such as OpenVINO, ONNX Runtime, Hugging Face Optimum Intel, and Azure Foundry ...Azure Foundry Local... · Openvino: Direct Access To... · Model Compression And...<|separator|>
  88. [88]
    Reduce LLM Footprint with OpenVINO™ Toolkit Weight Compression
    Jul 2, 2024 · Large language models (LLMs) enable conversational AI, giving rise to powerful chatbots and personal assistants with the potential to boost ...Missing: throughput | Show results with:throughput