Fact-checked by Grok 2 weeks ago

OpenVINO

OpenVINO is an open-source toolkit developed by Intel for optimizing and deploying deep learning models to enable high-performance AI inference across diverse environments, including edge devices, on-premises servers, and cloud infrastructure.^[1]^[2] Originally released in 2018, OpenVINO—short for Open Visual Inference and Neural Network Optimization—builds on Intel's earlier computer vision technologies to accelerate AI applications in domains such as computer vision, natural language processing (NLP), and automatic speech recognition.^[3]^[1] It allows developers to convert, optimize, and run models from popular frameworks like PyTorch, TensorFlow, ONNX, and PaddlePaddle on a variety of hardware targets, prioritizing Intel processors but extending to ARM/ARM64 and other architectures.^[4]^[1] Key components include the OpenVINO Runtime, a lightweight inference engine that supports C++, Python, and C APIs for cross-platform deployment on Linux, Windows, and macOS; model optimization tools for quantization, pruning, and compression to reduce latency and resource usage; and integration with libraries like OpenCV for enhanced computer vision workflows.^[2]^[1]^[5] The toolkit emphasizes hardware acceleration on x86 and ARM CPUs (optimized for Intel processors), Intel integrated and discrete GPUs, and neural processing units (NPUs), such as those in Intel Core Ultra processors, delivering up to significant performance gains in real-time inference scenarios.^[4]^[6]^[5] As of the 2025.3 release in September 2025, OpenVINO continues to expand support for generative AI, large language models, and emerging hardware like Intel Arc GPUs, while maintaining backward compatibility and providing pre-trained models via the OpenVINO Model Hub for rapid prototyping.^[7]^[8] This evolution positions it as a versatile solution for scalable AI deployment, fostering innovation in edge AI, AI PCs, and hybrid cloud-edge systems.^[9]^[1]

Overview

Introduction

OpenVINO is an open-source toolkit developed by Intel for optimizing and deploying deep learning models, with a particular emphasis on accelerating AI inference on Intel hardware platforms.^[1] The toolkit enables developers to convert, optimize, and run models from various frameworks such as PyTorch, TensorFlow, and ONNX, facilitating high-performance inference across diverse environments.^[4] The acronym OpenVINO stands for Open Visual Inference and Neural network Optimization.^[10] The primary goals of OpenVINO include reducing inference latency, increasing throughput, and preserving model accuracy, making it suitable for deployments ranging from edge devices to cloud infrastructure.^[11] Initially focused on computer vision applications that emulate human vision capabilities, the toolkit has since expanded to support broader AI inference tasks, including support for generative AI models on platforms like CPUs, GPUs, and NPUs.^[12] OpenVINO is released under the Apache 2.0 license as an open-source project, hosted on GitHub, allowing for community contributions and widespread adoption.^[4]

History

OpenVINO was initially released by Intel on May 16, 2018, as an open-source toolkit primarily designed for optimizing and deploying computer vision inference applications on Intel hardware.^[13] During its early versions from 2018 to 2020, OpenVINO emphasized the Model Optimizer and Inference Engine components, which facilitated the conversion of deep learning models from frameworks such as TensorFlow and Caffe into an Intermediate Representation (IR) format suitable for efficient inference on Intel CPUs, GPUs, and VPUs.^[14] In 2021, OpenVINO integrated with Intel's oneAPI ecosystem to expand hardware support and streamline development workflows, with version 2021.2 released in December 2020 introducing enhancements for broader compatibility and performance optimizations.^[15] The 2023.0 release, launched on May 30, 2023, coincided with OpenVINO's five-year anniversary and brought significant improvements, including an enhanced Python API for easier model handling and better support for ONNX models through direct loading capabilities without mandatory offline conversion.^[16] In 2025, OpenVINO continued its evolution with version 2025.1 released on April 10, 2025, which added support for vision-language models such as Jina CLIP v1 and introduced NPU acceleration for text generation to enable efficient deployment on AI PCs.^[17] Version 2025.3, released on September 3, 2025, further advanced generative AI capabilities with broader LLM model support, new framework integrations for minimal code changes, and GPU performance optimizations.^[18] A key shift in 2025 involved the deprecation of legacy tools such as the Model Optimizer, with all functionality unified under the OpenVINO Runtime to simplify the inference pipeline and reduce dependencies.^[19] Since its open-sourcing under the Apache 2.0 license, OpenVINO has benefited from community contributions through its GitHub repository, where developers have submitted enhancements, bug fixes, and extensions for diverse AI applications.^[4]

Technical Architecture

Core Components

OpenVINO's core components form the foundational software elements enabling efficient AI inference deployment. At the heart is the OpenVINO Runtime, a C++-based core library designed for executing deep learning inference across diverse hardware platforms. It provides device-agnostic model loading and execution capabilities, allowing developers to deploy models without hardware-specific modifications. The runtime includes bindings for Python, C++, and C APIs, supporting operating systems such as Linux, Windows, and macOS, which facilitates flexible integration into various application environments. Complementing the runtime are specialized tools for performance evaluation and validation. The Benchmark App is a utility for measuring inference performance of models on target hardware, supporting both synchronous and asynchronous execution modes to estimate throughput and latency under realistic conditions. It processes user-provided inputs to generate metrics like frames per second, aiding in hardware selection and optimization planning. The Post-Training Optimization Tool (POT) historically enabled quantization and other compression techniques without model retraining, focusing on integer quantization to reduce model size and inference time while preserving accuracy. However, POT has been discontinued since OpenVINO 2024.0, with its functionality superseded by the Neural Network Compression Framework (NNCF) for advanced post-training optimizations, including accuracy-aware quantization.^[20]^[21]^[22] OpenVINO integrates with Intel's oneAPI ecosystem through libraries like oneDNN (Intel oneAPI Deep Neural Network Library), which optimizes deep learning primitives for CPU and GPU execution, ensuring unified programming across heterogeneous Intel hardware. This integration promotes portability and performance consistency within the broader oneAPI framework for AI development.^[23] Among deprecated components, the Model Optimizer—previously used for converting models to Intermediate Representation (IR)—has been fully phased out in 2025 releases. It is replaced by direct IR conversion support via the OpenVINO Converter API, streamlining model preparation without legacy dependencies.^[19]^[21]

Model Representation

OpenVINO employs the Intermediate Representation (IR) as its core model format, designed specifically for efficient inference on Intel hardware. The IR is a binary format comprising two files: an XML file (.xml) that encodes the model's topology, including layers, inputs, outputs, and operations; and a binary file (.bin) that stores the trained weights and biases. This structure optimizes the model for deployment by abstracting away framework-specific details from earlier formats like those of Caffe or TensorFlow, facilitating portability across diverse inference environments.^[24]^[25] OpenVINO supports direct import of models from several popular frameworks, including ONNX, TensorFlow, TensorFlow Lite, PyTorch (through direct conversion or export to ONNX), and PaddlePaddle, in addition to its native IR format. Models in these input formats can be loaded into OpenVINO without manual preprocessing in many cases, as the runtime handles the ingestion process. For PyTorch models, conversion to IR has been supported natively since the 2023 release, streamlining the transition from training to inference.^[26]^[27]^[28] In OpenVINO versions from 2025 onward, model conversion and initial optimization occur automatically during runtime loading for supported formats such as TensorFlow, ONNX, TensorFlow Lite, and PaddlePaddle, obviating the need for a distinct offline optimization step. This on-the-fly process converts the input model to IR internally, applying basic optimizations like constant folding and dead code elimination to prepare it for execution. Users can explicitly convert models to IR using the openvino.convert_model API if desired, for scenarios requiring custom optimizations or repeated use.^[26]^[29] To accommodate custom layers and operations not natively supported, OpenVINO offers an extensibility mechanism through its Extension API, which allows developers to register and implement custom operations at runtime. This API enables the integration of framework-specific or proprietary ops by providing C++ or Python implementations that plug into the IR pipeline, ensuring model compatibility without altering the core toolkit.^[30]^[31] The IR format evolves with OpenVINO releases to enhance compatibility and features; for instance, IR version 11, introduced in 2023 alongside API 2.0, improved support for ONNX operations and dynamic shapes, aligning more closely with modern model requirements. Subsequent versions continue this progression, incorporating updates for new operation sets and inference optimizations.^[32]^[33]

Development and Optimization

Workflow

The OpenVINO workflow enables developers to efficiently prepare, optimize, and deploy AI inference pipelines by providing a streamlined API for model handling and execution across diverse hardware. This end-to-end process begins with importing pre-trained models and culminates in post-processing outputs, allowing seamless integration into applications without extensive framework-specific modifications.^[34] The first step involves model import, where pre-trained models from popular frameworks such as TensorFlow, PyTorch, or ONNX are loaded directly into the OpenVINO runtime using the ov.Core().read_model() method, which supports formats like IR, ONNX, and PaddlePaddle without requiring manual conversion in many cases.^[26] This import creates an ov.Model object representing the computational graph, ready for further processing. Next, configuration occurs, where developers specify the target device (e.g., CPU, GPU, or NPU), input shapes, and precision levels, such as converting from FP32 to INT8 to balance performance and accuracy.^[35] Input shapes can be fixed or dynamic to accommodate variable data sizes, using methods like reshape() to adapt the model dynamically. Precision configuration is set via compilation parameters to enable quantization-aware adjustments.^[36] Compilation follows, where the runtime compiles the model for the specified hardware using core.compile_model(), applying built-in optimizations tailored to the device for improved latency and throughput. This step generates a CompiledModel object optimized for execution, incorporating techniques like those detailed in the optimization methods section.^[36] Inference execution then runs predictions on the compiled model, supporting both synchronous and asynchronous modes via CompiledModel.infer_new_request() or create_infer_request().^[37] In synchronous mode, infer() blocks until results are available, while asynchronous mode uses start_async() and wait() for non-blocking operation, enabling overlap with data preparation.^[37] Post-processing handles the raw outputs from inference, such as applying non-maximum suppression (NMS) to filter bounding boxes in computer vision tasks or decoding logits into classifications. For object detection models like YOLO, this involves extracting coordinates, confidence scores, and drawing visualizations on input images. A basic Python inference loop exemplifies this workflow:

python
import openvino as ov
import [numpy](/page/NumPy) as np

# Step 1: Model import
core = ov.[Core](/page/Core)()
model = core.read_model("model.xml")  # Or path to ONNX, etc.

# Step 2: Configuration (e.g., set dynamic shape if needed)
# model.reshape({0: [1, 3, 224, 224]})  # Example for fixed shape

# Step 3: Compilation
compiled_model = core.compile_model(model, "CPU")  # Specify device

# Step 4: Inference execution (synchronous example)
input_data = np.random.uniform(-1, 1, (1, 3, 224, 224)).astype(np.float32)
result = compiled_model([input_data])[compiled_model.output(0)]

# Step 5: Post-processing (task-specific, e.g., argmax for classification)
predictions = np.argmax(result, axis=1)
print(predictions)
import openvino as ov
import [numpy](/page/NumPy) as np

# Step 1: Model import
core = ov.[Core](/page/Core)()
model = core.read_model("model.xml")  # Or path to ONNX, etc.

# Step 2: Configuration (e.g., set dynamic shape if needed)
# model.reshape({0: [1, 3, 224, 224]})  # Example for fixed shape

# Step 3: Compilation
compiled_model = core.compile_model(model, "CPU")  # Specify device

# Step 4: Inference execution (synchronous example)
input_data = np.random.uniform(-1, 1, (1, 3, 224, 224)).astype(np.float32)
result = compiled_model([input_data])[compiled_model.output(0)]

# Step 5: Post-processing (task-specific, e.g., argmax for classification)
predictions = np.argmax(result, axis=1)
print(predictions)

Best practices include using asynchronous execution to maximize throughput by pipelining inference with input preprocessing and output handling, particularly in real-time applications.^[38] Additionally, leveraging dynamic shapes supports variable input sizes, such as batching or resizing images, to avoid recompilation overhead.

Optimization Methods

OpenVINO employs a range of algorithmic techniques to enhance model efficiency, primarily through the Neural Network Compression Framework (NNCF), which integrates compression algorithms like quantization and pruning to reduce model size and accelerate inference while preserving accuracy.^[36] These methods target both model-level and runtime-level improvements, enabling deployment on resource-constrained devices. Quantization in OpenVINO reduces precision from floating-point to integer representations, notably supporting post-training quantization (PTQ) to INT8, which converts weights and activations without retraining, typically shrinking model size by approximately 4x and yielding speedups of 2-4x on CPU inference with minimal accuracy degradation.^[22]^[39] For scenarios requiring higher fidelity, quantization-aware training (QAT) simulates low-precision operations during fine-tuning to mitigate accuracy loss, often restoring performance close to the original floating-point model.^[40] NNCF facilitates custom quantization by allowing users to provide representative calibration datasets, ensuring robust parameter estimation for activations and weights. As of the 2025.3 release, NNCF supports advanced low-bit techniques including INT4 data-aware weights compression and NF4-FP8 for ONNX models, further reducing footprint for generative AI workloads.^[18]^[41] Pruning techniques eliminate redundant parameters, with NNCF offering filter pruning that removes unimportant convolutional filters, reducing computational complexity and model footprint while maintaining inference quality through magnitude-based or gradient-driven criteria.^[42] Sparsity induction further sparsifies weights, particularly effective for transformer models, where structured sparsity patterns leverage hardware vectorization for up to 2x throughput gains on Intel Xeon processors without significant accuracy penalties.^[36] Graph optimizations occur during Intermediate Representation (IR) compilation, applying transformations such as constant folding to precompute and replace constant subgraphs with their evaluated values, thereby simplifying the computation graph and reducing runtime overhead.^[43] Dead code elimination removes unused nodes and operations, streamlining the model for faster execution, while layer fusion merges compatible operations—like convolutions with activations—into single kernels to minimize memory access and boost efficiency on supported hardware.^[44] To tune latency and throughput, OpenVINO supports dynamic batching, which aggregates variable-sized inputs into batches at runtime to maximize device utilization, potentially increasing throughput by saturating compute resources.^[45] Pipeline parallelism divides model layers across execution stages, enabling concurrent processing of sequential operations and reducing end-to-end latency, especially beneficial for deep networks in high-throughput scenarios.^[46] These techniques measure success via metrics like frames per second (FPS) for throughput and milliseconds (ms) for latency, with quantized models often demonstrating 2-4x improvements in FPS on CPU compared to full-precision baselines.^[39] For generative AI workloads, OpenVINO includes specialized optimizations such as token eviction in the KV cache, which limits cache size to manage memory for long-context large language models (LLMs) by selectively discarding older tokens, preventing out-of-memory issues during extended sequences.^[47] KV cache optimizations further compress and reuse key-value pairs across generations, enhancing throughput for autoregressive decoding in LLMs by reducing redundant computations. In the 2025.3 release, Sage Attention support was added for CPU inference, providing performance boosts for first-token latency in LLMs via the ENABLE_SAGE_ATTN property.^[19]^[18]

Platform Compatibility

Operating Systems

OpenVINO provides primary support for several major operating systems, enabling deployment on diverse computing environments. On Linux, it officially supports Ubuntu 22.04 LTS and later (including full support for Ubuntu 24.04 LTS), as well as Red Hat Enterprise Linux 8 and later, all in 64-bit architectures. Windows 10 and 11 (64-bit x86_64) are fully supported, while macOS 12 and later versions target Apple Silicon processors through integration with oneAPI, which leverages the SYCL backend for GPU acceleration.^[48]^[16] Installation options vary by platform to facilitate ease of setup. Python users can install OpenVINO via pip across Linux, Windows, and macOS, providing access to the runtime and development tools. For Linux, Debian/Ubuntu users have access to APT repositories for package management, while Windows installations utilize MSI executors for straightforward deployment. Cross-compilation capabilities extend support to Arm-based Linux distributions, including embedded systems like Raspberry Pi, allowing inference on resource-constrained devices through custom builds.^[49]^[50] Version-specific compatibility ensures reliable operation. Docker containers are available for all supported platforms, offering a consistent, isolated environment that simplifies dependency management and reproducibility across development and production setups. Recent 2025 updates have enhanced macOS integration, improving performance on Apple Silicon via optimized oneAPI components that interface with the Metal API for GPU tasks.^[7] Despite broad desktop and server coverage, OpenVINO lacks native support for mobile operating systems such as Android or iOS due to hardware and ecosystem constraints. For mobile inference, developers can export models to ONNX format and utilize ONNX Runtime, which incorporates an OpenVINO execution provider to run optimized inference on compatible backends.^[51]^[52]

Hardware Acceleration

OpenVINO optimizes deep learning inference across a range of Intel hardware targets, leveraging specialized accelerators to enhance performance while maintaining compatibility with standard computing units. The toolkit integrates with Intel's ecosystem to exploit architecture-specific features, enabling efficient execution on diverse devices from data centers to edge systems.^[53] For central processing units (CPUs), OpenVINO supports Intel x86 architectures, including Core and Xeon processors, where it utilizes multi-threading for parallel execution and advanced vector extensions like AVX512 for accelerated computations on supported hardware. Additionally, through integration with the oneAPI Deep Neural Network Library (oneDNN), OpenVINO extends compatibility to Arm 64-bit architectures, incorporating optimized kernels for broader model support on Arm-based systems.^[54]^[55] Graphics processing units (GPUs) in OpenVINO encompass both integrated and discrete variants, such as Intel Iris Xe integrated graphics and the Arc series discrete GPUs. These are accelerated via the oneAPI Level Zero interface, with backend support for OpenCL on integrated GPUs and SYCL for more advanced programmability on discrete models, allowing for high-throughput parallel processing of neural network layers.^[56] Neural processing units (NPUs) provide dedicated AI acceleration in OpenVINO, targeting low-power inference scenarios on Intel Core Ultra processors starting from Meteor Lake and extending to subsequent generations. These units offload compute-intensive tasks from the CPU and GPU, optimizing for energy efficiency in always-on applications like real-time AI on laptops and mobile devices. In 2025, OpenVINO added full support for Lunar Lake NPUs, enhancing capabilities for generative AI workloads through updated drivers and runtime optimizations.^[57]^[58]^[59] Vision processing units (VPUs), such as the Intel Movidius Myriad X, were historically supported in earlier OpenVINO versions for edge devices like cameras, connected via USB or PCIe interfaces to enable compact, low-latency inference at the network periphery. However, as of OpenVINO 2023.0 and later releases including 2025, dedicated VPU support has been discontinued, with models redirected to CPU or GPU execution.^[60]^[61]^[62] Field-programmable gate arrays (FPGAs), particularly Intel Agilex devices, are targeted through the OpenVINO FPGA Plugin within the FPGA AI Suite, allowing customizable hardware acceleration for high-performance inference in data center and embedded environments. This plugin facilitates model deployment on reconfigurable logic, optimizing for specific topologies via compiled bitstreams.^[63]^[64] Performance benefits vary by hardware and model precision, with NPUs delivering significant speedups over CPUs for INT8 quantized models—typically 3 to 5 times faster in common inference tasks—due to their specialized matrix multiply units and reduced power consumption. For example, OpenVINO benchmarks on Core Ultra NPUs show improved throughput for vision models compared to CPU-only execution. Device selection in OpenVINO is managed programmatically via the ov.Core().set_property() method, supporting multi-device execution modes like AUTO for automatic optimal assignment or explicit configurations for heterogeneous setups across CPU, GPU, and NPU.^[65]^[39]^[66]

Applications

Computer Vision

OpenVINO facilitates a range of traditional computer vision tasks, including object detection, pose estimation, and semantic segmentation, by optimizing deep learning models for efficient inference on Intel hardware. For object detection, it supports models such as YOLO variants (e.g., YOLOv8, YOLOv11) and SSD-based architectures like MobileNet-SSD and SSD300, which enable real-time identification and localization of objects in images or video streams. Pose estimation is handled by specialized models like human-pose-estimation-0001 and human-pose-estimation-3d-0001, which detect human keypoints for applications requiring body posture analysis. Semantic segmentation models, such as road-segmentation-adas-0001 and U-Net variants like unet-camvid-onnx-0001, assign class labels to every pixel in an image, supporting tasks like scene understanding in urban environments.^[67]^[68]^[69]^[70] The OpenVINO Model Zoo provides a repository of pre-trained models specifically optimized for Intel processors, GPUs, and VPUs, allowing developers to deploy these without extensive reconfiguration. These models are converted to the OpenVINO Intermediate Representation (IR) format and quantized (e.g., to INT8) to reduce latency and memory usage while preserving accuracy, particularly on Intel CPUs and integrated graphics. For instance, ResNet-50, a foundational classification model often used in vision pipelines, achieves over 30 FPS on standard Intel CPUs in latency mode, demonstrating the toolkit's efficiency for real-time applications.^[71] In edge deployments, OpenVINO excels in real-time video analytics on resource-constrained devices like smart cameras equipped with Intel Movidius Myriad X VPUs, enabling low-power inference for continuous processing. These VPUs handle tasks such as object tracking in surveillance feeds at the edge, minimizing data transmission to the cloud. Practical examples include facial recognition systems, where models like face-detection-adas-0001 process video to identify individuals securely, and autonomous driving perception pipelines that integrate detection and segmentation for obstacle avoidance and lane analysis.^[72]^[73] OpenVINO integrates seamlessly with OpenCV for preprocessing (e.g., image resizing, normalization) and postprocessing (e.g., non-maximum suppression for bounding boxes), streamlining end-to-end computer vision workflows. This combination allows developers to leverage OpenCV's robust image handling alongside OpenVINO's optimized inference engine, as seen in demos for multi-model pipelines.^[74]^[75]

Generative AI

OpenVINO supports a range of generative AI models, enabling efficient inference for tasks such as text generation and image synthesis on Intel hardware. Key supported models include Stable Diffusion for high-quality image generation from text prompts and large language models (LLMs) like Llama, integrated seamlessly through the Hugging Face ecosystem via the Optimum Intel library.^[76]^[77]^[78] To accelerate generative workflows, OpenVINO incorporates optimizations like key-value (KV) cache management, which stores intermediate computations to reduce redundant calculations during autoregressive generation, and speculative decoding, which uses a smaller draft model to predict tokens ahead of verification by the main model, thereby speeding up token generation.^[79]^[80] These techniques can achieve up to 2.5x improvements in token throughput for models like Llama-2-7B on Intel processors.^[81] In 2025 releases, OpenVINO introduced enhanced support for vision-language models (VLMs), such as Phi-3-Vision, enabling multimodal tasks that combine visual and textual inputs; image-to-image pipelines for editing and style transfer using diffusion models; and token eviction mechanisms in the KV cache to handle long sequences by dynamically managing memory for extended contexts.^[17]^[82]^[83] Deployments of generative AI with OpenVINO emphasize on-device execution on AI PCs equipped with Neural Processing Units (NPUs), such as Intel Core Ultra laptops, minimizing reliance on cloud resources for privacy-sensitive applications. For instance, text-to-image generation with Stable Diffusion runs efficiently on Core Ultra hardware, while LLM-based chatbots like those using Llama benefit from NPU acceleration compared to CPU-only inference.^[6]^[59]^[84] These features address key challenges in deploying large generative models on edge hardware, particularly memory efficiency, by compressing weights, optimizing cache usage, and leveraging hardware-specific accelerations to fit billion-parameter models within limited resources without sacrificing output quality.^[80]^[85]

Automatic Speech Recognition

OpenVINO supports automatic speech recognition (ASR) tasks by optimizing models for efficient inference on Intel hardware, enabling real-time transcription and voice processing applications. Key models include Whisper and Distil-Whisper from Hugging Face, which perform speech-to-text conversion with high accuracy across multiple languages. These models are converted to OpenVINO IR format and quantized for reduced latency on CPUs, GPUs, and NPUs.^[86]^[87] In practical deployments, OpenVINO-powered ASR is used in edge devices for applications like voice assistants, live captioning, and meeting transcription, where low-latency processing is critical. For example, Whisper-large-v3, quantized to INT4, achieves efficient performance on Intel Core Ultra processors, supporting long audio sequences with minimal resource usage. Integration with audio preprocessing libraries enhances end-to-end workflows for continuous speech recognition.^[88]^[89]

References

[1]
Intel® Distribution of OpenVINO™ Toolkit
OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere. AI anywhere illustration of developers working. Overview ...OpenVINO™ Model Hub · AI PC · Edge AI Reference Kits
[2]
OpenVINO 2025.3 — OpenVINO™ documentation — Version(2025)
OpenVINO is an open-source toolkit for deploying performant AI solutions in the cloud, on-prem, and on the edge alike.Learn OpenVINO · OpenVINO Tokenizers · openvino.PartialShape · Documentation
[3]
Streamline Your Intel Distribution of OpenVINO Toolkit Development ...
Mar 3, 2020 · Back in 2018, Intel launched the Intel® Distribution of OpenVINO™ toolkit. Since then, it's been widely adopted by partners and developers to ...
[4]
OpenVINO™ is an open source toolkit for optimizing and deploying ...
OpenVINO™ supports inference on CPU (x86, ARM), GPU (Intel integrated & discrete GPU) and AI accelerators (Intel NPU).Awesome OpenVINO · Openvinotoolkit/openvino Wiki · OpenVINO model servers
[5]
[PDF] Fact Sheet: OpenVINO - Intel
Full support for Intel Meteor Lake, NPU. • Streamlined AI model deployment via. OpenVINO Runtime and Model Server. • Access to MediaPipe for multipurpose AI.
[6]
OpenVINO™ Toolkit for AI PC - Intel
Intel provides highly optimized developer support for AI workloads by including the OpenVINO™ toolkit on your PC.
[7]
Intel® Distribution of OpenVINO™ Toolkit
Sep 5, 2025 · This package contains the Intel® Distribution of OpenVINO™ Toolkit software version 2025.3 for Linux*, Windows* and macOS*.
[8]
OpenVINO™ Model Hub - Intel
Discover the performance difference OpenVINO toolkit can deliver across AI models on Intel® hardware platforms from the edge to AI PCs.
[9]
What's New in the Intel® Distribution of OpenVINO™ Toolkit
OpenVINO™ 2025.3 takes your AI deployments to the next level with new features and performance enhancements. In this release, you'll see continuous improvements ...
[10]
OpenVINO Toolkit: Optimize AI on Intel Hardware - Viso Suite
The name stands for “Open Visual Inference and Neural Network Optimization.” OpenVINO focuses on optimizing neural network inference with a write-once, deploy- ...
[11]
Intel Accelerates PadChest and fMRI Models on Azure ML
Sep 15, 2023 · The OpenVINO toolkit provides additional performance gains and ... These optimizations reduce latency and increase throughput without sacrificing ...
[12]
Where to Download the OpenVINO™ toolkit - Intel
The OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), ...
[13]
Intel Launches Neural Network Toolkit for High-Performance IoT ...
May 16, 2018 · On Wednesday Intel Corp. announced a new artificial intelligence (AI) solution: Open Visual Inference & Neural Network Optimization toolkit, ...Missing: press | Show results with:press
[14]
OpenVINO model optimization - OpenCV
Oct 16, 2020 · OpenVINO Toolkit provides Model Optimizer – a tool that optimizes the models for inference on target devices using static model analysis.Missing: 2018-2020 | Show results with:2018-2020
[15]
Intel® Distribution of OpenVINO™ Toolkit Release Notes 2021.4
This 2021.4.2 release provides functional bug fixes, and minor capability changes for the previous 2021.4.1 release.
[16]
Release Notes for Intel® Distribution of OpenVINO™ Toolkit 2023.0
May 30, 2023 · We are proud to announce the release of OpenVINO 2023.0 introducing a range of new features, improvements, and deprecations aimed at enhancing the developer ...Missing: June | Show results with:June
[17]
Release Notes for Intel Distribution of OpenVINO Toolkit 2025.1
Apr 10, 2025 · NPU acceleration for text generation is now enabled in OpenVINO™ Runtime and OpenVINO™ Model Server to support the power-efficient deployment of ...
[18]
Release Notes for Intel Distribution of OpenVINO Toolkit 2025.3
Sep 3, 2025 · What's new: More Gen AI coverage and frameworks integrations to minimize code changes, Broader LLM model support and more model compression ...Missing: generative | Show results with:generative
[19]
OpenVINO Release Notes
2025.3 - 3 September 2025#. System Requirements | Release policy | Installation Guides. What's new#. More Gen AI coverage and frameworks integrations to ...
[20]
Deep Learning accuracy validation framework
The Accuracy Checker is an extensible, flexible, and configurable Deep Learning accuracy validation framework with a modular structure.
[21]
Legacy Features and Components - OpenVINO™ documentation
Post-training Optimization Tool (POT). New solution: Neural Network Compression Framework (NNCF) now offers the same functionality. Old solution: POT ...<|separator|>
[22]
[Deprecated] Post-training Quantization with POT
For the needs of post-training optimization, OpenVINO provides a Post-training Optimization Tool (POT) which supports the uniform integer quantization method.
[23]
OpenVINO is powered by OneDNN for the best performance on ...
Jun 28, 2023 · OpenVINO™ is a framework designed to accelerate deep-learning models from DL frameworks like Tensorflow or Pytorch. By using OpenVINO ...
[24]
Documentation OpenVINO IR format
OpenVINO IR, known as Intermediate Representation, is the result of model conversion in OpenVINO and is represented by two files: an XML and a binary file.<|control11|><|separator|>
[25]
Convert to OpenVINO IR
OpenVINO IR is the proprietary model format used by OpenVINO, typically obtained by converting models of supported frameworks.
[26]
Conventional Model Preparation - OpenVINO™ documentation
OpenVINO supports the following model formats: PyTorch,. TensorFlow ... It converts files from one of the supported formats to OpenVINO IR, which ...Convert to OpenVINO IR · Generative Model Preparation · Setting Input Shapes
[27]
Convert a PyTorch Model to OpenVINO™ IR
Starting from the 2023.0 release OpenVINO supports direct PyTorch models conversion to OpenVINO Intermediate Representation (IR) format. OpenVINO model ...
[28]
Converting a PyTorch Model - OpenVINO™ documentation
To convert the model, use the openvino.convert_model function. Here is the simplest example of PyTorch model conversion using a model from torchvision.Missing: oneAPI | Show results with:oneAPI<|separator|>
[29]
Conversion Parameters - OpenVINO™ documentation
This document describes all available parameters for openvino.convert_model, ovc, and openvino.save_model without focusing on a particular framework model ...
[30]
OpenVINO Extensibility Mechanism
Explore OpenVINO™ Extensibility API, which allows adding support for models with custom operations and their further implementation in applications.
[31]
Custom OpenVINO Operations
Explore OpenVINO™ Extension API which enables registering custom operations to support models with operations not supported by OpenVINO.Missing: SDK | Show results with:SDK
[32]
OpenVINO™ API 2.0 Transition Guide
This guide introduces the new OpenVINO™ API: API 2.0, as well as the new OpenVINO IR model format: IR v11. Here, you will find comparisons of their “old” and “ ...Missing: history | Show results with:history
[33]
Operation Sets in OpenVINO
Learn the essentials of representing deep learning models in OpenVINO IR format and the use of supported operation sets.Missing: SDK | Show results with:SDK
[34]
Conventional AI Workflow - OpenVINO™ documentation
Supported file formats: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite. PyTorch files are not directly supported. OpenVINO files are read ...
[35]
Setting Input Shapes - OpenVINO™ documentation
openvino.convert_model supports conversion of models with dynamic input shapes that contain undefined dimensions.
[36]
Model Optimization - NNCF - OpenVINO™ documentation
Model optimization improves performance and reduces size. NNCF is the default tool, using compression algorithms to make models smaller and faster.Missing: 2018-2020 | Show results with:2018-2020
[37]
OpenVINO™ Inference Request
To set up and run inference, use the ov::InferRequest class. It enables you to run inference on different devices either synchronously or asynchronously.
[38]
General Optimizations - OpenVINO™ documentation
This article covers application-level optimization techniques, such as asynchronous execution, to improve data pipelining, pre-processing acceleration and so ...<|control11|><|separator|>
[39]
Post-training Quantization - OpenVINO™ documentation
Post-training quantization is a method of reducing the size of a model, to make it lighter, faster, and less resource hungry. Importantly, this process does ...
[40]
Intel OpenVINO Export - Ultralytics YOLO Docs
In this guide, we cover exporting YOLO11 models to the OpenVINO format, which can provide up to 3x CPU speedup, as well as accelerating YOLO inference on Intel ...
[41]
Quantization-aware Training (QAT) - OpenVINO™ documentation
Quantization-aware Training is a popular method that allows quantizing a model and applying fine-tuning to restore accuracy degradation caused by quantization.<|control11|><|separator|>
[42]
Basic Quantization Flow - OpenVINO™ documentation
Prepare a representative calibration dataset that is used to estimate quantization parameters of the activations within the model, for example, of 300 samples.
[43]
Filter Pruning of Convolutional Models - OpenVINO™ documentation
Filter pruning is an advanced optimization method that allows reducing the computational complexity of the model by removing redundant or unimportant filters.
[44]
Accelerate Inference of Sparse Transformer Models with OpenVINO ...
This tutorial demonstrates how to improve performance of sparse Transformer models with OpenVINO on 4th Gen Intel Xeon Scalable processors.Missing: fusion | Show results with:fusion
[45]
Overview of Transformations API - OpenVINO™ documentation
If your transformation inserts constant sub-graphs that need to be folded, do not forget to use ov::pass::ConstantFolding() after your transformation or call ...
[46]
Group Common optimization passes - OpenVINO™ documentation
Dequantization subgraph may have two forms: with and without Subtract. Input │ ▽ Convert ZeroPoints │ │ ▽ ▽ Input Subtract │ │ ▽ │ Scale Convert Scale │ │ │ │ ▽ ...
[47]
How Automatic Batching Works - OpenVINO™ documentation
This article provides a preview of the Automatic Batching function, including how it works, its configurations, and testing performance.
[48]
Advanced Throughput Options: Streams and Batching
With OpenVINO streams a device may handle processing multiple inference requests and the batching helps to saturate the device and leads to higher ...
[49]
openvino_genai.CacheEvictionConfig - OpenVINO™ documentation
max_cache_size (int) – Maximum number of tokens that should be kept in the KV cache. The evictable block area will be located between the “start” and “recent” ...
[50]
System Requirements - OpenVINO™ documentation
Higher versions of kernel might be required for 10th Gen Intel® Core ... Refer to the OpenVINO Release Policy to learn more about the release types.Missing: history | Show results with:history
[51]
openvino - PyPI
OpenVINO supports the CPU, GPU, and NPU devices and works with models from PyTorch, TensorFlow, ONNX, TensorFlow Lite, PaddlePaddle, and JAX/Flax frameworks.Missing: April | Show results with:April
[52]
Release Notes for Intel Distribution of OpenVINO Toolkit 2023.2
Nov 27, 2024 · Windows 11, 64-bit. CentOS 7. Red Hat Enterprise Linux 8, 64-bit. Intel® Neural Processing Unit with corresponding operating systems.
[53]
GLIBC library mismatch for Debian12 - Intel Community
Mar 6, 2025 · Currently, Debian 12 is not officially supported for OpenVINO 2025. The issue you encountered is due to OpenVINO 2025 requiring GLIBC 2.38 ...
[54]
Can we use an android device to run inference using OpenVINO IE ...
Aug 4, 2020 · OpenVINO isn't compatible with processors used by the Android devices yet. It is enabled for Intel Hardware like CPU, VPU, FPGA, GPU as we pass in these as a ...Missing: ios | Show results with:ios
[55]
Intel - OpenVINO™ | onnxruntime
The OpenVINO Execution Provider supports the following devices for deep learning model execution: CPU, GPU, and NPU. Configuration supports both single device ...
[56]
Supported Devices — OpenVINO™ documentation — Version(2025)
The OpenVINO™ runtime enables you to use the following devices to run your deep learning models: CPU, GPU, NPU. For their usage guides, see Devices and Modes.<|control11|><|separator|>
[57]
OpenVINO toolkit for ARM platforms overview
Feb 17, 2025 · Beyond ACL, the plugin supports additional ARM-optimized kernels available through OneDNN, enabling broader model compatibility. ‍. It is ...
[58]
oneAPI Deep Neural Network Library (oneDNN) - GitHub
oneDNN supports systems meeting the following requirements: Operating system with Intel 64 / Arm 64 / Power / IBMz architecture support; C++ compiler with C ...Releases 188 · Issues 28 · Pull requests 116 · Workflow runs
[59]
Configurations for Intel® Processor Graphics (GPU) with OpenVINO
Learn how to provide additional configuration for Intel® Processor Graphics (GPU) to work with Intel® Distribution of OpenVINO™ toolkit on your system.
[60]
NPU Device - OpenVINO™ documentation
The NPU device is currently supported by AUTO inference modes (HETERO execution is partially supported, for certain models). The NPU support in OpenVINO is ...Missing: 2025.1 April<|separator|>
[61]
Intel® NPU Driver - Windows*
Intel ® NPU Driver for Windows* (Intel ® AI Boost) includes support for OpenVino™ 2025.3 ; Operating System: Microsoft Windows 11* 64-bit (22H2, 23H2, 24H2, 25H2) ...
[62]
OpenVINO GenAI on NPU
This guide will give you extra details on how to use NPU with OpenVINO GenAI. See the installation guide for information on how to start.Missing: April | Show results with:April
[63]
Supported Devices — OpenVINO™ documentation — Version(2024)
With the OpenVINO™ 2023.0 release, support has been cancelled for: Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X.Missing: history | Show results with:history
[64]
Intel Movidius Myriad X Openvino Version and Support
Jun 11, 2025 · Thank you for reaching out to us. For your information, OpenVINO™ 2022.3 LTS will be the last version of OpenVINO™ to support the Myriad plugin.Compile model for myriad vpu - Intel CommunityInstalling OpenVino for Movidius NCS2 on Raspberry Pi 4More results from community.intel.comMissing: VPU | Show results with:VPU
[65]
Intel® Vision Accelerator Design with Intel® Movidius™ VPU
Support for heterogeneous execution across various accelerators—CPU, GPU, Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU), and FPGA—using a common API.
[66]
3.3.2. OpenVINO™ FPGA Runtime Plugin - Intel
FPGA AI Suite: Design Examples User Guide · 1. OpenVINO™ FPGA Runtime Overview 3.3. · 2. OpenVINO™ FPGA Runtime Plugin 3.3. · 3. FPGA AI Suite Runtime 3.3. · 4.
[67]
FPGA AI Suite - AI Inference Development Platform - Altera
The FPGA AI Suite enables FPGA designers, machine learning engineers, and software developers to create optimized FPGA AI platforms efficiently.
[68]
Hello NPU - OpenVINO™ documentation
Note that the actual performance may depend on the hardware used. Generally, we should expect NPU to be better than CPU. Please refer to the benchmark_app log ...
[69]
Multi-device execution - OpenVINO™ documentation
The Multi-Device execution mode in OpenVINO Runtime assigns multiple available computing devices to particular inference requests to execute in parallel.
[70]
Convert and Optimize YOLOv8 real-time object detection with ...
This tutorial demonstrates step-by-step instructions on how to run and optimize PyTorch YOLOv8 with OpenVINO.
[71]
mobilenet-ssd - OpenVINO™ documentation
The mobilenet-ssd model is a Single-Shot multibox Detection (SSD) network intended to perform object detection. This model is implemented using the Caffe* ...
[72]
human-pose-estimation-0001 - OpenVINO™ documentation
This is a multi-person 2D pose estimation network (based on the OpenPose approach) with tuned MobileNet v1 as a feature extractor. For every person in an image, ...
[73]
unet-camvid-onnx-0001 - OpenVINO™ documentation
This is a U-Net model that is designed to perform semantic segmentation. The model has been trained on the CamVid dataset from scratch using PyTorch framework.
[74]
Model Zoo — OpenVINO™ documentation — Version(2024)
Models, demos and full documentation are available in the Open Model Zoo GitHub repo and licensed under Apache License Version 2.0. Browse through over 200 ...Missing: computer | Show results with:computer
[75]
Edge AI-based Vision Analytics Powered by Intel OpenVINO and ...
GoodVision Real-time Traffic Video Analytics · SOP Compliance Monitoring for ... Edge AI-based Vision Analytics Powered by Intel OpenVINO and Myriad X VPU ...
[76]
Bird's Eye View Perception: Fast-BEV — Intel Embodied Intelligence ...
The FastBEV model is trained using PyTorch but can achieve optimized inference performance on Intel devices using OpenVINO. To enable this, the PyTorch model ...
[77]
Face Recognition Python* Demo - OpenVINO™ documentation
The demo uses 3 models to detect faces, predict keypoints, and recognize persons by matching faces to a gallery, processing video frame by frame.
[78]
How to use OpenCV with OpenVINO
Nov 16, 2022 · This script made it easy to install both OpenVINO and OpenCV together. Starting in the 2022.1.1 release we removed the OpenCV download script ...
[79]
Inference with Optimum Intel - OpenVINO™ documentation
The steps below show how to load and infer LLMs from Hugging Face using Optimum Intel. They also show how to convert models into OpenVINO IR format.
[80]
OpenVINO - Hugging Face
This guide will show you how to use the Stable Diffusion and Stable Diffusion XL (SDXL) pipelines with OpenVINO.
[81]
OpenVINO GenAI
Compatible with popular models including Llama, Mistral, Phi, Qwen, Stable Diffusion, Flux, Whisper, etc. Easy model conversion from Hugging Face and ModelScope ...
[82]
Text Generation via Speculative Decoding using FastDraft and ...
In this tutorial we consider how to apply Speculative decoding using FastDraft and OpenVINO GenAI.
[83]
Leveraging Speculative Sampling and KV-Cache Optimizations ...
Nov 8, 2023 · In this article, we illustrate a form of dynamic execution known as speculative sampling to reduce the overall latency of text generation.Missing: management decoding
[84]
[PDF] Accelerate Innovation Generative AI with the toolkit - Intel
OpenVINO 2024.2 optimizations for dGPUs deliver up to 1.6xperformance gains on the. Stable Diffusion transformer model. Intel® Discrete GPU Arc A770M. 1.6x ...
[85]
Introducing OpenVINO™ 2025.1 - Medium
Apr 11, 2025 · We are pleased to announce the next release of OpenVINO™ for 2025 with more enhancements and features from our engineering team.
[86]
Releases · openvinotoolkit/openvino - GitHub
Aug 26, 2025 · OpenVINO™ is an open source toolkit for optimizing and deploying AI inference - Releases · openvinotoolkit/openvino.
[87]
Generative AI on Intel AI PC: Local Power & Control - Plain Concepts
Intel Core™ Ultra laptops can run AI models locally using compatible frameworks such as OpenVINO, ONNX Runtime, Hugging Face Optimum Intel, and Azure Foundry ...Azure Foundry Local... · Openvino: Direct Access To... · Model Compression And...<|separator|>
[88]
Reduce LLM Footprint with OpenVINO™ Toolkit Weight Compression
Jul 2, 2024 · Large language models (LLMs) enable conversational AI, giving rise to powerful chatbots and personal assistants with the potential to boost ...Missing: throughput | Show results with:throughput