Vector Packet Processing
Vector Packet Processing (VPP) is an open-source, extensible framework that provides high-performance switch and router functionality by processing packets in user space on commodity CPUs, leveraging a vector-based approach to handle multiple packets simultaneously rather than one at a time as in traditional scalar processing.[1] This method reduces instruction cache thrashing and read latency while improving overall circuit efficiency, with the per-packet processing cost decreasing as vector sizes increase.[1] Developed originally by Cisco as a production-grade technology since 2002 and now maintained under the FD.io project, VPP has been deployed in commercial products generating over $1 billion in revenue and supports a wide range of networking protocols including IPv4, IPv6, MPLS, VLAN, and IPsec.[2][1][3] At its core, VPP operates through a modular graph architecture composed of pluggable nodes, where each node processes a vector of packet indices, enabling efficient data plane operations across layers 2 through 4 of the OSI model.[1][4] This design allows for seamless integration of plugins to extend functionality, such as hardware acceleration or custom graph rearrangements, without requiring kernel modifications, and it runs on multiple architectures including x86, ARM, and PowerPC in environments like bare metal, virtual machines, or containers.[2][3] Independent benchmarks demonstrate VPP's superior throughput, achieving over 14 million packets per second (MPPS) on a single core for IPv4/IPv6 forwarding and exceeding 100 Gbps full-duplex line rate, often outperforming kernel-based networking stacks by two orders of magnitude.[1][2] VPP's versatility makes it suitable for diverse applications, including virtual switches, routers, gateways, firewalls, and load balancers, with native support for integrations into cloud-native ecosystems like OpenStack and Kubernetes.[4] Its emphasis on scalability, low latency, and stability positions it as a foundational component for high-speed networking in data centers, edge computing, and service provider environments.[2]Overview
Definition and Purpose
Vector Packet Processing (VPP) is an extensible, open-source framework that provides layer 2-4 network stack functionality, enabling the development of high-performance switches, routers, and virtualized network elements on commodity hardware.[4] It operates in user space, bypassing traditional kernel networking to deliver scalable packet processing for diverse applications, including virtual switches, routers, gateways, firewalls, and load balancers.[4] As part of the FD.io project, VPP supports multi-platform deployment across architectures such as x86, ARM, and PowerPC, making it suitable for modern networking environments.[4] The primary purpose of VPP is to facilitate high-performance and scalable packet processing by avoiding the overhead associated with kernel-based networking stacks, which often limit throughput due to context switching and interrupt handling.[5] This user-space approach is particularly valuable for Network Function Virtualization (NFV) and Software-Defined Networking (SDN) workloads, where rapid packet forwarding and low-latency operations are essential to support virtualized infrastructures and programmable networks.[6] By running on commercial off-the-shelf processors, VPP achieves up to 100 times greater packet processing throughput compared to traditional kernel networking, enabling line-rate performance on high-speed interfaces.[5] At its core, VPP employs a vector processing model that handles packets in batches, known as vectors, rather than processing them individually as in scalar approaches.[7] These vectors, which can contain up to 256 packets, are collected from network device receive rings and routed through a directed graph of processing nodes, allowing multiple packets to share computational resources efficiently.[7] This batching improves CPU cache efficiency by warming the instruction cache (I-cache) with the first packet in the vector, enabling subsequent packets to benefit from cache hits and reducing per-packet overhead, including I-cache miss stalls by up to two orders of magnitude.[7]Key Characteristics
Vector Packet Processing (VPP) is engineered as a highly modular framework that enables the construction of custom packet processing graphs through a plugin-based architecture. This design treats plugins as first-class components, allowing developers to extend functionality by integrating new network nodes while reusing existing ones for rapid prototyping of bespoke forwarding behaviors.[8] The core consists of a directed graph of forwarding nodes supported by an extensible infrastructure, which facilitates the separation of packet processing logic from the underlying hardware, promoting flexibility in deploying virtual switches, routers, and network function virtualization (NFV) elements.[8][4] VPP demonstrates strong scalability across diverse hardware environments, supporting multiple processor architectures such as x86, ARM, and PowerPC, which ensures portability in both commodity servers and specialized networking appliances.[4] It efficiently leverages multi-core systems, achieving linear throughput scaling with additional cores—for instance, delivering up to 948 Gbps aggregate performance on an Intel Xeon Platinum 8168 processor with 512-byte packets (as demonstrated in 2017)—by distributing packet processing workloads across threads without significant contention.[8] This multi-platform compatibility, combined with integration capabilities like DPDK plugins, positions VPP for deployment in cloud-native and edge computing scenarios requiring high aggregate bandwidth.[4] A defining trait of VPP is its deterministic performance profile, achieved through execution in Linux user-space and the use of poll-mode drivers that bypass kernel interrupts for direct hardware access. This approach minimizes latency variations, ensuring predictable packet handling even under high loads, with per-core forwarding rates exceeding 50 Gbps for internet mix (IMIX) traffic on Intel Xeon E5-2667 v4 processors (as of 2017 benchmarks).[8] By avoiding the overhead of context switches and interrupt-driven I/O common in kernel-based stacks, VPP maintains consistent low-jitter processing, which is critical for real-time applications like 5G user plane functions.[8] VPP employs an event-driven, non-blocking I/O model that sustains continuous packet flows by actively polling receive (RX) queues and processing packets in vector batches, eliminating the delays associated with traditional interrupt-based mechanisms. This polling strategy, integrated with the vector processing paradigm, optimizes CPU cache utilization and SIMD instructions for efficient bulk operations, contributing to its high-throughput capabilities without blocking on asynchronous events.[8] As an open-source project governed by the FD.io collaboration, VPP benefits from contributions across multiple vendors, including Cisco, Intel, and Ericsson, fostering a robust ecosystem of shared innovations and interoperability testing.[8] This community-driven development model, hosted under the Linux Foundation, ensures ongoing enhancements while maintaining compatibility with standards like those from the Open Network Edge Services Software (ONAP) and ETSI NFV. As of November 2025, VPP continues active development with the latest release candidate v26.02, incorporating enhancements in session layer features and performance optimizations.[4][4]History
Origins at Cisco
Vector Packet Processing (VPP) originated within Cisco Systems in the early 2000s, initiated in 2002 as a high-performance software-based approach to packet forwarding, with foundational patent work beginning around 2004. The technology was developed to enable efficient processing of network traffic on commodity hardware, addressing the limitations of scalar packet processing by handling multiple packets simultaneously in vectors. This innovation stemmed from Cisco's need for scalable data plane capabilities in its networking products, evolving from earlier generations of proprietary packet processing engines that integrated hardware and software stacks for optimized throughput.[9][10][11] Central to VPP's development was Cisco Fellow David Barach, recognized as the primary inventor of the vector packet processing framework. Barach's contributions built on his expertise in high-speed networking data planes, leading to the filing of US Patent 7,961,636 in 2004, which describes vectorized software packet forwarding techniques for concurrent processing of packet vectors through a directed graph of nodes. The patent, assigned to Cisco Technology, Inc., and issued in 2011, outlined methods to minimize cache misses by loading instructions once per vector and adaptively controlling vector sizes to meet low-latency targets, such as 50 microseconds. Over more than two decades, this technology has undergone continuous evolution within Cisco, powering the data planes of various products and contributing to over $1 billion in shipped revenue.[9][1] Initially deployed proprietarily in Cisco's high-end routers and switches, VPP enabled line-rate performance for Ethernet and IP/Multiprotocol Label Switching (MPLS) services, sustaining up to 14.88 million packets per second on 10 Gbps links in software environments. Its principles were integrated into core forwarding engines of Cisco's carrier-grade routers, such as the ASR series, to achieve wire-speed processing without dedicated hardware acceleration. This proprietary implementation focused on modularity and extensibility, allowing seamless integration with Cisco's broader ecosystem before the technology's later open-sourcing in 2016.[10]Open-Sourcing and FD.io
In 2016, Cisco announced the open-sourcing of its proprietary Vector Packet Processing (VPP) technology by donating the core codebase to the Linux Foundation's newly launched Fast Data Input/Output (FD.io) project on February 11, aimed at accelerating high-performance networking software development.[12][13] This transition marked VPP's shift from a closed-source Cisco asset to a collaborative open-source platform, enabling broader industry adoption for scalable packet processing in virtualized environments. Under FD.io's governance within the Linux Foundation, VPP has benefited from multi-vendor contributions, with key supporters including Cisco, Intel, Red Hat, Ericsson, 6WIND, Huawei, AT&T, Comcast, Cavium Networks, ZTE, and Inocybe, fostering a diverse ecosystem for ongoing enhancements.[8] The project's structure promotes modular development, allowing participants to contribute plugins, drivers, and optimizations while maintaining VPP as the central data plane component.[2] Key milestones include the initial open-source release, VPP 16.06, in June 2016, which established the foundational vector processing stack.[14] By 2018, VPP achieved significant integrations, such as with OpenStack Neutron for virtual networking and Kubernetes for containerized deployments, demonstrated at events like the FD.io Mini-Summit at KubeCon Europe.[15][16] The project continues with biannual releases following a year.month naming convention, including VPP 25.06 in June 2025, which incorporated advancements in multi-architecture support and security features.[17] VPP's growth under FD.io has attracted numerous contributors cumulatively, driving its adoption in telecommunications and cloud infrastructures for high-throughput applications like edge computing and service function chaining.[18] FD.io has played a pivotal role in standardizing VPP as a universal data plane for Network Functions Virtualization (NFV), providing a performant, hardware-agnostic foundation that decouples control and data planes across diverse NFV environments.[19][20]Architecture
Vector Processing Model
Vector Packet Processing (VPP) employs a batching mechanism where packets are grouped into vectors, typically comprising up to 256 packets, which are processed as a single unit to minimize per-packet overhead such as function calls and context switches.[8] This approach contrasts with scalar processing, where each packet is handled individually, leading to inefficiencies like repeated instruction fetches and deeper call stacks. By assembling these vectors from receive (RX) rings on network interfaces, VPP enables bulk operations that amortize fixed costs across multiple packets, enhancing overall throughput.[21] In the processing pipeline, incoming vectors are classified based on packet attributes and dispatched en masse to appropriate handler nodes, allowing for parallel execution of operations on the batch. VPP leverages Single Instruction, Multiple Data (SIMD) instructions, such as Intel SSE and AVX, to perform computations across packet fields simultaneously, further optimizing parallel workloads like checksum calculations or header parsing. This bulk dispatching reduces context switches between packets and improves CPU cache utilization by keeping related data in locality, as the same code paths are executed repeatedly on the vector rather than scattering accesses.[8] Compared to scalar methods, vector processing can achieve significantly lower cycles per packet—often under 200 cycles for basic forwarding—due to these amortizations.[21] The efficiency of this model can be illustrated by a simplified throughput equation, where the processing rate (in packets per second) is approximately given by: \text{Processing rate} \approx \frac{\text{vector size} \times \text{CPU frequency}}{\text{cycles per vector}} This formulation highlights the batching benefits: larger vector sizes directly scale throughput by distributing the cycles required for vector-level operations across more packets, assuming constant latency per vector.[8] In practice, VPP dynamically adjusts vector sizes based on input rates to balance latency and utilization. For exceptional cases, such as packets requiring special handling (e.g., errors or unsupported features), individual packets are diverted from the vector using the VLIB punt infrastructure. These packets are tagged with a reason code during node processing and routed to dedicated sink nodes or the control plane, while the remaining vector continues uninterrupted to maintain bulk efficiency.[22] This selective diversion ensures that anomalies do not degrade the performance of the majority of traffic.Node Graph and Plugins
The core of Vector Packet Processing (VPP) lies in its modular data plane, structured as a directed acyclic graph (DAG) of nodes where packets are processed in vectors through a series of specialized functions. Each node in the graph represents a discrete operation, such as classification, header rewriting, or forwarding, allowing packets to traverse the structure based on runtime decisions encoded in "next" indices that route vectors to subsequent nodes. This graph-based approach enables efficient, high-throughput processing by dispatching vectors of packets (typically 128 to 256 packets) through the nodes, with the dispatcher subdividing vectors as needed to maintain stable frame sizes and ensure complete processing before recursion.[23][24] VPP defines several node types to control dispatch behavior and integration within the graph. Input nodes (VLIB_NODE_TYPE_INPUT) handle hardware-specific ingress from network interfaces, generating initial work vectors, while pre-input nodes (VLIB_NODE_TYPE_PRE_INPUT) execute preliminary tasks before other processing. Internal nodes (VLIB_NODE_TYPE_INTERNAL) perform core packet manipulations and are invoked only when pending frames are scheduled, facilitating conditional routing via dispatch arcs. Process nodes (VLIB_NODE_TYPE_PROCESS) support cooperative multitasking for control-plane-like operations that suspend after brief execution, ensuring the graph remains focused on data-plane efficiency. Output nodes mirror input nodes for egress, completing the traversal. Within nodes, vector batching allows simultaneous processing of multiple packets to leverage SIMD instructions, as detailed in the vector processing model.[24][23] The plugin architecture enhances VPP's extensibility by allowing dynamic loading of shared libraries at runtime, without recompiling the core engine. Plugins register new graph nodes via a vlib_plugin_registration structure, which VPP discovers by scanning a designated directory for matching libraries using dlopen and dlsym for verification. This enables the addition of features such as access control lists (ACLs) or encryption modules as first-class citizens integrated seamlessly into the graph. Plugins interact with the graph through the Binary API (VPP API), a shared-memory message-passing interface that supports request-reply semantics for runtime configuration, table programming, and graph modifications by external control planes.[25][26][27] Graph configurations are serialized for reproducibility, with the data plane node graph and its arcs captured via dedicated API messages that can be uploaded and stored in structured formats. VPP's API definitions are compiled into JSON representations, facilitating the loading and application of configurations to reconstruct the graph state across restarts or deployments. This serialization supports programmatic management, ensuring consistent packet processing paths in diverse environments.[28][29]Implementation
Integration with DPDK
Vector Packet Processing (VPP) integrates with the Data Plane Development Kit (DPDK) primarily through its poll-mode drivers (PMDs), which provide direct user-space access to network interface controllers (NICs) and bypass the kernel networking stack to enable zero-copy input/output operations. This approach minimizes overhead from context switches and system calls, allowing VPP to achieve line-rate packet processing on commodity hardware. DPDK's PMDs, such as those for Intel i40e and ixgbe devices, are loaded as plugins within VPP, handling low-level device initialization and queue management.[30] At the core of this integration, VPP's input nodes utilize DPDK libraries to poll NIC hardware queues and retrieve batches of packets directly into vector structures for processing. These nodes operate in a continuous polling loop, invoking the DPDK rte_eth_rx_burst function to assemble packet vectors from multiple descriptors in a single call, thereby feeding them into VPP's node graph dispatcher for subsequent operations. This mechanism ties directly to VPP's vector processing model by ensuring that incoming traffic is handled in bulk, optimizing CPU cache utilization and reducing per-packet overhead.[23] Configuration of VPP with DPDK emphasizes system tuning for performance, including the allocation of hugepages to support efficient memory mapping for packet buffers and mbuf pools. For example, hugepages are typically set via kernel boot parameters such ashugepagesz=1GB hugepages=64 in GRUB, while disabling transparent hugepages prevents fragmentation. NUMA affinity is achieved by pinning VPP worker threads to specific cores and nodes using tools like libvirt or numactl, ensuring local memory access and avoiding cross-node latency. Multi-queue NICs are configured through DPDK's device parameters, such as specifying num-rx-queues and num-tx-queues in VPP's startup configuration file to enable receive side scaling (RSS) and distribute traffic across multiple cores.[31]
The foundational integration began with VPP's initial open-source release, version 16.06 in 2016, which was built on DPDK 16.04 and included a custom patchset for compatibility and enhancements. For handling multiple NICs in virtualized environments, VPP supports Single Root I/O Virtualization (SR-IOV) via DPDK's rte_eth_dev API, treating virtual functions (VFs) as independent Ethernet ports. This allows VPP to manage VFs with dedicated queues—for instance, configuring 2 RX and 2 TX queues per VF on Intel 82599-based devices—enabling direct assignment to virtual machines while maintaining high throughput on the physical function.[32][33]
Supported Platforms and Deployment
Vector Packet Processing (VPP) primarily supports x86-64 architectures on Intel and AMD processors, enabling high-performance packet processing on standard server hardware. It also provides full support for ARM64 architectures, including platforms like the Ampere Altra family, which feature up to 128 cores and are optimized for edge computing applications. Additionally, VPP has historical support for Power architectures, though recent packaging focuses on x86-64 and ARM64. To achieve optimal performance, deployments typically require multi-core CPUs (at least 8 cores recommended for production) and high-speed network interface cards (NICs) supporting 10 Gbps or greater, such as Intel X520 or Mellanox ConnectX series, often integrated via DPDK for direct I/O access.[34][4][35][4] VPP operates primarily in Linux userspace, with official packages available for recent Long Term Support (LTS) releases of Debian and Ubuntu distributions. In 2024, VPP introduced an official port to FreeBSD as part of the 24.10 release, allowing integration with FreeBSD's networking stack for enhanced compatibility in BSD-based environments. Experimental support for Windows exists through community efforts, but it remains unofficial and limited to basic functionality.[34][36][37] VPP is designed for flexible deployment across various environments, including bare-metal servers for maximum performance, virtual machines such as those hosted on KVM or VMware for isolated workloads, and containerized setups using Docker for lightweight orchestration. For cloud-native applications, VPP integrates with Kubernetes through plugins such as Calico's VPP dataplane, enabling pod-to-pod networking in clustered deployments.[2][38][39] Installation of VPP can be accomplished via pre-built packages from FD.io repositories, which are accessible through APT for Debian/Ubuntu, ensuring straightforward setup on supported OS versions. Alternatively, users can build VPP from source by cloning the official Git repository and compiling with tools like Make and CMake, allowing customization for specific hardware or features. Binary packages are also available for FreeBSD via the ports system.[40][41][37]Features
Packet Processing Capabilities
Vector Packet Processing (VPP) provides a comprehensive set of built-in functions for handling packets at OSI layers 2 through 4, enabling efficient forwarding and manipulation in high-performance networking environments. These capabilities are implemented through a modular graph of processing nodes, allowing packets to traverse specific functions based on configuration.[42] At Layer 2, VPP supports Ethernet bridging via configurable bridge domains that facilitate packet forwarding based on destination MAC addresses. It includes MAC learning, which dynamically populates forwarding information base (FIB) tables with learned MAC addresses, along with configurable aging timers to remove stale entries. VLAN tagging is handled through tag rewrite operations, supporting both single VLAN tags and stacked Q-in-Q configurations for sub-interface isolation and traffic segmentation.[43][44][45] Layer 3 capabilities in VPP encompass IP routing for both IPv4 and IPv6, using fast lookup tables in the FIB for efficient unicast forwarding. ARP resolution is integrated to map IP addresses to MAC addresses, with support for static and dynamic entries. ICMP handling covers error messaging and diagnostics for IPv4 (ICMP) and IPv6 (ICMPv6), including echo requests and replies. Multicast support includes route configuration for group-based distribution, enabling efficient delivery to multiple recipients via IP multicast FIB entries.[42][46][47] For Layer 4, VPP offers TCP and UDP load balancing through the NAT plugin, distributing traffic across multiple backends using static mappings and session affinity based on client IP. Network Address Translation (NAT) is provided in NAT44 and NAT64 variants, supporting endpoint-independent mapping for address conservation and IPv4-IPv6 interoperability. Access Control Lists (ACLs) enable firewalling by applying policies at IP and MAC levels, including n-tuple classification to permit or deny traffic based on source/destination addresses, ports, and protocols. Stateful processing for these Layer 4 features relies on connection tracking, which maintains session state to handle bidirectional flows, timeouts, and SYN proxying for TCP connections. As of February 2025, the VPP 25.02 release introduced enhancements including Session Layer features and async processing support for TLS, extending these capabilities.[48][49][50][51] Advanced capabilities extend these functions with support for MPLS label imposition and disposition, allowing VPP to act as an MPLS edge or core router for traffic engineering. VXLAN encapsulation and decapsulation enable overlay networking, interconnecting bridge domains across underlay networks for virtualized environments. Quality of Service (QoS) marking applies prioritization through traffic classification and marking of Differentiated Services Code Point (DSCP) fields, ensuring bandwidth allocation and low-latency handling for critical traffic.[48][52][53]Extensibility and APIs
Vector Packet Processing (VPP) offers extensibility primarily through its plugin architecture, allowing developers to add custom functionality without modifying the core codebase. Plugins are developed in C, starting with a skeleton generated by the VPP plugin generator script, which creates essential files such as the main plugin source, node implementation, and API definitions. These plugins are compiled as shared object libraries (.so files) and loaded dynamically at runtime, integrating into VPP's directed graph of processing nodes to handle specific packet processing tasks.[54]
The Binary API provides a high-performance interface for control plane applications to interact with VPP, utilizing a shared memory mechanism to enable low-latency communication between external clients and the VPP data plane. This API supports both blocking and non-blocking modes, with generated high-level bindings in languages like C and C++ ensuring type safety and efficient message handling, such as automatic byte-order conversion. It facilitates operations like configuration updates and statistics queries over the shared memory ring, minimizing overhead compared to socket-based alternatives.[55]
VPP includes a built-in Command-Line Interface (CLI) for direct configuration and management, accessible interactively or via scripts, covering tasks from interface setup to feature enabling. The Binary API and CLI enable integration with automation tools for orchestration.[56]
The FD.io VPP repository hosts numerous plugins, including those for advanced protocols like Border Gateway Protocol (BGP) and Segment Routing over IPv6 (SRv6), demonstrating the framework's modular extensibility for diverse networking features.[57]