Fact-checked by Grok 2 weeks ago

Threading Building Blocks

Intel® oneAPI (oneTBB), formerly known as (TBB), is a C++ template library developed by for task-based parallel programming on multi-core processors. It provides a runtime-based model that enables developers to break computations into parallel tasks, abstracting low-level threading details and simplifying the addition of to complex applications without requiring expertise in thread management. Originally released in 2007 as Intel's first commercial open-source software product, TBB was designed to address the challenges of multicore programming by focusing on logical parallelism rather than explicit thread creation. Over the subsequent decade, it evolved to support emerging hardware complexities, integrate with libraries like (MKL) and Intel Data Analytics Acceleration Library (DAAL), and adapt to new programming paradigms. In 2020, Intel transitioned TBB to an open-source project under the Apache License 2.0 and rebranded it as oneTBB. In 2023, oneTBB was placed under the governance of the Unified Acceleration (UXL) Foundation to foster community-driven development. Key features of oneTBB include high-level parallel algorithms (such as parallel_for and parallel_reduce), concurrent data structures like queues and hash maps, and support for nested parallelism with automatic load balancing across cores. These components promote data-parallel programming patterns, ensuring efficient resource utilization and avoiding system oversubscription in shared-memory environments. As part of the broader oneAPI toolkit, oneTBB facilitates cross-architecture portability and is compatible with other threading models, making it suitable for , scientific simulations, and data-intensive workloads.

Introduction

Overview

Intel® oneAPI (oneTBB) is a C++ template library designed for task-based parallelism, enabling developers to implement scalable multi-threading on multi-core systems without managing low-level details such as creation, , or load balancing. By abstracting these complexities, oneTBB allows programmers to express parallelism through high-level constructs, focusing instead on algorithmic logic and data decomposition. The core goals of oneTBB include abstracting hardware-specific details to produce portable code that performs efficiently across diverse multi-core architectures, enhancing developer productivity by reducing the need for explicit programming, and delivering high performance through automatic optimization of task execution on available cores. This approach contrasts with traditional threading models by emphasizing composable patterns, such as divide-and-conquer, which facilitate the construction of complex parallel workflows from simpler, reusable components. Technically, oneTBB supports C++ standards starting from C++11 and later, including features like lambda expressions and auto deductions to streamline parallel algorithm implementation. Developed by Intel and first released as open-source software in 2007, it has evolved into an open-source project under the oneTBB branding, maintained by the UXL Foundation for broader adoption and contributions. As part of the oneAPI initiative, oneTBB integrates with other tools to support heterogeneous computing ecosystems.

Naming and Evolution

Threading Building Blocks (TBB) was originally introduced by in 2007 as a C++ library for simplifying parallel programming on multi-core processors. First released as under the GNU General Public License version 2 (GPLv2) with runtime exceptions in 2007 to encourage broader adoption and community contributions, it quickly gained traction for its high-level abstractions that abstracted away low-level threading details. In 2020, rebranded TBB to oneAPI Threading Building Blocks (oneTBB) as part of the oneAPI Base Toolkit, reflecting its expanded role within the broader oneAPI ecosystem for unified programming across architectures, including CPUs, GPUs, and other accelerators. This shift broadened its scope from CPU-focused parallelism to supporting cross-platform, standards-based development, while maintaining source compatibility with prior TBB versions where possible. By the time of the oneTBB rebranding, it adopted the more permissive 2.0, facilitating integration into diverse projects without restrictions. In 2023, governance of oneTBB transferred to the UXL Foundation, an open industry consortium under the , to promote community-driven development and multi-vendor collaboration on accelerated computing standards. As of November 2025, oneTBB remains actively maintained under UXL Foundation oversight, with the latest release being version 2022.3 (October 2025), incorporating enhancements such as simplified combined use of task_arena and task_group, restored custom assertion handler support, and compatibility with 3.13. These updates underscore its ongoing adaptation to modern heterogeneous systems while preserving its core task-based parallelism model.

History and Development

Origins at

Threading Building Blocks (TBB) was initiated around 2005 by 's Software Solutions Group as a response to the emerging challenges of programming multi-core processors, following the transition from single-core designs like the era to architectures such as the Core Duo introduced in 2006. This development effort aimed to provide C++ developers with higher-level abstractions for parallelism, addressing the increasing complexity of manual threading approaches prevalent at the time. The project was motivated by limitations in existing tools, including the low-level intricacies of for thread management and the constraints of in handling nested parallelism and scalability on multi-core systems. By focusing on task-based models rather than explicit thread control, TBB sought to enable more intuitive and efficient parallel code that could adapt to varying core counts without platform-specific optimizations. The initial development was led by engineers including James Reinders, Intel's Chief Software Evangelist at the time, with key contributions from Arch Robison and others in Intel's Performance Analysis and Threading Lab. Reinders, drawing from his extensive experience in , guided the effort to create a library that avoided low-level details while promoting portability across operating systems. Influences included established parallel programming patterns from research, such as the work-stealing scheduler from MIT's Cilk project for load balancing, alongside data-parallel concepts from languages like Nesl, ensuring TBB could support generic algorithms and concurrent data structures without tying developers to specific hardware. This approach emphasized composability, allowing building blocks like loops and to be combined modularly, in contrast to the rigid structures often required by or . TBB's first public preview came in the form of a release in mid-2006, initially supporting Windows and platforms to demonstrate its cross-platform viability amid the rapid adoption of multi-core CPUs. This early version focused on core features like scalable task scheduling and templates, providing developers with tools to exploit multi-core performance without deep expertise in concurrency primitives. The marked a pivotal step in Intel's strategy to democratize parallel programming, building on internal prototypes like the Threading Runtime within the Concurrent Collections framework explored around 2005.

Key Releases and Milestones

Threading Building Blocks (TBB) version 1.0, released in 2007, marked the initial stable release of the library, providing foundational support for and introducing the to enable efficient loop-level parallelism on multi-core systems. In 2008, version 2.0 was released, which open-sourced the library under a dual GPL and commercial license while adding flow graph support to facilitate pipeline-based parallelism for streaming data processing. Version 4.0, launched in 2013, focused on improved scalability for (NUMA) architectures and included previews of GPU integration to extend parallelism beyond CPU cores. The transition to oneTBB occurred in 2020, fully open-sourcing the library under the 2.0 license as part of Intel's oneAPI initiative and placing it under the governance of the Unified Extensible Library (UXL) ; this shift emphasized cross-architecture portability, with version 2020.2 specifically introducing compatibility with for environments. From 2021 to 2025, key milestones included version 2021.1, which added concurrent ordered containers to enhance thread-safe data structures for parallel access; the October 2025 release (version 2022.3) further extended task_arena capabilities for better , introduced a preview of dynamic task dependencies in task_group, and added support for 3.13.

Design Principles

Abstraction Models

Threading Building Blocks (TBB), now part of oneAPI as oneTBB, employs abstraction models that enable developers to express parallelism at a high level, shielding them from manual thread management and low-level scheduling details. The core task-based model represents programs as directed acyclic graphs (DAGs) of tasks, where each task is a fine-grained unit of independent work that can be executed concurrently. This model allows for dynamic scheduling, where the automatically maps tasks to available threads, adapting to varying workloads and hardware configurations. A key mechanism in the task-based model is work-stealing, in which idle threads proactively "steal" tasks from the local queues of busy threads to maintain load balance and minimize idle time. This approach shifts scheduling overhead to underutilized threads, ensuring efficient resource use without requiring programmers to predict execution patterns. Tasks can include dependencies and continuations, managed through interfaces like the task_group class, which supports structured parallelism while preserving semantics. By treating tasks as the fundamental building blocks, the model excels in handling irregular or dynamic parallelism, such as in graph traversals or recursive algorithms. Building on the task-based foundation, pattern-based abstractions provide reusable templates for common parallel idioms, further simplifying code by encapsulating divide-and-conquer strategies. For example, the parallel_for template decomposes loop iterations into independent tasks, while parallel_reduce handles associative reductions with techniques like privatization to avoid race conditions. These patterns reduce the need for explicit thread creation and joining, allowing developers to focus on algorithmic logic rather than concurrency primitives. oneTBB includes eight such optimized algorithms, designed to support nested parallelism and integrate seamlessly with the task scheduler. Effective use of these abstractions hinges on the grain size concept, which balances task granularity to optimize performance by minimizing scheduling overhead relative to computation time. Ideally, tasks should execute for 10,000 to 100,000 CPU cycles—equivalent to more than 1 microsecond on typical hardware—to ensure the benefits of parallelism outweigh the costs of task creation and stealing. Developers can control grain size through partitioners in algorithms like parallel_for, which adjust chunk sizes (e.g., 100 iterations) based on workload characteristics, preventing excessive fine-grained tasks that could degrade scalability. To support scalable execution across these models, oneTBB promotes non-blocking synchronization mechanisms, prioritizing atomic operations and memory fences over locks to reduce contention and contention-induced serialization. Standard C++ atomics (std::atomic) enable lock-free updates for shared variables, such as counters or flags, while avoiding issues like convoying (where threads queue behind a lock holder) and deadlocks. For data structures prone to false sharing, tools like tbb::cache_aligned_allocator ensure proper alignment, and fences provide ordering guarantees without full barriers. This emphasis on non-blocking techniques aligns with the library's goal of high-throughput parallelism on multicore systems.

Scalability and Portability

Threading Building Blocks (TBB), now known as oneTBB, employs a work-stealing task scheduler to achieve scalability across multi-core systems. This scheduler dynamically balances workloads by allowing idle threads to steal tasks from busy threads' local queues, ensuring efficient resource utilization without centralized coordination. The approach adapts automatically to varying numbers of CPU cores, from single-core setups to systems with hundreds or thousands of cores, such as those exceeding threads in large-scale servers. It also handles heterogeneous workloads effectively, where tasks vary in computational intensity, by prioritizing and distributing them based on availability and priority schemes. Portability is a core design goal of oneTBB, facilitated by its reliance on standard C++ templates and a lightweight runtime. The library provides header files for most functionality, with a minimal binary runtime library (libtbb) that links dynamically on supported platforms, reducing deployment overhead. It supports major operating systems including Windows 10/11 and Server editions, various Linux distributions (such as Ubuntu, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Amazon Linux), and macOS. Additionally, the open-source distribution enables compilation on ARM architectures, including Apple M-series processors, through source builds that leverage standard C++ compilers. To address challenges in large-scale systems, oneTBB incorporates NUMA awareness through policies that optimize task placement. These policies bind tasks to specific cores or NUMA domains, minimizing remote memory access latency by respecting the system's , which is detected via the . Users can configure via environment variables like TBB_AFFINITY or task_arena constraints, such as specifying NUMA domains for execution, which enhances on multi-socket servers. Recent releases have improved NUMA support for CPUs, ensuring better distribution across nodes. Performance evaluations demonstrate oneTBB's potential for near-linear speedup on multi-core processors for balanced workloads, as seen in applications achieving high gigaflop rates with minimal deviation from ideal . The scheduler's design keeps overhead low by reducing contention and costs, though fine-grained tasks may introduce some scheduling that can be mitigated through chunking. In practice, this enables efficient parallelism on systems up to dozens of cores with overhead typically dominated by application-specific factors rather than the runtime itself.

Core Components

Task Scheduler

The task scheduler in oneAPI Threading Building Blocks (oneTBB) serves as the runtime engine responsible for managing and executing parallel tasks across multiple threads, enabling efficient load balancing and on multicore systems. It operates by creating a of worker threads that process tasks non-preemptively, meaning once a thread begins executing a task, it completes it before moving to another. This design prioritizes computational intensity over I/O-bound operations, avoiding blocking calls within tasks to prevent thread . Central to the scheduler's architecture is the arena model, which logically groups worker threads into isolated arenas to manage and prevent interference between concurrent workloads. Each arena, represented by the task_arena class, maintains its own , allowing users to specify parameters such as the maximum number of threads, to specific cores or NUMA nodes, and concurrency limits for fine-grained control. For instance, a custom task_arena can restrict execution to a subset of threads, ensuring resource isolation in multi-application environments or for on heterogeneous hardware. This model supports by enabling nested arenas, where inner arenas inherit or override settings from outer ones, facilitating modular code. Tasks in the scheduler follow a defined lifecycle, beginning with creation as lightweight objects that inherit from base classes like task or are managed via higher-level constructs such as task_group. A task is enqueued into an arena using methods like enqueue, which schedules it for execution without blocking the calling thread. Upon dequeuing, a worker thread invokes the task's execute method, running the associated computation to completion. During execution, a task may spawn child tasks via spawn, which are added to the local deque for potential parallel processing; the parent task then waits for children to finish before concluding, using reference counting to track dependencies. This lifecycle ensures tasks remain small and efficient, with the scheduler handling allocation and deallocation to minimize overhead. Load balancing is achieved through a work-stealing algorithm, where each worker maintains a local (deque) of tasks. Newly spawned tasks are pushed onto the front of the owner's deque in a last-in, first-out (LIFO) manner, enabling fast, lock-free local access. When a becomes , it attempts to steal tasks from the tail (back) of another 's deque, selected randomly or via heuristics, approximating a first-in, first-out () order to amortize stealing costs over larger tasks. This decentralized approach reduces contention and adapts dynamically to workload imbalances, with stolen tasks often being older and less cache-hot to minimize data locality issues. The algorithm's lock-free implementation ensures high throughput, though it may introduce temporary imbalances during steals. The scheduler provides non-preemptive support through task_group_context, which associates levels with groups of related tasks for deadline-sensitive applications. Priorities range from low (0) to high (2), set via set_priority on a task_group_context object, and propagate through nested contexts in a . When multiple tasks are ready, the scheduler dequeues higher- ones first, but does not executing tasks, relying instead on completion points for rebalancing. This is useful for prioritizing urgent computations without full preemption overhead, though it requires careful grouping to avoid .

Parallel Algorithms

Threading Building Blocks (TBB), now known as oneAPI Threading Building Blocks (oneTBB), provides a set of high-level parallel algorithms that abstract common parallel patterns, enabling developers to express without managing low-level thread details. These algorithms leverage the underlying task scheduler to distribute work across available threads, ensuring on multi-core systems. The core parallel algorithms include parallel_for for independent iterations, parallel_reduce for aggregations, parallel_scan for prefix computations, and parallel_pipeline for staged processing, each designed for specific workload characteristics like or . The parallel_for algorithm executes a loop body over a range of indices in parallel, automatically partitioning the range into subranges for concurrent by multiple threads. It uses a concept, such as blocked_range<T>, to define the iteration , and a user-provided as the body that operates on each subrange. For example, the syntax is parallel_for(const blocked_range<T>& range, Body body);, where the body implements operator()(const Range& r) const to process elements from r.begin() to r.end(). This approach supports nested parallelism and load balancing through recursive subdivision, making it suitable for compute-bound tasks like .
cpp
#include <oneapi/tbb.h>
#include <oneapi/tbb/blocked_range.h>

using namespace oneapi::tbb;

class ParallelBody {
public:
    void operator()(const blocked_range<int>& r) const {
        for (int i = r.begin(); i != r.end(); ++i) {
            // Process element i
        }
    }
};

int main() {
    parallel_for(blocked_range<int>(0, 100), ParallelBody());
    return 0;
}
In this example, the range 0 to 99 is partitioned and processed concurrently. The parallel_reduce algorithm combines with , applying a to a while aggregating results using an associative operation, such as . It splits the range recursively, executes the on subranges in parallel, and merges partial results via a join method in the object. The syntax is OutputType parallel_reduce(const Range& range, const Body& body);, where the body must support splitting, execution, and joining for . This is ideal for operations like computing sums, where the uses an operator like +. For instance, to sum an :
cpp
#include <oneapi/tbb.h>
#include <oneapi/tbb/blocked_range.h>

using [namespace](/page/Namespace) oneapi::tbb;

[class](/page/Class) SumBody {
    [float](/page/Float)* const my_array;
    [float](/page/Float) value;
public:
    SumBody([float](/page/Float)* arr) : my_array(arr), value(0.0f) {}
    SumBody(SumBody& other, split) : my_array(other.my_array), value(0.0f) {}
    void operator()(const blocked_range<size_t>& r) {
        for (size_t i = r.begin(); i != r.end(); ++i) {
            value += my_array[i];
        }
    }
    void join(const SumBody& other) { value += other.value; }
};

float sum_array(float* array, size_t n) {
    SumBody body(array);
    parallel_reduce(blocked_range<size_t>(0, n), body);
    return body.value;
}
The splitting constructor enables parallel execution, and join combines results associatively. The parallel_scan algorithm computes a parallel prefix (scan) over a range, applying an associative operation cumulatively while preserving order, useful for tasks with data dependencies like or generation. It employs a two-phase approach: an upward pre-scan to compute partial sums and a downward final to distribute them, potentially invoking the operation up to twice as many times as a version for parallelism. Syntax includes imperative form void parallel_scan(const [Range](/page/Range)& range, [Body](/page/Body) body); or functional Value parallel_scan(const [Range](/page/Range)& range, const [Value](/page/Value)& identity, const [Scan](/page/Scan)& scan, const Combine& combine);, with the body or functors defining the scan logic. This supports both inclusive and exclusive scans efficiently on multi-core processors. In the imperative form, the body class implements methods for pre-scan, final-scan, and reverse-join to handle the up-down sweeps. The parallel_pipeline algorithm processes a through a series of stages (s) in parallel, optimizing for I/O-bound or pipelined workloads by overlapping execution across threads. It takes a maximum number of tokens (buffer size) and a variadic list of filter objects, each specifying or modes like filter::serial_in_order for ordered input or filter::parallel for independent items. The syntax is void parallel_pipeline(int max_number_of_live_tokens, const Filter& filter1, const Filter& filter2, ...);, where filters process input tokens and produce output tokens sequentially or concurrently. This enables high throughput for tasks like image processing pipelines, with automatic scheduling to minimize stalls. An example for text processing:
cpp
#include <oneapi/tbb.h>

using namespace oneapi::tbb;
using namespace oneapi::tbb::filter;

void process_text(const char* input_file, const char* output_file) {
    parallel_pipeline(4,
        make_filter<void, std::string>(filter::serial_in_order, 
            [input_file](flow_control& fc) -> std::string {
                // Reader: Read lines from input_file
                // Return next line or stop if EOF
                return read_line(input_file, fc);  // Placeholder for actual read
            }),
        make_filter<std::string, std::string>(filter::parallel, 
            [](const std::string& line) -> std::string {
                // Transform: Process each line in parallel
                return transform_line(line);  // Placeholder for actual transform
            }),
        make_filter<std::string, void>(filter::serial_in_order, 
            [output_file](const std::string& line) {
                // Writer: Write processed line to output_file
                write_line(output_file, line);  // Placeholder for actual write
            })
    );
}
Here, the reader and writer stages handle I/O serially in order, while the transform stage processes tokens in parallel.

Advanced Features

Data Parallelism Tools

Data parallelism tools in Intel® oneAPI Threading Building Blocks (oneTBB) provide thread-safe data structures and mechanisms to enable efficient of large datasets across multiple cores, minimizing overhead for scalable performance. These tools focus on concurrent access patterns common in data-intensive applications, such as simulations, image processing, and workloads. By leveraging lock-free or fine-grained locking techniques, they reduce contention and allow developers to express data-parallel operations without explicit . Concurrent containers form the core of oneTBB's data parallelism support, offering resizable, thread-safe alternatives to standard C++ containers. The concurrent_vector is a dynamic array that supports concurrent push_back operations from multiple threads, featuring lock-free growth and resizing to handle variable-sized data efficiently. It provides operations like push_back, grow_by, size, and random access via operator[], making it ideal for scenarios where elements are appended in parallel, such as building result sets in numerical computations. The concurrent_queue implements a first-in-first-out (FIFO) structure suitable for producer-consumer patterns, with blocking and non-blocking push and pop operations; it uses lock-free designs for single-producer/single-consumer cases and scales to multi-producer/multi-consumer via internal locking. Key methods include push, pop, empty, and size, enabling reliable data exchange in streaming applications. Similarly, the concurrent_hash_map is a hash-based associative container that supports lock-free lookups and scalable concurrent insertions, erasures, and searches, with operations like insert, find, erase, and count optimized for high-concurrency environments such as caching or indexing large datasets. To address memory allocation bottlenecks in parallel code, oneTBB includes scalable allocators that reduce lock contention during frequent allocations. The scalable_malloc function provides a for standard malloc, distributing allocation requests across threads using per-thread caches and arena-based management to minimize global . It pairs with scalable_free and supports commands for mode switching (e.g., to disable scalability for sequential phases), improving throughput in memory-bound parallel workloads by up to several times compared to system allocators in multi-threaded scenarios. SIMD integration in oneTBB enhances by combining task-based execution with vectorized computation. The parallel_for algorithm processes independent iterations over ranges, and when used with loops amenable to auto-vectorization (e.g., via ® oneAPI DPC++/C++ pragmas like #pragma ivdep), it allows SIMD instructions to accelerate inner-loop operations on CPUs supporting or similar extensions. For GPU offload, oneTBB integrates with oneAPI's programming model, enabling data-parallel kernels to be dispatched to accelerators while using oneTBB for host-side task orchestration.

Flow Graphs and Pipelines

The flow graph interface in oneAPI Threading Building Blocks (oneTBB) enables developers to model complex dependencies and asynchronous data flows using a -based parallelism model. Nodes represent computational units, connected via directed edges that define paths, allowing for dynamic execution where tasks activate only upon receiving inputs. The is constructed as a composable template using the tbb::flow::[graph](/page/Graph) , which manages all associated tasks and ensures thread-safe operations. For instance, a basic can be built by declaring graph g;, adding nodes such as an input input_node<message_type> src(g); and a function function_node<message_type, output_type> func(g, unlimited, user_function);, then connecting them with g.add_edge(src, func);, and finally executing via g.wait_for_all(); to synchronize completion. Nodes in the flow graph fall into three primary categories: function nodes, source nodes, and sink nodes. A function node processes incoming messages by invoking a user-defined , supporting configurable concurrency limits (e.g., unlimited for maximum parallelism or a fixed number for controlled execution). Source nodes generate and emit messages into the graph, often in a loop until exhaustion, while sink nodes receive and consume messages without producing outputs, suitable for final aggregation or I/O operations. Messages, which can be any copyable type, flow asynchronously between connected nodes, with the underlying task scheduler handling parallelism based on availability of inputs and threads. Pipelines extend the flow graph model for multi-stage, linear workflows, emphasizing sequential yet parallelizable stages with token-based progression. In oneTBB, pipelines are implemented via tbb::parallel_pipeline, where stages are defined as objects that pass (data items) downstream. Each processes one token at a time, with concurrency limited by the number of allocated (e.g., parallel_pipeline(4, input_filter) >> serial_process_filter >> output_filter);), preventing overload in memory-intensive stages. Examples include an input_filter for loading data from external sources and a serial_process_filter for non-parallelizable computations, ensuring tokens advance only after prior stages complete. This design supports assembly-line efficiency, where multiple enable overlapping execution across stages. Specialized nodes like join and split facilitate stream manipulation within graphs. A join_node merges inputs from multiple predecessor nodes into a single std::tuple, buffering messages according to policies such as reserving, queueing, or key-matching until all required elements arrive, then broadcasting the tuple to successors. Conversely, a split_node receives a tuple and broadcasts each element to a dedicated output port, enabling to parallel branches without altering the . These nodes support irregular topologies by handling variable-rate inputs and outputs. Flow graphs and pipelines are best suited for irregular, dependency-driven workloads, such as or event-driven simulations, where execution order depends on data availability rather than uniform partitioning. They are less optimal for pure data parallelism scenarios, like loops, due to the overhead of dynamic scheduling and .

Usage and Integration

Programming Interfaces

To integrate Intel® oneAPI Threading Building Blocks (oneTBB) into C++ projects, developers include the primary header <oneapi/tbb.h>, which provides access to core parallelism constructs such as algorithms, containers, and task scheduling interfaces. This header encapsulates the library's template-based , enabling thread-safe parallel execution without direct thread management. Post-2019 releases under the oneAPI umbrella standardize this inclusion, replacing earlier TBB-specific headers for broader ecosystem compatibility. Compilation requires a C++ compiler supporting at least the standard, though is recommended for advanced features like parallel STL extensions. Typical flags include -std=c++17 and linking against the TBB via -ltbb on systems, resulting in linkage to libtbb.so or equivalent dynamic . For build systems like , integration is facilitated by find_package(TBB REQUIRED) followed by target_link_libraries(your_target TBB::tbb), which automatically handles include paths and dependencies. Environment setup involves configuring execution contexts for controlled parallelism. The task_arena class allows explicit thread limits, such as task_arena(4) to cap concurrency at four threads, isolating workloads and preventing oversubscription on multicore systems. is managed through task_group, which supports try-catch propagation across parallel tasks, ensuring that exceptions thrown in one task unwind the group execution safely. oneTBB maintains full compatibility with the (STL), allowing seamless use of standard containers like std::vector within parallel algorithms without requiring custom allocators in most cases. This design supports incremental parallelism, where developers can parallelize existing serial code by wrapping loops or reductions in oneTBB primitives, avoiding modifications to the underlying sequential logic.

Basic Code Examples

Threading Building Blocks (TBB), now known as oneTBB, provides high-level parallel algorithms that simplify concurrent programming in C++. The parallel_for algorithm is a core component that divides a into subranges and processes them concurrently across multiple threads, enabling efficient parallel iteration over data structures like . A basic example of parallel_for computes the squares of indices and stores them in a . This demonstrates how functions can be used to define the body of the loop, where each thread handles a portion of the range without explicit .
cpp
#include <oneapi/tbb.h>
#include <vector>

int main() {
    const size_t N = 1000;
    std::vector<int> vec(N);
    oneapi::tbb::parallel_for(size_t(0), N, [&](size_t i) {
        vec[i] = i * i;
    });
    return 0;
}
This code initializes a of size N and fills it with squared values in parallel, leveraging the task scheduler to balance workload across available cores. The parallel_reduce algorithm performs a , such as , over a by recursively splitting the , computing partial results in parallel, and combining them using a specified . It is particularly useful for associative like summing elements. For instance, to the elements of an , the following uses parallel_reduce with a for the partial and std::plus for combining results:
cpp
#include <oneapi/tbb.h>
#include <numeric>

float ParallelSum(float array[], size_t n) {
    return oneapi::tbb::parallel_reduce(
        oneapi::tbb::blocked_range<size_t>(0, n),
        0.0f,
        [&](oneapi::tbb::blocked_range<size_t> r, float running_sum) {
            for (size_t i = r.begin(); i != r.end(); ++i) {
                running_sum += array[i];
            }
            return running_sum;
        },
        std::plus<float>()
    );
}
This approach ensures thread-safe accumulation without manual locking, as the reduction handles splitting and merging internally. oneTBB's concurrent_vector is a thread-safe that supports concurrent modifications, such as push_back, from multiple threads without data races, making it suitable for parallel . Unlike standard vectors, it uses internal to allow safe growth during parallel operations. An example integrates concurrent_vector with parallel_for to append values concurrently:
cpp
#include <oneapi/tbb.h>

int main() {
    oneapi::tbb::concurrent_vector<int> cv;
    oneapi::tbb::parallel_for(0, 100, [&](int i) {
        cv.push_back(i * 2);  // Safe concurrent push_back
    });
    // cv now contains even numbers from 0 to 198
    return 0;
}
This code populates the vector with doubled indices in parallel, relying on the container's operations for consistency. Error handling in oneTBB often involves the task_group class, which allows grouping tasks and propagating exceptions via try-catch blocks, while cancellation provides a mechanism to terminate ongoing tasks gracefully. Exceptions thrown within tasks are captured and rethrown after all tasks complete, ensuring cleanup. A simple demonstration uses task_group with try-catch for exception handling and cancellation:
cpp
#include <oneapi/tbb.h>
#include <stdexcept>
#include <iostream>

int main() {
    oneapi::tbb::task_group g;
    try {
        g.run([&] {
            // Simulate work that may fail
            if (true) {  // Condition for error
                throw std::runtime_error("Task failed");
            }
        });
        g.wait();
    } catch (const std::exception& e) {
        std::cout << "Caught: " << e.what() << std::endl;
        // Optionally cancel other tasks
        g.cancel();
    }
    return 0;
}
This structure catches exceptions from the task, prints the error, and can invoke cancellation to stop remaining tasks, promoting robust parallel execution.

Performance and Optimization

Scalability Analysis

The scalability of oneAPI (oneTBB) is fundamentally influenced by , which posits that the maximum speedup achievable in parallel execution is limited by the fraction of the program that remains serial, expressed as speedup = 1 / [(1 - P) + P/N], where P is the parallelizable portion and N is the number of processors. In oneTBB, this limitation is mitigated through the use of fine-grained tasks managed by its work-stealing scheduler, which decomposes workloads into small, independent units to maximize P by minimizing serial overhead from task creation and synchronization. For instance, in workloads like , oneTBB achieves a parallel fraction approaching 99%, enabling near-linear scaling on multi-core systems by dynamically balancing load across threads. Empirical benchmarks demonstrate oneTBB's strong for balanced workloads on multi-core processors. In tasks using fine-grained decomposition, oneTBB delivers of up to 28.7 times on 32-core systems compared to static thread assignments, with reaching 70-90% for compute-bound operations due to effective load balancing. Similar results appear in benchmarks like BlackScholes, where oneTBB attains 19-fold on 16 cores, though can drop if task is not optimized, highlighting the importance of workload balance for sustained performance. Key bottlenecks in oneTBB's arise from hardware constraints on large-scale systems. Memory bandwidth limitations become prominent in data-intensive tasks, as multiple cores compete for shared lines, leading to reduced throughput beyond 16-32 cores in bandwidth-bound scenarios. exacerbates this when threads inadvertently modify data in the same line, causing unnecessary cache invalidations and coherence traffic; oneTBB's scalable allocator helps mitigate this by aligning allocations to avoid such overlaps. On NUMA architectures, remote memory across nodes introduces penalties, potentially halving effective if threads access non-local allocations, necessitating affinity-aware task pinning for optimal . As of 2025, oneTBB has incorporated enhancements to further improve scalability, particularly in primitives. The adaptive mutex implementation spins briefly before blocking on contended locks, reducing context-switch overhead and lock contention in high-concurrency scenarios compared to traditional blocking mutexes. Recent updates in version 2022.3, maintained through 2025, optimize and queuing_mutex with test-and-test-and-set operations, enhancing overall on multi-core and NUMA systems by lowering costs in algorithms.

Best Practices

Effective use of Intel oneAPI Threading Building Blocks (oneTBB) requires careful attention to task granularity to balance parallelism overhead against load imbalance. For parallel algorithms like parallel_for, tuning the grain size ensures that subranges processed by individual tasks execute in approximately 100,000 clock cycles, which typically corresponds to 30-100 microseconds on modern processors, avoiding excessive scheduling costs while maintaining scalability. This can be achieved by specifying a grainsize parameter in the range object; for instance, if each iteration takes about 100 cycles, a grainsize of 1000 yields suitable chunks. For irregular workloads where iteration times vary significantly, the simple_partitioner is recommended to enforce fixed chunk sizes based on the specified grainsize, preventing over-partitioning and promoting predictable load distribution, though experimentation may be needed to optimize performance. Oversubscription, where the number of active threads exceeds available physical cores, can degrade performance by increasing context-switching overhead; oneTBB's scheduler mitigates this by default through affinity settings that map one logical thread per physical core. To explicitly control concurrency and avoid oversubscription in custom scenarios, construct a task_arena with max_concurrency set to the number of physical cores, ensuring worker threads do not exceed hardware limits. Timing measurements using tbb::tick_count facilitate monitoring of task durations and overall execution, allowing developers to verify that grain sizes align with target latencies and detect bottlenecks from excessive thread creation. In hybrid applications combining oneTBB with other parallelism models like , leverage oneTBB for fine-grained, dynamic tasking in irregular sections while reserving for coarse-grained, static loop parallelism to minimize runtime conflicts and optimize resource utilization. Proper nesting and affinity coordination are essential, as both libraries can share the same when configured compatibly, enhancing without introducing oversubscription. For debugging, enable oneTBB's debug features by defining TBB_USE_DEBUG during compilation, which activates additional assertions and checks for issues like invalid task dependencies that may lead to deadlocks. Setting TBB_STRICT=ON treats warnings as errors, enforcing stricter compliance and aiding early detection of potential deadlocks in task graphs. For performance profiling, integrate VTune Profiler, which supports oneTBB-specific analyses such as scheduling overhead and thread utilization, providing insights into concurrency efficiency and hotspots. These practices, when applied judiciously, help achieve robust, high-performance parallel code with oneTBB.

Licensing and Availability

Open Source Transition

Threading Building Blocks (TBB) was released by in 2007 as under the GPL v2 license with runtime exception, allowing free access to the source code and limited redistribution while integrated into Intel's commercial products such as . In 2017, TBB adopted the Apache License 2.0 to facilitate broader compatibility. In 2020, it was rebranded as oneAPI Threading Building Blocks (oneTBB) as part of the oneAPI initiative and hosted on under Intel's oneapi-src organization. In 2023, governance of oneTBB shifted to the UXL Foundation, a Foundation-hosted organization dedicated to unified acceleration standards, with the repository migrating to uxlfoundation/oneTBB to promote multi-vendor collaboration and neutral stewardship. This move aligned oneTBB with the oneAPI specification under open governance, ensuring long-term sustainability beyond Intel's sole control. The open-source model has fostered significant community involvement, with 124 contributors participating via as of November 2025, submitting pull requests that undergo before integration. Periodic releases, with the latest being version 2022.3.0 as of October 2025, incorporate these contributions, including bug fixes and performance improvements, while maintaining where possible. This evolution has driven broader adoption among developers for parallel programming tasks, as evidenced by increased usage in open-source projects and integrations with modern C++ ecosystems. Community efforts have enabled extensions like enhanced support for coroutines, allowing seamless integration with asynchronous programming patterns in complex applications.

Supported Platforms and Ecosystems

Threading Building Blocks, now known as oneTBB, supports a range of modern operating systems to enable parallel programming across diverse environments. On Windows, it is compatible with , , , , and Windows Server 2025. Linux distributions with full support include Amazon Linux 2023, 8, 9, and 10, 15 SP4 through SP7, 22.04, 24.04, and 25.04, 11 and 12, 41 and 42, and 9; additionally, 2 (WSL 2) supports and SLES configurations. For macOS, compatibility extends to versions 13.x, 14.x, and 15.x. Compiler support ensures broad integration into C++ development workflows. oneTBB works with the oneAPI DPC++/C++ , Microsoft Visual C++ 14.2 (from Visual Studio 2019) and 14.3 (from Visual Studio 2022) on Windows, (GCC) versions 8.x through 15.x paired with glibc 2.28 through 2.41 on , and versions 7.x through 20.x across supported platforms. These compilers allow developers to leverage oneTBB's task-based parallelism without platform-specific modifications. Hardware compatibility targets Intel processor families including , , , and , while also supporting non-Intel processors that adhere to compatible x86 architectures. As of 2025, oneTBB provides full support for advanced instruction set architectures such as on capable hardware, enabling optimized vectorized operations in parallel tasks. Containerized builds are facilitated through images, allowing seamless deployment in cloud and environments. Within ecosystems, oneTBB integrates as a core component of the Intel oneAPI toolkit, supporting and Data Parallel C++ (DPC++) for on CPUs. It is utilized in (HPC) applications, such as configurations with the PETSc library for scalable scientific simulations. This compatibility extends to standard C++ libraries, promoting its use in broader software stacks for implementation.

References

  1. [1]
    Intel® oneAPI Threading Building Blocks (oneTBB)
    Intel oneAPI Threading Building Blocks (oneTBB) is a widely used C++ library for task-based, shared memory parallel programming on the host.
  2. [2]
    oneAPI Threading Building Blocks (oneTBB) - GitHub
    oneTBB is a flexible C++ library that simplifies the work of adding parallelism to complex applications, even if you are not a threading expert.
  3. [3]
    Get Started with oneAPI Threading Building Blocks (oneTBB) - Intel
    oneTBB enables you to simplify parallel programming by breaking computation into parallel running tasks. Within a single process, parallelism is carried out ...
  4. [4]
    A Decade of Multicore Parallelism with Intel TBB
    Jan 12, 2017 · Ten years of Intel TBB has revolutionized multicore parallelization; it was designed for non-expert C++ programmers as an easy way to ...
  5. [5]
    Intel® oneAPI Threading Building Blocks
    Intel® oneAPI Threading Building Blocks (oneTBB)† is a flexible performance library that simplifies the work of adding parallelism to complex applications ...
  6. [6]
    Intel® oneAPI Threading Building Blocks (oneTBB)
    This document contains information about Intel® oneAPI Threading Building Blocks (oneTBB). oneTBB is a flexible performance library that can be found in the ...
  7. [7]
    oneTBB Benefits
    ### Summary of oneTBB Benefits, Goals, and Key Features
  8. [8]
    oneTBB: A Modern C++ Library for Task-based Parallelism on CPUs
    Aug 2, 2023 · oneTBB uses a more modernized C++ standard (at least C++ 11), which includes features such as deduction guides and constraints, among others.
  9. [9]
    Intel® oneAPI Threading Building Blocks Documentation
    Intel® oneAPI Threading Building Blocks Documentation. Overview · Download · Documentation & Resources.
  10. [10]
    Intel Threading Building Blocks [Book] - O'Reilly
    With it, you'll learn how to use Intel Threading Building Blocks (TBB) effectively for parallel programming -- without having to be a threading expert.
  11. [11]
    Intel® Threading Building Blocks Release Notes and New Features
    Jul 16, 2020 · ... Intel® TBB 2018 Initial Release. Initial Release. Release Notes. One of the best known C++ threading libraries Intel® Threading Building ...
  12. [12]
    Migrating from Threading Building Blocks (TBB) - Intel
    Mar 31, 2023 · While oneTBB is mostly source compatible with TBB, some interfaces were deprecated in TBB and redesigned or removed in oneTBB.
  13. [13]
    Intel TBB goes Open Source | POCO C++ Libraries Blog
    Jul 25, 2007 · Intel has open sourced (GPL2) the Threading Building Blocks – a library for for multi-core parallelism support.Missing: transition history
  14. [14]
    UXL Foundation: Unified Acceleration
    The initial contributions to the foundation will bring an existing open standards based platform with open governance. Our ultimate aim is to foster a multi ...Steering Members · Contributing Members · Steering Committee MembersMissing: 2023 | Show results with:2023
  15. [15]
    Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes
    Oct 27, 2025 · Release notes for Intel® oneAPI Threading Building Blocks (oneTBB), included with the Intel® oneAPI Base Toolkit.
  16. [16]
    Releases · uxlfoundation/oneTBB - GitHub
    Oct 31, 2024 · The oneTBB repository migrated to the new UXL Foundation organization. See our Release Notes to learn about issues fixed in this release ...Missing: UXLL | Show results with:UXLL
  17. [17]
    [PDF] Intel® Journal
    Nov 15, 2007 · [1] James Reinders, Intel Threading Building Blocks,. O'Reilly Media, Inc, Sebastopol, CA, 2007. [2] Robert D. Blumofe and Charles E ...
  18. [18]
    [PDF] Today's TBB: C++ Parallel Programming with Threading Building ...
    ... TBB algorithm will definitely execute concurrently – maybe TBB will choose to use a single thread to execute all of the tasks. We shouldn't, for example, have ...
  19. [19]
  20. [20]
  21. [21]
    Task-Based Programming - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...Missing: abstractions size
  22. [22]
    Controlling Chunking - Intel
    Chunking is controlled by a partitioner and a grainsize. To gain the most control over chunking, you specify both.Missing: abstractions | Show results with:abstractions
  23. [23]
    How Task Scheduler Works - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  24. [24]
    Intel® oneAPI Threading Building Blocks System Requirements
    Software - Supported Operating Systems · Amazon* Linux 2023 · Red Hat* Enterprise Linux* 8, 9, 10 · SuSE* Linux* Enterprise Server 15 SP4, SP5, SP6 SP7 · Ubuntu* ...
  25. [25]
    Guiding Task Scheduler Execution - Intel
    Interfaces from tbb::info namespace respect the process affinity mask. For instance, if the process affinity mask excludes execution on some of the NUMA ...
  26. [26]
    Scheduling Overhead in Intel® Threading Building Blocks (Intel ...
    Download oneTBB from the Intel® oneAPI Base Toolkit. Scheduling overhead is a typical problem of code threading when fine-grain chunks of work between threads ...
  27. [27]
    Task Scheduler — oneAPI Specification 1.3-rev-1 documentation
    The task scheduler is intended for parallelizing computationally intensive work. Because task objects are not scheduled preemptively, they should generally ...
  28. [28]
    task_arena — oneAPI Specification 1.3-rev-1 documentation
    The tasks spawned or enqueued into ... There is no guarantee that tasks enqueued into an arena execute concurrently with respect to any other tasks there.
  29. [29]
    Migrating from low-level task API - Intel
    TBB task::execute() method can return a pointer to a task that can be executed next by the current thread. This might reduce scheduling overheads compared to ...
  30. [30]
    [PDF] Intel® ThreadingBuilding Blocks Overview - CERN Indico
    Work Stealing Task Scheduler Implementation. The simple version: Each thread has a deque of tasks. ▫ Newly created tasks are pushed onto the front. ▫ Other ...
  31. [31]
    Non-Preemptive Priorities - Intel
    They enable the oneTBB scheduler to choose when to do a work item, not which work item to do. Other priority schemes can be implemented by changing the ...Missing: task_group_context | Show results with:task_group_context
  32. [32]
    task_group_context — oneAPI Specification 1.3-rev-1 documentation
    task_group_context represents a set of properties used by task scheduler for execution of the associated tasks. Each task is associated with only one ...
  33. [33]
    oneTBB Developer Guide - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  34. [34]
    parallel_for - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  35. [35]
    parallel_reduce - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  36. [36]
    parallel_scan — oneAPI Specification 1.1-rev-1 documentation
    The function template parallel_scan computes a parallel prefix, also known as a parallel scan. This computation is an advanced concept in parallel computing.
  37. [37]
    Working on the Assembly Line: parallel_pipeline - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  38. [38]
    Containers - Intel
    Intel oneAPI Threading Building Blocks (oneTBB) provides highly concurrent container classes. These containers can be used with raw Windows OS or Linux OS ...
  39. [39]
    concurrent_vector - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...Missing: vectorization hints
  40. [40]
    Scalable Memory Allocator - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  41. [41]
    parallel_for - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  42. [42]
    Data Parallelism in C++ using SYCL* - Intel
    The call to parallel_for() takes two parameters. One parameter is a lambda function, and the other is the range object “ r ” that represents the number of ...
  43. [43]
    Exploiting Heterogeneous Computing with Intel® oneAPI Threading ...
    This session will discuss how to utilize Intel® oneAPI Threading Building Blocks (oneTBB) to balance workloads across heterogenous compute resources. As XPU ...
  44. [44]
    Flow Graph — oneAPI Specification 1.2-rev-1 documentation
    In addition to loop parallelism, the oneAPI Threading Building Blocks (oneTBB) library also supports graph parallelism. With this feature, highly scalable and ...
  45. [45]
    Flow Graph Basics: Nodes - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  46. [46]
    Flow Graph Basics: Edges
    ### Summary of Edges in Flow Graphs
  47. [47]
    join_node - oneAPI Specification - UXL Foundation
    A node that creates a tuple from a set of messages received at its input ports and broadcasts the tuple to all of its successors. // Defined in header <oneapi/ ...
  48. [48]
    split_node - oneAPI Specification - UXL Foundation
    A split_node sends each element of the incoming std::tuple to the output port that matches the element index in the incoming tuple. // Defined in header <oneapi ...
  49. [49]
    Data Flow Graph - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  50. [50]
    Exceptions and Cancellation - Intel
    Oct 31, 2024 · oneAPI Threading Building Blocks (oneTBB) supports exceptions and cancellation. When code inside an oneTBB algorithm throws an exception, the following steps ...
  51. [51]
    concurrent_vector - Intel
    oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, Developer Guide, ...
  52. [52]
    Use Amdahl's Law and Measure the Program - Intel
    Dec 16, 2022 · In the context of parallel programming, Gene Amdahl formalized a rule called Amdahl's Law, which states that the speed-up that is possible from ...
  53. [53]
    [PDF] Scalable Parallelism with Intel® Threading Building Blocks
    Amdahl's law. Amdahl's law predicts the estimated speedup that can be achieved by converting a serial application to a parallel application. This first step.Missing: oneTBB | Show results with:oneTBB
  54. [54]
    [PDF] Characterizing and Improving the Performance of Intel Threading ...
    The Intel Threading Building Blocks (TBB) library has been designed to create portable, parallel C++ code. Inspired by previous parallel runtime systems such as ...
  55. [55]
    [PDF] Intel® oneAPI Threading Building Blocks Developer Guide and API ...
    Mar 1, 2021 · They enable the oneTBB scheduler to choose when to do a work item, not which work item to do. Other priority schemes can be implemented by ...
  56. [56]
    Memory Allocation — oneTBB documentation
    oneAPI Threading Building Blocks (oneTBB) provides several memory allocator templates that are similar to the STL template class std::allocator.
  57. [57]
    mutex — oneAPI Specification 1.4-provisional-rev-1 documentation
    A mutex is a class that models Mutex requirement using an adaptive approach, it guarantees that the thread that cannot acquire the lock spins before blocking.Missing: 2025 contention
  58. [58]
    Partitioner Summary - Intel
    Mar 31, 2023 · Intel® oneAPI Threading Building Blocks Developer Guide and API Reference ... Chunksize bounded by grain size. g/2 ≤ chunksize ≤ g.
  59. [59]
    Timing - Intel
    The class tick_count in oneAPI Threading Building Blocks (oneTBB) provides a simple interface for measuring wall clock time. A tick_count value obtained from ...
  60. [60]
    Mixing two runtimes - Intel
    Mar 31, 2023 · oneTBB is a library that supports scalable parallel programming using standard ISO C++ code. Documentation includes Get Started Guide, ...
  61. [61]
    Avoiding Conflicts in the Execution Environment - Intel
    If your program is parallelized by means other than Intel® OpenMP* run-time library (RTL) and Intel® Threading Building Blocks (oneTBB) RTL, several calls ...
  62. [62]
    Enabling Debugging Features - oneAPI Specification
    The following macros control certain debugging features. In general, it is useful to compile with these features on for development code, and off for production ...
  63. [63]
    Intel® Threading Building Blocks Code Analysis
    Flagged oneTBB functions might mean that the application spends CPU time in the oneTBB runtime because of parallel inefficiencies like scheduling overhead or ...
  64. [64]
    CMake: make TBB_STRICT=OFF disabling global "warning as error"
    Aug 31, 2020 · Hmm, actually currently oneTBB CMake in master can remove some globally set /WX and -Werror flags if TBB_STRICT is OFF. I'd like to investigate ...
  65. [65]
  66. [66]
    Unified Acceleration Foundation Forms to Drive Open Accelerated ...
    The Unified Acceleration Foundation promotes an open, unified standard accelerator programming model that delivers cross-platform performance and productivity.Missing: oneTBB | Show results with:oneTBB
  67. [67]
    Contributors to uxlfoundation/oneTBB
    - **Number of Contributors**: Data not explicitly provided in the content.
  68. [68]
    Getting Started “Hello, oneTBB!” - SpringerLink
    Apr 3, 2025 · Today's TBB is one of the best ways to fully exploit parallelism in standard C++. The TBB project has held this status for several decades, ...
  69. [69]
    mfisherman/onetbb - Docker Image
    The Docker image onetbb relies on Ubuntu 22.04 and contains oneAPI Threading Building Blocks (oneTBB) 2021.4.0. It allows you to build and run your oneTBB ...
  70. [70]
    Intel® oneAPI Threading Building Blocks (oneTBB)
    oneTBB is compiler-independent and is available on a variety of processors and operating systems. It is used by other oneAPI libraries (Intel oneAPI Math ...
  71. [71]
    [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI ...
    Oct 9, 2023 · [petsc-users] Configuration of PETSc with Intel OneAPI and Intel MPI fails. Richter, Roland Roland.Richter at empa.ch