Threading Building Blocks (TBB)
Intel Threading Building Blocks (TBB)—now commonly branded as oneTBB—is a C++ library for task-based parallel programming developed by Intel. It helps developers efficiently use multi-core processors without manually managing threads. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually. Instead the library abstracts access to the multiple processors by allowing the operations to be treated as "tasks", which are allocated to individual cores dynamically by the library's run-time engine, and by automating efficient use of the CPU cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of solutions for parallel programming aiming to decouple the programming from the particulars of the underlying machine.
Why do we use TBB?
TBB is used to:
• Simplify parallel programming
• Improve performance on multi-core CPUs
• Avoid manual thread creation and synchronization
• Build scalable applications
When should you use TBB?
Use TBB when:
• You are writing performance-critical C++ applications
• Your workload can be broken into independent tasks
• You need better scalability than simple threading
Typical use cases:
• Game engines
• Scientific computing
• Image/video processing
• Financial modeling
• High-performance computing (HPC)
Key Features
1. Task-Based Parallelism
Instead of managing threads, you define tasks: TBB schedules and balances them dynamically
2. Parallel Algorithms
Built-in parallel versions of common patterns:
tbb::parallel_for(0, N, [](int i) {
// work in parallel
});
3. Work Stealing Scheduler
Idle threads “steal” tasks from busy ones
Improves CPU utilization and scalability
4. Scalable Memory Allocator
Optimized memory management for multithreaded programs
5. Concurrent Containers
Thread-safe data structures:
• concurrent_vector
• concurrent_hash_map
Key Components
• parallel_for, parallel_reduce
• Task scheduler
• Task groups
• Concurrent containers
• Memory allocator
Advantages
• High performance and scalability
• Automatic load balancing (work stealing)
• Portable across platforms
• No need to manage threads manually
• More flexible than OpenMP in complex scenarios
Disadvantages
• Learning curve for task-based thinking
• More complex than OpenMP for simple cases
• Requires external library setup
• Debugging parallel tasks can be tricky
More details about TBB
Intel® Threading Building Blocks (Intel® TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability. Intel® Threading Building Blocks (Intel® TBB) 4.2 is a widely used, award-winning C and C++ library for creating high performance, scalable parallel applications.
TBB is a collection of components for parallel programming:
• Basic algorithms: parallel_for, parallel_reduce, parallel_scan
• Advanced algorithms: parallel_while, parallel_do, parallel_pipeline, parallel_sort
• Containers: concurrent_queue, concurrent_priority_queue, concurrent_vector, concurrent_hash_map
• Scalable memory allocation: scalable_malloc, scalable_free, scalable_realloc, scalable_calloc, scalable_allocator, cache_aligned_allocator
• Mutual exclusion: mutex, spin_mutex, queuing_mutex, spin_rw_mutex, queuing_rw_mutex, recursive_mutex
• Atomic operations: fetch_and_add, fetch_and_increment, fetch_and_decrement, compare_and_swap, fetch_and_store
• Timing: portable fine grained global time stamp
• Task Scheduler: direct access to control the creation and activation of tasks
TBB implements task stealing to balance a parallel workload across available processing cores in order to increase core utilization and therefore scaling. The TBB task stealing model is similar to the work stealing model applied in Cilk. Initially, the workload is evenly divided among the available processor cores. If one core completes its work while other cores still have a significant amount of work in their queue, TBB reassigns some of the work from one of the busy cores to the idle core. This dynamic capability decouples the programmer from the machine, allowing applications written using the library to scale to utilize the available processing cores with no changes to the source code or the executable program file.