Fact-checked by Grok 2 weeks ago

ROCm

ROCm (Radeon Open Compute) is an open-source software stack developed by Advanced Micro Devices (AMD) that enables GPU-accelerated computing for high-performance computing (HPC), artificial intelligence (AI), and heterogeneous workloads on AMD Graphics Processing Units (GPUs).^[1] It provides a comprehensive ecosystem including drivers, runtime libraries, development tools, and APIs, allowing developers to program GPUs from low-level kernels to high-level applications while supporting multiple programming models such as HIP (Heterogeneous-compute Interface for Portability), OpenCL, and OpenMP.^[1] Designed primarily for Linux and Windows operating systems, ROCm optimizes performance on AMD Instinct accelerators for data center use and extends support to AMD Radeon GPUs and Ryzen APUs for consumer and workstation applications.^[1]^[2] Originally released in 2016 with version 1.0, ROCm has evolved over nearly a decade to address the growing demands of AI and HPC, with leading enterprises and research institutions adopting it for scalable GPU computing.^[3] Key components include specialized libraries such as MIOpen for machine learning, rocBLAS for linear algebra, and RCCL for collective communications, alongside tools like the ROCm Compute Profiler for performance analysis and HIPIFY for porting CUDA code to HIP.^[1] Compilers like HIPCC and ROCm LLVM, combined with runtimes such as ROCR-Runtime, form the core architecture that ensures portability and compatibility with industry-standard frameworks.^[1] As of November 2025, the latest stable release is ROCm 7.1.0, which introduces enhancements in hardware monitoring via the AMD System Management Interface (AMD SMI), improved resiliency for AMD Instinct MI300X GPUs, and broader support for AI workloads through integrations with popular deep learning frameworks.^[4]^[5] This version builds on prior releases like ROCm 7.0 from September 2025, emphasizing developer productivity, enterprise scalability, and open innovation in GPU programming.^[6] ROCm's open-source nature, hosted on GitHub, fosters community contributions and customization, positioning it as a competitive alternative to proprietary platforms in the GPU computing landscape.^[7]

Overview

Definition and Purpose

ROCm (Radeon Open Compute) is an open-source software platform developed by AMD for GPU-accelerated computing, comprising a comprehensive stack that includes drivers, runtimes, application programming interfaces (APIs), and libraries to enable heterogeneous computing on AMD GPUs.^[3] Heterogeneous computing in this context refers to the integration of central processing units (CPUs) and graphics processing units (GPUs) to perform parallel processing tasks, allowing applications to offload compute-intensive operations from the host CPU to the GPU device for improved efficiency in data-parallel workloads.^[8] This stack supports programming from low-level kernels to high-level end-user applications, fostering an ecosystem for developers to leverage AMD hardware in diverse computational scenarios.^[9] The primary purpose of ROCm is to offer an open-source alternative to proprietary GPU computing platforms, such as NVIDIA's CUDA, by providing portability and compatibility across AMD GPUs for high-performance computing (HPC), artificial intelligence (AI), machine learning, and graphics workloads.^[3] By emphasizing open-source development, ROCm enables community contributions and reduces vendor lock-in, allowing developers to migrate code more easily between AMD and other ecosystems through tools like the Heterogeneous-compute Interface for Portability (HIP).^[10] Its design prioritizes extracting optimal performance from HPC and AI applications, including large-scale model training and inference, while maintaining compatibility with standard deep learning frameworks.^[11] Key features of ROCm include its modular architecture, which allows independent development and integration of components, and its predominantly open-source nature under permissive licenses such as MIT for most repositories, promoting widespread adoption and customization.^[12] The platform primarily targets Linux operating systems like Ubuntu for full functionality, with growing support for Windows, including ROCm components and AI framework integrations as of 2025.^[13]^[14] Furthermore, ROCm integrates seamlessly with popular frameworks such as PyTorch and TensorFlow, enabling mixed-precision training and scalable AI workflows through optimized libraries like MIOpen and RCCL.^[15]^[16]

History and Versions

ROCm originated in 2016 as an open-source software platform developed by AMD to enable GPU-accelerated computing on its Radeon GPUs, initially targeting high-performance computing (HPC) workloads on Polaris architecture hardware, such as the Radeon RX 480.^[17] The platform was first released on November 14, 2016, providing foundational support for OpenCL and introducing the Heterogeneous-compute Interface for Portability (HIP) to facilitate code portability from NVIDIA's CUDA ecosystem.^[3] Early releases emphasized integration with the Heterogeneous System Architecture (HSA) standard for unified CPU-GPU programming.^[17] Subsequent milestones included the open-sourcing of additional components, such as the OpenCL runtime in May 2017, broadening community contributions and ecosystem development.^[18] In December 2020, ROCm 4.0 introduced support for the CDNA architecture on Instinct MI100 GPUs and enhanced HIP features like cooperative groups, improving CUDA compatibility and expanding to more diverse workloads. This version also marked initial steps toward broader Radeon GPU integration, though primarily focused on professional hardware. Version progression continued with ROCm 5.0 in February 2022, which delivered improved stability through bug fixes and better driver integration, alongside preliminary support for RDNA 2 consumer GPUs like the Radeon RX 6000 series for machine learning tasks.^[19] ROCm 6.0, released in December 2023, enhanced AI capabilities with optimizations for FP8 data types in PyTorch, full support for Instinct MI300 GPUs, and expanded library compatibility for deep learning frameworks.^[20] These updates reflected growing emphasis on AI alongside HPC, with performance gains in transformer models and broader OS support including Windows previews. In September 2025, ROCm 7.0 represented a pivotal shift toward an AI-HPC hybrid ecosystem, delivering up to 3.8x performance uplifts in inference for large language models like DeepSeek compared to ROCm 6.0, full enablement of Instinct MI350 GPUs based on the CDNA 4 architecture, integration of Retrieval-Augmented Generation (RAG) tools for AI pipelines, and advanced enterprise features such as distributed inference and improved multi-GPU scaling.^[21]^[22] This release underscored AMD's commitment to open innovation, with enhanced developer tools and ecosystem partnerships to compete in AI deployments while maintaining HPC roots.^[23] ROCm 7.1.0, released on October 30, 2025, introduced enhancements in hardware monitoring via the AMD System Management Interface (AMD SMI), improved resiliency for AMD Instinct MI300X GPUs, and broader support for AI workloads through integrations with popular deep learning frameworks.^[5]

Foundations

Heterogeneous System Architecture

Heterogeneous System Architecture (HSA) is an open industry standard developed to enable seamless integration of CPUs, GPUs, and other compute devices as peer processors within a unified computing environment.^[24] It defines a programming model where heterogeneous components share a single coherent memory space, allowing applications to treat diverse hardware as a cohesive system without the traditional barriers of separate address spaces.^[24] This architecture addresses key challenges in heterogeneous computing by promoting interoperability across devices from different vendors, thereby simplifying software development and enhancing overall system efficiency.^[25] Central to HSA are several key concepts that facilitate efficient resource utilization. Unified virtual addressing provides a consistent memory view across all agents, enabling pointers to reference data regardless of the hosting device and eliminating the need for explicit data transfers between CPU and GPU memory.^[24] Fine-grained memory management allows for precise control over memory allocation and access permissions at the page level, supporting features like coherent regions with atomic operations and synchronization barriers to maintain data consistency during concurrent execution.^[24] The agent-based programming model treats each compute unit—such as a CPU core or GPU compute unit—as an independent agent capable of initiating and managing workloads, which promotes scalable parallelism by dispatching tasks to the most suitable hardware with minimal overhead.^[24] In ROCm, HSA serves as the foundational layer for device interaction and kernel execution. The platform leverages the HSA Intermediate Language (HSAIL), a portable intermediate representation for compute kernels, which allows source code written in higher-level languages to be compiled into device-agnostic bytecode before finalization for specific hardware targets.^[26] The HSA runtime, implemented in ROCm through the ROCr library, manages device enumeration, queue creation, and signal handling, providing low-level APIs for applications to dispatch kernels and synchronize operations across agents.^[26] This integration ensures that ROCm applications can interact with AMD GPUs as HSA-compliant agents, inheriting the standard's queuing and signaling protocols for robust heterogeneous execution.^[26] The adoption of HSA in ROCm yields significant benefits for heterogeneous workloads, particularly in enabling seamless collaboration between CPU and GPU without requiring explicit memory copies.^[27] By utilizing unified memory spaces, developers can allocate data accessible by both processors, reducing latency and overhead associated with traditional data movement, which is especially advantageous for data-intensive applications like machine learning and scientific simulations.^[27] Furthermore, HSA's support for scalable parallelism allows ROCm to efficiently distribute computations across multiple agents, improving throughput and power efficiency in diverse computing scenarios.^[24]

Programming Paradigms

ROCm supports the Single Instruction Multiple Threads (SIMT) execution model, which enables efficient parallel processing on GPU architectures by executing the same instruction across multiple threads simultaneously, allowing data-parallel algorithms to map onto massively parallel hardware.^[28] In this paradigm, developers launch kernels—functions that run on the GPU—as parallel tasks organized in a hierarchical structure: individual threads execute computations, grouped into thread blocks (or workgroups) that share resources, and multiple blocks form a grid for large-scale parallelism.^[28] This model draws from established GPU computing concepts but is optimized for AMD hardware, where warps—co-scheduled groups of threads—typically consist of 64 threads to align with the architecture's wavefront size, differing from the 32-thread warps in some other ecosystems.^[28] A key aspect of ROCm's heterogeneous focus is its support for asynchronous execution, which allows non-blocking operations between the host CPU and GPU devices, enabling overlap of computation, data transfer, and synchronization to maximize throughput in diverse computing environments.^[29] Stream-based parallelism further enhances this by organizing tasks into independent streams, where multiple kernels or memory operations can execute concurrently across devices without interference, facilitating efficient multi-device setups.^[30] Error handling in such configurations involves runtime checks and events to detect and recover from issues like out-of-memory conditions or device failures, ensuring robust operation in heterogeneous systems that integrate CPUs, GPUs, and other accelerators.^[31] The evolution of ROCm's programming paradigms has progressed from low-level, assembly-like interfaces that provided fine-grained control over GPU resources to higher-level abstractions that prioritize developer productivity and code portability across hardware vendors.^[32] This shift emphasizes avoiding vendor lock-in through standards-based models, such as those built on the Heterogeneous System Architecture (HSA), which unify memory and execution across CPU and GPU without explicit data copies.^[33] Early ROCm versions focused on direct hardware access for performance tuning, while recent developments introduce portable layers that abstract hardware differences, enabling seamless migration of code between AMD and compatible platforms.^[32] Users approaching ROCm programming require familiarity with foundational parallel computing concepts, including thread blocks for local collaboration and warps for efficient instruction dispatch, adapted to AMD's optimizations like larger wavefronts for better utilization of compute units.^[28] Understanding memory hierarchies is also essential: global memory offers high-capacity but higher-latency access shared across all threads, local (or group) memory provides faster shared access within thread blocks for reducing global traffic, and private memory per thread ensures isolation for scalar variables.^[34] These elements form the prerequisites for leveraging ROCm's paradigms effectively, promoting scalable and efficient GPU-accelerated applications.^[28]

Hardware Support

Professional GPUs

ROCm provides comprehensive support for AMD's Instinct MI series GPUs, which are designed for datacenter and high-performance computing (HPC) environments, particularly in artificial intelligence (AI) and large-scale simulations.^[13] The supported families include the MI300 series, such as the MI300X and MI325X based on the CDNA 3 architecture, and the MI350 series, including models like the MI350X and MI355X utilizing the advanced CDNA 4 architecture.^[35] These GPUs are optimized for high-bandwidth memory (HBM) configurations, with the MI350 series featuring up to 288 GB of HBM3E memory to handle massive datasets in AI training and inference workloads.^[36] Additionally, they incorporate specialized matrix cores for accelerated tensor operations, enabling efficient processing of deep learning models and scientific computations.^[23] Key features of ROCm on these professional GPUs include full integration of the software stack, supporting high-precision floating-point operations such as FP64 for demanding HPC applications like climate modeling and molecular dynamics.^[37] Multi-GPU scaling is facilitated through AMD Infinity Fabric technology, which provides high-speed, low-latency interconnects between GPUs, allowing seamless data sharing and load balancing across multiple accelerators in a single node or cluster.^[38] This enables configurations like eight-GPU systems with coherent memory access, enhancing scalability for distributed AI training.^[38] In 2025, ROCm 7.0 introduced full enablement for the MI350 series, marking a significant advancement in AI infrastructure support.^[35] Released in September 2025, this version delivers up to 3.5x faster inference performance compared to ROCm 6.0 on models like Llama 3.1 and DeepSeek R1, achieved through optimizations in inference engines such as vLLM and SGLang.^[23]^[39] ROCm 7.1, released in October 2025, builds on these advancements with improved resiliency for AMD Instinct MI300X GPUs and enhancements in hardware monitoring.^[5] ROCm deployment on Instinct GPUs is limited to enterprise Linux distributions, including Ubuntu 24.04, Red Hat Enterprise Linux 9, and SUSE Linux Enterprise Server 15, to ensure stability in production environments.^[40] It does not support interoperability with consumer graphics cards, focusing exclusively on compute-oriented datacenter hardware.^[13]

Consumer GPUs

ROCm provides experimental and preview-level support for AMD's consumer Radeon GPUs based on the RDNA architectures, enabling compute workloads on desktop systems at a lower cost compared to professional Instinct series hardware. Supported architectures include RDNA 2 (gfx1030, such as the Radeon RX 6000 series), RDNA 3 (gfx1100 and gfx1101, such as the Radeon RX 7000 series), and partial support for RDNA 4 (gfx1200 and gfx1201, such as select Radeon RX 9000 series models starting with ROCm 6.4.1 and expanded in ROCm 7.0).^[13]^[41] This support focuses on compute-only operations, excluding graphics or display rendering during execution, which limits configurations where the GPU is attached to a display for simultaneous visual output.^[13] Key features on these consumer GPUs include basic HIP (Heterogeneous-compute Interface for Portability) for porting CUDA code and OpenCL for parallel computing, allowing developers to run applications without full enterprise-level optimization. However, precision support is reduced; for instance, double-precision floating-point (FP64) operations are available but perform at a significantly lower rate (approximately 1/32 of FP32 throughput on RDNA architectures), making them unsuitable for high-precision scientific simulations that demand full-rate FP64 as found in professional GPUs. Multi-GPU configurations are in preview status with limited validation, supporting up to two simultaneous compute workloads but prone to errors like GPU resets or out-of-memory issues in demanding scenarios, contrasting with the robust scalability of Instinct accelerators.^[2]^[42]^[43] Primary use cases for ROCm on consumer Radeon GPUs involve entry-level AI and machine learning tasks on desktops, such as local inference for large language models (e.g., via PyTorch or TensorFlow integrations) and lightweight training for personal development workflows. These enable accessible experimentation with generative AI, like running Hugging Face models for content creation or basic scientific computing, though performance caveats include intermittent crashes during extended runs and no backward pass support for ML training on Windows.^[44]^[42] In 2025, developments like ROCm 7.0 expanded RDNA 4 compatibility and added Windows preview support for Radeon GPUs, enhancing broader accessibility for AI enthusiasts while maintaining a secondary focus to the more mature Instinct ecosystem for production-scale deployments. ROCm 7.1 further introduces initial support for select Ryzen APUs.^[45]^[46]^[2]

System Requirements

ROCm primarily supports Linux operating systems, with official compatibility for distributions including Ubuntu 24.04.3 and 22.04.5, Red Hat Enterprise Linux (RHEL) 10.0, 9.6, 9.4, and 8.10, SUSE Linux Enterprise Server (SLES) 15 SP7, Debian 13 and 12, Rocky Linux 9, Azure Linux 3.0, and Oracle Linux 10, 9, and 8.^[47]^[13] Limited support is available on Windows through the Windows Subsystem for Linux (WSL2), enabling ROCm development on compatible Radeon GPUs and Ryzen APUs, though it is not as comprehensive as native Linux support.^[48] ROCm does not support macOS.^[13] The software requires the open-source amdgpu kernel driver, version 5.15 or later, along with ROCm-specific kernel modules such as kfd and amdgpu for GPU management and heterogeneous computing.^[40] These drivers handle device initialization, memory management, and PCIe communication, ensuring compatibility with supported AMD GPUs. Supported kernel versions vary by distribution; for example, Ubuntu 24.04.3 uses kernel 6.8 or higher, while RHEL 8.10 supports kernel 4.18.^[40] Beyond GPUs, ROCm runs on x86_64 architectures with CPUs that support PCIe atomics, such as AMD Zen-based processors (first generation and later) or Intel Haswell and subsequent generations.^[47] Limited ARM64 support is available in experimental configurations for select Instinct accelerators.^[49] For AI and machine learning workloads, a minimum of 16 GB system RAM is recommended to handle data loading and model training efficiently, while AMD Instinct GPUs require PCIe 4.0 or higher interfaces for optimal bandwidth and performance in datacenter environments.^[50]^[51] As of November 2025, ROCm 7.1 offers enhanced container support through compatibility with Docker and Podman for streamlined cloud and edge deployments, including advanced features like improved multi-GPU scaling.^[46]^[52]

Programming Model

HIP Interface

HIP (Heterogeneous-compute Interface for Portability) is a C++ runtime API and kernel language developed by AMD as part of the ROCm platform, enabling developers to create portable applications that run on both AMD GPUs via ROCm and NVIDIA GPUs via CUDA from a single source codebase.^[53] This interface targets heterogeneous computing systems, supporting CPU and GPU execution while minimizing performance overhead compared to native CUDA or ROCm coding.^[53] HIP's design emphasizes familiarity for CUDA programmers, with API calls and kernel syntax that closely mirror CUDA, allowing straightforward porting of applications without major rewrites.^[53] Central to HIP are its kernel definition, memory management, and execution mechanisms. Kernels are defined using attributes like __global__ or the HIP_KERNEL macro, similar to CUDA, and launched either with the familiar triple-chevron syntax kernel<<<blocks, threads>>>(args) or the explicit hipLaunchKernelGGL macro for greater portability and template support. Memory operations include hipMalloc for device memory allocation, hipMemcpy for host-device data transfers (supporting synchronous and asynchronous variants), and hipFree for deallocation, providing direct analogs to CUDA's memory API.^[54] Execution control is handled through hipLaunchKernelGGL(kernel, dim3 grid, dim3 block, size_t sharedMem, hipStream_t stream, args...), which specifies grid and block dimensions, shared memory size, and an optional stream for concurrency.^[54] HIP ensures portability by compiling code to either AMD's ROCm backend using the HIP-Clang compiler or NVIDIA's CUDA backend using NVCC, orchestrated by the hipcc driver utility that automatically sets include paths, libraries, and target-specific options.^[55] It supports asynchronous operations via streams, created with hipStreamCreate and synchronized using hipStreamSynchronize or hipStreamWaitEvent, allowing overlapping computation and data transfers for improved throughput.^[56] Events, managed through hipEventCreate, hipEventRecord, and hipEventSynchronize, provide fine-grained timing and synchronization points within streams.^[57] Advanced features include unified memory support via hipMallocManaged, which allocates memory accessible from both host and device without explicit copies, leveraging Heterogeneous System Architecture (HSA) for unified addressing as detailed in the Foundations section.^[58] For multi-GPU environments, HIP enables device enumeration with hipGetDeviceCount to query available GPUs and hipSetDevice to select a target, facilitating distributed computing across multiple accelerators.^[59] The following code snippet illustrates a basic HIP kernel launch and memory management:

cpp
#include <hip/hip_runtime.h>

__global__ void vectorAdd(const float *A, const float *B, float *C, int N) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < N) C[i] = A[i] + B[i];
}

int main() {
    int N = 1000;
    size_t size = N * sizeof(float);
    float *h_A, *h_B, *h_C;
    float *d_A, *d_B, *d_C;

    h_A = (float*)malloc(size);
    h_B = (float*)malloc(size);
    h_C = (float*)malloc(size);

    hipMalloc(&d_A, size);
    hipMalloc(&d_B, size);
    hipMalloc(&d_C, size);

    // Initialize host arrays (omitted for brevity)

    hipMemcpy(d_A, h_A, size, hipMemcpyHostToDevice);
    hipMemcpy(d_B, h_B, size, hipMemcpyHostToDevice);

    hipLaunchKernelGGL(vectorAdd, dim3(1), dim3(256, 1, 1), 0, 0, d_A, d_B, d_C, N);

    hipMemcpy(h_C, d_C, size, hipMemcpyDeviceToHost);

    hipFree(d_A);
    hipFree(d_B);
    hipFree(d_C);
    free(h_A);
    free(h_B);
    free(h_C);

    return 0;
}
#include <hip/hip_runtime.h>

__global__ void vectorAdd(const float *A, const float *B, float *C, int N) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < N) C[i] = A[i] + B[i];
}

int main() {
    int N = 1000;
    size_t size = N * sizeof(float);
    float *h_A, *h_B, *h_C;
    float *d_A, *d_B, *d_C;

    h_A = (float*)malloc(size);
    h_B = (float*)malloc(size);
    h_C = (float*)malloc(size);

    hipMalloc(&d_A, size);
    hipMalloc(&d_B, size);
    hipMalloc(&d_C, size);

    // Initialize host arrays (omitted for brevity)

    hipMemcpy(d_A, h_A, size, hipMemcpyHostToDevice);
    hipMemcpy(d_B, h_B, size, hipMemcpyHostToDevice);

    hipLaunchKernelGGL(vectorAdd, dim3(1), dim3(256, 1, 1), 0, 0, d_A, d_B, d_C, N);

    hipMemcpy(h_C, d_C, size, hipMemcpyDeviceToHost);

    hipFree(d_A);
    hipFree(d_B);
    hipFree(d_C);
    free(h_A);
    free(h_B);
    free(h_C);

    return 0;
}

This example demonstrates allocation, data transfer, kernel execution, and cleanup, highlighting HIP's CUDA-like workflow.^[60]

OpenCL and OpenMP Support

ROCm provides support for OpenCL, enabling developers to write portable parallel computing kernels that can execute on AMD GPUs as well as other hardware platforms. The implementation is handled through the ROCm Compute Language Runtime (ROCclr), which serves as a virtual device interface within the broader AMD Compute Language Runtimes (CLR) framework, facilitating the execution of OpenCL programs on AMD hardware.^[61] ROCclr integrates with the OpenCL runtime to manage device interactions, memory allocation, and kernel dispatching, allowing standard OpenCL C kernel language to define compute-intensive tasks such as vector operations or image processing.^[62] Kernels are compiled using Clang with support for OpenCL C versions up to 2.0, where the -cl-std=CL2.0 flag enables full conformance, though higher versions like 3.0 remain experimental and not fully roadmap-integrated as of ROCm 7.1.^[63] Execution occurs via core OpenCL APIs, including clEnqueueNDRangeKernel for launching multi-dimensional work-groups on the GPU, ensuring efficient parallel task distribution across compute units.^[8] This OpenCL support is particularly suited for legacy applications or vendor-agnostic codebases requiring cross-platform compatibility, though it may incur overhead when mixed with ROCm's HIP interface due to separate runtime layers.^[8] Unlike HIP, which offers AMD-specific optimizations, OpenCL prioritizes standardization but lacks some performance enhancements tailored to ROCm's architecture, such as direct integration with AMD's memory hierarchy.^[1] ROCm also incorporates OpenMP support for directive-based heterogeneous programming, allowing incremental offloading of CPU code to AMD GPUs without full rewrites. The implementation relies on an LLVM-based toolchain, including Clang, which fully adheres to the OpenMP 4.5 standard and partially supports features from OpenMP 5.0, 5.1, and 5.2, such as device constructs for data mapping and task dependencies.^[64] As of ROCm 7.1, support for OpenMP in Fortran applications has been added, including integration with compilers and runtime libraries.^[64] Key directives include #pragma omp target for marking regions to offload from host to device, enabling automatic code movement and execution on the GPU, along with associated clauses like map for data transfer and teams for controlling parallelism granularity.^[65] This offloading model leverages the ROCm runtime to handle synchronization and resource allocation, making it accessible for scientific computing workloads like simulations or linear algebra routines. While effective for straightforward offloads, OpenMP in ROCm remains experimental for more complex scenarios, such as dynamic task graphs involving irregular dependencies or nested parallelism, where full feature parity with CPU-only execution is not yet achieved due to ongoing LLVM developments.^[66] Interoperability with other ROCm components, like HIP, is possible but limited by directive overhead, positioning OpenMP as a bridge for standards-compliant portability rather than peak performance tuning.^[1]

Core Software Stack

Runtimes and Drivers

The ROCm software stack relies on low-level kernel drivers and runtimes to interface directly with AMD GPU hardware, enabling efficient execution of compute workloads. The primary kernel driver is ROCk, an amdgpu-based component that manages GPU initialization, interrupt handling, and power management for discrete AMD GPUs. ROCk integrates with the Linux kernel's AMDGPU module and Kernel Fusion Driver (KFD) to provide the foundational hardware abstraction necessary for heterogeneous computing. This driver ensures stable operation by handling device discovery, resource allocation at the kernel level, and coordination between CPU and GPU for tasks like memory mapping and event processing.^[67]^[68] At the runtime layer, ROCr serves as AMD's implementation of the Heterogeneous System Architecture (HSA) runtime, acting as a thin user-mode API that bridges applications to the underlying hardware. ROCr facilitates queue management through HSA's architected queuing model, allowing asynchronous dispatch of compute packets to GPU queues with low latency. It also handles signal-based synchronization, where HSA signals enable fine-grained coordination between host and device operations, such as waiting for kernel completion or barrier dependencies. Complementing ROCr is ROCt, the HSA thunk interface, which provides a lightweight user-space bridge to the ROCk kernel driver, managing ioctl communications for direct hardware access without heavy overhead.^[26]^[68]^[69] Core functionalities of these components include command queue submission via HSA's Architected Queuing Language (AQL) packets, which encapsulate kernel dispatches, barriers, and memory operations for execution on AMD GPUs. Memory allocation is exposed through HSA APIs like hsa_memory_allocate, supporting fine-grained and coarse-grained regions with immediate visibility for coherent data sharing across agents. Synchronization mechanisms, such as barrier packets (HSA_PACKET_TYPE_BARRIER_AND and HSA_PACKET_TYPE_BARRIER_OR) and fence scopes (HSA_FENCE_SCOPE_SYSTEM), ensure ordered execution and data consistency without busy-waiting on the host. These elements collectively support scalable, low-level control over GPU resources, forming the execution backbone for higher-level ROCm components.^[70]^[68]^[71] In 2025, ROCm 7.0 introduced significant enhancements to runtimes and drivers, particularly for scalability and reliability on advanced hardware. ROCr was updated to version 1.18.0, adding support for AMD Instinct MI350 Series GPUs (based on CDNA 4 architecture) with optimized P2P memory copies utilizing all available SDMA engines for improved multi-GPU throughput. The AMDGPU driver (version 30.10) was modularized for independent updates, enhancing compatibility and error resilience through better reporting via hipGetLastError and new event notifications in AMD SMI for migration and thermal events. These changes enable production-grade scalability for MI350 deployments, achieving up to 3.8x performance uplifts in key workloads compared to ROCm 6.0 while bolstering fault tolerance in large-scale systems.^[46]^[49]^[72] ROCm 7.1.0, released on October 30, 2025, further improved the runtime layer with enhancements to HIP runtime compatibility with NVIDIA CUDA, including new APIs for memory management (e.g., hipExtMallocAsync, hipExtMemPool*), cooperative groups, and nested tile partitioning. These updates enhance cross-platform portability and efficiency for heterogeneous workloads, building on the HSA foundation provided by ROCr.^[5]

Compilers and Tools

ROCm's compilation infrastructure relies on LLVM-based tools optimized for heterogeneous computing on AMD GPUs. The primary compiler is ROCmCC, a Clang/LLVM-based frontend designed for high-performance computing across AMD GPUs and CPUs, supporting models like HIP, OpenMP, and OpenCL.^[73] It integrates with the AMDGPU backend in LLVM to generate intermediate representations such as HSAIL (Heterogeneous System Architecture Intermediate Language) for GPU kernels.^[74] ROCm-CompilerSupport provides the necessary extensions and libraries within the LLVM project, including the AMD Code Object Manager (comgr) for handling GPU code objects, ensuring seamless integration for ROCm applications.^[75] HIPCC serves as the compiler driver for HIP code, acting as a wrapper around Clang (specifically amdclang++) to automate the compilation process. It handles HIP source files by invoking the underlying LLVM pipeline to produce executable binaries, setting default include paths and linking against ROCm libraries. For offloading computations to AMD GPUs, developers use Clang with flags such as --offload-arch=<target-id> (e.g., --offload-arch=gfx908) to specify the GPU architecture like GFX9 or GFX11, or -mcpu=<target-id> to target specific processors, enabling single-source C++ code to run on both CPU and GPU.^[74] Key tools facilitate development and porting. HIPIFY automates the migration of CUDA applications to HIP by translating source code, replacing CUDA APIs with HIP equivalents, and adjusting kernel syntax—using either the Clang-based hipify-clang for comprehensive parsing or the Perl-based hipify-perl for simpler substitutions.^[76] It supports common CUDA runtime calls, device qualifiers like __global__, and standard libraries but requires manual review for unsupported features or third-party dependencies.^[76] Similarly, GPUFORT is a source-to-source translator for Fortran codes, converting CUDA Fortran or OpenACC directives to Fortran+HIP or Fortran+OpenMP 4.5+, aiding legacy HPC applications in adopting ROCm without full rewrites.^[77] At the mid-level, ROCclr (now integrated into the AMD Compute Language Runtimes, or CLR) acts as a common runtime layer for dispatching HIP and OpenCL kernels, providing a unified interface for heterogeneous execution while abstracting hardware specifics.^[61] It includes implementations for HIP (hipamd) and OpenCL (opencl) subcomponents, built atop HIP-Clang for runtime APIs like streams and memory management.^[61] Debugging workflows leverage ROCgdb, the ROCm source-level debugger based on GDB, which supports heterogeneous debugging of HIP applications across x86 hosts and AMD GPUs. It enables setting breakpoints in GPU kernels, single-stepping through device code, and inspecting memory or variables, though it currently focuses on source-line accuracy without full symbolic support for variables.^[78]

Libraries

Basic Linear Algebra

rocBLAS serves as the primary Basic Linear Algebra Subprograms (BLAS) library within the ROCm ecosystem, providing implementations for levels 1, 2, and 3 operations optimized for AMD GPUs.^[79] It is implemented in HIP C++ and leverages the ROCm runtime to execute vector, matrix-vector, and matrix-matrix computations on the GPU.^[79] hipBLAS, a companion library, offers CUDA compatibility by porting the cuBLAS API to HIP, enabling developers to adapt NVIDIA-focused code to ROCm with minimal changes while maintaining access to rocBLAS's underlying functionality. A cornerstone of rocBLAS is its support for the General Matrix Multiply (GEMM) operation, defined as C = \alpha A B + \beta C, where A and B are input matrices, C is the output matrix, and \alpha and \beta are scalar parameters.^[79] This routine, along with other level-3 BLAS functions, incorporates optimizations tailored to AMD's matrix core instructions, such as the Matrix Fused Multiply-Add (MFMA) operations available on Instinct MI100 and MI200 series GPUs.^[79] These enhancements exploit hardware-specific capabilities like tensor cores for accelerated dense linear algebra, ensuring efficient handling of large-scale computations in high-performance computing workloads.^[79] Key features of rocBLAS include support for half-precision floating-point arithmetic (FP16), which reduces memory bandwidth and boosts throughput for compatible operations, and batched variants of routines like GEMM for processing multiple independent problems simultaneously.^[79] Integration with the HIP programming model allows seamless kernel fusion through libraries like hipBLASLt, where multiple operations can be combined into a single GPU kernel to minimize data transfers and improve overall efficiency.^[79] The library is particularly tuned for AMD Instinct accelerators, delivering high-performance implementations that scale with GPU architecture advancements in ROCm 7.0 and later releases, including ROCm 7.1.0 (October 2025) which adds support for gfx1150/gfx1151 architectures and an OpenMP threads sample.^[79]^[5] In practice, developers invoke rocBLAS functions via a host-side API initialized with a rocblas_handle. For example, the single-precision GEMM can be performed using rocblas_sgemm, which computes C = \alpha A B + \beta C on the GPU by passing matrix dimensions, pointers to device memory, and scalars to the function. Asynchronous execution is supported through HIP streams, allowing overlapping computation with data movement for further performance gains.^[79]

Advanced Solvers and FFT

The ROCm platform provides advanced linear algebra solvers through rocSOLVER and its HIP-portable counterpart hipSOLVER, which implement a subset of LAPACK routines optimized for AMD GPUs. rocSOLVER supports key decompositions such as LU factorization via rocsolver_getrf and QR factorization via rocsolver_geqrf, enabling efficient solution of linear systems and least-squares problems in scientific computing workflows.^[80] Additionally, it includes eigenvalue solvers like rocsolver_syev for symmetric matrices and rocsolver_heev for Hermitian matrices, as well as singular value decomposition (SVD) through rocsolver_gesvd, which computes the decomposition A = U \Sigma V^H for general matrices A.^[81] hipSOLVER acts as a marshalling layer, supporting rocSOLVER as a backend alongside NVIDIA's cuSOLVER, and exposes an API closely aligned with cuSOLVER's dense linear algebra interface, such as hipsolverDnCreate for handle management and hipsolverDnGesvd for SVD, ensuring portability across GPU vendors without code changes.^[82] For frequency-domain computations, rocFFT and hipFFT deliver high-performance discrete Fourier transforms (DFTs) tailored to GPU architectures. rocFFT supports 1D, 2D, and 3D FFT plans created via rocfft_plan_create, accommodating real-to-complex, complex-to-real, and complex-to-complex transforms across data types like single- and double-precision floating-point.^[83] Batched operations are handled efficiently by specifying the number_of_transforms parameter in plan creation, allowing simultaneous execution of multiple independent FFTs to exploit GPU parallelism for large-scale signal processing tasks. hipFFT provides a cuFFT-compatible API, including functions like hipfftExecC2C for executing complex-to-complex transforms on plans, which maps seamlessly to rocFFT on AMD hardware while supporting cuFFT backends on NVIDIA GPUs.^[84] These libraries incorporate optimizations to enhance throughput and resource utilization, particularly for compute-intensive applications. In rocSOLVER, internal implementations bypass rocBLAS calls for small- and medium-sized matrices when optimizations are enabled, reducing overhead and improving performance for decompositions and solvers.^[80] rocFFT leverages batched execution and user-managed work buffers to minimize memory transfers, enabling memory-efficient processing of large datasets by auto-allocating temporary storage only when needed during rocfft_execute. Building on basic linear algebra operations from rocBLAS, these solvers and FFT routines facilitate advanced numerical methods in high-performance computing (HPC).^[85] ROCm 7.0 (September 2025) introduced significant enhancements, including hybrid CPU-GPU execution modes in rocSOLVER, SVD using Cuppen's algorithm for better numerical stability, performance gains in routines like rocsolver_bdsqr for bidiagonal SVD, rocsolver_syev/rocsolver_heev for eigenvalues, and rocsolver_geqr2/rocsolver_geqrf for QR factorization, as well as reduced memory footprint for eigensolvers such as rocsolver_stedc and generalized variants. hipSOLVER improved compatibility for sparse matrix workflows under CUDA backends. For FFT, rocFFT gained new single-precision kernels and optimized execution plans for large 1D transforms, boosting throughput in simulation-heavy workloads like computational fluid dynamics. These updates collectively enhanced efficiency on AMD Instinct MI350 GPUs. ROCm 7.1.0 (October 2025) further optimized rocSOLVER performance for LARF, LARFT, GEQR2, GEQRF, STEDC, and eigensolvers, and improved rocFFT with single-kernel plans for certain 2D sizes and better performance for specific 3D FFTs and MPI pencil decompositions, supporting larger-scale HPC applications with improved precision and reduced resource demands.^[23]^[5]

Machine Learning Libraries

ROCm provides a suite of specialized libraries optimized for machine learning workloads on AMD GPUs, focusing on deep learning primitives, tensor operations, and sparse computations essential for AI models. These libraries leverage the HIP programming model to ensure portability and compatibility with CUDA-based code, allowing developers to adapt existing machine learning applications with minimal changes.^[86] Central to ROCm's machine learning capabilities is MIOpen, AMD's open-source deep learning primitives library. MIOpen delivers high-performance implementations of key operations for convolutional neural networks (CNNs), including convolutions, activations, and pooling layers, with optimizations such as kernel fusion to reduce memory bandwidth usage and GPU launch overheads. It supports advanced data types like bfloat16 for efficient training of large models, making it a foundational component for accelerating AI workloads on AMD Instinct and Radeon GPUs.^[87]^[88] Complementing MIOpen, hipTensor is a high-performance HIP C++ library designed for tensor primitives, particularly tensor contractions critical for transformer-based architectures and other deep learning models. It exploits specialized matrix cores in modern AMD GPUs, such as those in the CDNA architecture, to achieve efficient computation of multi-dimensional tensor operations, enabling scalable performance in machine learning pipelines.^[89]^[90] For sparse matrix operations prevalent in machine learning, such as those in recommendation systems and sparse neural networks, rocSPARSE provides optimized routines for sparse linear algebra subprograms using the HIP language. This library handles sparse matrix-vector multiplications and other sparse formats, supporting efficient processing of data-sparse models on ROCm-enabled hardware.^[91] ROCm integrates with ONNX Runtime through a dedicated execution provider, enabling accelerated inference and training of ONNX models on AMD GPUs. This support facilitates deployment of diverse machine learning models, including transformers, with optimizations for low-precision formats like INT8 and INT4 to enhance efficiency.^[92]^[2] In ROCm 7.0 (2025), enhancements included support for retrieval-augmented generation (RAG) pipelines, demonstrated through tutorials integrating tools like LlamaIndex and Ollama for building AI applications on AMD GPUs. Additionally, optimized kernels for transformer models delivered up to 3x speedup in training performance compared to ROCm 6.0, as shown in benchmarks on AMD Instinct MI300X platforms, boosting productivity for large-scale AI development. ROCm 7.1.0 (October 2025) added further improvements, such as MIOpen's trust verify find mode and HIP kernel for backward layer normalization, along with bfloat16/half float mixed precision support in rocSPARSE for multiple routines.^[46]^[93]^[94]^[5]

Ecosystem

Third-Party Integrations

ROCm integrates seamlessly with major machine learning frameworks, enabling GPU acceleration on AMD hardware. PyTorch offers native support through ROCm-specific wheels, allowing developers to run deep learning workloads directly on AMD Instinct accelerators and Radeon GPUs without code modifications.^[95] TensorFlow utilizes an AMD-maintained plugin for ROCm compatibility, facilitating the execution of neural network training and inference tasks.^[96] Similarly, JAX provides built-in ROCm backend support, optimizing just-in-time compilation and autodifferentiation for high-performance computing in scientific simulations and AI research.^[95] In ROCm 7.0, these integrations achieve comparable performance to NVIDIA CUDA in many AI workloads, particularly in memory-bound inference scenarios with large language models, demonstrating near parity through optimized libraries like MIOpen and hipRTC.^[97] In high-performance computing, ROCm enables GPU acceleration for several key scientific applications. OpenFOAM, a popular open-source toolbox for computational fluid dynamics, leverages ROCm via OpenMP target offloading and HIP ports to accelerate simulations such as heat transfer and fluid flow on AMD GPUs, achieving significant speedups in solver performance.^[98] GROMACS, used for molecular dynamics simulations in biochemistry, supports ROCm through its HIP backend, allowing efficient GPU offloading for protein folding and drug discovery workloads on platforms like the Frontier exascale supercomputer.^[99]^[100] ABINIT, an electronic structure package for materials science, incorporates ROCm-compatible GPU acceleration via OpenMP offload directives, enabling faster ground-state calculations and density functional theory computations on AMD hardware.^[101] ROCm facilitates interoperability with graphics APIs and provides language bindings for broader adoption. Through HIP, ROCm supports resource sharing between compute kernels and Vulkan graphics pipelines, enabling hybrid applications in rendering and visualization by mapping buffers and textures across APIs.^[102] For Python developers, hip-python offers low-level bindings to the HIP runtime and ROCm libraries like rocBLAS and RCCL, simplifying GPU programming in AI and data science scripts.^[103] Fortran users benefit from hipfort, which exposes HIP APIs and accelerated math libraries, allowing legacy HPC codes to offload computations to AMD GPUs without extensive rewrites.^[104] In 2025, ROCm expanded its ecosystem with enhanced support for retrieval-augmented generation (RAG) in AI applications, providing tools and workflows to build end-to-end pipelines on AMD GPUs for improved generative AI accuracy using external knowledge bases.^[105] Additionally, Oracle announced an expanded partnership with AMD to integrate Instinct GPUs and ROCm into its cloud infrastructure, enabling large-scale AI and HPC workloads through superclusters powered by up to 50,000 AMD Instinct MI450 Series GPUs, planned for availability starting in Q3 2026.^[106]

Distribution and Installation

ROCm is distributed primarily through official AMD repositories, providing binary packages for supported Linux distributions such as Ubuntu and Red Hat Enterprise Linux (RHEL).^[52] For Ubuntu 22.04 (Jammy) and 24.04 (Noble), users add the AMD repository by downloading the GPG key and creating a sources list file, followed by updating the package index with apt update.^[107] Installation then proceeds via apt install rocm, which pulls in the core runtime, or specialized metapackages like rocm-dev for the full development stack including compilers, libraries, and tools.^[107] On RHEL 8.10 and 9.4, a similar process uses dnf after enabling the repository, installing packages like rocm for runtime components.^[108] Binary packages are available for ROCm 7.0 and later versions, ensuring compatibility with AMD Instinct accelerators and Radeon GPUs meeting system requirements.^[52] Docker containers offer a containerized alternative for isolated environments, with official ROCm images hosted on Docker Hub under the rocm namespace, such as rocm/[pytorch](/page/PyTorch) for machine learning workflows.^[109] These images include pre-built ROCm stacks and can be run with GPU access by mounting the host's device files using options like --device /dev/kfd --device /dev/dri.^[109] For custom builds, source compilation is supported via TheRock, AMD's open-source build system introduced in ROCm 7.9 preview, which uses CMake to assemble the ROCm core SDK from GitHub repositories, bundling dependencies for platforms like Ubuntu 24.04.^[110]^[111] Third-party distributions extend accessibility for specific use cases. Conda-forge provides ROCm packages tailored for Python and machine learning environments, such as rocm-device-libs and rocm-smi, installable via conda install -c conda-forge rocm-device-libs, allowing integration without full system package management.^[112] Spack, a package manager popular in high-performance computing (HPC) clusters, supports ROCm installation and source builds through its ROCm-specific recipes, enabling variant configurations for multi-version deployments across supercomputers.^[113]^[114] Cloud providers offer pre-configured images; for instance, Microsoft Azure provides AMD GPU instances with ROCm-enabled virtual machines for AI and HPC workloads, while AWS supports ROCm on AMD-powered EC2 instances via standard installation methods.^[115] The installation process typically involves adding the repository, installing the base rocm package, and verifying functionality with the rocminfo tool, which queries GPU details and ROCm version.^[116] Common troubleshooting includes resolving driver conflicts by ensuring the latest AMDGPU kernel driver is installed and blacklisting conflicting modules like Nouveau, as well as checking compatibility matrices for user-space and kernel versions.^[117] Users should reboot after installation and add their account to the render and video groups for proper GPU access.^[116]

Learning and Community Resources

The official documentation for ROCm is hosted at rocm.docs.amd.com, providing comprehensive guides for installation, programming, and optimization on AMD GPUs.^[118] This resource includes the HIP programming guide, which details the C++ runtime API and kernel language for creating portable applications across AMD and NVIDIA hardware, emphasizing heterogeneous computing environments.^[119] Additionally, the AMD ROCm AI Developer Hub offers tutorials in Jupyter Notebook format, covering inference, fine-tuning, pretraining, and GPU development, such as deploying models with vLLM and fine-tuning with Hugging Face Transformers.^[120] These materials support hands-on learning for HIP basics through example repositories and AI porting workflows from CUDA using tools like HIPIFY.^[121] ROCm's GitHub organization, under ROCm/ROCm, maintains over 350 open-source repositories as of 2025, serving as a central hub for developers to explore code examples and contribute to the ecosystem.^[122] Key learning resources include the rocm-examples repository, which provides introductory and advanced samples for HIP programming, and the HIP-Examples depot for kernel-level demonstrations.^[102] Contributions occur via pull requests and issue discussions on these repositories, fostering collaborative improvements to ROCm components like libraries and tools.^[123] For 2025 updates, official ROCm blogs highlight optimizations for the AMD Instinct MI350 series GPUs, including enhanced performance in distributed inference and enterprise AI workloads.^[124] Community support for ROCm is facilitated through the AMD Developer Hub, which includes forums, webinars, and best practices for troubleshooting and sharing experiences.^[94] Developers can engage in discussions on GitHub and participate in AMD-hosted events like the Advancing AI conference series, where ROCm advancements are showcased annually.^[125] Recent guides address emerging needs, such as building Retrieval-Augmented Generation (RAG) pipelines for enterprise AI using vLLM, LangChain, and Chroma on ROCm, enabling scalable, fact-grounded applications.^[126] These resources bridge installation with practical application, supporting users in high-performance computing and AI development.

Comparisons

With NVIDIA CUDA

ROCm and NVIDIA CUDA share several architectural similarities that facilitate developer transition and code portability. The Heterogeneous-compute Interface for Portability (HIP) in ROCm is designed to closely mirror CUDA's syntax and API, allowing developers to port CUDA applications to ROCm with minimal changes, often through automated tools like hipify. Both platforms support Single Instruction, Multiple Threads (SIMT) execution models for parallel processing on GPUs and stream-based asynchronous operations for overlapping computation and data transfer, enabling efficient workload management. This HIP-CUDA alignment promotes dual-vendor portability, where a single codebase can target both AMD and NVIDIA hardware without extensive rewrites.^[127] Key differences lie in their foundational approaches and openness. ROCm is an open-source platform built on the Heterogeneous System Architecture (HSA), which provides a unified memory model that allows seamless sharing of memory between CPU and GPU without explicit data transfers in many scenarios, simplifying programming for heterogeneous systems. In contrast, CUDA is a proprietary ecosystem requiring more explicit memory management, such as manual allocations and copies via cudaMalloc and cudaMemcpy, though it supports optional unified memory since CUDA 6.0. CUDA's closed nature limits customization, while ROCm's open-source model fosters community contributions and integration with Linux distributions. Regarding ecosystem scale, CUDA benefits from a larger, more mature library of third-party tools and frameworks optimized over nearly two decades, whereas ROCm's ecosystem, while smaller, is rapidly expanding in AI and high-performance computing (HPC) domains through partnerships like PyTorch and TensorFlow support. In terms of performance, ROCm 7.1 achieves competitive results relative to CUDA on AMD hardware, particularly for machine learning workloads. In the MLPerf Inference v5.1 benchmarks from September 2025, AMD Instinct MI325X GPUs with ROCm demonstrated near parity or outperformance against NVIDIA H200 systems with CUDA; for instance, Mixtral-8x7B offline throughput improved 23% over prior submissions and exceeded H200 averages, while Llama2-70B and SD-XL scenarios showed results competitive with H200 in offline, server, and interactive modes.^[128] Overall, ROCm delivers 80-95% of CUDA's performance in optimized ML tasks on equivalent hardware, though it may require additional tuning and lags in some mature tools due to CUDA's longer development history.^[129] Adoption patterns highlight CUDA's dominance in academic research and commercial AI, driven by its extensive tooling and NVIDIA's market leadership, with over 4 million developers using it as of 2025. ROCm is gaining traction in open-source HPC environments, powering systems like the Frontier exascale supercomputer at Oak Ridge National Laboratory, which leverages ROCm for its AMD Instinct MI250X GPUs to achieve world-leading performance in scientific simulations. This growth positions ROCm as a viable alternative for cost-sensitive, open ecosystems, especially as AMD invests in AI optimizations.^[130]

With Intel oneAPI

ROCm and Intel's oneAPI share several foundational similarities as open-source platforms designed for heterogeneous computing. Both emphasize portability across accelerators, leveraging standards such as SYCL for single-source C++ programming models that enable code to target diverse hardware without vendor-specific rewrites.^[131] They also support OpenMP offload directives for GPU acceleration, allowing developers to use familiar parallel programming constructs for compute-intensive tasks.^[132]^[133] Additionally, both incorporate OpenCL interoperability, facilitating legacy code migration and cross-platform execution through intermediate representations like SPIR-V.^[134] Key differences arise in their scope and programming paradigms. ROCm is tailored specifically for AMD GPUs, utilizing the Heterogeneous-compute Interface for Portability (HIP) as its core language, which mirrors CUDA syntax for easier porting from NVIDIA ecosystems while optimizing for AMD's architecture. In contrast, oneAPI targets a multi-vendor landscape encompassing CPUs, GPUs, and FPGAs from Intel, AMD, NVIDIA, and others, primarily through Data Parallel C++ (DPC++), an extension of SYCL that promotes unified codebases across architectures. This broader ambition is advanced by the Unified Acceleration (UXL) Foundation, an open consortium evolving oneAPI standards to foster industry-wide interoperability.^[135] Performance characteristics reflect these hardware focuses. On AMD Instinct accelerators, ROCm delivers significant AI workloads uplifts, such as up to 3.5 times faster inference compared to prior versions in ROCm 7.0, leveraging deep hardware-specific optimizations for training and inference.^[136] Conversely, oneAPI achieves superior efficiency on Intel Xe GPUs, with tailored libraries like oneDNN providing up to 2x throughput gains in deep learning operations due to integrated SYCL compilation and vector extensions. Interoperability via SPIR-V enables hybrid deployments, allowing SYCL/DPC++ code to execute on AMD hardware through ROCm's runtime.^[137] In terms of ecosystem, oneAPI offers expansive hardware coverage and tooling, including comprehensive libraries for AI, HPC, and analytics that span Intel's full portfolio, making it ideal for diverse deployments. ROCm, however, provides deeper, AMD-centric optimizations, such as specialized kernels for Instinct series in high-performance computing. Both platforms integrate with PyTorch—ROCm via native HIP backends for AMD GPUs and oneAPI through the Intel Extension for PyTorch (IPEX) using SYCL—but differ in development tools, with ROCm emphasizing ROCprof for profiling and oneAPI focusing on the DPC++ compiler suite for cross-vendor debugging.^[138]^[139]

References

[1]
What is ROCm? - AMD ROCm documentation
ROCm is a software stack, composed primarily of open-source software, that provides the tools for programming AMD Graphics Processing Units (GPUs).
[2]
Use ROCm on Radeon and Ryzen
Unlock Local AI Development on Your AMD Hardware. Transform your AMD-powered system into a powerful and private machine learning workstation.Use Rocm On Radeon And Ryzen · Expanded Platform Support · Rocmtm Key Capabilities
[3]
AMD ROCm™ Software
AMD ROCm is an open software stack including drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.What's New in ROCm 7 · Discover ROCm for AI · AMD Infinity Hub
[4]
ROCm release history
ROCm release history# ; 7.0.1. September 17, 2025 ; 7.0.0. September 16, 2025 ; 6.4.3. August 7, 2025 ; 6.4.2. July 21, 2025.
[5]
ROCm 7.1.0 release notes
The ROCm Data Center tool (RDC) hardware monitoring capabilities have been expanded by integrating the new AMDSMI API. This enhancement enables more ...
[6]
AMD ROCm 7.0: Built for Developers, Advancing Open Innovation
Sep 16, 2025 · ROCm 7.0 empowers both developers and enterprises to move faster, scale smarter, and deploy AI with confidence.
[7]
AMD ROCm™ Software - GitHub Home
With ROCm, you can customize your GPU software to meet your specific needs. You can develop, collaborate, test, and deploy your applications in a free, open ...Popular repositories - Loading · Issues 175 · ROCm/TheRock · Rocm-smi
[8]
Programming guide - AMD ROCm documentation
ROCm provides a robust environment for heterogeneous programs running on CPUs and AMD GPUs. ROCm supports various programming languages and frameworks.
[9]
What is ROCm? - AMD ROCm documentation
ROCm is an open-source stack, composed primarily of open-source software, designed for graphics processing unit (GPU) computation.
[10]
HIP porting guide - AMD ROCm documentation
Library Equivalents ROCm provides libraries to ease porting of code relying on CUDA libraries. Most CUDA libraries have a corresponding HIP library. There are ...Hip Porting Guide · Porting A Cuda Project · Identifying Device...Missing: alternative | Show results with:alternative<|separator|>
[11]
Use ROCm for AI
ROCm is an open-source software platform that enables high-performance computing and machine learning applications. It features the ability to accelerate ...Installing ROCm and deep... · Tutorials for AI developers · Training · Inference<|control11|><|separator|>
[12]
ROCm license - AMD ROCm documentation
ROCm is released by Advanced Micro Devices, Inc. (AMD) and is licensed per component separately. The following table is a list of ROCm components with links to ...<|control11|><|separator|>
[13]
Compatibility matrix - AMD ROCm documentation
Use this matrix to view the ROCm compatibility and system requirements across successive major and minor releases. You can also refer to the past versions ...
[14]
PyTorch compatibility - AMD ROCm documentation
PyTorch on ROCm provides mixed-precision and large-scale training using MIOpen and RCCL libraries. PyTorch provides two high-level features: Tensor computation ...
[15]
TensorFlow compatibility - AMD ROCm documentation
The official TensorFlow repository includes full ROCm support. AMD maintains a TensorFlow ROCm repository in order to quickly add bug fixes, updates, and ...
[16]
AMD Releases New Version of ROCm, the Most Versatile Open ...
Nov 14, 2016 · Upcoming releases of ROCm are expected to support AMD "Zen"-based x86 CPUs, ARM AArch64 CPU architecture starting with Cavium ThunderX ...
[17]
Everything You Need to Know About Why AMD Open Sourced the ...
Oct 18, 2017 · Last May, AMD open sourced the OpenCL driver stack for ROCm. With this they kept their promise to open source (almost) everything.
[18]
Radeon ROCm 5.0 Released With Some RDNA2 GPU Support
Feb 10, 2022 · Overnight AMD quietly released ROCm 5.0 for improving the Radeon Open eCosystem. Most exciting with ROCm 5.0 is having some level of Navi 2x / ...
[19]
AMD ROCm 6.0 Now Available To Download With MI300 ... - Phoronix
Dec 15, 2023 · AMD ROCm 6.0 Now Available To Download With MI300 Support, PyTorch FP8 & More AI. Written by Michael Larabel in Radeon on 15 December 2023 at ...
[20]
AMD ROCm 7 Announced: MI350 Support, New Algorithms, Models ...
Jun 12, 2025 · Breaking down the performance uplifts, we can see up to a 3.2x increase in Llama 3.1 70B, a 3.4x increase in Qwen2-72B, and up to 3.8x in Deep ...
[21]
AMD ROCm 7.0 Officially Released With Many Significant ... - Phoronix
Sep 16, 2025 · ROCm 7.0.0 is officially out and all of the ROCm 7.0 documentation has also been published along with the binaries being available via the AMD ...<|control11|><|separator|>
[22]
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency ...
Sep 16, 2025 · ROCm 7.0 raises the bar for end-to-end AI enablement. With breakthrough training and inference performance on the AMD Instinct™ MI350 series ...Missing: RAG | Show results with:RAG
[23]
What is Heterogeneous System Architecture (HSA)?
Aug 31, 2012 · HSA is all about delivering new, improved user experiences through advances in computing architectures that deliver improvements across all four key vectors.
[24]
HSA Announces Publication of New Guide to Heterogeneous ...
Dec 17, 2015 · “Heterogeneous computing is a key enabler of the next generation of compute environments, wherein entire systems will interconnect autonomously ...
[25]
ROCR 1.18.0 Documentation - AMD ROCm documentation
The ROCm runtime (ROCR) is AMD's implementation of HSA runtime, which is a thin, user-mode API that exposes the necessary interfaces to access and interact ...Missing: HSAIL | Show results with:HSAIL
[26]
Unified memory — HIP 6.2.41133 Documentation
It is particularly useful in heterogeneous computing environments with heavy memory usage with both a CPU and a GPU, which would require large memory transfers.Missing: seamless | Show results with:seamless
[27]
Introduction to the HIP programming model
In heterogeneous programming, the CPU is available for processing operations but the host application has the additional task of managing data and ...
[28]
HIP programming model - AMD ROCm documentation
The SIMT programming model behind the HIP device-side execution is a middle-ground between SMT (Simultaneous Multi-Threading) programming known from multicore ...Missing: paradigms | Show results with:paradigms
[29]
Multi-device management — HIP 7.1.0 Documentation
Streams enable asynchronous task execution, allowing multiple devices to process data concurrently without blocking one another. Events provide a mechanism for ...Missing: parallelism | Show results with:parallelism
[30]
Asynchronous concurrent execution — HIP 7.1.0 Documentation
Asynchronous concurrent execution is important for efficient parallelism and resource utilization, with techniques such as overlapping computation and data ...Missing: heterogeneous | Show results with:heterogeneous
[31]
ROCm Revisited: Evolution of the High-Performance GPU ...
Jun 9, 2025 · In this blog post, we aim to highlight AMD's ROCm ecosystem and the evolution of the software stack throughout the years.
[32]
Application portability with HIP - ROCm™ Blogs - AMD
Apr 26, 2024 · HIP enables platform-independent GPU programs, allowing CUDA code to run on both AMD and NVIDIA GPUs with a portable build system.Application Portability With... · Converting Cuda Applications... · Hipify ToolsMissing: alternative | Show results with:alternative
[33]
Performance guidelines — HIP 6.2.41134 Documentation
This chapter describes a set of best practices designed to help developers optimize the performance of HIP-capable GPU architectures.Parallel Execution · Memory Throughput... · Optimization For Maximum...
[34]
https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.2/how-to/performance_guidelines.html
[35]
AMD Instinct | Solution - GIGABYTE Global
The AMD Instinct™ MI350 Series GPUs, launched in June 2025, represent a ... Expanded Hardware & Platform Support : ROCm 7 is fully compatible with AMD Instinct ...
[36]
AMD Instinct GPU Validated System | MiTAC Computing Technology
With features like double-precision (FP64) performance and high inter-GPU bandwidth through AMD Infinity Fabric™, Instinct accelerators empower researchers ...
[37]
AMD Instinct™ MI300 Series microarchitecture
The GPUs are using seven high-bandwidth, low-latency AMD Infinity Fabric™ links (red lines) to form a fully connected 8-GPU system. previous. GPU ...
[38]
AMD Unveils ROCm 7: AI Inference Acceleration Up to 3.8x and Full ...
Jun 12, 2025 · The main performance gain was recorded in inference tasks: up to 3.5× faster than ROCm 6, with a maximum 3.8× in DeepSeek R1, 3.2× in Llama 3.1 ...
[39]
System requirements (Linux) - AMD ROCm documentation
The following table shows the supported AMD Instinct™ GPUs, and Radeon™ PRO and Radeon GPUs. ... AMD Instinct MI200 Series GPUs only supports Ubuntu 24.04.Missing: MI | Show results with:MI
[40]
AMD Releases ROCm 6.4.1 With RDNA4 GPU Support - Phoronix
May 21, 2025 · AMD ROCm 6.4.1 is now officially released. With ROCm 6.4.1 there is formal support for RDNA4 GPUs, including the Radeon RX 9000 series consumer graphics cards.
[41]
Radeon Limitations and recommended settings
AMD has identified common errors when running ROCm™ on Radeon™ multi-GPU configuration at this time, along with the applicable recommendations. See mGPU ...
[42]
AMD Seeking Feedback Around What Radeon GPUs You Would ...
Jan 22, 2025 · Even RDNA2 can do AI upscaling reasonably well. RDNA3 should be more than capable of running something better. These are their current high end ...<|control11|><|separator|>
[43]
Getting Started Guide: Using AMD ROCm™ Software on Radeon ...
Support for Hugging Face models and tools on Radeon GPUs using ROCm, allowing users to unlock the full potential of LLMs on their desktop systems. Radeon ...
[44]
AMD unveils ROCm 7 — new platform boosts AI performance up to ...
Jun 13, 2025 · The biggest change brought by ROCm 7 for client PCs is the extension of ROCm to Windows and Radeon GPUs, which allows the use of discrete and ...
[45]
ROCm 7.0.0 release notes
Sep 16, 2025 · Virtualization support#. ROCm 7.0.0 introduces support for KVM Passthrough for AMD Instinct MI350X and MI355X GPUs. All KVM-based SR-IOV ...<|separator|>
[46]
System requirements (Linux) - AMD ROCm documentation
Sep 17, 2025 · ROCm requires CPUs that support PCIe™ atomics. Modern CPUs after the release of 1st generation AMD Zen CPU and Intel™ Haswell support PCIe ...
[47]
WSL support matrices by ROCm version
This section provides information on the compatibility of ROCm™ components, Radeon™ GPUs, and the Radeon Software for Windows Subsystem for Linux® (WSL). To ...
[48]
AMD ROCm 7.0 Software: Supercharging AI and HPC Infrastructure ...
Oct 1, 2025 · The MI350 GPU leverages advanced FP4 and FP6 datatype support, offering outstanding compute density and memory efficiency for transformer ...Missing: September RAG
[49]
Prerequisites to use ROCm on Radeon desktop GPUs for machine ...
ROCm is an extension of HSA platform architecture, and shares queuing model, memory model, signaling and synchronization protocols. Platform atomics are ...
[50]
How ROCm uses PCIe atomics — AMD GPU Driver (amdgpu) 30.20.0
PCIe for atomic operations ROCm requires CPUs that support PCIe atomics. Similarly, all connected I/O devices should also support PCIe atomics for optimum ...
[51]
ROCm installation for Linux
Oct 9, 2025 · While the package manager is the recommended method, you can still install ROCm using the AMDGPU installer by following the legacy process.System requirements · ROCm installation overview · ROCm Runfile Installer · JAX
[52]
What is HIP? - AMD ROCm documentation
HIP is a thin API with little or no performance impact over coding directly in NVIDIA CUDA or AMD ROCm. HIP enables coding in a single-source C++ programming ...
[53]
[PDF] HIP Documentation
Sep 13, 2024 · The Heterogeneous-computing Interface for Portability (HIP) API is a C++ runtime API and kernel language that lets developers create ...
[54]
Kernel Language Syntax — HIP Documentation
The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. HIPIFY tools optionally convert Cuda ...
[55]
HIP compilers — HIP 6.2.41134 Documentation
ROCm provides the compiler driver hipcc , that can be used on AMD ROCm and NVIDIA CUDA platforms. On ROCm, hipcc takes care of the following: Setting the ...
[56]
Event management — HIP 7.1.0 Documentation
Record an event in the specified stream. hipEventQuery() or hipEventSynchronize() must be used to determine when the event transitions from “recording”
[57]
Unified memory management — HIP 6.2.41134 Documentation
HIP managed memory allocation API: The hipMallocManaged() is a dynamic memory allocator available on all GPUs with unified memory support. For more details, ...
[58]
Device management — HIP 7.1.52801 Documentation
### Summary of Multi-GPU Support and Related Functions
[59]
AMD compute language runtimes (CLR)
opencl - contains implementation of OpenCL™ on AMD platform. It is hosted at clr/opencl. rocclr - contains ROCm compute runtime used in HIP and OpenCL™ .
[60]
ROCm/clr - GitHub
AMD CLR (Compute Language Runtime) contains source codes for AMD's compute languages runtimes: HIP and OpenCL™ .Missing: implementation | Show results with:implementation
[61]
OpenCL Programming Guide — ROCm 4.5.0 documentation
If there is a kernel compilation error, the error code is CL_BUILD_PROGRAM_FAILURE, in which case it is necessary to print out the build log.
[62]
ROCm OpenMP support — llvm-project 20.0.0 Documentation
The ROCm installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard and a subset of OpenMP 5.0, 5.1, and 5.2 standards.
[63]
OpenMP support in ROCm
This document briefly describes the installation location of the OpenMP toolchain, example usage of device offloading, and usage of rocprof with OpenMP ...
[64]
Support, Getting Involved, and FAQ - LLVM/OpenMP
The OpenMP AMDGPU offloading support depends on the ROCm math libraries and the HSA ROCr / ROCt runtimes. These are normally provided by a standard ROCm ...
[65]
https://rocm.docs.amd.com/en/latest/about/compatibility/openmp.html
[66]
What is ROCR? - AMD ROCm documentation
The ROCm runtime (ROCR) is AMD's implementation of HSA runtime, which is a thin, user-mode API that exposes the necessary interfaces to access and interact ...
[67]
GitHub - ROCm/ROCT-Thunk-Interface: ROCm's Thunk Interface
### Summary of ROCt and Its Purpose
[68]
API — ROCR 1.18.0 Documentation
The HSA runtime passes three arguments to the callback: the allocation size, the application data, and a pointer to a memory location where the application ...
[69]
https://github.com/ROCm/ROCT-Thunk-Interface
[70]
AMD Launches ROCm 7.0, Up to 3.8x Performance Uplift Over ...
Sep 17, 2025 · AMD today unveiled ROCm 7.0, a massive update to its open GPU software platform for AI workloads across datacenter racks and even client ...
[71]
ROCm LLVM compiler infrastructure — llvm-project 20.0.0 ...
Learn more about the AMD ROCm LLVM compiler infrastructure and its various components and tools, including the open-source ROCm LLVM fork and associated ...
[72]
User Guide for AMDGPU Backend — LLVM 20.0.0git documentation
Sep 30, 2025 · Use the Clang options -mcpu=<target-id> or --offload-arch=<target-id> to specify the AMDGPU processor together with optional target features.
[73]
Rocm-CompilerSupport has moved! - GitHub
May 14, 2024 · Rocm-CompilerSupport has moved! This project is now located in the AMD Fork of the LLVM Project, under the "amd/comgr" directory.
[74]
HIPIFY documentation
HIPIFY is a ROCm tool to help developers migrate GPU programming from NVIDIA's CUDA language to AMD's HIP C++ programming language for use on AMD GPUs.Missing: alternative | Show results with:alternative
[75]
GPUFORT: S2S translation tool for CUDA Fortran and ... - GitHub
GPUFORT was developed to translate a number of HPC apps to code formats that are well supported by AMD's ROCm ecosystem.
[76]
ROCgdb 16.3 Documentation - AMD ROCm documentation
ROCgdb is the AMD source-level debugger for Linux, based on the GNU Debugger (GDB). ROCgdb enables heterogeneous debugging on the ROCm software.Missing: ROCD | Show results with:ROCD
[77]
rocBLAS design and usage notes - AMD ROCm documentation
The rocBLAS library uses Tensile and hipBLASLt internally, which supply high-performance implementations of GEMM. Tensile is installed as part of the ...
[78]
rocSOLVER LAPACK-like functions - AMD ROCm documentation
An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For ...
[79]
rocSOLVER 3.31.0 Documentation - AMD ROCm documentation
rocSOLVER implements LAPACK routines on top of the AMD ROCm platform. rocSOLVER is implemented in the HIP programming language and optimized for AMD GPUs.
[80]
hipSOLVER 3.1.0 Documentation - AMD ROCm documentation
hipSOLVER is a LAPACK marshalling library, with multiple supported backends. It sits between the application and a 'worker' LAPACK library.Missing: hipFFT | Show results with:hipFFT
[81]
rocFFT 1.0.35 Documentation - AMD ROCm documentation
The rocFFT library provides a fast and accurate implementation of the discrete Fast Fourier Transform (FFT) written in HIP for GPU devices.
[82]
hipFFT API usage - AMD ROCm documentation
hipFFT API usage#. This section describes how to use the hipFFT library API. The hipFFT API follows the NVIDIA CUDA cuFFT API.
[83]
Working with rocFFT - AMD ROCm documentation
Working with rocFFT#. This topic describes how to use rocFFT, including how to structure the workflow, set up and clean up the library, and use plans, ...
[84]
ROCm libraries - AMD ROCm documentation
Applies to Linux and Windows, Machine Learning and Computer Vision, Primitives, Communication, Math
[85]
MIOpen 3.5.1 Documentation - AMD ROCm documentation
MIOpen documentation MIOpen is one of the first libraries to publicly support the bfloat16 data type for convolutions, which allows for efficient training at ...
[86]
What is MIOpen? - AMD ROCm documentation
MIOpen is AMD's open-source, deep-learning primitives library for GPUs. It implements fusion to optimize for memory bandwidth and GPU launch overheads.
[87]
hipTensor 2.0.0 Documentation - AMD ROCm documentation
hipTensor is a high-performance HIP C++ library for accelerating tensor primitives. It leverages specialized GPU matrix cores on the latest AMD discrete GPUs.
[88]
What is hipTensor? - AMD ROCm documentation
hipTensor is a high-performance HIP library for tensor primitives. It's the AMD C++ library for accelerating tensor primitives, leveraging specialized GPU ...
[89]
rocSPARSE 4.1.0 Documentation
rocSPARSE is a library that provides basic linear algebra subroutines for sparse matrices and vectors. It's created using the HIP programming language, ...Missing: machine learning
[90]
AMD - ROCm | onnxruntime
The intent is to get users up and running with their custom workload in python and provides an environment of prebuild ROCm, Onnxruntime and MIGraphX packages ...Missing: integration | Show results with:integration
[91]
Constructing a RAG system using LlamaIndex and Ollama ...
This tutorial demonstrates how to construct a RAG pipeline using LlamaIndex and Ollama on AMD Radeon GPUs with ROCm. For further details, see the ...
[92]
[PDF] AMD ROCM™ 7 SOFTWARE SOLUTION GUIDE FOR AMD ...
A preview version of ROCm 7 software showed an average 3.5× higher inference throughput performance on AMD InstinctTM MI300X 8x GPU platform2 and up to 3× ...
[93]
Deep learning frameworks for ROCm
It also provides ROCm-compatible versions of popular frameworks and libraries, such as PyTorch, TensorFlow, JAX, and others. The AMD ROCm organization ...
[94]
TensorFlow on ROCm installation
This topic covers setup instructions and the necessary files to build, test, and run TensorFlow with ROCm support in a Docker environment. ... PyTorch on ROCm ...
[95]
ROCm vs CUDA: A Performance Showdown for Modern AI Workloads
Aug 7, 2025 · ROCm + AMD MI325X is ready for prime time. See benchmarks vs CUDA and why more teams are switching to ROCm for AI performance and cost.
[96]
Building an Accelerated OpenFOAM Proof-of-Concept Application ...
Jul 24, 2025 · RapidCFD is an open-source OpenFOAM implementation that runs almost all simulations on NVIDIA GPUs. ... https://github.com/ROCm/roc-stdpar.
[97]
GROMACS on AMD GPU-Based HPC Platforms - arXiv
Here, we share the results of our work on readying GROMACS for AMD GPU platforms using SYCL, and demonstrate performance on Cray EX235a machines with MI250X ...Missing: ABINIT | Show results with:ABINIT
[98]
Frontier User Guide - OLCF User Documentation
... RocBLAS has over 1,000 device libraries that may be `dlopen`'d by RocBLAS ... Use an alternative BLAS library such as Magma (for GPU) or cray-libsci or Openblas ( ...
[99]
Parallelism - abinit
This page gives hints on how to set parameters for a parallel calculation with the ABINIT package.
[100]
A collection of examples for the ROCm software stack - GitHub
This repository is a collection of examples to enable new users to start using ROCm, as well as provide more advanced examples for experienced users.
[101]
HIP Python - AMD ROCm documentation
Jun 23, 2023 · HIP Python provides low-level Cython and Python® bindings for the HIP runtime, HIPRTC, multiple math libraries and the communication library RCCL.
[102]
ROCm/hipfort: Fortran interfaces for ROCm libraries - GitHub
The current batch of HIPFORT interfaces is derived from ROCm 4.5.0. The following tables list the supported API: HIP · hipBLAS · hipFFT · hipRAND · hipSOLVER ...
[103]
From Ingestion to Inference: RAG Pipelines on AMD GPUs
Oct 2, 2025 · Build a RAG enhanced GenAI application that improves the quality of model responses by incorporating data that is missing in the model ...
[104]
Oracle and AMD Expand Partnership to Help Customers Achieve ...
Oct 14, 2025 · Beginning in calendar Q3 2026, Oracle will be the first hyperscaler to offer a publicly available AI supercluster powered by 50,000 AMD ...
[105]
Ubuntu native installation - AMD ROCm documentation
Install ROCm. System requirements · User & AMD GPU Driver (amdgpu) · Quick start installation guide · Detailed install · Prerequisites · Installation methods.
[106]
Red Hat Enterprise Linux native installation
For information about the AMDGPU driver installation, see the Red Hat Enterprise Linux native installation in the AMD Instinct Data Center GPU Documentation.Installing · Rocm Runtime Packages · Rocm Developer Packages
[107]
Running ROCm Docker containers — ROCm installation (Linux)
To grant a Docker container access to the host's AMD GPUs, run your container with the following options. See the Docker documentation to learn more about the ...Prerequisites · Accessing Gpus In Containers · Docker Compose
[108]
Build the ROCm Core SDK from source — AMD ROCm 7.9.0 preview
Oct 23, 2025 · Learn how to build the ROCm Core SDK from source using TheRock. Includes references to environment setup guides for Ubuntu 24.04 and Windows ...Prerequisites · High-Level Build Process · Platform-Specific Setup
[109]
ROCm/TheRock: The HIP Environment and ROCm Kit - GitHub
TheRock (The HIP Environment and ROCm Kit) is a lightweight open source build platform for HIP and ROCm. The project is currently in an early preview state.
[110]
Rocm Device Libs - conda install - Anaconda
To install this package run one of the following: conda install conda-forge::rocm-device-libs conda install conda-forge/label/cf202003::rocm-device-libs ...<|separator|>
[111]
Using Spack to install ROCm packages
Spack is a package management tool designed to support multiple software versions and configurations on a wide variety of platforms and environments.Missing: HPC | Show results with:HPC
[112]
ROCm/rocm-spack: A flexible package manager that ... - GitHub
It covers basic to advanced usage, packaging, developer features, and large HPC deployments. You can do all of the exercises on your own laptop using a Docker ...
[113]
AMD and Microsoft Bring Cloud-to-Client Power to Developers
May 19, 2025 · With ROCm support for both Cloud and Windows ... ROCm integrates seamlessly with Microsoft Azure, enabling powerful AI and HPC workloads.Amd Empowers Developers To... · Amd And Microsoft... · Amd Rocm Everywhere -- Build...Missing: AWS | Show results with:AWS
[114]
https://github.com/ROCm/rocm-spack
[115]
User and AMD GPU Driver (amdgpu) support matrix
Starting from ROCm™ 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart ( ...
[116]
AMD ROCm documentation — ROCm Documentation
ROCm is an open-source software platform optimized to extract HPC and AI workload performance from AMD Instinct GPUs and AMD Radeon GPUs while maintaining ...System requirements (Linux) · What is ROCm? · ROCm libraries · ROCm license
[117]
HIP 7.1.52801 Documentation - AMD ROCm documentation
HIP is a C++ runtime API and kernel language that lets you create portable applications for AMD and NVIDIA GPUs from a single source code.Introduction to the HIP... · Install HIP · HIP compilers · HIP graphsMissing: date | Show results with:date
[118]
ROCm™ AI Developer Hub - AMD
Access ROCm software platforms, tutorials, blogs, open source projects, and other resources for AI development on AMD GPUs.
[119]
Tutorials for AI developers - AMD ROCm documentation
Tutorials for AI developers 7.0 ... RAG with LlamaIndex and Ollama · OCR with vision-language models with vLLM · Building AI pipelines for voice assistants.
[120]
AMD ROCm™ Software
- **Description**: AMD ROCm™ Software is an open-source stack for GPU computation by AMD.
[121]
ROCm ROCm · Discussions - GitHub
Explore the GitHub Discussions forum for ROCm ROCm. Discuss code, ask questions & collaborate with the developer community.
[122]
Posted in 2025 - ROCm™ Blogs - AMD
In this blog from the AMD Silo AI Programs, we build a simple Retrieval‑Augmented Generation (RAG) pipeline. While pretrained models are powerful, they lack ...
[123]
https://github.com/ROCm/ROCm/discussions
[124]
Retrieval Augmented Generation (RAG) with vLLM, LangChain and ...
Learn AI-powered knowledge retrieval that enriches prompts with proprietary data to deliver accurate and context-aware answers.
[125]
Application portability with HIP - AMD GPUOpen
Apr 30, 2024 · The hipify tools can scan code to identify any unsupported CUDA functions. A list of supported CUDA APIs can be found in ROCm's HIPIFY ...<|separator|>
[126]
ROCm vs CUDA: GPU Computing Comparison (October 2025)
Oct 22, 2025 · CUDA now typically outperforms ROCm by 10% to 30%, down from 40% to 50% gaps in previous years. · ROCm costs 15% to 40% less (depending on the ...Missing: parity MLPerf
[127]
AMD ROCm™ Software for HPC
AMD ROCm™ software empowers developers to optimize HPC and Supercomputing applications on AMD Instinct™ accelerators.Missing: integrations ABINIT
[128]
invexed/hipSYCL: Implementation of SYCL for CPUs, AMD ... - GitHub
Hardware and OS requirements. We support CPUs, NVIDIA CUDA GPUs and AMD GPUs that are supported by ROCm; hipSYCL currently does not support other operating ...
[129]
OpenMP* Support - Intel
The Intel oneAPI DPC++/C++ Compiler supports OpenMP C++ pragmas that comply with OpenMP C++ Application Program Interface (API) specification 5.0.
[130]
OpenCL™ Code Interoperability - Intel
OpenCL™ Code Interoperability. The oneAPI programming model enables developers to continue using all OpenCL code features via different parts of the SYCL* API.
[131]
UXL Foundation: Unified Acceleration
oneAPI is designed to enable developers to use a single code base across multiple accelerators and architectures, supporting artificial intelligence, high ...Oneapi Developer Summit 2025... · Steering Members · Contributing MembersMissing: ROCm | Show results with:ROCm
[132]
AMD GPU's boosting ROCm 7.0 software libraries are here
Wed 17 Sep 2025 // 20:40 UTC. AMD closed the performance gap with Nvidia's Blackwell accelerators with the launch of the MI355X this spring.<|control11|><|separator|>
[133]
Compiling SYCL with Different GPUs - Intel
May 11, 2022 · This document demonstrates how a SYCL application can be compiled and executed on different graphics processing units (GPUs) from Intel, AMD, NVIDIA, etc.
[134]
PyTorch on ROCm installation
PyTorch on ROCm provides mixed-precision and large-scale training using AMD MIOpen and RCCL libraries. This topic covers setup instructions and the necessary ...
[135]
Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster ...
Aug 29, 2024 · PyTorch 2.4 now supports Intel Data Center GPU Max Series and the SYCL software stack, making it easier to speed up your AI workflows for both training and ...