Fact-checked by Grok 2 weeks ago

cpio

cpio is a command-line utility in Unix-like operating systems used to create, extract, and manage archives in the cpio format, as well as to copy files between directory trees without creating an intermediate archive.^[1] It operates in three primary modes: copy-out for generating archives from a list of files read from standard input, copy-in for extracting or listing contents from an archive via standard input, and pass-through for directly copying files to a destination directory.^[2] The utility supports multiple archive formats, including binary, old and new ASCII, CRC, HPUX variants, and compatibility with tar formats such as old tar and POSIX.1-1988 ustar, enabling interoperability with tools like tar.^[3] Originally developed by Dick Haight at AT&T's Unix Support Group, cpio first appeared in 1977 as part of PWB/UNIX 1.0 and was publicly released in 1981 with System III Unix, predating the widespread adoption of tar.^[4] As a legacy tool defined in the Single UNIX Specification Version 2 (1997), it remains part of POSIX but is recommended to be replaced by pax in modern applications for enhanced portability and features.^[1] GNU cpio, maintained by the Free Software Foundation since 1995 with the latest version 2.15 released in 2024, extends the original with support for remote archives, sparse files, and automatic format detection across different byte orders.^[3] Key features include the ability to handle multi-volume archives, such as those spanning magnetic tapes, and options for preserving file attributes like modification times and access permissions during operations.^[5] Common usage often pairs cpio with find to generate file lists, as in find . -print | cpio -o > archive.cpio for creation or cpio -i < archive.cpio for extraction, making it valuable for backups, software distribution, and initramfs images in Linux.^[2] Despite its niche status compared to tar or zip, cpio's efficiency in pipe-based workflows and support for large files up to 8 gigabytes in portable modes ensure its continued relevance in system administration.^[1]

Overview

Purpose and Functionality

cpio is a command-line utility available in Unix-like operating systems, designed for creating and extracting archives in the cpio format, as well as listing archive contents and copying files between directories.^[6] It operates in three primary modes: copy-out for generating archives from a list of files, copy-in for extracting files from an archive, and copy-pass for transferring files directly between locations without intermediate archiving.^[7] The tool supports both binary and ASCII-based archive formats, including old binary cpio, old portable ASCII cpio, new portable ASCII cpio with CRC checksums, and compatibility with tar formats such as old tar and POSIX.1 tar.^[6] Additionally, cpio handles a variety of file types, encompassing regular files, directories, symbolic links, and special files like device nodes.^[7] Unlike tools such as tar, which typically perform recursive directory traversal based on command-line arguments to archive entire directory trees by default, cpio relies on a list-of-files input model, reading filenames from standard input—often generated by commands like find—for more controlled and selective archiving.^[6] This approach allows precise specification of files to include, avoiding unintended inclusion of extraneous data, though it requires additional steps to achieve full directory backups.^[8] Common applications of cpio include software distribution, where its format serves as the payload for package managers like RPM to bundle and install files across systems.^[9] It is also widely used in backup systems to create and restore file archives, supporting operations on tapes, disks, or pipes for data preservation and recovery.^[10] In Linux kernel environments, cpio plays a key role in packaging initial RAM filesystems (initramfs), compressing directory structures into cpio archives that the kernel extracts during boot to initialize the root filesystem.^[11]

Basic Syntax and Options

The cpio command is invoked from the command line with a general syntax that varies by operation mode. For archive creation (copy-out mode), the syntax is cpio -o [options] < input-list > output-archive, where the list of files to archive is read from standard input and the resulting archive is written to standard output, often redirected to a file.^[12] For extraction or listing (copy-in mode), it is cpio -i [options] < input-archive, reading the archive from standard input. In pass-through mode for copying files between directories, the syntax is cpio -p [options] destination-directory < name-list.^[12] Options precede the mode flags and can be combined; for instance, multiple short options may follow a single hyphen.^[12] The primary modes determine the core behavior of cpio. The -o or --create flag activates copy-out mode, enabling the creation of an archive from a list of files provided via standard input.^[12] The -i or --extract flag switches to copy-in mode, used for extracting files from an archive or listing its contents.^[12] The -p or --pass-through flag runs copy-pass mode, which copies files directly between directory trees without intermediate archiving.^[12] At least one mode must be specified.^[12] Several key options modify the behavior across modes. The -v or --verbose option provides detailed output, listing processed files in copy-out and copy-pass modes or mimicking ls -l output when combined with -t.^[12] The -t or --list option, valid only in copy-in mode, displays a table of contents for the archive without extracting files; it implies -i if no mode is specified.^[12] For extraction and pass-through, -d or --make-directories automatically creates any necessary parent directories.^[12] The -u or --unconditional flag overwrites existing files without prompting or checking timestamps.^[12] To specify the archive format, -H format or --format=format is used, with common values including newc (new portable ASCII format) or odc (old portable ASCII); the default in copy-out mode is bin (binary), while copy-in auto-detects.^[12] Input for file lists is typically read from standard input, allowing pipelines with commands like ls or echo.^[12] For recursive inclusion of directories, find is commonly used to generate the list, such as find . -print | cpio -o > archive.cpio; to handle filenames with spaces or newlines, the -0 or --null option reads null-terminated strings, paired with find ... -print0.^[12] The cpio command returns an exit status of 0 upon successful completion of all operations and 2 if errors occurred.^[7]

History

Origins in Unix

The cpio utility was originally developed by Dick Haight while working in AT&T's Unix Support Group at Bell Labs. It first appeared in 1977 as part of PWB/UNIX 1.0, the Programmer's Work Bench edition of Unix, which was based on Version 6 Unix and targeted at programming environments. This initial implementation addressed the need for a tool to efficiently copy files into and out of archives, particularly for backup purposes on tape devices, at a time when storage and transfer mechanisms were limited compared to earlier utilities like pack, which focused primarily on compression rather than structured archiving.^[13]^[14] The design of cpio emphasized simplicity and portability, enabling it to handle file transfers and archiving in a sequential manner suitable for tapes and early Unix file systems. Its core functionality allowed for creating archives from standard input (often piped from find) and extracting or listing contents, making it a foundational tool for system administration tasks in restricted Unix environments like PWB/UNIX. The original binary format, known as the "old binary" or "odc" format, used fixed 76-byte headers to store metadata such as file names, permissions, and sizes, providing a compact structure for the era's hardware constraints.^[14]^[15] cpio was first released outside AT&T as part of UNIX System III in 1981, marking its broader availability in commercial Unix distributions. It gained further standardization through inclusion in early POSIX drafts, with formal adoption in POSIX.1-1988, which helped establish it as a portable utility across Unix variants despite later deprecation in favor of pax. Subsequent evolutions built on this foundation but retained the original's emphasis on tape-oriented operations.^[1]

Key Milestones and Evolutions

In the 1980s, cpio saw significant adoption within the UNIX ecosystem through its inclusion in AT&T's System V Release 1 in 1983, marking a key step in standardizing file archiving tools across commercial UNIX variants.^[16] This integration facilitated broader use in backup and distribution tasks. A major evolution came with the introduction of the "newc" (new ASCII) format in System V Release 4 around 1988, which enhanced portability by using human-readable ASCII headers with hexadecimal fields for device numbers and timestamps, allowing archives to transfer seamlessly across different architectures without byte-order issues.^[17] Concurrently, the "bin" (binary) format received additions like CRC checksums in variants such as the "crc" format, providing basic data integrity verification through a simple sum of file bytes, though misnamed as it lacks true cyclic redundancy checking.^[18] The 1990s brought formal standardization and open-source enhancements, with cpio's core functionality adopted into the POSIX.1-1990 standard (IEEE Std 1003.1-1990), ensuring consistent behavior for archive creation, extraction, and listing across compliant systems.^[19] This ratification emphasized the old character (odc) and newc formats for interoperability, limiting filenames to 256 characters but preserving permissions, ownership, and timestamps. Simultaneously, the GNU cpio project was initiated in 1995 under the Free Software Foundation, introducing enhancements for handling longer filenames beyond traditional limits via extended header fields in newc and crc formats, addressing growing needs in distributed file systems.^[3] During the 2000s, cpio's role expanded in embedded and kernel environments, notably through its integration into Linux kernel tools with the adoption of initramfs in version 2.6 series around 2005, where compressed cpio archives served as initial RAM filesystems for early boot processes, simplifying module loading and root mounting.^[20] Some cpio variants, particularly those aligned with POSIX utilities like pax, began supporting extensions for extended attributes and longer paths, enabling compatibility with modern filesystems while maintaining backward compatibility with core formats.^[21] In recent years, post-2010 developments have focused on security refinements with minimal changes to core formats, as seen in GNU cpio versions from 2012 onward, which added options like --no-absolute-filenames to prevent extraction of absolute-path files outside the current directory, mitigating symlink attacks and path traversal risks in untrusted archives. GNU cpio 2.15, released in January 2024, continued this focus with further bug fixes and improvements to security options.^[22]^[23] These updates underscore cpio's enduring stability, with ongoing maintenance emphasizing robustness over radical redesign.^[3]

Archive Format

Overall Structure

A cpio archive is fundamentally a sequential stream formed by the concatenation of multiple file entries, where each entry represents a single file, directory, or other filesystem object from the original system. This design enables the archive to be processed as a continuous byte stream, facilitating efficient creation and extraction without requiring random access to specific positions within the file.^[24]^[25] Each file entry begins with a fixed-size header that encodes essential metadata, followed by the filename string (null-terminated, with length specified in the header; padded with null bytes if needed for alignment in some formats), the actual file data (if the file is not empty or a directory), and in certain formats, padding after the data. While tools may pad output to block sizes (e.g., 512 bytes) for media like tapes, the format itself uses only small alignments (2 or 4 bytes) for filenames and data in certain variants to maintain stream integrity. The header and alignment mechanisms allow the archive to accommodate varying file sizes while preserving the order of entries as they were added.^[24]^[25] The archive ends with a special trailer record consisting of a header for a fictional file named "TRAILER!!!" with all numeric fields set to zero and file size 0, followed by the 8-byte filename (no data). Some implementations use specific magic bytes in the trailer for variant-specific recognition.^[24]^[25] cpio supports several format variants to balance portability, efficiency, and feature richness: the binary format (bin) uses compact binary integers for fields, making it fast but non-portable across architectures due to machine-dependent byte order; the old ASCII-based format (odc) employs human-readable octal strings for fields, enhancing portability; the new ASCII-based format (newc) uses hexadecimal strings and extends field sizes; and the CRC variant adds a checksum. cpio also supports compatibility with tar formats such as old tar and POSIX.1-1988 ustar, allowing interoperability with tools like tar and pax. Across these variants, cpio consistently preserves critical file attributes, including permissions (mode), user and group ownership (UID/GID), modification timestamps (mtime), and inode numbers for handling hard links.^[24]^[25]^[3] Due to its stream-oriented nature, cpio archives are inherently streamable, permitting incremental reading or writing in a single pass—headers provide all necessary cues for parsing subsequent data without indexing or seeking, which is particularly advantageous for tape backups or piped operations in Unix pipelines. Block handling is flexible, with no mandatory fixed block size enforced at the format level, though tools often default to 512-byte blocks for compatibility with magnetic tape media.^[24]^[25]

Header and File Data Details

The cpio archive format employs distinct header structures across its variants to encode file metadata, with the old binary (bin) and old portable ASCII (odc) formats using a 26-byte binary header or a 76-byte ASCII header, respectively, both identifiable by the magic number 070707 (octal). In the old binary format, the header consists of fixed binary fields in a machine-dependent byte order, including device ID (2 bytes), inode number (2 bytes), mode (2 bytes, incorporating S_IFMT bits for file type such as 0040000 octal for directories), user ID (2 bytes), group ID (2 bytes), link count (2 bytes), raw device ID (2 bytes), modification time (4 bytes), name size (2 bytes), and file size (4 bytes). The odc format mirrors these fields but encodes them as fixed-width ASCII strings—6 characters each for device, inode, mode, UID, GID, link count, and raw device (octal digits, right-padded with nulls), and 11 characters each for modification time and file size (octal)—resulting in the 76-byte total. These formats support inode numbers up to approximately 2^18 in odc due to the field width, while the binary variant limits them to 16 bits.^[18] The newer ASCII format, known as newc with magic number 070701 (6 ASCII characters), extends the header to 110 bytes to accommodate larger values and separate major/minor device numbers, using 8-character hexadecimal ASCII fields (right-padded with nulls) for all numeric values, including inode, mode (with S_IFMT bits), UID, GID, link count, modification time, file size, major device, minor device, major raw device, minor raw device, name size, and a checksum field (always 0 in newc). This structure allows for file sizes up to 4 GB and inodes up to 32 bits, with device major/minor fields enabling precise representation of special files. The CRC variant uses the identical 110-byte newc header but with magic 070702 and populates the checksum field with a 32-bit unsigned sum of the file data bytes for integrity verification.^[18] Filenames follow immediately after the header as null-terminated strings of length specified by the name size field; in odc and binary formats, no padding is applied to the filename, but in newc and CRC, it is padded with null bytes to a multiple of 4 bytes for alignment. The file data, if any, is stored next as raw bytes. In binary and odc formats, no padding follows the data; in newc and CRC, it is padded with null bytes to a multiple of 4 bytes. For hard links, the link count field exceeds 1, and subsequent entries share the same inode and device numbers; in newc/CRC, only the first occurrence includes the full file data (with file size >0), while later ones set file size to 0 to avoid duplication. Directories are represented without separate entries for contents, implied solely by the mode field's S_IFDIR bits (0040000 octal), with the filename conventionally ending in a slash (/) and file size 0, as no data is stored.^[18]^[3] Checksums are not standard in binary or odc formats but are optionally supported in the CRC variant via the 8-character hexadecimal field holding the sum of all file data bytes modulo 2^32, computed as an unsigned 32-bit integer; this provides basic error detection without affecting the newc structure. ASCII-based formats (odc, newc, CRC) prioritize cross-platform portability through human-readable numeric fields, avoiding binary endianness issues, though they require parsing fixed-width strings. The binary format, while compact, is less portable due to host-dependent byte ordering.^[18]

Core Operations

Archive Creation

To create a cpio archive, the utility is invoked in copy-out mode using the -o or --create option, which reads a list of filenames from standard input and writes the resulting archive to standard output.^[1]^[26] The basic syntax is cpio -o [-H format] [-A] < filelist > archive.cpio, where the input list provides the pathnames of files and directories to include, one per line by default.^[26]^[27] Input preparation typically involves generating the filename list dynamically with commands such as ls, echo, or find to capture desired files or directory trees. For example, to archive all files in the current directory, ls | cpio -o > directory.cpio pipes the output of ls directly to cpio; for a recursive directory tree, find . -print | cpio -o > tree.cpio ensures inclusion of subdirectories and their contents.^[28]^[26] If filenames contain spaces, newlines, or special characters, the input must use null-termination by pairing find ... -print0 | cpio --null -o to avoid truncation or misinterpretation.^[26] The output consists of sequential headers followed by file data, formatted according to the specified type via the -H option (e.g., -H newc for the portable ASCII format compatible with modern systems).^[1]^[26] Appending to an existing archive is possible with -A or --append, provided the output destination (specified via -F or -O in some implementations) supports it.^[26]^[27] Compression is handled externally by piping the output, such as find . | cpio -o | [gzip](/page/Gzip) > archive.cpio.gz, allowing integration with tools like gzip or bzip2.^[28]^[27] Cpio preserves file attributes during creation, including permissions, timestamps, ownership (if privileged), and extended attributes like ACLs when the underlying filesystem and implementation support them.^[1]^[27] Special files (e.g., devices, symlinks) are handled by storing their metadata in the header rather than contents, with options like -L dereferencing symlinks if needed. For output to a specific directory instead of stdout, some variants use -O dir to write the archive file there.^[26]^[27] Common pitfalls include failing to ensure null-terminated input for complex filenames, which can lead to incomplete archives, and providing absolute paths in the input list, potentially causing extraction issues on different systems unless relative paths (e.g., via find .) are used.^[28]^[26]

Archive Extraction and Listing

The cpio utility performs archive extraction in copy-in mode, invoked with the -i option, which reads an archive from standard input and recreates the archived files in the current directory tree. This process restores the original file paths relative to the current working directory, along with permissions, modification times (if specified), and ownership details such as user ID (UID) and group ID (GID); however, changing ownership requires appropriate privileges, typically root access for non-superuser files.^[1]^[12] Key suboptions enhance extraction control: -d or --make-directories automatically creates any necessary parent directories for the files being restored, preventing failures due to missing paths.^[1] The -u or --unconditional flag allows overwriting of existing files regardless of their modification times, while -m or --preserve-modification-time retains the original timestamps from the archive (though this may not apply to directories).^[1] A basic extraction command is cpio -i < archive.cpio, which processes all files matching the default pattern *.^[1] For listing archive contents without extraction, combine -i with -t or --list, producing a table of contents that displays file names; adding -v or --verbose yields a detailed, ls -l-style output including sizes, dates, permissions, and ownership.^[1] Selective extraction or listing uses pattern matching with shell wildcards (e.g., cpio -i '*.txt' < archive.cpio for text files only), or in GNU cpio, the -E or --pattern-file option reads patterns from a specified file for more complex filtering.^[1]^[12] The -t mode serves as a verification step, allowing users to inspect the archive's integrity and contents before committing to full extraction.^[12] To direct extraction to a specific directory, change the working directory with cd prior to running cpio, or use GNU cpio's --no-absolute-filenames option to strip leading slashes from paths, treating them as relative and avoiding overwrites in the root filesystem.^[12] Archives from compressed sources can be extracted via piping, such as zcat archive.cpio.gz | cpio -i -d, enabling seamless handling of gzipped or other compressed formats without intermediate decompression.^[12]

Advanced Features

File Passing and Copying

Cpio's copy-pass mode enables direct copying of files and directories from one location to another without generating an intermediate archive file, effectively combining the copy-out and copy-in operations into a single pass-through process. This mode reads a list of file pathnames from standard input—typically generated by a command like find—and streams the files to a specified destination directory, preserving file attributes such as permissions, timestamps, and ownership where possible. It is invoked using the -p or --pass-through option followed by the destination directory, as in find source -print | [cpio](/page/cpio) -p destination.^[12]^[7] In this mode, cpio processes files recursively if the input list includes directories, creating the necessary directory structure at the destination with the -d or --make-directories option, which automatically creates parent directories as needed. The -l or --link option optimizes efficiency by creating hard links to the source files instead of full copies when possible, reducing disk usage and time, particularly for identical files across the filesystem. For symlink handling, the -L or --dereference option follows symbolic links and copies the target files rather than the links themselves, while the default behavior preserves the symlinks. Additionally, the -u or --unconditional option ensures all files are copied or replaced, ignoring timestamps to overwrite even newer destination files. This mode supports crossing filesystem boundaries, maintaining attributes like modification times (with the -m option) and access control lists (ACLs, with -P where supported).^[12]^[5]^[7] The primary advantages of copy-pass mode include its ability to preserve file metadata and attributes across different filesystems, making it suitable for system backups, migrations, or mirroring directory trees without the overhead of temporary archive storage, thus conserving disk space and I/O operations. It is particularly efficient for local copies where hard linking can be leveraged, and it avoids the need for archive creation, which can be resource-intensive for large datasets. However, it requires a pre-generated input list, often from find, limiting its use to scenarios where such a list can be produced, and it does not natively support remote destinations—though it can be combined with tools like ssh for piped transfers using copy-out and copy-in modes, such as find . -print | cpio -o | ssh user@host 'cd /remote/dir && cpio -idm' (adjusted for actual remote handling).^[28]^[5]^[7] A common workflow equivalent to cp -a for recursive copying with attribute preservation is find . -depth -print | cpio -pdum /destination, where -depth ensures post-order traversal for proper directory handling, -p enables pass-through, -d creates directories, -u overwrites unconditionally, and -m retains modification times. This command streams the file tree from the current directory to /destination, demonstrating cpio's utility in straightforward, attribute-aware file tree duplication.^[28]^[5]

Error Handling and Options

cpio encounters several common errors during its operations, including "premature end of archive" when attempting to read an incomplete or truncated archive, which can occur due to transmission issues or media failures.^[3] Similarly, "file not found" errors arise when the input file list includes nonexistent paths during archive creation, causing cpio to skip those entries and continue processing. Permission denied errors frequently manifest during extraction, particularly with user ID (UID) mismatches when preserving ownership without sufficient privileges, or when lacking write access to target directories.^[5] These diagnostics can be enhanced using the -v or --verbose option, which lists processed files and highlights issues in real-time.^[3] To improve robustness, cpio provides options for selective file handling, such as -f or --nonmatching in copy-in mode, which extracts only files not matching specified shell glob patterns, effectively excluding unwanted items like temporary files. The -R or --owner=[user][:.][group] option allows setting specific ownership during extraction or copying, overriding archive metadata and requiring superuser privileges for UID/GID changes.^[3] Additionally, --quiet suppresses non-essential output, reducing verbosity while still reporting critical errors, which aids in scripted environments.^[3] For input patterns in copy-in mode, command-line arguments act as globs to include only matching files by default, enabling precise control over extraction scope. Security considerations in cpio include the --no-preserve-owner option, which prevents altering file ownership during extraction, defaulting to the extracting user's UID/GID to avoid privilege escalations from archives containing root-owned files.^[3] This mitigates risks when processing untrusted archives, as non-root users cannot inadvertently elevate permissions. Regarding long path names exceeding 256 characters, older cpio implementations issue warnings or fail due to format limitations in headers like the newc format, which caps names at 256 bytes; modern versions, such as GNU cpio, handle extensions but recommend using compatible formats to avoid truncation.^[3] For debugging, the --verbose option provides detailed traces of file operations, serving as the primary diagnostic tool since no dedicated --debug flag exists in standard implementations. cpio also manages signals like SIGINT (from Ctrl+C) by interrupting operations and performing basic cleanup, such as closing open files and removing partial extractions where possible, though interrupted archives may require manual intervention.^[3] The -W or --warning option further controls error reporting, with modes like "truncate" to alert on incomplete headers without halting execution.^[3] Performance can be optimized using the -B option, which sets the I/O block size to 5120 bytes instead of the default 512, accelerating reads and writes on tape devices or slow media by reducing overhead. This tweak is particularly beneficial for large archives, as it minimizes system calls during data transfer.^[5]

Standardization and Compatibility

POSIX Specifications

The POSIX specification for the cpio utility was defined in IEEE Std 1003.1-1996 (ISO/IEC 9945-1:1996), requiring support for three operational modes: copy-out mode invoked with the -o option for creating archives from file lists, copy-in mode with the -i option for extracting or listing archive contents, and copy-pass mode with the -p option for copying files between directory trees without intermediate archiving. Implementations must support the binary format (default for copy-out), the old portable (odc) format, and the new portable (newc) format, with the latter providing enhanced portability for filenames and timestamps. Mandatory options include -d to create necessary parent directories during extraction or passing, -m to retain original modification times, -u to overwrite existing files unconditionally, -v for verbose output during operations, and -t (in copy-in mode) to produce a table of contents without extraction.^[1] Specified behaviors emphasize preservation of file attributes: cpio must retain file types (regular files, directories, symbolic links, special files), permissions, and ownership where possible (typically requiring appropriate privileges), while handling hard links via inode numbers in the archive header, particularly in copy-pass mode with the -l option to create links instead of copies. Input is read from standard input (a list of pathnames for copy-out and copy-pass, or an archive for copy-in), and output is written to standard output (archive for copy-out, files for copy-in and copy-pass), with archives padded to 512-byte block boundaries for tape compatibility; no built-in recursion is provided, necessitating use with tools like find for processing directory hierarchies. Exit codes are standardized as 0 for successful completion and greater than 0 if an error occurred.^[1] The utility synopsis adheres precisely to the IEEE 1003.1 syntax: cpio -o[options] for copy-out, cpio -i[options] [patterns] for copy-in, and cpio -p[options] directory for copy-pass, ensuring interoperability across conforming systems. Compliance is assessed through POSIX certification test suites from The Open Group and IEEE, which validate adherence to these requirements; while the core behaviors are mandatory, extensions (such as additional format options or interactive renaming with -r) are permitted if clearly documented as non-POSIX. The cpio utility and its specification were removed from the Shell and Utilities volume of POSIX starting with IEEE Std 1003.1-2001 (POSIX.1-2001), in favor of the more extensible pax utility, though the cpio archive format constants remain defined in the <cpio.h> header as an XSI extension in subsequent standards including POSIX.1-2008.^[1]^[29]

Vendor and Implementation Variations

Implementations of the cpio utility vary across Unix-like systems, introducing extensions and behaviors that extend or deviate from the POSIX baseline to address specific platform needs or historical legacies. In Solaris and its derivative illumos, cpio supports formats including bar, crc, odc, tar, and ustar. Additionally, Solaris and illumos cpio implementations allow the -I and -O options to specify input and output files directly, bypassing standard input and output streams for more flexible I/O handling in scripted or automated workflows.^[5]^[30] On AIX and HP-UX, cpio supports the CRC format via the -H crc option for archives with checksums, in addition to the standard ASCII format via -c. These systems support various formats for compatibility with their environments.^[2]^[31] Compatibility challenges arise primarily from binary archive formats, which are sensitive to byte-order differences between big-endian and little-endian architectures; for instance, archives created on a big-endian system like older HP-UX may fail to extract correctly on little-endian platforms without byte-swapping utilities.^[32] Support for long filenames further varies: while POSIX-compliant cpio limits names to 255 characters in the newc format, GNU cpio employs non-standard extensions to accommodate longer paths, potentially causing truncation or errors in cross-implementation transfers.^[33] For improved interoperability, the ASCII-based newc format is recommended, as its text representation avoids endianness issues and facilitates exchange across diverse systems.^[12] Tools like pax bridge these gaps by reading and writing cpio archives alongside tar formats, enabling seamless migration without format conversion losses.^[21] Although cpio has been deprecated in POSIX.1-2001 in favor of pax due to its limitations in handling large files and modern features, it remains integral in embedded and boot contexts, such as Linux initramfs where its simple cpio structure allows efficient kernel loading of initial filesystems.^[34]^[20]

Implementations

GNU cpio

GNU cpio is the reference implementation of the cpio utility within the GNU Project, developed and maintained by the Free Software Foundation to provide a robust tool for archiving and file manipulation on Unix-like systems.^[35] The project is currently maintained by Sergey Poznyakoff, with contributions handled through the bug-cpio mailing list, ensuring ongoing updates and bug fixes.^[35] The latest stable release is version 2.15, issued on January 14, 2024.^[36] A key strength of GNU cpio lies in its extensive set of extensions beyond the POSIX baseline, enhancing flexibility in archive handling. Users can specify archive formats explicitly using the --format option, supporting variants such as pax (POSIX.1-2001 ustar extension), odc (original POSIX.1), bin (binary), and newc (new portable format with CRC).^[12] Ownership management is refined with --owner to set user and group IDs during extraction or copying, while --no-preserve-owner allows extraction without altering file ownership, defaulting to the extracting user's privileges for non-root operations.^[12] Additional features include --sparse for efficient handling of files containing large zero-filled blocks by writing them as sparse files on supported filesystems, and the -a (or --reset-access-time) option to restore original access times after reading files in copy-out or copy-pass modes.^[12] In Linux distributions, GNU cpio plays a central role in system initialization processes, particularly through integration with dracut, the tool responsible for generating initramfs images in cpio format to preload kernel modules and essential files during boot.^[37] It facilitates scripting scenarios, such as creating directory structures on-the-fly with --make-directories (or -d), which is invaluable for automated archive extraction in deployment or backup scripts. The comprehensive option set, including verbose output, pattern matching, and device handling, is fully documented in the GNU cpio man page, serving as the primary reference for advanced usage.^[12] GNU cpio is distributed under the GNU General Public License version 3 or later, promoting free redistribution and modification while requiring source code availability. During compilation from source, it supports configuration for large file support (LFS), enabling transparent handling of files exceeding 2 GiB on 32-bit systems through autoconf macros like AC_SYS_LARGEFILE.^[38] This ensures compatibility with modern filesystems and large-scale data operations. Performance optimizations in GNU cpio are geared toward efficient processing of voluminous archives, primarily through buffered I/O mechanisms that minimize system calls. The -B option sets a larger default block size of 5120 bytes for tape or pipe I/O, significantly accelerating throughput for bulk transfers compared to the 512-byte default.^[12] These enhancements make it suitable for high-volume tasks in Linux environments without compromising reliability.

BSD and Other Unix-like Variants

In BSD-derived systems, the cpio utility is typically integrated into the base system and leverages the libarchive library for enhanced format support and portability, enabling seamless handling of archives beyond traditional cpio formats.^[39]^[40] This contrasts with the GNU implementation by prioritizing a compact footprint and multi-format interoperability, such as extracting from tar or pax archives without requiring separate tools.^[41] FreeBSD and NetBSD incorporate cpio as a core component, built atop libarchive to support input from tar, pax, cpio, zip, jar, ar, and ISO 9660 images, while outputting to tar, pax, cpio, ar, or shar formats.^[39]^[40] In FreeBSD 14, released in 2023, this implementation facilitates tar interoperability directly through libarchive, allowing users to process mixed archive types efficiently. Key options include -L for dereferencing symbolic links during output or pass-through modes, ensuring the actual file contents are archived rather than the links themselves.^[39] NetBSD's version similarly emphasizes POSIX compliance with extensions for compression like -z (gzip) and -J (xz), making it suitable for system administration tasks in resource-constrained environments.^[40] OpenBSD's cpio adopts a minimalistic approach, focusing on core functionality for copying files to and from cpio archives while supporting additional formats such as bcpio, sv4cpio, pax, tar, and ustar.^[27] Security considerations are prominent, with the utility designed to continue processing archives despite errors—logging issues to stderr without halting—and avoiding creation of duplicate files if hard link operations fail.^[27] It includes the -z option for gzip compression and decompression, alongside -j for bzip2, but omits certain advanced features like the -s and -S swap bytes options found in other variants.^[27] This design reduces the attack surface, aligning with OpenBSD's emphasis on secure defaults, such as no setuid privileges by default. Other Unix-like variants extend cpio's reach into specialized environments. macOS, being BSD-derived, provides cpio through libarchive's bsdcpio, which mirrors FreeBSD's capabilities for multi-format support and is available out-of-the-box for archive operations.^[41] In Android, toybox supplies a lightweight cpio implementation as part of its multi-tool binary, optimized for mobile systems with basic modes for extraction (-i), creation (-o), and pass-through (-p), while omitting extensive options to minimize binary size.^[42] For embedded systems, BusyBox offers a stripped-down cpio with reduced options—such as limited support for pass-through mode and no advanced compression—to prioritize space efficiency in resource-limited devices like routers or IoT hardware.^[43] Compared to GNU cpio, BSD variants generally feature a smaller memory and disk footprint due to libarchive's streamlined design, though they lack native support for certain GNU-specific extensions like the pax format without explicit configuration. In practice, BSD users often pair cpio with pax for modern archiving needs, leveraging pax's extended POSIX features for better handling of long filenames and permissions in heterogeneous environments.^[44]

References

[1]
cpio
The cpio utility copies files to an archive, extracts files from an archive, or copies files between directory trees.Missing: documentation | Show results with:documentation
[2]
cpio(1): copy files to/from archives - Linux man page
GNU cpio is a tool for creating and extracting archives, or copying files from one place to another. It handles a number of cpio formats as well as reading and ...Missing: utility | Show results with:utility
[3]
cpio
### Summary of Introduction to GNU cpio
[4]
https://manpages.ubuntu.com/manpages/bionic/man1/bsdcpio.1.html
[5]
cpio - man pages section 1: User Commands - Oracle Help Center
Jul 27, 2022 · The cpio command copies files into and out of a cpio archive, which can span multiple volumes. It uses -i, -o, and -p options.Missing: utility | Show results with:utility
[6]
cpio: - GNU.org
Dec 20, 2004 · GNU cpio is a tool for creating and extracting archives, or copying files from one place to another. It handles a number of cpio formats as well as reading and ...
[7]
cpio(1) - Arch manual pages
DESCRIPTION. GNU cpio copies files between archives and directories. It supports the following archive formats: old binary cpio, old portable cpio, SVR4 cpio ...
[8]
Comparing tar and cpio - CCSF
cpio is useful for archiving bits and pieces of data. In order to tell it to archive every file in a directory, it must be given the path of each file in the ...Missing: functionality | Show results with:functionality
[9]
Packaging and distributing software | Red Hat Enterprise Linux | 8
The payload is a cpio archive that contains files to install to the system. There are two types of RPM packages. Both types share the file format and tooling, ...
[10]
Tutorial: Use Linux cpio to back up and restore files - TechTarget
Dec 15, 2023 · The cpio utility is an archive-based tool, much like tar. It copies files into an archive to back them up and extracts them from an archive to restore them.Missing: functionality | Show results with:functionality
[11]
ramfs-rootfs-initramfs.txt - The Linux Kernel Archives
All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is extracted into rootfs when the kernel boots up. After extracting, the kernel checks to ...
[12]
cpio: 3.4 Options - GNU.org
This section summarizes all available command line options. References in square brackets after each option indicate cpio modes in which this option is valid.Missing: man page
[13]
Dick Haight - Unix Heritage Wiki
May 13, 2022 · Haight contributed find, cpio, and expr, all in v7. By personal communication with Dick Haight we know he also contributed popen (in v7) and ...
[14]
8.5 Comparison of tar and cpio - GNU.org
cpio first showed up in PWB/UNIX 1.0; no generally-available version of UNIX had tar at the time. I don't know whether any version that was generally available ...
[15]
cpio — format of cpio archive files - Ubuntu Manpage
PWB format The PWB binary cpio format is the original format, when cpio ... It appeared in 1977 as part of PWB/UNIX 1.0, the “Programmer's Work Bench ...
[16]
[PDF] UNIX System V manual
It provides the UNIX programmer or operating system user with an overview of this implementation and details of commands, subroutines, and other facilities.
[17]
cpio.1 - The Heirloom Project
The -c format was introduced with System V Release 4. Except for the file size, it imposes no practical limitations on files archived. The original SVR4 ...Missing: 1983 | Show results with:1983
[18]
cpio(5) - Arch manual pages
The cpio utility is no longer a part of POSIX or the Single Unix Standard. It last appeared in Version 2 of the Single UNIX Specification (“SUSv2”). It has been ...<|control11|><|separator|>
[19]
[PDF] IEEE standard portable operating system interface for computer ...
The other POSIX standards are described in Appendix A. Organization of the Standard. The standard is divided into four parts: (1) Statement of scope (Chapter 1).<|control11|><|separator|>
[20]
Ramfs, rootfs and initramfs - The Linux Kernel documentation
Oct 17, 2005 · What is initramfs?¶ ... All 2.6 Linux kernels contain a gzipped “cpio” format archive, which is extracted into rootfs when the kernel boots up.
[21]
pax - The Open Group Publications Catalog
The pax utility shall support the following formats: cpio: The cpio interchange format; see the EXTENDED DESCRIPTION section. The default blocksize for this ...Missing: variants | Show results with:variants
[22]
RHBA-2012:0444 - Bug Fix Advisory - Red Hat Customer Portal
Jun 20, 2012 · This update fixes the following bug: Prior to this update,the options --to-stdout and --no-absolute-filenames were. not listed in the cpio ( ...
[23]
Format of cpio archives - IBM
The cpio command reads and writes either a compact binary format header or an ASCII format header. The tar command reads and writes headers in either the ...
[24]
format of cpio archive files - FreeBSD
DESCRIPTION The cpio archive format collects any number of files, directories, and other file system objects (symbolic links, device nodes, etc.) into a single ...
[25]
https://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt
[26]
cpio: 3.1 Copy-out mode
### Summary of Copy-out Mode in cpio
[27]
cpio(1) - OpenBSD manual pages
EXIT STATUS. The cpio utility exits with one of the following values: 0: All files were processed successfully. 1: An error occurred. DIAGNOSTICS. Whenever cpio ...
[28]
cpio: 2 Tutorial - GNU.org
You can instruct cpio to remove leading slashes using the ' --no-absolute-filenames ' option. ... GNU find , combined with the ' --null ' option of cpio .Missing: introduction | Show results with:introduction
[29]
<cpio.h>
The SEE ALSO is updated to refer to pax, since the cpio utility is not included in the Shell and Utilities volume of IEEE Std 1003.1-2001. End of informative ...
[30]
manual page: bsdcpio.1 - Tribblix
CPIO(1) User Commands CPIO(1) ... newc The SVR4 portable cpio format. odc The old ... for the "odc" variant, which can support files up to 8 gigabytes. illumos ...
[31]
cpio Command - IBM
The cpio command copies files into and out of archive storage and directories. It can copy files to standard output or into a directory.
[32]
cpio Man Page - Linux - SS64.com
The following archive formats are supported: binary, old ASCII, new ASCII, crc, HPUX binary, HPUX old ASCII, old tar, and POSIX.1 tar. The tar format is ...
[33]
How would you use the cpio command?
Jan 31, 2014 · Cpio was originally designed to store backup file archives on a tape device in a sequential, contiguous manner.Change CPIO format to newc without extractionExtracting all cpio files in a directory - Unix & Linux Stack ExchangeMore results from unix.stackexchange.com
[34]
cpio - man pages section 1: User Commands - Oracle Help Center
The cpio command copies files into and out of a cpio archive. The cpio archive can span multiple volumes. The –i, –o, and –p options select the action to be ...Missing: 1983 | Show results with:1983<|separator|>
[35]
Cpio - GNU Project - Free Software Foundation
GNU cpio copies files into or out of a cpio or tar archive. The archive can be another file on the disk, a magnetic tape, or a pipe.
[36]
Index of /gnu/cpio
- **Current Stable Version**: GNU cpio 2.15
[37]
How to build an initramfs using Dracut on Linux - LinuxConfig
Sep 22, 2025 · Dracut is a tool used to build initramfs cpio archives. It originated, and is mainly used on Fedora and the other distributions that are part of the Red Hat ...
[38]
https://www.gnu.org/software/cpio/manual/html_node/Installation.html
[39]
cpio
### Summary of cpio in FreeBSD
[40]
cpio(1) - NetBSD Manual Pages
They first appeared in 1977 in PWB/UNIX 1.0, the ``Programmer's Work Bench'' system developed for use within AT&T. They were first released outside of AT&T as ...Missing: introduction | Show results with:introduction
[41]
libarchive - C library and command-line tools for reading and writing ...
The source distribution includes the libarchive library, the bsdtar and bsdcpio command-line programs, full test suite, and documentation.
[42]
What is toybox?
### Summary: Toybox cpio in Android - Lightweight Implementation, Features
[43]
The Swiss Army Knife of Embedded Linux - BusyBox
BusyBox combines tiny versions of common UNIX utilities into a single small, modular executable, acting as a multi-call binary.
[44]
pax(1) - OpenBSD manual pages
pax will read, write, and list the members of an archive file and will copy directory hierarchies. pax operation is independent of the specific archive format.