Fact-checked by Grok 2 weeks ago

rsync

Rsync is a and open-source utility for systems that synchronizes files and directories between two locations, either locally or across networked computers, by transferring only the differences between source and destination files using a delta-transfer algorithm. Widely used for backups, mirroring, and as an enhanced file-copying command, rsync supports local transfers, remote operations via secure shells like SSH, or connections to an rsync daemon for server-based . It preserves such as permissions, timestamps, ownership, symbolic links, and extended attributes when used in archive mode. Developed by Australian computer scientist Andrew Tridgell as part of his research on efficient synchronization algorithms at the Australian National University, rsync was first publicly released on June 19, 1996, with co-contributions from Paul Mackerras. The tool's core rsync algorithm, which enables minimal data transfer by computing and sending rolling checksums to identify unchanged blocks, was detailed in Tridgell's 1999 thesis titled Efficient Algorithms for Sorting and Synchronization. Maintained under General Public License (GPL), rsync has evolved through community contributions, with Wayne Davison serving as a key maintainer until recent years; the latest stable version, 3.4.1, was released on January 15, 2025, incorporating security fixes and performance improvements. Tridgell continues to contribute sporadically, including recent work on vulnerability patches. Key features of rsync include its delta-transfer algorithm, which significantly reduces bandwidth usage for modified files compared to full copies, and options for during transfer to further optimize network efficiency. It handles recursive directory traversal, deletion of extraneous destination files with the --delete option, and bandwidth limiting to prevent overwhelming connections. For remote use, rsync integrates seamlessly with SSH for or can operate in daemon mode for anonymous access, making it suitable for public mirrors and automated backups. Despite its name suggesting two-way sync, rsync performs unidirectional transfers, requiring manual invocation for bidirectional synchronization, and it does not natively support two remote hosts without an intermediary. Its versatility has made it a standard tool in system administration, embedded in distributions like and , and extended in projects such as Duplicity for encrypted backups.

Overview

Definition and Purpose

Rsync is an open-source utility designed for fast and efficient and transfer, both locally and across networks. It operates by comparing source and destination files or directories, transferring only the differences (delta-transfer ) rather than entire files, which significantly reduces usage and transfer times, especially for large datasets or incremental updates. The primary purpose of rsync is to facilitate reliable backups, data mirroring, and synchronization tasks, making it a versatile tool for system administrators, developers, and users needing to maintain consistent file sets between locations. It supports various transfer modes, including local copying, remote transfers via secure shells like SSH, or direct connections to an rsync daemon over , while preserving such as permissions, timestamps, and symbolic links. This capability extends its utility beyond simple copying to advanced scenarios like automated backups and distributed file management. By minimizing through its algorithmic efficiency, rsync addresses common challenges in file operations over networks, such as slow connections or high-latency environments, positioning it as a standard tool in systems for tasks requiring precision and speed.

Core Features

Rsync is a versatile utility designed for efficient and transfer, both locally and across networks. At its core, it employs a delta-transfer that minimizes data transmission by identifying and sending only the differences between source and destination files, rather than transferring entire files. This , developed by Andrew Tridgell and Paul Mackerras, uses rolling checksums to divide files into blocks and match unchanged portions, achieving significant bandwidth savings, especially for large files with minor modifications. A key capability is its support for remote operations, allowing synchronization between local and remote hosts via remote shells such as SSH (secure) or RSH (insecure), or through an rsync daemon for anonymous or authenticated access. This enables seamless mirroring of directories across networked systems without requiring privileges on either end. Rsync also preserves essential during transfers, including symbolic links, device files, ownership, group memberships, permissions, and timestamps, ensuring the integrity of structures. Additional features enhance its utility for backups and selective . Users can apply include and exclude patterns to files, similar to those in tar, allowing precise control over what is transferred. Options for during transit reduce further, while bandwidth limiting prevents overload. Rsync supports dry-run modes to preview changes without executing them and can handle sparse files, hard links, and incremental backups via mechanisms like hard-linking to previous versions. These elements collectively make rsync a robust tool for maintaining consistent file sets across diverse environments.

History

Origins and Initial Development

Rsync was originally developed by Andrew Tridgell and Paul Mackerras during Tridgell's PhD research at the . The project emerged in the mid-1990s as a response to the inefficiencies of existing tools like FTP and RCP, which transmitted entire files even when only minor updates were needed, particularly over low-bandwidth, high-latency connections such as dial-up links used for distributing software packages. Tridgell, who was also involved in the project for cross-platform , sought a method to minimize data transfer volumes while ensuring reliable synchronization. The initial implementation of rsync was detailed in a June 1996 technical report co-authored by Tridgell and Mackerras, titled "The rsync algorithm," which outlined a novel approach for remote file updates. This algorithm divides the destination file into non-overlapping blocks of fixed size (typically 700 bytes) and computes weak rolling checksums (32-bit Adler-32) and strong checksums (128-bit MD4) for each block on the destination machine. The source machine then scans its file using the rolling checksums to find matching blocks at arbitrary offsets, transmitting only the differences—either literal data or references to matching blocks—along with instructions for reconstruction. This design required just one round trip between machines, reducing latency, and included optional compression via zlib to further optimize bandwidth usage. The report emphasized the algorithm's efficiency for similar files, with low computational overhead and negligible collision risk due to strong checksum verification, with false matches occurring less than once per 1000 true matches (estimated failure probability very low for typical file sizes). Rsync's first public release occurred in June 1996, shortly after the technical report, and was made available under the GNU General Public License. The tool provided a similar to RCP for ease of adoption, initially supporting local and remote synchronization via SSH or rsh. Early testing focused on real-world scenarios like , demonstrating significant bandwidth savings—for instance, updating a 1 MB file with 10% changes required sending only about 100 KB plus block signatures. Development was hosted within the project infrastructure, reflecting Tridgell's overlapping work, and the initial version laid the foundation for rsync's core delta-transfer mechanism, which remains central to its operation.

Major Releases and Recent Updates

Rsync's major releases have evolved significantly since its inception, with key advancements in performance, security, and feature support. The initial stable release, version 1.0, was announced on June 19, 1996, by Andrew Tridgell, introducing the core delta-transfer algorithm for efficient . Subsequent early versions, such as 2.0.2 released on May 15, 1998, added foundational capabilities like improved remote transfer handling. Version 2.5.0, released on November 30, 2001, introduced protocol version 25, enhancing compatibility and efficiency in networked transfers. The 2.6 series culminated in version 2.6.9 on November 6, 2006, which included minor feature additions like better handling of hard links and numerous bug fixes, becoming a long-standing stable release used in many distributions. The transition to the 3.x series marked a major milestone with version 3.0.0, released on March 1, 2008. This update introduced protocol version 30, support, preservation of access control lists (ACLs) and extended attributes (xattrs), and an incremental recursive scanning algorithm that reduced memory usage and enabled earlier file transfers during large directory scans. It also switched the to GPLv3, reflecting broader open-source alignment. Version 3.1.0, released on September 28, 2013, brought protocol version 31, performance improvements for large transfers, new options like --open-noatime to avoid updating access times, and enhanced error handling for interrupted transfers. More recent major releases have focused on modern system compatibility and security hardening. Version 3.2.0, released on June 19, 2020, added support for additional such as birth times, improved compilation options including the ability to use an unmodified zlib library, and included various bug fixes for edge cases in symlink and handling. Version 3.3.0, released on April 6, 2024, introduced enhancements to the rrsync wrapper script, such as new options for link munging and locking control, alongside optimizations for xattr hashing to prevent collisions and better integration with contemporary build systems. The most recent updates address critical concerns. 3.4.0, released on January 15, 2025, is a security-focused release that patches multiple high-severity vulnerabilities, including remote code execution (CVE-2024-12084, CVSS 9.8), arbitrary file reads (CVE-2024-12086), unsafe symlink creation (CVE-2024-12087), and path traversal issues (CVE-2024-12088), primarily affecting daemon and client modes when interacting with malicious servers. This ensures safer operations without introducing major new features. Following quickly, 3.4.1, released on January 16, 2025, fixes regressions from 3.4.0, such as use-after-free errors in the and collisions in flist flags, while removing dependencies like popt's alloca usage for better portability. As of November 2025, 3.4.1 remains the latest stable release, with distributions like incorporating packaging updates for ongoing stability.
VersionRelease DateKey Changes
2.6.9November 6, 2006Bug fixes, improved support; long-term stable release.
3.0.0March 1, 2008Protocol 30, , /xattr support, memory-efficient recursion, GPLv3.
3.1.0September 28, 2013Protocol 31, performance boosts, --open-noatime option.
3.2.0June 19, 2020Protocol 31, birth time support, zlib flexibility, symlink fixes.
3.3.0April 6, 2024rrsync enhancements, xattr optimizations.
3.4.0January 15, 2025Security fixes for RCE and data leaks (CVEs 2024-12084 et al.).
3.4.1January 16, 2025Regression fixes, improved stability.

Usage

Basic Syntax and Commands

The basic syntax of the rsync command follows the form rsync [OPTION...] SRC... [DEST], where OPTION specifies modifiers to control the transfer behavior, SRC denotes one or more source files or directories, and DEST indicates the destination file or directory. This structure allows rsync to synchronize files either locally or between remote systems, with the command interpreting paths as local by default unless prefixed with a remote host specification. If no destination is provided, rsync lists the contents of the source instead of transferring files. Options in rsync are primarily short flags (e.g., -a for mode, which preserves symbolic links, permissions, timestamps, and other attributes) or their long equivalents (e.g., --[archive](/page/Archive)), and they can be combined for customized operations. Sources and destinations support local paths (e.g., /path/to/file) or remote notations: for shell-based remote transfers, [USER@]HOST:PATH (e.g., user@[example.com](/page/Example.com):/home/[user](/page/User)/docs); for rsync daemon mode, HOST::MODULE/PATH or rsync://[USER@]HOST[:PORT]/PATH. The trailing slash on directories affects behavior: src/ copies the contents into the destination, while src copies the directory itself. For local synchronization, rsync operates directly on the filesystem without network involvement, making it suitable for backing up or mirroring directories on the same machine. A basic local command might be rsync -a /source/directory/ /destination/directory/, which recursively copies all files from the source to the destination while preserving file attributes. Adding -v (verbose) provides progress output, as in rsync -av /home/user/docs/ /backup/docs/, ensuring users can monitor the transfer of files like documents or code without unnecessary recreation of unchanged items. Remote synchronization extends local usage by incorporating transfers, typically over SSH for . To push files from a local source to a remote destination, the command is rsync -a /local/source/ [email protected]:/remote/destination/, which authenticates via SSH and updates only modified files. Conversely, pulling from remote to local uses rsync -a [email protected]:/remote/source/ /local/destination/, ideal for fetching updates from a . For compressed transfers over slower links, -z can be added, e.g., rsync -avz [email protected]:/remote/source/ /local/destination/. These commands leverage rsync's delta-transfer to minimize data sent, even in remote scenarios.

Common Options and Flags

Rsync provides a rich set of command-line options and flags to tailor behavior, enabling control over , data , preservation, and output verbosity. These options, often abbreviated with single hyphens for short forms (e.g., -a) and double hyphens for long forms (e.g., --), can be combined flexibly to suit various use cases, from backups to remote mirroring. The tool's design emphasizes efficiency, with many options optimizing for , speed, or safety during transfers. One of the most essential options is -a or --archive, which activates mode to recursively copy directories (-r) while preserving symbolic links (-l), permissions (-p), modification times (-t), group ownership (-g), owner (-o), and device files/special files (-D). This mode ensures that the destination closely mirrors the source's structure and metadata, making it ideal for backups without altering . For example, rsync -a source/ dest/ synchronizes entire directory trees reliably. Archive mode is frequently the starting point for rsync commands due to its comprehensive default behavior. To monitor transfer progress, the -v or --verbose increases output detail, displaying each as it is processed and summarizing transferred , bytes, and speed at the end. This is particularly useful for or verifying large synchronizations, though it can generate substantial output on verbose datasets. Combining -v with -a provides a balance of informativeness and efficiency. For bandwidth-constrained networks, -z or --compress enables on-the-fly of during , reducing the volume of sent while minimally impacting CPU usage on modern systems. It is ineffective for already compressed like JPEGs or but shines with text-based or uncompressed . An example usage is rsync -avz source/ user@host:dest/, which adds to a verbose . The -P shorthand combines --partial and --progress, allowing interrupted transfers to resume by keeping partial files and showing progress metrics such as bytes transferred and estimated time. This is invaluable for unreliable connections, as it prevents restarting large files from scratch. --partial alone retains incomplete files for later resumption, while --progress provides per-file status updates. Safety features include -n or --dry-run, which simulates the synchronization without making changes, allowing users to preview actions and avoid unintended deletions or overwrites. Similarly, -u or --update skips files that are newer on the destination, ensuring updates flow only from source to target without regressing timestamps. For cleanup, --delete removes files in the destination that no longer exist in the source, maintaining an exact mirror but requiring caution to prevent . Remote operations often use -e or --rsh=COMMAND to specify the , defaulting to ssh for secure transfers (e.g., rsync -av -e ssh source/ user@host:dest/). Additionally, -m or --prune-empty-dirs removes empty directories from the destination after transfer, streamlining the without affecting populated paths. These options collectively enable rsync's versatility across local, shell-based, and daemon modes.

Examples

Local File Synchronization

Rsync enables efficient of files and directories between paths on the same , functioning as an advanced to the cp command. It skips unchanged files and, by default, copies entire modified files, with optional delta-transfer algorithm to minimize data transfer for large, mostly unchanged files. In mode, rsync compares and destination files based on attributes such as size and modification time, skipping unchanged files unless the --checksum option forces a deeper verification using . This process ensures that only files that have changed (based on quick checks) are transferred in whole by default, reducing unnecessary I/O operations even on a single machine. The delta-transfer algorithm, which updates only modified portions, can be enabled with the --no-whole-file option. The basic syntax for local synchronization is rsync [options] source destination, where both source and destination are local paths. By default, local transfers use whole-file copying via the --whole-file option, bypassing the algorithm for simplicity unless --no-whole-file is specified to enable it. Key options include -a (archive mode), which recursively copies directories while preserving permissions, timestamps, symbolic links, and ownership; -v for verbose output detailing the changes; and --delete to remove files in the destination that no longer exist in the , ensuring a true mirror. Additional flags like --progress display transfer progress, and --dry-run simulates the operation without making changes, aiding in verification. For file comparison in local mode, rsync generates a list of files from the source and scans the destination, using quick checks (size and mod-time) to identify candidates for before applying the rolling checksum-based mechanism if enabled. When the is enabled (using --no-whole-file to override the default whole-file for transfers): The sender process computes weak and strong checksums for source file blocks, then identifies matching blocks in the destination file to transmit only the differences, reconstructing the updated file on the destination side. This approach handles sparse files efficiently with the -S option, avoiding allocation of unused space, and supports preservation via -H to maintain filesystem structures. A common example synchronizes the contents of a source directory to a destination while preserving attributes:
rsync -av /home/user/documents/ /backup/documents/
This command recursively copies files from /home/user/documents/ into /backup/documents/, showing verbose output of actions taken. To include deletion of extraneous files and show progress:
rsync -av --progress --delete /home/user/documents/ /backup/documents/
This ensures the backup directory exactly mirrors the source, removing any files unique to the destination. For selective synchronization, such as backing up only specific file types:
rsync -av --include='*.txt' --exclude='*' /home/user/ /backup/
Here, only .txt files are included, with all others excluded, demonstrating rsync's filtering capabilities for targeted local backups. These features make rsync ideal for tasks like creating incremental local backups or project directories during development.

Remote File Synchronization

Rsync enables efficient of files between a local machine and a remote , transferring only the differences in files to minimize usage. This is achieved through its delta-transfer , which compares file contents using checksums and sends only the modified portions. Remote operations support two primary connection methods: via a like SSH for secure, authenticated transfers, or via an rsync daemon for direct connections on port 873. In remote shell mode, rsync invokes the shell (defaulting to SSH since version 2.6.0) to execute the rsync process on the remote host, allowing seamless integration with existing secure connections. The basic syntax for pushing files from local to remote is rsync [options] source user@host:destination, where the single colon denotes remote shell usage. For example, to archive and compress a directory /local/dir/ to a remote server while preserving permissions and timestamps, the command rsync -avz -e ssh /local/dir/ [email protected]:/remote/dir/ transfers only changes, ensuring the remote directory mirrors the local one. Similarly, pulling files from remote to local uses rsync -avz [email protected]:/remote/dir/ /local/dir/, which is useful for backing up remote data to a local machine. For rsync daemon mode, the remote host must run an rsync server configured with modules defining accessible directories, enabling anonymous or authenticated access without a shell. The syntax employs a double colon, such as rsync [options] host::module/source destination for pulling from a module named "module". An example pull command is rsync -av rsync://backup.example.com/public/ /local/backup/, which fetches public files from the daemon, applying compression and verbose output for monitoring. Pushing to a daemon uses rsync -av /local/files/ backup.example.com::module/, requiring authentication via a password file if not anonymous. This mode suits scenarios like mirroring public repositories but demands careful configuration for security. Advanced examples incorporate options for specific remote needs, such as bandwidth limiting to avoid saturation with --bwlimit=1M in rsync -avz --bwlimit=1M user@host:/remote/dir/ /local/dir/, capping transfers at 1 per second. For excluding patterns during remote sync, --exclude='*.tmp' in rsync -avz --exclude='*.tmp' /local/dir/ user@host:/remote/dir/ skips temporary files, ensuring cleaner synchronization. These examples highlight rsync's flexibility for remote backups, deployments, and mirroring across .

Connection Methods

Local Mode

Local mode in rsync operates entirely on the local filesystem of a single host, synchronizing files between source and destination paths without any remote network involvement. This mode is invoked when neither the source nor destination path contains a colon (:), distinguishing it from remote modes that use shell-based or daemon connections. For example, the basic command rsync -av /source/dir/ /dest/dir/ copies files recursively while preserving permissions, timestamps, and symbolic links. In local mode, rsync behaves like an enhanced cp command, offering advanced synchronization features such as incremental updates based on file size and modification time checks, which allow it to skip unchanged files in repeated runs. By default, it enables the --whole-file option, disabling the delta-transfer algorithm to copy entire files outright, as this avoids unnecessary computational overhead on local storage where bandwidth is not a concern. Users can override this with --no-whole-file to enable delta encoding if minimizing disk writes is prioritized, such as with the --inplace option. This mode supports all standard rsync options, including --archive (-a) for comprehensive attribute preservation and --delete for removing files in the destination that no longer exist in the source. Compared to basic cp, rsync's local mode provides advantages in scenarios requiring ongoing synchronization, such as local backups or directories, due to its ability to efficiently updates without recopying everything. It includes utilities like --dry-run for simulating operations, --progress for , and --exclude patterns for selective transfers, making it more versatile for complex local file management tasks. However, for one-time full copies of large trees, cp -a may perform slightly faster due to rsync's additional checks, though rsync excels in resumability if interrupted.

Remote Shell and Daemon Modes

Rsync operates in remote shell mode when synchronizing files between a local system and a remote host using a remote shell program, such as SSH or , as the transport mechanism. This mode requires rsync to be installed on both the local and remote systems, and it initiates a connection by specifying the remote host in the source or destination path with a single colon, for example, user@remotehost:/path/to/source. The remote shell handles authentication and executes rsync on the remote side, allowing data transfer through the established shell connection without needing a dedicated rsync server process. By default, rsync uses SSH as the remote shell, but this can be customized with the -e or --rsh option, such as -e 'ssh -p 2222' to specify a non-standard port. In contrast, daemon mode enables rsync to function as a standalone server listening for incoming connections over TCP, typically on port 873, without relying on a remote shell. To use this mode, the rsync daemon must be started on the remote host with the --daemon option, often configured via a rsyncd.conf file that defines modules—virtual directories specifying paths, access controls, and other settings. Connections are initiated using a double colon in the path, like remotehost::module/path, or the rsync:// URL scheme, such as rsync://remotehost/module/path. Authentication in daemon mode is handled through the configuration file, potentially using secrets files with the --password-file option, and it supports options like --port to change the listening port or --address to bind to a specific IP. The primary differences between these modes lie in their transport and setup requirements: remote shell mode leverages existing shell access for secure, on-demand transfers but depends on the shell's overhead, while daemon mode provides direct, potentially faster socket-based communication at the cost of maintaining a persistent process. Both modes support rsync's core features, such as transfers, but daemon mode is often preferred in scenarios requiring access or integration with firewalls that block connections, though it demands careful configuration for security. For hybrid use, rsync allows invoking daemon features over a connection by specifying a path, bridging the two approaches.

Algorithm

File Selection and Comparison

Rsync selects files for synchronization by recursively scanning the source , applying user-defined s to determine inclusion or exclusion based on patterns, paths, sizes, and other attributes. The process begins with the sender generating a comprehensive that includes pathnames, sizes, modification times, permissions, , and modes for all candidate files and directories. This is transmitted to the receiver, where it is sorted lexicographically by path to facilitate efficient . rules, specified via options such as --include, --exclude, --filter, or files like --include-from and --exclude-from, allow precise over selection; for instance, patterns using wildcards (e.g., *.pdf to include PDF files) are matched in the order provided, with the first applicable rule determining whether a is included or excluded. By default, rsync includes all files unless explicitly excluded, and recursion is enabled with the -r or --recursive option to traverse subdirectories. Additional constraints, such as --max-size or --min-size, limit selection by size, while --one-file-system prevents crossing filesystem boundaries. Once the file list is established, rsync compares source and destination files to identify those requiring transfer, using a "quick check" algorithm by default that examines file size and modification time (mtime). A file is considered unchanged—and thus skipped—if both attributes match exactly, minimizing unnecessary data transfer. This timestamp and size comparison is efficient for most scenarios but can miss changes if clocks are not synchronized or if files are modified without altering these metadata. Directories, symbolic links, and special files like device nodes are handled separately: directories are created if missing, and symlinks are transferred based on their target paths without content comparison. The --modify-window option adjusts the tolerance for mtime mismatches (default 0 seconds, supporting sub-second precision with negative values), accommodating minor clock drifts. For more accurate detection of changes, the --checksum (or -c) option overrides the quick check by computing and comparing 128-bit checksums (or another via --checksum-choice) of entire file contents, ensuring transfers occur only if the files differ byte-for-byte. This increases CPU and I/O overhead but is essential for environments with unreliable timestamps, such as distributed systems. Conversely, --size-only restricts comparison to alone, ignoring mtimes entirely, which is useful when timestamps cannot be preserved. The --ignore-times (or -I) option disables mtime checks altogether, forcing rsync to treat all files as potentially needing transfer based on size or checksums if specified, effectively updating the entire set. In all cases, the comparison phase precedes the delta-transfer mechanism, with selected files queued for efficient partial updates using rolling checksums on blocks. The underlying , developed by Tridgell and Paul Mackerras, relies on these metadata-driven decisions to optimize over networks with high or limited .

Delta Transfer Mechanism

The delta transfer mechanism in rsync minimizes data transmission by sending only the differences between source and destination files, rather than entire files, making it particularly effective for updates over low-bandwidth or high-latency connections. This is accomplished using a delta-encoding algorithm that divides the basis file (the destination's existing version) into fixed-size blocks and identifies matching substrings in the target file (the source's new version) through comparisons. The approach ensures that unchanged portions are referenced by , while novel or modified segments are transmitted . The process begins with the (holding the basis file) partitioning it into blocks of a fixed , commonly bytes, though this can be adjusted via the --block-size option in implementations. For each block, the computes two s: a weak rolling for rapid approximate matching and a strong for . The weak is a 32-bit value based on an Adler-32-inspired rolling , allowing efficient computation as the window slides byte-by-byte. It is defined as s(k, l) = a(k, l) + 2^{16} \cdot b(k, l), where a(k, l) = \sum_{i=k}^{l} X_i \mod M and b(k, l) = \sum_{i=k}^{l} (l - i + 1) X_i \mod M, with M = 2^{16}. The rolling property enables updates via the recurrence relations: a(k+1, l+1) = (a(k, l) - X_k + X_{l+1}) \mod M and b(k+1, l+1) = (b(k, l) - (l - k + 1) X_k + a(k+1, l+1)) \mod M. The strong , originally a 128-bit , confirms exact matches and is sent alongside the weak s for all blocks, comprising about 1% of the . These s are transmitted from the to . Upon receiving the checksums, the sender (holding the target file) scans it by computing rolling weak checksums at every byte offset. To accelerate lookups, the low 16 bits of each weak checksum index a hash table containing candidate strong checksums from the receiver's blocks. Potential matches are verified first by the full 32-bit weak checksum (via a sorted list scan) and then by the strong checksum; confirmed matches result in a copy instruction referencing the corresponding block offset in the basis file. Non-matching regions are sent as literal bytes. The sender transmits these instructions—copy tokens and literals—to the receiver in a compact, ordered format for direct application to reconstruct the target file. This single round-trip negotiation ensures the delta is compact. The algorithm's outline is as follows:
Receiver (basis file B):
for each block in B:
    compute weak_checksum(block)
    compute strong_checksum(block)
    send (weak_checksum, strong_checksum, block_offset)  # to sender

Sender (target file A):
initialize hash_table[2^16] with strong_checksums indexed by weak_checksum low 16 bits
for offset = 0 to length(A):
    compute rolling_weak = weak_checksum(A[offset..offset+block_size-1])
    candidates = hash_table[rolling_weak & 0xFFFF]
    for each candidate in candidates (sorted by full weak_checksum):
        if full_weak_matches and strong_checksum(A[offset..]) == candidate.strong:
            send token: COPY candidate.block_offset, length=block_size  # to receiver
            advance offset by block_size
            break
        else if no match:
            send literal: A[offset]  # to receiver
            advance offset by 1

Receiver:
receives instructions and applies: copy from basis offsets or insert literals to reconstruct A
This mechanism achieves high efficiency, transferring only about 5% of the data for files with minor changes when block sizes exceed 300 bytes; for example, updating a 24 MB source requires transmitting roughly 1 MB. Collision risks from the weak (effective strength of about 46 bits with the strong ) are mitigated by the strong and an optional whole-file , with failure probability estimated at approximately 1 in 10^11 years (assuming 1 million 1 MB transfers per second). In rsync versions 3.2.0 and later, the strong can be configured (e.g., to xxh128, , or ) via --checksum-choice, and the weak checksum seed via --checksum-seed for reproducibility, enhancing security and adaptability without altering the core block-matching logic. The delta transfer is enabled by default but can be disabled with --whole-file for local copies or when checksum overhead outweighs benefits.

Advanced Topics

Performance Considerations

Rsync's performance is primarily driven by its delta-transfer algorithm, which minimizes data transmission by identifying and sending only the differences between source and destination files. This approach is particularly efficient over networks with limited or high , as it requires just one round trip to compute and transfer deltas, using rolling checksums to match blocks without exhaustive comparisons. The algorithm divides files into blocks of 500–1000 bytes, computes weak 32-bit rolling checksums for quick hashing, and verifies matches with strong 128-bit checksums, enabling rapid detection of unchanged portions. For similar files, this results in substantial bandwidth savings; in benchmarks updating a 24 MB tarball across versions, rsync transferred only 64 bytes (near-total savings) using 500-byte blocks, compared to full file resends. Computationally, the algorithm's efficiency stems from its use of hash tables and sorted lists for block matching, with rolling checksums computed via simple recurrences to avoid redundant calculations, keeping CPU overhead low even for large files. It performs best when source and destination files are similar, but remains reasonably efficient for dissimilar files by falling back to sending more data as needed. However, performance can degrade in local copies due to overhead from dual processes, communications, and select system calls, making rsync slower than direct tools like for identical file transfers; in one , syncing 100 files totaling 200 GB via was faster than rsync's whole-file mode, though rsync still skipped unchanged files to transfer only 8 GB of new data. Several options allow tuning for specific scenarios. The --whole-file (or -W) flag disables delta-transfer, opting for full file copies, which can accelerate transfers when exceeds disk I/O limits or for mostly unchanged files, as it avoids checksum computations. Conversely, --compress (or -z) reduces transmitted data via zlib (or alternatives like ), trading CPU cycles for lower network usage—beneficial on slow links but potentially counterproductive on fast, CPU-constrained systems. Block size can be adjusted with --block-size=SIZE to optimize for file characteristics; larger blocks suit bigger files with sparse changes, while smaller ones improve granularity for frequent small edits, though the default auto-selection based on file size balances most cases. Other factors include I/O patterns and choices. Enabling --checksum (or -c) for size/time-independent comparisons increases CPU load and disk reads for 128-bit , significantly slowing transfers unless is paramount. For large files with block-level modifications, --inplace updates destinations directly to cut I/O, though it may reduce efficiency if combined with other options. Bandwidth limiting via --bwlimit=RATE prevents saturation, and pipelining across multiple files maintains utilization. In practice, rsync's single-threaded nature limits parallelism, making it slower than multi-threaded alternatives for massive datasets, but its low overhead for incremental syncs—evident in the kernel example's sub-minute versus GNU diff's four minutes—establishes it as a high-impact for routine .

Security Implications

Rsync's security profile varies significantly by usage mode. In local mode or when invoked over SSH, transfers benefit from the underlying system's protections or SSH's and , mitigating many risks associated with interception or unauthorized access. However, daemon mode, which listens on a network port (default 873), transmits in by default, exposing it to , man-in-the-middle attacks, and unauthorized access if the server is internet-facing or insufficiently firewalled. Exposed rsync daemons pose substantial data leakage risks due to common misconfigurations, such as lacking or controls. A 2018 scan identified approximately 250,000 public IPv4 addresses running rsync daemons, with over 14,000 exposing listable modules containing sensitive files, including configuration files, user databases, and terabytes of media, often without credentials required. More recent assessments in 2025 reported over 660,000 exposed rsync servers as of January, with an August scan identifying approximately 550,000 instances, many potentially vulnerable to recent flaws. This exposure can lead to unauthorized read or write , enabling attackers to exfiltrate backups, overwrite files, or to deeper system compromise, particularly on devices like systems where rsync is enabled by default via UPnP. Rsync has faced multiple vulnerabilities over its history, primarily affecting daemon mode or interactions with untrusted peers. Historical issues include buffer overflows in extended attributes (xattrs) handling for versions prior to 3.1.3, allowing remote code execution or of service when processing malformed data from untrusted servers. More recently, in January 2025, six critical vulnerabilities (CVE-2024-12084 through CVE-2024-12088 and CVE-2024-12747) were disclosed, stemming from flaws in validation and path handling in rsync versions >= 3.2.7 and < 3.4.0. These enable remote code execution on servers via heap buffer overflows, arbitrary file reads or overwrites on clients from malicious servers, and symlink attacks bypassing safety checks, with impacts amplified by rsync's protocol evolution requiring . To mitigate these implications, users should prioritize SSH for remote transfers to ensure and strong , avoiding daemon mode over public networks. In daemon configurations, enforce via auth users and secrets files in rsyncd.conf, restrict access with hosts allow directives, disable module listing with list = false, and enable jails (use chroot = yes) or symlink munging to contain potential exploits. Refuse risky options like --xattrs or --links using refuse options in the config, and maintain updates to the latest version (3.4.1 as of January 2025) to address known CVEs, as older versions remain vulnerable to protocol downgrade attacks.

Applications

Common Use Cases

Rsync is extensively employed for creating backups of files and directories, leveraging its delta-transfer algorithm to transmit only changes rather than entire files, which is particularly efficient for incremental backups over networks with limited . For instance, system administrators often schedule rsync via jobs to back up home directories to remote hosts, using options like -a for archive mode to preserve permissions and timestamps, and --link-dest to create hard links for unchanged files in previous backups, thereby saving space. This approach is common in enterprise environments where large datasets undergo frequent minor updates, such as log files or databases, ensuring without redundant transfers. Another prevalent application is mirroring directories and servers, where rsync synchronizes source and destination locations by updating, adding, or deleting files to maintain identical copies. In web hosting scenarios, it is routinely used to mirror staging servers containing complete directory trees to production web servers in a demilitarized zone (DMZ), minimizing downtime during deployments by transferring only modified content like HTML, CSS, or image files. The --delete option ensures that files removed from the source are also excised from the destination, providing a true mirror without accumulating obsolete data, which is essential for maintaining consistency in distributed systems. Beyond backups and mirroring, rsync serves as an enhanced file copying utility for everyday synchronization tasks, both locally and remotely via SSH, outperforming traditional tools like cp by skipping unchanged files based on checksums or timestamps. It is particularly valuable in DevOps pipelines for deploying updates across multiple identical systems, such as batch-applying patches to a fleet of servers using --write-batch to generate transferable update files. Additionally, in containerized environments like OpenShift, rsync facilitates copying files to or from pods for tasks such as database archiving or configuration updates, supporting secure, efficient data movement without exposing network shares.

Integrations and Variants

Rsync has been integrated into numerous software tools and frameworks to enhance capabilities in , deployment, and workflows. In , an open-source platform, the synchronize module serves as a wrapper around rsync, enabling efficient file transfers between hosts during playbook execution by leveraging rsync's delta-transfer algorithm over SSH or other transports. This integration simplifies common tasks like deploying configurations or mirroring directories across distributed systems, with options for recursive copying, deletion, and verification. Similarly, BackupPC, a high-performance enterprise-grade system, incorporates a customized variant of rsync known as rsync-bpc, which includes a shim layer for direct access to pooled data, ensuring compatibility with rsync's protocol while optimizing for disk-based deduplication and . Duplicity, a command-line tool for encrypted incremental archives, utilizes the librsync —derived from rsync's core algorithm—to compute and transmit only file differences, supporting remote storage backends like SSH, FTP, and cloud services for secure, bandwidth-efficient . Other integrations leverage rsync for specialized applications. For instance, lsyncd (Live Syncing Daemon) monitors local directories using or fsevents and automatically invokes rsync processes to propagate changes to remote targets in near , reducing overhead compared to polling-based synchronization. This makes it suitable for live mirroring scenarios, such as development environments or content distribution networks, with support for SSH-secured transfers and Lua-based configuration for custom filters. In and deployment pipelines, rsync is often embedded via scripts or wrappers to synchronize artifacts between build servers and production environments, as seen in tools like Jenkins or CI, where its efficiency minimizes transfer times for large codebases. Variants of rsync extend its functionality to new platforms and use cases while preserving the core delta-transfer mechanism. CwRsync ports rsync to Windows environments using , providing native executables for remote backup and synchronization over SSH or daemon mode, with optimizations for handling Windows file paths and permissions. offers a (GUI) frontend built on , allowing users to configure rsync options visually for tasks like folder mirroring or backups, without requiring command-line expertise, and supports both local and remote operations across , Windows, and macOS. , dubbed "rsync for ," adapts rsync's syntax and features to over 70 cloud providers (e.g., S3, ), including multi-threaded transfers, , and mounting as filesystems via , enabling seamless synchronization between local systems and remote object stores. Rdiff-backup, another derivative, combines rsync-like mirroring with reverse differencing for versioned backups, storing incremental changes in a dedicated directory to allow efficient restores to any point in time, often over SSH for remote operations. These variants maintain with standard rsync where possible, broadening its applicability beyond systems.

References

  1. [1]
    rsync(1) - Linux man page
    Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync ...
  2. [2]
    rsync - Samba.org
    rsync is an open source utility that provides fast incremental file transfer. rsync is freely available under the GNU General Public License.Rsync download · Rsync examples · Rsync(1) · Rsync features
  3. [3]
    rsync(1) - Linux manual page - man7.org
    Rsync copies files either to or from a remote host, or locally on the current host (it does not support copying files between two remote hosts). There are two ...
  4. [4]
  5. [5]
    The Rsync Algorithm - OLS Transcription Project
    Jul 21, 2000 · My name's Andrew Tridgell. I'm going to tell you a bit about rsync, and I'm going to be concentrating on the rsync algorithm for this talk ...<|control11|><|separator|>
  6. [6]
    Tridge returns to rsync - LWN.net
    Apr 6, 2024 · ... [Andrew Tridgell] (the original author) and he has graciously agreed to get back into rsync work, along with Paul Mackerras, who was also an ...
  7. [7]
    Chapter 22. rsync | SELinux User's and Administrator's Guide
    The rsync utility performs fast file transfer and it is used for synchronizing data between systems.Missing: software | Show results with:software
  8. [8]
    File Copying with RSync | Administration Guide | SLES 12 SP5
    Despite its name, Rsync is not a synchronization tool. Rsync is a tool that copies data only in one direction at a time. It does not and cannot do the reverse.<|control11|><|separator|>
  9. [9]
    rsync - Community Help Wiki
    Sep 5, 2012 · Rsync is a tool for efficiently copying and backing up data from one location (the source) to another (the destination).
  10. [10]
    [PDF] The rsync algorithm - andrew.cmu.ed
    Jun 18, 1996 · The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996. Joint Computer Science Technical Report Series. Department of Computer ...
  11. [11]
    RsyncProject/rsync: An open source utility that provides fast ... - GitHub
    Rsync is a fast and extraordinarily versatile file copying tool for both remote and local files. Rsync uses a delta-transfer algorithm which provides a very ...Issues 285 · Actions · RsyncProject · Releases 2<|control11|><|separator|>
  12. [12]
    [PDF] Efficient Algorithms for Sorting and Synchronization - Samba.org
    The algorithm was developed by Andrew Tridgell and Richard Brent and was implemented by Andrew. Tridgell. ... This chapter describes the rsync algorithm, an ...
  13. [13]
    rsync(1) manpage - Samba.org
    Rsync was originally written by Andrew Tridgell and Paul Mackerras. Many people from around the world have helped to maintain and improve it. Mailing lists ...
  14. [14]
    rsync - Wikipedia, the free encyclopedia
    Apr 2, 2009 · rsync was first announced on 19 June 1996. Rsync 3.0 was released on 1 March 2008.
  15. [15]
    rsync - Wikidata
    Statements · 1 (English). publication date. 6 April 1999. 2.3.2. publication date. 8 November 1999. 1 reference. reference URL · https://github.com/WayneD/rsync/ ...
  16. [16]
    NEWS for rsync
    ### Summary of Earliest Mentions of rsync Development, Initial Release Date, and Authors
  17. [17]
    Linux Rsync Command - Computer Hope
    Jun 1, 2025 · Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the ...<|control11|><|separator|>
  18. [18]
    rsync - Wikipedia
    rsync (remote sync) is a utility for transferring and synchronizing files between a computer and a storage drive and across networked computers
  19. [19]
    Rsync remote code execution and related vulnerability fixes available
    Jan 14, 2025 · Canonical's security team has released updates of the rsync packages for all supported Ubuntu releases. The updates remediate CVE-2024-12084, ...<|control11|><|separator|>
  20. [20]
    Severe Rsync vulnerabilities — CVSS 9.8 — risk RCE, data leaks
    Jan 15, 2025 · Six vulnerabilities in the popular Rsync file-synchronizing tool were disclosed Wednesday, including critical and high-severity flaws that could risk remote ...<|control11|><|separator|>
  21. [21]
    rsync
    ### Summary of rsync
  22. [22]
    How Rsync Works A Practical Overview - Samba.org
    Rsync is heavily pipelined. This means that it is a set of processes that communicate in a (largely) unidirectional way.
  23. [23]
    Keeping Linux files and directories in sync with rsync - Red Hat
    Mar 10, 2021 · The rsync tool can recursively navigate a directory structure and update a second location with any new/changed/removed files.
  24. [24]
    Copying a large directory tree locally? cp or rsync? - Server Fault
    Jul 20, 2009 · I have to copy a large directory tree, about 1.8 TB. It's all local. Out of habit I'd use rsync, however I wonder if there's much point, and if I should rather ...Showing total progress in rsync: is it possible? - Server Faultrun rsync as root but keep user ownership - Server FaultMore results from serverfault.com
  25. [25]
    The rsync algorithm
    ### Summary of the rsync Algorithm
  26. [26]
    A look at rsync performance - LWN.net
    Aug 19, 2010 · There are a bunch of reasons for using rsync as shorthand for "make these two directories the same", even without needing the rsync algorithm to ...
  27. [27]
    How to Secure Rsync - UpGuard
    Jan 5, 2025 · The most basic way to protect rsync modules from accidental exposure is to restrict which external machines can talk to it. By using the hosts ...Missing: considerations | Show results with:considerations
  28. [28]
    Rsunk your Battleship: An Ocean of Data Exposed through Rsync
    Dec 21, 2018 · The rsync daemon has had a variety of security capabilities layered in since its original release in 1996, including host- and file-level ACLs, ...
  29. [29]
    Rsync Security Advisories - Samba.org
    You should install a security fix for rsync when the rync you are running is: older than 3.2.5 and pulling from an untrusted server; older than 3.2.5 and using ...Missing: considerations | Show results with:considerations
  30. [30]
    A look at the recent rsync vulnerability - LWN.net
    Jan 21, 2025 · Because the protocol changes over time, and not all users can update at once, rsync needs to be able to use different versions of the protocol ...The Flaw · The Impact · The Aftermath<|control11|><|separator|>
  31. [31]
    File copying with RSync | Administration Guide | SLES 15 SP7
    This is often the case when working with backups. Rsync can also be useful for mirroring staging servers that store complete directory trees of Web servers ...
  32. [32]
    Chapter 7. Working with containers | Nodes - Red Hat Documentation
    May 30, 2017 · The oc rsync command, or remote sync, is a useful tool for copying database archives to and from your pods for backup and restore purposes. You ...