Fact-checked by Grok 2 weeks ago

QPDF

QPDF is a free and open-source command-line program and C++ library designed for structural, content-preserving transformations on PDF files. It enables users to inspect and manipulate the internal structure of PDF documents, supporting operations such as for optimization, and decryption with passwords, merging or splitting files, and handling object , all while preserving the original content's visual . Unlike tools focused on rendering or content extraction, QPDF emphasizes low-level PDF syntax handling without high-level abstractions for page content. Key features of QPDF include the ability to copy objects between PDF files, create new PDFs from user-supplied content, and perform transformations like converting between compressed and uncompressed object streams. It is , with few external dependencies, making it suitable for developers integrating PDF manipulation into applications or scripts. The tool does not support viewing PDFs, extracting text, or converting to other formats, positioning it as a specialized utility for and modification. Developed primarily by Jay Berkenbilt since 2005, QPDF has evolved into a mature project hosted on , with releases distributed via . Licensed under the , Version 2.0, it is included in most major distributions and various software repositories, reflecting its reliability and adoption in open-source ecosystems. Comprehensive documentation, including manuals and examples, is available to guide both command-line usage and library integration.

Overview

Description

QPDF is a free, open-source command-line program and C++ library designed for performing structural, content-preserving transformations on PDF files. It enables low-level manipulation of PDF syntax, objects, and streams, such as normalizing structures or copying elements between files, while ensuring the visual appearance and content integrity remain unchanged. The tool fully supports PDF versions up to 1.7 and provides partial compatibility with PDF 2.0, particularly for features like encryption schemes defined in ISO 32000-2. Unlike PDF viewers or editors, QPDF does not render documents, extract text, or convert files to other formats; instead, it concentrates on backend operations like object inspection, cross-reference table management, and syntax validation without interpreting higher-level content. This makes it suitable for tasks requiring precise control over PDF internals, such as optimization for delivery or integration into automated workflows, where users supply the content directly. QPDF is dual-licensed under the 2.0, which has been the primary license since version 7.0, and the 2.0 for compatibility with earlier releases. It is multiplatform, supporting , Windows, and macOS, and is available as native packages in most major distributions and other software repositories.

Development

Jay Berkenbilt, a software engineer, is the lead developer and primary maintainer of QPDF. He created the project in 2001 as a personal tool for PDF manipulation needs while employed at Apex CoVantage, continuing initial modifications until 2005. Upon resigning from Apex CoVantage in 2005, Berkenbilt released QPDF as open-source software, transitioning it from private use to public availability. The project has been hosted on SourceForge since 2008, with an active development mirror on GitHub established in 2014 to facilitate version control and collaboration. Maintenance relies on primary maintainers Jay Berkenbilt and Manfred Holger (since 2022), supplemented by community contributions submitted via GitHub issues and pull requests, following guidelines outlined in the project's contributing documentation. QPDF has been packaged and included in major Linux distributions, such as Debian and Ubuntu, since 2008, enabling widespread adoption without formal organizational backing. Each release includes a comprehensive manual for users and developers, while support is provided through the qpdf-announce mailing list on SourceForge for updates and announcements. The latest release as of November 2025 is version 12.3.0 (October 22, 2025). Although lacking a dedicated organization, QPDF integrates into external tools, notably the R package qpdf, which leverages its C++ library for PDF splitting, combining, and compression tasks.

Features

Command-line transformations

The qpdf command-line tool enables a range of structural transformations on PDF files while preserving their visual content and logical structure. These operations include reorganizing objects for optimized delivery, securing or unsecured files, modifying page layouts, and reducing through techniques. All transformations are applied in memory, ensuring that the output PDF remains semantically equivalent to the input unless explicitly altered. Linearization optimizes PDFs for web viewing by rearranging objects to enable progressive rendering, where the first page loads quickly even on slow connections. The --linearize option performs this in two passes: the first identifies key offsets, and the second writes the file with a dictionary and cross-reference streams adjusted for streaming. For example, the command qpdf input.pdf --linearize output.pdf generates a web-optimized version. This feature is particularly useful for large documents intended for online distribution, as it reduces initial load times without altering content. Encryption and decryption provide security controls using owner and user passwords, with support for key lengths of 40-bit, 128-bit, or 256-bit. For 256-bit , qpdf employs ; lower lengths use , though the latter requires the --allow-weak-crypto flag due to its vulnerabilities. The --encrypt option sets permissions, such as restricting printing or editing, via the structure qpdf --encrypt key-length user-password owner-password input.pdf output.pdf; for instance, qpdf input.pdf --encrypt 256 userpass "" -- output.pdf applies 256-bit encryption with a user password. Decryption uses --decrypt, as in qpdf input.pdf --decrypt output.pdf, provided the correct password is supplied via --password. passwords are handled by transcoding to PDF Doc encoding for compatibility. Page operations allow basic manipulations like rotation, splitting, and merging without affecting the content streams. Rotation is achieved with --rotate=[+|-]angle[:page-range], where angles are 90, 180, or 270 degrees; for example, qpdf input.pdf --rotate=+90:1-5 output.pdf rotates the first five pages 90 degrees clockwise. Splitting uses --split-pages[=n], creating separate files for every n pages, such as qpdf input.pdf --split-pages=2 output-%d.pdf to produce output-1.pdf and output-2.pdf. Merging and selective page extraction employ --pages, starting from an empty file if needed: qpdf --empty --pages file1.pdf 1-3 file2.pdf z:5-7 -- output.pdf combines pages 1-3 from file1.pdf and pages 7, 6, 5 (reversed order of pages 5-7) from file2.pdf. Direct scaling or n-up printing (arranging multiple pages per sheet) is not natively supported in the CLI and requires external tools or library integration. Compression and optimization focus on reducing file size by filtering streams and consolidating objects, primarily using the deflate (FlateDecode) algorithm. The --compress-streams=y flag ensures all eligible streams are compressed, as in qpdf input.pdf --compress-streams=y output.pdf. Object deduplication occurs via --object-streams=generate, which packs indirect objects into compressed streams for efficiency: qpdf input.pdf --object-streams=generate output.pdf. Removal of unused objects is facilitated by --check, which validates the structure and generates a normalized output excluding invalid or unreferenced elements: qpdf input.pdf --check output.pdf. Additional filters like LZWDecode are applied through --decode-level=generalized before recompression, and --recompress-flate optimizes existing zlib streams. Image optimization with --optimize-images converts non-JPEG images to DCT (JPEG) if smaller, using a default quality of 75, adjustable via --jpeg-quality. These options support other standard PDF filters but prioritize content preservation over aggressive alteration. PDF creation via the CLI generates minimal files from existing or empty templates, but requires user-supplied graphics operators for custom , as qpdf does not generate visual itself. Starting with --empty produces a blank PDF: qpdf --empty output.pdf, onto which can be added using --pages. For instance, qpdf --empty --pages input.pdf 1 -- output.pdf creates a single-page from the first page of input.pdf. This approach is suited for assembling basic documents but relies on input for actual . Despite these capabilities, qpdf's CLI transformations have limitations: it preserves PDF content streams faithfully but may inadvertently alter or annotations if not explicitly preserved via options like --preserve-unreferenced-objects. There is no support for interactive forms, execution, or advanced features like digital signatures, as the tool focuses on structural changes rather than semantic interpretation.

Library capabilities

The libqpdf C++ library provides the core programmatic interface for QPDF, enabling developers to embed PDF manipulation capabilities directly into applications. The primary class, QPDF, handles high-level document operations, including loading and PDF files via methods like processFile, which supports input from files, buffers, or streams while automatically resolving indirect objects through an internal cache mechanism. Low-level access to PDF elements is facilitated by the QPDFObjectHandle class, which represents objects such as dictionaries, arrays, and streams, allowing for inspection, modification, and creation of indirect objects using factory methods like newIndirect or parse. For stream handling, the abstract base class and its subclasses (e.g., Pl_Buffer, Pl_File) enable and writing of content streams, supporting filtered and during read and write operations. Key API operations include reading and writing PDFs through QPDF in conjunction with QPDFWriter, which outputs files in standard PDF format or linearized for optimization, and supports QDF mode for JSON-like of the document structure, introduced in version 11.0.0 with enhancements for external handling. Advanced features encompass decryption and re-encryption of protected files using user or owner passwords, manipulation of tables via QPDF's private table for tracking object offsets and generations, and creation of new indirect objects with reserved numbers to avoid conflicts during merges. The offers partial support for PDF 2.0 syntax, including recognition of explicitly encoded strings as specified in the standard. Integration with libqpdf involves including headers from the include/qpdf directory for utility classes, though the full library requires compilation and linking; since version 11, building uses with the qpdf::libqpdf target for simplified dependency management across platforms. C-language bindings, exposed via qpdf-c.h, have been available since version 10, allowing direct use in C programs or languages that interface with shared libraries. The library is thread-safe for concurrent instances but not for shared objects, making it suitable as a backend for R's qpdf package for document transformations in statistical workflows, or custom processors that require in-process batch operations without invoking external shell commands. Performance-wise, libqpdf is designed for efficiency with large files, capable of processing PDFs exceeding available system memory by reading and writing without fully loading the document into , particularly when using output to externalize streams. It employs stream-based I/O, including options for access in certain configurations, to minimize overhead during object resolution and content stream operations on substantial documents.

Usage

Command-line examples

The qpdf command-line tool provides a versatile interface for performing content-preserving transformations on PDF files. Below are practical examples of common operations, demonstrating syntax for basic tasks. These examples assume the tool is installed and accessible via the command line; input and output files should be specified with full paths as needed. For basic conversion and normalizing PDF syntax while preserving original data, use the following command, which reads the input file, applies minimal transformations to ensure compliance with PDF standards, and writes the result to an output file without altering filters:
qpdf --stream-data=preserve input.pdf output.pdf
This is useful for repairing or standardizing PDFs without recompressing or decoding s, maintaining the file's original structure where possible. To merge multiple PDF files into a single output, create an empty base PDF and append pages from source files:
qpdf --empty --pages file1.pdf file2.pdf -- output.pdf
This command generates output.pdf containing all pages from file1.pdf followed by those from file2.pdf, inheriting metadata from the empty base; additional files can be listed similarly for multi-file merges. Splitting a PDF into individual page files is achieved with the --split-pages option, which outputs numbered files by default:
qpdf --split-pages input.pdf
This produces files named input-001.pdf, input-002.pdf, and so on, each containing one page from input.pdf; the %d placeholder in a custom output pattern like --split-pages=input-%03d.pdf allows formatted numbering. Decrypting a password-protected PDF requires specifying the and the --decrypt flag:
qpdf --password=pass --decrypt input.pdf output.pdf
Here, pass is replaced with the actual ; the command removes from input.pdf and saves the unencrypted version as output.pdf, preserving all other . For files without passwords, --decrypt alone suffices. Inspecting the internal structure, such as the trailer dictionary, aids in or :
qpdf --show-object=trailer input.pdf
This dumps the trailer object to stdout, revealing like the , size, and information without modifying the . Other objects can be shown with --show-object=obj,gen. For advanced optimization like linearizing a PDF for viewing while enabling , combine flags as follows:
qpdf --linearize --compress-streams=y input.pdf output.pdf
Linearization reorganizes the for byte-serving, allowing partial downloads, and reduces size by applying Flate filters to uncompressed . Error handling and validation can be performed using flags like --check, which verifies the PDF's and :
qpdf --check input.pdf
This exits with 0 if valid or non-zero otherwise, reporting issues; for piping output to stdout instead of a file, add --stdout to any transformation command, such as qpdf --stream-data=preserve input.pdf --stdout > output.pdf. Warnings can be suppressed with --no-warn if needed during processing.

Library integration

To integrate the QPDF library into C++ applications, begin by including the necessary header file and linking against the library during compilation. The primary header for core functionality is <qpdf/QPDF.hh>, which provides access to the QPDF class for loading and manipulating PDF files. Headers are installed in the include/qpdf directory, and applications should include them directly without modifying include paths. A basic setup involves creating a QPDF object and processing an input file, which parses the PDF structure while preserving content. For example:
cpp
#include <qpdf/QPDF.hh>
#include <iostream>

int main() {
    QPDF doc;
    try {
        doc.processFile("input.pdf");
        std::cout << "PDF loaded successfully." << std::endl;
    } catch (QPDFException const& e) {
        std::cerr << "Error: " << e.what() << std::endl;
        return 1;
    }
    return 0;
}
This uses processFile to load the file, optionally providing a password for encrypted PDFs. The method throws a QPDFException if the file is invalid or inaccessible. Common operations include writing the modified PDF to output and applying encryption. Output is handled via the QPDFWriter class, included from <qpdf/QPDFWriter.hh>. To write a flattened version (which resolves indirect objects and annotations), instantiate QPDFWriter with the QPDF object and specify options:
cpp
#include <qpdf/QPDFWriter.hh>

QPDFWriter writer(doc, "output.pdf");
writer.setFlattenAnnotations(true);  // Flatten annotations if needed
writer.write();
For encryption, configure parameters before writing, such as using 256-bit encryption with user and owner passwords:
cpp
QPDFWriter writer(doc, "encrypted.pdf");
writer.setR6EncryptionParameters(256, "userpass", "ownerpass", 0);  // 0 for default permissions
writer.write();
This applies Revision 6 , treating user and owner passwords equivalently for access. Permissions can restrict , , or as needed. Object manipulation leverages QPDFObjectHandle, a smart pointer-based class for accessing and modifying PDF elements without . To access pages, retrieve the root and its /Pages :
cpp
QPDFObjectHandle root = doc.getRoot();
QPDFObjectHandle pages = root.getKey("/Pages");
int num_pages = pages.getKey("/Count").getIntValue();
Adding a page appends it to the document using addPage, typically with a QPDFObjectHandle from another PDF or a newly created page object:
cpp
QPDF other_doc;
other_doc.processFile("other.pdf");
QPDFObjectHandle new_page = other_doc.getObjectByNumber(4, 0);  // Example page object
doc.addPage(new_page, false);  // Append (false for end)
This copies the page while resolving dependencies. Smart pointers in QPDFObjectHandle ensure automatic reference counting and deallocation. Error handling relies on exceptions from the QPDFException hierarchy, thrown for parsing errors, invalid objects, or encryption issues. Wrap operations in try-catch blocks and use doc.anyWarnings() to check for non-fatal issues post-processing. For validation, processFile inherently checks file integrity; invalid inputs trigger exceptions. Memory management is handled via RAII with smart pointers, avoiding explicit deletes. For build integration, use to find and link the library, assuming QPDF is installed via its build system or a :
cmake
cmake_minimum_required(VERSION 3.16)
project(MyApp LANGUAGES CXX)
find_package(qpdf REQUIRED)
add_executable(myapp main.cc)
target_link_libraries(myapp qpdf::libqpdf)
This locates headers and libraries automatically. For cross-language use, bindings like pikepdf provide FFI access to the C API, while can interface via similar wrappers. Best practices include always validating inputs with processFile before manipulation to catch malformed PDFs early. When generating output, prefer QPDFXRefStream via QPDFWriter::setUseXRefStreams(true) for modern, compressed streams compatible with PDF 1.5+. Avoid direct editing of compressed streams, as it risks corrupting content; instead, use high-level methods like addPage or replaceObject to maintain structural integrity.

History

Origins and early development

QPDF was originally created in 2001 by Jay Berkenbilt during his employment at Apex CoVantage, initially to support structural analysis of PDF files for and internal tools. The software addressed a need for low-level PDF handling in environments where rendering was not required, such as decryption and object inspection. From 2001 to 2005, Berkenbilt made periodic modifications to QPDF for personal and work-related purposes, enhancing its utility for non-rendering tasks like examining PDF internals without proprietary dependencies. A key motivation was the scarcity of free tools for such manipulations, particularly to process encrypted academic PDFs that were otherwise inaccessible without . In 2005, after leaving CoVantage—with the company's permission to retain ownership—Berkenbilt continued developing QPDF. The first public release occurred on April 29, 2008, as version 2.0 on under the Artistic License 2.0, with basic command-line interface and library support geared toward users; Windows compatibility was absent until subsequent versions.

Major releases and evolution

QPDF's development has seen several major releases since its initial public availability, each introducing significant enhancements to functionality, compatibility, and build processes. , released in April 2008, marked the first widely noted public release, adding support for Windows platforms and the ability to build as a (DLL), broadening its accessibility beyond systems. Version 7.0, released on September 15, 2017, shifted the project's licensing to the Apache 2.0 license from the previous Artistic License 2.0, facilitating greater adoption in open-source ecosystems while improving handling to better conform to PDF 1.7 specifications, including enhanced support for security parameters and key derivation. Version 10.0, released in 2020, introduced performance enhancements, support for external encryption libraries like , and expansions to the C API for better integration. It also provided initial handling for aspects of PDF 2.0. Version 11.0, released on September 10, 2022, represented a pivotal evolution by introducing JSON output version 2 in QDF mode for structured manipulation of PDF internals, a comprehensive expansion of the C API, partial compatibility with PDF 2.0 (ISO 32000-2), and a transition to the build system for more modern and portable compilation. It also incorporated security patches for vulnerabilities including those affecting earlier versions like CVE-2021-25786. Subsequent releases focused on refinement and . Version 12.0 in 2025 included changes and further optimizations. The latest release, version 12.2.0 on May 4, 2025, emphasized robustness with enhancements to object for handling complex or damaged PDFs and improvements tailored for processing large files, reducing memory usage and processing time in high-volume scenarios. Over time, QPDF has evolved from a primarily (CLI) tool for basic PDF transformations to a robust C++ library emphasizing structural preservation and extensibility, driven by community contributions through its repository where issues and pull requests facilitate ongoing maintenance. This shift is evident in its integration into broader ecosystems, such as the qpdf available on CRAN since , which leverages the library for PDF manipulation within statistical computing workflows.

References

  1. [1]
    QPDF: A Content-Preserving PDF Transformation System
    QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous ...
  2. [2]
    What is qpdf? — qpdf 12.2.0 documentation
    QPDF is a program and C++ library for structural, content-preserving transformations on PDF files. QPDF's website is located at https://qpdf.sourceforge.io/.
  3. [3]
    qpdf: A content-preserving PDF document transformer - GitHub
    QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous ...
  4. [4]
    Release Notes — qpdf 12.2.0 documentation
    Release Notes . This is a curated list of user-facing and developer-facing changes. Prior to version 12, file ChangeLog contained more detail.
  5. [5]
    Running qpdf — qpdf 12.2.0 documentation
    Feb 7, 2021 · Using 128-bit encryption forces the PDF version to be at least 1.4, or if AES is used, 1.6. Using 40-bit encryption forces the PDF version to be ...
  6. [6]
    Downloading qpdf — qpdf 12.2.0 documentation
    qpdf is included in most Linux distributions. Native packages are available for many other operating systems as well. Other resources: GitHub release page.
  7. [7]
    [PDF] Release 12.3.0 Jay Berkenbilt - QPDF Documentation
    qpdf. Release 12.3.0. Jay Berkenbilt. Oct 22, 2025. Page 2. Page 3 ... while staying true to qpdf's philosophy of not hiding document structure from the ...
  8. [8]
    qpdf - Debian Package Tracker
    qpdf tools for transforming and inspecting PDF files. general source: qpdf (main) version: 12.2.0-1 arch: all any std-ver: 4.7.2 VCS: unknown
  9. [9]
    qpdf-announce Mailing List for QPDF - SourceForge
    My intention is for trixie (debian version 14) to include a qpdf 12.x version. --Jay. [Qpdf-announce] qpdf 11.10.1: no functional changes. From: Jay B. <ej ...<|separator|>
  10. [10]
    qpdf: Split, Combine and Compress PDF Files - CRAN - R Project
    Jul 2, 2025 · This package interfaces directly to the 'qpdf' C++ library <https://qpdf.sourceforge.io/> and does not require any command line utilities.
  11. [11]
  12. [12]
  13. [13]
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
  19. [19]
    Design and Library Notes — qpdf 12.2.0 documentation
    The QPDF class consults its cache, and if necessary, inserts a new unresolved object, and returns an object handle pointing to the cache entry, which is then ...
  20. [20]
    Using the qpdf Library — qpdf 12.2.0 documentation
    Starting with version 11.0.0, the qpdf command-line tool can produce an unambiguous JSON representation of a PDF file and can also create or update PDF files ...
  21. [21]
    QPDF - Browse /qpdf/10.0.1 at SourceForge.net
    If you just want to use the qpdf command line program or use the qpdf DLL's C-language interface, you can download this file. You can also download this version ...Missing: bindings | Show results with:bindings
  22. [22]
    PDFtk Server Manual - PDF Labs
    When combining pages from multiple PDFs, use one of these options to copy the document ID from either the first or final input document into the new output PDF.Missing: libqpdf | Show results with:libqpdf<|separator|>
  23. [23]
  24. [24]
    qpdf(1) — qpdf — Debian testing - Debian Manpages
    The qpdf program is used to convert one PDF file to another equivalent PDF file. It is capable of performing a variety of transformations such as linearization ...
  25. [25]
  26. [26]
  27. [27]
  28. [28]
  29. [29]
    Building and Installing qpdf — qpdf 12.2.0 documentation
    Part of qpdf's test suite does comparisons of the contents PDF files by converting them to images and comparing the images. The image comparison tests are ...
  30. [30]
  31. [31]
    [PDF] QPDF Manual - Debian Sources
    Jan 5, 2021 · Jay Berkenbilt. Copyright ... QPDF was originally created in 2001 and modified periodically between 2001 and 2005 during my employment at.
  32. [32]
    CVE-2021-25786 Detail - NVD
    An issue was discovered in QPDF version 10.0.4, allows remote attackers to execute arbitrary code via crafted .pdf file to Pl_ASCII85Decoder::write parameter in ...