PDFtk
PDFtk (PDF Toolkit) is a cross-platform command-line utility for manipulating PDF documents, enabling tasks such as merging multiple PDFs into one, splitting a PDF into separate pages, rotating pages, filling forms, applying backgrounds or stamps, decrypting or encrypting files, and unpacking attachments, all while preserving the original content and structure.[1][2] Developed by Sid Steward and first released on March 7, 2004, as version 0.93, PDFtk originated as a free, open-source tool written in Java to address common PDF editing needs through scripting and automation, particularly for server-side processing and client-side workflows.[3][4] PDFtk quickly gained popularity among developers and users for its simplicity and reliability, with key milestones including the addition of form-filling capabilities in version 1.00 (August 14, 2004), support for AES encryption and bookmark merging in version 2.00 (May 22, 2013), and metadata handling enhancements in version 2.02 (July 24, 2013), which remains the current stable release.[3] Operated under PDF Labs by Sid Steward, the author of the O'Reilly book PDF Hacks, PDFtk has evolved into a family of tools including the original PDFtk Server (free command-line version available for Linux, macOS, and Windows), PDFtk Free (a no-cost graphical interface for basic merging and splitting on Windows 10/11), and PDFtk Pro (a paid graphical application at $3.99 for advanced features like watermarking, securing, and batch processing on Windows 10/11).[4][5][6] These versions maintain backward compatibility and are widely used in software publishing, web applications, and document automation, though the tool requires Java Runtime Environment (JRE) for operation on some platforms.[1][2]Introduction
Overview
PDFtk is a cross-platform command-line toolkit designed for straightforward manipulation of PDF documents, enabling common operations such as merging multiple files, splitting them into individual pages, rotating pages, and more.[1] Developed by Sid Steward, it was first released in 2004 as an open-source solution under the GNU General Public License version 2, prioritizing ease of use for scripting and automation without requiring complex PDF expertise.[7] From its origins as a basic command-line utility, PDFtk has evolved to encompass graphical user interfaces in variants like PDFtk Free and the proprietary PDFtk Pro, expanding its accessibility for both technical and non-technical users.[5] PDFtk Free and Pro are graphical editions available only for Windows 10 and 11. This progression reflects its foundational philosophy of simplicity, allowing users to handle routine PDF tasks efficiently on desktop and server environments alike.[1] The command-line PDFtk Server supports Linux, Windows, and macOS platforms, facilitating broad adoption in diverse computing setups.[1]Platforms and Licensing
PDFtk supports multiple operating systems, enabling broad accessibility for users across different environments. The command-line PDFtk Server edition runs on Windows, macOS, and Linux distributions.[1] On Windows, it is distributed as native binaries via an installer executable, allowing straightforward setup without additional dependencies.[1] For Linux, installation typically occurs through package managers such as apt on Debian-based systems, or by compiling from source code for custom builds.[1] On macOS, users can install binaries via a provided package installer or use Homebrew to manage the tool, often through the pdftk-java variant for compatibility.[1][8] Licensing for PDFtk varies by edition to accommodate different user needs. PDFtk Server is released as open-source software under the GNU General Public License (GPL) version 2, which permits free use, modification, and distribution provided its terms are followed. However, redistribution of binaries within proprietary commercial products requires a separate commercial license costing $995.[7] PDFtk Free, a graphical interface built on the Server edition, operates as freeware with no source code access provided, making it suitable for indefinite personal use without cost.[5] PDFtk Pro, an enhanced proprietary version, requires a paid license priced at $3.99 as of 2025 and includes an end-user license agreement that governs its use, incorporating third-party libraries under their respective terms.[6] All editions are primarily distributed through the official PDF Labs website, where users can download installers and documentation.[9] The PDFtk Server edition is available via open-source repositories in some Linux distributions, while others like Debian and Ubuntu provide the pdftk-java Java port for compatibility, facilitating easy integration into Linux workflows.[1][10] For end-users, the free editions—Server and Free—support personal and non-commercial applications without licensing fees, while PDFtk Pro targets commercial scenarios with added support and features, ensuring compliance through its proprietary model.[6][7] This tiered approach balances accessibility for hobbyists with professional-grade options for businesses.[9]History
Development Origins
PDFtk was developed by Sid Steward, a PDF expert and author of the 2004 O'Reilly book PDF Hacks, who founded PDF Labs to simplify PDF workflows. Motivated by the need for straightforward, command-line PDF manipulation tools that avoided the complexity and licensing costs of Adobe Acrobat, Steward created PDFtk as an accessible alternative for users handling document merging, splitting, and form filling.[11][1] This project emerged in 2004, a period of rapid PDF adoption for web distribution and professional printing, as the format had become a standard for cross-platform document exchange following its 1993 introduction by Adobe, yet open-source options for practical manipulation remained scarce. PDFtk addressed these gaps by offering a lightweight, scriptable utility that did not require proprietary software like Acrobat.[12][13] Technically, PDFtk's initial foundation leveraged the open-source iText Java library—first released in 2000—for handling PDF structures and operations, while incorporating C++ code for efficient core processing such as page extraction and compression. The first version, 0.93, was released on March 7, 2004. Version 1.00, released on August 14, 2004, added key features such as form-filling capabilities, marking an important milestone in PDF processing.[3][14] PDF Labs, operated by Steward since its inception, has served as the primary steward for PDFtk's maintenance, providing source code, documentation, and updates to ensure its reliability across platforms.[4]Key Releases and Milestones
PDFtk's development reached a significant milestone with the release of version 2.02 on July 24, 2013, which introduced options likedrop_xmp for removing XMP metadata and dump_data for extracting document information, alongside enhanced bookmark merging during PDF operations and fixes for issues including password handling and decryption errors.[3] This version built on version 2.00's addition of AES decryption support, enabling compatibility with PDF 1.7 features such as 256-bit AES encryption (Extension Level 3), and improved overall error handling for greater stability.[3][15] The tool has seen no official core updates from PDF Labs since 2013, reflecting its mature and reliable state for everyday PDF tasks.[3]
Following the stabilization of the command-line version, PDF Labs introduced graphical editions to broaden accessibility, launching PDFtk Free and PDFtk Pro in the mid-2010s as user-friendly interfaces built atop the core functionality.[5] PDFtk Free offers basic merging and splitting without cost, while PDFtk Pro provides advanced features like watermarking and page rotation for a one-time fee, both emphasizing intuitive GUIs over command-line usage.[6]
A key community-driven milestone occurred on December 30, 2017, when Marc Vinyals initiated the pdftk-java project, a GPL-licensed Java port of the original tool designed to resolve compatibility challenges stemming from the deprecation of the GNU Compiler for Java (GCJ) in distributions like Debian and issues with Oracle's JDK licensing.[16] This port maintains functional parity with the original while leveraging standard Java runtime environments, facilitating continued adoption in open-source ecosystems.[16]
As of 2025, PDFtk's core remains unchanged, with ongoing maintenance occurring through third-party graphical frontends such as PDFTK Builder Enhanced version 4.1.9, released on November 15, 2025, which wraps the version 2.02 server tool in an enhanced GUI for Windows users.[17]
Features
Core Manipulation Functions
PDFtk provides essential tools for manipulating the structure of PDF documents through command-line operations, enabling users to combine, divide, reorient, and restore files without requiring graphical interfaces or proprietary software. These functions operate on the document's page-level architecture, allowing precise control over content arrangement and integrity. Merging and collation are achieved primarily via thecat operation, which concatenates pages from one or more input PDFs into a single output file, supporting selective page ranges and reordering for efficient document assembly. For instance, to combine the first three pages of file A with pages 1 through 2 of file B, the command pdftk A.pdf B.pdf cat A1-3 B1-2 output C.pdf produces a new PDF C with the specified sequence. This method handles multi-page inputs by assigning handles (e.g., A for the first file) and supports filters like even or odd pages, making it suitable for collating scanned documents or reports from disparate sources.[2]
Splitting and bursting functions facilitate the extraction of individual pages or ranges from a PDF, breaking down large files into manageable components for archiving or further processing. The burst operation, for example, decomposes an input PDF into separate single-page files, naming them sequentially such as pg_0001.pdf, while also generating a doc_data.txt file summarizing the original structure including bookmarks and metadata. Alternatively, users can extract specific ranges using cat with output redirection, though burst is optimized for full disassembly.[2]
Page rotation and manipulation allow targeted adjustments to page orientations within a document, correcting issues from scanning or layout errors without affecting content. Rotations are specified in degrees—north (0°), east (90° clockwise), south (180°), or west (270° clockwise)—applied to entire documents or page subsets via the rotate modifier. A command like pdftk input.pdf rotate 1-endeast output rotated.pdf applies a 90° clockwise rotation to all pages, while finer control such as pdftk A.pdf cat A1-5east A6-10west output adjusted.pdf rotates only designated ranges eastward or westward.[2]
Repair and decompression operations address file integrity and editability by fixing structural damages and unpacking compressed streams for manual inspection or modification. To repair a corrupted PDF, PDFtk attempts to reconstruct damaged cross-reference (XREF) tables and stream lengths, as in pdftk broken.pdf output fixed.pdf, which salvages readable content where possible without external dependencies. Decompression via uncompress expands page streams for text editing, producing human-readable PostScript-like code, followed by compress to restore the original format; for example, pdftk in.pdf uncompress output editable.pdf enables direct alterations before recompression. These features are particularly valuable for recovering partially damaged files from unreliable sources.[2][18][1]
Forms and Overlay Operations
PDFtk supports form-related operations for generating and filling PDF forms, as well as applying overlays like backgrounds and stamps to documents. These features enable automation of form processing and visual enhancements without altering the underlying content structure. Form handling begins withgenerate_fdf, which extracts form field data from a filled PDF into an FDF (Forms Data Format) file for editing or reuse; for example, pdftk form.pdf generate_fdf output form.fdf creates an FDF file containing field values. To fill a form, the fill_form operation applies data from an FDF file to a blank PDF form: pdftk blank.pdf fill_form form.fdf output filled.pdf. Additional options like flatten integrate the form data permanently into the PDF, preventing further edits, as in pdftk blank.pdf fill_form form.fdf flatten output final.pdf. These capabilities are useful for batch form population in workflows.[2]
Overlay functions include applying backgrounds and stamps to pages. The background operation overlays a PDF page as a background on corresponding pages of an input document: pdftk front.pdf background back.pdf output under.pdf places the first page of back.pdf behind each page of front.pdf. For multiple backgrounds, multibackground attaches different background pages to specific input pages. Similarly, stamp places a PDF page on top of input pages: pdftk front.pdf stamp stamp.pdf output over.pdf, with multistamp for varied stamps per page. These operations support page ranges and are ideal for adding watermarks, headers, or footers across documents.[2]
Metadata and Security Operations
PDFtk provides tools for viewing and modifying PDF metadata, enabling users to extract and update document information such as titles, authors, subjects, and custom fields stored in the PDF Info dictionary. Thedump_data operation reads a single input PDF and outputs its metadata, bookmarks (outlines), page metrics (including media box dimensions, rotation, and labels), and other structural data to a text file or standard output. For instance, the command pdftk input.pdf dump_data output metadata.txt generates a report containing key-value pairs like "InfoKey: Title" and "InfoValue: Document Title", along with bookmark hierarchies and page labels.[2] An optional UTF-8 variant, dump_data_utf8, ensures proper encoding for international characters in the output.[2]
To update metadata, PDFtk uses the update_info operation, which applies changes from an input data file to the PDF's Info dictionary, supporting additions or modifications to standard fields (e.g., author, creator) and custom metadata in UTF-8 format. Users prepare an info file matching the dump_data format, edit it—such as adding "InfoKey: CustomField" with "InfoValue: Example Value"—and then run pdftk input.pdf update_info info.txt output updated.pdf. This operation preserves existing bookmarks unless explicitly overridden and is useful for batch standardization of document properties. The UTF-8 counterpart, update_info_utf8, handles accented or non-ASCII characters reliably.[2]
For security, PDFtk supports encryption and decryption of PDFs using owner and user passwords to control access and permissions. Decryption requires the input password via input_pw <password>, allowing processing of secured files; for example, pdftk secured.pdf input_pw mypassword output unsecured.pdf removes protection if the password is correct. Encryption applies during output, setting an owner password for full control (e.g., pdftk input.pdf output secured.pdf owner_pw mypassword) or a user password for restricted access, with options to allow specific permissions like printing or content modification using allow <permissions> (e.g., allow Printing ModifyContents). The tool uses 128-bit RC4 encryption by default for output, with a 40-bit RC4 option via encrypt_40bit for compatibility with older viewers, though it can decrypt AES-encrypted inputs. PDFtk does not support 256-bit AES encryption in its original implementation.[2][18]
PDFtk also handles file attachments and stream compression for optimization and embedding. The attach_files operation embeds external files (e.g., images, documents) into a PDF, optionally associating them with a specific page via to_page <n>; for example, pdftk input.pdf attach_files attachment.[txt](/page/TXT) to_page 1 output embedded.pdf adds the file to the first page's attachments. Conversely, unpack_files extracts all embedded files to a directory, as in pdftk input.pdf unpack_files output ./attachments/. For compression, the compress option restores or applies Flate compression to page streams during output (pdftk input.pdf compress output optimized.pdf), reducing file size, while uncompress removes it for easier text extraction or editing (pdftk input.pdf uncompress output editable.pdf). These features aid in optimizing PDFs without altering core content.[2][18]
Despite these capabilities, PDFtk has limitations in advanced security features; it does not support applying or verifying digital signatures, which require certificate-based authentication beyond password protection, nor does it implement digital rights management (DRM) mechanisms like expiration dates or device binding. These omissions stem from its reliance on older PDF libraries, focusing instead on basic password-based encryption and metadata handling.[2][3]
Implementations
Original Command-Line Version
The original command-line version of PDFtk, known as PDFtk Server, is implemented primarily in C++ and interfaces with a GCJ-compiled version of the Java-based iText library for PDF manipulation tasks, integrating the native-compiled Java code into the C++ executable.[19][20] This hybrid approach allows the tool to leverage iText's robust PDF handling while maintaining a native executable footprint. The source code is released under the GNU General Public License (GPL) version 2 or later, making it freely available for modification and redistribution.[7][21] Installation of PDFtk Server is straightforward across major platforms, with pre-built binaries provided for Windows via an executable installer that places the tool in the system path for command-line access.[1] On Linux distributions such as Debian and Ubuntu, it can be installed directly from official repositories using package managers like apt.[22] For macOS and other Unix-like systems including FreeBSD, Solaris, and HP-UX, users typically compile from source: download the source archive, adjust the Makefile for the target compiler (e.g., GCC version), and run platform-specific build commands likemake -f Makefile.[Debian](/page/Debian). However, due to the deprecation of GCJ in 2016, compiling on modern systems requires obtaining and using an older GCC version with GCJ support, which may not be readily available.[1]
The command-line interface follows a modular syntax designed for flexibility in scripting environments: pdftk [<input PDF files | - | PROMPT>] [<operation> <operation arguments>] [output <output filename | - | PROMPT>] [options].[2] Input files can be specified directly (e.g., A=input1.pdf B=input2.pdf) or via stdin with -, and operations such as cat for merging or burst for splitting include arguments like page ranges (e.g., cat A1-5 B3-end).[2] Output is directed to a file, stdout (-), or interactive prompt (PROMPT), with global options like user_pw for encryption or compress for optimization; the --help flag provides a concise summary of all available options and syntax.[2]
PDFtk Server excels in scriptable automation for batch PDF processing, such as merging multiple documents in shell scripts on servers or client machines, due to its integration with tools like Bash or cron jobs.[1] Its lightweight design, with binaries typically under 1 MB, ensures minimal resource overhead, making it ideal for embedding in automated workflows without the bloat of full graphical suites.[1]