Format
In computing and information technology, a format refers to the standardized structure and encoding method used to organize, store, or transmit data, enabling compatibility and interoperability across systems and devices.[1] This encompasses file formats for documents, images, audio, and video, as well as data stream formats for network transmission, ensuring that information can be reliably interpreted by software and hardware.[2] Formats play a critical role in digital preservation, where selecting sustainable ones prevents obsolescence and maintains long-term accessibility of cultural and scientific records.[3] Beyond computing, the term format broadly describes the arrangement or presentation of content in various media. In publishing and printing, it denotes the physical layout, size, and design of materials such as books or periodicals, influencing readability and aesthetic appeal.[4] In broadcasting and entertainment, a format outlines the organizational plan of programs, including segments, styles, and interactive elements, as seen in radio shows or television series structured around talk, music, or news.[4] Historically, the concept traces back to the 19th century in printing contexts, evolving with technological advancements—from mechanical typesetting to digital encoding—to support efficient production and distribution.[4] Key challenges in formats include proprietary versus open standards, where closed formats like early Microsoft Office files limited interoperability, prompting the development of open alternatives such as PDF/A for archiving and ODF for office documents to promote vendor-neutral exchange.[2][5][6] International bodies like the International Organization for Standardization (ISO) define many formats to ensure global consistency, as in ISO/IEC 14496 for multimedia compression.[7] In data science and Earth observation, formats like NetCDF facilitate complex, multidimensional datasets for scientific analysis, highlighting their role in enabling reproducible research and collaboration.[8]Overview
Definition
A format is a predefined structure, pattern, or standard way of organizing information, media, or processes to ensure consistency, compatibility, or presentation.[4][9] This encompasses the overall plan of arrangement, including elements such as shape, size, layout, and general makeup, which dictate how content is assembled and displayed.[4][10] Key attributes of a format include specific encoding rules, protocols, and organizational standards that govern the assembly and interpretation of elements within a given medium.[11] Such attributes promote interoperability across systems while maintaining a uniform approach to handling diverse types of data or materials.[12] The term "format" originated in the 19th century from French, derived from Latin liber formatus meaning "a book formed," initially referring to the physical shape, size, and binding of printed works.[13] By the mid-20th century, particularly from the 1960s onward, its usage expanded from physical print contexts to digital domains, where it described the standardized arrangement of data in computing and electronic media.[13][14] This evolution reflected broader technological shifts, adapting the concept to ensure reliable storage, transmission, and rendering in non-physical environments.[15]Etymology and Historical Development
The word "format" originates from the Latin formatus, the past participle of formare, meaning "to shape" or "to form." It entered the English language in the early 19th century via the French format and German Format, initially denoting the physical shape or size of a book, as in the phrase liber formatus ("a book formed" or "shaped book").[13][4] The term first gained prominence in 1840 within the printing industry, where it described standardized book dimensions to facilitate production and distribution.[4] This usage coincided with advancements in printing technology during the early Industrial Revolution, as steam-powered presses, such as Friedrich Koenig's 1814 model, enabled mass production of uniform texts, necessitating consistent formats for efficiency and scalability.[10][16] In the post-1950s computing era, the concept evolved into digital data standards, with early systems like magnetic tape and core memory employing structured formats for information storage and retrieval, as seen in generalized file maintenance programs developed by user groups such as SHARE.[17][18] The Industrial Revolution's emphasis on standardization profoundly influenced the term's development, as mass production in printing and related fields demanded interchangeable parts and uniform specifications to achieve economies of scale, setting precedents for later applications in media and technology.[19]Publishing and Visual Media
Print and Document Formats
Print and document formats encompass standardized dimensions and layouts used in traditional publishing and document production, ensuring consistency, efficiency, and compatibility across various media. These formats have evolved from early bookbinding techniques to modern international standards, facilitating the production of books, newspapers, and official documents. The foundational developments trace back to the mid-15th century with Johannes Gutenberg's invention of the movable-type printing press around 1450, which enabled mass production of texts in large formats like the folio used for the Gutenberg Bible, a two-volume work measuring approximately 11.5 by 15.5 inches per page.[20] This innovation marked the shift from handwritten manuscripts to printed books, standardizing page sizes based on folding large sheets of paper. Over centuries, printing technology progressed through hand-press methods to steam-powered rotary presses in the 19th century and eventually to offset lithography in the early 20th century, which allowed for high-volume reproduction on standardized paper sizes while maintaining the folding logic of earlier formats.[21] In bookbinding history, traditional formats such as folio, quarto, and octavo defined book sizes by the number of folds applied to a printed sheet. A folio involved a single fold, yielding two leaves (four pages) and resulting in books taller than 13 inches, often used for grand works like Bibles or atlases due to their imposing scale.[22] A quarto, folded twice to produce four leaves (eight pages), measured about 10 to 13 inches tall and became common for literature and pamphlets after Gutenberg's era, balancing readability with portability.[23] The octavo, folded three times for eight leaves (16 pages) and typically 8 to 10 inches tall, dominated later print runs for novels and reference books, optimizing paper use in an age when sheets were handmade and costly. These formats persisted into the industrial era, influencing modern printing until superseded by metric standards.[22] The contemporary global benchmark for print formats is the ISO 216 standard, established in 1975 and revised in 2007, which defines trimmed sizes for writing paper and printed matter in administrative, commercial, and technical applications.[24] It features three primary series—A, B, and C—all sharing an aspect ratio of $1 : \sqrt{2} (approximately 1:1.414), derived from the geometric principle that folding a sheet in half along its shorter side preserves the ratio, enabling seamless scaling without distortion.[25] The A-series, the most widely adopted, starts with A0 at 841 × 1189 mm (area of 1 square meter) and halves in area per size; for instance, A4 measures 210 × 297 mm and serves as the default for everyday documents worldwide.[26] The B-series provides intermediate sizes, with B0 at 1000 × 1414 mm, ideal for posters and books requiring proportions between A sizes, as its area is the geometric mean of corresponding A formats.[27] The C-series, used primarily for envelopes, fits A-series sheets; C4, for example, accommodates an unfolded A4 at 229 × 324 mm.[25] This folding logic ensures that an A0 sheet yields two A1s, four A2s, and so on, promoting efficient paper utilization in printing workflows.[26] These standards find broad applications in publishing and documentation, enhancing practicality and uniformity. In book production, A-series sizes like A5 for paperbacks and B-series for illustrated volumes allow precise trimming and binding while minimizing waste.[28] Newspapers often employ larger formats such as A1 or B1 sheets for broadsheets, folded to A2 or A3 for readability, supporting high-circulation runs via offset printing.[29] Legal documents favor A4 for contracts and filings due to its compatibility with international filing systems and photocopying, where the \sqrt{2} ratio prevents proportion loss during enlargement or reduction.[30] Overall, ISO 216's advantages include superior paper efficiency compared to non-scaling formats and wide global interoperability, adopted by most countries worldwide, though with notable exceptions such as the United States and Canada.[29][24]Photographic and Visual Formats
Photographic formats encompass the physical and digital standards used for capturing, processing, and presenting still images, influencing everything from artistic expression to commercial applications. The Daguerreotype, introduced in 1839 by Louis Daguerre, marked the first commercially viable photographic process, producing unique positive images on silver-plated copper plates that revolutionized portraiture and early visual documentation.[31] This format's fine detail and permanence laid the groundwork for photography's integration into art, where it enabled precise rendering of subjects, and into advertising, fostering the visual promotion of products through lifelike representations. By the mid-20th century, the Polaroid instant format, launched in 1948 by Edwin Land, democratized image production by allowing immediate development, which expedited creative workflows in both artistic experimentation and rapid advertising campaigns. In analog photography, film formats vary by gauge and frame size, directly affecting aspect ratios and resolution. The 35mm format, standardized with a 24×36 mm frame, yields a 3:2 aspect ratio that became ubiquitous for its portability and balance of detail, suitable for journalistic and artistic photography while supporting enlargements up to moderate sizes without excessive grain.[32] Medium format, often using a 6×6 cm square frame on 120 film, provides a 1:1 aspect ratio and approximately four times the area of 35mm, resulting in superior resolution and tonal gradation that enhances depth in landscape and portrait art, as well as sharper reproductions in advertising layouts.[33] Large format cameras, employing sheets like 4×5 inches with a typical 4:5 aspect ratio, offer even greater film area—about 15 times that of 35mm—enabling exceptional resolution and control over perspective, which has been instrumental in fine art prints and high-end commercial visuals requiring intricate detail.[34] These larger formats' resolution advantages stem from the increased negative size, allowing finer grain structure and reduced enlargement needs, thereby preserving visual fidelity in outputs for artistic exhibitions and promotional materials.[35] Digital visual standards have extended these principles into computational realms, focusing on compression and display for maintaining image quality. The JPEG format, defined by ISO/IEC 10918, utilizes discrete cosine transform-based lossy compression to approximate human visual perception, minimizing artifacts in photographs while enabling efficient storage and transmission for web-based art portfolios and advertising imagery. In contrast, the TIFF format supports lossless compression and multi-layer capabilities, preserving exact pixel data essential for professional retouching in artistic works and color-accurate ad proofs where subtle tonal variations are critical. For display, 4K resolution at 3840×2160 pixels represents a contemporary benchmark, quadrupling the pixel count of 1080p to deliver immersive detail on screens, enhancing the viewing of photographic art in galleries and dynamic advertising on digital billboards.[36] Throughout history, evolving formats from Daguerreotype to digital have empowered photography's dual role in art—evoking emotion through manipulated visuals, as in pictorialism—and in advertising, where precise formats drove commercial innovation by the 1930s.[37] These standards often align with print sizes for seamless output to physical media.Computing and Technology
File and Data Formats
File and data formats encompass standardized methods for encoding, storing, and exchanging digital information in computing environments, ensuring compatibility across systems and applications. At their core, these formats rely on binary encoding, where data is represented as sequences of 0s and 1s to facilitate machine-readable storage and transmission. Metadata is typically embedded within the format to describe attributes such as file creation date, author, version, or structural elements, enabling efficient retrieval and processing without altering the primary content. Compression techniques are integral to many formats, with lossless methods like DEFLATE preserving all original data for exact reconstruction, while lossy approaches, such as those used in multimedia, discard perceptually insignificant details to achieve smaller file sizes at the cost of minor quality reduction. Representative examples illustrate these principles: the Portable Document Format (PDF), standardized as ISO 32000-1:2008, encodes documents in a binary structure that includes text, images, and vector graphics, along with metadata for security and accessibility features, often incorporating lossless compression for fidelity in professional printing and archiving.[38] Similarly, the MP4 format, defined by ISO/IEC 14496-14:2020, serves as a container for multimedia data, using binary encoding to bundle audio, video, and subtitles, with support for both lossless and lossy compression to optimize streaming and storage.[39] Standards bodies play a pivotal role in defining these formats to promote consistency and adoption. The Internet Engineering Task Force (IETF) focuses on network-oriented data formats, producing specifications like RFC 8259 for JavaScript Object Notation (JSON), which standardizes a lightweight, text-based format for structured data exchange over the web. The International Organization for Standardization (ISO), often in collaboration with the International Electrotechnical Commission (IEC), develops broader multimedia and document standards, such as ISO/IEC 26300 for the Open Document Format (ODF). Interoperability challenges arise between open and proprietary formats; for instance, both ODF (ISO/IEC 26300) and Office Open XML (ISO/IEC 29500, used in DOCX) are standardized, but implementations may introduce differences in feature support and formatting across applications. The evolution of file and data formats reflects advancements in computing hardware and software needs. In the 1890s, punched cards emerged as one of the earliest data formats, developed by Herman Hollerith for the U.S. Census Bureau to encode tabular information via holes punched in stiff paper cards, building on earlier punched card concepts.[40] By the late 20th century, structured text-based formats gained prominence; Extensible Markup Language (XML), recommended by the World Wide Web Consortium (W3C) in 1998 as a subset of ISO 8879 SGML, introduced hierarchical tagging for self-describing documents, widely adopted for configuration files and web services.[41] JSON, developed by Douglas Crockford in the early 2000s as a simpler alternative to XML for JavaScript-driven applications, evolved into an IETF standard via RFC 8259 in 2017, favoring key-value pairs for efficient API data interchange. Recent trends include WebAssembly (Wasm), which reached minimum viable product status in 2017 under W3C auspices and has continued to evolve, with version 3.0 released in September 2025, providing a binary instruction format for high-performance web applications compiled from languages like C++ and Rust, bridging the gap between native code and browser environments.[42][43]Storage and Disk Formatting
Storage and disk formatting refers to the processes used to prepare physical and virtual storage media for data storage in computing systems. These procedures initialize the media's structure, enabling the operating system to manage files effectively. Historically, formatting concepts emerged with magnetic tape drives in the 1950s, such as the IBM 726 system introduced in 1952, which required initialization of tracks for sequential data recording on oxide-coated tapes.[18] The transition to hard disk drives began in 1956 with the IBM RAMAC 305, the first commercial HDD, where low-level formatting defined magnetic sectors on rotating platters to allow random access storage.[18] By the 1960s, removable disk packs like the IBM 1311 necessitated partitioning during formatting to organize multiple logical volumes on a single physical device.[18] The rise of personal computing in the 1980s made high-level formatting accessible via operating systems, setting the stage for standardized file systems on consumer media.[44] Formatting types are broadly categorized as low-level and high-level. Low-level formatting, often performed at the factory, physically divides the disk into tracks, sectors, and cylinders while initializing servo information for precise head positioning on HDDs.[45] This step establishes the basic geometry of the media, such as 512-byte sectors, preparing it for higher-level operations. High-level formatting, conducted by the operating system, builds the logical structure atop this foundation, including partition tables (e.g., Master Boot Record or GUID Partition Table) and file systems like FAT32 or NTFS.[46] For FAT32, this involves writing the boot sector, initializing the file allocation table (FAT) with empty cluster entries to track file locations, and creating the root directory.[47] NTFS formatting similarly initializes the master file table (MFT) and security descriptors.[46] Within high-level formatting, a quick format rapidly erases the existing file allocation table and root directory without scanning for defects, while a full format additionally verifies sectors by writing and reading test patterns to identify and mark bad areas.[48] The formatting process has significant implications for data destruction. Quick formatting merely removes references in the file allocation table, leaving actual data intact and recoverable with forensic tools.[48] Full formatting overwrites sectors with a pattern (e.g., zeros), providing moderate protection by scanning and reinitializing the entire surface, but it does not guarantee irrecoverability due to potential remapping of bad sectors.[49] For secure erasure, overwriting methods—such as a single pass of fixed data for HDDs—are recommended over standard formatting, as they replace user data in addressable areas, though multiple passes may be used for higher assurance.[49] On SSDs, introduced commercially in the late 1990s and popularized in the 2000s, secure erase commands (e.g., ATA Secure Erase) are preferred, as they invoke the drive's internal mechanisms to purge all cells, including those affected by wear leveling and over-provisioning, bypassing the limitations of overwriting.[49] In DOS and Windows environments, the FORMAT command facilitates these high-level operations, allowing users to specify the file system (e.g.,FORMAT D: /FS:[NTFS](/page/NTFS)) and quick mode (/Q) for efficiency on verified media.[48]
Modern challenges in formatting arose with solid-state drives (SSDs) in the 2000s, where flash memory's erase-block architecture differs from magnetic media. Traditional full formatting can accelerate wear on NAND cells without effectively reclaiming space, leading to the adoption of TRIM in 2009 as part of the ATA specification.[50] TRIM enables the operating system to inform the SSD of deleted blocks during or after formatting, allowing immediate garbage collection to erase invalid data and maintain write performance.[50] This preparation ensures the media's compatibility with file formats by optimizing block allocation for efficient data placement.[50]