Fact-checked by Grok 2 weeks ago

Filename extension

A filename extension, commonly referred to as a file extension, is a suffix added to the end of a computer file's name, typically following a period (.), that indicates the file's format, type, or the application intended to handle it.^[1] These extensions, often three or four characters long (e.g., .txt for plain text or .jpg for JPEG images), enable operating systems to associate files with specific programs, determine how to display icons, and process the data appropriately.^[2] The practice of using filename extensions emerged in early microcomputer operating systems, with notable origins in CP/M (Control Program for Microcomputers), developed in 1974, where three-letter suffixes like .BAS for BASIC source code helped categorize files within the limited 8.3 naming convention (eight characters for the name, three for the extension).^[3] This approach was carried over and standardized in MS-DOS (1981) and subsequent Windows versions, where extensions became integral for file type identification and became a core feature of the FAT file system.^[3] In contrast, Unix, introduced in the 1970s, and Unix-like systems (such as Linux and macOS) do not enforce or rely on extensions for determining file types at the kernel level; instead, file handling depends on content inspection (e.g., via magic numbers or shebangs) or user-defined associations, though extensions remain a widespread convention for readability and application compatibility.^[4] Filename extensions play a critical role in cross-platform interoperability, software development, and user workflows, but they also pose security risks, such as in phishing attacks where misleading extensions (e.g., .txt.exe) exploit default hiding in some interfaces.^[5] Common extensions have evolved with technology, from legacy formats like .doc (pre-2007 Microsoft Word) to modern ones like .docx (XML-based), reflecting shifts toward open standards and compression.^[2] While extensions are optional in many systems, altering them does not change the file's underlying format and may prevent proper opening without manual intervention.^[2]

Fundamentals

Definition and Purpose

A filename extension, also known as a file extension, is a suffix appended to the end of a filename, typically consisting of a period followed by a short string of characters, usually one to four letters or digits, such as ".txt" in the filename "document.txt".^[6]^[7] This suffix serves as a conventional indicator of the file's type or format, helping both users and software systems to recognize and handle the file appropriately.^[2] The primary purposes of filename extensions include aiding operating systems and applications in identifying the file format to determine the appropriate software for opening, editing, or processing the file; facilitating the organization of files by type within directories for easier management; and providing user convenience by visually signaling the file's intended use through standardized conventions.^[2]^[8] For instance, extensions promote interoperability across different computing environments by allowing files to retain type information even when metadata is not preserved during transfer.^[8] Common examples illustrate these roles: the ".jpg" extension denotes Joint Photographic Experts Group (JPEG) files, which are compressed raster images suitable for photographs and graphics in image viewing or editing applications; ".exe" identifies executable program files on Windows systems, executable by the operating system to run software; and ".pdf" signifies Portable Document Format files, designed for documents that preserve layout, fonts, and images across various platforms and devices without alteration.^[2]^[9] The portion of the filename preceding the extension and period is known as the stem or base name, which uniquely identifies the file's content within its type category.^[6]

Historical Development

The concept of filename extensions traces its roots to early time-sharing systems in the 1960s. In MIT's Compatible Time-Sharing System (CTSS), first demonstrated in 1961 on an IBM 709, each user file consisted of two separate names: a primary name up to six characters long and a secondary name of similar length, which described the file's type or processing requirements, such as "FAP" for assembly language source or "DATA" for data files.^[10] These secondary names functioned as precursors to modern extensions, aiding the system in determining how to handle files, though without a dot separator or fixed length limit. By the mid-1960s, Digital Equipment Corporation (DEC) advanced this idea in systems like the PDP-6 multiprogramming monitor, released in 1964, which explicitly used "filename extensions" separated by a dot (e.g., filename.ext) to denote file types, directly influencing later designs.^[11] This convention carried over to DEC's PDP-8 and other minicomputers, where extensions helped distinguish executables, sources, and data. In the 1970s, as personal computing emerged, early word processors like WordStar (1978) adopted extensions such as .WS for its proprietary format, standardizing their use for document interchange on microcomputers.^[12] The rise of personal systems amplified extensions' role, enabling users to quickly identify file purposes amid growing software diversity. Control Program for Microcomputers (CP/M), developed by Gary Kildall and first released in 1974 (version 1.4 in 1975), formalized the 8.3 filename format—eight characters for the base name and three for the extension—drawing from DEC conventions to fit hardware constraints like limited directory space on 8-inch floppy disks.^[13] This structure, where extensions like .COM for executables or .ASM for assembly code indicated types, became a de facto standard for microcomputers. Microsoft Disk Operating System (MS-DOS), launched in 1981 as 86-DOS adapted for IBM PC, directly cloned CP/M's 8.3 format, embedding it deeply into personal computing ecosystems.^[14] Its influence persisted in Windows until 1995, when Windows 95 introduced long filenames via the VFAT extension, supporting up to 255 characters while maintaining backward compatibility with 8.3 short names.^[15] In contrast, Unix-like systems from the 1970s, such as Version 7 Unix (1979), eschewed enforced extensions, treating filenames as arbitrary strings up to 14 characters without inherent type semantics; instead, file types were determined by content inspection via the file command or, for executables, the shebang mechanism (#!) introduced by Dennis Ritchie around 1979-1980 to specify interpreters like /bin/sh.^[16] This approach carried into Linux (1991) and macOS (2001, based on NeXTSTEP), where extensions remain optional conventions rather than OS mandates, though applications often rely on them for usability. By the 1990s, as computing shifted toward networking and multimedia, extensions facilitated cross-platform compatibility; the Internet Assigned Numbers Authority (IANA) began maintaining an informal association of extensions with MIME types via RFCs like 2046 (1996), aiding web browsers in content handling. The 2000s marked a partial evolution beyond extensions, with embedded metadata gaining prominence for richer identification. The Exchangeable Image File Format (Exif), standardized by the Japan Electronics and Information Technology Industries Association (JEITA) in 1995 (version 1.0) and widely adopted by 2002 with digital cameras, embedded camera settings, timestamps, and GPS data directly in JPEG and TIFF files, reducing reliance on extensions alone for image processing.^[17] This trend extended to other formats, emphasizing content-embedded details over filename suffixes for more robust, tamper-resistant identification in professional and archival contexts.

Technical Implementation

File System and OS Support

Filename extensions are integrated into the filename attribute across major file systems, serving as a suffix following a period (.) to denote file types, though their enforcement and length limits vary. In the File Allocation Table (FAT) file system, commonly used for removable media and legacy Windows installations, filenames adhere to the 8.3 convention, restricting the base name to 8 characters and the extension to up to 3 characters, for a total of up to 11 characters plus the dot.^[6] This format ensures backward compatibility with MS-DOS but limits modern usage, with long filenames stored separately using Unicode while maintaining a short 8.3 alias.^[6] The New Technology File System (NTFS), Windows' default, supports extended filenames up to 255 characters in total (including the extension), stored in Unicode without a rigid 8.3 constraint, allowing flexible extension lengths as part of the overall name.^[6]^[18] Linux's ext4 file system treats filenames, including extensions, as arbitrary byte strings (typically ASCII) stored within directory entries, with a maximum length of 255 bytes for the entire name.^[19] These entries, formatted as struct ext4_dir_entry_2, include the full filename in a name field, where the extension follows the conventional dot separator but is not parsed separately by the file system itself.^[19] Similarly, Apple's File System (APFS), the default for macOS and iOS, accommodates filenames up to 255 UTF-16 code units, incorporating extensions as part of the Unicode name without distinct length restrictions for the suffix.^[20] In contrast, the legacy Hierarchical File System Plus (HFS+), predecessor to APFS, also supports 255 UTF-16 code units per filename, maintaining compatibility for extensions in macOS environments.^[20]^[21] Operating systems enforce filename extensions differently, influencing their role in file handling. Windows integrates extensions deeply into its ecosystem, using them to determine default application associations for opening files, with the registry mapping extensions like .docx to programs such as Microsoft Word.^[22] This reliance makes extensions essential for user interactions in Explorer, where they trigger type-specific behaviors.^[2] macOS employs extensions optionally for file identification, prioritizing Uniform Type Identifiers (UTIs) and content-based inspection via magic numbers or headers, while hiding them by default in Finder to simplify the interface—users can toggle visibility globally or per file.^[23]^[24] Linux views extensions as mere conventions without enforcement, relying instead on magic numbers—unique byte sequences at file starts—for type detection through utilities like the file command, allowing robust identification even without extensions.^[25]^[26] Cross-platform file transfers introduce challenges due to differing conventions, particularly around dot usage and visibility. In Unix-like systems (including Linux and macOS), filenames starting with a dot (e.g., .bashrc) are hidden by default in directory listings, which can confuse Windows users mistaking them for extension-less files or files with leading-dot extensions, potentially leading to unintended modifications or access issues.^[27] Case sensitivity in Unix file systems (e.g., File.txt differing from file.txt) contrasts with Windows' default case-insensitivity on NTFS, risking file overwrites or non-detection during portability.^[28] Additionally, varying path separators (/ in Unix vs. \ in Windows) complicate scripting, though extensions themselves remain portable as dot-suffixed strings if lengths fit constraints.^[28] Specific examples highlight these dynamics in mobile ecosystems. Android's external storage, often formatted as FAT32 for SD cards and USB drives, inherits FAT's 8.3 limitations, enforcing short extensions to ensure compatibility with apps and legacy devices, though internal storage (using ext4 or F2FS) allows longer names.^[6] In iOS and macOS, the Files app and Finder restrict visible extensions by default to reduce clutter, with users able to hide them individually via Get Info or globally in settings, but the system still processes them for type resolution alongside APFS metadata.^[23]^[24] These behaviors underscore the need for tools like cross-platform archives (e.g., ZIP) to preserve extensions during transfers.^[28]

Syntax and Conventions

Filename extensions are conventionally placed immediately after the last period (dot) in a filename, serving to denote the file type by appending a suffix to the base name, such as in "document.txt" where "txt" is the extension.^[6] This structure is a general convention across most file systems, though the interpretation of the extension can vary. In practice, the extension follows the base name without spaces or additional separators beyond the dot.^[6] Case sensitivity for filename extensions depends on the underlying operating system and file system. Unix-like systems, including Linux, treat extensions as case-sensitive, meaning "file.TXT" and "file.txt" are distinct files, with a strong convention favoring lowercase letters for consistency and portability.^[29] In contrast, Windows file systems like NTFS are case-preserving but case-insensitive, so "file.txt" and "FILE.TXT" refer to the same file, though mixed case is commonly used in practice.^[6] Extensions typically consist of 1 to 5 alphanumeric characters, though modern file systems impose no strict limit on length beyond the overall filename constraint of around 255 characters.^[6] Allowed characters are generally letters (a-z, A-Z) and digits (0-9), with occasional use of symbols in specific contexts, but reserved characters such as forward slash (/), backslash (), colon (:), asterisk (*), question mark (?), quotes ("), less than (<), greater than (>), and pipe (|) must be avoided to prevent parsing errors across systems.^[30] Compound extensions, like ".tar.gz" for gzip-compressed tar archives, arise when multiple dots are used, with the portion after the final dot treated as the primary extension while earlier parts form part of the base name.^[31] The three-letter extension standard originated in the MS-DOS era with the 8.3 filename format, limiting base names to 8 characters and extensions to 3, as seen in legacy formats like ".doc" for documents and ".xls" for spreadsheets.^[32] Modern conventions have evolved to include longer or multi-part extensions, such as ".7z" for 7-Zip archives, accommodating more complex file types without the DOS restrictions.^[6] International variations in filename extensions are influenced by character encoding support. Contemporary UTF-8-based systems, prevalent in Linux and modern Windows, allow Unicode characters in extensions, enabling non-Latin scripts like Cyrillic or Hanzi for global compatibility.^[33] However, legacy ASCII-limited systems from early Unix and DOS eras restricted extensions to 7-bit ASCII, causing compatibility issues with non-Latin characters that could lead to garbled names or rejection in cross-platform transfers.^[34]

File Identification

Role in Determining Content Type

Filename extensions play a crucial role in enabling operating systems and applications to identify the type of content within a file and select the appropriate software for handling it. When a user or program interacts with a file, the extension serves as a quick indicator that triggers the lookup of associated parsers, viewers, or default applications through system configurations. For instance, a file ending in .mp3 is typically mapped to an audio player, allowing the system to launch media software automatically upon double-clicking the file.^[35]^[36] In Microsoft Windows, this mapping occurs primarily via the Windows Registry, where the HKEY_CLASSES_ROOT key stores associations between extensions and programmatic identifiers (ProgIDs). Each extension, such as .txt, is linked to a ProgID (e.g., txtfile) that defines the content type, default actions like opening with Notepad, and MIME equivalents for interoperability. This registry hive merges user-specific settings from HKEY_CURRENT_USER\Software\Classes with system-wide ones from HKEY_LOCAL_MACHINE\Software\Classes, ensuring consistent behavior across sessions. Applications register their supported extensions during installation to establish these links, enabling seamless file handling.^[35]^[37] On Linux and Unix-like systems, filename extensions are mapped to MIME types using configuration files like /etc/mime.types, which define rules for associating suffixes with media types recognized by desktop environments and applications. For example, the entry audio/mpeg mp3 directs the system to treat .mp3 files as MPEG audio, often launching a compatible player via desktop entry specifications in /usr/share/applications. This setup, maintained by packages like shared-mime-info, allows graphical interfaces such as GNOME or KDE to determine default handlers based on the extension.^[36]^[38] Despite their utility, filename extensions have inherent limitations as a sole mechanism for content type determination, since they are user-assigned and easily modifiable, potentially leading to mismatches between the extension and actual file contents. For example, renaming a malicious executable from .exe to .txt could bypass basic checks if only the extension is examined, allowing unintended execution. Systems and applications often supplement extensions with internal file signatures—known as "magic numbers"—which are byte patterns at the file's header that reliably identify formats regardless of the name; tools like the GNU file command prioritize these magic tests over extensions for accurate detection.^[39]^[40] Practical examples illustrate this role and its caveats. Image viewers like those in Windows or GIMP on Linux check a .png extension to invoke PNG parsers, but may fall back to signature verification if the content does not match, preventing errors with corrupted or disguised files. Similarly, web browsers handling local .js files use the extension to enable JavaScript execution in a secure context, though modern implementations increasingly validate content signatures to mitigate risks from renamed scripts.^[35]

Comparison to MIME Types

MIME types, formally known as media types, are standardized identifiers used to specify the nature and format of a file or data stream in internet protocols such as email and the web. They consist of a main type and a subtype separated by a slash, such as text/plain for plain text files or image/jpeg for JPEG images, and were defined by the Internet Engineering Task Force (IETF) in RFC 2045, published in 1996.^[41] These types can include additional parameters, like charset=utf-8 for character encoding, enabling precise handling of content across diverse systems.^[41] Filename extensions and MIME types both serve to identify file content for appropriate processing, but they differ fundamentally in scope and reliability. Extensions operate at the filesystem level as informal, human-readable suffixes (e.g., .txt conventionally mapping to text/plain), lacking a centralized authority and relying on operating system or application conventions.^[42] In contrast, MIME types are protocol-oriented, hierarchical standards designed for network transmission, where the type/subtype structure and parameters provide explicit, machine-readable details about content semantics and handling requirements.^[41] This makes MIME types more robust for interoperability in distributed environments, while extensions are simpler but prone to ambiguity due to their ad-hoc nature. In practice, the two systems often interact through mapping mechanisms to bridge filesystem and protocol contexts. Web servers like Apache HTTP Server use modules such as mod_mime to derive MIME types from filename extensions during content delivery, consulting configuration files that associate suffixes like .html with text/html.^[43] Similarly, web clients and browsers infer MIME types from extensions when handling downloads, falling back to operating system mappings if the server does not specify a Content-Type header, which helps maintain consistency in file association but can propagate errors if the extension is misleading.^[42] While filename extensions offer simplicity and ease of use for local file management, they are error-prone because they can be easily altered or omitted, leading to incorrect content interpretation without deeper inspection. MIME types provide greater precision and standardization, ensuring consistent behavior across protocols, but they demand proper server configuration and can fail if misapplied, as seen in cases where .html files containing XHTML are served as text/html instead of the stricter application/xhtml+xml, potentially causing parsing issues in compliant browsers.^[44] Overall, MIME types prioritize accuracy in networked scenarios, whereas extensions suffice for basic, informal identification but risk mismatches without additional validation.^[45]

Applications and Special Uses

Executable Files

Filename extensions play a crucial role in identifying executable files, which are programs designed to be run directly by an operating system or interpreter. On Windows, common extensions for executables include .exe for compiled binaries in Portable Executable (PE) format, .bat and .cmd for batch scripts, and .com for legacy command files.^[2]^[46] In Unix-like systems such as Linux, executables often lack mandatory extensions, relying instead on file permissions, but conventions include .sh for shell scripts, .py for Python scripts, .bin for binary images, and .run for self-extracting installers.^[46]^[47] The execution mechanics vary by platform but frequently involve the extension as a cue for the appropriate loader or interpreter. On Windows, when a user launches a file via double-click or command line, the operating system checks the extension to determine the handler; for .exe files, the PE loader in the Windows kernel (ntoskrnl.exe) parses the file header to map it into memory and start execution, ensuring compatibility with the system's architecture. For batch files like .bat, the Command Prompt (cmd.exe) interprets the script line by line. In contrast, Unix-like systems prioritize the execute permission bit set via chmod +x over extensions; upon invocation, the kernel examines the first line for a shebang (e.g., #!/bin/sh for .sh files or #!/usr/bin/env python3 for .py scripts), invoking the specified interpreter if present, which then processes the file content.^[47] This supplemental role of extensions in Unix aids in human readability and IDE associations but is not enforced by the loader.^[48] Cross-platform execution introduces additional layers, often requiring emulation or virtual environments. For instance, Wine, a compatibility layer for POSIX-compliant systems like Linux, translates Windows API calls to native equivalents, allowing .exe files to run without a full Windows installation by loading the PE format through its own loader (wineboot.exe).^[49] Similarly, Java's .jar extension denotes an archive that can serve as a platform-independent executable; if the JAR manifest specifies a Main-Class, it launches via the Java Virtual Machine (JVM) with the command java -jar, abstracting hardware differences across Windows, Linux, and macOS.^[50] These approaches mitigate extension-specific incompatibilities but may incur performance overhead due to translation or interpretation.^[49] Historically, the use of extensions for executables traces back to MS-DOS 1.0, released in 1981 for the IBM PC, which introduced .com for flat, memory-resident programs limited to 64 KB and .exe for segmented, relocatable executables supporting larger code.^[51] This convention influenced Windows development. In Unix-like systems, the evolution continued into the 1990s with the adoption of the Executable and Linkable Format (ELF) around 1992–1995, replacing the simpler a.out format; ELF files typically have no extension but use the same permission and shebang mechanisms for execution.^[52]^[53]

Multiple or Hidden Extensions

Filename extensions can be compounded to indicate layered file processing, such as archiving followed by compression. For instance, a .tar.gz file represents a tar archive that has been compressed with gzip, where the .tar extension denotes the tape archive format for bundling multiple files, and .gz indicates the gzip compression applied afterward.^[54] Similarly, .js.map files use a compound extension to denote source map files associated with JavaScript bundles, aiding in debugging minified code. Systems typically parse these by examining the extension from right to left, prioritizing the innermost or most recent operation, though custom handling may be required for accurate identification in software.^[55] In Windows, file extensions for known file types can be hidden by default through File Explorer settings, suppressing their display to simplify the user interface. This feature is enabled via the View tab in File Explorer options, where "File name extensions" is unchecked, causing a file like resume.docx to appear as resume. While intended for usability, this concealment poses risks, as it can mask malicious files, such as executables disguised with benign-looking names, potentially leading to unintended execution of malware.^[2] Files without extensions are common in Unix-like systems, particularly for binaries and executables, as these operating systems do not rely on extensions for type identification. Instead, the file command uses magic numbers—unique byte sequences at the file's beginning—to determine content types, such as recognizing ELF binaries via their header signatures defined in standards like <elf.h>. This approach allows executables like Unix binaries to function without any suffix, emphasizing content over naming conventions.^[56] Double extensions, such as .txt.exe, represent another variation often used for deception, where a benign primary extension precedes a dangerous secondary one to masquerade the true file type. Adversaries exploit this in attacks, appending executable extensions like .exe after innocuous ones like .txt or .pdf, relying on hidden extension settings to trick users into opening malware. The MITRE ATT&CK framework classifies this as a masquerading technique (T1036.007), with examples including PreviewReport.DOC.exe used by threat actors like Bazar for initial access via phishing.^[5] Platform-specific conventions further illustrate non-standard extension use. On macOS, applications are distributed as bundles—directories structured as packages with the .app extension, such as Chess.app, which the Finder treats as a single file while hiding the suffix by default to maintain a clean appearance. This bundling organizes executables, resources, and metadata without altering core extension semantics. In contrast, Linux environments often avoid extensions for shell scripts, following best practices like those in Google's shell style guide, which recommend no extension for executables added to the PATH to enable direct invocation without suffixes, reserving .sh for non-executable library files.^[57]^[58]

Security and Risks

Associated Vulnerabilities

Filename extension spoofing involves attackers renaming malicious files with innocuous extensions to deceive users and bypass security filters, such as changing a executable file from malware.exe to photo.jpg to appear as an image. This tactic exploits user trust in file extensions for quick identification, leading to unintended execution of harmful code when the file is opened. A notable example is the ILOVEYOU worm from 2000, which spread via email attachments named LOVELETTER-FOR-YOU.TXT.vbs; Windows' default setting to hide known file extensions made it appear as a harmless .txt file, prompting users to open it and triggering the Visual Basic script that infected systems worldwide.^[59]^[60] Double extension exploits leverage systems that parse only the final extension in a filename, allowing attackers to append a benign extension after a malicious one, such as document.doc.exe, which displays as a document but executes as a program. This masquerading technique enables the delivery of malware disguised as safe documents, evading basic extension-based checks in applications or antivirus software. For instance, in web upload vulnerabilities, filenames like image.jpg.php can bypass filters expecting image files, permitting server-side script execution if the application overlooks the hidden executable extension.^[5]^[61] Auto-execution risks arise from legacy behaviors in email clients and browsers that automatically launch associated applications or scripts upon detecting certain extensions, without user confirmation, potentially running malicious code directly. In older versions of Microsoft Outlook and Internet Explorer, extensions like .exe, .bat, or .vbs triggered immediate execution when attachments were previewed or downloaded, amplifying the impact of spoofed files. This vulnerability has historically facilitated rapid worm propagation, as seen in early 2000s email-based attacks where clicking a disguised executable led to system compromise without additional warnings.^[62] Case sensitivity attacks exploit discrepancies between case-insensitive systems like Windows NTFS and case-sensitive ones like Unix/Linux ext4, enabling name collisions that can overwrite files, alter permissions, or grant unauthorized access via filename extensions. For example, an attacker could create colliding files such as script.py and Script.PY (where the latter links to a sensitive location); on Windows, they resolve to the same file, potentially executing unintended code or exposing data during cross-platform operations. A real-world instance is CVE-2021-21300 in Git, where case-insensitive file systems allowed remote code execution by cloning repositories with colliding directory names and symlinks, such as a (symlink to .git/hooks/) and A/post-checkout (malicious script), bypassing access controls on mixed-sensitivity environments.^[63]

Mitigation Strategies

Users can mitigate risks by configuring their operating systems to always display file extensions, preventing the concealment of potentially malicious suffixes. In Windows, this is achieved through File Explorer settings by navigating to the View tab and enabling "File name extensions," a recommendation from Microsoft documentation to enhance visibility and awareness of true file types.^[64] Additionally, users should verify the extension and source of any file before opening it, as relying solely on visible names can lead to unintended execution of harmful content; security experts advise scanning attachments with antivirus software and avoiding downloads from untrusted sources as a standard precaution.^[65] Software defenses play a crucial role by shifting focus from extensions to intrinsic file properties. Antivirus programs employ file signature scanning, which examines magic bytes—the unique byte sequences at the beginning of files—to accurately identify content types regardless of misleading extensions, as outlined in MITRE ATT&CK frameworks for detecting masqueraded files.^[66] For instance, tools like ClamAV use these signatures to differentiate legitimate from malicious files during scans.^[67] Complementing this, sandboxing isolates executables in a virtual environment, allowing safe execution without impacting the host system; Microsoft Windows Sandbox, for example, provides a disposable desktop for testing unknown applications, limiting potential damage from disguised threats.^[68] System-level policies further strengthen protections by controlling how files are handled at the network and browser layers. Browsers like Google Chrome incorporate Safe Browsing to block automatic downloads and execution of known dangerous files, with users able to manage enhanced protections via account settings to prevent auto-opening of executables.^[69] On web servers, enforcing strict MIME type handling via the X-Content-Type-Options: nosniff header disables browser MIME sniffing, ensuring content is rendered only according to declared types and reducing exploitation risks, as recommended by security standards from Indusface and Jetpack.^[70]^[71] Developers should adopt multi-layer validation to avoid over-reliance on filename extensions when processing uploads. The OWASP Input Validation Cheat Sheet advocates combining extension checks with content verification, such as inspecting magic bytes and enforcing size limits, while using whitelists for permitted types to block unauthorized formats.^[72] Additionally, integrating Content-Type headers alongside server-side scanning ensures robust defense; for example, storing uploads outside the web root and renaming files with random identifiers prevents direct access and extension-based attacks, aligning with OWASP guidelines for secure file handling.^[73]

References

[1]
What is a file extension (file format)? | Definition from TechTarget
May 15, 2023 · In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file ...
[2]
Common file name extensions in Windows - Microsoft Support
Windows file names have two parts separated by a period: first, the file name, and second, a three- or four-character extension that defines the file type.Missing: science | Show results with:science
[3]
File Extension - an overview | ScienceDirect Topics
A file extension is defined as a suffix added to a filename that indicates the file type and associates it with a specific application for deployment, often ...
[4]
File Naming Conventions in Linux - The Linux Information Project
Jul 21, 2005 · File names were limited to 14 bytes (equivalent to 14 characters) in early UNIX systems. However, modern Unix-like systems support long file ...<|control11|><|separator|>
[5]
Naming Files, Paths, and Namespaces - Win32 apps - Microsoft Learn
Aug 28, 2024 · All file systems follow the same general naming conventions for an individual file: a base file name and an optional extension, separated by a period.File And Directory Names · Naming Conventions · Namespaces
[6]
Filename Extension Definition - The Linux Information Project
Jun 27, 2005 · A filename extension, also commonly referred to as just an extension, is usually defined as a short string (i.e., sequence of characters) ...
[7]
Filename Extensions - Apple Developer
May 25, 2011 · Conceptual information and guidelines describing the structure and usage of the Mac OS X file system.Missing: purpose | Show results with:purpose
[8]
File format reference for Word, Excel, and PowerPoint - Office
Apr 25, 2025 · Supported file formats and their extensions are listed in the following tables for Word, Excel, and PowerPoint.
[9]
[PDF] The Compatible Time-Sharing System - People | MIT CSAIL
... file (of second name 'TSSDC.' for system commands, 'SAYED' for user coamands), in the system file directory or the user's own files (see AB.10.04 concerning ...
[10]
https://people.csail.mit.edu/saltzer/Multics/CTSS-Documents/CTSS_ProgrammersGuide_1966.pdf
[11]
“Wow, it's WordStar!” Exploring a Beloved Early Word Processor and ...
Jul 21, 2022 · docx” for Microsoft Word documents, “.mp3” for MP3 files, or “.csv” for Comma Separated Values. Some WordStar files may follow similar standards ...
[12]
Early Digital Research CP/M Source Code - CHM
Oct 1, 2014 · CP/M was originally written in PL/M and compiled with Intel's FORTRAN-based cross-development tools running on a timeshared mainframe computer. ...The Story Of Cp/m · What Cp/m Was, And What It... · 2. Version 1.3, From 1976
[13]
Microsoft MS-DOS early source code - Computer History Museum
Mar 25, 2014 · File names were limited to 8 characters, plus a 3-character extension indicating the file type. There were commands like “dir” to list the files ...
[14]
Why does MS-DOS use 8.3 filenames instead of, say, 11.2 or 16.16?
Jun 10, 2009 · ... MS-DOS worked hard at being compatible with CP/M. And CP/M used 8.3 filenames. Why did CP/M use 8.3 filenames? I don't know. There's nothing ...
[15]
[PDF] The Evolution of the Unix Time-sharing System*
This paper presents a brief history of the early development of the Unix operating system. It concentrates on the evolution of the file system, the process- ...
[16]
Exchangeable Image File Format (Exif) Family
Nov 6, 2023 · Exif metadata tags include descriptive metadata, copyright details, camera settings, technical image data, date and time information, geographic ...
[17]
NTFS overview - Microsoft Learn
Jun 18, 2025 · Maximum file name and path NTFS supports long file names and extended-length paths, with the following maximum values:
[18]
4.3. Directory Entries — The Linux Kernel documentation
In an ext4 filesystem, a directory is more or less a flat file that maps an arbitrary byte string (usually ASCII) to an inode number on the filesystem. There ...
[19]
File System Details - Apple Developer
Apr 9, 2018 · File System Details. This appendix includes information about the file systems supported by macOS, iOS, watchOS, and tvOS.
[20]
Technical Note TN1150: HFS Plus Volume Format - Apple Developer
HFS Plus uses up to 255 Unicode characters to store file names. Allowing up to 255 characters makes it easier to have very descriptive names. Long names are ...
[21]
Best Practices for File Associations - Win32 apps - Microsoft Learn
Jan 26, 2022 · The following list are recommended best practices you should use when working with file associations.Missing: OS | Show results with:OS
[22]
Show or hide filename extensions on Mac - Apple Support
Filename extensions are usually hidden in macOS, but if you find them useful, you can show them. If extensions are hidden, macOS still opens files with the ...
[23]
File System Basics - Apple Developer
Apr 9, 2018 · If your app defines custom file formats, you should register those formats and any associated filename extensions in your app's Info.plist file.
[24]
file(1) - Linux manual page - man7.org
The concept of a “magic number” has been applied by extension to data files. ... This is usually used in conjunction with the -m option to debug a new magic file ...
[25]
Do file extensions have any purpose in Linux? - Ask Ubuntu
Jul 27, 2016 · On Windows, there is a strong tradition of using the file extension as the primary means of identifying a file; most visibly, the graphical file ...
[26]
Why are filenames that start with a dot hidden? Can I hide files ...
Aug 30, 2013 · Traditional Unix filesystems don't have a "hide" attribute for files. A filesystem driver can hide any files it wants, by simply omitting their ...Shell filename pattern that expands to dot files but not to `..`?Bash globbing that matches all files except those with a specific ...More results from unix.stackexchange.comMissing: portability | Show results with:portability
[27]
file path portability - windows - Stack Overflow
Nov 10, 2008 · I have a program that I need to run under *nix and windows. because the program takes file paths from files the issue is what to do about the \ vs / issue.Portability Bug between Windows and Unix of \r\n (CRLF) and \n (LF)?c# - Cross-platform file name handling in .NET Core - Stack OverflowMore results from stackoverflow.com
[28]
Filesystems and case-insensitivity - LWN.net
Nov 28, 2018 · Supporting case-insensitive file names requires the encoding-awareness changes in order to define what case folding means for a given character.
[29]
What characters are forbidden in Windows and Linux directory ...
Dec 29, 2009 · The forbidden printable ASCII characters are: Linux/Unix: / (forward slash) Windows: < (less than) > (greater than) : (colon - sometimes works, but is actually ...What characters should be restricted from a Unix file name?Allowed characters in filename - Stack OverflowMore results from stackoverflow.com
[30]
How to create tar.gz file in Linux using command line - nixCraft
Aug 22, 2025 · We can use '.tgz' extension instead of '.tar.gz' to save long file names # create tgz backup file tar zcf /backups/docs.tgz $HOME/Documents/
[31]
8.3 Filename - MS-FSCC - Microsoft Learn
Apr 7, 2025 · An 8.3 filename (also referred to as a DOS name, a short name, or an 8.3-compliant filename) is a filename that conforms to the following restrictions.
[32]
The Unicode HOWTO: Locale setup
You can now already use any Unicode characters in file names. No kernel or file utilities need modifications. This is because file names in the kernel can be ...
[33]
UTF 8 filenames? - Unix & Linux Stack Exchange
May 7, 2012 · On Unix/Linux, a filename is a sequence of any bytes except for a slash or a NUL. A slash separates path components, and a NUL terminates a path name.Understanding Unix file name encodingHow to fix the UTF-8 character encoded filenames which don't look ...More results from unix.stackexchange.com
[34]
File Types - Win32 apps - Microsoft Learn
Nov 19, 2021 · This topic explains how to create new file types and how to associate your app with your file type and other well-defined file types.
[35]
MIME/etc/mime.types - Debian Wiki
Jan 17, 2021 · MIME /etc/mime.types is the file in which system-wide rules are defined for mapping filename suffices to media types (MIME types).
[36]
HKEY_CLASSES_ROOT Key - Win32 apps - Microsoft Learn
Jan 7, 2021 · The HKEY_CLASSES_ROOT (HKCR) key contains file name extension associations and COM class registration information such as ProgIDs, CLSIDs, and IIDs.
[37]
https://learn.microsoft.com/en-us/windows/win32/sysinfo/hkey-classes-root-key
[38]
CWE-646: Reliance on File Name or Extension of Externally ...
The product allows a file to be uploaded, but it relies on the file name or extension of the file to determine the appropriate behaviors.Missing: limitations identification
[39]
Unrestricted File Upload - OWASP Foundation
The file types allowed to be uploaded should be restricted to only those that are necessary for business functionality. Never accept a filename and its ...
[40]
RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One
This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages.
[41]
Media types (MIME types) - HTTP - MDN Web Docs
Aug 19, 2025 · A MIME type indicates the nature and format of a document, file, or bytes. It consists of a type and a subtype, like type/subtype.Common media types · Compression in HTTP · Image file type and format guideMissing: 2045 | Show results with:2045
[42]
mod_mime - Apache HTTP Server Version 2.4
While mod_mime associates metadata with filename extensions, the core server provides directives that are used to associate all the files in a given container ( ...
[43]
What MIME type should XHTML be served with? - W3C
The recommended MIME type for XHTML is application/xhtml+xml. For XHTML 1.0, text/html is allowed for backward compatibility.
[44]
Why are MIME types needed if we can identify file ... - Server Fault
Aug 11, 2010 · Mime type clearly specifies the intended use of the file. File extensions only hint at the content. Both can be wrong.Do file extensions matter if you're sending the right MIME type?difference between mime types and extension filtering on IIS?More results from serverfault.com
[45]
Executable File Formats - FileInfo.com
Common executable file extensions include .EXE, .APP, .VB, and .SCR ... Linux Executable File, 4.2 .CMD, Windows Command File, 4.2 .XBE, Xbox Executable ...
[46]
Executing Python Scripts With a Shebang
Jan 25, 2025 · In this tutorial, you'll learn when and how to use the shebang line in your Python scripts to execute them from a Unix-like shell.
[47]
File extensions for unix shell scripts [closed]
Feb 15, 2012 · As you said it, the Unix file extensions are purely information. You just need your script to have a correct shebang and being executable. You ...Find all files with a Python Shebang - Unix & Linux Stack Exchangeshell script - Can .sh files only be used in Linux?More results from unix.stackexchange.com
[48]
WineHQ - Run Windows applications on Linux, BSD, Solaris and ...
A compatibility layer capable of running Windows applications on several POSIX-compliant operating systems, such as Linux, macOS, & BSD.
[49]
Guide to Creating Jar Executables and Windows ... - Baeldung
Jan 8, 2024 · In this tutorial, we'll start by learning how to package a Java program into an executable Java ARchive (JAR) file. Then, we'll see how to ...
[50]
DOS 1.0 and 1.1 | OS/2 Museum
In August 1981, IBM released its Personal Computer (better known as the PC) and DOS 1.0. It was widely expected that Digital Research would release CP/M-86 for ...
[51]
Evolution of the ELF object file format - MaskRay
May 26, 2024 · In the 1990s, many Unix and Unix-like operating systems, including Solaris, IRIX, HP-UX, Linux, and FreeBSD, switched to ELF. The 86open ...
[52]
The ELF Object File Format: Introduction - Linux Journal
Apr 1, 1995 · Let us start at the beginning. Users will generally encounter three types of ELF files—.o files, regular executables, and shared libraries.Missing: 1990s | Show results with:1990s<|control11|><|separator|>
[53]
What is TAR file format, TGZ TBZ TXZ extensions - PeaZip
Compressed tar files can be found named with single extension, e.g. TGZ, TBZ, TXZ, TZST, or with double file extension, e.g. TAR.GZ, TAR.BR, TAR.BZ2, TAR.XZ, ...<|separator|>
[54]
3 Best Ways to Get a File Extension in Python - Index.dev
Feb 25, 2025 · Discover multiple ways to extract file extensions in Python using os.path, pathlib, and advanced methods for handling complex suffixes.
[55]
file(1) - Linux manual page
### Summary: How Unix/Linux Identifies File Types Without Extensions (e.g., Binaries)
[56]
Masquerading: Double File Extension, Sub-technique T1036.007
Adversaries may abuse a double extension in the filename as a means of masquerading the true file type. A file name may include a secondary file type extension ...Missing: deception | Show results with:deception
[57]
About Bundles - Apple Developer
Mar 27, 2017 · About Bundles. Bundles are a convenient way to deliver software in macOS and iOS. Bundles provide a simplified interface for end users and ...
[58]
styleguide | Style guides for Google-originated open-source projects
Bash is the only shell scripting language permitted for executables. Executables must start with #!/bin/bash and minimal flags.
[59]
[PDF] The ILOVEYOU Worm - nob.cs.ucdavis.edu!
The second also looks like a text file (.TXT) because. Windows would hide the extension. If you viewed it in another way, the second extension would be visible ...Missing: disguise | Show results with:disguise
[60]
E-mail Security in the Wake of Recent Malicious Code Incidents
A common technique used to disguise malicious code is to make an executable appear as ... superfluous file extension such as: ILOVEYOU.TXT ... the ILOVEYOU worm).
[61]
File Upload Vulnerabilities and Security Best Practices - Vaadata
Apr 29, 2025 · Extension validation: restrict the file types accepted (for example, to image formats only) by setting up a whitelist of authorised extensions.Bypassing File Extension... · Exploiting File Upload... · Xss Attack Using Svg File...
[62]
An overview of unsafe file types in Microsoft products
This article provides an overview of unsafe file types and of the safeguards that Microsoft has created to help protect customers from unsafe file types.Missing: clients | Show results with:clients
[63]
[PDF] Unsafe at Any Copy: Name Collisions from Mixing Case Sensitivities
Feb 21, 2023 · Historically, UNIX file systems are case sensitive, whereas Windows file systems are case insensitive.
[64]
How can I get the extension to display along with the name of the file?
Apr 20, 2022 · Open the "View" menu;; Tick "File name extensions" in the "Show/hide" section. I hope this helps. Feel free to ask back any questions ...
[65]
File Upload Protection – 10 Best Practices for Preventing Cyber ...
2. Verify file types. In addition to restricting the file types, it is important to ensure that no files are 'masking' as allowed file types. For ...
[66]
Masquerade File Type, Sub-technique T1036.008 - MITRE ATT&CK®
Mar 8, 2023 · For example, a file's signature (also known as header or magic bytes) is the beginning bytes of a file and is often used to identify the file's ...
[67]
Signatures - ClamAV Documentation
ClamAV signatures are text-based, used to differentiate clean and malicious files. They are distributed in CVD files, and include body-based and hash-based ...
[68]
Windows Sandbox | Microsoft Learn
Jan 24, 2025 · Windows Sandbox (WSB) offers a lightweight, isolated desktop environment for safely running applications. It's ideal for testing, debugging, exploring unknown ...
[69]
Manage Enhanced Safe Browsing for your account - Google Help
Go to your Google Account. · Tap Security & sign-in. · Scroll to “Enhanced Safe Browsing for your Account.” · Select Manage Enhanced Safe Browsing. · Turn Enhanced ...Missing: execution | Show results with:execution
[70]
X-Content-Type-Options: Examples and Benefits - Indusface
Sep 4, 2025 · The X-Content-Type-Options header is an HTTP response header used to instruct browsers on how to handle the MIME types of the resources they receive.
[71]
What is MIME Sniffing? Definition and How to Prevent Attacks
Feb 27, 2025 · 1. Configure your server to send correct MIME types · 2. Implement 'X-Content-Type-Options: nosniff' · 3. Use content security policy (CSP) ...
[72]
Input Validation - OWASP Cheat Sheet Series
Use input validation to ensure the uploaded filename uses an expected extension type. · Ensure the uploaded file is not larger than a defined maximum file size.
[73]
Achieve OWASP File Upload Standards with MetaDefender Core
The OWASP File Upload Cheat Sheet provides a proven foundation for securing file uploads, from validation to malware scanning to sanitization and safe storage.