COM file
A COM file is a binary executable file format originally developed for the CP/M operating system and later adopted by MS-DOS, consisting of a flat, unstructured memory image of machine code, data, and stack without any header, metadata, or relocation information.[1] It represents the simplest form of DOS-compatible program, limited to a maximum size of approximately 64 KB (precisely 65,280 bytes or 0xFF00) due to the single-segment memory model it employs.[2][3] Introduced in the late 1970s with CP/M and carried over to MS-DOS in 1981, the format enabled quick loading and execution of small utility programs on resource-constrained 8086/8088-based systems, serving as the precursor to more complex formats like EXE.[1] In MS-DOS, COM files were prioritized over EXE files bearing the same name during command execution, a legacy behavior that persisted into early Windows versions such as 95, 98, and Me, where the COMMAND.COM shell itself was a COM file.[2] Upon execution, the DOS loader allocates a memory segment, places the Program Segment Prefix (PSP) at offset 0x00, loads the entire COM file contents starting at offset 0x100, initializes the stack pointer near the top of available memory, and transfers control to the program's entry point at 0x100 via a far call instruction.[1][3] This format's lack of structure imposed significant constraints: programs could not exceed the 64 KB limit, required manual management of code, data, and stack within a single 64 KB segment, and relied on direct BIOS or DOS interrupts (e.g., INT 21h) for system services without support for dynamic linking or overlays.[1][3] While ideal for compact commands like DEBUG.COM or FORMAT.COM, larger applications necessitated the MZ/EXE format introduced in MS-DOS 1.0 to accommodate relocatable segments and headers.[1][4] In modern Windows, COM files are largely obsolete but can still execute under the NTVDM subsystem on 32-bit systems or via DOS emulators on 64-bit versions, though they pose security risks due to their simplicity and historical use in malware.[2]Overview and History
Definition and Purpose
A COM file is a flat binary executable format utilized in MS-DOS, consisting of pure machine code and data without any headers or metadata structures.[5] This simplicity allows the entire file contents to be treated as a single contiguous block of code and initialized data, designed specifically for direct loading into memory at offset 0x100 in the program's segment.[5] Unlike more elaborate formats, COM files require no parsing of headers or relocation of addresses, making them ideal for environments with limited resources.[6] The primary purpose of COM files is to enable the rapid execution of small utility programs in resource-constrained systems like MS-DOS, where the emphasis on straightforward operation takes precedence over advanced features such as relocatable code or dynamic linking.[5] They offer key advantages including minimal runtime overhead, instantaneous loading without the need for format interpretation, and compatibility with memory models that support fixed-address execution, rendering them suitable for bootloaders and memory-resident utilities.[5] Historically, the COM format originated as the native executable type in CP/M and was directly inherited by early versions of MS-DOS, serving as the default for executables before the introduction of the more versatile EXE format for handling larger applications.[6] This inheritance ensured continuity in the DOS ecosystem, allowing simple binaries to remain viable even as the operating system evolved.[6]Development in MS-DOS
The COM file format originated in 86-DOS, an operating system developed by Tim Paterson at Seattle Computer Products starting in April 1980 as a CP/M-compatible environment for Intel 8086-based systems, providing a simple mechanism for executing assembly-language programs directly as binary images.[7][8] Early versions of 86-DOS, such as 0.33 released in December 1980, included utilities like ASM.COM and HEX2BIN.COM, demonstrating the format's use for compact, load-and-run executables in resource-constrained environments.[7][9] Microsoft first licensed 86-DOS in December 1980 and acquired full rights in July 1981, adapting it into MS-DOS 1.0, released alongside the IBM PC in August 1981, where the COM format became the primary executable type for small programs due to its simplicity and direct compatibility with the system's 64 KB memory limit.[10][11] This adoption integrated COM files into the IBM PC ecosystem, enabling key command-line utilities such as DEBUG.COM for program debugging and FORMAT.COM for disk preparation, which exemplified the format's role in essential system operations.[11][12] The COM format remained largely unchanged through subsequent MS-DOS releases, persisting as a core component up to version 6.22 in 1994, while the more flexible EXE format, introduced in MS-DOS 1.0 (August 1981), supported overlays and programs exceeding 64 KB, shifting preference toward EXE for larger applications.[13][11] Despite this evolution, COM files continued to underpin lightweight utilities in the command-line environment throughout the MS-DOS era.[14] By the mid-1990s, the rise of graphical interfaces marked the decline of COM files as a primary format; Windows 95, released in August 1995, phased out direct reliance on DOS executables in favor of native Windows applications, though COM support was retained in the MS-DOS compatibility mode for legacy software.[10][13]Technical Format
Binary Structure
The COM file format consists of a flat binary image containing solely the program's machine code and data, without any file header, segments, or metadata structures. This simplicity stems from its origins in early operating systems like CP/M, where the entire file—limited to a maximum size of 64 KB—is treated as a direct memory loadable entity. Upon execution in MS-DOS, the operating system allocates a single 64 KB memory segment and loads the COM file starting at offset 0x0100 within that segment, setting the code segment (CS) and instruction pointer (IP) to point to this location (CS:IP = segment:0x0100), while data segment (DS) and extra segment (ES) registers are also initialized to the segment base.[5][1] Preceding the loaded code at offsets 0x0000 to 0x00FF within the same segment, MS-DOS constructs a Program Segment Prefix (PSP), a 256-byte data structure that provides essential runtime information such as the program's termination vector, memory allocation details, and command-line arguments, but this PSP is not part of the COM file itself. The COM file's contents thus occupy a contiguous linear block from 0x0100 onward, encompassing the program's code, stack, heap, and any initialized data variables, all managed within the single segment without relocation or segmentation support. This unified layout requires programmers to use absolute addressing relative to the 0x0100 origin, as there are no mechanisms for dynamic relocation during loading.[15][16] In contrast to the more complex EXE format, which begins with an MZ header containing details like the program's entry point, relocation table, and segment information to enable loading into non-contiguous memory and support for larger programs, the COM format lacks all such elements, enforcing a simpler but more restrictive model suitable only for small, self-contained applications. COM files must be created using assembly tools configured for flat binary output, such as the Microsoft Macro Assembler (MASM), where directives like .MODEL TINY and ORG 0x100h ensure the output is a pure binary without object file overhead or linking artifacts. The standard file extension is .COM, adhering to the 8.3 naming convention of the MS-DOS file system, which reserves the first eight characters for the name and the last three for the extension.[1][17]Memory Loading Process
The MS-DOS command interpreter, COMMAND.COM, initiates the loading of a .COM file by invoking DOS Interrupt 21h with AH=4Bh (the EXEC function), passing the program's filename and an execution parameter block that specifies details such as the command tail and file control blocks (FCBs).[18] The DOS loader allocates a contiguous block of conventional memory for the program, creating a 256-byte Program Segment Prefix (PSP) at the base of this block to manage the program's environment, including interrupt vectors and default FCBs. The entire contents of the .COM file—treated as raw machine code without any header or relocation information—are then read into memory starting at offset 0x0100 within the allocated segment, immediately following the PSP, using DOS file services like INT 21h AH=3Fh for reading.[18] This process assumes the file size does not exceed 64 KB (minus the 256 bytes for the PSP), as .COM files operate within a single 64 KB segment.[19] Upon successful loading, the DOS loader configures the CPU registers to prepare for execution: the code segment (CS), data segment (DS), extra segment (ES), and stack segment (SS) registers are all set to the segment address of the PSP, ensuring the program runs in a flat memory model with unified addressing; the instruction pointer (IP) is set to 0x0100 to begin execution at the start of the loaded code; and the stack pointer (SP) is initialized to 0xFFFE, pointing to the last available word in the 64 KB segment to provide maximum stack space.[18] No relocation or segment binding occurs, as the .COM format lacks relocation tables, allowing the program to run directly in this single-segment environment without further adjustment by the loader.[20] The loader then transfers control to the program at the effective address formed by the CS:IP pair. The program executes within the allocated memory until it terminates, typically by issuing INT 20h (a direct terminate call that releases all memory and returns control to DOS via the PSP's interrupt 22h vector) or INT 21h with AH=4Ch (terminate with return code, which flushes file buffers, closes handles, and releases memory before returning to the caller with an exit code in AL).[18] If the program ends without proper termination—such as by falling off the end of code—the PSP's first two bytes (containing the INT 20h opcode CD 20h) serve as a safety net to invoke termination automatically.[21] Error conditions during loading, such as insufficient memory (error code 08h) or a file larger than 64 KB, result in the carry flag being set upon return from the EXEC call, with the specific error code in AX, prompting COMMAND.COM to display an error message and return to the DOS prompt without executing the program.[18] In certain MS-DOS configurations, particularly from version 5.0 onward with extended memory managers like HIMEM.SYS and EMM386 loaded via CONFIG.SYS directives such as DOS=HIGH,UMB, the available conventional memory is maximized by relocating core DOS components to the high memory area (HMA) or upper memory blocks (UMBs), indirectly allowing .COM files to utilize more of the lower 640 KB for loading without fragmentation issues.[22] For terminate-and-stay-resident (TSR) .COM programs, the LH (load high) command in AUTOEXEC.BAT—enabled by UMB support in CONFIG.SYS—can explicitly place them into UMBs above 640 KB, though transient programs are still loaded into conventional memory by default.[23]Limitations and Workarounds
Size Restrictions
The COM file format imposes a strict maximum size of 65,278 bytes (0xFEFE in hexadecimal), stemming from its reliance on single-segment loading within the 64 KB address space of the 8086 processor's segment, excluding the 256-byte Program Segment Prefix (PSP) allocated by MS-DOS for essential system data and an additional 2 bytes reserved on the stack for the return address.[6][24][25] This limitation means COM files lack support for multiple memory segments or dynamic allocation mechanisms beyond the contiguous RAM available in that single segment, requiring all code, data, and stack to reside linearly within the allocated space starting at offset 0x0100 immediately after the PSP.[24][26] Consequently, the format's constraints influenced program design by promoting highly compact coding practices, such as prioritizing CPU registers over memory-based variables to minimize space usage and generally avoiding inclusion of external libraries that would inflate the binary size.[27] If a COM file exceeds 64 KB, MS-DOS typically rejects it during loading, resulting in errors like "Program too big to fit in memory" or immediate crashes due to incomplete or corrupted execution, as the system cannot allocate sufficient contiguous memory.[28][29] Developers could assess a COM file's size using the DIR command in MS-DOS, which displays the exact byte count of the file on disk, though the actual loadable portion accounts for overhead like the PSP and any unaddressable bytes at the segment's end.[30][6]Techniques for Larger Programs
To overcome the 64 KB size restriction inherent to COM files, developers employed overlay techniques, loading a compact core program as a COM file and dynamically fetching additional code or data from disk files during execution. This was achieved using MS-DOS interrupt 21h functions, such as AH=3Dh to open a file and AH=3Fh to read its contents into allocated memory, allowing the program to incorporate larger modules on demand.[31] Alternatively, interrupt 21h with AH=4Bh and AL=03h provided a dedicated "load overlay" capability, transferring code from a specified file into a target memory location without immediate execution, enabling segmented program structures despite the flat memory model of COM files.[32] Self-modifying code offered another workaround, where the running program altered its own instructions in memory to emulate segmentation or adapt behavior, leveraging the fact that COM files treat code and data within the same writable segment. This technique reduced the need for static inclusion of all logic within the initial 64 KB load, though it required careful management to avoid corruption. For instance, a program could overwrite portions of its code to branch to newly loaded routines, simulating a multi-segment EXE-like architecture. Tools for COM-to-EXE conversion, such as com0exe, facilitated creating hybrid setups by wrapping a small COM stub around larger EXE overlays, effectively reverse-engineering the process of tools like EXE2BIN to produce COM-compatible entry points for extended functionality. In TSR mode, small COM-based stubs remained in memory after initial loading, hooking interrupts to chain-load or invoke larger modules as needed; the Microsoft Mouse driver (MOUSE.COM) exemplifies this, installing a minimal resident handler that extended input capabilities without exceeding COM limits.[33] Early games adopted similar extensions, starting with compact COM loaders that dynamically incorporated graphics or level data to fit within memory constraints. These methods, while innovative, introduced significant limitations: they heightened development complexity due to manual memory management, risked instability from improper loading or overwrites, and exhibited incompatibility with certain DOS versions or hardware configurations lacking sufficient free memory above the COM segment.[34]Platform Compatibility
Support in DOS and Early Windows
COM files enjoyed full native support in MS-DOS versions 1.0 through 7.0, from their introduction in 1981 to the late 1990s, as simple binary executables loaded directly by the command interpreter COMMAND.COM. This interpreter, residing in memory as both a resident and transient portion, handled execution by searching for the file in the current directory or along the PATH environment variable and loading its contents into memory starting at offset 0x100, preserving the DOS environment for the program.[14] Key operational features in MS-DOS emphasized COM files' efficiency and priority. The system searched for executables by prioritizing the .COM extension over .EXE and .BAT in the current directory and PATH directories, enabling quick access without specifying extensions. Additionally, COM files could be automatically executed during system startup via the AUTOEXEC.BAT batch file, which ran commands sequentially after CONFIG.SYS processing, allowing utilities or drivers to load seamlessly at boot.[35][36] In early Windows versions 1.0 to 3.1 (1985–1992), COM files executed within a DOS box, a virtualized DOS environment that inherited the native MS-DOS loader behavior for compatibility with the underlying DOS host. This setup allowed DOS-based programs, including COM files, to run windowed or full-screen under Windows' graphical shell, with the DOS box providing emulation for graphics modes and hardware access.[37] From Windows NT in 1993 onward, the NT Virtual DOS Machine (NTVDM) provided emulated support for COM files on 32-bit x86 systems, replicating the DOS loading process while enforcing the format's inherent 64 KB size limit through memory segmentation. NTVDM isolated 16-bit DOS applications in a virtualized subsystem, enabling execution without interfering with the 32-bit kernel.[38] Support for COM files was gradually deprecated as legacy technology starting with Windows 95, though retained via virtual DOS mechanisms such as NTVDM in the NT family for backward compatibility. Microsoft placed NTVDM in maintenance mode due to its age and security vulnerabilities, recommending migration to modern 32-bit or 64-bit applications. This support persisted as an optional feature in 32-bit editions of Windows 10 until its end-of-life in October 2025. Windows 11, released in 2021 as a 64-bit-only OS, does not include NTVDM and all 16-bit DOS execution to align with contemporary hardware and security standards.[38][39]Implementation on Other Systems
The COM file format for 8086 processors in MS-DOS drew significant influence from the executable formats used in CP/M-86, Digital Research's operating system for Intel 8086 systems introduced in the late 1970s. While CP/M-86 primarily employed the .CMD extension for relocatable memory image files that supported direct loading into memory without relocation, its design emphasized simple binary loading mechanisms akin to the flat, non-relocatable structure of MS-DOS .COM files for 8086 binaries. This precursor approach facilitated efficient execution in resource-constrained environments by treating executables as raw memory images starting at offset 0x100, a convention that MS-DOS adopted to ensure compatibility with early x86 hardware.[40][41] DR-DOS, released by Digital Research in 1988 as a compatible alternative to MS-DOS, retained the core COM file format while introducing variations such as extended file attributes and additional interrupt 21h functions for enhanced system calls. These modifications allowed DR-DOS to support the same direct loading process for .COM files—mapping the binary directly into memory at segment 0x0100—without altering the fundamental binary structure, ensuring seamless execution of MS-DOS-compatible programs. However, certain system files like COMMAND.COM in DR-DOS 6.0 deviated by using the more advanced DOS executable (EXE) format for larger code requirements, though standard application .COM files remained unchanged in format.[42][43] FreeDOS, an open-source DOS-compatible operating system initiated in 1994, provides full support for .COM files through its kernel loader, which emulates the MS-DOS loading behavior by reading the file as a raw binary image and executing it in real mode at the conventional memory offset. The FreeDOS kernel (KERNEL.SYS) handles .COM execution identically to MS-DOS, loading the entire file into memory below 640 KB and transferring control to the entry point, thereby maintaining compatibility for legacy DOS software on modern hardware. This design choice ensures that .COM programs run without modification, leveraging the kernel's CONFIG.SYS and FDCONFIG.SYS directives for environment setup.[44][45] Emulators like DOSBox, first released in 2002, enable .COM file execution by simulating an IBM PC-compatible environment, including the DOS command interpreter and memory management necessary for loading and running these flat binaries. DOSBox mounts host directories as virtual drives and invokes .COM files via the emulated command line, replicating the original loading process with cycle-accurate timing for authentic behavior in games and utilities. Similarly, PCem (and its successor 86Box) supports .COM execution through full hardware emulation of x86 systems from the 1980s and 1990s, allowing users to boot DOS variants and run .COM programs as on genuine period hardware, complete with accurate BIOS interactions and peripheral simulation.[46][47] On Unix-like systems, .COM files can be executed using DOSemu, a Linux-based DOS emulation layer that provides a user-space environment for running DOS applications, including direct loading of .COM binaries via an emulated MS-DOS kernel. DOSemu integrates with the host filesystem, allowing seamless access to .COM files while handling real-mode execution through dynamic recompilation or interpretation. Wine does not support DOS .COM files natively. For executing DOS programs on Unix-like systems, dedicated emulators like DOSBox or DOSemu are recommended.[48] In embedded systems, .COM files find use in certain BIOS and UEFI-compatible tools for x86 architectures, particularly in legacy real-mode utilities embedded within firmware for diagnostic or boot-time operations that require DOS compatibility. These tools leverage the simple loading mechanism of .COM files to execute in the pre-OS environment, ensuring portability across x86-based embedded platforms without relying on complex loaders.[49]Modern Applications
Compatibility in Contemporary OS
In contemporary 64-bit Windows 11 editions, COM files cannot be executed natively due to the lack of the NTVDM (NT Virtual DOS Machine) subsystem, which was limited to 32-bit Windows versions and placed in maintenance mode without further development.[38] The WOW64 subsystem supports 32-bit applications but does not handle 16-bit DOS executables like COM files, requiring third-party emulators such as DOSBox-X or NTVDMx64 to provide compatibility through simulated DOS environments.[38][50][51] Following the launch of the 64-bit-only Windows 11 in 2021 and updates including version 25H2 released in September 2025, these emulators have become essential for any DOS legacy support, as no built-in mechanisms exist for direct loading.[52][53] Linux and macOS offer no native execution for COM files, as these systems do not include DOS-compatible loaders, instead relying on user-space emulators like DOSBox-X for lightweight simulation or QEMU for full-system virtualization paired with a DOS kernel.[50][54] This emulation approach ensures isolation but demands manual configuration to mount file systems and replicate hardware interfaces.[55] Support for COM files persists in modern operating systems primarily to accommodate legacy business software in enterprises, where outdated DOS applications continue to operate critical workflows; retro computing communities preserve historical programs; and cybersecurity professionals analyze malware samples that exploit the format to evade detection.[45][56][57] Contemporary development tools, such as the Netwide Assembler (NASM), enable the generation of COM-compatible flat binary outputs using the-f bin format, allowing developers to assemble and test DOS code across platforms like Windows, Linux, and macOS without platform-specific dependencies.[58]
Running COM files on 64-bit systems presents challenges, including the complete absence of direct execution paths, which blocks legacy loaders and requires virtualization layers like QEMU or VirtualBox to achieve hardware-accurate emulation and prevent compatibility gaps in timing or interrupts.[59][54]
As of 2025, COM file compatibility has become increasingly niche, with viability maintained through open-source initiatives like FreeDOS 1.4, released in April 2025, which provides an updated DOS-compatible kernel for running and developing such executables in emulated or bare-metal environments.[45][60]