Fact-checked by Grok 2 weeks ago

Executable

In computing, an executable is a file that contains a program, consisting of machine-readable instructions and data that can be loaded into memory and directly executed by a computer's operating system or processor. These files instruct the system to perform specific tasks, such as running applications or system processes, and are typically inert until invoked. Executables take the form of compiled binary files, which hold machine code optimized for a particular architecture. Executable files are essential components of software distribution and execution across operating systems, with standardized formats ensuring compatibility and efficient loading. For Microsoft Windows, the Portable Executable (PE) format structures these files to include headers, code sections, and resources, allowing the system to map them into process address space.^[1] On Unix-like systems such as Linux, the Executable and Linkable Format (ELF) serves a similar role, organizing object code, symbols, and relocation data for dynamic linking and execution.^[2] Apple's macOS uses the Mach-O format, which supports both executables and shared libraries with provisions for fat binaries that run on multiple architectures.^[3] The PE format evolved from the earlier Common Object File Format (COFF), while ELF and Mach-O have distinct historical developments.^[4] Beyond technical structure, executables play a critical role in software security and portability, as they must be verified for integrity before execution to prevent malware infection. Operating systems employ mechanisms like digital signatures and code signing to authenticate executables, reducing risks from unauthorized or tampered files.^[1] As computing has advanced, executables have adapted to support virtualization, containerization, and cross-platform execution, enabling software to run seamlessly across diverse hardware and environments.^[5]

Definition and Fundamentals

Core Concept

An executable is a file or program segment containing machine code or bytecode that a central processing unit (CPU) or virtual machine can directly execute to perform specified tasks, in contrast to source code which must be processed further or scripts which require an interpreter at runtime.^[6]^[7] This form encodes instructions in a binary format native to the hardware or a managed runtime environment, allowing the computer to carry out operations without additional translation steps during execution.^[6] Executables differ from non-executable files, such as source code or data files, by being pre-processed into a ready-to-run state that includes structural elements like headers for metadata, entry points, and dependency information, enabling direct loading into memory for execution.^[7] Unlike human-readable source code, which is written in high-level languages and requires compilation or interpretation, or plain-text scripts that are executed line-by-line by an interpreter, executables represent a compiled or assembled output optimized for efficient hardware-level processing.^[6] In the software lifecycle, an executable serves as the final output of the build process, transforming developer-written code into a standalone artifact that can be distributed and run independently on compatible systems.^[6] This role enables programs to operate without needing the original source or development tools present, facilitating deployment across environments. For example, a basic "Hello World" program assembled from low-level instructions produces a compact binary executable that outputs the message upon running, whereas an equivalent Python script remains as interpreted text requiring a runtime environment like the Python interpreter to execute.^[8]^[6]

Key Characteristics

Executables feature a modular internal structure designed to facilitate loading and execution by the operating system. At the core is a header that provides essential metadata, including a magic number to identify the file format—such as 0x7F 'E' 'L' 'F' for ELF files or the "PE\0\0" signature for Portable Executable (PE) files—along with details on the file's architecture, entry point, and layout of subsequent sections.^[9]^[1] Following the header, the file is divided into sections, each serving a distinct purpose: the .text section contains the machine code instructions, marked as read-only to prevent modification; the .data section holds initialized global and static variables; the .bss section reserves space for uninitialized variables, which are zeroed at runtime; and a symbol table section stores references to functions and variables for linking and debugging.^[9]^[1] This segmented organization allows tools like linkers and loaders to efficiently parse and map the file into memory.^[10] Portability of executables is inherently limited by dependencies on the target CPU architecture and operating system. For instance, binaries compiled for x86 architectures use a different instruction set than those for ARM, rendering them incompatible without recompilation or emulation.^[11] Additionally, operating system variations introduce challenges such as endianness—where x86 systems typically employ little-endian byte ordering while some others use big-endian—and differing calling conventions that dictate how function parameters are passed between caller and callee.^[12] These factors necessitate architecture-specific and OS-specific builds to ensure correct execution, as mismatches can lead to crashes or undefined behavior.^[13] Key attributes of executables include memory protection mechanisms that enhance security and stability during runtime. The code segment (.text) is configured with read-only and executable permissions, preventing accidental or malicious writes to instructions while allowing the CPU to fetch and execute them.^[14] In contrast, data segments (.data and .bss) are granted read-write permissions for variable modifications but are non-executable to mitigate code injection risks.^[14] Runtime memory is further segregated into stack and heap regions: the stack, used for local variables and function calls, operates on a last-in-first-out basis with automatic allocation and deallocation; the heap, for dynamic allocations via functions like malloc, grows as needed and requires explicit management to avoid leaks or overflows.^[15] This separation ensures efficient resource use and isolation of execution contexts.^[16] The size of an executable binary is influenced by optimization techniques applied during compilation and linking, which balance performance, functionality, and efficiency. Dead code elimination, a common optimization, removes unused functions, variables, and instructions that are never reached, directly reducing the final file size and improving load times— for example, interprocedural analysis can significantly reduce code size in large programs by identifying unreferenced sections.^[17] Other factors include the inclusion of debug symbols (which can be stripped post-build), alignment padding for hardware requirements, and the embedding of runtime libraries, all of which contribute to variability in binary footprint across builds.^[18] These optimizations prioritize minimalism without sacrificing correctness, making executables more suitable for distribution and deployment.^[17]

Creation Process

Compilation and Linking

The compilation phase of creating an executable begins with translating high-level source code, such as C or C++, into machine-readable object files using a compiler like the GNU Compiler Collection (GCC).^[19] This process involves multiple sub-phases in the compiler's frontend: lexical analysis, where the source code is scanned to identify tokens such as keywords, identifiers, and operators while ignoring whitespace and comments; syntax analysis or parsing, which checks the token sequence against the language's grammar to build a parse tree representing the program's structure; and semantic analysis, which verifies type compatibility, scope rules, and other meaning-related aspects to ensure the code is valid beyond syntax.^[20] Following these, the compiler generates intermediate code, applies optimizations to improve efficiency (such as dead code elimination or loop unrolling), and produces target-specific assembly code through the backend's code generation phase.^[21] The assembly step converts the generated assembly code into relocatable object files, typically using the GNU Assembler (as), which translates low-level instructions into binary object code while preserving relocation information for unresolved addresses and symbols.^[22] These object files contain the program's machine code segments, data, and symbol tables but are not yet executable, as external references (like function calls to libraries) remain unresolved.^[22] In the linking phase, a linker such as GNU ld combines multiple object files and libraries into a single executable image by resolving symbols—mapping references to their definitions—and assigning final memory addresses. Static linking embeds the entire contents of required libraries directly into the executable, resulting in a self-contained file that includes all necessary code at build time, which increases file size but eliminates runtime dependencies.^[23] In contrast, dynamic linking incorporates only stubs or references to external libraries, deferring full resolution to runtime via a dynamic linker, which allows shared libraries to be loaded once and reused across programs but requires the libraries to be present on the target system.^[23] The linker also handles relocation, adjusting addresses in the object code to fit the final layout, and produces formats like ELF for Unix-like systems.

Source to Executable Conversion

The transformation from high-level source code, such as C or C++ files, to a runnable executable follows a structured pipeline that ensures the code is processed into machine-readable instructions compatible with the target system. This end-to-end workflow begins with preprocessing and progresses through compilation, assembly, and linking, automating the conversion while resolving dependencies and optimizing for execution.^[19] Preprocessing is the initial stage, where the compiler's preprocessor expands macros, resolves include directives to incorporate header files, and handles conditional compilation based on directives like #ifdef. This step modifies the source code to produce an intermediate form ready for further processing, often expanding files like .c or .cpp without altering the core logic.^[19] The output is then fed into compilation, where the compiler translates the preprocessed code into assembly language, generating human-readable instructions specific to the target architecture. Assembly follows immediately, converting this assembly code into object files (typically .o or .obj) that contain relocatable machine code segments.^[19] Finally, linking combines these object files with required libraries, resolving external references to form a cohesive executable file, such as a.out on Unix-like systems or an .exe on Windows.^[19] To automate and scale this pipeline across complex projects involving multiple source files, build systems play a crucial role in managing dependencies, incremental builds, and platform variations. Makefiles, part of the GNU Make tool, define rules specifying targets (e.g., the executable), prerequisites (e.g., object files), and shell commands (recipes) to execute the stages, using file timestamps to recompile only modified components.^[24] CMake, a cross-platform meta-build system, generates native build files (e.g., Makefiles or Visual Studio projects) from a high-level CMakeLists.txt script, using commands like add_executable() to define the output and target_link_libraries() to handle linking dependencies.^[25] Integrated development environments (IDEs), such as Visual Studio or Eclipse, often integrate these tools or provide built-in builders to streamline the workflow within a graphical interface.^[25] Cross-compilation extends this pipeline to produce executables for architectures different from the host machine, enabling development on powerful desktops for embedded or remote targets. For instance, using GCC, developers specify the target triple (e.g., arm-linux-gnueabi-gcc) to configure the toolchain, ensuring preprocessing, compilation, and assembly generate code for the desired platform, such as building Windows executables on a Linux host.^[26] This requires matching libraries and headers for the target, often managed by build systems like CMake through toolchain files that override default settings.^[26] Throughout the conversion, error handling is essential to identify issues early and maintain code integrity. During compilation, type mismatches—such as incompatible pointer assignments or implicit conversions that alter values—trigger warnings or errors, configurable via flags like -Wconversion or -Wincompatible-pointer-types to enforce strict type checking.^[27] In the linking phase, unresolved symbols occur when references to functions or variables lack corresponding definitions in the object files or libraries, leading to linker errors that halt the build unless suppressed with options like --unresolved-symbols=ignore-all.^[28] These issues, often stemming from missing includes, incorrect library paths, or mismatched declarations across files, demand iterative debugging to ensure a successful executable output.^[28]

Types and Formats

Native vs. Managed Executables

Native executables are programs compiled directly into machine code tailored to a specific CPU architecture, allowing the operating system to execute them without additional interpretation or translation layers.^[29] This direct compilation, often from languages like C or C++, results in binaries such as ELF files on Linux or PE files on Windows, with no runtime overhead during execution beyond the OS loader.^[30] In contrast, they require recompilation for different platforms, limiting portability, and place the burden of memory management and error handling on the developer, which can lead to issues like buffer overflows if not implemented carefully.^[31] Managed executables, on the other hand, are compiled into an intermediate representation, such as Common Intermediate Language (CIL) in .NET or bytecode in Java, which is not directly executable by the hardware.^[29] These executables rely on a virtual machine— like the Common Language Runtime (CLR) for .NET or the Java Virtual Machine (JVM)—to perform just-in-time (JIT) compilation at runtime, converting the intermediate code to native machine instructions as needed.^[32] Examples include .NET assemblies (.dll or .exe files containing CIL, structured in the PE format on Windows) and Java class files (.class files containing bytecode, typically packaged in JAR archives based on the ZIP format).^[30]^[33] The primary advantages of native executables lie in their performance and efficiency: they execute at full hardware speed with minimal startup latency and no ongoing runtime costs, making them ideal for resource-constrained or high-performance applications like system software.^[31] However, their platform specificity reduces cross-architecture portability, requiring separate builds for each target environment, such as x86 versus ARM. Managed executables offer enhanced portability, as the same intermediate code can run on any platform with the appropriate virtual machine, facilitating "write once, run anywhere" development.^[32] They also provide built-in security features, such as automatic memory management via garbage collection and type safety enforced by the runtime, reducing common vulnerabilities like memory leaks.^[29] Drawbacks include dependency on the runtime environment, which adds installation requirements and potential performance overhead from JIT compilation, though optimizations mitigate this in modern implementations.^[34] Hybrid approaches bridge these paradigms by applying ahead-of-time (AOT) compilation to managed code, producing native executables from intermediate representations without JIT at runtime. In .NET, Native AOT compiles CIL directly to machine code during the build process, yielding self-contained binaries with faster startup times and smaller memory footprints compared to traditional JIT-managed executables, while retaining managed benefits like garbage collection.^[34] This method enhances deployment scenarios, such as cloud-native applications or mobile apps, by reducing runtime dependencies, though it may limit dynamic features like reflection.^[35]

Common File Formats

Executable file formats standardize the structure of binaries across operating systems, enabling loaders to map code, data, and metadata into memory for execution. Major formats include the Portable Executable (PE) for Windows, the Executable and Linkable Format (ELF) for Unix-like systems, and Mach-O for Apple platforms, each defining headers, sections, and linking information tailored to their ecosystems. Additional formats like the legacy Common Object File Format (COFF) and the WebAssembly binary format (WASM) address specialized or emerging use cases, such as object files and web-native execution.^[1]^[9]^[3] The Portable Executable (PE) format serves as the standard for executable files on Microsoft Windows and Win32/Win64 systems, encompassing applications (.exe files) and dynamic-link libraries (.dll files). It begins with a DOS header for compatibility with MS-DOS, followed by a PE signature, COFF file header, optional header with subsystem information and data directories (such as imports and exports), and an array of section headers that define the layout of segments like .text for executable code, .data for initialized data, .rdata for read-only data, and .bss for uninitialized data. This structure allows the Windows loader to relocate the image, resolve imports, and initialize the process environment, supporting features like address space layout randomization (ASLR) for security. PE files are extensible, accommodating debug information, resources, and certificates in dedicated sections.^[1] The Executable and Linkable Format (ELF) is the predominant format for executables, object files, shared libraries, and core dumps on Unix-like operating systems, including Linux and Solaris. Defined by the Tool Interface Standard, an ELF file starts with an ELF header specifying the file class (32-bit or 64-bit), endianness, ABI version, and entry point, followed by optional program header tables that describe loadable segments (e.g., PT_LOAD for code and data) and section header tables that organize content into sections like .text for code, .data for initialized variables, .rodata for constants, and .symtab for symbols. Program headers guide the dynamic loader in mapping segments into virtual memory, while sections facilitate linking and debugging; shared objects (.so files) use ELF to enable dynamic linking at runtime. ELF's flexibility supports multiple architectures and processor-specific features, such as note sections for auxiliary information.^[9] Mach-O, short for Mach Object, is the executable format used in macOS, iOS, watchOS, and tvOS, organizing binaries into a header, load commands, and segments containing sections for efficient loading by the dyld dynamic linker. The header identifies the CPU type, file type (e.g., MH_EXECUTE for executables or MH_DYLIB for libraries), and number of load commands, which specify details like segment permissions, symbol tables, and dynamic library paths. Segments such as __TEXT (for code and read-only data) and __DATA (for writable data) group related sections, with __LINKEDIT holding linking information; Mach-O supports "fat" binaries that embed multiple architectures (e.g., x86_64 and arm64) in one file, allowing universal execution across devices like Intel-based Macs and Apple Silicon. This format integrates with Apple's code-signing system, embedding entitlements and signatures directly in the binary.^[3] Other notable formats include the Common Object File Format (COFF), a legacy predecessor to PE used primarily for object files (.obj) in Windows toolchains and older Unix systems, featuring a file header with machine type and section count, followed by optional headers, section tables, and raw section data for relocatable code and symbols. COFF lacks the full executable portability of PE but remains relevant in build processes for its simplicity in handling intermediate compilation outputs. In contrast, WebAssembly (WASM) provides a platform-independent binary format for high-performance execution in web browsers and standalone runtimes, encoding modules as a sequence of typed instructions in a compact, linear bytecode structure with sections for code, data, types, functions, and imports/exports, compiled from languages like C++ or Rust to run sandboxed at near-native speeds without traditional OS dependencies.^[1]^[36]

Execution Mechanism

Loading and Running

The loading of an executable into memory begins when the operating system kernel receives a request to execute a program file, typically through system calls that initiate process creation. The kernel first reads the executable's header to verify its format and extract metadata about memory layout, such as segment sizes and permissions. For instance, in Linux systems using the ELF format, the kernel's load_elf_binary() function parses the ELF header and program header table to identify loadable segments like code, data, and BSS.^[37] Similarly, in Windows with the PE format, the loader examines the DOS header, NT headers, and optional header to determine the image base and section alignments.^[1] Once headers are parsed, the kernel maps the executable's segments into the process's virtual address space, allocating memory pages as needed without immediately loading all physical pages to support demand paging. Read-only segments like code are mapped with execute permissions, while data segments receive read-write access; the BSS segment, representing uninitialized data, is zero-filled by allocating fresh pages. The kernel also establishes the stack and heap regions: the stack grows downward from a high virtual address, often with address space layout randomization (ASLR) for security, while the heap starts just after the data segment and expands via system calls like brk() or mmap(). In Linux, setup_arg_pages() configures the initial stack size and adjusts memory accounting for argument pages.^[37] Windows performs analogous mappings through the Ntdll.dll loader, reserving virtual memory for sections and committing pages on demand.^[1] Process creation integrates loading in operating system-specific models. In Unix-like systems such as Linux, the common approach uses the fork-exec paradigm: the fork() system call duplicates the parent process to create a child, sharing the address space initially via copy-on-write, after which the child invokes execve() to replace its image with the new executable.^[38]^[39] The execve() call triggers the kernel to load the binary, clear the old address space via flush_old_exec(), and set up the new one, returning control to the child only on success. In contrast, Windows employs the CreateProcess() API, which atomically creates a new process object, allocates its virtual address space, loads the specified executable, and starts its primary thread in a single operation, inheriting the parent's environment unless overridden.^[40] After loading, execution begins at the designated entry point, with the kernel performing final initializations. In Linux ELF executables, the kernel jumps to the entry address from the ELF header (or the dynamic linker's if present) via start_thread(), having populated the stack with the argument count argc, an array of argument pointers argv (with argv[0] typically the program name), environment pointers envp, and an auxiliary vector containing metadata like the entry point and page size.^[37]^[39] The actual entry symbol _start, provided by the C runtime (e.g., in glibc's crt1.o), receives these via the stack or registers, initializes the runtime environment (such as constructors and global variables), and invokes __libc_start_main() to call the user's main(int argc, char *argv[]) function.^[41] For Windows PE executables, the loader computes the entry point by adding the AddressOfEntryPoint RVA from the optional header to the image base, then starts the primary thread there; the C runtime entry (e.g., mainCRTStartup) similarly sets up argc and argv from the command line before calling main.^[1] Process termination occurs when the program calls an exit function, such as exit() in C, which sets an exit code and triggers cleanup. The exit code, an integer typically 0 for success and non-zero for failure, is returned to the parent process; in Linux, the least significant byte of the status is passed via wait() or waitpid(), while the kernel reaps the process, freeing its memory mappings, closing file descriptors, and releasing other resources to prevent leaks.^[42] If the parent ignores SIGCHLD or has set SA_NOCLDWAIT, the child is immediately reaped without becoming a zombie. In Windows, ExitProcess() sets the exit code (queryable via GetExitCodeProcess()) and notifies loaded DLLs, terminates all threads, unmaps the image from memory, and closes kernel handles, though persistent objects like files may remain if referenced elsewhere.^[43] Forced termination via signals (e.g., SIGKILL in Unix) or TerminateProcess() in Windows bypasses runtime cleanup but still reclaims system resources.

Dynamic Linking and Libraries

Dynamic linking allows executables to reference external shared libraries at runtime rather than embedding all code during compilation, enabling modular program design where libraries like .dll files on Windows or .so files on Unix-like systems are loaded on demand. This process relies on symbol tables within the executable and library files, which contain unresolved references to functions and variables; the runtime system resolves these symbols by searching for matching exports in loaded libraries, often using a dynamic symbol table for efficient lookups. Lazy loading defers the actual loading of a library until the first reference to one of its symbols is encountered, optimizing memory usage by avoiding unnecessary loads for unused components. The runtime loader, such as dyld on macOS or ld.so on Linux, manages this linking process by handling symbol resolution, applying relocations to adjust addresses based on the library's load position, and enforcing versioning to ensure compatibility between executable and library versions. For instance, ld.so on Linux uses a dependency tree to load prerequisite libraries recursively and performs global symbol resolution to bind imports across modules. Versioning mechanisms, like sonames in ELF files, prevent conflicts by specifying minimum required library versions, allowing multiple variants to coexist on the system. One key advantage of dynamic linking is the reduction in executable file size, as shared code is stored once in libraries and reused across multiple programs, which also facilitates easier updates to libraries without recompiling dependent executables. However, it introduces challenges such as dependency conflicts, colloquially known as "DLL hell" on Windows, where mismatched library versions can cause runtime failures if the system loads an incompatible variant. To support dynamic linking effectively, executables and shared libraries often employ position-independent code (PIC), which compiles instructions to be relocatable without fixed addresses, using techniques like relative addressing and GOT/PLT tables to defer address resolution until runtime. This enables libraries to be loaded at arbitrary memory locations and shared among processes, enhancing system efficiency, though it may incur a slight performance overhead due to indirect jumps. In contrast to static linking, where all library code is incorporated at build time, dynamic linking promotes resource sharing but requires careful management of dependencies.

Security Considerations

Vulnerabilities and Protections

Executables are susceptible to buffer overflow vulnerabilities, where programs write more data to a fixed-length buffer than it can hold, potentially overwriting adjacent memory regions such as return addresses on the stack.^[44] This occurs due to the intermixing of data storage areas and control data in memory, allowing malformed inputs to alter program control flow and enable arbitrary code execution.^[44] Stack smashing attacks exemplify this risk, exploiting stack-based buffer overflows in C programs by using functions like strcpy() to copy excessive data, overwriting the return address to redirect execution to injected shellcode.^[45] Code injection vulnerabilities further compound these threats, arising when executables fail to neutralize special elements in externally influenced inputs, permitting attackers to insert and execute malicious code.^[46] For instance, unvalidated user inputs can be interpreted as executable commands in languages like PHP or Python, leading to unauthorized actions such as system calls.^[46] To mitigate these exploits, operating systems implement protections like Address Space Layout Randomization (ASLR), which randomly relocates key areas of a process's virtual address space—including stacks, heaps, and loaded modules—at runtime to thwart address prediction by attackers.^[47] Complementing ASLR, Data Execution Prevention (DEP) uses the processor's NX (No eXecute) bit to mark certain memory pages as non-executable, preventing buffer overflow payloads from running code in data regions like the stack or heap.^[48] If execution is attempted on non-executable memory, DEP triggers an access violation, terminating the process to block exploitation.^[48] Executables also serve as primary vectors for malware, including viruses that attach to legitimate files and activate upon execution, spreading via shared disks or networks.^[49] Trojans similarly masquerade as benign executables, such as email attachments or downloads, tricking users into running them to grant attackers backdoor access or data exfiltration capabilities.^[50] Malware detection often relies on heuristic methods, which analyze runtime behaviors in simulated environments to identify suspicious actions like self-replication, even for unknown variants without matching signatures.^[51] Best practices for securing executables emphasize input validation, where data is checked early against allowlists for format, length, and semantics to block malformed inputs that could trigger overflows or injections.^[52] Additionally, least privilege execution restricts processes to minimal necessary permissions, confining potential damage from compromised executables by elevating privileges only when required and dropping them immediately afterward.^[53]

Signing and Verification

Code signing is a cryptographic process that attaches a digital signature to an executable file, ensuring its integrity and authenticity by verifying that it has not been altered since signing and originates from a trusted publisher. This is achieved using digital certificates, typically in the X.509 format, issued by trusted certificate authorities (CAs). The signature is generated by hashing the executable—commonly with SHA-256—and encrypting the hash with the developer's private key, which is embedded in the certificate along with the corresponding public key for later verification.^[54]^[55] Developers obtain an X.509 code signing certificate from a CA after undergoing identity validation, then use platform-specific tools to apply the signature. On Windows, the SignTool utility (signtool.exe) from the Windows SDK signs executables or catalog files by computing a SHA-256 hash of the file contents, signing it with the private key, and embedding the result in a PKCS #7 structure within the PE (Portable Executable) file format. Similarly, on macOS, the codesign command-line tool signs Mach-O executables and bundles, creating a CodeResources file that includes SHA-256 hashes of resources and the signature, stored in the bundle's _CodeSignature directory. For distribution outside app stores, Microsoft employs Authenticode as its standard, which supports dual-signing with both SHA-1 and SHA-256 for broader compatibility, while Apple uses Developer ID certificates to enable Gatekeeper verification for non-App Store software.^[56]^[57]^[58] During loading or execution, the operating system verifies the signature to enforce trust. The process involves recomputing the SHA-256 hash of the executable and decrypting the embedded signature with the public key from the certificate to obtain the original hash; if they match, the file is deemed untampered. The certificate chain is then validated against trusted root CAs to confirm the publisher's identity, often requiring online checks for revocation status via Certificate Revocation Lists (CRLs) published by the CA. On Windows, Authenticode verification occurs via the WinVerifyTrust API, which chains to roots in the system's trusted store and blocks execution if the signature is invalid or revoked. macOS uses the Security framework for similar checks during Gatekeeper assessment, ensuring the Developer ID signature aligns with Apple's notarization ticket if applicable.^[56]^[59]^[57] The primary purposes of signing and verification are to prevent tampering by detecting unauthorized modifications and to establish a chain of trust from the developer to the end user through CA-anchored certificates, thereby reducing risks from malware masquerading as legitimate software. Revocation mechanisms, such as CRLs, allow CAs to invalidate compromised certificates before expiration by listing their serial numbers, prompting systems to deny verification and halt execution of affected executables; this is critical for code signing, where revoked certificates remain on CRLs indefinitely to maintain long-term protection. Standards like Microsoft's Authenticode and Apple's Developer ID ensure interoperability and enforce these practices across ecosystems.^[56]^[60]^[61]

Historical Development

Early Executables

The earliest forms of executables emerged in the pre-1950s era through physical media like punch cards and magnetic tapes, which enabled direct machine execution on pioneering computers. The ENIAC, completed in 1945 as the first general-purpose electronic digital computer, relied on punch cards for input via an integrated IBM card reader, allowing programs to be loaded and executed by configuring the machine's wiring and switches based on the card data.^[62] These punch cards, perforated with holes representing binary instructions, served as the primary medium for storing and inputting both data and rudimentary programs, marking the transition from manual calculations to automated execution.^[63] Magnetic tapes began supplementing punch cards in the late 1940s, offering higher capacity for sequential program storage and execution on systems like the UNIVAC I (1951), where tapes could be read directly to initiate computations without intermediate transcription.^[64] In the 1950s and 1960s, executables evolved with the advent of assembly languages on mainframe computers, facilitating more structured programming. The IBM 701, introduced in 1952 as one of the first commercial scientific computers, used assembly language where programmers encoded instructions in symbolic form, translated into machine code stored on punch cards or tape for loading into memory. This period also saw the development of the first loaders for relocatable code, enabling programs to be assembled independently and then positioned in memory at runtime; Grace Hopper's A-0 system for the UNIVAC in 1952 implemented an early linking loader that combined subroutines from separate modules into a single executable.^[65]^[66] These loaders addressed the rigidity of absolute addressing in earlier systems, allowing code to be moved without full reprogramming, though execution still required manual intervention to set memory addresses. A key milestone in executable management came with the Multics operating system in 1967, which introduced file permissions specifically for executables to enhance security in a multi-user environment. Multics segmented files with access controls, including an "execute" permission bit that restricted direct execution of data segments and enforced protection rings to isolate user programs from system resources.^[67] This innovation, part of Multics' hierarchical file system, was the first to systematically apply permissions to executable files, preventing unauthorized access or modification during shared computing sessions.^[68] Early executables were hampered by significant limitations, including the absence of automated linking mechanisms and reliance on manual memory management. Programmers had to explicitly calculate and adjust addresses for each load, with no dynamic resolution of external references, leading to error-prone setups on mainframes like the IBM 701.^[69] Memory allocation was entirely manual, requiring operators to track available space and avoid overlaps, which constrained program size and portability across sessions.^[70]

Evolution in Modern Systems

In the 1980s and 1990s, executable formats evolved significantly alongside the growth of personal computing and Unix-like systems. The MS-DOS operating system, released by Microsoft in 1981, introduced the .COM format for simple, memory-resident programs limited to 64 KB, followed by the more advanced .EXE format in MS-DOS 1.0, which supported relocatable code and larger programs through a header-based structure known as MZ after its designer.^[71] Concurrently, Unix systems saw the rise of dynamic linking, first implemented in SunOS 4.0 in late 1988, enabling shared libraries to be loaded at runtime for efficient memory use and easier updates, building on virtual memory advancements.^[72] By the 1990s, the Executable and Linkable Format (ELF) emerged as a standardized alternative to older formats like a.out, with initial specifications published by Unix System Laboratories in 1990 and the Tool Interface Standard (TIS) version 1.2 released in May 1995, facilitating portable executables across Unix variants such as Solaris and Linux.^[73] The 2000s marked a shift toward managed code environments, prioritizing portability and security over native binaries. Sun Microsystems released Java in May 1995, introducing bytecode as an intermediate representation executed by the Java Virtual Machine (JVM), which handled memory management and type safety automatically.^[74] Microsoft followed with the .NET Framework in February 2002, featuring the Common Language Runtime (CLR) for executing Common Intermediate Language (CIL) code, enabling cross-language interoperability and just-in-time (JIT) compilation for platform independence.^[75] Early previews of containerization also appeared, with FreeBSD introducing Jails in 2000 to isolate processes, filesystems, and networks within a single kernel, laying groundwork for resource partitioning in shared environments.^[76] From the 2010s onward, executables adapted to diverse architectures, web integration, and heightened security demands in cloud and mobile ecosystems. Apple announced Universal 2 binaries in June 2020 to support the transition to Apple Silicon (ARM64), allowing single files to contain both Intel x86-64 and ARM code for seamless execution across hardware.^[77] WebAssembly (Wasm), released by the W3C in March 2017, emerged as a compact, binary instruction format for high-performance, cross-platform code execution in browsers and beyond, compiling from languages like C++ and Rust without traditional plugins. Security enhancements, such as OS-level sandboxing, gained prominence; for instance, Windows introduced AppContainer in Windows 8 (2012) to restrict executable access to resources via mandatory integrity control, while macOS expanded sandboxing in 2012 to limit app privileges by default.^[78] Looking ahead, hybrid just-in-time (JIT) and ahead-of-time (AOT) compilation strategies are gaining traction to balance startup speed and runtime optimization, as seen in tools like GraalVM for Java, which combines AOT for initial execution with JIT for adaptive improvements.^[79] Executable compression techniques, such as those in UPX, continue to evolve for efficient distribution, reducing file sizes by 50-70% through algorithms like LZMA while preserving fast decompression, aiding bandwidth-constrained mobile and cloud deployments.^[80]

References

[1]
where are the exe. files ???? - Microsoft Q&A
Jul 19, 2019 · An executable is a file that contains a program - that is, a particular kind of file that is capable of being executed or run as a program in ...Missing: definition science
[2]
OS Processes - CS 3410 - Cornell: Computer Science
This is a file that contains the instructions and data for your program. An executable is inert: it's not doing anything; it's just sitting there on your disk.
[3]
Glossary - Princeton Research Computing
Executable. An executable is a file that can be typed in a shell and run (executed) the commands within the file. The executable can be a script file with ...
[4]
Types of files - IBM
Binary files are regular files that contain information readable by the computer. Binary files might be executable files that instruct the system to accomplish ...
[5]
PE Format - Win32 apps - Microsoft Learn
Jul 14, 2025 · This document specifies the structure of executable (image) files and object files under the Microsoft Windows family of operating systems.
[6]
[PDF] I Executable and Linkable Format (ELF)
Executable and shared object files have a base address, which is the lowest virtual address associated with the memory image of the program's object file. One ...
[7]
Chapter 7 Object File Format (Linker and Libraries Guide)
An executable file holds a program that is ready to execute. The file specifies how exec(2) creates a program's process image. A shared object file holds code ...Missing: science | Show results with:science
[8]
Inside Windows: Win32 Portable Executable File Format in Detail
Microsoft introduced the PE File format, more commonly known as the PE format, as part of the original Win32 specifications. However, PE files are derived from ...
[9]
[PDF] CS429: Computer Organization and Architecture - Linking I & II
Apr 5, 2018 · Linking combines code and data into a single executable. A linker merges object files, resolves external references, and relocates symbols.<|control11|><|separator|>
[10]
Representing executable files
Binary Format This executable file format can be specific to the operating system, as we would not normally expect that a program compiled for one system will ...
[11]
Executable definition by The Linux Information Project (LINFO)
Jul 9, 2005 · An executable file, also called an executable or a binary, is the ready-to-run (i.e., executable) form of a program. A program is a sequence of ...
[12]
Executable File - Artifact Details | MITRE D3FEND™
name: Executable File; definition: In computing, executable code or an executable file or executable program, sometimes simply an executable, ...
[13]
x86 Assembly Language Programming
This document contains very brief examples of assembly language programs for the x86. ... Hello, World" to the console using only system calls. Runs on 64-bit ...
[14]
[PDF] Tool Interface Standard (TIS) Executable and Linking Format (ELF ...
An object file's section header table lets one locate all the file's sections. The section header table is an array of Elf32_Shdr structures as described below.Missing: internal | Show results with:internal
[15]
[PDF] Common Object File Format (COFF - Texas Instruments
Apr 8, 2009 · Each section has its own section header. Table 5 shows the structure of each section header. Table 5. Section Header Contents. Byte Number Type.
[16]
Are binaries portable across different CPU architectures?
Jul 26, 2016 · No, binaries are not portable across different CPU architectures. They must be recompiled for each target architecture.Missing: conventions | Show results with:conventions
[17]
Android ABIs - NDK
Feb 10, 2025 · Android ABIs are combinations of CPU and instruction sets, including supported instruction sets, endianness, data passing conventions, and ...
[18]
A Technical Guide to Porting Software to ARM64 Architecture
Jul 4, 2024 · ARM64 processors are on the up, and x86/x64 emulation isn't enough. Discover how to port software to ARM64 architectures with our ...
[19]
[PDF] Memory management in C: The heap and the stack
Oct 7, 2010 · • Code segment or text segment: Code segment contains the code executable or code binary. • Data segment: Data segment is sub divided into two ...
[20]
3.4: Memory segments - Engineering LibreTexts
Nov 30, 2020 · The heap segment contains chunks of memory allocated at run time, most often by calling the C library function malloc . The stack segment ...
[21]
Methods to Optimize Code Size - Intel
Using interprocedural optimization (IPO) may reduce code size. It enables dead code elimination and suppresses generation of code for functions that are ...
[22]
Optimize Options (Using the GNU Compiler Collection (GCC))
This can improve dead code elimination and common subexpression elimination. ... Maximum size (in bytes) of objects tracked bytewise by dead store elimination.
[23]
Overall Options (Using the GNU Compiler Collection (GCC))
Compilation can involve up to four stages: preprocessing, compilation proper, assembly and linking, always in that order. GCC is capable of preprocessing and ...
[24]
GCC Frontend HOWTO: Some general ideas about Compilers
Lexical analysis. The lexical analyzer reads the source program and emits tokens. · Syntax analysis. Tokens from the lexical analyzer are the input to this phase ...
[25]
Using as - Sourceware
The GNU assembler can be configured to produce several alternative object file formats. For the most part, this does not affect how you write assembly language ...
[26]
Link Options (Using the GNU Compiler Collection (GCC))
If both static and shared libraries are found, the linker gives preference to linking with the shared library unless the -static option is used. It makes a ...
[27]
GNU make
Summary of each segment:
[28]
cmake-buildsystem(7)
A CMake-based buildsystem is organized as a set of high-level logical targets. Each target corresponds to an executable or library, or is a custom target ...
[29]
Invoking GCC (Using the GNU Compiler Collection (GCC))
### Summary of Cross-Compilation and Generating Executables for Different Architectures
[30]
Warning Options (Using the GNU Compiler Collection (GCC))
Summary of each segment:
[31]
LD
Summary of each segment:
[32]
What is managed code? - .NET - Microsoft Learn
Apr 19, 2023 · To put it simply, managed code is just that: code whose execution is managed by a runtime. In this case, the runtime in question is called the ...
[33]
Managed Execution Process - .NET - Microsoft Learn
Apr 20, 2024 · The CIL and metadata are contained in a portable executable (PE) file that is based on and that extends the published Microsoft PE and common ...
[34]
.NET Native and Compilation - UWP applications - Microsoft Learn
Oct 20, 2022 · This article compares .NET Native with other compilation technologies available for .NET Framework apps, and also provides a practical overview of how .NET ...
[35]
Chapter 3. Compiling for the Java Virtual Machine
Oracle's JDK software contains a compiler from source code written in the Java programming language to the instruction set of the Java Virtual Machine.Missing: native | Show results with:native
[36]
Native AOT deployment overview - .NET | Microsoft Learn
The Native AOT deployment model uses an ahead-of-time compiler to compile IL to native code at the time of publish. Native AOT apps don't use a just-in-time ...Optimizing AOT deployments · Known trimming incompatibilities
[37]
Optimizing AOT deployments - .NET | Microsoft Learn
Sep 4, 2024 · The Native AOT publishing process generates a self-contained executable with a subset of the runtime libraries that are tailored ...
[38]
Overview of the Mach-O Executable Format - Apple Developer
Mar 10, 2014 · A Mach-O binary is organized into segments. Each segment contains one or more sections. Code or data of different types goes into each section.
[39]
Binary Format — WebAssembly 3.0 (2025-11-02)
Binary Format¶ · Conventions · Grammar · Auxiliary Notation · Lists · Values · Bytes · Integers · Floating-Point · Names · Types · Number Types · Vector Types ...Conventions · Modules · Instructions · Types
[40]
How programs get run: ELF binaries - LWN.net
Feb 4, 2015 · This article only focuses on what's needed to load an ELF program, rather than exploring all of the details of the format.
[41]
fork(2) - Linux manual page
### Summary of fork System Call and Its Role in Unix Process Creation Model with exec
[42]
execve(2) - Linux manual page
### Summary of execve System Call
[43]
Create processes - Win32 apps - Microsoft Learn
Jul 14, 2025 · The CreateProcess function creates a new process that runs independently of the creating process. For simplicity, this relationship is called a parent-child ...
[44]
https://www.utc.edu/sites/default/files/2021-04/buffer-overflow.pdf
[45]
exit(3) - Linux manual page - man7.org
The exit() function causes normal process termination and the least significant byte of status (ie, status & 0xFF) is returned to the parent.On_exit(3) · Pthread_exit(3) · Atexit(3)Missing: OS | Show results with:OS
[46]
Terminating a Process - Win32 apps | Microsoft Learn
Jul 14, 2025 · The GetExitCodeProcess function returns the termination status of a process. While a process is executing, its termination status is STILL_ACTIVE.
[47]
[PDF] Buffer Overflow Vulnerability Lab - UTC
This vulnerability arises due to the mixing of the storage for data (e.g. buffers) and the storage for controls (e.g. return addresses): an overflow in the data ...
[48]
[PDF] Smashing The Stack For Fun And Profit Aleph One
Smashing the stack corrupts the execution stack by writing past an array's end, causing a jump to a random address. The stack is a memory region with a stack ...
[49]
CWE-94: Improper Control of Generation of Code ('Code Injection')
To reduce the likelihood of code injection, use stringent allowlists that limit which constructs are allowed. If you are dynamically constructing code that ...
[50]
DYNAMICBASE (Use address space layout randomization)
May 6, 2022 · Specifies whether to generate an executable image that can be randomly rebased at load time by using the address space layout randomization (ASLR) feature of ...Missing: explanation | Show results with:explanation
[51]
Data Execution Prevention - Win32 apps - Microsoft Learn
May 1, 2023 · Data Execution Prevention (DEP) is a system-level memory protection feature that is built into the operating system starting with Windows XP and Windows Server ...<|separator|>
[52]
What Is the Difference: Viruses, Worms, Trojans, and Bots? - Cisco
Jun 14, 2018 · Almost all viruses are attached to an executable file, which means the virus may exist on a system but will not be active or able to spread ...
[53]
What Is a Trojan Horse? Trojan Virus and Malware Explained | Fortinet
A Trojan Horse Virus is a type of malware that downloads onto a computer disguised as a legitimate program. The delivery method typically sees an attacker ...How Trojans Work · Most Common Types Of Trojan... · Trojan Horse Virus Faqs<|separator|>
[54]
[PDF] The Evolving Virus Threat - NIST Computer Security Resource Center
Oct 19, 2000 · Heuristics are behavior-based technologies that can detect the suspicious behavior of new and unknown threats. 2.<|separator|>
[55]
Input Validation - OWASP Cheat Sheet Series
This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications.
[56]
Secure Coding Practices Checklist - OWASP Foundation
Data protection. Implement least privilege, restrict users to only the functionality, data and system information that is required to perform their tasks.
[57]
What Is Code Signing? | Sectigo® Official
Digitally signing the executable software with a publicly trusted X. 509 certificate increases confidence in the software.
[58]
Authenticode Signing for Game Developers - Win32 apps
Jul 21, 2021 · After the CA decides that you meet its policy criteria, it generates a code-signing certificate that conforms to X. 509, the industry-standard ...
[59]
Authenticode Digital Signatures - Windows drivers - Microsoft Learn
Jul 12, 2025 · Authenticode is a Microsoft code-signing technology that identifies the publisher of Authenticode-signed software.Missing: documentation | Show results with:documentation
[60]
Code Signing Tasks - Apple Developer
Sep 13, 2016 · Explains how to use command-line tools to sign your code.
[61]
Developer ID - Signing Your Apps for Gatekeeper
A Developer ID certificate lets Gatekeeper verify that you're a trusted developer when people download and open your app, plug-in, or installer package from ...
[62]
App code signing process in macOS - Apple Support
Feb 18, 2021 · Code signing is performed by the developer using their Developer ID certificate (issued by Apple). Verification of this signature proves to the ...
[63]
Revoke a code signing certificate - DigiCert documentation
The certificate revocation process works as follows: · Submit a request to revoke the certificate. · An administrator approves the revocation request. · DigiCert ...
[64]
https://www.computerhistory.org/storageengine/tape-unit-developed-for-data-storage/
[65]
ENIAC - CHM Revolution - Computer History Museum
ENIAC (Electronic Numerical Integrator And Computer), built between 1943 and 1945—the first large-scale computer to run at electronic speed without being slowed ...<|control11|><|separator|>
[66]
Punch Cards for Data Processing
Punch cards became the preferred method of entering data and programs onto them. They also were used in later minicomputers and some early desktop calculators.
[67]
1951: Tape unit developed for data storage
By the late 1940s computer design engineers recognized that magnetic audio tape technology could be adapted for digital data recording.
[68]
When was the relocatable object module invented?
Mar 31, 2017 · Grace Hopper invented a kind of linking loader in 1951 for the Univac, as part of the A-0 "compiler" (not a compiler like we understand it ...
[69]
[PDF] Protection and the Control of Information Sharing in Multics
The design of mechanisms to control the sharing of information in the Multics system is described. Five design principles help provide insight into the ...
[70]
A Hardware Architecture for Implementing Protection Rings - Multics
As described earlier, the permission flags for each segment in the virtual memory of a process simply indicate that the segment can or cannot be read, written, ...
[71]
[PDF] IBM 701 users' class notes and related materials
instructions affected by relocation for the general purpose program desired by the problem planner. These control data together with the specified ...
[72]
[PDF] FORTRAN Session - Software Preservation Group
Before 1954 almost all programming was done in machine language or assembly lan- ... IBM 701 Speedcoding and other automatic programming systems. In Proe. Syrup ...
[73]
microsoft/MS-DOS - GitHub
Apr 25, 2024 · This repo contains the original source-code and compiled binaries for MS-DOS v1.25 and MS-DOS v2.0, plus the source-code for MS-DOS v4.00 jointly developed by ...Actions · Pull requests · Security · Activity<|separator|>
[74]
[PDF] 1 AN INTRODUCTION TO SOLARIS - Pearsoncmg.com
Aug 25, 2000 · 1988. SunOS 4.0. • New virtual memory system integrates the file system cache with the memory system. • Dynamic linking added. • The first ...
[75]
Evolution of the ELF object file format - MaskRay
May 26, 2024 · Version 1.2 was released in May 1995. ELF has been very influential. In the 1990s, many Unix and Unix-like operating systems, including Solaris, ...
[76]
[PDF] The Java Language: A White Paper - Tech Insider
Introduction. The Java programming language and environment is designed to solve a number of problems in modern programming practice.
[77]
Common Language Runtime (CLR) overview - .NET - Microsoft Learn
Get started with common language runtime (CLR), .NET's run-time environment. The CLR runs code and provides services to make the development process easier.Missing: structure | Show results with:structure<|separator|>
[78]
A Brief History of Containers: From the 1970s Till Now - Aqua Security
Jan 10, 2020 · Read on to understand the changes and developments we saw and offer our view of where we believe Containers are heading to in the near future.
[79]
Apple announces Mac transition to Apple silicon
Jun 22, 2020 · Using Universal 2 application binaries, developers will be able to easily create a single app that taps into the native power and ...
[80]
Sandboxing and Virtualization: Modern Tools for Combating Malware
Utilizing hardware virtualization based techniques, a malware pro tection solution runs the target application in its own OS in a virtual machine. Malware is ...Missing: executables | Show results with:executables
[81]
[PDF] AOT vs. JIT: Impact of Profile Data on Code Quality
Just-in-time (JIT) compilation during program execution and ahead-of-time (AOT) compilation during software installation are alternate techniques used by ...
[82]
UPX: the Ultimate Packer for eXecutables - Homepage
UPX is an advanced executable file compressor. UPX will typically reduce the file size of programs and DLLs by around 50%-70%, thus reducing disk space, network ...Missing: techniques | Show results with:techniques