2.5D

In computer graphics, 2.5D (two-and-a-half-dimensional) refers to techniques that bridge two-dimensional (2D) and three-dimensional (3D) representations by incorporating limited depth or spatial elements to simulate 3D effects while retaining the simplicity and efficiency of 2D processing. This approach typically involves adding depth maps to 2D images or videos, using height fields for terrain rendering, or employing billboarding and ray casting to mimic 3D scenes from a fixed viewpoint without full volumetric modeling.^[1] Common in early video games like Doom (1993), where ray casting projected 2D sprites onto a pseudo-3D environment, 2.5D enables enhanced visual depth at lower computational cost compared to true 3D polygon rendering. In video games, 2.5D commonly describes titles with 2D gameplay mechanics—such as side-scrolling or top-down movement restricted to a plane—but rendered using 3D geometry for environments and characters to provide visual depth and parallax effects.^[2] Examples include isometric games like Diablo (1996) or modern titles such as Octopath Traveler (2018), where orthographic cameras maintain a fixed perspective while 3D models create layered, three-quarter views.^[3] This hybrid style balances the accessibility of 2D controls with the aesthetic richness of 3D assets, often using tilemaps or sprite sheets extruded into limited z-depth. In animation and non-photorealistic rendering (NPR), 2.5D techniques involve constructing models from 2D hand-drawn shapes that support out-of-plane rotations and stylizations, preserving the flat aesthetic of traditional cartoons while allowing 3D-like motion.^[4] For instance, 2.5D cartoon models interpolate between key 2D drawings to generate smooth animations with implicit 3D structure, as seen in tools for vector-based art that avoid full 3D rigging.^[5] Similarly, 2.5D video processing applies non-photorealistic stylization to footage augmented with depth data, enabling consistent edge detection and shading across frames for artistic effects.^[6]

History

Pre-Digital Origins

The conceptual foundations of 2.5D representations trace back to ancient civilizations, where artists and draftsmen employed axonometric and oblique projections to depict three-dimensional forms on two-dimensional surfaces without the distortions of linear perspective. In ancient China, during the Han Dynasty (202 BCE–220 CE), technical drawings and painted scrolls utilized oblique parallel perspective, a method that maintained parallel projection lines and equal object sizes regardless of depth, allowing for accurate spatial representation in architectural plans and artistic scenes. This approach, evident in early handscrolls, reflected a philosophical emphasis on objective depiction of nature, avoiding the subjective convergence of lines toward vanishing points seen in Western traditions.^[7] In ancient Greece, from the sixth century BCE onward, early geometers such as Thales of Miletus and Hippocrates of Chios developed foundational geometric principles that contributed to methods of spatial projection, influencing later works like Euclid's Elements (c. 300 BCE) for accurate 2D representations of 3D forms without perspective foreshortening. These practices enabled the clear visualization of complex structures, such as in architectural and mechanical sketches, setting precedents for non-distorted 3D-to-2D mappings.^[8] By the Renaissance, Albrecht Dürer advanced these ideas through his seminal work Underweysung der Messung mit dem Zirckel und Richtscheyt (1525), where he systematically explored descriptive geometry to construct precise representations of volumes and shadows, laying groundwork for 2.5D-like techniques in art and engineering. Dürer's methods, including geometric constructions for polyhedra and conic sections, emphasized mathematical rigor in simulating depth on flat planes.^[9]^[10] In the realm of animation, the 1930s marked a pivotal analog innovation with Walt Disney Studios' development of the multiplane camera, first fully utilized in Snow White and the Seven Dwarfs (1937). This device layered multiple sheets of hand-drawn animation cels at varying distances from the camera lens, allowing elements to move at different speeds to simulate parallax and depth in otherwise flat 2D sequences. By capturing footage frame-by-frame, the multiplane camera created a sense of volumetric space, enhancing narrative immersion without relying on full 3D modeling.^[11] Early 20th-century graphic design and technical illustration further embraced isometric views to convey volume in blueprints and diagrams, particularly in engineering and architecture. Isometric projection, where the three principal axes are equally foreshortened at 120-degree angles, provided a distortion-free alternative to perspective for illustrating machinery, buildings, and products, enabling viewers to grasp spatial relationships at a glance. This technique gained prominence in industrial catalogs and instructional materials, bridging artistic expression with practical visualization.^[12]

Digital Developments

The concept of 2.5D emerged in digital computing during the 1960s with early computer-aided design (CAD) systems, particularly Ivan Sutherland's Sketchpad program developed in 1963 at MIT. Sketchpad introduced interactive graphical manipulation on a display, allowing users to create and modify line drawings using a light pen, which laid the groundwork for representing three-dimensional models through two-dimensional interfaces on the TX-2 computer. This system enabled real-time constraint-based editing of geometric figures, marking a pivotal shift toward hybrid 2D-3D visualization in engineering and design applications.^[13]^[14] By the 1980s, 2.5D techniques gained commercial traction in arcade video games through sprite-scaling methods that simulated depth without full 3D rendering. Sega's 1985 release of Space Harrier utilized the company's Super Scaler hardware to dynamically scale and position 2D sprites against scrolling backgrounds, creating a pseudo-3D flying shooter experience with high frame rates and immersive perspective effects. This approach, processing up to 32,000 pixels per frame, represented an early hardware-accelerated form of 2.5D that influenced subsequent arcade titles and console ports.^[15]^[16] The early 1990s saw further hardware-specific advancements in console gaming, exemplified by Nintendo's Mode 7 feature on the Super Nintendo Entertainment System (SNES), introduced with the console's 1990 launch. Mode 7 allowed a single background layer to be rotated, scaled, and affine-transformed in real-time, producing pseudo-3D effects like curving roads and aerial views; Super Mario Kart (1992) prominently employed this for its kart-racing tracks, blending 2D sprites with dynamic 3D-like environments. This technique optimized for the SNES's 16-bit architecture, enabling smooth perspective simulation within 2D constraints.^[17]^[18] The mid-1990s marked the rise of 2.5D in personal computer games through ray-casting engines that rendered 3D-like interiors using 2D maps. id Software's Wolfenstein 3D (1992) pioneered this with a ray-casting algorithm that projected textured walls and floors from a 2D grid, simulating navigable corridors in a first-person view while restricting vertical movement. Building on this, the Doom engine (1993) advanced the approach by adding variable-height floors, sloped sectors, and dynamic lighting, enhancing spatial illusion in titles like Doom and licensing it widely for hybrid 2D/3D experiences. The term "2.5D" emerged in the early 1990s in gaming communities to describe these pseudo-3D engines that bridged 2D efficiency with 3D aesthetics, notably applied to id Software's Doom (1993), which popularized ray-casting engines.^[19]^[20]^[21]

Computer Graphics Techniques

Projection Methods

Projection methods in 2.5D computer graphics involve static parallel projections that map three-dimensional scenes onto two-dimensional planes, preserving parallel lines and object sizes without perspective distortion or full 3D computations. These techniques, rooted in pre-digital engineering drawings, enable efficient representation of depth through axis scaling and rotation.^[22]^[23] Axonometric projection scales the x, y, and z axes equally in angle while allowing foreshortening, creating a distortion-free view of depth suitable for visualizing multiple object faces simultaneously. In the isometric variant, the axes are projected at 120-degree angles with equal scaling, facilitating precise measurements and tiling in graphical applications. This approach was employed in the 1989 Macintosh version of SimCity, where axonometric views depicted urban layouts in black-and-white isometric style for city-building simulation.^[22]^[24]^[25] Oblique projection, a parallel method where projection lines are not perpendicular to the plane, displays one face (typically the front) at true size while receding lines are drawn at an angle, often 45 degrees, to suggest depth. It includes cavalier projection, which uses full z-axis scaling for receding lines, and cabinet projection, which halves the z-scale to reduce visual distortion and improve readability. These variants are widely used in computer-aided design (CAD) software for technical illustrations, as they simplify the depiction of mechanical parts without complex rotations.^[23]^[26] The core mathematical transformation for oblique projection converts 3D coordinates (x, y, z) to 2D coordinates (x', y') as follows:

\begin{align*} x' &= x + z \cos \theta \\ y' &= y + z \sin \theta \end{align*}

where \theta is the projection angle, typically 45 degrees for receding lines. This formula applies uniform depth offset without z-division, enabling straightforward implementation in graphics pipelines.^[27] These projection methods offer performance advantages in tile-based games by avoiding costly 3D transformations, allowing constant object sizes and efficient screen-space rendering for large maps. For instance, The Legend of Zelda: A Link to the Past (1991) utilized overhead oblique views to maintain smooth navigation across tiled overworlds on limited hardware.^[28]^[22]

Pseudo-3D Rendering

Pseudo-3D rendering encompasses a set of real-time techniques that simulate three-dimensional navigation and depth using primarily two-dimensional assets and simplified computations, enabling interactive experiences on hardware with limited processing power. These methods, prevalent in early video games, rely on mathematical projections and scaling to create illusions of spatial movement without full 3D polygon rendering. By mapping 2D elements onto screen space based on viewer position, pseudo-3D achieves efficient performance while approximating volumetric environments.^[29] One foundational technique is the ray casting algorithm, which projects rays from the viewer's position through a 2D grid-based map to determine wall intersections, rendering them as vertical strips on the screen to form corridor-like scenes. Developed for games like Wolfenstein 3D (1992), this approach uses a digital differential analyzer (DDA) to traverse the grid efficiently, stepping from cell to cell along the ray's path until hitting a wall. The algorithm begins by calculating the ray direction for each screen column, incorporating the camera plane to simulate field of view, then advances the ray using step directions (±1 in x or y) and side distances updated by delta distances derived from the ray direction components (deltaDistX = |1 / rayDirX|, deltaDistY = |1 / rayDirY|). Upon detecting a wall hit, the perpendicular distance is computed to avoid fisheye distortion, typically as perpWallDist = (side == 0 ? sideDistX - deltaDistX : sideDistY - deltaDistY), where side indicates the hit face (0 for x, 1 for y). This distance determines the wall strip's height on screen: lineHeight = screenHeight / perpWallDist, with the strip drawn from (drawStart = -lineHeight / 2 + screenHeight / 2) to (drawEnd = lineHeight / 2 + screenHeight / 2). A simplified pseudocode for the distance calculation in horizontal ray steps approximates dist = (mapWidth * rayDirX - mapX) / rayDirX for x-side hits, ensuring vertical strips align to create a seamless pseudo-3D view.^[29]^[30] Sprite scaling along the z-axis provides another key illusion of depth, particularly in side-scrolling or overhead perspectives, by dynamically resizing 2D sprites inversely proportional to their calculated distance from the viewer. In early games such as After Burner II (1987), enemies and objects are rendered as flat sprites whose vertical and horizontal dimensions are adjusted based on z-depth to simulate perspective foreshortening, with formulas like scaled_height = original_height / z_depth and scaled_width = original_width / z_depth applied per frame. This technique sorts sprites by distance for correct layering, drawing closer ones over farther ones to mimic occlusion, though without true geometric intersection testing. Such scaling enhances interactivity in 2D environments by adding a navigational z-component, allowing players to perceive relative depths during movement.^[31] The Mode 7 graphics mode on the Super Nintendo Entertainment System (SNES) exemplifies affine transformations for pseudo-3D terrain rendering, applying rotation, scaling, and shearing to a single background layer to simulate scrolling landscapes or racing tracks. This hardware-accelerated feature uses a 128x128 tilemap transformed via coefficients in fixed-point arithmetic, where the rotation matrix for angle α is defined as:

\begin{align*} x' &= x \cos \alpha - y \sin \alpha, \\ y' &= x \sin \alpha + y \cos \alpha, \end{align*}

with subsequent uniform scaling by a zoom factor (e.g., via coefficient a = d = zoom, b = c = 0 for pure scaling) to project the 2D bitmap onto a pseudo-perspective plane. The full affine equation incorporates origin offsets: x' = ax + by + x_0, y' = cx + dy + y_0, enabling real-time updates per frame for rotational navigation effects in games like F-Zero (1990). This mode's efficiency stems from direct memory access to the tile data during scanline rendering, producing smooth 3D-like curvature through per-scanline adjustments.^[32] Despite these innovations, pseudo-3D rendering techniques exhibit inherent limitations, such as the absence of true occlusion culling beyond simple sprite sorting and no support for vertical movement or looking up/down, restricting environments to flat, corridor-based layouts. Games like Rise of the Triad (1994), built on a modified Wolfenstein 3D engine, exemplify the "2.5D" moniker due to these constraints: sectors connect only orthogonally without sloped floors or multi-level verticality, leading to gameplay confined to horizontal planes where height variations are simulated via scaled elements rather than geometric depth. These shortcomings, while enabling high frame rates on 1990s hardware, highlight the transitional nature of pseudo-3D toward full 3D engines.^[33]

Depth Simulation Techniques

Depth simulation techniques in 2.5D graphics enhance the illusion of three-dimensional space within primarily two-dimensional environments by employing environmental and surface effects that suggest depth without requiring complete 3D modeling or full scene reconstruction. These methods leverage layered compositions, motion disparities, and oriented sprites to create perceptual depth cues, drawing on principles of human vision such as motion parallax and relative scaling. Commonly used in early video games constrained by hardware limitations, these techniques allowed developers to achieve immersive atmospheres efficiently on 2D engines. Modern implementations in engines like Unity extend these with tools for advanced parallax and sprite management as of 2023.^[34]^[35] Billboarding is a core depth simulation method where flat 2D sprites are dynamically oriented to always face the camera, simulating volumetric objects like particles or distant foliage without the computational cost of true 3D geometry. This approach applies a texture containing the object's image to a rectangular primitive, which is then rotated to align with the view direction, providing a 3D-like appearance from most angles. The rotation angle to face the viewer is calculated using the formula \angle = \atan2(\viewerY - \spriteY, \viewerX - \spriteX), where the atan2 function determines the direction from the sprite's position to the viewer's position in screen coordinates. In pseudo-3D engines like that of Doom (1993), billboarding was employed for enemy and item sprites to add interactive depth to corridor environments.^[36]^[37] Skyboxes and skydomes further contribute to perceived depth by rendering static 2D images as distant environmental enclosures around the scene, simulating infinite horizons or atmospheric backdrops without modeling far-off geometry. A skybox typically wraps six textured faces around an invisible cube encompassing the playable area, while a skydome uses a hemispherical projection for curved skies; both remain fixed relative to the camera to maintain the illusion of remoteness. This technique, originating in early 3D engines, was adapted for 2.5D applications to provide layered depth in overhead or side-scrolling views, as seen in adaptations of Quake (1996) engine principles for hybrid 2D/3D titles. By placing these elements at maximal depth, they establish spatial context and immersion cost-effectively.^[34] Parallax scrolling simulates depth through multi-layered backgrounds that move at varying speeds proportional to their assigned depth, mimicking the relative motion observed in real-world vision when objects at different distances shift at different rates. Foreground layers advance faster than midground and background layers, creating a sense of progression and spatial separation; the speed for each layer can be computed as \layer\_speed = \base\_speed \times (1 - \depth\_factor), where \depth\_factor ranges from 0 (for the farthest layer) to 1 (for the nearest). In Grand Theft Auto (1997), this was applied to overhead cityscapes, with slower-moving distant buildings and clouds enhancing the urban depth in its top-down 2D perspective.^[38] These techniques integrate seamlessly into 2D engines to produce faux-3D effects, often combining billboarding for interactive elements, skyboxes for atmospheric bounds, and parallax for dynamic backgrounds. Platformers like Castlevania: Symphony of the Night (1997) exemplified this by layering rotational sprite effects with parallax-shifted environments, such as in castle corridors where foreground architecture moves against slower receding walls, fostering a labyrinthine depth that elevates exploration. Such integrations prioritized performance on era hardware, like the PlayStation's sprite capabilities, while delivering visual sophistication.

Film and Animation Techniques

Traditional Multiplane Methods

The multiplane camera, a pioneering device in traditional animation, was developed by Walt Disney Studios and first operational in 1937. This apparatus consisted of multiple horizontal glass planes stacked vertically at varying distances from a vertical camera lens, with each plane holding hand-drawn cels or painted artwork representing different scene elements such as foregrounds, midgrounds, and backgrounds. By moving these planes independently at controlled speeds during filming, animators could simulate camera movements and create a parallax effect, where closer layers appeared to shift more rapidly than distant ones, enhancing the illusion of three-dimensional depth in two-dimensional animation.^[39] A notable predecessor to Disney's multiplane camera was invented by animator Ub Iwerks in 1933, after he left Disney to establish his own studio. Iwerks' version employed a similar layering principle with multiple planes of artwork positioned before a horizontal camera, allowing for basic depth simulation in his ComiColor and Willie Whopper cartoon series throughout the mid-1930s. Although less sophisticated than Disney's later iteration, Iwerks' design demonstrated early feasibility of multiplane techniques for independent animation production.^[40] Parallel developments included Max Fleischer's "setback" camera, introduced in the early 1930s at Fleischer Studios, which used a horizontal setup with rotatable turntables and layered artwork or models to achieve depth and parallax effects in series like Popeye the Sailor.^[41] Disney's multiplane camera debuted in the 1937 Silly Symphony short film The Old Mill, which utilized the device to depict atmospheric scenes of birds in an abandoned windmill, earning the Academy Award for Best Animated Short Film that year. The technique's impact was further showcased in feature-length productions like Bambi (1942), where it contributed to immersive forest sequences by layering detailed natural elements—such as foliage, undergrowth, and distant trees—to convey spatial depth and atmospheric perspective.^[39]^[42] To achieve realistic parallax, the apparent motion for each plane was adjusted proportionally, ensuring that closer layers shifted more than distant ones to mimic natural depth cues when the camera panned or dollied. This mechanical computation allowed precise frame-by-frame control, with operators adjusting plane velocities to align with the intended scene dynamics.^[43] In hand-drawn animation, the multiplane camera offered significant advantages by enabling natural separation of foreground and background elements on distinct planes, which simplified compositing without requiring three-dimensional modeling or perspective redraws for every frame. This approach preserved the fluidity of traditional cel animation while adding volumetric realism, making complex environmental interactions more achievable in pre-digital workflows.^[40]

Digital Layering Approaches

Digital layering approaches in modern film and animation emulate the depth effects of traditional multiplane cameras through software-based z-depth sorting and parallax simulation. In tools like Adobe After Effects, 2D layers are converted to 3D by enabling the 3D switch, adding a z-axis to properties such as Position and Scale, which allows for depth-based rendering.^[44] A 3D camera can then be applied to these layers, where foreground elements at lower z-values shift more relative to the background during movement, creating parallax offset proportional to the z-distance (e.g., offset ≈ z × camera focal length adjustment).^[44] This technique sorts layers by depth to prevent overlaps and simulates multiplane effects without full 3D modeling, enabling efficient compositing for animated sequences. Cut-out animation represents another key digital layering method, where 2D character elements are treated as modular puppets with hierarchical rigging for pseudo-3D rotation and movement. In productions like South Park, digital tools facilitate this by photographing or scanning cut-out parts (e.g., limbs, heads) and assembling them in software, with parenting hierarchies linking child elements to parents for coordinated animation.^[45] Beginning with season 5 in 2001, episodes transitioned to Autodesk Maya, enhancing the cut-out workflow with layered spacing between parts to mimic shadows and subtle depth, allowing rotations that reveal front and back views without true 3D geometry.^[46] This approach maintains the flat, stylized aesthetic while adding dynamic posing through offset layering and simple z-planes. Hybrid techniques in CGI further advance 2.5D by integrating layered 2D styles with depth cues in compositing pipelines. In Spider-Man: Into the Spider-Verse (2018), animators built 3D models but animated them with hand-drawn techniques, then applied 2D overlays like ink lines, halftone dots, and comic-book graphics in post-production to layer stylized elements over depth-rendered scenes.^[47] Compositors used custom tools to blend these layers, ensuring depth perception through parallax shifts and selective focus while preserving a 2D comic aesthetic, resulting in a visually distinctive pseudo-3D effect.^[48] The evolution of these approaches traces from 1990s Flash animations, which employed pseudo-3D through tweened scaling and rotation of layered sprites to simulate depth in web-based shorts, to contemporary tools like Unity that support VR-ready 2.5D workflows.^[49] In Unity, 2D elements can be rigged with skeletal animation and positioned in z-space for parallax under VR cameras, enabling immersive animated shorts that extend filmic layering into interactive environments.^[50]

Graphic Design Applications

Isometric and Oblique Views

In graphic design, isometric projections provide a foundational tool for simulating three-dimensional depth on two-dimensional surfaces, particularly through the use of isometric grids in software like Adobe Illustrator. These grids typically feature lines oriented at 30-degree angles relative to the horizontal axis, forming equilateral triangles that enable precise alignment of elements without converging perspective lines.^[51] Designers leverage these grids to create UI mockups and packaging visuals, where the uniform scaling across all axes conveys spatial relationships clearly and engagingly, enhancing the perceived solidity of flat illustrations.^[52] Oblique views complement isometric techniques in graphic design by offering a more flexible alternative for illustrating complex assemblies, especially in exploded diagrams that separate components to reveal internal structures. In oblique projection, the depth axis is often scaled to half its true length—a method known as cabinet oblique—to mitigate visual distortion while maintaining true proportions on the principal faces.^[53] This approach has been prevalent in instruction manuals and technical illustrations since the 1980s, allowing designers to clarify assembly sequences in print media without the need for full three-dimensional modeling.^[53] Representative examples highlight the practical application of these 2.5D projections in consumer-facing design. IKEA catalogs and assembly guides employ isometric and oblique layouts to depict furniture arrangements and disassembly steps, providing intuitive spatial guidance that aids user comprehension in flat, printed formats.^[54] Tools like Adobe Photoshop further facilitate 2.5D effects in print media through features such as Perspective Warp, which allows designers to distort layers along custom grids to simulate isometric or oblique angles quickly. This enables the transformation of standard 2D assets into pseudo-three-dimensional compositions, ideal for packaging prototypes or editorial illustrations, without requiring specialized 3D software.^[55]

Informational Visualization

In informational visualization, 2.5D techniques leverage simulated depth through layering, z-offsets, and perspective cues to represent spatial and multivariate data, bridging the gap between flat 2D graphics and computationally intensive 3D models. This approach enhances perceptual understanding by adding a third visual dimension without requiring full geometric rendering, allowing designers to encode additional variables like volume or hierarchy in a scalable manner.^[56] A prominent example is 2.5D charts, such as layered bar graphs or treemaps that employ z-offsets to depict data volume or temporal changes. In these visualizations, elements are stacked with varying heights to represent magnitude, enabling users to discern patterns in hierarchical or multivariate datasets interactively.^[57] Infographics often utilize 2.5D through parallax-like stacking in web design, where CSS properties such as z-index and 3D transforms create layered depth for dynamic narratives. This method simulates movement along a z-axis as users scroll, improving engagement in interactive timelines or storytelling pieces. The New York Times has pioneered such applications in visualizations like "Snow Fall," employing parallax effects to layer multimedia elements and guide readers through complex data stories since 2012.^[58]^[59] These 2.5D methods offer key benefits, including improved readability and reduced cognitive load compared to purely 2D displays, as they introduce subtle depth cues that highlight relationships without overwhelming users with 3D occlusion issues. This aligns with Edward Tufte's foundational principles of maximizing data-ink ratio and minimizing non-essential elements, adapted to digital contexts where layered depth enhances clarity in multivariate presentations.^[60]

Technical Foundations

Mathematical Principles

The mathematical principles of 2.5D techniques rely on affine transformations and approximations of perspective to simulate depth in a 2D plane, often treating the third dimension as a parameter for scaling or shifting without full volumetric computation.^[61] Coordinate transformations form a foundational element, enabling oblique or isometric views that shear 2D coordinates based on a depth value to mimic 3D projection. A general 2.5D projection matrix for oblique views can be expressed as an affine shear transformation:

P = \begin{pmatrix} 1 & 0 & \tan \theta \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}

where \theta is the shear angle determining the obliqueness, and the matrix applies to homogeneous coordinates (x, y, z, 1) by shifting the x-coordinate by \tan \theta \cdot z while preserving y and z. This matrix, derived from parallel projection combined with shearing, projects points without foreshortening in the depth direction, suitable for isometric rendering where true perspective convergence is avoided.^[61]^[62] Parallax effects in 2.5D layering arise from differential motion of planes at varying depths, quantified by the parallax equation for displacement under camera translation. The displacement of a layer is given by d = b \cdot \frac{z}{z + D}, where b is the baseline (camera movement), z is the layer's depth, and D is the distance from camera to the reference plane; this formula approximates the relative shift inversely proportional to effective distance, creating depth illusion through slower motion of distant layers relative to nearer ones. In multiplane setups, this ensures foreground elements shift more than backgrounds during panning, with the ratio of displacements between layers directly tied to their depth differences. Ray casting in 2.5D environments simulates perspective scaling for wall heights by casting rays from the viewer and adjusting vertical extent based on distance. The wall height on screen is calculated as h = \frac{H}{z \cdot \cos \phi}, where H is the screen height, z is the perpendicular distance to the wall, and \phi is the field-of-view angle of the ray relative to the view direction; the \cos \phi term corrects for fisheye distortion by using perpendicular rather than radial distance. This derivation stems from similar triangles in perspective projection, where closer walls appear taller, enabling efficient 2D map traversal into pseudo-3D views without full polygon rasterization.^[29] Bump and normal mapping perturb surface normals to simulate geometric detail without altering vertex positions, relying on a height field to derive illusions of relief. The perturbed normal N' is computed as N' = N + k \cdot \nabla h, where N is the original surface normal, k is the bump height scaling factor, and \nabla h is the gradient of the height map h(u,v) in tangent space, approximating slopes as (\partial h / \partial u, \partial h / \partial v, 0). To derive this, consider a surface parameterized by u and v with implicit height z = h(u,v); the tangent vectors are \mathbf{T_u} = (\partial u / \partial u, \partial v / \partial u, \partial h / \partial u) \approx (1, 0, \partial h / \partial u) and \mathbf{T_v} \approx (0, 1, \partial h / \partial v), yielding the unnormalized normal as \mathbf{T_u} \times \mathbf{T_v} = (-\partial h / \partial u, -\partial h / \partial v, 1); for small perturbations on a flat base normal N = (0,0,1), this simplifies to the additive gradient form before normalization N' = N' / \|N'\| for lighting computations. This approach, avoiding geometric displacement, enhances shading fidelity at low computational cost.^[63]^[64]

Modern Extensions and Generalizations

In virtual reality (VR) and augmented reality (AR) applications, 2.5D techniques employing layered sprites have been integrated into game engines such as Unity to blend with full 3D elements, optimizing performance by reducing the computational demands of complex 3D rendering while maintaining immersive depth effects.^[65] This hybrid approach enables efficient handling of dynamic environments in resource-constrained VR/AR hardware, as demonstrated in pseudo-2.5D content synthesis for mixed reality headsets.^[65] AI-assisted 2.5D has advanced through neural networks that generate depth maps from single 2D images, enabling automated conversion of flat artwork into layered pseudo-3D structures. The MiDaS model, introduced in 2019 and refined in subsequent versions, employs a robust training strategy mixing diverse datasets to predict relative inverse depth, achieving zero-shot generalization across scenes for applications like retro game remakes.^[66] This facilitates adding depth to pixel art assets, as seen in tools converting 2D sprites to parallax-layered environments, revitalizing classic games with modern 2.5D effects while preserving original aesthetics.^[66] Despite these advancements, 2.5D techniques face criticisms for limitations in occlusion handling, particularly in complex scenes where partial overlaps, such as concave shapes or wrapped elements, cannot be accurately resolved without full 3D support. Future trends include hybrids like HD-2D, as in Octopath Traveler (2018), which fuses 2D sprites with 3D environments and lighting for more dynamic interactions while maintaining 2.5D efficiency.