Shadow mapping
Shadow mapping is a real-time computer graphics technique for rendering shadows in three-dimensional scenes, introduced by Lance Williams in 1978 as a method to cast curved shadows onto curved surfaces using depth buffering.[1] The approach involves two primary rendering passes: first, generating a depth map (or shadow map) from the perspective of the light source by rendering the scene and storing the depth values of visible surfaces; second, during the main scene render from the viewer's perspective, transforming fragment coordinates into light space and comparing their depths against the shadow map to determine if they are occluded and thus shadowed.[2][3] This image-based method offers significant advantages for interactive applications, such as video games and simulations, due to its linear computational cost relative to scene complexity—approximately twice that of standard rendering—and its compatibility with hardware acceleration via graphics APIs like OpenGL and Direct3D.[1][3] It supports dynamic shadows for both static and moving objects without requiring additional geometric primitives, making it suitable for large-scale environments.[2] However, shadow mapping is prone to artifacts, including aliasing from resolution limitations, self-shadowing "acne" on surfaces due to floating-point precision errors, and perspective aliasing where shadow resolution varies unevenly across the scene; these issues are commonly mitigated through techniques like depth bias, percentage-closer filtering, and polygon offsetting.[3][2] Over time, variants have addressed these limitations to enhance quality and performance. Cascaded shadow maps divide the view frustum into multiple depth ranges, allocating higher resolution to nearer cascades for improved detail in foreground shadows.[2] Variance shadow maps store depth variance in the map to enable soft shadows via statistical sampling, reducing aliasing without multiple samples per fragment.[4] Other improvements include adaptive resolution adjustments and integration with modern GPU features for handling complex lighting, such as point lights via cube-mapped shadow maps.[3] These evolutions have made shadow mapping a foundational technique in real-time rendering pipelines, including those in engines like Unreal Engine.[2]Fundamentals
Definition and History
Shadow mapping is a rasterization-based computer graphics technique used to approximate hard shadows in rendered scenes by generating a depth map, known as the shadow map, from the viewpoint of a light source and then comparing depths during the primary scene rendering to determine shadowed regions.[5] This image-space method leverages depth buffering to efficiently handle occlusions without explicit ray tracing, making it suitable for both static and dynamic scenes.[6] The technique was invented by Lance Williams in 1978, detailed in his seminal SIGGRAPH paper "Casting Curved Shadows on Curved Surfaces," which introduced the core idea of projecting depth information from a light's perspective to cast shadows onto arbitrary surfaces, including curved ones.[5] Initially, shadow mapping found application in offline rendering for pre-computed animations and visual effects, particularly in the 1980s as computational power allowed for more complex scene illumination in film production. For instance, Pixar researchers extended the method in 1987 to handle antialiased shadows using depth maps for area light sources, enabling higher-quality results in ray-traced environments like those in early computer-animated films.[7] Shadow mapping transitioned to real-time rendering in the late 1990s and early 2000s, driven by advancements in graphics hardware that supported programmable shaders and depth textures. The NVIDIA GeForce 3 GPU, released in 2001, provided hardware acceleration for shadow maps via DirectX 8 and OpenGL extensions, allowing efficient implementation in interactive applications.[6] This milestone facilitated its adoption in video games, marking one of the earliest uses of real-time shadow mapping for dynamic shadows. By the mid-2000s, integration into standard rendering pipelines in OpenGL and DirectX enabled widespread use for handling multiple dynamic lights in real-time scenarios, evolving from its offline origins to a cornerstone of modern graphics engines.[6]Principles of Shadows and Shadow Maps
Shadows in optical physics arise from the occlusion of light by intervening geometry, preventing direct illumination from reaching certain surfaces. When an opaque object blocks rays from a light source to a receiver, it casts a shadow consisting of two distinct regions: the umbra, where the light source is completely obstructed and no direct light reaches the surface, and the penumbra, where partial occlusion occurs, allowing some light rays to graze the edges of the occluder and create a transitional zone of reduced intensity.[8] This formation depends on the relative positions of the light, occluder, and receiver, with the umbra being the darkest core and the penumbra providing a softer boundary.[9] The nature of shadows—hard or soft—fundamentally stems from the size and distance of the light source relative to the occluder. A point light source, idealized as having zero extent, produces sharp, hard shadows with no penumbra because all rays are either fully blocked or fully transmitted, resulting in binary occlusion.[8] In contrast, extended light sources, such as area lights with non-negligible size comparable to the occluder distance, generate soft shadows featuring prominent penumbrae, as varying portions of the source remain visible around the occluder's edges, blending the transition from full shadow to illumination.[9] Larger source sizes or closer occluder distances amplify the penumbra width, enhancing realism but increasing computational complexity in simulation.[8] In computer graphics, shadow maps digitally represent these occlusion principles as a 2D texture capturing the minimum depth from the light source to visible surfaces within its view frustum, serving as a proxy for determining shadowed regions during rendering.[10] This depth map encodes, for each texel (pixel in texture space), the closest distance along rays emanating from the light, effectively approximating the umbra and penumbra boundaries by comparing scene depths against stored values.[1] The technique relies on rasterization pipelines that employ projective geometry to transform world coordinates into the light's view space via view-projection matrices, which define the frustum as a perspective volume bounding the illuminated scene.[11] Depth buffering, a core prerequisite in this rasterization process, maintains a per-pixel buffer storing the minimum depth value encountered during scene traversal, resolving visibility by discarding fragments farther from the viewpoint (or light, in shadow map generation).[11] Projective geometry ensures accurate mapping by applying homogeneous transformations—combining view matrices (positioning the light as camera) and projection matrices (perspective or orthographic)—to clip and normalize coordinates within the frustum, enabling the shadow map to align seamlessly with the light's optical projection.[10] This foundation allows shadow maps to efficiently proxy real-world occlusion without explicit ray tracing of every light path.[1]Core Algorithm
Generating the Shadow Map
The generation of the shadow map constitutes the first pass of the shadow mapping algorithm, where the scene is rendered solely from the perspective of the light source to capture depth information about occluding geometry. This process utilizes the light's viewpoint to determine visible surfaces, storing their distances in a depth texture that serves as the shadow map. Introduced by Williams in 1978, this depth-only rendering leverages Z-buffer techniques to efficiently compute the nearest surface depth for each pixel in the light's view frustum.[12] To initiate the generation, the view matrix for the light is established by positioning a virtual camera at the light source and orienting it along the light's direction, transforming world-space coordinates into light-view space. The projection matrix is then configured based on the light type: an orthographic projection for directional lights to model parallel rays emanating from an infinite distance, and a perspective projection for spot lights to simulate the conical volume illuminated by the source with a defined field of view and angle. For point lights, which emit in all directions, a perspective projection is applied across multiple faces of a cubemap to encompass the full 360-degree surroundings, though basic implementations often limit this to simpler cases. The scene geometry is subsequently rendered using these matrices, employing a fragment shader or render state that discards color output and writes only the depth values to the attached depth buffer. These depths are stored in a 2D texture, typically at a resolution like 1024×1024 pixels, which provides a balance between shadow detail and rendering overhead.[13][14] During rendering, the depth value z_{\text{light}} for each fragment is derived from the light-space position of the world vertex, computed asz_{\text{light}} = \text{projection}_{\text{light}} \cdot \text{view}_{\text{light}} \cdot \mathbf{pos}_{\text{world}},
and then normalized and clamped to the [0,1] range suitable for texture storage, representing the relative distance from the light to the surface. This value records the minimum depth (closest occluder) per texel via depth testing, ensuring the shadow map encodes only the frontmost geometry visible to the light.[13] For scenes with multiple light sources, shadow maps are generated sequentially for each active light, producing distinct depth textures that can later be sampled independently during scene rendering. This per-light approach accommodates varying projection types and positions but scales the computational cost with the number of shadow-casting lights, often necessitating optimizations like limiting shadows to key sources in real-time applications.[6]