Heightmap
A heightmap, also known as a heightfield, is a two-dimensional raster image in computer graphics that stores elevation values for a surface, where each pixel's intensity—typically grayscale—encodes the height at corresponding (x, y) coordinates to enable the reconstruction of three-dimensional geometry.[1] This representation assumes a single height value per horizontal position, making it suitable for modeling landscapes without overhangs or caves.[2] Heightmaps are extensively used in terrain rendering for video games, simulations, and geographic information systems (GIS), where they define the vertical displacement of vertices on a base mesh to create realistic landscapes such as mountains, valleys, and plains.[2] In game engines like Unity, heightmaps are imported as grayscale RAW files (often 16-bit for precision) to sculpt terrain procedurally or from real-world digital elevation model (DEM) data, allowing tools to raise, lower, or smooth elevations efficiently.[2] This approach is memory-efficient for large-scale environments, as it requires only a fraction of the storage compared to full polygonal meshes, while supporting level-of-detail techniques for performance optimization during rendering.[3] Beyond terrain, heightmaps serve as displacement maps in shader-based rendering to add fine-scale surface details without increasing polygon counts, such as simulating cracks on rocks or fabric weaves.[4] In techniques like parallax occlusion mapping, the height data perturbs texture coordinates to create depth illusions on flat surfaces, with brighter pixels indicating protrusions that occlude adjacent areas for enhanced realism.[4] Common formats include PNG or TIFF for grayscale storage, where white pixels denote maximum height and black minimum, often normalized to a 0-1 range for GPU processing.[4] Heightmaps can be generated procedurally using noise functions like Perlin noise for synthetic terrains or derived from satellite imagery and LiDAR scans in geospatial applications, ensuring compatibility with standards like those from the USGS for accurate elevation modeling.[5] Their simplicity facilitates integration into real-time graphics pipelines, though limitations in representing complex topologies have led to hybrid methods combining heightmaps with voxel or mesh-based extensions in advanced scenarios.[6]Fundamentals
Definition and Purpose
A heightmap, also referred to as a heightfield, is a two-dimensional raster array or image that encodes the elevation of points on a three-dimensional surface, where each pixel or grid cell stores a scalar value representing the height at that location. This representation typically uses a grayscale format, with pixel intensities mapping to elevation levels; for instance, in an 8-bit image, values range from 0 (black, indicating minimum elevation) to 255 (white, indicating maximum elevation), allowing for 256 discrete height levels.[6] The structure defines a function h: \mathbb{R}^2 \to \mathbb{R}, where elevations are interpolated between grid points to form a continuous surface.[6] The core purpose of a heightmap is to facilitate the compact storage, manipulation, and rendering of complex 3D terrain data by applying vertical displacements to a base mesh, such as a grid of vertices, based on the encoded height values, thereby avoiding the need for explicit storage of full volumetric or polygonal geometry. This approach significantly reduces computational overhead in real-time applications, as the underlying mesh remains planar in the horizontal dimensions while elevation variations are handled separately.[6] Heightmaps thus serve as a foundational tool for simulating realistic landscapes in fields like computer graphics, enabling efficient visualization of large-scale environments without prohibitive memory or processing demands.[6] The term and concept originated in computer graphics during the late 1970s and early 1980s, amid efforts to model natural phenomena like terrain through stochastic processes, with foundational work by Fournier, Fussell, and Carpenter in 1982 introducing methods for rendering fractal-based surfaces suitable for heightfield representations.[7] Early adoption occurred in terrain simulations for flight simulators, where heightmaps provided a practical means to depict undulating landscapes over vast areas, marking a shift from vector-based to raster-driven graphics techniques.[8] Key advantages of heightmaps include their small file sizes due to 2D storage and quantization, which support scalability across expansive regions like planetary surfaces; additionally, they integrate seamlessly with complementary techniques such as texture mapping for surface coloration and normal mapping for lighting details, enhancing visual fidelity without increasing geometric complexity.[6]Data Representation
Heightmaps encode terrain elevation data through various schemes to facilitate storage and 3D reconstruction. The most common approach uses grayscale images, where pixel intensity directly corresponds to height values, with darker shades representing lower elevations and brighter shades indicating higher ones; this method leverages 8-bit or 16-bit depth for sufficient precision in representing subtle variations. Additionally, normalized floating-point values, typically in the range of 0 to 1 for unsigned heights or -1 to 1 for signed displacements relative to a base plane, are employed in binary or image formats to handle relative elevations efficiently during processing.[3] File formats for heightmaps vary by application, balancing compression, metadata support, and compatibility. RAW binary files store uncompressed arrays of height values as simple numeric grids, often in 16-bit integer format for high fidelity without headers or encoding overhead. Lossless image formats like PNG and TIFF preserve grayscale or multichannel data with embedded metadata, such as georeferencing, making them suitable for both graphics and geospatial workflows.[9] Specialized geospatial formats include HGT, a binary grid used for Shuttle Radar Topography Mission (SRTM) data with fixed 1 arc-second resolution tiles named by latitude and longitude (e.g., N37W122.hgt), and Digital Elevation Model (DEM) variants like those in GeoTIFF or BIL, which support elevation matrices with accompanying headers for projection and units.[10][11] The resolution of a heightmap defines the spatial detail of the terrain, with each pixel mapping to a specific horizontal interval; for instance, a 1024×1024 grid might cover a 1 km × 1 km area at 1 m per pixel, allowing fine-grained features like ridges or valleys to be captured.[12] Vertical scaling involves applying an exaggeration factor to height values, which stretches the z-axis relative to the x-y plane (e.g., a factor of 2 doubles apparent relief) to enhance visibility of subtle topographic differences in visualizations where natural proportions appear flat to human perception.[13] Despite their efficiency, heightmaps have inherent limitations due to their 2.5D structure, which assigns only a single height value to each (x, y) coordinate, precluding the modeling of overhangs, caves, or other non-vertical features that require full 3D volumetric data.[14] Low-resolution heightmaps exacerbate aliasing artifacts, manifesting as jagged edges or terracing in reconstructed surfaces, particularly when bilinear interpolation fails to smooth transitions between discrete pixel values.[15]Generation Methods
Manual Creation
Manual creation of heightmaps involves artists and designers using digital tools to directly paint or draw elevation data, providing precise control over terrain features in two-dimensional grayscale images where pixel intensity represents height. Digital painting software, such as Adobe Photoshop or Substance 3D Painter, enables this through specialized brushes that simulate sculpting elevations, allowing users to build up or erode areas by varying brush opacity and pressure to define hills, valleys, and ridges.[16] In Substance 3D Painter, the Height channel supports HDR color formats for painting positive and negative displacements without brightness limits, facilitating detailed manual adjustments.[16] Vector-based drawing tools, like Adobe Illustrator, can also be used to create initial outlines of terrain contours, which are then rasterized into grayscale heightmaps for further editing.[17] Additionally, spline-based methods involve drawing smooth contour lines using tools in software like AutoCAD or gCADPlus, which are subsequently extruded or interpolated to generate height values across the map.[18] The typical workflow begins with rough sketches of key terrain features, such as hills and valleys, using basic drawing tools to outline major elevations on a grayscale canvas. Artists then iteratively refine these sketches by layering details with brushes or paths, applying manual filters—like Gaussian blur for smoothing transitions or custom erosion effects—to mimic natural wear without relying on automation. Finally, the resulting image is exported in standard heightmap formats, such as 16-bit PNG, ensuring compatibility with rendering engines. This process emphasizes iterative editing, where designers test the heightmap in a 3D preview to adjust inconsistencies in real-time.[19] Manual creation offers high artistic flexibility, enabling the design of bespoke landscapes tailored to narrative or aesthetic needs, such as fantastical worlds with exaggerated peaks or hidden chasms that procedural methods might overlook. However, it is time-intensive, often requiring weeks or months for complex terrains, and demands significant skill to avoid inconsistencies like unnatural slopes or artifacts that could disrupt visual coherence.[20] Without supplementary aids, maintaining scale and geological realism can be challenging for non-expert users.[20] In game level design, manual heightmap creation is particularly valuable for adjusting elevations to integrate gameplay elements, such as carving paths for player navigation or raising obstacles to influence combat dynamics, as seen in environments built for Unreal Engine projects.[19] This approach allows designers to prioritize player experience over generic randomness, ensuring terrains align with specific level objectives.[20]Procedural Generation
Procedural generation of heightmaps relies on algorithmic methods to create synthetic terrain data automatically, enabling scalable and varied landscapes without manual intervention. These approaches draw from mathematical models of natural processes, producing height values across a grid based on pseudo-random functions that mimic geological features like hills, valleys, and plateaus. Core to this process is the use of noise functions, which generate coherent, continuous variations rather than pure randomness, ensuring realistic undulations in the terrain.[21] Among the foundational algorithms is Perlin noise, introduced in 1985 as a gradient noise function that layers coherent noise to simulate natural textures. It addresses the shortcomings of earlier random methods by providing smooth, organic transitions, making it ideal for heightmap undulations. To achieve multi-scale detail, Perlin noise is often combined with fractal Brownian motion (fBm), which sums multiple octaves of noise with decreasing amplitudes to model roughness across scales, as formalized in early procedural terrain models. The fBm height value at a point can be computed as: h(x, y) = \sum_{i=0}^{n-1} \left( \frac{1}{2} \right)^i \cdot \text{noise}(2^i x, 2^i y) where n is the number of octaves, lacunarity is 2 (doubling frequency per octave), and persistence is 0.5 (halving amplitude), with the noise term contributing progressively finer details.[22][23][21] An advancement over Perlin noise is Simplex noise, developed in 2001 to improve computational efficiency, particularly in higher dimensions, by using a simpler simplicial grid that reduces directional artifacts and multiplications compared to Perlin's hypercubic lattice. This makes it suitable for generating three-dimensional heightmaps with less overhead, while maintaining the natural continuity essential for terrain. fBm can similarly incorporate Simplex noise for enhanced performance in multi-scale roughness simulation.[24] Key techniques extend these noise bases to introduce complexity. Domain warping distorts the input coordinates of a noise function using another auxiliary noise layer, creating intricate features such as winding rivers or eroded canyons by folding the terrain domain non-uniformly. Erosion simulation further refines heightmaps by applying physics-inspired models, like hydraulic erosion, where simulated water flow transports sediment from high to low areas, carving realistic valleys through iterative particle or grid-based computations. Thermal erosion variants model rock breakdown and slumping for additional geological fidelity. Seed-based randomness initializes these algorithms deterministically, allowing reproducible generation by fixing the pseudo-random sequence from a user-provided seed.[21][25][26] Tunable parameters control the output characteristics. Octave count determines the layers of detail, with higher values adding finer textures at the cost of computation. Persistence governs amplitude falloff per octave, typically between 0.5 and 1, where lower values yield smoother terrain by diminishing high-frequency contributions. Lacunarity scales frequency per octave, often set to 2 for doubling detail resolution, promoting fractal-like scaling. Boundary conditions, such as periodic wrapping or mirroring, prevent visible seams in tiled heightmaps for seamless worlds.[23][21] As of 2025, modern advancements integrate machine learning, particularly generative adversarial networks (GANs) trained on real digital elevation models (DEMs), to produce plausible heightmaps more rapidly than traditional noise methods alone. These models learn distributional patterns from data, enabling conditional generation from sketches or parameters, and outperform rule-based approaches in capturing rare geological motifs while reducing manual tuning.[27]Real-World Data Acquisition
Real-world heightmaps, also known as digital elevation models (DEMs), are derived from physical measurements using various remote sensing technologies to capture terrain elevations with high fidelity to actual landscapes. Satellite altimetry, such as radar interferometry, provides broad-scale data; for instance, NASA's Shuttle Radar Topography Mission (SRTM) in 2000 generated a near-global dataset covering nearly 80% of Earth's land surfaces at 30-meter resolution, using synthetic aperture radar to measure surface heights.[28] Similarly, the ASTER Global Digital Elevation Model (GDEM), released in 2009 with major versions in 2011 (v2) and 2019 (v3), utilizes optical stereo imaging from the Advanced Spaceborne Thermal Emission and Reflection Radiometer to produce 30-meter gridded elevations worldwide.[29] Airborne LiDAR (Light Detection and Ranging) offers centimeter-level precision for detailed surveys by emitting laser pulses and measuring their return times; the height h is calculated as h = \frac{[c](/page/Speed_of_light) \cdot t}{2} - ground offset, where c is the speed of light and t is the round-trip flight time of the pulse.[30] This time-of-flight method enables vertical accuracies better than 10 cm in optimal conditions, making LiDAR ideal for high-resolution mapping over targeted areas. Photogrammetry complements these by inferring elevations from stereo imagery, often captured by drones, where height is estimated through parallax—the relative displacement of features between overlapping images taken from different viewpoints.[31] This technique generates DEMs by mathematically reconstructing 3D geometry from 2D image pairs, suitable for both aerial and UAV platforms.[32] Acquired data typically requires processing to form usable heightmaps, beginning with interpolation of sparse point clouds to a regular grid; inverse distance weighting (IDW), for example, estimates elevations at grid nodes by weighting nearby measurements inversely proportional to their distance, preserving local variations.[33] Datum adjustments are essential to align data with reference systems, such as converting from the WGS84 ellipsoid to mean sea level (MSL) for hydrological applications, ensuring consistent vertical referencing across datasets.[34] Error correction addresses artifacts like vegetation penetration in LiDAR or radar shadowing, using techniques such as ground filtering to remove non-terrain returns and statistical adjustments based on auxiliary data like ICESat-2 altimetry for vegetated or urban areas.[35][36] Global datasets like SRTM and ASTER GDEM offer 30-meter resolutions with near-complete land coverage but face challenges in data voids, particularly in polar regions due to low solar illumination or persistent cloud cover, and oceanic areas where radar signals reflect off water surfaces without penetrating to bathymetry.[37] In contrast, local surveys using USGS LiDAR achieve 1-meter resolutions over the contiguous United States, providing dense point clouds for urban or forested terrains but limited to specific project areas.[38] These voids necessitate infilling with multi-source merging, as seen in efforts to create gapless models for Antarctica.[39] Access to these datasets varies, with public domain releases available through platforms like USGS EarthExplorer, which distributes SRTM, ASTER, and LiDAR-derived DEMs free of charge for non-commercial use.[40] High-resolution proprietary data, such as commercial LiDAR surveys, may require purchase for specialized applications. As of 2025, the Copernicus DEM has seen improvements from Sentinel-1 and Sentinel-2 satellites, including better infilling over regions like Norway and the Spanish Pyrenees at 30-meter resolution, enhancing global coverage and accuracy.[41]Applications
Computer Graphics and Rendering
In computer graphics, heightmaps serve as a fundamental tool for terrain visualization in real-time rendering pipelines, enabling efficient representation of three-dimensional landscapes on two-dimensional grids. By encoding elevation data in grayscale textures, heightmaps allow graphics engines to displace base meshes, creating the illusion of complex geometry without the computational overhead of full polygonal models. This integration is particularly prominent in game development and virtual reality applications, where interactive performance is paramount. Rendering techniques for heightmaps primarily revolve around displacement mapping, where vertex positions are adjusted based on sampled height values to generate undulating surfaces. In vertex shaders, this is achieved by sampling the heightmap texture at a vertex's texture coordinates and offsetting the position accordingly; for example, in GLSL, the transformed position can be computed asvec3 pos = basePos + vec3(0.0, texture(heightTex, texCoord).r * scale, 0.0);. This method, known as vertex displacement, efficiently deforms low-resolution grids into detailed terrains during the geometry stage of the pipeline. To enhance detail adaptively, tessellation subdivides patches dynamically on the GPU, applying displacement to newly generated vertices for smoother transitions and higher fidelity in close-up views, as demonstrated in hardware-accelerated approaches that leverage DirectX 11 or Vulkan tessellation units. Additionally, parallax occlusion mapping (POM) provides a pixel-level illusion of depth on flat-shaded surfaces by iteratively offsetting texture coordinates along the view ray using heightmap samples, simulating self-occlusion and adding subsurface relief without increasing geometry complexity; this technique traces multiple depth samples per fragment to approximate ray marching, improving visual realism for bumpy terrains at minimal cost.[42][43][44]
Heightmaps are often combined with complementary texture maps to achieve photorealistic lighting and material effects. Normal maps, which define surface tangents for specular highlights and shadows, are commonly derived from heightmaps by computing gradients via the Sobel filter: the partial derivatives ∂h/∂x and ∂h/∂y approximate slope directions, normalized to form the normal vector (n_x, n_y, 1). These normals integrate seamlessly with physically based rendering (PBR) workflows, where albedo textures provide base colors modulated by the displaced geometry. For instance, in lit terrains, the combined maps enable accurate diffuse and specular responses under dynamic lights, enhancing the perceptual depth of procedural landscapes.
Performance optimizations are crucial for real-time applications, as high-resolution heightmaps can strain GPU resources. Level-of-detail (LOD) systems switch between progressively coarser heightmap resolutions based on viewer distance, reducing vertex processing and draw calls; in Unity, terrain components automatically manage LOD groups for height-displaced meshes, blending seams to avoid popping artifacts. Mipmapping applies filtered downsamplings to height textures, mitigating aliasing on distant surfaces by selecting appropriate levels during sampling, which cuts texture fetch bandwidth in large scenes. Modern engines like Unreal leverage GPU acceleration through compute shaders for asynchronous heightmap updates and displacement, supporting massive worlds without frame drops.[45]
The use of heightmaps in graphics has evolved significantly since early pseudo-3D engines. In DOOM (1993), sector-based elevation simulated basic height variations without true heightmaps, laying groundwork for 2.5D rendering. By 2016, No Man's Sky employed procedural heightfields for infinite planetary terrains, displacing voxel-based meshes in real-time. As of 2025, trends emphasize real-time global illumination on height-displaced surfaces, with ray-traced denoising techniques in engines like Unreal Engine 5 applying indirect lighting to dynamically tessellated terrains, achieving cinematic quality at interactive rates through hardware-accelerated RT cores.