Video game graphics
Video game graphics encompass the visual representations and rendering techniques employed in video games to depict characters, environments, objects, and effects, evolving from rudimentary vector and pixel-based displays to sophisticated real-time 3D simulations that enhance player immersion.[1][2] The history of video game graphics traces back to the late 1950s, when early experiments utilized oscilloscope displays for simple vector graphics, as seen in Tennis for Two (1958), which rendered basic line-based simulations of moving balls and paddles.[2] By the 1960s and 1970s, advancements at institutions like the University of Utah pioneered foundational techniques, including Gouraud shading (1971) for smooth surface interpolation and texture mapping (1974) by Edwin Catmull, which allowed images to be applied to 3D surfaces for greater realism.[3] Vector graphics dominated arcade games in the late 1970s and early 1980s, enabling scalable wireframe visuals in titles like Asteroids (1979) and Battlezone (1980), where electron beams directly drew lines on CRT screens for precise rotations and high-resolution lines without pixelation.[2] The shift to raster scan CRT displays in the 1970s introduced pixel-based bitmapped graphics, supporting colorful sprites and backgrounds in games such as Space Invaders (1978) and Pac-Man (1980), though limited by fixed grids that complicated scaling and rotation.[2] The 1990s marked the transition to 3D graphics, driven by hardware like the PlayStation console and APIs such as DirectX, with Final Fantasy VII (1997) exemplifying early polygonal models rendered via triangle rasterization for real-time gameplay.[1] Core techniques include rasterization, which converts 3D triangles into 2D pixels through stages like vertex shading and pixel shading, often using the Phong reflection model for lighting effects.[1] In contemporary video games, graphics leverage advanced real-time rendering pipelines to achieve near-photorealism, incorporating ray tracing for accurate light simulation and global illumination, as demonstrated in engines like Unreal Engine 5.[1] This evolution prioritizes performance for interactive frame rates (typically 30-60 FPS), contrasting with offline rendering in animated films, while continuing to build on decades of innovations in shading, texturing, and display technologies.[1][3]Early Graphics Techniques
Text-based Graphics
Text-based graphics in early video games relied on ASCII characters and descriptive prose to visualize environments, objects, and interactions, serving as the primary visual medium on text-only terminals and early computers lacking dedicated graphics hardware. This approach emerged in the 1970s amid the limitations of mainframe systems like the PDP-10, where games used typed commands and output to simulate immersive worlds without visual rendering.[4] The genre, often called interactive fiction or text adventures, prioritized narrative depth and player agency over visual fidelity, drawing on literary traditions to engage users through imagination.[5] The foundational example is Colossal Cave Adventure, developed by Will Crowther around 1975 and refined with Don Woods in 1976, which depicted a sprawling cave network through vivid textual descriptions such as "You are standing at the end of a road before a small brick building" and simple two-word commands like "go north."[4] Techniques evolved to include ASCII art for rudimentary maps and symbols, as seen in the roguelike genre's originator, Rogue, created in 1980 by Michael Toy, Glenn Wichman, and Ken Arnold for Unix systems. Rogue employed procedural generation to create randomized dungeon levels displayed via ASCII characters—letters for walls, symbols for monsters and items—allowing dynamic, replayable explorations without static visuals.[6] These methods enabled complex gameplay on resource-constrained hardware, with text serving both as interface and "graphic" element to represent spatial layouts and events.[7] Despite their innovations, text-based graphics faced inherent limitations, including the absence of color, animation, and intuitive visuals, which placed heavy reliance on players' mental imagery to fill in details and sustain engagement.[8] This shifted in the late 1970s with the Zork series, developed in 1977 by Tim Anderson, Marc Blank, Bruce Daniels, and Dave Lebling at MIT, which advanced parsing for natural-language commands like "take all but rug" but remained purely textual; its commercial release by Infocom in the early 1980s marked a transitional peak before graphical interfaces dominated.[8] Key examples from the 1980s include Multi-User Dungeons (MUDs), pioneered in 1978 by Roy Trubshaw and Richard Bartle at the University of Essex using the MUDDLE language on a DECsystem-10. These evolved into networked, multi-player text adventures accessible via systems like CompuNet by 1980, fostering social interactions through shared textual worlds that later influenced online gaming on personal computers.[9] MUDs like MUD1 and its 1985 successor MUD2 emphasized collaborative exploration and role-playing in procedurally described realms, extending the single-player text adventure model to communal experiences.[10]Vector Graphics
Vector graphics in video games refer to a rendering technique that uses mathematical equations to draw lines, curves, and polygons directly on cathode-ray tube (CRT) displays or oscilloscopes, producing wireframe visuals without relying on a pixel grid for inherently smooth and scalable imagery.[11] This approach leverages electron beam deflection to trace luminous paths on the phosphor-coated screen, creating high-contrast, glowing lines that persist briefly due to phosphor afterglow.[12] Unlike raster systems, which scan pixels row by row, vector methods enable precise, real-time plotting of geometric primitives, marking an evolution from text-based displays toward more dynamic visual representations in early gaming.[13] The technique emerged in arcade games during the mid-1970s, with Space Wars (1977) by Cinematronics serving as the first mass-produced vector-based title, designed by Larry Rosenthal as an adaptation of the 1962 mainframe game Spacewar!.[14] This two-player space combat game utilized a custom vector monitor with digital-to-analog converters to generate sharp, black-and-white wireframe ships and obstacles, controlled via discrete hardware components without a microprocessor.[13] Atari advanced the format in 1979 with Lunar Lander and Asteroids, both employing the company's Digital Vector Generator (DVG)—a specialized circuit built from TTL integrated circuits that sequences vectors stored in ROM and RAM to drive deflection coils on monochrome CRTs.[15] Asteroids, in particular, depicted asteroid fields and spacecraft as interconnected line segments, achieving real-time updates at 60 Hz for fluid motion.[16] By 1980, vector graphics enabled rudimentary 3D simulations, as seen in Atari's Battlezone, which rendered wireframe tanks and terrain from a first-person perspective using the DVG augmented by a "math box" of bit-slice processors to compute 2x2 matrix transformations for scaling and projection.[11] This hardware-specific approach offered advantages like superior brightness and alias-free lines, ideal for dimly lit arcades, and supported rapid drawing speeds that minimized flicker in fast-paced action.[12] However, vector systems declined in the early 1980s as raster displays became more affordable and versatile, supporting filled polygons, textures, and full-color palettes while requiring less specialized, failure-prone hardware like high-voltage deflection circuits.[12] Cinematronics shifted to laserdisc technology by 1983, and Atari's last major vector release, Tempest (1981), highlighted the format's niche appeal before raster dominance in titles like Pac-Man solidified the transition.[12] Battlezone's innovations, meanwhile, extended to military flight simulators, underscoring vector graphics' lasting influence on immersive 3D training applications.[11]2D Graphics
Sprite and Tile-based Rendering
Sprite and tile-based rendering refers to a foundational technique in 2D video game graphics where the screen is composed by combining small, reusable bitmap images known as tiles for static or scrolling backgrounds and sprites for dynamic, movable elements overlaid on those backgrounds. Tiles are typically square bitmaps, such as 8x8 pixels, arranged in a grid called a tilemap to construct larger scenes efficiently, minimizing memory usage by reusing patterns for elements like floors, walls, or terrain. Sprites, in contrast, are independent bitmaps—often the same size as tiles but configurable for characters, projectiles, or effects—that can be positioned, scaled, or layered arbitrarily to create interactive visuals. This approach dominated early console and arcade hardware due to limited processing power and memory, enabling complex scenes without rendering every pixel from scratch.[17] The technique emerged in the late 1970s with arcade systems, where custom hardware first supported tile-based backgrounds and overlaid sprites for animation. Namco's Pac-Man (1980) exemplified this, using an 8x8 tile grid for the maze layout stored in video RAM and dedicated sprite hardware to position and animate the titular character and ghosts as 16x16 pixel overlays, allowing smooth movement across the tilemap. This innovation built on earlier arcade chipsets like those in Namco's Galaxian (1979), which introduced programmable tile graphics and hardware sprites, with Pac-Man refining their use for character animation in a major commercial title.[18][19] By the early 1980s, home consoles adopted similar designs; Nintendo's Entertainment System (NES), released in 1983, featured a Picture Processing Unit (PPU) that rendered backgrounds via two 32x30 tilemaps (each tile 8x8 pixels) and supported up to 64 sprites per frame, drawn from pattern tables in video RAM.[20] Core techniques include sprite multiplexing, where hardware or software prioritizes and layers multiple sprites per scanline to composite the final image, and tilemap scrolling, which shifts the background grid horizontally or vertically by adjusting tile indices without redrawing pixels. In the NES PPU, for instance, sprite evaluation during each scanline fetches up to eight sprites from Object Attribute Memory (OAM), copying their tile indices, positions, and attributes (like horizontal/vertical flipping or priority) to secondary OAM for rendering, while background tiles are fetched in parallel from nametables. Palette limitations were common to conserve memory; early systems like the NES supported 52 colors total but restricted sprites to one of three color palettes (each with three colors plus transparency) and backgrounds to a 16-color global palette, often leading to visual constraints like color clashes in overlapping areas. Scrolling tilemaps in games used modular updates, where only changed tile positions were reloaded during vertical blanking intervals to maintain 60 Hz refresh rates.[21][20] Animation in sprite-based systems relies on frame-by-frame substitution, where a sequence of pre-drawn bitmap frames is cycled through by updating the sprite's tile index in OAM at timed intervals, often synchronized to the game's frame rate. For example, character walking cycles might flip between 4-8 frames stored in the sprite pattern table, with attributes like horizontal flipping used to mirror sprites for left/right movement without duplicating assets. Collision detection typically employs bounding boxes—rectangular approximations of sprite shapes defined by their pixel coordinates—to check overlaps efficiently, comparing x/y extents between sprites or against tilemap positions rather than pixel-perfect analysis, which was computationally expensive on period hardware. This method enabled responsive interactions, such as player-enemy contacts, by flagging collisions when boxes intersected during update loops.[22] A seminal example is Super Mario Bros. (1985) on the NES, which constructed levels using 8x8 background tiles for platforms and scenery, while protagonists and enemies utilized 8x16 sprites (combining two 8x8 tiles vertically) for taller forms like the 16x32 big Mario. The game's side-scrolling levels applied these in a layered tilemap for parallax effects, with sprites animated via frame flipping for actions like jumping. Hardware limits, such as the PPU's cap of 64 total sprites and eight per scanline, caused flicker in dense scenes—e.g., during enemy swarms—where excess sprites were dropped or rotated in OAM order across frames to distribute visibility pseudo-randomly, preventing permanent disappearance of key elements. These constraints influenced design, prioritizing sparse on-screen action to avoid visual artifacts while maximizing the system's 256 sprite tile capacity in video RAM.[23][24]2D Perspectives and Views
In 2D video games, perspectives and views refer to the camera angles and spatial layouts that guide player navigation and interaction, often simulating depth within a flat plane to enhance immersion and gameplay flow. These techniques prioritize simplicity and direct control, allowing developers to focus on mechanics like exploration and precision timing without the computational demands of three-dimensional rendering. Common arrangements include top-down and side-scrolling views, each suited to different genres and historical eras of game design.[25] The top-down or overhead view presents the game world from above, typically using orthogonal projection for a flat, map-like representation or isometric projection to add subtle depth cues through angled visuals. This perspective excels in strategy and adventure games, where grid-based movement enables clear pathfinding and tactical planning, as seen in The Legend of Zelda (1986), which employed a top-down layout to facilitate open-world exploration across Hyrule's interconnected screens.[25] Orthogonal top-down views maintain consistent scale for all elements, promoting precise navigation on structured grids, while isometric variants, though less common in early titles, offer a pseudo-elevated feel for multi-level environments without full 3D processing. Sprites populate these views efficiently, layering characters and objects to create dynamic scenes.[26] Side-scrolling views, in contrast, unfold horizontally as players progress left to right or vice versa, emphasizing linear advancement through levels filled with obstacles and enemies. This arrangement simulates forward momentum and environmental traversal, with parallax scrolling—a technique where background layers move at varying speeds relative to the foreground—creating an illusion of depth by mimicking real-world visual separation. In Sonic the Hedgehog (1991), parallax scrolling enhanced the high-speed chase through zones like Green Hill, where distant hills shifted slower than nearby foliage, reinforcing the sense of velocity and expansive worlds on the Sega Genesis hardware.[27] Such views suit action-oriented gameplay, allowing seamless horizontal expansion beyond single-screen limits. Platformer games, a subset often using side-scrolling, incorporate specific physics simulations to handle verticality and interaction, particularly through jump arcs governed by gravity. These arcs follow parabolic trajectories, where initial upward velocity diminishes under constant downward acceleration, enabling players to clear gaps or reach platforms with tunable height based on input duration. Developers simulate 2D gravity as a fixed force (typically 9.8 m/s² scaled for gameplay feel), integrating velocity updates each frame to produce responsive, intuitive leaps that feel natural yet controllable. Multi-layer backgrounds further enrich platformers by separating environmental elements—foreground platforms, midground hazards, and distant scenery—fostering storytelling through visual narrative, such as evolving landscapes that hint at lore or progression without explicit text.[28][29] Historically, 2D perspectives evolved from static, fixed-screen designs to fluid scrolling, reflecting hardware advancements and design ambitions. Early arcade titles like Donkey Kong (1981) confined action to single screens, requiring players to navigate vertically and horizontally within bounded views to climb structures and avoid hazards, which emphasized puzzle-like timing over exploration. By the late 1980s, console capabilities enabled smooth scrolling, as in [Mega Man](/page/Mega Man) (1987), where continuous horizontal movement across expansive stages allowed for rhythmic combat and level progression, marking a shift toward more immersive, world-spanning layouts. This transition expanded gameplay scope while retaining 2D's core efficiency.[30][31] The advantages of 2D perspectives lie in their computational simplicity and emphasis on precise controls, making them ideal for accessible, responsive experiences. Rendering flat planes and layered sprites demands far less processing power than 3D polygons, reducing development time and costs—often by factors of 2-5 times—while enabling tight, pixel-perfect input mapping for actions like jumps or aiming. This focus on core mechanics fosters genres reliant on skill mastery, such as platformers, without the complexity of spatial navigation or lighting calculations.[26][29]Pseudo-3D Techniques
Pseudo-3D techniques encompass a range of 2D rendering methods employed in video games during the 1980s and 1990s to simulate three-dimensional depth and perspective without full 3D polygon processing, relying instead on scaling, rotation, layering, and projection tricks to create illusions of spatiality.[32] These approaches bridged the gap between flat 2D sprite-based graphics and emerging true 3D systems, leveraging limited hardware capabilities of arcade machines, early consoles, and personal computers to achieve dynamic visuals like curving roads or labyrinthine corridors. By manipulating 2D elements such as backgrounds and sprites—building on basic 2D sprite rendering—they produced engaging pseudo-depth effects that enhanced gameplay immersion without the computational overhead of volumetric modeling.[33] One foundational technique involved sprite scaling to simulate distance, where objects farther from the viewer were rendered smaller and layered behind closer ones, often combined with vertical positioning to mimic elevation. In arcade racing games like Out Run (1986) by Sega, this was applied to road segments: pre-rendered 2D strips of the track were scaled and shifted frame-by-frame to create the illusion of forward motion and turns, with dedicated hardware chips automating basic drawing while the CPU handled positional calculations.[33] Similarly, isometric or 3/4 views used angled 2D tiles and sprites to convey height and multi-level structures, as seen in Populous (1989) by Bullfrog Productions, where isometric projections of terrain and buildings provided a pseudo-3D overview of world-shaping layouts, allowing players to perceive depth in a top-down plane without z-depth buffering.[34] These methods emerged prominently in the mid-1980s amid arcade hardware advancements, evolving from simpler vector displays to sprite-driven simulations that prioritized speed and visual flair over geometric accuracy. A pivotal advancement came with affine transformations on consoles like the Super Nintendo Entertainment System (SNES), introduced in 1990, which enabled hardware-accelerated rotation, scaling, shearing, and translation of entire background layers to generate pseudo-3D environments. The SNES's Mode 7 specifically rendered a single 8-bit-per-pixel layer as a texture-mapped plane, applying a rotation matrix computed via sine and cosine functions during horizontal blanks, with HDMA (Horizontal Direct Memory Access) allowing per-scanline adjustments for perspective distortion.[32] This technique shone in racing titles such as F-Zero (1990) by Nintendo, where Mode 7 scaled and rotated a checkered track texture to simulate winding, multi-elevation circuits, achieving smooth 60 FPS visuals that conveyed velocity and depth through continuous affine warping of the 2D plane.[32] Another key method, ray casting, projected 3D-like corridors from a 2D map by casting virtual rays from the player's viewpoint into a grid-based world, determining wall distances and heights to draw vertical strips as textured columns. Pioneered in Wolfenstein 3D (1992) by id Software, this algorithm transformed a simplified 2D floor plan into a first-person perspective by calculating ray intersections with walls, scaling wall slices proportionally to their distance for a faux-3D maze effect, all rendered in real-time on 286 PCs without floating-point operations.[35] Building on this, Doom (1993) by id Software refined visibility handling through binary space partitioning (BSP) trees, which pre-divided static level geometry into a hierarchical structure offline, enabling efficient front-to-back rendering and occlusion culling to avoid drawing hidden surfaces.[36] This innovation, adapted from 1980s computer graphics research, allowed complex, multi-room environments to render at playable speeds on era hardware, marking a high-water mark for pseudo-3D before polygonal engines dominated. Despite their ingenuity, pseudo-3D techniques faced inherent limitations due to their 2D foundations and hardware constraints, lacking true occlusion for overlapping objects beyond simple layering, dynamic lighting, or sloped surfaces. In ray-casting engines like Wolfenstein 3D, walls were confined to a uniform grid with fixed heights, preventing variable elevations or non-orthogonal architecture, while visibility computations relied on ray traces per screen column, capping performance at resolutions like 320x200.[37] BSP in Doom mitigated some visibility issues for static sectors but struggled with dynamic elements like enemies, requiring separate clipping and rendering passes, and prohibited features such as multi-level floors or arched doorways to maintain efficiency.[36] These constraints—rooted in the absence of depth buffers or vector math support—confined pseudo-3D to stylized, corridor-like or planar simulations, paving the way for full 3D transitions by the mid-1990s as processing power grew.[32]3D Graphics
3D Modeling and Basic Rendering
In 3D modeling for video games, objects are represented using polygonal meshes, which consist of vertices defining points in 3D space, edges connecting those vertices, and faces—typically triangles or quadrilaterals—forming the surfaces of the model.[38] These meshes approximate complex shapes through a collection of flat polygons, allowing for efficient manipulation and rendering in real-time environments. To display these 3D models on a 2D screen, rasterization is employed, a process that projects the 3D geometry onto the screen and fills the resulting pixels with color data, converting vector-based polygons into a raster image suitable for output.[39] The transition to consumer-accessible 3D graphics accelerated in the 1990s, driven by hardware advancements that shifted games from 2D sprites to fully polygonal environments. The Sony PlayStation, released in Japan in December 1994, featured dedicated 3D polygon processing capabilities, enabling home consoles to handle real-time 3D rendering for titles like Ridge Racer.[40] On the PC side, the 3dfx Voodoo graphics card, launched in 1996, provided affordable 3D acceleration, revolutionizing gameplay with smoother frame rates and effects in games such as Quake.[41] This era marked a pivotal shift, as developers moved from experimental arcade systems to widespread adoption in home gaming. Early polygonal rendering techniques emphasized flat shading, filling entire faces with solid colors to create basic 3D structures, as seen in Sega's Virtua Fighter (1993), which utilized basic polygonal character models with around 100-200 polygons per fighter for fluid animations on arcade hardware.[42] Texture mapping enhanced these models by applying 2D images onto polygonal surfaces using UV coordinates, which map each vertex to a specific point (u,v) on a texture image, allowing simple details like clothing patterns without increasing polygon count.[43] Depth sorting was managed via z-buffering, a technique that maintains a depth value for each screen pixel and discards fragments farther from the viewer during rasterization, ensuring correct occlusion without manual polygon ordering.[44] Developers faced significant challenges with limited computational resources, resulting in low polygon counts—such as 100-500 polygons per scene or model in id Software's Quake (1996)—to maintain playable frame rates on contemporary hardware.[45] Rendering relied on fixed-function pipelines in early GPUs, where hardware performed predefined operations like transformation and lighting without programmable flexibility, constraining effects to basic transformations and texturing.[46] These constraints prioritized optimization, often leading to stylized, blocky aesthetics that defined the era's visual identity.3D Perspectives and Camera Views
In three-dimensional video game graphics, perspectives and camera views determine how players perceive and interact with virtual environments, fundamentally shaping immersion and gameplay dynamics. The first-person perspective places the player directly in the role of the protagonist, eliminating an on-screen avatar to enhance embodiment and spatial presence. This approach was advanced in full 3D polygonal games like Quake (1996), which rendered complex environments and enemies using textured polygons from the player's viewpoint, enabling fast-paced action and intense immersion through direct control and vulnerability.[45] In contrast, the third-person perspective maintains a visible player character, allowing observation of actions and surroundings from an external vantage, often via over-the-shoulder or chase cameras. Tomb Raider (1996) exemplified this with its dynamic third-person camera that automatically adjusted to Lara Croft's movements—such as running, jumping, or climbing—while providing contextual views of the environment to aid puzzle-solving and exploration in fully navigable 3D levels.[47] This setup enables dynamic switching between fixed and free cameras, balancing player agency with narrative visibility, as seen in later titles that toggle views for combat or traversal. Core techniques for implementing these views include projection matrices to map 3D coordinates onto 2D screens and clipping planes to optimize rendering. Perspective projection, common in immersive games, uses a field-of-view (FOV) parameter—typically 45–90 degrees for realism in first-person shooters—to simulate human vision, where distant objects appear smaller, achieved via functions likegluPerspective() with parameters for FOV angle, aspect ratio, near-plane distance, and far-plane distance.[48] Orthographic projection, conversely, renders without depth scaling, maintaining uniform object sizes for isometric or strategic views, as in glOrtho(). Clipping planes define the view frustum's boundaries: the near plane (e.g., 0.1 units) culls geometry too close to the camera to prevent distortion, while the far plane (e.g., 1000 units) eliminates distant objects beyond visibility, reducing computational load by discarding off-screen or out-of-range polygons before rasterization.[48][49]
The evolution of 3D perspectives progressed from constrained, fixed views to expansive free-roaming cameras, reflecting hardware advances. Early fixed 3D in polygonal games featured limited movement with simple clipping for off-screen objects. By 2001, Grand Theft Auto III introduced seamless third-person free-roaming in an open-world city, with rotatable cameras during driving and on-foot exploration, enabling 360-degree navigation and enhancing spatial awareness across vast urban environments.[50]
These perspectives offer advantages like heightened spatial awareness—first-person views excel in tactical precision for shooters, while third-person aids environmental interaction—but also pose challenges, such as motion sickness in first-person games due to sensory conflicts between visual motion and physical stillness. Studies indicate that narrow FOVs (below 90 degrees) exacerbate nausea and disorientation in FPS titles, prompting developers to recommend wider settings (e.g., 100+ degrees) and stable camera mechanics to mitigate symptoms like dizziness, affecting up to 80% of susceptible players during prolonged sessions.[51][52]