Paraxial approximation
The paraxial approximation, also known as Gaussian optics, is a simplifying assumption in geometrical optics that applies to light rays propagating close to the optical axis and making small angles with it, enabling the linearization of ray paths for easier analysis of imaging systems.[1][2] This approximation confines calculations to the region near the axis, where ray heights and angles remain small, typically valid for angles less than about 10 degrees with errors around 1%.[2][3] The paraxial approximation was developed by Carl Friedrich Gauss in 1841 in his work Dioptrische Untersuchungen, where he introduced the small-angle approximations to simplify the analysis of optical systems, laying the foundation for first-order optics.[4] At its core, the paraxial approximation relies on the small-angle substitutions \sin \theta \approx \tan \theta \approx \theta (with \theta in radians), which transform nonlinear trigonometric relations into linear ones.[1][2] For refraction at interfaces, this yields the paraxial form of Snell's law: n_1 \theta_1 = n_2 \theta_2, where n denotes refractive index and \theta the ray angle relative to the axis.[1] Reflection follows a similar linearization for mirrors.[2] These approximations also simplify surface sagitta calculations, assuming the sag (deviation from flatness) is negligible compared to the radius of curvature.[2] In practice, the paraxial approximation facilitates the use of ray transfer matrices to model entire optical systems, such as combinations of lenses and free space propagation.[1] For a thin lens, the matrix is \begin{pmatrix} 1 & 0 \\ -1/f & 1 \end{pmatrix}, where f is the focal length, while propagation over distance L uses \begin{pmatrix} 1 & L \\ 0 & 1 \end{pmatrix}.[1] The overall system matrix, obtained by multiplying individual matrices, predicts image location and magnification via the thin lens equation \frac{n}{s} + \frac{n'}{s'} = \frac{1}{f}, where s and s' are object and image distances.[1][2] This framework is essential for first-order optical design, including determining cardinal points (foci, principal planes) and Gaussian properties of systems like telescopes and microscopes.[2] It underpins applications in focusing with parabolic or ellipsoidal surfaces for aberration-free on-axis imaging and serves as the foundation for higher-order aberration corrections in complex lenses, such as those used in lithography.[3][2] While powerful for preliminary analysis, it breaks down for wide-field or high-numerical-aperture systems, necessitating exact ray tracing or wavefront methods.[1][3]Introduction
Definition
The paraxial approximation is a fundamental simplification in optics that assumes light rays propagate close to the optical axis and make small angles with it, typically less than 10 degrees, allowing for linear approximations in ray tracing and wave propagation analysis.[5] This approach treats rays as paraxial, meaning their transverse distances from the axis and inclination angles remain sufficiently small throughout the system to neglect higher-order effects, thereby enabling efficient computation of optical behavior.[6] In essence, it models light as bundles of rays or waves confined near the axis, which is particularly valid for well-collimated beams like those in laser systems or standard imaging setups.[7] Paraxial optics represents a first-order, small-angle subset of geometric optics, where the full nonlinear equations of ray propagation are linearized by ignoring terms beyond the first order in ray height and angle.[8] Unlike complete geometric optics, which handles arbitrary ray paths and angles without simplification, the paraxial framework restricts analysis to near-axis regions, reducing complex Snell's law and refraction/reflection calculations to manageable algebraic forms.[9] This distinction ensures paraxial methods provide accurate predictions only within their validity range, beyond which aberrations and nonlinear effects dominate. The approximation simplifies calculations for imaging systems, such as lenses and mirrors, by facilitating quick determinations of focal points, image positions, and magnifications without tracing every possible ray path.[10] Central to this are key trigonometric identities applied in radians: \sin \theta \approx \theta, \tan \theta \approx \theta, and \cos \theta \approx 1, which linearize the geometry of ray bending at surfaces.[11] These enable straightforward applications, including ray transfer matrix analysis for sequential optical elements.[8]Historical Context
The paraxial approximation traces its origins to the 17th century, rooted in René Descartes' foundational work on the laws of refraction outlined in his 1637 treatise La Dioptrique. Descartes derived the relationship between the angles of incidence and refraction—now known as Snell's law—using a mechanical analogy of light as particles. This approach laid the groundwork for approximating ray paths near the optical axis, though without explicit formulation of the paraxial limit.[12] In the early 18th century, Isaac Newton advanced these concepts in his seminal Opticks (1704), where he applied geometric optics principles to telescope design, emphasizing rays close to the axis to analyze focusing and aberrations in reflecting systems. Newton's analysis of spherical mirrors and the limitations of refracting telescopes due to surface curvature effectively utilized paraxial-like assumptions to predict image formation, without naming the approximation, thereby bridging theoretical refraction with practical instrumentation.[13] In the early 19th century, Joseph von Fraunhofer advanced empirical work in lens design and manufacturing, producing high-precision achromatic objectives for telescopes that minimized aberrations and achieved superior image quality.[14] The formalization of the paraxial approximation came later through Carl Friedrich Gauss's Dioptrische Untersuchungen (1841), which systematically described thin lens behavior and optical systems under small-angle conditions, establishing the framework for Gaussian optics and ray transfer matrices.[15] In the 20th century, Dennis Gabor extended paraxial ideas into wave optics with his 1948 invention of holography, where approximations for near-axis propagation enabled the recording and reconstruction of complex wavefronts using coherent light. Following the laser's development in the 1960s, the approximation gained renewed prominence in modeling beam propagation, as detailed in Kogelnik and Li's analysis of paraxial rays in resonators and transmission lines. This evolution from geometric optics—focused on ray paths—to physical optics, incorporating wave phenomena, underscores the paraxial approximation's enduring utility, including its integration into modern computational ray tracing software for initial optical system design.[16][17][18]Mathematical Foundations
Small-Angle Approximations
The Taylor series expansions of the trigonometric functions around \theta = 0 form the mathematical basis for the small-angle approximations central to the paraxial regime. For the sine function, the expansion is \sin \theta = \theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \frac{\theta^7}{7!} + \cdots, where the terms decrease rapidly for small \theta.[19] Similarly, the cosine expansion is \cos \theta = 1 - \frac{\theta^2}{2!} + \frac{\theta^4}{4!} - \frac{\theta^6}{6!} + \cdots, and the tangent expansion is \tan \theta = \theta + \frac{\theta^3}{3} + \frac{2\theta^5}{15} + \cdots. These infinite series converge for all real \theta (in radians) and enable truncation for small angles to achieve linearity in optical calculations.[20] In the first-order approximation, higher-order terms (\theta^3 and beyond) are neglected when \theta is small, yielding \sin \theta \approx \theta, \tan \theta \approx \theta, and \cos \theta \approx 1. This simplification justifies treating optical rays as propagating in a linear manner, where deviations from the optical axis are proportional to the angle without quadratic or higher nonlinearities. All expansions and approximations require \theta in radians; for reference, 1 radian \approx 57.3^\circ, so angles in degrees must be converted by multiplying by \pi/180.[19] For enhanced accuracy, a second-order term can be included for cosine: \cos \theta \approx 1 - \theta^2/2. This retains the quadratic correction while still neglecting higher powers. The relative error for the first-order sine approximation \sin \theta \approx \theta is approximately \theta^2/6 and remains below 0.5% for \theta < 10^\circ (or \theta < 0.175 radians), with the error reaching about 1% at 14^\circ. A three-term expansion for sine, \sin \theta \approx \theta - \theta^3/6 + \theta^5/120, further reduces the error to less than 0.5% even up to \theta \leq \pi/2.[19][21] To illustrate the accuracy, the following table compares exact values to first-order approximations for selected small angles in degrees (converted to radians):| Angle (\theta) | \theta (rad) | Exact \sin \theta | Approx. \sin \theta \approx \theta | Relative Error (%) |
|---|---|---|---|---|
| 0° | 0 | 0 | 0 | 0 |
| 5° | 0.0873 | 0.0872 | 0.0873 | 0.11 |
| 10° | 0.1745 | 0.1736 | 0.1745 | 0.51 |
| 15° | 0.2618 | 0.2588 | 0.2618 | 1.16 |
Derivation from Geometric Optics
The paraxial approximation in geometric optics begins with the simplification of Snell's law for refraction at a plane interface between two media with refractive indices n_1 and n_2. Snell's law states that n_1 \sin \theta_1 = n_2 \sin \theta_2, where \theta_1 and \theta_2 are the angles of incidence and refraction relative to the normal. For paraxial rays—those making small angles with the optical axis—the small-angle approximation \sin \theta \approx \theta (in radians) applies, yielding n_1 \theta_1 \approx n_2 \theta_2. This linear relation implies that ray directions change proportionally to the refractive index ratio, enabling straight-line propagation approximations in homogeneous media.[22] For refraction at a single spherical surface separating media of indices n_1 and n_2, with radius of curvature R (positive if the center lies to the right of the vertex for light traveling left to right), the geometry involves an incident ray from an object at distance u (object distance, positive if to the left) parallel to the axis at height h above it, striking the surface near the vertex. The surface normal at the incidence point deviates slightly from the axis by angle \gamma \approx h / R. The incident angle \theta_1 \approx \gamma - \alpha, where \alpha \approx h / u is the ray's slope angle, and the refracted angle \theta_2 \approx \gamma - \beta, with \beta \approx h / u' and u' the image distance (positive if to the right). Applying the paraxial Snell's law n_1 (\gamma - \alpha) \approx n_2 (\gamma - \beta) and substituting the angle approximations leads to the paraxial refraction formula: \frac{n_2}{u'} - \frac{n_1}{u} = \frac{n_2 - n_1}{R}. This equation describes how the spherical surface shifts the image location linearly in terms of ray height and slope, assuming rays remain nearly parallel to the axis after refraction.[23] In reflection from a spherical mirror, the paraxial approximation simplifies the law of reflection (\theta_i = \theta_r) using small angles where \theta \approx \tan \theta. Consider a concave mirror with radius R (positive for concave toward the incident light), vertex at V, and center of curvature C at distance R from V along the optical axis. An object at distance u (positive) sends a ray parallel to the axis at height h, striking near V; the normal there is along the radius, so the incidence angle \theta \approx h / R. The reflected ray has slope -\theta relative to the axis (due to equal angles), intersecting the axis at the focal point f = R/2. For a general object ray with slope \theta_o \approx h / u and reflected slope \theta_i \approx -h / v (where v is image distance, positive for real images), the geometry yields \theta_o + \theta_i \approx 2 (h / R), simplifying to the mirror equation \frac{1}{v} + \frac{1}{u} = \frac{2}{R}. This linear relation traces rays as straight lines between the vertex and image point, emphasizing paraxial propagation.[24] For a thin lens—approximated as two closely spaced spherical surfaces with negligible thickness—the paraxial formula combines refractions at each surface. Assume the lens has index n in air (n_1 = n_2 = 1), first surface radius R_1 (positive if convex to the left), and second R_2 (positive if convex to the right). Applying the single-surface formula to the first interface (air to lens) gives an intermediate image at u_1', then to the second (lens to air) with object distance approximately u_2 \approx -u_1' (due to thinness). In the paraxial limit, this yields the thin lens equation \frac{1}{u} + \frac{1}{u'} = \frac{1}{f}, where the focal length f satisfies the lensmaker's formula \frac{1}{f} = (n-1) \left( \frac{1}{R_1} - \frac{1}{R_2} \right). This derives the lens power as the difference in surface curvatures, scaled by the index contrast, allowing linear ray tracing through the combined element.[25]Applications
Ray Transfer Matrix Analysis
In ray transfer matrix analysis, a paraxial ray is characterized by a two-component vector consisting of its transverse position r (height from the optical axis) and its angle \theta (slope relative to the axis) at a given plane perpendicular to the axis.[26] This representation assumes small angles, consistent with the paraxial approximation, allowing linear transformations to describe ray evolution.[27] The propagation of such a ray through an optical element or system is modeled by a 2×2 ray transfer matrix, often denoted as the ABCD matrix, which linearly relates the input ray vector to the output: \begin{pmatrix} r_{\text{out}} \\ \theta_{\text{out}} \end{pmatrix} = \begin{pmatrix} A & B \\ C & D \end{pmatrix} \begin{pmatrix} r_{\text{in}} \\ \theta_{\text{in}} \end{pmatrix}. This matrix formalism enables the composition of complex systems by matrix multiplication, where the overall matrix is the product of individual element matrices in reverse order of traversal.[26][27] For systems conserving the étendue (optical invariant), the determinant satisfies AD - BC = 1 when input and output media have the same refractive index; in general, it equals n_{\text{in}} / n_{\text{out}}. Specific optical elements have well-defined ABCD matrices under the paraxial approximation. For free-space propagation over a physical distance d in a medium of refractive index n, the matrix is \begin{pmatrix} 1 & d \\ 0 & 1 \end{pmatrix}, reflecting the unchanged angle and linear increase in position with the physical distance d.[26] For a thin lens of focal length f (assuming surrounding medium with n = 1), the matrix is \begin{pmatrix} 1 & 0 \\ -1/f & 1 \end{pmatrix}, which preserves the input position but alters the angle based on the lens power $1/f.[27] For refraction at a single curved (spherical) surface separating media of indices n (incident) and n' (transmitted), with radius of curvature R (positive if the center lies to the right of the surface for light traveling left to right), the matrix is \begin{pmatrix} 1 & 0 \\ (n' - n)/(n' R) & n/n' \end{pmatrix}, accounting for the position invariance and the angle change due to the surface power.[26] To analyze a multi-element system, such as a simple astronomical telescope consisting of an objective lens of focal length f_1 followed by free-space propagation distance d and an eyepiece lens of focal length f_2 (with d = f_1 + f_2 for afocal configuration), the overall ABCD matrix is the product M = M_{\text{eyepiece}} \cdot M_{\text{prop}} \cdot M_{\text{objective}} = \begin{pmatrix} 1 & 0 \\ -1/f_2 & 1 \end{pmatrix} \begin{pmatrix} 1 & d \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ -1/f_1 & 1 \end{pmatrix}. Computing this yields the system's transformation properties, such as angular magnification M = f_1 / f_2, derived from the elements (e.g., A \approx 0, D \approx 0 for afocal systems).[26] The elements of the system ABCD matrix also determine the cardinal points, which locate the effective focal points and principal planes. For a system in air (n = 1), the effective focal length is given by f = -1/C, where C is the (2,1) element, representing the system's overall focusing power.[27] The distances to the principal planes from the input and output planes are h_1 = (1 - D)/C (first principal plane) and h_2 = (A - 1)/C (second principal plane), enabling the reduction of the system to an equivalent thin lens at those planes.[26] These relations facilitate the design and characterization of optical instruments like microscopes, where matrix analysis simplifies tracing rays through successive lenses and spaces.Gaussian Beam Optics
In wave optics, the paraxial approximation facilitates the analysis of light propagation for beams with small divergence angles, particularly relevant for laser beams. The starting point is the scalar Helmholtz equation for a monochromatic field E in free space: \nabla^2 E + k^2 E = 0, where k = 2\pi / \lambda is the wavenumber and \lambda is the wavelength.[28] To model forward-propagating waves along the z-direction, assume E(x, y, z) = u(x, y, z) e^{i k z}, where u varies slowly in the transverse directions compared to the longitudinal phase. Substituting this form into the Helmholtz equation yields \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} + 2 i k \frac{\partial u}{\partial z} + \frac{\partial^2 u}{\partial z^2} = 0. The paraxial approximation neglects the second longitudinal derivative under the condition |\partial^2 u / \partial z^2| \ll k^2 |u|, resulting in the paraxial wave equation: \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} + 2 i k \frac{\partial u}{\partial z} = 0. This equation describes the slow transverse variation of the envelope u for beams confined to small angles relative to the propagation axis.[28] A fundamental exact solution to the paraxial wave equation in cylindrical coordinates (r = \sqrt{x^2 + y^2}) is the Gaussian beam, which represents the lowest-order transverse electromagnetic mode (TEM00) of a laser. The complex envelope is given by u(r, z) = u_0 \frac{w_0}{w(z)} \exp\left[-\frac{r^2}{w(z)^2}\right] \exp\left[i \left( \frac{k r^2}{2 R(z)} - \phi(z) \right)\right], where u_0 is the amplitude at the beam waist, and the full field includes the rapid phase e^{i k z}. The beam width w(z) varies with propagation distance z as w(z) = w_0 \sqrt{1 + \left(\frac{z}{z_R}\right)^2}, where w_0 is the minimum waist radius at z = 0 and z_R = \pi w_0^2 / \lambda is the Rayleigh range, defining the distance over which the beam area doubles. The radius of curvature of the wavefront R(z) is R(z) = z \left[1 + \left(\frac{z_R}{z}\right)^2\right], which is infinite at the waist and positive for z > 0. Additionally, the Gouy phase shift \phi(z) = \arctan(z / z_R) accounts for an extra \pi radian phase accumulation over one Rayleigh range compared to a plane wave. These parameters characterize the beam's spatial extent, phase front, and overall propagation, with the intensity profile I(r, z) \propto |u(r, z)|^2 remaining Gaussian at every z.[28][29] To describe Gaussian beam propagation through paraxial optical elements such as lenses, mirrors, or free space, the complex beam parameter q(z) is employed, defined as q(z) = z + i z_R at the waist. In general, $1/q(z) = 1/R(z) - i \lambda / (\pi w(z)^2), encapsulating both curvature and width information. Under the paraxial approximation, the transformation of q through a system described by the ray transfer matrix (ABCD matrix) from geometric optics is \frac{1}{q_{\text{out}}} = \frac{A / q_{\text{in}} + B}{C / q_{\text{in}} + D}, or equivalently q_{\text{out}} = \frac{A q_{\text{in}} + B}{C q_{\text{in}} + D}, where the elements A, B, C, D satisfy AD - BC = 1 for lossless systems. This formulation bridges wave and ray optics, allowing straightforward computation of beam parameters after traversal of optical components without solving the wave equation anew. For instance, a thin lens of focal length f has matrix elements A = D = 1, B = 0, C = -1/f, enabling prediction of focused beam waists and locations.[28][29]Limitations and Extensions
Aberrations and Accuracy Limits
The paraxial approximation neglects higher-order terms in the Taylor expansion of trigonometric functions, such as the cubic term in \sin \theta \approx \theta - \frac{\theta^3}{6}, which introduces aberrations by causing rays at larger angles to deviate from the predicted paraxial paths. This leads to primary aberrations including spherical aberration, where marginal rays focus closer to the lens than paraxial rays, resulting in longitudinal and transverse shifts; coma, which produces asymmetric blurring of off-axis points; astigmatism, manifesting as different focal lengths in the meridional and sagittal planes; and field curvature, where the image surface bends away from a flat plane. These effects arise because the approximation assumes all rays follow linear paths near the axis, ignoring the nonlinear contributions that distort focus for non-paraxial rays.[30][2] The accuracy of the paraxial approximation diminishes with increasing ray angles, as the neglected terms become significant. For instance, the error in the focal length calculation, stemming from the cosine approximation \cos \theta \approx 1 - \frac{\theta^2}{2}, scales roughly as \frac{\theta^2}{2} relative to the paraxial value, leading to noticeable deviations in systems with larger apertures or fields of view. To quantify this, the relative error in the small-angle approximation for \frac{\sin \theta}{\theta} (a key factor in ray refraction) can be assessed as follows:| Angle \theta (degrees) | Relative Error in \frac{\sin \theta}{\theta} (%) |
|---|---|
| 10 | 0.5 |
| 18 | 1.6 |
| 30 | 4.6 |