Camera matrix
The camera matrix, also known as the projection matrix, is a 3×4 matrix in computer vision that maps homogeneous three-dimensional world coordinates to homogeneous two-dimensional image coordinates under the pinhole camera model, encapsulating both intrinsic camera properties and extrinsic pose parameters.[1][2] This linear transformation, denoted as \mathbf{x} = P \mathbf{X}, where \mathbf{X} is a 4×1 world point and \mathbf{x} is a 3×1 image point, enables the projection of 3D scenes onto 2D images while accounting for perspective effects.[3] The matrix P has 11 degrees of freedom after accounting for scale ambiguity, making it a fundamental tool for tasks like camera calibration and 3D reconstruction.[2] The camera matrix decomposes into an intrinsic matrix K (3×3) and an extrinsic matrix [R \mid \mathbf{t}] (3×4), such that P = K [R \mid \mathbf{t}].[1][3] The intrinsic matrix K captures internal camera parameters, including focal lengths f_x and f_y (in pixels), the principal point (c_x, c_y) at the image center, and skew coefficient s to model non-orthogonal pixel axes, typically assuming zero skew for simplicity:K = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}. [2][3] These five parameters define how 3D rays in the camera coordinate system convert to 2D pixel coordinates.[2] The extrinsic parameters consist of a rotation matrix R (3×3 orthogonal) and translation vector \mathbf{t} (3×1), which together describe the camera's rigid transformation from world to camera coordinates, with six degrees of freedom (three for rotation and three for position).[1][3] This decomposition allows separate estimation of camera internals from its external pose, often using known 3D-2D correspondences via methods like direct linear transformation (DLT).[2] In practice, the camera matrix facilitates applications in augmented reality, robotics, and photogrammetry by enabling accurate 3D-to-2D projections and inverse problems like pose estimation.[3] It assumes an ideal pinhole model, ignoring distortions like radial or tangential effects, which are handled by additional calibration parameters in extended models.[1]