The linear–quadratic regulator (LQR) is a foundational method in optimal control theory for designing state-feedback controllers that stabilize linear time-invariant dynamical systems while minimizing a quadratic cost function, which balances the magnitude of state deviations from a desired trajectory against the energy expended by control inputs.[1] This approach assumes full knowledge of the system state and relies on solving the algebraic Riccati equation to compute optimal feedback gains, ensuring closed-loop stability under controllable and observable conditions.[2] The LQR problem is typically formulated for infinite-horizon operation, where the cost is the integral of a quadratic form involving the state vector x(t) and control vector u(t), expressed as \int_0^\infty (x^T Q x + u^T R u) \, dt, with positive semi-definite Q and positive definite R weighting matrices.[3] The resulting control law takes the linear form u(t) = -K x(t), where K = R^{-1} B^T P and P solves the Riccati equation A^T P + P A - P B R^{-1} B^T P + Q = 0 for the system \dot{x} = A x + B u.[4]Introduced by Rudolf E. Kalman in 1960 as part of his seminal contributions to optimal control, the LQR built on earlier dynamic programming ideas from Richard Bellman and addressed the need for systematic feedback design in linear systems.[5] Kalman's work formalized the problem using Hamilton-Jacobi-Bellman theory, deriving the Riccati differential equation as the key computational tool and introducing controllability as a necessary condition for solvability.[4] By the 1960s, extensions incorporated finite-horizon variants and observer-based implementations for partial state observation, evolving into the linear-quadratic-Gaussian framework when combined with Kalman filtering for noisy environments.[3] These developments established LQR as a benchmark for robust control, with inherent guarantees of asymptotic stability and performance bounds derived from the eigenvalues of the closed-loop system.[2]In practice, LQR finds widespread application in engineering domains requiring precise regulation, such as aerospace for attitude control of spacecraft, robotics for trajectory stabilization, and structural engineering for vibration suppression in buildings and bridges.[1] Its quadratic cost structure allows tunable trade-offs between tracking accuracy and control effort via the choice of Q and R, often refined through loop-shaping or numerical optimization to meet specific robustness requirements against model uncertainties.[5] Despite assumptions of linearity and perfect statefeedback, modern variants extend LQR to nonlinear, constrained, and stochastic settings, underscoring its enduring influence in fields like autonomous vehicles and power systems.[3]
Introduction
Definition and Purpose
The linear–quadratic regulator (LQR) is a classical optimal control technique that computes a state-feedback control law to minimize a quadratic cost function for linear dynamical systems.[6] This approach assumes the system dynamics are linear and time-invariant, with the cost reflecting a trade-off between state deviations and control efforts.[7]The primary purpose of the LQR is to stabilize unstable or marginally stable systems by driving the states toward a desired equilibrium, such as the origin, while also enabling reference tracking and disturbance rejection.[6] By balancing the penalties on state errors (via a positive semi-definite weighting matrix) and control inputs (via a positive definite weighting matrix), it achieves efficient regulation without excessive energy expenditure.[7]Under the assumptions of linearity and quadratic costs, the LQR guarantees global optimality and asymptotic stability for controllable systems, with computational tractability achieved through solutions to Riccati equations.[6] These properties make it a foundational method in control theory, offering a robust starting point for more complex nonlinear or uncertain systems.[7] For instance, in aerospace applications like spacecraft attitude control, the LQR designs minimum-energy feedback laws to align vehicle orientation with thrust commands, ensuring precise maneuvering.[8]
Historical Development
The linear–quadratic regulator (LQR) emerged in the mid-20th century amid the development of optimal control theory during the 1950s and 1960s, marking a transition from frequency-domain methods to state-space approaches for system analysis and design. This period was influenced by the space race and advances in computing, which necessitated efficient control strategies for complex dynamic systems. Key figures such as Rudolf E. Kalman and Richard S. Bucy played pivotal roles; Kalman's work laid the groundwork by framing control problems in terms of minimizing quadratic cost functions for linear systems.[9]A seminal milestone occurred in 1960 when Kalman published his paper "Contributions to the Theory of Optimal Control," introducing the solution to the linear quadratic problem via the Riccati equation and establishing the core framework for LQR design. This formulation provided a systematic method to compute optimal feedback gains, influencing subsequent research in multivariable control.[9]The evolution of LQR continued in the 1960s through its integration with state estimation via the Kalman-Bucy filter, leading to the linear-quadratic-Gaussian (LQG) framework that handled stochastic disturbances in real-world applications. Computational advances in the 1970s and 1980s, driven by the proliferation of digital computers and microprocessors, enabled the real-time solution of Riccati equations and implementation of LQR controllers in embedded systems, broadening its practicality beyond theoretical analysis.[9][10]Into the 2020s, LQR remains a foundational tool in robotics and autonomous vehicles, valued for its robustness and optimality in path tracking and stabilization tasks, with ongoing refinements but no fundamental paradigm shifts since the late 20th century.[11][12]
Mathematical Prerequisites
Linear State-Space Models
The linear state-space model provides a fundamental framework for representing the dynamics of linear time-invariant (LTI) systems in control theory. In this representation, the system's behavior is described by a set of first-orderdifferential (or difference) equations that capture the evolution of the internal state over time in response to inputs. For continuous-time systems, the state-space form is given by\dot{\mathbf{x}}(t) = A \mathbf{x}(t) + B \mathbf{u}(t),where \mathbf{x}(t) \in \mathbb{R}^n is the state vector, \mathbf{u}(t) \in \mathbb{R}^m is the input vector, A \in \mathbb{R}^{n \times n} is the system matrix describing the intrinsic dynamics, and B \in \mathbb{R}^{n \times m} is the input matrix that maps inputs to state changes. For discrete-time systems, the analogous form is\mathbf{x}_{k+1} = A \mathbf{x}_k + B \mathbf{u}_k,with \mathbf{x}_k and \mathbf{u}_k denoting states and inputs at time step k. This notation uses boldface to distinguish vectors and matrices, and time dependence is explicit where relevant to highlight the evolution. The state-space approach, introduced as a unified method for analyzing linear dynamical systems, shifts focus from input-output relations to internal state trajectories, enabling comprehensive system analysis.[13]Key assumptions underlying the linear state-space model include linearity, meaning the equations are affine in states and inputs with no higher-order terms, and time-invariance, where the matrices A and B remain constant over time. These properties ensure that superposition holds, allowing solutions to be scaled and added linearly. Additionally, controllability is a critical assumption, defined as the ability to drive the state vector from any initial condition to the origin in finite time using admissible inputs; it depends on the rank of the controllability matrix [B, AB, \dots, A^{n-1}B] being full (equal to n). In the context of regulator problems, controllability guarantees that stabilizing feedback can be designed. The input matrix B plays a pivotal role here, as its columns span the directions in state space directly influenced by inputs, determining the extent to which external controls can affect the system's evolution.Properties of the state-space model facilitate stability analysis, primarily through the eigenvalues of the system matrix A. For continuous-time LTI systems, asymptotic stability requires all eigenvalues of A to have negative real parts, ensuring that the unforced state \mathbf{x}(t) = e^{At} \mathbf{x}(0) converges to zero as t \to \infty. In discrete time, stability demands that all eigenvalues lie inside the unit circle in the complex plane. The matrix B does not directly affect open-loop stability but modulates how inputs can counteract or enhance the natural dynamics dictated by A. These characteristics make the state-space form indispensable for understanding system behavior before applying control strategies.[14][15]
Quadratic Cost Functions
In optimal control, particularly for the linear-quadratic regulator (LQR), the performance of a control strategy is evaluated using a quadratic cost function, which quantifies the trade-off between state regulation and control effort. For the infinite-horizon continuous-time case, the cost is typically defined as the integralJ = \int_0^\infty \left( \mathbf{x}(t)^T Q \mathbf{x}(t) + \mathbf{u}(t)^T R \mathbf{u}(t) \right) dt,where Q \in \mathbb{R}^{n \times n} is a positive semi-definite symmetric matrix weighting the state deviations (often chosen to penalize specific states like position or velocity errors), and R \in \mathbb{R}^{m \times m} is a positive definite symmetric matrix weighting the control inputs (reflecting energy or actuator limits). The quadratic form ensures the cost is convex, facilitating analytical solutions via dynamic programming or calculus of variations. For finite-horizon problems, the integral is taken from 0 to T, potentially with a terminal cost \mathbf{x}(T)^T P_f \mathbf{x}(T) to account for endpoint conditions. In discrete time, the sum replaces the integral: J = \sum_{k=0}^\infty \left( \mathbf{x}_k^T Q \mathbf{x}_k + \mathbf{u}_k^T R \mathbf{u}_k \right).The choice of Q and R is application-specific: larger Q elements emphasize tighter state tracking, while larger R elements reduce aggressive control to conserve resources. Observability of the pair (A, \sqrt{Q}) (where \sqrt{Q} is a matrix square root) ensures the cost penalizes all controllable modes, complementing the controllability assumption. This framework, rooted in quadratic programming, provides a measurable objective for deriving optimal feedback laws that minimize J subject to the system dynamics.[16]
Problem Formulation
General Setup
The linear–quadratic regulator (LQR) addresses the optimal control of linear dynamical systems by minimizing a quadratic performance index that penalizes deviations in state from a desired trajectory and excessive control effort.[4] This formulation assumes a continuous-time, time-invariant system with perfect state measurement available, distinguishing it from extensions like the linear–quadratic-Gaussian (LQG) controller, which incorporates process and measurement noise.[17] The state-space representation and quadratic cost structure draw from general linear systems theory, as outlined in the mathematical prerequisites.[6]The core LQR problem is to determine a control input u(t) that minimizes the cost functionalJ = \int_0^T \left( x(t)^\top Q x(t) + u(t)^\top R u(t) \right) \, dt + x(T)^\top P_f x(T),subject to the linear dynamics\dot{x}(t) = A x(t) + B u(t),with given initial state x(0) = x_0, where Q \succeq 0 and R \succ 0 are symmetric weighting matrices for states and inputs, respectively, and P_f \succeq 0 is an optional terminal cost matrix.[17] Terminal constraints may be incorporated via P_f, but the problem often assumes none for simplicity.[18] The system matrices A \in \mathbb{R}^{n \times n} and B \in \mathbb{R}^{n \times m} describe the state evolution, with the pair (A, B) assumed controllable to ensure solvability.[4]Under these assumptions, the optimal control takes the state-feedback form u(t) = -K(t) x(t), where K(t) is a time-varying gain matrix designed to balance the cost terms.[17] This feedback law renders the closed-loop system \dot{x} = (A - B K) x asymptotically stable, driving the state to the origin while respecting the quadratic penalties.[6] The resulting K is computed offline via dynamic programming, yielding a certainty-equivalence solution without online optimization.[18]
Finite-Horizon Variant
In the finite-horizon variant of the linear-quadratic regulator (LQR), the control problem is formulated over a fixed time interval [0, T] with T < \infty, aiming to minimize a quadratic cost that balances state deviations, control effort, and a terminal penalty. The cost functional is given byJ = x(T)^T P_T x(T) + \int_0^T \left( x(t)^T Q(t) x(t) + u(t)^T R(t) u(t) \right) dt,where P_T \geq 0 is the terminal weighting matrix penalizing the final state x(T), and Q(t) \geq 0, R(t) > 0 are symmetric time-varying matrices weighting the state and control penalties, respectively. This setup assumes linear dynamics \dot{x}(t) = A(t) x(t) + B(t) u(t) with initial condition x(0) = x_0, allowing for optimization of transient behavior without assuming perpetual operation.[19][20]The optimal control policy in this finite-horizon LQR takes the form of a time-varying state feedback u(t) = -K(t) x(t), where the gain matrix K(t) is determined by solving equations backward in time from the terminal condition at t = T to the initial time t = 0. This backward integration ensures the gain adapts to the approaching horizon, providing precise control adjustments that diminish in influence as the terminal time nears, particularly when P_T emphasizes final accuracy. Unlike steady-state approaches, this time-varying nature captures the evolving priorities over the finite interval.[19]Finite-horizon LQR finds applications in short-term tasks requiring precise trajectory optimization, such as missile guidance systems where the goal is to minimize intercept error within a bounded engagement time. In these scenarios, the method enables integrated guidance and control by shaping trajectories to meet terminal constraints like impact angle and miss distance. Compared to the infinite-horizon variant, it lacks an asymptotic steady-state gain, instead demanding repeated backward computations for each horizon length, which elevates the computational load especially for extended finite intervals.[21][19]
Infinite-Horizon Variant
The infinite-horizon variant of the linear-quadratic regulator (LQR) addresses optimal control over an unbounded time interval, where the horizon T approaches infinity and no terminal cost term is included in the objective function. This formulation minimizes the accumulated quadratic cost indefinitely, emphasizing steady-state performance and long-term stability rather than a fixed endpoint. It is particularly suited for systems requiring persistent regulation without predefined termination, such as continuous industrial processes.[19]For the infinite-horizon cost to remain finite and the optimal solution to converge, the system must satisfy specific structural conditions: the pair (A, B) must be stabilizable to ensure controllability of unstable modes, and the pair (A, C) must be detectable, where Q = C^\top C, to guarantee observability of those modes. These assumptions prevent divergence of the cost functional and ensure the existence of a stabilizing feedback law. Without detectability, unobservable unstable modes could lead to unbounded costs, rendering the problem ill-posed.[22][23]Under these conditions, the optimal control policy simplifies to a time-invariant linear feedback gain K, applied constantly after any initial transient phase, yielding u(t) = -K x(t). This constant gain promotes asymptotic stability, driving the state to the origin over time while minimizing the ongoing quadratic penalties on state deviations and control effort. The infinite-horizon LQR thus provides a stationary policy ideal for perpetual operation.[24]As the finite-horizon length T increases, the time-varying solution to the associated Riccati equation converges to a steady-state form known as the algebraic Riccati solution, establishing equivalence between the limiting finite-horizon and infinite-horizon problems. This asymptotic behavior underscores the practicality of the infinite-horizon approach for applications like chemical process control or power system stabilization, where indefinite operation demands robust, unchanging regulation strategies.[22][25]
Optimal Control Derivation
Hamilton-Jacobi-Bellman Approach
The Hamilton-Jacobi-Bellman (HJB) approach to the linear-quadratic regulator (LQR) derives the optimal control policy using principles of dynamic programming, where the value function represents the minimum cost-to-go from any state at a given time.[26] This method frames the LQR problem as a continuous-time optimal control task with linear dynamics \dot{x} = Ax + Bu and quadratic stage cost x^T Q x + u^T R u, assuming Q \succeq 0 and R \succ 0.[27] The value function V(x, t) is defined as the infimum over admissible controls of the total cost from time t to the horizon T, including the integral of the stage cost plus a terminal quadratic penalty x(T)^T Q_f x(T).[18]Applying the dynamic programming principle leads to the HJB equation, which enforces optimality at every state and time:\min_u \left[ \frac{\partial V}{\partial t} + \frac{\partial V}{\partial x} (Ax + Bu) + x^T Q x + u^T R u \right] = 0,with the terminal condition V(x, T) = x^T Q_f x.[26] This partial differential equation characterizes the value function as the solution to a boundary-value problem, connecting the LQR to broader optimal control theory through the Hamiltonian formulation.[27]Given the quadratic structure of the cost and the linearity of the dynamics, the value function is assumed to take the quadratic form V(x, t) = x^T P(t) x, where P(t) is a symmetric positive semidefinite matrix that evolves backward in time.[18] Substituting this ansatz into the HJB equation yields the optimal control by minimizing the expression with respect to u. The minimization step involves taking the derivative with respect to u and setting it to zero, resulting in the state-feedback law u^* = -R^{-1} B^T P x.[26] This linear feedback gain directly ties the HJB solution to the LQR's closed-loop structure, highlighting its role as a cornerstone of feedbackoptimal control.[27]
Riccati Equation Solution
In the Hamilton-Jacobi-Bellman (HJB) framework for the linear-quadratic regulator (LQR) problem, the optimal value function is assumed to take a quadratic form V(x, t) = x^T P(t) x, where P(t) is a symmetric positive definite matrix that evolves with time.[7] This assumption simplifies the nonlinear HJB partial differential equation into an ordinary matrix differential equation for P(t).[28]Substituting the quadratic value function into the HJB equation and minimizing the Hamiltonian with respect to the control input u yields the continuous-time differential Riccati equation:\dot{P}(t) + A^T P(t) + P(t) A - P(t) B R^{-1} B^T P(t) + Q = 0,where A and B are the state and input matrices of the linear system, Q is the state weighting matrix, and R is the input weighting matrix, assumed positive definite.[29] This equation, first derived in the context of optimal control for linear systems, describes the backward evolution of P(t) from the terminal time to the initial time.[4]For the infinite-horizon LQR problem, where the time horizon extends to infinity and the system is asymptotically stable under optimal control, the solution seeks a constant positive definite matrix P. Setting \dot{P} = 0 in the differential Riccati equation results in the algebraic Riccati equation:A^T P + P A - P B R^{-1} B^T P + Q = 0.A unique positive definite solution P exists under the assumptions of controllability of the pair (A, B) and positive definiteness of Q and R.[29][7]The boundary condition for the finite-horizon case is P(T) = P_T, typically P_T = Q_f to account for terminal state costs, ensuring the solution remains positive definite for all t \in [0, T].[28] For the infinite-horizon case, the positive definiteness of P guarantees the stability of the closed-loop system.[4]Once P is obtained by solving the appropriate Riccati equation, the optimal feedback gain matrix is computed as K(t) = R^{-1} B^T P(t) for the time-varying case or K = R^{-1} B^T P for the steady-state case, yielding the linear control law u = -K x.[29][7]
Specific Implementations
Continuous-Time Finite-Horizon
The continuous-time finite-horizon linear-quadratic regulator (LQR) provides an optimal feedback control law for linear dynamical systems over a specified time interval [0, T], minimizing the quadratic cost \int_0^T (x(t)^T Q x(t) + u(t)^T R u(t)) \, dt, where Q \succeq 0 and R \succ 0 weight state and input penalties, respectively, assuming no terminal cost for simplicity.[4][28] The solution relies on the time-varying positive semidefinite matrix P(t) that satisfies the differential Riccati equation, derived from the Hamilton-Jacobi-Bellman framework.[4]The Riccati differential equation is given by\dot{P}(t) = -A^T P(t) - P(t) A + P(t) B R^{-1} B^T P(t) - Q,with the terminal boundary condition P(T) = 0.[28][23] This ordinary differential equation (ODE) is integrated backward in time from t = T to t = 0 to obtain P(t) for all t \in [0, T], ensuring the cost-to-go function V(t, x) = x^T P(t) x captures the optimal value from time t onward.[4][28] If a nonzero terminal cost matrix Q_f \succeq 0 is included, the boundary becomes P(T) = Q_f, but the equation form remains unchanged.[28]The time-varying optimal feedback gain is then computed asK(t) = R^{-1} B^T P(t),yielding the control law u(t) = -K(t) x(t).[4][23] This gain adjusts dynamically over the horizon, typically becoming more aggressive early in the interval to drive the state toward zero and less so near t = T under the zero-terminal-cost assumption.[28] Substituting into the system dynamics \dot{x}(t) = A x(t) + B u(t) produces the closed-loop form\dot{x}(t) = (A - B K(t)) x(t),which is asymptotically stable over [0, T] for controllable and observable systems, steering the state to the origin by the horizon end.[4][28]Computationally, the Riccati equation is solved via numerical backward integration methods, such as Runge-Kutta solvers, starting from P(T) and propagating to initial time; this "backward recursion" avoids forward simulation errors accumulating over long horizons.[28][23] The solution exhibits sensitivity to the horizon length T: shorter T results in conservative gains that prioritize feasibility over performance, while longer T allows more optimal but computationally intensive trajectories, with numerical stability improving for moderate T under well-conditioned A and B.[28][23]
Continuous-Time Infinite-Horizon
In the continuous-time infinite-horizon linear-quadratic regulator (LQR) problem, the goal is to design a feedbackcontrol law for the linear time-invariant system \dot{x}(t) = A x(t) + B u(t) that minimizes the quadratic cost functional J = \int_0^\infty \left( x(t)^T Q x(t) + u(t)^T R u(t) \right) dt, where Q \succeq 0 and R \succ 0 are symmetric weighting matrices.[4] This formulation assumes no terminal cost and requires the integral to converge, which holds under stabilizability conditions on the system. The optimal control takes the form of a constant linear statefeedback u(t) = -K x(t), where the gain matrix K is time-invariant, contrasting with the time-varying gains in finite-horizon variants.[4]The optimal gain K is derived by solving the Hamilton-Jacobi-Bellman equation in steady state, yielding the continuous algebraic Riccati equation (ARE):A^T P + P A - P B R^{-1} B^T P + Q = 0,where P \succeq 0 is the unique positive semidefinite solution.[4] If the pair (A, B) is controllable and the pair (Q^{1/2}, A) is observable, then P is positive definite, ensuring the closed-loop system matrix A - B K has all eigenvalues with negative real parts, thus guaranteeing asymptotic stability of the origin for any initial state.[4] The feedback gain is then computed as K = R^{-1} B^T P.[4]Several numerical methods exist for solving the ARE. A direct approach involves forming the Hamiltonian matrixH = \begin{pmatrix} A & -B R^{-1} B^T \\ -Q & -A^T \end{pmatrix}and computing its Schur decomposition to identify the stable invariant subspace, from which P is obtained via the corresponding deflating subspace; this method is robust and widely implemented in software libraries.[30] Alternatively, iterative solvers such as the Kleinman algorithm start with an initial guess P_0 = 0 and successively solve the associated Lyapunov equation A^T P_{k+1} + P_{k+1} A + Q - P_{k+1} B R^{-1} B^T P_k + P_k B R^{-1} B^T P_{k+1} = 0 until convergence, leveraging the monotonicity of the sequence under the aforementioned assumptions.[31] These techniques ensure computational efficiency for systems of moderate dimension while preserving the theoretical guarantees of uniqueness and stability.[30]
Discrete-Time Finite-Horizon
The discrete-time finite-horizon linear quadratic regulator (LQR) provides an optimal feedbackcontrol strategy for linear systems over a predetermined number of time steps, commonly applied in sampled-data contexts where continuous processes are discretized for computational implementation.[32] This formulation arises from dynamic programming principles, minimizing a quadratic performance index subject to linear dynamics, and yields time-varying feedback gains computed offline via recursive matrix equations. Unlike its continuous-time counterpart, which relies on differential equations, the discrete version employs matrix algebra suitable for digital processors, enabling precise control in applications like robotics and process automation.[33]The underlying system is modeled by the linear discrete-time dynamicsx_{k+1} = A x_k + B u_k,where x_k \in \mathbb{R}^n denotes the state vector at time step k, u_k \in \mathbb{R}^m is the control input, and A, B are the state transition and input matrices, respectively, assumed constant over the horizon.[32] The control objective is to minimize the finite-horizon quadratic cost functionalJ = \sum_{k=0}^{N-1} \left( x_k^T Q x_k + u_k^T R u_k \right),with symmetric positive semidefinite state weighting matrix Q \geq 0 and positive definite input weighting matrix R > 0, penalizing deviations from the origin without a terminal cost term. The optimal policy takes the linear state-feedback form u_k = -K_k x_k, where the time-varying gain K_k balances state regulation against control effort.[32]To derive the gains, the solution involves backward recursion through the discrete-time Riccati difference equation, starting from the horizon end with P_N = 0:P_k = A^T P_{k+1} A + Q - A^T P_{k+1} B (R + B^T P_{k+1} B)^{-1} B^T P_{k+1} A, \quad k = N-1, \dots, 0.Each P_k \geq 0 represents the value function matrix at step k, capturing the cumulative future cost-to-go. The corresponding optimal gain is thenK_k = (R + B^T P_{k+1} B)^{-1} B^T P_{k+1} A,ensuring the feedback u_k = -K_k x_k achieves the minimum cost, assuming (A, B) is stabilizable and (A, Q^{1/2}) is detectable to guarantee convergence properties over the horizon.[32] This recursion is efficiently solved numerically, with computational complexity scaling as \mathcal{O}(N n^3) for horizon length N and state dimension n, making it feasible for moderate-sized systems.In practice, the discrete-time finite-horizon LQR is integral to digital control systems, where analog plants are discretized via methods like zero-order hold, allowing implementation on microcontrollers for tasks such as trajectory tracking in unmanned vehicles or inventory management in operations research.[33] The approach's optimality holds under the linearity and quadratic cost assumptions, providing a benchmark for robust and adaptive extensions in uncertain environments.
Discrete-Time Infinite-Horizon
In the discrete-time infinite-horizon linear-quadratic regulator (LQR) problem, the goal is to find a time-invariant feedbackgain that minimizes the infinite sum of a quadraticcost function for a linear system x_{k+1} = A x_k + B u_k, where the cost is \sum_{k=0}^{\infty} (x_k^T Q x_k + u_k^T R u_k), with Q \geq 0 and R > 0 symmetric.[29] This formulation assumes the pair (A, B) is stabilizable and (A, Q^{1/2}) is detectable, ensuring the existence of a unique positive semidefinite solution to the associated algebraic equation.The optimal cost-to-go matrix P satisfies the discrete algebraic Riccati equation (ARE):\begin{aligned}
P &= A^T P A + Q - A^T P B (R + B^T P B)^{-1} B^T P A,
\end{aligned}which represents the steady-state limit of the finite-horizon recursion as the number of steps approaches infinity.[29] The corresponding stationaryfeedback gain is then given byK = (R + B^T P B)^{-1} B^T P A,yielding the optimal control law u_k = -K x_k.[29]To solve the discrete ARE numerically, iterative methods such as the Kleinman iteration can be employed, which converges to the stabilizing solution P under the stabilizability assumption by successively updating P_{i+1} = A^T P_i A + Q - A^T P_i B (R + B^T P_i B)^{-1} B^T P_i A, starting from P_0 = 0. Alternatively, the Schur decomposition method transforms the associated Hamiltonian matrix into a Schur form to extract the stable invariant subspace, from which P is recovered, providing a direct and reliable computation for systems where iterative convergence may be slow.The resulting closed-loop system dynamics are x_{k+1} = (A - B K) x_k, and under the given assumptions, the matrix A - B K is Schur stable, meaning all its eigenvalues lie strictly inside the unit disk, ensuring asymptotic stability and finite expected cost.[29]
Constraints and Limitations
Unconstrained Assumptions
The standard linear–quadratic regulator (LQR) is formulated under the assumption that control inputs and state variables are unbounded, permitting the optimization process to select any real-valued inputs and allowing states to evolve without prescribed limits. This unconstrained nature simplifies the mathematical derivation, enabling closed-form solutions via dynamic programming or variational methods.[4]Full state feedback is another core assumption, requiring that all components of the system state vector are directly measurable and accessible for real-timecontrolcomputation, without reliance on estimation or observers.[34] Furthermore, the LQR framework assumes a deterministic system dynamics, free from external disturbances, process noise, or measurement errors, ensuring that the model perfectly captures the system's behavior.[35]These assumptions imply that the optimal feedback gain, derived as a linear state feedback law u = -Kx from the solution to the associated Riccati equation, may demand control efforts that saturate physical actuators in practical applications, such as exceeding voltage limits in electric motors or thrust bounds in aerospace systems.[23] Consequently, when inputs saturate, the closed-loop system can deviate from the predicted trajectory, potentially resulting in suboptimal performance or even instability if the unconstrained design pushes the system beyond safe operating regimes.[35]A primary limitation of these unconstrained assumptions is their disconnection from real-world engineering constraints, such as actuator saturation or state bounds in safety-critical domains like robotics or automotive control, where violation can lead to catastrophic failure despite the design's theoretical soundness.[23] Nonetheless, under the stated assumptions and provided the linear system is controllable, the LQR guarantees global optimality by minimizing the infinite-horizon quadratic cost functional and ensures asymptotic stability of the closed-loop dynamics around the origin.[4] This stability holds uniformly for all initial states, leveraging the positive definiteness of the cost matrices and the linearity of the dynamics.[34]
Handling State and Input Constraints
A common practical approach to handle input constraints in the linear-quadratic regulator (LQR) involves clipping the unconstrained optimal control input to respect actuator limits, yielding the saturated control law u = \text{sat}(-K x), where K is the LQR gainmatrix and \text{sat}(\cdot) denotes the saturation function.[36] This method is widely applied in semi-active control systems, such as vehicle suspensions, where the clipped input approximates the ideal force while adhering to hardware bounds. However, simple clipping can lead to performance degradation or instability for large state deviations, as the saturated feedback no longer minimizes the quadraticcost exactly.[36]To ensure stability under input saturation, Lyapunov-based analysis employs sector bounds to model the nonlinearity introduced by the saturation function, treating the difference between saturated and unsaturated controls as a bounded perturbation. For instance, Fu (2001) derives an optimal sector bound \rho that characterizes this mismatch and constructs a Lyapunov function V(x) = x^T P_\rho x, where P_\rho solves a modified Riccati equation, guaranteeing asymptotic stability within nested invariant ellipsoidal sets.[37] This approach enlarges the region of attraction compared to naive clipping while preserving near-optimal performance for small states.[38]State constraints, such as bounds on position or velocity, require reformulating the LQR problem as a quadratic program (QP) solved at each time step, minimizing the quadratic cost subject to linear inequalities on states and inputs. Scokaert and Rawlings (1998) show that for the infinite-horizon discrete-time case, this constrained LQR yields a stabilizing feedback policy by iteratively solving finite-horizon QPs, with the solution converging to a unique steady-state gain that respects constraints without tuning parameters like control horizons.[39] This QP-based method ensures feasibility and optimality within the constrained set but shifts from the closed-form Riccati solution of unconstrained LQR.For mild constraints, alternative strategies include gain scheduling, where the LQR gain K is adapted dynamically based on the state magnitude to preempt saturation, or explicit multiparametric quadratic programming that precomputes a piecewise affine control law over polyhedral regions. Komaee (2024) proposes dynamic gain adaptation in LQR to balance high performance with saturation avoidance, using a state-dependent scaling that maintains stability via Lyapunov analysis.[40] Similarly, Bemporad et al. (2002) derive an explicit solution for constrained LQR as a continuous piecewise quadratic function, enabling fast online evaluation akin to model predictive control (MPC) predictions over short horizons.[41] These methods suit systems with known bound structures, such as robotic manipulators.Incorporating constraints into LQR often results in a loss of global optimality, as the saturated or QP-adjusted inputs deviate from the unconstrained minimum-cost trajectory, potentially reducing the attractionbasin or requiring conservative designs. Computationally, QP solving or explicit partitioning increases online demands compared to matrix multiplications in standard LQR, with complexity scaling cubically in horizon length for dense formulations. In robotics applications, such as trajectory optimization for legged platforms, constrained iterative LQR variants address joint limits and contact forces but rely on approximations like active-set methods, leading to suboptimality and numerical challenges with high-degree constraints.[42]
Extensions and Related Controllers
Linear-Quadratic-Gaussian Regulator
The linear-quadratic-Gaussian (LQG) regulator addresses the optimal control of linear systems in the presence of stochastic disturbances, integrating the linear-quadratic regulator (LQR) for deterministic control with a Kalman filter for state estimation under Gaussian noise assumptions. This framework applies to discrete-time systems described by the state evolution x_{k+1} = A x_k + B u_k + w_k and noisy measurements y_k = C x_k + v_k, where w_k and v_k are zero-mean Gaussian process and measurement noises with covariances Q_n and R_n, respectively. The LQG approach minimizes the expected value of a quadratic cost functional similar to the LQR, but accounting for estimation uncertainty.[43]Central to the LQG regulator is the separation principle, which establishes that the optimal control and estimation problems can be solved independently, with the overall controller applying the LQR feedback gain to the state estimate from the Kalman filter. Specifically, the control law takes the form u_k = -K \hat{x}_{k|k}, where K is the optimal LQR gain matrix and \hat{x}_{k|k} is the filtered state estimate. This independence simplifies design: the LQR gain K is computed based on the system matrices A, B, and the cost weighting matrices Q and R as in the deterministic case, while the estimator parameters depend solely on the noise covariances Q_n and R_n. The separation principle holds under the Gaussian noise assumption, ensuring the combined controller achieves the minimal expected cost.[43][4]The Kalman filter provides the state estimate \hat{x}_{k|k} through a two-step prediction-correction process. In the prediction step, the a priori estimate is propagated forward using the system dynamics: \hat{x}_{k|k-1} = A \hat{x}_{k-1|k-1} + B u_{k-1}, with the associated error covariance updated via the Riccati equation incorporating Q_n. The correction step then incorporates the new measurement y_k to yield the a posteriori estimate \hat{x}_{k|k} = \hat{x}_{k|k-1} + L_k (y_k - C \hat{x}_{k|k-1}), where the Kalman gain L_k is derived from the predicted covariance, measurement matrix C, and noise covariances Q_n and R_n to minimize estimation error variance. For time-invariant systems, steady-state gains K and L are obtained by solving algebraic Riccati equations.The LQG framework embodies the certainty equivalence principle, whereby the optimal control policy treats the state estimate \hat{x}_{k|k} as the true state, ignoring the uncertainty in the estimate for the purpose of feedback computation. This equivalence arises directly from the separation principle and the quadratic-Gaussian structure, ensuring that the certainty-equivalent controller coincides with the fully optimal stochastic controller. While this yields the minimal expected quadratic cost, it does not explicitly hedge against estimation errors, a limitation addressed in robust extensions.[43]
Model Predictive Control
Model Predictive Control (MPC) is an optimal control technique that employs a dynamic model to forecast the system's future states over a finite prediction horizon. At each sampling instant, an optimization problem is solved to compute a sequence of future control inputs that minimize a cost function, subject to the predicted system dynamics and any operational constraints. Only the initial control action from this sequence is implemented, after which the horizon recedes forward, and the optimization is repeated using updated measurements, enabling closed-loop feedback and adaptation to disturbances or model mismatches. This receding-horizon strategy originated in process industries in the late 1970s and has since become a standard for multivariable constrained systems due to its ability to handle complex interactions explicitly.[44]In its linear form, MPC bears a direct relation to the linear-quadratic regulator (LQR), extending it to incorporate constraints and finite horizons. For unconstrained linear systems with quadratic costs, the MPC problem reduces to the finite-horizon LQR optimization, where the solution yields a time-varying statefeedback law; as the prediction horizon approaches infinity, this converges to the steady-state infinite-horizon LQR gain derived from the algebraic Riccati equation. When constraints are introduced, such as bounds on states x_k \in \mathcal{X} or inputs u_k \in \mathcal{U}, the problem transforms into a quadratic program (QP), solved online at each step to enforce feasibility while minimizing the cost. This constrained formulation results in a piecewise affine or nonlinear feedback law, contrasting with LQR's linear structure.[45][46]The primary advantages of MPC over traditional LQR stem from its explicit treatment of constraints and predictive nature. LQR assumes unbounded states and inputs, potentially leading to infeasible or saturating controls in real systems, whereas MPC systematically incorporates inequality constraints via the QP, ensuring safe operation in applications like chemical processes or robotics where actuator limits and safety bounds are paramount. Furthermore, the online re-optimization in MPC provides superior disturbance rejection and tracking performance compared to LQR's static gain, as it anticipates future behavior and adjusts dynamically; however, this comes at the cost of higher computational demand, often mitigated by explicit solutions precomputing the feedback map offline. While linear MPC focuses on quadratic costs for convex optimization, it can extend to nonlinear variants for broader applicability, though stability guarantees require careful terminal cost and set design akin to LQR.[44][45]The standard discrete-time linear MPC formulation for a system x_{k+1} = A x_k + B u_k is given by:\begin{align*}
\min_{u_{0|k}, \dots, u_{N-1|k}} &\quad \sum_{i=0}^{N-1} \left( x_{i|k}^\top Q x_{i|k} + u_{i|k}^\top R u_{i|k} \right) + x_{N|k}^\top P x_{N|k} \\
\text{subject to} &\quad x_{i+1|k} = A x_{i|k} + B u_{i|k}, \quad x_{0|k} = x_k, \\
&\quad x_{i|k} \in \mathcal{X}, \quad u_{i|k} \in \mathcal{U}, \quad i = 0, \dots, N-1, \\
&\quad x_{N|k} \in \mathcal{X}_f,
\end{align*}where x_{i|k} and u_{i|k} denote predicted state and input at time k+i based on current state x_k, Q \succeq 0 and R \succ 0 are weighting matrices penalizing state deviations and control effort, P \succeq 0 is the terminal cost matrix (often the LQR solution), \mathcal{X} and \mathcal{U} are convex sets defining state and input constraints, and \mathcal{X}_f is a terminal set ensuring recursive feasibility and stability. The optimal input sequence U_k^* = \{u_{0|k}^*, \dots, u_{N-1|k}^*\} yields the applied control u_k = u_{0|k}^*, with the process repeating at k+1. This setup mirrors the finite-horizon LQR cost but adds the inequality constraints, enabling MPC's practical utility in constrained environments.[45][46]
Other Variants
The quadratic-quadratic regulator variant of the linear-quadratic regulator incorporates penalties solely on quadratic input terms, including the rate of change of the control input, to promote smoother controller behavior while minimizing state deviations. The associated cost function simplifies toJ = \int_0^\infty \left( x^\top Q x + u^\top R u + \dot{u}^\top S \dot{u} \right) \, dt,where Q \geq 0 weights the state, R > 0 and S \geq 0 weight the control and its derivative, respectively. This formulation addresses limitations of standard LQR in systems requiring bounded control rates, such as flight control, by augmenting the dynamics with control derivative states.[47]The polynomial-quadratic regulator extends LQR to approximate nonlinear dynamics through higher-order polynomial terms in the cost function, enabling better handling of systems with mild nonlinearities. For instance, including cubic terms in the states allows the controller to capture asymmetries or higher-fidelity behaviors without full nonlinear optimization. Seminal work on approximating solutions for such problems uses structured methods to compute feedback gains, often yielding low-degree polynomial controllers for tractability in high-dimensional settings.[48] Recent advancements further explore polynomial-polynomial costs for broader nonlinear approximations, maintaining quadratic-like solvability via sum-of-squares techniques.Risk-sensitive LQR addresses uncertainty in stochastic environments by replacing the expected quadratic cost with an exponentialutility function, emphasizing worst-case scenarios for risk-averse decision-making. The criterion becomes J = \frac{2}{\theta} \log \mathbb{E} \left[ \exp \left( \frac{\theta}{2} \int_0^\infty (x^\top Q x + u^\top R u) \, dt \right) \right], where \theta > 0 tunes risk sensitivity; as \theta \to \infty, it converges to H-infinity control for robustness against disturbances. This variant, originally formulated for linear-quadratic-Gaussian systems, provides certainty-equivalent solutions via Riccati equations with adjusted parameters.[49]In the 2020s, robust LQR variants have seen integration with adaptive algorithms and AI-driven reinforcement learning to enable online learning of system parameters in uncertain or evolving environments, such as autonomous systems and robotics. These approaches achieve sublinear regret bounds while ensuring stability, bridging classical optimal control with data-driven methods for real-world deployment.[50]