Fact-checked by Grok 2 weeks ago

Lyapunov optimization

Lyapunov optimization is a mathematical framework for designing online algorithms that optimize long-term average performance in stochastic dynamical systems, such as queueing networks, while ensuring stability through the use of Lyapunov functions to bound system states like queue backlogs. It achieves this via the drift-plus-penalty method, which minimizes the expected change (drift) in a quadratic Lyapunov function plus a scaled penalty term representing the optimization objective, enabling decisions that balance stability and utility without requiring prior knowledge of underlying probability distributions.^[1] The method originates from classical Lyapunov stability theory, introduced by Aleksandr Lyapunov in 1892 for deterministic systems, but was adapted for stochastic optimization in the context of communication networks and queueing systems during the early 2000s. Key developments, particularly by Michael J. Neely, formalized the approach in works like the 2010 monograph Stochastic Network Optimization with Application to Communication and Queueing Systems, where it is shown to yield algorithms with explicit performance bounds: an O(1/V) gap from the optimal time-average utility and O(V) bounds on queue sizes, with the parameter V trading off between optimality and stability. This framework extends earlier backpressure algorithms for network scheduling, providing a general tool for handling time-varying constraints and random events in real-time.^[1] Lyapunov optimization has broad applications in networked systems, including wireless resource allocation, where it maximizes throughput subject to queue stability; cloud and edge computing for task offloading and energy minimization; and more recent extensions to machine learning tasks like federated learning client selection.^[2] Unlike model-based methods such as Markov decision processes, it operates in a model-free manner, making it computationally efficient for large-scale, decentralized systems. Its probabilistic guarantees, including probability-1 convergence to optimal time averages under mild moment conditions, underscore its robustness for practical deployment.^[1]

Introduction and History

Definition and Overview

Lyapunov optimization is a framework that utilizes Lyapunov drift techniques to achieve long-term time-average performance guarantees in stochastic dynamical systems, particularly by minimizing average costs while maintaining queue stability in environments that may otherwise be unstable. This method addresses the challenge of optimizing systems subject to random events, such as variable arrivals or channel conditions, without relying on predictive models of future states.^[1] The key principle of Lyapunov optimization is to make control decisions at each discrete time slot based exclusively on the observable current system state, thereby bounding the expected drift of a suitable Lyapunov function—often a measure of queue backlog—and ensuring both stability and optimality. This real-time, causal decision-making process trades off immediate costs against congestion control, yielding algorithms that converge to optimal solutions with probability 1.^[1] In its basic setup, Lyapunov optimization applies to discrete-time stochastic systems comprising queues that evolve according to random arrivals, controllable actions, and exogenous events; the goal is to select actions that minimize the long-term time-averaged cost (such as power or delay) while constraining queues to remain stable, meaning their expected lengths grow sublinearly over time. For illustration, consider a simple single-queue system where random packets arrive at each time slot, and the server must choose a transmission rate based on the current queue length to maximize throughput while minimizing power cost; the optimization selects the rate that minimizes a weighted combination of expected queue increase (drift) and immediate power expenditure, ensuring the queue remains bounded and average power approaches the optimum. Notably model-free, this framework requires no a priori knowledge of arrival statistics, channel probabilities, or other system parameters, making it robust for online implementation.^[1] It is especially applicable to convex optimization problems, where constraints and objectives can be incorporated via auxiliary variables into the drift-based decision rule without sacrificing the no-future-knowledge property.

Historical Development

The foundations of Lyapunov optimization trace back to the work of Aleksandr Lyapunov, who in his 1892 doctoral dissertation introduced the concept of Lyapunov functions as a tool for analyzing the stability of solutions to differential equations describing continuous-time dynamical systems.^[3] This seminal contribution, titled The General Problem of the Stability of Motion, established a direct method for proving stability without solving the equations explicitly, laying the groundwork for later extensions in optimization and control.^[4] In the mid-20th century, Lyapunov's stability theory was extended to discrete-time and stochastic systems, particularly through advancements in control theory. Pioneers such as Richard Bellman, with his development of dynamic programming in the 1950s, and Rudolf Kalman, who applied Lyapunov methods to linear systems and filtering in the 1960s, facilitated these adaptations by incorporating probabilistic elements and time-discretized models into stability analysis.^[5]^[6] The emergence of Lyapunov optimization as a distinct framework occurred in the 1980s and 1990s, driven by intersections between dynamic programming, approximate methods, and queueing theory. A key milestone was the 1992 introduction of the backpressure algorithm by Leandros Tassiulas and Anthony Ephremides, which used Lyapunov drift to achieve optimal throughput and stability in multihop wireless networks.^[7] Dimitri Bertsekas played a key role through his contributions to neuro-dynamic programming, which integrated Lyapunov stability concepts with reinforcement learning and approximate dynamic programming to address large-scale stochastic control problems.^[8] A pivotal milestone came in the 2000s with Michael J. Neely's development of the drift-plus-penalty method, which applied Lyapunov drift techniques to optimize time-averaged performance in stochastic networks, particularly wireless systems subject to queue stability constraints.^[9] This approach built on earlier backpressure scheduling ideas and culminated in Neely's 2010 book Stochastic Network Optimization with Application to Communication and Queueing Systems, which formalized Lyapunov optimization for dynamic networks.^[10] Post-2010, the framework expanded to broader domains, including energy systems for renewable integration, as evidenced in works optimizing hybrid energy supplies in cellular networks,^[11] and machine learning for safe reinforcement learning, such as stability-aware policy optimization.^[12]

Mathematical Foundations

Lyapunov Functions

In the context of Lyapunov optimization, a Lyapunov function is defined as a non-negative scalar-valued function V(\mathbf{x}) that quantifies the "energy" or deviation of a system's state \mathbf{x} from an equilibrium point, typically satisfying positive definiteness and radial unboundedness to ensure it captures global system behavior.^[13] Positive definiteness requires V(\mathbf{x}) \geq 0 for all \mathbf{x}, with V(\mathbf{0}) = 0 and V(\mathbf{x}) > 0 for \mathbf{x} \neq \mathbf{0}, while radial unboundedness means V(\mathbf{x}) \to \infty as \|\mathbf{x}\| \to \infty.^[13] For stability analysis in optimization problems, particularly in discrete-time systems, the Lyapunov function must exhibit a non-positive time derivative or one-step difference, such as \Delta V(\mathbf{x}(t)) = V(\mathbf{x}(t+1)) - V(\mathbf{x}(t)) \leq 0, indicating that the system's state does not increase in "energy" along trajectories, thereby promoting convergence to feasible or optimal regions.^[14] In practice, Lyapunov functions are often constructed as quadratic forms, such as V(\mathbf{Q}) = \frac{1}{2} \sum_i Q_i^2 for a vector of state variables like queue lengths \mathbf{Q} = (Q_1, Q_2, \dots), which is positive definite and radially unbounded for non-negative \mathbf{Q}.^[15] This form leverages the Euclidean norm to measure squared deviations, simplifying analysis in vector spaces.^[16] Such functions play a central role in Lyapunov optimization by providing upper bounds on system excursions from equilibrium, enabling the design of control policies that minimize drift while optimizing objectives like utility maximization.^[1] While quadratic Lyapunov functions are prevalent for linear or quadratically constrained systems due to their computational tractability and alignment with L2 norms, they can be adapted to nonlinear cases through more general positive definite forms, such as weighted sums or composite functions that incorporate nonlinearity while preserving the required properties.^[14]

Drift Analysis

In Lyapunov optimization, the one-step drift, denoted as \Delta V(t), is defined as the conditional expected change in the Lyapunov function V(\mathbf{Q}(t)) from time slot t to t+1, given the current state vector \mathbf{Q}(t):

\Delta V(t) = \mathbb{E}\left[ V(\mathbf{Q}(t+1)) - V(\mathbf{Q}(t)) \mid \mathbf{Q}(t) \right],

where \mathbf{Q}(t) represents the vector of queue lengths or system states at time t.^[1] This measure quantifies the expected evolution of the Lyapunov function, which is typically chosen as a scalar valued function that is non-decreasing in each state variable, such as a quadratic form V(\mathbf{Q}(t)) = \frac{1}{2} \sum_i Q_i(t)^2.^[16] To analyze system stability, the drift is upper-bounded using inequalities derived from the dynamics of the state evolution. For queueing systems where Q_i(t+1) = \left[ Q_i(t) - \mu_i(t) + A_i(t) \right]^+ with A_i(t) denoting random arrivals and \mu_i(t) the service process, the squared term expands to yield

\Delta V(t) \leq B + \sum_i Q_i(t) \left( \mathbb{E}[A_i(t) \mid \mathbf{Q}(t)] - \mu_i(t) \right),

where B is a finite constant capturing bounded second moments of arrivals and services, ensuring the bound holds regardless of the specific control actions.^[1] This upper bound facilitates the derivation of stability conditions by isolating terms that depend on the state and control decisions. The presence of a negative drift term is crucial for ensuring mean-square stability, where the expected value of V(\mathbf{Q}(t)) remains finite over time. If the upper bound satisfies \Delta V(t) \leq B - \epsilon V(\mathbf{Q}(t)) for some constants B \geq 0 and \epsilon > 0 whenever V(\mathbf{Q}(t)) is sufficiently large, then \limsup_{t \to \infty} \mathbb{E}[V(\mathbf{Q}(t))] < \infty, implying that the second moments of the state variables are bounded.^[1] This condition, often verified under assumptions of finite fourth moments for arrivals and services, guarantees that the system queues do not grow unboundedly in expectation.^[17]

Stability and Optimization Theorems

The stability of queueing systems under Lyapunov optimization is established through drift conditions on a Lyapunov function V(Q(t)), typically the sum of squared queue lengths, V(Q(t)) = \frac{1}{2} \sum_i Q_i(t)^2, where Q(t) denotes the queue state vector at time t. The basic Lyapunov stability theorem states that if the one-step drift satisfies \mathbb{E}[\Delta V(t) \mid Q(t)] \leq B - \epsilon \sum_i Q_i(t) for constants B \geq 0 and \epsilon > 0, then the queues are mean-rate stable, meaning \limsup_{t \to \infty} \frac{\mathbb{E}[\sum_i Q_i(t)]}{t} = 0, and under additional bounded moment conditions, the expected queue size is bounded as \mathbb{E}[\sum_i Q_i(t)] = O(1/\epsilon).^[1] This theorem extends the classical Foster-Lyapunov criteria for positive recurrence in Markov chains to queueing networks, where the drift condition \mathbb{E}[\Delta V(t) \mid Q(t)] \leq B (with B < \infty) ensures mean rate stability \limsup_{t \to \infty} \frac{\mathbb{E}[\sum_i |Q_i(t)|]}{t} = 0, and rate stability \limsup_{t \to \infty} \frac{|Q_i(t)|}{t} = 0 almost surely holds with bounded second moments on queue updates.^[1] In optimization contexts, these criteria are adapted to verify stability while achieving performance bounds in stochastic systems without requiring knowledge of arrival or service statistics.^[1] For optimizing time-average objectives subject to stability, the drift-plus-penalty theorem provides guarantees on utility maximization or cost minimization. Specifically, if \mathbb{E}[\Delta V(t) + \beta \cdot g(Q(t), A(t)) \mid Q(t)] \leq B + \beta g^* - \epsilon \sum_i Q_i(t) for a penalty function g (such as negative utility), constants B, \epsilon > 0, parameter \beta > 0, and optimal value g^*, then the time-average penalty satisfies \limsup_{t \to \infty} \mathbb{E}[g(\bar{Q}, \bar{A})] \leq g^* + B/\beta, yielding an optimality gap of O(1/\beta), while queues remain stable with \mathbb{E}[\sum_i Q_i(t)] = O(\beta / \epsilon).^[1] Proofs of these theorems rely on telescoping sums over a time horizon T: summing the drift inequalities from t=0 to T-1 gives V(Q(T)) - V(Q(0)) \leq \sum_{t=0}^{T-1} (B - \epsilon \sum_i Q_i(t)) + penalty terms, dividing by T and taking \limsup_{T \to \infty} bounds long-term averages, with stability following from bounded moments and the law of large numbers.^[1] A key result is that Lyapunov optimization algorithms derived from these theorems achieve time-average utility within \epsilon of the optimum while maintaining queue sizes of O(1/\epsilon), operating without prior knowledge of system statistics by relying solely on current queue states and causal information.^[1]

Applications in Queueing Networks

Queueing System Models

Lyapunov optimization is commonly applied to stochastic queueing networks consisting of N queues, where the state of the i-th queue at time slot t is denoted by the backlog Q_i(t) \geq 0.^[1] Random arrivals to the i-th queue are captured by the process A_i(t) \geq 0, which is typically independent and identically distributed (i.i.d.) over time slots with finite second moments.^[1] The service rate \mu_i(t) for the i-th queue depends on the control action U(t) selected from a feasible action set \mathcal{U}, often influenced by a random network state \omega(t) that is also i.i.d. over slots.^[10] The queue dynamics evolve according to the Lindley equation:

Q_i(t+1) = \left[ Q_i(t) - \mu_i(t) \right]^+ + A_i(t),

where ^+ = \max\{x, 0\} denotes the positive part.^[1] This update reflects the standard evolution in discrete-time queueing systems, where served packets are removed up to the current backlog, and new arrivals are added. Feasible action sets \mathcal{U} are assumed to be compact and convex, ensuring that service rates \mu_i(t) are bounded and lie within a convex hull determined by the network state.^[10] Additionally, time-average constraints often govern the system, such as \limsup_{T \to \infty} \frac{1}{T} \sum_{t=0}^{T-1} \mathbb{E}[\mu_i(t)] \leq c_i for some capacity c_i > 0, reflecting long-term resource limits.^[1] In multi-hop queueing networks, such as those modeling packet routing in communication systems, queues at network nodes interact through routing and scheduling decisions. Here, arrivals to downstream queues depend on services from upstream ones, and control actions U(t) include both transmission schedules and routing choices across links.^[18] A key example is the backpressure policy, where routing decisions prioritize packets toward queues with the largest differential backlog—defined as the difference in queue lengths between neighboring nodes—to balance loads and maximize throughput under interference constraints.^[18] These networks assume i.i.d. external arrivals and service completions, with feasible schedules forming matchings in the network graph to avoid conflicts.^[18] A representative example is the tandem queueing system, consisting of queues in series where output from one queue feeds the next. For two queues, the dynamics are Q_1(t+1) = [Q_1(t) - \mu_1(t)]^+ + A_1(t) and Q_2(t+1) = [Q_2(t) - \mu_2(t)]^+ + \mu_1(t), with \mu_1(t) and \mu_2(t) constrained by a shared server capacity.^[1] Similarly, input-queued switches model N \times N crossbar architectures with virtual output queues at inputs, where arrivals are i.i.d. packets destined for outputs, and scheduling selects non-conflicting matches to serve queues, evolving under the standard backlog update while respecting the one-packet-per-port limit per slot.^[10]

Drift-Plus-Penalty Framework

The drift-plus-penalty framework serves as the core methodology in Lyapunov optimization for addressing stochastic optimization problems in queueing networks, where the goal is to maximize time-average utility subject to queue stability and long-term constraints. At each discrete time slot t, the algorithm observes the current queue state \mathbf{Q}(t) and selects an action U(t) from a feasible set to minimize a bounded expression involving the expected one-step Lyapunov drift plus a scaled penalty term: \Delta V(t) + \beta \mathbb{E}[L(U(t)) \mid \mathbf{Q}(t)], where V(t) is a quadratic Lyapunov function (typically V(t) = \frac{1}{2} \sum_i Q_i(t)^2), \Delta V(t) = V(t+1) - V(t) measures queue stability, L(U(t)) represents the instantaneous penalty or negative utility to be minimized, and \beta > 0 is a tunable parameter that trades off between achieving queue boundedness (smaller \beta) and nearing the optimal time-average penalty (larger \beta). This approach decouples the complex multi-slot optimization into tractable per-slot decisions, ensuring mean-rate stability of the queues while approaching the optimal static policy performance. The derivation of the framework begins with the standard one-slot bound on the conditional Lyapunov drift for the actual queues \mathbf{Q}(t): \Delta V(t) \leq B + \sum_i Q_i(t) (\mathbb{E}[A_i(t) \mid \mathbf{Q}(t)] - \mathbb{E}[\mu_i(U(t)) \mid \mathbf{Q}(t)]), where B \geq 0 is a finite constant bounding second-moment terms, A_i(t) is the random arrival to queue i, and \mu_i(U(t)) is the service rate induced by action U(t). Adding the scaled penalty term \beta \mathbb{E}[L(U(t)) \mid \mathbf{Q}(t)] to this bound yields an upper bound on the drift-plus-penalty that the algorithm minimizes over feasible U(t), resulting in the policy U(t) = \arg\min_{U \in \mathcal{U}} \left\{ \beta L(U) - \sum_i Q_i(t) \mathbb{E}[\mu_i(U) \mid \mathbf{Q}(t)] \right\}, where \mathcal{U} is the action set and the expectation is often simplified under known system statistics. This minimization directly weights service opportunities by current queue backlogs (prioritizing longer queues) while penalizing high-cost actions, thereby fostering stability without requiring knowledge of arrival statistics.^[19] To handle time-average constraints of the form \lim_{T \to \infty} \frac{1}{T} \sum_{t=0}^{T-1} \mathbb{E}[g_j(U(t))] \leq 0 for j = 1, \dots, J (e.g., average power or resource limits), the framework incorporates virtual (auxiliary) queues Z_j(t), which track cumulative constraint violations. These evolve as Z_j(t+1) = \left[ Z_j(t) - s_j \right]^+ + v_j(U(t)), where ^+ = \max\{x, 0\}, s_j \geq 0 is a slack variable (often the constraint bound), and v_j(U(t)) = g_j(U(t)) is the violation under action U(t). The virtual queues are non-negative and mean-rate stable under the algorithm, implying constraint satisfaction in the long run, and are included in the augmented Lyapunov function V(t) = \frac{1}{2} \sum_i Q_i(t)^2 + \frac{1}{2} \sum_j Z_j(t)^2, extending the drift bound accordingly.^[19] Regarding optimality, the framework guarantees that the time-average penalty satisfies \limsup_{T \to \infty} \frac{1}{T} \sum_{t=0}^{T-1} L(U(t)) \leq L^* + \frac{B}{\beta}, where L^* is the optimal value under any stabilizing policy that satisfies the constraints, with all queues (actual and virtual) bounded by O(\beta) in steady state. The parameter \beta thus controls the optimality gap, allowing arbitrarily close performance to L^* at the expense of larger queue backlogs and potential delays. This near-optimality holds under mild conditions, such as the existence of a feasible policy with strict slack in constraints (Slater's condition), and applies to a broad class of convex or concave objectives in queueing networks.^[19]

Algorithms and Implementation

Lyapunov optimization yields practical algorithms for controlling queueing systems in real-time, leveraging the drift-plus-penalty framework to make decentralized decisions based on current queue states and system conditions.^[20] One seminal approach is the backpressure algorithm, originally developed for stabilizing multi-hop radio networks by selecting transmission schedules that maximize the weighted sum of differential queue lengths across links. In this method, for each potential link from node i to node j, the weight is computed as the maximum of zero and the difference in queue backlogs Q_i(t) - Q_j(t) for the commodity of interest, reflecting the "pressure" to route packets from longer to shorter queues.^[21] The algorithm then solves a maximum weighted matching or scheduling problem over these link weights to determine the active transmissions in each time slot, ensuring throughput optimality under stability constraints. Extensions incorporate multi-receiver diversity by ranking potential receivers per transmitter and probabilistically allocating service based on success probabilities, further enhancing efficiency in broadcast-prone environments.^[21] For problems with separable convex objectives, an online gradient descent variant adapts decision variables iteratively using projected updates informed by queue states.^[22] Specifically, at time t, the update takes the form U(t+1) = \proj_{\mathcal{X}} \left( U(t) - \alpha \nabla \left( Q(t)^\top f(U(t)) + \beta \cost(U(t)) \right) \right), where \proj_{\mathcal{X}} projects onto the feasible set \mathcal{X}, \alpha > 0 is a step size, Q(t) represents queue backlogs, f captures service rates, and \beta weights the penalty term for costs like power.^[22] This method integrates stochastic constraint handling via virtual queues that track cumulative violations, enabling convergence to near-optimal time averages with O(\sqrt{T}) regret over T slots.^[22] When the per-slot optimization involves non-convex decision sets, Lyapunov methods employ relaxations to an enclosing convex set or subgradient-based approximations to navigate local optima.^[23] Within each slot, the algorithm minimizes a perturbed objective over the relaxed set using subgradient descent on dual variables associated with equality constraints, followed by primal recovery via projection.^[23] This yields bounded convergence times, such as O(1/\epsilon) iterations for \epsilon-optimality under local polyhedral assumptions, without requiring full convexity.^[23] The computational complexity of these algorithms is typically polynomial per slot, facilitating scalability; for instance, maximum weighted matching in backpressure scheduling under simple interference models can be solved in O(N^3) time via combinatorial algorithms, leading to overall complexity O(T \poly(N)) over T slots for N nodes.^[20] In distributed settings, nodes exchange local queue information to approximate solutions, reducing central overhead while preserving near-optimality.^[21] A representative implementation is the energy-efficient control algorithm for a simple wireless downlink scheduler, which minimizes average power while stabilizing user queues and maximizing throughput.^[20] The following pseudocode outlines the per-slot operations for a base station serving L users with queue backlogs U_l(t) and channel states S_l(t), using parameter V > 0 to trade off power and delay:

Algorithm: Energy-Efficient Downlink Scheduler (EECA)

Input: V > 0, feasible power set Π, current queues U(t) = [U_1(t), ..., U_L(t)], channels S(t) = [S_1(t), ..., S_L(t)]

For each time slot t:

1. Observe U(t) and S(t).

2. For each user l = 1 to L:
   Compute quality Q_l = max_{P_l ∈ Π_l} [2 U_l(t) μ_l(P_l, S_l(t)) - V P_l],
   where μ_l(P_l, S_l(t)) is the achievable rate (e.g., log(1 + γ_{S_l(t)} P_l)).

3. Select user l*(t) = arg max_l Q_l(t).

4. If Q_{l*(t)} > 0:
   Set transmission power P_{l*(t)}(t) to the arg max from step 2 for l*(t),
   serve user l*(t) at rate μ_{l*(t)}(P_{l*(t)}(t), S_{l*(t)}(t)),
   update queues: U_l(t+1) = [U_l(t) - μ_l(t)]^+ + A_l(t) for all l (A_l(t) arrivals).

5. Else: Idle (no transmission).

Output: Transmission schedule and powers for slot t.
Algorithm: Energy-Efficient Downlink Scheduler (EECA)

Input: V > 0, feasible power set Π, current queues U(t) = [U_1(t), ..., U_L(t)], channels S(t) = [S_1(t), ..., S_L(t)]

For each time slot t:

1. Observe U(t) and S(t).

2. For each user l = 1 to L:
   Compute quality Q_l = max_{P_l ∈ Π_l} [2 U_l(t) μ_l(P_l, S_l(t)) - V P_l],
   where μ_l(P_l, S_l(t)) is the achievable rate (e.g., log(1 + γ_{S_l(t)} P_l)).

3. Select user l*(t) = arg max_l Q_l(t).

4. If Q_{l*(t)} > 0:
   Set transmission power P_{l*(t)}(t) to the arg max from step 2 for l*(t),
   serve user l*(t) at rate μ_{l*(t)}(P_{l*(t)}(t), S_{l*(t)}(t)),
   update queues: U_l(t+1) = [U_l(t) - μ_l(t)]^+ + A_l(t) for all l (A_l(t) arrivals).

5. Else: Idle (no transmission).

Output: Transmission schedule and powers for slot t.

This scheduler achieves average power within B N / V of the optimum, where B bounds the Lyapunov function, and queue averages bounded by (B N + V N P_{\max}) / (2 \epsilon_{\max}), with \epsilon_{\max} the stability margin.^[20]

Broader Applications and Extensions

Wireless Communication Networks

In wireless communication networks, Lyapunov optimization frameworks model packet transmission as queues representing data buffers at network nodes, with decision actions focused on power allocation and rate scheduling that respect signal-to-interference-plus-noise ratio (SINR) constraints to determine feasible transmission rates.^[24] These models capture the stochastic nature of arrivals and service opportunities, enabling dynamic control policies that stabilize queues while optimizing network performance.^[24] A primary application lies in cross-layer optimization for ad-hoc and cellular networks, where the framework jointly addresses throughput maximization, delay minimization, and energy efficiency by integrating routing, scheduling, and resource allocation across protocol layers.^[24] Building on general queueing system models, this approach adapts to wireless specifics like interference and mobility by using queue backlog information to guide decisions in real time.^[24] For instance, in fading channels, opportunistic scheduling prioritizes transmissions to users with favorable channel conditions, weighted by their current queue lengths to balance load and exploit temporal variations without requiring predictive channel state information.^[24] The method's adaptations for time-varying channels emphasize online algorithms that rely solely on current queue states and observable channel realizations, avoiding the need for statistical predictions of channel state information and thus ensuring robustness in dynamic environments like mobile ad-hoc networks.^[24] Performance analyses demonstrate that these policies achieve throughput within an arbitrarily small fraction of the network capacity region, with average delays scaling as O(1/V), where V is a tunable parameter controlling the trade-off between utility maximization (e.g., throughput) and queue stability penalties.^[24] This trade-off highlights the framework's flexibility, allowing operators to prioritize low-latency operations in time-critical wireless scenarios at the cost of slightly reduced throughput efficiency.^[24]

Energy and Resource Management Systems

Lyapunov optimization has been instrumental in managing energy harvesting systems, where devices must balance data processing tasks with intermittent renewable energy supplies. In such models, data tasks are represented by queues that accumulate arrivals and are served based on available energy, while the battery is modeled as an energy queue with stochastic recharge rates from sources like solar or wind. Actions at each time slot involve deciding how much energy to use for transmission or storage, ensuring that energy consumption does not exceed availability to prevent system instability. This framework allows for online decision-making that maximizes long-term utility, such as throughput or quality of service, while respecting battery dynamics.^[25] A key technique in these systems is the use of virtual queues to enforce energy causality constraints, which prevent overdraft by maintaining non-negative battery levels, and to handle long-term average power constraints without requiring future knowledge of energy arrivals. These virtual queues are incorporated into the Lyapunov drift-plus-penalty function, guiding minimization of a weighted drift term that penalizes queue growth while optimizing utility. For instance, in sensor networks with solar harvesting, the battery queue evolves as Q_b(t+1) = Q_b(t) - u(t) + h(t), where u(t) is the energy used for data upload and h(t) is the harvested energy, both stochastic and non-negative. Algorithms derived from this approach, such as energy-selective allocation, dynamically adjust transmission rates to optimize data upload while bounding battery overflow.^[25] The primary benefits of applying Lyapunov optimization here include its ability to handle uncertainty in energy arrivals without relying on forecasting or statistical assumptions, achieving utility performance within a tunable gap O(1/V) of the optimum, where V is a control parameter, alongside bounded queue sizes that ensure finite overflow probabilities. In practice, this leads to robust operation in resource-constrained environments, stabilizing both data and energy queues under varying conditions. For a case study in smart grid load balancing, Lyapunov-based algorithms allocate renewable energy to delay-tolerant loads, minimizing costs from non-renewable purchases by smoothing intermittent supplies through storage and demand shifting. Evaluations on real-world data from the California ISO market and NREL wind profiles demonstrate cost reductions of approximately 50% compared to greedy methods, with average delays under 10 hours.^[26]

Recent Advances and Variants

In recent years, Lyapunov optimization has evolved to address multi-scale dynamics in hierarchical systems, where components operate on disparate fast and slow timescales. Post-2015 developments introduce multi-time-scale models that extend the drift-plus-penalty framework to manage interactions between low-latency primary processes and higher-latency secondary ones, ensuring stability across scales in resource-constrained environments like wireless networks. These approaches leverage composite Lyapunov functions to bound drift in both short-term and long-term horizons, achieving near-optimal performance with reduced computational overhead compared to monolithic optimizations.^[27] Integration with machine learning has emerged as a key variant, particularly through reinforcement learning (RL) approximations of Lyapunov policies for tackling non-convex optimization problems. Deep RL formulations recast the Lyapunov drift minimization as a Markov decision process, enabling agents to learn stabilizing actions that minimize time-average penalties without requiring full model knowledge.^[28] This synergy is especially prominent in safe RL, where control Lyapunov functions (CLFs) enforce stability and safety constraints during policy updates, preventing constraint violations in uncertain environments.^[29] For instance, Lyapunov barrier policy optimization restricts RL updates to safe action sets using barrier functions, demonstrating improved sample efficiency and robustness in continuous control tasks.^[30] Decentralized variants of Lyapunov optimization facilitate distributed algorithms for large-scale networks, eliminating the need for a central coordinator by enabling local drift computations. In edge computing and microgrid scenarios, these methods use consensus-based Lyapunov functions to coordinate decisions across nodes, achieving asymptotic optimality while preserving privacy and reducing communication overhead.^[31] Recent implementations, such as Lyapunov-assisted deep deterministic policy gradients, generate near-optimal offloading strategies in decentralized mobile edge systems with up to 20% lower latency than centralized baselines.^[31] For nonconvex settings, relaxed smoothness assumptions allow decentralized first-order methods to converge via Lyapunov analyses tailored to heterogeneous objectives.^[32] Key recent works have extended Lyapunov optimization to continuous-time regimes and hybrid systems, incorporating fluid approximations for smoother dynamics modeling. Systematic Lyapunov analyses for continuous-time first-order methods provide sublinear convergence guarantees, bridging discrete-time queueing roots with fluid-based limits for large-scale simulations.^[33] In hybrid systems, 2020s research on safe RL employs CLFs alongside control barrier functions to ensure reachability and collision avoidance in robotics, with data-driven learning of these functions yielding bounded error rates under model uncertainties.^[34] A comprehensive review highlights how these CLF-based techniques guarantee forward invariance in safe RL, addressing both stability and constraint satisfaction across diverse applications.^[35] Ongoing challenges in Lyapunov optimization center on scalability for massive IoT deployments and robustness against adversarial inputs, where exponential state growth and uncertain perturbations can destabilize drift bounds. In IoT networks, handling millions of devices demands lightweight, distributed variants that maintain optimality amid resource constraints, with recent frameworks reducing communication costs through Lyapunov-driven federation.^[36] Robust extensions, such as those for low-Earth orbit satellite routing, incorporate worst-case drift analyses to mitigate adversarial traffic, achieving higher throughput under attacks compared to nominal methods.^[37] Future directions emphasize hybrid ML-Lyapunov tools for real-time adaptability in ultra-dense IoT, focusing on verifiable safety in non-stationary environments.^[38]

References

[1]
Queue Stability and Probability 1 Convergence via Lyapunov ... - arXiv
Aug 20, 2010 · Lyapunov drift and Lyapunov optimization are powerful techniques for optimizing time averages in stochastic queueing networks subject to stability.
[2]
https://www.sciencedirect.com/science/article/pii/S1084804520302551
[3]
Alexandr Mikhailovich Liapunov, The general problem of the stability ...
A Lyapunov function can be used to determine the equilibrium stability of nonlinear systems, as well as to estimate the Region of Attraction (RoA) or establish ...
[4]
[PDF] Aleksandr Lyapunov, the man who created the modern theory of ...
Jan 26, 2019 · In 1892, he issued the work “The general problem of stability of motions", in which he established the modern abstract mathematical theory of ...
[5]
[PDF] Kalman Bertram I - BYU Engineering
The "second method" of Lyapunov is the most general approach currently in the theory of stability of dynamic systems. After a rigorous exposition of the ...
[6]
Approaches to Stability Analysis - jstor
piece on " Discrete-Time Systems " (ibid.), are excellent accounts of the applications of Lyapunov's technique to stability problems, and have considerably ...<|separator|>
[7]
[PDF] Neuro-Dynamic Programming - MIT
Page 1. Page 2. Neuro-Dynamic Programming. Dimitri P. Bertsekas and John N. Tsitsiklis. Massachusetts Institute of Technology. WWW site for book information and ...Missing: 1980s 1990s
[8]
[PDF] Dynamic Power Allocation and Routing for Time-Varying Wireless ...
Abstract—We consider dynamic routing and power allocation for a wireless network with time-varying channels. The network consists of power constrained nodes ...
[9]
Stochastic Network Optimization with Application to Communication ...
In stockThis text presents a modern theory of analysis, control, and optimization for dynamic networks. Mathematical techniques of Lyapunov drift and Lyapunov ...
[10]
[PDF] A Lyapunov Optimization Approach for Green Cellular Networks with ...
Abstract—Powering cellular networks with renewable energy sources via energy harvesting (EH) has recently been proposed as a promising solution for green ...
[11]
[PDF] Nonlinear Systems and Control Lecture # 9 Lyapunov Stability
The conditions of Lyapunov's theorem are only sufficient. Failure of a Lyapunov function candidate to satisfy the conditions for stability or asymptotic ...
[12]
[PDF] Dynamic Lyapunov functions
Lyapunov functions are a fundamental tool to investigate stability properties of equilibrium points of linear and nonlinear systems.
[13]
[PDF] Queue Stability and Probability 1 Convergence via Lyapunov ... - arXiv
Lyapunov Optimization. Michael J. Neely. Abstract. Lyapunov drift and Lyapunov optimization are powerful techniques for optimizing time averages in stochastic.
[14]
[PDF] Network Optimization: Notes and Exercises
Abstract. These notes provide a tutorial treatment of topics of Pareto optimality, Lagrange multipliers, and computational algorithms.
[15]
Stability and Probability 1 Convergence for Queueing Networks via ...
Jul 19, 2012 · Lyapunov drift is a powerful tool for optimizing stochastic queueing networks subject to stability. However, the most convenient drift ...
[16]
Stability properties of constrained queueing systems and scheduling ...
Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks ; Print ISSN: 0018-9286.
[17]
https://onlinelibrary.wiley.com/doi/abs/10.1155/2012/831909
[18]
[PDF] Energy Optimal Control for Time Varying Wireless Networks
Abstract—We develop a dynamic control strategy for min- imizing energy expenditure in a time varying wireless network with adaptive transmission rates.
[19]
[PDF] Optimal Backpressure Routing for Wireless Networks with Multi ...
In this paper, we overcome these challenges with a simple solution that uses the concept of backpressure routing and. Lyapunov drift. We first show that it is ...
[20]
[PDF] Online Convex Optimization with Stochastic Constraints - arXiv
Aug 12, 2017 · This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinke- vich's OCO over a known simple fixed ...
[21]
[PDF] Time-Average Optimization with Non-Convex Decision Set and Its ...
This time- average optimization extends traditional convex formulations to allow a non-convex decision set. This class of problems can be solved by Lyapunov ...
[22]
[PDF] Dynamic Power Allocation and Routing for Satellite and Wireless ...
In this thesis, we develop the notion of network layer capacity and describe capacity achiev- ing power allocation and routing algorithms for general networks ...<|separator|>
[23]
[PDF] Utility Optimal Scheduling in Energy Harvesting Networks
In this paper, we consider the problem of constructing utility optimal scheduling algorithms in a discrete stochastic network, where the communica- tion links ...<|control11|><|separator|>
[24]
[PDF] Efficient Algorithms for Renewable Energy Allocation to Delay ...
These problems are solved via the Lyapunov optimization technique. Our resulting algorithms do not require knowledge of the statistics of the time-varying ...
[25]
A Lyapunov Drift-Plus-Penalty-Based Multi-Objective Optimization of ...
By introducing the idea of Lyapunov drift plus penalty, the construction process is simulated as a network transmission model, starting from the begin of the ...
[26]
A Reinforcement Learning Formulation of the Lyapunov Optimization
Dec 14, 2020 · In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while ...
[27]
[PDF] A Lyapunov-based Approach to Safe Reinforcement Learning
Lyapunov functions have been extensively used in control theory to analyze the stability of dynamic systems [20, 28]. A Lyapunov function is a type of scalar ...Missing: post- seminal<|control11|><|separator|>
[28]
[PDF] LYAPUNOV BARRIER POLICY OPTIMIZATION
We propose a new method, LBPO, that uses a Lyapunov-based barrier function to restrict the policy update to a safe set for each training iteration. Our method ...<|control11|><|separator|>
[29]
Lyapunov-Assisted Decentralized Dynamic Offloading Strategy ...
Nov 15, 2024 · A novel task offloading algorithm, namely, reduced target deep deterministic policy gradient (RT-DDPG), is proposed, which can generate near-optimal offloading ...
[30]
Decentralized Stochastic Nonconvex Optimization under the ... - arXiv
Sep 10, 2025 · We present a new framework for analyzing decentralized first-order methods in the relaxed smooth setting, based on the Lyapunov function related ...
[31]
A Systematic Approach to Lyapunov Analyses of Continuous-Time ...
Jul 26, 2023 · In the recent paper [4], the authors proposed Lyapunov-based analyses for many first-order methods, for linear and sublinear convergence rates.
[32]
[PDF] Reinforcement Learning for Safety-Critical Control under Model ...
1: Our method (RL-CBF-CLF-QP): We propose a control barrier function (CBF) and control Lyapunov function (CLF) based parametrized quadratic program, where the ...
[33]
A Review On Safe Reinforcement Learning Using Lyapunov and ...
Aug 12, 2025 · The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety ...
[34]
Optimizing resource allocation in industrial IoT with federated ...
The Lyapunov-driven optimization algorithm minimizes communication costs by considering both computation and communication expenditures across devices. This ...
[35]
Robust Lyapunov Optimization for LEO Satellite Networks Routing ...
Aug 7, 2025 · We validate the effectiveness of the proposed robust Lyapunov optimization through extensive simulations under various traffic conditions and ...
[36]
A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement ...
Jun 4, 2025 · The Lyapunov drift-plus-penalty method, first introduced by Neely in 2010 [11] to stabilize queues while optimizing specific objectives, has ...