Fact-checked by Grok 2 weeks ago

Optimal stopping

Optimal stopping is a branch of mathematics, primarily within and stochastic processes, that addresses the problem of determining the optimal time to cease of a sequential and take an action, in order to maximize the expected reward or minimize the expected cost based on the information gathered up to that point. This involves formulating a stopping rule—a decision adapted to the evolving —that balances the between continuing to observe for potentially better outcomes and the risk of missing an acceptable one due to costs like time or opportunity. The theory formalizes these decisions using concepts such as stopping times, which are random variables indicating when to halt, and relies on tools from martingale theory and dynamic programming to compute the value function representing the maximal expected gain. The origins of optimal stopping trace back to early work in sequential analysis during the 1940s, notably Abraham Wald's development of the sequential probability ratio test for efficient hypothesis testing in quality control and military applications during World War II. Building on this, foundational contributions in the 1950s and 1960s by researchers like J.L. Snell, Y.S. Chow, and H. Robbins established the modern framework, emphasizing backward induction for finite-horizon problems and essential suprema for infinite ones. A seminal text, Great Expectations: The Theory of Optimal Stopping by Chow, Robbins, and Siegmund (1971), synthesized these advances and highlighted the role of the Snell envelope—a supermartingale that bounds the optimal value process. Earlier roots can be found in gambling problems from the 16th century, such as those analyzed by Gerolamo Cardano, though rigorous probabilistic treatment awaited the axiomatic foundations laid by Andrey Kolmogorov in 1933. Classic examples illustrate the theory's core principles. In the secretary problem (or marriage problem), one must select the best candidate from a sequence of N applicants interviewed in random order, without recall; the optimal strategy is to observe and reject the first N/e ≈ 37% of candidates (where e is the base of the natural logarithm), then hire the next applicant who exceeds all prior ones, yielding a success probability that approaches 1/e ≈ 37% as N grows large. Another foundational case is the house-selling problem, where sequential offers arrive randomly, and the decision to accept balances the current bid against the of waiting, often solved via threshold rules derived from the value function. These discrete-time problems extend to continuous settings, such as stopping a to maximize expected gain, using free-boundary problems where continuation and stopping regions are delineated by the infinitesimal generator of the process. Applications of optimal stopping span diverse fields, underscoring its practical significance. In , it underpins the pricing of American options, where early exercise decisions maximize payoff under the Black-Scholes framework, contributing to the 1997 Nobel Prize in awarded to Merton and . Economic models of job search employ it to determine reservation wages, minimizing unemployment duration while maximizing lifetime earnings. In medicine and engineering, it aids change-point detection, such as monitoring vitals or failures to trigger interventions at the optimal , often incorporating costs for delayed . More broadly, the theory informs in , including the one-armed bandit problem for and sequential auctions for .

Fundamentals

Definition

Optimal stopping refers to the mathematical framework for determining the optimal time to terminate a or sequence of observations in order to maximize an expected reward or minimize an expected cost. This decision-making paradigm arises in uncertain environments where actions must be based on evolving information, such as sequentially arriving random variables, and the goal is to select a moment that balances immediate gains against potential future improvements. Formally, the problem involves choosing a \tau for an observation process X to optimize an like \sup_{\tau} \mathbb{E}[g(X_\tau)], where g denotes the reward function and the supremum is taken over admissible stopping times. Central assumptions include the sequential revelation of information about the process, the irrevocability of the stopping decision once made, and the absence of recall, meaning previously passed opportunities cannot be revisited. These constraints model real-world scenarios where decisions are final and information unfolds over time without the option to undo choices. Key terminology in optimal stopping includes the stopping time, a \tau that is measurable with respect to the generated by the process up to time \tau, ensuring the decision to stop depends only on information available at that instant. The reservation value represents the critical for the process value above which stopping becomes optimal, often characterizing threshold-based strategies. Additionally, the indifference curve (or boundary) delineates the region in the state space where the of stopping equals that of continuing, marking the transition between continuation and stopping sets in the solution. A non-mathematical illustrates the concept: consider a searching for the best on an item, weighing the effort and time of further inquiries against the risk of settling for a suboptimal deal, ultimately stopping when the current offer sufficiently outweighs anticipated improvements. Such problems are analyzed in both discrete and continuous time formulations, as explored in subsequent sections.

Historical Context

The roots of optimal stopping theory trace back to early gambling problems in the 16th century, such as those analyzed by in De Ludo Aleae (1564), and further developed through 18th-century advancements in probability, with influences from the formulated by in the late 1700s. In the 1940s, applications spurred connections to , originating with the U.S. Navy's 1942 Group under Bernard Koopman, whose 1946 synthesis in Search and Screening optimized detection efforts under uncertainty, effectively framing when to halt searches based on probabilistic success. The 1950s marked significant development with Abraham Wald's sequential analysis from 1947, extended by Herbert Robbins' 1952 papers on adaptive experimental designs and optimal stopping rules in statistical , which generalized stopping to non-statistical contexts via dynamic programming influences from Richard Bellman. Thomas S. Ferguson's contemporaneous work further formalized pure stopping problems, as seen in his 1967 contributions linking to broader sequential frameworks. The 1960s saw key advancements, including the formalization of the secretary problem—traced to discussions by Merrill Flood in 1950 and popularized by in 1960—with rigorous solutions by Dennis V. Lindley (1961), (1963), and others like Chow, Moriguti, Robbins, and Samuels (1964), establishing asymptotic strategies for rank-based stopping. This era's momentum carried into the 1970s-1980s transition to , driven by Albert Shiryaev's 1978 Optimal Stopping Rules and collaborative works with Gordon Peskir, which integrated martingale methods and free-boundary problems for continuous-time formulations, influencing quickest detection and prediction. Post-2000 growth integrated optimal stopping with , building on 1970s option like Black-Scholes (1973) for options as stopping problems, and extended to through Gaussian processes for time-series decisions (2022). In the 2020s, recent extensions address predicted priors in contexts, as in the 2025 formulation balancing consistency and robustness against prior errors, bridging classical theory to algorithmic predictions.

Mathematical Formulations

Discrete Time Case

In the discrete time case, optimal stopping problems are formulated over a countable set of time steps, typically indexed by non-negative integers, where decisions are made sequentially based on observed data. Consider a stochastic process (X_t)_{t=0}^\infty with state space E \subseteq \mathbb{R}^d, often assumed to be a time-homogeneous Markov chain for tractability, starting from an initial state X_0 = x \in E. The filtration (\mathcal{F}_t)_{t \geq 0} is generated by the observations up to time t, with \mathcal{F}_t = \sigma(X_0, \dots, X_t). A stopping time \tau is a random variable taking values in \{0, 1, 2, \dots, n\} for finite horizon n or \{0, 1, 2, \dots \} \cup \{\infty\} for infinite horizon, such that \{\tau \leq t\} \in \mathcal{F}_t for all t. If \tau = 0, the payoff is g(X_0). The objective is to choose \tau to maximize the expected total reward, given by \sup_{\tau} E_x\left[ \sum_{k=1}^\tau r_k(X_{k-1}) + g(X_\tau) \right], where the sum is empty (equal to 0) if \tau = 0, r_k: E \to \mathbb{R} denotes the running reward received at step k (possibly time-dependent), and g: E \to \mathbb{R} is the terminal payoff function upon stopping. Here, E_x denotes under the measure with X_0 = x, and the supremum is taken over all stopping times \tau. This setup captures sequential decision problems where continuing incurs a running reward (or if negative) until stopping yields the terminal payoff. In many Markovian settings, the running rewards are state-dependent, r_k(X_{k-1}), to reflect the process . For the finite horizon case with n steps, the problem is solved via . Define the value function V_k(x) as the supremum expected reward starting from state x at step k, for k = 0, 1, \dots, n. At the final step k = n, V_n(x) = g(x), and recursively for k = n-1, \dots, 0, V_k(x) = \max\left( g(x), \, r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x] \right). The optimal at step k is to stop if g(x) \geq r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x], yielding the optimal stopping time as the first k where this inequality holds. This dynamic programming ensures the value function satisfies the principle of optimality, computing the solution in O(n) steps under standard assumptions like bounded rewards. In the infinite horizon case, to ensure and avoid unbounded rewards, a discount factor \beta \in (0,1) is typically introduced, weighting future rewards. The value function V(x) satisfies the V(x) = \max\left( g(x), \, r(x) + \beta E_x[V(X_1) \mid X_0 = x] \right), where the supremum is over all stopping times \tau < \infty almost surely. Under contractivity of the operator induced by \beta and boundedness of g and r, successive approximations to the unique fixed point V, and an optimal stopping time exists. Under monotonicity assumptions, such as g being non-decreasing and the process X_t having non-decreasing conditional expectations, the solution converges to an optimal policy. Specifically, the continuation region \{x : V(x) > g(x)\} is below a b^*, and the optimal \tau = \inf\{t \geq 0 : X_t \geq b^* \}, where b^* solves the free-boundary problem from the . This structure simplifies computation and holds for one-dimensional diffusions discretized in time or i.i.d. observations with likelihood ratios.

Continuous Time Case

In the continuous time case, optimal stopping problems are formulated for evolving over uncountable time intervals, typically Markov processes such as diffusions, observed continuously. The underlying is X_t, defined on the time [0, \infty), and the stopping decision is made via a stopping time \tau adapted to the filtration \{\mathcal{F}_t\}_{t \geq 0} generated by the process up to time t. This setup allows for decisions based on the complete history of observations, contrasting with discrete-time analogs that rely on finite-step decisions. The goal is to select the stopping time \tau that maximizes the expected discounted reward: \sup_{\tau} \mathbb{E}\left[ g(X_\tau) e^{-r\tau} + \int_0^\tau f(X_s) e^{-rs} \, ds \right], where r > 0 is the constant discount rate, g is the terminal payoff function received upon stopping, and f is the running payoff function accumulated continuously until \tau. This objective balances the immediate reward from stopping against the ongoing accumulation of discounted gains from continuation. The solution is provided by the Snell envelope Z_t, which represents the conditional value of optimal future stopping: Z_t = \esssup_{\tau \geq t} \mathbb{E}\left[ g(X_\tau) e^{-r(\tau - t)} + \int_t^\tau f(X_s) e^{-r(s - t)} \, ds \,\Big|\, \mathcal{F}_t \right]. As the smallest supermartingale that dominates the associated payoff process (defined by the immediate stopping reward g(X_t) and future running payoffs), Z_t characterizes the maximal expected reward obtainable from time t onward and serves as the value process of the problem. The optimal stopping region is the set \{t : Z_t = g(X_t)\}, where immediate stopping is optimal because the value equals the terminal payoff, forgoing further running rewards. In the complementary continuation region \{t : Z_t > g(X_t)\}, it is beneficial to wait, as the expected future discounted rewards exceed the immediate option. Any optimal \(\tau satisfies \tau = \inf\{t \geq 0 : Z_t = g(X_t)\} almost surely. For diffusion processes, a martingale characterization underpins the optimality: under the optimal stopping strategy, the process Z_{t \wedge \tau} is a martingale, meaning \mathbb{E}[Z_{t \wedge \tau} \mid \mathcal{F}_s] = Z_s for $0 \leq s < t. This reflects that, in the continuation region, the infinitesimal generator of the diffusion applied to Z balances the running payoff f, ensuring no arbitrage from early or delayed stopping, while at the boundary of the stopping region, the process touches the payoff without crossing.

Solution Methods

Dynamic Programming

Dynamic programming provides a foundational framework for solving optimal stopping problems by decomposing them into recursive subproblems that build toward the global optimum. The principle of optimality, first articulated by , asserts that the optimal value for a problem from any state can be obtained by optimally solving the subproblems arising from subsequent states, ensuring that solutions to smaller problems combine to yield the overall solution. This principle underpins the recursive structure of value functions in optimal stopping, where the decision to stop or continue at each stage depends on the expected future value. In the discrete-time finite-horizon case, the value function V_k(x) at stage k and state x is defined recursively as V_k(x) = \max\left\{ g(x), \, \mathbb{E}\left[ V_{k-1}(X_{k+1}) \mid X_k = x \right] \right\}, where g(x) represents the immediate gain upon stopping at state x, and the expectation accounts for the transition to the next state X_{k+1}. This backward recursion starts from the terminal stage, where V_0(x) = g(x), and proceeds iteratively to compute the optimal value and associated stopping policy for each horizon length k. To solve these recursions numerically, especially in infinite-horizon settings, policy iteration algorithms are employed. These begin with an initial candidate stopping policy, evaluate its value function through repeated application of the Bellman operator until convergence, then improve the policy by updating the stopping set to include states where the gain exceeds the continuation value, and repeat until the policy stabilizes at a fixed point. Such iterations converge to the optimal policy under standard assumptions on the transition kernel and discount factor. Computational challenges arise in high-dimensional state spaces due to the curse of dimensionality, where the number of grid points required for approximation grows exponentially with dimension, rendering exact recursion infeasible. Mitigation strategies leverage monotonicity properties of the value function, such as non-decreasing behavior in certain state components, to restrict the search over possible stopping boundaries and reduce the effective dimensionality during grid-based approximations. For the discounted infinite-horizon case, optimality is established via the contraction mapping theorem in the Banach space of bounded continuous functions equipped with the sup norm. The Bellman operator, which maps a value function to the maximum of the gain and the discounted expected continuation value, is a contraction with modulus equal to the discount factor \gamma < 1, guaranteeing a unique fixed point that corresponds to the optimal value function; successive approximations from any initial function converge geometrically to this fixed point.

Threshold and Boundary Strategies

In optimal stopping problems, particularly those involving sequential observations of random offers, the optimal policy frequently simplifies to a threshold strategy known as the reservation price. The reservation price η represents the minimum value at which an agent should accept an offer; in the standard discounted infinite-horizon model with i.i.d. offers, it satisfies the equation η = β \mathbb{E}[\max(X, η)], where β < 1 is the discount factor and X denotes an i.i.d. offer drawn from a known distribution. This fixed threshold emerges when offers are independent and identically distributed, ensuring that stopping occurs the first time an observed value exceeds η, thereby maximizing the expected payoff. In non-stationary settings, such as finite-horizon problems where the number of remaining offers decreases over time, threshold strategies often exhibit monotonicity. Under the single crossing condition—where the gain from continuing (rather than stopping) crosses zero only once as a function of the state variable—the optimal thresholds become increasing over time. This monotonicity implies that the reservation price rises as the horizon shortens, reflecting heightened urgency to accept offers later in the process. Dynamic programming methods can be used to compute these evolving thresholds, confirming their optimality within the broader framework. For continuous-time formulations involving diffusion processes, optimal stopping boundaries take the form of free boundaries that solve a variational inequality. Specifically, the optimal boundary b(t) is determined by the equation \min\left{ \partial_t u + \frac{1}{2} \partial_{xx} u - r u, u - g \right} = 0, where u(t,x) is the value function, g(x) is the gain function, and r is the discount rate. Complementing this, the smooth pasting condition requires that the derivatives match at the boundary, ensuring a continuous and differentiable transition between continuation and stopping regions. These free boundaries delineate the continuation region where it is optimal to wait and the stopping region where immediate action maximizes value. Threshold optimality in these problems arises under specific structural conditions on the payoff and continuation value functions. Convexity of the payoff function guarantees that the marginal benefit of higher states favors stopping beyond a single threshold, avoiding multiple crossings. Additionally, submodularity of the continuation value—implying diminishing marginal returns to waiting as the state improves—ensures that the optimal policy remains a simple threshold rule rather than a more complex structure. Together, these conditions facilitate the reduction of general problems to tractable boundary strategies, enhancing both theoretical analysis and practical implementation.

Advanced Topics

Jump Diffusion Processes

Jump diffusion processes extend the standard continuous-time diffusion frameworks for optimal stopping by incorporating abrupt discontinuities driven by a Poisson point process, capturing phenomena like sudden market shocks or events that cannot be modeled by smooth paths alone. The underlying state process X_t evolves according to the stochastic differential equation dX_t = \mu \, dt + \sigma \, dW_t + \int_{\mathbb{R}} z \, \tilde{N}(dt, dz), where W_t is a standard Brownian motion, \tilde{N}(dt, dz) is a compensated Poisson random measure with intensity \lambda \, \nu(dz), \mu is the drift, \sigma > 0 is the volatility, \lambda > 0 is the jump intensity, and \nu is the Lévy measure governing jump sizes z. This model, originally proposed by Merton for asset pricing under discontinuous returns, allows jumps to reflect rare but significant changes in the state variable, thereby complicating the optimal stopping decision as the process can instantaneously cross stopping boundaries. In the context of optimal stopping, the problem seeks to maximize the expected discounted reward over adapted stopping times \tau, formulated as \sup_{\tau} \mathbb{E}^x \left[ e^{-r \tau} g(X_\tau) + \int_0^\tau e^{-r s} f(X_s) \, ds \right], where r > 0 is the risk-neutral discount rate, g denotes the terminal payoff function, and f the running payoff. The presence of jumps alters payoff expectations by introducing non-local effects, as a jump can shift the state from the continuation region directly into the stopping region, potentially increasing the value of waiting or exercising immediately compared to pure diffusion settings. This formulation generalizes classic problems like American option pricing, where early exercise becomes optimal due to jump-induced asymmetries in risk. The value function u(x) satisfies the nonlinear integro-differential variational inequality \min \left\{ r u(x) - \mu u'(x) - \frac{1}{2} \sigma^2 u''(x) - \lambda \int_{\mathbb{R}} \left[ u(x + z) - u(x) \right] \nu(dz) - f(x), \, u(x) - g(x) \right\} = 0, with \lambda the jump intensity and \nu the Lévy measure; the integral term accounts for the expected change in value due to jumps, rendering the equation nonlocal and requiring solutions that handle both diffusive and jump components. Solutions are obtained through variational inequalities, where the continuation region is characterized by equality in the equation (with the generator balancing the discount and payoffs), and the stopping region by u(x) = g(x), subject to jump conditions at the free boundary to ensure continuity and prevent arbitrage from instantaneous jumps across it. These boundaries often exhibit reduced smoothness compared to diffusion cases, with numerical methods like finite differences adapted for the integro-term to approximate the solution. In financial applications, models are particularly relevant for pricing options subject to sudden spikes or news events, where optimal early exercise boundaries adjust to the jump risk—higher intensity \lambda typically lowers the critical price for exercise in puts, enhancing the early exercise premium. For instance, in Merton's applied to options, jumps simulate crashes or earnings surprises, leading to variational solutions that quantify the impact of discontinuities on stopping strategies.

Recent Extensions in Stochastic Frameworks

Recent theoretical advances in optimal stopping have extended classical frameworks to handle model uncertainty and nonlinear expectations, particularly through the G-expectation introduced by Shige in the 2010s. This sublinear expectation, denoted \hat{\mathbb{E}}[\cdot], addresses ambiguity in underlying probability measures by incorporating volatility in G-, a nonlinear extension of standard Brownian motion. In this setting, the optimal stopping problem is formulated as \sup_{\tau} \hat{\mathbb{E}}[g(X_{\tau})], where X is the state process driven by G-Brownian motion, g is the reward function, and \tau is a . Dynamic programming principles adapt to this framework, yielding a nonlinear Hamilton-Jacobi-Bellman equation whose solution determines the value function and optimal strategy, enabling robust decision-making under epistemic about drift or diffusion parameters. Extensions to random horizon problems further generalize these stochastic settings by incorporating an exogenous random termination time \sigma, leading to stopping times of the form \tau \wedge \sigma. Here, the value function is derived using optional sampling theorems under changed probability measures, accounting for the joint filtration generated by the public information flow and private signals available to the decision-maker. This approach resolves the informational asymmetry in multi-agent or partially observable environments, ensuring the optimal strategy maximizes expected reward while respecting the random endpoint, as analyzed in progressive enlargement of filtrations. Such formulations prove particularly useful in applications like where external events impose abrupt horizons. A 2025 development introduces predicted models, which extend the secretary problem to scenarios with partial or erroneous information on value distributions. In this Bayesian framework, the decision-maker updates beliefs about the underlying distribution before observing candidates, using a "predicted " that may deviate from the true one, unlike classical no- assumptions. The optimal is computed via dynamic programming over the posterior, balancing exploration of the against observed data, and achieving higher success probabilities than naive priors in simulations with misspecified models. This bridges traditional optimal stopping with robust , addressing real-world scenarios like hiring under incomplete market intelligence. Numerical methods for stochastic systems, combining continuous states with discrete jumps, have seen 2024 advancements through finite difference schemes tailored to these mixed dynamics. These schemes discretize the infinitesimal generator of the hybrid process—incorporating , jumps, and switches—into a Markov chain approximation, solving the associated for the value function with high accuracy and convergence rates of order O(h) in spatial step h. Applied to quickest detection or control problems, they handle non-smooth boundaries efficiently via upwind differencing, outperforming methods in computational speed for multidimensional cases. Connections to emphasize worst-case optimal stopping under , where the decision-maker minimizes maximum regret over a set of plausible models. This formulation, often via Girsanov transformations or penalty methods, yields time-consistent strategies that hedge against adversarial drifts or jumps, extending linear expectations to inf-sup problems like \inf_{Q \in \mathcal{Q}} \sup_{\tau} \mathbb{E}^Q[g(X_{\tau})] over ambiguity sets \mathcal{Q}. Such robust approaches, building on precursors like jump-diffusion models, provide safeguards in uncertain environments such as portfolio liquidation.

Classic Examples

Secretary Problem

The secretary problem models a scenario where a decision maker must select the best option from a finite of alternatives presented sequentially in random order, without the ability to recall previous options. There are n candidates for a single position, arriving one at a time, each with an unknown quality that is revealed only through relative comparisons to those interviewed so far. The decision maker observes the relative rank of the current candidate compared to all predecessors and must immediately decide whether to select them or proceed to the next, aiming to maximize the probability of hiring the overall best candidate. This setup, first popularized in the early , exemplifies the tension in optimal stopping between gathering information and committing to a choice. The optimal strategy is to interview and reject the first r-1 candidates, using them solely to establish a benchmark of the best among them, and then select the first subsequent candidate who outperforms this benchmark. The value of r is chosen to maximize the success probability and for large n, r \approx n/e, where e \approx 2.718 is the base of the natural logarithm; more precisely, it is the value maximizing the probability formula. Under this threshold strategy, the probability of selecting the best candidate is p(r,n) = \frac{r}{n} \sum_{k=r+1}^{n} \frac{1}{k-1}, which simplifies for computation but requires evaluating over possible r to find the optimum. As n \to \infty, the maximal p(r,n) converges to $1/e \approx 0.367879, establishing an asymptotic performance bound for the no-information case where only relative ranks are available. This result, derived using Markov stopping time theory, highlights the strategy's efficiency despite the lack of absolute quality measures. A key variation is the full-information case, where each candidate's absolute quality is drawn independently from a known continuous , allowing the decision maker to observe exact values rather than just relative ranks. Here, the optimal strategy employs a sequence of decreasing thresholds: reject early candidates below a high initial threshold and accept the first later candidate exceeding the corresponding time-dependent threshold, calibrated to balance the risk of missing the best. and Mosteller analyzed this setting, showing that the asymptotic success probability again approaches $1/e, though the exact thresholds depend on the (e.g., [0,1] yields specific reservation values solvable via dynamic programming). This extension provides insight into scenarios with quantifiable utilities, such as auction bidding or investment timing. Multiple-choice extensions generalize the problem to selecting up to k > 1 candidates, often without and aiming to maximize the expected number of top-k selections or a similar objective. In the no-information version, the strategy involves multiple thresholds or a phased approach: skip an initial fraction for sampling, then select candidates outperforming evolving benchmarks until k choices are made. Gnedin established foundational results, demonstrating that for large n, the probability of securing all top-k candidates decays factorially but can be optimized with success rates scaling as (k!/e^k) \ln^k n / n^{k-1} in certain formulations; more recent work refines these for constraints or weighted selections. These variants apply to problems like hiring teams or selection. The intuition behind the secretary problem's optimal rule lies in trading off —skipping early candidates to learn the quality distribution—and exploitation—stopping at a promising later option informed by that sample. For finite n, exact computation via confirms the threshold's optimality, but the $1/e limit underscores a universal trade-off in sequential under , independent of specific distributions in the asymptotic regime.

House Selling Problem

The house selling problem is a classic example of an optimal stopping problem in time, where a seller receives a sequence of independent and identically distributed (i.i.d.) offers Y_i drawn from a known F with density f, and must decide whether to accept the current offer or continue searching, without recall of previous offers. The process typically unfolds over a finite horizon of T periods, after which the seller must accept the final offer if not stopped earlier. The optimal strategy is a rule: accept the first offer Y_t \geq \eta_t at time t, where the reservation prices \eta_t are decreasing in t (i.e., \eta_1 > \eta_2 > \cdots > \eta_T), reflecting the diminishing value of future as the horizon approaches. This setup maximizes the expected payoff, and the thresholds are derived via using dynamic programming. The \eta_t at time t (with T - t + 1 periods remaining) satisfies the \eta_t = \int_{\eta_t}^\infty y \, dF(y) + \int_0^{\eta_t} \mathbb{E}[\eta_{t+1}] \, dF(y), where \eta_T = \mathbb{E}[Y_T] is the value at the terminal period, ensuring indifference between accepting \eta_t and continuing to the next period. In the finite-horizon case without search costs, the decreasing thresholds capture the seller's willingness to lower standards over time to avoid settling for a suboptimal final offer. For the infinite-horizon case without costs or , the \eta solves \eta = \mathbb{E}[\max(Y, \eta)] = \int_\eta^\infty y \, dF(y) + \eta F(\eta), which simplifies to \eta = \mathbb{E}[Y \mid Y \geq \eta]; explicit solutions exist for specific distributions, such as the on [0, 1], where \eta = 1 (implying waiting indefinitely due to bounded support), and the with rate \lambda > 0, where \eta = 0 (accepting the first offer due to the memoryless property and unbounded support). When search costs c > 0 per period are introduced to obtain the next offer, the payoff adjusts to account for cumulative costs, yielding a net expected return of \mathbb{E}[Y_N - c N] where N is the . In the infinite-horizon case with costs but no , the optimal \eta satisfies \int_\eta^\infty (y - \eta) \, dF(y) = c, balancing the expected gain from a better future offer against the immediate cost of . For the uniform distribution on [0, 1], this yields \eta = 1 - \sqrt{2c} when c \leq 1/2 (and a linear adjustment otherwise); for the with rate \lambda, \eta = -\frac{1}{\lambda} \ln(\lambda c) when c < 1/\lambda. These costs lower the reservation price relative to the no-cost case, as the seller trades off potential gains against ongoing expenses. The choice of reservation price is highly sensitive to the tail behavior of F: distributions with heavier tails (e.g., lognormal or Pareto) lead to higher \eta values, as the potential for exceptionally large offers justifies prolonged searching despite costs, whereas light-tailed distributions (e.g., uniform) result in lower thresholds and quicker stopping to mitigate risk. This sensitivity underscores the problem's reliance on threshold strategies, where the seller accepts offers exceeding a value that equates the immediate payoff to the expected future value net of costs. Seminal analyses, including those incorporating such distributional effects, emphasize that optimal thresholds preserve monotonicity and converge appropriately in infinite horizons.

Coin Tossing

The coin tossing problem provides a simple yet profound illustration of optimal stopping, involving repeated independent trials with a binary outcome and a payoff based on the observed sequence. In the classic formulation, a fair coin is tossed sequentially, and the decision maker can stop at any time n \geq 1, receiving a payoff equal to the proportion of heads S_n / n, where S_n denotes the number of heads in the first n tosses. The objective is to select a stopping time \tau that maximizes the expected payoff \mathbb{E}[S_\tau / \tau]. This setup highlights the tension between exploiting current information and the potential for future improvement, with the waiting time until stopping exhibiting geometric-like properties due to the memoryless nature of the coin tosses. The optimal stopping rule involves thresholds on the current proportion of heads, stopping when S_n / n exceeds a decreasing boundary that accounts for the diminishing opportunities for large deviations as n grows, informed by the . While the precise boundary remains unsolved in closed form, introduced a practical strategy: stop if the number of heads surpasses the number of tails by an amount calibrated to the total tosses, achieving an expected payoff exceeding 0.79. For instance, one stops immediately on the first head (payoff 1), but continues after an initial tail to await a compensating head streak, balancing the risk of dilution against the chance of boosting the average. A representative exact solution considers the strategy of stopping on the first head, regardless of preceding tails. Here, \tau = k with probability (1/2)^k for k \geq 1 (geometric waiting time with success probability 1/2), yielding S_k = 1 and payoff $1/k. The expected reward is thus \sum_{k=1}^\infty (1/2)^k \cdot (1/k) = -\ln(1 - 1/2) = \ln 2 \approx 0.693, surpassing the long-run average of 0.5. More generally, for a strategy stopping on a head after exactly r preceding tails (continuing only on tails up to r, then tossing once more), the success probability is (1/2)^{r+1}, with payoff $1/(r+1), contributing (1/2)^{r+1} / (r+1) to the expectation; earlier heads yield payoff 0, but the full computation aggregates over paths. Intuition often suggests stopping whenever the current proportion exceeds 0.5 to lock in gains, yet optimal strategies sometimes continue below this level to capitalize on potential head streaks, leading to a seeming paradox: how can one outperform the fair-game average? The resolution lies in the martingale property of the process M_n = S_n / n, where \mathbb{E}[M_{n+1} \mid \mathcal{F}_n] = M_n since tosses are i.i.d. with mean 0.5. The optional stopping theorem implies \mathbb{E}[M_\tau] = 0.5 under conditions like bounded \tau or uniform integrability, but these fail for unbounded strategies exploiting rare large deviations, allowing \mathbb{E}[M_\tau] > 0.5 without violating fairness—intuition overlooks the subtle failure of the theorem's assumptions. This problem generalizes to a biased coin with success probability p \neq 0.5, where the martingale mean shifts to p, and optimal thresholds adjust accordingly: for p > 0.5, one can achieve expectations exceeding p by similar deviation-exploiting rules, while the core geometric structure persists in the waiting times between potential stopping points.

Broader Applications

Search and Parking Problems

Search theory addresses decision-making under uncertainty in sequential observations, particularly in labor markets where individuals evaluate job offers while incurring search costs. In the classic job search model, an unemployed worker receives wage offers drawn from a known and decides whether to accept or continue searching, balancing the immediate against the expected value of future opportunities. The optimal strategy involves a reservation , below which offers are rejected, ensuring the marginal benefit of continued search equals the cost. Seminal work by McCall established this framework, showing that the reservation rises with unemployment benefits and offer arrival rates but falls with search costs and impatience (discount rates). In continuous-time formulations, job offers arrive as a process with rate λ, search incurs a cost c per time, and outcomes are discounted at rate r. The reservation wage η equates the value of employment at η to the value of unemployment, yielding η = b - c + \frac{\lambda}{r} \int_{\eta}^{\infty} (w - \eta) , dF(w), where b is the baseline utility; higher search costs c lower η, reflecting the desire to accept offers sooner to avoid prolonged costs. The problem, introduced in the mid-20th century and formalized as an optimal stopping task, models a driver seeking a spot near a destination along a one-dimensional path, such as a , where potential spots appear randomly. In Tamaki's 1982 analysis, spots emerge via a Poisson process, and the destination has a prior distribution; the goal is to minimize expected , starting from an initial position. The driver observes spots sequentially and must decide irrevocably whether to stop, incurring a equal to the walking distance if parking succeeds, or a penalty if overshooting. The optimal strategy employs a policy: at remaining distance x to the destination, park if the spot is sufficiently close, with the determined by the on the destination and parameters. This arises from dynamic programming, where the value function represents the minimal expected cost and satisfies a balancing immediate against expected future costs. This setup connects to broader , particularly infinite-horizon variants of problem incorporating costs, where sequential relative ranks are evaluated amid ongoing search expenses, akin to unbounded with arrival costs c and discount r. In such extensions, the optimal threshold mirrors the reservation wage, trading off immediate acceptance against probabilistic future gains, with success probabilities converging to 1 - 1/e in costless limits but adjusted downward by costs.

Financial and Trading Models

Optimal stopping problems are central to , particularly in derivative pricing and timing decisions under . In , these problems often involve deciding when to exercise an option or liquidate an asset to maximize expected discounted payoff, where the underlying asset price follows a such as . The framework balances the trade-off between waiting for potentially higher rewards and the costs of discounting or risk exposure. Seminal contributions, such as those linking option pricing to optimal stopping, have provided analytical and numerical tools for these decisions. A prominent application is the pricing of American put options, which allow early exercise unlike counterparts. The value function u(S_t) of an American put with strike K, r, and \sigma satisfies the variational inequality \max\left\{ \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 u}{\partial S^2} + r S \frac{\partial u}{\partial S} - r u, (K - S) - u \right\} = 0. The early exercise b(S) determines the stock price S_t levels below which immediate exercise is optimal, solving the free- problem with value matching and smooth pasting conditions. The solution yields a decreasing boundary function, reflecting that exercise becomes more likely as the price falls. This formulation, established in early optimal stopping theory for diffusions, enables closed-form approximations for perpetual options and numerical solutions for finite maturities. In trading timing models, optimal stopping addresses when to sell an asset whose price S_t evolves as a geometric Brownian motion dS_t = \mu S_t dt + \sigma S_t dW_t. The objective is to maximize the expected discounted sale price \sup_\tau \mathbb{E}[e^{-r\tau} S_\tau], where \tau is the stopping time. For infinite horizons, the optimal strategy involves selling upon hitting a constant upper boundary b^*, derived from solving the associated Hamilton-Jacobi-Bellman equation. This boundary exceeds the initial price if \mu > r, incentivizing waiting, but equals it otherwise to avoid negative drift. Such models guide real-time trading decisions in volatile markets, emphasizing the role of drift, volatility, and discounting in threshold determination. Extensions to jump-diffusion processes incorporate sudden volatility changes, which can alter exercise boundaries in option . In models like Merton's jump-diffusion, where asset returns include Poisson-driven , spikes increase the value of waiting for American puts, shifting the early exercise boundary downward compared to pure cases. This effect arises because introduce and , making continuation more attractive during high- regimes to capture potential rebounds. Optimal stopping boundaries in these settings are solved via equations or variational inequalities that account for the jump and size , highlighting how discontinuities amplify the timing . For practical implementation, especially in high-dimensional problems with multiple underlyings, numerical methods like the least-squares Monte Carlo (LSMC) algorithm approximate American option prices. Developed by Longstaff and Schwartz, LSMC simulates paths of the asset process and uses with least-squares to estimate continuation values at each time step, determining exercise decisions by comparing to intrinsic payoffs. This approach excels in multi-asset scenarios where closed-form solutions are infeasible, achieving convergence with thousands of paths and polynomial basis functions for . It has become a for path-dependent options, balancing computational with accuracy in dimensions beyond three. In real-world , optimal stopping informs strategies amid fee structures and market frictions. face decisions on when to withdraw capital from funds with first-loss or shared-loss fees, modeled as stopping times that maximize net proceeds under fund performance. For instance, in a diffusion-based fund value process, the optimal liquidation boundary depends on fee parameters, often leading to earlier exits under high loss-sharing to mitigate . These models, analyzed via free-boundary problems, reveal that shared fees can delay by aligning interests, providing insights for fund design and timing in illiquid markets.

Machine Learning Contexts

In (NAS), optimal stopping principles are applied to determine when to halt the evaluation of candidate architectures during processes, thereby balancing computational efficiency with performance gains. in has been shown to be competitive with sophisticated gradient-based methods, such as or evolutionary algorithms, particularly when paired with well-engineered search spaces that limit the exploration to promising architectures. For instance, variants of the classical inform these strategies by suggesting thresholds for terminating search after evaluating approximately 37% of the space, achieving acceptable architectures in benchmarks involving thousands of runs. Recent analyses highlight that such approaches reduce the need for exhaustive training, with incomplete evaluations enhancing overall efficiency in resource-constrained settings. The framework further refines optimal epochs in by quantifying the expected benefit of additional evaluations against stopping costs, often leading to "good enough" thresholds that cut exploration to 15-20% of the space while maintaining around 80% success rates in identifying high-performing models. This is particularly relevant amid escalating training costs for frontier models, which have risen at a rate of 2.4 times per year since 2016, projected to exceed $1 billion per model by 2027, underscoring the practical impact of stopping rules in mitigating exponential compute demands. Studies from 2024 demonstrate that these stopping-informed random searches not only match but sometimes outperform methods in terms of architecture quality per compute unit, especially for large-scale tasks like image classification on datasets such as . In (RL), optimal stopping manifests in Markov decision processes (MDPs) as decisions to terminate phases, such as in ε-greedy policies where agents balance gathering new information against exploiting known rewards. Deep adapts optimal stopping by approximating the —representing expected future rewards—with neural networks, enabling scalable solutions to stopping problems traditionally solved via dynamic programming. Here, stopping actions yield immediate rewards (e.g., payoffs upon termination), while continuation defers them, with the ε-greedy strategy facilitating by selecting random actions with probability ε, which decays over episodes to converge toward optimal thresholds. This formulation connects directly to classic stopping rewards in updates, where the agent learns to stop when the Q-value of continuation falls below the stopping payoff, as seen in applications like option exercise in modeled as episodic MDPs. In optimal stopping tasks like the secretary problem with predicted additives, theoretical analyses show improved competitive ratios beyond the classical 1/e ≈ 0.368 using machine-learned predictions. Computational challenges arise in high-dimensional settings, such as multi-armed bandits, where optimal stopping requires approximating value functions across vast state spaces plagued by the of dimensionality. Deep dynamic programming addresses this by using neural networks to learn stopping boundaries, enabling solutions to problems with dozens of dimensions that traditional methods cannot handle due to exponential complexity. In bandit contexts, stopping times are defined with respect to filtration processes, allowing agents to terminate arms when posterior rewards indicate no further value, with approximations reducing in non-stationary environments. Recent advancements incorporate predicted priors into optimal stopping for hiring models, extending the secretary framework to scenarios where machine-learned predictions of candidate distributions inform thresholds, achieving bi-criteria guarantees for both consistency (exploiting accurate priors) and robustness (handling prediction errors). A analysis proposes algorithms that balance these in hiring simulations, improving selection probabilities over no-prior baselines while maintaining worst-case prophet-like bounds.

References

  1. [1]
    [PDF] Optimal Stopping and Applications - UCLA Mathematics
    The theory of optimal stopping is concerned with the problem of choosing a time to take a given action based on sequentially observed random variables in ...
  2. [2]
    [PDF] Selected Topics of the Optimal Stopping Theory - mimuw
    We start with a few simple problems which illustrate well basic aspects of the optimal stopping theory. In all the examples, S = (Sn)n≥0 is a symmetric random.Missing: survey | Show results with:survey
  3. [3]
    Knowing When to Stop | American Scientist
    The drive to improve the odds of winning gave birth to the mathematical field of probability, which in turn produced optimal stopping strategies.
  4. [4]
    [PDF] Stochastic Optimal Stopping: Problem Formulations
    A stochastic optimal stopping problem involves maximizing sup E [f(Xτ ,τ)] st τ ∈ T, where f(Xτ,τ) is the gain from stopping at time τ when the state value is ...
  5. [5]
    [PDF] Optimal Stopping Policy for Multivariate Sequences a Generalized ...
    problem, house-selling problem, optimal stopping problem. Basic assumptions of these series of sequential decision problems are as follows. [1]. 1- One by ...Missing: key | Show results with:key
  6. [6]
    [PDF] Optimal Search for the Best Alternative
    The stopping rule is to terminate search whenever the maxi- mum sampled reward is above the reservation price of every unsampled source. This simple ...
  7. [7]
    Full article: Optimal stopping with applications: an editorial prelude
    Nov 5, 2008 · Optimal stopping theory has its roots in classical calculus of variations. The three problem formulations due to Lagrange (18th century), Mayer ...
  8. [8]
    [PDF] Advances and Applications to Search and Rescue Decision Support
    This report describes the developments in the field of search theory from its origins during. World War II to the present day. The completion of this report ...
  9. [9]
    Herbert Robbins and sequential analysis: invited paper
    This paper reviews Herbert Robbins' research in sequential analysis (excluding stochastic approximation) from 1952 until roughly 1980.Missing: 1950s | Show results with:1950s
  10. [10]
    [PDF] Who Solved the Secretary Problem? - Penn Math
    Dynkin (1963) considers the problem as an application of the theory of Markov stopping times, and shows that, properly interpreted, the problem is monotone so ...
  11. [11]
    Optimal Stopping with Gaussian Processes - ACM Digital Library
    Oct 26, 2022 · We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific ...
  12. [12]
    None
    ### Summary of "Optimal Stopping with a Predicted Prior"
  13. [13]
    [PDF] Optimal Stopping - The University of Manchester
    Definition. A random variable τ : Ω → [0,∞] is called a Markov time ... Shiryaev, A. N., Optimal stopping rules, 1973, American mathematical. Society ...
  14. [14]
    Optimal Stopping and Free-Boundary Problems - SpringerLink
    The general theory of optimal stopping is exposed at thelevel of basic principles in both discrete and continuous time covering martingale andMarkovian methods.
  15. [15]
    [PDF] The Monotone Case Approach for the Solution of Certain ... - arXiv
    Jun 4, 2019 · This paper studies explicitly solvable multidimensional optimal stopping problems of sum- and product-type in discrete and continuous time ...
  16. [16]
    Dynamic programming and the principle of optimality: A systematic ...
    In this paper the dynamic programming procedure is systematically studied so as to clarify the relationship between Bellman's principle of optimality.
  17. [17]
    [PDF] 1 Dynamic Programming: The Optimality Equation
    We introduce the idea of dynamic programming and the principle of optimality. We give notation for state-structured models, and introduce ideas of feedback, ...
  18. [18]
    [PDF] Chapter 3. Optimal Stopping of Markov Chains 1 Finite-horizon
    The function ¯v given by (5) is the value function, and the optimal stopping time is τ∗ = {n ≥ 0 : Xn ≤ b}. The proof of this lemma is given in the ...<|control11|><|separator|>
  19. [19]
    [PDF] Stochastic Optimal Stopping: Numerical Methods
    Numerical methods for stochastic optimal stopping include backwards recursive dynamic algorithms, Monte Carlo simulation, linear programming, and neural ...
  20. [20]
    Multilevel Simulation Based Policy Iteration for Optimal Stopping
    This paper presents a novel approach to reducing the complexity of simulation based policy iteration methods for solving optimal stopping problems.
  21. [21]
    The policy iteration method for the optimal stopping of a markov ...
    In this paper we study the problem of the optimal stopping of a Markov chain with a countable state space. In each state i the controller receives a reward ...
  22. [22]
    [PDF] Approximate Solutions to Optimal Stopping Problems
    Like other problems of sequential decision making, optimal stopping problems suffer from the curse of dimensionality, and classical dynamic programming methods ...Missing: monotonicity | Show results with:monotonicity
  23. [23]
    Monotonicity of the value function for a two-dimensional optimal ...
    Aug 15, 2012 · Abstract page for arXiv paper 1208.3126: Monotonicity of the value function for a two-dimensional optimal stopping problem.
  24. [24]
    [PDF] Beating the curse of dimensionality in options pricing and optimal ...
    Aug 17, 2018 · In Section 4, we describe a modification. Page 8. Chen and Goldberg: Beating the curse of dimensionality in options pricing and optimal stopping.
  25. [25]
    [PDF] Chapter 4 Introduction to Dynamic Programming - andrew.cmu.ed
    THEOREM 2.1 (Contraction mapping theorem for bounded returns). Let C[a,b] be the set of all continuous functions mapping values from a bounded closed interval ...
  26. [26]
  27. [27]
    [PDF] Comparative Statics for Optimal Stopping Problems in Nonstationary ...
    Jul 30, 2024 · Suppose that the optimal stopping problem is monotone decreasing and that Condition. C 2 holds. Then the stopping boundaries t → b(t) and t →.
  28. [28]
    [PDF] Principles of Optimal Stopping and Free-Boundary Problems
    Apr 2, 2019 · SHIRYAEV, A. N. (1967). Two problems of sequential analysis. Cybernetics 3 (63-69). [65]. SHIRYAEV, A. N. (1978). Optimal Stopping Rules.
  29. [29]
  30. [30]
    Optimal stopping, free boundary, and American option in a jump ...
    This paper considers the American put option valuation in a jump-diffusion model and relates this optimal-stopping problem to a parabolic integro-different.
  31. [31]
  32. [32]
    [PDF] Optimal Stopping for a Diffusion with Jumps - Centro de Matemática
    In this paper we give the closed form solution of some optimal stopping problems for processes derived from a diffusion with jumps. Within the possible ...
  33. [33]
    [1211.0598] A Note on $G$- Optimal Stopping Problems - arXiv
    Nov 3, 2012 · We first establish the well-definedness of the stopping problem under the G-expectation, by showing the quasi-continuity of the stopped process.
  34. [34]
    Optimal stopping under G-expectation
    In this study, we develop a theory of optimal stopping problems within the G-expectation framework. To address this problem, we first introduce a type of ...
  35. [35]
    [2301.09836] Optimal stopping problem under random horizon - arXiv
    Jan 24, 2023 · This paper considers a pair (\mathbb{F},\tau), where \mathbb{F} is a filtration representing the public flow of information which is available to all agents ...
  36. [36]
    The Optimal Stopping Problem under a Random Horizon - MDPI
    The optimal stopping problem under a random horizon involves a random time (τ) that might not be an F-stopping time, and a filtration (G) that makes τ ...Missing: seminal | Show results with:seminal
  37. [37]
    Optimal Stopping with a Predicted Prior
    ### Summary of Optimal Stopping with a Predicted Prior
  38. [38]
    Numerical solutions of optimal stopping problems for a class of ...
    PeskirG. et al. ShiryaevA.N.. Optimal Stopping Rules. (1978). ShiryaevA.N.. The problem of the most rapid detection of a disturbance of a stationary regime ...<|separator|>
  39. [39]
    [PDF] Nonlinear Analysis: Hybrid Systems - NSF PAR
    This paper is devoted to numerically solving a class of optimal stopping problems for stochastic hybrid systems involving both continuous states and ...
  40. [40]
    On the Robust Optimal Stopping Problem - SIAM Publications Library
    We study a robust optimal stopping problem with respect to a set P of mutually singular probabilities. This can be interpreted as a zero-sum ...
  41. [41]
    Optimal stopping under ambiguity in continuous time
    3 កក្កដា 2012 · We develop a theory of optimal stopping problems under ambiguity in continuous time. Using results from (backward) stochastic calculus, ...
  42. [42]
  43. [43]
    On Multiple Choice Secretary Problems - PubsOnLine
    This paper deals with multiple choice, 'no-information' secre- tary problems; general surveys of the secretary problem may be found in Freeman. (1983) and ...
  44. [44]
    [PDF] House-Hunting Without Second Moments - Thomas S. Ferguson
    Key Words: Optimal Stopping; Selling an Asset; Job Search. 1. Introduction. The house-hunting problem, also called the problem of selling ... Sakaguchi M.
  45. [45]
    Dynamic programming of some sequential sampling design
    View PDF; Download ... Using techniques in the theory of dynamic programming [1] we shall determine the structures of the optimal rules for the problems.
  46. [46]
    None
    Nothing is retrieved...<|control11|><|separator|>
  47. [47]
    [PDF] Doob's Optional Stopping Theorem
    Doob's Optional Stopping Theorem: If the sequence. S(0),S(1),S(2),... is a bounded martingale, and T is a stopping time, then the expected value of S(T) is S ...
  48. [48]
    [PDF] Analysis of the Chow-Robbins Game with Biased Coins
    May 17, 2018 · Abstract. The rules of the Chow-Robbins game are simple: flip a coin repeatedly, and stop whenever you choose. The goal: maximize the ratio ...
  49. [49]
    42. Job Search I: The McCall Search Model
    The agent should accept if and only if the current wage offer exceeds the reservation wage. In view of (42.2), we can compute this reservation wage if we can ...
  50. [50]
    An optimal parking problem | Journal of Applied Probability
    Jul 14, 2016 · An optimal parking problem - Volume 19 Issue 4. ... Journal of Applied Probability , Volume 19 , Issue 4 , December 1982 , pp. 803 - 814. DOI ...
  51. [51]
    The Infinite Secretary Problem - Project Euclid
    This problem is in a strong sense the "limit" of a corresponding sequence of "finite secretary problems" which have been examined by various authors. The limit ...Missing: costs | Show results with:costs
  52. [52]
    Optimal Stopping with Sampling Cost: The Secretary Problem - jstor
    A secretary problem is an optimal stopping problem based on relative ranks. To the usual formulation of the secretary problem we add a cumulative.Missing: connection search
  53. [53]
    [PDF] Optimal Stopping and the American Put | Semantic Scholar
    Apr 1, 1991 · This paper considers the American put option valuation in a jump-diffusion model and relates this optimal-stopping problem to a parabolic ...Missing: seminal | Show results with:seminal
  54. [54]
    Optimal Stopping and the American Put - ResearchGate
    Aug 6, 2025 · In this paper, we show that the problem of pricing the American put is equivalent to solving an optimal stopping problem. The optimal stopping ...Missing: seminal | Show results with:seminal
  55. [55]
    Optimal Stopping Time for Holding an Asset
    This paper solves the problem to find the optimal stopping time for the holding asset and make a decision when to sell assets with discounted price reaching the ...
  56. [56]
    [PDF] Optimal stopping and the American put under incomplete information
    In this thesis we will be concerned with optimal stopping in continuous time. The canonical example of such a problem in mathematical finance is the pricing of ...Missing: survey | Show results with:survey<|control11|><|separator|>
  57. [57]
    Pricing American options for jump diffusions by iterating optimal ...
    Jan 6, 2009 · We approximate the price of the American put for jump diffusions by a sequence of functions, which are computed iteratively.
  58. [58]
    Pricing and disentanglement of American puts in the hyper ...
    They examine jump effects by comparing the shape of the early exercise boundary with and without jumps, keeping the overall volatility constant. In contrast, ...
  59. [59]
    [PDF] Valuing American Options by Simulation: A Simple Least-Squares ...
    The approach uses least squares to estimate conditional expected payoff, and simulation is a promising alternative to traditional methods for valuing American ...
  60. [60]
    [PDF] Valuing American Options by Simulation: A Simple Least-Squares ...
    The approach uses least squares to estimate the conditional expected payoff from continuation, and simulation is a promising alternative to traditional methods.
  61. [61]
    Analysis of an optimal stopping problem arising from hedge fund ...
    Mar 1, 2020 · In this paper, we consider the liquidation timing decision of the investor in a hedge fund with a first-loss and/or shared-loss fee structure.
  62. [62]
    [PDF] Analysis of an Optimal Stopping Problem Arising from Hedge Fund ...
    Jun 19, 2019 · In this paper, we consider the liquidation timing decision of the investor in a hedge fund with a first- loss and/or shared-loss fee ...
  63. [63]
    Neural architecture search applying optimal stopping theory - Frontiers
    The goal of the CSP's optimal policy is to maximize the success of selecting the highest ranked candidate (R1). NAS researchers applying the CSP's rules and ...
  64. [64]
    [2405.21015] The rising costs of training frontier AI models - arXiv
    This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental ...
  65. [65]
  66. [66]
    Solving high-dimensional optimal stopping problems using deep ...
    Aug 5, 2019 · In this work, we propose an algorithm for solving such problems, which is based on deep learning and computes, in the context of early exercise option pricing,Missing: armed bandits dynamic programming
  67. [67]
    [PDF] Chapter 6 MULTI-ARMED BANDIT PROBLEMS
    It is a stopping time with respect to the family of σ-fields {F,Ft;t = 0,1,2,...}. 3.3. Stopping Times for Multi-armed Bandit. Problems. Consider the classical ...