Fact-checked by Grok 2 weeks ago

Optimal stopping

Optimal stopping is a branch of mathematics, primarily within probability theory and stochastic processes, that addresses the problem of determining the optimal time to cease observation of a sequential process and take an action, in order to maximize the expected reward or minimize the expected cost based on the information gathered up to that point.^[1] This involves formulating a stopping rule—a decision procedure adapted to the evolving data—that balances the trade-off between continuing to observe for potentially better outcomes and the risk of missing an acceptable one due to costs like time or opportunity.^[1] The theory formalizes these decisions using concepts such as stopping times, which are random variables indicating when to halt, and relies on tools from martingale theory and dynamic programming to compute the value function representing the maximal expected gain.^[2] The origins of optimal stopping trace back to early work in sequential analysis during the 1940s, notably Abraham Wald's development of the sequential probability ratio test for efficient hypothesis testing in quality control and military applications during World War II.^[3] Building on this, foundational contributions in the 1950s and 1960s by researchers like J.L. Snell, Y.S. Chow, and H. Robbins established the modern framework, emphasizing backward induction for finite-horizon problems and essential suprema for infinite ones.^[1] A seminal text, Great Expectations: The Theory of Optimal Stopping by Chow, Robbins, and Siegmund (1971), synthesized these advances and highlighted the role of the Snell envelope—a supermartingale that bounds the optimal value process.^[2] Earlier roots can be found in gambling problems from the 16th century, such as those analyzed by Gerolamo Cardano, though rigorous probabilistic treatment awaited the axiomatic foundations laid by Andrey Kolmogorov in 1933.^[3] Classic examples illustrate the theory's core principles. In the secretary problem (or marriage problem), one must select the best candidate from a sequence of N applicants interviewed in random order, without recall; the optimal strategy is to observe and reject the first N/e ≈ 37% of candidates (where e is the base of the natural logarithm), then hire the next applicant who exceeds all prior ones, yielding a success probability that approaches 1/e ≈ 37% as N grows large.^[3] Another foundational case is the house-selling problem, where sequential offers arrive randomly, and the decision to accept balances the current bid against the expected value of waiting, often solved via threshold rules derived from the value function.^[1] These discrete-time problems extend to continuous settings, such as stopping a Brownian motion to maximize expected gain, using free-boundary problems where continuation and stopping regions are delineated by the infinitesimal generator of the process.^[2] Applications of optimal stopping span diverse fields, underscoring its practical significance. In finance, it underpins the pricing of American options, where early exercise decisions maximize payoff under the Black-Scholes framework, contributing to the 1997 Nobel Prize in Economics awarded to Robert Merton and Myron Scholes.^[3] Economic models of job search employ it to determine reservation wages, minimizing unemployment duration while maximizing lifetime earnings.^[1] In medicine and engineering, it aids change-point detection, such as monitoring patient vitals or machine failures to trigger interventions at the optimal moment, often incorporating costs for delayed action.^[1] More broadly, the theory informs decision-making in operations research, including the one-armed bandit problem for resource allocation and sequential auctions for procurement.^[1]

Fundamentals

Definition

Optimal stopping refers to the mathematical framework for determining the optimal time to terminate a stochastic process or sequence of observations in order to maximize an expected reward or minimize an expected cost.^[1] This decision-making paradigm arises in uncertain environments where actions must be based on evolving information, such as sequentially arriving random variables, and the goal is to select a moment that balances immediate gains against potential future improvements.^[1] Formally, the problem involves choosing a stopping time \tau for an observation process X to optimize an objective like \sup_{\tau} \mathbb{E}[g(X_\tau)], where g denotes the reward function and the supremum is taken over admissible stopping times.^[4] Central assumptions include the sequential revelation of information about the process, the irrevocability of the stopping decision once made, and the absence of recall, meaning previously passed opportunities cannot be revisited.^[1]^[5] These constraints model real-world scenarios where decisions are final and information unfolds over time without the option to undo choices. Key terminology in optimal stopping includes the stopping time, a random variable \tau that is measurable with respect to the filtration generated by the process up to time \tau, ensuring the decision to stop depends only on information available at that instant. The reservation value represents the critical threshold for the process value above which stopping becomes optimal, often characterizing simple threshold-based strategies.^[6] Additionally, the indifference curve (or boundary) delineates the region in the state space where the expected value of stopping equals that of continuing, marking the transition between continuation and stopping sets in the solution. A non-mathematical analogy illustrates the concept: consider a consumer searching for the best price on an item, weighing the effort and time of further inquiries against the risk of settling for a suboptimal deal, ultimately stopping when the current offer sufficiently outweighs anticipated improvements.^[1] Such problems are analyzed in both discrete and continuous time formulations, as explored in subsequent sections.^[1]

Historical Context

The roots of optimal stopping theory trace back to early gambling problems in the 16th century, such as those analyzed by Gerolamo Cardano in De Ludo Aleae (1564), and further developed through 18th-century advancements in probability, with influences from the calculus of variations formulated by Joseph-Louis Lagrange in the late 1700s.^[3]^[7] In the 1940s, World War II applications spurred connections to search theory, originating with the U.S. Navy's 1942 Antisubmarine Warfare Operations Research Group under Bernard Koopman, whose 1946 synthesis in Search and Screening optimized detection efforts under uncertainty, effectively framing when to halt searches based on probabilistic success.^[8] The 1950s marked significant development with Abraham Wald's sequential analysis from 1947, extended by Herbert Robbins' 1952 papers on adaptive experimental designs and optimal stopping rules in statistical decision-making, which generalized stopping to non-statistical contexts via dynamic programming influences from Richard Bellman.^[9]^[1] Thomas S. Ferguson's contemporaneous work further formalized pure stopping problems, as seen in his 1967 contributions linking to broader sequential frameworks.^[1] The 1960s saw key advancements, including the formalization of the secretary problem—traced to discussions by Merrill Flood in 1950 and popularized by Martin Gardner in 1960—with rigorous solutions by Dennis V. Lindley (1961), Eugene Dynkin (1963), and others like Chow, Moriguti, Robbins, and Samuels (1964), establishing asymptotic strategies for rank-based stopping.^[10] This era's momentum carried into the 1970s-1980s transition to stochastic control, driven by Albert Shiryaev's 1978 Optimal Stopping Rules and collaborative works with Gordon Peskir, which integrated martingale methods and free-boundary problems for continuous-time formulations, influencing quickest detection and prediction.^[7] Post-2000 growth integrated optimal stopping with finance, building on 1970s option pricing like Black-Scholes (1973) for American options as stopping problems, and extended to AI through Gaussian processes for time-series decisions (2022).^[3]^[11] In the 2020s, recent extensions address predicted priors in machine learning contexts, as in the 2025 formulation balancing consistency and robustness against prior errors, bridging classical theory to algorithmic predictions.^[12]

Mathematical Formulations

Discrete Time Case

In the discrete time case, optimal stopping problems are formulated over a countable set of time steps, typically indexed by non-negative integers, where decisions are made sequentially based on observed data. Consider a stochastic process (X_t)_{t=0}^\infty with state space E \subseteq \mathbb{R}^d, often assumed to be a time-homogeneous Markov chain for tractability, starting from an initial state X_0 = x \in E. The filtration (\mathcal{F}_t)_{t \geq 0} is generated by the observations up to time t, with \mathcal{F}_t = \sigma(X_0, \dots, X_t). A stopping time \tau is a random variable taking values in \{0, 1, 2, \dots, n\} for finite horizon n or \{0, 1, 2, \dots \} \cup \{\infty\} for infinite horizon, such that \{\tau \leq t\} \in \mathcal{F}_t for all t. If \tau = 0, the payoff is g(X_0).^[13] The objective is to choose \tau to maximize the expected total reward, given by

\sup_{\tau} E_x\left[ \sum_{k=1}^\tau r_k(X_{k-1}) + g(X_\tau) \right],

where the sum is empty (equal to 0) if \tau = 0, r_k: E \to \mathbb{R} denotes the running reward received at step k (possibly time-dependent), and g: E \to \mathbb{R} is the terminal payoff function upon stopping. Here, E_x denotes expectation under the measure with X_0 = x, and the supremum is taken over all stopping times \tau. This setup captures sequential decision problems where continuing incurs a running reward (or cost if negative) until stopping yields the terminal payoff. In many Markovian settings, the running rewards are state-dependent, r_k(X_{k-1}), to reflect the process dynamics.^[1]^[14] For the finite horizon case with n steps, the problem is solved via backward induction. Define the value function V_k(x) as the supremum expected reward starting from state x at step k, for k = 0, 1, \dots, n. At the final step k = n,

V_n(x) = g(x),

and recursively for k = n-1, \dots, 0,

V_k(x) = \max\left( g(x), \, r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x] \right).

The optimal action at step k is to stop if g(x) \geq r_{k+1}(x) + E_x[V_{k+1}(X_{k+1}) \mid X_k = x], yielding the optimal stopping time as the first k where this inequality holds. This dynamic programming recursion ensures the value function satisfies the principle of optimality, computing the solution in O(n) steps under standard assumptions like bounded rewards.^[1] In the infinite horizon case, to ensure convergence and avoid unbounded rewards, a discount factor \beta \in (0,1) is typically introduced, weighting future rewards. The value function V(x) satisfies the Bellman equation

V(x) = \max\left( g(x), \, r(x) + \beta E_x[V(X_1) \mid X_0 = x] \right),

where the supremum is over all stopping times \tau < \infty almost surely. Under contractivity of the operator induced by \beta and boundedness of g and r, successive approximations converge to the unique fixed point V, and an optimal stopping time exists.^[14] Under monotonicity assumptions, such as g being non-decreasing and the process X_t having non-decreasing conditional expectations, the solution converges to an optimal threshold policy. Specifically, the continuation region \{x : V(x) > g(x)\} is below a threshold b^*, and the optimal \tau = \inf\{t \geq 0 : X_t \geq b^* \}, where b^* solves the free-boundary problem from the Bellman equation. This structure simplifies computation and holds for one-dimensional diffusions discretized in time or i.i.d. observations with monotone likelihood ratios.^[15]^[14]

Continuous Time Case

In the continuous time case, optimal stopping problems are formulated for stochastic processes evolving over uncountable time intervals, typically Markov processes such as diffusions, observed continuously. The underlying process is X_t, defined on the time horizon [0, \infty), and the stopping decision is made via a stopping time \tau adapted to the filtration \{\mathcal{F}_t\}_{t \geq 0} generated by the process up to time t. This setup allows for decisions based on the complete history of observations, contrasting with discrete-time analogs that rely on finite-step decisions.^[14] The goal is to select the stopping time \tau that maximizes the expected discounted reward:

\sup_{\tau} \mathbb{E}\left[ g(X_\tau) e^{-r\tau} + \int_0^\tau f(X_s) e^{-rs} \, ds \right],

where r > 0 is the constant discount rate, g is the terminal payoff function received upon stopping, and f is the running payoff function accumulated continuously until \tau. This objective balances the immediate reward from stopping against the ongoing accumulation of discounted gains from continuation.^[14] The solution is provided by the Snell envelope Z_t, which represents the conditional value of optimal future stopping:

Z_t = \esssup_{\tau \geq t} \mathbb{E}\left[ g(X_\tau) e^{-r(\tau - t)} + \int_t^\tau f(X_s) e^{-r(s - t)} \, ds \,\Big|\, \mathcal{F}_t \right].

As the smallest supermartingale that dominates the associated payoff process (defined by the immediate stopping reward g(X_t) and future running payoffs), Z_t characterizes the maximal expected reward obtainable from time t onward and serves as the value process of the problem.^[14] The optimal stopping region is the set \{t : Z_t = g(X_t)\}, where immediate stopping is optimal because the value equals the terminal payoff, forgoing further running rewards. In the complementary continuation region

\{t : Z_t > g(X_t)\}, it is beneficial to wait, as the expected future discounted rewards exceed the immediate option. Any optimal \(\tau

satisfies \tau = \inf\{t \geq 0 : Z_t = g(X_t)\} almost surely.^[14] For diffusion processes, a martingale characterization underpins the optimality: under the optimal stopping strategy, the process Z_{t \wedge \tau} is a martingale, meaning \mathbb{E}[Z_{t \wedge \tau} \mid \mathcal{F}_s] = Z_s for $0 \leq s < t. This reflects that, in the continuation region, the infinitesimal generator of the diffusion applied to Z balances the running payoff f, ensuring no arbitrage from early or delayed stopping, while at the boundary of the stopping region, the process touches the payoff without crossing.^[14]

Solution Methods

Dynamic Programming

Dynamic programming provides a foundational framework for solving optimal stopping problems by decomposing them into recursive subproblems that build toward the global optimum. The principle of optimality, first articulated by Bellman, asserts that the optimal value for a problem from any state can be obtained by optimally solving the subproblems arising from subsequent states, ensuring that solutions to smaller problems combine to yield the overall solution.^[16] This principle underpins the recursive structure of value functions in optimal stopping, where the decision to stop or continue at each stage depends on the expected future value.^[17] In the discrete-time finite-horizon case, the value function V_k(x) at stage k and state x is defined recursively as

V_k(x) = \max\left\{ g(x), \, \mathbb{E}\left[ V_{k-1}(X_{k+1}) \mid X_k = x \right] \right\},

where g(x) represents the immediate gain upon stopping at state x, and the expectation accounts for the transition to the next state X_{k+1}.^[18] This backward recursion starts from the terminal stage, where V_0(x) = g(x), and proceeds iteratively to compute the optimal value and associated stopping policy for each horizon length k.^[19] To solve these recursions numerically, especially in infinite-horizon settings, policy iteration algorithms are employed. These begin with an initial candidate stopping policy, evaluate its value function through repeated application of the Bellman operator until convergence, then improve the policy by updating the stopping set to include states where the gain exceeds the continuation value, and repeat until the policy stabilizes at a fixed point.^[20] Such iterations converge to the optimal policy under standard assumptions on the transition kernel and discount factor.^[21] Computational challenges arise in high-dimensional state spaces due to the curse of dimensionality, where the number of grid points required for approximation grows exponentially with dimension, rendering exact recursion infeasible.^[22] Mitigation strategies leverage monotonicity properties of the value function, such as non-decreasing behavior in certain state components, to restrict the search over possible stopping boundaries and reduce the effective dimensionality during grid-based approximations.^[23]^[24] For the discounted infinite-horizon case, optimality is established via the contraction mapping theorem in the Banach space of bounded continuous functions equipped with the sup norm. The Bellman operator, which maps a value function to the maximum of the gain and the discounted expected continuation value, is a contraction with modulus equal to the discount factor \gamma < 1, guaranteeing a unique fixed point that corresponds to the optimal value function; successive approximations from any initial function converge geometrically to this fixed point.^[22]^[25]

Threshold and Boundary Strategies

In optimal stopping problems, particularly those involving sequential observations of random offers, the optimal policy frequently simplifies to a threshold strategy known as the reservation price. The reservation price η represents the minimum value at which an agent should accept an offer; in the standard discounted infinite-horizon model with i.i.d. offers, it satisfies the equation η = β \mathbb{E}[\max(X, η)], where β < 1 is the discount factor and X denotes an i.i.d. offer drawn from a known distribution.^[26] This fixed threshold emerges when offers are independent and identically distributed, ensuring that stopping occurs the first time an observed value exceeds η, thereby maximizing the expected payoff.^[26] In non-stationary settings, such as finite-horizon problems where the number of remaining offers decreases over time, threshold strategies often exhibit monotonicity. Under the single crossing condition—where the gain from continuing (rather than stopping) crosses zero only once as a function of the state variable—the optimal thresholds become increasing over time.^[27] This monotonicity implies that the reservation price rises as the horizon shortens, reflecting heightened urgency to accept offers later in the process. Dynamic programming methods can be used to compute these evolving thresholds, confirming their optimality within the broader framework.^[27] For continuous-time formulations involving diffusion processes, optimal stopping boundaries take the form of free boundaries that solve a variational inequality. Specifically, the optimal boundary b(t) is determined by the equation \min\left{ \partial_t u + \frac{1}{2} \partial_{xx} u - r u, u - g \right} = 0, where u(t,x) is the value function, g(x) is the gain function, and r is the discount rate.^[28] Complementing this, the smooth pasting condition requires that the derivatives match at the boundary, ensuring a continuous and differentiable transition between continuation and stopping regions.^[28] These free boundaries delineate the continuation region where it is optimal to wait and the stopping region where immediate action maximizes value. Threshold optimality in these problems arises under specific structural conditions on the payoff and continuation value functions. Convexity of the payoff function guarantees that the marginal benefit of higher states favors stopping beyond a single threshold, avoiding multiple crossings.^[29] Additionally, submodularity of the continuation value—implying diminishing marginal returns to waiting as the state improves—ensures that the optimal policy remains a simple threshold rule rather than a more complex structure.^[29] Together, these conditions facilitate the reduction of general optimal stopping problems to tractable boundary strategies, enhancing both theoretical analysis and practical implementation.

Advanced Topics

Jump Diffusion Processes

Jump diffusion processes extend the standard continuous-time diffusion frameworks for optimal stopping by incorporating abrupt discontinuities driven by a Poisson point process, capturing phenomena like sudden market shocks or events that cannot be modeled by smooth paths alone. The underlying state process X_t evolves according to the stochastic differential equation

dX_t = \mu \, dt + \sigma \, dW_t + \int_{\mathbb{R}} z \, \tilde{N}(dt, dz),

where W_t is a standard Brownian motion, \tilde{N}(dt, dz) is a compensated Poisson random measure with intensity \lambda \, \nu(dz), \mu is the drift, \sigma > 0 is the volatility, \lambda > 0 is the jump intensity, and \nu is the Lévy measure governing jump sizes z. This model, originally proposed by Merton for asset pricing under discontinuous returns,^[30] allows jumps to reflect rare but significant changes in the state variable, thereby complicating the optimal stopping decision as the process can instantaneously cross stopping boundaries.^[31] In the context of optimal stopping, the problem seeks to maximize the expected discounted reward over adapted stopping times \tau, formulated as

\sup_{\tau} \mathbb{E}^x \left[ e^{-r \tau} g(X_\tau) + \int_0^\tau e^{-r s} f(X_s) \, ds \right],

where r > 0 is the risk-neutral discount rate, g denotes the terminal payoff function, and f the running payoff. The presence of jumps alters payoff expectations by introducing non-local effects, as a jump can shift the state from the continuation region directly into the stopping region, potentially increasing the value of waiting or exercising immediately compared to pure diffusion settings. This formulation generalizes classic problems like American option pricing, where early exercise becomes optimal due to jump-induced asymmetries in risk.^[31] The value function u(x) satisfies the nonlinear integro-differential variational inequality

\min \left\{ r u(x) - \mu u'(x) - \frac{1}{2} \sigma^2 u''(x) - \lambda \int_{\mathbb{R}} \left[ u(x + z) - u(x) \right] \nu(dz) - f(x), \, u(x) - g(x) \right\} = 0,

with \lambda the jump intensity and \nu the Lévy measure; the integral term accounts for the expected change in value due to jumps, rendering the equation nonlocal and requiring solutions that handle both diffusive and jump components. Solutions are obtained through variational inequalities, where the continuation region is characterized by equality in the equation (with the generator balancing the discount and payoffs), and the stopping region by u(x) = g(x), subject to jump conditions at the free boundary to ensure continuity and prevent arbitrage from instantaneous jumps across it. These boundaries often exhibit reduced smoothness compared to diffusion cases, with numerical methods like finite differences adapted for the integro-term to approximate the solution.^[32]^[33] In financial applications, jump diffusion models are particularly relevant for pricing American options subject to sudden volatility spikes or news events, where optimal early exercise boundaries adjust to the jump risk—higher intensity \lambda typically lowers the critical price for exercise in puts, enhancing the early exercise premium. For instance, in Merton's framework applied to equity options, jumps simulate crashes or earnings surprises, leading to variational solutions that quantify the impact of discontinuities on stopping strategies.^[31]

Recent Extensions in Stochastic Frameworks

Recent theoretical advances in optimal stopping have extended classical frameworks to handle model uncertainty and nonlinear expectations, particularly through the G-expectation introduced by Shige Peng in the 2010s. This sublinear expectation, denoted \hat{\mathbb{E}}[\cdot], addresses ambiguity in underlying probability measures by incorporating volatility uncertainty in G-Brownian motion, a nonlinear extension of standard Brownian motion. In this setting, the optimal stopping problem is formulated as \sup_{\tau} \hat{\mathbb{E}}[g(X_{\tau})], where X is the state process driven by G-Brownian motion, g is the reward function, and \tau is a stopping time. Dynamic programming principles adapt to this framework, yielding a nonlinear Hamilton-Jacobi-Bellman equation whose solution determines the value function and optimal strategy, enabling robust decision-making under epistemic uncertainty about drift or diffusion parameters.^[34]^[35] Extensions to random horizon problems further generalize these stochastic settings by incorporating an exogenous random termination time \sigma, leading to stopping times of the form \tau \wedge \sigma. Here, the value function is derived using optional sampling theorems under changed probability measures, accounting for the joint filtration generated by the public information flow and private signals available to the decision-maker. This approach resolves the informational asymmetry in multi-agent or partially observable environments, ensuring the optimal strategy maximizes expected reward while respecting the random endpoint, as analyzed in progressive enlargement of filtrations. Such formulations prove particularly useful in applications like real options valuation where external events impose abrupt horizons.^[36]^[37] A 2025 development introduces predicted prior models, which extend the secretary problem to scenarios with partial or erroneous prior information on value distributions. In this Bayesian framework, the decision-maker updates beliefs about the underlying distribution before observing candidates, using a "predicted prior" that may deviate from the true one, unlike classical no-prior assumptions. The optimal threshold strategy is computed via dynamic programming over the posterior, balancing exploration of the predicted distribution against observed data, and achieving higher success probabilities than naive priors in simulations with misspecified models. This bridges traditional optimal stopping with robust Bayesian inference, addressing real-world scenarios like hiring under incomplete market intelligence.^[38] Numerical methods for hybrid stochastic systems, combining continuous states with discrete jumps, have seen 2024 advancements through finite difference schemes tailored to these mixed dynamics. These schemes discretize the infinitesimal generator of the hybrid process—incorporating diffusion, jumps, and regime switches—into a Markov chain approximation, solving the associated variational inequality for the value function with high accuracy and convergence rates of order O(h) in spatial step h. Applied to quickest detection or control problems, they handle non-smooth boundaries efficiently via upwind differencing, outperforming Monte Carlo methods in computational speed for multidimensional cases.^[39]^[40] Connections to robust control emphasize worst-case optimal stopping under ambiguity, where the decision-maker minimizes maximum regret over a set of plausible models. This zero-sum game formulation, often via Girsanov transformations or penalty methods, yields time-consistent strategies that hedge against adversarial drifts or jumps, extending linear expectations to inf-sup problems like \inf_{Q \in \mathcal{Q}} \sup_{\tau} \mathbb{E}^Q[g(X_{\tau})] over ambiguity sets \mathcal{Q}. Such robust approaches, building on precursors like jump-diffusion models, provide safeguards in uncertain environments such as portfolio liquidation.^[41]^[42]

Classic Examples

Secretary Problem

The secretary problem models a scenario where a decision maker must select the best option from a finite sequence of alternatives presented sequentially in random order, without the ability to recall previous options. There are n candidates for a single position, arriving one at a time, each with an unknown quality that is revealed only through relative comparisons to those interviewed so far. The decision maker observes the relative rank of the current candidate compared to all predecessors and must immediately decide whether to select them or proceed to the next, aiming to maximize the probability of hiring the overall best candidate. This setup, first popularized in the early 1960s, exemplifies the tension in optimal stopping between gathering information and committing to a choice.^[10] The optimal strategy is to interview and reject the first r-1 candidates, using them solely to establish a benchmark of the best among them, and then select the first subsequent candidate who outperforms this benchmark. The value of r is chosen to maximize the success probability and for large n, r \approx n/e, where e \approx 2.718 is the base of the natural logarithm; more precisely, it is the value maximizing the probability formula. Under this threshold strategy, the probability of selecting the best candidate is

p(r,n) = \frac{r}{n} \sum_{k=r+1}^{n} \frac{1}{k-1},

which simplifies for computation but requires evaluating over possible r to find the optimum. As n \to \infty, the maximal p(r,n) converges to $1/e \approx 0.367879, establishing an asymptotic performance bound for the no-information case where only relative ranks are available. This result, derived using Markov stopping time theory, highlights the strategy's efficiency despite the lack of absolute quality measures.^[10]^[43] A key variation is the full-information case, where each candidate's absolute quality is drawn independently from a known continuous distribution, allowing the decision maker to observe exact values rather than just relative ranks. Here, the optimal strategy employs a sequence of decreasing thresholds: reject early candidates below a high initial threshold and accept the first later candidate exceeding the corresponding time-dependent threshold, calibrated to balance the risk of missing the best. Gilbert and Mosteller analyzed this setting, showing that the asymptotic success probability again approaches $1/e, though the exact thresholds depend on the distribution (e.g., uniform [0,1] yields specific reservation values solvable via dynamic programming). This extension provides insight into scenarios with quantifiable utilities, such as auction bidding or investment timing. Multiple-choice extensions generalize the problem to selecting up to k > 1 candidates, often without recall and aiming to maximize the expected number of top-k selections or a similar objective. In the no-information version, the strategy involves multiple thresholds or a phased approach: skip an initial fraction for sampling, then select candidates outperforming evolving benchmarks until k choices are made. Gnedin established foundational results, demonstrating that for large n, the probability of securing all top-k candidates decays factorially but can be optimized with success rates scaling as (k!/e^k) \ln^k n / n^{k-1} in certain formulations; more recent work refines these for matroid constraints or weighted selections. These variants apply to resource allocation problems like hiring teams or portfolio selection.^[44] The intuition behind the secretary problem's optimal rule lies in trading off exploration—skipping early candidates to learn the quality distribution—and exploitation—stopping at a promising later option informed by that sample. For finite n, exact computation via backward induction confirms the threshold's optimality, but the $1/e limit underscores a universal trade-off in sequential decision-making under uncertainty, independent of specific distributions in the asymptotic regime.

House Selling Problem

The house selling problem is a classic example of an optimal stopping problem in discrete time, where a seller receives a sequence of independent and identically distributed (i.i.d.) offers Y_i drawn from a known distribution function F with density f, and must decide whether to accept the current offer or continue searching, without recall of previous offers. The process typically unfolds over a finite horizon of T periods, after which the seller must accept the final offer if not stopped earlier. The optimal strategy is a threshold rule: accept the first offer Y_t \geq \eta_t at time t, where the reservation prices \eta_t are decreasing in t (i.e., \eta_1 > \eta_2 > \cdots > \eta_T), reflecting the diminishing value of future information as the horizon approaches. This setup maximizes the expected payoff, and the thresholds are derived via backward induction using dynamic programming.^[1]^[45] The reservation price \eta_t at time t (with T - t + 1 periods remaining) satisfies the Bellman equation \eta_t = \int_{\eta_t}^\infty y \, dF(y) + \int_0^{\eta_t} \mathbb{E}[\eta_{t+1}] \, dF(y), where \eta_T = \mathbb{E}[Y_T] is the value at the terminal period, ensuring indifference between accepting \eta_t and continuing to the next period. In the finite-horizon case without search costs, the decreasing thresholds capture the seller's willingness to lower standards over time to avoid settling for a suboptimal final offer. For the infinite-horizon case without costs or discounting, the stationary reservation price \eta solves \eta = \mathbb{E}[\max(Y, \eta)] = \int_\eta^\infty y \, dF(y) + \eta F(\eta), which simplifies to \eta = \mathbb{E}[Y \mid Y \geq \eta]; explicit solutions exist for specific distributions, such as the uniform distribution on [0, 1], where \eta = 1 (implying waiting indefinitely due to bounded support), and the exponential distribution with rate \lambda > 0, where \eta = 0 (accepting the first offer due to the memoryless property and unbounded support).^[46]^[1] When search costs c > 0 per period are introduced to obtain the next offer, the payoff adjusts to account for cumulative costs, yielding a net expected return of \mathbb{E}[Y_N - c N] where N is the stopping time. In the infinite-horizon case with costs but no discounting, the optimal reservation price \eta satisfies \int_\eta^\infty (y - \eta) \, dF(y) = c, balancing the expected gain from a better future offer against the immediate cost of searching. For the uniform distribution on [0, 1], this yields \eta = 1 - \sqrt{2c} when c \leq 1/2 (and a linear adjustment otherwise); for the exponential distribution with rate \lambda, \eta = -\frac{1}{\lambda} \ln(\lambda c) when c < 1/\lambda. These costs lower the reservation price relative to the no-cost case, as the seller trades off potential gains against ongoing expenses.^[45]^[1] The choice of reservation price is highly sensitive to the tail behavior of F: distributions with heavier tails (e.g., lognormal or Pareto) lead to higher \eta values, as the potential for exceptionally large offers justifies prolonged searching despite costs, whereas light-tailed distributions (e.g., uniform) result in lower thresholds and quicker stopping to mitigate risk. This sensitivity underscores the problem's reliance on threshold strategies, where the seller accepts offers exceeding a value that equates the immediate payoff to the expected future value net of costs. Seminal analyses, including those incorporating such distributional effects, emphasize that optimal thresholds preserve monotonicity and converge appropriately in infinite horizons.^[45]^[46]

Coin Tossing

The coin tossing problem provides a simple yet profound illustration of optimal stopping, involving repeated independent trials with a binary outcome and a payoff based on the observed sequence. In the classic formulation, a fair coin is tossed sequentially, and the decision maker can stop at any time n \geq 1, receiving a payoff equal to the proportion of heads S_n / n, where S_n denotes the number of heads in the first n tosses. The objective is to select a stopping time \tau that maximizes the expected payoff \mathbb{E}[S_\tau / \tau]. This setup highlights the tension between exploiting current information and the potential for future improvement, with the waiting time until stopping exhibiting geometric-like properties due to the memoryless nature of the coin tosses.^[47] The optimal stopping rule involves thresholds on the current proportion of heads, stopping when S_n / n exceeds a decreasing boundary that accounts for the diminishing opportunities for large deviations as n grows, informed by the law of iterated logarithm. While the precise boundary remains unsolved in closed form, Chow and Robbins introduced a practical strategy: stop if the number of heads surpasses the number of tails by an amount calibrated to the total tosses, achieving an expected payoff exceeding 0.79. For instance, one stops immediately on the first head (payoff 1), but continues after an initial tail to await a compensating head streak, balancing the risk of dilution against the chance of boosting the average.^[47]^[1] A representative exact solution considers the strategy of stopping on the first head, regardless of preceding tails. Here, \tau = k with probability (1/2)^k for k \geq 1 (geometric waiting time with success probability 1/2), yielding S_k = 1 and payoff $1/k. The expected reward is thus \sum_{k=1}^\infty (1/2)^k \cdot (1/k) = -\ln(1 - 1/2) = \ln 2 \approx 0.693, surpassing the long-run average of 0.5. More generally, for a strategy stopping on a head after exactly r preceding tails (continuing only on tails up to r, then tossing once more), the success probability is (1/2)^{r+1}, with payoff $1/(r+1), contributing (1/2)^{r+1} / (r+1) to the expectation; earlier heads yield payoff 0, but the full computation aggregates over paths.^[3] Intuition often suggests stopping whenever the current proportion exceeds 0.5 to lock in gains, yet optimal strategies sometimes continue below this level to capitalize on potential head streaks, leading to a seeming paradox: how can one outperform the fair-game average? The resolution lies in the martingale property of the process M_n = S_n / n, where \mathbb{E}[M_{n+1} \mid \mathcal{F}_n] = M_n since tosses are i.i.d. with mean 0.5. The optional stopping theorem implies \mathbb{E}[M_\tau] = 0.5 under conditions like bounded \tau or uniform integrability, but these fail for unbounded strategies exploiting rare large deviations, allowing \mathbb{E}[M_\tau] > 0.5 without violating fairness—intuition overlooks the subtle failure of the theorem's assumptions.^[1]^[48] This problem generalizes to a biased coin with success probability p \neq 0.5, where the martingale mean shifts to p, and optimal thresholds adjust accordingly: for p > 0.5, one can achieve expectations exceeding p by similar deviation-exploiting rules, while the core geometric structure persists in the waiting times between potential stopping points.^[49]

Broader Applications

Search and Parking Problems

Search theory addresses decision-making under uncertainty in sequential observations, particularly in labor markets where individuals evaluate job offers while incurring search costs. In the classic job search model, an unemployed worker receives wage offers drawn from a known distribution and decides whether to accept or continue searching, balancing the immediate wage against the expected value of future opportunities. The optimal strategy involves a reservation wage, below which offers are rejected, ensuring the marginal benefit of continued search equals the cost. Seminal work by McCall established this framework, showing that the reservation wage rises with unemployment benefits and offer arrival rates but falls with search costs and impatience (discount rates). In continuous-time formulations, job offers arrive as a Poisson process with rate λ, search incurs a flow cost c per unit time, and outcomes are discounted at rate r. The reservation wage η equates the value of employment at η to the value of unemployment, yielding η = b - c + \frac{\lambda}{r} \int_{\eta}^{\infty} (w - \eta) , dF(w), where b is the baseline utility; higher search costs c lower η, reflecting the desire to accept offers sooner to avoid prolonged costs.^[50] The parking problem, introduced in the mid-20th century and formalized as an optimal stopping task, models a driver seeking a spot near a destination along a one-dimensional path, such as a street, where potential spots appear randomly. In Tamaki's 1982 analysis, spots emerge via a Poisson process, and the destination has a prior distribution; the goal is to minimize expected walking distance, starting from an initial position. The driver observes spots sequentially and must decide irrevocably whether to stop, incurring a cost equal to the walking distance if parking succeeds, or a penalty if overshooting.^[51] The optimal strategy employs a threshold policy: at remaining distance x to the destination, park if the spot is sufficiently close, with the threshold determined by the prior on the destination and process parameters. This arises from dynamic programming, where the value function represents the minimal expected cost and satisfies a functional equation balancing immediate parking against expected future costs.^[51] This setup connects to broader search theory, particularly infinite-horizon variants of the secretary problem incorporating costs, where sequential relative ranks are evaluated amid ongoing search expenses, akin to unbounded job hunting with arrival costs c and discount r. In such extensions, the optimal threshold mirrors the reservation wage, trading off immediate acceptance against probabilistic future gains, with success probabilities converging to 1 - 1/e in costless limits but adjusted downward by costs.^[52]^[53]

Financial and Trading Models

Optimal stopping problems are central to financial modeling, particularly in derivative pricing and investment timing decisions under uncertainty. In finance, these problems often involve deciding when to exercise an option or liquidate an asset to maximize expected discounted payoff, where the underlying asset price follows a stochastic process such as geometric Brownian motion. The framework balances the trade-off between waiting for potentially higher rewards and the costs of discounting or risk exposure. Seminal contributions, such as those linking option pricing to optimal stopping, have provided analytical and numerical tools for these decisions.^[54] A prominent application is the pricing of American put options, which allow early exercise unlike European counterparts. The value function u(S_t) of an American put with strike K, risk-free rate r, and volatility \sigma satisfies the variational inequality

\max\left\{ \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 u}{\partial S^2} + r S \frac{\partial u}{\partial S} - r u, (K - S) - u \right\} = 0.

The early exercise boundary b(S) determines the stock price S_t levels below which immediate exercise is optimal, solving the free-boundary problem with value matching and smooth pasting conditions. The solution yields a decreasing boundary function, reflecting that exercise becomes more likely as the stock price falls. This formulation, established in early optimal stopping theory for diffusions, enables closed-form approximations for perpetual options and numerical solutions for finite maturities.^[54]^[55] In trading timing models, optimal stopping addresses when to sell an asset whose price S_t evolves as a geometric Brownian motion dS_t = \mu S_t dt + \sigma S_t dW_t. The objective is to maximize the expected discounted sale price \sup_\tau \mathbb{E}[e^{-r\tau} S_\tau], where \tau is the stopping time. For infinite horizons, the optimal strategy involves selling upon hitting a constant upper boundary b^*, derived from solving the associated Hamilton-Jacobi-Bellman equation. This boundary exceeds the initial price if \mu > r, incentivizing waiting, but equals it otherwise to avoid negative drift. Such models guide real-time trading decisions in volatile markets, emphasizing the role of drift, volatility, and discounting in threshold determination.^[56]^[57] Extensions to jump-diffusion processes incorporate sudden volatility changes, which can alter exercise boundaries in option pricing. In models like Merton's jump-diffusion, where asset returns include Poisson-driven jumps, volatility spikes increase the value of waiting for American puts, shifting the early exercise boundary downward compared to pure diffusion cases. This effect arises because jumps introduce skewness and kurtosis, making continuation more attractive during high-volatility regimes to capture potential rebounds. Optimal stopping boundaries in these settings are solved via integral equations or variational inequalities that account for the jump intensity and size distribution, highlighting how discontinuities amplify the timing dilemma.^[58]^[59] For practical implementation, especially in high-dimensional problems with multiple underlyings, numerical methods like the least-squares Monte Carlo (LSMC) algorithm approximate American option prices. Developed by Longstaff and Schwartz, LSMC simulates paths of the asset process and uses backward induction with least-squares regression to estimate continuation values at each time step, determining exercise decisions by comparing to intrinsic payoffs. This approach excels in multi-asset scenarios where closed-form solutions are infeasible, achieving convergence with thousands of paths and polynomial basis functions for regression. It has become a benchmark for pricing path-dependent options, balancing computational efficiency with accuracy in dimensions beyond three.^[60]^[61] In real-world finance, optimal stopping informs hedge fund liquidation strategies amid fee structures and market frictions. Investors face decisions on when to withdraw capital from funds with first-loss or shared-loss fees, modeled as stopping times that maximize net proceeds under stochastic fund performance. For instance, in a diffusion-based fund value process, the optimal liquidation boundary depends on fee parameters, often leading to earlier exits under high loss-sharing to mitigate downside risk. These models, analyzed via free-boundary problems, reveal that shared fees can delay liquidation by aligning interests, providing insights for fund design and investor timing in illiquid markets.^[62]^[63]

Machine Learning Contexts

In neural architecture search (NAS), optimal stopping principles are applied to determine when to halt the evaluation of candidate architectures during random search processes, thereby balancing computational efficiency with performance gains. Early stopping in random search has been shown to be competitive with sophisticated gradient-based methods, such as reinforcement learning or evolutionary algorithms, particularly when paired with well-engineered search spaces that limit the exploration to promising architectures.^[64] For instance, variants of the classical secretary problem inform these strategies by suggesting thresholds for terminating search after evaluating approximately 37% of the space, achieving acceptable architectures in benchmarks involving thousands of runs.^[64] Recent analyses highlight that such approaches reduce the need for exhaustive training, with incomplete evaluations enhancing overall efficiency in resource-constrained settings.^[64] The value of information framework further refines optimal epochs in NAS by quantifying the expected benefit of additional evaluations against stopping costs, often leading to "good enough" thresholds that cut exploration to 15-20% of the space while maintaining around 80% success rates in identifying high-performing models.^[64] This is particularly relevant amid escalating training costs for frontier AI models, which have risen at a rate of 2.4 times per year since 2016, projected to exceed $1 billion per model by 2027, underscoring the practical impact of stopping rules in mitigating exponential compute demands.^[65] Studies from 2024 demonstrate that these stopping-informed random searches not only match but sometimes outperform gradient methods in terms of architecture quality per compute unit, especially for large-scale tasks like image classification on datasets such as ImageNet.^[64] In reinforcement learning (RL), optimal stopping manifests in Markov decision processes (MDPs) as decisions to terminate exploration phases, such as in ε-greedy policies where agents balance gathering new information against exploiting known rewards. Deep Q-learning adapts optimal stopping by approximating the Q-function—representing expected future rewards—with neural networks, enabling scalable solutions to stopping problems traditionally solved via dynamic programming. Here, stopping actions yield immediate rewards (e.g., payoffs upon termination), while continuation defers them, with the ε-greedy strategy facilitating exploration by selecting random actions with probability ε, which decays over episodes to converge toward optimal thresholds. This formulation connects directly to classic stopping rewards in Q-learning updates, where the agent learns to stop when the Q-value of continuation falls below the stopping payoff, as seen in applications like option exercise in finance modeled as episodic MDPs. In optimal stopping tasks like the secretary problem with predicted additives, theoretical analyses show improved competitive ratios beyond the classical 1/e ≈ 0.368 using machine-learned predictions.^[66] Computational challenges arise in high-dimensional settings, such as multi-armed bandits, where optimal stopping requires approximating value functions across vast state spaces plagued by the curse of dimensionality. Deep dynamic programming addresses this by using neural networks to learn stopping boundaries, enabling solutions to problems with dozens of dimensions that traditional methods cannot handle due to exponential complexity.^[67] In bandit contexts, stopping times are defined with respect to filtration processes, allowing agents to terminate arms when posterior rewards indicate no further value, with deep approximations reducing regret in non-stationary environments.^[68] Recent advancements incorporate predicted priors into optimal stopping for AI hiring models, extending the secretary framework to scenarios where machine-learned predictions of candidate distributions inform thresholds, achieving bi-criteria guarantees for both consistency (exploiting accurate priors) and robustness (handling prediction errors). A 2025 analysis proposes algorithms that balance these in hiring simulations, improving selection probabilities over no-prior baselines while maintaining worst-case prophet-like bounds.^[38]

References

[1]
[PDF] Optimal Stopping and Applications - UCLA Mathematics
The theory of optimal stopping is concerned with the problem of choosing a time to take a given action based on sequentially observed random variables in ...
[2]
[PDF] Selected Topics of the Optimal Stopping Theory - mimuw
We start with a few simple problems which illustrate well basic aspects of the optimal stopping theory. In all the examples, S = (Sn)n≥0 is a symmetric random.Missing: survey | Show results with:survey
[3]
Knowing When to Stop | American Scientist
The drive to improve the odds of winning gave birth to the mathematical field of probability, which in turn produced optimal stopping strategies.
[4]
[PDF] Stochastic Optimal Stopping: Problem Formulations
A stochastic optimal stopping problem involves maximizing sup E [f(Xτ ,τ)] st τ ∈ T, where f(Xτ,τ) is the gain from stopping at time τ when the state value is ...
[5]
[PDF] Optimal Stopping Policy for Multivariate Sequences a Generalized ...
problem, house-selling problem, optimal stopping problem. Basic assumptions of these series of sequential decision problems are as follows. [1]. 1- One by ...Missing: key | Show results with:key
[6]
[PDF] Optimal Search for the Best Alternative
The stopping rule is to terminate search whenever the maxi- mum sampled reward is above the reservation price of every unsampled source. This simple ...
[7]
Full article: Optimal stopping with applications: an editorial prelude
Nov 5, 2008 · Optimal stopping theory has its roots in classical calculus of variations. The three problem formulations due to Lagrange (18th century), Mayer ...
[8]
[PDF] Advances and Applications to Search and Rescue Decision Support
This report describes the developments in the field of search theory from its origins during. World War II to the present day. The completion of this report ...
[9]
Herbert Robbins and sequential analysis: invited paper
This paper reviews Herbert Robbins' research in sequential analysis (excluding stochastic approximation) from 1952 until roughly 1980.Missing: 1950s | Show results with:1950s
[10]
[PDF] Who Solved the Secretary Problem? - Penn Math
Dynkin (1963) considers the problem as an application of the theory of Markov stopping times, and shows that, properly interpreted, the problem is monotone so ...
[11]
Optimal Stopping with Gaussian Processes - ACM Digital Library
Oct 26, 2022 · We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific ...
[12]
None
### Summary of "Optimal Stopping with a Predicted Prior"
[13]
[PDF] Optimal Stopping - The University of Manchester
Definition. A random variable τ : Ω → [0,∞] is called a Markov time ... Shiryaev, A. N., Optimal stopping rules, 1973, American mathematical. Society ...
[14]
Optimal Stopping and Free-Boundary Problems - SpringerLink
The general theory of optimal stopping is exposed at thelevel of basic principles in both discrete and continuous time covering martingale andMarkovian methods.
[15]
[PDF] The Monotone Case Approach for the Solution of Certain ... - arXiv
Jun 4, 2019 · This paper studies explicitly solvable multidimensional optimal stopping problems of sum- and product-type in discrete and continuous time ...
[16]
Dynamic programming and the principle of optimality: A systematic ...
In this paper the dynamic programming procedure is systematically studied so as to clarify the relationship between Bellman's principle of optimality.
[17]
[PDF] 1 Dynamic Programming: The Optimality Equation
We introduce the idea of dynamic programming and the principle of optimality. We give notation for state-structured models, and introduce ideas of feedback, ...
[18]
[PDF] Chapter 3. Optimal Stopping of Markov Chains 1 Finite-horizon
The function ¯v given by (5) is the value function, and the optimal stopping time is τ∗ = {n ≥ 0 : Xn ≤ b}. The proof of this lemma is given in the ...<|control11|><|separator|>
[19]
[PDF] Stochastic Optimal Stopping: Numerical Methods
Numerical methods for stochastic optimal stopping include backwards recursive dynamic algorithms, Monte Carlo simulation, linear programming, and neural ...
[20]
Multilevel Simulation Based Policy Iteration for Optimal Stopping
This paper presents a novel approach to reducing the complexity of simulation based policy iteration methods for solving optimal stopping problems.
[21]
The policy iteration method for the optimal stopping of a markov ...
In this paper we study the problem of the optimal stopping of a Markov chain with a countable state space. In each state i the controller receives a reward ...
[22]
[PDF] Approximate Solutions to Optimal Stopping Problems
Like other problems of sequential decision making, optimal stopping problems suffer from the curse of dimensionality, and classical dynamic programming methods ...Missing: monotonicity | Show results with:monotonicity
[23]
Monotonicity of the value function for a two-dimensional optimal ...
Aug 15, 2012 · Abstract page for arXiv paper 1208.3126: Monotonicity of the value function for a two-dimensional optimal stopping problem.
[24]
[PDF] Beating the curse of dimensionality in options pricing and optimal ...
Aug 17, 2018 · In Section 4, we describe a modification. Page 8. Chen and Goldberg: Beating the curse of dimensionality in options pricing and optimal stopping.
[25]
[PDF] Chapter 4 Introduction to Dynamic Programming - andrew.cmu.ed
THEOREM 2.1 (Contraction mapping theorem for bounded returns). Let C[a,b] be the set of all continuous functions mapping values from a bounded closed interval ...
[26]
https://doi.org/10.1287/opre.40.2.409
[27]
[PDF] Comparative Statics for Optimal Stopping Problems in Nonstationary ...
Jul 30, 2024 · Suppose that the optimal stopping problem is monotone decreasing and that Condition. C 2 holds. Then the stopping boundaries t → b(t) and t →.
[28]
[PDF] Principles of Optimal Stopping and Free-Boundary Problems
Apr 2, 2019 · SHIRYAEV, A. N. (1967). Two problems of sequential analysis. Cybernetics 3 (63-69). [65]. SHIRYAEV, A. N. (1978). Optimal Stopping Rules.
[29]
https://journals.sagepub.com/doi/pdf/10.1111/poms.12579
[30]
Optimal stopping, free boundary, and American option in a jump ...
This paper considers the American put option valuation in a jump-diffusion model and relates this optimal-stopping problem to a parabolic integro-different.
[31]
https://link.springer.com/article/10.1007/BF02683325
[32]
[PDF] Optimal Stopping for a Diffusion with Jumps - Centro de Matemática
In this paper we give the closed form solution of some optimal stopping problems for processes derived from a diffusion with jumps. Within the possible ...
[33]
[1211.0598] A Note on $G$- Optimal Stopping Problems - arXiv
Nov 3, 2012 · We first establish the well-definedness of the stopping problem under the G-expectation, by showing the quasi-continuity of the stopped process.
[34]
Optimal stopping under G-expectation
In this study, we develop a theory of optimal stopping problems within the G-expectation framework. To address this problem, we first introduce a type of ...
[35]
[2301.09836] Optimal stopping problem under random horizon - arXiv
Jan 24, 2023 · This paper considers a pair (\mathbb{F},\tau), where \mathbb{F} is a filtration representing the public flow of information which is available to all agents ...
[36]
The Optimal Stopping Problem under a Random Horizon - MDPI
The optimal stopping problem under a random horizon involves a random time (τ) that might not be an F-stopping time, and a filtration (G) that makes τ ...Missing: seminal | Show results with:seminal
[37]
Optimal Stopping with a Predicted Prior
### Summary of Optimal Stopping with a Predicted Prior
[38]
Numerical solutions of optimal stopping problems for a class of ...
PeskirG. et al. ShiryaevA.N.. Optimal Stopping Rules. (1978). ShiryaevA.N.. The problem of the most rapid detection of a disturbance of a stationary regime ...<|separator|>
[39]
[PDF] Nonlinear Analysis: Hybrid Systems - NSF PAR
This paper is devoted to numerically solving a class of optimal stopping problems for stochastic hybrid systems involving both continuous states and ...
[40]
On the Robust Optimal Stopping Problem - SIAM Publications Library
We study a robust optimal stopping problem with respect to a set P of mutually singular probabilities. This can be interpreted as a zero-sum ...
[41]
Optimal stopping under ambiguity in continuous time
3 កក្កដា 2012 · We develop a theory of optimal stopping problems under ambiguity in continuous time. Using results from (backward) stochastic calculus, ...
[42]
https://link.springer.com/article/10.1007/s11579-012-0081-6
[43]
On Multiple Choice Secretary Problems - PubsOnLine
This paper deals with multiple choice, 'no-information' secre- tary problems; general surveys of the secretary problem may be found in Freeman. (1983) and ...
[44]
[PDF] House-Hunting Without Second Moments - Thomas S. Ferguson
Key Words: Optimal Stopping; Selling an Asset; Job Search. 1. Introduction. The house-hunting problem, also called the problem of selling ... Sakaguchi M.
[45]
Dynamic programming of some sequential sampling design
View PDF; Download ... Using techniques in the theory of dynamic programming [1] we shall determine the structures of the optimal rules for the problems.
[46]
None
Nothing is retrieved...<|control11|><|separator|>
[47]
[PDF] Doob's Optional Stopping Theorem
Doob's Optional Stopping Theorem: If the sequence. S(0),S(1),S(2),... is a bounded martingale, and T is a stopping time, then the expected value of S(T) is S ...
[48]
[PDF] Analysis of the Chow-Robbins Game with Biased Coins
May 17, 2018 · Abstract. The rules of the Chow-Robbins game are simple: flip a coin repeatedly, and stop whenever you choose. The goal: maximize the ratio ...
[49]
42. Job Search I: The McCall Search Model
The agent should accept if and only if the current wage offer exceeds the reservation wage. In view of (42.2), we can compute this reservation wage if we can ...
[50]
An optimal parking problem | Journal of Applied Probability
Jul 14, 2016 · An optimal parking problem - Volume 19 Issue 4. ... Journal of Applied Probability , Volume 19 , Issue 4 , December 1982 , pp. 803 - 814. DOI ...
[51]
The Infinite Secretary Problem - Project Euclid
This problem is in a strong sense the "limit" of a corresponding sequence of "finite secretary problems" which have been examined by various authors. The limit ...Missing: costs | Show results with:costs
[52]
Optimal Stopping with Sampling Cost: The Secretary Problem - jstor
A secretary problem is an optimal stopping problem based on relative ranks. To the usual formulation of the secretary problem we add a cumulative.Missing: connection search
[53]
[PDF] Optimal Stopping and the American Put | Semantic Scholar
Apr 1, 1991 · This paper considers the American put option valuation in a jump-diffusion model and relates this optimal-stopping problem to a parabolic ...Missing: seminal | Show results with:seminal
[54]
Optimal Stopping and the American Put - ResearchGate
Aug 6, 2025 · In this paper, we show that the problem of pricing the American put is equivalent to solving an optimal stopping problem. The optimal stopping ...Missing: seminal | Show results with:seminal
[55]
Optimal Stopping Time for Holding an Asset
This paper solves the problem to find the optimal stopping time for the holding asset and make a decision when to sell assets with discounted price reaching the ...
[56]
[PDF] Optimal stopping and the American put under incomplete information
In this thesis we will be concerned with optimal stopping in continuous time. The canonical example of such a problem in mathematical finance is the pricing of ...Missing: survey | Show results with:survey<|control11|><|separator|>
[57]
Pricing American options for jump diffusions by iterating optimal ...
Jan 6, 2009 · We approximate the price of the American put for jump diffusions by a sequence of functions, which are computed iteratively.
[58]
Pricing and disentanglement of American puts in the hyper ...
They examine jump effects by comparing the shape of the early exercise boundary with and without jumps, keeping the overall volatility constant. In contrast, ...
[59]
[PDF] Valuing American Options by Simulation: A Simple Least-Squares ...
The approach uses least squares to estimate conditional expected payoff, and simulation is a promising alternative to traditional methods for valuing American ...
[60]
[PDF] Valuing American Options by Simulation: A Simple Least-Squares ...
The approach uses least squares to estimate the conditional expected payoff from continuation, and simulation is a promising alternative to traditional methods.
[61]
Analysis of an optimal stopping problem arising from hedge fund ...
Mar 1, 2020 · In this paper, we consider the liquidation timing decision of the investor in a hedge fund with a first-loss and/or shared-loss fee structure.
[62]
[PDF] Analysis of an Optimal Stopping Problem Arising from Hedge Fund ...
Jun 19, 2019 · In this paper, we consider the liquidation timing decision of the investor in a hedge fund with a first- loss and/or shared-loss fee ...
[63]
Neural architecture search applying optimal stopping theory - Frontiers
The goal of the CSP's optimal policy is to maximize the success of selecting the highest ranked candidate (R1). NAS researchers applying the CSP's rules and ...
[64]
[2405.21015] The rising costs of training frontier AI models - arXiv
This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental ...
[65]
https://arxiv.org/abs/2405.21015
[66]
Solving high-dimensional optimal stopping problems using deep ...
Aug 5, 2019 · In this work, we propose an algorithm for solving such problems, which is based on deep learning and computes, in the context of early exercise option pricing,Missing: armed bandits dynamic programming
[67]
[PDF] Chapter 6 MULTI-ARMED BANDIT PROBLEMS
It is a stopping time with respect to the family of σ-fields {F,Ft;t = 0,1,2,...}. 3.3. Stopping Times for Multi-armed Bandit. Problems. Consider the classical ...