Prediction and Risk Optimization Under Uncertainty: A Cross-Domain Meta-Review

Q: What four primitives underlie every mature decision system?

Across finance, operations, insurance, energy, healthcare, causal inference, and e-commerce, every mature decision system composes (i) a calibrated probabilistic model of the state variable, (ii) a coherent risk-aware objective functional, (iii) an explicit operational constraint set, and (iv) a principled exploration mechanism. The remainder of the meta-review is a taxonomy of how these four primitives are instantiated.

Q: How is the DataGlass marketplace ad-budget system positioned in the meta-review?

DataGlass is the connecting case (Section 12.11). It inherits Markowitz's portfolio framing, Almgren–Chriss's cost-of-thrashing, the newsvendor's capacity-constrained service-level structure, Cramér–Lundberg's tail-risk machinery, contextual bandits' calibrated exploration, RTB's Lagrangian shadow-price interpretation, the M5 competition's emphasis on calibrated uncertainty, and double machine learning's causal-identification machinery. The novelty is composition and the explicit treatment of the platform as an opaque, constraint-imposing intermediary rather than a transparent auction.

Q: What empirical lift does DataGlass report?

DataGlass reports 21.3% offline and 21.6% online portfolio-profit lift over the manual baseline, with reallocation frequency reduced 43.8% [1, Section X].

Q: Which open problems does the meta-review identify?

Nine open problems: (P1) time-consistent risk-averse Bellman for marketplace allocation; (P2) Wasserstein DRO with bandit regret guarantees; (P3) joint optimisation of budget and target ROAS; (P4) causal identification under attribution mixing; (P5) cross-marketplace transfer learning; (P6) conformal-prediction integration; (P7) algorithmic-fairness constraints; (P8) mechanism-design counter-strategies under first-price auctions; (P9) foundation-model exploitation of unstructured signals.

Q: Why does rolling-mean reported ROAS fail as a budget heuristic?

Rolling-mean estimators target average rather than marginal return, and the gap between average and marginal is monotone increasing in spend over a saturating Hill response curve. They therefore over-allocate to campaigns already in the saturated regime. They also confuse reported with true profit-adjusted ROAS and ignore attribution latency, inventory dilution, and the platform-side cost of bid churn — the analytical mechanism is detailed in the companion paper [2, Section 3].

DataGlass Labs Research

Working paper, May 2026. Companion to the DataGlass technical paper [1] and the seller-pain meta-paper [2]. Cite as: DataGlass Labs Research, "Prediction and Risk Optimization Under Uncertainty: A Cross-Domain Meta-Review of Methods in Finance, Operations, Causal Inference, and E-Commerce Decision Intelligence," DataGlass Labs Research working paper, May 2026.

Keywords. Decision under uncertainty, risk-aware optimization, distributionally robust optimization, stochastic dynamic programming, mean-variance portfolio theory, conditional value-at-risk, multi-armed bandits, online convex optimization, model predictive control, causal inference, double machine learning, demand forecasting, real-time bidding, e-commerce decision intelligence, marketplace advertising, meta-review, systematic synthesis.

Abstract

Background and objectives. The state of the art in prediction and risk-aware optimization under uncertainty has been developed independently in finance, operations research, insurance, energy, healthcare, causal inference, and — most recently — e-commerce decision intelligence. The literatures share a mathematical core but cite each other only weakly. We map the landscape, identify the small set of formal primitives that underlies every mature production system, and bridge from canonical finance and operations results to the decision-intelligence problems of marketplace e-commerce — with the DataGlass system [1] as the connecting case.

Methods and results. Structured narrative meta-review across Scopus, Web of Science, Google Scholar, arXiv, and SSRN, 1952–2026. The corpus is 213 primary works and 41 textbook/handbook references, screened against an AMSTAR-2-inspired rubric (formal rigor, empirical validation, reproducibility, deployment evidence). Every mature decision system in the surveyed domains is a composition of four primitives: (i) a calibrated probabilistic model, (ii) a coherent risk-aware objective, (iii) an explicit operational constraint set, and (iv) a principled exploration mechanism. Eleven worked cases — Markowitz mean-variance, Rockafellar–Uryasev CVaR, Almgren–Chriss execution, the data-driven newsvendor, Cramér–Lundberg ruin theory, contextual-bandit news recommendation, real-time bidding as constrained MDP, the M5 forecasting competition, double machine learning, Wasserstein DRO, and DataGlass marketplace ad budget allocation — instantiate the framework. Cross-domain transfer is strong in the formal core (shadow prices, Bellman recursion, no-regret guarantees) and weaker in the calibration layer (risk-measure choice, ambiguity-set design, exploration schedule).

Conclusions and plain-language summary. Cross-domain reading is structurally under-weighted in current practice; the next wave of e-commerce decision-intelligence systems will be built by engineers fluent in the prior generation's finance and operations literatures. We close with nine open problems. The non-technical version: the same math runs stock portfolios, inventory orders, insurance pricing, and Shopee or Amazon ad budgets — once you see the shared structure, many "novel" e-commerce algorithms turn out to be re-derivations of finance and operations classics. The paper provides a citable map for researchers and a checklist for practitioners.

Notation

Symbol	Meaning
$θ$	Unknown parameter; element of parameter space $Θ$
$X$	Random outcome / state
$a, π$	Action; policy mapping states to actions
$A$	Action / decision set
$L (a, X)$	Loss functional
$ρ (\cdot)$	Risk measure (variance, VaR, CVaR, generic coherent / convex)
$P, Q$	Ambiguity sets over distributions
$V^{⋆}$	Value function in dynamic programming
$T$	Horizon length (or sample size where context indicates)
$K, N$	Number of arms / campaigns / assets
$B_{t}$	Budget at time $t$
$μ^{⋆}, λ^{⋆}$	Lagrange multiplier / shadow price on a budget or resource constraint
$R_{T}$	Cumulative regret over horizon $T$
$V_{T}$	Path variation in non-stationary settings
$VaR_{α}, CVaR_{α}$	Value-at-Risk and Conditional Value-at-Risk at confidence $α$
$η, κ$	Auxiliary scalars (context-dependent: temporary impact, decay rate, Lagrangian)
$W_{p} (P, Q)$	$p$ -Wasserstein distance between distributions
$F_{t}$	Feasible / filtration set at time $t$
$b_{i, t}$	Daily budget for campaign $i$ on day $t$ (e-commerce notation, Section 11)
$π_{i} (b)$	Expected profit of campaign $i$ at budget $b$
AOV, CVR, ROAS	Average order value; conversion rate; return on ad spend
$m_{g}, r, f, ρ$	Gross margin, return rate, platform-fee rate, fulfilment overhead (Section 11.1)

Symbols recurring across the meta-review. Domain-specific specialisations are introduced where used.

1. Introduction

Decision-making under uncertainty is the unifying problem of quantitative finance, operations research, insurance, energy economics, healthcare operations, causal inference for program evaluation, and — increasingly — e-commerce decision intelligence. Across these domains, the working stack is structurally similar: a probabilistic model is fit to historical data, a loss or utility functional is specified, an optimization problem is solved subject to operational constraints, and the resulting policy is deployed in an environment that will eventually deviate from the model's assumptions. The methodological vocabulary is shared. Markowitz [3], Bellman [4], Robbins [5], and Knight [6] still set the agenda. What differs across domains is the calibration of the primitives — what is observable, what is controllable, what is the appropriate notion of "risk," and what is the cost of being wrong.

This paper is a meta-review of that shared landscape. The motivating gap is twofold. First, the academic literature on prediction and risk-aware optimization is fragmented along domain lines. A paper on conditional value-at-risk in portfolio theory rarely cites the closely related work on robust newsvendor inventory, and almost never cites the production e-commerce literature on real-time bidding under budget constraints, despite the three problems being structurally isomorphic. The same Lagrange multiplier — call it a shadow price in operations, a budget multiplier in advertising, a tangency-portfolio scalar in finance — recurs verbatim across domains, but the citation graphs barely intersect. Second, the e-commerce decision-intelligence stack — pricing, demand forecasting, ad budget allocation, recommendation, promotion optimization — is rapidly adopting techniques developed for finance and operations decades earlier, often without an explicit articulation of which translation steps are required and which are not. The result is a class of "novel" e-commerce algorithms that, on inspection, are unwitting re-derivations of finance and operations classics — and a class of finance and operations results that fail to land in the e-commerce literature simply because no one has written the bridge. A consolidated meta-review serves both audiences.

The contribution of this paper is not a new methodological result. It is (i) a unified taxonomy under which finance, operations, causal inference, and e-commerce optimization can be read as instances of the same problem class; (ii) a critical, citable synthesis of the canonical results in each domain; (iii) eleven detailed case studies that ground the abstract framework in concrete formulations; (iv) an explicit bridge from the finance and operations literature to the decision-intelligence problems of marketplace e-commerce, with the DataGlass system [1] as one worked example; and (v) the most complete cross-domain bibliography we are aware of in this area, with 254 references organized thematically.

The paper is structured as follows. Section 2 documents the meta-review methodology with explicit search strings, screening protocol, and quality-assessment rubric. Section 3 develops the unified theoretical foundations. Sections 4–11 are the domain reviews. Section 12 contains eleven detailed case studies. Section 13 is the cross-domain quantitative synthesis with comparison tables. Section 14 documents heterogeneity, risk of bias, and publication-bias considerations. Section 15 lists nine open problems. Sections 16–18 contain limitations, reproducibility, and conflict-of-interest statements. Three appendices provide a glossary, the search strings, and additional summary tables. Citations follow IEEE numerical style.

↳ The gap, and why this paper exists

The gap. Mature decision systems in quantitative finance, operations research, insurance, energy economics, healthcare, causal inference, and e-commerce solve structurally identical problems — yet the literatures cite each other only weakly. A CVaR-portfolio paper rarely cites the closely related distributionally-robust newsvendor work, and almost never cites the marketplace ad-budget literature, even though the three are isomorphic up to renaming. The seller-side e-commerce stack, in particular, is rapidly re-deriving results that finance and operations settled decades earlier. The motivation. A citable cross-domain map. We argue that every production decision system in this space composes the same four primitives — a calibrated probabilistic model, a coherent risk-aware objective, an operational constraint set, and a principled exploration mechanism — and we trace those primitives through eleven worked cases, from Markowitz mean-variance to the DataGlass marketplace ad-budget allocation system, so engineers in newer fields can inherit the prior generation's results instead of reinventing them.

2. Methodology of the Meta-Review

We follow the recommendations of Petticrew and Roberts [7] for systematic reviews in the social sciences and operations research, adapted to the structural-review form because the unit of analysis is method rather than clinical effect size. The review is not PRISMA-compliant in the strict sense [8] because pooled effect sizes are not meaningfully comparable across domains; we therefore adopt a structured narrative synthesis with explicit inclusion criteria, quality assessment, and heterogeneity discussion.

2.1 Research questions

The review addresses four research questions.

RQ1. What is the formal common structure, if any, of mature decision systems in finance, operations, insurance, energy, healthcare, causal inference, and e-commerce?

RQ2. Which mathematical primitives recur across domains and which are domain-specific?

RQ3. Which results from finance and operations transfer cleanly to e-commerce ad budget allocation and which do not?

RQ4. What are the binding open research problems at the intersection?

2.2 Search strategy

Searches were conducted between January and April 2026 in the following databases: Scopus, Web of Science Core Collection, Google Scholar, the ACM Digital Library, IEEE Xplore, INFORMS PubsOnLine, JSTOR, arXiv (cs.LG, stat.ML, math.OC, q-fin.PM, q-fin.TR, q-fin.RM), SSRN, and RePEc. Search strings combined a methodological term ("risk-aware optimization", "conditional value-at-risk", "distributionally robust", "multi-armed bandit", "Bayesian decision", "constrained Markov decision process", "online convex optimization", "model predictive control") with a domain term ("portfolio", "execution", "inventory", "newsvendor", "insurance", "ruin", "advertising", "real-time bidding", "demand forecasting", "ad budget", "marketplace"). The full search-string inventory is given in Appendix B.

2.3 Inclusion and exclusion criteria

Inclusion. A method (or method family) qualifies if all four of the following hold: (i) it produces a decision under non-trivial uncertainty about a state variable; (ii) it falls within the seven specified domains or in a clearly cross-cutting methodology stratum (causal inference, online learning); (iii) it is either canonically cited in subsequent literature or (post-2018) demonstrates production-scale deployment with a quantified empirical lift; (iv) the paper or chapter specifies its loss functional, constraint set, and uncertainty model with enough precision for reproduction or critique.

Exclusion. Pure prediction work without an explicit decision objective; qualitative case studies without analytical content; vendor whitepapers without independent validation; methods superseded by demonstrably better successors that are themselves included; papers behind paywalls without accessible preprints.

2.4 Screening and selection

Initial searches returned approximately 6,400 candidate records. After deduplication and abstract-level screening against Section 2.3 criteria, 487 records were retained for full-text screening. Of these, 213 primary research papers and 41 textbook/handbook chapters were retained for the final corpus. Records were screened against a written checklist; in cases of ambiguity, two reviewers independently rated the record on the Section 2.6 quality rubric and disagreements (12 records) were resolved by discussion.

2.5 Data extraction

For each retained record we extracted: (i) domain; (ii) problem class (single-period vs. sequential, finite vs. infinite horizon, full information vs. bandit); (iii) probabilistic model class; (iv) loss/risk functional; (v) constraint set; (vi) exploration mechanism if any; (vii) theoretical result(s); (viii) empirical validation if any; (ix) deployment evidence if any. Extraction was performed into a structured database; summary tables in Section 13 are derived from this database.

2.6 Quality assessment

Each record was scored on a four-dimensional rubric inspired by the AMSTAR-2 framework [9], adapted for methodological review:

1. Formal rigor — does the paper state assumptions and prove its claims, or is the result heuristic?

2. Empirical validation — is there at least one numerical study with held-out evaluation?

3. Reproducibility — is the algorithm specified to a level that an independent implementation could reproduce headline results?

4. Deployment evidence — has the method been deployed at production scale, with an effect size and confidence interval reported?

Scores are 0 / 1 / 2 on each dimension (0 = absent, 1 = partial, 2 = strong). The summary tables in Section 13 indicate the median rubric score for each method family.

2.7 Heterogeneity assessment

Quantitative heterogeneity (in the meta-analytic sense of $I^{2}$ statistic) is not meaningful across domains. We instead document structural heterogeneity along the four-primitive taxonomy of Section 3: probabilistic model, risk objective, constraint set, exploration mechanism. Cross-domain transfer is graded strong / partial / weak based on whether the formal object, the calibration, and the empirical performance characteristics carry over.

2.8 Synthesis approach

We adopt a vote-counting synthesis at the level of methodological primitives — not at the level of effect sizes — supplemented by detailed worked cases that exhibit the primitives concretely. This approach is appropriate when the units of synthesis are methods rather than studies and when effect sizes are not commensurable.

2.9 Pre-registration and protocol deviations

The protocol was not pre-registered on PROSPERO because PROSPERO's scope is restricted to health-related reviews. The protocol is recorded internally at DataGlass Labs Research and is available on request. One deviation from the protocol: the post-screening corpus expanded the causal inference domain (Section 9) beyond the original scope after pilot extraction made clear that the primitives were structurally critical to the e-commerce sections.

3. Foundations: A Unified Language for Prediction and Risk-Aware Optimization

3.1 Decision-theoretic preliminaries

Let $Θ$ denote the (unknown) parameter, $X$ the (random) outcome, and $a \in A$ the action. The agent observes data $D$ , forms a posterior $p (θ ∣ D)$ , and chooses $a$ to minimize an expected loss $L$ . The Bayes action is

a^{⋆} = ar g a \in A min E_{θ \sim p (θ ∣ D)} E_{X \sim p (X ∣ θ)} [L (a, X)] .

This is the canonical Savage–Berger Bayesian decision-theoretic setup [10], [11]. Frequentist alternatives — minimax, empirical risk minimization, statistical learning theory — replace the outer expectation with worst-case or finite-sample analogues [12].

The four-layer decomposition that organizes the rest of this paper emerges immediately. The probabilistic model is $p (X ∣ θ) p (θ)$ ; the risk-aware objective is some functional $ρ [L (a, X)]$ generalizing $E [L]$ ; constraints restrict $A$ to feasible actions; and exploration ensures that $p (θ ∣ D)$ becomes concentrated where it matters most for $a^{⋆}$ .

The choice of loss therefore implicitly chooses the summary statistic of the posterior that the system optimizes — a point that becomes operationally critical in Section 11.4 (forecasting) and Section 11.6 (advertising).

3.2 Risk measures

Replacing $E$ with a risk measure $ρ$ produces a risk-aware objective. We catalogue the four classes that recur most frequently.

Variance and mean-variance. Markowitz [3] defines portfolio risk as variance and gives the canonical mean-variance program

w min w^{⊤} Σ w s.t. μ^{⊤} w = \overset{r}{ˉ}, 1^{⊤} w = 1,

which produces the Markowitz frontier and, with a risk-free asset, the Capital Market Line of Sharpe [14]. The objection that variance penalizes upside as well as downside motivates downside-risk alternatives.

Value-at-Risk. $VaR_{α} (L) = in f {ℓ : Pr (L \leq ℓ) \geq α}$ . VaR is the regulatory lingua franca of banking [15] but is non-coherent: it fails sub-additivity, so portfolio diversification can increase VaR.

Conditional Value-at-Risk (Expected Shortfall). $CVaR_{α} (L) = E [L ∣ L \geq VaR_{α} (L)]$ . Rockafellar and Uryasev [16], [17] showed that CVaR is coherent in the sense of Artzner et al. [18] and admits the Linear-Program reformulation

CVaR_{α} (L) = η min η + \frac{1}{1 - α} E [(L - η)^{+}] .

When $L$ is linear in decisions, this becomes a tractable LP — a watershed result that made CVaR the dominant practical risk measure in modern portfolio optimization, capital allocation, and increasingly in fairness-aware machine learning.

Convex and distortion risk measures. Föllmer and Schied [19] and Frittelli and Rosazza Gianin [20] generalize coherence to convex risk measures, which preserve the dual representation

ρ (L) = Q \in Q sup {E_{Q} [L] - α (Q)},

with $α (Q)$ a penalty function on test measures. Distortion risk measures of Wang [21] cover the actuarial premium-principle literature and recover CVaR as a special case of a piecewise-linear distortion. Spectral risk measures [22] sit between CVaR and the broader convex class.

3.3 Robust and distributionally robust optimization

When the distribution of $X$ is itself uncertain, the agent can hedge against the worst case in an ambiguity set $P$ :

a min P \in P sup E_{X \sim P} [L (a, X)] .

Ben-Tal, El Ghaoui, and Nemirovski [23] systematize the deterministic robust counterpart for ellipsoidal uncertainty in linear and conic programs, achieving tractable second-order-cone reformulations. Bertsimas and Sim [24] give the "price of robustness" budgeted-uncertainty framework that interpolates between the nominal and worst-case problems with a tunable conservatism parameter $Γ$ .

Distributionally Robust Optimization (DRO) replaces the parametric ambiguity set with a divergence ball.

Moment ambiguity. Delage and Ye [25] characterize the worst-case expectation under known mean and second-moment intervals and prove SDP reformulability.

** $ϕ$ -divergence ambiguity.** Ben-Tal et al. [26] develop DRO under Kullback–Leibler, $χ^{2}$ , and Hellinger ambiguity, with explicit worst-case expressions.

Wasserstein ambiguity. Esfahani and Kuhn [27] place a $p$ -Wasserstein ball of radius $ε$ around the empirical distribution and prove the equivalence

Q : W_{p} (Q, \hat{P}_{n}) \leq ε sup E_{Q} [L (a, X)] = λ \geq 0 in f λ ε^{p} + \frac{1}{n} i = 1 \sum n x sup {L (a, x) - λ ∥ x - \overset{x}{^}_{i} ∥^{p}},

which reduces Wasserstein DRO to a regularized empirical risk minimization. Blanchet, Murthy, and Si [28] establish duality and rate results; Gao and Kleywegt [29] give a comprehensive treatment. Wasserstein DRO has emerged as the unifying lens for both adversarial robustness in machine learning [30] and operational hedging in supply chains.

The Knightian distinction between risk (probabilities known) and uncertainty (probabilities themselves uncertain) [6], formalized by Ellsberg [31] and Gilboa–Schmeidler [32], is the philosophical foundation of this entire literature. Its decision-theoretic analogue — the maxmin expected utility representation — is the link between behavioral economics and DRO.

3.4 Stochastic dynamic programming

When decisions are sequential, the natural formalism is the Markov Decision Process. The Bellman equation [4]

V^{⋆} (s) = a \in A (s) min {c (s, a) + γ E_{s^{'} \sim P (\cdot ∣ s, a)} [V^{⋆} (s^{'})]},

is the universal recursion. Bertsekas [33] and Puterman [34] are the standard references for finite-state MDPs. Powell [35] develops approximate dynamic programming with explicit treatment of the curse of dimensionality across resource allocation, energy, and freight applications. Sutton and Barto [36] is the modern reinforcement-learning treatment.

Risk-sensitive DP. Howard and Matheson [37] introduced risk-sensitive Bellman recursions with exponential utility. Ruszczyński [38] establishes the time-consistency conditions under which a Markov risk-measure formulation yields a tractable DP. The risk-averse Bellman recursion is

V^{⋆} (s) = a min {c (s, a) + γ ρ_{s^{'} ∣ s, a} [V^{⋆} (s^{'})]},

where $ρ$ is a coherent (or convex) Markov risk measure. Time-consistency requires that $ρ$ admits a translation-equivariant decomposition.

Constrained MDPs. Altman [39] develops the theory of constrained MDPs in which the agent maximizes a primary expected reward subject to expected-cost constraints. Lagrangian duality reduces the constrained MDP to an unconstrained one with cost $c (s, a) + λ^{⊤} g (s, a)$ , with $λ$ the dual variable. This is the formal home of every "budget-constrained" decision system reviewed in Sections4, 8, 9, and 11.

Robust and DR MDPs. Iyengar [40] and Nilim and El Ghaoui [41] develop robust MDPs with rectangular ambiguity in the transition kernel; Wiesemann, Kuhn, and Rustem [42] extend to convex ambiguity. Distributionally robust dynamic programming has emerged as a unified language for safe RL [43].

3.5 Online learning and bandits

When the environment is observed sequentially and the agent must trade off exploration and exploitation, the relevant theory is online learning [44], [45] and multi-armed bandits [46]. The canonical regret bound for the upper-confidence-bound (UCB) algorithm of Auer, Cesa-Bianchi, and Fischer [47] is

R_{T} = O (K T lo g T),

matched up to constants by Thompson Sampling [48], [49], which is empirically superior under model misspecification and delayed feedback [50]. Under budget constraints, Bandits with Knapsacks [51] gives an $O (OPT / B)$ regret bound directly relevant to advertising allocation [1]. Under non-stationarity, dynamic-regret bounds of Besbes, Gur, and Zeevi [52] yield

R_{T} = O ((K V_{T})^{1/3} T^{2/3}),

where $V_{T}$ is the path variation; Chen, Lee, and Luo [53] sharpen this to $\tilde{O} (V_{T}^{res} T)$ when side information is available. Hazan [45] develops online convex optimization with $O (T)$ regret via online gradient descent, $O (lo g T)$ under strong convexity, and the mirror-descent generalization [54], [55] that connects to natural gradients and Bregman divergences.

The four primitives — calibrated probabilistic model, risk-aware objective, constraint set, exploration mechanism — appear in every domain reviewed below. The remainder of this paper is, in essence, a taxonomy of how those primitives are instantiated.

3.6 Model predictive control and receding-horizon optimization

Model predictive control (MPC) [57], [58] is the cross-domain workhorse for sequential constrained decision problems with a finite look-ahead. At each step, the agent solves

u_{t : t + H} min k = t \sum t + H c (s_{k}, u_{k}) s.t. s_{k + 1} = f (s_{k}, u_{k}, w_{k}), (s_{k}, u_{k}) \in Z,

implements only $u_{t}$ , observes the new state, and re-solves. The receding-horizon principle is the engineering counterpart of the Bellman recursion; it is the dominant deployment pattern in process control, autonomous driving [59], energy-system unit commitment [35], and is increasingly used for inventory and ad-budget allocation under known short-horizon dynamics. Stochastic MPC [60] and tube MPC [61] add explicit uncertainty handling.

3.7 Online convex optimization, mirror descent, and primal-dual methods

The online-convex-optimization (OCO) framework [44], [45] provides regret guarantees for sequential decisions against arbitrary convex losses. Online gradient descent achieves $O (T)$ regret; online Newton step [62] achieves $O (lo g T)$ under exp-concavity; FTRL and OMD [54], [63] achieve $O (T lo g d)$ in $d$ dimensions on the simplex. Primal-dual methods of Balseiro, Lu, and Mirrokni [64], [65] develop dual mirror descent for online allocation with simultaneous regret and constraint-violation guarantees, directly relevant to budget-pacing problems in advertising. Gordon, Greenwald, and Marks [66] give the no-regret-to-correlated-equilibrium link.

4. Domain Review I — Finance and Quantitative Risk Management

4.1 Mean-variance and the CAPM

The intellectual genealogy of risk-aware optimization runs from Markowitz [3] through Tobin's separation theorem [67] to Sharpe's CAPM [14], Lintner [68], and Mossin [69]. The mean-variance frontier, the tangency portfolio, and the equilibrium pricing relation $E [r_{i}] - r_{f} = β_{i} (E [r_{M}] - r_{f})$ are the foundation of every modern asset-allocation system. Empirical critiques of the variance-as-risk premise (Mandelbrot [70], Fama [71]) and of the constant-beta CAPM (Fama and French [72], [73]) drive subsequent extensions: APT [74], multi-factor models, conditional CAPM, and the q-factor model of Hou, Xue, and Zhang [75].

4.2 CVaR portfolio optimization and coherent risk

The Rockafellar–Uryasev linearization [16] enabled large-scale portfolio optimization with downside-risk objectives:

w min η + \frac{1}{( 1 - α ) N} n = 1 \sum N z_{n} s.t. z_{n} \geq L_{n} (w) - η, z_{n} \geq 0, μ^{⊤} w \geq \overset{r}{ˉ}, 1^{⊤} w = 1, w \geq 0,

with $L_{n} (w) = - r_{n}^{⊤} w$ . The resulting LP scales to thousands of assets and millions of scenarios. CVaR is the standard regulatory measure under the Basel III Fundamental Review of the Trading Book and is the internal economic-capital measure for most large insurers under Solvency II [15].

4.3 Robust portfolio optimization

Ben-Tal and Nemirovski [23], Goldfarb and Iyengar [76], and Tütüncü and Koenig [77] develop robust mean-variance and robust factor-model portfolios in which the moment estimates $(\overset{μ}{^}, \hat{Σ})$ are themselves treated as uncertain. The robust-portfolio result is that introducing modest ambiguity uniformly improves out-of-sample Sharpe ratio relative to the plug-in MV portfolio, formalizing the practitioner intuition that "shrinkage works." Ledoit and Wolf's covariance shrinkage [78], [79] is the closely related Bayesian/Stein answer; Bayesian portfolio choice in the Black–Litterman tradition [80] integrates investor views with market equilibrium.

4.4 Algorithmic execution: Almgren–Chriss

The execution problem — sell $X$ shares over $T$ time steps minimizing expected cost plus a multiple of cost variance — is solved analytically by Almgren and Chriss [81], producing the efficient frontier of execution. The optimal trajectory satisfies

n_{k} = \frac{2 sinh ( κ T /2 )}{sinh ( κ T )} cosh (κ (T - (k - \frac{1}{2}))) X,

where $κ = λ σ^{2} / η$ depends on volatility $σ$ , temporary impact $η$ , and risk aversion $λ$ . Subsequent work — Obizhaeva and Wang [82], Gatheral [83], Cartea and Jaimungal [84] — refines impact dynamics, introduces transient impact, and integrates with stochastic order books.

4.5 Derivatives, hedging, and stochastic control

Black, Scholes, and Merton [85], [86] gave the parabolic PDE characterization of European option prices under continuous hedging, and the Merton consumption-investment problem [87] introduced dynamic-programming reasoning to portfolio choice. Cont and Tankov [88] survey the jump-process extensions; Glasserman [89] develops the Monte Carlo machinery for derivative pricing under non-trivial dynamics. Cartea, Jaimungal, and Penalva [90] integrate stochastic control with limit-order-book microstructure to give the modern algorithmic-trading reference. Local-volatility [91] and stochastic-volatility models [92], [93] capture the volatility smile; Carr and Wu [94] survey the variance-swap literature.

4.6 Statistical arbitrage and high-frequency trading

Avellaneda and Lee [95] formalize the cointegration-based statistical-arbitrage strategy as an Ornstein–Uhlenbeck mean-reversion problem; Guéant, Lehalle, and Fernandez-Tapia [96] solve the market-making problem under inventory risk with closed-form bid–ask quotes. Both are stochastic-control problems whose solution structure — quotes that depend on inventory — is the natural cousin of the budget-allocation problem in marketplace ads, where utilization plays the role of inventory.

4.7 Credit risk and counterparty risk

Merton's structural model [97] and the reduced-form intensity-based models of Duffie and Singleton [98] are the two foundational families. Brigo, Morini, and Pallavicini [99] cover counterparty-risk and CVA modeling. Tail-risk dependence is modeled via copulas [100], [101]; Embrechts, McNeil, and Straumann [102] document the dangers of linear-correlation thinking under heavy tails.

4.8 Macro-finance and central-bank decision-making

Svensson [103] develops the linear-quadratic regulator interpretation of monetary policy under inflation targeting, with explicit Bellman recursions. The robust-control extensions of Hansen and Sargent [104] introduce model uncertainty into macroeconomic policy rules. The conceptual link to e-commerce decision intelligence is the design of feedback policies under model misspecification — exactly the problem DataGlass solves at the campaign level [1].

5. Domain Review II — Operations and Supply Chain

5.1 The newsvendor

The newsvendor [105], [106] is the canonical single-period, risk-neutral decision under demand uncertainty:

q^{⋆} = F^{- 1} (\frac{c _{u}}{c _{u} + c _{o}}),

where $c_{u}, c_{o}$ are per-unit underage and overage costs and $F$ is the demand CDF. Its risk-aware extensions — CVaR newsvendor [107], distributionally robust newsvendor [108], data-driven newsvendor with the SAA bound of Levi, Roundy, and Shmoys [109] — collectively form the simplest non-trivial laboratory in which the four primitives of Section 3 can be studied in closed form. Ban and Rudin [110] give a contextual newsvendor with covariate-dependent demand and prove $O (n^{- 1/2})$ rates.

5.2 Multi-period inventory and base-stock policies

Scarf [111] proves the optimality of $(s, S)$ policies for inventory under fixed ordering costs; Clark and Scarf [112] extend to multi-echelon systems; Federgruen and Zipkin [113] establish base-stock optimality under the average-cost criterion. The unifying result is that the inventory problem is a constrained MDP whose optimal policy is characterized by a single threshold per state, and is therefore amenable to ADP at scale [35].

5.3 Pricing and revenue management

Talluri and van Ryzin [114] is the reference for revenue management under capacity constraints. The dynamic-pricing literature of Gallego and van Ryzin [115], with the data-driven extensions of Besbes and Zeevi [116] and Ferreira, Lee, and Simchi-Levi [117], is the conceptual link between operations and the e-commerce-pricing literature reviewed in Section 11.2. Cohen, Perakis, and Pindyck [118] formalize promotion optimization as a mixed-integer program with stochastic demand.

5.4 Supply chain coordination and contract design

Cachon's survey [119] is the canonical operations-research reference for coordinating contracts (buyback, revenue-sharing, quantity-flexibility). The bullwhip-effect analysis of Lee, Padmanabhan, and Whang [120] is structurally relevant to ad-spend cascading on cross-channel attribution, where small upstream shocks amplify downstream.

5.5 Network revenue management

The displacement-cost framework of Talluri and van Ryzin [121] for airline network revenue management — under which a Lagrangian shadow price is computed for each capacity-constrained leg — is the closest operations-research analogue of the multi-campaign budget shadow price $μ^{⋆}$ in DataGlass [1, Section V]. Adelman [122] develops affine-policy approximations; Topaloglu [123] applies the framework to stochastic resource allocation.

6. Domain Review III — Insurance and Actuarial Science

The actuarial tradition is the original home of tail-risk modeling. The Cramér–Lundberg model [124] of an insurance surplus process

U_{t} = u + c t - i = 1 \sum N_{t} X_{i},

with $N_{t}$ a Poisson claim-arrival process, gives the foundational ruin probability $ψ (u) = Pr (in f_{t \geq 0} U_{t} < 0)$ and the Lundberg upper bound $ψ (u) \leq e^{- R u}$ , with $R$ the adjustment coefficient. Extreme Value Theory [125], [126] characterizes the limiting distribution of normalized maxima as a Generalized Extreme Value distribution; Pickands' theorem [127] gives the Generalized Pareto for threshold exceedances. McNeil, Frey, and Embrechts [15] is the modern reference. Distortion risk measures [21] formalize the actuarial premium principle as a coherent risk functional. Solvency-II internal-model regulation [128] requires insurers to quantify a one-year 99.5% VaR — a regulatory analogue of the Basel III trading-book CVaR requirement, with similar moral-hazard and capital-arbitrage concerns [15].

The relevance to e-commerce is direct in two places: the heavy-tailed nature of conversion-rate outliers under viral demand [129], and the modeling of return-rate tails for true-ROAS adjustment [1, Section III].

7. Domain Review IV — Energy Systems and Stochastic Resource Allocation

Powell [35] is the canonical reference for energy-system stochastic optimization. The unit-commitment problem under wind and solar uncertainty — schedule generators over $T$ hours minimizing expected cost subject to ramp, capacity, and reserve constraints — is the canonical large-scale application of stochastic programming with recourse. Two-stage formulations [130], multi-stage stochastic dual dynamic programming [131], and stochastic MPC with chance constraints [132] dominate the operational literature. Storage and demand-response add a state-dependent constraint structure that closely resembles the inventory-dilution mechanism in DataGlass [1, Section IV.C]. Distributionally robust energy planning under climate-scenario ambiguity is an active research frontier [133].

8. Domain Review V — Healthcare Operations and Decision Support

In healthcare, Ayer, Alagoz, and Stout [134] formulate the breast-cancer screening decision as a partially observed MDP. Sutton et al. [135] survey RL in clinical decision support; Komorowski et al. [136] use off-policy evaluation to derive sepsis-treatment policies on ICU data. Bertsimas et al. [137] develop the decision rule approach to clinical optimization, a constrained DRO formulation. The off-policy evaluation problem [138], [139] is the methodological bottleneck shared with e-commerce recommendation: the inability to A/B-test arbitrary policies in production puts a disproportionate weight on causal-inference machinery.

9. Domain Review VI — Causal Inference and Program Evaluation

A complete meta-review must include causal inference, because every production decision system eventually faces the question "is the policy actually causing the lift, or are we observing a confound?"

9.1 Potential-outcomes framework

Neyman [140], Rubin [141], Imbens and Rubin [142] develop the potential-outcomes language: the causal effect of treatment $T$ on outcome $Y$ for unit $i$ is $τ_{i} = Y_{i} (1) - Y_{i} (0)$ , with the fundamental problem of causal inference being that only one potential outcome is observed per unit. Pearl [143] develops the do-calculus and graphical-model formulation that complements the potential-outcomes view.

9.2 Randomized experiments

Fisher [144] introduced randomization inference; modern extensions include adaptive [145] and contextual [146] designs. Athey and Imbens [147] survey field experiments in economics. Gordon, Zettelmeyer, Bhargava, and Chapsky [148] establish the gold standard for advertising measurement: only randomized experiments produce reliably unbiased lift estimates, with observational and quasi-experimental approaches typically biased by 30–100% in advertising contexts.

9.3 Quasi-experimental methods

Difference-in-differences [149] and its modern extensions [150] are the standard for staggered policy adoption; regression discontinuity [151], [152] exploits eligibility thresholds; instrumental variables [153] with the LATE interpretation of Imbens and Angrist [154] handle endogenous treatment; synthetic controls of Abadie, Diamond, and Hainmueller [155], [156] construct counterfactual time series from donor pools.

9.4 Machine learning for causal inference

Athey and Imbens [157] introduce causal trees; Wager and Athey [158] extend to causal forests with asymptotic normality. Chernozhukov et al. [159] develop double/debiased machine learning (DML) with the Neyman-orthogonal score $ψ (W; θ, η) = (Y - ℓ (X) - θ (D - m (X))) (D - m (X))$ , achieving $n$ -consistent treatment-effect estimation under nuisance-function rates as slow as $n^{1/4}$ . Künzel et al. [160] introduce meta-learners (S-, T-, X-, R-learners) for heterogeneous treatment effects. Nie and Wager [161] develop the R-learner with quasi-oracle efficiency.

9.5 Sensitivity analysis and unobserved confounding

Rosenbaum [162] gives the bounding-parameter approach to sensitivity analysis. Oster [163] proves coefficient-stability sensitivity under proportional selection on observables and unobservables; Cinelli and Hazlett [164] extend to omitted-variable bias with explicit sensitivity statistics. The DataGlass system uses Oster-style sensitivity together with first-differencing and randomized perturbation [1, Section VIII].

9.6 Off-policy evaluation

Inverse propensity scoring [165], doubly robust estimators of Bang and Robins [166] and Dudík, Erhan, Langford, and Li [139], targeted maximum likelihood of van der Laan and Rose [167], and the per-decision importance weighting of Precup, Sutton, and Singh [168] are the standard machinery. Swaminathan and Joachims [169] develop counterfactual risk minimization for batch contextual bandits. The methodological lesson — off-policy evaluation has variance proportional to importance-ratio range — explains why exploration design (Section 3.5) is structurally tied to evaluation feasibility.

10. Domain Review VII — Online Learning, Online Convex Optimization, and Conformal Prediction

10.1 Online convex optimization

Zinkevich [44], Hazan, Agarwal, and Kale [45], and Cesa-Bianchi and Lugosi [170] develop OCO with adversarial-loss regret guarantees. Online gradient descent achieves $O (T)$ ; online Newton step [62] achieves $O (lo g T)$ under exp-concavity. Mirror descent [54], [55] generalizes to non-Euclidean geometries via Bregman divergences and is the umbrella under which exponentiated-gradient updates [171] for the simplex (relevant to multi-asset and multi-campaign allocations) sit.

10.2 No-regret learning and game-theoretic implications

Hannan consistency [172], the polynomial-weights algorithm of Littlestone and Warmuth [173], and the Cesa-Bianchi–Lugosi prediction-with-expert-advice machinery [170] establish that no-regret play converges to coarse correlated equilibrium [66]. Roughgarden [174] develops the price of anarchy analysis; Foster and Vohra [175] provide calibration via no-regret learning.

10.3 Conformal prediction

Vovk, Gammerman, and Shafer [176], with the modern split-conformal extensions of Lei et al. [177] and adaptive variants of Romano, Patterson, and Candès [178], produce distribution-free prediction sets with finite-sample coverage guarantees. The framework is the natural complement to Bayesian uncertainty: where Bayesian credibility requires a correctly specified prior, conformal coverage requires only exchangeability. Both are relevant to e-commerce decision systems where calibrated uncertainty intervals on demand or response curves drive the downstream optimizer.

10.4 Distribution shift and OOD generalization

Quiñonero-Candela, Sugiyama, Schwaighofer, and Lawrence [179] and Sugiyama and Kawanabe [180] are foundational for covariate-shift adaptation. Arjovsky, Bottou, Gulrajani, and Lopez-Paz [181] develop invariant risk minimization; Sagawa et al. [182] develop group DRO. The connection to operational decision systems is direct: the platform-side learning algorithm in marketplace ads continuously shifts the response distribution, so static models go stale at characteristic time-scales documented in [1, Section VII].

11. Domain Review VIII — E-Commerce Prediction and Optimization

The e-commerce decision-intelligence stack is composed of approximately seven interacting subsystems. We summarize each.

11.1 Demand forecasting

The historical workhorse is the Box–Jenkins ARIMA family [183]; the modern industrial benchmark is exponential smoothing in the unified state-space formulation of Hyndman and Athanasopoulos [184]. The M-competitions [185], [186], [187] document a steady decline in the relative performance of pure statistical methods and the rise of hybrid global models — DeepAR [188], N-BEATS [189], Temporal Fusion Transformer [190], NHITS [191], TimesNet [192] — culminating in the M5 competition [187], where global gradient-boosted models with hierarchical reconciliation [193] won. Smyl [194] gives the hybrid ES-RNN that won M4. The methodological lesson is that cross-series information sharing, not model class per se, drives the gain. Quantile-regression objectives produce calibrated prediction intervals essential for downstream inventory and pricing decisions [110].

11.2 Pricing and elasticity

Demand modeling at the SKU level uses constant-elasticity, multinomial logit [195], and mixed-logit specifications [196]; Berry, Levinsohn, and Pakes (BLP) [197] address price endogeneity via instrumental variables. Modern e-commerce pricing systems combine these structural models with scalable contextual bandits [146] and Thompson-sampled exploration [198]. The Lerner condition $(p^{⋆} - c) / p^{⋆} = 1/∣ ε ∣$ recovers the closed-form optimum under constant elasticity. The DataGlass internal note on elasticity modeling and bundle pricing [199] gives a worked treatment for SKU-level pricing under cross-elasticity.

11.3 Recommendation and personalization

Collaborative filtering [200], matrix factorization [201], and deep recommendation models [202], [203] are the workhorses. Counterfactual evaluation via inverse propensity scoring [204] and doubly robust estimators [139] is the bridge from offline data to online decision; bandit-based learning-to-rank [205] is the closed-loop variant. Attention-based and Transformer recommendation models [206], [207] are now standard at scale. The methodological lesson — recommendation is off-policy reinforcement learning under partial logging — is widely accepted but operationally constraining.

11.4 Promotion and assortment optimization

Cohen et al. [118] formalize multi-product promotion as a MIP with stochastic demand. Assortment optimization under MNL choice [208], [209] gives the revenue-ordered optimal assortment for capacitated problems; Bernstein, Modaresi, and Sauré [210] extend this to dynamic assortment with learning. The unifying language is combinatorial optimization with side information and exploration.

11.5 Real-time bidding and display advertising

RTB is the closest finance-style problem in advertising, because the agent does bid directly. Cai et al. [211] formulate RTB as an MDP with neural-network value approximation, deriving the optimal bid

b_{i}^{⋆} = V^{⋆} (s_{win}) - V^{⋆} (s_{lose}),

with documented production lifts of 16.7% over baselines. Wu et al. [212] add budget constraints via Lagrangian relaxation, giving $b_{i} = v_{i} / λ$ for a budget multiplier $λ$ learned by DQN, achieving 23.4% click-improvement under strict budget compliance. Zhao et al. [213] aggregate to hour-level MDPs for sponsored search, using twin Q-networks with replay refresh. He et al. [214] propose hierarchical RL (HiBid). Wang et al. [215] address ROI-constrained bidding via curriculum-guided Bayesian RL. Liu et al. [216] provide a rigorous ablation. The literature is mature; its limitation, from the perspective of this review, is that it assumes the agent controls bids — an assumption that fails for marketplace platforms (Section 11.7).

11.6 Auction theory and platform mechanism design

Vickrey [217], Myerson [218], and the modern Athey and Segal [219] deliver the foundational mechanism-design theory. Edelman, Ostrovsky, and Schwarz [220] and Varian [221] analyze generalized second-price auctions for sponsored search. Rawat [222] documents algorithmic-collusion bid suppression in first-price auctions when bidders learn — relevant because most modern marketplaces have moved from second-price to first-price [223], affecting the seller's effective response curve [1, Section VIII].

11.7 Marketplace ad budget allocation

Modern marketplace platforms — Shopee, Lazada, TikTok Shop, Amazon Sponsored Products, Walmart Connect, Mercado Libre Ads — do not expose bid-level control. The seller sets only daily budget and target ROAS; the platform's auto-bidder mediates the auction. This is a structurally different problem: the action space is one-dimensional per campaign, the environment is opaque, and the platform actively penalizes high-frequency intervention through learning-phase mechanics. The DataGlass system [1] is the first end-to-end production system that addresses this problem class explicitly. Methodologically it is a synthesis of Hill-saturation response modeling [224], Negative-Binomial overdispersion handling [225], Beta-Binomial conversion modeling with utilization-dependent dilution [226], constrained-portfolio optimization via shadow-price bisection [3], [64], Thompson-sampled exploration [48], CUSUM changepoint detection [227], [228], and randomized perturbation experiments [148]. Section 12.11 returns to this system as the connecting case.

The seller-pain quantification of the companion research article [2] documents the analytical mechanism by which manual heuristics — trial-and-error reallocation, rolling-mean extrapolation of reported ROAS, gut-feel reallocation — are systematically biased estimators of marginal contribution. The true profit-adjusted ROAS

R_{i}^{true} = R_{i}^{rep} \cdot m_{g} \cdot (1 - r) \cdot (1 - f - ρ)

differs from the reported dashboard figure by an order of magnitude in plausible parameterizations [2, Section 3.1].

12. Detailed Case Studies

We now develop eleven worked cases, chosen so that each illustrates a distinct primitive of the Section 3 framework. Each case is presented in the same structure: problem statement, formulation, key result, and what generalizes to e-commerce.

12.1 Case 1 — Markowitz mean-variance portfolio

Problem. Allocate wealth across $n$ risky assets with mean returns $μ$ and covariance $Σ ≻ 0$ to balance return and variance.

Formulation. $min_{w} w^{⊤} Σ w$ s.t. $μ^{⊤} w = \overset{r}{ˉ}$ , $1^{⊤} w = 1$ .

Key result. Closed-form $w^{⋆} (\overset{r}{ˉ}) = Σ^{- 1} (α μ + β 1)$ for scalars $α, β$ determined by the constraints. The locus ${(σ (\overset{r}{ˉ}), \overset{r}{ˉ})}$ is the Markowitz frontier; with a risk-free asset, the tangency portfolio gives the Capital Market Line $\overset{r}{ˉ} - r_{f} = σ \cdot SR^{⋆}$ .

Generalizes to e-commerce. The dual interpretation — the Lagrange multiplier on the budget constraint as the marginal return per unit risk — is the direct ancestor of the equal-marginal-profit shadow-price condition $\partial π_{i} / \partial b_{i} = μ^{⋆}$ that DataGlass solves [1, Section V]. Empirical fragility — extreme sensitivity to $\overset{μ}{^}$ — motivates the Bayesian and robust extensions used in production e-commerce systems, where the analogue concern is over-fit campaign-level response curves.

12.2 Case 2 — Rockafellar–Uryasev CVaR

Problem. Minimize $α$ -CVaR of portfolio loss subject to a target expected return.

Formulation. Using the Rockafellar–Uryasev linearization, the LP

w, η, z min η + \frac{1}{( 1 - α ) N} n \sum z_{n} s.t. z_{n} \geq L_{n} (w) - η, z_{n} \geq 0, μ^{⊤} w \geq \overset{r}{ˉ}, 1^{⊤} w = 1, w \geq 0.

Key result. A linear program in $(w, η, z)$ that scales to $N = 1 0^{6}$ scenarios on commodity hardware [16], [17]. The optimal $η$ is itself the VaR.

Generalizes to e-commerce. Replacing CVaR over loss with CVaR over negative profit yields a downside-risk-aware version of the DataGlass portfolio optimizer that naturally penalizes catastrophic-day campaign blow-ups; the formulation is in the v2.0 roadmap [1, Section XI].

12.3 Case 3 — Almgren–Chriss optimal execution

Problem. Liquidate $X$ shares over $T$ time steps minimizing expected cost plus risk-aversion-weighted variance.

Formulation. Let $x_{k}$ be holdings at step $k$ , $n_{k} = x_{k - 1} - x_{k}$ the trade, $η n_{k}$ the temporary impact, and $γ \sum_{j} n_{j}$ the permanent impact. The mean-variance objective is

(n_{k}) min E [C (n)] + λ Var [C (n)] s.t. k \sum n_{k} = X .

Key result. Closed-form hyperbolic-cosine optimal trajectory; the efficient frontier is parameterized by $λ$ .

Generalizes to e-commerce. Step-size limits and action-count caps in DataGlass [1, Section V] are the direct analogue: aggressive reallocation incurs platform-side learning-phase cost, slow reallocation incurs opportunity cost. The structural form of the trade-off is identical, and the solution structure — exponentially decaying departures from the unconstrained optimum — recurs.

12.4 Case 4 — The data-driven newsvendor

Problem. Decide a single-period order quantity $q$ given $N$ historical demand observations.

Formulation. SAA: $\overset{q}{^} = ar g min_{q} \frac{1}{N} \sum_{n} [c_{u} (D_{n} - q)^{+} + c_{o} (q - D_{n})^{+}]$ . The solution is the empirical quantile $\overset{q}{^} = D_{(⌈ N τ ⌉)}$ for $τ = c_{u} / (c_{u} + c_{o})$ .

Key result. Levi, Roundy, and Shmoys [109] prove a $1+\epsilon$ approximation bound for $N = O (1/ ϵ^{2})$ . The DRO Wasserstein newsvendor of Esfahani–Kuhn [27] gives a regularized variant whose excess-risk bound is the classical $O (1/ N)$ but with explicit dependence on the Wasserstein radius. Ban and Rudin [110] prove $O (n^{- 1/2})$ rates for the contextual newsvendor with covariate-dependent demand.

Generalizes to e-commerce. The CVR-dilution mechanism [1, Section IV] is structurally a newsvendor problem at the campaign level, where utilization above threshold corresponds to over-ordering relative to the seller's effective service capacity.

12.5 Case 5 — Cramér–Lundberg ruin theory

Problem. Compute the probability that an insurance surplus process becomes negative.

Formulation. $U_{t} = u + c t - S_{t}$ , $S_{t} = \sum_{i = 1}^{N_{t}} X_{i}$ with $N_{t}$ Poisson rate $λ$ and claim sizes $X_{i}$ iid with mean $μ$ and CDF $F$ .

Key result. Lundberg's inequality $ψ (u) \leq e^{- R u}$ , where the adjustment coefficient $R$ solves $λ \int_{0}^{\infty} e^{R x} (1 - F (x)) d x = c$ . For exponential claims, $ψ (u) = \frac{λ μ}{c} exp (- (\frac{1}{μ} - \frac{λ}{c}) u)$ .

Generalizes to e-commerce. Conversion-rate "viral demand" tail risk and return-rate tails are heavy-tailed claim-like processes; the actuarial machinery for tail-quantile estimation [125]–[127] transfers directly to the calibration of profit-adjusted ROAS confidence intervals when the seller portfolio includes rare but large-impact promotional events.

12.6 Case 6 — Contextual-bandit news recommendation (LinUCB)

Problem. Choose one of $K$ articles to display, given user context $x \in R^{d}$ , to maximize click-through.

Formulation. Linear payoff model $E [r_{t} ∣ x_{t}, a_{t}] = x_{t}^{⊤} θ_{a_{t}}$ . LinUCB selects $a_{t} = ar g max_{a} (x_{t}^{⊤} \hat{θ}_{a} + α x_{t}^{⊤} A_{a}^{- 1} x_{t})$ .

Key result. Li, Chu, Langford, and Schapire [146] report 12.5% lift on Yahoo News versus a non-contextual baseline; theoretical regret analysis by Chu et al. [229] establishes $O (T d lo g T)$ .

Generalizes to e-commerce. Contextual bandits are now standard in e-commerce recommendation, search ranking, and (with multi-arm extensions) in ad creative selection. The DataGlass exploration layer [1, Section VII.A] is structurally a budget-constrained Thompson Sampling whose regret analysis follows the BwK extension [51] of the same framework.

12.7 Case 7 — Real-time bidding as a constrained MDP

Problem. Choose per-impression bids to maximize expected click value subject to a daily budget.

Formulation. State $s_{t} = (B_{t}, T_{t}, θ_{t})$ ; action $a_{t} = b_{t}$ . The constrained MDP has expected-cost constraint $E [\sum_{t} cost_{t}] \leq B$ . Lagrangian relaxation gives $b^{⋆} (v) = v / λ^{⋆}$ , where $λ^{⋆}$ is the budget shadow price [212].

Key result. Cai et al. [211] report 16.7% lift; Wu et al. [212] report 23.4% lift with strict budget compliance; Liu et al. [216] ablate that budget/time-ratio features dominate.

Generalizes to e-commerce. The dual variable $λ^{⋆}$ in RTB and the shadow price $μ^{⋆}$ in DataGlass are the same object — the marginal value of an additional dollar of advertising — at different temporal granularities. RTB controls bids per impression; DataGlass controls budgets per day; the underlying primal-dual structure is invariant.

12.8 Case 8 — The M5 forecasting competition

Problem. Forecast hierarchical Walmart unit sales at SKU-store-day granularity for 28 days.

Formulation. A panel forecasting problem with hierarchical reconciliation across SKU/category/department/store/state aggregation levels.

Key result. Winning models were global gradient-boosted regressors (LightGBM) with hand-crafted lag and rolling features, tweaked by hierarchical reconciliation methods [193]. Pure deep models (DeepAR, NBEATS) under-performed at SKU level despite winning M4. Quantile-regression objectives produce the calibrated prediction intervals essential for downstream inventory decisions.

Generalizes to e-commerce. The DataGlass response-curve estimation problem is not a standard forecasting problem (it is a budget→outcome mapping, not time-series extrapolation), but the M5 lesson — calibrated uncertainty intervals matter more than headline point accuracy for downstream decisions — translates directly. DataGlass uses Negative-Binomial likelihoods explicitly to produce the calibrated posterior the optimizer requires [1, Section IV].

12.9 Case 9 — Double machine learning for treatment-effect estimation

Problem. Estimate the average treatment effect $θ = E [Y (1) - Y (0)]$ from observational data with high-dimensional confounders $X$ .

Formulation. Partially linear model $Y = D θ + g (X) + ε$ , $D = m (X) + v$ . The Neyman-orthogonal score is $ψ (W; θ, η) = (Y - g (X) - θ (D - m (X))) (D - m (X))$ .

Key result. Chernozhukov et al. [159] prove that with cross-fitting, $\hat{θ}$ is $n$ -consistent and asymptotically normal as long as the nuisance functions $g, m$ are estimated at rate $o (n^{- 1/4})$ — a rate achieved by most modern machine learning methods.

Generalizes to e-commerce. Causal estimation of the budget-response relationship in the presence of high-dimensional confounders (calendar, product, market, competitor) is a direct application. The DataGlass system uses first-differencing plus randomized perturbation as the primary identification strategy [1, Section VIII], but DML provides the standard observational fallback when randomization is infeasible.

12.10 Case 10 — Wasserstein DRO and adversarial robustness

Problem. Train a classifier robust to small perturbations of the input distribution.

Formulation. Wasserstein DRO

θ min Q : W_{p} (Q, \hat{P}_{n}) \leq ε sup E_{Q} [ℓ (θ; X, Y)],

with the Esfahani–Kuhn dual representation as a regularized empirical-risk minimization [27].

Key result. For $p = \infty$ Wasserstein and Lipschitz loss, the worst-case is exactly equivalent to adversarial training with $ℓ_{\infty}$ perturbations of magnitude $ε$ [30]. For $p = 2$ , the dual is a Tikhonov-style regularization with explicit penalty.

Generalizes to e-commerce. Marketplace platforms continuously shift the response distribution as competitor behavior, platform algorithms, and seasonal demand co-evolve. A Wasserstein-DRO formulation of the DataGlass response model, with $ε$ tuned to historical drift, is among the open problems identified in Section 15.

12.11 Case 11 — DataGlass marketplace ad budget allocation

Problem. Allocate daily budgets across a seller's portfolio of campaigns on a platform-controlled marketplace, maximizing expected profit subject to operational constraints.

Formulation. As in [1] and the companion research article [2, Section 4]:

b_{t} max i = 1 \sum N π_{i} (b_{i, t}) s.t. i \sum b_{i, t} \leq B_{t}, b_{i, t} \in F_{i, t},

with $π_{i}$ derived from a Negative-Binomial click model with conditional Hill saturation, a Beta-Binomial conversion model with utilization-dependent dilution, and a true-profit adjustment $m_{g} (1 - r) (1 - f - ρ)$ that corrects the gap between reported and contribution-margin-adjusted ROAS [2, Section 3.1].

Key result. Optimal solution is characterized by the equal-marginal-profit condition $\partial π_{i} / \partial b_{i}^{⋆} = μ^{⋆}$ on the active set; solved by bisection on $μ$ . Empirical lifts of 21.3% offline and 21.6% online with reallocation frequency reduced by 43.8% [1, Section X].

Synthesis. This case is the connecting tissue of the meta-review. It inherits Markowitz's portfolio framing (Section 12.1), Almgren–Chriss's cost-of-thrashing (Section 12.3), the newsvendor's capacity-constrained service-level structure (Section 12.4), Cramér–Lundberg's tail-risk machinery (Section 12.5), contextual bandits' calibrated exploration (Section 12.6), RTB's Lagrangian shadow-price interpretation (Section 12.7), the M5 competition's emphasis on calibrated uncertainty (Section 12.8), and double machine learning's causal-identification machinery (Section 12.9). None of these primitives is novel to e-commerce. The novelty is in the composition — and in the explicit treatment of the platform as an opaque, constraint-imposing intermediary rather than a transparent auction.

None of these primitives is novel to e-commerce. The novelty is in the composition — and in the explicit treatment of the platform as an opaque, constraint-imposing intermediary rather than a transparent auction.

13. Cross-Domain Quantitative Synthesis

13.1 The four-primitive taxonomy

Across the eleven cases and the broader literature surveyed in Sections4–11, every mature decision system instantiates the same four primitives. Table 1 summarizes the instantiation across selected domains.

Domain	Probabilistic model	Risk objective	Constraint set	Exploration
Mean-variance portfolio	Multivariate normal returns	Variance	Budget, no-short	Static (none)
CVaR portfolio	Empirical scenarios	CVaR	Budget, no-short	Static
Almgren–Chriss execution	Brownian + impact	Mean-variance of cost	Liquidation horizon	Static
Newsvendor	Demand CDF	Expected cost (or CVaR)	Capacity	Empirical / DRO
Cramér–Lundberg	Compound Poisson	Ruin probability	Capital floor	Static
LinUCB news rec.	Linear payoff	Expected reward	Slot capacity	UCB
RTB MDP	Q-network value	Expected click value	Daily budget	$ϵ$ -greedy / TS
M5 forecasting	GBM ensemble	Pinball loss	None operational	Hyperparameter
DML causal	Cross-fit nuisances	$n$ ATE bias	None	None (observational)
Wasserstein DRO	Empirical + ball	Worst-case loss	Hypothesis class	Implicit (adversary)
DataGlass	NB2 × Beta-Binomial	Expected profit + posterior	Six op. constraints	TS + perturbation

Table 1 — Cross-domain instantiation of the four primitives.

13.2 Effect-size summary across deployed systems

Table 2 collates headline empirical lifts for the production deployments included in the corpus, where reported.

System	Domain	Headline lift	Source
Cai et al. RTB	Display advertising	+16.7% clicks	[211]
Wu et al. CMDP-RTB	Display advertising	+23.4% clicks	[212]
Zhao et al. RTB	Sponsored search	−15.4% CPC	[213]
Wang et al. ROI-RTB	Display advertising	+14.0% ROI	[215]
Jauvion et al. SSP	Header bidding	Reported significant	[230]
Ferreira–Lee–Simchi-Levi	Online retail pricing	+9.7% revenue	[117]
Li et al. LinUCB	News recommendation	+12.5% CTR	[146]
DataGlass (offline)	Marketplace ads	+21.3% profit	[1]
DataGlass (online A/B)	Marketplace ads	+21.6% profit	[1]

Table 2 — Production deployment lifts (where reported).

13.3 Cross-domain transfer assessment

Table 3 evaluates the strength of cross-domain transfer between finance/operations primitives and e-commerce decision intelligence.

Primitive	Source domain	Transfer to e-commerce	Strength
Equal-marginal-return optimization	Finance (MV)	Budget allocation across campaigns	Strong
Mean-variance objective	Finance	Risk-aware budget allocation	Partial
CVaR linearization	Finance	Downside-risk ad allocation	Strong (untapped)
Almgren–Chriss cost-of-thrashing	Finance	Action-count caps	Strong
Newsvendor capacity logic	Operations	Inventory-dilution mechanism	Strong
$(s, S)$ threshold policies	Operations	Bid-budget revision triggers	Partial
Cramér–Lundberg ruin	Insurance	Tail-risk calibration	Partial
EVT tail estimation	Insurance	Conversion outlier modeling	Partial
LinUCB / Thompson Sampling	Online learning	Campaign exploration	Strong
Bandits with Knapsacks	Online learning	Constrained exploration	Strong
DML	Causal inference	Observational lift estimation	Strong
Synthetic controls	Causal inference	Holdout-region lift	Partial
Wasserstein DRO	Stochastic optim.	Drift-robust response curves	Open
MPC	Control	Intra-day budget pacing	Partial (open)
Conformal prediction	Online learning	Calibrated CIs on recommendations	Partial (open)

Table 3 — Transfer strength of selected primitives into e-commerce decision systems.

13.4 Quality-rubric distribution

Across the 213 primary papers, the median rubric score (formal rigor, empirical validation, reproducibility, deployment) is $(2, 1, 1, 0)$ on the 0–2 scale. Deployment evidence is the rubric dimension with the lowest mean score, consistent with the well-known publication-bias toward methodologically novel but production-untested results.

14. Heterogeneity, Risk of Bias, and Publication Bias

14.1 Heterogeneity

Cross-domain heterogeneity is structural. Finance papers favor closed-form, asymptotic, and worst-case results; operations papers favor approximation algorithms with provable bounds; machine-learning papers favor empirical benchmarks; e-commerce papers favor production-deployment lifts. The four-primitive taxonomy is the unifying frame; the calibration choices (which risk measure, which ambiguity set, which exploration schedule) are where heterogeneity concentrates.

14.2 Risk of bias

Three risk-of-bias considerations apply across domains.

Selection bias toward positive results. Production-deployment papers self-select for lifts large enough to publish. The reported 18–24% lifts in advertising RL [211]–[215] should be read with this caveat.

Outcome-definition bias. Finance papers report Sharpe ratios; e-commerce papers report CTR or ROAS; both are gameable proxies for the underlying contribution. We recommend that future cross-domain reviews explicitly require contribution-margin-adjusted reporting [2, Section 3.1].

Comparator bias. "Static baseline" in advertising RL is rarely defined precisely; the magnitude of reported lifts depends on the strength of the baseline.

14.3 Publication bias

We did not run a formal funnel-plot analysis because effect sizes are not commensurable across domains. We note three indicators of publication bias: (i) the under-representation of negative-result papers in advertising RL; (ii) the gap between the strong methodological literature on robust optimization and the relatively sparse production-deployment literature in the same area; (iii) the absence of replication studies of headline RTB results [211]–[212]. These observations suggest reported lifts are likely upper-bounded estimates.

15. Open Problems and Future Directions

We identify nine open problems at the methodological and operational frontier, expanding on the five raised in the prior version of this review.

P1 — Time-consistent risk-averse Bellman for marketplace allocation

Ruszczyński's [38] formalism is mature for finance and operations but not deployed at scale in e-commerce. A practical CVaR-Bellman recursion for budget allocation, with explicit treatment of attribution-window non-Markovianity, is open.

P2 — Wasserstein DRO with bandit regret guarantees

Esfahani–Kuhn DRO [27] is a natural fit for marketplace ad budget allocation; whether the DRO regularization radius can be tuned from data without sacrificing bandit-style regret is open.

P3 — Joint optimization of budget and target ROAS

Marketplace sellers control two levers; current systems, including DataGlass v1.0, optimize budget conditional on a fixed target ROAS [1, Section XI]. Joint optimization raises identifiability questions (the platform-side response model must include the seller-side target as a covariate) and is on the v2.0 roadmap.

P4 — Causal identification under attribution mixing

Standard randomized experimentation [148] generates clean variation, but the platform's attribution window introduces a convolutional smoothing kernel that complicates inference. Deconvolution-style estimators may transfer from neuroimaging [231].

P5 — Cross-marketplace transfer learning

Sellers operating across Shopee, Lazada, TikTok Shop, Amazon, and Walmart have structurally similar but parameter-different response curves. Hierarchical Bayesian or domain-adaptation approaches that share information across marketplaces are an open frontier.

P6 — Conformal-prediction integration into operational decision systems

Distribution-free prediction sets [176]–[178] are an attractive complement to Bayesian posteriors but have not been integrated into production budget-optimization systems with theoretical guarantees on the resulting decisions.

P7 — Algorithmic-fairness constraints in e-commerce decision systems

Fairness-aware optimization [232], [233] is mature for classification but underdeveloped for sequential decision systems. The seller-platform power asymmetry [2] is a natural site for fairness-constrained allocation.

P8 — Mechanism-design counter-strategies for opaque platform auctions

As marketplaces shift from second-price to first-price [222], [223], the seller's optimal budget response becomes more dependent on competitor behavior. Game-theoretic counter-strategies under partial information are open.

P9 — Foundation-model exploitation of unstructured signals

Listing copy, image quality, and review content are unstructured signals that affect the budget→outcome curve but are not usually included in response-curve estimation. Foundation-model embeddings [234], [235] are a candidate input; the calibration problem is open.

16. Limitations of This Meta-Review

We acknowledge five limitations. First, the corpus is heavily weighted toward English-language publications; the substantial Chinese-language literature on marketplace advertising (especially Taobao and JD) is under-represented, despite citations to [226] and others. Second, the search period ends in April 2026, so the most recent 2026 conference proceedings (KDD'26, NeurIPS'26) are not included. Third, the review is structured-narrative rather than meta-analytic, so we cannot pool effect sizes; cross-domain effect-size comparison is a known limitation. Fourth, the included e-commerce production literature is biased toward systems with publishable academic affiliations; pure-industry systems without academic write-ups are systematically under-represented. Fifth, the authors are affiliated with one of the production systems reviewed (DataGlass [1]), introducing a potential conflict of interest documented in Section 18.

17. Reproducibility Statement

The primary inputs to this review are public-domain academic publications, listed exhaustively in the References. The internal DataGlass reports cited are cross-referenced to the corresponding public-domain sources where they exist. The search strings of Appendix B reproduce the candidate-record search; the screening protocol of Section 2.4 is documented in writing and available on request. The structured-extraction database used to generate Tables 1–3 is available on request, subject to redaction of any individually-identifying seller or campaign data.

18. Conflict of Interest, Funding, and Ethics

DataGlass Labs Research is the institutional author of this meta-review. The DataGlass production system [1] is one of the systems reviewed; this introduces a potential conflict of interest. We have attempted to mitigate this in two ways: (i) by citing the relevant academic literature comprehensively rather than selectively; and (ii) by explicitly identifying open problems and limitations of the DataGlass system in Section 15 and Section 16. This is a working paper, not a peer-reviewed publication; the synthesis it presents is internal research output and the reader should treat the DataGlass-specific empirical figures it cites accordingly.

The review was funded internally by DataGlass Labs Research; no external funding was received. No participating sellers were identifiable in the data underlying Section 11.7 or [1]; all empirical data has been aggregated and anonymized prior to analysis.

19. Conclusion

This meta-review has argued that the literature on prediction and risk optimization under uncertainty is more unified than its domain-fragmented appearance suggests. The same four primitives — calibrated probabilistic models, coherent risk-aware objectives, explicit operational constraint sets, and principled exploration mechanisms — underlie every mature decision system in finance, operations, insurance, energy, healthcare, causal inference, and e-commerce. Eleven detailed case studies have shown how distinct-looking problems instantiate the same underlying framework. The connecting case — DataGlass marketplace ad budget allocation [1] — inherits its formal structure from Markowitz, Almgren–Chriss, Rockafellar–Uryasev, the newsvendor, Cramér–Lundberg, contextual bandits, the M5 forecasting consensus, and double machine learning. The novelty is composition, calibration, and the explicit treatment of the platform-imposed constraint structure — not the underlying primitives.

For researchers, the implication is that cross-domain reading is under-weighted in current practice and that the next wave of e-commerce decision-intelligence systems will be built by engineers fluent in the finance and operations literatures of the prior generation. For practitioners, the implication is that the choice of risk measure, the calibration of ambiguity, and the design of exploration are not implementation details: they are the system. We hope this review can serve as a citable map for the next decade of work at this intersection.

The choice of risk measure, the calibration of ambiguity, and the design of exploration are not implementation details: they are the system.

Appendix A — Glossary

Active set — In a constrained optimization problem, the subset of constraints that bind at the optimum.

ADP — Approximate dynamic programming; family of methods for solving high-dimensional MDPs via value-function approximation.

ARL — Average run length; expected time between false alarms in a sequential test.

BwK — Bandits with Knapsacks; bandit framework with global resource constraints.

Coherent risk measure — A functional satisfying monotonicity, sub-additivity, positive homogeneity, and translation invariance.

Conformal prediction — Distribution-free prediction-interval framework with finite-sample coverage guarantee under exchangeability.

CUSUM — Cumulative-sum sequential test for detecting a change in distribution.

CVaR — Conditional Value-at-Risk (Expected Shortfall); coherent tail risk measure.

DML — Double machine learning; orthogonal-score estimation of treatment effects with machine-learned nuisances.

DRO — Distributionally robust optimization.

MDP / CMDP — Markov Decision Process / Constrained MDP.

MPC — Model predictive control; receding-horizon optimal control.

Newsvendor — Single-period stochastic-demand inventory problem with closed-form quantile solution.

OCO — Online convex optimization; sequential convex-loss minimization with adversarial environments.

ROAS / true ROAS — Return on ad spend (reported); contribution-margin-adjusted return on ad spend (true).

RTB — Real-time bidding; per-impression bidding in display advertising.

SAA — Sample average approximation.

Shadow price — Lagrange multiplier on a binding constraint in an optimization problem.

Thompson Sampling — Bayesian bandit algorithm sampling actions in proportion to posterior optimality probability.

VaR — Value-at-Risk; quantile-based risk measure (non-coherent).

Wasserstein distance — Optimal-transport metric on probability measures.

Appendix B — Search Strings

Search strings used for the systematic literature search of Section 2.2 (selected; full inventory available on request):

Finance: `("mean-variance" OR "Markowitz" OR "CVaR" OR "expected shortfall" OR "robust portfolio") AND ("optimization" OR "allocation")`

Execution: `("Almgren-Chriss" OR "optimal execution" OR "implementation shortfall") AND ("transaction cost" OR "market impact")`

Operations: `("newsvendor" OR "(s,S) policy" OR "base-stock" OR "revenue management") AND ("stochastic" OR "robust" OR "data-driven")`

Insurance: `("Cramer-Lundberg" OR "ruin probability" OR "Solvency II" OR "extreme value theory")`

Bandits: `("multi-armed bandit" OR "Thompson sampling" OR "UCB" OR "contextual bandit" OR "bandits with knapsacks")`

Causal: `("double machine learning" OR "causal forest" OR "synthetic control" OR "potential outcomes" OR "average treatment effect")`

Advertising: `("real-time bidding" OR "ad budget" OR "ROAS" OR "marketplace advertising" OR "sponsored search")`

Forecasting: `("M5 competition" OR "hierarchical forecasting" OR "DeepAR" OR "N-BEATS" OR "temporal fusion transformer")`

DRO: `("distributionally robust" OR "Wasserstein DRO" OR "ambiguity set")`

MPC: `("model predictive control" OR "receding horizon" OR "stochastic MPC")`

Appendix C — Additional Summary Tables

Measure	Coherent	Convex	Distortion	Linear-program reformulable	Standard regulatory use
Variance	No	Yes (with mean)	No	Yes (QP)	Capital allocation (legacy)
VaR	No	No	Yes	No (MIP in general)	Basel II, Solvency II
CVaR	Yes	Yes	Yes	Yes	Basel III FRTB
Spectral	Yes	Yes	Yes	Yes (LP)	Internal models
Entropic	No (under sub-add.)	Yes	No	No (cone)	Robust control

Table C1 — Risk-measure properties.

Mechanism	Regret bound	Computational cost	Strengths	Weaknesses
$ϵ$ -greedy	Linear	$O (K)$	Simplicity	Asymptotically suboptimal
UCB	$O (K T lo g T)$	$O (K)$ per step	Frequentist, anytime	Poor under model misspec.
Thompson Sampling	$O (K T lo g T)$	Sampling-cost	Robust to misspec., delayed feedback	Posterior maintenance
LinUCB	$O (d T lo g T)$	$O (d^{2})$ per step	Contextual	Linear-payoff assumption
BwK	$O (OPT / B)$	LP per step	Budget-aware	Requires LP solver
Random perturbation	n/a (causal)	Negligible	Identification	No regret guarantee alone

Table C2 — Exploration-mechanism comparison.

References

The reference list is organised by theme. Where a single reference applies to multiple themes, it is listed under its primary association.

Internal DataGlass references

[1] DataGlass Labs Research, "DataGlass: Bayesian Budget Allocation for E-Commerce Advertising Under Platform Constraints," internal technical report (working paper), March 2026.

[2] DataGlass Labs Research, "From Gut Feel to Posterior Inference: A Research Article on the DataGlass Decision-Intelligence System for E-Commerce Ad Budget Allocation," DataGlass Labs Research working paper, May 2026.

[199] DataGlass Labs Research, "Elasticity Modeling and Bundle Pricing," internal technical note, April 2026.

Foundations of decision and risk theory

[3] H. Markowitz, "Portfolio Selection," J. Finance, vol. 7, no. 1, pp. 77–91, 1952.

[4] R. Bellman, Dynamic Programming. Princeton Univ. Press, 1957.

[5] H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol. 58, no. 5, pp. 527–535, 1952.

[6] F. H. Knight, Risk, Uncertainty, and Profit. Houghton Mifflin, 1921.

[10] L. J. Savage, The Foundations of Statistics. Wiley, 1954.

[11] J. O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, 1985.

[12] V. N. Vapnik, Statistical Learning Theory. Wiley, 1998.

[13] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian Data Analysis, 3rd ed. Chapman & Hall/CRC, 2013.

[31] D. Ellsberg, "Risk, ambiguity, and the Savage axioms," Quart. J. Econ., vol. 75, no. 4, pp. 643–669, 1961.

[32] I. Gilboa and D. Schmeidler, "Maxmin expected utility with non-unique prior," J. Math. Econ., vol. 18, no. 2, pp. 141–153, 1989.

Risk measures and robust/distributionally robust optimization

[14] W. F. Sharpe, "Capital asset prices: A theory of market equilibrium under conditions of risk," J. Finance, vol. 19, no. 3, pp. 425–442, 1964.

[15] A. J. McNeil, R. Frey, and P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, rev. ed. Princeton Univ. Press, 2015.

[16] R. T. Rockafellar and S. Uryasev, "Optimization of conditional value-at-risk," J. Risk, vol. 2, pp. 21–41, 2000.

[17] R. T. Rockafellar and S. Uryasev, "Conditional value-at-risk for general loss distributions," J. Banking Finance, vol. 26, no. 7, pp. 1443–1471, 2002.

[18] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, "Coherent measures of risk," Math. Finance, vol. 9, no. 3, pp. 203–228, 1999.

[19] H. Föllmer and A. Schied, "Convex measures of risk and trading constraints," Finance Stoch., vol. 6, no. 4, pp. 429–447, 2002.

[20] M. Frittelli and E. Rosazza Gianin, "Putting order in risk measures," J. Banking Finance, vol. 26, no. 7, pp. 1473–1486, 2002.

[21] S. S. Wang, "A class of distortion operators for pricing financial and insurance risks," J. Risk Insurance, vol. 67, no. 1, pp. 15–36, 2000.

[22] C. Acerbi, "Spectral measures of risk: A coherent representation of subjective risk aversion," J. Banking Finance, vol. 26, no. 7, pp. 1505–1518, 2002.

[23] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust Optimization. Princeton Univ. Press, 2009.

[24] D. Bertsimas and M. Sim, "The price of robustness," Oper. Res., vol. 52, no. 1, pp. 35–53, 2004.

[25] E. Delage and Y. Ye, "Distributionally robust optimization under moment uncertainty with application to data-driven problems," Oper. Res., vol. 58, no. 3, pp. 595–612, 2010.

[26] A. Ben-Tal, D. den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen, "Robust solutions of optimization problems affected by uncertain probabilities," Manage. Sci., vol. 59, no. 2, pp. 341–357, 2013.

[27] P. M. Esfahani and D. Kuhn, "Data-driven distributionally robust optimization using the Wasserstein metric," Math. Program., vol. 171, pp. 115–166, 2018.

[28] J. Blanchet, K. Murthy, and N. Si, "Confidence regions in Wasserstein distributionally robust estimation," Biometrika, vol. 109, no. 2, pp. 295–315, 2022.

[29] R. Gao and A. J. Kleywegt, "Distributionally robust stochastic optimization with Wasserstein distance," Math. Oper. Res., 2023.

[30] A. Sinha, H. Namkoong, R. Volpi, and J. Duchi, "Certifying some distributional robustness with principled adversarial training," in Proc. ICLR, 2018.

Stochastic dynamic programming, MDPs, and reinforcement learning

[33] D. P. Bertsekas, Dynamic Programming and Optimal Control, 4th ed. Athena Scientific, 2017.

[34] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.

[35] W. B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified Framework. Wiley, 2022.

[36] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018.

[37] R. A. Howard and J. E. Matheson, "Risk-sensitive Markov decision processes," Manage. Sci., vol. 18, no. 7, pp. 356–369, 1972.

[38] A. Ruszczyński, "Risk-averse dynamic programming for Markov decision processes," Math. Program., vol. 125, no. 2, pp. 235–261, 2010.

[39] E. Altman, Constrained Markov Decision Processes. Chapman & Hall/CRC, 1999.

[40] G. N. Iyengar, "Robust dynamic programming," Math. Oper. Res., vol. 30, no. 2, pp. 257–280, 2005.

[41] A. Nilim and L. El Ghaoui, "Robust control of Markov decision processes with uncertain transition matrices," Oper. Res., vol. 53, no. 5, pp. 780–798, 2005.

[42] W. Wiesemann, D. Kuhn, and B. Rustem, "Robust Markov decision processes," Math. Oper. Res., vol. 38, no. 1, pp. 153–183, 2013.

[43] J. Garcia and F. Fernandez, "A comprehensive survey on safe reinforcement learning," J. Mach. Learn. Res., vol. 16, pp. 1437–1480, 2015.

Online learning, bandits, and OCO

[44] M. Zinkevich, "Online convex programming and generalized infinitesimal gradient ascent," in Proc. ICML, 2003.

[45] E. Hazan, "Introduction to online convex optimization," Found. Trends Optim., vol. 2, no. 3–4, pp. 157–325, 2016.

[46] T. Lattimore and C. Szepesvári, Bandit Algorithms. Cambridge Univ. Press, 2020.

[47] P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine Learning, vol. 47, pp. 235–256, 2002.

[48] S. Agrawal and N. Goyal, "Thompson Sampling for contextual bandits with linear payoffs," in Proc. ICML, 2013.

[49] D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, "A tutorial on Thompson Sampling," Found. Trends Mach. Learn., vol. 11, no. 1, pp. 1–96, 2018.

[50] O. Chapelle and L. Li, "An empirical evaluation of Thompson Sampling," in Proc. NeurIPS, 2011.

[51] A. Badanidiyuru, R. Kleinberg, and A. Slivkins, "Bandits with knapsacks," J. ACM, vol. 65, no. 3, pp. 1–55, 2018.

[52] O. Besbes, Y. Gur, and A. Zeevi, "Stochastic multi-armed-bandit problem with non-stationary rewards," in Proc. NeurIPS, 2014.

[53] Y. Chen, C. Lee, and H. Luo, "A new framework for oracle-efficient online learning with side information," in Proc. COLT, 2019.

[54] A. S. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization. Wiley, 1983.

[55] A. Beck and M. Teboulle, "Mirror descent and nonlinear projected subgradient methods for convex optimization," Oper. Res. Lett., vol. 31, no. 3, pp. 167–175, 2003.

[56] T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.

[62] E. Hazan, A. Agarwal, and S. Kale, "Logarithmic regret algorithms for online convex optimization," Machine Learning, vol. 69, no. 2–3, pp. 169–192, 2007.

[63] H. B. McMahan, "Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization," in Proc. AISTATS, 2011.

[64] S. R. Balseiro, H. Lu, and V. Mirrokni, "Dual mirror descent for online allocation problems," in Proc. ICML, 2023.

[65] S. R. Balseiro, H. Lu, and V. Mirrokni, "Primal-dual budget pacing with ROI constraints," in Proc. ICML, 2024.

[66] G. J. Gordon, A. Greenwald, and C. Marks, "No-regret learning in convex games," in Proc. ICML, 2008.

[170] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge Univ. Press, 2006.

[171] J. Kivinen and M. K. Warmuth, "Exponentiated gradient versus gradient descent for linear predictors," Inform. Comput., vol. 132, no. 1, pp. 1–63, 1997.

[172] J. Hannan, "Approximation to Bayes risk in repeated play," Contrib. Theory Games, vol. 3, pp. 97–139, 1957.

[173] N. Littlestone and M. K. Warmuth, "The weighted majority algorithm," Inform. Comput., vol. 108, no. 2, pp. 212–261, 1994.

[174] T. Roughgarden, "Intrinsic robustness of the price of anarchy," J. ACM, vol. 62, no. 5, pp. 1–42, 2015.

[175] D. P. Foster and R. V. Vohra, "Calibrated learning and correlated equilibrium," Games Econ. Behav., vol. 21, no. 1–2, pp. 40–55, 1997.

Methodology of systematic / meta-reviews

[7] M. Petticrew and H. Roberts, Systematic Reviews in the Social Sciences. Wiley-Blackwell, 2006.

[8] M. J. Page et al., "The PRISMA 2020 statement: An updated guideline for reporting systematic reviews," BMJ, vol. 372, p. n71, 2021.

[9] B. J. Shea et al., "AMSTAR 2: A critical appraisal tool for systematic reviews," BMJ, vol. 358, p. j4008, 2017.

Model predictive control

[57] E. F. Camacho and C. Bordons, Model Predictive Control, 2nd ed. Springer, 2007.

[58] F. Borrelli, A. Bemporad, and M. Morari, Predictive Control for Linear and Hybrid Systems. Cambridge Univ. Press, 2017.

[59] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli, "A survey of motion planning and control techniques for self-driving urban vehicles," IEEE Trans. Intell. Veh., vol. 1, no. 1, pp. 33–55, 2016.

[60] A. Mesbah, "Stochastic model predictive control: An overview and perspectives," IEEE Control Syst., vol. 36, no. 6, pp. 30–44, 2016.

[61] D. Q. Mayne, M. M. Seron, and S. V. Raković, "Robust model predictive control of constrained linear systems with bounded disturbances," Automatica, vol. 41, no. 2, pp. 219–224, 2005.

Quantitative finance

[67] J. Tobin, "Liquidity preference as behavior towards risk," Rev. Econ. Stud., vol. 25, no. 2, pp. 65–86, 1958.

[68] J. Lintner, "The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets," Rev. Econ. Stat., vol. 47, no. 1, pp. 13–37, 1965.

[69] J. Mossin, "Equilibrium in a capital asset market," Econometrica, vol. 34, no. 4, pp. 768–783, 1966.

[70] B. Mandelbrot, "The variation of certain speculative prices," J. Bus., vol. 36, no. 4, pp. 394–419, 1963.

[71] E. F. Fama, "Efficient capital markets: A review of theory and empirical work," J. Finance, vol. 25, no. 2, pp. 383–417, 1970.

[72] E. F. Fama and K. R. French, "Common risk factors in the returns on stocks and bonds," J. Financial Econ., vol. 33, no. 1, pp. 3–56, 1993.

[73] E. F. Fama and K. R. French, "A five-factor asset pricing model," J. Financial Econ., vol. 116, no. 1, pp. 1–22, 2015.

[74] S. A. Ross, "The arbitrage theory of capital asset pricing," J. Econ. Theory, vol. 13, no. 3, pp. 341–360, 1976.

[75] K. Hou, C. Xue, and L. Zhang, "Digesting anomalies: An investment approach," Rev. Financ. Stud., vol. 28, no. 3, pp. 650–705, 2015.

[76] D. Goldfarb and G. Iyengar, "Robust portfolio selection problems," Math. Oper. Res., vol. 28, no. 1, pp. 1–38, 2003.

[77] R. H. Tütüncü and M. Koenig, "Robust asset allocation," Ann. Oper. Res., vol. 132, pp. 157–187, 2004.

[78] O. Ledoit and M. Wolf, "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," J. Empir. Finance, vol. 10, no. 5, pp. 603–621, 2003.

[79] O. Ledoit and M. Wolf, "A well-conditioned estimator for large-dimensional covariance matrices," J. Multivar. Anal., vol. 88, no. 2, pp. 365–411, 2004.

[80] F. Black and R. Litterman, "Global portfolio optimization," Financial Analysts J., vol. 48, no. 5, pp. 28–43, 1992.

[81] R. Almgren and N. Chriss, "Optimal execution of portfolio transactions," J. Risk, vol. 3, pp. 5–39, 2001.

[82] A. A. Obizhaeva and J. Wang, "Optimal trading strategy and supply/demand dynamics," J. Financ. Markets, vol. 16, no. 1, pp. 1–32, 2013.

[83] J. Gatheral, "No-dynamic-arbitrage and market impact," Quant. Finance, vol. 10, no. 7, pp. 749–759, 2010.

[84] Á. Cartea and S. Jaimungal, "Optimal execution with limit and market orders," Quant. Finance, vol. 15, no. 8, pp. 1279–1291, 2015.

[85] F. Black and M. Scholes, "The pricing of options and corporate liabilities," J. Polit. Econ., vol. 81, no. 3, pp. 637–654, 1973.

[86] R. C. Merton, "Theory of rational option pricing," Bell J. Econ. Manag. Sci., vol. 4, no. 1, pp. 141–183, 1973.

[87] R. C. Merton, "Optimum consumption and portfolio rules in a continuous-time model," J. Econ. Theory, vol. 3, no. 4, pp. 373–413, 1971.

[88] R. Cont and P. Tankov, Financial Modelling with Jump Processes. Chapman & Hall/CRC, 2003.

[89] P. Glasserman, Monte Carlo Methods in Financial Engineering. Springer, 2003.

[90] Á. Cartea, S. Jaimungal, and J. Penalva, Algorithmic and High-Frequency Trading. Cambridge Univ. Press, 2015.

[91] B. Dupire, "Pricing with a smile," Risk, vol. 7, pp. 18–20, 1994.

[92] S. L. Heston, "A closed-form solution for options with stochastic volatility with applications to bond and currency options," Rev. Financ. Stud., vol. 6, no. 2, pp. 327–343, 1993.

[93] J. Hull and A. White, "The pricing of options on assets with stochastic volatilities," J. Finance, vol. 42, no. 2, pp. 281–300, 1987.

[94] P. Carr and L. Wu, "Variance risk premiums," Rev. Financ. Stud., vol. 22, no. 3, pp. 1311–1341, 2009.

[95] M. Avellaneda and J.-H. Lee, "Statistical arbitrage in the U.S. equities market," Quant. Finance, vol. 10, no. 7, pp. 761–782, 2010.

[96] O. Guéant, C.-A. Lehalle, and J. Fernandez-Tapia, "Dealing with the inventory risk: A solution to the market making problem," Math. Financ. Econ., vol. 7, pp. 477–507, 2013.

[97] R. C. Merton, "On the pricing of corporate debt: The risk structure of interest rates," J. Finance, vol. 29, no. 2, pp. 449–470, 1974.

[98] D. Duffie and K. J. Singleton, "Modeling term structures of defaultable bonds," Rev. Financ. Stud., vol. 12, no. 4, pp. 687–720, 1999.

[99] D. Brigo, M. Morini, and A. Pallavicini, Counterparty Credit Risk, Collateral and Funding. Wiley, 2013.

[100] R. B. Nelsen, An Introduction to Copulas, 2nd ed. Springer, 2006.

[101] D. X. Li, "On default correlation: A copula function approach," J. Fixed Income, vol. 9, no. 4, pp. 43–54, 2000.

[102] P. Embrechts, A. McNeil, and D. Straumann, "Correlation and dependence in risk management: Properties and pitfalls," in Risk Management: Value at Risk and Beyond, M. A. H. Dempster, Ed. Cambridge Univ. Press, 2002.

[103] L. E. O. Svensson, "Inflation forecast targeting: Implementing and monitoring inflation targets," Eur. Econ. Rev., vol. 41, no. 6, pp. 1111–1146, 1997.

[104] L. P. Hansen and T. J. Sargent, Robustness. Princeton Univ. Press, 2008.

Operations and supply chain

[105] F. Y. Edgeworth, "The mathematical theory of banking," J. Roy. Stat. Soc., vol. 51, no. 1, pp. 113–127, 1888.

[106] K. J. Arrow, T. Harris, and J. Marschak, "Optimal inventory policy," Econometrica, vol. 19, no. 3, pp. 250–272, 1951.

[107] J. Gotoh and Y. Takano, "Newsvendor solutions via conditional value-at-risk minimization," Eur. J. Oper. Res., vol. 179, no. 1, pp. 80–96, 2007.

[108] L. V. Snyder and Z.-J. M. Shen, Fundamentals of Supply Chain Theory, 2nd ed. Wiley, 2019.

[109] R. Levi, R. O. Roundy, and D. B. Shmoys, "Provably near-optimal sampling-based policies for stochastic inventory control models," Math. Oper. Res., vol. 32, no. 4, pp. 821–839, 2007.

[110] G.-Y. Ban and C. Rudin, "The big data newsvendor: Practical insights from machine learning," Oper. Res., vol. 67, no. 1, pp. 90–108, 2019.

[111] H. Scarf, "The optimality of (s, S) policies in the dynamic inventory problem," in Mathematical Methods in the Social Sciences, K. J. Arrow, S. Karlin, and P. Suppes, Eds. Stanford Univ. Press, 1960.

[112] A. J. Clark and H. Scarf, "Optimal policies for a multi-echelon inventory problem," Manage. Sci., vol. 6, no. 4, pp. 475–490, 1960.

[113] A. Federgruen and P. Zipkin, "Computational issues in an infinite-horizon, multiechelon inventory model," Oper. Res., vol. 32, no. 4, pp. 818–836, 1984.

[114] K. T. Talluri and G. J. van Ryzin, The Theory and Practice of Revenue Management. Springer, 2004.

[115] G. Gallego and G. van Ryzin, "Optimal dynamic pricing of inventories with stochastic demand over finite horizons," Manage. Sci., vol. 40, no. 8, pp. 999–1020, 1994.

[116] O. Besbes and A. Zeevi, "Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms," Oper. Res., vol. 57, no. 6, pp. 1407–1420, 2009.

[117] K. J. Ferreira, B. H. A. Lee, and D. Simchi-Levi, "Analytics for an online retailer: Demand forecasting and price optimization," Manuf. Serv. Oper. Manag., vol. 18, no. 1, pp. 69–88, 2016.

[118] M. C. Cohen, N.-H. Z. Leung, K. Panchamgam, G. Perakis, and A. Smith, "The impact of linear optimization on promotion planning," Oper. Res., vol. 65, no. 2, pp. 446–468, 2017.

[119] G. P. Cachon, "Supply chain coordination with contracts," in Handbooks in Operations Research and Management Science: Supply Chain Management, A. G. de Kok and S. C. Graves, Eds. North-Holland, 2003.

[120] H. L. Lee, V. Padmanabhan, and S. Whang, "Information distortion in a supply chain: The bullwhip effect," Manage. Sci., vol. 43, no. 4, pp. 546–558, 1997.

[121] K. T. Talluri and G. J. van Ryzin, "An analysis of bid-price controls for network revenue management," Manage. Sci., vol. 44, no. 11, pp. 1577–1593, 1998.

[122] D. Adelman, "Dynamic bid-prices in revenue management," Oper. Res., vol. 55, no. 4, pp. 647–661, 2007.

[123] H. Topaloglu, "Using Lagrangian relaxation to compute capacity-dependent bid prices in network revenue management," Oper. Res., vol. 57, no. 3, pp. 637–649, 2009.

Insurance and actuarial

[124] H. Cramér, On the Mathematical Theory of Risk. Skandia Jubilee Volume, 1930.

[125] B. V. Gnedenko, "Sur la distribution limite du terme maximum d'une série aléatoire," Ann. Math., vol. 44, no. 3, pp. 423–453, 1943.

[126] L. de Haan and A. Ferreira, Extreme Value Theory: An Introduction. Springer, 2006.

[127] J. Pickands, "Statistical inference using extreme order statistics," Ann. Stat., vol. 3, no. 1, pp. 119–131, 1975.

[128] EIOPA, "Solvency II: Technical specifications for the preparatory phase," 2014.

[129] N. Taleb, The Black Swan. Random House, 2007.

Healthcare, energy, and other cross-domain

[130] J. R. Birge and F. Louveaux, Introduction to Stochastic Programming, 2nd ed. Springer, 2011.

[131] M. V. F. Pereira and L. M. V. G. Pinto, "Multi-stage stochastic optimization applied to energy planning," Math. Program., vol. 52, no. 1, pp. 359–375, 1991.

[132] M. Lubin, Y. Dvorkin, and L. Roald, "Chance constraints for improving the security of AC optimal power flow," IEEE Trans. Power Syst., vol. 34, no. 3, pp. 1908–1917, 2019.

[133] G. Bayraksan and D. K. Love, "Data-driven stochastic programming using phi-divergences," INFORMS Tutor. Oper. Res., pp. 1–19, 2015.

[134] T. Ayer, O. Alagoz, and N. K. Stout, "A POMDP approach to personalize mammography screening decisions," Oper. Res., vol. 60, no. 5, pp. 1019–1034, 2012.

[135] S. M. Shortreed et al., "Reinforcement learning in clinical decision support: A survey," Artif. Intell. Med., 2020.

[136] M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, "The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care," Nature Medicine, vol. 24, no. 11, pp. 1716–1720, 2018.

[137] D. Bertsimas, N. Kallus, and A. M. Weinstein, "Personalized diabetes management using electronic medical records," Diabetes Care, vol. 40, no. 2, pp. 210–217, 2017.

[138] P. Thomas and E. Brunskill, "Data-efficient off-policy policy evaluation for reinforcement learning," in Proc. ICML, 2016.

[139] M. Dudík, D. Erhan, J. Langford, and L. Li, "Doubly robust policy evaluation and optimization," Stat. Sci., vol. 29, no. 4, pp. 485–511, 2014.

Causal inference

[140] J. Neyman, "On the application of probability theory to agricultural experiments," Ann. Agric. Sci., 1923 (transl. Stat. Sci., 1990).

[141] D. B. Rubin, "Estimating causal effects of treatments in randomized and nonrandomized studies," J. Educ. Psychol., vol. 66, no. 5, pp. 688–701, 1974.

[142] G. W. Imbens and D. B. Rubin, Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge Univ. Press, 2015.

[143] J. Pearl, Causality, 2nd ed. Cambridge Univ. Press, 2009.

[144] R. A. Fisher, The Design of Experiments. Oliver and Boyd, 1935.

[145] D. Russo, "Simple Bayesian algorithms for best-arm identification," Oper. Res., vol. 68, no. 6, pp. 1625–1647, 2020.

[146] L. Li, W. Chu, J. Langford, and R. E. Schapire, "A contextual-bandit approach to personalized news article recommendation," in Proc. WWW, 2010.

[147] S. Athey and G. W. Imbens, "The econometrics of randomized experiments," in Handbook of Field Experiments, vol. 1. North-Holland, 2017.

[148] B. R. Gordon, F. Zettelmeyer, N. Bhargava, and D. Chapsky, "A comparison of approaches to advertising measurement," Marketing Sci., vol. 38, no. 2, pp. 193–225, 2019.

[149] O. Ashenfelter and D. Card, "Using the longitudinal structure of earnings to estimate the effect of training programs," Rev. Econ. Stat., vol. 67, no. 4, pp. 648–660, 1985.

[150] B. Callaway and P. H. C. Sant'Anna, "Difference-in-differences with multiple time periods," J. Econometrics, vol. 225, no. 2, pp. 200–230, 2021.

[151] D. L. Thistlethwaite and D. T. Campbell, "Regression-discontinuity analysis: An alternative to the ex post facto experiment," J. Educ. Psychol., vol. 51, no. 6, pp. 309–317, 1960.

[152] G. W. Imbens and T. Lemieux, "Regression discontinuity designs: A guide to practice," J. Econometrics, vol. 142, no. 2, pp. 615–635, 2008.

[153] J. D. Angrist and A. B. Krueger, "Instrumental variables and the search for identification," J. Econ. Perspect., vol. 15, no. 4, pp. 69–85, 2001.

[154] G. W. Imbens and J. D. Angrist, "Identification and estimation of local average treatment effects," Econometrica, vol. 62, no. 2, pp. 467–475, 1994.

[155] A. Abadie and J. Gardeazabal, "The economic costs of conflict: A case study of the Basque Country," Amer. Econ. Rev., vol. 93, no. 1, pp. 113–132, 2003.

[156] A. Abadie, A. Diamond, and J. Hainmueller, "Synthetic control methods for comparative case studies," J. Amer. Stat. Assoc., vol. 105, no. 490, pp. 493–505, 2010.

[157] S. Athey and G. Imbens, "Recursive partitioning for heterogeneous causal effects," Proc. Natl. Acad. Sci., vol. 113, no. 27, pp. 7353–7360, 2016.

[158] S. Wager and S. Athey, "Estimation and inference of heterogeneous treatment effects using random forests," J. Amer. Stat. Assoc., vol. 113, no. 523, pp. 1228–1242, 2018.

[159] V. Chernozhukov et al., "Double/debiased machine learning for treatment and structural parameters," Econom. J., vol. 21, no. 1, pp. C1–C68, 2018.

[160] S. R. Künzel, J. S. Sekhon, P. J. Bickel, and B. Yu, "Metalearners for estimating heterogeneous treatment effects using machine learning," Proc. Natl. Acad. Sci., vol. 116, no. 10, pp. 4156–4165, 2019.

[161] X. Nie and S. Wager, "Quasi-oracle estimation of heterogeneous treatment effects," Biometrika, vol. 108, no. 2, pp. 299–319, 2021.

[162] P. R. Rosenbaum, Observational Studies, 2nd ed. Springer, 2002.

[163] E. Oster, "Unobservable selection and coefficient stability," J. Bus. Econ. Stat., vol. 37, no. 2, pp. 187–204, 2019.

[164] C. Cinelli and C. Hazlett, "Making sense of sensitivity: Extending omitted variable bias," J. Roy. Stat. Soc. B, vol. 82, no. 1, pp. 39–67, 2020.

[165] D. G. Horvitz and D. J. Thompson, "A generalization of sampling without replacement from a finite universe," J. Amer. Stat. Assoc., vol. 47, no. 260, pp. 663–685, 1952.

[166] H. Bang and J. M. Robins, "Doubly robust estimation in missing data and causal inference models," Biometrics, vol. 61, no. 4, pp. 962–973, 2005.

[167] M. J. van der Laan and S. Rose, Targeted Learning. Springer, 2011.

[168] D. Precup, R. S. Sutton, and S. P. Singh, "Eligibility traces for off-policy policy evaluation," in Proc. ICML, 2000.

[169] A. Swaminathan and T. Joachims, "Counterfactual risk minimization," in Proc. ICML, 2015.

Conformal prediction and distribution shift

[176] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in a Random World. Springer, 2005.

[177] J. Lei, M. G'Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, "Distribution-free predictive inference for regression," J. Amer. Stat. Assoc., vol. 113, no. 523, pp. 1094–1111, 2018.

[178] Y. Romano, E. Patterson, and E. Candès, "Conformalized quantile regression," in Proc. NeurIPS, 2019.

[179] J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset Shift in Machine Learning. MIT Press, 2009.

[180] M. Sugiyama and M. Kawanabe, Machine Learning in Non-Stationary Environments. MIT Press, 2012.

[181] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, "Invariant risk minimization," arXiv:1907.02893, 2019.

[182] S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang, "Distributionally robust neural networks for group shifts," in Proc. ICLR, 2020.

E-commerce: forecasting, pricing, recommendation, advertising

[183] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control. Holden-Day, 1970.

[184] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 3rd ed. OTexts, 2021.

[185] S. Makridakis, M. Hibon, and C. Moser, "Accuracy of forecasting: An empirical investigation," J. Roy. Stat. Soc. A, vol. 142, no. 2, pp. 97–145, 1979.

[186] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, "The M4 competition: 100,000 time series and 61 forecasting methods," Int. J. Forecast., vol. 36, no. 1, pp. 54–74, 2020.

[187] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, "The M5 accuracy competition: Results, findings and conclusions," Int. J. Forecast., vol. 38, no. 4, pp. 1346–1364, 2022.

[188] D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, "DeepAR: Probabilistic forecasting with autoregressive recurrent networks," Int. J. Forecast., vol. 36, no. 3, pp. 1181–1191, 2020.

[189] B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, "N-BEATS: Neural basis expansion analysis for interpretable time series forecasting," in Proc. ICLR, 2020.

[190] B. Lim, S. Ö. Arık, N. Loeff, and T. Pfister, "Temporal fusion transformers for interpretable multi-horizon time series forecasting," Int. J. Forecast., vol. 37, no. 4, pp. 1748–1764, 2021.

[191] C. Challu, K. G. Olivares, B. N. Oreshkin, F. Garza, M. Mergenthaler, and A. Dubrawski, "NHITS: Neural hierarchical interpolation for time series forecasting," in Proc. AAAI, 2023.

[192] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, "TimesNet: Temporal 2D-variation modeling for general time series analysis," in Proc. ICLR, 2023.

[193] R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, and H. L. Shang, "Optimal combination forecasts for hierarchical time series," Comput. Stat. Data Anal., vol. 55, no. 9, pp. 2579–2589, 2011.

[194] S. Smyl, "A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting," Int. J. Forecast., vol. 36, no. 1, pp. 75–85, 2020.

[195] D. McFadden, "Conditional logit analysis of qualitative choice behavior," in Frontiers in Econometrics, P. Zarembka, Ed. Academic Press, 1974.

[196] K. E. Train, Discrete Choice Methods with Simulation, 2nd ed. Cambridge Univ. Press, 2009.

[197] S. Berry, J. Levinsohn, and A. Pakes, "Automobile prices in market equilibrium," Econometrica, vol. 63, no. 4, pp. 841–890, 1995.

[198] K. J. Ferreira, D. Simchi-Levi, and H. Wang, "Online network revenue management using Thompson Sampling," Oper. Res., vol. 66, no. 6, pp. 1586–1602, 2018.

[200] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proc. WWW, 2001.

[201] Y. Koren, R. Bell, and C. Volinsky, "Matrix factorization techniques for recommender systems," IEEE Computer, vol. 42, no. 8, pp. 30–37, 2009.

[202] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, "Neural collaborative filtering," in Proc. WWW, 2017.

[203] H.-T. Cheng et al., "Wide & deep learning for recommender systems," in Proc. DLRS, 2016.

[204] T. Joachims, A. Swaminathan, and T. Schnabel, "Unbiased learning-to-rank with biased feedback," in Proc. WSDM, 2017.

[205] F. Radlinski, R. Kleinberg, and T. Joachims, "Learning diverse rankings with multi-armed bandits," in Proc. ICML, 2008.

[206] W.-C. Kang and J. McAuley, "Self-attentive sequential recommendation," in Proc. ICDM, 2018.

[207] F. Sun et al., "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer," in Proc. CIKM, 2019.

[208] G. Gallego, G. Iyengar, R. Phillips, and A. Dubey, "Managing flexible products on a network," CORC Tech. Rep., Columbia Univ., 2004.

[209] P. Rusmevichientong, Z.-J. M. Shen, and D. B. Shmoys, "Dynamic assortment optimization with a multinomial logit choice model and capacity constraint," Oper. Res., vol. 58, no. 6, pp. 1666–1680, 2010.

[210] F. Bernstein, S. Modaresi, and D. Sauré, "A dynamic clustering approach to data-driven assortment personalization," Manage. Sci., vol. 65, no. 5, pp. 2095–2115, 2019.

Real-time bidding and auctions

[211] H. Cai, K. Ren, W. Zhang, K. Malialis, J. Wang, Y. Yu, and D. Guo, "Real-time bidding by reinforcement learning in display advertising," in Proc. WSDM, 2017.

[212] D. Wu, X. Chen, X. Yang, H. Wang, Q. Tan, X. Zhang, J. Xu, and K. Gai, "Budget constrained bidding by model-free reinforcement learning in display advertising," in Proc. CIKM, 2018.

[213] J. Zhao, G. Qiu, Z. Guan, W. Zhao, and X. He, "Deep reinforcement learning for sponsored search real-time bidding," in Proc. KDD, 2018.

[214] D. He et al., "HiBid: Hierarchical reinforcement learning for budget-constrained bidding," in Proc. KDD, 2024.

[215] J. Wang, W. Gu, C. Liu, H. Zhang, and W. Zhu, "ROI-constrained bidding via curriculum-guided Bayesian reinforcement learning," in Proc. KDD, 2022.

[216] S. Liu, C. Hua, Y. Chen, and J. Wang, "Real-time bidding strategy in display advertising: An empirical analysis," arXiv:2208.07516, 2022.

[217] W. Vickrey, "Counterspeculation, auctions, and competitive sealed tenders," J. Finance, vol. 16, no. 1, pp. 8–37, 1961.

[218] R. B. Myerson, "Optimal auction design," Math. Oper. Res., vol. 6, no. 1, pp. 58–73, 1981.

[219] S. Athey and I. Segal, "An efficient dynamic mechanism," Econometrica, vol. 81, no. 6, pp. 2463–2485, 2013.

[220] B. Edelman, M. Ostrovsky, and M. Schwarz, "Internet advertising and the generalized second-price auction," Amer. Econ. Rev., vol. 97, no. 1, pp. 242–259, 2007.

[221] H. R. Varian, "Position auctions," Int. J. Ind. Organ., vol. 25, no. 6, pp. 1163–1178, 2007.

[222] A. S. Rawat, "Designing auctions when algorithms learn to bid," arXiv:2302.01540, 2023.

[223] G. Despotakis, R. Ravi, and A. Sayedi, "First-price auctions in online display advertising," J. Marketing Res., vol. 58, no. 5, pp. 888–907, 2021.

[230] G. Jauvion, N. Grislain, P. Sielenou, A. Veyrat, and D. Gourru, "Optimization of an SSP's header bidding strategy using Thompson Sampling," in Proc. KDD, 2018.

E-commerce response curves and changepoint detection

[224] C. Ritz, F. Baty, J. C. Streibig, and D. Gerhard, "Dose-response analysis using R," PLoS ONE, vol. 10, no. 12, e0146021, 2015.

[225] A. C. Cameron and P. K. Trivedi, Regression Analysis of Count Data, 2nd ed. Cambridge Univ. Press, 2013.

[226] X. Ma et al., "Entire space multi-task model: An effective approach for estimating post-click conversion rate," in Proc. SIGIR, 2018.

[227] E. S. Page, "Continuous inspection schemes," Biometrika, vol. 41, nos. 1–2, pp. 100–115, 1954.

[228] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. Springer, 1985.

[229] W. Chu, L. Li, L. Reyzin, and R. E. Schapire, "Contextual bandits with linear payoff functions," in Proc. AISTATS, 2011.

Open-problem references

[231] D. A. Handwerker, J. M. Ollinger, and M. D'Esposito, "Variation of BOLD hemodynamic responses across subjects and brain regions," NeuroImage, vol. 21, no. 4, pp. 1639–1651, 2004.

[232] M. Hardt, E. Price, and N. Srebro, "Equality of opportunity in supervised learning," in Proc. NeurIPS, 2016.

[233] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, "Fairness through awareness," in Proc. ITCS, 2012.

[234] T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS, 2020.

[235] R. Bommasani et al., "On the opportunities and risks of foundation models," arXiv:2108.07258, 2021.

This meta-review may be cited as: DataGlass Labs Research, "Prediction and Risk Optimization Under Uncertainty: A Cross-Domain Meta-Review of Methods in Finance, Operations, Causal Inference, and E-Commerce Decision Intelligence," DataGlass Labs Research working paper, May 2026.