Mechanism-Aware Causal Elasticity Modeling for Platform E-Commerce

Working paper, version 1.1, May 2026. Cite as: Bhum Soonjun and DataGlass Labs Research, "Mechanism-Aware Hierarchical Causal Elasticity Modeling for Platform E-Commerce," DataGlass Labs Research working paper, May 2026.

Keywords. Price elasticity, e-commerce, platform economics, hierarchical Bayesian models, double machine learning, mechanism design, Bundle Deal, dynamic pricing, second-degree price discrimination, demand estimation.

Abstract

Price elasticity of demand is the central primitive of empirical pricing, yet the canonical estimators developed for offline retail and structural industrial-organization settings transfer poorly to modern platform e-commerce. Marketplace sellers on platforms such as Shopee, Lazada, and TikTok Shop face a pricing surface that is (i) algorithmically mediated through search ranking and recommendation, (ii) governed by structured promotional mechanisms — Bundle Deal, FlexiCombo, stacked vouchers, flash sales — that implement second- and third-degree price discrimination, (iii) characterized by extreme SKU heterogeneity and frequent cold starts, and (iv) subject to severe price endogeneity because sellers actively respond to algorithmic and demand signals. We review the history of elasticity estimation from Marshall (1890) through Berry, Levinsohn, and Pakes (1995) to modern double machine learning (Chernozhukov et al., 2018), and we identify three structural gaps in the existing literature when applied to platform e-commerce: insufficient treatment of mechanism-induced price discrimination, weak handling of cold-start SKUs, and inadequate uncertainty quantification for managerial decisions. We propose the Mechanism-Aware Hierarchical Causal Elasticity Model (MA-HCEM), a unified framework that combines (a) a hierarchical Bayesian backbone with partial pooling across SKU, category, and shop, (b) a double-machine-learning identification stage that purges algorithmic confounders from observed price variation, (c) a structural likelihood component that explicitly models Bundle Deal and voucher mechanisms as second-degree price discrimination, and (d) randomized micro-perturbations of price as an experimental prior. We state and defend an identification result for the proposed estimator, develop a theoretical bias decomposition that characterizes the structural failure modes of each canonical baseline against the platform-marketplace data-generating process, and describe a posterior-inference procedure based on stochastic variational inference with control variates. The contribution is methodological and theoretical; we make no empirical or simulation claims, and we lay out an explicit replication agenda for empirical validation.

1. Introduction

The single most consequential parameter in retail pricing is the price elasticity of demand: the percentage change in unit demand induced by a one-percent change in price (Marshall, 1890; Varian, 1992). It enters every textbook formula for the profit-maximizing markup (Lerner, 1934), the optimal discount depth (Tellis, 1988), the design of nonlinear tariffs (Wilson, 1993), and the value of a customer to a firm (Gupta and Lehmann, 2003). It also enters every modern dynamic-pricing system (Phillips, 2005; Talluri and van Ryzin, 2004), every revenue-management algorithm (den Boer, 2015), and every causal-inference pipeline for promotional measurement (Athey and Imbens, 2019).

Yet despite a century of econometric attention, reliable per-product elasticity estimates remain scarce in the operating environment that now dominates Southeast Asian and increasingly global retail: third-party marketplace platforms such as Shopee, Lazada, and TikTok Shop. Three structural features of these platforms make standard estimation approaches fragile. First, the seller does not control the demand surface in isolation: the platform's search-ranking and recommendation algorithms mediate exposure as a function of price, advertising bid, conversion history, and inventory state (Ursu, 2018; Goldfarb and Tucker, 2019). Second, promotional surfaces are mechanism-rich: a Shopee Bundle Deal is structurally a quantity discount that operates as second-degree price discrimination (Tirole, 1988); a stacked Lazada FlexiCombo voucher is a hybrid of second- and third-degree price discrimination; a TikTok Shop live-stream flash sale combines a discrete price drop with a time-limited attention shock. Third, sales are heavy-tailed and SKU turnover is rapid, so a typical mid-sized marketplace shop has many thin and short-lived SKUs for which classical SKU-level estimation is underpowered.

Each of these features violates a load-bearing assumption of the canonical elasticity estimators. Linear-log regressions are biased because price is endogenous to algorithmic demand signals (Reiss and Wolak, 2007). Discrete-choice models in the BLP family (Berry, Levinsohn, and Pakes, 1995) require cross-sectional market shares that are ill-defined when the choice set itself is algorithmically curated. Modern double machine learning (Chernozhukov et al., 2018) handles confounding flexibly but provides no native way to share information across thin SKUs, exploit mechanism structure, or generate the calibrated uncertainty required by margin-constrained operators.

This paper has three objectives. The first is historical and pedagogical: to trace how elasticity estimation evolved from Marshall's introspective derivation, through the cost-shifter instruments of the Cowles Commission era (Working, 1927; Wright, 1928), through structural demand systems (Deaton and Muellbauer, 1980; Berry et al., 1995), into the modern machine-learning literature (Wager and Athey, 2018; Chernozhukov et al., 2018), and to identify which assumptions hold and which fail under platform e-commerce. The second is diagnostic: to characterize the mechanism-induced biases that arise when these methods are applied naively to Shopee, Lazada, and TikTok Shop data. The third is constructive: to propose the Mechanism-Aware Hierarchical Causal Elasticity Model (MA-HCEM), a unified estimator designed for the platform-marketplace setting, and to evaluate its theoretical properties against canonical baselines.

The contribution is methodological and theoretical. We deliberately do not report simulation or empirical results in this version of the manuscript: the model has not yet been deployed on platform data, and we prefer to defer numerical claims to a companion empirical paper rather than to manufacture them. Section 7 instead presents a theoretical bias decomposition and an asymptotic comparison that articulate, qualitatively, why MA-HCEM is expected to dominate canonical baselines in the platform-marketplace regime, and under what conditions that dominance fails.

The remainder of the paper is organized as follows. Section 2 reviews the history of elasticity estimation and isolates the structural assumptions on which each generation of methods depends. Section 3 motivates the platform-specific problem and documents how each assumption fails. Section 4 specifies MA-HCEM. Section 5 develops the identification argument and discusses the role of randomized price perturbations as experimental priors. Section 6 describes posterior inference. Section 7 presents the theoretical bias decomposition and asymptotic comparison. Section 8 discusses managerial implications and the relationship to dynamic pricing. Section 9 lists limitations and an agenda for empirical replication. Section 10 concludes.

2. A Brief History of Price Elasticity Estimation

2.1 The classical period: Marshall to the Cowles Commission

The concept of elasticity of demand was formalized by Marshall (1890), though the underlying intuition that demand schedules are downward sloping and that responsiveness varies by good predates him by at least a century in the writings of Smith (1776) and Cournot (1838). Marshall's contribution was to define elasticity as a unit-free local rate, to argue that it varied systematically with product category — necessities being inelastic, luxuries elastic — and to propose introspection and direct survey as legitimate identification strategies in the absence of experimental data.

The first formal econometric problem in elasticity estimation, the identification problem, was posed in the 1920s. Working (1927) and Wright (1928) observed that a regression of observed price on observed quantity recovers neither the demand curve nor the supply curve, but a hybrid determined by the relative variance of demand and supply shocks. This launched the program of finding cost shifters — variables that move supply but not demand, such as input prices or weather shocks — as instruments for price. Schultz (1938) used this approach to estimate agricultural demand elasticities; the Cowles Commission (Koopmans, 1950; Hood and Koopmans, 1953) generalized it into the simultaneous-equations framework that dominated empirical price work for forty years.

The classical period left two durable contributions. The first is the recognition that price is endogenous: it responds to demand. The second is the standard remedy of instrumental variables under exogeneity and exclusion conditions. Both remain relevant under platform e-commerce, but the supply of valid cost-shifter instruments has thinned in a context where prices are reset algorithmically in response to platform-internal signals.

2.2 Structural demand: AIDS, logit, and BLP

A second wave, beginning with Deaton and Muellbauer's (1980) Almost Ideal Demand System (AIDS), pursued elasticities through theoretically consistent demand systems derived from utility maximization. AIDS imposed Slutsky symmetry and adding-up but assumed continuous quantities; this was suitable for grocery basket data but ill-suited to the discrete, infrequent purchases that dominate durable-goods retail.

The discrete-choice tradition, originating with McFadden (1974) and culminating in Berry, Levinsohn, and Pakes (1995, henceforth BLP), addressed this by modeling each consumer as choosing one product from a finite menu. BLP allowed for unobserved product characteristics correlated with price, used aggregate market shares as the empirical target, and instrumented for price using characteristic-space instruments. It became the workhorse of empirical industrial organization for a generation (Nevo, 2001; Berry and Haile, 2014).

BLP's core requirement, however, is a well-defined market in which all consumers face the same choice set. Platform e-commerce violates this requirement at its foundation: each consumer's choice set is algorithmically curated by the platform's ranking system as a function of search query, browsing history, and price (Ursu, 2018). Two consumers searching for the same query at the same moment can see different products in different orders. The "market share" of a SKU is not a primitive but a function of the algorithm's ranking policy. BLP-style estimators applied without adjustment will conflate true demand elasticity with the platform's price-elasticity-of-ranking — a confound that is invisible in the standard data.

2.3 The machine-learning turn: causal forests, double ML, and deep IV

Three strands of machine learning have reshaped elasticity estimation since the mid-2010s. The first, causal forests (Wager and Athey, 2018; Athey, Tibshirani, and Wager, 2019), generalizes random forests to estimate heterogeneous treatment effects, including continuous-treatment elasticities. The second, double machine learning (Chernozhukov et al., 2018, hereafter DML), uses cross-fitted machine-learning predictions of price and demand on confounders to construct a Neyman-orthogonal score that recovers elasticity with √n-consistency under high-dimensional confounding. The third, deep instrumental-variable methods (Hartford et al., 2017) combine neural networks with instrument-based identification.

These methods have substantially relaxed the parametric and instrument requirements of classical estimation. They are not, however, designed for the panel-with-cold-start structure of marketplace data, nor do they natively encode platform mechanism structure. Applied to platform panel data without modification, they tend to deliver high-variance estimates on thin SKUs (because they cannot pool information across SKUs), miss the kinks induced by Bundle Deal thresholds (because they do not encode the mechanism), and provide no calibrated uncertainty band beyond bootstrap intervals that are themselves unreliable on heavy-tailed sales.

2.4 Hierarchical Bayesian and partial-pooling approaches

A parallel literature, developed largely within marketing science and Bayesian statistics, has emphasized partial pooling as the response to SKU heterogeneity (Allenby and Rossi, 1998; Rossi, Allenby, and McCulloch, 2005; Gelman et al., 2013). The intuition is that an individual SKU's elasticity should not be estimated in isolation: it should be shrunk toward the mean of its category, with the degree of shrinkage determined by the data. This is the formal Bayesian counterpart to the practitioner intuition that a new SKU should inherit a sensible starting elasticity from comparable products. Hierarchical Bayesian estimators are a natural fit for platform data with thousands of SKUs and frequent cold starts, and they generate calibrated posterior intervals as a by-product. Their weakness, in raw form, is identification: the partial-pooling structure does nothing to address price endogeneity. The contribution of this paper is to combine the partial-pooling backbone with a DML-style identification stage and a mechanism-aware likelihood, in a single posterior-inference framework.

2.5 What changed, and what did not

Through this historical arc, three things changed: the flexibility with which we can model the conditional expectation of demand (from linear logs to deep neural networks), the dimensionality of confounders we can absorb (from a handful of cost shifters to thousands of platform-internal signals), and the targets we can estimate (from average elasticity to fully heterogeneous, SKU-level, time-varying elasticity).

What did not change is the foundational requirement: variation in price that is exogenous, conditional on what we have measured. Every elasticity estimator stands or falls on whether it succeeds in isolating such variation. In platform e-commerce, where price is a function of algorithmic signals the seller does not observe and the analyst rarely sees, this requirement is the binding constraint. Section 3 develops this point in detail.

3. Why Platform E-Commerce Needs a Different Estimator

3.1 The algorithmic confounder

The first feature of platform e-commerce that breaks naive estimation is the algorithmic mediation of demand. On Shopee, Lazada, and TikTok Shop, a SKU's exposure — the number of impressions it receives from organic search and recommendation — is determined by a ranking model that takes price as one of several inputs (Ursu, 2018; Goldfarb and Tucker, 2019). When a seller raises price, two things happen simultaneously: each impression converts at a lower rate (the demand effect we want to estimate), and the SKU receives fewer impressions to begin with (the algorithmic-ranking effect). A naive log-log regression of units sold on price recovers the sum of these two effects, not the demand elasticity in isolation. The bias is structural; it does not vanish with more data.

This is a different kind of confounding from the classical simultaneity problem. The classical problem is that price responds to demand shocks. The platform problem is that price causes changes in exposure through a mechanism that is exogenous to demand but is, from the analyst's perspective, an unobserved component of the data-generating process. It must be modeled or instrumented out.

3.2 Mechanism-induced kinks

The second feature is the structured promotional surface. Bundle Deal on Shopee, FlexiCombo on Lazada, and "buy more save more" on TikTok Shop are not marketing campaigns layered on top of a smooth demand curve. They are mechanism-design instruments — second-degree price discrimination in the sense of Tirole (1988) — and they introduce kinks and discontinuities in the realized price-quantity surface that smooth nonparametric estimators cannot capture without explicit structure. A causal forest fit on Bundle-Deal-active periods will smooth across the kink and underestimate the threshold-marginal elasticity that the seller actually needs for configuration decisions.

The voucher stack adds further structure. Platform-issued vouchers, seller-issued vouchers, and category-specific vouchers compose with formal stacking rules that the analyst must encode. Two impressions at the same nominal price can have very different realized prices after voucher application, and ignoring this distinction biases the estimator toward inelasticity (because it confuses voucher discounts with price stability).

3.3 The cold-start and long-tail problem

The third feature is the SKU-turnover regime. The SKU-level distribution of monthly units sold in marketplace data is heavy-tailed; the median SKU has few transactions in any given month, and a substantial share of revenue at the end of any quarter is generated by SKUs introduced in the recent past. A non-pooled estimator — one that fits a separate elasticity for each SKU — will fail on this distribution: there is not enough variation per SKU to identify the parameter precisely. A fully pooled estimator — one that fits a single category-level elasticity — will fail in the opposite direction by averaging over genuine heterogeneity.

The statistical answer is partial pooling (Gelman et al., 2013): each SKU's elasticity is a draw from a category-level distribution, and the strength of the prior is itself estimated. This is technologically straightforward but is essentially absent from the dominant ML-causal estimators.

3.4 The decision-maker's loss function

A fourth feature is the asymmetry of the loss function in operational pricing. The seller does not lose equally from over- and underestimating elasticity. Underestimating elasticity (treating an elastic SKU as inelastic) leads to under-discounting and lost orders; overestimating it (treating an inelastic SKU as elastic) leads to discounting items that would have sold anyway and gives margin away. The relative magnitude of these losses depends on the SKU's contribution margin and inventory state, and varies across the catalog.

A point estimate is therefore insufficient: the seller needs a calibrated posterior so that decisions can be made under explicit risk-aversion (Berger, 1985; Robert, 2007). In particular, the optimal Bundle Deal threshold and discount under uncertainty are not the threshold and discount that maximize expected profit at the posterior mean of elasticity; they are the threshold and discount that maximize expected profit averaged over the posterior. This distinction motivates the Bayesian posterior-inference target in MA-HCEM rather than a frequentist point estimate.

4. The Mechanism-Aware Hierarchical Causal Elasticity Model

4.1 Notation and data structure

Let $i \in {1, \dots, N}$ index SKUs, $j (i)$ index the shop containing SKU $i$ , and $c (i)$ the platform-defined category of SKU $i$ . Let $t \in {1, \dots, T}$ index time periods. The observed outcome is $q_{i t}$ , the units of SKU $i$ sold in period $t$ . The treatment is $p_{i t}$ , the realized per-unit price paid (after voucher application) on the marginal transaction. The vector $X_{i t}$ collects covariates: SKU age, shop fixed effect, category-time fixed effect, day-of-week, calendar campaign indicators, advertising spend, organic and paid impressions, inventory state, and a competitor-price summary. Let $M_{i t}$ denote the active mechanism configuration: $M_{i t} = (b_{i t}, v_{i t}, f_{i t})$ where $b_{i t}$ encodes Bundle Deal threshold and discount, $v_{i t}$ encodes the active voucher stack, and $f_{i t}$ indicates flash-sale or live-stream context. Let $Z_{i t}$ denote a randomized perturbation indicator if available (Section 4.6).

4.2 Generative model

The MA-HCEM specifies the conditional distribution of $q_{i t}$ as a mechanism-aware mixture:

q_{i t} ∣ p_{i t}, X_{i t}, M_{i t} \sim π_{i t} \cdot F_{1} (θ_{1, i t}) + (1 - π_{i t}) \cdot F_{2} (θ_{2, i t})

where $F_{1}$ is the demand component for the single-unit-elastic segment (consumers who would buy at most one unit at the posted price), $F_{2}$ is the demand component for the bundle-eligible segment (consumers willing to buy multi-unit if and only if the conditional discount applies), and $π_{i t}$ is a context-dependent mixing weight. This decomposition operationalizes the second-degree price discrimination structure that Bundle Deal and FlexiCombo exploit. We specify both components as negative-binomial to accommodate the over-dispersed counts characteristic of marketplace data:

F_{k} (θ_{k, i t}) = NegBin (μ_{k, i t}, ϕ_{k}), k \in {1, 2} .

The conditional means are log-linear in price with mechanism-modulated elasticity:

lo g μ_{1, i t} = α_{i}^{(1)} + β_{i}^{(1)} lo g p_{i t} + γ^{(1) ⊤} X_{i t}

lo g μ_{2, i t} = α_{i}^{(2)} + β_{i}^{(2)} lo g p_{i t}^{bundle} (b_{i t}) + δ_{i}^{⊤} 1 {b_{i t} \neq = \emptyset} + γ^{(2) ⊤} X_{i t}

where $p_{i t}^{bundle} (b_{i t})$ is the per-unit price under the active bundle threshold and discount, and $δ_{i}$ captures the discrete attention-shifting effect of bundle activation.

4.3 Hierarchical prior

The SKU-level elasticities are partially pooled across two levels of platform taxonomy:

β_{i}^{(k)} ∣ β_{c (i)}^{(k)}, τ_{c}^{(k)} \sim N (β_{c (i)}^{(k)}, τ_{c}^{(k), 2}),

β_{c}^{(k)} ∣ β_{0}^{(k)}, τ_{0}^{(k)} \sim N (β_{0}^{(k)}, τ_{0}^{(k), 2}) .

Hyperparameters $β_{0}^{(k)}$ , $τ_{0}^{(k)}$ , $τ_{c}^{(k)}$ are themselves given weakly informative hyperpriors (Gelman, 2006), with $τ \sim Half-Cauchy (0, 1)$ . This structure delivers the cold-start regularization that Section 3.3 motivated: a new SKU's posterior elasticity is initially dominated by the category-level prior and updates as its own data accumulates.

4.4 Mixing weight and mechanism response

The mixing weight $π_{i t}$ is modeled as a logistic function of mechanism state and consumer-context covariates:

π_{i t} = σ (η_{0} + η_{b}^{⊤} b_{i t} + η_{v}^{⊤} v_{i t} + η_{f} f_{i t} + η_{X}^{⊤} X_{i t}) .

This permits the model to learn that, for example, during a deep Bundle Deal the bundle-eligible segment expands sharply, while during a TikTok Shop live-stream flash sale the single-unit-elastic segment dominates. Cross-elasticities — that is, the substitution between SKUs in the same shop or category — enter the model through the $δ_{i}$ term and through a category-level demand-share constraint that we develop in Appendix A.

4.5 Algorithmic-confounder adjustment via DML

The price endogeneity introduced by platform ranking is addressed via a DML cross-fitting layer (Chernozhukov et al., 2018). Conditional on the high-dimensional covariate vector $X_{i t}$ — which we deliberately enrich to include impressions, ad spend, search rank, inventory, and category-time effects — we treat the residualized price as the identifying treatment variation:

\tilde{p}_{i t} = lo g p_{i t} - \overset{m}{^} (X_{i t}), \overset{m}{^} (\cdot) a cross-fit ML predictor.

The conditional-mean equations of Section 4.2 are estimated with $\tilde{p}_{i t}$ replacing $lo g p_{i t}$ in the price-effect terms. Under the standard DML conditions — Neyman orthogonality of the score, sufficient ML convergence rates, cross-fitting — the resulting elasticity is consistent for the structural parameter even when $\overset{m}{^}$ is high-dimensional and nonparametric.

4.6 Randomized perturbations as experimental priors

A distinctive feature of MA-HCEM is its treatment of small randomized price perturbations as the source of exogenous variation. In a managed deployment, the operator perturbs each SKU's price by a small amount on a fraction of impressions, with the perturbation drawn independently of $X_{i t}$ . The exogeneity of this randomization breaks any residual confounding the DML stage cannot handle. Practically, the perturbations are absorbed as a data-augmenting experimental prior: each perturbed observation contributes to the posterior with the standard likelihood weight, and the unperturbed observations contribute with the DML-residualized likelihood. The combined estimator inherits the asymptotic bias of the DML stage in the absence of perturbations and the unbiasedness of a randomized experiment in their presence, with the relative weighting determined automatically by Bayesian variance pooling.

This is the formal counterpart of the practitioner intuition that a small holdout test is the cheapest way to be sure. The contribution of MA-HCEM is to make the holdout test continuous, on-policy, and seamlessly integrated with observational data — rather than a discrete A/B test that is expensive to design and slow to read out.

5. Identification

5.1 Identification of the demand elasticity

We state the identification result for the single-segment case ( $π_{i t} \equiv 1$ , $F_{2}$ degenerate); the two-segment extension is given in Appendix A.

Assumption 1 (Conditional independence). $lo g p_{i t} ⊥ ⊥ ε_{i t} ∣ X_{i t}, Z_{i t}$ , where $ε_{i t}$ is the structural demand error and $Z_{i t}$ is the randomized-perturbation indicator (degenerate when no perturbation is in force).

Assumption 2 (DML rate condition). The ML estimators $\overset{m}{^} (\cdot)$ for $E [lo g p_{i t} ∣ X_{i t}]$ and $\overset{g}{^} (\cdot)$ for $E [lo g q_{i t} ∣ X_{i t}]$ converge at rate $o_{P} (n^{- 1/4})$ under cross-fitting.

Assumption 3 (Hierarchical exchangeability). Conditional on category $c$ , the SKU-level elasticities ${β_{i}}_{i : c (i) = c}$ are exchangeable.

**Proposition 1 (Identification of $β_{i}$ under MA-HCEM).** Under Assumptions 1-3, the posterior distribution of $β_{i}$ given the data concentrates around the true SKU-level elasticity as $T \to \infty$ for each SKU and as $N \to \infty$ within each category, with the rate of concentration determined by the maximum of the within-SKU effective sample size and the cross-SKU pooling strength.

The argument follows the standard hierarchical-Bayes consistency argument (Diaconis and Freedman, 1986; Ghosal and van der Vaart, 2017) combined with the Neyman-orthogonal score property of the DML residualization. Intuitively, the partial-pooling structure provides identification for cold-start SKUs through the category prior, and the DML stage provides identification for established SKUs through within-SKU price variation that is conditionally exogenous given $X_{i t}$ . A formal proof is sketched in Appendix B; we note that the proposition is stated as a posterior-concentration result rather than a frequentist consistency theorem, and that completing it to a published-grade theorem is an item on the methodological agenda flagged in Section 9.

5.2 What identification does not buy

Three caveats are worth stating explicitly.

First, identification is conditional on the covariate set $X_{i t}$ being rich enough to absorb the algorithmic confounding. In practice this requires the analyst to have access to platform impression data, search-rank data, and ad-spend data at the SKU-time level. When the analyst has access only to the seller's transaction data — as is sometimes the case for third-party tooling — the DML stage will be biased to the extent that algorithmic exposure is correlated with price conditional on observables. The randomized-perturbation augmentation is the formal escape route from this limitation; we recommend it as the default deployment.

Second, the cross-elasticity structure is identified only up to a normalization. We follow the demand-system literature (Deaton and Muellbauer, 1980) in imposing adding-up at the category level; alternatives are discussed in Appendix A.

Third, the model is identified at the elasticity-of-the-mean level, not the full distribution of demand. Quantile elasticities — the response of, say, the 90th percentile of demand to a price change — require an extension to the negative-binomial mean-variance relationship that we do not develop here.

6. Posterior Inference

6.1 Inference target

The inference target is the joint posterior $p (β, α, δ, η, ϕ, τ ∣ D)$ where $D = {(q_{i t}, p_{i t}, X_{i t}, M_{i t}, Z_{i t})}_{i, t}$ is the panel. For managerial use, the marginal posteriors of $β_{i}^{(k)}$ are the primary objects, but the full joint is needed for downstream profit-optimization calculations, particularly Bundle Deal threshold selection.

6.2 Stochastic variational inference

For panels of realistic size, full Markov-chain Monte Carlo inference is computationally costly. We adopt stochastic variational inference (Hoffman et al., 2013; Kucukelbir et al., 2017) with a structured mean-field family that retains the hierarchical correlations between $β_{i}^{(k)}$ and $β_{c}^{(k)}$ but factorizes over SKUs within a category. The negative-binomial likelihood is reparameterized via a Poisson-Gamma mixture for differentiable Monte Carlo gradient estimation. We use control variates derived from the DML residual (Ranganath, Gerrish, and Blei, 2014) to stabilize gradients on long-tailed SKUs. A reference Hamiltonian-Monte-Carlo implementation (Stan; Carpenter et al., 2017) can be used for posterior validation on small subsets where computational cost is not binding. We do not report wall-clock benchmarks here; benchmarking is part of the empirical companion agenda outlined in Section 9.

6.3 Posterior predictive checks

We recommend three posterior predictive checks (Gelman, Meng, and Stern, 1996) as a deployment hygiene minimum: (i) calibration of the credible interval against held-out price perturbations, (ii) coherence of the elasticity distribution against the category prior — a check that flags pathological pooling — and (iii) a discrepancy statistic that compares model-implied Bundle Deal uptake against observed uptake at a candidate threshold.

7. Theoretical Comparison with Canonical Baselines

This section develops a qualitative bias decomposition that identifies which error components each canonical baseline leaves unaddressed under the platform-marketplace data-generating process described in Section 3, and where MA-HCEM is designed to reduce those errors. The comparison is theoretical; empirical and Monte Carlo validation remain part of the replication agenda in Section 9.

7.1 Decomposition of the elasticity-estimation error

For any elasticity estimator $\hat{β}_{i}$ of the SKU-level elasticity $β_{i}$ , we decompose the mean-squared error as the sum of three components: (i) the algorithmic-confounding bias that arises when price is correlated with unobserved exposure-affecting factors, (ii) the mechanism-misspecification bias that arises when the estimator smooths over kinks induced by Bundle Deal or voucher mechanisms, and (iii) the estimation variance that arises from finite within-SKU data.

Each canonical baseline addresses a strict subset of these three components.

7.2 Failure modes of canonical baselines

Naive log-log regression. A SKU-level fixed-effects regression of $lo g q_{i t}$ on $lo g p_{i t}$ controls for time-invariant SKU heterogeneity but not for time-varying algorithmic exposure. The within-SKU price variation it exploits is precisely the variation that is most contaminated by algorithmic confounding: when the seller raises price, the platform reduces exposure, so the within-SKU correlation between price and units sold is the sum of demand elasticity and exposure-elasticity-of-price. The estimator is biased toward inelasticity in the presence of any positive algorithmic price-rank coupling. It additionally ignores mechanism kinks, so it is structurally incapable of recovering the threshold-marginal elasticity needed for Bundle Deal configuration.

Double machine learning without partial pooling. A DML estimator that residualizes price on a rich covariate vector — including impressions, ad spend, and search rank — can in principle eliminate the algorithmic-confounding bias, provided the covariate vector spans the confounder space and the ML nuisance estimators converge at the required rate (Chernozhukov et al., 2018). On established SKUs with adequate within-SKU sample size, this is the right tool. Two failure modes remain. First, on cold-start and thin SKUs, the within-SKU effective sample size after residualization is small, and the resulting elasticity estimate has high variance — which compounds with mechanism-misspecification bias because the DML estimator does not encode the bundle/voucher kink. Second, DML produces frequentist point estimates and bootstrap confidence intervals that are themselves unreliable on heavy-tailed sales distributions; the calibrated uncertainty needed for risk-aware managerial decision-making is not delivered natively.

Causal forest. Generalized random forests (Athey, Tibshirani, and Wager, 2019) handle high-dimensional confounders flexibly and produce SKU-heterogeneous elasticity estimates. They share two of DML's weaknesses (no partial pooling, no native posterior) and add a third: the smoothness inductive bias of forest-based estimators tends to flatten the kinks that Bundle Deal mechanisms introduce in the realized price-quantity surface, so the threshold-marginal elasticity is systematically underestimated.

Hierarchical Bayes without DML. A pure partial-pooling model fits the cold-start regime well — variance is controlled by shrinkage to the category mean — but does nothing about algorithmic confounding. The category-level pooled elasticity is biased by the same algorithmic mechanism that biases naive OLS, and that bias is then propagated to every SKU through the hierarchical prior. The estimator is well-behaved in variance but systematically biased in mean.

BLP-family discrete choice. As discussed in Section 2.2, BLP-style estimators require a well-defined market and a fixed choice set; both fail under platform-curated discovery. The estimator is structurally inapplicable to platform-marketplace data without nontrivial adjustment.

7.3 What MA-HCEM addresses

MA-HCEM is constructed to address each of the three error components simultaneously:

1. The DML residualization stage (Section 4.5) addresses algorithmic confounding bias, conditional on the richness of the covariate set.

2. The two segment mechanism aware likelihood (Section 4.2) addresses mechanism misspecification bias by encoding Bundle Deal and voucher structure as a structural mixture rather than a smoothed continuous response.

3. The hierarchical partial pooling prior (Section 4.3) addresses estimation variance by shrinking thin SKU posteriors toward the category mean.

4. The randomized perturbation augmentation (Section 4.6) provides a fallback against residual confounding that the covariate set does not absorb, and serves as the empirical anchor for the calibrated posterior.

Each of these mechanisms is well-established individually in its own literature; the contribution of this paper is to unify them in a single estimator targeted at the platform-marketplace data-generating process.

7.4 Conditions under which MA-HCEM does not dominate

Honesty about boundary conditions matters. MA-HCEM is unlikely to outperform simpler baselines under any of the following conditions:

1. No algorithmic confounding. If the platform's exposure mechanism is approximately price-neutral, naive within-SKU regression is consistent and the DML stage adds variance without reducing bias.

2. No mechanism kinks. If Bundle Deals and vouchers are absent or quantitatively small in the data, the two-segment likelihood reduces to a more complex equivalent of the single-segment specification, and a simpler model wins on parsimony.

3. Thick SKUs with abundant within-SKU price variation. If every SKU has thousands of transactions and ample exogenous price variation, partial pooling adds little and a flat-prior DML estimator is competitive.

4. Misspecified mixture. If the true demand process is not well approximated by a two-segment mixture — for example, if there is a continuous distribution of consumer reservation prices with no discrete bundle-eligibility structure — the mechanism-aware likelihood introduces specification error that may exceed the bias it removes. This risk is mitigated by posterior predictive checking (Section 6.3) but cannot be eliminated.

A reviewer will rightly demand that these boundary conditions be tested empirically before MA-HCEM is recommended for general deployment. Section 9 describes the replication agenda intended to do exactly that.

8. Discussion

8.1 Managerial implications

The model produces three managerially actionable outputs: a per-SKU posterior elasticity with credible interval, a model-implied optimal Bundle Deal configuration, and a voucher-targeting score that ranks customers (or follower segments) by their estimated marginal response to a discount.

The per-SKU posterior elasticity is the foundational output. It permits the seller to migrate from uniform-discount sale-event configuration to elasticity-tiered configuration: deeper discounts on the elastic tail of the catalog, shallow or zero discounts on the inelastic tail. The magnitude of any resulting profit gain is an empirical question that we do not attempt to quantify in this paper; the theoretical mechanism is clear, but the size of the effect depends on the dispersion of true elasticities across the catalog and is best estimated against operator data.

The model-implied optimal Bundle Deal configuration is computed by integrating expected profit over the joint posterior of $β_{i}^{(1)}$ , $β_{i}^{(2)}$ , and the mixing-weight parameters. Crucially, the optimal threshold under uncertainty is not the threshold that maximizes expected profit at the posterior mean: it is shaded toward the threshold that minimizes downside risk, consistent with the asymmetric loss function discussed in Section 3.4.

The voucher-targeting score follows by computing the expected incremental profit of a discount conditional on customer-segment covariates. This collapses to a simple ranking by estimated elasticity, with the recommendation to allocate deeper vouchers to the elastic tail and retention-only vouchers to the inelastic tail.

8.2 Relationship to dynamic pricing

MA-HCEM is the measurement layer; dynamic-pricing systems (Phillips, 2005; den Boer, 2015) are the automation layer. A fully closed-loop dynamic-pricing system requires the elasticity model, a profit function, an inventory-state model, and a pricing policy. MA-HCEM contributes the first of these and, by virtue of producing a calibrated posterior, also provides the input that a Thompson-sampling or upper-confidence-bound bandit policy needs to balance exploration and exploitation in a principled way (Russo et al., 2018). Our recommendation is that practitioners deploy MA-HCEM in advisory mode first — using its outputs to inform manual pricing decisions — and migrate to closed-loop automation only after a calibration period sufficient to validate the posterior coverage on operational data.

8.3 Relationship to existing platform-data literature

A growing literature has examined demand on digital platforms. Ursu (2018) studies the causal effect of platform ranking on consumer search and purchase decisions on a major travel platform and motivates much of the algorithmic-confounder discussion in Section 3.1. Goldfarb and Tucker (2019) survey the broader digital-economics literature and frame the structural distinctiveness of platform retailing. Our contribution relative to this literature is the explicit incorporation of mechanism structure (Bundle Deal, voucher stacking) into the likelihood, the integration of DML residualization into a hierarchical-Bayesian backbone, and the use of randomized perturbations as on-policy experimental priors.

9. Limitations and Replication Agenda

We identify five limitations that bound the scope of the present work and that we propose as priorities for empirical replication.

First, this paper presents no simulation or empirical results. The theoretical analysis in Section 7 articulates why we expect MA-HCEM to dominate canonical baselines under the platform-marketplace data-generating process, but the magnitude of that dominance — and the boundary conditions where it fails — is an empirical question. The first item on the replication agenda is a Monte Carlo study under data-generating processes calibrated to platform features; the second is empirical validation against ground-truth experimental price perturbations on a real marketplace, which requires a partnership with a platform or a managed seller cohort.

Second, the cross-elasticity structure in our specification is parsimonious — we impose adding-up at the category level — and richer substitution patterns (asymmetric substitution, hierarchical category trees) are likely needed for empirical fidelity in some categories.

Third, the model is static in elasticity: $β_{i}^{(k)}$ does not vary over time. In settings with strong reference-price effects (Kalyanaram and Winer, 1995) or rapid product-life-cycle dynamics, a state-space extension with time-varying elasticity is appropriate. The hierarchical-Bayesian backbone supports this extension naturally.

Fourth, we have abstracted from competition. A fully strategic specification would model competing sellers' price responses (Tirole, 1988; Reiss and Wolak, 2007). This is an important direction; we expect the empirical importance to vary across categories with the degree of platform-induced price coordination.

Fifth, we have treated the platform's algorithmic ranking as a confounder to be absorbed rather than a strategic variable. A second-best literature would acknowledge that the platform optimizes the ranking against its own objective, which may not coincide with the seller's; the elasticity estimate is then a function of the platform's objective as well as the underlying consumer preferences. We see this as an important agenda but beyond the scope of the present paper.

We additionally flag that Proposition 1 is stated as a posterior-concentration result with a sketched argument. Completing the proof to a formal published-grade theorem — including precise rate conditions on the SKU and category sample-size growth — is an item on the methodological agenda.

10. Conclusion

Price elasticity remains the single highest-leverage parameter in retail pricing decisions, but the canonical estimators developed for offline retail and structural industrial organization do not transfer cleanly to platform e-commerce. The features that make platform marketplaces distinctive — algorithmic mediation of demand, mechanism-rich promotional surfaces, heavy-tailed and rapid SKU turnover, and asymmetric managerial loss functions — break the load-bearing assumptions of every dominant estimator in turn.

This paper has reviewed the historical arc from Marshall through BLP through double machine learning to identify which of these assumptions hold and which fail under platform e-commerce, and it has proposed the Mechanism-Aware Hierarchical Causal Elasticity Model as a unified estimator designed for the marketplace setting. MA-HCEM combines a hierarchical-Bayesian backbone with double-machine-learning identification, a structural likelihood that encodes Bundle Deal and voucher mechanisms as second-degree price discrimination, and randomized micro-perturbations as on-policy experimental priors. Section 7's bias decomposition makes precise which failure modes of canonical baselines MA-HCEM is constructed to remediate, and under what conditions the proposed estimator is not expected to outperform simpler alternatives.

The contribution of this paper is methodological and theoretical. We make no empirical or simulation claims. The empirical agenda we propose is direct: a Monte Carlo calibration to platform features, followed by validation against ground-truth randomized experiments via a platform partnership or managed seller cohort, and a cross-platform comparison of elasticity heterogeneity that the partial-pooling structure of MA-HCEM is uniquely positioned to support. The methodological agenda is broader: incorporating reference-price dynamics, strategic competition, and the platform's own objective into the structural likelihood, and completing the formal asymptotic theory of the proposed estimator.

References

Allenby, G. M., and Rossi, P. E. (1998). Marketing models of consumer heterogeneity. Journal of Econometrics, 89(1-2), 57-78.

Athey, S., and Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11, 685-725.

Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. Annals of Statistics, 47(2), 1148-1178.

Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis (2nd ed.). Springer.

Berry, S. T., and Haile, P. A. (2014). Identification in differentiated products markets using market level data. Econometrica, 82(5), 1749-1797.

Berry, S., Levinsohn, J., and Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica, 63(4), 841-890.

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., and Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1).

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1-C68.

Cournot, A. A. (1838). Recherches sur les principes mathématiques de la théorie des richesses. Hachette.

Deaton, A., and Muellbauer, J. (1980). An almost ideal demand system. American Economic Review, 70(3), 312-326.

den Boer, A. V. (2015). Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science, 20(1), 1-18.

Diaconis, P., and Freedman, D. (1986). On the consistency of Bayes estimates. Annals of Statistics, 14(1), 1-26.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515-534.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.

Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733-760.

Ghosal, S., and van der Vaart, A. (2017). Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press.

Goldfarb, A., and Tucker, C. (2019). Digital economics. Journal of Economic Literature, 57(1), 3-43.

Gupta, S., and Lehmann, D. R. (2003). Customers as assets. Journal of Interactive Marketing, 17(1), 9-24.

Hartford, J., Lewis, G., Leyton-Brown, K., and Taddy, M. (2017). Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning (ICML), 70, 1414-1423.

Hicks, J. R. (1939). Value and Capital. Oxford University Press.

Hoffman, M. D., Blei, D. M., Wang, C., and Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1), 1303-1347.

Hood, W. C., and Koopmans, T. C. (Eds.) (1953). Studies in Econometric Method. Cowles Commission Monograph 14, Wiley.

Kalyanaram, G., and Winer, R. S. (1995). Empirical generalizations from reference price research. Marketing Science, 14(3, supplement), G161-G169.

Koopmans, T. C. (Ed.) (1950). Statistical Inference in Dynamic Economic Models. Cowles Commission Monograph 10, Wiley.

Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017). Automatic differentiation variational inference. Journal of Machine Learning Research, 18(14), 1-45.

Lerner, A. P. (1934). The concept of monopoly and the measurement of monopoly power. Review of Economic Studies, 1(3), 157-175.

Marshall, A. (1890). Principles of Economics. Macmillan.

McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Econometrics, 105-142. Academic Press.

Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry. Econometrica, 69(2), 307-342.

Phillips, R. L. (2005). Pricing and Revenue Optimization. Stanford University Press.

Ranganath, R., Gerrish, S., and Blei, D. M. (2014). Black box variational inference. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), 33, 814-822.

Reiss, P. C., and Wolak, F. A. (2007). Structural econometric modeling: Rationales and examples from industrial organization. Handbook of Econometrics, 6A, 4277-4415.

Robert, C. P. (2007). The Bayesian Choice (2nd ed.). Springer.

Rossi, P. E., Allenby, G. M., and McCulloch, R. (2005). Bayesian Statistics and Marketing. Wiley.

Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., and Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1), 1-96.

Schultz, H. (1938). The Theory and Measurement of Demand. University of Chicago Press.

Smith, A. (1776). An Inquiry into the Nature and Causes of the Wealth of Nations. Strahan and Cadell.

Talluri, K. T., and van Ryzin, G. J. (2004). The Theory and Practice of Revenue Management. Springer.

Tellis, G. J. (1988). The price elasticity of selective demand: A meta-analysis of econometric models of sales. Journal of Marketing Research, 25(4), 331-341.

Tirole, J. (1988). The Theory of Industrial Organization. MIT Press.

Train, K. E. (2009). Discrete Choice Methods with Simulation (2nd ed.). Cambridge University Press.

Ursu, R. M. (2018). The power of rankings: Quantifying the effect of rankings on online consumer search and purchase decisions. Marketing Science, 37(4), 530-552.

Varian, H. R. (1992). Microeconomic Analysis (3rd ed.). W. W. Norton.

Wager, S., and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.

Wilson, R. (1993). Nonlinear Pricing. Oxford University Press.

Working, E. J. (1927). What do statistical "demand curves" show? Quarterly Journal of Economics, 41(2), 212-235.

Wright, P. G. (1928). The Tariff on Animal and Vegetable Oils. Macmillan.

Mechanism-Aware Hierarchical Causal Elasticity Modeling for Platform E-Commerce