Data Science
Machine learning in e-commerce often gets described in vague terms. For sellers, the most useful ML question is much more concrete:
How much will this SKU sell soon?
That question feeds into every other operational decision a marketplace seller makes — inventory, ads, pricing, promotions, cashflow. Underestimate demand and the SKU stocks out at exactly the worst time. Overestimate demand and cash gets trapped in inventory that turns slowly. Misunderstand the shape of demand and ad budget pours into products that cannot support the extra sales. Demand forecasting is not really prediction for its own sake. It is decision infrastructure.
Why marketplace demand is hard to forecast
Marketplace demand is volatile in ways that classical retail forecasting was never designed for. A Shopee seller can watch demand swing because of payday campaigns, double-day campaigns, vouchers, competitor pricing, their own ad spend, keyword ranking changes, creator content, reviews, stock availability, seasonality, weather, shipping-speed changes, or even small product-page edits — sometimes several of these at once.
A simple historical average does not survive contact with that environment. A product that sold 100 units last month will not necessarily sell 100 units this month. Maybe it was out of stock for a week. Maybe ad spend was higher. Maybe a campaign artificially boosted demand. Maybe a competitor dropped price. Maybe a review issue hurt conversion. The seller who treats last month's number as next month's plan is fitting a straight line to a curve.
A basic demand forecasting formula
Even a simplified additive model captures most of what matters:
Expected demand = base demand
+ seasonality
+ promotion lift
+ ad effect
+ price effect
+ channel effect
− stock constraintEach component does specific work. Base demand captures the normal pattern. Seasonality captures recurring spikes. Promotion lift captures campaign behavior. Ad effect captures paid visibility. Price effect captures elasticity. The stock constraint prevents the model from misreading a stockout as a drop in real demand — which sounds obvious until you see what happens when it is missing.
The stockout distortion problem
Stockouts corrupt historical data. If a product was out of stock for ten days, sales data will show lower units sold for that period — but that does not mean demand disappeared. It means demand could not be served.
The reason this matters in practice is the feedback loop. Underestimate demand after a stockout, reorder a smaller batch, stock out again sooner — and the model "learns" to expect lower and lower demand from a SKU whose real customers are still there.
How ML can help
Machine learning can learn patterns across many signals at once — historical sales, campaign dates, ad spend, price changes, inventory availability, product category, day of week, seasonality, order velocity, and channel behavior — without the seller having to choose which one matters in advance. A 2025 systematic review of machine learning in inventory control (ScienceDirect) analyzed 122 articles and categorised ML applications into demand forecasting before optimisation, ML embedded directly into optimisation, and dynamic approaches such as reinforcement learning for inventory policies.
sMAPE = symmetric mean absolute percentage error. The single highest-leverage step — censoring stockout-period sales rather than treating them as observed demand — produces a larger accuracy gain than the model-architecture upgrade. Result: in production, data quality dominates model choice on hierarchical retail series.
For sellers, the practical takeaway is not "use ML because ML is trendy." The takeaway is more pointed:
Forecasting should connect to decisions.
Forecasting without decisions is not enough
A forecast that says "you may sell 500 units next month" is interesting but operationally inert. The seller needs to know what to do — should I reorder now, should I increase ads, should I reduce ads because stock is running low, should I raise price to slow demand, should I avoid a campaign because inventory cannot support it, should I bundle this SKU with something else? A forecast that does not answer at least one of those questions is doing analysis instead of work. This is why DataGlass connects forecasting directly to action.
The ad and inventory connection
Ads and inventory should not be managed in separate tabs. A SKU with strong margin and enough inventory can absorb more ad budget. A SKU with strong demand but low stock will be hurt by more ad budget — every additional click lands on a page that is about to disappoint a buyer. A SKU with excess inventory but weak margin can sometimes benefit from a discount, but only if the discount is capped by contribution margin instead of by clearance instinct.
Forecasting confidence matters
A forecast should not pretend to be perfect. A good system shows confidence alongside the prediction, because the right reorder behavior is different at high, medium, and low confidence:
High confidence: reorder now
Medium confidence: monitor and prepare supplier
Low confidence: avoid aggressive inventory betThis is especially important for new products and for products that have been moved by viral content, where the historical signal is short, noisy, or both.
Sensitivity — what changes the operating decision
A forecasting system in production exists to feed decisions, and the table below stress-tests how three operating decisions — reorder, ad scale, campaign participation — shift under different forecast-confidence regimes.
| Decision | High confidence | Medium confidence | Low confidence |
|---|---|---|---|
| Reorder timing | Auto-reorder at lead-time + safety buffer | Manual review; supplier on standby | Avoid aggressive inventory commitment |
| Ad budget allocation | Scale to forecast-supported demand level | Hold current spend; monitor weekly | Cap spend to current run-rate |
| Campaign-window participation | Participate at full eligibility tier | Participate at lower voucher tier | Decline; protect inventory for baseline demand |
| Pricing change | Implement price test with monitoring | Defer until forecast stabilises | Hold current pricing |
| Stockout-risk alert | Alert at 14 days of supply | Alert at 21 days of supply | Alert at 28+ days; treat as soft signal |
The matrix is the integration layer between the forecast and the operating decisions. A forecast without confidence-tiered decision rules is analysis without action; a confidence-tiered system produces auto-handled cases at the high-confidence end and human-review cases at the low-confidence end, concentrating operator attention where it matters.
Limitations and where this argument breaks
Five explicit limits.
- History-length lower bound. The framework assumes ≥6 months of clean order-line history per SKU. Below that, simpler heuristics (moving average, supplier-lead-time-based reorder, category-mean inference) outperform the ML model. New-product launches need a different operating procedure: human-set reorder cadence with auto-graduation to the model once history accumulates.
- Viral-content distortion. SKUs moved by creator content or live-stream sessions exhibit demand distributions that classical models handle poorly — bursty, short-window, high-variance. Wider confidence intervals and human-in-the-loop reorder are appropriate; production code should detect and flag these cases rather than auto-decide.
- Cross-platform demand interaction. The framework treats per-platform demand independently. Real demand sometimes shifts between Shopee, Lazada, and TikTok Shop on the same SKU as price-mirror automation closes arbitrage. The cross-platform interaction term is non-trivial to model and is a known underestimation source.
- Censoring quality. The stockout-period censoring step is only as good as the inventory-state input. If inventory-state data is noisy (delayed updates, multi-warehouse SKUs), censoring incorrectly applied creates either over-forecasting (if too aggressive) or under-forecasting (if too conservative).
- Internal-data scope. The accuracy figures (~13% sMAPE production baseline, the chart's comparative numbers) are aggregated across the SEA-6 Thai Shopee accounts we model directly. They are not population claims about all e-commerce demand-forecasting setups; they explicitly exclude the bottom of the size distribution noted above.
Methodology
Public-data citations are taken from the 2025 ScienceDirect systematic review of ML approaches in inventory control, the M5 Forecasting Competition methodology and results (Kaggle), the Bain e-Conomy SEA 2025 commentary on regional demand volatility drivers, and Shopee's Help Center documentation on stockout policy and ranking-algorithm response to inventory state.
Internal-data claims — the sMAPE figures across forecasting methods, the 20–40% stockout-prevention recall lift, the typical operating-decision split under confidence-tiered rules — are aggregated across approximately 400 active marketplace seller accounts across the DataGlass research methodology sample frame (Jan 2024 – Apr 2026, 28-month observation window), with at least 6 months of clean order-line history per SKU included in the forecasting evaluation set. Forecasts are evaluated on rolling out-of-sample 28-day windows; accuracy figures are reported as sample medians across SKUs.
The seller does not need to see the model. The seller needs to see the decision.