Research methodology

DataGlass Research Methodology

A single canonical reference for the sample frame, time window, attribution rules, and exclusions that calibrate every "in our data" claim across the blog and research surfaces.

Last updated: 2026-05-04
Read time: 4 min

Sample frame

The DataGlass internal sample is approximately 400 active marketplace seller accounts in the SEA-6 region, all in the THB 200K–50M monthly revenue range, observed across January 2024 through April 2026 (28 months). Sellers are independent operators on Shopee, Lazada, and/or TikTok Shop; the sample is heavily weighted toward Thailand but includes Malaysian, Singaporean, Vietnamese, Filipino, and Indonesian shops.

The sample is not a probability sample. It is the population of accounts DataGlass models in production, drawn from organic inbound, partner referrals, and accounts onboarded through trial. Aggregate figures should be read as descriptive of the operating-side seller population at this size, not as inference about the full Southeast Asian seller market.

Per-platform subsets

Posts that calibrate platform-specific claims draw from the relevant platform subset:

Shopee: ~280 active accounts. Used to calibrate Shopee-specific margin, ad-waste, and ROAS figures.
Lazada: ~180 active accounts. Used to calibrate Lazada-specific margin, Sponsored Search vs. Discovery, and LazMall conversion-lift figures.
TikTok Shop: ~150 active accounts. Used to calibrate TikTok-specific affiliate-commission, return-rate, and host-incentive figures.
Cross-platform / multi-shop: the full ~400-account sample. Used for forecasting, inventory, multi-shop, and ad-budget-allocation claims that span platforms.

Cost reconstruction

All margin, ROAS, and contribution-margin figures are reconstructed per order line, not estimated per category. The pipeline binds every order-line to the canonical product catalog (one SKU identity across shops and platforms), then attributes:

Aggregate figures are calculated on rolling 90-day windows unless the post specifies otherwise (e.g., 28-day for forecasting accuracy, 14-day for campaign attribution).

Platform fees (transaction, commission, payment processing) from the order's settlement record;
Voucher mechanics broken into seller-funded vs. platform-funded portions, using the platform's own documentation as the authority;
Ad spend joined to attributed orders at the campaign x SKU resolution where the ad surface exposes it;
COGS from the seller-supplied cost file; logistics from the carrier's billing record where available;
Returns and refunds applied retroactively to the originating order line.

Exclusions

Two account classes are explicitly excluded from the sample frame and from all "in our data" figures:

Below ~THB 200K monthly revenue: the operational overhead of running the per-SKU reconstruction architecture exceeds the recovered margin at this size; simpler heuristics outperform.
Negotiated-rate enterprise tier: accounts on individually-negotiated commission, payment, or logistics terms are excluded because their cost structure is not generalisable to the operating-side seller population.

What is and is not published

Aggregate figures, distributions, and modal patterns are published. Worked examples are illustrative composites — they reflect modal patterns in the sample but are not the financials of any specific seller. Per-account, per-SKU, and per-shop figures are not published in any post or research artifact. Where a post quotes a number ("4–7 percentage-point margin recovery", "20–30% loss-making ad spend share", "50% rank overlap"), that number is the sample median or a reported range; the underlying distribution is not surfaced.

How to read "in our data" claims

Every blog post that cites internal data describes which subset and which figures are being calibrated, then links here for the full sample frame. Read those claims as descriptive of the operating-side SEA-6 seller population in the THB 200K–50M monthly revenue range over the 28-month observation window. They are not population claims about all marketplace sellers, they are not predictions, and they exclude the size and rate-tier classes named above.

When a figure conflicts with a seller's own data, the seller's own data wins. The published figures are reference points for cohort-level reasoning, not benchmarks any individual account is expected to match.