Product Forecasting & Inventory Optimization

01The Right Problem Statement

The goal is not to minimize forecast error. The goal is to minimize the cost of being wrong.

This distinction, which Vandeput makes forcefully in the opening chapters of both books, is the intellectual foundation of everything that follows. Statistical accuracy — a lower RMSE, a tighter confidence interval — is instrumentally valuable only insofar as it reduces the cost of incorrect inventory decisions. An excellent forecast that informs a bad stocking policy is worse than useless. A mediocre forecast that is correctly translated into inventory decisions can be highly profitable.

For Grow Fragrance, this reframing is not merely academic. When the planning process runs on a founder's growth intuition baked into a rules-based engine, the question to ask is not "how accurate is the engine?" but "what is the cost distribution of its errors, and in which direction does it tend to fail?"

The Newsvendor Problem — The Executive Lens

The Newsvendor model is the single most important concept in this entire document. It is named for a newspaper seller who must decide each morning how many papers to order before knowing how many will sell. Order too few, and you lose potential profit. Order too many, and you're stuck with unsellable inventory.

The model is deceptively simple and extraordinarily powerful. It translates every forecasting and inventory decision into a single, intuitive question: given the cost of ordering too much vs. too little, how many units should I stock?

Newsvendor — Critical Ratio CR = C_u / (C_u + C_o)

CR — Critical Ratio (target service level / optimal fill rate)

C_u — Cost of Understocking (lost margin + lost customer LTV + brand damage)

C_o — Cost of Overstocking (holding cost + markdown risk + disposal cost)

The critical ratio tells you, directly, what percentile of the demand distribution you should stock to. If CR = 0.80, stock the 80th percentile of expected demand. If demand is normally distributed with mean 500 and std dev 80, stock approximately 567 units (500 + 0.84 × 80).

Cost Asymmetry at Grow

Grow has confirmed that stockouts are materially more painful than overstock. The cost structure for a DTC fragrance brand is characteristically asymmetric:

⚠️

Cost of Understocking (C_u)

Lost revenue on the sale. Lost LTV of the customer (repeat buyers who hit a stockout may not return). Brand reputation damage. Lost social momentum — influencer spikes can't be recaptured. Lost wholesale placement if a retail order can't be fulfilled.

📦

Cost of Overstocking (C_o)

Holding/carrying cost (fragrance materials are relatively stable; not high spoilage). Working capital tied up in inventory. Potential markdown to clear. Fungibility of raw materials partially offsets this — unused fragrance components can often be redirected.

Key Insight

The raw material fungibility at Grow structurally lowers C_o. If a fragrance batch doesn't sell as expected, the components aren't lost — they can be reformulated or redirected. This asymmetry pushes the Critical Ratio well above 0.50, potentially toward 0.75–0.85 for hero SKUs. The exact number requires properly parameterizing C_u and C_o from actuals — a Phase 1 deliverable.

Interactive: Newsvendor Calculator

Adjust cost parameters to see how the optimal stocking quantity shifts. This is the executive conversation — not "our MASE improved."

C_u — Cost of Understocking (per unit) $45

C_o — Cost of Overstocking (per unit) $12

Mean Demand Forecast 500

Forecast Std Dev (σ) 80

0.79 Critical Ratio

569 Optimal Order Qty

79% Target Service Level

Common Mistake

Using a 50th percentile forecast (the mean) as the stocking quantity implicitly assumes C_u = C_o. For Grow, where stockouts are far more costly, this systematically under-stocks every SKU. The Newsvendor framework makes this error visible and correctable.

02Understanding Your Demand Signal

Before choosing a model, understand what your data is actually telling you.

Vandeput dedicates significant attention to demand characterization in Data Science for Supply Chain Forecasting. The choice of forecasting model is downstream of the shape of demand — not upstream. A SARIMA model applied to intermittent, lumpy demand will perform worse than a naïve method.

Demand Decomposition

Any demand time series can be decomposed into four components:

📈

Trend

Long-run direction of demand. For Grow's hero SKUs, this should be positive and measurable with 12+ months of data. Bass diffusion applies here for new launches.

🌊

Seasonality

Recurring patterns within a period (weekly, monthly, annual). Fragrance sales have strong annual seasonality — the existing rules-based engine captures some of this implicitly.

⚡

Cyclicality

Longer-run patterns not tied to calendar (economic cycles, category waves). Less relevant for Grow at current scale, but relevant for wholesale channel planning.

〰️

Noise / Residual

The irreducible random component. Influencer spikes, viral moments, and one-time promotions land here. Crucially — you cannot and should not try to forecast noise.

Illustrative Demand Decomposition — Fragrance SKU (Annual Pattern)

Intermittent & Lumpy Demand

Vandeput distinguishes demand types using two axes: demand frequency (how often does a non-zero demand period occur?) and demand variability (when demand does occur, how variable is the quantity?).

Demand Type	Frequency	Variability	Relevant for Grow?	Recommended Approach
Smooth	High	Low	Hero SKUs	SES, Holt-Winters, ARIMA
Erratic	High	High	Promo SKUs	Holt-Winters + event features
Intermittent	Low	Low	Slow Movers	Croston's method
Lumpy	Low	High	New Formats	Bayesian Pooling, ADIDA

Grow Context

With 3–5 fragrances × 4–5 SKUs per fragrance, you likely have a mix of smooth (hero anchors), erratic (launch SKUs with promotion), and potentially lumpy (new formats). Classifying your SKU catalog into these buckets is a Day 1 task — it determines which model family applies to each SKU before any data fitting begins.

Data Quality Requirements

Vandeput is explicit: no model can overcome bad input data. For Grow, the minimum viable data requirements are:

Sales history: Daily or weekly sell-through by SKU (not orders — actual sales). Starting from 2024 means roughly 1–2 years of data. Sufficient for seasonal models, limiting for multi-year trend estimation.
Promotions & events log: Date, SKU affected, type (influencer, paid ad, email blast, sale). Without this, promo demand looks like noise and inflates forecast error.
Stockout log: When was a SKU out of stock? Censored demand (periods where you would have sold more but couldn't) is one of the most common forecast biases.
Lead times: Supplier lead time by raw material. This feeds directly into safety stock calculation.

03Forecast Error Metrics

Metrics are not goals — they are instruments. Know what each measures and what it cannot measure.

The Metric Catalog

Metric	Formula	Strengths	Weaknesses
MAE Mean Absolute Error	mean(\|A - F\|)	Interpretable in units. Easy to explain.	Scale-dependent. Can't compare across SKUs with different volumes.
RMSE Root Mean Squared Error	√mean((A-F)²)	Penalizes large errors more heavily. Useful when outlier errors are costly.	Less interpretable. Sensitive to a small number of large errors.
MASE Mean Absolute Scaled Error	MAE / MAE_naïve	Scale-free. Comparable across SKUs and time periods. Vandeput's recommended default.	Requires a naïve benchmark to exist. Can be unintuitive.
Bias Mean Error	mean(F - A)	Reveals systematic over/under-forecasting. A biased model with low RMSE is dangerous.	Errors cancel out — can be zero even with highly variable forecasts.
MAPE Mean Abs % Error	mean(\|A-F\|/A)	Intuitive percentage. Widely used in business.	Undefined when A=0. Biased toward under-forecasting. Vandeput recommends avoiding it.

Vandeput's Warning on MAPE

MAPE is the most commonly reported forecast metric in business and one of the least useful. It is undefined for zero-demand periods (common in fragrance SKUs), asymmetrically penalizes overforecasting, and systematically incentivizes under-forecasting. Replace it with MASE as your primary accuracy metric and track Bias separately.

Translating Forecast Error into Inventory Cost

This is the bridge between statistical modeling and business value — and it is the bridge most implementations never build. Vandeput argues that forecast error is only meaningful when expressed in dollars, not abstract statistical units.

Error Cost Translation Expected Cost = C_u · E[max(D-Q, 0)] + C_o · E[max(Q-D, 0)]

D — Realized demand (random variable)

Q — Stocking quantity

E[max(D-Q, 0)] — Expected stockout quantity (lost sales)

E[max(Q-D, 0)] — Expected overstock quantity (excess inventory)

Cost Surface: Expected Cost as a Function of Stocking Quantity

The chart above shows the critical insight: the optimal stocking quantity is not at the mean of the forecast distribution. It shifts right (toward higher stocking) as C_u increases relative to C_o. For Grow, given the confirmed asymmetry, the optimal quantity will consistently sit above the forecast mean — meaning a 50th-percentile point forecast systematically under-stocks.

04Forecasting Models

Start simple. Add complexity only when simplicity provably fails.

Baseline Models — The Floor

Vandeput's most important methodological point on model selection: every model must beat a naïve benchmark to justify its complexity. The naïve forecast is simply: "next period's demand equals this period's demand." It requires no estimation, no parameters, and no data beyond the last observation.

Naïve / Random Walk

F(t+1) = A(t)

The MASE denominator. If any model you build has MASE > 1.0, you are worse than doing nothing. This is your absolute floor.

Seasonal Naïve

F(t+1) = A(t - seasonality)

Next period equals the same period last year. For fragrance with strong annual cycles, this is a surprisingly strong baseline. Often beats complex models with limited data.

Moving Average

F(t+1) = mean(A(t-n+1)...A(t))

Smooths noise but lags trend. Window length n is a tuning parameter. Simple to explain to non-technical stakeholders.

Exponential Smoothing — The Workhorse

Exponential smoothing models are Vandeput's recommended starting point for most supply chain applications. They are computationally lightweight, interpretable, and adapt to changing demand levels over time. The alpha (α) parameter controls how quickly the model responds to new information.

Simple Exponential Smoothing (SES) S(t) = α · A(t) + (1 - α) · S(t-1)

S(t) — Smoothed level at time t (also the one-step-ahead forecast)

α — Smoothing parameter ∈ (0,1). High α = reacts quickly to new data. Low α = stable, slow to respond.

A(t) — Actual demand at time t

Holt-Winters (Triple Exponential Smoothing) F(t+h) = (L(t) + h·T(t)) · S(t+h-m)

L(t) — Level component (updated with α)

T(t) — Trend component (updated with β)

S(t) — Seasonal component (updated with γ)

m — Seasonal period (12 for monthly/annual seasonality)

h — Forecast horizon

Grow Recommendation

Holt-Winters with additive or multiplicative seasonality is the appropriate starting model for Grow's hero SKUs (anchor fragrances with established history). The seasonal component will partially replicate what the rules-based engine does implicitly — but with statistically fitted parameters rather than manual adjustments. This becomes the shadow mode comparator in Phase 1.

Holt-Winters vs. Naïve vs. Actual — Illustrative Fragrance SKU

ARIMA / SARIMA

ARIMA (AutoRegressive Integrated Moving Average) models demand as a function of its own past values and past forecast errors. SARIMA extends this with seasonal differencing. These are appropriate when the demand series shows autocorrelation — i.e., knowing yesterday's demand genuinely helps predict today's.

Vandeput is pragmatic about ARIMA: it is more complex to tune (p, d, q parameters plus seasonal equivalents), requires stationarity testing, and rarely outperforms Holt-Winters on typical supply chain data. However, for Grow's DTC channel where day-of-week effects, promotional echoes, and holiday patterns create structured autocorrelation, SARIMA can add value — particularly as data accumulates through 2025 and 2026.

SARIMA(p,d,q)(P,D,Q,m) ARIMA Order: (p,d,q) × Seasonal Order: (P,D,Q,m)

p / P — AR order: how many past values to include (non-seasonal / seasonal)

d / D — Differencing order: how many times to difference for stationarity

q / Q — MA order: how many past errors to include

m — Seasonal period (12 for monthly data)

ML-Based Forecasting

Machine learning models (gradient boosting, random forests, neural networks) can incorporate external features — promotions, social media signals, day-of-week, holidays, new product indicators — that classical time series models cannot. Vandeput discusses these in the context of feature engineering for supply chain, noting that their value is primarily in capturing causal demand drivers rather than in time series pattern recognition per se.

LightGBM / XGBoost

Gradient boosted trees. Best overall performer on tabular demand data when sufficient feature engineering is applied. Handles non-linear interactions (promo × season × launch age) naturally.

Feature Engineering

The model is only as good as its features. Essential features: lag variables (demand at t-1, t-7, t-28), rolling statistics, promotion flags, SKU age, launch cohort, Bass curve position.

Phase 3 Candidate

ML models require more data than Grow currently has. With 12–18 months of clean actuals, begin exploratory ML work. Deploy only when MASE meaningfully beats Holt-Winters on a holdout set.

Model Selection Framework

The question is never "which model is best in theory?" It is always "which model produces the lowest expected inventory cost on this SKU given this much data?"

Model Selection Decision Tree

New SKU?

↙ Yes No ↘

Bayesian Pooling from similar cohort

Intermittent Demand?

↙ Yes No ↘

Croston / ADIDA

Seasonal Pattern?

↙ Yes No ↘

Holt-Winters

SES / Holt's

↓ (with 18m+ data)

SARIMA / LightGBM

05Anchors & Faders — Product Lifecycle Modeling

Every SKU in a fragrance portfolio follows a lifecycle. Modeling the shape of that lifecycle is more powerful than fitting a time series in isolation.

The existing rules-based engine implicitly encodes a lifecycle assumption: new fragrances peak and decay, while a handful of "anchor" products maintain or grow. This intuition is statistically grounded — it maps directly to two well-studied models: the Bass Diffusion Model (for faders) and S-curve / logistic growth (for anchors). Making these curves explicit and fitting them to historical data turns founder intuition into testable, refineable parameters.

Bass Diffusion Model — For New Launch Faders

The Bass model (Frank Bass, 1969) describes the adoption of a new product as the interplay between two populations: innovators who adopt independently of others, and imitators who adopt because of social influence from existing adopters.

Bass Diffusion Model f(t) = [p + q·F(t)] · [1 - F(t)]

f(t) — Fraction of market adopting at time t (instantaneous adoption rate)

F(t) — Cumulative fraction of market that has adopted by time t

p — Coefficient of innovation (external influence: ads, brand awareness). Typical range: 0.01–0.03

q — Coefficient of imitation (internal influence: word of mouth, social proof). Typical range: 0.3–0.5

m — Total potential market size (must be estimated or set via growth input)

For Grow, the critical parameter is m — this is precisely where the founder's growth input enters the model. The Bass curve shape (governed by p and q) can be fitted from historical launch data. Once fitted, all subsequent launches share the same curve shape; only m varies. This is the statistical validation layer for the growth assumption: "given p and q fitted from your previous launches, what value of m is implied by our historical peak demands, and how does that compare to the growth target?"

Interactive: Bass Diffusion Curve

Adjust parameters to see how different fragrance launch profiles emerge. The peak timing and height are direct functions of p, q, and m.

p (Innovation) 0.02

q (Imitation) 0.40

m (Market Size) 2,000

S-Curve / Logistic Growth — For Anchor SKUs

Anchor SKUs — the 1–2 hero fragrances that grow in the early years before plateauing — follow a logistic growth curve. Unlike Bass, the logistic model has no decay phase: demand rises, inflects, and saturates at a ceiling.

Logistic Growth (S-Curve) D(t) = K / (1 + e^(-r(t - t₀)))

K — Carrying capacity / demand ceiling (maximum sustainable demand)

r — Growth rate (steepness of the S-curve)

t₀ — Inflection point (time of fastest growth)

Anchor vs. Fader Demand Profiles — Illustrative Comparison

Bayesian Pooling — Borrowing Strength Across SKUs

When a new SKU launches, it has zero history. The naïve approach is to wait for data to accumulate before forecasting. The Bayesian approach is to use related SKUs as informative priors — shrinking the new SKU's forecast toward the cohort mean, and then updating as actuals arrive.

The Pooling Logic

A new fragrance in the Fader category is not launched into a vacuum. Grow has previously launched fragrances. Those launches inform a prior distribution over (p, q, m). The new launch starts there and updates toward its own data as weeks accumulate. After 4–6 weeks of sell-through, the posterior is meaningfully updated. After a full season, the SKU stands on its own data.

The mathematical machinery is hierarchical Bayesian modeling — at its simplest, a partial pooling model where individual SKU parameters are drawn from a common hyperprior. In practice, a reasonable approximation for Phase 2 is:

Approximate Bayesian Pooling F_new = w · F_pooled + (1-w) · F_individual

w — Pooling weight, decreasing as n (individual observations) increases. w = k/(k+n) where k is a tunable shrinkage constant.

F_pooled — Forecast from cohort-level parameters (all Faders, all Anchors)

F_individual — Forecast from this SKU's own fitted parameters

This approach is directly applicable to Grow's new format problem: a new size of an existing fragrance can pool from both the cohort-level prior and the parent fragrance's demand history, with higher initial weight on the parent.

06Inventory Optimization

Forecasting tells you what demand might be. Inventory optimization tells you what to do about it.

Safety Stock

Safety stock is buffer inventory held to protect against demand variability and supply uncertainty during the replenishment lead time. Vandeput's key point: safety stock is not waste — it is the cost of uncertainty. Reducing safety stock without reducing uncertainty just increases the probability of stockout.

Safety Stock — Standard Formula SS = z · σ_LT

z — Service level factor (z-score corresponding to target service level). z=1.28 for 90%, z=1.65 for 95%, z=2.05 for 98%

σ_LT — Standard deviation of demand during lead time

When both demand and lead time vary:

σ_LT = √(L̄ · σ²_d + d̄² · σ²_L)

L̄, σ_L — Mean and std dev of lead time

d̄, σ_d — Mean and std dev of demand per period

The z-score Connection to Newsvendor

The z-score used in safety stock calculation is the Newsvendor critical ratio, expressed as a normal quantile. When CR = 0.80, z = 0.84. This means the two frameworks — Newsvendor and safety stock — are the same model expressed differently. The Newsvendor gives the intuition; safety stock gives the operational implementation.

Reorder Point & Economic Order Quantity

Reorder Point (ROP)

ROP = d̄ · L̄ + SS

d̄ · L̄ — Expected demand during lead time

SS — Safety stock buffer

When inventory position falls to or below ROP, trigger a replenishment order. This is the operational trigger for day-to-day ops staff — the "reorder X now" signal.

Economic Order Quantity (EOQ)

EOQ = √(2 · D · K / h)

D — Annual demand

K — Fixed ordering cost per order

h — Holding cost per unit per year

EOQ minimizes total ordering + holding costs. For Grow, where raw material fungibility lowers h, EOQ will tend toward larger, less frequent orders — but must be balanced against cash flow constraints.

Newsvendor — Full Parameterization for Grow

Returning to the Newsvendor with full parameterization relevant to Grow's cost structure:

Cost Component	Category	Estimable From	Notes for Grow
Lost sale margin	C_u	P&L, price × margin %	Direct, measurable
Lost repeat customer LTV	C_u	Cohort analysis, repurchase rate	High for DTC — a stockout breaks the repurchase habit
Lost brand/social momentum	C_u	Attribution modeling	Hard to quantify; use conservative estimate + sensitivity analysis
Raw material holding cost	C_o	WACC × inventory value	Low for stable fragrance materials
Finished goods holding cost	C_o	Warehouse + WACC	Modest; space is limited constraint
Markdown / clearance cost	C_o	Historical promo discounts	Partially offset by fungibility

Service Levels — Cycle vs. Fill Rate

Vandeput distinguishes two service level definitions that are frequently confused:

Cycle Service Level (CSL)

Probability that no stockout occurs during a replenishment cycle. CSL = 95% means 95% of order cycles have zero stockouts. This is what the z-score in the safety stock formula directly controls.

Fill Rate (FR)

Fraction of demand met from stock on hand. FR = 95% means 95% of units demanded are shipped without delay. Fill rate is always ≥ CSL. For Grow's DTC context, fill rate is the more meaningful customer-facing metric.

Do Not Confuse These

A 95% CSL does not mean 95% of orders are fulfilled. It means 95% of cycles are stockout-free. If cycles are short and order quantities are small, a 95% CSL can correspond to a 99%+ fill rate. If cycles are long, the same CSL might yield a much lower fill rate. Always report both for executive conversations.

07MRP Output → Layered Forecasting System

MRP is a constraint solver, not a demand forecaster. The two must be architecturally separated and connected deliberately.

Material Requirements Planning (MRP) is a production scheduling and materials planning tool. It takes a demand plan as input and calculates what to produce, buy, and when. The demand plan itself is not generated by MRP — it is fed into it. The current state at Grow is that the rules-based Demand Engine produces the demand plan that feeds MRP. The statistical forecasting layer will eventually sit between raw historical data and MRP, either replacing or augmenting the Demand Engine's output.

MRP Data Extraction & What's Usable

MRP Output Field	Usability	Notes
Historical planned orders	Low	Reflects the Demand Engine's intent, not actual demand. Do not use as demand history.
Historical actual receipts	Medium	Useful for lead time distribution fitting. Actual vs. planned receipt dates = lead time variability.
On-hand inventory snapshots	High	Essential for calculating implied demand from inventory movements (beginning + receipts - ending = sales).
Stockout / back-order flags	High	Censored demand identification. Any period with a stockout has understated demand.
Bill of Materials (BOM)	High	Links finished goods to raw materials. Critical for raw material demand planning and fungibility mapping.

Critical Data Distinction

Use sell-through (actual sales to end customers) as your demand signal — not production orders, not receipts, not planned demand. If you only have inventory movement data, back-calculate demand as: Demand(t) = Inventory(t-1) + Receipts(t) − Inventory(t), adjusting for any stockout periods.

System Architecture — Demand Signal Layer

Layer 1 — Raw Data Sources

Shopify / DTC Sales

Wholesale Orders

Inventory Snapshots

Promotions Log

↓ ETL / DataHub Pipeline ↓

Layer 2 — Clean Demand Signal

Adjusted Sell-Through by SKU/Week

↓ Parallel Paths ↓

Rules-Based Engine (existing)

⟷

Statistical Forecast Engine (new)

↓ Reconciliation Layer ↓

Layer 3 — Consensus Demand Plan

Validated Demand Plan + Uncertainty Intervals

↓

MRP Input

Inventory Policy Engine

Reorder Triggers

The Feedback Loop

A forecasting system without a feedback loop is not a system — it is a one-time calculation. The operational value comes from continuous updating: actuals flow back in, forecast errors are measured, model parameters are updated, and inventory policies are recalibrated. At Grow's current scale, this loop can run weekly.

# Conceptual weekly update loop
def weekly_forecast_update(sku_id, new_actuals):
    # 1. Ingest new week actuals
    actuals = load_actuals(sku_id) + new_actuals
    
    # 2. Refit model (SES/HW alpha updates automatically; Bass needs periodic refit)
    model = load_model(sku_id)
    model.update(new_actuals)
    
    # 3. Generate new forecast with uncertainty intervals
    forecast, lower, upper = model.predict(horizon=12, confidence=0.80)
    
    # 4. Recalculate safety stock with updated σ
    ss = safety_stock(sigma_lt=model.sigma_lt, z=cr_to_z(cu, co))
    
    # 5. Update reorder point
    rop = model.mean_lt_demand + ss
    
    # 6. Log forecast error metrics for governance
    log_metrics(sku_id, mase=calc_mase(actuals, forecast), bias=calc_bias(actuals, forecast))
    
    return {'forecast': forecast, 'rop': rop, 'ss': ss}

08Growth Input Validation

The founder's growth input is a prior, not a forecast. The statistical system's job is to interrogate that prior with evidence.

The current Demand Engine uses a growth target as a primary driver — accounting for an estimated 60–70% of output. This is not inherently wrong: growth targets are legitimate inputs to planning. The problem is the absence of a validation mechanism. When the growth input is high, the entire production plan is large. If the growth doesn't materialize, that's working capital tied up in inventory. If it does materialize but wasn't planned for, that's stockouts.

The statistical framework provides three things the engine currently lacks:

📊

Evidence-Based Growth Estimate

Fit trend models to historical sell-through by SKU. Calculate the statistically supported growth rate with confidence intervals. This is the "what does the data say the company can grow at" number.

⚡

Gap Analysis

Compare the evidence-based growth estimate to the growth input. A large gap (target: 40%, data-supported: 22%) requires explanation — not suppression. What specific marketing actions, new products, or channel expansions justify the gap?

🎯

Conditional Production Plan

Present a distribution of outcomes, not a single number. "If we hit 40% growth: produce X. If we hit 22%: produce Y. The cost of Y being wrong in each direction is Z." This is the Monte Carlo vision, accelerated.

Growth Input Validation Framework Gap = Growth_target − Growth_evidence

Growth_evidence — MoM or YoY trend from statistical model fitted to sell-through, with 80% CI

Growth_target — Founder's input to Demand Engine

Decision rules:

Gap < 10pp — Reasonable; no intervention needed. Proceed with blended estimate.

Gap 10–25pp — Elevated; require documented action plan (marketing spend, new product timeline, channel expansion).

Gap > 25pp — High risk; temper growth input to evidence + 15pp, or require board-level discussion of required investment to bridge gap.

Shadow Mode Protocol

Shadow mode is the politically and operationally correct way to introduce statistical forecasting alongside a founder's existing system. The statistical model runs in parallel, produces outputs, and accumulates a performance track record — but does not yet influence decisions. This serves several purposes:

Trust building: The model's track record speaks for itself. When it outperforms the engine, the evidence is in the log — not an argument.
Gap documentation: Every week, the log shows: Engine said X, statistical model said Y, actuals were Z. Over a season, patterns emerge.
No political risk: The engine's authority is not challenged during shadow mode. The founder sees this as "we're also running this other thing" — not "we're replacing your system."
Parameter refinement: The model is being fitted and tuned in shadow mode. By the time it's ready for influence, it's already battle-hardened on real data.

# Shadow mode logging schema
shadow_log = {
    'week': week_id,
    'sku': sku_id,
    'engine_forecast': engine_output,          # Rules-based engine
    'stat_forecast': stat_model_output,          # Statistical model
    'stat_lower_80': lower_bound,
    'stat_upper_80': upper_bound,
    'actual_demand': None,                         # Filled in retrospectively
    'engine_error': None,                          # Filled retrospectively
    'stat_error': None,                            # Filled retrospectively
    'growth_input': engine_growth_assumption,     # Log this explicitly
    'stat_growth_estimate': stat_growth_ci        # Compare to it
}

09MVP & Iterative Implementation

The company needs something in production now. Don't let the statistically correct be the enemy of the operationally useful.

The path from "no statistical models" to "full closed-loop forecasting system" is not a single jump. It is a sequence of phases, each of which delivers standalone business value and de-risks the next phase. The North Star is profitability — which means the first win needs to be visible within 30–60 days, not 12 months.

P1

Phase 1 — Foundation & Visibility

Days 1–30 · "Know where you stand"

Clean and catalog historical sell-through by SKU from 2024 data. Flag censored demand (stockout periods).
Build the Promotions & Events Log retrospectively — even an imperfect log is better than none.
Classify SKUs: Anchor vs. Fader vs. New Format. Approximately 3 buckets, each getting a different model family.
Fit Seasonal Naïve and Holt-Winters to Anchor SKUs. Compute MASE vs. Naïve baseline. Establish the performance floor.
Parameterize C_u and C_o from P&L and LTV data. Calculate Critical Ratio. This is a one-page deliverable for the executive — the most valuable document of Phase 1.
Begin Shadow Mode logging. Engine output vs. stat model output logged weekly, no decisions changed.
Go/No-Go Criterion: MASE ≤ 1.0 on at least 50% of Anchor SKUs.

P2

Phase 2 — Growth Input Validation & Safety Stock

Days 31–60 · "Quantify the cost of being wrong"

Fit Bass Diffusion to all historical Fader SKU launches. Extract p and q estimates. Document confidence intervals on m for the current season's new launches.
Build the Growth Input Validation dashboard: evidence-based growth estimate vs. founder's target, with Gap Analysis output.
Calculate statistically grounded Safety Stock for each SKU using fitted σ from Phase 1 models + Critical Ratio from C_u/C_o analysis.
Implement Reorder Point triggers for ops staff — this is the "reorder X now" deliverable. Simple threshold, automatically updated weekly.
Present Shadow Mode log first results to data team. Identify SKUs where stat model is outperforming the engine — begin internal advocacy.
Prototype Bayesian Pooling for 1–2 new format SKUs as a proof of concept.
Go/No-Go Criterion: Growth Input Validation report reviewed by data team. ROP triggers deployed and tested for at least 5 SKUs.

P3

Phase 3 — Consensus Plan & Monte Carlo

Days 61–120+ · "Distribution of possible futures"

After one full season of Shadow Mode, present the performance comparison formally. Let the data make the case for statistical influence on the demand plan.
Build the Monte Carlo simulation layer: for each SKU, simulate 10,000 demand scenarios by sampling from the fitted forecast distribution. Aggregate to total production plan scenarios.
Present the founder with a "production scenario distribution" — not "order X units" but "here is what our inventory position looks like across scenarios, and here is the cost of each tail outcome."
If ML data is available (18m+ clean history), begin exploratory LightGBM work on the top 5 SKUs by revenue. Only deploy if MASE improvement is statistically significant.
Introduce Bayesian Pooling formally for all new launches — the prior is now fitted from a full season of launches.
Go/No-Go Criterion: Statistical system has demonstrably lower total forecast cost (C_u·stockouts + C_o·overstock) than engine on the season holdout period.

The Political Strategy

The goal is never to replace the founder's engine — it is to make the founder choose to replace it, because the data is undeniable. Shadow mode builds the evidence. Growth Input Validation makes the risk visible in dollar terms. Monte Carlo gives the founder a tool that enhances their intuition rather than overriding it. By Phase 3, the statistical system should feel like a superpower, not a threat.

10Governance & Monitoring

A forecasting system without governance degrades. Metrics drift, models go stale, and nobody notices until it's expensive.

KPIs That Map to Dollars

KPI	Audience	Target	Action Threshold
Inventory Cost of Forecast Error C_u·stockouts + C_o·overstock ($)	Executive	Decreasing QoQ	Increase >15% vs prior period
Fill Rate by SKU Tier	Executive / Ops	≥95% (Anchors), ≥90% (Faders)	Below threshold for 2 consecutive weeks
MASE by SKU	Data Team	<1.0 (beat naïve)	MASE >1.2 triggers model review
Forecast Bias by SKU	Data Team	\|Bias\| < 5%	Systematic bias (same sign 4+ weeks) triggers refit
Growth Gap Target vs. Evidence-Based	Executive / Data	<10 percentage points	Gap >25pp requires documented justification
Shadow Mode Accuracy Delta	Data Team	Stat model MASE < Engine MASE	Positive delta for 4+ consecutive weeks = escalate for influence

Model Retraining Cadence

Weekly (Automated)

SES/Holt-Winters parameter updates via online learning. Reorder Point recalculation. Shadow Mode log entry. Bias check — flag SKUs with 3+ consecutive same-sign errors.

Monthly (Triggered)

Full model refit if MASE exceeds threshold. Bayesian Pooling weight recalibration as new SKU data accumulates. Cost parameter review (C_u/C_o updates from P&L).

Seasonal (Pre-Planning)

Bass curve refit with new launch actuals. Growth Input Validation report generation. Full holdout evaluation of all models vs. engine. Monte Carlo scenario generation for upcoming season.

The Long-Term Vision — Monte Carlo Production Planning

The endgame described by the founder — a Monte Carlo engine with inputs from historical and statistical results, creating a "distribution of possible production scenarios" — is achievable within 12–18 months of this implementation. The architecture is:

Long-Term Monte Carlo Architecture

Fitted Bass / Logistic Parameters per SKU

C_u / C_o by SKU Tier

Lead Time Distribution

↓ Monte Carlo Simulation (N=10,000) ↓

Demand Scenario 1

Demand Scenario 2

... Scenario N

↓ Aggregate → Production Plan Distribution ↓

P10 Plan (Conservative)

P50 Plan (Base Case)

P85 Plan (Optimistic)

↓ Newsvendor Optimization → Recommended Plan ↓

Optimal Production Plan + Cost of Tail Risks

This architecture transforms the planning conversation from "how many units should we make?" to "here is the distribution of demand outcomes, here is the cost of each tail scenario, and here is the mathematically optimal production plan given your cost structure." The founder's growth input becomes one of several tunable parameters in the simulation — not a monolithic driver, but a lever the team can test and stress-examine.

Final Note — Profitability as North Star

Every model, every metric, every governance process in this document exists to serve one goal: helping Grow Fragrance become profitable. Better forecasting reduces the cost of uncertainty. Reduced uncertainty frees working capital. Freed working capital funds growth without external financing. The path from "vibes-driven planning" to "data-disciplined profitability" is the 18-month arc this document describes — and every phase of it delivers standalone business value on the way.

Product Forecasting &Inventory Optimization

01The Right Problem Statement

The Newsvendor Problem — The Executive Lens

Cost Asymmetry at Grow

Cost of Understocking (Cu)

Cost of Overstocking (Co)

Interactive: Newsvendor Calculator

02Understanding Your Demand Signal

Demand Decomposition

Trend

Seasonality

Cyclicality

Noise / Residual

Intermittent & Lumpy Demand

Data Quality Requirements

03Forecast Error Metrics

The Metric Catalog

Translating Forecast Error into Inventory Cost

04Forecasting Models

Baseline Models — The Floor

Naïve / Random Walk

Seasonal Naïve

Moving Average

Exponential Smoothing — The Workhorse

ARIMA / SARIMA

ML-Based Forecasting

LightGBM / XGBoost

Feature Engineering

Phase 3 Candidate

Model Selection Framework

05Anchors & Faders — Product Lifecycle Modeling

Bass Diffusion Model — For New Launch Faders

Interactive: Bass Diffusion Curve

S-Curve / Logistic Growth — For Anchor SKUs

Bayesian Pooling — Borrowing Strength Across SKUs

06Inventory Optimization

Safety Stock

Reorder Point & Economic Order Quantity

Reorder Point (ROP)

Economic Order Quantity (EOQ)

Newsvendor — Full Parameterization for Grow

Service Levels — Cycle vs. Fill Rate

Cycle Service Level (CSL)

Fill Rate (FR)

07MRP Output → Layered Forecasting System

MRP Data Extraction & What's Usable

System Architecture — Demand Signal Layer

The Feedback Loop

08Growth Input Validation

Evidence-Based Growth Estimate

Gap Analysis

Conditional Production Plan

Shadow Mode Protocol

09MVP & Iterative Implementation

Phase 1 — Foundation & Visibility

Phase 2 — Growth Input Validation & Safety Stock

Phase 3 — Consensus Plan & Monte Carlo

10Governance & Monitoring

KPIs That Map to Dollars

Model Retraining Cadence

Weekly (Automated)

Monthly (Triggered)

Seasonal (Pre-Planning)

The Long-Term Vision — Monte Carlo Production Planning

Product Forecasting &
Inventory Optimization

Cost of Understocking (C_u)

Cost of Overstocking (C_o)