Grow Fragrance · Data Science

Product Forecasting &
Inventory Optimization

A comprehensive implementation guide grounded in Vandeput's Data Science for Supply Chain Forecasting and Inventory Optimization — adapted for a founder-led, DTC-first fragrance business moving toward profitability.

Based on Vandeput (2021, 2023)
Audience Data Scientists & Analysts
Version 1.0 · 2025
Part I
Foundations

01The Right Problem Statement

The goal is not to minimize forecast error. The goal is to minimize the cost of being wrong.

This distinction, which Vandeput makes forcefully in the opening chapters of both books, is the intellectual foundation of everything that follows. Statistical accuracy — a lower RMSE, a tighter confidence interval — is instrumentally valuable only insofar as it reduces the cost of incorrect inventory decisions. An excellent forecast that informs a bad stocking policy is worse than useless. A mediocre forecast that is correctly translated into inventory decisions can be highly profitable.

For Grow Fragrance, this reframing is not merely academic. When the planning process runs on a founder's growth intuition baked into a rules-based engine, the question to ask is not "how accurate is the engine?" but "what is the cost distribution of its errors, and in which direction does it tend to fail?"

The Newsvendor Problem — The Executive Lens

The Newsvendor model is the single most important concept in this entire document. It is named for a newspaper seller who must decide each morning how many papers to order before knowing how many will sell. Order too few, and you lose potential profit. Order too many, and you're stuck with unsellable inventory.

The model is deceptively simple and extraordinarily powerful. It translates every forecasting and inventory decision into a single, intuitive question: given the cost of ordering too much vs. too little, how many units should I stock?

Newsvendor — Critical Ratio CR = Cu / (Cu + Co)
CR — Critical Ratio (target service level / optimal fill rate)
Cu — Cost of Understocking (lost margin + lost customer LTV + brand damage)
Co — Cost of Overstocking (holding cost + markdown risk + disposal cost)

The critical ratio tells you, directly, what percentile of the demand distribution you should stock to. If CR = 0.80, stock the 80th percentile of expected demand. If demand is normally distributed with mean 500 and std dev 80, stock approximately 567 units (500 + 0.84 × 80).

Cost Asymmetry at Grow

Grow has confirmed that stockouts are materially more painful than overstock. The cost structure for a DTC fragrance brand is characteristically asymmetric:

⚠️

Cost of Understocking (Cu)

Lost revenue on the sale. Lost LTV of the customer (repeat buyers who hit a stockout may not return). Brand reputation damage. Lost social momentum — influencer spikes can't be recaptured. Lost wholesale placement if a retail order can't be fulfilled.

📦

Cost of Overstocking (Co)

Holding/carrying cost (fragrance materials are relatively stable; not high spoilage). Working capital tied up in inventory. Potential markdown to clear. Fungibility of raw materials partially offsets this — unused fragrance components can often be redirected.

Key Insight

The raw material fungibility at Grow structurally lowers Co. If a fragrance batch doesn't sell as expected, the components aren't lost — they can be reformulated or redirected. This asymmetry pushes the Critical Ratio well above 0.50, potentially toward 0.75–0.85 for hero SKUs. The exact number requires properly parameterizing Cu and Co from actuals — a Phase 1 deliverable.

Interactive: Newsvendor Calculator

Adjust cost parameters to see how the optimal stocking quantity shifts. This is the executive conversation — not "our MASE improved."

0.79 Critical Ratio
569 Optimal Order Qty
79% Target Service Level
Common Mistake

Using a 50th percentile forecast (the mean) as the stocking quantity implicitly assumes Cu = Co. For Grow, where stockouts are far more costly, this systematically under-stocks every SKU. The Newsvendor framework makes this error visible and correctable.

02Understanding Your Demand Signal

Before choosing a model, understand what your data is actually telling you.

Vandeput dedicates significant attention to demand characterization in Data Science for Supply Chain Forecasting. The choice of forecasting model is downstream of the shape of demand — not upstream. A SARIMA model applied to intermittent, lumpy demand will perform worse than a naïve method.

Demand Decomposition

Any demand time series can be decomposed into four components:

📈

Trend

Long-run direction of demand. For Grow's hero SKUs, this should be positive and measurable with 12+ months of data. Bass diffusion applies here for new launches.

🌊

Seasonality

Recurring patterns within a period (weekly, monthly, annual). Fragrance sales have strong annual seasonality — the existing rules-based engine captures some of this implicitly.

Cyclicality

Longer-run patterns not tied to calendar (economic cycles, category waves). Less relevant for Grow at current scale, but relevant for wholesale channel planning.

〰️

Noise / Residual

The irreducible random component. Influencer spikes, viral moments, and one-time promotions land here. Crucially — you cannot and should not try to forecast noise.

Illustrative Demand Decomposition — Fragrance SKU (Annual Pattern)

Intermittent & Lumpy Demand

Vandeput distinguishes demand types using two axes: demand frequency (how often does a non-zero demand period occur?) and demand variability (when demand does occur, how variable is the quantity?).

Demand Type Frequency Variability Relevant for Grow? Recommended Approach
Smooth High Low Hero SKUs SES, Holt-Winters, ARIMA
Erratic High High Promo SKUs Holt-Winters + event features
Intermittent Low Low Slow Movers Croston's method
Lumpy Low High New Formats Bayesian Pooling, ADIDA
Grow Context

With 3–5 fragrances × 4–5 SKUs per fragrance, you likely have a mix of smooth (hero anchors), erratic (launch SKUs with promotion), and potentially lumpy (new formats). Classifying your SKU catalog into these buckets is a Day 1 task — it determines which model family applies to each SKU before any data fitting begins.

Data Quality Requirements

Vandeput is explicit: no model can overcome bad input data. For Grow, the minimum viable data requirements are:

03Forecast Error Metrics

Metrics are not goals — they are instruments. Know what each measures and what it cannot measure.

The Metric Catalog

Metric Formula Strengths Weaknesses
MAE
Mean Absolute Error
mean(|A - F|) Interpretable in units. Easy to explain. Scale-dependent. Can't compare across SKUs with different volumes.
RMSE
Root Mean Squared Error
√mean((A-F)²) Penalizes large errors more heavily. Useful when outlier errors are costly. Less interpretable. Sensitive to a small number of large errors.
MASE
Mean Absolute Scaled Error
MAE / MAE_naïve Scale-free. Comparable across SKUs and time periods. Vandeput's recommended default. Requires a naïve benchmark to exist. Can be unintuitive.
Bias
Mean Error
mean(F - A) Reveals systematic over/under-forecasting. A biased model with low RMSE is dangerous. Errors cancel out — can be zero even with highly variable forecasts.
MAPE
Mean Abs % Error
mean(|A-F|/A) Intuitive percentage. Widely used in business. Undefined when A=0. Biased toward under-forecasting. Vandeput recommends avoiding it.
Vandeput's Warning on MAPE

MAPE is the most commonly reported forecast metric in business and one of the least useful. It is undefined for zero-demand periods (common in fragrance SKUs), asymmetrically penalizes overforecasting, and systematically incentivizes under-forecasting. Replace it with MASE as your primary accuracy metric and track Bias separately.

Translating Forecast Error into Inventory Cost

This is the bridge between statistical modeling and business value — and it is the bridge most implementations never build. Vandeput argues that forecast error is only meaningful when expressed in dollars, not abstract statistical units.

Error Cost Translation Expected Cost = Cu · E[max(D-Q, 0)] + Co · E[max(Q-D, 0)]
D — Realized demand (random variable)
Q — Stocking quantity
E[max(D-Q, 0)] — Expected stockout quantity (lost sales)
E[max(Q-D, 0)] — Expected overstock quantity (excess inventory)
Cost Surface: Expected Cost as a Function of Stocking Quantity

The chart above shows the critical insight: the optimal stocking quantity is not at the mean of the forecast distribution. It shifts right (toward higher stocking) as Cu increases relative to Co. For Grow, given the confirmed asymmetry, the optimal quantity will consistently sit above the forecast mean — meaning a 50th-percentile point forecast systematically under-stocks.

Part II
Forecasting Models

04Forecasting Models

Start simple. Add complexity only when simplicity provably fails.

Baseline Models — The Floor

Vandeput's most important methodological point on model selection: every model must beat a naïve benchmark to justify its complexity. The naïve forecast is simply: "next period's demand equals this period's demand." It requires no estimation, no parameters, and no data beyond the last observation.

Naïve / Random Walk

F(t+1) = A(t)

The MASE denominator. If any model you build has MASE > 1.0, you are worse than doing nothing. This is your absolute floor.

Seasonal Naïve

F(t+1) = A(t - seasonality)

Next period equals the same period last year. For fragrance with strong annual cycles, this is a surprisingly strong baseline. Often beats complex models with limited data.

Moving Average

F(t+1) = mean(A(t-n+1)...A(t))

Smooths noise but lags trend. Window length n is a tuning parameter. Simple to explain to non-technical stakeholders.

Exponential Smoothing — The Workhorse

Exponential smoothing models are Vandeput's recommended starting point for most supply chain applications. They are computationally lightweight, interpretable, and adapt to changing demand levels over time. The alpha (α) parameter controls how quickly the model responds to new information.

Simple Exponential Smoothing (SES) S(t) = α · A(t) + (1 - α) · S(t-1)
S(t) — Smoothed level at time t (also the one-step-ahead forecast)
α — Smoothing parameter ∈ (0,1). High α = reacts quickly to new data. Low α = stable, slow to respond.
A(t) — Actual demand at time t
Holt-Winters (Triple Exponential Smoothing) F(t+h) = (L(t) + h·T(t)) · S(t+h-m)
L(t) — Level component (updated with α)
T(t) — Trend component (updated with β)
S(t) — Seasonal component (updated with γ)
m — Seasonal period (12 for monthly/annual seasonality)
h — Forecast horizon
Grow Recommendation

Holt-Winters with additive or multiplicative seasonality is the appropriate starting model for Grow's hero SKUs (anchor fragrances with established history). The seasonal component will partially replicate what the rules-based engine does implicitly — but with statistically fitted parameters rather than manual adjustments. This becomes the shadow mode comparator in Phase 1.

Holt-Winters vs. Naïve vs. Actual — Illustrative Fragrance SKU

ARIMA / SARIMA

ARIMA (AutoRegressive Integrated Moving Average) models demand as a function of its own past values and past forecast errors. SARIMA extends this with seasonal differencing. These are appropriate when the demand series shows autocorrelation — i.e., knowing yesterday's demand genuinely helps predict today's.

Vandeput is pragmatic about ARIMA: it is more complex to tune (p, d, q parameters plus seasonal equivalents), requires stationarity testing, and rarely outperforms Holt-Winters on typical supply chain data. However, for Grow's DTC channel where day-of-week effects, promotional echoes, and holiday patterns create structured autocorrelation, SARIMA can add value — particularly as data accumulates through 2025 and 2026.

SARIMA(p,d,q)(P,D,Q,m) ARIMA Order: (p,d,q) × Seasonal Order: (P,D,Q,m)
p / P — AR order: how many past values to include (non-seasonal / seasonal)
d / D — Differencing order: how many times to difference for stationarity
q / Q — MA order: how many past errors to include
m — Seasonal period (12 for monthly data)

ML-Based Forecasting

Machine learning models (gradient boosting, random forests, neural networks) can incorporate external features — promotions, social media signals, day-of-week, holidays, new product indicators — that classical time series models cannot. Vandeput discusses these in the context of feature engineering for supply chain, noting that their value is primarily in capturing causal demand drivers rather than in time series pattern recognition per se.

LightGBM / XGBoost

Gradient boosted trees. Best overall performer on tabular demand data when sufficient feature engineering is applied. Handles non-linear interactions (promo × season × launch age) naturally.

Feature Engineering

The model is only as good as its features. Essential features: lag variables (demand at t-1, t-7, t-28), rolling statistics, promotion flags, SKU age, launch cohort, Bass curve position.

Phase 3 Candidate

ML models require more data than Grow currently has. With 12–18 months of clean actuals, begin exploratory ML work. Deploy only when MASE meaningfully beats Holt-Winters on a holdout set.

Model Selection Framework

The question is never "which model is best in theory?" It is always "which model produces the lowest expected inventory cost on this SKU given this much data?"

Model Selection Decision Tree
New SKU?
↙ Yes        No ↘
Bayesian Pooling from similar cohort
Intermittent Demand?
↙ Yes   No ↘
Croston / ADIDA
Seasonal Pattern?
↙ Yes   No ↘
Holt-Winters
SES / Holt's
↓ (with 18m+ data)
SARIMA / LightGBM

05Anchors & Faders — Product Lifecycle Modeling

Every SKU in a fragrance portfolio follows a lifecycle. Modeling the shape of that lifecycle is more powerful than fitting a time series in isolation.

The existing rules-based engine implicitly encodes a lifecycle assumption: new fragrances peak and decay, while a handful of "anchor" products maintain or grow. This intuition is statistically grounded — it maps directly to two well-studied models: the Bass Diffusion Model (for faders) and S-curve / logistic growth (for anchors). Making these curves explicit and fitting them to historical data turns founder intuition into testable, refineable parameters.

Bass Diffusion Model — For New Launch Faders

The Bass model (Frank Bass, 1969) describes the adoption of a new product as the interplay between two populations: innovators who adopt independently of others, and imitators who adopt because of social influence from existing adopters.

Bass Diffusion Model f(t) = [p + q·F(t)] · [1 - F(t)]
f(t) — Fraction of market adopting at time t (instantaneous adoption rate)
F(t) — Cumulative fraction of market that has adopted by time t
p — Coefficient of innovation (external influence: ads, brand awareness). Typical range: 0.01–0.03
q — Coefficient of imitation (internal influence: word of mouth, social proof). Typical range: 0.3–0.5
m — Total potential market size (must be estimated or set via growth input)

For Grow, the critical parameter is m — this is precisely where the founder's growth input enters the model. The Bass curve shape (governed by p and q) can be fitted from historical launch data. Once fitted, all subsequent launches share the same curve shape; only m varies. This is the statistical validation layer for the growth assumption: "given p and q fitted from your previous launches, what value of m is implied by our historical peak demands, and how does that compare to the growth target?"

Interactive: Bass Diffusion Curve

Adjust parameters to see how different fragrance launch profiles emerge. The peak timing and height are direct functions of p, q, and m.

S-Curve / Logistic Growth — For Anchor SKUs

Anchor SKUs — the 1–2 hero fragrances that grow in the early years before plateauing — follow a logistic growth curve. Unlike Bass, the logistic model has no decay phase: demand rises, inflects, and saturates at a ceiling.

Logistic Growth (S-Curve) D(t) = K / (1 + e^(-r(t - t₀)))
K — Carrying capacity / demand ceiling (maximum sustainable demand)
r — Growth rate (steepness of the S-curve)
t₀ — Inflection point (time of fastest growth)
Anchor vs. Fader Demand Profiles — Illustrative Comparison

Bayesian Pooling — Borrowing Strength Across SKUs

When a new SKU launches, it has zero history. The naïve approach is to wait for data to accumulate before forecasting. The Bayesian approach is to use related SKUs as informative priors — shrinking the new SKU's forecast toward the cohort mean, and then updating as actuals arrive.

The Pooling Logic

A new fragrance in the Fader category is not launched into a vacuum. Grow has previously launched fragrances. Those launches inform a prior distribution over (p, q, m). The new launch starts there and updates toward its own data as weeks accumulate. After 4–6 weeks of sell-through, the posterior is meaningfully updated. After a full season, the SKU stands on its own data.

The mathematical machinery is hierarchical Bayesian modeling — at its simplest, a partial pooling model where individual SKU parameters are drawn from a common hyperprior. In practice, a reasonable approximation for Phase 2 is:

Approximate Bayesian Pooling F_new = w · F_pooled + (1-w) · F_individual
w — Pooling weight, decreasing as n (individual observations) increases. w = k/(k+n) where k is a tunable shrinkage constant.
F_pooled — Forecast from cohort-level parameters (all Faders, all Anchors)
F_individual — Forecast from this SKU's own fitted parameters

This approach is directly applicable to Grow's new format problem: a new size of an existing fragrance can pool from both the cohort-level prior and the parent fragrance's demand history, with higher initial weight on the parent.

Part III
Inventory Optimization

06Inventory Optimization

Forecasting tells you what demand might be. Inventory optimization tells you what to do about it.

Safety Stock

Safety stock is buffer inventory held to protect against demand variability and supply uncertainty during the replenishment lead time. Vandeput's key point: safety stock is not waste — it is the cost of uncertainty. Reducing safety stock without reducing uncertainty just increases the probability of stockout.

Safety Stock — Standard Formula SS = z · σLT
z — Service level factor (z-score corresponding to target service level). z=1.28 for 90%, z=1.65 for 95%, z=2.05 for 98%
σLT — Standard deviation of demand during lead time
When both demand and lead time vary:
σLT = √(L̄ · σ²d + d̄² · σ²L)
L̄, σL — Mean and std dev of lead time
d̄, σd — Mean and std dev of demand per period
The z-score Connection to Newsvendor

The z-score used in safety stock calculation is the Newsvendor critical ratio, expressed as a normal quantile. When CR = 0.80, z = 0.84. This means the two frameworks — Newsvendor and safety stock — are the same model expressed differently. The Newsvendor gives the intuition; safety stock gives the operational implementation.

Reorder Point & Economic Order Quantity

Reorder Point (ROP)

ROP = d̄ · L̄ + SS
d̄ · L̄ — Expected demand during lead time
SS — Safety stock buffer

When inventory position falls to or below ROP, trigger a replenishment order. This is the operational trigger for day-to-day ops staff — the "reorder X now" signal.

Economic Order Quantity (EOQ)

EOQ = √(2 · D · K / h)
D — Annual demand
K — Fixed ordering cost per order
h — Holding cost per unit per year

EOQ minimizes total ordering + holding costs. For Grow, where raw material fungibility lowers h, EOQ will tend toward larger, less frequent orders — but must be balanced against cash flow constraints.

Newsvendor — Full Parameterization for Grow

Returning to the Newsvendor with full parameterization relevant to Grow's cost structure:

Cost Component Category Estimable From Notes for Grow
Lost sale margin Cu P&L, price × margin % Direct, measurable
Lost repeat customer LTV Cu Cohort analysis, repurchase rate High for DTC — a stockout breaks the repurchase habit
Lost brand/social momentum Cu Attribution modeling Hard to quantify; use conservative estimate + sensitivity analysis
Raw material holding cost Co WACC × inventory value Low for stable fragrance materials
Finished goods holding cost Co Warehouse + WACC Modest; space is limited constraint
Markdown / clearance cost Co Historical promo discounts Partially offset by fungibility

Service Levels — Cycle vs. Fill Rate

Vandeput distinguishes two service level definitions that are frequently confused:

Cycle Service Level (CSL)

Probability that no stockout occurs during a replenishment cycle. CSL = 95% means 95% of order cycles have zero stockouts. This is what the z-score in the safety stock formula directly controls.

Fill Rate (FR)

Fraction of demand met from stock on hand. FR = 95% means 95% of units demanded are shipped without delay. Fill rate is always ≥ CSL. For Grow's DTC context, fill rate is the more meaningful customer-facing metric.

Do Not Confuse These

A 95% CSL does not mean 95% of orders are fulfilled. It means 95% of cycles are stockout-free. If cycles are short and order quantities are small, a 95% CSL can correspond to a 99%+ fill rate. If cycles are long, the same CSL might yield a much lower fill rate. Always report both for executive conversations.

Part IV
Implementation

07MRP Output → Layered Forecasting System

MRP is a constraint solver, not a demand forecaster. The two must be architecturally separated and connected deliberately.

Material Requirements Planning (MRP) is a production scheduling and materials planning tool. It takes a demand plan as input and calculates what to produce, buy, and when. The demand plan itself is not generated by MRP — it is fed into it. The current state at Grow is that the rules-based Demand Engine produces the demand plan that feeds MRP. The statistical forecasting layer will eventually sit between raw historical data and MRP, either replacing or augmenting the Demand Engine's output.

MRP Data Extraction & What's Usable

MRP Output Field Usability Notes
Historical planned orders Low Reflects the Demand Engine's intent, not actual demand. Do not use as demand history.
Historical actual receipts Medium Useful for lead time distribution fitting. Actual vs. planned receipt dates = lead time variability.
On-hand inventory snapshots High Essential for calculating implied demand from inventory movements (beginning + receipts - ending = sales).
Stockout / back-order flags High Censored demand identification. Any period with a stockout has understated demand.
Bill of Materials (BOM) High Links finished goods to raw materials. Critical for raw material demand planning and fungibility mapping.
Critical Data Distinction

Use sell-through (actual sales to end customers) as your demand signal — not production orders, not receipts, not planned demand. If you only have inventory movement data, back-calculate demand as: Demand(t) = Inventory(t-1) + Receipts(t) − Inventory(t), adjusting for any stockout periods.

System Architecture — Demand Signal Layer

Layer 1 — Raw Data Sources
Shopify / DTC Sales
Wholesale Orders
Inventory Snapshots
Promotions Log
↓ ETL / DataHub Pipeline ↓
Layer 2 — Clean Demand Signal
Adjusted Sell-Through by SKU/Week
↓ Parallel Paths ↓
Rules-Based Engine (existing)
Statistical Forecast Engine (new)
↓ Reconciliation Layer ↓
Layer 3 — Consensus Demand Plan
Validated Demand Plan + Uncertainty Intervals
MRP Input
Inventory Policy Engine
Reorder Triggers

The Feedback Loop

A forecasting system without a feedback loop is not a system — it is a one-time calculation. The operational value comes from continuous updating: actuals flow back in, forecast errors are measured, model parameters are updated, and inventory policies are recalibrated. At Grow's current scale, this loop can run weekly.

# Conceptual weekly update loop
def weekly_forecast_update(sku_id, new_actuals):
    # 1. Ingest new week actuals
    actuals = load_actuals(sku_id) + new_actuals
    
    # 2. Refit model (SES/HW alpha updates automatically; Bass needs periodic refit)
    model = load_model(sku_id)
    model.update(new_actuals)
    
    # 3. Generate new forecast with uncertainty intervals
    forecast, lower, upper = model.predict(horizon=12, confidence=0.80)
    
    # 4. Recalculate safety stock with updated σ
    ss = safety_stock(sigma_lt=model.sigma_lt, z=cr_to_z(cu, co))
    
    # 5. Update reorder point
    rop = model.mean_lt_demand + ss
    
    # 6. Log forecast error metrics for governance
    log_metrics(sku_id, mase=calc_mase(actuals, forecast), bias=calc_bias(actuals, forecast))
    
    return {'forecast': forecast, 'rop': rop, 'ss': ss}

08Growth Input Validation

The founder's growth input is a prior, not a forecast. The statistical system's job is to interrogate that prior with evidence.

The current Demand Engine uses a growth target as a primary driver — accounting for an estimated 60–70% of output. This is not inherently wrong: growth targets are legitimate inputs to planning. The problem is the absence of a validation mechanism. When the growth input is high, the entire production plan is large. If the growth doesn't materialize, that's working capital tied up in inventory. If it does materialize but wasn't planned for, that's stockouts.

The statistical framework provides three things the engine currently lacks:

📊

Evidence-Based Growth Estimate

Fit trend models to historical sell-through by SKU. Calculate the statistically supported growth rate with confidence intervals. This is the "what does the data say the company can grow at" number.

Gap Analysis

Compare the evidence-based growth estimate to the growth input. A large gap (target: 40%, data-supported: 22%) requires explanation — not suppression. What specific marketing actions, new products, or channel expansions justify the gap?

🎯

Conditional Production Plan

Present a distribution of outcomes, not a single number. "If we hit 40% growth: produce X. If we hit 22%: produce Y. The cost of Y being wrong in each direction is Z." This is the Monte Carlo vision, accelerated.

Growth Input Validation Framework Gap = Growth_target − Growth_evidence
Growth_evidence — MoM or YoY trend from statistical model fitted to sell-through, with 80% CI
Growth_target — Founder's input to Demand Engine
Decision rules:
Gap < 10pp — Reasonable; no intervention needed. Proceed with blended estimate.
Gap 10–25pp — Elevated; require documented action plan (marketing spend, new product timeline, channel expansion).
Gap > 25pp — High risk; temper growth input to evidence + 15pp, or require board-level discussion of required investment to bridge gap.

Shadow Mode Protocol

Shadow mode is the politically and operationally correct way to introduce statistical forecasting alongside a founder's existing system. The statistical model runs in parallel, produces outputs, and accumulates a performance track record — but does not yet influence decisions. This serves several purposes:

# Shadow mode logging schema
shadow_log = {
    'week': week_id,
    'sku': sku_id,
    'engine_forecast': engine_output,          # Rules-based engine
    'stat_forecast': stat_model_output,          # Statistical model
    'stat_lower_80': lower_bound,
    'stat_upper_80': upper_bound,
    'actual_demand': None,                         # Filled in retrospectively
    'engine_error': None,                          # Filled retrospectively
    'stat_error': None,                            # Filled retrospectively
    'growth_input': engine_growth_assumption,     # Log this explicitly
    'stat_growth_estimate': stat_growth_ci        # Compare to it
}

09MVP & Iterative Implementation

The company needs something in production now. Don't let the statistically correct be the enemy of the operationally useful.

The path from "no statistical models" to "full closed-loop forecasting system" is not a single jump. It is a sequence of phases, each of which delivers standalone business value and de-risks the next phase. The North Star is profitability — which means the first win needs to be visible within 30–60 days, not 12 months.

P1

Phase 1 — Foundation & Visibility

Days 1–30 · "Know where you stand"
  • Clean and catalog historical sell-through by SKU from 2024 data. Flag censored demand (stockout periods).
  • Build the Promotions & Events Log retrospectively — even an imperfect log is better than none.
  • Classify SKUs: Anchor vs. Fader vs. New Format. Approximately 3 buckets, each getting a different model family.
  • Fit Seasonal Naïve and Holt-Winters to Anchor SKUs. Compute MASE vs. Naïve baseline. Establish the performance floor.
  • Parameterize Cu and Co from P&L and LTV data. Calculate Critical Ratio. This is a one-page deliverable for the executive — the most valuable document of Phase 1.
  • Begin Shadow Mode logging. Engine output vs. stat model output logged weekly, no decisions changed.
  • Go/No-Go Criterion: MASE ≤ 1.0 on at least 50% of Anchor SKUs.
P2

Phase 2 — Growth Input Validation & Safety Stock

Days 31–60 · "Quantify the cost of being wrong"
  • Fit Bass Diffusion to all historical Fader SKU launches. Extract p and q estimates. Document confidence intervals on m for the current season's new launches.
  • Build the Growth Input Validation dashboard: evidence-based growth estimate vs. founder's target, with Gap Analysis output.
  • Calculate statistically grounded Safety Stock for each SKU using fitted σ from Phase 1 models + Critical Ratio from Cu/Co analysis.
  • Implement Reorder Point triggers for ops staff — this is the "reorder X now" deliverable. Simple threshold, automatically updated weekly.
  • Present Shadow Mode log first results to data team. Identify SKUs where stat model is outperforming the engine — begin internal advocacy.
  • Prototype Bayesian Pooling for 1–2 new format SKUs as a proof of concept.
  • Go/No-Go Criterion: Growth Input Validation report reviewed by data team. ROP triggers deployed and tested for at least 5 SKUs.
P3

Phase 3 — Consensus Plan & Monte Carlo

Days 61–120+ · "Distribution of possible futures"
  • After one full season of Shadow Mode, present the performance comparison formally. Let the data make the case for statistical influence on the demand plan.
  • Build the Monte Carlo simulation layer: for each SKU, simulate 10,000 demand scenarios by sampling from the fitted forecast distribution. Aggregate to total production plan scenarios.
  • Present the founder with a "production scenario distribution" — not "order X units" but "here is what our inventory position looks like across scenarios, and here is the cost of each tail outcome."
  • If ML data is available (18m+ clean history), begin exploratory LightGBM work on the top 5 SKUs by revenue. Only deploy if MASE improvement is statistically significant.
  • Introduce Bayesian Pooling formally for all new launches — the prior is now fitted from a full season of launches.
  • Go/No-Go Criterion: Statistical system has demonstrably lower total forecast cost (Cu·stockouts + Co·overstock) than engine on the season holdout period.
The Political Strategy

The goal is never to replace the founder's engine — it is to make the founder choose to replace it, because the data is undeniable. Shadow mode builds the evidence. Growth Input Validation makes the risk visible in dollar terms. Monte Carlo gives the founder a tool that enhances their intuition rather than overriding it. By Phase 3, the statistical system should feel like a superpower, not a threat.

10Governance & Monitoring

A forecasting system without governance degrades. Metrics drift, models go stale, and nobody notices until it's expensive.

KPIs That Map to Dollars

KPI Audience Target Action Threshold
Inventory Cost of Forecast Error
Cu·stockouts + Co·overstock ($)
Executive Decreasing QoQ Increase >15% vs prior period
Fill Rate by SKU Tier Executive / Ops ≥95% (Anchors), ≥90% (Faders) Below threshold for 2 consecutive weeks
MASE by SKU Data Team <1.0 (beat naïve) MASE >1.2 triggers model review
Forecast Bias by SKU Data Team |Bias| < 5% Systematic bias (same sign 4+ weeks) triggers refit
Growth Gap
Target vs. Evidence-Based
Executive / Data <10 percentage points Gap >25pp requires documented justification
Shadow Mode Accuracy Delta Data Team Stat model MASE < Engine MASE Positive delta for 4+ consecutive weeks = escalate for influence

Model Retraining Cadence

Weekly (Automated)

SES/Holt-Winters parameter updates via online learning. Reorder Point recalculation. Shadow Mode log entry. Bias check — flag SKUs with 3+ consecutive same-sign errors.

Monthly (Triggered)

Full model refit if MASE exceeds threshold. Bayesian Pooling weight recalibration as new SKU data accumulates. Cost parameter review (Cu/Co updates from P&L).

Seasonal (Pre-Planning)

Bass curve refit with new launch actuals. Growth Input Validation report generation. Full holdout evaluation of all models vs. engine. Monte Carlo scenario generation for upcoming season.

The Long-Term Vision — Monte Carlo Production Planning

The endgame described by the founder — a Monte Carlo engine with inputs from historical and statistical results, creating a "distribution of possible production scenarios" — is achievable within 12–18 months of this implementation. The architecture is:

Long-Term Monte Carlo Architecture
Fitted Bass / Logistic Parameters per SKU
Cu / Co by SKU Tier
Lead Time Distribution
↓ Monte Carlo Simulation (N=10,000) ↓
Demand Scenario 1
Demand Scenario 2
... Scenario N
↓ Aggregate → Production Plan Distribution ↓
P10 Plan (Conservative)
P50 Plan (Base Case)
P85 Plan (Optimistic)
↓ Newsvendor Optimization → Recommended Plan ↓
Optimal Production Plan + Cost of Tail Risks

This architecture transforms the planning conversation from "how many units should we make?" to "here is the distribution of demand outcomes, here is the cost of each tail scenario, and here is the mathematically optimal production plan given your cost structure." The founder's growth input becomes one of several tunable parameters in the simulation — not a monolithic driver, but a lever the team can test and stress-examine.

Final Note — Profitability as North Star

Every model, every metric, every governance process in this document exists to serve one goal: helping Grow Fragrance become profitable. Better forecasting reduces the cost of uncertainty. Reduced uncertainty frees working capital. Freed working capital funds growth without external financing. The path from "vibes-driven planning" to "data-disciplined profitability" is the 18-month arc this document describes — and every phase of it delivers standalone business value on the way.

Grow Fragrance · Product Forecasting Implementation Guide v1.0
Primary References: Vandeput, N. (2021). Data Science for Supply Chain Forecasting. De Gruyter. · Vandeput, N. (2023). Inventory Optimization: Models and Simulations. De Gruyter.
Supporting Literature: Bass, F.M. (1969). A New Product Growth Model. · Fader & Hardie (2005). CLV Modeling. · Silver, Pyke & Thomas (2017). Inventory and Production Management.
AWAITING FINAL REVIEW
Pre-Production State — not approved for final distribution — preprod.growfragrance.ai