Product Forecasting &
Inventory Optimization
A comprehensive implementation guide grounded in Vandeput's Data Science for Supply Chain Forecasting and Inventory Optimization — adapted for a founder-led, DTC-first fragrance business moving toward profitability.
01The Right Problem Statement
The goal is not to minimize forecast error. The goal is to minimize the cost of being wrong.
This distinction, which Vandeput makes forcefully in the opening chapters of both books, is the intellectual foundation of everything that follows. Statistical accuracy — a lower RMSE, a tighter confidence interval — is instrumentally valuable only insofar as it reduces the cost of incorrect inventory decisions. An excellent forecast that informs a bad stocking policy is worse than useless. A mediocre forecast that is correctly translated into inventory decisions can be highly profitable.
For Grow Fragrance, this reframing is not merely academic. When the planning process runs on a founder's growth intuition baked into a rules-based engine, the question to ask is not "how accurate is the engine?" but "what is the cost distribution of its errors, and in which direction does it tend to fail?"
The Newsvendor Problem — The Executive Lens
The Newsvendor model is the single most important concept in this entire document. It is named for a newspaper seller who must decide each morning how many papers to order before knowing how many will sell. Order too few, and you lose potential profit. Order too many, and you're stuck with unsellable inventory.
The model is deceptively simple and extraordinarily powerful. It translates every forecasting and inventory decision into a single, intuitive question: given the cost of ordering too much vs. too little, how many units should I stock?
The critical ratio tells you, directly, what percentile of the demand distribution you should stock to. If CR = 0.80, stock the 80th percentile of expected demand. If demand is normally distributed with mean 500 and std dev 80, stock approximately 567 units (500 + 0.84 × 80).
Cost Asymmetry at Grow
Grow has confirmed that stockouts are materially more painful than overstock. The cost structure for a DTC fragrance brand is characteristically asymmetric:
Cost of Understocking (Cu)
Lost revenue on the sale. Lost LTV of the customer (repeat buyers who hit a stockout may not return). Brand reputation damage. Lost social momentum — influencer spikes can't be recaptured. Lost wholesale placement if a retail order can't be fulfilled.
Cost of Overstocking (Co)
Holding/carrying cost (fragrance materials are relatively stable; not high spoilage). Working capital tied up in inventory. Potential markdown to clear. Fungibility of raw materials partially offsets this — unused fragrance components can often be redirected.
The raw material fungibility at Grow structurally lowers Co. If a fragrance batch doesn't sell as expected, the components aren't lost — they can be reformulated or redirected. This asymmetry pushes the Critical Ratio well above 0.50, potentially toward 0.75–0.85 for hero SKUs. The exact number requires properly parameterizing Cu and Co from actuals — a Phase 1 deliverable.
Interactive: Newsvendor Calculator
Adjust cost parameters to see how the optimal stocking quantity shifts. This is the executive conversation — not "our MASE improved."
Using a 50th percentile forecast (the mean) as the stocking quantity implicitly assumes Cu = Co. For Grow, where stockouts are far more costly, this systematically under-stocks every SKU. The Newsvendor framework makes this error visible and correctable.
02Understanding Your Demand Signal
Before choosing a model, understand what your data is actually telling you.
Vandeput dedicates significant attention to demand characterization in Data Science for Supply Chain Forecasting. The choice of forecasting model is downstream of the shape of demand — not upstream. A SARIMA model applied to intermittent, lumpy demand will perform worse than a naïve method.
Demand Decomposition
Any demand time series can be decomposed into four components:
Trend
Long-run direction of demand. For Grow's hero SKUs, this should be positive and measurable with 12+ months of data. Bass diffusion applies here for new launches.
Seasonality
Recurring patterns within a period (weekly, monthly, annual). Fragrance sales have strong annual seasonality — the existing rules-based engine captures some of this implicitly.
Cyclicality
Longer-run patterns not tied to calendar (economic cycles, category waves). Less relevant for Grow at current scale, but relevant for wholesale channel planning.
Noise / Residual
The irreducible random component. Influencer spikes, viral moments, and one-time promotions land here. Crucially — you cannot and should not try to forecast noise.
Intermittent & Lumpy Demand
Vandeput distinguishes demand types using two axes: demand frequency (how often does a non-zero demand period occur?) and demand variability (when demand does occur, how variable is the quantity?).
| Demand Type | Frequency | Variability | Relevant for Grow? | Recommended Approach |
|---|---|---|---|---|
| Smooth | High | Low | Hero SKUs | SES, Holt-Winters, ARIMA |
| Erratic | High | High | Promo SKUs | Holt-Winters + event features |
| Intermittent | Low | Low | Slow Movers | Croston's method |
| Lumpy | Low | High | New Formats | Bayesian Pooling, ADIDA |
With 3–5 fragrances × 4–5 SKUs per fragrance, you likely have a mix of smooth (hero anchors), erratic (launch SKUs with promotion), and potentially lumpy (new formats). Classifying your SKU catalog into these buckets is a Day 1 task — it determines which model family applies to each SKU before any data fitting begins.
Data Quality Requirements
Vandeput is explicit: no model can overcome bad input data. For Grow, the minimum viable data requirements are:
- Sales history: Daily or weekly sell-through by SKU (not orders — actual sales). Starting from 2024 means roughly 1–2 years of data. Sufficient for seasonal models, limiting for multi-year trend estimation.
- Promotions & events log: Date, SKU affected, type (influencer, paid ad, email blast, sale). Without this, promo demand looks like noise and inflates forecast error.
- Stockout log: When was a SKU out of stock? Censored demand (periods where you would have sold more but couldn't) is one of the most common forecast biases.
- Lead times: Supplier lead time by raw material. This feeds directly into safety stock calculation.
03Forecast Error Metrics
Metrics are not goals — they are instruments. Know what each measures and what it cannot measure.
The Metric Catalog
| Metric | Formula | Strengths | Weaknesses |
|---|---|---|---|
| MAE Mean Absolute Error |
mean(|A - F|) | Interpretable in units. Easy to explain. | Scale-dependent. Can't compare across SKUs with different volumes. |
| RMSE Root Mean Squared Error |
√mean((A-F)²) | Penalizes large errors more heavily. Useful when outlier errors are costly. | Less interpretable. Sensitive to a small number of large errors. |
| MASE Mean Absolute Scaled Error |
MAE / MAE_naïve | Scale-free. Comparable across SKUs and time periods. Vandeput's recommended default. | Requires a naïve benchmark to exist. Can be unintuitive. |
| Bias Mean Error |
mean(F - A) | Reveals systematic over/under-forecasting. A biased model with low RMSE is dangerous. | Errors cancel out — can be zero even with highly variable forecasts. |
| MAPE Mean Abs % Error |
mean(|A-F|/A) | Intuitive percentage. Widely used in business. | Undefined when A=0. Biased toward under-forecasting. Vandeput recommends avoiding it. |
MAPE is the most commonly reported forecast metric in business and one of the least useful. It is undefined for zero-demand periods (common in fragrance SKUs), asymmetrically penalizes overforecasting, and systematically incentivizes under-forecasting. Replace it with MASE as your primary accuracy metric and track Bias separately.
Translating Forecast Error into Inventory Cost
This is the bridge between statistical modeling and business value — and it is the bridge most implementations never build. Vandeput argues that forecast error is only meaningful when expressed in dollars, not abstract statistical units.
The chart above shows the critical insight: the optimal stocking quantity is not at the mean of the forecast distribution. It shifts right (toward higher stocking) as Cu increases relative to Co. For Grow, given the confirmed asymmetry, the optimal quantity will consistently sit above the forecast mean — meaning a 50th-percentile point forecast systematically under-stocks.
04Forecasting Models
Start simple. Add complexity only when simplicity provably fails.
Baseline Models — The Floor
Vandeput's most important methodological point on model selection: every model must beat a naïve benchmark to justify its complexity. The naïve forecast is simply: "next period's demand equals this period's demand." It requires no estimation, no parameters, and no data beyond the last observation.
Naïve / Random Walk
F(t+1) = A(t)
The MASE denominator. If any model you build has MASE > 1.0, you are worse than doing nothing. This is your absolute floor.
Seasonal Naïve
F(t+1) = A(t - seasonality)
Next period equals the same period last year. For fragrance with strong annual cycles, this is a surprisingly strong baseline. Often beats complex models with limited data.
Moving Average
F(t+1) = mean(A(t-n+1)...A(t))
Smooths noise but lags trend. Window length n is a tuning parameter. Simple to explain to non-technical stakeholders.
Exponential Smoothing — The Workhorse
Exponential smoothing models are Vandeput's recommended starting point for most supply chain applications. They are computationally lightweight, interpretable, and adapt to changing demand levels over time. The alpha (α) parameter controls how quickly the model responds to new information.
Holt-Winters with additive or multiplicative seasonality is the appropriate starting model for Grow's hero SKUs (anchor fragrances with established history). The seasonal component will partially replicate what the rules-based engine does implicitly — but with statistically fitted parameters rather than manual adjustments. This becomes the shadow mode comparator in Phase 1.
ARIMA / SARIMA
ARIMA (AutoRegressive Integrated Moving Average) models demand as a function of its own past values and past forecast errors. SARIMA extends this with seasonal differencing. These are appropriate when the demand series shows autocorrelation — i.e., knowing yesterday's demand genuinely helps predict today's.
Vandeput is pragmatic about ARIMA: it is more complex to tune (p, d, q parameters plus seasonal equivalents), requires stationarity testing, and rarely outperforms Holt-Winters on typical supply chain data. However, for Grow's DTC channel where day-of-week effects, promotional echoes, and holiday patterns create structured autocorrelation, SARIMA can add value — particularly as data accumulates through 2025 and 2026.
ML-Based Forecasting
Machine learning models (gradient boosting, random forests, neural networks) can incorporate external features — promotions, social media signals, day-of-week, holidays, new product indicators — that classical time series models cannot. Vandeput discusses these in the context of feature engineering for supply chain, noting that their value is primarily in capturing causal demand drivers rather than in time series pattern recognition per se.
LightGBM / XGBoost
Gradient boosted trees. Best overall performer on tabular demand data when sufficient feature engineering is applied. Handles non-linear interactions (promo × season × launch age) naturally.
Feature Engineering
The model is only as good as its features. Essential features: lag variables (demand at t-1, t-7, t-28), rolling statistics, promotion flags, SKU age, launch cohort, Bass curve position.
Phase 3 Candidate
ML models require more data than Grow currently has. With 12–18 months of clean actuals, begin exploratory ML work. Deploy only when MASE meaningfully beats Holt-Winters on a holdout set.
Model Selection Framework
The question is never "which model is best in theory?" It is always "which model produces the lowest expected inventory cost on this SKU given this much data?"
05Anchors & Faders — Product Lifecycle Modeling
Every SKU in a fragrance portfolio follows a lifecycle. Modeling the shape of that lifecycle is more powerful than fitting a time series in isolation.
The existing rules-based engine implicitly encodes a lifecycle assumption: new fragrances peak and decay, while a handful of "anchor" products maintain or grow. This intuition is statistically grounded — it maps directly to two well-studied models: the Bass Diffusion Model (for faders) and S-curve / logistic growth (for anchors). Making these curves explicit and fitting them to historical data turns founder intuition into testable, refineable parameters.
Bass Diffusion Model — For New Launch Faders
The Bass model (Frank Bass, 1969) describes the adoption of a new product as the interplay between two populations: innovators who adopt independently of others, and imitators who adopt because of social influence from existing adopters.
For Grow, the critical parameter is m — this is precisely where the founder's growth input enters the model. The Bass curve shape (governed by p and q) can be fitted from historical launch data. Once fitted, all subsequent launches share the same curve shape; only m varies. This is the statistical validation layer for the growth assumption: "given p and q fitted from your previous launches, what value of m is implied by our historical peak demands, and how does that compare to the growth target?"
Interactive: Bass Diffusion Curve
Adjust parameters to see how different fragrance launch profiles emerge. The peak timing and height are direct functions of p, q, and m.
S-Curve / Logistic Growth — For Anchor SKUs
Anchor SKUs — the 1–2 hero fragrances that grow in the early years before plateauing — follow a logistic growth curve. Unlike Bass, the logistic model has no decay phase: demand rises, inflects, and saturates at a ceiling.
Bayesian Pooling — Borrowing Strength Across SKUs
When a new SKU launches, it has zero history. The naïve approach is to wait for data to accumulate before forecasting. The Bayesian approach is to use related SKUs as informative priors — shrinking the new SKU's forecast toward the cohort mean, and then updating as actuals arrive.
A new fragrance in the Fader category is not launched into a vacuum. Grow has previously launched fragrances. Those launches inform a prior distribution over (p, q, m). The new launch starts there and updates toward its own data as weeks accumulate. After 4–6 weeks of sell-through, the posterior is meaningfully updated. After a full season, the SKU stands on its own data.
The mathematical machinery is hierarchical Bayesian modeling — at its simplest, a partial pooling model where individual SKU parameters are drawn from a common hyperprior. In practice, a reasonable approximation for Phase 2 is:
This approach is directly applicable to Grow's new format problem: a new size of an existing fragrance can pool from both the cohort-level prior and the parent fragrance's demand history, with higher initial weight on the parent.
06Inventory Optimization
Forecasting tells you what demand might be. Inventory optimization tells you what to do about it.
Safety Stock
Safety stock is buffer inventory held to protect against demand variability and supply uncertainty during the replenishment lead time. Vandeput's key point: safety stock is not waste — it is the cost of uncertainty. Reducing safety stock without reducing uncertainty just increases the probability of stockout.
The z-score used in safety stock calculation is the Newsvendor critical ratio, expressed as a normal quantile. When CR = 0.80, z = 0.84. This means the two frameworks — Newsvendor and safety stock — are the same model expressed differently. The Newsvendor gives the intuition; safety stock gives the operational implementation.
Reorder Point & Economic Order Quantity
Reorder Point (ROP)
When inventory position falls to or below ROP, trigger a replenishment order. This is the operational trigger for day-to-day ops staff — the "reorder X now" signal.
Economic Order Quantity (EOQ)
EOQ minimizes total ordering + holding costs. For Grow, where raw material fungibility lowers h, EOQ will tend toward larger, less frequent orders — but must be balanced against cash flow constraints.
Newsvendor — Full Parameterization for Grow
Returning to the Newsvendor with full parameterization relevant to Grow's cost structure:
| Cost Component | Category | Estimable From | Notes for Grow |
|---|---|---|---|
| Lost sale margin | Cu | P&L, price × margin % | Direct, measurable |
| Lost repeat customer LTV | Cu | Cohort analysis, repurchase rate | High for DTC — a stockout breaks the repurchase habit |
| Lost brand/social momentum | Cu | Attribution modeling | Hard to quantify; use conservative estimate + sensitivity analysis |
| Raw material holding cost | Co | WACC × inventory value | Low for stable fragrance materials |
| Finished goods holding cost | Co | Warehouse + WACC | Modest; space is limited constraint |
| Markdown / clearance cost | Co | Historical promo discounts | Partially offset by fungibility |
Service Levels — Cycle vs. Fill Rate
Vandeput distinguishes two service level definitions that are frequently confused:
Cycle Service Level (CSL)
Probability that no stockout occurs during a replenishment cycle. CSL = 95% means 95% of order cycles have zero stockouts. This is what the z-score in the safety stock formula directly controls.
Fill Rate (FR)
Fraction of demand met from stock on hand. FR = 95% means 95% of units demanded are shipped without delay. Fill rate is always ≥ CSL. For Grow's DTC context, fill rate is the more meaningful customer-facing metric.
A 95% CSL does not mean 95% of orders are fulfilled. It means 95% of cycles are stockout-free. If cycles are short and order quantities are small, a 95% CSL can correspond to a 99%+ fill rate. If cycles are long, the same CSL might yield a much lower fill rate. Always report both for executive conversations.
07MRP Output → Layered Forecasting System
MRP is a constraint solver, not a demand forecaster. The two must be architecturally separated and connected deliberately.
Material Requirements Planning (MRP) is a production scheduling and materials planning tool. It takes a demand plan as input and calculates what to produce, buy, and when. The demand plan itself is not generated by MRP — it is fed into it. The current state at Grow is that the rules-based Demand Engine produces the demand plan that feeds MRP. The statistical forecasting layer will eventually sit between raw historical data and MRP, either replacing or augmenting the Demand Engine's output.
MRP Data Extraction & What's Usable
| MRP Output Field | Usability | Notes |
|---|---|---|
| Historical planned orders | Low | Reflects the Demand Engine's intent, not actual demand. Do not use as demand history. |
| Historical actual receipts | Medium | Useful for lead time distribution fitting. Actual vs. planned receipt dates = lead time variability. |
| On-hand inventory snapshots | High | Essential for calculating implied demand from inventory movements (beginning + receipts - ending = sales). |
| Stockout / back-order flags | High | Censored demand identification. Any period with a stockout has understated demand. |
| Bill of Materials (BOM) | High | Links finished goods to raw materials. Critical for raw material demand planning and fungibility mapping. |
Use sell-through (actual sales to end customers) as your demand signal — not production orders, not receipts, not planned demand. If you only have inventory movement data, back-calculate demand as: Demand(t) = Inventory(t-1) + Receipts(t) − Inventory(t), adjusting for any stockout periods.
System Architecture — Demand Signal Layer
The Feedback Loop
A forecasting system without a feedback loop is not a system — it is a one-time calculation. The operational value comes from continuous updating: actuals flow back in, forecast errors are measured, model parameters are updated, and inventory policies are recalibrated. At Grow's current scale, this loop can run weekly.
# Conceptual weekly update loop
def weekly_forecast_update(sku_id, new_actuals):
# 1. Ingest new week actuals
actuals = load_actuals(sku_id) + new_actuals
# 2. Refit model (SES/HW alpha updates automatically; Bass needs periodic refit)
model = load_model(sku_id)
model.update(new_actuals)
# 3. Generate new forecast with uncertainty intervals
forecast, lower, upper = model.predict(horizon=12, confidence=0.80)
# 4. Recalculate safety stock with updated σ
ss = safety_stock(sigma_lt=model.sigma_lt, z=cr_to_z(cu, co))
# 5. Update reorder point
rop = model.mean_lt_demand + ss
# 6. Log forecast error metrics for governance
log_metrics(sku_id, mase=calc_mase(actuals, forecast), bias=calc_bias(actuals, forecast))
return {'forecast': forecast, 'rop': rop, 'ss': ss}
08Growth Input Validation
The founder's growth input is a prior, not a forecast. The statistical system's job is to interrogate that prior with evidence.
The current Demand Engine uses a growth target as a primary driver — accounting for an estimated 60–70% of output. This is not inherently wrong: growth targets are legitimate inputs to planning. The problem is the absence of a validation mechanism. When the growth input is high, the entire production plan is large. If the growth doesn't materialize, that's working capital tied up in inventory. If it does materialize but wasn't planned for, that's stockouts.
The statistical framework provides three things the engine currently lacks:
Evidence-Based Growth Estimate
Fit trend models to historical sell-through by SKU. Calculate the statistically supported growth rate with confidence intervals. This is the "what does the data say the company can grow at" number.
Gap Analysis
Compare the evidence-based growth estimate to the growth input. A large gap (target: 40%, data-supported: 22%) requires explanation — not suppression. What specific marketing actions, new products, or channel expansions justify the gap?
Conditional Production Plan
Present a distribution of outcomes, not a single number. "If we hit 40% growth: produce X. If we hit 22%: produce Y. The cost of Y being wrong in each direction is Z." This is the Monte Carlo vision, accelerated.
Shadow Mode Protocol
Shadow mode is the politically and operationally correct way to introduce statistical forecasting alongside a founder's existing system. The statistical model runs in parallel, produces outputs, and accumulates a performance track record — but does not yet influence decisions. This serves several purposes:
- Trust building: The model's track record speaks for itself. When it outperforms the engine, the evidence is in the log — not an argument.
- Gap documentation: Every week, the log shows: Engine said X, statistical model said Y, actuals were Z. Over a season, patterns emerge.
- No political risk: The engine's authority is not challenged during shadow mode. The founder sees this as "we're also running this other thing" — not "we're replacing your system."
- Parameter refinement: The model is being fitted and tuned in shadow mode. By the time it's ready for influence, it's already battle-hardened on real data.
# Shadow mode logging schema
shadow_log = {
'week': week_id,
'sku': sku_id,
'engine_forecast': engine_output, # Rules-based engine
'stat_forecast': stat_model_output, # Statistical model
'stat_lower_80': lower_bound,
'stat_upper_80': upper_bound,
'actual_demand': None, # Filled in retrospectively
'engine_error': None, # Filled retrospectively
'stat_error': None, # Filled retrospectively
'growth_input': engine_growth_assumption, # Log this explicitly
'stat_growth_estimate': stat_growth_ci # Compare to it
}
09MVP & Iterative Implementation
The company needs something in production now. Don't let the statistically correct be the enemy of the operationally useful.
The path from "no statistical models" to "full closed-loop forecasting system" is not a single jump. It is a sequence of phases, each of which delivers standalone business value and de-risks the next phase. The North Star is profitability — which means the first win needs to be visible within 30–60 days, not 12 months.
Phase 1 — Foundation & Visibility
- Clean and catalog historical sell-through by SKU from 2024 data. Flag censored demand (stockout periods).
- Build the Promotions & Events Log retrospectively — even an imperfect log is better than none.
- Classify SKUs: Anchor vs. Fader vs. New Format. Approximately 3 buckets, each getting a different model family.
- Fit Seasonal Naïve and Holt-Winters to Anchor SKUs. Compute MASE vs. Naïve baseline. Establish the performance floor.
- Parameterize Cu and Co from P&L and LTV data. Calculate Critical Ratio. This is a one-page deliverable for the executive — the most valuable document of Phase 1.
- Begin Shadow Mode logging. Engine output vs. stat model output logged weekly, no decisions changed.
- Go/No-Go Criterion: MASE ≤ 1.0 on at least 50% of Anchor SKUs.
Phase 2 — Growth Input Validation & Safety Stock
- Fit Bass Diffusion to all historical Fader SKU launches. Extract p and q estimates. Document confidence intervals on m for the current season's new launches.
- Build the Growth Input Validation dashboard: evidence-based growth estimate vs. founder's target, with Gap Analysis output.
- Calculate statistically grounded Safety Stock for each SKU using fitted σ from Phase 1 models + Critical Ratio from Cu/Co analysis.
- Implement Reorder Point triggers for ops staff — this is the "reorder X now" deliverable. Simple threshold, automatically updated weekly.
- Present Shadow Mode log first results to data team. Identify SKUs where stat model is outperforming the engine — begin internal advocacy.
- Prototype Bayesian Pooling for 1–2 new format SKUs as a proof of concept.
- Go/No-Go Criterion: Growth Input Validation report reviewed by data team. ROP triggers deployed and tested for at least 5 SKUs.
Phase 3 — Consensus Plan & Monte Carlo
- After one full season of Shadow Mode, present the performance comparison formally. Let the data make the case for statistical influence on the demand plan.
- Build the Monte Carlo simulation layer: for each SKU, simulate 10,000 demand scenarios by sampling from the fitted forecast distribution. Aggregate to total production plan scenarios.
- Present the founder with a "production scenario distribution" — not "order X units" but "here is what our inventory position looks like across scenarios, and here is the cost of each tail outcome."
- If ML data is available (18m+ clean history), begin exploratory LightGBM work on the top 5 SKUs by revenue. Only deploy if MASE improvement is statistically significant.
- Introduce Bayesian Pooling formally for all new launches — the prior is now fitted from a full season of launches.
- Go/No-Go Criterion: Statistical system has demonstrably lower total forecast cost (Cu·stockouts + Co·overstock) than engine on the season holdout period.
The goal is never to replace the founder's engine — it is to make the founder choose to replace it, because the data is undeniable. Shadow mode builds the evidence. Growth Input Validation makes the risk visible in dollar terms. Monte Carlo gives the founder a tool that enhances their intuition rather than overriding it. By Phase 3, the statistical system should feel like a superpower, not a threat.
10Governance & Monitoring
A forecasting system without governance degrades. Metrics drift, models go stale, and nobody notices until it's expensive.
KPIs That Map to Dollars
| KPI | Audience | Target | Action Threshold |
|---|---|---|---|
| Inventory Cost of Forecast Error Cu·stockouts + Co·overstock ($) |
Executive | Decreasing QoQ | Increase >15% vs prior period |
| Fill Rate by SKU Tier | Executive / Ops | ≥95% (Anchors), ≥90% (Faders) | Below threshold for 2 consecutive weeks |
| MASE by SKU | Data Team | <1.0 (beat naïve) | MASE >1.2 triggers model review |
| Forecast Bias by SKU | Data Team | |Bias| < 5% | Systematic bias (same sign 4+ weeks) triggers refit |
| Growth Gap Target vs. Evidence-Based |
Executive / Data | <10 percentage points | Gap >25pp requires documented justification |
| Shadow Mode Accuracy Delta | Data Team | Stat model MASE < Engine MASE | Positive delta for 4+ consecutive weeks = escalate for influence |
Model Retraining Cadence
Weekly (Automated)
SES/Holt-Winters parameter updates via online learning. Reorder Point recalculation. Shadow Mode log entry. Bias check — flag SKUs with 3+ consecutive same-sign errors.
Monthly (Triggered)
Full model refit if MASE exceeds threshold. Bayesian Pooling weight recalibration as new SKU data accumulates. Cost parameter review (Cu/Co updates from P&L).
Seasonal (Pre-Planning)
Bass curve refit with new launch actuals. Growth Input Validation report generation. Full holdout evaluation of all models vs. engine. Monte Carlo scenario generation for upcoming season.
The Long-Term Vision — Monte Carlo Production Planning
The endgame described by the founder — a Monte Carlo engine with inputs from historical and statistical results, creating a "distribution of possible production scenarios" — is achievable within 12–18 months of this implementation. The architecture is:
This architecture transforms the planning conversation from "how many units should we make?" to "here is the distribution of demand outcomes, here is the cost of each tail scenario, and here is the mathematically optimal production plan given your cost structure." The founder's growth input becomes one of several tunable parameters in the simulation — not a monolithic driver, but a lever the team can test and stress-examine.
Every model, every metric, every governance process in this document exists to serve one goal: helping Grow Fragrance become profitable. Better forecasting reduces the cost of uncertainty. Reduced uncertainty frees working capital. Freed working capital funds growth without external financing. The path from "vibes-driven planning" to "data-disciplined profitability" is the 18-month arc this document describes — and every phase of it delivers standalone business value on the way.