Newsvendor Cost Weighting
DATA & ANALYTICS REFERENCE SERIES
Newsvendor Cost Weighting
When Asymmetric Error Costs Outperform Traditional Accuracy Metrics
March 2026 • v1.0 • David Delisio, Data Scientist
1. The Core Idea in Plain Language
Every inventory decision — how much to order, how large to make a production batch, when to restock — carries a silent assumption: that ordering too much and ordering too little are equally bad outcomes. This assumption is rarely stated out loud, but it is baked into most standard forecasting accuracy metrics by default.
A metric like MAPE (Mean Absolute Percentage Error) gives the exact same penalty score whether the forecast was 100 units too high or 100 units too low. From a pure accuracy standpoint, that symmetry makes sense. From a business standpoint, it rarely reflects reality.
If you order too much, the extra inventory sits in the warehouse until it sells — there is a carrying cost, but the product is not lost. If you order too little and stock out, you lose the sale entirely, you may lose the customer's future business, and depending on your sales channels, you can face additional penalties like reduced search visibility. Those two outcomes are not the same.
The newsvendor model is a well-established and widely used framework that formalises this observation. When the cost of running out differs from the cost of having too much, your inventory targets should reflect that difference — and the model provides a mathematically grounded way to do exactly that. This document explains how it works and where it fits alongside the accuracy metrics already in use.
| The One-Sentence SummaryMAPE tells you how accurate your forecast was. The newsvendor model tells you how much the direction of that error actually cost — and whether your stocking decisions were leaning the right way given your specific cost structure. |
|---|
2. What Standard Accuracy Metrics Do — and Don't — Capture
2.1 How MAPE, RMSE, and MAE Work
The three most common ways to measure forecast accuracy all share the same underlying logic: they measure the distance between what you predicted and what actually happened, and they do so symmetrically — a miss of 100 units in either direction gets the same score.
| Metric | What It Measures | How It Treats Direction | Best Used For |
|---|---|---|---|
| MAPE(Mean AbsolutePercentage Error) | Average % gap between forecast and actual, across all periods | Symmetric — over and under errors score equally | Percentage-based accuracy; easy to communicate to non-technical stakeholders |
| RMSE(Root Mean SquaredError) | Similar to MAPE but squares the errors first, so large misses are penalised more | Symmetric | Comparing models; catching forecasts that are badly wrong occasionally |
| MAE(Mean AbsoluteError) | Average gap in original units (e.g. candles) | Symmetric | Simple, unit-level baseline; less sensitive to occasional big errors than RMSE |
| Bias | Average of signed errors — shows if the model tends to forecast too high or too low overall | Directional, but not cost-weighted | Detecting a systematic lean in one direction over time |
These are genuinely useful metrics. Where they leave a gap is that they do not distinguish between the business impact of the two directions of error. A forecast that is off by 100 units on the high side and one that is off by 100 units on the low side receive identical scores — even though the financial consequences can be very different.
2.2 The Gap That Matters
The chart below shows the point visually. On the left: how a standard accuracy metric like MAPE sees two types of error. On the right: what those same errors actually cost the business.
This gap between how we score forecasts and how errors actually affect the business is what the newsvendor model is designed to close.
3. The Newsvendor Model
3.1 Where It Comes From
The newsvendor model has been around since the 1950s in operations research. The original scenario: a newspaper vendor has to decide how many papers to stock each morning before knowing how many customers will show up. Unsold papers are thrown away (overstock cost). Customers who cannot get a paper go elsewhere (stockout cost). What is the right number to stock?
The insight is that the answer depends on the ratio of those two costs — and that ratio should directly drive the stocking decision. Despite its simple origins, this model is used today by retailers, manufacturers, and logistics companies worldwide because the underlying logic applies to almost any inventory problem.
3.2 The Two Costs — Industry-Standard Estimation Approaches
The model uses two numbers to describe the cost structure of any inventory decision. There is no single universally correct way to estimate either of them — the right approach depends on what data is available and how comprehensively you want to capture the true cost. The table below covers the most common industry methods, from simplest to most complete.
An important point before reading the table: each row is a standalone approach, not a list of components to add together. You pick one Cu method and one Co method for a given decision, calculate a single number for each, and feed them into the critical ratio formula. Section 6.4 discusses how to test whether your choice of method materially changes the answer.
Cu — Cost of Running Short (Underage)
| Approach | How to Calculate It | When to Use It | Typical Range |
|---|---|---|---|
| Lost gross margin(baseline) | Selling price minus COGS minus variable fulfilment cost (e.g. pick/pack/ship or marketplace fees). This is the profit you give up on a sale that never happens. | The default starting point for any product. Easiest to calculate and hardest to argue with — it is purely the margin foregone. | Varies by product. For a $40 product at 60% gross margin: Cu ≈ $24 minus ~$4 fulfilment = ~$20 |
| Lost gross margin+ customer churn | Gross margin per unit plus (estimated customer LTV × estimated probability the stockout causes the customer not to return). Accounts for the long-term cost of disappointing a repeat buyer. | Most appropriate for products with strong repeat purchase behaviour, subscription customers, or a brand where loyalty is a strategic asset. | Adds 20–80% to the baseline Cu depending on LTV and churn assumptions. Commonly used in DTC brands. |
| Lost margin+ marketplace penalty | Gross margin per unit plus an estimated cost of suppressed organic visibility on a marketplace channel following an OOS event. The penalty is typically modelled as a % reduction in organic orders over the recovery period. | Use when a meaningful share of sales runs through a marketplace where OOS events are known to affect algorithmic ranking or Buy Box eligibility. | Marketplace penalty is typically modelled as 1–4 weeks of reduced organic sales at the pre-OOS run rate. Hard to quantify precisely — express as a range. |
| Fully-loadedopportunity cost | Gross margin + customer LTV impact + marketplace penalty + any incremental marketing cost required to recover lost demand (e.g. a promotional email or paid reactivation). Captures the full downstream cost of a stockout event. | Reserved for high-margin, high-loyalty, marketplace-dependent products where stockouts have proven to have lasting effects. Use as an upper bound in sensitivity analysis. | Can be 2–3× the baseline gross margin for products with strong customer economics and marketplace exposure. |
Co — Cost of Having Too Much (Overage)
| Approach | How to Calculate It | When to Use It | Typical Range |
|---|---|---|---|
| Physical holding cost(baseline) | Storage cost per unit per month multiplied by the number of months expected to hold the excess. Includes warehouse space, handling, and insurance but not the cost of the tied-up cash itself. | The minimum floor for any Co estimate. Always included. Straightforward to calculate from warehouse cost data. | Consumer goods benchmarks: 1.5–3% of unit cost per month in warehouse storage costs. |
| Holding cost+ capital charge | Physical holding cost plus a capital opportunity cost: the unit's value multiplied by the business's cost of capital (WACC) applied over the holding period. Reflects that cash tied up in inventory cannot be used elsewhere. | Standard practice in most inventory management frameworks. Particularly relevant for capital-constrained businesses where cash has clear alternative uses. | WACC typically 10–20% annually for growth-stage consumer brands. Adds 0.8–1.7% of unit value per month to the baseline holding cost. |
| Holding cost+ markdown risk | Physical holding cost plus the expected markdown required to clear unsold inventory, weighted by the probability that units are not sold at full price. For seasonal or time-limited products, this can be the dominant component. | Most important for seasonal products, limited-edition items, or any SKU where unsold stock at end-of-season will need to be discounted or written off. The seasonal fragrance context is a clear example. | Markdown depth varies widely. A 20–30% end-of-season markdown on 10–20% of units adds roughly $0.50–$2.00 per unit to Co depending on unit value. |
| Fully-loadedoverage cost | Holding cost + capital charge + markdown risk + disposal or write-off cost for any units that cannot be sold at any price (e.g. fragrance products approaching shelf-life limits). The upper bound on Co. | Use as an upper bound in sensitivity analysis. Appropriate when products have hard expiry or degradation constraints. | Can be 15–25% of unit value over a 3–6 month holding period for fragrance products with shelf-life considerations. |
One practical consideration worth flagging: when a business sells through more than one channel — for example, both a direct website and a marketplace — the cost of a stockout can differ meaningfully between them, making Cu channel-specific. How to handle this in practice is covered in Section 6.3.
3.3 The Critical Ratio — One Simple Formula
Once you have Cu and Co, the model gives you a target service level — the probability of having enough stock that maximises expected profit:
| Critical Ratio = Cu / (Cu + Co)This is the service level — the probability of not stocking out — that maximises profit.Example: Cu = $23, Co = $5Critical Ratio = 23 / (23 + 5) = 0.82Target the 82nd percentile of expected demand |
|---|
It is worth pausing on why 0.82 comes out of this example — because it is not an arbitrary target, it is the precise point where the cost of adding one more unit of safety stock exactly equals the cost of the stockout it prevents. Below 0.82, stocking more is still worth it — each extra unit avoids more stockout cost than it creates in carrying cost. Above 0.82, the reverse is true. The formula finds the crossover point.
Put another way: if running out costs roughly four and a half times more than overstocking — as in this example — the model says you should be willing to accept a stockout only about one time in five. The 82% target is not a policy choice or a rule of thumb. It falls directly out of the cost ratio.
3.4 Where the Demand Distribution Comes From
The diagram below shows 82% on a demand distribution — a bell curve representing the range of possible demand outcomes. But it is worth asking: where does that distribution come from?
This is where a probabilistic forecast model earns its place in the pipeline. A simple point forecast gives you one number — 'we expect to sell 500 units.' That single number is enough to calculate MAPE, but it is not enough to apply the newsvendor model. To find the 82nd percentile of demand, you need to know not just the expected level of demand but also how much it varies — the width of the distribution.
A model like Prophet, SARIMA, or similar time-series tools produces exactly this: a central estimate plus a range of uncertainty around it. The wider that range (meaning demand is harder to predict), the more the newsvendor model will recommend building in safety stock — because there is more territory between your order quantity and the tail scenarios that would leave you short. Investing in a better forecast model therefore has a direct and quantifiable benefit: a tighter demand distribution means a more precise stocking target, which means less wasted inventory on one side and fewer stockouts on the other.
4. When the Newsvendor Approach Adds the Most Value
4.1 Three Conditions That Make It Worth Applying
The newsvendor approach produces meaningfully better inventory outcomes when three things are true at the same time:
The costs of running out and having too much are different. If they happen to be roughly equal, standard accuracy metrics already point you toward the right answer. But for most consumer goods businesses — particularly those with strong margins and customer loyalty — stocking out is substantially more expensive.
Demand is genuinely uncertain. If you could predict sales exactly, the stocking target would not matter — you would just order exactly right. The value of this approach comes from making better decisions under uncertainty.
The forecast is directly connected to an ordering or production decision. If a human reviews and overrides the number before any commitment is made, some of the mathematical precision is lost in practice — though the underlying cost logic is still useful as a gut-check.
In most consumer goods inventory environments, all three conditions apply simultaneously — particularly for branded products with meaningful margins and some degree of repeat purchasing.
4.2 Where Standard Accuracy Metrics Still Belong
This is not an argument for replacing MAPE and RMSE. They remain the right tools in several important situations:
Choosing which forecast model to use: When comparing Prophet against SARIMA against a simpler model, accuracy metrics give you a clean, cost-neutral comparison. You want to pick the most accurate model before layering cost logic on top.
Monitoring forecasts over time: A weekly or monthly scorecard using MAPE and Bias is an efficient way to catch when a model has started drifting — before it causes a business problem.
When cost parameters are uncertain: The newsvendor approach requires estimates of Cu and Co. If those numbers are not yet agreed upon or are very hard to estimate, accuracy metrics are a reasonable starting point while the cost picture is being developed.
Communicating with a broad audience: MAPE is widely understood. '15% MAPE' is immediately interpretable. Newsvendor outputs require a bit more explanation, so accuracy metrics remain valuable for general reporting.
| How They Work TogetherThink of it as two stages. First, use MAPE and RMSE to pick the best forecasting model and monitor its accuracy over time. Then, use the newsvendor critical ratio to translate that model's output into the right stocking quantity. Neither replaces the other — they solve different parts of the same problem. |
|---|
5. Application — Estimating Cu and Co in Practice
5.1 What Goes Into Cu and Co
The inputs to the newsvendor model are business judgments, not purely statistical ones. Cu and Co each boil down to a single number that goes into the critical ratio formula — but there are different ways to arrive at that number depending on how comprehensively you want to model the cost structure.
The most important thing to understand before reading the tables below: you do not add all of these rows together. Each table describes a different way to define Cu or Co. You pick the approach that fits the decision you are making, use it consistently, and arrive at one Cu and one Co.
| How to Read This SectionThe first table below covers the inventory-focused baseline — the simplest and most direct way to define Cu and Co for a straightforward stocking decision. The second table describes alternative framings that bring in broader business considerations. These are not additional components to layer on top — they are different lenses for defining the same two numbers, suited to different contexts. |
|---|
The Inventory Baseline — Start Here
For most inventory stocking decisions, Cu and Co can be defined directly from unit economics. This is the recommended starting point because the inputs are concrete, observable, and easy to agree on.
| Cost | What It Is | How to Calculate It |
|---|---|---|
| Cu — the cost of running out (one unit) | The gross margin you lose on a sale that cannot happen because the shelf is empty | Selling price minus COGS minus the variable fulfilment cost for that channel (e.g. Amazon FBA fees or DTC shipping) |
| Co — the cost of having too much (one unit) | The cost of holding a unit of stock that does not sell within the planning period | Monthly storage cost per unit, multiplied by the number of months, plus an allowance for markdown risk on any units that eventually need to be discounted |
These two numbers are enough to run a newsvendor analysis. For many SKUs this straightforward approach will produce a reliable and defensible critical ratio.
Alternative Framings — for Specific Contexts
In some situations, the simple inventory baseline does not fully capture the real cost of the decision. The rows below are examples of how Cu or Co can be defined more broadly. Each one is a complete, standalone alternative framing — not an additional line item to add to the calculation above.
| Framing | When to Use It | How It Redefines Cu or Co |
|---|---|---|
| Customer lifetime value framing (Cu) | When the product in question is a high-repeat or subscription item where losing the sale likely means losing the customer | Cu becomes the estimated LTV (Lifetime Value) of a customer who does not come back, multiplied by the probability that a stockout causes them to churn. This produces a higher Cu than the margin-only baseline, which will push the critical ratio upward and recommend carrying more safety stock. |
| Amazon channel penalty framing (Cu) | When the SKU in question is primarily sold on Amazon and has experienced OOS events that coincided with ranking drops | Cu is extended to include an estimated cost of the algorithmic ranking impact — the lost organic visibility and future sales that result from an OOS event. This is harder to quantify precisely and is best expressed as a range rather than a point estimate. |
| Cash opportunity cost framing (Co) | When the business is cash-constrained and tying up capital in inventory has a meaningful opportunity cost | Co is extended beyond physical storage costs to include the cost of the capital itself — using the business's effective cost of capital (WACC) applied to the unit value. This raises Co and will push the critical ratio downward slightly, recommending a leaner stocking position. |
A useful sense-check: after choosing a framing and calculating a critical ratio, test it against a plausible range of inputs rather than a single point estimate. If the recommended stocking quantity changes materially when you use a conservative vs. generous estimate of Cu or Co, that is worth flagging — it means the decision is sensitive to assumptions that have not been pinned down precisely.
5.2 A Worked Example — Bamboo Candle
The Bamboo Candle is currently on a waitlist — demand is known to exceed supply. This makes it a particularly clear illustration of the newsvendor approach, because the cost of stocking out is unusually high.
| Illustrative Example (Numbers Are Approximate)Assume: gross margin of $18 per unit, an estimated $5 per unit of brand and ranking impact from a stockout, carrying cost of $2 per unit, and markdown risk of $3 per unit on any unsold stock at season end. This gives Cu = $23 and Co = $5, and therefore a Critical Ratio of 23 / (23 + 5) = 0.82. A standard accuracy-focused approach would target mean expected demand of 400 units. The newsvendor model says: target the 82nd percentile of expected demand, which works out to approximately 469 units given the observed demand distribution. The expected cost of the two approaches over many cycles is meaningfully different. |
|---|
Again — these numbers are illustrative. The real exercise is to work through this with actual margin data and agree on the Cu and Co estimates with the team before using them in production decisions.
5.3 The Stockout Data Problem — Why This Cannot Be Skipped
There is one data quality issue that matters more than anything else when applying the newsvendor model: the difference between 'no sales' and 'no demand'.
Sales data records what was actually sold. It does not record what customers wanted to buy but couldn't because the product was unavailable. When a SKU is out of stock, the system logs zero or near-zero sales — but the real demand may have been close to normal. If you build a forecast on that data, the model will learn that demand during those periods was low. It was not. Supply was low. These are fundamentally different things.
| Why This Error CompoundsThe problem does not just add noise to the forecast. It systematically biases it in the wrong direction. Underestimated demand produces a lower Cu estimate, which produces a lower critical ratio, which says to order less. Each step reinforces the original error. The brand ends up stocking less than it should, experiencing more stockouts, and generating more censored data that further underestimates demand. A growing brand — especially one that has experienced supply constraints in recent history — is particularly exposed to this. |
|---|
Identifying and correcting for stockout periods in the historical data is therefore not an optional refinement — it is a necessary prerequisite for any of this analysis to be reliable. The practical steps are:
Step 1 — Identify OOS periods: Cross-reference sales records against inventory position data. Any period where inventory hit zero and sales dropped simultaneously is a candidate for demand censoring. This requires having inventory position data available — sales data alone is not sufficient to make this determination.
Step 2 — Separate 'no stock' from 'no demand': Not every zero-sales period is a stockout. Seasonal slow periods, product pauses, and pre-launch windows are genuine low-demand periods. Inventory logs, reorder histories, and product status records are needed to tell these apart correctly.
Step 3 — Estimate the missing demand: For confirmed OOS periods, the suppressed demand can be reconstructed. The simplest approach is to interpolate from the weeks just before and just after the stockout. A more rigorous approach uses a statistical model that accounts for seasonality and trend. For the Bamboo Candle, the existing waitlist is a real, live demand signal that can directly inform this estimate.
Step 4 — Check how much it changes the picture: Run the Cu and Critical Ratio calculation both with and without the correction. If the numbers shift substantially, the uncorrected data was pointing the business in a meaningfully wrong direction. If they barely move, censoring was not a significant factor for that SKU.
There is also a forward-looking infrastructure implication here: to catch stockout periods reliably going forward, the system needs to log daily inventory position alongside sales — not just transactions. Without inventory data to cross-reference against, the only way to identify a stockout period retrospectively is to look for suspiciously low sales stretches, which is imprecise.
| Infrastructure NoteLogging daily inventory position by SKU should be treated as a foundational data requirement, not a future enhancement. It is the key that makes stockout detection reliable and reproducible — and without it, any cost-weighted inventory optimisation will rest on assumptions that cannot be properly validated. |
|---|
6. Validating the Approach: Backtesting
6.1 Why Accuracy Backtests Are Not Enough
A standard backtest works like this: hold back the most recent data, run the model on the earlier data, compare its predictions to what actually happened, and calculate MAPE or RMSE. This is a useful and necessary check.
But it only tells you whether the forecast was accurate. It does not tell you whether the inventory decisions that came from that forecast were profitable. A forecast can have good MAPE but still drive costly stockouts — if its errors happen to fall consistently in the expensive direction.
6.2 Adding an Economic Layer
A more complete validation adds a second step: simulate the actual inventory outcome. For each period in the backtest, take the order quantity the model would have recommended, compare it to what demand actually turned out to be, and calculate the financial result using the Cu and Co estimates. This turns the backtest from a measurement of accuracy into a measurement of business impact.
Take the model's demand forecast for each period in the holdout window.
Apply the critical ratio to calculate what the newsvendor-optimal order quantity would have been.
Compare that quantity to actual demand. If it fell short, count the stockout cost. If it exceeded demand, count the carrying cost.
Do the same thing for a baseline — for example, ordering at mean forecast demand as a MAPE-optimal baseline.
Compare total simulated profit across the backtest window for both approaches.
The output of this exercise is a dollar figure — the estimated difference in profit between the two approaches over the backtest period. That is a number any business stakeholder can engage with directly, without needing to understand confidence intervals or percentiles.
6.3 Applying the Model When You Sell Across Multiple Channels
Section 3.2 flagged that when a business sells through more than one channel, the cost of a stockout can differ between them — which means the same SKU may warrant a different critical ratio depending on where the marginal unit is being allocated. This section covers how to handle that in practice.
The core issue is that Cu is channel-specific. On a direct channel (such as a branded website), a stockout costs the gross margin on the lost sale plus whatever fraction of customers do not come back. On a marketplace channel, the same stockout also risks suppressing organic search visibility and Buy Box eligibility — an ongoing penalty that can last for weeks after inventory is restored. Because these penalties are hard to recover from quickly, the effective Cu on the marketplace is typically higher for the same product.
There are two practical approaches for businesses in this situation:
Channel-specific critical ratios: Calculate Cu and Co separately for each channel, apply the critical ratio independently, and stock accordingly. This is the most precise approach but requires a clear view of how inventory is allocated between channels at the point of the ordering decision.
Blended critical ratio with channel weighting: If channel allocation is not decided until fulfilment, a single blended Cu can be constructed by weighting each channel's Cu by its share of expected demand. This is a reasonable simplification when inventory is pooled and fulfilled across both channels from the same stock.
A useful starting point is to calculate the critical ratio under both the direct-channel Cu and the marketplace Cu for a given SKU, and note how far apart the implied order quantities are. If the gap is small, a blended approach is fine. If the gap is large, it is worth being explicit about channel allocation before committing to an order quantity.
6.4 Other Things to Watch For
Test across a range of cost assumptions, not just one set: If the newsvendor approach only outperforms under very specific Cu and Co values, that is a sign the conclusion may not hold in practice. A robust result should hold across a reasonable range.
Make sure OOS periods in the backtest data have been corrected: Running the economic backtest on censored data will understate Cu and make the results misleading.
Avoid look-ahead bias: The demand distribution used to calculate the order quantity should only use data that was available at the time of the simulated ordering decision — not data from after the fact.
Compare on a like-for-like seasonal basis: A model that performs well in Q4 but not Q1 may be capturing seasonality differences, not a genuine improvement from cost weighting.
7. A Note on Forecasting Models vs. Decision Engines
The document so far has mostly discussed applying the newsvendor approach to probabilistic demand forecasts — statistical models like Prophet or SARIMA that produce a range of demand estimates alongside a central prediction.
It is worth stepping back and distinguishing two different types of system that can feed inventory decisions, because the right scoring approach differs between them.
| Type of System | Examples | What It Produces | Natural Scoring Approach |
|---|---|---|---|
| Statistical forecast model | Prophet, SARIMA, regression models | A range of demand estimates with a central prediction and confidence bounds — it tells you 'we expect 400 units, but it could reasonably be anywhere from 300 to 520' | MAPE and RMSE measure how close the central prediction was to actual demand |
| Decision engine | A production batch sizing rule, a reorder point calculator, a purchase order generator based on historical averages | A specific action — 'order 500 units', 'start a production run', 'reorder now' — rather than a probabilistic prediction | Newsvendor cost scoring asks: did these decisions lead to good economic outcomes given what stockouts and overstock actually cost? |
The distinction matters because a decision engine does not necessarily produce a probabilistic forecast at all — it may be a rule or formula derived from experience. In those cases, MAPE can still be a useful proxy (measuring how close the implied 'target' was to actual demand), but it does not capture the full picture of whether the decisions were economically sound.
Applying a newsvendor lens to a decision engine does not mean rebuilding it. It simply means asking: looking back at the decisions this engine recommended, did they tend to lean in the right direction given the cost structure? Did stockouts happen more or less often than the cost asymmetry would justify? That question can be answered with historical data without changing anything about how the engine works.
| Worth ConsideringFor any system that directly drives inventory commitments, it is worth tracking both an accuracy metric (to measure how close the predictions were) and an economic outcome metric (to measure whether the decisions were profitable). The two metrics can tell quite different stories — and both stories are useful. |
|---|
| What You Are Evaluating | Question Being Asked | Useful Metric(s) | Notes |
|---|---|---|---|
| Statistical forecast model — model selection | Which model is most accurate? | MAPE / RMSE | Compare models consistently without bringing cost assumptions into it |
| Statistical forecast model — ongoing monitoring | Is the model still performing well over time? | MAPE + Bias | Bias is particularly useful — it shows whether the model is drifting high or low |
| Statistical forecast model — stocking decision | Given this forecast, how much should we order? | Apply the newsvendor critical ratio to the forecast distribution | This step translates accuracy into an action — it is not a scoring step |
| Decision engine of any type | Are the decisions this engine produces economically sound? | Newsvendor cost scoring (simulated profit impact) | A useful complement to any accuracy metrics already in use |
| Decision engine vs. alternative approach | Would a different approach produce better business outcomes? | Simulated profit comparison under shared cost assumptions | Requires agreed Cu and Co estimates before the comparison is meaningful |
| Any system — stakeholder reporting | How is the system performing in terms the business cares about? | Dollar-denominated P&L impact | The clearest way to connect analytical work to the outcomes that matter most |
9. Applying the Newsvendor Model to a Seasonal Forecast Engine
The sections above describe the newsvendor model in general terms. This section addresses a more specific question: how do you apply it when the underlying forecast is produced by a rule-based seasonal engine — one that generates prediction intervals from pooled historical decay curves rather than from a statistical time-series model?
This is a meaningful and practical question, because the two types of interval have different properties. Understanding those differences tells you where you can apply the model with confidence today, and where the estimates need to be treated with a bit more caution.
9.1 What the Engine Currently Produces
A seasonal forecast engine built around fragrance age decay curves produces, for each SKU, three outputs: a base case forecast, and a low and high estimate derived from the standard deviation of historical decay rates across all fragrances at the same age. For example, if the Y1→Y2 transition shows a median decay of -35% with a standard deviation of 30 percentage points across observed fragrances, the engine uses that spread to construct the interval.
This is a legitimate and useful source of uncertainty quantification — and it is substantially better than a point forecast alone. The interval genuinely reflects something real: how much fragrance demand at this lifecycle stage has historically varied. For established fragrances with several years of observed data, the engine's own observed rates provide a reliable anchor. For new launches, the pooled category data is the primary signal.
The key thing to understand is what this interval does and does not capture. It captures decay rate uncertainty — the variation in how fragrances at a given age have tended to perform across the category. It does not capture base demand uncertainty — uncertainty about whether the season's overall demand level will be higher or lower than expected, which is a separate source of risk. And it does not capture uncertainty from factors outside the engine's model, such as cannibalization from a concurrent new launch, a significant promotional shift, or a supply disruption. The interval width should therefore be thought of as a lower bound on true demand uncertainty rather than a complete picture of it.
| The Practical ImplicationThe engine's intervals are wide enough and grounded enough to support newsvendor cost weighting today — particularly for returning fragrances where the decay signal is well-established. For new launches, the intervals are wider and the base estimate itself carries more uncertainty, so the newsvendor output for those SKUs should be treated as directional guidance rather than a precision calculation until more launch cycles are observed. |
|---|
9.2 Applying the Critical Ratio to the Engine's Output
The mechanics are straightforward. For each SKU, the engine produces a base forecast and a prediction interval. To apply the newsvendor model:
Step 1 — Establish Cu and Co for the SKU, using one of the approaches from Section 3.2. For most seasonal inventory decisions, starting with lost gross margin for Cu and holding cost plus markdown risk for Co is appropriate.
Step 2 — Calculate the critical ratio: Cu / (Cu + Co).
Step 3 — Treat the engine's prediction interval as a demand distribution. The base case is the central estimate, and the low and high bounds define the spread. A simple assumption is that demand is approximately normally distributed between those bounds, with the base case as the mean.
Step 4 — Find the order quantity corresponding to the critical ratio percentile of that distribution. If the base case is 10,000 units and the 70% interval spans 8,000 to 12,000 units, a normal approximation gives a standard deviation of roughly 1,200 units. A critical ratio of 0.82 would point to approximately 10,000 + (0.92 × 1,200) ≈ 11,100 units.
Step 5 — Compare this to the engine's base case output. The difference is the implied safety stock the newsvendor model recommends given the cost structure. If this is materially different from the current production commitment, that gap is worth a conversation.
9.3 A Sensitivity Check Before Committing
Before using the newsvendor output to drive a production decision, it is worth running the calculation across a range of Cu and Co assumptions — which is what Section 6.4 describes as the first thing to check. This is especially important with an engine whose intervals may be underestimating true uncertainty.
A useful three-scenario structure for this check:
| Scenario | Cu Method | Co Method | What It Tests |
|---|---|---|---|
| Conservative (lean toward less) | Lost gross margin only (baseline) | Holding cost + capital charge + markdown risk (fully-loaded Co) | What does the model recommend if the cost of overstocking is taken seriously? This scenario will produce the lowest critical ratio and the least aggressive safety stock recommendation. |
| Base case | Lost gross margin + estimated customer churn impact | Holding cost + capital charge | The most defensible middle ground for a branded product with repeat buyers. Balances the cost of a lost customer against the cost of carrying unsold inventory. |
| Growth / availability-first | Lost gross margin + customer churn + channel visibility penalty | Physical holding cost only (baseline Co) | What does the model recommend if protecting availability and channel standing is the priority? This scenario will produce the highest critical ratio and the most conservative (highest) safety stock target. |
If all three scenarios point to a similar production quantity, the decision is robust and you can proceed with confidence. If the scenarios diverge significantly, that is a signal to sharpen the cost assumptions before committing — the answer is sensitive to inputs that have not been pinned down.
9.4 Where a Richer Demand Distribution Would Help
The engine's interval approach is a practical and honest solution given the data currently available. There are two directions that would improve it over time, worth keeping in mind as the data picture matures.
The first is simply more data. The pooled decay curve becomes more reliable as additional season cycles accumulate. With three or four observed seasons per fragrance age bucket, the standard deviation estimate stabilises and the intervals narrow — the same methodology becomes more precise. This improvement is largely automatic as the catalogue and data history grow.
The second is a different type of model altogether. A probabilistic time-series model — such as Prophet with its built-in uncertainty quantification, or a Monte Carlo simulation engine — produces a demand distribution that accounts not just for decay rate variance but for all modelled sources of uncertainty simultaneously: seasonality, trend, promotional effects, and residual noise. The interval it produces is a fuller characterisation of what demand might actually be, rather than a projection of historical decay spread.
The practical benefit is not just wider or narrower intervals — it is that the intervals become better calibrated to the actual probability of demand falling outside them. A 70% interval from a well-fit time-series model means demand fell inside it roughly 70% of the time historically. That calibration is harder to guarantee from decay rate variance alone, particularly when demand is also affected by factors the decay model does not capture.
| A Path Worth ConsideringThe seasonal engine and a statistical forecast model are not mutually exclusive. The engine is well-suited to the structured, config-driven planning workflow it supports. A probabilistic model could run alongside it — using the same DuckDB sales data — and provide a second interval estimate that can be compared to the engine's output. Where they agree, confidence is higher. Where they diverge meaningfully, that divergence is itself a useful signal that something in the cost, demand, or model assumptions deserves a closer look. |
|---|
10. Glossary
All abbreviations and technical terms used in this document, including the business and finance terms referenced in the cost estimation table in Section 5.
Forecast Accuracy Metrics
| Term / Abbreviation | Plain Language Definition |
|---|---|
| MAPE — Mean Absolute Percentage Error | The average percentage gap between what was forecast and what actually happened. If sales were forecast at 100 units and came in at 80 units, the error is 20%. MAPE is the average of those percentages across all forecast periods. Lower is better. It treats over-forecasting and under-forecasting errors equally. |
| RMSE — Root Mean Squared Error | Similar to MAPE but squares each error before averaging. This means large individual misses are penalised more heavily than small ones. Useful for detecting models that are occasionally very wrong, even if usually reasonable. |
| MAE — Mean Absolute Error | The average gap between forecast and actual, measured in the original units (e.g. candles). Less sensitive to the occasional large error than RMSE. A straightforward, easy-to-interpret baseline. |
| Bias | The average of the signed errors — that is, actual minus forecast — across all periods. A positive bias means the model tends to forecast too low. A negative bias means it tends to forecast too high. Bias can be zero even if individual errors are large, so it should always be reported alongside MAPE or MAE. |
| Symmetric loss function | A mathematical rule that penalises errors of equal size equally, regardless of which direction they go. All three metrics above use symmetric loss functions. The newsvendor model uses an asymmetric loss function instead. |
Newsvendor and Inventory Optimisation
| Term / Abbreviation | Plain Language Definition |
|---|---|
| Newsvendor model | An inventory optimisation tool from the field of operations research. Given uncertain demand and different costs for stocking out vs. overstocking, it calculates the quantity that maximises expected profit. Also known as the newsboy problem or single-period inventory model. |
| Cu — Underage cost | The cost you incur per unit when you do not have enough stock to meet demand. At its core this is the lost profit margin — but it can also include the value of future purchases from a customer you disappoint, and channel-specific penalties such as Amazon's ranking system. |
| Co — Overage cost | The cost you incur per unit when you end up with more stock than demand required. This includes warehouse and holding costs, the cash tied up in unsold inventory, and any markdown or disposal costs if the product does not sell. |
| Critical Ratio (CR) | The output of the newsvendor model: Cu divided by (Cu + Co). This number represents the optimal service level — the probability of having enough stock that maximises expected profit. It is the percentile of the demand distribution you should target when placing an order. |
| Q* — Optimal order quantity | The order quantity that maximises expected profit given the demand distribution and the two cost parameters. It is found by identifying the point on the demand distribution that matches the critical ratio. |
| Service level | The probability of not running out of stock in a given period. Under the newsvendor model, the service level equals the critical ratio. An 82% service level means you design the stocking decision to have enough inventory in 82 out of 100 comparable periods. |
| Censored demand | Demand that went unrecorded because there was no stock to sell. When a product is out of stock, the sales figure logged in your data is zero — but customers still wanted the product. That unmet demand is 'censored' from the data, and it needs to be estimated before using sales history to build forecasts. |
| OOS — Out of Stock | A period during which inventory reaches zero and orders cannot be fulfilled. OOS periods are the main source of censored demand in historical sales data. |
| Demand imputation | The process of estimating what true demand was during an OOS period, using surrounding data, comparable SKU patterns, waitlist information, or statistical modelling. A necessary step before applying newsvendor analysis to any SKU with a history of stockouts. |
| Economic backtest | A backtest that goes beyond measuring forecast accuracy and simulates the actual profit impact of the inventory decisions implied by the forecast. Produces a dollar figure rather than a percentage error. |
Business and Finance Terms
| Term / Abbreviation | Plain Language Definition |
|---|---|
| COGS — Cost of Goods Sold | The direct costs involved in making the products sold. For a candle this would include the fragrance oil, wax, wick, vessel, and packaging, plus any direct labour and allocated manufacturing overhead. Does not include marketing, sales, or general admin expenses. The main ingredient in calculating gross margin. |
| DTC — Direct to Consumer | A sales model where the brand sells directly to end customers — typically through its own website or Shopify store — rather than through a retailer or marketplace middleman. DTC sales generally produce higher gross margins than Amazon or wholesale because there is no channel partner taking a share. |
| LTV — Lifetime Value (also CLV — Customer Lifetime Value) | An estimate of the total profit a single customer is expected to generate over their entire relationship with the brand. Used in Cu estimation — if a stockout causes a customer not to return, the cost is not just the lost sale today but the LTV they would have contributed over time. |
| Gross margin | Revenue minus COGS, expressed as a percentage of revenue or in dollar terms. The per-unit gross margin is typically the main component of Cu in a newsvendor calculation — it represents what you give up when a sale does not happen. |
| Contribution margin | Revenue minus all variable costs, including COGS, variable fulfilment fees, and variable marketing costs. A more complete profitability measure than gross margin, particularly for channels with significant variable costs like Amazon FBA. |
| WACC — Weighted Average Cost of Capital | A measure of the cost of financing a business — blending the cost of debt and the cost of equity, weighted by how much of each the company uses. Used in Co estimation as the opportunity cost of cash tied up in inventory. For a growth-stage brand, WACC is typically higher than for a mature business, making excess inventory relatively more expensive. |
| P&L — Profit and Loss statement | The financial statement that summarises revenue, costs, and profit over a period of time. The ultimate performance scorecard. The goal of newsvendor cost weighting is to connect inventory decisions directly to P&L outcomes by quantifying the margin impact of getting stocking decisions wrong in either direction. |
| SKU — Stock Keeping Unit | A unique code assigned to a specific product variant for tracking purposes. Each distinct combination of product, size, scent, and packaging has its own SKU. Newsvendor optimisation is typically applied at the individual SKU level. |
| MOQ — Minimum Order Quantity | The smallest quantity a supplier will accept in a single order. MOQ constraints can mean the optimal newsvendor order quantity has to be rounded up to a feasible level — or that a different order timing strategy is needed. |
| BOM — Bill of Materials | A detailed list of all the raw materials and components, and the quantities of each, needed to produce one finished unit. Used in production planning and COGS calculation. |
| WMS — Warehouse Management System | Software that manages the physical operations of a warehouse — receiving, storage locations, picking, packing, and shipping. The system most likely to hold the daily inventory position data needed for reliable stockout detection. |
| FBA — Fulfillment by Amazon | Amazon's service where the seller ships stock to Amazon's warehouses and Amazon handles all storage, picking, packing, and shipping. FBA inventory levels are a relevant input for Cu estimation on the Amazon channel. |
| FBM — Fulfillment by Merchant | An Amazon arrangement where the seller manages and ships inventory directly to customers, rather than using Amazon's fulfilment network. |
| NC-ROAS — New Customer Return on Ad Spend | A marketing efficiency metric measuring how much revenue from new customers is generated per dollar of ad spend. Relevant to Cu estimation — when it is expensive to acquire new customers, the long-term cost of losing one to a stockout (their LTV) is correspondingly higher. |
Data & Analytics Reference Series • v1.0 • March 2026 • For internal use