Happy to provide more details on the implementation:
Model Selection Frequency:
We re-run best-fit model selection quarterly, not monthly. Here’s our reasoning:
-
Model Stability: Changing models too frequently creates forecast volatility. Supply planning needs consistent forecasting logic to make reliable decisions. Monthly switching would cause whiplash effects.
-
Computational Efficiency: Running best-fit across 340 SKUs with 8 candidate models is resource-intensive. Quarterly cadence balances accuracy improvement with system performance.
-
Seasonal Alignment: Quarterly re-evaluation aligns with our seasonal business cycle. We re-run best-fit at the start of each season (Spring, Summer, Fall, Winter) when demand patterns shift.
-
Exception-Based Switching: Between quarterly cycles, we have an alert that flags SKUs where forecast error exceeds 30% for two consecutive months. Those SKUs trigger an immediate best-fit re-evaluation rather than waiting for the quarterly cycle.
Model Selection by Product:
We allow different models for different products - that’s the whole point of best-fit. Our portfolio breaks down as:
- 45% use Holt-Winters (strong seasonality, trend)
- 30% use Seasonal Naive (seasonal but erratic)
- 15% use Exponential Smoothing (low seasonality)
- 10% use Moving Average or other models
We don’t enforce consistency within product families because even similar products can have different demand drivers. For example, picture frames and decorative pillows are both home décor, but frames have steady demand while pillows spike seasonally.
Training/Validation Split Rationale:
The 18/6 month split (75/25) was intentional for seasonal products:
- 18 months training ensures we capture at least 1.5 full seasonal cycles (our products have 12-month seasonality)
- 6 months validation lets us test model performance across half a seasonal cycle, including both peak and off-peak periods
- Standard 80/20 would give only 4.8 months validation, which might miss critical seasonal transitions
For non-seasonal products, 80/20 would work fine, but since 70% of our portfolio is seasonal, we optimized for that majority.
IQR Cleansing Configuration Details:
// Pseudocode - IQR outlier detection and cleansing:
1. Calculate Q1 (25th percentile) and Q3 (75th percentile) of demand
2. IQR = Q3 - Q1
3. Lower_Bound = Q1 - (1.5 × IQR)
4. Upper_Bound = Q3 + (1.5 × IQR)
5. For each demand value:
IF demand < Lower_Bound OR demand > Upper_Bound
Replace with median demand for that month
ELSE keep original value
// Applied before model training, not on forecast output
The key is that cleansing happens on historical data BEFORE model training, not on the generated forecast. This prevents extreme historical outliers from skewing model parameters while preserving the forecast’s ability to predict high/low demand within normal ranges.
Inventory Optimization Integration:
Translating forecast accuracy into inventory reduction required several changes:
-
Dynamic Safety Stock: We implemented a formula that adjusts safety stock based on forecast error:
- Safety Stock = Z-score × Forecast Error Standard Deviation × Lead Time
- As forecast accuracy improved (lower error standard deviation), safety stock automatically decreased
- We monitor this monthly and saw gradual reduction over 6 months as confidence in forecasts grew
-
Service Level Differentiation: With better forecast accuracy, we could afford to lower safety stock on C-items (low-value products) while maintaining or increasing safety stock on A-items (high-value, high-velocity). This optimized inventory investment.
-
Replenishment Frequency: Improved forecast accuracy enabled more frequent, smaller replenishment orders. Instead of ordering monthly to buffer forecast uncertainty, we moved to bi-weekly orders for fast-moving items, reducing average inventory levels.
-
Forecast Value-Add (FVA) Tracking: We implemented FVA metrics to measure whether manual forecast overrides improved or worsened the statistical forecast. This helped us identify where planners should intervene vs. trust the model, further improving effective accuracy.
Best-Fit Model Accuracy Improvement Analysis:
To quantify best-fit’s impact, we compared three scenarios:
- Baseline: Single model (exponential smoothing) applied to all products = 58% accuracy
- Manual selection: Planners chose models based on product knowledge = 64% accuracy
- Best-fit automated: System selected optimal model per product = 76% accuracy
The 12-point improvement over baseline and 8-point improvement over manual selection validated the best-fit approach. The time savings (15 hours/week) came primarily from eliminating manual model selection and tuning.
Lessons Learned:
-
Data Quality Critical: Best-fit only works with clean historical data. We spent 3 weeks cleaning demand history before implementation - removing duplicate orders, correcting data entry errors, aligning promotional events. Without this prep, best-fit would select models based on bad data.
-
Change Management: Planners initially resisted automated model selection, feeling it reduced their control. We addressed this by:
- Showing accuracy improvement data
- Allowing manual overrides when planners had market intelligence
- Implementing FVA to prove when overrides helped vs. hurt
- Repositioning planners as “forecast analysts” who interpret and adjust, not just create forecasts
-
Continuous Monitoring: Best-fit isn’t “set and forget.” We have monthly reviews where we analyze:
- Which products have declining accuracy (need re-evaluation)
- Which models are most commonly selected (informs data patterns)
- Forecast bias by product family (systematic over/under forecasting)
- Outlier cleansing effectiveness (are we removing too much or too little?)
-
Integration with S&OP: The accuracy improvement enabled better S&OP discussions. Instead of debating whether the forecast is accurate, we now focus on strategic decisions like capacity allocation, new product launches, and market expansion.
Overall, implementing best-fit with IQR cleansing transformed our demand planning from an art (manual, inconsistent) to a science (systematic, measurable) while preserving the planner’s role in applying business judgment where it adds value.