Implemented best-fit forecast models in demand planning for seasonal product portfolio

Sharing our implementation experience with best-fit statistical forecasting for a seasonal consumer goods portfolio that significantly improved forecast accuracy.

Business Challenge: Our company produces seasonal home décor products with highly variable demand patterns. We had 340 SKUs with forecast accuracy averaging 58%, leading to frequent stockouts during peak seasons and excess inventory in off-seasons. Manual forecast model selection was time-consuming and inconsistent across product managers.

Implementation Approach: We implemented SAP IBP’s best-fit forecast functionality with IQR (Interquartile Range) outlier cleansing to automatically select optimal statistical models for each product based on historical demand patterns.

Key configuration elements:

  • Enabled best-fit model selection across 8 candidate models (moving average, exponential smoothing, Holt-Winters, seasonal naive, etc.)
  • Configured IQR cleansing with 1.5x multiplier to remove demand spikes from promotional events
  • Set 24-month history window for model training
  • Implemented weekly forecast generation with monthly accuracy tracking

Here’s the core best-fit configuration:


// Pseudocode - Best-fit model selection logic:
1. Load historical demand for each SKU (24 months)
2. Apply IQR cleansing to remove outliers >1.5x IQR
3. Split data: 18 months training, 6 months validation
4. Test each candidate model against validation period
5. Calculate MAPE for each model's validation forecast
6. Select model with lowest MAPE as best-fit
7. Generate forecast using best-fit model parameters
// Models evaluated: MA, SES, Holt, Winters, Croston, etc.

Results After 6 Months:

  • Overall forecast accuracy improved from 58% to 76%
  • Seasonal products (70% of portfolio) saw accuracy increase from 52% to 81%
  • Inventory reduction of 23% while maintaining 96% service level
  • Planning team time saved: 15 hours/week previously spent on manual model tuning

The 18-month training / 6-month validation split is interesting. Most implementations I’ve seen use 80/20 split which would be roughly 19 months training / 5 months validation for your 24-month window. Did you find the 75/25 split gave better results? I’m wondering if the longer validation period helps with seasonal products where you want to validate across multiple seasons.

How did you handle the model selection frequency? Did you re-run best-fit monthly, quarterly, or only when accuracy dropped below threshold? I’m curious about the balance between model stability (keeping same model for consistency) vs. adaptability (switching models as demand patterns evolve). Also, did you allow different models for different products, or enforce consistency within product families?

Great results! Quick question on your IQR cleansing - did you find that 1.5x multiplier was optimal, or did you test different values? We’re considering implementing this but concerned about removing legitimate demand spikes that should influence future forecasts. How did you distinguish between outliers to remove vs. genuine demand signals?

Happy to provide more details on the implementation:

Model Selection Frequency: We re-run best-fit model selection quarterly, not monthly. Here’s our reasoning:

  1. Model Stability: Changing models too frequently creates forecast volatility. Supply planning needs consistent forecasting logic to make reliable decisions. Monthly switching would cause whiplash effects.

  2. Computational Efficiency: Running best-fit across 340 SKUs with 8 candidate models is resource-intensive. Quarterly cadence balances accuracy improvement with system performance.

  3. Seasonal Alignment: Quarterly re-evaluation aligns with our seasonal business cycle. We re-run best-fit at the start of each season (Spring, Summer, Fall, Winter) when demand patterns shift.

  4. Exception-Based Switching: Between quarterly cycles, we have an alert that flags SKUs where forecast error exceeds 30% for two consecutive months. Those SKUs trigger an immediate best-fit re-evaluation rather than waiting for the quarterly cycle.

Model Selection by Product: We allow different models for different products - that’s the whole point of best-fit. Our portfolio breaks down as:

  • 45% use Holt-Winters (strong seasonality, trend)
  • 30% use Seasonal Naive (seasonal but erratic)
  • 15% use Exponential Smoothing (low seasonality)
  • 10% use Moving Average or other models

We don’t enforce consistency within product families because even similar products can have different demand drivers. For example, picture frames and decorative pillows are both home décor, but frames have steady demand while pillows spike seasonally.

Training/Validation Split Rationale: The 18/6 month split (75/25) was intentional for seasonal products:

  • 18 months training ensures we capture at least 1.5 full seasonal cycles (our products have 12-month seasonality)
  • 6 months validation lets us test model performance across half a seasonal cycle, including both peak and off-peak periods
  • Standard 80/20 would give only 4.8 months validation, which might miss critical seasonal transitions

For non-seasonal products, 80/20 would work fine, but since 70% of our portfolio is seasonal, we optimized for that majority.

IQR Cleansing Configuration Details:


// Pseudocode - IQR outlier detection and cleansing:
1. Calculate Q1 (25th percentile) and Q3 (75th percentile) of demand
2. IQR = Q3 - Q1
3. Lower_Bound = Q1 - (1.5 × IQR)
4. Upper_Bound = Q3 + (1.5 × IQR)
5. For each demand value:
   IF demand < Lower_Bound OR demand > Upper_Bound
      Replace with median demand for that month
   ELSE keep original value
// Applied before model training, not on forecast output

The key is that cleansing happens on historical data BEFORE model training, not on the generated forecast. This prevents extreme historical outliers from skewing model parameters while preserving the forecast’s ability to predict high/low demand within normal ranges.

Inventory Optimization Integration: Translating forecast accuracy into inventory reduction required several changes:

  1. Dynamic Safety Stock: We implemented a formula that adjusts safety stock based on forecast error:

    • Safety Stock = Z-score × Forecast Error Standard Deviation × Lead Time
    • As forecast accuracy improved (lower error standard deviation), safety stock automatically decreased
    • We monitor this monthly and saw gradual reduction over 6 months as confidence in forecasts grew
  2. Service Level Differentiation: With better forecast accuracy, we could afford to lower safety stock on C-items (low-value products) while maintaining or increasing safety stock on A-items (high-value, high-velocity). This optimized inventory investment.

  3. Replenishment Frequency: Improved forecast accuracy enabled more frequent, smaller replenishment orders. Instead of ordering monthly to buffer forecast uncertainty, we moved to bi-weekly orders for fast-moving items, reducing average inventory levels.

  4. Forecast Value-Add (FVA) Tracking: We implemented FVA metrics to measure whether manual forecast overrides improved or worsened the statistical forecast. This helped us identify where planners should intervene vs. trust the model, further improving effective accuracy.

Best-Fit Model Accuracy Improvement Analysis: To quantify best-fit’s impact, we compared three scenarios:

  • Baseline: Single model (exponential smoothing) applied to all products = 58% accuracy
  • Manual selection: Planners chose models based on product knowledge = 64% accuracy
  • Best-fit automated: System selected optimal model per product = 76% accuracy

The 12-point improvement over baseline and 8-point improvement over manual selection validated the best-fit approach. The time savings (15 hours/week) came primarily from eliminating manual model selection and tuning.

Lessons Learned:

  1. Data Quality Critical: Best-fit only works with clean historical data. We spent 3 weeks cleaning demand history before implementation - removing duplicate orders, correcting data entry errors, aligning promotional events. Without this prep, best-fit would select models based on bad data.

  2. Change Management: Planners initially resisted automated model selection, feeling it reduced their control. We addressed this by:

    • Showing accuracy improvement data
    • Allowing manual overrides when planners had market intelligence
    • Implementing FVA to prove when overrides helped vs. hurt
    • Repositioning planners as “forecast analysts” who interpret and adjust, not just create forecasts
  3. Continuous Monitoring: Best-fit isn’t “set and forget.” We have monthly reviews where we analyze:

    • Which products have declining accuracy (need re-evaluation)
    • Which models are most commonly selected (informs data patterns)
    • Forecast bias by product family (systematic over/under forecasting)
    • Outlier cleansing effectiveness (are we removing too much or too little?)
  4. Integration with S&OP: The accuracy improvement enabled better S&OP discussions. Instead of debating whether the forecast is accurate, we now focus on strategic decisions like capacity allocation, new product launches, and market expansion.

Overall, implementing best-fit with IQR cleansing transformed our demand planning from an art (manual, inconsistent) to a science (systematic, measurable) while preserving the planner’s role in applying business judgment where it adds value.

Your inventory reduction while maintaining service level is impressive. Can you share more about how you translated improved forecast accuracy into inventory optimization decisions? Did you adjust safety stock calculations based on the new forecast accuracy levels? We’re seeing accuracy improvements but haven’t yet realized the inventory benefits.

Good question. We actually tested 1.5x, 2.0x, and 2.5x multipliers. The 1.5x worked best for our business because we have a separate promotional forecast stream. Promotional demand spikes are planned separately and added to the baseline statistical forecast. So IQR cleansing removes unplanned spikes (data errors, one-time bulk orders) but doesn’t affect planned promotional events. If you need to capture all demand signals in your statistical forecast, 2.0x or 2.5x might be better to avoid over-cleansing.