How We Kept Demand Forecasts Alive During Supplier Shocks

We rolled out ML-based demand forecasting across our regional distribution network about eighteen months ago, and for the first few quarters everything looked solid—accuracy was holding in the mid-nineties, inventory turns improved, and the planning team was finally getting away from endless spreadsheet firefights. Then last spring a combination of tariff changes and two tier-2 supplier exits hit us inside the same month, and our forecast accuracy dropped nearly fifteen points in six weeks. Orders we thought were reliable suddenly weren’t, lead times that had been stable for years stretched out, and the model just kept serving up optimistic numbers that didn’t match reality.

What saved us was that we’d built continuous monitoring into the deployment from day one. We were tracking not just overall error but segmenting by supplier, region, and product family, so when the drift started we caught it fast. We had automated retraining pipelines in place, but we still had to make judgment calls—do we retrain on the disrupted data and risk baking in temporary shocks, or do we wait and lose more ground? We ended up doing both: a short-cycle retrain with weighted recent data to capture the new normal, and then scenario modeling to stress-test the refreshed model against different recovery timelines.

In the end we stabilized around 91% accuracy within eight weeks and avoided the inventory pileups that hit some of our competitors. The big lesson for us was that model drift isn’t just a technical problem—it’s an operational one. You need the infrastructure to detect it, the process to respond fast, and the cross-functional trust so that planners, procurement, and data teams can make calls together when things go sideways.

The continuous retraining setup is key. We’re still on quarterly manual retrains and it’s killing us—every time something shifts in the market we’re weeks behind. What kind of pipeline orchestration are you running? Are you doing fully automated retrains on a schedule, or event-driven based on drift signals? And how do you handle the tradeoff between retraining fast and validating that the retrained model isn’t worse?

For monitoring we segment by supplier tier, geography, and product velocity—high-velocity SKUs get tighter thresholds because errors propagate faster. We don’t have one global trigger; instead we weight the segments and use a composite score. On the retraining side, we run event-driven retrains when drift crosses segment thresholds, but every retrained model goes through a holdout validation and a business review before it goes live. That adds a day or two, but it’s worth it to catch cases where the new model fixes one problem and creates another.

We spent six months on data normalization before we even started model development—painful at the time, but it paid off. Supplier codes, lead time definitions, and cost hierarchies were all over the place. If you try to do ML on top of messy data, you’ll spend all your time chasing phantom drift signals that are really just data inconsistencies. Get the data foundation solid first, even if it delays the AI project.