How do you deal with false positives killing operator trust in AI alerts?

We rolled out an AI-based anomaly detection system in our MES a few months ago to catch equipment issues early and reduce unplanned downtime. The model performance looked solid in testing—around 92% accuracy—and leadership was excited. But now we’re seeing a real problem on the floor: operators are starting to ignore the alerts because we’re getting too many false positives.

A typical example: the system flags a vibration spike on a motor, maintenance goes to check it, and everything is fine. Or it triggers an alert during a normal product changeover because it thinks the temperature pattern is unusual. After a few weeks of this, the experienced crew just started dismissing warnings without checking. One supervisor told me bluntly that the system “cries wolf” and wastes their time. Now I’m worried we’re conditioning people to ignore alerts right when we need them to pay attention.

We’ve tried tuning thresholds and adding some process context like production schedules, but it hasn’t been enough. The alerts still don’t match what operators actually see as problems. How do other teams handle this? What kind of data or context integration actually moves the needle on false positives, and how do you rebuild trust once operators have already lost confidence in the system?

This is the number one issue we see in predictive maintenance rollouts. The biggest lesson we learned: vibration data alone is powerful but ambiguous. A spike can mean misalignment, lubrication issues, load changes, or just how an operator ran the equipment that shift. Without understanding operating conditions—speed, load, product type, recent maintenance—the system is basically guessing. We started integrating process data from our control systems and ERP (what product was running, what the setpoints were, ambient temp) and false positives dropped significantly. It’s not perfect, but operators now see alerts that make sense in context, and that’s when trust started coming back.

One thing that helped us was changing how we presented the alerts. Instead of just flashing a warning on the HMI, we added a short explanation—basically why the system thinks this is a problem and what data it’s looking at. It sounds small, but operators stopped feeling like the system was a black box. They could see the logic, and if it didn’t match what they knew about the equipment, they’d flag it as feedback. That feedback loop also helped us tune the model faster because we were learning from the people who actually know the machines.