RPA bots keep breaking in production—how are you handling drift and conformance violations?

We’ve been running RPA in finance ops for about 18 months now and the pattern is always the same: bots work great in testing, run fine for a few weeks in production, then start failing in unpredictable ways. Invoice processing halts when a vendor changes their line-item format slightly. Purchase order routing breaks when a new supplier requires different approvals. Financial close workflows just stop when somebody restructures accounts.

We’ve got process mining in place and it’s surfaced tons of conformance violations—turns out the way people actually execute these workflows is pretty different from what we assumed when we built the bots. But knowing about the drift and actually preventing the breakage are two different problems. We’re stuck in a loop where manual workarounds become the safety valve every time automation hits an edge case, which kind of defeats the ROI we were aiming for.

Curious how others are managing this in production environments. Are you catching drift early enough to fix bots before they fail? What’s your governance structure look like for handling these conformance issues? And how do you balance deterministic automation with the reality that business processes are messy and constantly changing?

We hit this exact problem last year with EDI integrations. A missing hyphen in a part code would kill the whole batch. What helped was shifting from perfect-data assumptions to building resilience into the bots themselves—basic format validation and fallback logic before the bot tries to process. It doesn’t solve every case but it cuts down the silent failures dramatically.