How do you actually keep RPA bots from breaking when processes drift?

We’ve got about 40 bots in production handling invoice processing, purchase requisitions, and parts of the financial close. Things ran smoothly for the first six months, but now we’re seeing constant breakage. A vendor changes an invoice format slightly—missing hyphen, extra space in a field—and the bot just stops. Someone updates an account structure in the ERP and three bots fail overnight. We’re spending more time fixing bots than we saved by deploying them.

We do have process mining running, and it shows us where execution is diverging from the baseline, but by the time we see it in the dashboard, the damage is done. We’ve tried building more error handling into the bots, but every edge case we address seems to create two new ones. Our CoE is constantly firefighting instead of scaling new use cases.

What’s worked for teams dealing with this? Do you rely on continuous monitoring with automated alerts, or is there a smarter way to design bots so they adapt instead of just failing? And how do you decide when a process is actually stable enough to automate in the first place?

We also struggled with this until we accepted that not everything should be fully automated. Some processes are inherently variable—vendor formats, approval routing when new suppliers come in, anything that touches external parties. For those, we shifted to human-in-the-loop: the bot handles the deterministic steps and flags exceptions for review. It’s slower than full automation, but it’s way more reliable and the team actually trusts it.

The invoice format issue is brutal. We’ve seen the same thing—one vendor adds a new line item structure and the bot can’t parse it. What helped us was building a pre-validation layer that checks incoming documents against expected schemas before the bot touches them. If something doesn’t match, it routes to a human queue instead of failing silently. It’s not perfect, but at least we catch issues before they cascade.