We’ve got about 40 bots in production handling invoice processing, purchase requisitions, and parts of the financial close. Things ran smoothly for the first six months, but now we’re seeing constant breakage. A vendor changes an invoice format slightly—missing hyphen, extra space in a field—and the bot just stops. Someone updates an account structure in the ERP and three bots fail overnight. We’re spending more time fixing bots than we saved by deploying them.
We do have process mining running, and it shows us where execution is diverging from the baseline, but by the time we see it in the dashboard, the damage is done. We’ve tried building more error handling into the bots, but every edge case we address seems to create two new ones. Our CoE is constantly firefighting instead of scaling new use cases.
What’s worked for teams dealing with this? Do you rely on continuous monitoring with automated alerts, or is there a smarter way to design bots so they adapt instead of just failing? And how do you decide when a process is actually stable enough to automate in the first place?