We’re at a bit of a crossroads with our AI adoption roadmap. Leadership is pushing hard for generative AI capabilities in procurement – supplier recommendations, contract analysis, risk scoring – but every time we scope a pilot, we hit the same wall: our data is a mess. We’ve got duplicate supplier records across three legacy systems, product categories that don’t align between regions, and spend data that’s inconsistent enough that finance and procurement teams argue over the numbers monthly.
The frustrating part is that we know what we want AI to do. We’ve seen demos, we’ve budgeted for tools, we even hired a data scientist. But when she pulls the data for a demand forecasting pilot, she spends three weeks just trying to figure out which supplier names are actually the same entity and which product codes are still active. It feels like we’re trying to build on quicksand.
I’m curious how others have approached this. Do you tackle master data quality as a separate initiative before touching AI, or do you use AI projects as the forcing function to finally clean things up? And if you did go the master-data-first route, how did you scope it without it turning into a multi-year boil-the-ocean exercise? What actually moved the needle for you?
We made the mistake of trying to do master data cleanup in parallel with an ERP upgrade. Bad idea. The ERP migration timeline kept slipping, data standards kept changing, and we ended up doing remediation work twice. If I had to do it again, I’d stabilize the data first, validate it in the current systems, then migrate clean data to the new platform. Sequential, not parallel.
The key is governance before technology. We set up a data governance committee with clear domain ownership – someone from procurement owns supplier master, someone from product management owns the product catalog, etc. Each domain lead was responsible for defining standards and cleaning their area. It sounds bureaucratic but it’s the only way to make it stick. Without ownership, you clean the data once and six months later it’s a disaster again because people keep entering junk.
Have you looked at using AI itself to clean the data? We deployed an LLM-based matching tool that automatically detected duplicate supplier records by comparing names, addresses, and tax IDs even when the formatting was wildly inconsistent. It flagged thousands of likely matches and we had stewards review and confirm them. Cut our remediation time in half. You still need humans in the loop for edge cases, but the automation handles the bulk work.
I’d also recommend staging your AI adoption around data readiness. Start with use cases that are more forgiving of messy data – maybe anomaly detection in financial transactions where the model can learn patterns even if some records are incomplete. Once you’ve got a win and credibility, use that momentum to fund the deeper master data work for more demanding applications like predictive maintenance or autonomous procurement. You need quick wins to keep executive support, but you also need to be transparent about what’s possible with current data quality versus what requires investment.
Something we learned: don’t underestimate the change management piece. You can build the best data governance model in the world, but if the procurement team in APAC doesn’t understand why they need to follow new supplier creation rules or if sales keeps bypassing validation to close deals faster, the data degrades immediately. We ran training sessions, built quick reference guides, and most importantly, made it easy to do the right thing – validation rules that catch errors at entry, not six months later in a report.
We were in almost the exact same spot last year. Tried to pilot an AI tool for supplier risk assessment and discovered we had the same supplier listed forty-seven times with different spellings and legal entities. What worked for us was picking one high-value use case – in our case, strategic sourcing analytics – and doing targeted remediation just for the data domains that mattered for that pilot. We didn’t try to fix everything. Just suppliers, categories, and twelve months of spend data. Took about eight weeks with a small team, but the pilot actually worked and that built momentum.