Best approach to validate requirement quality before AI-generated tests

kimberly141 · February 14, 2025, 9:22am

We’re piloting AI-driven test case generation from user stories and acceptance criteria, but we’re running into a foundational problem: the quality of our requirements is all over the place. Some stories are crystal clear with measurable acceptance criteria, others are vague wish lists. When we feed these into the AI pipeline, the outputs reflect that inconsistency—some generated tests are spot-on, others completely miss the point or test the wrong things.

We tried running an audit on about 500 existing stories and found that roughly 30% had ambiguous language, missing subjects or outcomes, or acceptance criteria that were basically untestable. We looked at tools that score requirements against standards like INCOSE or EARS notation, but we’re unclear on the best sequencing: should we clean up the backlog first and then turn on AI test generation, or can we run both in parallel with some kind of quality gate in between?

Has anyone dealt with this chicken-and-egg problem? What’s the minimum quality threshold you’d recommend before feeding requirements into AI tooling, and how do you maintain that quality as new stories get added every sprint?

mohitace · February 14, 2025, 2:38pm

What scoring framework did you settle on? We’ve been debating whether to enforce EARS notation strictly or just focus on clarity and completeness. Some of our teams push back on EARS because they find it too rigid for exploratory work, but without some standard it’s hard to automate quality checks consistently.

jessica_tech · February 14, 2025, 3:51pm

We use a hybrid model. Core functional requirements follow EARS patterns because they feed into safety and compliance validation downstream. For experimental or UX-focused stories, we relax the syntax rules but still enforce completeness checks—every story must have a defined persona, a measurable outcome, and at least one testable acceptance criterion. That way we get consistency where it matters without killing creativity in early discovery work. The NLP tool we use lets you configure different rule sets per project or epic, which has been helpful.

luciapartner · February 15, 2025, 10:47am

Agree with the phased approach. We did a one-time backlog cleanup sprint where we bulk-analyzed about 800 stories, prioritized the 200 most critical ones for manual review, and auto-closed or archived anything older than six months with a quality score below 50%. It was painful but necessary. Once we had a clean baseline, maintaining quality with each new story became manageable. The ROI showed up fast—our AI-generated test accuracy jumped from about 60% usable to over 85% within two months.

gupta_arch · February 14, 2025, 11:45am

We hit this exact issue about six months ago. Our approach was to establish a two-phase gate: first, run all new and modified stories through an NLP-based quality checker that flags ambiguity, missing elements, and non-INVEST compliance. Only stories scoring above 75% pass through to the AI test generation pipeline. Anything below that threshold gets bounced back to the product owner with specific feedback on what needs fixing. It added maybe 10 minutes per story initially, but within a few sprints authors started internalizing the patterns and our pass rate jumped to about 90%. The key was making the feedback actionable—not just ‘this is unclear,’ but ‘the outcome clause is missing’ or ‘replace vague quantifier like some with a measurable condition.’

amandaexpert · February 14, 2025, 1:02pm

Just a word of caution from our experience: we tried running quality scoring and AI test generation in parallel without a hard gate, thinking we’d just flag low-quality outputs for review. What ended up happening was that teams got flooded with generated tests that looked technically correct but were testing the wrong behaviors because the underlying requirement was ambiguous. The rework cost was higher than if we’d just paused and cleaned the backlog first. If I were doing it again, I’d set a baseline cleanup sprint before turning on any downstream AI automation.

wizplanner · February 15, 2025, 12:33pm

Worth noting that you’ll probably need to tune your quality thresholds by domain. We found that integration stories (crossing multiple systems) needed stricter scoring than UI polish stories. If the requirement touches finance or compliance logic, we enforce a 90% minimum. For internal tooling improvements, 70% is acceptable. It’s not one-size-fits-all, and trying to enforce uniform rules across wildly different story types just creates friction.

Topic		Views
Scaling requirements quality checks without drowning in manual reviews AI Adoption in ALM discussion , traceability , nlp , ai-adoption , exploring , requirements-quality , alm-ai , backlog-hygiene , gherkin	5	February 14, 2025
How do you validate AI-generated acceptance criteria before teams start building? AI Adoption in ALM question , nlp , ai-adoption , exploring , acceptance-criteria , requirements-traceability , alm-ai , backlog-hygiene , user-stories	7	February 18, 2025
AI spanning requirements, test management, and CI/CD—how are you connecting the dots? AI Adoption in ALM discussion , ci-cd , test-automation , scaling , ai-adoption , llm , alm-ai , self-healing-tests , risk-prediction	7	February 20, 2025
Shipping AI-generated code at scale: how do you bridge the confidence gap? AI Adoption in ALM question , code-quality , ai-adoption , piloting , ai-assisted-development , alm-ai , cognitive-load , developer-experience	6	February 19, 2025
AI code assistants everywhere, but dev teams still double-checking everything—how to bridge the trust gap? AI Adoption in ALM discussion , change-management , testing-automation , ai-adoption , piloting , alm-ai , developer-experience , code-review	6	February 19, 2025
Managing AI fatigue in dev teams—how to build sustainable trust? AI Adoption in ALM question , governance , change-management , workflow-alignment , ai-adoption , piloting , alm-ai , ai-fatigue	4	February 19, 2025
AI-powered anomaly detection in visual inspection: balancing accuracy gains with validation burden AI Adoption in QMS discussion , data-governance , audit-trails , anomaly-detection , ai-adoption , piloting , qms-ai , capa-management	4	December 14, 2025
How are teams balancing AI test prioritization with full regression coverage? AI Adoption in ALM question , ci-cd , test-automation , regression-testing , ai-adoption , llm , piloting , alm-ai , defect-prediction	4	February 19, 2025
Automated test case generation from requirements using AI in elm-7.0.2 IBM Engineering Lifecycle Management use-case , quality-mgmt , nlp-integration , automation-mgmt , test-case-mgmt , backlog-planning , elm-7-0-2 , manual-test-cases , 60-percent-effort-reduction	5	November 22, 2025

Best approach to validate requirement quality before AI-generated tests

Related topics