How are teams balancing AI test prioritization with full regression coverage?

emilyplant · February 18, 2025, 2:22pm

We’re running daily builds across a pretty complex product suite—lots of microservices, multiple frontends, and integration points everywhere. Our full regression suite takes about 12 hours to run, which is killing our ability to ship faster. We’ve been looking at ML-based test prioritization to focus on high-risk areas instead of running everything on every commit.

We did a small pilot with impact analysis and risk prediction models trained on our historical defect data and code changes. The results were interesting—execution time dropped by maybe 50%, but we also missed a couple of issues that only showed up in tests we didn’t prioritize. Now leadership is nervous about skipping tests, and some engineers don’t trust the model’s recommendations. We’re still manually reviewing which tests to run, which defeats the purpose.

How are other teams handling this? Do you run prioritized tests on every commit and full regression on a schedule, or have you found a way to build confidence in the model’s decisions? What metrics convinced your stakeholders that AI prioritization was safe?

jacobw · February 19, 2025, 10:40am

What’s your governance process for updating the model? We found that the model degrades over time as the codebase evolves—tests that were high-priority six months ago aren’t anymore, and new high-risk areas emerge. We retrain quarterly using the latest data, and we have a monthly review where QA and dev leads can flag tests that should be manually prioritized regardless of what the model says. That manual override option was critical for getting buy-in.

kai_analyst · February 19, 2025, 1:05pm

How are you handling flaky tests in the prioritization? We had issues where the model kept flagging certain tests as high-priority because they failed frequently, but they were actually just flaky—timeouts, race conditions, environment issues. We ended up building a separate flakiness detection layer that quarantines unreliable tests before they even reach the prioritization model. Otherwise the model just amplifies the noise.

sofiapartner · February 18, 2025, 6:30pm

From the dev side, the biggest benefit was getting test results back in under an hour instead of waiting until the next morning. We can fix issues the same day we introduce them, which makes a massive difference in productivity. But I agree with the trust issue—early on, I didn’t understand why certain tests were prioritized, and I’d manually trigger full runs just to be safe. Once the QA team added explanations to the CI feedback (“these tests were prioritized because you changed X and historically that affects Y”), I stopped second-guessing it.

arch_ops · February 19, 2025, 8:15am

We tried a hybrid approach: AI prioritization for PRs and feature branches, but we always run the full suite before merging to main and before any production deployment. The risk is just too high to skip comprehensive testing at those gates. The AI helps us iterate faster during development, but we don’t rely on it for final validation. It’s more about speed during the dev cycle than replacing full coverage.

Topic		Views
AI spanning requirements, test management, and CI/CD—how are you connecting the dots? AI Adoption in ALM discussion , ci-cd , test-automation , scaling , ai-adoption , llm , alm-ai , self-healing-tests , risk-prediction	7	February 20, 2025
Recalibrating AI defect prediction after false-negative spike in production AI Adoption in ALM use-case , ci-cd , scaling , ai-adoption , model-drift , quality-gates , alm-ai , defect-prediction , false-negatives	6	February 15, 2025
AI code assistants everywhere, but dev teams still double-checking everything—how to bridge the trust gap? AI Adoption in ALM discussion , change-management , testing-automation , ai-adoption , piloting , alm-ai , developer-experience , code-review	6	February 19, 2025
When AI Defect Prediction Misses Critical Bugs: Our CI/CD Release Gate Learning Curve AI Adoption in ALM use-case , ci-cd , ai-adoption , piloting , model-drift , quality-gates , alm-ai , defect-prediction , false-negatives	6	February 20, 2025
AI defect prediction letting critical bugs slip through—how to catch false negatives before production? AI Adoption in ALM question , ci-cd , ai-adoption , piloting , model-drift , release-gates , alm-ai , defect-prediction , false-negatives	7	February 18, 2025
Flaky test detection at scale: ML model vs heuristics vs hybrid? AI Adoption in ALM question , ci-cd , scaling , ai-adoption , flaky-tests , test-maintenance , test-prioritization , alm-ai , self-healing	6	February 14, 2025
Choosing AI Test Automation Architecture: Multi-Model vs Single-Model vs Retrofitted AI Adoption in ALM discussion , scaling , roi , ai-adoption , test-prioritization , alm-ai , self-healing-tests , flaky-test-detection , multi-model-ai	6	February 18, 2025
Recovering 22K Builds with ML-Based Flaky Test Detection Platform AI Adoption in ALM use-case , ci-cd , devops , scaling , ml-models , ai-adoption , test-prioritization , alm-ai , flaky-test-detection	3	February 19, 2025
Managing AI fatigue in dev teams—how to build sustainable trust? AI Adoption in ALM question , governance , change-management , workflow-alignment , ai-adoption , piloting , alm-ai , ai-fatigue	4	February 19, 2025

How are teams balancing AI test prioritization with full regression coverage?

Related topics