We’re experiencing significant test flakiness (30% failure rate) in our Harness CI pipelines pulling tests from ALM environment management in mf-25.3. The same tests pass consistently when run directly in ALM, but fail intermittently in Harness.
Investigation shows that environment configurations aren’t syncing properly before test execution starts. Tests begin running with stale config data, causing failures related to database connections, API endpoints, and feature flags. We’ve tried adding wait periods, but that’s unreliable and extends pipeline time significantly.
Looking for solutions around environment verification, config version pinning, or health check endpoints that could ensure environment readiness before test execution. The 30% flakiness is destroying team confidence in our CI pipeline.
Warm-up periods are underrated. Even after environment health checks pass, the first few test executions can fail due to cold caches, lazy-loaded connections, and service mesh routing updates. Add a warm-up step that runs a lightweight smoke test suite before the main test execution. This primes the environment and absorbs the initial instability. We run 3-5 basic API calls as warm-up and it significantly improved reliability.
For health check endpoints, ALM mf-25.3 added /api/env-health/{env-name} that returns environment status including database connectivity, API availability, and config sync status. Poll this endpoint with a 5-second interval until it returns status: ready before executing tests. Set a 2-minute timeout to fail fast if environment doesn’t stabilize. This cut our flakiness from 25% to under 5%.
Thanks for the version pinning suggestion. I enabled config snapshots but I’m not seeing how to reference specific version IDs in the Harness YAML. Is there a parameter I should be passing to the alm-runner command? Also, how do I determine which config version corresponds to our qa-env environment?