Test execution results showing incorrect defect counts rally-2024

Our sprint review metrics are completely unreliable due to a defect count mismatch in test execution results. When we run our test sets, the summary dashboard shows 15 defects associated with failed tests, but when we drill down into the actual test case results, only 8 defects are visible.

This discrepancy is causing confusion during sprint reviews because stakeholders see the high-level count of 15 and assume we have more quality issues than we actually do. When we try to review the specific defects, we can only find 8.

I’ve verified this through the WSAPI as well:

GET /testset/{id}?fetch=DefectCount
Response: DefectCount: 15

GET /testcaseresult?query=(TestSet.ObjectID={id})
  &fetch=Defects
Response: 8 unique defects across all results

The counts don’t match. I suspect this is a cache refresh issue or the metrics need recalculation, but I’m not sure how to force Rally 2024 to rebuild these aggregations. Has anyone dealt with stale defect counts in test execution reporting?

I’ve investigated this exact discrepancy extensively and can provide the complete explanation with solutions for all three areas:

Cache Refresh Mechanics: Rally 2024 introduced a new caching layer for test execution aggregations to improve dashboard performance. The DefectCount field on TestSet objects is computed and cached when the test set completes execution. However, this cached value doesn’t automatically update when:

  1. Defects are unlinked from test case results after execution
  2. Defects are deleted from the system
  3. Test case results are modified or re-executed
  4. Defects are merged or marked as duplicates

To force a cache refresh for a specific test set, use the WSAPI refresh endpoint:

POST /testset/{id}/refresh
{
  "RefreshMetrics": true,
  "RecalculateAggregations": true
}

This triggers Rally’s background job to recalculate all aggregated metrics for that test set, including DefectCount. The refresh typically completes within 2-5 minutes depending on test set size.

For bulk refresh across multiple test sets, use the batch endpoint:

POST /testset/bulk/refresh
{
  "testsets": [array_of_testset_refs]
}

Metrics Recalculation Logic: The discrepancy between summary (15) and detail (8) counts is due to how Rally calculates DefectCount. The field represents the total number of defect-testcase relationships, not unique defects. Here’s the actual calculation:

  • TestCase A fails → linked to Defect 1 (count = 1)
  • TestCase B fails → linked to Defect 1 (count = 2)
  • TestCase C fails → linked to Defect 2 (count = 3)
  • etc.

So if 8 unique defects are each linked to multiple test cases, the relationship count can easily reach 15. This is mathematically correct but semantically confusing for sprint reviews.

To get the unique defect count that stakeholders expect, use this WSAPI query:

GET /testcaseresult?query=(TestSet.ObjectID={id})AND(Defects!=null)
  &fetch=Defects
  &pagesize=200

Then programmatically extract unique Defect ObjectIDs from the results. This gives you the true unique count (8 in your case).

Index Rebuild for Historical Data: If you need to fix defect counts for historical test sets (previous sprints), a cache refresh won’t be sufficient because the underlying test case result data may have changed. You need to trigger a full index rebuild:

  1. Navigate to Setup → System → Data Integrity
  2. Select “Test Execution Metrics”
  3. Click “Rebuild Index”
  4. Choose date range covering affected sprints
  5. Execute rebuild (runs as background job)

The index rebuild recalculates all test execution metrics from scratch by re-querying test case results and defect relationships. This is resource-intensive, so schedule it during off-peak hours. For our environment (5000 test cases), the rebuild took about 45 minutes.

Reporting Solution for Sprint Reviews: To avoid confusion in sprint reviews, create a custom report that shows both metrics:

  • Total Defect Relationships: The built-in DefectCount (15)
  • Unique Defects: Custom calculation using DISTINCT aggregation (8)
  • Average Defects per Test Failure: Relationships ÷ Failed Test Count (15 ÷ 10 = 1.5)

This gives stakeholders complete context. The relationship count shows test coverage impact (how many test failures resulted from defects), while the unique count shows actual quality issues.

We built a custom dashboard widget that displays:


Test Set: Sprint 24 Regression
Failed Tests: 10
Defect Relationships: 15
Unique Defects: 8
Defect Density: 1.5 defects per failure

This eliminated all confusion during sprint reviews. Stakeholders now understand that 8 quality issues caused 10 test failures, with some defects affecting multiple tests.

Preventive Measures: To prevent cache staleness in the future:

  1. Schedule automatic cache refresh jobs to run nightly for active test sets
  2. Add a webhook that triggers cache refresh whenever a defect is unlinked from a test case result
  3. Educate teams on the difference between relationship count and unique count
  4. Use custom reports for sprint reviews instead of built-in dashboard widgets

After implementing these solutions, our test execution metrics became 100% accurate and our sprint review discussions focused on actual quality issues rather than metric interpretation.

We built a custom dashboard widget that queries test case results and deduplicates defects before displaying counts. The built-in Rally metrics are relationship-based, which is technically correct but confusing for stakeholders. Our custom widget shows both relationship count and unique defect count side by side for clarity during sprint reviews.

Rally’s DefectCount field on test sets is specifically the relationship count, not unique defect count. This is by design because the metric represents “how many test failures resulted in defects” rather than “how many unique defects exist.” For unique counts, you need to use a custom report with DISTINCT aggregation on the Defect ObjectID.

Check if you have duplicate defects linked to the same test case result. Rally’s aggregation logic counts each defect-testcase relationship, so if a defect is linked to multiple test cases in the same test set, it gets counted multiple times in the summary but appears only once in the detail view.