Implemented defect trend analysis from test automation reduced escape rate by 68 percent

ninjaerp · December 19, 2025, 6:54pm

I want to share our successful implementation of automated defect trend analysis that significantly reduced our production defect escape rate. We integrated Azure Analytics with our test automation framework to identify patterns between test failures and production defects.

The core approach involved correlating test automation results with defect data using Analytics queries and REST API automation. When specific test patterns emerged (e.g., intermittent failures in particular modules), we automatically created risk assessment work items and adjusted our test coverage accordingly. Over 6 months, our production defect escape rate dropped from 22% to 7%.

Key implementation:

# Query Analytics for test failure patterns
response = requests.get(
  f'{org_url}/_odata/v4.0-preview/TestRuns',
  params={'$filter': 'Outcome eq "Failed"'}
)

This use case demonstrates how risk-based testing driven by automated analytics can dramatically improve quality outcomes.

marcosys · December 26, 2025, 12:42am

Impressive results Patricia. Can you share more details about how you correlated test failures with production defects? What specific patterns did you look for, and how did you automate the risk assessment work item creation?

taruncoder · December 26, 2025, 5:57am

Sure Kevin. We tracked several correlation patterns: test failures in specific modules that later had production defects, intermittent test failures that indicated environmental issues, and test execution time increases that suggested performance degradation. The automation used Azure DevOps REST API to create risk assessment work items when these patterns exceeded thresholds. I can share more technical details about the Analytics queries and REST automation logic.

gurucode · December 30, 2025, 6:46pm

The sliding window approach is smart. How did you structure your Analytics queries to efficiently process the historical data? With millions of test results, query performance must have been a concern. Did you use pre-aggregated Analytics views or raw OData queries?

elenaapi · January 1, 2026, 7:41pm

Here’s the complete technical implementation that achieved our 68% reduction in defect escape rate:

1. Defect Correlation Framework:

We built a correlation engine that analyzes relationships between test automation results and production defects:

# Analytics query for defect correlation
analytics_query = f"""
{org_url}/_odata/v4.0-preview/TestRuns?
$filter=CompletedDate ge {start_date} and Outcome eq 'Failed'
&$expand=TestRun($select=TestRunId,CompletedDate,Build),
         Test($select=TestName,Area)
&$select=TestRunId,Outcome,Duration
"""

# Query defects in same area/timeframe
defect_query = f"""
{org_url}/_odata/v4.0-preview/WorkItems?
$filter=WorkItemType eq 'Bug' and State eq 'Closed'
        and ClosedDate ge {start_date}
&$select=WorkItemId,Title,AreaPath,ClosedDate,Severity
"""

2. Risk-Based Testing Pattern Detection:

We identified four key patterns that correlated with production defects:

Pattern 1: Module Failure Clustering

3+ test failures in same area path within 5 test runs
At least 1 production defect in that area in past 90 days
Action: Increase test coverage in that module by 25%

Pattern 2: Intermittent Failure Escalation

Same test fails intermittently (passes, then fails, then passes)
Occurs in 40%+ of test runs over 2-week window
Action: Flag as environmental issue, create stability work item

Pattern 3: Performance Degradation

Test execution duration increases by 30%+ over baseline
Sustained over 7+ consecutive runs
Action: Create performance investigation work item

Pattern 4: Cross-Module Failure Propagation

Failures cascade across dependent modules
Correlation coefficient > 0.7 between module test failures
Action: Increase integration test coverage between modules

3. Analytics Queries Optimization:

To handle millions of test results efficiently:

# Use pre-aggregated Analytics views
view_query = f"""
{org_url}/{project}/_odata/v4.0-preview/TestResultsDaily?
$apply=filter(CompletedDate ge {start_date})
       /groupby((AreaPath,Outcome),
                aggregate(ResultCount with sum as TotalTests))
&$orderby=CompletedDate desc
"""

Key optimizations:

Use TestResultsDaily view instead of raw TestResults (90% faster)
Apply server-side aggregation with $apply
Filter by date range first to limit dataset
Cache results for 1 hour to reduce API calls

4. REST API Automation for Risk Assessment:

Automatic work item creation when patterns detected:

# Create risk assessment work item
risk_item = {
  "op": "add",
  "path": "/fields/System.Title",
  "value": f"Risk: High failure rate in {area_path}"
}

response = requests.post(
  f'{org_url}/{project}/_apis/wit/workitems/$Risk Assessment?api-version=7.0',
  json=[risk_item],
  headers={'Content-Type': 'application/json-patch+json'}
)

Risk assessment work items include:

Affected area path and module
Pattern type detected
Historical defect correlation data
Recommended action (increase coverage, investigate stability, etc.)
Links to related test runs and defects

5. Implementation Architecture:


// Pseudocode - Complete automation workflow:
1. Scheduled job runs every 4 hours
2. Query Analytics for test results from past 14 days
3. Query defects from past 90 days in same areas
4. Calculate correlation coefficients between test failures and defects
5. Apply pattern detection algorithms
6. For each detected pattern:
   - Check if risk assessment already exists (avoid duplicates)
   - Create risk assessment work item via REST API
   - Link to related test runs and defects
   - Assign to area owner for review
7. Update test coverage recommendations in backlog
8. Send daily summary report to quality team

6. False Positive Reduction:

Our approach to minimize noise:

Sliding Window Analysis: 2-week minimum observation period
Threshold Tuning: Started conservative (5 failures), tuned down to 3 after validation
Flaky Test Exclusion: Tests marked as flaky excluded from correlation analysis
Historical Validation: Required at least 1 production defect in area within past 90 days
Manual Review Loop: Quality team reviews risk assessments weekly, provides feedback to refine patterns

7. Test Coverage Adjustment:

Based on risk assessments, we automatically adjusted test coverage:

# Calculate recommended coverage increase
if risk_level == 'High':
    coverage_increase = 0.25  # 25% more tests
elif risk_level == 'Medium':
    coverage_increase = 0.15  # 15% more tests

# Create test case work items
for i in range(int(current_tests * coverage_increase)):
    # REST API call to create test case
    # Link to risk assessment work item

8. Results and Metrics:

Over 6-month implementation period:

Production Defect Escape Rate: 22% → 7% (68% reduction)
Test Coverage: 67% → 84% (in high-risk modules)
Mean Time to Detect Defects: 12 days → 3 days
False Positive Rate: Started at 31%, tuned down to 8%
Risk Assessments Created: 147 total, 89% resulted in actionable improvements

9. Key Success Factors:

Defect Correlation: Linking test failures to production defects provided clear signal
Risk-Based Testing: Focused test expansion on high-risk areas, not blanket coverage increases
Analytics Queries: Pre-aggregated views made analysis scalable to millions of test results
REST Automation: Eliminated manual work in creating and tracking risk assessments
Continuous Tuning: Weekly review and threshold adjustment reduced false positives significantly

10. Lessons Learned:

Start with conservative thresholds and tune based on feedback
Pre-aggregated Analytics views are essential for performance at scale
Exclude known flaky tests to reduce noise
Require historical defect data for validation (prevents acting on spurious patterns)
Automate the entire workflow - manual processes don’t scale

This implementation demonstrates how combining Azure Analytics queries, REST API automation, defect correlation analysis, and risk-based testing can dramatically improve quality outcomes. The key is systematic pattern detection backed by historical data, not just reactive test coverage increases.

valentinasolver · December 27, 2025, 2:47am

This is exactly what we need. How did you handle false positives - tests that fail intermittently but don’t correlate with actual defects? And what was your threshold for triggering risk assessment work items? We’ve tried similar approaches but got overwhelmed with noise.

Topic		Replies	Views
Automated defect triage from test cases cut MTTR 65% using test failure signatures Azure DevOps use-case , automation , rest-api , defect-mgmt , logic-apps , test-case-mgmt , ado-2025 , manual-triage , mttr-65pct	5	0	December 5, 2025
Automated traceability matrix sync to Power BI dashboards cuts manual reporting by 75% Azure DevOps use-case , workflow-automation , power-bi , traceability-matrix , odata , power-automate , manual-reporting , ado-2025 , odata-power-analytics	7	0	November 18, 2025
Automated risk-based test selection for compliance suite ach Azure DevOps use-case , risk-mgmt , rest-api , test-automation , compliance-reporting , odata-api , audit-automation , ado-2023 , test-filtering	5	1	September 14, 2025
Automated impact analysis for release planning: Tracing requirements to test cases and defects Azure DevOps use-case , traceability , azure-devops , power-automate , automation-mgmt , release-planning , ado-2024 , manual-impact-analysis , reduced-release-risk	5	0	August 5, 2025
Automated bi-directional defect sync between test execution and defect tracking Siemens Polarion ALM use-case , rest-api , java , test-automation , defect-mgmt , webhooks , manual-logging , test-case-mgmt , pol-2406	5	0	October 20, 2025
Risk-based go/no-go decision automation for releases PTC Codebeamer use-case , risk-mgmt , release-mgmt , workflow-automation , defect-tracking , test-execution , automation-mgmt , cb-22 , manual-risk	3	0	August 19, 2025
Automated quality checks in manufacturing using rules-engine and real-time analytics Microsoft Azure IoT use-case , manufacturing , quality-control , rules-engine , power-bi , real-time-processing , azure-stream-analytics , analytics-ml , aziot-24	6	0	August 28, 2025
Manual vs risk-based test case prioritization impact on sprint velocity Azure DevOps discussion , risk-assessment , test-coverage , test-case-mgmt , backlog-planning , velocity-metrics , ado-2025 , test-plans , prioritization-strategy	4	0	November 9, 2025
End-to-end traceability dashboard cut defect escape rate 65% Micro Focus ALM / Quality Center use-case , traceability , real-time-monitoring , quality-metrics , dashboard-customization , defect-tracking , sprint-mgmt , mf-24 , risk-analysis	7	0	September 18, 2025

Implemented defect trend analysis from test automation reduced escape rate by 68 percent

Related topics