After analyzing defect resolution patterns across 50+ releases, I can share what we’ve learned about severity weighting and priority scoring for meaningful KPI formulas:
Core Formula Structure: Use additive weighting for severity and priority, with age as a multiplicative factor. Multiplicative severity×priority creates exponential scaling that distorts the metric - a Critical/P1 becomes 100x more important than Medium/P3, which doesn’t match actual business impact.
Our Production Formula:
Base_Score = (Severity_Weight * 0.65) + (Priority_Weight * 0.35)
Age_Multiplier = 1 + (Weeks_Open * 0.12)
Final_Score = Base_Score * Age_Multiplier * Status_Factor
Severity weights: Critical=100, High=60, Medium=30, Low=10
Priority weights: P1=100, P2=70, P3=40, P4=15
Status factors: New=1.0, In Progress=0.7, Blocked=1.3
Why 65/35 Split: Our historical data showed severity is a better predictor of customer escalations and release blockers than priority. Priority captures business judgment but is more subjective and variable across teams.
Age Multiplier Calibration: 12% per week means a defect doubles in score after about 8 weeks, which aligns with when stakeholders typically start escalating. Adjust this based on your release cadence.
Handling Severity/Priority Conflicts: Build a separate “alignment score” that flags when severity and priority diverge by 2+ levels. Display this as a dashboard widget showing count of misaligned defects. Don’t try to hide the disagreement in a composite score - make it visible for triage discussions.
Custom Reports Implementation: Create the calculated attribute in your work item type definition, then reference it in Insight Reporting custom reports. This ensures consistency and lets you filter/sort by risk score. You can also use the REST API to bulk-calculate scores for existing defects if you’re changing the formula.
Dashboard Design: Show three metrics side by side - Total Defect Count (unweighted), Total Risk Score (weighted), and Average Score Per Defect. This helps teams understand whether they have many low-risk issues or few high-risk ones. We also trend the 4-week moving average of new score added vs score resolved to show if risk is accumulating or declining.
The biggest improvement in decision-making came from separating technical risk (severity-driven) from business impact (priority-driven) in our dashboards, as another commenter mentioned. Executives care about business impact, engineering cares about technical risk, and showing both prevents talking past each other in release readiness meetings.