Great discussion on a common challenge in decision API implementations. The 120ms vs 850ms gap you’re experiencing is typical when full explainability is enabled without optimization. Let me share a comprehensive approach that addresses API response verbosity, audit trail requirements, and performance tuning:
Understanding the Performance Impact:
Pega’s explainability engine in 8.6 captures every rule evaluation, property calculation, and strategy path taken. With 15+ rules and multiple scorecards, you’re generating thousands of audit points per decision. The 850ms latency breaks down roughly as:
- 120ms: Core decision execution
- 400ms: Explanation graph generation
- 250ms: Serialization to JSON
- 80ms: Network transfer of large payload
Optimization Strategy:
1. Implement Selective Explainability (Immediate Impact):
Configure your decision strategy to capture explanations only for specific outcomes. In your strategy rule, add a condition:
IF (DecisionOutcome = "Decline" OR
DecisionOutcome = "Manual Review" OR
ExplanationRequested = true)
THEN EnableExplainability()
Approved decisions (typically 70-80% of volume) skip explanation overhead entirely. This alone should reduce your average latency to ~200ms.
2. Tiered Explanation Levels:
Modify your API response to support explanation depth as a parameter:
explainability=none: Decision outcome only (120ms)
explainability=summary: Top 3 contributing factors (180ms)
explainability=detailed: Full rule trace (850ms)
explainability=async: Outcome immediate, explanation generated in background
Your client application requests the appropriate level based on context. Customer-facing UIs use ‘summary’, compliance audits use ‘async’ with retrieval later.
3. Asynchronous Audit Trail Pattern:
Implement a two-phase approach:
// Pseudocode - Async explanation generation:
1. Execute decision strategy in fast mode
2. Return decision outcome + explanationToken to client
3. Queue background job: regenerate decision with full explainability
4. Store detailed explanation in audit database keyed by token
5. Client can retrieve via GET /decisions/explain/{token} when needed
This maintains compliance requirements (full audit trail exists) while keeping the decision API fast. The background regeneration typically completes within 2-3 seconds.
4. Decision Strategy Optimization:
Review your 15+ rules for consolidation opportunities:
- Decision Tables: Combine multiple if-then rules into decision tables (50% faster evaluation)
- Rule Ordering: Place high-impact rules first; enable early exit when possible
- Scorecard Efficiency: Use adaptive models that evaluate fewer features for obvious cases
- Champion-Challenger Sampling: Don’t run challengers on every request; use 10-20% sampling
We’ve seen decision execution drop from 120ms to 60ms with these optimizations.
5. API Response Compression:
Enable gzip compression on your API responses. Explanation payloads compress extremely well (typically 80% reduction) because of repetitive structure. This cuts network transfer time from 80ms to 15ms.
6. Caching Strategic Components:
While each decision is unique, components are often reusable:
- Scorecard Models: Cache loaded models in memory (avoid reload per request)
- Reference Data: Cache lookup tables used in rules (reduce database hits)
- Customer Context: If making multiple decisions for same customer in a session, cache their profile
This typically saves 20-40ms per request.
Regulatory Compliance Considerations:
For credit decisioning specifically, regulations like FCRA and ECOA require:
- Ability to explain adverse actions (declines)
- Retention of decision factors for disputes
- Timely delivery of adverse action notices
They do NOT require:
- Real-time explanation generation for approvals
- Explanation data in the synchronous API response
- Detailed traces for every single decision
Your async approach fully satisfies compliance as long as explanations can be retrieved within a reasonable timeframe (typically 30 days for disputes).
Recommended Implementation Path:
Phase 1 (Week 1): Implement selective explainability for declines only. Expected result: 70% of requests at ~120ms, 30% at ~850ms, average ~340ms.
Phase 2 (Week 2): Add tiered explanation levels with ‘summary’ as default for declines. Expected result: 70% at ~120ms, 25% at ~180ms, 5% at ~850ms, average ~180ms.
Phase 3 (Week 3): Implement async pattern for detailed explanations. Expected result: 100% at ~120-180ms, full audit trail available within 3 seconds.
Phase 4 (Ongoing): Optimize decision strategy and enable caching. Target: Sub-100ms for 95th percentile.
Monitoring Metrics:
Track these KPIs to validate your optimization:
- P50, P95, P99 latency by explanation level
- Explanation retrieval rate (how often async explanations are actually fetched)
- Decision strategy execution time vs explanation generation time
- Cache hit rates for scorecards and reference data
The combination of selective explainability, tiered levels, and async generation gives you the best of both worlds: fast API responses for users and complete audit trails for compliance. Most organizations find that less than 5% of decisions actually need detailed explanations retrieved, making the async pattern highly effective.