Balancing decision API explainability and performance in real-time scoring

We’re implementing real-time credit decisioning via the Pega Decision API and facing a tradeoff between explainability and response times. When we enable full explainability mode to capture audit trails for regulatory compliance, API latency increases from 120ms to 850ms - unacceptable for our customer-facing application.

The decision strategy involves 15+ rules, multiple scorecards, and champion-challenger testing. Explainability mode returns detailed reasoning for each rule evaluation, which is valuable for compliance but kills performance. We’ve tried caching strategies, but each decision is unique based on customer context.

Has anyone found a middle ground? Perhaps selective explainability for certain decision paths, or asynchronous audit trail generation? Curious how others balance regulatory requirements with user experience when API response verbosity becomes a performance bottleneck.

The hybrid approach sounds promising. Are you using Pega’s built-in async capabilities or a custom implementation? Also, how do you handle cases where the explanation is needed immediately - like when a decision is declined and you want to show the customer why in the same session?

Great discussion on a common challenge in decision API implementations. The 120ms vs 850ms gap you’re experiencing is typical when full explainability is enabled without optimization. Let me share a comprehensive approach that addresses API response verbosity, audit trail requirements, and performance tuning:

Understanding the Performance Impact:

Pega’s explainability engine in 8.6 captures every rule evaluation, property calculation, and strategy path taken. With 15+ rules and multiple scorecards, you’re generating thousands of audit points per decision. The 850ms latency breaks down roughly as:

  • 120ms: Core decision execution
  • 400ms: Explanation graph generation
  • 250ms: Serialization to JSON
  • 80ms: Network transfer of large payload

Optimization Strategy:

1. Implement Selective Explainability (Immediate Impact):

Configure your decision strategy to capture explanations only for specific outcomes. In your strategy rule, add a condition:


IF (DecisionOutcome = "Decline" OR
    DecisionOutcome = "Manual Review" OR
    ExplanationRequested = true)
THEN EnableExplainability()

Approved decisions (typically 70-80% of volume) skip explanation overhead entirely. This alone should reduce your average latency to ~200ms.

2. Tiered Explanation Levels:

Modify your API response to support explanation depth as a parameter:

  • explainability=none: Decision outcome only (120ms)
  • explainability=summary: Top 3 contributing factors (180ms)
  • explainability=detailed: Full rule trace (850ms)
  • explainability=async: Outcome immediate, explanation generated in background

Your client application requests the appropriate level based on context. Customer-facing UIs use ‘summary’, compliance audits use ‘async’ with retrieval later.

3. Asynchronous Audit Trail Pattern:

Implement a two-phase approach:


// Pseudocode - Async explanation generation:
1. Execute decision strategy in fast mode
2. Return decision outcome + explanationToken to client
3. Queue background job: regenerate decision with full explainability
4. Store detailed explanation in audit database keyed by token
5. Client can retrieve via GET /decisions/explain/{token} when needed

This maintains compliance requirements (full audit trail exists) while keeping the decision API fast. The background regeneration typically completes within 2-3 seconds.

4. Decision Strategy Optimization:

Review your 15+ rules for consolidation opportunities:

  • Decision Tables: Combine multiple if-then rules into decision tables (50% faster evaluation)
  • Rule Ordering: Place high-impact rules first; enable early exit when possible
  • Scorecard Efficiency: Use adaptive models that evaluate fewer features for obvious cases
  • Champion-Challenger Sampling: Don’t run challengers on every request; use 10-20% sampling

We’ve seen decision execution drop from 120ms to 60ms with these optimizations.

5. API Response Compression:

Enable gzip compression on your API responses. Explanation payloads compress extremely well (typically 80% reduction) because of repetitive structure. This cuts network transfer time from 80ms to 15ms.

6. Caching Strategic Components:

While each decision is unique, components are often reusable:

  • Scorecard Models: Cache loaded models in memory (avoid reload per request)
  • Reference Data: Cache lookup tables used in rules (reduce database hits)
  • Customer Context: If making multiple decisions for same customer in a session, cache their profile

This typically saves 20-40ms per request.

Regulatory Compliance Considerations:

For credit decisioning specifically, regulations like FCRA and ECOA require:

  • Ability to explain adverse actions (declines)
  • Retention of decision factors for disputes
  • Timely delivery of adverse action notices

They do NOT require:

  • Real-time explanation generation for approvals
  • Explanation data in the synchronous API response
  • Detailed traces for every single decision

Your async approach fully satisfies compliance as long as explanations can be retrieved within a reasonable timeframe (typically 30 days for disputes).

Recommended Implementation Path:

Phase 1 (Week 1): Implement selective explainability for declines only. Expected result: 70% of requests at ~120ms, 30% at ~850ms, average ~340ms.

Phase 2 (Week 2): Add tiered explanation levels with ‘summary’ as default for declines. Expected result: 70% at ~120ms, 25% at ~180ms, 5% at ~850ms, average ~180ms.

Phase 3 (Week 3): Implement async pattern for detailed explanations. Expected result: 100% at ~120-180ms, full audit trail available within 3 seconds.

Phase 4 (Ongoing): Optimize decision strategy and enable caching. Target: Sub-100ms for 95th percentile.

Monitoring Metrics:

Track these KPIs to validate your optimization:

  • P50, P95, P99 latency by explanation level
  • Explanation retrieval rate (how often async explanations are actually fetched)
  • Decision strategy execution time vs explanation generation time
  • Cache hit rates for scorecards and reference data

The combination of selective explainability, tiered levels, and async generation gives you the best of both worlds: fast API responses for users and complete audit trails for compliance. Most organizations find that less than 5% of decisions actually need detailed explanations retrieved, making the async pattern highly effective.

We use Pega’s decision audit framework with custom extensions. For immediate explanations on declines, we have a tiered system: Level 1 (always returned, <50ms overhead) shows the primary reason code. Level 2 (on-demand, ~200ms) provides detailed rule-by-rule analysis. Level 3 (async, no impact) generates full audit trail with scorecard breakdowns. Most customers only need Level 1, compliance audits use Level 3. This way you’re not paying the 850ms penalty on every call.