Environment-specific API behavior is a common challenge in Pega implementations, especially for automation testing and CI/CD pipelines. Let me break down the key factors affecting REST API behavior across test and production environments:
Configuration Factors Impacting API Behavior:
1. Authentication and Security Settings:
Test and production often have different authentication configurations:
- OAuth token expiration: Prod may have shorter token lifetimes (5 min vs 30 min in test)
- Certificate validation: Prod enforces strict SSL/TLS validation, test may skip it
- IP whitelisting: Prod may restrict API access to specific IP ranges
- Rate limiting per authentication context: Different limits for service accounts vs user accounts
These cause authentication failures or slower handshake times in production.
2. Performance and Resource Allocation:
Production environments typically have more conservative resource settings:
prconfig.xml differences:
http.connection.timeout: Test=30000ms, Prod=10000ms (stricter timeout)
http.pool.max.connections: Test=200, Prod=500 (but shared across more users)
requestor.pool.size: Different thread allocation affects concurrent API handling
database.connection.pool.max: Smaller pool = slower DB queries in API logic
Your 2s → 15s slowdown suggests production has lower resource allocation per request.
3. Data Source and Integration Differences:
Validation discrepancies often stem from:
- Data pages: Pointing to different databases (test DB vs prod DB with different data)
- External REST connectors: Different endpoint URLs with varying response times
- Decision tables: Environment-specific configurations with different validation logic
- Lookup tables: Prod may have stricter reference data causing validation failures
4. Rule Resolution and Caching:
Subtle differences in rule behavior:
- Ruleset versions: Even minor version differences (8.5.3.1 vs 8.5.3.2) can change validation
- Rule caching: Prod may have aggressive caching causing stale rule evaluation
- When conditions: Rules with environment-specific when conditions (checking server name, etc.)
- Access groups: Different access groups in prod may have different rule visibility
5. Network and Load Balancer Configuration:
Infrastructure differences causing timeouts:
- Load balancer timeouts: Prod LB may terminate connections after 10s
- API gateway throttling: Prod may have API gateway with request queuing
- Network latency: Prod in different datacenter = higher latency to external systems
- SSL offloading: Prod may handle SSL at LB level adding overhead
Diagnostic Approach:
Step 1: Enable Detailed API Logging
In both environments, enable tracer and PAL for API requests:
- Capture full request/response including headers
- Log rule execution times within API flow
- Track database query performance
- Monitor thread allocation and queuing
Compare logs side-by-side to identify where behavior diverges.
Step 2: Isolate Configuration Differences
Create a configuration comparison checklist:
// Pseudocode - Config comparison script:
1. Export prconfig.xml from both environments
2. Diff authentication settings (OAuth, certificates)
3. Compare dynamic system settings (DSS) values
4. Check data source configurations
5. Verify ruleset versions and application versions
6. Document any environment-specific when conditions
Step 3: Performance Baseline Testing
Run controlled performance tests:
- Single API call to simple endpoint (no external dependencies)
- Measure: authentication time, rule execution time, response serialization time
- Compare test vs prod for identical call
- This isolates infrastructure vs application issues
Step 4: Validation Rule Analysis
For validation failures:
- Extract the exact validation rule that’s failing in prod
- Check if rule references data pages or decision tables
- Verify data sources for those references
- Test rule execution directly (not via API) in both environments
Solutions and Best Practices:
1. Environment Parity Checklist:
Maintain a formal checklist of configs that must match:
- Core prconfig settings (timeouts, pools, threads)
- Authentication configuration (token lifetimes, certificate validation)
- Data source endpoints (ensure test data sources are prod-like)
- Rate limiting and throttling settings
- Load balancer and network timeouts
2. Configuration as Code:
Store environment configs in version control:
- Template prconfig.xml with environment variables
- Automate config deployment with validation checks
- Maintain separate configs for test/staging/prod with documented differences
- Use Pega’s configuration management features to track changes
3. Synthetic Monitoring:
Implement continuous API monitoring in all environments:
- Run lightweight API health checks every 5 minutes
- Alert on response time degradation (>2x baseline)
- Alert on validation errors or authentication failures
- Track trends to catch gradual performance decay
4. Environment-Specific Testing:
Extend your CI/CD pipeline:
- Run smoke tests in production after deployment (read-only APIs)
- Compare test results against production baseline
- Flag any discrepancies for investigation before promoting
- Maintain separate test data sets that mirror production data patterns
5. Rate Limit and Error Handling:
Make your automation more resilient:
- Implement exponential backoff for timeout errors
- Detect rate limiting (429 responses) and adjust request rate
- Add environment-specific timeout configurations in test framework
- Log detailed error context for faster troubleshooting
For Your Specific Issues:
Timeout Problem (2s → 15s):
Most likely causes:
- Production database connection pool exhausted (check active connections)
- Load balancer or API gateway queuing requests (check queue depth metrics)
- External service calls slower in prod (check connector response times)
- Resource contention from other applications (check server CPU/memory)
Validation Problem (pass in test, fail in prod):
Most likely causes:
- Data pages pulling different reference data (check data source configs)
- Decision tables with environment-specific rules (check for when conditions)
- Access group differences affecting rule visibility (check requestor access group)
- Date/time sensitive validation with different server timezones
Recommended Action Plan:
Week 1: Enable detailed logging and capture 10 examples of failing API calls in prod vs successful in test. Analyze logs to identify exact divergence point.
Week 2: Run configuration audit comparing all prconfig, DSS, and data source settings. Document differences and assess which are legitimate (security) vs problematic (performance).
Week 3: Implement environment-specific timeout configs in your test framework to match production’s stricter limits. Add retry logic for transient failures.
Week 4: Set up synthetic monitoring in production to catch regressions early. Create runbook for common environment-specific issues.
The key insight is that test and production will never be 100% identical - security and scale requirements demand differences. Your automation framework needs to account for expected variations while alerting on unexpected ones. Focus on making your tests resilient to legitimate environment differences rather than trying to eliminate all differences.