Here’s a comprehensive solution addressing all three aspects of your OAuth2 token management issue:
1. OAuth2 Refresh Token Flow Implementation
Implement proactive token refresh in your sync job. Before each batch operation, check token validity:
if (currentTime + 300 > tokenExpiryTime) {
refreshAccessToken(refreshToken);
updateStoredTokens(newAccess, newRefresh);
}
This checks 5 minutes before expiry and refreshes preemptively. Never wait for 401 errors to trigger refresh - that causes batch failures and data inconsistencies.
2. Long-Running Sync Job Handling
For multi-hour sync operations, implement a token refresh wrapper around your batch processing:
- Store initial token expiry timestamp (issued_at + expires_in)
- Before each batch API call, validate token freshness
- Use a thread-safe token store if running parallel sync jobs
- Log all token refresh operations with timestamps for debugging
Key pattern: Treat token management as a cross-cutting concern, not batch-specific logic. Create a dedicated TokenManager class that your sync service depends on.
3. API Error 401 Troubleshooting
When 401 errors occur despite refresh logic:
- Verify refresh token hasn’t expired (90-day default in Zendesk Sell)
- Check that you’re updating BOTH access and refresh tokens after each refresh (refresh tokens are single-use)
- Confirm your Integration Hub API credentials have correct scopes (contacts:write, contacts:read)
- Monitor token endpoint rate limits (max 10 refresh requests per minute)
- Implement exponential backoff for transient failures: 2s, 4s, 8s delays
Common pitfall: Reusing the same refresh token multiple times. Each refresh operation returns a NEW refresh token that must replace the old one in your secure storage.
Implementation Best Practices:
- Store tokens encrypted in your database with expiry timestamps
- Implement a token refresh mutex for concurrent sync jobs
- Set up monitoring alerts for refresh failures (>3 failures = investigation needed)
- Test token refresh logic with artificially short token lifespans (5-minute tokens) in staging
- Document token lifecycle in your integration runbook
For your specific 50k+ record sync scenario, this approach will handle the 45+ minute runtime without interruption. The proactive refresh ensures the token is always valid when batches execute, eliminating the random 401 failures you’re experiencing between batches 85-95.
If you need to handle even longer sync jobs (4+ hours), consider implementing a job checkpoint system that can resume from the last successful batch if any catastrophic failure occurs. This pairs well with the token refresh logic and provides additional resilience for large-scale data migrations.