Your Lambda Invoke API rate limiting issue requires a multi-faceted approach addressing all three key areas:
Lambda API Rate Limits:
The synchronous Invoke API has a default quota of 10 requests per second per region (some regions have 20). At 500 records/minute, you’re averaging 8.3 invocations per second, which is already at the limit threshold. Any burst or variance pushes you over. The solution isn’t just handling the limit but reducing your API call frequency.
Immediate action: Request a service quota increase through AWS Service Quotas console. Select “Lambda” → “Synchronous invocation requests per second” and request an increase to 50-100 TPS. AWS typically approves these within 24-48 hours for legitimate use cases.
Batch Invocation Best Practices:
Refactor your pipeline to batch records before invoking Lambda:
// Current: 1 record = 1 API call
invoke(lambda, {record: single_record})
// Optimized: 20 records = 1 API call
invoke(lambda, {records: [r1, r2, ..., r20]})
Implement client-side batching logic:
// Pseudocode - Batch accumulator:
1. Accumulate SQS messages in buffer (max 20 or 5sec timeout)
2. When batch full or timeout: invoke Lambda with batch
3. Parse batch response for individual record results
4. Update database with per-record success/failure
5. Delete processed messages from SQS
This reduces your 500 invocations/minute to just 25 invocations/minute (20x improvement), well under any rate limit. Your Lambda needs to handle batch input and return structured results:
Input: {"records": [{"id": "1", "data": "..."}, ...]} Output: {“results”: [{“id”: “1”, “status”: “success”}, …]}
Exponential Backoff with Jitter:
Even with batching, implement retry logic for resilience:
// Pseudocode - Retry with exponential backoff:
1. Set base_delay = 100ms, max_retries = 5
2. On TooManyRequestsException:
- Calculate: delay = min(base_delay * 2^attempt, 10000)
- Add jitter: delay += random(0, delay * 0.3)
- Sleep for delay milliseconds
- Retry invocation
3. If max_retries exceeded: send to DLQ for manual review
The jitter (30% randomization) prevents synchronized retries across multiple pipeline workers, which would create a thundering herd. Critical: Use jitter based on the calculated delay, not a fixed range.
Additional Optimizations:
- Implement client-side rate limiting using a token bucket algorithm: Allow 8 invocations per second with burst capacity of 15. This prevents hitting AWS limits.
- Monitor CloudWatch metric
Throttles for your Lambda function. If non-zero, you’re also hitting concurrent execution limits (separate from API limits).
- Consider Lambda’s native SQS event source mapping as an alternative. It automatically batches and handles retries, eliminating Invoke API calls entirely. Configure batch size to 20 and batch window to 5 seconds.
The combination of batching (25x reduction in API calls) plus backoff/jitter (graceful retry handling) plus rate limit increase (safety margin) will completely eliminate your Rate Limit Exceeded errors while maintaining synchronous processing semantics.