Lambda function times out when performing batch write operations to DynamoDB with large payloads

brandon_guru · December 5, 2024, 4:38pm

Our Lambda function processes S3 event notifications and writes records to DynamoDB using batch_write_item. It works fine for small files but times out (3 minute limit) when processing larger datasets. The function receives a list of items from S3, transforms them, and attempts to write 500-800 records in batches of 25.

Current implementation:

with table.batch_writer() as batch:
    for item in items:
        batch.put_item(Item=transform_item(item))

We’ve tried increasing memory from 512MB to 3GB which improved performance slightly but still hitting timeouts. The payload size for each item is around 2-3KB. Should we be chunking the data differently or is there a better approach for handling large batch operations in Lambda?

angela_coder · December 8, 2024, 6:25am

The issue is likely unprocessed items accumulating. DynamoDB batch_write_item has throughput limits and can return unprocessed items if you exceed provisioned capacity. The boto3 batch_writer handles retries but that takes time. Check your DynamoDB table’s WCU (write capacity units) - you might be throttling. Also, are you processing everything in one Lambda invocation?

brandon_guru · January 19, 2025, 12:42am

Glad you identified the transformation bottleneck. For the complete solution, here’s a proven pattern that addresses all three key issues: Lambda timeout constraints, DynamoDB batch write efficiency, and payload size management.

Architecture Pattern:

Producer Lambda (S3 trigger) - Reads file, creates batches of 10-25 items, sends to SQS
SQS Queue - Buffers work items with visibility timeout > Lambda timeout
Consumer Lambda (SQS trigger) - Processes batches and writes to DynamoDB

Producer Lambda optimization:

import boto3, json
sqs = boto3.client('sqs')
QUEUE_URL = 'your-queue-url'
BATCH_SIZE = 25

for i in range(0, len(items), BATCH_SIZE):
    batch = items[i:i+BATCH_SIZE]
    sqs.send_message(QueueUrl=QUEUE_URL, MessageBody=json.dumps(batch))

Consumer Lambda configuration:

Set batch size to 10 (SQS trigger receives up to 10 messages)
Reserved concurrency: 50-100 (controls parallel executions)
Timeout: 60 seconds (processes smaller chunks quickly)

DynamoDB write optimization:

def write_batch(items):
    with table.batch_writer(overwrite_by_pkeys=['id']) as batch:
        for item in items:
            batch.put_item(Item=item)

Key improvements this provides:

Lambda Timeout Resolution: Each consumer Lambda processes only 10-25 items (250 items max with batch size 10), completing in 10-30 seconds instead of approaching the 3-minute limit.
Payload Size Management: SQS messages stay under 256KB limit. If individual items are 2-3KB, batches of 25 keep messages around 75KB. For larger items, reduce batch size to 10-15.
DynamoDB Batch Write Efficiency: Consumer Lambda handles retries automatically. If throttling occurs, SQS message visibility timeout ensures the batch is retried. Set DynamoDB table to on-demand or provision WCU = (average item size KB × items per second) / 1KB.

Additional Optimizations:

Enable SQS dead-letter queue for failed messages after 3 retries
Use Lambda reserved concurrency to prevent overwhelming DynamoDB
Add CloudWatch alarms on SQS ApproximateAgeOfOldestMessage (alert if queue backs up)
Consider DynamoDB PartiQL batch statements if you need conditional writes

This pattern scales to millions of items while keeping individual Lambda executions fast and reliable. The SQS buffer absorbs traffic spikes and DynamoDB auto-scaling adapts to the sustained write rate.

brandon_coder · December 24, 2024, 7:46am

On-demand mode has a default limit of 4000 WCU per table which can be exceeded during spikes. Even with auto-scaling, there’s a brief adaptation period. For your use case, I’d recommend using SQS as a buffer. Have the first Lambda read S3 and send messages to SQS in batches, then use a second Lambda with SQS trigger to process smaller chunks. This spreads the load and prevents timeouts since each Lambda invocation handles fewer items. You could also use Step Functions to orchestrate parallel Lambda executions if you need faster processing. The key is breaking the monolithic batch operation into manageable units that complete well under the timeout limit.

jessicalead · December 11, 2024, 1:21am

Yes, single invocation processes the entire file. The table is using on-demand billing so WCU shouldn’t be the bottleneck, but I see throttling metrics in CloudWatch. Would splitting into multiple Lambda invocations help? How would you recommend structuring that?

Topic		Replies	Views
Lambda function times out when processing large files from S3 trigger in batch workflow Amazon Web Services (AWS) question , compute , timeout , devops-auto , lambda , aws-2021 , python , s3 , step-functions	5	1	October 20, 2025
Lambda Invoke API returns Rate Limit Exceeded during high-volume batch processing jobs Amazon Web Services (AWS) question , compute , rest-api , lambda , batch-processing , aws-2021 , concurrency , api-rate-limiting , job-failure	4	0	April 21, 2025
Using Lambda and DynamoDB Streams for real-time database analytics Amazon Web Services (AWS) discussion , serverless , compute , database , event-driven , lambda , aws-2021 , real-time-analytics , dynamodb	3	0	July 15, 2025
DynamoDB Streams for real-time inventory sync across distributed warehouses Amazon Web Services (AWS) use-case , compute , database , lambda , aws-2019 , python , sync-delay , order-accuracy , dynamodb	7	0	December 14, 2024
Comparing EC2 vs Lambda for batch processing in ERP workloads: cost, scalability, and operational overhead Amazon Web Services (AWS) discussion , serverless , compute , scalability , lambda , batch-processing , aws-2019 , architecture-choice , ec2	3	0	July 21, 2025
Kinesis Firehose delivery stream drops large IoT event payloads during peak hours AWS IoT question , json , data-loss , awsiot-24 , event-processin , data-stream , kinesis-firehose , payload-chunking , buffer-config	7	0	January 6, 2025
CloudWatch Logs Insights API batch query limits and performance tuning for high-volume log analytics Amazon Web Services (AWS) discussion , timeout , observability , batch-processing , aws-2020 , rate-limit , apis , analytics-delay , cloudwatch-logs	5	0	May 13, 2025
Function Compute times out when processing large payroll batch exports Alibaba Cloud question , serverless , payroll , compute , timeout , event-driven , batch-processing , function-compute , ac-2021	3	0	January 5, 2025
REST API batch update fails in inventory optimization when payload exceeds 500 records Oracle Fusion Cloud SCM question , integration , performance , rest-api , batch-processing , inventory-opt , ofc-23d , json , payload-optimization	7	0	February 10, 2025

Lambda function times out when performing batch write operations to DynamoDB with large payloads

Related topics