I’ve dealt with this exact scenario for SAP BW to Snowflake pipelines. Here’s a comprehensive solution addressing all the factors:
Snowpipe Configuration Optimization:
First, upgrade your architecture to use SNS-to-SQS pattern instead of direct SQS. Create an SNS topic in the same region as your S3 bucket, configure S3 event notifications to publish to SNS, then subscribe your SQS queue to the topic. This adds retry logic and better handles high-volume bursts from SAP extracts.
For S3 event notifications, ensure you’re using these settings:
- Event type: s3:ObjectCreated:*
- Prefix filter: Match your SAP data path exactly (e.g., sap/bw/transactional/)
- Suffix filter: Add .csv or .parquet to avoid triggering on metadata files
Verify IAM permissions allow S3 to publish to SNS, SNS to send to SQS, and Snowpipe to read from SQS. The policy should include s3:GetObject, s3:GetObjectVersion, and s3:ListBucket.
Warehouse Sizing Strategy:
For SAP transactional data with 15-30 minute delays on X-Small, you need to scale up. Based on typical SAP BW extract volumes:
- If processing 100-500 files/hour: Use Small warehouse
- If processing 500-2000 files/hour: Use Medium warehouse
- Above 2000 files/hour: Consider Large or split into multiple pipes
The key metric is FILES_INSERTED vs FILES_RECEIVED in PIPE_USAGE_HISTORY. If you’re consistently seeing a backlog building during business hours, you’re undersized.
Implement warehouse auto-suspend after 60 seconds to control costs when file arrival is sporadic (common with scheduled SAP extracts).
Monitoring and Troubleshooting:
Set up comprehensive monitoring with these queries:
-- Check pipe lag
SELECT SYSTEM$PIPE_STATUS('SAP_TRANSACTIONAL_PIPE');
-- Monitor load performance
SELECT
TO_DATE(LAST_LOAD_TIME) as load_date,
COUNT(*) as files_loaded,
AVG(ROW_COUNT) as avg_rows,
AVG(FILE_SIZE) as avg_size_mb
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
TABLE_NAME=>'SAP_TRANSACTIONS',
START_TIME=>DATEADD(hours, -24, CURRENT_TIMESTAMP())))
GROUP BY 1 ORDER BY 1 DESC;
Create a monitoring task that runs every 5 minutes:
- Check if pending file count > 100 (indicates backlog)
- Verify last successful load was within last 10 minutes
- Alert if error count increases
SAP-Specific Optimizations:
For SAP BW integration, coordinate with your ABAP team to:
- Schedule extracts during off-peak hours when possible
- Split large extracts into smaller chunks (max 100MB per file)
- Use incremental delta loads rather than full refreshes
- Implement file-level checksums in naming convention for validation
Expected Results:
After implementing these changes, you should see:
- Ingestion latency reduced to 2-5 minutes for most files
- Consistent processing even during peak SAP extract windows
- Better visibility into pipeline health through monitoring
- More predictable dashboard refresh cycles
The combination of proper SNS/SQS configuration, appropriately sized warehouse, and comprehensive monitoring will resolve your delays. Start with the warehouse upgrade to Small - that alone should cut your latency by 60-70% immediately.