Snowpipe delays when loading SAP transactional data through

We’re experiencing significant delays with Snowpipe ingestion from our SAP BW system. The pipeline loads transactional data via S3, but we’re seeing 15-30 minute lags between file arrival and data availability in Snowflake.

Our setup uses S3 event notifications to trigger Snowpipe, but the SQS queue seems to accumulate messages during peak hours (8 AM - 11 AM). We’ve checked the Snowpipe status and it shows ‘RUNNING’, but the COPY_HISTORY reveals inconsistent load times.

The warehouse assigned to Snowpipe tasks is X-Small. Could this be causing the bottleneck? We need near real-time analytics for our SAP sales data, and these delays are impacting dashboard refresh rates. Any insights on optimizing Snowpipe configuration or monitoring would be appreciated.

I’ve dealt with this exact scenario for SAP BW to Snowflake pipelines. Here’s a comprehensive solution addressing all the factors:

Snowpipe Configuration Optimization:

First, upgrade your architecture to use SNS-to-SQS pattern instead of direct SQS. Create an SNS topic in the same region as your S3 bucket, configure S3 event notifications to publish to SNS, then subscribe your SQS queue to the topic. This adds retry logic and better handles high-volume bursts from SAP extracts.

For S3 event notifications, ensure you’re using these settings:

  • Event type: s3:ObjectCreated:*
  • Prefix filter: Match your SAP data path exactly (e.g., sap/bw/transactional/)
  • Suffix filter: Add .csv or .parquet to avoid triggering on metadata files

Verify IAM permissions allow S3 to publish to SNS, SNS to send to SQS, and Snowpipe to read from SQS. The policy should include s3:GetObject, s3:GetObjectVersion, and s3:ListBucket.

Warehouse Sizing Strategy:

For SAP transactional data with 15-30 minute delays on X-Small, you need to scale up. Based on typical SAP BW extract volumes:

  • If processing 100-500 files/hour: Use Small warehouse
  • If processing 500-2000 files/hour: Use Medium warehouse
  • Above 2000 files/hour: Consider Large or split into multiple pipes

The key metric is FILES_INSERTED vs FILES_RECEIVED in PIPE_USAGE_HISTORY. If you’re consistently seeing a backlog building during business hours, you’re undersized.

Implement warehouse auto-suspend after 60 seconds to control costs when file arrival is sporadic (common with scheduled SAP extracts).

Monitoring and Troubleshooting:

Set up comprehensive monitoring with these queries:

-- Check pipe lag
SELECT SYSTEM$PIPE_STATUS('SAP_TRANSACTIONAL_PIPE');

-- Monitor load performance
SELECT
  TO_DATE(LAST_LOAD_TIME) as load_date,
  COUNT(*) as files_loaded,
  AVG(ROW_COUNT) as avg_rows,
  AVG(FILE_SIZE) as avg_size_mb
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME=>'SAP_TRANSACTIONS',
  START_TIME=>DATEADD(hours, -24, CURRENT_TIMESTAMP())))
GROUP BY 1 ORDER BY 1 DESC;

Create a monitoring task that runs every 5 minutes:

  • Check if pending file count > 100 (indicates backlog)
  • Verify last successful load was within last 10 minutes
  • Alert if error count increases

SAP-Specific Optimizations:

For SAP BW integration, coordinate with your ABAP team to:

  1. Schedule extracts during off-peak hours when possible
  2. Split large extracts into smaller chunks (max 100MB per file)
  3. Use incremental delta loads rather than full refreshes
  4. Implement file-level checksums in naming convention for validation

Expected Results:

After implementing these changes, you should see:

  • Ingestion latency reduced to 2-5 minutes for most files
  • Consistent processing even during peak SAP extract windows
  • Better visibility into pipeline health through monitoring
  • More predictable dashboard refresh cycles

The combination of proper SNS/SQS configuration, appropriately sized warehouse, and comprehensive monitoring will resolve your delays. Start with the warehouse upgrade to Small - that alone should cut your latency by 60-70% immediately.

X-Small warehouse is definitely your problem for SAP transactional volumes. During peak hours, the compute capacity can’t keep up with the file ingestion rate. I’d recommend starting with Small or Medium warehouse, especially if you’re processing hundreds of files per hour.

You can monitor this by querying PIPE_USAGE_HISTORY - look at the CREDITS_USED and FILES_INSERTED columns. If you see credits maxing out during your delay windows, that’s your confirmation.

Thanks for the suggestions. I checked our setup - we’re using direct SQS without SNS. The S3 events are configured correctly with ObjectCreated:Put triggers.

Queried PIPE_USAGE_HISTORY and you’re right - credits spike to maximum during our delay periods. The FILES_INSERTED count drops significantly even though files are arriving in S3. Looks like warehouse sizing is the primary issue here.

First thing to check is your S3 event notification configuration. Are you using SNS topic with SQS subscription, or direct SQS? The SNS-to-SQS pattern provides better reliability for high-volume scenarios.

Also verify that your S3 bucket has the correct event configuration - it should trigger on ‘ObjectCreated’ events with the appropriate prefix filter matching your SAP data path. Permission issues between S3, SNS/SQS, and Snowpipe can cause silent failures that manifest as delays.