Redshift Serverless Zero-ETL integration from SAP OData creates unoptimized partition keys

jacob_arch · April 17, 2025, 11:50am

We’re implementing AWS Glue Zero-ETL integration to replicate SAP OData entities into Redshift Serverless for our analytics warehouse. The integration creates tables automatically, but we’re seeing poor query performance on date-range filters.

The Zero-ETL Object Settings in Glue seem to auto-generate partition keys based on source schema, but they don’t align with our query patterns. Our SAP OData source has fields like CREATED_DATE, MODIFIED_DATE, and FISCAL_PERIOD, but Glue is partitioning by a generic row_id field instead.


-- Current auto-generated structure
CREATE TABLE sap_orders (
  row_id BIGINT SORTKEY,
  created_date TIMESTAMP,
  ...
)

Queries filtering by created_date are doing full table scans. We need to understand how to configure Zero-ETL Object Settings to respect our SAP source schema mapping and create proper partition strategies for time-series data. Is there a way to influence the automatic table creation to optimize for our query performance needs?

charlesbuilder · April 24, 2025, 5:32am

Thanks for the MV suggestion, but that defeats the purpose of real-time integration. We need the data available immediately after SAP changes. The Glue Data Catalog shows all the SAP OData fields correctly, including CREATED_DATE with proper timestamp type. The issue is specifically in how Zero-ETL Object Settings translates this to Redshift table definitions.

gregory_arch · May 6, 2025, 9:39pm

Look into the Redshift data sharing capabilities combined with transformation views. Here’s what works:

Let Zero-ETL create the base tables with default settings - don’t fight the automation
Enable automatic table optimization on your Redshift Serverless namespace
Create a separate schema with transformation views that implement proper sort keys
Use query rewriting or application-level routing to target the optimized views

For SAP OData source schema mapping, you need to verify the OData metadata is correctly exposing temporal fields. Check the Glue connection’s schema inference settings - there’s an option to override automatic type detection.

Regarding partition strategy for Redshift Serverless, remember it doesn’t use traditional partitioning like S3. Instead focus on:

SORTKEY on created_date for range queries
DISTKEY on frequently joined columns
Enable automatic table optimization to let Redshift learn patterns

For Zero-ETL Object Settings configuration, the key is in the Glue connection properties:


"TableSettings": {
  "SortKeyColumns": ["created_date"],
  "DistributionStyle": "AUTO"
}

Add this to your Zero-ETL integration configuration. It’s not well documented but the Glue API supports passing Redshift-specific hints through the integration settings.

Query performance optimization requires a multi-layer approach:

Set proper sort keys at table creation (via settings above)
Use materialized views for frequently accessed aggregations
Implement workload management (WLM) queues for different query types
Monitor query execution with Redshift Query Monitoring Rules

The combination of correct Zero-ETL settings plus Redshift’s automatic optimization should resolve your scan issues. Test with a subset of tables first before applying to all SAP entities.

sandraguru · April 28, 2025, 6:48am

Another angle - check your Redshift Serverless workgroup configuration. The automatic table optimization feature might help here. Enable it in the workgroup settings and Redshift will analyze query patterns and automatically adjust sort keys and distribution styles over time. It won’t fix the initial schema but could improve performance as the system learns your access patterns.

andrew_solver · April 20, 2025, 1:17am

I’ve seen this exact issue. The problem is that Zero-ETL creates tables based on source structure without understanding your query patterns. You might need to create materialized views on top of the Zero-ETL tables with proper sort keys. Something like:


CREATE MATERIALIZED VIEW sap_orders_by_date
SORTKEY(created_date)
AS SELECT * FROM sap_orders;

This gives you the automation benefits of Zero-ETL while optimizing for your analytical queries. Refresh the MV periodically after Zero-ETL sync completes.

ruth_func · April 26, 2025, 9:38pm

Have you tried using Glue ETL jobs instead of Zero-ETL? I know it adds complexity, but you get full control over schema mapping and can apply transformation logic during ingestion. You could partition by date ranges and even pre-aggregate data for common query patterns. Zero-ETL is great for simple replication but falls short when you need optimization control.

larry_guru · April 17, 2025, 1:57pm

Zero-ETL integrations are designed for simplicity but that means limited control over target schema. The automatic partition key selection uses source primary keys or surrogate keys by default. For SAP OData sources, you need to check if your entity metadata exposes the right key fields. Have you looked at the Glue Data Catalog to see what metadata is being extracted from the OData $metadata endpoint?

Topic		Views
Automated SAP inventory synchronization to Redshift with Bedrock Knowledge Base for 70% faster reporting Amazon Web Services (AWS) use-case , iam , vpc , aws-2019 , reporting-automation , inventory-visibility , data-warehousin , redshift-serverless , sap-odata	6	January 14, 2025
Automated Glue job for incremental S3 to Redshift ETL in ERP finance module reduced processing time by 60% Amazon Web Services (AWS) use-case , storage , analytics , sql , devops-auto , etl-pipeline , aws-2020 , python , s3	5	January 10, 2025
Athena query fails on ERP logs due to missing partition metadata after Glue ETL job Amazon Web Services (AWS) question , analytics , etl , sql , data-catalog , aws-2021 , athena , glue , partition-repair	3	February 16, 2025
Athena query fails on partitioned Glue table for financial reporting Amazon Web Services (AWS) question , compute , analytics , aws-2021 , schema-mismatch , reporting-blocked , partitioning , athena , glue	4	April 6, 2025
Snowpipe ingestion delays when syncing SAP master data through BDC interface Snowflake question , sql , ai-ml-integration , data-connectors , snow-7-0 , snowpipe , sap-bdc , ingestion-latency , real-time-ai-model-training	5	September 2, 2025
Sales order creation is extremely slow during peak hours due to full table scans SAP S/4HANA question , performance-opt , indexing , sales-mgmt , sap-1809 , sap-hana-cockpit , full-table-scan , order-processing , peak-load	7	March 23, 2025
Snowpipe ingestion delays when syncing SAP master data through BDC Snowflake question , ai-ml-integrati , data-connectors , change-data-capture , snow-7-0 , snowpipe , sap-bdc , ingestion-latency , real-time-ai-model-training	6	November 18, 2025
Athena query API returns timeout error when processing large datasets for monthly reports Amazon Web Services (AWS) question , analytics , timeout , database , sql , rest-api , aws-2019 , pagination , apis	7	October 6, 2025
Athena query fails to read Parquet files from S3 with schema mismatch error Amazon Web Services (AWS) question , analytics , sql , data-lake , aws-2019 , s3 , athena , parquet , glue-data-catalog	6	January 13, 2025

Redshift Serverless Zero-ETL integration from SAP OData creates unoptimized partition keys

Related topics