We’re implementing AWS Glue Zero-ETL integration to replicate SAP OData entities into Redshift Serverless for our analytics warehouse. The integration creates tables automatically, but we’re seeing poor query performance on date-range filters.
The Zero-ETL Object Settings in Glue seem to auto-generate partition keys based on source schema, but they don’t align with our query patterns. Our SAP OData source has fields like CREATED_DATE, MODIFIED_DATE, and FISCAL_PERIOD, but Glue is partitioning by a generic row_id field instead.
-- Current auto-generated structure
CREATE TABLE sap_orders (
row_id BIGINT SORTKEY,
created_date TIMESTAMP,
...
)
Queries filtering by created_date are doing full table scans. We need to understand how to configure Zero-ETL Object Settings to respect our SAP source schema mapping and create proper partition strategies for time-series data. Is there a way to influence the automatic table creation to optimize for our query performance needs?
Thanks for the MV suggestion, but that defeats the purpose of real-time integration. We need the data available immediately after SAP changes. The Glue Data Catalog shows all the SAP OData fields correctly, including CREATED_DATE with proper timestamp type. The issue is specifically in how Zero-ETL Object Settings translates this to Redshift table definitions.
Look into the Redshift data sharing capabilities combined with transformation views. Here’s what works:
- Let Zero-ETL create the base tables with default settings - don’t fight the automation
- Enable automatic table optimization on your Redshift Serverless namespace
- Create a separate schema with transformation views that implement proper sort keys
- Use query rewriting or application-level routing to target the optimized views
For SAP OData source schema mapping, you need to verify the OData metadata is correctly exposing temporal fields. Check the Glue connection’s schema inference settings - there’s an option to override automatic type detection.
Regarding partition strategy for Redshift Serverless, remember it doesn’t use traditional partitioning like S3. Instead focus on:
- SORTKEY on created_date for range queries
- DISTKEY on frequently joined columns
- Enable automatic table optimization to let Redshift learn patterns
For Zero-ETL Object Settings configuration, the key is in the Glue connection properties:
"TableSettings": {
"SortKeyColumns": ["created_date"],
"DistributionStyle": "AUTO"
}
Add this to your Zero-ETL integration configuration. It’s not well documented but the Glue API supports passing Redshift-specific hints through the integration settings.
Query performance optimization requires a multi-layer approach:
- Set proper sort keys at table creation (via settings above)
- Use materialized views for frequently accessed aggregations
- Implement workload management (WLM) queues for different query types
- Monitor query execution with Redshift Query Monitoring Rules
The combination of correct Zero-ETL settings plus Redshift’s automatic optimization should resolve your scan issues. Test with a subset of tables first before applying to all SAP entities.
Another angle - check your Redshift Serverless workgroup configuration. The automatic table optimization feature might help here. Enable it in the workgroup settings and Redshift will analyze query patterns and automatically adjust sort keys and distribution styles over time. It won’t fix the initial schema but could improve performance as the system learns your access patterns.
I’ve seen this exact issue. The problem is that Zero-ETL creates tables based on source structure without understanding your query patterns. You might need to create materialized views on top of the Zero-ETL tables with proper sort keys. Something like:
CREATE MATERIALIZED VIEW sap_orders_by_date
SORTKEY(created_date)
AS SELECT * FROM sap_orders;
This gives you the automation benefits of Zero-ETL while optimizing for your analytical queries. Refresh the MV periodically after Zero-ETL sync completes.
Have you tried using Glue ETL jobs instead of Zero-ETL? I know it adds complexity, but you get full control over schema mapping and can apply transformation logic during ingestion. You could partition by date ranges and even pre-aggregate data for common query patterns. Zero-ETL is great for simple replication but falls short when you need optimization control.
Zero-ETL integrations are designed for simplicity but that means limited control over target schema. The automatic partition key selection uses source primary keys or surrogate keys by default. For SAP OData sources, you need to check if your entity metadata exposes the right key fields. Have you looked at the Glue Data Catalog to see what metadata is being extracted from the OData $metadata endpoint?