Exactly right on the REST API approach. Here’s our complete implementation details:
Automated ETL for Master Data Architecture:
1. SAP Data Services ETL Pipeline:
Job runs every 6 hours (02:00, 08:00, 14:00, 20:00) on weekdays, once daily (02:00) on weekends.
Extraction phase:
- Connect to SAP ERP using RFC connector
- Read change pointers from BDCP table for customer master changes
- Extract delta records from KNA1, KNVV, KNB1 tables
- Typical extraction: 2-5 minutes for 100-500 changed records
Transformation phase:
- Standardize customer names (proper case, trim whitespace)
- Format phone numbers and emails to consistent pattern
- Enrich with territory hierarchy from separate lookup table
- Calculate customer segment based on revenue and transaction count
- Transformation processing: 1-2 minutes
2. Data Validation Rules:
Implemented in SAP Data Services validation transform:
Mandatory field validation:
- Customer ID (KUNNR) not null and 10 digits
- Customer name (NAME1) not null, length > 3 characters
- Country code (LAND1) not null and exists in country reference table
- Sales organization (VKORG) not null and active
Business rule validation:
- Credit limit (KLIMK) >= 0
- Payment terms (ZTERM) exists in terms master
- Email format: regex validation for @ and domain
- Phone format: digits only after stripping special characters
Referential integrity validation:
- Territory code exists in territory master table
- Sales rep ID exists in active user list
- Industry code exists in industry classification table
Validation outcomes:
- Pass: Record loads to staging database
- Fail: Record written to exception table with validation error details
- Email alert sent to data steward if validation failure rate exceeds 2%
3. Daily Tableau Extract Refresh:
Triggered via REST API at completion of ETL job:
SAP Data Services script component at end of job:
# Authenticate with Tableau Server
curl -X POST https://tableau-server/api/3.19/auth/signin \
-H "Content-Type: application/json" \
-d '{"credentials":{"name":"etl_service","password":"***"}}'
# Trigger data source refresh
curl -X POST https://tableau-server/api/3.19/sites/sales/datasources/customer-master/refresh \
-H "X-Tableau-Auth: {token}"
Tableau Server configuration:
- Data source: “Customer Master Data”
- Connection: PostgreSQL staging database
- Extract refresh: API-triggered (not scheduled)
- Extract filters: Active customers only (deletion flag = false)
- Incremental refresh: Yes, based on last_modified_date field
- Typical extract refresh time: 8-12 minutes for 45,000 customer records
4. Monitoring and Alerting:
ETL monitoring dashboard tracks:
- Records extracted per run
- Validation pass/fail counts
- ETL job duration and performance trends
- Tableau extract refresh status and timing
- Data freshness: time since last successful refresh
Alerts configured:
- Email alert if ETL job fails or validation failure rate > 2%
- Slack notification when Tableau extract refresh completes
- PagerDuty alert if data freshness exceeds 8 hours (indicates ETL failure)
Implementation Results:
Before automation:
- Manual SAP export: 3 hours weekly staff time
- Data latency: 5-7 days
- Dashboard accuracy: ~85% (frequent customer attribute mismatches)
- Incident rate: 8-12 data quality issues reported monthly
After automation:
- Manual effort: 0 hours (fully automated)
- Data latency: Maximum 6 hours (typically 2-4 hours)
- Dashboard accuracy: 98%+ (validated data only)
- Incident rate: 1-2 data quality issues monthly (primarily edge cases)
Improved Sales Reporting Outcomes:
Sales dashboards now show:
- Current customer segmentation for territory planning
- Real-time credit status for order processing decisions
- Accurate sales rep assignments for commission calculation
- Up-to-date customer contact information for outreach campaigns
Sales team feedback: “Finally trust the customer data in reports. No more checking SAP directly to verify dashboard information.”
Lessons Learned:
-
Change data capture (change pointers) is essential for SAP performance - full refresh would take 45+ minutes and impact ERP responsiveness
-
Three-tier validation with exception handling is better than all-or-nothing approach - business users prefer partial data with alerts over no data due to strict validation failures
-
API-triggered Tableau refresh ensures data consistency - fixed schedules caused issues when ETL jobs ran long or failed
-
Staging database is worth the infrastructure cost - provides buffer between SAP and Tableau, enables data quality checks, and allows Tableau-optimized schema design
This architecture has been running production for 8 months with 99.2% uptime. The automated ETL approach eliminated manual data export tasks and significantly improved sales analytics accuracy.