Ad-hoc reporting: ETL-based extracts vs live connection in Tableau for ERP data

I’m evaluating architecture options for ad-hoc reporting in Tableau Server 2023.1 connected to our ERP system. We have about 200 business users who create custom reports on sales, inventory, and financial data.

Currently debating two approaches:

  1. ETL-based extracts: Nightly ETL jobs pull data from ERP into Tableau extracts, users work with cached data
  2. Live connections: Users connect directly to ERP database with query optimization

The extract approach gives predictable performance but users complain about data being up to 24 hours old. Live connections provide real-time data but ERP queries can be slow, especially for complex ad-hoc reports with multiple joins. What have others experienced with these tradeoffs? How do you balance dashboard performance against data latency for business user ad-hoc reporting needs?

The extract vs live connection decision for ad-hoc reporting in Tableau is fundamentally about balancing three competing priorities: performance, data freshness, and system load. Based on your 200-user base and ERP backend, here’s how to think through each tradeoff:

Extract vs Live Connection Tradeoffs:

EXTRACTS - Best for:

  • Historical analysis and trending (last 12 months sales, YoY comparisons)
  • Reports with complex calculations that would burden ERP database
  • Dashboards accessed by many users simultaneously (extracts cache results)
  • Scenarios where data latency of 4-24 hours is acceptable
  • When you need consistent data snapshots for compliance/audit purposes

Pros: Predictable fast performance (sub-second), minimal ERP load, supports complex calculations

Cons: Data staleness (4-24 hours old), storage overhead, refresh scheduling complexity

LIVE CONNECTIONS - Best for:

  • Operational dashboards requiring current-hour data (today’s shipments, open orders)
  • Transactional lookups (order status, customer account balance)
  • Data that changes frequently and business decisions depend on latest values
  • Exploratory ad-hoc analysis where users need to drill into latest details

Pros: Real-time data access, no storage overhead, always current

Cons: Query performance varies (2-10 seconds typical, up to minutes for complex queries), heavy ERP database load with 200 concurrent users, limited calculation performance

ERP Query Performance Considerations: With 200 users running ad-hoc queries against ERP via live connections, you’ll face:

  • Database connection pool exhaustion (ERP systems typically limit concurrent connections to 100-200)
  • Query contention during peak business hours (9-11 AM, 1-3 PM)
  • Slow performance for reports with multiple table joins (5+ tables)
  • Risk of poorly written user queries impacting ERP operational performance

Mitigation strategies:

  1. Implement query governor in ERP to limit Tableau query runtime (kill queries exceeding 60 seconds)
  2. Create read-replica database specifically for Tableau reporting (isolates analytics load from operational ERP)
  3. Use Tableau’s query caching and performance recording to identify slow queries
  4. Educate users on efficient data source design (avoid Cartesian joins, limit row returns)

Data Latency Considerations: The key question: How old can data be before business value degrades?

  • Real-time required (< 5 minutes): Operational dashboards, inventory allocation, fraud detection → LIVE CONNECTION
  • Near-real-time acceptable (1-4 hours): Sales pipeline, customer service metrics, daily operations → INCREMENTAL EXTRACTS with 2-4 hour refresh
  • End-of-day acceptable (24 hours): Financial reporting, trend analysis, strategic planning → NIGHTLY EXTRACTS
  • Historical only (weekly/monthly): Compliance reports, annual planning, archived data → WEEKLY EXTRACTS

For your 200 users, segment by use case:

  • 60% likely need end-of-day data (nightly extracts)
  • 25% need near-real-time (4-hour incremental extracts)
  • 15% need real-time (live connections with query optimization)

Recommended Hybrid Architecture:

  1. Primary Data Layer (Extracts):

    • Core dimensional data (customers, products, GL accounts) - nightly refresh
    • Historical transactional data (orders, shipments, invoices older than 2 days) - nightly refresh
    • Aggregated metrics (daily sales by region, monthly inventory turns) - early morning refresh
    • Covers 80% of ad-hoc reporting needs with fast performance
  2. Operational Data Layer (Live Connections):

    • Current day transactions (today’s orders, shipments in last 4 hours)
    • Real-time inventory positions
    • Customer account balances
    • Covers 15% of needs requiring current data
  3. Near-Real-Time Layer (Incremental Extracts):

    • Sales pipeline and opportunity data - refresh every 2 hours
    • Customer service tickets - refresh every 4 hours
    • Covers 5% of needs for recent but not real-time data

Implementation Guidance:

  • Publish extract-based data sources as certified data sources in Tableau Server (users see “Certified” badge)
  • Name live connection data sources clearly: “ERP Live - Use for Real-Time Only”
  • Create data source usage guidelines: “For reports on data older than yesterday, use certified extracts. For today’s transactions, use live sources.”
  • Monitor query performance via Tableau Server admin views - identify and optimize slow queries
  • Implement row-level security consistently across both extract and live data sources

Performance Benchmarks to Target:

  • Extract-based dashboards: < 3 seconds load time for 80% of views
  • Live connection dashboards: < 8 seconds for simple queries, < 30 seconds for complex ad-hoc reports
  • Extract refresh times: < 30 minutes for incremental, < 2 hours for full nightly refresh

The hybrid approach gives you the best of both worlds: fast, reliable performance for most ad-hoc reporting via extracts, with live connections available for the subset of use cases that truly require real-time data. This balances user needs, system performance, and ERP database load effectively for a 200-user Tableau deployment.

The hybrid approach is interesting. How do you handle the complexity of managing both connection types? Do users need to understand which data sources are extracts vs live? Also concerned about ERP query performance - even with optimization, complex joins across large fact tables can overwhelm the database during business hours when 200 users are running ad-hoc queries simultaneously.

Don’t forget about data governance and security implications. Live connections expose ERP data structures directly to users, which can create security risks if row-level permissions aren’t properly configured. Extracts give you better control - you can filter sensitive data during ETL, apply transformations, and ensure consistent business logic. With 200 users creating ad-hoc reports, maintaining consistent definitions and calculations is critical. Extracts enforce this through a controlled ETL pipeline.