The extract vs live connection decision for ad-hoc reporting in Tableau is fundamentally about balancing three competing priorities: performance, data freshness, and system load. Based on your 200-user base and ERP backend, here’s how to think through each tradeoff:
Extract vs Live Connection Tradeoffs:
EXTRACTS - Best for:
- Historical analysis and trending (last 12 months sales, YoY comparisons)
- Reports with complex calculations that would burden ERP database
- Dashboards accessed by many users simultaneously (extracts cache results)
- Scenarios where data latency of 4-24 hours is acceptable
- When you need consistent data snapshots for compliance/audit purposes
Pros: Predictable fast performance (sub-second), minimal ERP load, supports complex calculations
Cons: Data staleness (4-24 hours old), storage overhead, refresh scheduling complexity
LIVE CONNECTIONS - Best for:
- Operational dashboards requiring current-hour data (today’s shipments, open orders)
- Transactional lookups (order status, customer account balance)
- Data that changes frequently and business decisions depend on latest values
- Exploratory ad-hoc analysis where users need to drill into latest details
Pros: Real-time data access, no storage overhead, always current
Cons: Query performance varies (2-10 seconds typical, up to minutes for complex queries), heavy ERP database load with 200 concurrent users, limited calculation performance
ERP Query Performance Considerations:
With 200 users running ad-hoc queries against ERP via live connections, you’ll face:
- Database connection pool exhaustion (ERP systems typically limit concurrent connections to 100-200)
- Query contention during peak business hours (9-11 AM, 1-3 PM)
- Slow performance for reports with multiple table joins (5+ tables)
- Risk of poorly written user queries impacting ERP operational performance
Mitigation strategies:
- Implement query governor in ERP to limit Tableau query runtime (kill queries exceeding 60 seconds)
- Create read-replica database specifically for Tableau reporting (isolates analytics load from operational ERP)
- Use Tableau’s query caching and performance recording to identify slow queries
- Educate users on efficient data source design (avoid Cartesian joins, limit row returns)
Data Latency Considerations:
The key question: How old can data be before business value degrades?
- Real-time required (< 5 minutes): Operational dashboards, inventory allocation, fraud detection → LIVE CONNECTION
- Near-real-time acceptable (1-4 hours): Sales pipeline, customer service metrics, daily operations → INCREMENTAL EXTRACTS with 2-4 hour refresh
- End-of-day acceptable (24 hours): Financial reporting, trend analysis, strategic planning → NIGHTLY EXTRACTS
- Historical only (weekly/monthly): Compliance reports, annual planning, archived data → WEEKLY EXTRACTS
For your 200 users, segment by use case:
- 60% likely need end-of-day data (nightly extracts)
- 25% need near-real-time (4-hour incremental extracts)
- 15% need real-time (live connections with query optimization)
Recommended Hybrid Architecture:
-
Primary Data Layer (Extracts):
- Core dimensional data (customers, products, GL accounts) - nightly refresh
- Historical transactional data (orders, shipments, invoices older than 2 days) - nightly refresh
- Aggregated metrics (daily sales by region, monthly inventory turns) - early morning refresh
- Covers 80% of ad-hoc reporting needs with fast performance
-
Operational Data Layer (Live Connections):
- Current day transactions (today’s orders, shipments in last 4 hours)
- Real-time inventory positions
- Customer account balances
- Covers 15% of needs requiring current data
-
Near-Real-Time Layer (Incremental Extracts):
- Sales pipeline and opportunity data - refresh every 2 hours
- Customer service tickets - refresh every 4 hours
- Covers 5% of needs for recent but not real-time data
Implementation Guidance:
- Publish extract-based data sources as certified data sources in Tableau Server (users see “Certified” badge)
- Name live connection data sources clearly: “ERP Live - Use for Real-Time Only”
- Create data source usage guidelines: “For reports on data older than yesterday, use certified extracts. For today’s transactions, use live sources.”
- Monitor query performance via Tableau Server admin views - identify and optimize slow queries
- Implement row-level security consistently across both extract and live data sources
Performance Benchmarks to Target:
- Extract-based dashboards: < 3 seconds load time for 80% of views
- Live connection dashboards: < 8 seconds for simple queries, < 30 seconds for complex ad-hoc reports
- Extract refresh times: < 30 minutes for incremental, < 2 hours for full nightly refresh
The hybrid approach gives you the best of both worlds: fast, reliable performance for most ad-hoc reporting via extracts, with live connections available for the subset of use cases that truly require real-time data. This balances user needs, system performance, and ERP database load effectively for a 200-user Tableau deployment.