Comparing data connectors vs direct query for large datasets - performance trade-offs

I’ve been benchmarking different approaches for connecting Tableau to our large datasets (5-50M rows) and wanted to share findings and get community input on best practices.

We tested three scenarios:

  1. Native data connectors (SQL Server, PostgreSQL) with live connections
  2. ODBC/JDBC direct queries with custom SQL
  3. Extract-based approach with scheduled refreshes

Our initial results show extract performance is 3-5x faster for dashboard rendering, but refresh windows are becoming problematic as data volumes grow. Direct queries offer real-time data but suffer from slow filter operations on large fact tables. Native connectors with query optimization show promise but require careful index management.

Curious about others’ experiences with connector vs direct query performance for enterprise-scale deployments. What refresh strategies work best when extracts take 2+ hours to refresh?

Another factor is network latency. If your Tableau Server is in a different data center or cloud region from your database, direct queries suffer from round-trip latency on every interaction. We measured 200-300ms additional latency per query when connecting from AWS us-east-1 Tableau Server to an on-premises SQL Server. Extracts eliminate this because data is local to the Tableau Server. For cloud deployments, co-locating your database and Tableau Server in the same region makes a huge difference for live connection performance.

Incremental refresh is definitely on our roadmap. How do you handle historical data corrections or backdated transactions? We occasionally need to update records from previous months, which would be missed by incremental refresh if we’re only looking at recent modifications.

From a database perspective, direct queries can absolutely perform well if you optimize for Tableau’s query patterns. Enable query logging on your database to see what Tableau generates, then create covering indexes for common filter combinations. We built materialized views specifically for our most-used Tableau dashboards - pre-aggregated data that updates hourly. This gives us near-real-time data with extract-like performance. The views handle the heavy lifting, and Tableau just queries pre-computed results.