We’re experiencing 30+ minute refresh cycles on our enterprise semantic model and it’s becoming a real bottleneck. The model uses a star schema with 4 fact tables connected to 12 dimension tables. Our refresh window is getting too tight for morning reports.
I’ve noticed that bi-directional relationships on two dimensions seem to be causing issues, and cardinality settings might not be optimal. We have about 15M rows across the fact tables. Query folding appears to work on most tables, but I’m not sure if our relationship configuration is preventing some optimizations.
Fact_Sales -> Dim_Product (many-to-one)
Fact_Sales -> Dim_Customer (many-to-one, bi-directional)
Fact_Inventory -> Dim_Location (many-to-one, bi-directional)
Fact_Orders -> Dim_Product (many-to-one)
Has anyone dealt with similar refresh performance issues in large star schemas? Should we be looking at aggregation tables or is there something fundamentally wrong with our relationship setup?
Check your cardinality settings first. If Power BI is detecting many-to-many where it should be many-to-one, that’s a huge performance killer. Also, verify that your dimension keys are actually unique - duplicate keys in dimension tables will force many-to-many relationships and destroy refresh performance. Run a quick COUNTROWS(DISTINCT(…)) check on each dimension key column. With 15M rows across facts, you should definitely consider aggregation tables for common report queries. Pre-aggregating at monthly or weekly grain can speed up both refresh and query performance significantly.
Bi-directional relationships are often the culprit in refresh performance issues. They force the engine to evaluate filters in both directions during refresh, which adds significant overhead. Unless you have a specific cross-filtering requirement, I’d recommend switching those to single-direction and using DAX measures for any reverse filtering needs. That alone could cut your refresh time substantially.
For aggregations, always materialize in the source if possible. Calculated tables in Power BI are computed during refresh, which adds to your refresh time problem. If you can create pre-aggregated tables in your data warehouse (like monthly sales summaries), Power BI can automatically route queries to them when appropriate. The aggregation awareness feature works really well when configured correctly.
Good catch on the cardinality - I found duplicates in Dim_Customer which was forcing a many-to-many relationship. That’s definitely part of the problem. The bi-directional relationships are used for some drill-through reports where users need to see product details from customer context. Would calculated tables for aggregations help, or should these be materialized in the source?