After reviewing all the perspectives here and doing additional performance testing, I want to share our final approach and key learnings:
Our Hybrid Model Decision:
We’re implementing a three-tier data architecture that addresses all three concerns - normalized integrity, reporting complexity, and data redundancy:
Tier 1 - Normalized Transactional Layer (3NF):
Core NC data remains fully normalized. Separate tables for nc_master, nc_products, nc_suppliers, nc_audit_links with proper foreign key constraints. This ensures data integrity for all create/update/delete operations. No redundancy here - single source of truth.
Tier 2 - Selective Denormalization for Common Access:
Based on query analysis, we’re adding four calculated fields to the nc_master table that eliminate 70% of common joins:
- supplier_name (from supplier dimension)
- product_category (from product dimension)
- current_status_label (from status lookup)
- days_open (calculated field)
These are maintained via database triggers on the underlying tables. The trigger overhead is minimal (< 50ms per update) and the reporting performance gain is substantial.
Tier 3 - Reporting Schema (Dimensional Model):
Separate reporting database with star schema design. Fact table: nc_facts. Dimensions: supplier_dim, product_dim, time_dim, status_dim. Refreshed every 2 hours via scheduled ETL job. This handles all complex analytics, trend analysis, and executive dashboards.
Key Trade-off Analysis:
Normalized vs Denormalized Models: We’re not choosing one over the other - we’re using both where each excels. Normalized for transactional integrity, denormalized for reporting performance. The ETL layer bridges them.
Reporting Complexity: The selective denormalization in Tier 2 handles 70% of reports with zero joins. The remaining 30% of complex analytics use Tier 3’s dimensional model, which is optimized for multi-table queries. Users can now build most reports without database expertise.
Data Redundancy Trade-offs: Yes, we have redundancy, but it’s controlled and purposeful. The four denormalized fields in nc_master add < 5% storage overhead but eliminate massive join costs. The reporting schema is explicitly a duplicate for analytics - users understand it’s not real-time. Triggers maintain consistency automatically, so redundancy doesn’t create data quality issues.
Implementation Results:
Common NC reports that previously took 15-30 seconds now run in under 2 seconds. Complex trend analysis reports dropped from 2-3 minutes to 10-15 seconds. User adoption of self-service reporting increased significantly because the query patterns are much simpler.
Recommendation:
Don’t treat this as an either/or decision. Start with a normalized foundation for data integrity, then strategically denormalize based on measured performance needs. Use modern database features (materialized views, indexed views, ETL to separate reporting schemas) to get the benefits of both approaches without the full costs of either extreme.