Great that you got it working! Let me provide a comprehensive solution for anyone else hitting this issue.
The core problem is field resolution scope in SAQL when combining joins with XMD-defined calculated fields. Here’s how to address all three critical aspects:
1. SAQL Join Syntax and Field Aliasing
The key is understanding that field aliases create new contexts. When you write q = load "Sales"; q = foreach q generate quantity, unit_price; and then join, your calculated fields from XMD don’t automatically transfer. Use explicit aliases consistently:
s = load "Sales";
p = load "Products";
q = cogroup s by 'ProductId', p by 'Id';
2. Dataset XMD and Calculated Fields
Your XMD calculated fields are dataset-scoped, not query-scoped. Instead of relying on XMD calculations after joins, define them directly in SAQL:
q = foreach q generate
s.quantity * s.unit_price as 'total_revenue',
p.category as 'product_category';
3. Field Resolution in Joined Queries
The best pattern is the pre-aggregation approach: Calculate and aggregate within the source dataset context before joining:
// Aggregate Sales first
sales_agg = group s by 'ProductId';
sales_agg = foreach sales_agg generate
'ProductId' as 'ProductId',
sum(quantity * unit_price) as 'total_revenue';
// Now join pre-aggregated data
q = cogroup sales_agg by 'ProductId', p by 'Id';
This approach keeps calculations in their native context, improves performance, and avoids field resolution errors. For your cross-functional reporting dashboard, structure your SAQL with pre-aggregation steps for each dataset, then join the aggregated results. This pattern scales well for complex multi-dataset queries and makes the logic more maintainable.
Alternatively, if you need the calculated field available across multiple queries, consider creating it as a derived field during the dataflow ETL process rather than in XMD. That way it’s a physical field in the dataset and survives all join operations naturally.