For years we have always focused on well modeled fact tables, that only held measures, calculated measures, flags, and integer keys to a dimension table that would add all the context needed for each of those summed, averaged, or counted measures. This has always scaled extremely well to Terabyte sized fact tables and even high cardinality dimensions.
Recently we had a requirement to do a very large analysis in AWS Quicksight of Actual VS Forecast data from dozens of different fact tables. The “Forecast” part of this requirement meant we needed to look back at historical performance, apply some weighted measures to that historical data depending on how old that data was, and apply that result to a period in the future. This gave us pause, as well as a few cold beers to figure out the best way to do this.
How do you get all this measure data from multiple fact tables, look at it backwards and forwards easily, and then apply it to a future date? The answer is looking at data in the perspective of a “day”. Making a “day” the fact event, rather than a traditional fact table event, made all of this possible.
Once we took that perspective, things fell in line quite well. For this project we were using serverless Redshift on AWS, which is tuned quite well for traditional analytic queries, but it's definitely not made to do time series analysis. There are some databases built specifically for that (TimeScaleDB, Influx, TSDB), but we were able to accomplish this with Redshift.
At a high level, this is what we did:
Select day,sum(metric#1) from fact#1
UNION
Select day,sum(metric#2) from fact#2
UNION
Select day,sum(metric#3) from fact#3
We did this for about a dozen fact tables. This now gave us a dataset that is easily usable in a BI tool, and gives us all metrics for any given day without the possibility of losing data with an inner join, left join, or any of the other “gotchas” of using joins. It is now very easy to look back in time using simple ‘dateadd’ logic from the BI tool, or even ‘dateadd’ logic in SQL to look forward or backward by day, week, month, year, etc.
In summary, it’s always good to think about alternative ways to view your data. If you are doing forecasting or comparing actuals VS a forecast across different slices of time, using your date dimension as the fact table may make sense!
Let's talk today about this solution and many others to gain insights to all of your data!
Comments