1. With Data and Aggregations.
2. With Input Data Only.
3. Export, Restructure, Load.
For my test model I'm using ASOsamp.Sample with 550MB of input data and when I aggregate I'm using the below command. This will give me aggregate views approximately 50% the size of the input data. To cause the restructure, I will add a member to the Measures dimension.
Execute aggregate process on database ASOsamp.Sample stopping when total_size exceeds 1.5;
Now the results (all times in seconds):
Execute aggregate process on database ASOsamp.Sample stopping when total_size exceeds 1.5;
Now the results (all times in seconds):
1. With input data and aggregations.
Restructure Time: 51.85sTotal Elapsed Time: 51.85s
2. With input data only.
Clear Aggregate Data Time: ~1sRestructure Time: 26.12s
Create Aggregate Views Time: 48.71s
Total Elapsed Time: 75.83s
3. Export/Restructure/Load
Export Data Time: 43sClear Data Time: ~1s
Restructure Time: 1.35s
Load Data Time: 13.41s + 14.88s
Create Aggregate Views Time: 52.29s
Total Elapsed Time: 125.93s
I do find it interesting that having aggregated views that are 50% the size of the input data causes a nearly 100% increase in restructure time (26.12s vs. 51.85s). I think I need to do some more experimentation on this aspect of restructures.
In this example I wasn't able to gain better performance by restructuring with less data. I did run a few tests turning on parallel loads, calcs and exports. The results were slightly better (method 2 clocked in at about 66s) but the restructure with data and aggregations still won the day. I think there is reason to believe that with some larger input sets and a very powerful machine that method 2 could beat method 1.
No comments:
Post a Comment