Showing posts with label restructure. Show all posts
Showing posts with label restructure. Show all posts

Thursday, September 17, 2015

Measuring Performance with ASO Fragmentation

It's true that ASO fragmentation takes place when performing physical partial clears of data. For example, I loaded some data into ASOSamp and the ess00001.dat file for the database is 562,036,736 bytes. If I clear out Prev Year data then the ess00001.dat file increases to 847,249,408 bytes. A restructure of the outline shrinks it back down to 285,212,672 bytes. Note: Only certain restructures will clear up the fragmentation issue. For example, adding a member to a dynamic dimension will but adding a member to a stored dimension will not. Click here to see the various levels of ASO outline restructuring.

What I'd like to do today is measure the retrieval performance associated with fragmentation. I will use the data I mentioned above for four tests. The elapsed time is recorded next to each scenario in parenthesis.

Test 1: Run simple query with no aggregate views. Record elapsed time for each scenario.

Scenario 1: Load just Curr Year data. (0.68s)
Scenario 2: Load data in Curr Year and Prev Year but then clear out Prev Year. (0.68s)

Test 2: Run zoom-in query with no aggregate views. Record elapsed time for each scenario.

Scenario 1: Load just Curr Year data. (3.548s)
Scenario 2: Load data in Curr Year and Prev Year but then clear out Prev Year. (3.56s)

Test 3: Create 50% aggregate views. Record elapsed time for each scenario.

Scenario 1: Load just Curr Year data. (33.71s)
Scenario 2: Load data in Curr Year and Prev Year but then clear out Prev Year. (32.13s)

Test 4: Export data.

Scenario 1: Load just Curr Year data. (19.632s)
Scenario 2: Load data in Curr Year and Prev Year but then clear out Prev Year. (19.714s)

Summary

Well that's not very exciting. But it is good news -- fragmentation doesn't seem to impede performance. I feel like I need to run these tests again with much larger ASO cubes to see what happens. Perhaps with a 20 or 50 GB it might make a difference.

Epilogue

The most interesting result is that in Test 3, 50 views were created which increased the ess00001.dat file to 411,041,792 bytes in Scenario 1 but in Scenario 2 the ess00001.dat file does not increase past 847,249,408 bytes. It seems to be using the empty space in the file to store the aggregate views. I don't think it's any coincidence that the size of the file increases by exactly the size of the size of the Curr Year data when the Prev Year data is cleared. So I'm not sure I'd call ASO fragmentation a true fragmentation. Fragmentation implies that there are lots of many empty spaces all over the files and it seems like data is still contiguous in the ess*.dat file. It would be interesting to hear from an Oracle developer how the clear command impacts the files on disk.

Tuesday, September 8, 2015

ASO Restructure Performance Part 2

As I showed in Part 1, you don't have a lot of control in speeding up ASO restructures. As with BSO, the more data you have in the cube, the longer the restructure will take. One tip for faster ASO restructures is to clear aggregate data. Once the restructure is complete you'll need to aggregate the data again but you can do this while the cube is live if you need to. Keeping this in mind, I thought I'd compare the performance for a few restructure options.

1.  With Data and Aggregations.
2.  With Input Data Only.
3.  Export, Restructure, Load.


For my test model I'm using ASOsamp.Sample with 550MB of input data and when I aggregate I'm using the below command. This will give me aggregate views approximately 50% the size of the input data. To cause the restructure, I will add a member to the Measures dimension.

Execute aggregate process on database ASOsamp.Sample stopping when total_size exceeds 1.5;

Now the results (all times in seconds):

1. With input data and aggregations.

Restructure Time: 51.85s
Total Elapsed Time: 51.85s

2. With input data only.

Clear Aggregate Data Time: ~1s
Restructure Time: 26.12s
Create Aggregate Views Time: 48.71s
Total Elapsed Time: 75.83s

3. Export/Restructure/Load

Export Data Time: 43s
Clear Data Time: ~1s
Restructure Time: 1.35s
Load Data Time: 13.41s + 14.88s
Create Aggregate Views Time: 52.29s
Total Elapsed Time: 125.93s

I do find it interesting that having aggregated views that are 50% the size of the input data causes a nearly 100% increase in restructure time (26.12s vs. 51.85s). I think I need to do some more experimentation on this aspect of restructures.

In this example I wasn't able to gain better performance by restructuring with less data. I did run a few tests turning on parallel loads, calcs and exports. The results were slightly better (method 2 clocked in at about 66s) but the restructure with data and aggregations still won the day. I think there is reason to believe that with some larger input sets and a very powerful machine that method 2 could beat method 1.