POH Tuning (Part 6 - Varying the benchmark)



Varying the benchmark

In the last post, we showed that in a particular scenario, allocating pinned objects on pinned object heap is a better choice from both the speed perspective and the heap size perspective. How about other scenarios?

In part 4, we already discussed the criterion what is feasible to test under GCPerfSim, so we will simply generate all the possibilities here with this simple Python script

for a in range(1, 100):
    if 1000 % a == 0:
        for b in range(1, a):
            if a % b == 0:
                if ((1000 - b) % (a - b) == 0):
                    pin_sohsi = 1000 // a
                    pin_sohpi = a // b
                    poh_sohsi = (1000 - b) // (a - b)
                    poh_pohar = b
                    print("  2gb_pin_%s_%s:" % (a, b));
                    print("    arguments:");
                    print("      tc: 6");
                    print("      tagb: 100");
                    print("      tlgb: 2");
                    print("      lohar: 0");
                    print("      pohar: 0");
                    print("      sohsr: 100-4000");
                    print("      pohsr: 100-4000");
                    print("      sohsi: %s" % pin_sohsi);
                    print("      lohsi: 0");
                    print("      pohsi: 0");
                    print("      sohpi: %s" % pin_sohpi);
                    print("      lohpi: 0");
                    print("      sohfi: 0");
                    print("      lohfi: 0");
                    print("      pohfi: 0");
                    print("      allocType: reference");
                    print("      testKind: time");
                    print("  2gb_poh_%s_%s:" % (a, b));
                    print("    arguments:");
                    print("      tc: 6");
                    print("      tagb: 100");
                    print("      tlgb: 2");
                    print("      lohar: 0");
                    print("      pohar: %s" % poh_pohar);
                    print("      sohsr: 100-4000");
                    print("      pohsr: 100-4000");
                    print("      sohsi: %s" % poh_sohsi);
                    print("      lohsi: 0");
                    print("      pohsi: 1");
                    print("      sohpi: 0");
                    print("      lohpi: 0");
                    print("      sohfi: 0");
                    print("      lohfi: 0");
                    print("      pohfi: 0");
                    print("      allocType: reference");
                    print("      testKind: time");

Preprocessing the result

As usual, we can create a pandas data frame for all the runs. Note that we used the benchmark name to keep track of the scenario, now we can extract these information using these simple python commands:

run_data_frame['p'] = run_data_frame.apply(lambda row: row['benchmark_name'].split('_')[1], axis=1)
run_data_frame['a'] = run_data_frame.apply(lambda row: int(row['benchmark_name'].split('_')[2]), axis=1)
run_data_frame['b'] = run_data_frame.apply(lambda row: int(row['benchmark_name'].split('_')[3]), axis=1)

Now we can display the data in a nice sorted order

run_data_frame[["p","a","b","PctTimePausedInGC", "speed", "HeapSizeBeforeMB_Mean", "HeapSizeAfterMB_Mean"]].sort_values(["p","a","b"])
p a b PctTimePausedInGC speed HeapSizeBeforeMB_Mean HeapSizeAfterMB_Mean
pin 2 1 41.026208 13.866402 2909.199133 2909.070073
pin 4 1 40.996801 12.284033 3463.542373 3463.398389
pin 4 2 49.933891 17.241924 3589.816502 3589.782947
pin 8 4 57.689693 23.644679 3562.012412 3562.028666
pin 10 1 37.903009 9.974110 3581.997371 3581.919437
pin 10 5 59.205905 22.360968 3183.753481 3182.977681
pin 20 10 80.345651 64.567706 3902.958089 3902.755635
pin 40 8 85.275268 63.498679 4017.525417 4017.429858
pin 40 10 87.644163 75.736131 3991.189973 3991.090571
pin 40 20 91.544744 120.112824 4125.414157 4125.391582
pin 50 25 92.621281 136.764843 4152.331355 4152.325399
poh 2 1 51.951706 3.781857 2229.854581 2262.198264
poh 4 1 45.390104 3.555535 2335.833893 2362.571388
poh 4 2 47.362661 3.416029 2283.330958 2369.388038
poh 8 4 50.333994 4.037242 2383.460738 2578.549332
poh 10 1 53.589727 4.143718 2653.550769 2684.079599
poh 10 5 48.989765 3.985653 2392.374988 2635.762941
poh 20 10 55.155276 5.408592 2295.580731 2796.314600
poh 40 8 45.019697 14.362240 2780.002428 2975.107003
poh 40 10 45.041205 12.911266 2740.182679 3042.987936
poh 40 20 56.201348 9.650450 1889.063288 2694.295634
poh 50 25 52.596479 12.271813 1908.234109 2592.491218
run_data_frame[["p","a","b","TotalNumberGCs", "CountIsGen0", "CountIsGen1", "CountIsBackground", "CountIsBlockingGen2"]].sort_values(["p","a","b"])
p a b TotalNumberGCs CountIsGen0 CountIsGen1 CountIsBackground CountIsBlockingGen2
pin 2 1 293 270 21 0 2
pin 4 1 284 242 40 0 2
pin 4 2 283 240 41 0 2
pin 8 4 285 220 63 0 2
pin 10 1 330 247 81 0 2
pin 10 5 286 215 69 0 2
pin 20 10 458 366 89 0 3
pin 40 8 552 486 61 0 5
pin 40 10 565 500 60 0 5
pin 40 20 601 537 57 0 7
pin 50 25 633 567 57 0 9
poh 2 1 4902 4875 25 0 2
poh 4 1 4002 3952 48 0 2
poh 4 2 4549 4493 54 0 2
poh 8 4 4450 4349 99 0 2
poh 10 1 4417 4296 119 0 2
poh 10 5 4277 4155 119 0 3
poh 20 10 4176 3941 231 0 4
poh 40 8 488 338 146 0 4
poh 40 10 471 327 140 0 4
poh 40 20 2849 2411 429 0 9
poh 50 25 622 415 191 0 16

Observation

The obvious observation is that POH is superior to pinned handle in all cases above for both speed and heap size aspect. From a number of GCs perspective, the 4000+ GCs in the POH case stands out. But even in those cases, the speed is still superior overall. Those in those cases we have a relatively low surivial rate. (Remember a is the number of objects to surive per 1000 objects)