Optimization tests with WRF 2.2   (updated 1/17/2007)

Results of several optimization options are shown below in tabular form. Plots appear at the end of this page.

Timings below are from the following:


NCSA Xeon cluster Tungsten, dual-processor, 8cpu (4nodes) run, intel fortran compiler
min avg max last total (sec)
wrf00 -O2 20.07410, 24.10833, 46.18200, 20.82320, 192.86661
wrf01 -O3 20.07180, 24.27175, 47.35260, 20.77120, 194.17400
wrf02 -O3 -axW 12.26170, 16.45409, 42.55320, 12.26170, 131.63271
wrf03 -O3 -axN 12.25870, 16.22942, 40.91680, 12.25870, 129.83539
wrf04 -O3 -axP 21.80670, 26.89333, 56.08760, 22.59970, 215.14661
wrf05 -O3 -msse2 12.09950, 16.96726, 47.64030, 12.13050, 135.73810
rerun of wrf05 12.07030, 16.16044, 41.79880, 12.08000, 129.28349
wrf07 -O3 -unroll0 20.09910, 25.42625, 56.55470, 20.85940, 203.41000
wrf08 -O3 -auto 20.06130, 25.17671, 54.57120, 20.80960, 201.41370
wrf09 -O3 -axW -unroll0 11.89910, 16.45061, 44.96690, 11.90580, 131.60489
wrf10 -O3 -axN -unroll0 11.80140, 16.79606, 48.42630, 11.80140, 134.36850
wrf11 -O3 -axP -unroll0 22.00490, 28.51027, 68.66390, 22.62220, 228.08212
wrf12 -O3 -axW -auto 11.75010, 16.85568, 49.37550, 11.83780, 134.84541
wrf13 -O3 -axN -auto 11.68970, 16.68665, 48.41900, 11.68970, 133.49319
wrf14 -O3 -axP -auto 19.51240, 24.46545, 53.19480, 20.26340, 195.72360
wrf15 -O3 -axW -auto -unroll0 11.32720, 15.64510, 42.60220, 11.32720, 125.16081
wrf16 -O3 -axN -auto -unroll0 11.32350, 15.54421, 41.96100, 11.35380, 124.35369
wrf17 -O2 -axW -auto -unroll0 11.35570, 15.64499, 42.34090, 11.50190, 125.15990
wrf18 -O2 -axN -auto -unroll0 11.37240, 15.89558, 44.43870, 11.37240, 127.16460
wrf19 -O2 -axW 12.24770, 16.86405, 45.93590, 12.24770, 134.91240
wrf20 -O2 -axN 12.25140, 17.00084, 47.09330, 12.26250, 136.00670
rerun of wrf20 12.21550, 17.23564, 49.13140, 12.24130, 137.88510
wrf22 -O2 -msse2 11.99840, 17.08373, 49.59450, 12.03840, 136.66982
wrf23 -O2 -unroll0 20.11190, 24.79041, 51.49890, 20.80430, 198.32330
wrf24 -O2 -auto 20.11610, 25.25327, 55.35450, 20.80710, 202.02620
wrf25 -O2 -axW -unroll0 11.74780, 16.68930, 47.66590, 11.74780, 133.51440
wrf26 -O2 -axN -unroll0 11.76430, 16.34574, 45.16460, 11.76430, 130.76590
wrf27 -O2 -axW -auto 11.81110, 14.98951, 33.98280, 11.82750, 119.91609
wrf28 -O2 -axN -auto 11.64530, 14.95214, 35.04260, 11.65370, 119.61710
Followup test rerunning four fastest -O2 cases, twice for each, not run consecutively
min avg max last total (sec)
wrf17a -O2 -axW -auto -unroll0 11.44960, 14.69052, 34.20800, 11.44960, 117.52419
wrf17b -O2 -axW -auto -unroll0 11.30900, 14.42174, 32.99420, 11.30900, 115.37390
wrf18a -O2 -axN -auto -unroll0 11.28980, 14.49504, 33.28170, 11.28980, 115.96030
wrf18b -O2 -axN -auto -unroll0 11.30820, 14.48814, 33.33010, 11.30820, 115.90510
wrf27a -O2 -axW -auto 11.70130, 14.79760, 33.33350, 11.70130, 118.38080
wrf27b -O2 -axW -auto 11.70330, 15.14882, 36.05420, 11.70330, 121.19060
wrf28a -O2 -axN -auto 11.69510, 14.80996, 33.56640, 11.69510, 118.47969
wrf28b -O2 -axN -auto 11.68810, 14.96658, 34.71730, 11.68810, 119.73260
Cases 06 (-msse3) and 21 (-fast) failed to compile.

From above, it is noted:

The configure.wrf file for run 00 on Tungsten is here, and that for run 18 (arguably the fastest) is available here.


NCSA SGI Altix Cobalt distributed shared memory, 8cpu used, intel fortran compiler
min avg max last total (sec)
wrf00 -O2 6.59600, 9.25064, 26.66670, 6.59600, 74.00510
wrf01 -O3 7.46670, 9.92405, 26.14960, 7.46740, 79.39240
wrf02 -fast          
wrf03 -O2 -ip 6.59430, 9.13144, 25.72040, 6.59430, 73.05150
wrf04 -O2 -ip -ip-no-inlining 6.59080, 9.12325, 25.70990, 6.59080, 72.98599
wrf05 -O2 -nopad 6.59680, 9.13439, 25.75210, 6.59680, 73.07510
wrf06 -O2 -unroll0 6.60290, 9.16621, 25.85080, 6.60290, 73.32970
wrf07 -O2 -no-prefetch 8.43460, 10.89290, 27.08620, 8.43460, 87.14320
wrf08 -O2 -no-ansi-alias 6.61800, 9.15705, 25.76200, 6.61800, 73.25639
wrf09 -O2 -fno-alias 6.60370, 9.13179, 25.66340, 6.60370, 73.05430
wrf10 -O2 -fno-alias -fno-fnalias 6.60930, 9.13399, 25.65490, 6.60930, 73.07189
wrf11 -O2 -IPF-fp-relaxed 6.58820, 9.10940, 25.59710, 6.58820, 72.87521
wrf12 -O2 -Ob0 6.59850, 9.12385, 25.66600, 6.59850, 72.99081
wrf13 -O2 -nolib-inline 7.23490, 9.79181, 26.48230, 7.23490, 78.33450
wrf14 -O2 -no-IPF-fma 6.72070, 9.31193, 26.19450, 6.72070, 74.49541
wrf15 -O2 -auto 6.59690, 9.12623, 25.67610, 6.59690, 73.00980
wrf16 -O2 -save 6.60030, 9.15670, 25.89000, 6.60030, 73.25360
wrf17 -O2 -noalign 6.60190, 9.41070, 27.92720, 6.60190, 75.28560
wrf18 -O2 -align all 6.59600, 9.12370, 25.66400, 6.59600, 72.98960
wrf19 -O2 -ftz 6.59640, 9.12261, 25.66200, 6.59640, 72.98090
wrf20 -O2 -unroll0 -ip -ftz 6.59700, 9.15636, 25.82070, 6.59700, 73.25089
wrf21 -O2 -unroll0 -auto -ip -ftz 6.59870, 9.16765, 25.84380, 6.59870, 73.34120
wrf22 -O3 -ip 7.47070, 9.93419, 26.21120, 7.47260, 79.47350
wrf23 -O3 -nopad 7.46200, 9.91564, 26.11500, 7.46320, 79.32510
wrf24 -O3 -unroll0 6.36720, 9.04907, 26.74520, 6.36770, 72.39259
wrf25 -O3 -fno-alias -fno-fnalias 7.47100, 9.92163, 26.10330, 7.47260, 79.37300
wrf26 -O3 -IPF-fp-relaxed 7.46830, 9.92454, 26.14680, 7.46970, 79.39630
wrf27 -O3 -Ob0 7.41400, 9.88061, 26.09280, 7.41400, 79.04491
wrf28 -O3 -auto 7.45870, 9.90766, 26.07460, 7.46030, 79.26129
wrf29 -O3 -align all 7.46890, 9.91709, 26.07040, 7.47060, 79.33670
wrf30 -O3 -ftz 7.47170, 9.93579, 26.21170, 7.47220, 79.48629
wrf31 -O3 -unroll0 -ip -ftz 6.36440, 9.04236, 26.70980, 6.36440, 72.33890
wrf32 -O3 -unroll0 -auto -ip -ftz 6.35630, 9.03952, 26.69550, 6.35830, 72.31619
Rerun of selected and new options:
min avg max last total (sec)
wrf00 -O2 6.5169 9.03815 25.5248 6.5169 72.3052
wrf01 -O3 7.3607 9.80494 25.9418 7.3619 78.4395
wrf24 -O3 -unroll0 6.2918 8.95411 26.5197 6.2918 71.6329
wrf33 -O2 -ip -fno-alias 6.5108 9.04555 25.6196 6.5108 72.3644
-fno-fnalias -auto
-align all -nopad -ftz
wrf34 -O2 -ip -fno-alias 6.5161 9.09024 25.8102 6.5161 72.7219
-fno-fnalias -auto
-align all -nopad -ftz
-unroll0
wrf35 -O3 -ip -fno-alias 7.3608 9.7992 25.8978 7.361 78.3936
-fno-fnalias -auto
-align all -nopad -ftz
wrf36 -O3 -ip -fno-alias 6.2846 8.98905 26.7999 6.2846 71.9124
-fno-fnalias -auto
-align all -nopad -ftz
-unroll0
Rerun - final, of many of above:
min avg max last total (sec)
wrf00 -O2 6.54370 9.06575 25.55420 6.54370 72.52599
wrf01 -O3 7.39080 9.83114 25.95460 7.39080 78.64910
wrf03 -O2 -ip 6.53600 9.07592 25.61380 6.53600 72.60740
wrf04 -O2 -ip -ip-no-inlining 6.53270 9.06450 25.62390 6.53270 72.51600
wrf05 -O2 -nopad 6.54640 9.08071 25.63410 6.54640 72.64571
wrf09 -O2 -fno-alias 6.54290 9.08566 25.70940 6.54290 72.68530
wrf10 -O2 -fno-alias -fno-fnalias 6.53550 9.06704 25.56940 6.53550 72.53630
wrf15 -O2 -auto 6.54550 9.07061 25.57140 6.54550 72.56490
wrf18 -O2 -align all 6.54300 9.06521 25.55280 6.54300 72.52170
wrf19 -O2 -ftz 6.53960 9.06395 25.55940 6.53960 72.51160
wrf24 -O3 -unroll0 6.30980 8.98309 26.61980 6.31070 71.86470
wrf31 -O3 -unroll0 -ip -ftz 6.31230 8.98910 26.64230 6.31420 71.91280
wrf32 -O3 -unroll0 -auto -ip -ftz 6.31120 9.00045 26.70130 6.31120 72.00360
wrf33 -O2 -ip -fno-alias 6.54740 9.07526 25.61530 6.54740 72.60210
-fno-fnalias -auto
-align all -nopad -ftz
wrf34 -O2 -ip -fno-alias 6.54100 9.38326 27.94160 6.54100 75.06610
  -fno-fnalias -auto          
  -align all -nopad -ftz          
  -unroll0          
wrf35 -O3 -ip -fno-alias 7.40050 9.84974 26.02330 7.40230 78.79790
  -fno-fnalias -auto          
  -align all -nopad -ftz          
wrf36 -O3 -ip -fno-alias 6.31220 9.00691 26.74770 6.31360 72.05530
-fno-fnalias -auto
-align all -nopad -ftz
-unroll0


Plots comparing 48h run solutions

All are 1-grid runs made on Tungsten [Xeon linux cluster].
Run Optimization Configuration 48h stats Pressure/wind Wind speed 700mb RH
wrf00 -O2 KF Cumulus
WSM6 microphys
1/4 diffusion (Smag)
936 mb,
49.2 m/s,
1338.74s wallclock
wrf18 -O2 -axN -auto -unroll0 KF Cumulus
WSM6 microphys
1/4 diffusion (Smag)
935 mb,
49.5 m/s,
876.58s wallclock
Grell -O2 -axN -auto -unroll0 Grell Cumulus
WSM6 microphys
1/4 diffusion (Smag)
970 mb,
35.1 m/s,
924.48s wallclock

Resources

Here are some of the tools used to gather the above.


Brian F. Jewett -- home page