CFP2000 Result: Hewlett-Packard Company AlphaServer GS1280 7/1300

Benchmark	Base Copies	Base Runtime	Base Ratio	Copies	Runtime	Ratio
168.wupwise	64	161	738	64	67.3	1766
171.swim	64	76.7	3002	64	76.7	3002
172.mgrid	64	225	594	64	149	896
173.applu	64	124	1260	64	114	1363
177.mesa	64	135	769	64	118	883
178.galgel	64	122	1762	64	121	1783
179.art	64	115	1682	64	72.1	2676
183.equake	64	219	441	64	73.0	1322
187.facerec	64	152	929	64	136	1037
188.ammp	64	264	618	64	229	713
189.lucas	64	117	1264	64	109	1359
191.fma3d	64	184	846	64	137	1138
200.sixtrack	64	225	363	64	210	389
301.apsi	64	192	1004	64	182	1062
SPECfp_rate_base2000	932
SPECfp_rate2000	1224

Benchmark

Base
Copies

Base
Runtime

Base
Ratio

Copies

Runtime

Ratio

168.wupwise

161

738

67.3

1766

171.swim

76.7

3002

76.7

3002

172.mgrid

225

594

149

896

173.applu

124

1260

114

1363

177.mesa

135

769

118

883

178.galgel

122

1762

121

1783

179.art

115

1682

72.1

2676

183.equake

219

441

73.0

1322

187.facerec

152

929

136

1037

188.ammp

264

618

229

713

189.lucas

117

1264

109

1359

191.fma3d

184

846

137

1138

200.sixtrack

225

363

210

389

301.apsi

192

1004

182

1062

SPECfp_rate_base2000

932

SPECfp_rate2000

1224

Hardware

Hardware Vendor:

Hewlett-Packard Company

Model Name:

AlphaServer GS1280 7/1300

CPU:

Alpha 21364

CPU MHz:

1300

FPU:

Integrated

CPU(s) enabled:

64 cores, 64 chips, 1 core/chip

CPU(s) orderable:

2 to 64

Parallel:

Primary Cache:

64KB(I)+64KB(D) on chip

Secondary Cache:

1.75MB on chip per CPU

L3 Cache:

None

Other Cache:

None

Memory:

2GB per CPU; 256MB RIMMs

Disk Subsystem:

18GB Ultra 320 15KRPM

Other Hardware:

None

Software

Operating System:

Tru64 UNIX V5.1B-1 + PK4

Compiler:

Compaq C V6.5-011-48C5K
Program Analysis Tools V2.0
Spike V5.2 (510 USG)
HP Fortran V5.5A-3548-48D88
HP Fortran 77 V5.5A-3548-48D88
KAP Fortran V4.3 000607
KAP Fortran 77 V4.1 980926
KAP C V4.1 000607

File System:

AdvFS, 14x18GB, RAID0

System State:

Multi-user

Notes / Tuning Information
Baseline C: cc -arch ev7 -fast -O4 ONESTEP Fortran: f90 -arch ev7 -fast -O5 ONESTEP Peak: All use -g3 -arch ev7 -non_shared ONESTEP except these (which use only the tunings shown below): 173.applu 188.ammp 191.fma3d Individual benchmark tuning: 168.wupwise: kf77 -call_shared -inline all -tune ev67 -unroll 12 -automatic -align commons -arch ev67 -fkapargs=' -aggressive=c -fuse -fuselevel=1 -so=2 -r=1 -o=1 -interleave -ur=6 -ur2=060 ' +PFB 171.swim: same as base 172.mgrid: kf90 -call_shared -arch generic -O5 -inline manual -nopipeline -transform_loops -unroll 9 -automatic -fkapargs='-aggressive=a -fuse -interleave -ur=2 -ur3=5 -cachesize=128,16000 ' +PFB 173.applu: kf90 -O5 -transform_loops -fkapargs=' -o=0 -nointerleave -ur=14 -ur2=260 -ur3=18' +PFB 177.mesa: kcc -fast -O4 +CFB +IFB 178.galgel: f90 -O5 -fast -unroll 5 -automatic 179.art: kcc -assume whole_program -ldensemalloc -call_shared -assume restricted_pointers -unroll 16 -inline none -ckapargs=' -fuse -fuselevel=1 -ur=3' +PFB 183.equake: cc -call_shared -arch generic -fast -O4 -ldensemalloc -assume restricted_pointers -inline speed -unroll 13 -xtaso_short +PFB 187.facerec: f90 -O4 -nopipeline -inline all -non_shared -speculate all -unroll 7 -automatic -assume accuracy_sensitive -math_library fast +IFB 188.ammp: cc -arch host -O4 -ifo -assume nomath_errno -assume trusted_short_alignment -fp_reorder -readonly_strings -ldensemalloc -xtaso_short -assume restricted_pointers -unroll 9 -inline speed +CFB +IFB +PFB 189.lucas: kf90 -O5 -fkapargs='-ur=1' +PFB 191.fma3d: kf90 -O4 -transform_loops -fkapargs='-cachesize=128,16000 ' +PFB 200.sixtrack: f90 -fast -O5 -assume accuracy_sensitive -notransform_loops +PFB 301.apsi: kf90 -O5 -inline none -call_shared -speculate all -align commons -fkapargs=' -aggressive=ab -tune=ev5 -fuse -ur=1 -ur2=60 -ur3=20 -cachesize=128,16000' Most benchmarks are built using one or more types of profile-driven feedback. The types used are designated by abbreviations in the notes: +CFB: Code generation is optimized by the compiler, using feedback from a training run. These commands are done before the first compile (in phase "fdo_pre0"): mkdir /tmp/pp rm -f /tmp/pp/${baseexe}* and these flags are added to the first and second compiles: PASS1_CFLAGS = -prof_gen_noopt -prof_dir /tmp/pp PASS2_CFLAGS = -prof_use_feedback -prof_dir /tmp/pp (Peak builds use /tmp/pp above; base builds use /tmp/pb.) +IFB: Icache usage is improved by the post-link-time optimizer Spike, using feedback from a training run. These commands are used (in phase "fdo_postN"): mv ${baseexe} oldexe spike oldexe -feedback oldexe -o ${baseexe} +PFB: Prefetches are improved by the post-link-time optimizer Spike, using feedback from a training run. These commands are used (in phase "fdo_post_makeN"): rm -f Counts mv ${baseexe} oldexe pixie -stats dstride oldexe 1>pixie.out 2>pixie.err mv oldexe.pixie ${baseexe} A training run is carried out (in phase "fdo_runN"), and then this command (in phase "fdo_postN"): spike oldexe -fb oldexe -stride_prefetch -o ${baseexe} When Spike is used for both Icache and Prefetch improvements, only one spike command is actually issued, with the Icache options followed by the Prefetch options. vm: vm_bigpg_enabled = 1 vm_bigpg_thresh = 6 vm_swap_eager = 0 ubc_maxpercent = 50 proc: max_per_proc_address_space = 34359738368 max_per_proc_data_size = 34359738368 max_per_proc_stack_size = 34359738368 max_proc_per_user = 2048 max_threads_per_user = 4096 maxusers = 2048 per_proc_address_space = 34359738368 per_proc_data_size = 34359738368 per_proc_stack_size = 34359738368 Portability: galgel: -fixed Information on UNIX V5.1B Patches can be found at http://ftp1.service.digital.com/public/unix/v5.1b/ Processes were bound to CPUs using "runon".

Notes / Tuning Information

 Baseline   C: cc  -arch ev7 -fast -O4 ONESTEP 
      Fortran: f90 -arch ev7 -fast -O5 ONESTEP 

 
 Peak:
   All use -g3 -arch ev7 -non_shared ONESTEP 
   except these (which use only the tunings shown below):
      173.applu 188.ammp 191.fma3d
   Individual benchmark tuning:
   168.wupwise: kf77 -call_shared -inline all -tune ev67 
                -unroll 12 -automatic -align commons -arch ev67
                -fkapargs=' -aggressive=c -fuse
                -fuselevel=1 -so=2 -r=1 -o=1 -interleave
                -ur=6 -ur2=060 ' +PFB
       171.swim: same as base
      172.mgrid: kf90 -call_shared -arch generic -O5 -inline
                 manual -nopipeline -transform_loops -unroll 9 -automatic
                 -fkapargs='-aggressive=a -fuse -interleave
                 -ur=2 -ur3=5 -cachesize=128,16000 ' +PFB
     173.applu: kf90  -O5 -transform_loops 
                -fkapargs=' -o=0 -nointerleave -ur=14
                -ur2=260 -ur3=18' +PFB
      177.mesa: kcc -fast -O4 +CFB +IFB 
    178.galgel: f90 -O5 -fast -unroll 5 -automatic
       179.art: kcc  -assume whole_program -ldensemalloc 
                -call_shared -assume restricted_pointers 
                -unroll 16 -inline none -ckapargs=' 
                -fuse -fuselevel=1 -ur=3' +PFB
    183.equake: cc -call_shared -arch generic -fast -O4
                -ldensemalloc -assume restricted_pointers
                -inline speed -unroll 13 -xtaso_short +PFB
   187.facerec: f90 -O4 -nopipeline -inline all 
                -non_shared -speculate all -unroll 7
                -automatic -assume accuracy_sensitive 
                -math_library fast +IFB 
      188.ammp: cc -arch host -O4 -ifo -assume nomath_errno 
                -assume trusted_short_alignment -fp_reorder 
                -readonly_strings -ldensemalloc -xtaso_short 
                -assume restricted_pointers -unroll 9 
                -inline speed +CFB +IFB +PFB
     189.lucas: kf90 -O5 -fkapargs='-ur=1' +PFB 
     191.fma3d: kf90 -O4 -transform_loops -fkapargs='-cachesize=128,16000 ' +PFB
  200.sixtrack: f90 -fast -O5 -assume accuracy_sensitive 
                -notransform_loops +PFB
      301.apsi: kf90 -O5 -inline none -call_shared -speculate all 
                -align commons -fkapargs=' -aggressive=ab 
                -tune=ev5 -fuse -ur=1 -ur2=60 -ur3=20 
                -cachesize=128,16000'

 Most benchmarks are built using one or more types of 
 profile-driven feedback.  The types used are designated
 by abbreviations in the notes:

 +CFB: Code generation is optimized by the compiler, using 
       feedback from a training run.  These commands are
       done before the first compile (in phase "fdo_pre0"):

            mkdir /tmp/pp
            rm -f /tmp/pp/${baseexe}*

       and these flags are added to the first and second compiles:

            PASS1_CFLAGS = -prof_gen_noopt    -prof_dir /tmp/pp
            PASS2_CFLAGS = -prof_use_feedback -prof_dir /tmp/pp
 
      (Peak builds use /tmp/pp above; base builds use /tmp/pb.)

 +IFB: Icache usage is improved by the post-link-time optimizer 
       Spike, using feedback from a training run.  These commands
       are used (in phase "fdo_postN"):  

            mv ${baseexe} oldexe
            spike oldexe -feedback oldexe -o ${baseexe}

 +PFB: Prefetches are improved by the post-link-time optimizer 
       Spike, using feedback from a training run.  These
       commands are used (in phase "fdo_post_makeN"):

            rm -f *Counts*
            mv ${baseexe} oldexe
            pixie -stats dstride oldexe 1>pixie.out 2>pixie.err
            mv oldexe.pixie ${baseexe}

       A training run is carried out (in phase "fdo_runN"), and 
       then this command (in phase "fdo_postN"):

            spike oldexe -fb oldexe -stride_prefetch -o ${baseexe}

 When Spike is used for both Icache and Prefetch improvements, 
 only one spike command is actually issued, with the Icache 
 options followed by the Prefetch options.

 vm:
         vm_bigpg_enabled = 1
         vm_bigpg_thresh = 6
         vm_swap_eager = 0
         ubc_maxpercent = 50
 
 proc:
         max_per_proc_address_space = 34359738368
         max_per_proc_data_size = 34359738368
         max_per_proc_stack_size = 34359738368
         max_proc_per_user = 2048
         max_threads_per_user = 4096
         maxusers = 2048
         per_proc_address_space = 34359738368
         per_proc_data_size = 34359738368
         per_proc_stack_size = 34359738368
 
 
 Portability: galgel: -fixed
  
 Information on UNIX V5.1B Patches can be found at
 http://ftp1.service.digital.com/public/unix/v5.1b/
  
 Processes were bound to CPUs using "runon".

First published at SPEC.org on 24-Aug-2004

Generated on Thu Aug 26 11:55:45 2004 by SPEC CPU2000 HTML formatter v1.01