SPEC® ACCEL™ ACC Result

Copyright 2015-2017 Standard Performance Evaluation Corporation

Supermicro (Test Sponsor: NVIDIA Corporation)

Tesla P100-PCIE-16GB

SuperServer 1028GR-TR

SPECaccel_acc_base = 8.02

SPECaccel_acc_peak = 8.02

ACCEL license: 019 Test date: May-2017
Test sponsor: NVIDIA Corporation Hardware Availability: Oct-2015
Tested by: NVIDIA Corporation Software Availability: May-2017
Benchmark results graph
Hardware
CPU Name: Intel Xeon E5-2698 v3
CPU Characteristics:
CPU MHz: 2300
CPU MHz Maximum: 3600
FPU: Integrated
CPU(s) enabled: 32 cores, 2 chips, 16 cores/chip, 2 threads/core
CPU(s) orderable: 1,2 chips
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 256 KB I+D on chip per core
L3 Cache: 40 MB I+D on chip per chip
Other Cache: None
Memory: 256 GB (16 x 16 GB 2Rx4 PC4-2133P-R)
Disk Subsystem: 500 GB Seagate ST9500620NS 7200 RPM SATA
Other Hardware: None
Accelerator
Accel Model Name: Tesla P100
Accel Vendor: NVIDIA Corporation
Accel Name: Tesla P100-PCIE-16GB
Type of Accel: GPU
Accel Connection: PCIe
Does Accel Use ECC: Yes
Accel Description: See Notes
Accel Driver: NVIDIA UNIX x86_64 Kernel Module 375.20
Software
Operating System: CentOS Linux release 7.2.1511 (Core)
3.10.0-327.22.2.el7.x86_64
Compiler: PGI Professional Edition, Release 17.5
File System: xfs
System State: Run level 3 (multi-user)
Other Software: None

Results Table

Benchmark Base Peak
Seconds Ratio Seconds Ratio Seconds Ratio Seconds Ratio Seconds Ratio Seconds Ratio
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
303.ostencil 17.8  8.14  17.6  8.24  17.6  8.25  17.8  8.14  17.6  8.24  17.6  8.25 
304.olbm 41.5  11.0   41.6  10.9   41.6  10.9   41.5  11.0   41.6  10.9   41.6  10.9  
314.omriq 118    8.13  119    8.05  120    8.00  118    8.13  119    8.05  120    8.00 
350.md 22.4  11.3   22.6  11.2   24.4  10.3   22.4  11.3   22.6  11.2   24.4  10.3  
351.palm 145    2.54  143    2.58  143    2.59  145    2.54  143    2.58  143    2.59 
352.ep 71.1  7.46  71.1  7.45  71.1  7.45  71.1  7.46  71.1  7.45  71.1  7.45 
353.clvrleaf 52.1  8.55  52.1  8.54  52.1  8.54  52.1  8.55  52.1  8.54  52.1  8.54 
354.cg 43.6  9.36  43.5  9.38  44.1  9.25  43.6  9.36  43.5  9.38  44.1  9.25 
355.seismic 39.8  9.31  39.8  9.31  39.7  9.31  39.8  9.31  39.8  9.31  39.7  9.31 
356.sp 34.4  8.02  34.3  8.04  34.4  8.02  34.4  8.02  34.3  8.04  34.4  8.02 
357.csp 30.7  8.80  31.0  8.72  31.0  8.71  30.7  8.80  31.0  8.72  31.0  8.71 
359.miniGhost 50.7  7.28  50.7  7.28  50.7  7.28  50.7  7.28  50.7  7.28  50.7  7.28 
360.ilbdc 43.2  8.50  43.1  8.51  43.1  8.51  43.2  8.50  43.1  8.51  43.1  8.51 
363.swim 56.8  4.05  56.7  4.06  56.5  4.07  56.8  4.05  56.7  4.06  56.5  4.07 
370.bt 12.4  18.0   12.5  17.9   12.4  17.9   12.4  18.0   12.5  17.9   12.4  17.9  

Operating System Notes

 Stacksize set to 'unlimited'

Platform Notes

 Sysinfo program /local/home/colgrove/SPECACCEL/Docs/sysinfo
 $Rev: 6965 $ $Date:: 2015-04-21 #$ c05a7f14b1b1765e3fe1df68447e8a35
 running on hsw8 Mon May  8 15:08:54 2017

 This section contains SUT (System Under Test) info as seen by
 some common utilities.  To remove or add to this section, see:
   http://www.spec.org/accel/Docs/config.html#sysinfo

 From /proc/cpuinfo
    model name : Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
       2 "physical id"s (chips)
       64 "processors"
    cores, siblings (Caution: counting these is hw and system dependent.  The
    following excerpts from /proc/cpuinfo might not be reliable.  Use with
    caution.)
       cpu cores : 16
       siblings  : 32
       physical 0: cores 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
       physical 1: cores 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    cache size : 40960 KB

 From /proc/meminfo
    MemTotal:       264038532 kB
    HugePages_Total:   32768
    Hugepagesize:       2048 kB

 /usr/bin/lsb_release -d
    CentOS Linux release 7.2.1511 (Core)

 From /etc/*release* /etc/*version*
    centos-release: CentOS Linux release 7.2.1511 (Core)
    centos-release-upstream: Derived from Red Hat Enterprise Linux 7.2 (Source)
    os-release:
       NAME="CentOS Linux"
       VERSION="7 (Core)"
       ID="centos"
       ID_LIKE="rhel fedora"
       VERSION_ID="7"
       PRETTY_NAME="CentOS Linux 7 (Core)"
       ANSI_COLOR="0;31"
       CPE_NAME="cpe:/o:centos:centos:7"
    redhat-release: CentOS Linux release 7.2.1511 (Core)
    system-release: CentOS Linux release 7.2.1511 (Core)
    system-release-cpe: cpe:/o:centos:centos:7

 uname -a:
    Linux hsw8 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 23 17:05:11 UTC 2016
    x86_64 x86_64 x86_64 GNU/Linux

 run-level 3 May 8 13:53

 SPEC is set to: /local/home/colgrove/SPECACCEL
    Filesystem              Type  Size  Used Avail Use% Mounted on
    /dev/mapper/centos-root xfs   443G   28G  415G   7% /

 Cannot run dmidecode; consider saying 'chmod +s /usr/sbin/dmidecode'

 (End of data from sysinfo program)
  Information from pgaccelinfo

  CUDA Driver Version:           8000
  NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  375.20

  Device Number:                 0
  Device Name:                   Tesla P100-PCIE-16GB
  Device Revision Number:        6.0
  Global Memory Size:            17100439552
  Number of Multiprocessors:     56
  Concurrent Copy and Execution: Yes
  Total Constant Memory:         65536
  Total Shared Memory per Block: 49152
  Registers per Block:           65536
  Warp Size:                     32
  Maximum Threads per Block:     1024
  Maximum Block Dimensions:      1024, 1024, 64
  Maximum Grid Dimensions:       2147483647 x 65535 x 65535
  Maximum Memory Pitch:          2147483647B
  Texture Alignment:             512B
  Clock Rate:                    1328 MHz
  Execution Timeout:             No
  Integrated Device:             No
  Can Map Host Memory:           Yes
  Compute Mode:                  default
  Concurrent Kernels:            Yes
  ECC Enabled:                   Yes
  Memory Clock Rate:             715 MHz
  Memory Bus Width:              4096 bits
  L2 Cache Size:                 4194304 bytes
  Max Threads Per SMP:           2048
  Async Engines:                 2
  Unified Addressing:            Yes
  Managed Memory:                Yes
  PGI Compiler Option:           -ta=tesla:cc60

Base Compiler Invocation

C benchmarks:

 pgcc 

Fortran benchmarks:

 pgfortran 

Benchmarks using both Fortran and C:

 pgcc   pgfortran 

Base Optimization Flags

C benchmarks:

 -fast   -Mfprelaxed   -acc   -ta=tesla:cc60   -ta=tesla:cuda8.0 

Fortran benchmarks:

 -fast   -Mfprelaxed   -acc   -ta=tesla:cc60   -ta=tesla:cuda8.0 

Benchmarks using both Fortran and C:

353.clvrleaf:  -fast   -Mfprelaxed   -acc   -ta=tesla:cc60   -ta=tesla:cuda8.0 
359.miniGhost:  -fast   -Mfprelaxed   -acc   -ta=tesla:cc60   -ta=tesla:cuda8.0   -Mnomain 

Peak Optimization Flags

C benchmarks:

303.ostencil:  basepeak = yes 
304.olbm:  basepeak = yes 
314.omriq:  basepeak = yes 
352.ep:  basepeak = yes 
354.cg:  basepeak = yes 
357.csp:  basepeak = yes 
370.bt:  basepeak = yes 

Fortran benchmarks:

350.md:  basepeak = yes 
351.palm:  basepeak = yes 
355.seismic:  basepeak = yes 
356.sp:  basepeak = yes 
360.ilbdc:  basepeak = yes 
363.swim:  basepeak = yes 

Benchmarks using both Fortran and C:

353.clvrleaf:  basepeak = yes 
359.miniGhost:  basepeak = yes 

The flags file that was used to format this result can be browsed at
https://www.spec.org/accel/flags/pgi2017_flags.20170621.00.html.

You can also download the XML flags source by saving the following link:
https://www.spec.org/accel/flags/pgi2017_flags.20170621.00.xml.