# Invocation command line: # /dev/shm/cpu2017-1.1.5/bin/harness/runcpu --configfile amd_rate_aocc300_milan_A1.cfg --tune all --reportable --iterations 2 --nopower --runmode rate --tune base:peak --size test:train:refrate intrate # output_root was not used for this run ############################################################################ ################################################################################ # AMD AOCC 300 SPEC CPU2017 V1.1.0 Rate Configuration File for 64-bit Linux # # File name : amd_rate_aocc300_milan_A1.cfg # Creation Date : January 15, 2021 # CPU2017 Version : 1.1.5 # Supported benchmarks : All Rate benchmarks (intrate, fprate) # Compiler name/version : AOCC 3.0.0 # Operating system version : OpenSUSE 15.2 # Supported OS's : Ubuntu 20.04, RHEL 8.3, SLES 15 SP2 # Hardware : AMD Milan, Rome, Naples (AMD64) # FP Base Pointer Size : 64-bit # FP Peak Pointer Size : 64-bit # INT Base Pointer Size : 64-bit # INT Peak Pointer Size : 32/64-bit # Auto Parallization : No # # Note: DO NOT EDIT THIS FILE, the only edits required to properly run these # binaries are made in the ini Python file. Please consult Readme.amd_rate_aocc300_milan_A1.txt # for a few uncommon exceptions which require edits to this file. # # Description: # # This binary package automates away many of the complexities necessary to set # up and run SPEC CPU2017 under optimized conditions on AMD Milan/Rome/Naples-based # server platforms within Linux (AMD64). # # The binary package was built specifically for AMD Milan/Rome/Naples microprocessors and # is not intended to run on other products. # # Please install the binary package by following the instructions in # "Readme.amd_rate_aocc300_milan_A1.txt" under the "How To Use the Binaries" section. # # The binary package is designed to work without alteration on two socket AMD # Milan/Rome/Naples-based servers with 64 cores per socket, SMT enabled and 1 TiB of DDR4 # memory distributed evenly among all 16 channels using 32 GiB DIMMs. # # To run the binary package on other Milan/Rome/Naples configurations, please review # "Readme.amd_rate_aocc300_milan_A1.txt". In general, Milan/Rome or Naples CPUs # should be autodetected with no action required by the user. # # In most cases, it should be unnecessary to edit "amd_rate_aocc300_milan_A1.cfg" or any # other file besides "ini_amd_rate_aocc300_milan_A1.py" where reporting fields # and run conditions are set. # # The run script automatically sets the optimal number of rate copies and binds # them appropriately. # # The run script and accompanying binary package are designed to work on Ubuntu # 20.04, RHEL 8.2 and SLES 15 SP2. # # Important! If you write your own run script, please set the stack size to # "unlimited" when executing this binary package. Failure to do so may cause # some benchmarks to overflow the stack. For example, to set stack size within # the bash shell, include the following line somewhere at the top of your run # script before the runcpu invocation: # # ulimit -s unlimited # # Modification of this config file should only be necessary if you intend to # rebuild the binaries. General instructions for rebuilding the binaries are # found in-line below. # ################################################################################ # Modifiable macros: ################################################################################ # Change the following line to true if you intend to REBUILD the binaries (AMD # does not support this). Valid values are "true" or "false" (no quotes). %define allow_build false # Only change these macros if you are rebuilding the binary package: %define compiler_name aocc300 %define binary_package_name amd_rate_%{compiler_name}_milan_A %define binary_package_revision 1 %define build_path /sppo/bin/cpu2017v115aocc3/ %define flags_file_name %{compiler_name}-flags-A1.xml # To enable the platform file, be sure to uncomment the flagsurl02 header line # below. %define platform_file_name INVALID_platform_%{binary_package_name}.xml # You should never have to change binary_package_full_name: %define binary_package_full_name %{binary_package_name}%{binary_package_revision} ################################################################################ # Include file name ################################################################################ # The include file contains fields that are commonly changed. This file is auto- # generated based upon INI file settings and should not need user modification # for runs. %define inc_file_name %{binary_package_full_name}.inc ################################################################################ # Binary label extension and "allow_build"" switch ################################################################################ # Only modify the binary label extension if you plan to rebuild the binaries. %define ext %{binary_package_name} # If you plan to recompile these CPU2017 binaries, please choose a new extension # name (ext above) to avoid confusion with the current binary set on your system # under test, and to avoid confusion for SPEC submission reviewers. You will # also need to set "allow_build" to true below. Finally, you must modify the # Paths section below to point to your library locations if the paths are not # already set up in your build environment. ################################################################################ # Paths and Environment Variables # ** MODIFY AS NEEDED (modification should not be necessary for runs) ** ################################################################################ # Allow environment variables to be set before runs: preenv = 1 # Necessary to avoid out-of-memory exceptions on certain SUTs: preENV_MALLOC_CONF = retain:true # Define the name of the directory that holds AMD library files: %define lib_dir %{binary_package_name}_lib # Set the shared object library path for runs and builds: preENV_LD_LIBRARY_PATH = $[top]/%{lib_dir}/64;$[top]/%{lib_dir}/32:%{ENV_LD_LIBRARY_PATH} # Define 32-bit library build paths: # Do not use $[top] with the 32-bit libraries because doing so will cause an # options checksum error triggering a xalanc recompile attempt on SUTs having # different file paths: JEMALLOC_LIB32_PATH = %{build_path}%{lib_dir}/32 %if '%{allow_build}' eq 'false' # The include file is only needed for runs, but not for builds. # include: %{inc_file_name} # ----- Begin inclusion of 'amd_rate_aocc300_milan_A1.inc' ############################################################################ ################################################################################ ################################################################################ # File name: amd_rate_aocc300_milan_A1.inc # File generation code date: November 13, 2020 # File generation date/time: March 17, 2021 / 14:44:52 # # This file is automatically generated during a SPEC CPU2017 run. # # To modify inc file generation, please consult the readme file or the run # script. ################################################################################ ################################################################################ ################################################################################ ################################################################################ # The following macros are generated for use in the cfg file. ################################################################################ ################################################################################ %define logical_core_count 48 %define physical_core_count 24 ################################################################################ # The following inc blocks set the rate copy counts and affinity settings. # # intrate benchmarks: 500.perlbench_r,502.gcc_r,505.mcf_r,520.omnetpp_r, # 523.xalancbmk_r,525.x264_r,531.deepsjeng_r,541.leela_r,548.exchange2_r, # 557.xz_r # fpspeed benchmarks: 503.bwaves_r,507.cactuBSSN_r,519.lbm_r,521.wrf_r, # 527.cam4_r,538.imagick_r,544.nab_r,549.fotonik3d_r,554.roms_r # # Selected copy counts from 'milan24' section of CPU info ################################################################################ # default copy counts: default: copies = 48 # Bind commands for assigning affinity: bind0 = numactl --localalloc --physcpubind=0 bind1 = numactl --localalloc --physcpubind=1 bind2 = numactl --localalloc --physcpubind=2 bind3 = numactl --localalloc --physcpubind=3 bind4 = numactl --localalloc --physcpubind=4 bind5 = numactl --localalloc --physcpubind=5 bind6 = numactl --localalloc --physcpubind=6 bind7 = numactl --localalloc --physcpubind=7 bind8 = numactl --localalloc --physcpubind=8 bind9 = numactl --localalloc --physcpubind=9 bind10 = numactl --localalloc --physcpubind=10 bind11 = numactl --localalloc --physcpubind=11 bind12 = numactl --localalloc --physcpubind=12 bind13 = numactl --localalloc --physcpubind=13 bind14 = numactl --localalloc --physcpubind=14 bind15 = numactl --localalloc --physcpubind=15 bind16 = numactl --localalloc --physcpubind=16 bind17 = numactl --localalloc --physcpubind=17 bind18 = numactl --localalloc --physcpubind=18 bind19 = numactl --localalloc --physcpubind=19 bind20 = numactl --localalloc --physcpubind=20 bind21 = numactl --localalloc --physcpubind=21 bind22 = numactl --localalloc --physcpubind=22 bind23 = numactl --localalloc --physcpubind=23 bind24 = numactl --localalloc --physcpubind=24 bind25 = numactl --localalloc --physcpubind=25 bind26 = numactl --localalloc --physcpubind=26 bind27 = numactl --localalloc --physcpubind=27 bind28 = numactl --localalloc --physcpubind=28 bind29 = numactl --localalloc --physcpubind=29 bind30 = numactl --localalloc --physcpubind=30 bind31 = numactl --localalloc --physcpubind=31 bind32 = numactl --localalloc --physcpubind=32 bind33 = numactl --localalloc --physcpubind=33 bind34 = numactl --localalloc --physcpubind=34 bind35 = numactl --localalloc --physcpubind=35 bind36 = numactl --localalloc --physcpubind=36 bind37 = numactl --localalloc --physcpubind=37 bind38 = numactl --localalloc --physcpubind=38 bind39 = numactl --localalloc --physcpubind=39 bind40 = numactl --localalloc --physcpubind=40 bind41 = numactl --localalloc --physcpubind=41 bind42 = numactl --localalloc --physcpubind=42 bind43 = numactl --localalloc --physcpubind=43 bind44 = numactl --localalloc --physcpubind=44 bind45 = numactl --localalloc --physcpubind=45 bind46 = numactl --localalloc --physcpubind=46 bind47 = numactl --localalloc --physcpubind=47 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak copy counts: 24 510.parest_r,521.wrf_r,549.fotonik3d_r,554.roms_r=peak: copies = 24 # Bind commands for assigning affinity: bind0 = numactl --localalloc --physcpubind=0 bind1 = numactl --localalloc --physcpubind=1 bind2 = numactl --localalloc --physcpubind=2 bind3 = numactl --localalloc --physcpubind=3 bind4 = numactl --localalloc --physcpubind=4 bind5 = numactl --localalloc --physcpubind=5 bind6 = numactl --localalloc --physcpubind=6 bind7 = numactl --localalloc --physcpubind=7 bind8 = numactl --localalloc --physcpubind=8 bind9 = numactl --localalloc --physcpubind=9 bind10 = numactl --localalloc --physcpubind=10 bind11 = numactl --localalloc --physcpubind=11 bind12 = numactl --localalloc --physcpubind=12 bind13 = numactl --localalloc --physcpubind=13 bind14 = numactl --localalloc --physcpubind=14 bind15 = numactl --localalloc --physcpubind=15 bind16 = numactl --localalloc --physcpubind=16 bind17 = numactl --localalloc --physcpubind=17 bind18 = numactl --localalloc --physcpubind=18 bind19 = numactl --localalloc --physcpubind=19 bind20 = numactl --localalloc --physcpubind=20 bind21 = numactl --localalloc --physcpubind=21 bind22 = numactl --localalloc --physcpubind=22 bind23 = numactl --localalloc --physcpubind=23 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak copy counts: 12 503.bwaves_r=peak: copies = 12 # Bind commands for assigning affinity: bind0 = numactl --localalloc --physcpubind=0 bind1 = numactl --localalloc --physcpubind=2 bind2 = numactl --localalloc --physcpubind=4 bind3 = numactl --localalloc --physcpubind=6 bind4 = numactl --localalloc --physcpubind=8 bind5 = numactl --localalloc --physcpubind=10 bind6 = numactl --localalloc --physcpubind=12 bind7 = numactl --localalloc --physcpubind=14 bind8 = numactl --localalloc --physcpubind=16 bind9 = numactl --localalloc --physcpubind=18 bind10 = numactl --localalloc --physcpubind=20 bind11 = numactl --localalloc --physcpubind=22 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ # Switch back to the default block after the include file: default: ################################################################################ ################################################################################ ################################################################################ ################################################################################ # The remainder of this file defines CPU2017 report parameters. ################################################################################ ################################################################################ ################################################################################ # SPEC CPU 2017 report header ################################################################################ license_num =55 tester =Dell Inc. test_sponsor =Dell Inc. hw_vendor =Dell Inc. hw_model =PowerEdge C6525 (AMD EPYC 7443P 24-Core Processor) #--------- If you install new compilers, edit this section -------------------- sw_compiler =C/C++/Fortran: Version 3.0.0 of AOCC ################################################################################ ################################################################################ # Hardware, firmware and software information ################################################################################ hw_avail =Mar-2021 sw_avail =Mar-2021 hw_cpu_name =AMD EPYC milan24 hw_cpu_nominal_mhz =2850 hw_cpu_max_mhz =4000 hw_ncores =24 hw_nthreadspercore =2 hw_ncpuorder =1 chip hw_other =None # Other perf-relevant hw, or "None" fw_bios =Version 2.2.0 released Jan-2021 sw_base_ptrsize =64-bit hw_pcache =32 KB I + 32 KB D on chip per core hw_scache =512 KB I+D on chip per core hw_tcache000 =128 MB I+D on chip per chip, 32 MB shared / 6 hw_tcache001 = cores hw_ocache =None ################################################################################ # Notes ################################################################################ # Enter notes_000 through notes_100 here. notes_000 =Binaries were compiled on a system with 2x AMD EPYC 7742 CPU + 512GiB Memory using OpenSUSE 15.2 notes_005 = notes_010 = notes_submit_000 ='numactl' was used to bind copies to the cores. notes_submit_005 =See the configuration file for details. notes_os_000 ='ulimit -s unlimited' was used to set environment stack size limit notes_os_005 ='ulimit -l 2097152' was used to set environment locked pages in memory limit notes_os_010 = notes_os_015 =runcpu command invoked through numactl i.e.: notes_os_020 =numactl --interleave=all runcpu notes_os_025 = notes_os_030 ='echo 8 > /proc/sys/vm/dirty_ratio' run as root to limit dirty cache to 8% of notes_os_035 =memory. notes_os_040 ='echo 1 > /proc/sys/vm/swappiness' run as root to limit swap usage to minimum notes_os_045 =necessary. notes_os_050 ='echo 1 > /proc/sys/vm/zone_reclaim_mode' run as root to free node-local memory notes_os_055 =and avoid remote memory usage. notes_os_060 ='sync; echo 3 > /proc/sys/vm/drop_caches' run as root to reset filesystem caches. notes_os_065 ='sysctl -w kernel.randomize_va_space=0' run as root to disable address space layout notes_os_070 =randomization (ASLR) to reduce run-to-run variability. notes_os_075 = notes_os_080 ='echo always > /sys/kernel/mm/transparent_hugepage/enabled' and notes_os_085 ='echo always > /sys/kernel/mm/transparent_hugepage/defrag' run as root for peak notes_os_090 =integer runs and all FP runs to enable Transparent Hugepages (THP). notes_os_095 ='echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' run as root for base notes_os_100 =integer runs to enable THP only on request. notes_comp_000 =The AMD64 AOCC Compiler Suite is available at notes_comp_005 =http://developer.amd.com/amd-aocc/ notes_comp_010 = notes_jemalloc_000 = notes_jemalloc_005 =jemalloc: configured and built with GCC v4.8.2 in RHEL 7.4 (No options specified) notes_jemalloc_010 =jemalloc 5.1.0 is available here: notes_jemalloc_015 =https://github.com/jemalloc/jemalloc/releases/download/5.1.0/jemalloc-5.1.0.tar.bz2 notes_jemalloc_020 = sw_other =jemalloc: jemalloc memory allocator library v5.1.0 ################################################################################ # The following note fields describe platorm settings. ################################################################################ # example: (uncomment as necessary) # notes_plat_000 =BIOS settings: # notes_plat_002 = cTDP: 280 # notes_plat_004 = Determinism Slider set to Power # notes_plat_006 = Package Power: 280 # notes_plat_008 = EDC: 300 # notes_plat_010 = NPS: 1 # notes_plat_011 = ACPI SRAT L3 Cache as NUMA Domain: enabled # notes_plat_012 = Memory interleaving: Disabled # notes_plat_014 = 4-link xGMI max speed: 16Gbps # notes_plat_015 = Fan Speed: Maximum ################################################################################ # The following are custom fields: ################################################################################ # Use custom_fields to enter lines that are not listed here. For example: # notes_plat_100 = Energy Bias set to Max Performance # new_field = Ambient temperature set to 10C ################################################################################ # The following fields must be set here for only Int benchmarks. ################################################################################ intrate: sw_peak_ptrsize =32/64-bit ################################################################################ # The following fields must be set here for FP benchmarks. ################################################################################ fprate: sw_peak_ptrsize =64-bit ################################################################################ # The following fields must be set here or they will be overwritten by sysinfo. ################################################################################ intrate,fprate: hw_disk =252 GB on tmpfs hw_memory =512 GB (8 x 64 GB 2Rx4 PC4-3200AA-R) hw_nchips =1 prepared_by =Dell Inc. sw_file =tmpfs sw_os000 =Red Hat Enterprise Linux 8.3 (Ootpa) sw_os001 =4.18.0-240.el8.x86_64 sw_state =Run level 3 (multi-user) ################################################################################ # End of inc file ################################################################################ # Switch back to the default block after the include file: default: # ---- End inclusion of '/dev/shm/cpu2017-1.1.5/config/amd_rate_aocc300_milan_A1.inc' # Switch back to default block after the include file: default: fail_build = 1 %elif '%{allow_build}' eq 'true' # If you intend to rebuild, be sure to set the library paths either in the # build script or here: preENV_LIBRARY_PATH = $[top]/%{lib_dir}/64;$[top]/%{lib_dir}/32:%{ENV_LIBRARY_PATH} % define build_ncpus 64 # controls number of simultaneous compiles fail_build = 0 makeflags = --jobs=%{build_ncpus} --load-average=%{build_ncpus} %else % error The value of "allow_build" is %{allow_build}, but it can only be "true" or "false". This error was generated %endif ################################################################################ # Enable automated data collection per benchmark ################################################################################ # Data collection is not enabled for reportable runs. # teeout is necessary to get data collection stdout into the logs. Best # practices for the individual data collection items would be to have # them store important output in separate files. Filenames could be # constructed from $SPEC (environment), $lognum (result number from runcpu), # and benchmark name/number. teeout = yes # Run runcpu with '-v 35' (or greater) to log lists of variables which can # be used in substitutions as below. # For CPU2006, change $label to $ext %define data-collection-parameters benchname='$name' benchnum='$num' benchmark='$benchmark' iteration=$iter size='$size' tune='$tune' label='$label' log='$log' lognum='$lognum' from_runcpu='$from_runcpu' %define data-collection-start $[top]/data-collection/data-collection start %{data-collection-parameters} %define data-collection-stop $[top]/data-collection/data-collection stop %{data-collection-parameters} monitor_specrun_wrapper = %{data-collection-start} ; $command ; %{data-collection-stop} ################################################################################ # Header settings ################################################################################ backup_config = 0 # set to 0 if you do not want backup files bench_post_setup = sync # command_add_redirect: If set, the generated ${command} will include # redirection operators (stdout, stderr), which are passed along to the shell # that executes the command. If this variable is not set, specinvoke does the # redirection. command_add_redirect = yes env_vars = yes flagsurl000 = http://www.spec.org/cpu2017/flags/aocc300-flags-A1.xml flagsurl001 = http://www.spec.org/cpu2017/flags/Dell-Platform-Flags-PowerEdge-AMD-Milan-rev2.1.xml #flagsurl02 = $[top]/%{platform_file_name} # label: User defined extension string that tags your binaries & directories: label = %{ext} line_width = 1020 log_line_width = 1020 mean_anyway = yes output_format = all reportable = yes size = test,train,ref teeout = yes teerunout = yes tune = base,peak ################################################################################ # Compilers ################################################################################ default: CC = clang -m64 CXX = clang++ -m64 -std=c++98 FC = flang -m64 CLD = clang -m64 CXXLD = clang++ -m64 FLD = flang -m64 CC_VERSION_OPTION = --version CXX_VERSION_OPTION = --version FC_VERSION_OPTION = --version ################################################################################ # Portability Flags ################################################################################ default:# data model applies to all benchmarks ################################################################################ # Default Flags ################################################################################ EXTRA_PORTABILITY = -DSPEC_LP64 EXTRA_LIBS = -ljemalloc -lamdlibm -lm MATHLIBOPT = #clearing this variable or else SPEC will set it to -lm VECMATHLIB = -fveclib=AMDLIBM OPT_ROOT = -march=znver3 $(VECMATHLIB) OPT_ROOT_BASE = -O3 -ffast-math $(OPT_ROOT) OPT_ROOT_PEAK = -Ofast $(OPT_ROOT) -flto #Ofast enables -ffast-math CROSSPLAT_PORT_OPTS = -mno-adx -mno-sse4a ################################################################################ # Portability Flags ################################################################################ default: # *** Benchmark-specific portability *** # Anything other than the data model is only allowed where a need is proven. # (ordered by last 2 digits of benchmark number) 500.perlbench_r: #lang='C' PORTABILITY = -DSPEC_LINUX_X64 521.wrf_r: #lang='F,C' CPORTABILITY = -DSPEC_CASE_FLAG FPORTABILITY = -Mbyteswapio 523.xalancbmk_r: #lang='CXX' PORTABILITY = -DSPEC_LINUX 526.blender_r: #lang='CXX,C' CPORTABILITY = -funsigned-char CXXPORTABILITY = -D__BOOL_DEFINED 527.cam4_r: #lang='F,C' PORTABILITY = -DSPEC_CASE_FLAG ################################################################################ # Tuning Flags ################################################################################ ##################### # Base tuning flags # ##################### default=base: COPTIMIZE = $(OPT_ROOT_BASE) -flto -fstruct-layout=5 \ -mllvm -unroll-threshold=50 \ -mllvm -inline-threshold=1000 -fremap-arrays -mllvm \ -function-specialize -flv-function-specialization \ -mllvm -enable-gvn-hoist -mllvm \ -global-vectorize-slp=true -mllvm -enable-licm-vrp \ -mllvm -reduce-array-computations=3 \ -Wno-unused-command-line-argument CXXOPTIMIZE = $(OPT_ROOT_BASE) -flto -mllvm -enable-partial-unswitch \ -mllvm -unroll-threshold=100\ -finline-aggressive -flv-function-specialization \ -mllvm -loop-unswitch-threshold=200000 \ -mllvm -reroll-loops -mllvm -aggressive-loop-unswitch\ -mllvm -extra-vectorizer-passes -mllvm \ -reduce-array-computations=3 -mllvm \ -global-vectorize-slp=true \ -Wno-unused-command-line-argument \ -mllvm -convert-pow-exp-to-int=false FOPTIMIZE = -Hz,1,0x1 $(OPT_ROOT_BASE) -Kieee -Mrecursive \ -mllvm -fuse-tile-inner-loop -funroll-loops \ -mllvm -extra-vectorizer-passes -mllvm \ -lsr-in-nested-loop -mllvm -enable-licm-vrp -mllvm \ -reduce-array-computations=3 -mllvm \ -global-vectorize-slp=true \ -Wno-unused-command-line-argument LDCXXFLAGS = -Wl,-mllvm -Wl,-x86-use-vzeroupper=false EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-region-vectorize \ -Wl,-mllvm -Wl,-function-specialize \ -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 LDFFLAGS = -Wl,-mllvm -Wl,-enable-X86-prefetching \ -Wl,-mllvm -Wl,-enable-licm-vrp EXTRA_LIBS = -lamdlibm -lm -ljemalloc -lflang -lflangrti EXTRA_FLIBS = # Don't put the AMD and mvec math libraries in MATH_LIBS because it will trigger a reporting issue # because GCC won't use them. Forcefeed all benchmarks the math libraries in EXTRA_LIBS and clear # out MATH_LIBS. MATH_LIBS = # The following is necessary for 502/602 gcc: LDOPTIMIZE = -z muldefs ######################## # intrate tuning flags # ######################## intrate: FOPTIMIZE = $(OPT_ROOT_BASE) -flto EXTRA_FFLAGS = -mllvm -unroll-aggressive \ -mllvm -unroll-threshold=500 EXTRA_CXXFLAGS = -mllvm -do-block-reorder=aggressive \ -fvirtual-function-elimination -fvisibility=hidden LDCFLAGS = -Wl,-allow-multiple-definition -Wl,-mllvm \ -Wl,-enable-licm-vrp LDCXXFLAGS = -Wl,-mllvm -Wl,-do-block-reorder=aggressive LDFFLAGS = -Wl,-mllvm -Wl,-inline-recursion=4 \ -Wl,-mllvm -Wl,-lsr-in-nested-loop \ -Wl,-mllvm -Wl,-enable-iv-split ######################## # fprate tuning flags # ######################## fprate: CXX = clang++ -m64 -std=c++98 $[CROSSPLAT_PORT_OPTS] ##################### # Peak tuning flags # ##################### default=peak: COPTIMIZE = $(OPT_ROOT_PEAK) -fstruct-layout=7 \ -mllvm -unroll-threshold=50 -fremap-arrays \ -flv-function-specialization -mllvm \ -inline-threshold=1000 -mllvm -enable-gvn-hoist \ -mllvm -global-vectorize-slp=true -mllvm \ -function-specialize -mllvm -enable-licm-vrp \ -mllvm -reduce-array-computations=3 \ -Wno-unused-command-line-argument CXXOPTIMIZE = $(OPT_ROOT_PEAK) -finline-aggressive \ -mllvm -unroll-threshold=100 \ -flv-function-specialization -mllvm -enable-licm-vrp\ -mllvm -reroll-loops -mllvm \ -aggressive-loop-unswitch -mllvm \ -reduce-array-computations=3 -mllvm \ -global-vectorize-slp=true \ -Wno-unused-command-line-argument FOPTIMIZE = $(OPT_ROOT_PEAK) -Kieee -Mrecursive \ -mllvm -reduce-array-computations=3 \ -mllvm -global-vectorize-slp=true \ -mllvm -enable-licm-vrp \ -Wno-unused-command-line-argument EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-function-specialize \ -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 LDFFLAGS = -Wl,-mllvm -Wl,-enable-X86-prefetching \ -Wl,-mllvm -Wl,-enable-licm-vrp LDCXXFLAGS = -Wl,-mllvm -Wl,-x86-use-vzeroupper=false \ -Wl,-mllvm -Wl,-enable-licm-vrp feedback = 0 PASS1_CFLAGS = -fprofile-instr-generate PASS2_CFLAGS = -fprofile-instr-use PASS1_FFLAGS = -fprofile-generate PASS2_FFLAGS = -fprofile-use PASS1_CXXFLAGS = -fprofile-instr-generate PASS2_CXXFLAGS = -fprofile-instr-use PASS1_LDFLAGS = -fprofile-instr-generate PASS2_LDFLAGS = -fprofile-instr-use fdo_run1 = $command ; llvm-profdata merge --output=default.profdata *.profraw #libraries: EXTRA_LIBS = -lamdlibm -lm -ljemalloc EXTRA_FLIBS = -lflang -lflangrti # Benchmark specific peak tuning flags: 500.perlbench_r=peak: #lang='C' COPTIMIZE = $(OPT_ROOT_PEAK) -fstruct-layout=7 \ -mllvm -unroll-threshold=50 -fremap-arrays \ -flv-function-specialization -mllvm \ -inline-threshold=1000 -mllvm -enable-gvn-hoist \ -mllvm -global-vectorize-slp=false \ -mllvm -function-specialize -mllvm -enable-licm-vrp\ -mllvm -reduce-array-computations=3 \ -Wno-unused-command-line-argument feedback = 1 502.gcc_r=peak: #lang='C' EXTRA_PORTABILITY = -D_FILE_OFFSET_BITS=64 EXTRA_COPTIMIZE = -fgnu89-inline CC = clang -m32 CLD = clang -m32 -L/usr/lib EXTRA_LIBS = -L$[JEMALLOC_LIB32_PATH] -ljemalloc MATHLIBOPT = -lm EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-function-specialize 507.cactuBSSN_r=peak: CXXOPTIMIZE = $(OPT_ROOT_PEAK) \ -mllvm -unroll-threshold=100 \ -flv-function-specialization \ -mllvm -loop-unswitch-threshold=200000 \ -finline-aggressive -mllvm -reroll-loops \ -mllvm -aggressive-loop-unswitch \ -mllvm -reduce-array-computations=3 \ -mllvm -extra-vectorizer-passes \ -mllvm -convert-pow-exp-to-int=false EXTRA_LIBS += $(EXTRA_FLIBS) #adding flang libs to cxx linker 510.parest_r=peak: EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-suppress-fmas \ -Wl,-mllvm -Wl,-function-specialize 523.xalancbmk_r=peak: #lang='CXX` EXTRA_PORTABILITY = -DSPEC_LP64 CXX = clang++ -m32 CXXLD = clang++ -m32 -L/usr/lib EXTRA_CXXFLAGS = -mllvm -do-block-reorder=aggressive \ -fvirtual-function-elimination -fvisibility=hidden LDCXXFLAGS = -Wl,-mllvm -Wl,-do-block-reorder=aggressive EXTRA_LIBS = -L$[JEMALLOC_LIB32_PATH] -ljemalloc ENV_MALLOC_CONF = thp:never 525.x264_r=peak: #lang='C' basepeak=yes 527.cam4_r=peak: FOPTIMIZE = $(OPT_ROOT_BASE) -ffast-math -funroll-loops \ -mllvm -extra-vectorizer-passes \ -mllvm -lsr-in-nested-loop -Mrecursive EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-function-specialize \ -Wl,-mllvm -Wl,-force-vector-interleave=1 544.nab_r=peak: EXTRA_LDFLAGS = -flto -Wl,-mllvm -Wl,-region-vectorize \ -Wl,-mllvm -Wl,-function-specialize 554.roms_r=peak: EXTRA_FFLAGS = -Hz,1,0x1 -mllvm -fuse-tile-inner-loop # The following settings were obtained by running the sysinfo_program # 'specperl $[top]/bin/sysinfo' (sysinfo:SHA:60a26e139a7df7ba5521c983304469c762a79f3394ac112dddae4bac7d1a4f55) default: notes_plat_sysinfo_000 = notes_plat_sysinfo_005 = Sysinfo program /dev/shm/cpu2017-1.1.5/bin/sysinfo notes_plat_sysinfo_010 = Rev: r6538 of 2020-09-24 e8664e66d2d7080afeaa89d4b38e2f1c notes_plat_sysinfo_015 = running on rhel-8-3-amd Wed Mar 17 14:45:04 2021 notes_plat_sysinfo_020 = notes_plat_sysinfo_025 = SUT (System Under Test) info as seen by some common utilities. notes_plat_sysinfo_030 = For more information on this section, see notes_plat_sysinfo_035 = https://www.spec.org/cpu2017/Docs/config.html#sysinfo notes_plat_sysinfo_040 = notes_plat_sysinfo_045 = From /proc/cpuinfo notes_plat_sysinfo_050 = model name : AMD EPYC 7443P 24-Core Processor notes_plat_sysinfo_055 = 1 "physical id"s (chips) notes_plat_sysinfo_060 = 48 "processors" notes_plat_sysinfo_065 = cores, siblings (Caution: counting these is hw and system dependent. The following notes_plat_sysinfo_070 = excerpts from /proc/cpuinfo might not be reliable. Use with caution.) notes_plat_sysinfo_075 = cpu cores : 24 notes_plat_sysinfo_080 = siblings : 48 notes_plat_sysinfo_085 = physical 0: cores 0 1 2 3 4 5 8 9 10 11 12 13 16 17 18 19 20 21 24 25 26 27 28 29 notes_plat_sysinfo_090 = notes_plat_sysinfo_095 = From lscpu: notes_plat_sysinfo_100 = Architecture: x86_64 notes_plat_sysinfo_105 = CPU op-mode(s): 32-bit, 64-bit notes_plat_sysinfo_110 = Byte Order: Little Endian notes_plat_sysinfo_115 = CPU(s): 48 notes_plat_sysinfo_120 = On-line CPU(s) list: 0-47 notes_plat_sysinfo_125 = Thread(s) per core: 2 notes_plat_sysinfo_130 = Core(s) per socket: 24 notes_plat_sysinfo_135 = Socket(s): 1 notes_plat_sysinfo_140 = NUMA node(s): 4 notes_plat_sysinfo_145 = Vendor ID: AuthenticAMD notes_plat_sysinfo_150 = CPU family: 25 notes_plat_sysinfo_155 = Model: 1 notes_plat_sysinfo_160 = Model name: AMD EPYC 7443P 24-Core Processor notes_plat_sysinfo_165 = Stepping: 1 notes_plat_sysinfo_170 = CPU MHz: 2702.763 notes_plat_sysinfo_175 = BogoMIPS: 5689.59 notes_plat_sysinfo_180 = Virtualization: AMD-V notes_plat_sysinfo_185 = L1d cache: 32K notes_plat_sysinfo_190 = L1i cache: 32K notes_plat_sysinfo_195 = L2 cache: 512K notes_plat_sysinfo_200 = L3 cache: 32768K notes_plat_sysinfo_205 = NUMA node0 CPU(s): 0-5,24-29 notes_plat_sysinfo_210 = NUMA node1 CPU(s): 6-11,30-35 notes_plat_sysinfo_215 = NUMA node2 CPU(s): 12-17,36-41 notes_plat_sysinfo_220 = NUMA node3 CPU(s): 18-23,42-47 notes_plat_sysinfo_225 = Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov notes_plat_sysinfo_230 = pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm notes_plat_sysinfo_235 = constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq notes_plat_sysinfo_240 = monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c notes_plat_sysinfo_245 = rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch notes_plat_sysinfo_250 = osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb notes_plat_sysinfo_255 = cat_l3 cdp_l3 invpcid_single hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall notes_plat_sysinfo_260 = fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb notes_plat_sysinfo_265 = sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total notes_plat_sysinfo_270 = cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock notes_plat_sysinfo_275 = nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold notes_plat_sysinfo_280 = v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca notes_plat_sysinfo_285 = notes_plat_sysinfo_290 = /proc/cpuinfo cache data notes_plat_sysinfo_295 = cache size : 512 KB notes_plat_sysinfo_300 = notes_plat_sysinfo_305 = From numactl --hardware WARNING: a numactl 'node' might or might not correspond to a notes_plat_sysinfo_310 = physical chip. notes_plat_sysinfo_315 = available: 4 nodes (0-3) notes_plat_sysinfo_320 = node 0 cpus: 0 1 2 3 4 5 24 25 26 27 28 29 notes_plat_sysinfo_325 = node 0 size: 128441 MB notes_plat_sysinfo_330 = node 0 free: 128338 MB notes_plat_sysinfo_335 = node 1 cpus: 6 7 8 9 10 11 30 31 32 33 34 35 notes_plat_sysinfo_340 = node 1 size: 128901 MB notes_plat_sysinfo_345 = node 1 free: 128780 MB notes_plat_sysinfo_350 = node 2 cpus: 12 13 14 15 16 17 36 37 38 39 40 41 notes_plat_sysinfo_355 = node 2 size: 128931 MB notes_plat_sysinfo_360 = node 2 free: 122789 MB notes_plat_sysinfo_365 = node 3 cpus: 18 19 20 21 22 23 42 43 44 45 46 47 notes_plat_sysinfo_370 = node 3 size: 128886 MB notes_plat_sysinfo_375 = node 3 free: 128802 MB notes_plat_sysinfo_380 = node distances: notes_plat_sysinfo_385 = node 0 1 2 3 notes_plat_sysinfo_390 = 0: 10 12 12 12 notes_plat_sysinfo_395 = 1: 12 10 12 12 notes_plat_sysinfo_400 = 2: 12 12 10 12 notes_plat_sysinfo_405 = 3: 12 12 12 10 notes_plat_sysinfo_410 = notes_plat_sysinfo_415 = From /proc/meminfo notes_plat_sysinfo_420 = MemTotal: 527967864 kB notes_plat_sysinfo_425 = HugePages_Total: 0 notes_plat_sysinfo_430 = Hugepagesize: 2048 kB notes_plat_sysinfo_435 = notes_plat_sysinfo_440 = /sbin/tuned-adm active notes_plat_sysinfo_445 = Current active profile: throughput-performance notes_plat_sysinfo_450 = notes_plat_sysinfo_455 = From /etc/*release* /etc/*version* notes_plat_sysinfo_460 = os-release: notes_plat_sysinfo_465 = NAME="Red Hat Enterprise Linux" notes_plat_sysinfo_470 = VERSION="8.3 (Ootpa)" notes_plat_sysinfo_475 = ID="rhel" notes_plat_sysinfo_480 = ID_LIKE="fedora" notes_plat_sysinfo_485 = VERSION_ID="8.3" notes_plat_sysinfo_490 = PLATFORM_ID="platform:el8" notes_plat_sysinfo_495 = PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)" notes_plat_sysinfo_500 = ANSI_COLOR="0;31" notes_plat_sysinfo_505 = redhat-release: Red Hat Enterprise Linux release 8.3 (Ootpa) notes_plat_sysinfo_510 = system-release: Red Hat Enterprise Linux release 8.3 (Ootpa) notes_plat_sysinfo_515 = system-release-cpe: cpe:/o:redhat:enterprise_linux:8.3:ga notes_plat_sysinfo_520 = notes_plat_sysinfo_525 = uname -a: notes_plat_sysinfo_530 = Linux rhel-8-3-amd 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 notes_plat_sysinfo_535 = x86_64 x86_64 GNU/Linux notes_plat_sysinfo_540 = notes_plat_sysinfo_545 = Kernel self-reported vulnerability status: notes_plat_sysinfo_550 = notes_plat_sysinfo_555 = CVE-2018-12207 (iTLB Multihit): Not affected notes_plat_sysinfo_560 = CVE-2018-3620 (L1 Terminal Fault): Not affected notes_plat_sysinfo_565 = Microarchitectural Data Sampling: Not affected notes_plat_sysinfo_570 = CVE-2017-5754 (Meltdown): Not affected notes_plat_sysinfo_575 = CVE-2018-3639 (Speculative Store Bypass): Mitigation: Speculative Store notes_plat_sysinfo_580 = Bypass disabled via prctl and notes_plat_sysinfo_585 = seccomp notes_plat_sysinfo_590 = CVE-2017-5753 (Spectre variant 1): Mitigation: usercopy/swapgs notes_plat_sysinfo_595 = barriers and __user pointer notes_plat_sysinfo_600 = sanitization notes_plat_sysinfo_605 = CVE-2017-5715 (Spectre variant 2): Mitigation: Full AMD retpoline, notes_plat_sysinfo_610 = IBPB: conditional, IBRS_FW, STIBP: notes_plat_sysinfo_615 = always-on, RSB filling notes_plat_sysinfo_620 = CVE-2020-0543 (Special Register Buffer Data Sampling): Not affected notes_plat_sysinfo_625 = CVE-2019-11135 (TSX Asynchronous Abort): Not affected notes_plat_sysinfo_630 = notes_plat_sysinfo_635 = run-level 3 Nov 26 09:22 notes_plat_sysinfo_640 = notes_plat_sysinfo_645 = SPEC is set to: /dev/shm/cpu2017-1.1.5 notes_plat_sysinfo_650 = Filesystem Type Size Used Avail Use% Mounted on notes_plat_sysinfo_655 = tmpfs tmpfs 252G 5.7G 247G 3% /dev/shm notes_plat_sysinfo_660 = notes_plat_sysinfo_665 = From /sys/devices/virtual/dmi/id notes_plat_sysinfo_670 = Vendor: Dell Inc. notes_plat_sysinfo_675 = Product: PowerEdge C6525 notes_plat_sysinfo_680 = Product Family: PowerEdge notes_plat_sysinfo_685 = notes_plat_sysinfo_690 = Additional information from dmidecode follows. WARNING: Use caution when you interpret notes_plat_sysinfo_695 = this section. The 'dmidecode' program reads system data which is "intended to allow notes_plat_sysinfo_700 = hardware to be accurately determined", but the intent may not be met, as there are notes_plat_sysinfo_705 = frequent changes to hardware, firmware, and the "DMTF SMBIOS" standard. notes_plat_sysinfo_710 = Memory: notes_plat_sysinfo_715 = 8x 80AD863280AD HMAA8GR7AJR4N-XN 64 GB 2 rank 3200 notes_plat_sysinfo_720 = 8x Not Specified Not Specified notes_plat_sysinfo_725 = notes_plat_sysinfo_730 = BIOS: notes_plat_sysinfo_735 = BIOS Vendor: Dell Inc. notes_plat_sysinfo_740 = BIOS Version: 2.2.0 notes_plat_sysinfo_745 = BIOS Date: 01/21/2021 notes_plat_sysinfo_750 = BIOS Revision: 2.2 notes_plat_sysinfo_755 = notes_plat_sysinfo_760 = (End of data from sysinfo program) hw_cpu_name = AMD EPYC 7443P hw_disk = 252 GB add more disk info here hw_memory001 = 503.509 GB fixme: If using DDR4, the format is: hw_memory002 = 'N GB (N x N GB nRxn PC4-nnnnX-X)' hw_nchips = 1 prepared_by = root (is never output, only tags rawfile) sw_file = tmpfs sw_os001 = Red Hat Enterprise Linux release 8.3 (Ootpa) sw_state = Run level 3 (add definition here) # End of settings added by sysinfo_program 557.xz_r: # The following setting was inserted automatically as a result of # post-run basepeak application. basepeak = 1 541.leela_r: # The following setting was inserted automatically as a result of # post-run basepeak application. basepeak = 1 505.mcf_r: # The following setting was inserted automatically as a result of # post-run basepeak application. basepeak = 1 # The following section was added automatically, and contains settings that # did not appear in the original configuration file, but were added to the # raw file after the run. default: power_management000 = BIOS and OS set to prefer performance power_management001 = at the cost of additional power usage. notes_plat_000 = BIOS settings: notes_plat_005 = NUMA Nodes per Socket : 4 notes_plat_010 = L3 Cache as NUMA Domain : Enabled notes_plat_015 = Virtualization Technology : Disabled notes_plat_020 = DRAM Refresh Delay : Performance notes_plat_025 = System Profile : Custom notes_plat_030 = CPU Power Management : Maximum Performance notes_plat_035 = Memory Patrol Scrub : Disabled notes_plat_040 = PCI ASPM L1 Link notes_plat_045 = Power Management : Disabled notes_tmpfs_000 = notes_tmpfs_005 =Benchmark run from a 252 GB ramdisk created with the cmd: "mount -t tmpfs -o size=252G tmpfs /mnt/ramdisk" notes_mitig_000 = notes_mitig_005 =NA: The test sponsor attests, as of date of publication, that CVE-2017-5754 (Meltdown) notes_mitig_010 =is mitigated in the system as tested and documented. notes_mitig_015 =Yes: The test sponsor attests, as of date of publication, that CVE-2017-5753 (Spectre variant 1) notes_mitig_020 =is mitigated in the system as tested and documented. notes_mitig_025 =Yes: The test sponsor attests, as of date of publication, that CVE-2017-5715 (Spectre variant 2) notes_mitig_030 =is mitigated in the system as tested and documented.