Compilers |
Sun Studio 12
GCC for SPARC Systems V4.2.0 (gccfss).
Note: these compilers are described together because gccfss uses the same optimizing code generator as Studio 12. |
---|---|
Operating systems: | Solaris 10 |
Copyright: |
The text for many of the descriptions below was excerpted from the Sun Studio Compiler Documentation, which is copyright © 2005 Sun Microsystems, Inc. The original documentation can be found at docs.sun.com. Some material below is quoted from the gccfss website, http://cooltools.sunsource.net/gcc/. Additional information about GCC options may be found at the The GNU C documentation website. |
Last updated: 13-Mar-2008 jh |
Invoke the Sun Studio C Compiler.
Invoke the Sun Studio C++ Compiler
Invoke the Sun Studio Fortran 90 Compiler
Includes symbols in the executable. If the optimization level is -xO3 or lower, some optimizations may be disabled when -g is present. At -xO4 or higher, full optimization is performed, even when -g is present.
A convenience option, this switch selects several other options that are described in this file.
Perform optimizations across all object files in the link step:
At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
Set the preferred page size for running the program.
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Allows the compiler to perform type-based alias analysis at the specified alias level:
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Generate indirect prefetches for data arrays accessed indirectly.
Links in a linker mapfile that enables the creation of a 'bss' segment, and aligns the segment at 4Mb. This effectively provides an appropriate alignment for large page mapping of the heap.
Includes symbols in the executable. If the optimization level is -xO3 or lower, some optimizations may be disabled when -g0 is present. At -xO4 or higher, full optimization is performed, even when -g0 is present.
Use STLport's Standard Library implementation instead of the default libCstd.
A convenience option, this switch selects several other options that are described in this file.
Perform optimizations across all object files in the link step:
At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
Set the preferred page size for running the program.
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Analyze loops for inter-iteration data dependencies, and do loop restructuring.
Allows the compiler to perform type-based alias analysis:
Links in a linker mapfile that enables the creation of a 'bss' segment, and aligns the segment at 4Mb. This effectively provides an appropriate alignment for large page mapping of the heap.
Includes symbols in the executable. If the optimization level is -xO3 or lower, some optimizations may be disabled when -g is present. At -xO4 or higher, full optimization is performed, even when -g is present.
A convenience option, this switch selects the following switches that are described in this file:
Perform optimizations across all object files in the link step:
At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
Set the preferred page size for running the program.
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Links in a linker mapfile that enables the creation of a 'bss' segment, and aligns the segment at 4Mb. This effectively provides an appropriate alignment for large page mapping of the heap.
Includes symbols in the executable. If the optimization level is -xO3 or lower, some optimizations may be disabled when -g is present. At -xO4 or higher, full optimization is performed, even when -g is present.
A convenience option, this switch selects several other options that are described in this file.
A convenience option, this switch selects the following switches that are described in this file:
Perform optimizations across all object files in the link step:
At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
Set the preferred page size for running the program.
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Allows the compiler to perform type-based alias analysis at the specified alias level:
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Sun Studio C and Sun Studio C++ is 1. The default for Sun Studio Fortran and for gccfss is 2.
Generate indirect prefetches for data arrays accessed indirectly.
Links in a linker mapfile that enables the creation of a 'bss' segment, and aligns the segment at 4Mb. This effectively provides an appropriate alignment for large page mapping of the heap.
Specify the -xjobs option to set how many processes the compiler creates to complete its work. Currently, -xjobs works only with the -xipo option. When you specify -xjobs=n, the interprocedural optimizer uses n as the maximum number of code generator instances it can invoke to compile different files.
Directs the compiler to print the name and version ID of each component as the compiler executes.
Turns on verbose mode, showing how command options expand. Shows each component as it is invoked.
Specify the -xjobs option to set how many processes the compiler creates to complete its work. Currently, -xjobs works only with the -xipo option. When you specify -xjobs=n, the interprocedural optimizer uses n as the maximum number of code generator instances it can invoke to compile different files.
Controls compiler verbosity. There are several values that can be used with this flag:
The default is -verbose=%none.
Specify the -xjobs option to set how many processes the compiler creates to complete its work. Currently, -xjobs works only with the -xipo option. When you specify -xjobs=n, the interprocedural optimizer uses n as the maximum number of code generator instances it can invoke to compile different files.
Directs the compiler to print the name and version ID of each component as the compiler executes.
This flag will cause the Sun Studio Fortran compiler to emit verbose messages.
Specify the -xjobs option to set how many processes the compiler creates to complete its work. Currently, -xjobs works only with the -xipo option. When you specify -xjobs=n, the interprocedural optimizer uses n as the maximum number of code generator instances it can invoke to compile different files.
Directs the compiler to print the name and version ID of each component as the compiler executes.
Turns on verbose mode, showing how command options expand. Shows each component as it is invoked.
This flag will cause the Sun Studio Fortran compiler to emit verbose messages.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Allows the compiler to assume that your code does not rely on setting of the errno variable.
Assume data is naturally aligned.
Selects faster (but nonstandard) handling of floating point arithmetic exceptions and gradual underflow.
Controls simplifying assumptions for floating point arithmetic:
Evaluate float expressions as single precision.
Turns off all IEEE 754 trapping modes.
Allows the compiler to perform type-based alias analysis at the specified alias level:
Substitute intrinsic functions or inline system functions where profitable for performance.
Analyze loops for inter-iteration data dependencies, and do loop restructuring.
Use inline expansion for math library, libm.
Specify optimization level n:
Allow generation of prefetch instructions. -xprefetch=yes and -xprefetch are synonyms for -xprefetch=auto,explicit. -xprefetch=no is a synonym for -xprefetch=no%auto,no%explicit. (Explicit prefetch macros are not used in the source code of the SPEC CPU2006 benchmarks; therefore, in the context of CPU2006, -xprefetch=yes is effectively a synonym for -xprefetch=auto.)
Selects options for architecture, chip timing, and cache sizes. These can also be controlled separately, via -xarch, -xchip, and -xcache, respectively. A wide variety of targets can be selected, including ultra3, ultra3cu, ultra3i, ultra3iplus, ultra4, ultra4plus, ultraT1, ultraT2, sparc64vi. In each case, appropriate options are selected for architecture, chip timing, and cache size to match that target.
If -xtarget=native is selected, options that are appropriate for the system where the compile is being done.
The default is -xtarget=generic, which sets the parameters for the best performance over most 32-bit platform architectures.
On Solaris SPARC systems, the default pointer size with -xtarget=native is 32-bit.
Specifies which instructions can be used. Among the choices are:
xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
xchip determines timing properties that are assumed by the compiler. It does not limit which instructions are allowed (see xtarget for that). Among the choices are:
Select the optimized math library.
Sets the IEEE 754 trapping mode to common exceptions (invalid, division by zero, and overflow).
Pad local variables, for better use of cache.
Allow the compiler to transform math library calls within loops into calls to the vector math library. Specifying -xvector is equivalent to -xvector=yes.
These platform notes are in two sections: a generic section about system tuning and a section which is specific to the Sun Blade 6000 test of March, 2008.
One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.
autoup=<n> (Unix /etc/system)
When the file system flush daemon fsflush runs, it writes to disk all modified file buffers that are more
than n seconds old.
bufhwm=<n> (Unix /etc/system)
Sets the upper limit of the file system buffer cache. The units for bufhwm are in kilobytes. Alternatively, the units
can be expressed as a percent of total memory, by setting bufhw_pct.
cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors are enabled at boot time. <n> represents a bitmap of the
processors to be brought online.
disablecomponent (System Management Services)
This command can be used prior to booting the system for a 1-cpu test. The tester uses disablecomponent to
add all other CPUs to the "blacklist", which is a list of components that cannot be used at boot time.
LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be
defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to
ensure that the correct versions of libraries are picked up.
LD_PRELOAD=<shared object> (Unix environment variable)
Adds the named shared object to the runtime environment.
MADV=access_lwp and LD_PRELOAD=madv.so.1 (Unix environment variables)
When the madv.so.1 shared object is present in the LD_PRELOAD list, it is possible to provide advice to the system
about how memory is likely to be accessed. The advice present in MADV applies to all processes and their descendants. A
commonly used value is access_lwp, which means that when memory is allocated, the next process to touch it will be
the primary user. Examples of other possible values include sequential, for memory that is used only once and
then no longer needed and acces_many when many processes will be sharing data.
MPSSHEAP=<size>, MPSSSTACK=<size>, and
LD_PRELOAD=mpss.so.1 (Unix environment variables)
When these variables are set, the mpss.so.1 shared object will set the preferred page size for new processes, and their
descendants, to the requested sizes for the heap and stack.
PARALLEL=<n> (Unix environment variable)
If programs have been compiled with -xautopar, this environment variable can be set to the number of
processors that programs should use.
segmap_percent=<n> (Unix /etc/system)
This value controls the size of the segmap cache as a percent of total memory. Set this value to help keep the file
system cache from consuming memory unnecessarily.
STACKSIZE=<n> (Unix environment variable)
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.
svcadm disable webconsole (Unix, superuser commands)
Turns off the Sun Web Console, a browser-based interface that performs systems management.
If it is enabled, system administrators can manage systems, devices and services from remote systems.
tsb_rss_factor=<1> (Unix /etc/system)
Suggests that the the size of the TSB (Translation Storage Buffer) may be increased if it is more than 25% (128/512)
full. Doing so may reduce TSB traps, at the cost of additional kernel memory.
tune_t_fsflushr=<n> (Unix /etc/system)
Controls the number of seconds between runs of the file system flush daemon, fsflush.
ulimit -s <n> (Unix shell)
Sets the stack size to n kbytes, or "unlimited" to allow the stack size to grow without limit.
Note that the "heap" and the "stack" share space; if your application allocates large amounts of memory on the heap,
then you may find that the stack limit should not be set to "unlimited". A commonly used setting for SPEC CPU2006 purposes
is a stack size of 128MB (131072K).
For the testing of the Sun Blade 6000 System, jobs are submitted to processors using
submit = $[top]/config/blade-submit.pl $SPECCOPYNUM "$command"
In this line, the SPEC tools invoke a perl procedure which does arithmetic to derive processor numbers from the SPEC copy number. The procedure receives as input the copy number and the command that actually runs the benchmark, and produces as output a file that assigns the job to the correct location, and starts that file with ssh. Here is the complete text of the procedure:
#!/bin/perl use strict; use Cwd; # Particular testbed used today: my @nodes = qw ( sys115 sys114 sys113 sys112 sys111 sys110 sys109 sys108 sys107 sys106 ); # Processor description: # UltraSPARC T2 has 8 cores, each with 2 integer units, each with 4 threads my @cores = qw ( 7 6 5 4 3 2 1 0); # When assigning, my @int_units = qw ( 1 0 ); # ...fill resources from top my @threads = qw ( 3 2 1 0); # ...to bottom. my $threads_per_int_unit = $#threads + 1; my $threads_per_core = $threads_per_int_unit * ($#int_units + 1); # end of processor description section my $rundir = getcwd; my $copynum = shift @ARGV; my $node = $copynum % ($#nodes+1); my $copy_on_node = int($copynum / ($#nodes+1)); my $core = $copy_on_node % ($#cores+1); my $copy_on_core = int($copy_on_node / ($#cores+1)); my $int_unit = $copy_on_core % ($#int_units+1); my $copy_on_int_unit = int($copy_on_core / ($#int_units+1)); my $processor_num = ($cores[$core] * $threads_per_core) + ($int_units[$int_unit] * $threads_per_int_unit) + $threads[$copy_on_int_unit]; open DOBMK, "> dobmk" or die "Eh?"; print DOBMK "cd $rundir\n"; print DOBMK '/usr/ucb/echo -n "`hostname` " >> pbind.out' , "\n"; print DOBMK "/usr/sbin/pbind -b $processor_num \$\$ >> pbind.out\n"; print DOBMK 'sh -c "' . join(' ', @ARGV) . '"' . "\n"; close DOBMK; system '/usr/bin/ssh', '-n', $nodes[$node], 'sh', "$rundir/dobmk";
The effect of the above procedure is to use ssh (Secure Shell) to submit jobs to the nodes listed at the top, binding a copy of the benchmark to a virtual processor on that node with pbind. (Note: the above arithmetic could have been accomplished more efficiently using more traditional 'awk' and 'expr' methods, but the tester felt that a slight loss of efficiency was balanced by the potential improvement in clarity from the above procedure.)
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2014 Standard Performance Evaluation Corporation
Tested with SPEC CPU2006 v1.0.1.
Report generated on Tue Jul 22 16:47:43 2014 by SPEC CPU2006 flags formatter v6906.