XL Fortran for AIX 8.1

User's Guide

Running XL Fortran Programs

The default file name for the executable program is a.out. You can select a different name with the -o compiler option. You should avoid giving your programs the same names as system or shell commands (such as test or cp), as you could accidentally execute the wrong command. If a name conflict does occur, you can execute the program by specifying a path name, such as ./test.

You can run a program by entering the path name and file name of an executable object file along with any run-time arguments on the command line.

Canceling Execution

To suspend a running program, press the Ctrl+Z key while the program is in the foreground. Use the fg command to resume running.

To cancel a running program, press the Ctrl+C key while the program is in the foreground.

Running Previously Compiled Programs

Statically linked programs that you compiled with levels of XL Fortran prior to Version 8.1 should continue to run with no change in performance or behavior. They may not run on a system with a level of the operating system different from the system on which they were compiled.

If you have dynamically linked programs compiled by XL Fortran Version 2, 3, 4, 5, 6, or 7, you can run them on systems with the XL Fortran Version 8 libraries. The programs will use the current compiler data formats and I/O behavior, which are somewhat different from those of XL Fortran Version 2.

Compiling and Executing on Different Systems

If you want to move an XL Fortran executable file to a different system for execution, you can link statically and copy the program, and optionally the run-time message catalogs. Alternatively, you can link dynamically and copy the program as well as the XL Fortran libraries if needed and optionally the run-time message catalogs. For non-SMP programs, libxlf90.a is usually the only XL Fortran library needed. For SMP programs, you will usually need at least the libxlf90_r.a and libxlsmp.a libraries. libxlf.a is only needed if the program has any XL Fortran Version 1 or 2 object files linked in. libxlfpmt*.a and libxlfpad.a are only needed if the program is compiled with the -qautodbl option. If your application has dependencies on libhmd.a, refer to Using Debug Memory Routines for XL Fortran for more details on library dependencies.

For a dynamically linked program to work correctly, the XL Fortran libraries and operating system on the execution system must be at either the same level or a more recent level than on the compilation system.

For a statically linked program to work properly, the operating-system level may need to be the same on the execution system as it is on the compilation system.

Related information:: See Dynamic and Static Linking.

POSIX Pthreads Binary Compatibility

The XL Fortran compiler and run-time library provide binary compatibility in the following areas:

Executable file binary compatibility. If you created an executable file that had dependencies on the pthreads Draft 7 API (for example, you used XL Fortran Version 5.1.0 or AIX Version 4.2.1), you can upgrade your system to use XL Fortran Version 8.1 or AIX Version 4.3.3 and run your executable file without first recompiling and relinking your program.
Object file or archive library binary compatibility. If you created an object file or archive library that had dependencies on the Draft 7 pthreads API, you can continue to use that object file or archive library with the Draft 7 interface if you move from AIX Version 4.2.1 to AIX Version 4.3.3. For example, if you have a source file called test.f that uses a shared or static archive library called libmy_utility.a (which was created with the Draft 7 interface), you would enter something similar to the following command on AIX Version 4.3.3:
```
xlf95_r7 test.f -lmy_utility -o a.out
```
You do not need to regenerate libmy_utility.a before using it on AIX Version 4.3.3.

There are, however, restrictions on binary compatibility. XL Fortran supports combinations of Draft 7 and 1003.1-1996 standard object files in some instances. For example, if you used XL Fortran Version 5.1.0 to create a library, that library uses the Draft 7 pthreads API. An application that you build with that library can use either the Draft 7 pthreads API or the 1003.1-1996 standard pthreads API as long as the portions of the complete application built with the Draft 7 pthreads API do not share any pthreads data objects (such as mutexes or condition variables) with the portions built with the 1003.1-1996 standard pthreads API. If any such objects need to be used across portions of an application that are compiled with different levels of the pthreads API, the final application needs to use either the Draft 7 pthreads API or the 1003.1-1996 standard pthreads API across the entire application. You can do this in one of two ways:

Build the application by using the xlf_r7, xlf90_r7, or xlf95_r7 command, so that it uses the Draft 7 pthreads API.
Build both the library and the rest of the application by using the xlf_r, xlf90_r, or xlf95_r command.

Run-Time Libraries and Include Directories for POSIX Pthreads Support

There are three run-time libraries that are connected with POSIX thread support. The libxlf90_r.a library is a multiprocessor-enabled version of the Fortran run-time library. The libxlsmp.a library is the SMP run-time library.

The following libraries are used:

/lib/libxlf90.a: Provides 1003.1-1996 standard 32-bit and 64-bit support. This library is linked to libxlf90_r.a.
/lib/libxlsmp.a: Provides 1003.1-1996 standard 32-bit and 64-bit support.
/lib/libxlfpthrds_compat.a: Provides Draft 7 32-bit support.

XL Fortran supplies the following directories for .mod files:

/usr/lpp/xlf/include_32_d7: Provides Draft 7 32-bit support.
/usr/lpp/xlf/include_32: Provides 1003.1-1996 standard 32-bit support.
/usr/lpp/xlf/include_64: Provides 1003.1-1996 standard 64-bit support.

Depending on the invocation command, and in some cases, the compiler option, the appropriate set of libraries and include files for thread support is bound in. For example:

Cmd.	Libraries Used	Include Files Used	POSIX Pthreads API Level Supported
xlf90_r xlf95_r	/lib/libxlf90.a /lib/libxlsmp.a /lib/libpthreads.a	/usr/lpp/xlf/include_32 (if you specify -q32) /usr/lpp/xlf/include_64 (if you specify -q64)	1003.1-1996 standard
xlf90_r7 xlf95_r7	/lib/libxlf90.a /lib/libxlsmp.a /lib/libxlfpthrds_compat.a /lib/libpthreads.a	/usr/lpp/xlf/include_32_d7	Draft 7

Selecting the Language for Run-Time Messages

To select a language for run-time messages that are issued by an XL Fortran program, set the LANG and NLSPATH environment variables before executing the program.

In addition to setting environment variables, your program should call the C library routine setlocale to set the program's locale at run time. For example, the following program specifies the run-time message category to be set according to the LC_ALL, LC_MESSAGES, and LANG environment variables:

  PROGRAM MYPROG
  PARAMETER(LC_MESSAGES = 5)
  EXTERNAL SETLOCALE
  CHARACTER NULL_STRING /Z'00'/
  CALL SETLOCALE (%VAL(LC_MESSAGES), NULL_STRING)
  END

Related Information:: See Environment Variables for National Language Support.

The C library routine setlocale is defined in the AIX Technical Reference: Base Operating System and Extensions Volume 1.

Setting Run-Time Options

Internal switches in an XL Fortran program control run-time behavior, similar to the way compiler options control compile-time behavior. You can set the run-time options through either environment variables or a procedure call within the program.

You can specify all XL Fortran run-time option settings by using one of two environment variables: XLFRTEOPTS and XLSMPOPTS.

The XLFRTEOPTS Environment Variable

The XLFRTEOPTS environment variable allows you to specify options that affect I/O, EOF error-handling, and the specification of random-number generators. You can declare XLFRTEOPTS by using the following ksh command format:

                       .-:------------------------------------------.
                       V                                            |
>>-XLFRTEOPTS=--+---+----runtime_option_name--=----option_setting---+--+---+-><
                '-"-'                                                  '-"-'

You can specify option names and settings in uppercase or lowercase. You can add blanks before and after the colons and equal signs to improve readability. However, if the XLFRTEOPTS option string contains imbedded blanks, you must enclose the entire option string in double quotation marks (").

The environment variable is checked when the program first encounters one of the following conditions:

An I/O statement is executed.
The RANDOM_SEED procedure is executed.
An ALLOCATE statement needs to issue a run-time error message.
A DEALLOCATE statement needs to issue a run-time error message.

Changing the XLFRTEOPTS environment variable during the execution of a program has no effect on the program.

The SETRTEOPTS procedure (which is defined in "Service and Utility Procedures" in the XL Fortran for AIX Language Reference) accepts a single-string argument that contains the same name-value pairs as the XLFRTEOPTS environment variable. It overrides the environment variable and can be used to change settings during the execution of a program. The new settings remain in effect for the rest of the program unless changed by another call to SETRTEOPTS. Only the settings that you specified in the procedure call are changed.

You can specify the following run-time options with the XLFRTEOPTS environment variable or the SETRTEOPTS procedure:

buffering={enable | disable_preconn | disable_all}

Determines whether the XL Fortran run-time library performs buffering for I/O operations.

The library reads data from or writes data to the file system in chunks for READ or WRITE statements, instead of piece by piece. The major benefit of buffering is performance improvement.

If you have applications in which Fortran routines work with routines in other languages or in which a Fortran process works with other processes on the same data file, the data written by Fortran routines may not be seen immediately by other parties (and vice versa), because of the buffering. Also, a Fortran READ statement may read more data than it needs into the I/O buffer and cause the input operation performed by a routine in other languages or another process that is supposed to read the next data item to fail. In these cases, you can use the buffering run-time option to disable the buffering in the XL Fortran run-time library. As a result, a READ statement will read in exactly the data it needs from a file and the data written by a WRITE statement will be flushed out to the file system at the completion of the statement.

Note: I/O buffering is always enabled for files on sequential access devices (such as pipes, terminals, sockets, and tape drives). The setting of the buffering option has no effect on these types of files.

If you disable I/O buffering for a logical unit, you do not need to call the Fortran service routine flush_ to flush the contents of the I/O buffer for that logical unit.

The suboptions for buffering are as follows:

enable: The Fortran run-time library maintains an I/O buffer for each connected logical unit. The current read-write file pointers that the run-time library maintains might not be synchronized with the read-write pointers of the corresponding files in the file system.
disable_preconn: The Fortran run-time library does not maintain an I/O buffer for each preconnected logical unit (0, 5, and 6). However, it does maintain I/O buffers for all other connected logical units. The current read-write file pointers that the run-time library maintains for the preconnected units are the same as the read-write pointers of the corresponding files in the file system.
disable_all: The Fortran run-time library does not maintain I/O buffers for any logical units. You should not specify the buffering=disable_all option with Fortran programs that perform asynchronous I/O.

In the following example, Fortran and C routines read a data file through redirected standard input. First, the main Fortran program reads one integer. Then, the C routine reads one integer. Finally, the main Fortran program reads another integer.

Fortran main program:

integer(4) p1,p2,p3
print *,'Reading p1 in Fortran...'
read(5,*) p1
call c_func(p2)
print *,'Reading p3 in Fortran...'
read(5,*) p3
print *,'p1 p2 p3 Read: ',p1,p2,p3
end

C subroutine (c_func.c):

#include <stdio.h>
void
c_func(int *p2)
{
    int n1 = -1;
 
    printf("Reading p2 in C...\n");
    setbuf(stdin, NULL);    /* Specifies no buffering for stdin */
    fscanf(stdin,"%d",&n1);
    *p2=n1;
}

Input data file (infile):

The main program runs by using infile as redirected standard input, as follows:

$ main < infile

If you turn on buffering=disable_preconn, the results are as follows:

Reading p1 in Fortran...
Reading p2 in C...
Reading p3 in Fortran...
p1 p2 p3 Read:  11111 22222 33333

If you turn on buffering=enable, the results are unpredictable.

cnverr={yes | no}

If you set this run-time option to no, the program does not obey the IOSTAT= and ERR= specifiers for I/O statements that encounter conversion errors. Instead, it performs default recovery actions (regardless of the setting of err_recovery) and may issue warning messages (depending on the setting of xrf_messages).

Related Information:: For more information about conversion errors, see "Executing Data Transfer Statements" in the XL Fortran for AIX Language Reference. For more information about IOSTAT values, see "Conditions and IOSTAT Values" in the XL Fortran for AIX Language Reference.

erroreof={yes | no}

Determines whether the label specified by the ERR= specifier is to be branched to if no END= specifier is present when an end-of-file condition is encountered.

err_recovery={yes | no}

If you set this run-time option to no, the program stops if there is a recoverable error while executing an I/O statement with no IOSTAT= or ERR= specifiers. By default, the program takes some recovery action and continues when one of these statements encounters a recoverable error. Setting cnverr to yes and err_recovery to no can cause conversion errors to halt the program.

langlvl={extended | 90ext | 90std | 95std }

Determines the level of support for Fortran standards and extensions to the standards. The values of the suboptions are as follows:

90std: Specifies that the compiler should flag any extensions to the Fortran 90 standard I/O statements and formats as errors.
95std: Specifies that the compiler should flag any extensions to the Fortran 95 standard I/O statements and formats as errors.
extended: Specifies that the compiler should accept all extensions to the Fortran 90 standard and Fortran 95 standard I/O statements and formats.
90ext: Currently, provides the same level of support as the extended suboption. 90ext was the default suboption prior to XL Fortran Version 7.1. However, this suboption is now obsolete, and to avoid problems in the future, you should start using the extended suboption as soon as possible.

To obtain support for items that are part of the Fortran 95 standard and are available in XL Fortran as of Version 7.1 (such as namelist comments), you must specify one of the following suboptions:

95std
extended

The following example contains a Fortran 95 extension (the file specifier is missing from the OPEN statement):

program test1
 
call setrteopts("langlvl=95std")
open(unit=1,access="sequential",form="formatted")
 
10 format(I3)
 
write(1,fmt=10) 123

Specifying langlvl=95std results in a run-time error message.

The following example contains a Fortran 95 feature (namelist comments) that was not part of Fortran 90:

program test2
 
INTEGER I
LOGICAL G
NAMELIST /TODAY/G, I
 
call setrteopts("langlvl=95std:namelist=new")
 
open(unit=2,file="today.new",form="formatted", &
    & access="sequential", status="old")
 
read(2,nml=today)
close(2)
 
end
 
today.new:
 
&TODAY  ! This is a comment
I = 123, G=.true. /

If you specify langlvl=95std, no run-time error message is issued. However, if you specify langlvl=90std, a run-time error message is issued.

The err_recovery setting determines whether any resulting errors are treated as recoverable or severe.

multconn={yes | no}

Enables you to access the same file through more than one logical unit simultaneously. With this option, you can read more than one location within a file simultaneously without making a copy of the file.

You can only use multiple connections within the same program for files on random-access devices, such as disk drives. In particular, you cannot use multiple connections within the same program for:

Asynchronous I/O
Files on sequential-access devices (such as pipes, terminals, sockets, and tape drives)

To avoid the possibility of damaging the file, keep the following points in mind:

The second and subsequent OPEN statements for the same file can only be for reading.
If you initially opened the file for writing, the unit connected to the file by the first OPEN becomes read-only (ACCESS='READ') when the second unit is connected. You must close all of the units that are connected to the file and reopen the first unit to restore write access to it.
Two files are considered to be the same file if they share the same device and i-node numbers. Thus, linked files are considered to be the same file.

multconnio={tty | no }

Enables you to connect a TTY device to more than one logical unit. You can then write to or read from more than one logical unit that is attached to the same TTY device.

Note:: Using this option can produce unpredictable results.

In your program, you can now specify multiple OPEN statements that contain different values for the UNIT parameters but the same value for the FILE parameters. For example, if you have a symbolic link called mytty that is linked to TTY device /dev/pts/2, you can run the following program when you specify the multconnio=tty option:

PROGRAM iotest
OPEN(UNIT=3, FILE='mytty', ACTION="WRITE")
OPEN(UNIT=7, FILE='mytty', ACTION="WRITE")
END PROGRAM iotest

Fortran preconnects units 0, 5, and 6 to the same TTY device. Normally, you cannot use the OPEN statement to explicitly connect additional units to the TTY device that is connected to units 0, 5, and 6. However, this is possible if you specify the multconnio=tty option. For example, if units 0, 5, and 6 are preconnected to TTY device /dev/pts/2, you can run the following program if you specify the multconnio=tty option:

PROGRAM iotest
OPEN(UNIT=3, FILE='/dev/pts/2')
END PROGRAM iotest

namelist={new | old}

Determines whether the program uses the XL Fortran (new) or XL Fortran Version 1 (old) NAMELIST format for input and output. The Fortran 90 and Fortran 95 standards require the new format.

Note:: You may need the old setting to read existing data files that contain NAMELIST output.

With namelist=old, the nonstandard NAMELIST format is not considered an error by either the langlvl=95std or the langlvl=90std setting.

Related Information:: For more information about NAMELIST I/O, see "Namelist Formatting" in the XL Fortran for AIX Language Reference.

nlwidth=record_width

By default, a NAMELIST write statement produces a single output record long enough to contain all of the written NAMELIST items. To restrict NAMELIST output records to a given width, use the nlwidth run-time option.

Note:: The RECL= specifier for sequential files has largely made this option obsolete, because programs attempt to fit NAMELIST output within the specified record length. You can still use nlwidth in conjunction with RECL= as long as the nlwidth width does not exceed the stated record length for the file.

random={generator1 | generator2}

Specifies the generator to be used by RANDOM_NUMBER if RANDOM_SEED has not yet been called with the GENERATOR argument. The value generator1 (the default) corresponds to GENERATOR=1, and generator2 corresponds to GENERATOR=2. If you call RANDOM_SEED with the GENERATOR argument, it overrides the random option from that point onward in the program. Changing the random option by calling SETRTEOPTS after calling RANDOM_SEED with the GENERATOR option has no effect.

scratch_vars={yes | no}

To give a specific name to a scratch file, set the scratch_vars run-time option to yes, and set the environment variable XLFSCRATCH_unit to the name of the file you want to be associated with the specified unit number. See Naming Scratch Files for examples.

unit_vars={yes | no}

To give a specific name to an implicitly connected file or to a file opened with no FILE= specifier, you can set the run-time option unit_vars=yes and set one or more environment variables with names of the form XLFUNIT_unit to file names. See Naming Files That Are Connected with No Explicit Name for examples.

uwidth={32 | 64}

To specify the width of record length fields in unformatted sequential files, specify the value in bits. When the record length of an unformatted sequential file is greater than (2**31 - 1) bytes minus 8 bytes (for the record terminators surrounding the data), you need to set the run-time option uwidth=64 to extend the record length fields to 64 bits. This allows the record length to be up to (2**63 - 1) minus 16 bytes (for the record terminators surrounding the data). The run-time option uwidth is only valid for 64-bit mode applications.

xrf_messages={yes | no}

To prevent programs from displaying run-time messages for error conditions during I/O operations, RANDOM_SEED calls, and ALLOCATE or DEALLOCATE statements, set the xrf_messages run-time option to no. Otherwise, run-time messages for conversion errors and other problems are sent to the standard error stream.

The following examples set the cnverr run-time option to yes and the xrf_messages option to no.

# Basic format
  XLFRTEOPTS=cnverr=yes:xrf_messages=no
  export XLFRTEOPTS
 
# With imbedded blanks
  XLFRTEOPTS="xrf_messages = NO : cnverr = YES"
  export XLFRTEOPTS

As a call to SETRTEOPTS, this example could be:

  CALL setrteopts('xrf_messages=NO:cnverr=yes')
! Name is in lowercase in case -U (mixed) option is used.

The XLSMPOPTS Environment Variable

The XLSMPOPTS environment variable allows you to specify options that affect SMP execution.

You can declare XLSMPOPTS by using the following ksh command format:

                      .-:------------------------------------------.
                      V                                            |
>>-XLSMPOPTS=--+---+----runtime_option_name--=----option_setting---+--+---+-><
               '-"-'                                                  '-"-'

You can specify option names and settings in uppercase or lowercase. You can add blanks before and after the colons and equal signs to improve readability. However, if the XLSMPOPTS option string contains imbedded blanks, you must enclose the entire option string in double quotation marks (").

You can specify the following run-time options with the XLSMPOPTS environment variable:

schedule

Selects the scheduling type and chunk size to be used as the default at run time. The scheduling type that you specify will only be used for loops that were not already marked with a scheduling type at compilation time.

Work is assigned to threads in a different manner, depending on the scheduling type and chunk size used. A brief description of the scheduling types and their influence on how work is assigned follows:

dynamic or guided

The run-time library dynamically schedules parallel work for threads on a "first-come, first-do" basis. "Chunks" of the remaining work are assigned to available threads until all work has been assigned. Work is not assigned to threads that are asleep.

static

Chunks of work are assigned to the threads in a "round-robin" fashion. Work is assigned to all threads, both active and asleep. The system must activate sleeping threads in order for them to complete their assigned work.

affinity

The run-time library performs an initial division of the iterations into number_of_threads partitions. The number of iterations that these partitions contain is:

   CEILING(number_of_iterations / number_of_threads)

These partitions are then assigned to each of the threads. It is these partitions that are then subdivided into chunks of iterations. If a thread is asleep, the threads that are active will complete their assigned partition of work.

Choosing chunking granularity is a tradeoff between overhead and load balancing. The syntax for this option is schedule=suboption, where the suboptions are defined as follows:

affinity[=n]: As described previously, the iterations of a loop are initially divided into partitions, which are then preassigned to the threads. Each of these partitions is then further subdivided into chunks that contain n iterations. If you have not specified n, a chunk consists of CEILING(number_of_iterations_remaining_in_local_partition / 2) loop iterations.
When a thread becomes available, it takes the next chunk from its preassigned partition. If there are no more chunks in that partition, the thread takes the next available chunk from a partition preassigned to another thread.
dynamic[=n]: The iterations of a loop are divided into chunks that contain n iterations each. If you have not specified n, a chunk consists of CEILING(number_of_iterations / number_of_threads) iterations.
guided[=n]: The iterations of a loop are divided into progressively smaller chunks until a minimum chunk size of n loop iterations is reached. If you have not specified n, the default value for n is 1 iteration.
The first chunk contains CEILING(number_of_iterations / number_of_threads) iterations. Subsequent chunks consist of CEILING(number_of_iterations_remaining / number_of_threads) iterations.
static[=n]: The iterations of a loop are divided into chunks that contain n iterations. Threads are assigned chunks in a "round-robin" fashion. This is known as block cyclic scheduling. If the value of n is 1, the scheduling type is specifically referred to as cyclic scheduling.
If you have not specified n, the chunks will contain CEILING(number_of_iterations / number_of_threads) iterations. Each thread is assigned one of these chunks. This is known as block scheduling.

If you have not specified schedule, the default is set to schedule=static, resulting in block scheduling.

Related Information:: For more information, see the description of the SCHEDULE directive in the XL Fortran for AIX Language Reference.

Parallel execution options

The three parallel execution options, parthds, usrthds, and stack, are as follows:

parthds=num: Specifies the number of threads (num) to be used for parallel execution of code that you compiled with the -qsmp option. By default, this is equal to the number of online processors. There are some applications that cannot use more than some maximum number of processors. There are also some applications that can achieve performance gains if they use more threads than there are processors.
This option allows you full control over the number of execution threads. The default value for num is 1 if you did not specify -qsmp. Otherwise, it is the number of online processors on the machine. For more information, see the NUM_PARTHDS intrinsic function in the XL Fortran for AIX Language Reference.
usrthds=num: Specifies the maximum number of threads (num) that you expect your code will explicitly create if the code does explicit thread creation. The default value for num is 0. For more information, see the NUM_USRTHDS intrinsic function in the XL Fortran for AIX Language Reference.
stack=num: Specifies the largest amount of space in bytes (num) that a thread's stack will need. The default value for num is 4194304.

Performance tuning options

When a thread completes its work and there is no new work to do, it can go into either a "busy-wait" state or a "sleep" state. In "busy-wait", the thread keeps executing in a tight loop looking for additional new work. This state is highly responsive but harms the overall utilization of the system. When a thread sleeps, it completely suspends execution until another thread signals it that there is work to do. This state provides better utilization of the system but introduces extra overhead for the application.

The xlsmp run-time library routines use both "busy-wait" and "sleep" states in their approach to waiting for work. You can control these states with the spins, yields, and delays options.

During the busy-wait search for work, the thread repeatedly scans the work queue up to num times, where num is the value that you specified for the option spins. If a thread cannot find work during a given scan, it intentionally wastes cycles in a delay loop that executes num times, where num is the value that you specified for the option delays. This delay loop consists of a single meaningless iteration. The length of actual time this takes will vary among processors. If the value spins is exceeded and the thread still cannot find work, the thread will yield the current time slice (time allocated by the processor to that thread) to the other threads. The thread will yield its time slice up to num times, where num is the number that you specified for the option yields. If this value num is exceeded, the thread will go to sleep.

In summary, the ordered approach to looking for work consists of the following steps:

Scan the work queue for up to spins number of times. If no work is found in a scan, then loop delays number of times before starting a new scan.
If work has not been found, then yield the current time slice.
Repeat the above steps up to yields number of times.
If work has still not been found, then go to sleep.

The syntax for specifying these options is as follows:

spins[=num]: where num is the number of spins before a yield. The default value for spins is 100.
yields[=num]: where num is the number of yields before a sleep. The default value for yields is 10.
delays[=num]: where num is the number of delays while busy-waiting. The default value for delays is 500.

Zero is a special value for spins and yields, as it can be used to force complete busy-waiting. Normally, in a benchmark test on a dedicated system, you would set both options to zero. However, you can set them individually to achieve other effects.

For instance, on a dedicated 8-way SMP, setting these options to the following:

parthds=8 : schedule=dynamic=10 : spins=0 : yields=0

results in one thread per CPU, with each thread assigned chunks consisting of 10 iterations each, with busy-waiting when there is no immediate work to do.

You can also use the environment variables SPINLOOPTIME and YIELDLOOPTIME to tune performance. Refer to the AIX Performance Management Guide for more information on these variables.

Options to enable and control dynamic profiling

You can use dynamic profiling to reevaluate the compiler's decision to parallelize loops in a program. The three options you can use to do this are: parthreshold, seqthreshold, and profilefreq.

parthreshold=num

Specifies the time, in milliseconds, below which each loop must execute serially. If you set parthreshold to 0, every loop that has been parallelized by the compiler will execute in parallel. The default setting is 0.2 milliseconds, meaning that if a loop requires fewer than 0.2 milliseconds to execute in parallel, it should be serialized.

Typically, parthreshold is set to be equal to the parallelization overhead. If the computation in a parallelized loop is very small and the time taken to execute these loops is spent primarily in the setting up of parallelization, these loops should be executed sequentially for better performance.

seqthreshold=num

Specifies the time, in milliseconds, beyond which a loop that was previously serialized by the dynamic profiler should revert to being a parallel loop. The default setting is 5 milliseconds, meaning that if a loop requires more than 5 milliseconds to execute serially, it should be parallelized.

seqthreshold acts as the reverse of parthreshold.

profilefreq=num

Specifies the frequency with which a loop should be revisited by the dynamic profiler to determine its appropriateness for parallel or serial execution. Loops in a program can be data dependent. The loop that was chosen to execute serially with a pass of dynamic profiling may benefit from parallelization in subsequent executions of the loop, due to different data input. Therefore, you need to examine these loops periodically to reevaluate the decision to serialize a parallel loop at run time.

The allowed values for this option are the numbers from 0 to 32. If you set profilefreq to one of these values, the following results will occur.

If profilefreq is 0, all profiling is turned off, regardless of other settings. The overheads that occur because of profiling will not be present.
If profilefreq is 1, loops parallelized automatically by the compiler will be monitored every time they are executed.
If profilefreq is 2, loops parallelized automatically by the compiler will be monitored every other time they are executed.
If profilefreq is greater than or equal to 2 but less than or equal to 32, each loop will be monitored once every nth time it is executed.
If profilefreq is greater than 32, then 32 is assumed.

It is important to note that dynamic profiling is not applicable to user-specified parallel loops (for example, loops for which you specified the PARALLEL DO directive).

OpenMP Environment Variables

The following environment variables, which are included in the OpenMP standard, allow you to control the execution of parallel code.

Note:: If you specify both the XLSMPOPTS environment variable and an OpenMP environment variable, the OpenMP environment variable takes precedence.

OMP_DYNAMIC Environment Variable

The OMP_DYNAMIC environment variable enables or disables dynamic adjustment of the number of threads available for the execution of parallel regions. The syntax is as follows:

>>-OMP_DYNAMIC=--+-TRUE--+-------------------------------------><
                 '-FALSE-'

If you set this environment variable to TRUE, the run-time environment can adjust the number of threads it uses for executing parallel regions so that it makes the most efficient use of system resources. If you set this environment variable to FALSE, dynamic adjustment is disabled.

The default value for OMP_DYNAMIC is TRUE. Therefore, if your code needs to use a specific number of threads to run correctly, you should disable dynamic thread adjustment.

The omp_set_dynamic subroutine takes precedence over the OMP_DYNAMIC environment variable.

OMP_NESTED Environment Variable

The OMP_NESTED environment variable enables or disables nested parallelism. The syntax is as follows:

>>-OMP_NESTED=--+-TRUE--+--------------------------------------><
                '-FALSE-'

If you set this environment variable to TRUE, nested parallelism is enabled. This means that the run-time environment might deploy extra threads to form the team of threads for the nested parallel region. If you set this environment variable to FALSE, nested parallelism is disabled.

The default value for OMP_NESTED is FALSE.

The omp_set_nested subroutine takes precedence over the OMP_NESTED environment variable.

OMP_NUM_THREADS Environment Variable

The OMP_NUM_THREADS environment variable sets the number of threads that a program will use when it runs. The syntax is as follows:

>>-OMP_NUM_THREADS=--num---------------------------------------><

num: the maximum number of threads that can be used if dynamic adjustment of the number of threads is enabled. If dynamic adjustment of the number of threads is not enabled, the value of OMP_NUM_THREADS is the exact number of threads that can be used. It must be a positive, scalar integer.

The default number of threads that a program uses when it runs is the number of online processors on the machine.

If you specify the number of threads with both the PARTHDS suboption of the XLSMPOPTS environment variable and the OMP_NUM_THREADS environment variable, the OMP_NUM_THREADS environment variable takes precedence. The omp_set_num_threads subroutine takes precedence over the OMP_NUM_THREADS environment variable.

The following example shows how you can set the OMP_NUM_THREADS environment variable:

export OMP_NUM_THREADS=16

OMP_SCHEDULE Environment Variable

The OMP_SCHEDULE environment variable applies to PARALLEL DO and work-sharing DO directives that have a schedule type of RUNTIME. The syntax is as follows:

>>-OMP_SCHEDULE=--sched_type--+---------------+----------------><
                              '-,--chunk_size-'

sched_type: is either DYNAMIC, GUIDED, or STATIC.
chunk_size: is a positive, scalar integer that represents the chunk size.

This environment variable is ignored for PARALLEL DO and work-sharing DO directives that have a schedule type other than RUNTIME.

If you have not specified a schedule type either at compile time (through a directive) or at run time (through the OMP_SCHEDULE environment variable or the SCHEDULE option of the XLSMPOPTS environment variable), the default schedule type is STATIC, and the default chunk size is set to the following for the first N - 1 threads:

chunk_size = ceiling(Iters/N)

It is set to the following for the Nth thread, where N is the total number of threads and Iters is the total number of iterations in the DO loop:

chunk_size = Iters - ((N - 1) * ceiling(Iters/N))

If you specify both the SCHEDULE option of the XLSMPOPTS environment variable and the OMP_SCHEDULE environment variable, the OMP_SCHEDULE environment variable takes precedence.

The following examples show how you can set the OMP_SCHEDULE environment variable:

export OMP_SCHEDULE="GUIDED,4"
export OMP_SCHEDULE="DYNAMIC"

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]