IBM Books

MPI Programming and Subroutine Reference


Appendix G. Programming Considerations for User Applications in POE

Partial Table-of-Contents

  • MPI Signal-Handling and MPI Threaded Library Considerations
  • Environment Overview
  • Exit Status
  • POE Job Step Function
  • POE Additions To The User Executable
  • Signal Handlers
  • Replacement exit/atexit
  • Let POE Handle Signals When Possible
  • Don't Hard Code File Descriptor Numbers
  • Termination Of A Parallel Job
  • Your Program Can't Run As Root
  • AIX Function Limitations
  • Shell Execution
  • Do Not Rewind stdin, stdout Or stderr
  • Ensuring String Arguments Are Passed To Your Program Correctly
  • Network Tuning Considerations
  • Standard I/O Requires Special Attention
  • STDIN/STDOUT Piping Example
  • Reserved Environment Variables
  • AIX Message Catalog Considerations
  • Language Bindings
  • Available Virtual Memory Segments
  • Using the SP Switch Clock as a Time Source
  • 32-Bit and 64-Bit Support
  • Running Applications With Large Numbers of Tasks
  • MPI Signal-Handling Library Considerations
  • POE Gets Control First And Handles Task Initialization
  • Using Message Passing Handlers
  • POE Additions To The User Executable
  • Message Passing Initialization Module
  • Signal Handlers
  • Interrupted System Calls
  • Forks Are Limited
  • Checkpoint/Restart Limitations
  • MPI Threaded Library Considerations
  • POE Gets Control First And Handles Task Initialization
  • Language Bindings
  • MPI-IO Requires GPFS To Be Used Effectively
  • Use of AIX Signals
  • SIGALRM
  • SIGIO
  • SIGPIPE
  • Limitations In Setting The Thread Stacksize
  • Forks Are Limited
  • Standard I/O Requires Special Attention
  • Thread-Safe Libraries
  • Program And Thread Termination
  • Other Thread-Specific Considerations
  • Order Requirement For System Includes
  • MPI_INIT
  • Collective Communications
  • Support for M:N Threads
  • Fortran Considerations
  • Fortran 90 and MPI
  • Fortran and Threads
  • Restrictions


  • This appendix documents various limitations, restrictions, and programming considerations for user applications written to run under the IBM Parallel Environment for AIX (PE) licensed program.

    PE includes two versions of the message passing libraries. These are called the signal-handling library and the threaded library.

    This appendix consists of sections that list the programming considerations common to both libraries, as well as those unique to either the signal-handling library or the threaded library. There is also a subsection on using POE and the Fortran compiler. Specifically, the sections are as follows:


    MPI Signal-Handling and MPI Threaded Library Considerations

    The information in this section pertains to both the (MPL/MPI) signal-handling library and the MPI threaded library.

    Environment Overview

    As the end user, you are encouraged to think of the Parallel Operating Environment(POE) (also referred to as the poe command) as an ordinary (serial) command. It accepts redirected I/O, can be run under the nice and time commands, interprets command flags, and can be invoked in shell scripts.

    An n-task parallel job running in the Parallel Operating Environment actually consists of the n user tasks, an equal number (n) of instances of the IBM Parallel Environment for AIX pmd daemon (which is the parent task of the user's task), and the POE home node task in which the poe command runs. A pmd daemon is started by the POE home node on each machine on which each user task runs, and serves as the point of contact between the home node and the user's tasks.

    The POE home node routes standard input, standard output and standard error streams between the home node and the user's tasks via the pmd daemon, using TCP/IP sockets for this purpose. The sockets are created when the POE home node starts the pmd daemon for each task of a parallel job. The POE home node and pmd also use the sockets to exchange control messages to provide task synchronization, exit status and signaling. These capabilities do not depend upon the message passing library and are available to control any parallel program run by the poe command.

    Exit Status

    Exit status is a value between 0 and 255 inclusive. It is returned from POE on the home node reflecting the composite exit status of your parallel application, as follows:

    POE Job Step Function

    The POE job-step function is intended for the execution of a sequence of separate yet inter-related dependent programs. Therefore, it provides you with a job control mechanism that allows both job-step progression and job-step termination. The job control mechanism is the program's exit code.

    POE Additions To The User Executable

    POE links in the following routines when your executable is compiled with any of the POE compilation scripts (mpcc, mpcc_r, mpxlf,etc.).

    Signal Handlers

    POE installs signal handlers for most signals that cause program termination in order to notify the other tasks of termination and to complete the VT trace file, if enabled. POE then causes the program to exit normally with a code of (128+signal). When running non-threaded applications under POE, you may install a signal handler for any of these signals, and it should call the POE registered signal handler if the task decides to terminate. (See "Let POE Handle Signals When Possible".) When running threaded applications, any attempt to install a signal handler is ignored.

    Signals that are specifically handled by POE or the message passing library follow:

    The signal-handling library uses SIGIO, SIGALRM and SIGPIPE for its operations and it also handles these signals. For more information about the signal-handling library, see "MPI Signal-Handling Library Considerations". For more information about signals, see "Use of AIX Signals".

    Replacement exit/atexit

    POE requires its own versions of the library exit()/atexit() functions, and expects to load them dynamically from its own version of libc.a (or libc_r.a) in /usr/lpp/ppe.poe/lib; therefore, do not code your own exit function to override the library function. This is to synchronize profiling and to provide barrier synchronization upon exit.

    Let POE Handle Signals When Possible

    Programs that handle signals must coordinate with POE's handling of most of the common signals (see above).

    DO NOT issue message passing calls from signal handlers. Also, many AIX library calls are not "signal safe", and should not be issued from signal handlers. Check the AIX Technical Reference (function sigaction()) for a list of AIX functions callable from signal handlers.

    POE sets up signal handlers for all the signals that normally terminate program execution. It does this so that it can terminate the entire parallel job in an orderly fashion if one task terminates abnormally (via signal). A user program may install a handler for any or all of these signals, but should save the address of the POE signal handler. If the user program decides to terminate, it should call the POE signal handler. If the user program decides not to terminate, it should just return to the interrupted code. SIGTERM is used by POE to shutdown the parallel job in a variety of abnormal circumstances, and should be allowed to terminate the job.

    The POE home node converts a user's SIGTSTP signal (Ctrl-z) to a SIGSTOP signal to all the remote nodes, and passes the SIGCONT signal sent by the fg or bg command to all the remote nodes to restart the job.

    Don't Hard Code File Descriptor Numbers

    Do not use hard coded file descriptor numbers beyond those specified by STDIN, STDOUT and STDERR.

    POE opens several files and uses file descriptors as message passing handles. These are allocated before the user gets control, so the first file descriptor allocated to a user is unpredictable.

    Termination Of A Parallel Job

    POE provides for orderly termination of a parallel job, so that all tasks terminate at the same time. This is accomplished in the atexit routine registered at program initialization. For normal exits (codes 0, 2-127), the atexit routine sends a control message to the POE home node, and waits for a positive response. For abnormal exits and those which don't go through the atexit routine, the pmd daemon catches the exit code and sends a control message to the POE home node.

    For normal exits, when POE gets a control message for every task, it responds to each node, allowing that node to exit normally with its individual exit code. The pmd daemon monitors the exit code and passes it back to the POE home node for presentation to the user.

    For abnormal exits and those detected by pmd, POE sends a message to each pmd asking that it send a SIGTERM signal to its task, thereby terminating the task. When the task finally exits, pmd sends its exit code back to the POE home node and exits itself.

    User-initiated termination of the POE home node via SIGINT (Ctrl-c) and/or SIGQUIT (Ctrl-\) causes a message to be sent to pmd asking that the appropriate signal be sent to the parallel task. Again, pmd waits for the task to die then terminates itself.

    Your Program Can't Run As Root

    To prevent uncontrolled root access to the entire parallel job computation resource, POE checks to see that the user is not root as part of its authentication.

    AIX Function Limitations

    The use of the following AIX functions may be limited, but no formal testing has been done:

    Shell Execution

    You can have POE run a shell script which is loaded and run on the remote nodes as if it were a binary file.

    If the POE home node task is not started under the Korn shell, mounted file system names may not be mapped correctly to the names defined for the automount daemon or AIX equivalent running on the IBM RS/6000 SP. See the IBM Parallel Environment for AIX: Operation and Use, Volume 1 for a discussion of alternative name mapping techniques.

    The program executed by POE on the parallel nodes does not run under a shell on those nodes. Redirection and piping of STDIO applies to the POE home node (poe binary), and not the user's code. If shell processing of a command line is desired on the remote nodes, invoke a shell script on the remote nodes to provide the desired preprocessing before the user's application is executed.

    Do Not Rewind stdin, stdout Or stderr

    The partition manager daemon uses pipes to direct stdin, stdout and stderr to the user's program, therefore, do not rewind these files.

    Ensuring String Arguments Are Passed To Your Program Correctly

    Quotation marks, either single or double, used as argument delimiters are stripped away by the shell and are never "seen" by poe. Therefore, the quotation marks must be escaped to allow the quoted string to be passed correctly to the remote task(s) as one argument. For example, if you want to pass the following string to the user program (including the imbedded blank)

     
        a b
     
    

    then you need to enter the following:

     
        poe user_program \"a b\"
     
    

    user_program is passed the following argument as one token:

     
        a b
     
    

    Without the backslashes, the string would have been treated as two arguments (a and b).

    POE behaves like rsh when arguments are passed to POE. Therefore, the following:

     
        poe user_program "a b"
     
    

    is equivalent to:

     
        rsh some_machine user_program "a b"
     
    

    In order to pass the string argument as one token, the quotes have to be escaped.

    Network Tuning Considerations

    Programs generating large volumes of STDOUT or STDERR may overload the home node. As described previously, standard output and standard error files generated by a user's program are piped to pmd, then forwarded to the poe binary via a TCP/IP socket. It is possible to generate so much data that the IP message buffers on the home node are exhausted, the poe binary hangs and possibly the entire node may hang). Note that the option -stdoutmode (environment variable MP_STDOUTMODE) controls which output stream is displayed by the poe binary, but does not limit the standard output traffic received from the remote nodes, even if set to display the output of just one node.

    The POE environment variable MP_SNDBUF can be used to override the default network settings for the size of the TCP buffers used.

    If you have large volumes of standard I/O, work with your network administrator to establish appropriate TCP/IP tuning parameters. You may also want to examine if using named pipes is appropriate for your application.

    Standard I/O Requires Special Attention

    When your program runs on the remote nodes, it has no controlling terminal. STDIN and STDOUT, STDERR are always piped.

    Programs that depend on piping standard input or standard output as part of a processing sequence may wish to bypass the home node poe binary. Running the poe command (or starting a program compiled with one of the POE compile scripts) causes the poe binary to be loaded on the machine on which you typed the command (the POE home node). The poe binary, in turn, starts a daemon named pmd on each parallel node assigned to run the job, and then requests pmd to run your executable (via fork and exec). The poe binary reads STDIN and passes it to each of the parallel tasks via a TCP/IP socket connection to the pmd daemon, which pipes it to the user. Similarly, STDOUT and STDERR from the user are piped to pmd and sent on the socket back to the home node, where it is written to the poe binary's STDOUT and STDERR descriptors. If you know that the task reading STDIN or writing STDOUT must be on the same node (processor) as the poe binary (the poe home node), named pipes can be used to bypass poe's reading and forwarding STDIN and STDOUT.

    If STDIN is piped or redirected to the poe binary (via ordinary pipes), and your application is linked with the signal handling message passing library, (via mpcc, mpxlf, or mpCC), then set the environment variable MP_HOLD_STDIN to "yes". This lets poe initialize the signal-handling library before handling the STDIN file.

    If your application is linked with the threaded library, see "Standard I/O Requires Special Attention" for more information.

    STDIN/STDOUT Piping Example

    The following two scripts show how STDIN and STDOUT can be piped directly between pre- and post-processing steps, bypassing the POE home node task. This example assumes that parallel task 0 is known or forced to be on the same node as the POE home node.

    The script compute_home runs on the home node; the script compute_parallel runs on the parallel nodes (those running tasks 0 through n-1).

    compute_home:
    #! /bin/ksh
    # Example script compute_home runs three tasks:
    #    data_generator creates/gets data and writes to stdout
    #    data_processor is a parallel program that reads data
    #      from stdin, processes it in parallel, and writes
    #      the results to stdout.
    #    data_consumer reads data from stdin and summarizes it
    #
    mkfifo poe_in_$$
    mkfifo poe_out_$$
    export MP_STDOUTMODE=0
    export MP_STDINMODE=0
    data_generator >poe_in_$$ |
         poe compute_parallel poe_in_$$ poe_out_$$ data_processor |
         data_consumer <poe_out_$$
     rc=$?
     rm poe_in_$$
     rm poe_out_$$
     exit rc
    

    compute_parallel:
    #! /bin/ksh
    # Example script compute_parallel is a shell script that
    #    takes the following arguments:
    #    1) name of input named pipe (stdin)
    #    2) name of output named pipe (stdout)
    #    3) name of program to be run (and arguments)
    #
    poe_in=$1
    poe_out=$2
    shift 2
    $*   <$poe_in   >$poe_out
    

    Reserved Environment Variables

    Environment variables starting with MP_ are intended for use by POE, and should be set only as instructed in the documentation. POE also uses a handful of MP_... environment variables for internal purposes, which should not be interfered with.

    AIX Message Catalog Considerations

    POE assumes that NLSPATH contains the appropriate POE message catalogs, even if LANG is set to "C" or is unset. Duplicate message catalogs are provided for languages "En_US", "en_US", and "C".

    Language Bindings

    The Fortran, C and C++ bindings for MPI are contained in the same library and can be freely intermixed.

    Refer to "Fortran Considerations" for more information about the Fortran compiler.

    The AIX compilers support the flag -qarch. This option allows you to target code generation to a particular processor architecture. While this option can provide performance enhancements on specific platforms, it inhibits portability, particularly between the Power and PowerPC machines. The MPI library is not targeted to a specific architecture and is the same on PowerPC and Power nodes.

    The MPI-IO functions from MPI-2 are only available with the threaded library.

    Available Virtual Memory Segments

    AIX makes available up to 11 additional address segments for end user programs. The MPI libraries use some of these as listed in Table 16. The remaining are available to the user for either extended heap (-bmaxdata option) or shared memory (shmget). Very large jobs, which include all jobs with more than 1000 tasks, will need to use the -bmaxdata option to ensure a large enough heap.

    Table 16. Memory Segments Used By the MPI and LAPI Libraries
    Component RS/6000 SP node with switch RS/6000 workstation or no switch
    MPI User Space 2 not available
    MPI IP 1* 0
    VT Trace Capture 1 0
    LAPI User Space 2 not available

    * If the environment variable MP_CLOCK_SOURCE=AIX, the value is 0.

    Using the SP Switch Clock as a Time Source

    The RS/6000 SP switch clock is a globally-synchronized counter that may be used as a source for the MPI_WTIME function, provided that all tasks are run on nodes of the same SP system. The environment variable MP_CLOCK_SOURCE provides additional control. Table 17 shows how the clock source is determined. MPI guarantees that MPI_WTIME_IS_GLOBAL has the same value at every task.

    Table 17. How the Clock Source Is Determined
    MP_CLOCK_SOURCE Library Version All Nodes SP? Source Used MPI_WTIME_IS_GLOBAL
    not set ip yes switch false


    no AIX false

    us yes switch true


    no Error
    SWITCH ip yes* switch false


    no AIX false

    us yes switch true


    no Error
    AIX ip yes AIX false


    no AIX false

    us yes AIX false


    no AIX false

    * The user is responsible for ensuring all of the nodes are in the same SP system.

    32-Bit and 64-Bit Support

    POE compiles and runs all applications as 32-bit applications. 64-bit applications are not supported yet.

    Running Applications With Large Numbers of Tasks

    If you plan to run your parallel applications with a large number of tasks (more than 256), the following tips may improve stability and performance:


    MPI Signal-Handling Library Considerations

    The information in this subsection provides you with specific additional programming considerations for when you are using POE and the MPL/MPI signal-handling library.

    POE Gets Control First And Handles Task Initialization

    POE sets up its environment environment via the entry point mp_main(). mp_main() initializes the message passing library, sets up signal handlers, sets up an atexit routine, and initializes VT trace data collection before calling your main program.

    Using Message Passing Handlers

    Only a subset of MPL message passing is allowed on handlers created by the MPL Receive and Call function (mpc_rcvncall or MP_RCNVCALL). MPI calls on these handlers are not supported.

    POE Additions To The User Executable

    POE links in the following routines when your executable is compiled with mpcc, mpxlf or mpCC. These are routines specific for the signal handling environment.

    Message Passing Initialization Module

    POE initializes the parallel message passing library and determines that all nodes can communicate successfully before the user main() program gains control. As a result, any program compiled with the POE compiler scripts must be run under the control of POE and is not suitable as a serial program.

    If communication initialization fails, the parallel task is terminated with an appropriate exit code.

    Signal Handlers

    The message passing library sets up signal handlers for SIGALRM, SIGIO and SIGPIPE to manage message passing activity. A user program may install a handler for any or all of these signals, but should save the address of and invoke the POE signal handler before returning to the interrupted code. The sigaction() function returns the required structure. Also, set SA_RESTART as well as the mask so all signals are masked when the signal handler is running.

    The following are the signals used and specifically handled by the message passing library in a signal handling environment:

    Interrupted System Calls

    The message passing library uses an interval timer to manage message traffic, specifically to ensure that messages progress even when message passing calls are not being made. When this interval timer expires, a SIGALRM signal is sent to the program, interrupting whatever computation is in progress. The message passing library has a signal handler set, and normally handles the signal and returns to the user's program without the program's knowledge. However, the following library and system calls are interrupted and do not complete normally. The user is responsible for testing whether an interrupt occurred and recovering from the interrupt. In many cases, this is accomplished by just retrying the call.

    Note: The normal timer interval is less than 500 milliseconds. So a sleep call (with time specified in seconds) returns to the original sleep interval, due to rounding, and can't be used to determine how much time remains in the interval. You should use the functions usleep and nsleep instead. See also the "Sample Replacement Sleep Program" in Appendix H. "Using Signals and the IBM PE Programs".

    With the exception of sleep, system and exec, the routines listed above set the system error indicator (the variable errno) to EINTR, which can be tested by the user's program. See the "Sample Replacement Select Program" in Appendix H. "Using Signals and the IBM PE Programs".

    Normal file read and write are restarted automatically by AIX, and should not require any special treatment by the user.

    The system and fork calls create a new task in which the interval timer is still running. If a fork is followed by an exec (which is what system does), the signal handler for the timer is overlaid, and the task is terminated when the interval timer expires.

    To handle this for the system call, temporarily turn the interval timer off (using the alarm(0) call) before the call, and turn it on again (ualarm(500000, 500000) will do) after the system call.

    To handle the interval timer for a forked child, merely turn off the interval timer via alarm(0) in the child.

    There are other restrictions on fork described below.

    Forks Are Limited

    As described earlier, if a task forks, the forked child inherits the running timer. The timer should be turned off before forking another program. If the forked child does not exec another program, it should be aware that an atexit routine has been registered for the parent which is also inherited by the child. In most cases, the atexit routine will request POE to terminate the task (parent). A forked child should terminate with an _exit(0) system call to prevent the atexit routine from being called. Also, if the forked parent terminates before the child, the child task will not be cleaned up by POE.

    A forked child must not call the message passing library.

    Checkpoint/Restart Limitations

    A user may initiate a checkpoint sequence from within a parallel MPI program by calling the MP_CHKPT function. All tasks in the parallel job must issue the call, which does not return until the checkpoint files have been created for all tasks. If the job subsequently fails and is restarted, the restart returns from the MP_CHKPT function with an indication that the parallel job has been restarted.

    Programs using the signal handling (non-threaded) MPI library may be linked as a checkpointable executable, which is run as a LoadLeveler batch job. LoadLeveler Version 2.1 or later is required. Restrictions on the program follow:


    MPI Threaded Library Considerations

    When programming in a threaded environment specific skills and considerations are required. The information in this subsection provides you with specific programming considerations when using POE and the MPI threaded library. It assumes you are familiar with POSIX threads in general including mutexes, thread condition waiting, thread-specific storage, thread creation and termination.

    POE Gets Control First And Handles Task Initialization

    POE sets up its environment via the entry point mp_main_r(). mp_main_r() sets up signal handlers, initializes VT, and sets up an atexit routine before calling your main program.

    Note: In the threaded library, message passing initialization takes place when MPI_INIT is called and not by mp_main_r. The threaded library and the signal-handling library differ significantly in this regard.

    Language Bindings

    The Fortran, C and C++ bindings for MPI are contained in the same library (libmpi_r.a) and can be freely intermixed.

    Refer to "Fortran Considerations" for more information about running Fortran programs in a threaded environment.

    MPI-IO Requires GPFS To Be Used Effectively

    The subset implementation of MPI-IO provided in the thread library depends on all tasks running on a single file system. IBM Generalized Parallel File System (GPFS) is able to present a single file system to all nodes of an SP. Shared file systems (NFS and AFS, for example) do not have the same rigorous management of file consistency when updates occur from more than one node.

    MPI-IO can be used with most file systems as long as all tasks are on a single node. This single node approach may be useful in learning to use MPI-IO, but is not likely to be worthwhile in any production context.

    Any production use of MPI-IO must be based on GPFS.

    Use of AIX Signals

    The threaded POE run-time environment creates a thread to handle the following asynchronous signals:

    A user signal handler must not be invoked to handle the above signals, which are handled by sigwait.

    The following signals, which are used by MPI in the non-threaded library, are handled as described below.

    SIGALRM

    The threaded library does not use SIGALRM and long system calls such as sleep are not interrupted by the message passing library. For example, sleep runs its entire duration unless interrupted by a user-generated event.

    SIGIO

    PE blocks SIGIO before calling your program. SIGIO is used in the IP version of the library to notify you of an I/O event or the arrival of a message packet. This notification is enabled via the environment variable MP_CSS_INTERRUPT. If this environment variable is set to YES, the message packet arrival dispatches the interrupt service thread to process the packet.

    The User Space version of the library receives notification of an arriving packet via an AIX kernel event and does not use SIGO. You may unblock it or use sigwait to process SIGIO signals.

    If you've registered a signal handler (via sigaction) for SIGIO before MPI_INIT is called, the function is added to the interrupt service thread and is executed each time the service thread is dispatched. Although registered as a signal handler, the function is not required to be signal safe because it is executed on a thread. You can use pthread calls to communicate with other threads. You cannot call MPI functions in this handler.

    After MPI_FINALIZE is called, your signal handler is restored but you need to unblock SIGIO in order to receive subsequent SIGIO signals.

    If you register or change the SIGIO signal handler after calling MPI_INIT, your changes are ignored by the MPI library but your changes are not undone by MPI_FINALIZE.

    SIGPIPE

    Neither the threaded or non-threaded IP libraries use SIGPIPE. The threaded User Space library polls a variable set by the AIX kernel to determine if the switch has faulted and needs to be restarted. As a result, it does not use SIGPIPE.

    Limitations In Setting The Thread Stacksize

    The main thread stacksize is the same as the stacksize used for non-threaded applications. If you write your own MPI reduce functions to use with nonblocking collective communications or a SIGIO handler that will be executed on one of the library service threads, you are limited to a stacksize of 96KB by default. To increase your thread stacksize, use the environment variable MP_THREAD_STACKSIZE. For more information about the default and your ability to change the default, see the manpage for AIX_PTHREAD_SET_STACKSIZE.

    Forks Are Limited

    If a task forks, only the thread that forked exists in the child task. Therefore, the message passing library will not operate properly. Also, if the forked child does not exec another program, it should be aware that an atexit routine has been registered for the parent which is also inherited by the child. In most cases, the atexit routine requests that POE terminate the task (parent). A forked child should terminate with an _exit(0) system call to prevent the atexit routine from being called. Also, if the forked parent terminates before the child, the child task will not be cleaned up by POE.

    A forked child MUST NOT call the message passing library.

    Standard I/O Requires Special Attention

    When your program runs on the remote nodes, it has no controlling terminal. STDIN and STDOUT, STDERR are always piped.

    If your threaded MPI program processes STDIN from a large file on the home node, you must do one of the following:

    This also includes programs which may not explicitly use MPI.

    If STDIN is piped (or redirected) to the poe binary (via ordinary pipes) and your application is linked with the threaded library, then handle STDIN in the following way:

    Thread-Safe Libraries

    AIX provides thread-safe versions of some libraries, such as libc_r.a. However, not all libraries have a thread-safe version. It is your responsibility to determine whether the libraries you use can be safely called by more than one thread.

    Program And Thread Termination

    MPI_FINALIZE terminates the MPI service threads but does not affect user-created threads. Use pthread_exit to terminate any user-created threads, and exit(m) to terminate the main program (initial thread). The value of m is used to set POE's exit status as explained on "Exit Status".

    Other Thread-Specific Considerations

    Order Requirement For System Includes

    For threaded programs, AIX requires that the system include <pthread.h> must be first with <stdio.h> or other system includes following it. <pthread.h> defines some conditional compile variables that modify the code generation of subsequent includes, particularly <stdio.h>. Please note that <pthread.h> is not required unless your file uses thread-related calls or data.

    MPI_INIT

    Call MPI_INIT once per task not once per thread. MPI_INIT does not have to be called on the main thread but MPI_INIT and MPI_FINALIZE must be called on the same thread.

    MPI calls on other threads must adhere to the MPI standard in regard to the following:

    Collective Communications

    Collective communications must meet the MPI standard requirement that all participating tasks execute collective communications on any given communicator in the same order. If collective communications calls are made on multiple threads, it is your responsibility to ensure the proper sequencing or to use distinct communicators.

    Support for M:N Threads

    By default, user threads are created with process contention scope, and M user threads are mapped to N kernel threads. The values of the ratio M:N and the default contention scope are settable by AIX environment variables. The service threads created by MPI, POE, and LAPI have system contention scope, that is, they are mapped 1:1 to kernel threads.

    For PSSP 2.3 and 2.4, you must create system contention scope threads. For PSSP 3.1, you can create process contention scope threads, but any such thread will be converted to a system contention scope thread when it makes its first MPI call.


    Fortran Considerations

    The information in this subsection provides you with some specific programming considerations for when you are using POE and the Fortran compiler.

    Fortran 90 and MPI

    Incompatibilities exist between Fortran 90 and MPI which may effect the ability to use such programs. Refer to the information in

    /usr/lpp/ppe.poe/samples/mpif90/README.mpif90
    

    for further details. PE, Version 2, Release 2 provided the header file mpif90.h for use with Fortran 90. The file is still available in PE, Version 2, Release 4, but should not be used by new code. The mpif.h header file is formatted to work with either mpxlf90 or mpxlf compilation.

    Fortran and Threads

    Version 5 of the AIX XLF Fortran compiler supports threads.

    Version 4.1 of the AIX XLF Fortran compiler is not thread-safe. However, XLF Version 4.1.0.1 provides a partial thread-support XLF runtime library. It supports multi-threaded applications that have one Fortran thread. Be sure you thoroughly test such use.

    The partial thread-support library is libxlf90_t.a and is installed as /usr/lib/libxlf90_t.a. When you use the mpxlf_r command, this library is included automatically.

    Restrictions

    When you use libxlf90_t.a the following restrictions apply. Therefore, only one Fortran thread in a multi-threaded application may use the library.


    [ Top of Page | Previous Page | Next Page | Table of Contents | Index ]