This section contains Parallel ESSL-specific application program coding requirements and considerations for message passing programs--that is, programs coded in Fortran, C, and C++. To make a Parallel ESSL call in a parallel application program:
To look at an application program outline, see the following:
For an example of the use of Parallel ESSL in a sample message passing Fortran 90 application program solving a thermal diffusion problem, see Appendix B. "Sample Programs".
The ESSL Version 3 Guide and Reference manual contains additional information about coding ESSL for AIX subroutine calls in Fortran, C, and C++ programs. That information also applies to Parallel ESSL and is not repeated in this book. The specific topics you may want to reference, that apply to Parallel ESSL, are:
A parallel machine with k processes is often thought of as a one-dimensional linear array of processes labeled 0, 1, ..., k-1. For performance reasons, it is sometimes useful to map this one-dimensional array into a logical two-dimensional rectangular grid, which is also referred to as process grid, of processes. The process grid can have p process rows and q process columns, where p × q = k. A process can now be indexed by row and column, (i,j), where 0 <= i < p and 0 <= j < q. (This logical rectangular grid may not necessarily be reflected by the underlying hardware--that is, processor nodes. In most cases k is less than or equal to the number of SP processor nodes that your job is running on. In special cases, however, the number of processes can be greater than the number of SP processor nodes. This is subject to restrictions imposed by PE. For more details refer to the appropriate Parallel Environment: Operation and Use manual.)
Before calling the Parallel ESSL subroutines, you need to call BLACS_GET, followed by a call to either BLACS_GRIDINIT or BLACS_GRIDMAP to define the size and dimensions of your process grid. This identifies what processes are involved in the communication. You can reinitialize the BLACS, as needed, at various points in your application program to redefine the process grid.
When you initialize the BLACS, you must specify the (total) size k of the grid to be less than or equal to the value set in the MP_PROCS PE environment variable or its associated command-line flag -procs. If argument values are not valid, an error message is issued and the program is terminated.
An example of initializing the BLACS in a Fortran 90 program is shown in Appendix B. "Sample Programs". See the subroutine initialize_scale in "Module Scale (Message Passing)".
You call the BLACS_PINFO routine when you want to determine how many processes are available. You can use this information as input into other BLACS routines that set up your process grid.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_PINFO (mypnum, nprocs) |
C | blacs_pinfo (&mypnum, &nprocs); |
C++ |
extern "FORTRAN" void blacs_pinfo(const int &, const int &); blacs_pinfo (mypnum, nprocs); |
Returned as: a fullword integer value, where: 0 <= mypnum <= (nprocs - 1).
Returned as: a fullword integer value.
You call the BLACS_GET routine when you want the values the BLACS are using for internal defaults. The most common use is in retrieving a default system context for input into BLACS_GRIDINIT or BLACS_GRIDMAP.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_GET (icontxt, what, val) |
C | blacs_get (&icontxt, &what, &val); |
C++ and what = 0, 2, 10 |
extern "FORTRAN" void blacs_get(const int &, const int &, const int &); blacs_get (icontxt, what, val); |
If what = 0 or 2, icontxt is ignored.
If what = 10, icontxt is the integer handle indicating the BLACS context.
Specified as: a fullword integer value.
Table 31. Input and Output for BLACS_GET
Value of what | BLACS Internals That are Returned in val |
---|---|
0 | Handle indicating the default system context |
2 | BLACS debug level |
10 | Handle indicating the system context used to define the BLACS context
whose handle is icontxt.
You can redefine the shape of your process grid by calling BLACS_GET with what=10. For examples on how to do this, see the "Notes" section in "BLACS_GRIDINIT" or "BLACS_GRIDMAP". |
Specified as: a fullword integer value 0, 2, 10.
Returned as: a fullword integer value.
You call the BLACS_GRIDINIT routine when you want to map the processes sequentially in row-major order or column-major order into the process grid. You must specify the same input argument values in the calls to BLACS_GRIDINIT on every process.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_GRIDINIT (icontxt, order, nprow, npcol) |
C | blacs_gridinit (&icontxt, &order, &nprow, &npcol); |
C++ |
extern "FORTRAN" void blacs_gridinit(const int &, char *, const int &, const int &); blacs_gridinit (icontxt, order, nprow, npcol); |
Specified as: a fullword integer value.
If order = 'R', row-major natural ordering is used. This is the default.
If order = 'C', column-major natural ordering is used.
Specified as: a single character; order = 'R' or 'C'.
Specified as: a fullword integer where: 1 <= nprow <= p.
Specified as: a fullword integer value where: 1 <= npcol <= q.
Returned as: a fullword integer value.
CALL BLACS_GET(0, 0, icontxt)
* * Define the 1 × 4 process grid * CALL BLACS_GET(0, 0, icontxt) CALL BLACS_GRIDINIT(icontxt, 'R' 2, 2) . . . * * Redefine the shape to a 1 × 4 process grid * CALL BLACS_GET(icontxt, 10, newcontxt) CALL BLACS_GRIDINIT(newcontxt, 'R', 1, 4)
CALL BLACS_GRIDINIT (icontxt,'R',3,4)
The processes would be mapped sequentially in row major order into a 3 by 4
process grid as follows:
Table 32. A 3 by 4 process grid
Pp,q | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | t0 | t1 | t2 | t3 |
1 | t4 | t5 | t6 | t7 |
2 | t8 | t9 | t10 | t11 |
Note: | In this example, the process grid is 3 by 4. You must execute a call to Parallel ESSL on all processes whose process row and column index satisfy 0 <= i < 3 and 0 <= j < 4, respectively. |
You call the BLACS_GRIDINFO routine to obtain the process row and column index.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_GRIDINFO (icontxt, nprow, npcol, myrow, mycol) |
C | blacs_gridinfo (&icontxt, &nprow, &npcol, &myrow, &mycol); |
C++ | extern "FORTRAN" void blacs_gridinfo(const int &, const int
&, const int &, const int &, const int &);
blacs_gridinfo (icontxt, nprow, npcol, myrow, mycol); |
Specified as: a fullword integer value, returned by BLACS_GRIDINIT or BLACS_GRIDMAP.
Specified as: a fullword integer where: 1 <= nprow <= p.
Specified as: a fullword integer value where: 1 <= npcol <= q.
Returned as: a fullword integer value where: 0 <= myrow < p.
Returned as: a fullword integer value where: 0 <= mycol < q.
You call the BLACS_GRIDMAP routine when you want to map the processes in a specific manner into a process grid. You pass in a two-dimensional array containing the process numbers, which is mapped into your new process grid. You must specify the same input argument values in the calls to BLACS_GRIDMAP on every process.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_GRIDMAP (icontxt, usermap, ldumap, nprow, npcol) |
C | blacs_gridmap (&icontxt, usermap, &ldumap, &nprow, &npcol); |
C++ | extern "FORTRAN" void blacs_gridmap(const int &, int *, const
int &, const int &, const int &);
blacs_gridmap (icontxt, usermap, ldumap, nprow, npcol); |
Specified as: a fullword integer value.
Specified as: a two dimensional integer array of size ldumap by npcol.
Specified as: an integer where: ldumap >= nprow
Specified as: a fullword integer where: 1 <= nprow <= p.
Specified as: a fullword integer value where: 1 <= npcol <= q.
Returned as: a fullword integer value.
CALL BLACS_GET(0, 0, icontxt)
* * Define the 1 × 4 process grid * CALL BLACS_GET(0, 0, icontxt) CALL BLACS_GRIDMAP(icontxt, usermap, 2, 2, 2) . . . * * Redefine the shape of your 2 × 2 process grid * to a 1 × 4 process grid * CALL BLACS_GET(icontxt, 10, newcontxt) CALL BLACS_GRIDMAP(newcontxt, usermap, 2, 1, 4)
CALL BLACS_GRIDMAP (icontxt1,USERMAP,5,3,4)
Where array USERMAP1 contained the following integer values:
* * | 0 1 2 3 | | 8 9 10 11 | USERMAP1 = | 4 5 6 7 | | . . . . | | . . . . | * *
then, the processes would be mapped into a 3 by 4 process grid as
follows:
Pp,q | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | t0 | t1 | t2 | t3 |
1 | t8 | t9 | t10 | t11 |
2 | t4 | t5 | t6 | t7 |
BLACS_GRIDMAP sets icontxt1. Use the value of icontxt1 in any subsequent calls to Parallel ESSL to use this process grid.
While the above process grid is active, another overlapping process grid can be defined. Suppose you then called BLACS_GRIDMAP in your Fortran program as follows:
CALL BLACS_GRIDMAP(icontxt2, USERMAP2, 2, 2, 2)
where USERMAP contains the following values:
* * USERMAP2 = | 1 2 | | 10 11 | * *
Then the processes would be mapped into a 2 by 2 process grid as
follows:
Pp,q | 0 | 1 |
---|---|---|
0 | t1 | t2 |
1 | t10 | t11 |
BLACS_GRIDMAP will set icontxt2. Use the value of icontxt2 in any subsequent calls to Parallel ESSL to use this process grid.
Notes:
You call the BLACS_GRIDEXIT routine to release a BLACS context.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_GRIDEXIT (icontxt) |
C | blacs_gridexit (&icontxt); |
C++ | extern "FORTRAN" void blacs_gridexit(const int &);
blacs_gridexit (icontxt); |
Specified as: a fullword integer value, returned by BLACS_GRIDINIT or BLACS_GRIDMAP.
You call the BLACS_EXIT routine to release all the BLACS context and the memory allocated by the BLACS.
Language | Call Statement |
---|---|
Fortran | CALL BLACS_EXIT (continue) |
C | blacs_exit (&continue); |
C++ | extern "FORTRAN" void blacs_exit(const int &);
blacs_exit (continue); |
If continue = 0, all the BLACS context and memory allocated by the BLACS are released. In addition, Parallel ESSL calls MPI_Finalize to exit from MPI. There can only be one call to MPI_Finalize in your program. Therefore, at the end of your program, you should call BLACS_EXIT with continue = 0 or call MPI_Finalize directly.
If continue <> 0, the BLACS contexts and memory allocated by the BLACS are released, however, you can continue using MPI. When you are finished using MPI, you need to remember to call MPI_Finalize directly.
Specified as: a fullword integer.
In Fortran 90 programs, the Parallel ESSL sparse linear algebraic equation subroutines are invoked with the CALL statement, using the features of Fortran 90--generic interfaces, optional and keyword arguments, assumed-shape arrays, and modules.
The Fortran 90 sparse linear algebraic equation subroutines require that an explicit interface be provided for each extrinsic procedure entry in the scope where it is called, using an interface block. The interface blocks for the Parallel ESSL subroutines are provided for you in the module F90SPARSE, so you do not have to code the interface blocks yourself. In the beginning of your program, before any other specification statements, you must code the statement:
use f90sparse
This gives the XL Fortran compiler access to the interface blocks. For examples of where to code this statement in your program, see "Application Program Outline for the Fortran 90 Sparse Linear Algebraic Equations and Their Utilities".
For further details on coding the CALL statement and other related aspects of Fortran 90 programs, see the following Fortran manuals:
Before you can call the Parallel ESSL subroutines from your C or C++ program, you must have the Parallel ESSL header file installed on your system. The Parallel ESSL header file allows you to code your function calls as described in Part 2 of this book. The Parallel ESSL header file is named pessl.h. You should check with your system support group to verify that the appropriate Parallel ESSL header file is installed.
In the beginning of your C program, before you call any of the Parallel ESSL subroutines, you must code the following statement for the Parallel ESSL header file:
#include <pessl.h>
In the beginning of your C++ program, before you call any of the Parallel ESSL subroutines, you must code the following statement for the Parallel ESSL header file:
#include <pessl.h>
For the Level 2 and 3 PBLAS, dense and banded linear algebraic equations, and eigensystem analysis and singular value analysis subroutines, this application program outline shows how you can use the BLACS to define a process grid, set up a Type-1 array descriptor, call a Parallel ESSL subroutine, and exit the BLACS. For a complete example, see Appendix B. "Sample Programs".
. . . * * Determine my process number and the total number of available * processes * CALL BLACS_PINFO(IAM, NNODES) * * Define a process grid that is as close to square as possible
* NPROW = INT(SQRT(REAL(NNODES))) NPCOL = NNODES/NPROW * * Get the default system context * Define the process grid * Determine my process row and column index * CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, 'R', NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
* * Only call the Parallel ESSL subroutine if I am in the process grid * IF (MYROW .LT. NPROW .AND. MYCOL .LT. NPCOL) THEN * * Setup input arrays, scalars, array descriptors, etc. * . . *
* NUMROC can be used to return the size of local arrays * For example, here is one way to setup the descriptor vector for A * DESC_A(1) = DTYPE_A DESC_A(2) = ICONTXT DESC_A(3) = M_A DESC_A(4) = N_A DESC_A(5) = MB_A DESC_A(6) = NB_A DESC_A(7) = RSRC_A DESC_A(8) = CSRC_A DESC_A(9) = MAX (1, NUMROC(DESC_A(3), DESC_A(5), MYROW, DESC_A(7), NPROW))
. . . * * CALL Parallel ESSL subroutine * CALL PDTRAN(M, N, ALPHA, A, IA, JA, DESC_A, BETA, C, IC, JC, DESC_C) * * Process output arrays, scalars etc.
* . . . * * When finished with this process grid, release the process grid. * CALL BLACS_GRIDEXIT(ICONTXT) . ENDIF
. . . . * * At the end of the program, exit from the BLACS and MPI * CALL BLACS_EXIT(0) . . . END
The following is an outline for a application program that is calling the Fortran 90 sparse linear algebraic equation subroutines and their utilities. For a more complete example, see Example--Using the Fortran 90 Sparse Subroutines or "Fortran 90 Sample Sparse Program".
USE F90SPARSE . . . !User-defined subroutine INTERFACE PARTS SUBROUTINE PARTS(...) INTEGER GLOBAL_INDEX, N, NP
INTEGER NV INTEGER PV(*) END SUBROUTINE PARTS END INTERFACE . . . !Define the process grid CALL BLACS_GET (...) CALL BLACS_GRIDINIT(...) CALL BLACS_GRIDINFO(...)
. . . !Allocate space for and initialize array descriptor desc_a. CALL PADALL(...) !Allocate space and initialize some values !for sparse matrix A. CALL PSPALL(...)
!Allocate and build vectors b and x. CALL PGEALL(...) !Build the sparse matrix A with multiple calls to PSPINS. !Each process has to call PSPINS as many times as !necessary to insert the local rows it owns. !Update array descriptor desc_a. do CALL PSPINS(...) enddo
!Build vectors b and x with multiple calls to PGEINS. !Each process has to call PGEINS as many times as !necessary to insert the local elements it owns. do CALL PGEINS(...) enddo !Finalize the sparse matrix A and array descriptor desc_a CALL PSPASB(...)
!Finalize the vectors b and x. !Matrix A and array descriptor desc_a !must be finalized before calling PGEASB. CALL PGEASB(...) !Prepare preconditioner CALL PSPGPR(...) !Call solver
CALL PSPGIS(...) !Cleanup and exit. !Deallocate vectors b and x !Deallocate matrix A and the preconditioner data structure PRC CALL PGEFREE(...) CALL PSPFREE(...) !Deallocate the array descriptor desc_a only after !vectors b and x, and matrix A are deallocated. CALL PADFREE(...)
. . . CALL BLACS_GRIDEXIT(...) CALL BLACS_EXIT(...)
The following is an outline for a application program that is calling the Fortran 77 sparse linear algebraic equation subroutines and their utilities. For a complete example, see Example--Using the Fortran 77 Sparse Subroutines or "Fortran 77 Sample Sparse Program".
. . . EXTERNAL PARTS . . . !Define the process grid CALL BLACS_GET (...)
CALL BLACS_GRIDINIT(...) CALL BLACS_GRIDINFO(...) . . . !Initialize array descriptor desc_a. CALL PADINIT(...) !Initialize some values !for sparse matrix A. CALL PDSPINIT(...)
!Build the sparse matrix A with multiple calls to PDSPINS. !Each process has to call PDSPINS as many times as !necessary to insert the local rows it owns. !Update array descriptor desc_a. do CALL PDSPINS(...) enddo
!Build vectors b and x with multiple calls to PDGEINS. !Each process has to call PDGEINS as many times as !necessary to insert the local elements it owns. do CALL PDGEINS(...) enddo !Finalize the sparse matrix A and array descriptor desc_a CALL PDSPASB(...)
!Finalize the vectors b and x. CALL PDGEASB(...) !Prepare preconditioner CALL PDSPGPR(...) !Call solver CALL PDSPGIS(...)
. . . CALL BLACS_GRIDEXIT(...) CALL BLACS_EXIT(...)