Guide and Reference

Specifying and Distributing Data in a Message Passing Program

This section describe the calling sequence arguments for vectors and matrices, and shows how to distribute vectors, matrices and sequences in a message passing program for the following areas:

For the Level 2 and 3 PBLAS, Dense Linear Algebraic Equations, and Eigensystem Analysis and Singular Value Analysis subroutines, see "Specifying Block-Cyclically-Distributed Vectors and Matrices".
For the Banded Linear Algebraic Equations, see "Specifying Block-Cyclically-Distributed Matrices for the Banded Linear Algebraic Equations".
For the Sparse Linear Algebraic Equations, see "Specifying Sparse Matrices for the Fortran 90 and Fortran 77 Sparse Linear Algebraic Equations".
For the Fourier Transforms, see "Specifying Sequences for the Fourier Transforms".

An example of block-cyclic distribution of a global matrix in a Fortran 90 program in a message passing environment is shown in Appendix B. "Sample Programs". See the following:

The subroutine get_diffusion_matrix in "Module Fourier (Message Passing)", which shows how a local array can be assigned values.
The subroutine rlocal_to_rglobal in "Module Scale (Message Passing)", which shows gathering the local portions of the block-cyclically-distributed real array to generate the corresponding global matrix.

Specifying Block-Cyclically-Distributed Vectors and Matrices

For the Level 2 and 3 PBLAS, Dense Linear Algebraic Equations, and Eigensystem Analysis and Singular Value Analysis subroutines, certain calling sequence arguments are used to specify block-cyclically-distributed vectors or matrices.

Calling Sequence Arguments for Block-Cyclically-Distributed Vectors and Matrices

Table 14 describes the arguments associated with a vector X. Table 15 describes the arguments associated with a matrix A.

Table 14. Calling Sequence Arguments for a Block-Cyclically-Distributed Vector

Argument Meaning
x is the local part of the global matrix X. To determine the size of the local array for X, see "Determining the Number of Rows and Columns in Your Local Arrays".
ix is the row index of global matrix X.
jx is the column index of global matrix X.
desc_x is the array descriptor for global matrix X. (See Table 16.)
incx Stride for global vector X.

Note: A global vector of length n is distributed across process rows the same way as an n × 1 matrix is (in this case M_X is n and N_X is 1). A global vector of length n is distributed across process columns the same way as a 1 × n matrix is (in this case M_X is 1 and N_X is n).

Argument	Meaning
`x`	is the local part of the global matrix X. To determine the size of the local array for X, see "Determining the Number of Rows and Columns in Your Local Arrays".
`ix`	is the row index of global matrix X.
`jx`	is the column index of global matrix X.
`desc_x`	is the array descriptor for global matrix X. (See Table 16.)
`incx`	Stride for global vector X.

Table 15. Calling Sequence Arguments for a Block-Cyclically-Distributed Matrix

Argument Meaning
a is the local part of the global matrix A. To determine the size of the local array for A, see "Determining the Number of Rows and Columns in Your Local Arrays".
ia is the row index of the global matrix A.
ja is the column index of the global matrix A.
desc_a is the array descriptor for global matrix A. (See Table 16.)

Argument	Meaning
`a`	is the local part of the global matrix A. To determine the size of the local array for A, see "Determining the Number of Rows and Columns in Your Local Arrays".
`ia`	is the row index of the global matrix A.
`ja`	is the column index of the global matrix A.
`desc_a`	is the array descriptor for global matrix A. (See Table 16.)

Array Descriptors for Block-Cyclically-Distributed Matrices

An array descriptor, which is an integer array, is needed for each block-cyclically-distributed vector or matrix. The process grid definition and array descriptor are used to establish the mapping between the global vector or matrix and its corresponding process and distributed memory location.

Throughout this book, the _ (underscore) symbol in the array descriptor is followed by an X to indicate a vector or an A to indicate a matrix.

An example of setting up descriptor arrays in a Fortran 90 program is shown in Appendix B. "Sample Programs". See the subroutines initialize_rarray and initialize_carray in "Module Scale (Message Passing)".

Table 16 shows the type-1 array descriptor, as it is used in the Level 2 and 3 PBLAS, Dense Linear Algebraic Equations, and Eigensystem Analysis and Singular Value Analysis subroutines.

Table 16. Type-1 Array Descriptor for Block-Cyclically Distributed Vector or Matrix

DESC_( ) Symbolic name Meaning
1 DTYPE_ Descriptor type, where DTYPE_=1
2 CTXT_ BLACS context in which the global matrix is defined. (See "Initializing the BLACS".)
3 M_ Number of rows in the global matrix
4 N_ Number of columns in the global matrix
5 MB_ Row block size
6 NB_ Column block size
7 RSRC_ The process row of the p × q process grid over which the first row of the global matrix is distributed
8 CSRC_ The process column of the p × q process grid over which the first column of the global matrix is distributed
9 LLD_ Leading dimension of the local array. (See "Determining the Number of Rows and Columns in Your Local Arrays".) This value may be different on each process.

DESC_( )	Symbolic name	Meaning
1	DTYPE_	Descriptor type, where DTYPE_=1
2	CTXT_	BLACS context in which the global matrix is defined. (See "Initializing the BLACS".)
3	M_	Number of rows in the global matrix
4	N_	Number of columns in the global matrix
5	MB_	Row block size
6	NB_	Column block size
7	RSRC_	The process row of the `p` × `q` process grid over which the first row of the global matrix is distributed
8	CSRC_	The process column of the `p` × `q` process grid over which the first column of the global matrix is distributed
9	LLD_	Leading dimension of the local array. (See "Determining the Number of Rows and Columns in Your Local Arrays".) This value may be different on each process.

Specifying Submatrices

After a global vector or matrix is block-cyclically distributed over a process grid, you may decide to use only a portion of the global data structure. This is called a submatrix. For examples of how to specify the calling sequence arguments, listed in Table 14 and Table 15, for a submatrix, see:

Suppose you decide to distribute your global vector or matrix over the process grid, starting at a process other than 0,0. For examples of how to set the array descriptor values, listed in Table 16, see:

Determining the Number of Rows and Columns in Your Local Arrays

In a Parallel ESSL calling sequence, you specify an array that contains the local part of the global vector or matrix. To determine LOCp(M_) or LOCq(N_), which are used in the subroutines descriptions in Part 2 of this book, you must make a call to NUMROC:

For LOCp(M_), which represents the number of rows that a process would receive if M_ was distributed block-cyclically over the p rows of its process column, you specify:
```
   LOCp(M_) = NUMROC (M_, MB_, myrow, RSRC_, p)
```
where:

M_ is the number of rows in the global matrix.
MB_ is the row block size.
myrow is the process row index. See "Initializing the BLACS".
RSRC_ is the process row over which the first row of the global matrix is distributed.
p is the number of rows in the p × q process grid.
For LOCq(N_), which represents the number of columns that a process would receive if N_ was distributed block-cyclically over the q columns of its process row, you specify:
```
   LOCq(N_) = NUMROC (N_, NB_, mycol, CSRC_, q)
```
where:

N_ is the number of columns in the global matrix.
NB_ is the column block size.
mycol is the process column index. See "Initializing the BLACS".
CSRC_ is the process column over which the first column of the global matrix is distributed.
q is the number of columns in the p × q process grid.

Specifying Block-Cyclically-Distributed Matrices for the Banded Linear Algebraic Equations

For the Banded Linear Algebraic Equations, certain calling sequence arguments are used to specify block-cyclically distributed matrices on one-dimensional process grids.

Although the global array is block-cyclically distributed, the actual submatrix used in computation is either block-row or block-column distributed. See the appropriate subroutine for restrictions.

Symmetric Band Matrix

A symmetric band matrix must be distributed over a one-dimensional process grid:

On a 1 × p process grid, the symmetric band matrix is block-cyclically distributed. In this case, either type-501 or type-1 array descriptor may be specified.
On a p × 1 process grid, the symmetric band matrix is block-cyclically distributed as if the process grid is 1 × p. In this case, the type-501 array descriptor must be specified.

Table 17 describes the calling sequence arguments associated with a symmetric band matrix.

Table 17. Calling Sequence Arguments for a Distributed Symmetric Band Matrix

Argument Meaning
n is the order of the global symmetric band submatrix A.
a is the local part of the global symmetric band matrix A.
ja is the column index of the global symmetric band matrix A.
desc_a is the array descriptor for the global symmetric band matrix A. For more details, see Table 21 and Table 16.

Argument	Meaning
`n`	is the order of the global symmetric band submatrix A.
`a`	is the local part of the global symmetric band matrix A.
`ja`	is the column index of the global symmetric band matrix A.
`desc_a`	is the array descriptor for the global symmetric band matrix A. For more details, see Table 21 and Table 16.

General Tridiagonal Matrix

A general tridiagonal matrix, represented as three vectors, must be distributed over a one-dimensional process grid using a block-cyclic data distribution. Because vectors are one-dimensional data structures, you can use type-501, type-502, or type-1 array descriptor regardless of whether the process grid is p × 1 or 1 × p. Table 18 describes the calling sequence arguments associated with a general tridiagonal matrix.

Table 18. Calling Sequence Arguments for General Tridiagonal Matrix

Argument Meaning
n is the order of the global general tridiagonal submatrix A.
dl, d, du is the local part of the global vectors. (The general tridiagonal matrix A is stored in tridiagonal storage mode in dl, d, and du.)
ia is the row index of the global general tridiagonal matrix A.
desc_a is the array descriptor for the global general tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22.

Argument	Meaning
`n`	is the order of the global general tridiagonal submatrix A.
`dl, d, du`	is the local part of the global vectors. (The general tridiagonal matrix A is stored in tridiagonal storage mode in `dl, d`, and `du`.)
`ia`	is the row index of the global general tridiagonal matrix A.
`desc_a`	is the array descriptor for the global general tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22.

Symmetric Tridiagonal Matrix

A symmetric tridiagonal matrix, represented as two vectors, must be distributed over a one-dimensional process grid using block-cyclic data distribution.
Note: For both serial ESSL and Parallel ESSL, the n- 1 elements of the equal off-diagonals of a symmetric tridiagonal matrix are stored in a one-dimensional vector of length n. To be compatible with ScaLAPACK, in Parallel ESSL, the off-diagonal is chosen to be the superdiagonal and is stored in elements ia through ia+n- 2. In the serial ESSL library, the off-diagonal is chosen to be the subdiagonal and is stored in elements 2 through n.

Because vectors are one-dimensional data structures, you can use a type-501, type-502, or type-1 array descriptor regardless of whether the process grid is p × 1 or 1 × p. Table 19 describes the calling sequence arguments associated with a symmetric tridiagonal matrix.

Table 19. Calling Sequence Arguments for a Symmetric Tridiagonal Matrix

Argument Meaning
n is the order of the global symmetric tridiagonal submatrix A.
d, e is the local part of the global vectors. (The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in d and e.)
ia is the row index of the global symmetric tridiagonal matrix A.
desc_a is the array descriptor for the global symmetric tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22.

Argument	Meaning
`n`	is the order of the global symmetric tridiagonal submatrix A.
`d, e`	is the local part of the global vectors. (The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in `d` and `e`.)
`ia`	is the row index of the global symmetric tridiagonal matrix A.
`desc_a`	is the array descriptor for the global symmetric tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22.

General Matrix Consisting of Multiple Right-Hand Sides

For the Banded Linear Algebraic Equations subroutines, a general matrix consisting of multiple right-hand sides must be distributed over a one-dimensional process grid:

On a p × 1 process grid, the multiple right-hand sides is block-cyclically distributed. In this case either type-502 or type-1 array descriptor may be specified.
On a 1 × p process grid, the multiple right-hand sides is block-cyclically distributed as if the process grid is p × 1. In this case type-502 array descriptor must be specified.

Table 20 describes the calling sequence arguments associated with the general matrix.

Table 20. Calling Sequence Arguments for a Matrix Containing the Multiple Right-Hand Sides

Argument Meaning
n is the number of rows in the global general submatrix B.
b is the local part of the global general matrix B.
ib is the row index of the global general matrix B.
desc_b is the array descriptor for the global general matrix B. For more details, see Table 22 and Table 16.

Argument	Meaning
`n`	is the number of rows in the global general submatrix B.
`b`	is the local part of the global general matrix B.
`ib`	is the row index of the global general matrix B.
`desc_b`	is the array descriptor for the global general matrix B. For more details, see Table 22 and Table 16.

Array Descriptors for Banded Matrices

An array descriptor, which is an integer array, is needed for each block-distributed matrix. The process grid definition and the array descriptor are used to establish the mapping between the global matrix and its corresponding process and distributed memory location.

In the Banded Linear Algebraic Equations sections throughout this book, the _ (underscore) symbol in the array descriptor is followed by an A or a B. A indicates a banded, tridiagonal, or symmetric tridiagonal matrix. B indicates a matrix containing the multiple right-hand sides matrix.

When you place a call to the banded or tridiagonal subroutines, you must be careful to choose consistent combinations of array descriptor types for matrix A and matrix B, and process grids. For consistent combinations, see the "Notes and Coding Rules" in the subroutine descriptions in Part 2 of this book.

Therefore, depending on which subroutine you are using in the Banded Linear Algebraic Equations, you may choose different array descriptors in the same subroutine calling sequence. Keep in mind you must only create one process grid; that is, CTXT_A = CTXT_B.

For example, when calling PDPBSV suppose you choose DTYPE_A = 501 for the band matrix A and DTYPE_B = 502 for matrix B. If you specify CTXT_A as 1 × p, you must also specify CTXT_B as 1 × p. Or if you specify CTXT_A as p × 1, you must also specify CTXT_B as p × 1. For an example of how to set the array descriptor values, see "Example".

Table 21. Type-501 Array Descriptor

DESC_( ) Symbolic name Value
1 DTYPE_ DTYPE_ = 501 for 1 × p or p × 1, where p is the number of processes in a process grid.
2 CTXT_ BLACS context in which the global matrix is defined. The BLACS process grid can be defined as 1 × p or p × 1.
(See "Initializing the BLACS".)
3 N_ Number of columns in the global matrix
4 NB_ Column block size.
5 CSRC_ The process column over which the first column of the global matrix is distributed
6 LLD_ Leading dimension of the local array. (See "Determining the Number of Rows or Columns in Your Local Arrays".) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored.
7 -- Reserved.

DESC_( )	Symbolic name	Value
1	DTYPE_	DTYPE_ = 501 for 1 × `p` or `p` × 1, where `p` is the number of processes in a process grid.
2	CTXT_	BLACS context in which the global matrix is defined. The BLACS process grid can be defined as 1 × `p` or `p` × 1. (See "Initializing the BLACS".)
3	N_	Number of columns in the global matrix
4	NB_	Column block size.
5	CSRC_	The process column over which the first column of the global matrix is distributed
6	LLD_	Leading dimension of the local array. (See "Determining the Number of Rows or Columns in Your Local Arrays".) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored.
7	--	Reserved.

Table 22. Type-502 Array Descriptor

DESC_( ) Symbolic name Value
1 DTYPE_ DTYPE_ = 502 for p × 1 or 1 × p, where p is the number of processes in a process grid.
2 CTXT_ BLACS context in which the global matrix is defined. The BLACS process grid can be defined as 1 × p or p × 1.
(See "Initializing the BLACS".)
3 M_ Number of rows in the global matrix
4 MB_ Row block size.
5 RSRC_ The process row over which the first row of the global matrix is distributed
6 LLD_ Leading dimension of the local array. (See "Determining the Number of Rows or Columns in Your Local Arrays".) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored for matrix A.
7 -- Reserved.

DESC_( )	Symbolic name	Value
1	DTYPE_	DTYPE_ = 502 for `p` × 1 or 1 × `p`, where `p` is the number of processes in a process grid.
2	CTXT_	BLACS context in which the global matrix is defined. The BLACS process grid can be defined as 1 × `p` or `p` × 1. (See "Initializing the BLACS".)
3	M_	Number of rows in the global matrix
4	MB_	Row block size.
5	RSRC_	The process row over which the first row of the global matrix is distributed
6	LLD_	Leading dimension of the local array. (See "Determining the Number of Rows or Columns in Your Local Arrays".) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored for matrix A.
7	--	Reserved.

Determining the Number of Rows or Columns in Your Local Arrays

For local arrays described by type-501 array descriptor, the number of rows in the local matrix is always equal to the number of rows in the global matrix. The number of columns in the local array is determined as follows:

For a 1 × q process grid:

LOCq(N_) = NUMROC(N_,NB_,mycol,CSRC_,q)

For q × 1 process grid:

LOCq(N_) = NUMROC(N_,NB_,myrow,CSRC_,q)

where:

N_ is the number of columns in the global matrix.

NB_ is the column block size.

mycol, for a 1 × q process grid, is the process column index. See "Initializing the BLACS".

myrow, for a q × 1 process grid, is the process row index. See "Initializing the BLACS".

CSRC_ is element 5 of type-501 array descriptor.

q is the number of columns in the process grid.

For local arrays described by type-502 array descriptor, the number of columns in the local matrix is always equal to the number of columns in the global matrix. The number of rows in the local array is determined as follows:

For a p × 1 process grid:

LOCp(M_) = NUMROC(M_,MB_,myrow,RSRC_,p)

For a 1 × p process grid:

LOCp(M_) = NUMROC(M_,MB_,mycol,RSRC_,p)

where:

M_ is the number of rows in the global matrix.

MB_ is the row block size.

myrow, for a p × 1 process grid, is the process row index. See "Initializing the BLACS".

mycol, for a 1 × p process grid, is the process column index. See "Initializing the BLACS".

RSRC_ is element 5 of type-502 array descriptor.

p is the number of rows in the process grid.

Distributing Data Structures

You must distribute your data before calling Parallel ESSL from your message passing program. This section shows how you how to distribute your data.

All the Parallel ESSL message passing subroutines, except the Banded Linear Algebraic Equations and Fourier transform subroutines, support block-cyclic distribution. The Banded Linear Algebraic Equations and the Fourier transform subroutines only support block distribution.

The following sections provide examples for distributing data over one- or two-dimensional process grids:

"Vectors"
"Matrices"
"Specifying Sequences for the Fourier Transforms"

Vectors

Parallel ESSL supports block-cyclic distribution for vectors over one- or two-dimensional process grids. A vector is distributed over a single row or column of the process grid, except for PDURNG. For PDURNG, vectors are distributed block-cyclically over the entire one- or two-dimensional process grid using row-major order, where the length n of the vector x must be evenly divisible by the available processes np multiplied by the block size nb. In other words, n/(np)(nb) must be an integer.

Block-Cyclic Distribution over One-Dimensional Process Grids

This example shows how a global vector of length 24 with blocks of size 3 is distributed block-cyclically over one-dimensional process grids. Assume the following:

X = (8, 2, 3, 6, 5, 1, 9, 5, 3, 6, 2, 4, 10, 7, 4, 2, 8, 2, 8, 9, 2, 3, 11, 10)

Global vector x:

B,D    0
     *    *
     | 8  |
 0   | 2  |
     | 3  |
     | -- |
     | 6  |
 1   | 5  |
     | 1  |
     | -- |
     | 9  |
 2   | 5  |
     | 3  |
     | -- |
     | 6  |
 3   | 2  |
     | 4  |
     | -- |
     | 10 |
 4   | 7  |
     | 4  |
     | -- |
     | 2  |
 5   | 8  |
     | 2  |
     | -- |
     | 8  |
 6   | 9  |
     | 2  |
     | -- |
     | 3  |
 7   | 11 |
     | 10 |
     *    *

Column-oriented, 4 × 1 process grid:

B,D 0
0
4
P₀₀
1
5
P₁₀
2
6
P₂₀
3
7
P₃₀

B,D	0
0 4	P₀₀
1 5	P₁₀
2 6	P₂₀
3 7	P₃₀

Local arrays:

p,q  | 0
-----|----
     | 8
     | 2
     | 3
 0   | 10
     | 7
     | 4
-----|----
     | 6
     | 5
     | 1
 1   | 2
     | 8
     | 2
-----|----
     | 9
     | 5
     | 3
 2   | 8
     | 9
     | 2
-----|----
     | 6
     | 2
     | 4
 3   | 3
     | 11
     | 10

For the column-oriented example, the array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 24
4 N_X 1
5 MB_X 3
6 NB_X 1
7 RSRC_X 0
8 CSRC_X 0
9 LLD_X 6

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	24
4	N_X	1
5	MB_X	3
6	NB_X	1
7	RSRC_X	0
8	CSRC_X	0
9	LLD_X	6

Row-oriented, 1 × 4 process grid:

B,D 0 4 1 5 2 6 3 7
0 P₀₀ P₀₁ P₀₂ P₀₃

B,D	0 4	1 5	2 6	3 7
0	P₀₀	P₀₁	P₀₂	P₀₃

Local array:

p,q  |      0        |       1       |       2       |        3
-----|---------------|---------------|---------------|----------------
 0   | 8 2 3 10 7 4  |  6 5 1 2 8 2  |  9 5 3 8 9 2  |  6 2 4 3 11 10

For the row-oriented example, the array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 1
4 N_X 24
5 MB_X 1
6 NB_X 3
7 RSRC_X 0
8 CSRC_X 0
9 LLD_X 1

Note: The same global vector was distributed over a 4 × 1 grid and then over a 1 × 4 grid. Notice the values contained in the corresponding local arrays are identical.

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	1
4	N_X	24
5	MB_X	1
6	NB_X	3
7	RSRC_X	0
8	CSRC_X	0
9	LLD_X	1

Block-Cyclic Distribution over Two-Dimensional Process Grids

This example shows how a global vector of length 18 with block size of 3 is distributed over two-dimensional grids. When a two-dimensional process grid is used, the global vector can be distributed over any single row or any single column of the grid. Assume the following:

X = (4, 11, 17, 21, 3, 7, 12, 5, 3, 15, 3, 4, 9, 17, 1, 10, 9, 25)

Global vector x:

B,D    0
     *    *
     |  4 |
 0   | 11 |
     | 17 |
     | -- |
     | 21 |
 1   |  3 |
     |  7 |
     | -- |
     | 12 |
 2   |  5 |
     |  3 |
     | -- |
     | 15 |
 3   |  3 |
     |  4 |
     | -- |
     |  9 |
 4   | 17 |
     |  1 |
     | -- |
     | 10 |
 5   |  9 |
     | 25 |
     *    *

Two-dimensional, 2 × 3 process grid:

B,D -- -- 0
0
2
4
P₀₀ P₀₁ P₀₂
1
3
5
P₁₀ P₁₁ P₁₂

B,D	--	--	0
0 2 4	P₀₀	P₀₁	P₀₂
1 3 5	P₁₀	P₁₁	P₁₂

If the global vector is distributed over the third column of a 2 × 3 process grid, then P₀₂ and P₁₂ contain the following local arrays:

For the single column example, the array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 18
4 N_X 1
5 MB_X 3
6 NB_X 1
7 RSRC_X 0
8 CSRC_X 2
9 LLD_X 9

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	18
4	N_X	1
5	MB_X	3
6	NB_X	1
7	RSRC_X	0
8	CSRC_X	2
9	LLD_X	9

If the global vector is distributed over the second row of a 2 × 3 process grid, then P₁₀, P₁₁, and P₁₂ contain the following local arrays:

p,q  |        0         |        1        |        2
-----|------------------|-----------------|-----------------
 1   |  4 11 17 15 3 4  |  21 3 7 9 17 1  |  12 5 3 10 9 25

For the single row example, the array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 1
4 N_X 18
5 MB_X 1
6 NB_X 3
7 RSRC_X 1
8 CSRC_X 0
9 LLD_X 1

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	1
4	N_X	18
5	MB_X	1
6	NB_X	3
7	RSRC_X	1
8	CSRC_X	0
9	LLD_X	1

For PDURNG, the global vector is distributed block-cyclically over the entire 2 × 3 process grid using row-major order, as follows:

p,q  |    0      |     1      |     2
-----|-----------|------------|-----------
 0   |  4 11 17  |  21  3  7  |  12  5  3
-----|-----------|------------|-----------
 1   | 15  3  4  |   9 17  1  |  10  9 25

Notes:

For PDURNG, the length n of the vector x must be evenly divisible by the number of available processes np multiplied by the block size nb. For this example, 18 = (6)(3).
For PDURNG, the array descriptor is not used.

Following is an example of uneven block-cyclic distribution for a global vector of length 20 with block size of 3, where the two local arrays are different sizes. In this case, a fragment of a block with two elements occurs at the end of the vector. Assume the following:

X = (0, 5, 6, 3, 21, 5, 6, 1, 8, 9, 13, 11, 12, 15, 14, 15, 11, 17, 18, 19)

Following is a global vector x with block size 3:

B,D    0
     *    *
     |  0 |
 0   |  5 |
     |  6 |
     | -- |
     |  3 |
 1   | 21 |
     |  5 |
     | -- |
     |  6 |
 2   |  1 |
     |  8 |
     | -- |
     |  9 |
 3   | 13 |
     | 11 |
     | -- |
     | 12 |
 4   | 15 |
     | 14 |
     | -- |
     | 15 |
 5   | 11 |
     | 17 |
     | -- |
 6   | 18 |
     | 19 |
     *    *

Two-dimensional, 2 × 3 process grid:

B,D 0 -- --
0
2
4
6
P₀₀ P₀₁ P₀₂
1
3
5
P₁₀ P₁₁ P₁₂

B,D	0	--	--
0 2 4 6	P₀₀	P₀₁	P₀₂
1 3 5	P₁₀	P₁₁	P₁₂

If the vector is distributed over the first column of a 2 × 3 process grid, then P₀₀ and P₁₀ contain the following local arrays:

Array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 20
4 N_X 1
5 MB_X 3
6 NB_X 1
7 RSRC_X 0
8 CSRC_X 0
9 LLD_X 11 (For P₀₀)
9 (For P₁₀)

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	20
4	N_X	1
5	MB_X	3
6	NB_X	1
7	RSRC_X	0
8	CSRC_X	0
9	LLD_X	11 (For P₀₀) 9 (For P₁₀)

If the vector is distributed over the first row of the 2 × 3 process grid, then P₀₀, P₀₁, and P₀₂ contain the following local arrays:

p,q  |           0             |         1          |         2
-----|-------------------------|--------------------|-------------------
 0   | 0  5  6  9 13 11 18 19  |  3 21  5 12 15 14  |  6  1  8 15 11 17

Array descriptor DESC_X contains the following:

DESC_X( ) Symbolic name Value
1 DTYPE_X 1
2 CTXT_X BLACS context
3 M_X 1
4 N_X 20
5 MB_X 1
6 NB_X 3
7 RSRC_X 0
8 CSRC_X 0
9 LLD_X 1

DESC_X( )	Symbolic name	Value
1	DTYPE_X	1
2	CTXT_X	BLACS context
3	M_X	1
4	N_X	20
5	MB_X	1
6	NB_X	3
7	RSRC_X	0
8	CSRC_X	0
9	LLD_X	1

Matrices

The Parallel ESSL subroutines, except the Banded Linear Algebraic Equations, support block-cyclic data distribution for matrices using one- or two-dimensional process grids. The Banded Linear Algebraic Equations support only block data distribution using one-dimensional process grids.

The following terminology is used when it is necessary to distinguish special types of matrices:

Full block matrix -- a matrix of blocks distributed over the whole process grid.
Block row matrix -- a matrix of blocks distributed over a single row of the process grid.
Block column matrix -- a matrix of blocks distributed over a single column of the process grid.
Single block matrix -- a matrix consisting of a single block lying in a single process of the process grid.

Distributed over One-Dimensional Process Grids

This section describes how to distribute a matrix block-cyclically over a one-dimensional process grid. It also shows how matrices for the Banded Linear Algebraic Equations are distributed over a one-dimensional process grid using block distribution.

Block-Cyclically Distributing a Matrix

The examples that follow show how a 6 × 8 global matrix A with blocks of size 2 × 2 is distributed block-cyclically over one-dimensional process grids. Assume the following global matrix A:

B,D       0           1           2           3
     *                                             *
 0   |   0   1  |    2   3  |    4   5  |    6   7 |
     |  10  11  |   12  13  |   14  15  |   16  17 |
     | ---------|-----------|-----------|--------- |
 1   |  20  21  |   22  23  |   24  25  |   26  27 |
     |  30  31  |   32  33  |   34  35  |   36  37 |
     | ---------|-----------|-----------|--------- |
 2   |  40  41  |   42  43  |   44  45  |   46  47 |
     |  50  51  |   52  53  |   54  55  |   56  57 |
     *                                             *

Column-oriented, 3 × 1 process grid:

B,D 0 1 2 3
0 P₀₀
1 P₁₀
2 P₂₀

B,D	0 1 2 3
0	P₀₀
1	P₁₀
2	P₂₀

Local arrays:

p,q  |                0
-----|---------------------------------
 0   |   0   1   2   3   4   5   6   7
     |  10  11  12  13  14  15  16  17
-----|---------------------------------
 1   |  20  21  22  23  24  25  26  27
     |  30  31  32  33  34  35  36  37
-----|---------------------------------
 2   |  40  41  42  43  44  45  46  47
     |  50  51  52  53  54  55  56  57

For the column-oriented example, the array descriptor DESC_A contains:

DESC_A( ) Symbolic name Value
1 DTYPE_A 1
2 CTXT_A BLACS context
3 M_A 6
4 N_A 8
5 MB_A 2
6 NB_A 2
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 2

DESC_A( )	Symbolic name	Value
1	DTYPE_A	1
2	CTXT_A	BLACS context
3	M_A	6
4	N_A	8
5	MB_A	2
6	NB_A	2
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	2

Row-oriented, 1 × 2 process grid:

B,D 0 2 1 3
0
1
2
P₀₀ P₀₁

B,D	0 2	1 3
0 1 2	P₀₀	P₀₁

Local arrays:

p,q  |        0         |         1
-----|------------------|------------------
     |   0   1   4   5  |    2   3   6   7
     |  10  11  14  15  |   12  13  16  17
     |  20  21  24  25  |   22  23  26  27
 0   |  30  31  34  35  |   32  33  36  37
     |  40  41  44  45  |   42  43  46  47
     |  50  51  54  55  |   52  53  56  57

For the row-oriented example, the array descriptor DESC_A:

DESC_A( ) Symbolic name Value
1 DTYPE_A 1
2 CTXT_A BLACS context
3 M_A 6
4 N_A 8
5 MB_A 2
6 NB_A 2
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 6

DESC_A( )	Symbolic name	Value
1	DTYPE_A	1
2	CTXT_A	BLACS context
3	M_A	6
4	N_A	8
5	MB_A	2
6	NB_A	2
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	6

For an example of distributing a matrix over a one-dimensional process grid in a Fortran 90 program, see matrix F in Appendix B. "Sample Programs", which is:

Created in subroutine initialize_carray in "Module Scale (Message Passing)".
Assigned values in subroutine get_diffusion_matrix in "Module Fourier (Message Passing)".
Used in "Program Main (Message Passing)".

Block-Cyclically Distributing a Symmetric Band Matrix

This section shows how to distribute a symmetric band matrix A over a one-dimensional process grid using block-cyclic distribution.

Assume the following symmetric band matrix A of size 9 × 9 with a half bandwidth of 2:

              *                                  *
              | 11  21  31   0   0   0   0  0  0 |
              | 21  22  32  42   0   0   0  0  0 |
              | 31  32  33  34  53   0   0  0  0 |
         A =  |  0  42  34  44  54  64   0  0  0 |
              |  0   0  53  54  55  65  75  0  0 |
              |  0   0   0  64  65  66  76 86  0 |
              |  0   0   0   0  75  76  77 87 97 |
              |  0   0   0   0   0  86  87 88 98 |
              |  0   0   0   0   0   0  97 98 99 |
              *                                  *

Matrix A must be stored in upper- or lower-band-packed storage mode. The sections that follow contain examples describing these two storage modes. In these examples, matrix A is stored in an array with dimensions 3 × 9.

Upper-Band-Packed Storage Mode

The global matrix A with block size of 2 is stored in upper-band-packed storage mode, as follows:

B,D      0       1       2       3      4
     *                                    *
     |  *  * | 31 42 | 53 64 | 75 86 | 97 |
 0   |  * 21 | 32 34 | 54 65 | 76 87 | 98 |
     | 11 22 | 33 44 | 55 66 | 77 88 | 99 |
     *                                    *

Following is a row-oriented, 1 × 3 process grid:

B,D 0 3 1 4 2
0 P₀₀ P₀₁ P₀₂

B,D	0 3	1 4	2
0	P₀₀	P₀₁	P₀₂

The following local arrays A are distributed block-cyclically over the 1 × 3 process grid:

p,q  |      0       |     1     |    2
-----|--------------|-----------|--------
     |  *  *  75 86 |  31 42 97 |  53 64
 0   |  * 21  76 87 |  32 34 98 |  54 65
     | 11 22  77 88 |  33 44 99 |  55 66

where * means you do not have to store a value in that position in the local array. However, these storage positions are required and overwritten during the computation.

The type-501 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 501 for 1 × p
2 CTXT_A BLACS context
3 N_A 9
4 NB_A 2
5 CSRC_A 0
6 LLD_A 3
7 -- Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 501 for 1 × `p`
2	CTXT_A	BLACS context
3	N_A	9
4	NB_A	2
5	CSRC_A	0
6	LLD_A	3
7	--	Reserved

Alternately, the type-1 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 1 for 1 × p
2 CTXT_A BLACS context
3 M_A 3
4 N_A 9
5 MB_A 1
6 NB_A 2
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 3

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for 1 × `p`
2	CTXT_A	BLACS context
3	M_A	3
4	N_A	9
5	MB_A	1
6	NB_A	2
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	3

Lower-Band-Packed Storage Mode

The global matrix A with block size of 2 is stored in lower-band-packed storage mode, as follows:

B,D      0       1       2       3      4
     *                                    *
     | 11 22 | 33 44 | 55 66 | 77 88 | 99 |
 0   | 21 32 | 34 54 | 65 76 | 87 98 |  * |
     | 31 42 | 53 64 | 75 86 | 97  * |  * |
     *                                    *

Following is a row-oriented, 1 × 3 process grid:

B,D 0 3 1 4 2
0 P₀₀ P₀₁ P₀₂

B,D	0 3	1 4	2
0	P₀₀	P₀₁	P₀₂

The following local arrays A are distributed block-cyclically over the 1 × 3 process grid:

p,q  |      0      |     1    |    2
-----|-------------|----------|--------
     | 11 22 77 88 | 33 44 99 | 55 66
 0   | 21 32 87 98 | 34 54  * | 65 76
     | 31 42 97  * | 53 64  * | 75 86

where * means you do not have to store a value in that position in the local array. However, these storage positions are required and overwritten during the computation.

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 501 for 1 × `p`
2	CTXT_A	BLACS context
3	N_A	9
4	NB_A	2
5	CSRC_A	0
6	LLD_A	3
7	--	Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for 1 × `p`
2	CTXT_A	BLACS context
3	M_A	3
4	N_A	9
5	MB_A	1
6	NB_A	2
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	3

For more information on how to store symmetric band matrices, see the ESSL Version 3 Guide and Reference manual.

Block-Cyclically Distributing a General Tridiagonal Matrix

A general tridiagonal matrix, represented as three vectors, must be distributed over a one-dimensional process grid using a block-cyclic data distribution. Because vectors are one-dimensional data structures, you can use a type-501, type-502, or type-1 array descriptor regardless of whether the process grid is 1 × p or p × 1.

The first part of this section shows how to distribute a general tridiagonal matrix A over a p × 1 process grid. The second part shows how to distribute the same matrix over a 1 × p process grid. In both cases, the values contained in the corresponding local arrays are identical.

Assume the following general tridiagonal matrix A of size 7 × 7:

              *                          *
              | 11  12   0   0   0  0  0 |
              | 21  22  23   0   0  0  0 |
              |  0  32  33  34   0  0  0 |
              |  0   0  43  44  45  0  0 |
              |  0   0   0  54  55 56  0 |
              |  0   0   0   0  65 66 67 |
              |  0   0   0   0   0 76 77 |
              *                          *

Matrix A is stored in tridiagonal storage mode in the following three vectors:

dl= (*, 21, 32, 43, 54, 65, 76)

d= (11, 22, 33, 44, 55, 66, 77)

du= (12, 23, 34, 45, 56, 67, *)

Block-Cyclic Distribution on a p × 1 Process Grid

The general tridiagonal matrix A is stored in tridiagonal storage mode in vectors dl, d, and du.

Following is global vector dl:

B,D    0
     *    *
 0   |  * |
     | 21 |
     | -- |
 1   | 32 |
     | 43 |
     | -- |
 2   | 54 |
     | 65 |
     | -- |
 3   | 76 |
     *    *

Following is global vector d:

B,D    0
     *    *
 0   | 11 |
     | 22 |
     | -- |
 1   | 33 |
     | 44 |
     | -- |
 2   | 55 |
     | 66 |
     | -- |
 3   | 77 |
     *    *

Following is global vector du:

B,D    0
     *    *
 0   | 12 |
     | 23 |
     | -- |
 1   | 34 |
     | 45 |
     | -- |
 2   | 56 |
     | 67 |
     | -- |
 3   |  * |
     *    *

Following is a column-oriented, 3 × 1 process grid:

B,D 0
0
3
P₀₀
1 P₁₀
2 P₂₀

B,D	0
0 3	P₀₀
1	P₁₀
2	P₂₀

The arrays are block-cyclically distributed over the 3 × 1 process grid.

Following are the local arrays for DL:

p,q  | 0
-----|----
 0   |  *
     | 21
     | 76
-----|----
 1   | 32
     | 43
-----|----
 2   | 54
     | 65

Following are the local arrays for D:

p,q  | 0
-----|----
 0   | 11
     | 22
     | 77
-----|----
 1   | 33
     | 44
-----|----
 2   | 55
     | 66

Following are the local arrays for DU:

p,q  | 0
-----|----
 0   | 12
     | 23
     |  *
-----|----
 1   | 34
     | 45
-----|----
 2   | 56
     | 67

where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.

The type-502 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 502 for p × 1
2 CTXT_A BLACS context
3 M_A 7
4 MB_A 2
5 RSRC_A 0
6 LLD_A Not used
7 - Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 502 for `p` × 1
2	CTXT_A	BLACS context
3	M_A	7
4	MB_A	2
5	RSRC_A	0
6	LLD_A	Not used
7	-	Reserved

Alternately, the type-1 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 1 for p × 1
2 CTXT_A BLACS context
3 M_A 7
4 N_A 1
5 MB_A 2
6 NB_A 1
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A Not used

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for `p` × 1
2	CTXT_A	BLACS context
3	M_A	7
4	N_A	1
5	MB_A	2
6	NB_A	1
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	Not used

Block-Cyclic Distribution on a 1 × p Process Grid

The general tridiagonal matrix A is stored in tridiagonal storage mode in vectors dl, d, and du. Because vectors are one-dimensional data structures, the block-cyclically distributed arrays on a 1 × p process grid are identical to the block-cyclically distributed arrays on a p × 1 process grid.

Following is global vector dl:

B,D     0       1       2      3
     *                           *
 0   | * 21 | 32 43 | 54 65 | 76 |
     *                           *

Following is global vector d:

B,D      0       1       2      3
     *                            *
 0   | 11 22 | 33 44 | 55 66 | 77 |
     *                            *

Following is global vectors du:

B,D      0       1       2      3
     *                             *
 0   | 12 23 | 34 45 | 55 67 |  *  |
     *                             *

Following is a row-oriented, 1 × 3 process grid:

B,D 0 3 1 2
0 P₀₀ P₀₁ P₀₂

B,D	0 3	1	2
0	P₀₀	P₀₁	P₀₂

The arrays are block-cyclically distributed over the 1 × 3 process grid.

Following are the local arrays for DL:

p,q  |     0   |   1   |   2
-----|---------|-------|------
 0   | * 21 76 | 32 43 | 54 65

Following are the local arrays for D:

p,q  |     0    |   1   |   2
-----|----------|-------|------
 0   | 11 22 77 | 33 44 | 55 66

Following are the local arrays for DU:

p,q  |   0      |    1   |   2
-----|----------|--------|-------
 0   | 12 23  * | 34 45  | 55 67

where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.

The type-501 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 501 for 1 × p
2 CTXT_A BLACS context
3 N_A 7
4 NB_A 2
5 CSRC_A 0
6 LLD_A Not used
7 - Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 501 for 1 × `p`
2	CTXT_A	BLACS context
3	N_A	7
4	NB_A	2
5	CSRC_A	0
6	LLD_A	Not used
7	-	Reserved

Alternately, the type-1 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 1 for 1 × p
2 CTXT_A BLACS context
3 M_A 1
4 N_A 7
5 MB_A 1
6 NB_A 2
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A Not used

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for 1 × `p`
2	CTXT_A	BLACS context
3	M_A	1
4	N_A	7
5	MB_A	1
6	NB_A	2
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	Not used

For more information on how to store general tridiagonal matrices, see the ESSL Version 3 Guide and Reference manual.

Block-Cyclically Distributing a Symmetric Tridiagonal Matrix

A symmetric tridiagonal matrix, represented as two vectors, must be distributed over a one-dimensional process grid using a block-cyclic data distribution. Because vectors are one-dimensional data structures, you can use a type-501, type-502, or type-1 array descriptor regardless of whether the process grid is p × 1 or 1 × p.
Note: For both serial ESSL and Parallel ESSL, the n- 1 elements of the equal off-diagonals of a symmetric tridiagonal matrix are stored in a one-dimensional vector of length n. To be compatible with ScaLAPACK, in Parallel ESSL, the off-diagonal is chosen to be the superdiagonal and is stored in elements 1 through n- 1. In the serial ESSL library, the off-diagonal is chosen to be the subdiagonal and is stored in elements 2 through n.

The first part of this section shows a how to distribute a symmetric tridiagonal matrix A over a p × 1 process grid. The second part shows how to distribute the same matrix over a 1 × p process grid. In both cases, the values contained in the corresponding local arrays are identical.

Assume the following symmetric tridiagonal matrix A of size 7 × 7:

              *                           *
              | 10   1   0   0   0   0  0 |
              |  1  20   2   0   0   0  0 |
              |  0   2  30   3   0   0  0 |
              |  0   0   3  40   4   0  0 |
              |  0   0   0   4  50   5  0 |
              |  0   0   0   0   5  60  6 |
              |  0   0   0   0   0   6 70 |
              *                           *

Matrix A is stored in parallel-symmetric-tridiagonal storage mode in the following two vectors:

d= (10, 20, 30, 40, 50, 60, 70)

e= (1, 2, 3, 4, 5, 6, *)

Block-Cyclic Distribution on a p × 1 Process Grid

The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in vectors d and e.

Following is global vector d:

B,D    0
     *    *
     | 10 |
 0   | 20 |
     | 30 |
     | -- |
 1   | 40 |
     | 50 |
     | 60 |
     | -- |
 2   | 70 |
     *    *

Following is global vector e:

B,D    0
     *   *
     | 1 |
 0   | 2 |
     | 3 |
     | - |
 1   | 4 |
     | 5 |
     | 6 |
     | - |
 2   | * |
     *   *

Following is a column-oriented, 2 × 1 process grid:

B,D 0
0
2
P₀₀
1 P₁₀

B,D	0
0 2	P₀₀
1	P₁₀

The arrays are block-cyclically distributed over the 2 × 1 process grid.

Following are the local arrays for D:

p,q  | 0
-----|----
     | 10
 0   | 20
     | 30
     | 70
-----|----
 1   | 40
     | 50
     | 60

Following are the local arrays for E:

p,q  | 0
-----|---
     | 1
 0   | 2
     | 3
     | *
-----|---
 1   | 4
     | 5
     | 6

where * means you do not have to store a value in that position in the local array. However, these storage positions are required.

The type-502 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 502 for p × 1
2 CTXT_A BLACS context
3 M_A 7
4 MB_A 3
5 RSRC_A 0
6 LLD_A Not used
7 - Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 502 for `p` × 1
2	CTXT_A	BLACS context
3	M_A	7
4	MB_A	3
5	RSRC_A	0
6	LLD_A	Not used
7	-	Reserved

Alternately, the type-1 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 1 for p × 1
2 CTXT_A BLACS context
3 M_A 7
4 N_A 1
5 MB_A 3
6 NB_A 1
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A Not used

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for `p` × 1
2	CTXT_A	BLACS context
3	M_A	7
4	N_A	1
5	MB_A	3
6	NB_A	1
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	Not used

Block-Cyclic Distribution on a 1 × p Process Grid

The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in vectors d and e. Because vectors are one-dimensional data structures, the block-cyclically distributed arrays on a 1 × p process grid are identical to the block-cyclically distributed arrays on a p × 1 process grid.

Following is global vector d:

B,D        0          1       2
     *                          *
 0   | 10 20 30 | 40 50 60 | 70 |
     *                          *

Following is global vector e:

B,D      0       1     2
     *                   *
 0   | 1 2 3 | 4 5 6 | * |
     *                   *

Following is a row-oriented, 1 × 2 process grid:

B,D 0 2 1
0 P₀₀ P₀₁

B,D	0 2	1
0	P₀₀	P₀₁

The arrays are block-cyclically distributed over the 1 × 2 process grid.

Following are the local arrays for D:

p,q  |      0       |    1
-----|--------------|---------
 0   | 10 20 30 70  | 40 50 60

Following are the local arrays for E:

p,q  |    0    |   1
-----|---------|------
 0   | 1 2 3 * | 4 5 6

where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.

The type-501 array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A DTYPE_A = 501 for 1 × p
2 CTXT_A BLACS context
3 N_A 7
4 NB_A 3
5 CSRC_A 0
6 LLD_A Not used
7 - Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 501 for 1 × `p`
2	CTXT_A	BLACS context
3	N_A	7
4	NB_A	3
5	CSRC_A	0
6	LLD_A	Not used
7	-	Reserved

DESC_A( )	Symbolic name	Value
1	DTYPE_A	DTYPE_A = 1 for 1 × `p`
2	CTXT_A	BLACS context
3	M_A	1
4	N_A	7
5	MB_A	1
6	NB_A	3
7	RSRC_A	0
8	CSRC_A	0
9	LLD_A	Not used

Block-Cyclically Distributing a General Matrix Containing the Right-Hand Sides

This section shows how to block-cyclically distribute a general matrix B containing the multiple right-hand sides for the Banded Linear Algebraic Equations subroutines.

Following is the global matrix B:

B,D       0
     *          *
 0   | 11 12 13 |
     | 21 22 23 |
     | -------- |
 1   | 31 32 33 |
     | 41 42 43 |
     | -------- |
 2   | 51 52 53 |
     | 61 62 63 |
     |----------|
 3   | 71 72 73 |
     *          *

Following is a 3 × 1 process grid:

B,D 0
0
3
P₀₀
1 P₁₀
2 P₂₀

B,D	0
0 3	P₀₀
1	P₁₀
2	P₂₀

Following are the local arrays:

p,q  |    0
-----|----------
 0   | 11 12 13
     | 21 22 23
     | 71 72 73
-----|----------
 1   | 31 32 33
     | 41 42 43
-----|----------
 2   | 51 52 53
     | 61 62 63

The type-502 array descriptor DESC_B contains the following:

DESC_B( ) Symbolic name Value
1 DTYPE_B DTYPE_B = 502 for p × 1
2 CTXT_B BLACS context
3 M_B 7
4 MB_B 2
5 RSRC_B 0
6 LLD_B 3 (For P₀₀)
2 (For P₁₀ and P₂₀)
7 -- Reserved

DESC_B( )	Symbolic name	Value
1	DTYPE_B	DTYPE_B = 502 for `p` × 1
2	CTXT_B	BLACS context
3	M_B	7
4	MB_B	2
5	RSRC_B	0
6	LLD_B	3 (For P₀₀) 2 (For P₁₀ and P₂₀)
7	--	Reserved

Alternately, the type-1 array descriptor DESC_B contains the following:

DESC_B( ) Symbolic name Value
1 DTYPE_B DTYPE_B = 1 for p × 1
2 CTXT_B BLACS context
3 M_B 7
4 N_B 3
5 MB_B 2
6 NB_B 1
7 RSRC_B 0
8 CSRC_B 0
9 LLD_B 3 (For P₀₀)
2 (For P₁₀ and P₂₀)

DESC_B( )	Symbolic name	Value
1	DTYPE_B	DTYPE_B = 1 for `p` × 1
2	CTXT_B	BLACS context
3	M_B	7
4	N_B	3
5	MB_B	2
6	NB_B	1
7	RSRC_B	0
8	CSRC_B	0
9	LLD_B	3 (For P₀₀) 2 (For P₁₀ and P₂₀)

Block-Cyclically Distributing over Two-Dimensional Process Grids

This section shows how to distribute general, symmetric, and upper triangular matrices over a two-dimensional process grid using block-cyclic distribution.

Distributing a General Matrix

This example shows how the data for a global matrix A with block size of 2 × 3 is distributed block-cyclically over the entire 2 × 3 process grid. Assume the following 9 × 26 global matrix A with 45 blocks:

B,D       0          1         2         3          4           5         6          7         8
     *                                                                                            *
 0   | 112 5 7  |  8 9 3  |  7 5 1  |  3 2 1  |  8 98   4  |  8 9 4  |  1 3 10  |  3 3 10  |  5 3 |
     | 116 9 6  |  7 2 3  |  6 5 6  |  4 3 2  |  7  2 111  |  7 2 1  |  7 6 15  |  7 6 15  |  7 6 |
     | ---------|---------|---------|---------|------------|---------|----------|----------|----- |
 1   |   1 5 7  |  1 9 3  |  1 5 1  |  1 2 1  |  1  9   4  |  1 9 4  |  5 8 10  |  3 3 11  |  5 3 |
     |   6 9 6  |  7 2 3  |  6 5 6  |  4 3 2  |  7  2   1  |  7 2 1  |  7 6 19  |  7 1 15  |  7 2 |
     | ---------|---------|---------|---------|------------|---------|----------|----------|----- |
 2   |   2 5 7  |  2 9 3  |  2 5 1  |  2 2 1  |  2  9   4  |  2 9 4  |  1 8 10  |  2 3 11  |  3 3 |
     |   6 9 6  |  7 2 3  |  6 5 6  |  4 3 2  |  7  2   1  |  7 2 1  |  7 3 19  |  7 4 15  |  7 8 |
     | ---------|---------|---------|---------|------------|---------|----------|----------|----- |
 3   |   3 5 7  |  3 9 3  |  3 5 1  |  3 2 1  |  3  9   4  |  3 9 4  |  9 8 10  |  2 3 11  |  3 3 |
     |   6 9 6  |  7 2 3  |  6 5 6  |  4 3 2  |  7  2   1  |  7 2 1  |  1 3 49  |  7 4 55  |  7 3 |
     | ---------|---------|---------|---------|------------|---------|----------|----------|----- |
 4   |  20 1 9  |  4 5 6  |  9 8 7  |  1 4 3  |  1 15  21  |  4 7 6  |  9 8 12  |  3 9 18  |  2 4 |
     *                                                                                            *

Note:

In this example, the global matrix dimensions are not divisible by the respective block size. As a result, all of the block sizes are 2 × 3, except for blocks in the last row and the last column of the blocked matrix.

Two-dimensional, 2 × 3 process grid:

B,D 0 3 6 1 4 7 2 5 8
0
2
4
P₀₀ P₀₁ P₀₂
1
3
P₁₀ P₁₁ P₁₂

B,D	0 3 6	1 4 7	2 5 8
0 2 4	P₀₀	P₀₁	P₀₂
1 3	P₁₀	P₁₁	P₁₂

Local arrays:

p,q  |          0            |            1            |         2
-----|-----------------------|-------------------------|------------------
     | 112 5 7 3 2 1 1 3 10  |  8 9 3 8 98   4 3 3 10  |  7 5 1 8 9 4 5 3
     | 116 9 6 4 3 2 7 6 15  |  7 2 3 7  2 111 7 6 15  |  6 5 6 7 2 1 7 6
 0   |   2 5 7 2 2 1 1 8 10  |  2 9 3 2  9   4 2 3 11  |  2 5 1 2 9 4 3 3
     |   6 9 6 4 3 2 7 3 19  |  7 2 3 7  2   1 7 4 15  |  6 5 6 7 2 1 7 8
     |  20 1 9 1 4 3 9 8 12  |  4 5 6 1 15  21 3 9 18  |  9 8 7 4 7 6 2 4
-----|-----------------------|-------------------------|------------------
     |   1 5 7 1 2 1 5 8 10  |  1 9 3 1  9   4 3 3 11  |  1 5 1 1 9 4 5 3
     |   6 9 6 4 3 2 7 6 19  |  7 2 3 7  2   1 7 1 15  |  6 5 6 7 2 1 7 2
 1   |   3 5 7 3 2 1 9 8 10  |  3 9 3 3  9   4 2 3 11  |  3 5 1 3 9 4 3 3
     |   6 9 6 4 3 2 1 3 49  |  7 2 3 7  2   1 7 4 55  |  6 5 6 7 2 1 7 3

Array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A 1
2 CTXT_A BLACS context
3 M_A 9
4 N_A 26
5 MB_A 2
6 NB_A 3
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 5 (For P₀₀, P₀₁, and P₀₂)
4 (For P₁₀, P₁₁, and P₁₂)

Distributing a Symmetric Matrix

This example shows how the data for a global symmetric matrix A with block size of 3 × 3 is distributed block-cyclically over a 2 × 3 process grid. Assume the following 18 × 18 global symmetric matrix A with 36 blocks:

B,D       0            1            2            3            4            5
     *                                                                           *
     |  1  2  3  |   4  5  6  |   7  8  9  |  10 11 12  |  13 14 15  |  16 17 18 |
 0   |  2 10 11  |  12 13 14  |  15 16 17  |  18 19 20  |  21 22 23  |  24 25 26 |
     |  3 11 20  |  21 22 23  |  24 25 26  |  27 28 29  |  30 31 32  |  33 34 35 |
     | ----------|------------|------------|------------|------------|---------- |
     |  4 12 21  |   2  3  5  |   7 11 13  |  17 19 23  |  29 31 37  |  41 43 47 |
 1   |  5 13 22  |   3  1  4  |   9 16 25  |  36 49 64  |  81 10 12  |  14 16 19 |
     |  6 14 23  |   5  4  5  |   6 10 11  |  15 16 20  |  21 25 26  |  30 31 35 |
     | ----------|------------|------------|------------|------------|---------- |
     |  7 15 24  |   7  9  6  |   1  2  3  |   4  5  6  |   7  8  9  |  10 11 12 |
 2   |  8 16 25  |  11 16 10  |   2 11 13  |  15 17 19  |  21 23 25  |  27 29 31 |
     |  9 17 26  |  13 25 11  |   3 13  2  |   4  6  8  |  10 12 14  |  16 18 20 |
     | ----------|------------|------------|------------|------------|---------- |
     | 10 18 27  |  17 36 15  |   4 15 4   |   3  6  9  |   2  4  6  |   3  6  9 |
 3   | 11 19 28  |  19 49 16  |   5 17 6   |   6  1  2  |   3  4  5  |   6  7  8 |
     | 12 20 29  |  23 64 20  |   6 19 8   |   9  2  1  |   3  5  7  |   9 11 13 |
     | ----------|------------|------------|------------|------------|---------- |
     | 13 21 30  |  29 81 21  |   7 21 10  |   2  3  3  |  20 22 21  |  24 23 25 |
 4   | 14 22 31  |  31 10 25  |   8 23 12  |   4  4  5  |  22  4  5  |   6  9 10 |
     | 15 23 32  |  37 12 26  |   9 25 14  |   6  5  7  |  21  5  3  |   2  7  8 |
     | ----------|------------|------------|------------|------------|---------- |
     | 16 24 33  |  41 14 30  |  10 27 16  |   3  6  9  |  24  6  2  |   4 11 15 |
 5   | 17 25 34  |  43 16 31  |  11 29 18  |   6  7 11  |  23  9  7  |  11 17 13 |
     | 18 26 35  |  47 19 35  |  12 31 20  |   9  8 13  |  25 10  8  |  15 13 21 |
     *                                                                           *

Two-dimensional, 3 × 2 process grid:

B,D 0 2 4 1 3 5
0
3
P₀₀ P₀₁
1
4
P₁₀ P₁₁
2
5
P₂₀ P₂₁

The symmetric matrix is distributed block-cyclically in lower storage mode over a 3 × 2 process grid:

p,q  |             0               |              1
-----|-----------------------------|-----------------------------
     |  1  *  *  *  *  *  *  *  *  |   *  *  *  *  *  *  *  *  *
     |  2 10  *  *  *  *  *  *  *  |   *  *  *  *  *  *  *  *  *
     |  3 11 20  *  *  *  *  *  *  |   *  *  *  *  *  *  *  *  *
 0   | 10 18 27  4 15  4  *  *  *  |  17 36 15  3  *  *  *  *  *
     | 11 19 28  5 17  6  *  *  *  |  19 49 16  6  1  *  *  *  *
     | 12 20 29  6 19  8  *  *  *  |  23 64 20  9  2  1  *  *  *
-----|-----------------------------|-----------------------------
     |  4 12 21  *  *  *  *  *  *  |   2  *  *  *  *  *  *  *  *
     |  5 13 22  *  *  *  *  *  *  |   3  1  *  *  *  *  *  *  *
     |  6 14 23  *  *  *  *  *  *  |   5  4  5  *  *  *  *  *  *
 1   | 13 21 30  7 21 10 20  *  *  |  29 81 21  2  3  3  *  *  *
     | 14 22 31  8 23 12 22  4  *  |  31 10 25  4  4  5  *  *  *
     | 15 23 32  9 25 14 21  5  3  |  37 12 26  6  5  7  *  *  *
-----|-----------------------------|-----------------------------
     |  7 15 24  1  *  *  *  *  *  |   7  9  6  *  *  *  *  *  *
     |  8 16 25  2 11  *  *  *  *  |  11 16 10  *  *  *  *  *  *
     |  9 17 26  3 13  2  *  *  *  |  13 25 11  *  *  *  *  *  *
 2   | 16 24 33 10 27 16 24  6  2  |  41 14 30  3  6  9  4  *  *
     | 17 25 34 11 29 18 23  9  7  |  43 16 31  6  7 11 11 17  *
     | 18 26 35 12 31 20 25 10  8  |  47 19 35  9  8 13 15 13 21

where * means you do not have to store a value in that position in the local array. However, these storage positions are required.

Notice that the local arrays are not symmetric.

Array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A 1
2 CTXT_A BLACS context
3 M_A 18
4 N_A 18
5 MB_A 3
6 NB_A 3
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 6

For more information on how to store symmetric matrices, see the ESSL Version 3 Guide and Reference manual.

Distributing an Upper Triangular Matrix

This example shows how the data for a global upper triangular matrix A with block size of 2 × 2 is distributed block-cyclically over a 2 × 3 process grid. Assume the following 12 × 12 global upper triangular matrix A with 36 blocks:

B,D      0         1         2         3         4         5
     *                                                         *
 0   |  2  1  |   2 13  |  13 10  |  15 21  |  26 31  |   7  5 |
     |  0  3  |   4  4  |  11 23  |  41 45  |  59 67  |   1  8 |
     | -------|---------|---------|---------|---------|------- |
 1   |  0  0  |   5  9  |   6  9  |  33 65  |  21 14  |   9  4 |
     |  0  0  |   0  7  |  16  8  |   7 33  |   3  7  |   5  3 |
     | -------|---------|---------|---------|---------|------- |
 2   |  0  0  |   0  0  |  11 25  |  10  5  |  23  7  |  10  6 |
     |  0  0  |   0  0  |   0 13  |  36 12  |   3 13  |   5  6 |
     | -------|---------|---------|---------|---------|------- |
 3   |  0  0  |   0  0  |   0  0  |  17 49  |  14  1  |   7  2 |
     |  0  0  |   0  0  |   0  0  |   0 19  |  64 16  |   1  7 |
     | -------|---------|---------|---------|---------|------- |
 4   |  0  0  |   0  0  |   0  0  |   0  0  |  23 81  |   6 15 |
     |  0  0  |   0  0  |   0  0  |   0  0  |   0 29  |   9  4 |
     | -------|---------|---------|---------|---------|------- |
 5   |  0  0  |   0  0  |   0  0  |   0  0  |   0  0  |   5  3 |
     |  0  0  |   0  0  |   0  0  |   0  0  |   0  0  |   0  4 |
     *                                                         *

Two-dimensional, 2 × 3 process grid:

B,D 0 3 1 4 2 5
0
2
4
P₀₀ P₀₁ P₀₂
1
3
5
P₁₀ P₁₁ P₁₂

The following local arrays are distributed block-cyclically in upper-triangular storage mode over a 2 × 3 process grid:

p,q  |      0       |       1       |       2
-----|--------------|---------------|--------------
     |  2  1 15 21  |   2 13 26 31  |  13 10  7  5
     |  *  3 41 45  |   4  4 59 67  |  11 23  1  8
     |  *  * 10  5  |   *  * 23  7  |  11 25 10  6
 0   |  *  * 36 12  |   *  *  3 13  |   * 13  5  6
     |  *  *  *  *  |   *  * 23 81  |   *  *  6 15
     |  *  *  *  *  |   *  *  * 29  |   *  *  9  4
-----|--------------|---------------|--------------
     |  *  * 33 65  |   5  9 21 14  |   6  9  9  4
     |  *  *  7 33  |   *  7  3  7  |  16  8  5  3
     |  *  * 17 49  |   *  * 14  1  |   *  *  7  2
 1   |  *  *  * 19  |   *  * 64 16  |   *  *  1  7
     |  *  *  *  *  |   *  *  *  *  |   *  *  5  3
     |  *  *  *  *  |   *  *  *  *  |   *  *  *  4

where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.

Notice the local arrays are not upper triangular.

Array descriptor DESC_A contains the following:

DESC_A( ) Symbolic name Value
1 DTYPE_A 1
2 CTXT_A BLACS context
3 M_A 12
4 N_A 12
5 MB_A 2
6 NB_A 2
7 RSRC_A 0
8 CSRC_A 0
9 LLD_A 6

For more information on how to store triangular matrices, see the ESSL Version 3 Guide and Reference manual.

Specifying Sparse Matrices for the Fortran 90 and Fortran 77 Sparse Linear Algebraic Equations

For the Fortran 90 and Fortran 77 sparse linear algebraic equation subroutines, you must use the sparse utility subroutines provided with Parallel ESSL to build the sparse matrices on each process in the process grid. This sections shows the calling sequence arguments associated with the sparse matrix A.

Fortran 90 Sparse Linear Algebraic Equation Subroutines

This section contains the following sections:

"Calling Sequence Arguments for the Sparse Matrix"
"Derived Data Types"

Calling Sequence Arguments for the Sparse Matrix

This section describes the calling sequence arguments associated with a sparse matrix A.

Table 23. Calling Sequence Arguments for the Sparse Matrix

Arguments Meaning
a is the local part of the sparse matrix A and specified as derived data type D_SPMAT. For more details about D_SPMAT, see "Derived Data Type D_SPMAT".
ia is the row index of the sparse matrix A.
ja is the column index of the sparse matrix A.
desc_a is the array descriptor for the sparse matrix A and specified as derived data type DESC_TYPE. For more details about DESC_TYPE, see "Derived Data Type DESC_TYPE".
parts is a user-supplied subroutine that specifies a mapping between a global index for an element in the global sparse matrix and its corresponding storage location on one or more processes.
For details about how you must define the PARTS subroutine, see "Programming Considerations for the Parts Subroutine (Fortran 90 and Fortran 77)".

Derived Data Types

Some of the arguments of the Fortran 90 sparse linear algebraic equations and their utility subroutines are derived data types.

For more information on derived data types, see IBM AIX XL Fortran Compiler/6000 Language Reference.

Derived Data Type D_SPMAT

Table 24 describes the components of D_SPMAT that you must provide as input to the PSPINS subroutine. In addition to the components you provide, Parallel ESSL creates other components as necessary that are only for internal use.

Table 24. Components of D_SPMAT

Components of D_SPMAT Description Scope
M Number of local rows Local
N Number of local columns Local
FIDA Storage mode for the submatrix Global
AS Pointer to the submatrix, which contains the coefficients. Local
IA1 Pointer to the column numbers of each non-zero element in the submatrix. Local
IA2 Pointer to the starting positions of each row of the submatrix and one position past the end of the submatrix. Local

The AS, IA1, and IA2 components, which are described in this table depend on how you specify the FIDA component. This description assumes you are using storage by rows. For details about how these components must be specified and their special restrictions, see the appropriate argument descriptions in PSPINS--Inserts Local Data into a General Sparse Matrix.

Derived Data Type DESC_TYPE

Parallel ESSL builds the array descriptor, desc_a, which is specified as derived data type DESC_TYPE, and its components, as follows:

PADALL allocates space for the array descriptor and initializes its components.
PSPINS updates some components of the array descriptor.
PSPASB makes final updates to some components of the array descriptor.

MATRIX_DATA is one component of the array descriptor. Table 25 describes the elements of DESC_A%MATRIX_DATA that you may want to reference. However, your application programs should not modify the components of the array descriptor directly. These components should only be updated with calls to PSPINS and PSPASB.

Table 25. Elements of DESC_A%MATRIX_DATA(_)

MATRIX_DATA(_) Name Description Data Type Limits Scope
1 DEC_TYPE Type of data distribution Fullword integer Internal format Global
2 CTXT BLACS context Fullword integer Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M Number of rows in the global general sparse matrix A Fullword integer M >= 0 and M = N Global
4 N Number of columns in the global general sparse matrix A Fullword integer N >= 0 and M = N Global
5 N_ROW Number of local rows Fullword integer N_ROW >= 1 Local
6 N_COL Number of local columns& Fullword integer N_COL >= 1 Local

&DESC_A%MATRIX_DATA(6) is stable after you have placed a call to PSPASB.

Fortran 77 Sparse Linear Algebraic Equation Subroutines

This section contains the following sections:

"Calling Sequence Arguments for the Sparse Matrix"
"Array Descriptor"

Calling Sequence Arguments for the Sparse Matrix

This section describes the calling sequence arguments associated with a general sparse matrix A.

Table 26. Calling Sequence Arguments for the Sparse Matrix

Arguments Meaning
as is the local part of a matrix
ia is the row index of the sparse matrix.
ja is the column index of the sparse matrix.
ia1 is the local part of an array containing the sparse matrix indices.
ia2 is the local part of an array containing the sparse matrix indices.
infoa is an integer array for a matrix. For details about infoa see Table 27.
desc_a is an array descriptor for the sparse matrix. For details about desc_a see "Array Descriptor".
parts is a user-supplied subroutine that specifies a mapping between a global index for an element in the global sparse matrix and its corresponding storage location on one or more processes.
For details about how you must define the PARTS subroutine, see "Programming Considerations for the Parts Subroutine (Fortran 90 and Fortran 77)".

Table 27. Elements of INFOA()

INFOA() Description Scope
1 Length of an array for a matrix Local
2 Length of an array containing sparse matrix indices. Local
3 Length of an array containing sparse matrix indices. Local
4 Storage format of the matrix. Global
5 Type of matrix. Global
6 Number of local rows. Local
7 Number of local columns. Local
8 through 30 Reserved for internal use. --

If infoa is in a subroutine calling sequence, you must always specify a value for INFOA(1), INFOA(2), and INFOA(3).

Array Descriptor

An integer array descriptor, desc_a, is needed to establish a mapping between the global general sparse matrix A and its corresponding distributed memory location. You must specify an array descriptor length, DLEN, in DESC_A(11) on input to PADINIT:

For the maximum length you should need, use the following formulas to calculate the length of the array descriptor, DLEN:
If there is no overlap, DLEN = 33+3(np)+n+(N_COL)+(np-1)(N_ROW)+(N_COL-N_ROW)
If there is no overlap, 33+3(np)+4n is an upper bound for DLEN.
If overlap occurs, add at most to DLEN: 3(np)+1+2(np)(N_ROW)
where:
- N_ROW <= n
- N_COL <= n
- N_ROW is approximately n/np
- n is the order of the global general sparse matrix A.
- np is the number of processes in the process grid.
Use the following formula(s) to calculate a more typical value of the length of the array descriptor, DLEN:
33+3(np)+alphan <= DLEN <= 33+6(np)+3n
where:
- 1 < alpha <= 2
- n is the order of the global general sparse matrix A.
- np is the number of processes in the process grid.

Note:

The actual length of the array descriptor depends on the sparse matrix structure and therefore is known after a call to PDSPASB.

Parallel ESSL builds the remaining elements in the array descriptor, as follows:

PADINIT initializes the array descriptor.
PDSPINS updates parts of the array descriptor.
PDSPASB makes final updates to some parts of the array descriptor.

You may want to use some of the values in desc_a to build vector b containing the right-hand side and vector x containing initial guess to the solution. (Parallel ESSL creates other elements in the array descriptor that are for internal use only.) Table 28 describes the elements of of the array descriptor that you may want to reference. Your application programs should not modify the elements of the array descriptor directly. The elements should only be updated with calls to PDSPINS and PDSPASB.

Table 28. Elements of DESC_A()

DESC_A() Name Description Data Type Limits Scope
1 DEC_TYPE Type of data distribution Fullword integer Internal format Global
2 CTXT BLACS context Fullword integer Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M Number of rows in the global general sparse matrix A Fullword integer M >= 0 and M = N Global
4 N Number of columns in the global general sparse matrix A Fullword integer N >= 0 and M = N Global
5 N_ROW Number of local rows& Fullword integer 1 <= N_ROW <= n Local
6 N_COL Number of local columns£ Fullword integer 1 <= N_COL <= n Local
11 DLEN Length of the array descriptor Fullword integer See the formulas shown in the beginning of this section. Global

&DESC_A(5) is stable after you have placed a call to PADINIT. You can use this value to calculate lprcs in PDSPGPR.
£DESC_A(6) is stable after you have placed a call to PDSPASB. You can use this value to calculate lprcs in PDSPGPR.
DESC_A(7) through DESC_A(10) are only for internal use.
DESC_A(12) through DESC_A(DLEN) are only for internal use.

Programming Considerations for the Parts Subroutine (Fortran 90 and Fortran 77)

This section describes how to design and code the parts subroutine for use by the Parallel ESSL Fortran 90 and Fortran 77 sparse linear algebraic equation subroutines and their utility subroutines.

You must supply a separate subroutine that is callable by Parallel ESSL. You must specify the name of the subroutine in the parts argument. This subroutine name is selected by you. You must declare parts as an external subroutine in your application program.

Coding and Designing the Parts Subroutine for the Sparse Subroutines

The parts subroutine specifies the mapping between a global index for an element in the global general sparse matrix A and its corresponding storage location on a process or processes (if overlap occurs).

You should design the parts subroutine so it receives, as input, global_index, n, and np. It also must return to Parallel ESSL, as output, the information in the pv and nv arguments indicating the storage location of global_index on one or more processes.

Syntax

Fortran	CALL PARTS (`global_index`, `n`, `np`, `pv`, `nv`)
C	parts (`&global_index`, `&n`, `&np`, `pv`, `&nv`);
C++	extern "Fortran" void parts(const int &, const int &, const int &, int *, const int &); parts (`global_index`, `n`, `np`, `pv`, `nv`);

On Entry

global_index: is an input scalar argument containing an integer that indicates the global index for an element in the global general sparse matrix A, where: 1 <= global_index <= n.
n: is an input scalar argument containing an integer that indicates the order of the global general sparse matrix A, where: n >= 0.
np: is an input scalar argument containing an integer that indicates the number of processes in the process grid, where: np > 0.

On Output

pv: is an output array containing integers that identify which processes are receiving the global index, global_index, where: 0 <= pv(i) < np and 1 <= i <= nv.
nv: is an output scalar argument containing an integer that indicates the number of unique processes specified in the pv argument, where: 1 <= nv <= np.

Notes

The parts subroutine can be coded in Fortran, C, or C++. However, for C and C++ programs, all the arguments must be passed by reference.

Examples for the PARTS Subroutine

Examples of how you could code parts for different types of data distribution are shown in:

Block Data Distribution for a C Program

void part_block(global_indx,n,nnodes,pv,nv)
int  *global_indx,*n,*nnodes,*pv,*nv;
{
  int dim_block;
     dim_block = (*n + *nnodes -1)/(*nnodes);
     *nv = 1;
     pv[*nv-1] = (*global_indx - 1)/dim_block;
}

Specifying Sequences for the Fourier Transforms

This section shows how to use block-column distribution to distribute a two- or three-dimensional sequence over a one-dimensional process grid. It also describes how some of the two- and three-dimensional complex sequences are stored in FFT-packed storage mode.

Two-Dimensional Sequence

Following is a two-dimensional sequence using zero-based indexing where the first dimension is n1, and the second dimension is n2:

Distributing Data

For the Fourier transform subroutines, a two-dimensional sequence is distributed over a one-dimensional process grid, using block-column distribution. The process grid must be arranged as a row (1 × q, where q is the number of processes).
Note: Two-dimensional sequences can be thought of as two-dimensional matrices. The term sequence is used because it is traditional for Fourier transforms.

You must distribute the input sequence sequentially to the processes in the process grid, using block-column distribution. Parallel ESSL also returns the output sequence using block-column distribution. The output sequence may be returned in normal or transposed form.

A sequence can be distributed unevenly; that is, one process in the process grid can receive an array that is smaller than other processes. It can also happen that some processes receive no data. "Example 2" shows an example of uneven data distribution.

LOCq(n) represents the number of columns that a process would receive if n is distributed block over q processes. You need to calculate LOCq(n) for each process, as follows:

The number of columns, LOCq(n), that processes P₀₀ through P_0,k-1 receive is calculated as follows:
LOCq(n) = NB2 = (n+q-1)/q
The number of columns, LOCq(n), that process P_0,k receives is calculated as follows:
LOCq(n) = n-(q-1)(NB2)
Processes P_0,k+1 through P_0,q-1 would not receive any data. This may happen if there is not enough data to distribute to all the specified processes.

where:

n represents the following:

n is the second dimension, n2, of the sequence (for normal form)

n is the first dimension, n1, of the sequence (for transposed form and the sequence is not stored in FFT-packed storage mode)

n is n1/2 (for transposed form and the sequence is stored in FFT-packed storage mode)

q is the number processes in the process grid

P_0,k is the process that receives the last block of data. For uneven data distribution, P_0,k would receive an array that is smaller than the other processes receive.

Following is an example of block-column distribution for a two-dimensional sequence over a one-dimensional, row-oriented process grid.

Global sequence of size 8 × 12:

B,D       0            1            2             3
     *                                                   *
     |  0 10 20  |  30 40 50  |  60 70 80  |  90 100 110 |
     |  1 11 21  |  31 41 51  |  61 71 81  |  91 101 111 |
     |  2 12 22  |  32 42 52  |  62 72 82  |  92 102 112 |
     |  3 13 23  |  33 43 53  |  63 73 83  |  93 103 113 |
 0   |  4 14 24  |  34 44 54  |  64 74 84  |  94 104 114 |
     |  5 15 25  |  35 45 55  |  65 75 85  |  95 105 115 |
     |  6 16 26  |  36 46 56  |  66 76 86  |  96 106 116 |
     |  7 17 27  |  37 47 57  |  67 77 87  |  97 107 117 |
     *                                                   *

Row-oriented, 1 × 4 process grid:

B,D 0 1 2 3
0 P₀₀ P₀₁ P₀₂ P₀₃

Local arrays:

p,q  |     0       |      1       |      2       |      3
-----|-------------|--------------|--------------|-------------
     |  0  10  20  |  30  40  50  |  60  70  80  |  90 100 110
     |  1  11  21  |  31  41  51  |  61  71  81  |  91 101 111
     |  2  12  22  |  32  42  52  |  62  72  82  |  92 102 112
     |  3  13  23  |  33  43  53  |  63  73  83  |  93 103 113
 0   |  4  14  24  |  34  44  54  |  64  74  84  |  94 104 114
     |  5  15  25  |  35  45  55  |  65  75  85  |  95 105 115
     |  6  16  26  |  36  46  56  |  66  76  86  |  96 106 116
     |  7  17  27  |  37  47  57  |  67  77  87  |  97 107 117

An example of the distribution of a two-dimensional sequence in a Fortran 90 program is shown in Appendix B. "Sample Programs". See the following:

The subroutine initialize-scale in "Module Scale (Message Passing)", which determines the parameters to be used for block distribution, ultimately setting up the correct parameters for distributing an FFT sequence.
The subroutine get-diffusion_matrix in "Module Fourier (Message Passing)", which shows how a local array can be assigned values.
The subroutine rlocal_to_rglobal in "Module Scale (Message Passing)", which shows gathering the local portions of the block-distributed real array to generate the corresponding global sequence/matrix.

FFT-Packed Storage Mode

The output sequence for PSRCFT2 and PDRCFT2, and the input sequence for PSCRFT2 and PDCRFT2 are stored in FFT-packed storage mode because they consist of complex-conjugate, even symmetric data.

For FFT-packed storage mode, only certain elements of the complex-conjugate, even symmetric data are stored. This section describes how the complex elements of sequence y, which is the output sequence for PSRCFT2 and PDRCFT2, and the input sequence for PSCRFT2 and PDCRFT2, are stored in global matrices Y and X, respectively.

For example, suppose y is the two-dimensional sequence to be stored in FFT-packed storage mode for PDRCFT2. The following list describes how the elements in y correspond to the elements in the global matrix Y:

The real part of y_0,0 is stored in the real part of Y_0,0
The real part of y_0,n2/2 is stored in the imaginary part of Y_0,0
The real part of y_n1/2,0 is stored in the real part of Y_n2/2,0
The real part of y_n1/2,n2/2 is stored in the imaginary part of Y_n2/2,0
The elements y_0,1:n2/2-1 are stored in elements Y_1:n2/2-1,0
The elements y_{n1/2,1:n2/2-1} are stored in elements Y_{n2/2+1:n2-1,0}
The rows y_1:n1/2-1,i are stored in columns Y_i,1:n1/2-1
where:

n1 is the first dimension of array y
n2 is the second dimension of array y
i = 0, ..., n2- 1

The remaining elements of y are not stored because they are the complex conjugates of elements already stored. These relationships are shown in the following equations:

where:

n1 is the first dimension of array y

n2 is the second dimension of array y

The following example, which uses zero-based indexing, has complex conjugate, even symmetry. The dimensions of array y are 8 × 8 (that is n1 = n2 = 8), where array y is:

    *                                                                         *
    | (111,0)  (-3,23)  (-8,10)   (-9,4)   (-9,0)  (-9,-4)  (-8,-10)  (-3,-23)|
    |(10,-10)    (4,4)    (9,3)   (-6,2)   (-1,2)   (-2,1)    (-3,1)   (-5,-3)|
    |  (6,-4)    (1,3)    (0,2)   (-7,1)   (-1,9)   (-1,4)   (-2,-4)   (-2,-2)|
    |  (6,-2)    (6,2)   (-5,1)   (-8,8)   (-1,4)  (-1,-1)   (-1,-8)   (-1,-2)|
    |   (6,0)   (-3,2)   (-9,1)   (-1,5)   (-1,0)  (-1,-5)   (-9,-1)   (-3,-2)|
    |   (6,2)   (-1,2)   (-1,8)   (-1,1)  (-1,-4)  (-8,-8)   (-5,-1)    (6,-2)|
    |   (6,4)   (-2,2)   (-2,4)  (-1,-4)  (-1,-9)  (-7,-1)    (0,-2)    (1,-3)|
    | (10,10)   (-5,3)  (-3,-1)  (-2,-1)  (-1,-2)  (-6,-2)    (9,-3)    (4,-4)|
    *                                                                         *

Because zero-based indexing is used, y_0,0 = (111,0), y_3,2 = (-5,1), and y_5,7 = (6,-2).

In this example, the real part of y_0,0 is 111, the real part of y_0,4 is-9, the real part of y_4,0 is 6, the real part of y_4,4 is-1, and their imaginary parts are all zero. For the FFT-packed storage mode, the imaginary parts at these particular positions are not stored. Therefore, the number stored at position Y_0,0 is (111,-9), which represents the contents of both y_0,0 and y_0,4. The number stored at position Y_4,0 is (6,-1), which represents the contents of both y_4,0 and y_4,4.

The elements y_0,1:3 are stored in Y_1:3,0. The elements y_4,1:3 are stored in Y_5:7,0. The rows y_1:3,0:7 are stored in columns Y_0:7,1:3. For FFT-packed storage mode, the elements in positions y_0,5:7, y_4,5:7, and rows y_5:7,0:7 are not stored.

Following is the global matrix Y in FFT-packed storage mode:

B,D            0                    1
     *                                        *
     | (111,-9) (10,-10)  |   (6,-4)   (6,-2) |
     |  (-3,23)  (4,  4)  |   (1, 3)   (6, 2) |
     |  (-8,10)  (9,  3)  |   (0, 2)  (-5, 1) |
     |  (-9, 4) (-6,  2)  |  (-7, 1)  (-8, 8) |
 0   |   (6,-1) (-1,  2)  |  (-1, 9)  (-1, 4) |
     |  (-3, 2) (-2,  1)  |  (-1, 4)  (-1,-1) |
     |  (-9, 1) (-3,  1)  |  (-2,-4)  (-1,-8) |
     |  (-1, 5) (-5, -3)  |  (-2,-2)  (-1,-2) |
     *                                        *

Following is a 1 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁

After the data has been distributed over the process grid, the following local arrays for Y are stored in FFT-packed storage mode:

p,q  |         0           |         1
-----|---------------------|-------------------
     | (111,-9)  (10,-10)  |   (6,-4)   (6,-2)
     |  (-3,23)   (4,  4)  |   (1, 3)   (6, 2)
     |  (-8,10)   (9,  3)  |   (0, 2)  (-5, 1)
     |  (-9, 4)  (-6,  2)  |  (-7, 1)  (-8, 8)
 0   |   (6,-1)  (-1,  2)  |  (-1, 9)  (-1,4)
     |  (-3, 2)  (-2,  1)  |  (-1, 4)  (-1,-1)
     |  (-9, 1)  (-3,  1)  |  (-2,-4)  (-1,-8)
     |  (-1, 5)  (-5, -3)  |  (-2,-2)  (-1,-2)

Example

The following example shows how to pack data from a two-dimensional array X into a global array XG, whose columns could then be block-column distributed among q processes. Array X must contain complex-conjugate even symmetric data.

Each of the q processes would get LOCq(n) consecutive columns of array XG. Array X is stored as n1 rows by n2 columns. Array XG is stored as n2 rows by n1/2 columns. This is the transposed form required by PSCRFT2 and PDCRFT2 for the input array.

     PROGRAM PACK2D
     IMPLICIT NONE
     INTEGER*4 N1,N2,INDEX,JINDEX
     PARAMETER(N1 = 64, N2 = 32)
     COMPLEX*16 XG(0:N2-1,0:N1/2-1)
     COMPLEX*16 X(0:N1-1,0:N2-1)
     XG(0,0)    = ( REAL(X(0,0))    , REAL(X(0,N2/2))    )
     XG(N2/2,0) = ( REAL(X(N1/2,0)) , REAL(X(N1/2,N2/2)) )
     DO INDEX = 1 , N2/2-1
        XG(INDEX,0) = X(0,INDEX)
        XG(N2/2+INDEX,0) = X(N1/2,INDEX)
     ENDDO
     DO JINDEX = 0,N2-1
      DO INDEX = 1,N1/2-1
         XG(JINDEX,INDEX) = X(INDEX,JINDEX)
      ENDDO
     ENDDO
     STOP
     END

Three-Dimensional Sequences

Following is a three-dimensional sequence using zero-based indexing where the first dimension is n1, the second dimension is n2, and the third dimension is n3:

Distributing Data

For the Fourier transform subroutines, a three-dimensional sequence is distributed over a one-dimensional process grid, using block-plane distribution. The process grid must be arranged as a row (1 × q, where q is the number of processes).
Note: Three-dimensional sequences can be thought of as three-dimensional matrices. The term sequence is used because it is traditional for Fourier transforms.

You must distribute the three-dimensional input sequence sequentially to the processes in the process grid, using block-plane distribution. Parallel ESSL also returns the output sequence using block-plane distribution. The output sequence may be returned in normal or transposed form.

LOCq(n) represents the number of planes that a process would receive if n is distributed block over q processes. You need to calculate LOCq(n) for each process, as follows:

The number of planes, LOCq(n), that processes P₀₀ through P_0,k-1 receive is calculated as follows:
LOCq(n) = NB3 = (n+q-1)/q
The number of planes, LOCq(n), that process P_0,k receives is calculated as follows:
LOCq(n) = n-(q-1)(NB3)
Processes P_0,k+1 through P_0,q-1 would not receive any data. This may happen if there is not enough data to distribute to all the specified processes.

where:

n represents the following:

n is the third dimension, n3, of the sequence (for normal form)

n is the first dimension, n1, of the sequence (for transposed form and the sequence is not stored in FFT-packed storage mode)

n is n1/2 (for transposed form and the sequence is stored in FFT-packed storage mode)

q is the number processes in the process grid

P_0,k is the process that receives the last block of data. For uneven data distribution, P_0,k would receive an array that is smaller than the other processes receive.

Following is an example of block plane distribution for a three-dimensional sequence over a one-dimensional process grid.

Three-dimensional, global sequence with four planes that are of size 2 × 2:

        Plane 0:      Plane 1:
----------------------------------------------
B,D               0
----------------------------------------------
     *                          *
 0   |    0  1    |    10 101   |
     |   10 11    |    11 111   |
     *                          *
 
        Plane 2:      Plane 3:
----------------------------------------------
B,D               1
----------------------------------------------
     *                          *
 0   |   20 21    |    30 31    |
     |   23 24    |    33 34    |
     *                          *

Row-oriented, 1 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁

Local arrays:

p,q  |       0        |       1
-----|----------------|---------------
 0   |  0  1  10 101  |  20 21  30 31
     | 10 11  11 111  |  23 24  33 34

FFT-Packed Storage Mode

The output sequence for PSRCFT3 and PDRCFT3, and the input sequence for PSCRFT3 and PDCRFT3 are stored in FFT-packed storage mode because they consist of complex-conjugate, even symmetric data.

For FFT-packed storage mode, only certain elements of the complex-conjugate, even symmetric data are stored. This section describes how the complex elements of sequence y, which is the output sequence for PSRCFT3 and PDRCFT3, and the input sequence for PSCRFT3 and PDCRFT3, are stored in global matrices Y and X, respectively.

For example, suppose y is the three-dimensional sequence to be stored in FFT-packed storage mode for PDRCFT3. The following list describes how the elements in y correspond to the complex elements in the global matrix Y:

The real part of y_0,0,0 is stored the real part of Y_0,0,0
The real part of y_0,0,n3/2 is stored in the imaginary part of Y_0,0,0
The elements y_0,0,1:n3/2-1 are stored in elements Y_1:n3/2-1,0,0
The real part of y_0,n2/2,0 is stored in the real part of Y_n3/2,0,0
The real part of y_0,n2/2,n3/2 is stored in the imaginary part of Y_n3/2,0,0
The elements y_{0,n2/2,1:n3/2-1} are stored in elements Y_{n3/2+1:n3-1,0,0}
The real part of y_n1/2,0,0 is stored in the real part of Y_0,n2/2,0
The real part of y_n1/2,0,n3/2 is stored in the imaginary part of Y_0,n2/2,0
The elements y_{n1/2,0,1:n3/2-1} are stored in elements Y_{1:n3/2-1,n2/2,0}
The real part of y_n1/2,n2/2,0 is stored in the real part of Y_n3/2,n2/2,0
The real part of y_{n1/2,n2/2,n3/2} is the imaginary part of Y_n3/2,n2/2,0
The elements y_{n1/2,n2/2,1:n3/2-1} are stored in elements Y_{n3/2+1:n3-1,n2/2,0}
The rows y_0,1:n2/2-1,i are stored in columns Y_i,1:n2/2-1,0
The rows y_{n1/2,1:n2/2-1,j} are stored in columns Y_{j,n2/2+1:n2-1,0}
The planes y_1:n1/2-1,i,j are stored in planes Y_j,i,1:n1/2-1

where:

i = 0, ..., n2-1
j = 0, ..., n3-1
n1 is the first dimension of array y
n2 is the second dimension of array y
n3 is the third dimension of array y

The remaining elements of y are not stored because they are the complex conjugates of elements already stored. These relationships are shown in the following equations:

where:

n1 is the first dimension of array y

n2 is the second dimension of array y

n3 is the third dimension of array y

The following example, which uses zero-based indexing, has complex-conjugate, even symmetry. The dimensions of array y are 4 × 4 × 4 (that is n1 = n2 = n3 = 4).

Plane 0:

y_0:3,0:3,0 =

             *                                             *
             |     (30,0)     (2,-3)   (-0.3,0)     (2,3)  |
             |   (-1,0.7)    (-1,-4)  (-2,-0.7)  (0.5,-2)  |
             |     (-2,0)  (-2,-0.6)      (2,0)  (-2,0.6)  |
             |  (-1,-0.7)    (0.5,2)   (-2,0.7)    (-1,4)  |
             *                                             *

Plane 1:

y_0:3,0:3,1 =

             *                                                 *
             |    (2,-2)       (-1,1)   (0.7,-2)      (-3,-2)  |
             |     (2,2)      (-2,-1)   (-0.5,3)   (0.04,0.5)  |
             |  (-0.4,3)  (-0.009,-3)  (0.9,0.1)    (-1,-0.2)  |
             |   (-2,-2)      (-2,-1)   (-0.5,2)  (0.1,0.005)  |
             *                                                 *

Plane 2:

y_0:3,0:3,2 =

             *                                             *
             |      (3,0)  (0.3,0.5)  (0.1,0)  (0.3,-0.5)  |
             |  (-0.3,-2)     (1,-3)    (2,3)      (-7,3)  |
             |      (2,0)     (2,-1)    (1,0)       (2,1)  |
             |   (-0.3,2)  (-0.7,-3)   (2,-3)       (1,3)  |
             *                                             *

Plane 3:

y_0:3,0:3,3 =

             *                                                  *
             |      (2,2)       (-3,2)     (0.7,2)     (-1,-1)  |
             |     (-2,2)   (1,-0.005)   (-0.5,-2)    (-0.2,1)  |
             |  (-0.4,-3)     (-1,0.2)  (0.9,-0.1)  (-0.009,3)  |
             |     (2,-2)  (0.04,-0.5)   (-0.5,-3)      (-2,1)  |
             *                                                  *

Because zero-based indexing is used, y_0,0,0 = (30,0), y_2,1,1 = (-0.009,-3), and y_3,1,3 = (0.04,-0.5).

In this example, the real part of y_0,0,0 is 30, the real part of y_0,0,2 is 3, and their imaginary parts are zero. For the FFT-packed storage mode, the imaginary parts at these particular positions are not stored. Therefore, the element stored at position Y_0,0,0 is (30,3), which represents the contents of both y_0,0,0 and y_0,0,2.

The element y_0,0,1 is stored in the global matrix Y_1,0,0 position.

The real part of y_0,2,0 is-0.3, the real part of y_0,2,2 is 0.1, and their imaginary parts are zero. For the FFT-packed storage mode, the imaginary parts at these particular positions are not stored. Therefore, the element stored at position Y_2,0,0 is (-0.3,0.1), which represents the contents of both y_0,2,0 and y_0,2,2.

The element y_0,2,1 is stored in the global matrix Y_3,0,0 position.

The real part of y_2,0,0 is-2, the real part of y_2,0,2 is 2, and their imaginary parts are zero. For the FFT-packed storage mode, the imaginary parts at these particular positions are not stored. Therefore, the element stored at position Y_0,2,0 is (-2,2), which represents the contents of both y_2,0,0 and y_2,0,2.

The element y_2,0,1 is stored in the global matrix Y_1,2,0 position.

The real part of y_2,2,0 is 2, the real part of y_2,2,2 is 1, and their imaginary parts are zero. For the FFT-packed storage mode, the imaginary parts at these particular positions are not stored. Therefore, the element stored at position Y_2,2,0 is (2,1), which represents the contents of both y_2,2,0 and y_2,2,2.

The element y_2,2,1 is stored in the global matrix Y_3,2,0 position.

The rows y_0,1,0:3 are stored in columns Y_0:3,1,0. The rows y_2,1,0:3 are stored in columns Y_0:3,3,0. The plane y_1,0:3,0:3 is stored in plane Y_0:3,0:3,1. For FFT-packed storage mode, the remaining elements do not need to be stored due to symmetry.

Following is the global matrix Y in FFT-packed storage mode:

Plane 0:

B,D                             0
     *                                                     *
     |    (30,  3)     (2,  -3)      (-2, 2)    (-2, -0.6) |
     |     (2, -2)     (-1,  1)     (-0.4,3)   (-0.009,-3) |
 0   | (-0.3, 0.1)   (0.3, 0.5)      (2,  1)       (2,  1) |
     |   (0.7, -2)      (-3, 2)   (0.9, 0.1)     (-1, 0.2) |
     *                                                     *

Plane 1:

B,D                             1
     *                                                      *
     |  (-1,0.7)        (-1, -4)   (-2, -0.7)     (0.5, -2) |
     |     (2,2)         (-2,-1)    (-0.5, 3)   (0.04, 0.5) |
 0   | (-0.3,-2)         (1, -3)      (2,  3)     (-0.7, 3) |
     |   (-2, 2)  (-0.1, -0.005)   (-0.5, -2)     (-0.2, 1) |
     *                                                      *

Following is a 1 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁

After the data has been distributed over the process grid, the following local arrays for Y are stored in FFT-packed storage mode:

p,q  |                       0                        |                         1
-----|------------------------------------------------|--------------------------------------------------
     |    (30,  3)   (2,  -3)    (-2, 2)  (-2, -0.6)  |   (-1,0.7)       (-1, -4) (-2, -0.7)   (0.5, -2)
     |     (2, -2)   (-1,  1)   (-0.4,3) (-0.009,-3)  |      (2,2)        (-2,-1)  (-0.5, 3) (0.04, 0.5)
 0   | (-0.3, 0.1) (0.3, 0.5)    (2,  1)     (2,  1)  |  (-0.3,-2)        (1, -3)    (2,  3)   (-0.7, 3)
     |   (0.7, -2)    (-3, 2) (0.9, 0.1)   (-1, 0.2)  |    (-2, 2) (-0.1, -0.005) (-0.5, -2)   (-0.2, 1)

Example

The following example shows how to pack data from a three-dimensional array X into a global array XG, whose planes could then be block distributed among q processes. Array X must contain complex-conjugate even symmetric data.

Each of the q processes would get LOCq(n) consecutive planes of array XG. Array X is stored as n1 rows by n2 columns by n3 planes. Array XG is stored as n3 rows by n2 columns by n1/2 planes. This is the transposed form required by PSCRFT3 and PDCRFT3 for the input array. n1, n2, and n3 are divisible by 2q, as required by PSCRFT3 and PDCRFT3.

     PROGRAM PACK3D
     IMPLICIT NONE
     INTEGER*4 N1,N2,N3
     INTEGER*4 IINDEX,JINDEX,KINDEX
     PARAMETER(N1 = 64, N2 = 32, N3 = 48)
     COMPLEX*16 XG(0:N3-1,0:N2-1,0:N1/2-1)
     COMPLEX*16 X(0:N1-1,0:N2-1,0:N3-1)
     XG(0,0,0)    = ( REAL(X(0,0,0))    , REAL(X(0,0,N3/2))    )
     XG(N3/2,0,0) = ( REAL(X(0,N2/2,0)) , REAL(X(0,N2/2,N3/2)) )
     XG(0,N2/2,0) = ( REAL(X(N1/2,0,0)) , REAL(X(N1/2,0,N3/2)) )
     XG(N3/2,N2/2,0) = (REAL(X(N1/2,N2/2,0)),REAL(X(N1/2,N2/2,N3/2)))
     DO IINDEX = 1 , N3/2-1
        XG(IINDEX,0,0) = X(0,0,IINDEX)
        XG(N3/2+IINDEX,0,0) = X(0,N2/2,IINDEX)
        XG(IINDEX,N2/2,0) = X(N1/2,0,IINDEX)
        XG(N3/2+IINDEX,N2/2,0) = X(N1/2,N2/2,IINDEX)
     ENDDO
     DO KINDEX = 0,N3-1
      DO JINDEX = 1,N2/2-1
        XG(KINDEX,JINDEX,0) = X(0,JINDEX,KINDEX)
        XG(KINDEX,N2/2+JINDEX,0) = X(N1/2,JINDEX,KINDEX)
      ENDDO
     ENDDO
     DO KINDEX = 0,N3-1
      DO JINDEX = 0,N2-1
       DO  IINDEX = 1,N1/2-1
          XG(KINDEX,JINDEX,IINDEX) = X(IINDEX,JINDEX,KINDEX)
       ENDDO
      ENDDO
     ENDDO
     STOP
     END

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

Arguments	Meaning
`a`	is the local part of the sparse matrix A and specified as derived data type D_SPMAT. For more details about D_SPMAT, see "Derived Data Type D_SPMAT".
`ia`	is the row index of the sparse matrix A.
`ja`	is the column index of the sparse matrix A.
`desc_a`	is the array descriptor for the sparse matrix A and specified as derived data type DESC_TYPE. For more details about DESC_TYPE, see "Derived Data Type DESC_TYPE".
`parts`	is a user-supplied subroutine that specifies a mapping between a global index for an element in the global sparse matrix and its corresponding storage location on one or more processes. For details about how you must define the PARTS subroutine, see "Programming Considerations for the Parts Subroutine (Fortran 90 and Fortran 77)".

Arguments	Meaning
`as`	is the local part of a matrix
`ia`	is the row index of the sparse matrix.
`ja`	is the column index of the sparse matrix.
`ia1`	is the local part of an array containing the sparse matrix indices.
`ia2`	is the local part of an array containing the sparse matrix indices.
`infoa`	is an integer array for a matrix. For details about `infoa` see Table 27.
`desc_a`	is an array descriptor for the sparse matrix. For details about `desc_a` see "Array Descriptor".
`parts`	is a user-supplied subroutine that specifies a mapping between a global index for an element in the global sparse matrix and its corresponding storage location on one or more processes. For details about how you must define the PARTS subroutine, see "Programming Considerations for the Parts Subroutine (Fortran 90 and Fortran 77)".

INFOA()	Description	Scope
1	Length of an array for a matrix	Local
2	Length of an array containing sparse matrix indices.	Local
3	Length of an array containing sparse matrix indices.	Local
4	Storage format of the matrix.	Global
5	Type of matrix.	Global
6	Number of local rows.	Local
7	Number of local columns.	Local
8 through 30	Reserved for internal use.	--
If `infoa` is in a subroutine calling sequence, you must always specify a value for INFOA(1), INFOA(2), and INFOA(3).

DESC_A()	Name	Description	Data Type	Limits	Scope
1	DEC_TYPE	Type of data distribution	Fullword integer	Internal format	Global
2	CTXT	BLACS context	Fullword integer	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M	Number of rows in the global general sparse matrix A	Fullword integer	M >= 0 and M = N	Global
4	N	Number of columns in the global general sparse matrix A	Fullword integer	N >= 0 and M = N	Global
5	N_ROW	Number of local rows&	Fullword integer	1 <= N_ROW <= `n`	Local
6	N_COL	Number of local columns£	Fullword integer	1 <= N_COL <= `n`	Local
11	DLEN	Length of the array descriptor	Fullword integer	See the formulas shown in the beginning of this section.	Global
&DESC_A(5) is stable after you have placed a call to PADINIT. You can use this value to calculate `lprcs` in PDSPGPR. £DESC_A(6) is stable after you have placed a call to PDSPASB. You can use this value to calculate `lprcs` in PDSPGPR. DESC_A(7) through DESC_A(10) are only for internal use. DESC_A(12) through DESC_A(DLEN) are only for internal use.