Guide and Reference

Level 3 PBLAS (Message Passing)

This chapter describes the Level 3 PBLAS subroutines.

Overview of the Level 3 PBLAS Subroutines

The Level 3 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 3 BLAS.
Note: These subroutines are designed in accordance with the proposed Level 3 PBLAS standard. (See references [14], [15], and [17].) If these subroutines do not comply with the standard as approved, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program.

Table 44. List of Level 3 PBLAS (Message Passing)

Descriptive Name Long-Precision Subprogram Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose PDGEMM
PZGEMM
PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric PDSYMM PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product PDTRMM PDTRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides PDTRSM PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix PDSYRK PDSYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix PDSYR2K PDSYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix PDTRAN PDTRAN--Matrix Transpose for a General Matrix

Descriptive Name	Long-Precision Subprogram	Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose	PDGEMM PZGEMM	PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric	PDSYMM	PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product	PDTRMM	PDTRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides	PDTRSM	PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix	PDSYRK	PDSYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix	PDSYR2K	PDSYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix	PDTRAN	PDTRAN--Matrix Transpose for a General Matrix

Level 3 PBLAS Subroutines

This section contains the Level 3 PBLAS subroutine descriptions.

PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose

PDGEMM performs any one of the following combined matrix computations:

C <-- alphaAB+betaC

C <-- alphaAB^T+betaC

C <-- alphaA^TB+betaC

C <-- alphaA^TB^T+betaC

PZGEMM performs any one of the following combined matrix computations:

C <-- alphaAB+betaC

C <-- alphaAB^T+betaC

C <-- alphaA^TB+betaC

C <-- alphaA^TB^T+betaC

C <-- alphaA^HB+betaC

C <-- alphaA^HB^T+betaC

C <-- alphaAB^H+betaC

C <-- alphaA^TB^H+betaC

C <-- alphaA^HB^H+betaC

where, in the PDGEMM and PZGEMM formulas above:

A represents the global general submatrix:

For transa = 'N', it is A_{ia:ia+m-1,
ja:ja+k-1}.
For transa = 'T' or 'C', it is A_{ia:ia+k-1,
ja:ja+m-1}.

B represents the global general submatrix:

For transb = 'N', it is B_{ib:ib+k-1,
jb:jb+n-1}.
For transb = 'T' or 'C', it is B_{ib:ib+n-1,
jb:jb+k-1}.

C represents the global general submatrix C_{ic:ic+m-1,
jc:jc+n-1}.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following four cases, no computation is performed and the subroutine returns after doing some parameter checking:

m = 0
n = 0
alpha is zero and beta is one.
k = 0 and beta is one.

Assuming the above conditions do not exist, if beta is not one and k is 0, then betaC is returned.

See references [14] and [15].

Table 45. Data Types

A, B, C, alpha, beta Subroutine
Long-precision real PDGEMM
Long-precision complex PZGEMM

Syntax

Fortran	CALL PDGEMM \| PZGEMM (`transa`, `transb`, `m`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`)
C and C++	pdgemm \| pzgemm (`transa`, `transb`, `m`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`);

On Entry

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Scope: global

Specified as: a single character; transa = 'N', 'T', or 'C'

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Scope: global

Specified as: a single character; transb = 'N', 'T', or 'C'

m

is the number of rows in submatrix C used in the computation, and:

If transa = 'N', it is the number of rows in submatrix A.

If transa = 'T' or 'C', it is the number of columns in submatrix A.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix C used in the computation, and:

If transb = 'N', it is the number of columns in submatrix B.

If transb = 'T' or 'C', it is the number of rows in submatrix B.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If transa = 'N', it is the number of columns in submatrix A.

If transa = 'T' or 'C', it is the number of rows in submatrix A.

In addition:

If transb = 'N', it is the number of rows in submatrix B.

If transb = 'T' or 'C', it is the number of columns in submatrix B.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 45.

a

If transa = 'N', the leading LOCp(ia+m-1) by LOCq(ja+k-1) part of the local array A must contain the local pieces of the leading ia+m-1 by ja+k-1 part of the global matrix.
If transa = 'T' or 'C', the leading LOCp(ia+k-1) by LOCq(ja+m-1) part of the local array A must contain the local pieces of the leading ia+k-1 by ja+m-1 part of the global matrix.

Note:

No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If transa = 'N', then ia+m-1 <= M_A.

If transa = 'T' or 'C', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If transa = 'N', then ja+k-1 <= N_A.

If transa = 'T' or 'C', then ja+m-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 or `k` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 or `k` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

If transb = 'N', the leading LOCp(ib+k-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+k-1 by jb+n-1 part of the global matrix.
If transb = 'T' or 'C', the leading LOCp(ib+n-1) by LOCq(jb+k-1) part of the local array B must contain the local pieces of the leading ib+n-1 by jb+k-1 part of the global matrix.

Note:

No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B, and:

If transb = 'N', then ib+k-1 <= M_B.

If transb = 'T' or 'C', then ib+n-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B, and:

If transb = 'N', then jb+n-1 <= N_B.

If transb = 'T' or 'C', then jb+k-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `k` = 0 or `n` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	If `k` = 0 or `n` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	NB_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 45.

c

is the local part of the global general matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+m-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+m-1 by jc+n-1 part of the global matrix.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:

`desc_c`	Name	Description	Limits	Scope
1	DTYPE_C	Descriptor type	DTYPE_C=1	Global
2	CTXT_C	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_C	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_C >= 0 Otherwise: M_C >= 1	Global
4	N_C	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_C >= 0 Otherwise: N_C >= 1	Global
5	MB_C	Row block size	MB_C >= 1	Global
6	NB_C	Column block size	NB_C >= 1	Global
7	RSRC_C	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_C < `p`	Global
8	CSRC_C	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_C < `q`	Global
9	LLD_C	The leading dimension of the local array	LLD_C >= max(1,LOCp(M_C))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45.

Notes and Coding Rules

This subroutine accepts lowercase letters for the transa and transb arguments.
For PDGEMM, if you specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_B = CTXT_C.

The coding rules described in this note depend upon which matrix--A, B, or C--is used as the reference matrix, which is referred to, in general, as matrix X. For each of the three possible selections for the reference matrix, there is a unique set of coding rules that must be met. These are detailed in Table 46 and Table 47. Follow these steps to select a reference matrix and determine what coding rules to use:

Step 1: First, the reference matrix is selected. For optimal performance, the reference matrix is selected based on the arguments m, n, and k, as follows:

If k <= min(m, n), then X = C

If n <= min(m, k), then X = A

If m <= min(n, k), then X = B

The matrix selected must satisfy coding rules a and d, described below, to be a suitable reference matrix. If it does, you go to step 2. If it does not, then it checks to see if either of the other two matrices satisfies coding rules a, c, and d, making one of them a suitable reference matrix. If one of them is suitable, then you go to step 2. If neither matrix is suitable, an error condition results.

Step 2: After a suitable reference matrix is chosen in Step 2, all remaining coding rules, described below, are checked. If the rules are satisfied, the subroutine continues normally. If they are not, an error condition results.

Coding Rules: Following are the coding rules:

The reference matrix must be aligned on a block boundary; that is:

ix-1 must be a multiple of MB_X.
jx-1 must be a multiple of NB_X.

These indexes are indicated in column 5 of Table 46 for each entry for X.
The block sizes that must be equal are indicated in column 4 of Table 46 for each entry for X. The rules for block sizes depend only upon the values of transa and transb, and not on the reference matrix selected; however, for your convenience, the rules are repeated in the table for each reference matrix.
Given the reference matrix X, additional rules apply to the block row and block column offsets of the two nonreference matrices. These rules are listed in column 7 of Table 46 for each entry for X. These rules must only be met when looping is required--that is, either of the conditions in column 8 is met.

The indexes of the nonreference matrices, which need to be on a block boundary, are listed in column 6 of Table 46 for each entry for X.

Table 46. Coding Rules for the Reference Matrix X

-1- X

-2- transa

-3- transb

-4- (b) Equal Block Sizes

-5- (a) Block Bndry For X

-6- (d) Block Bndry For Other

-7- (c) Equal Block Offsets (If Looping is Required)

-8- (c) Conditions For Looping

A 'N' 'N'
MB_A = MB_C NB_B = NB_C NB_A = MB_B

ia, ja ib, ic
mod(jb-1, NB_B) = mod(jc-1, NB_C)

n+mod(jb-1, NB_B) > NB_B -or- n+mod(jc-1, NB_C) > NB_C

A 'N' 'T' or 'C'
MB_A = MB_C MB_B = NB_C NB_A = NB_B

ia, ja jb, ic
mod(ib-1, MB_B) = mod(jc-1, NB_C)

n+mod(ib-1, MB_B) > MB_B -or- n+mod(jc-1, NB_C) > NB_C

A 'T' or 'C' 'N'
NB_A = MB_C NB_B = NB_C MB_A = MB_B

ia, ja ib, ic
mod(jb-1, NB_B) = mod(jc-1, NB_C)

n+mod(jb-1, NB_B) > NB_B -or- n+mod(jc-1, NB_C) > NB_C

A 'T' or 'C' 'T' or 'C'
NB_A = MB_C MB_B = NB_C MB_A = NB_B

ia, ja jb, ic
mod(ib-1, MB_B) = mod(jc-1, NB_C)

n+mod(ib-1, MB_B) > MB_B -or- n+mod(jc-1, NB_C) > NB_C

B 'N' 'N'
MB_A = MB_C NB_B = NB_C NB_A = MB_B

ib, jb ja, jc
mod(ia-1, MB_A) = mod(ic-1, MB_C)

m+mod(ia-1, MB_A) > MB_A -or- m+mod(ic-1, MB_C) > MB_C

B 'N' 'T' or 'C'
MB_A = MB_C MB_B = NB_C NB_A = NB_B

ib, jb ja, jc
mod(ia-1, MB_A) = mod(ic-1, MB_C)

m+mod(ia-1, MB_A) > MB_A -or- m+mod(ic-1, MB_C) > MB_C

B 'T' or 'C' 'N'
NB_A = MB_C NB_B = NB_C MB_A = MB_B

ib, jb ia, jc
mod(ja-1, NB_A) = mod(ic-1, MB_C)

m+mod(ja-1, NB_A) > NB_A -or- m+mod(ic-1, MB_C) > MB_C

B 'T' or 'C' 'T' or 'C'
NB_A = MB_C MB_B = NB_C MB_A = NB_B

ib, jb ia, jc
mod(ja-1, NB_A) = mod(ic-1, MB_C)

m+mod(ja-1, NB_A) > NB_A -or- m+mod(ic-1, MB_C) > MB_C

C 'N' 'N'
MB_A = MB_C NB_B = NB_C NB_A = MB_B

ic, jc ia, jb
mod(ja-1, NB_A) = mod(ib-1, MB_B)

k+mod(ja-1, NB_A) > NB_A -or- k+mod(ib-1, MB_B) > MB_B

C 'N' 'T' or 'C'
MB_A = MB_C MB_B = NB_C NB_A = NB_B

ic, jc ia, ib
mod(ja-1, NB_A) = mod(jb-1, NB_B)

k+mod(ja-1, NB_A) > NB_A -or- k+mod(jb-1, NB_B) > NB_B

C 'T' or 'C' 'N'
NB_A = MB_C NB_B = NB_C MB_A = MB_B

ic, jc ja, jb
mod(ia-1, MB_A) = mod(ib-1, MB_B)

k+mod(ia-1, MB_A) > MB_A -or- k+mod(ib-1, MB_B) > MB_B

C 'T' or 'C' 'T' or 'C'
NB_A = MB_C MB_B = NB_C MB_A = NB_B

ic, jc ja, ib
mod(ia-1, MB_A) = mod(jb-1, NB_B)

k+mod(ia-1, MB_A) > MB_A -or- k+mod(jb-1, NB_B) > NB_B

-1- X	-2- `transa`	-3- `transb`	-4- (b) Equal Block Sizes	-5- (a) Block Bndry For X	-6- (d) Block Bndry For Other	-7- (c) Equal Block Offsets (If Looping is Required)	-8- (c) Conditions For Looping
A	'N'	'N'	MB_A = MB_C NB_B = NB_C NB_A = MB_B	`ia, ja`	`ib, ic`	mod(`jb`-1, NB_B) = mod(`jc`-1, NB_C)	`n`+mod(`jb`-1, NB_B) > NB_B -or- `n`+mod(`jc`-1, NB_C) > NB_C
A	'N'	'T' or 'C'	MB_A = MB_C MB_B = NB_C NB_A = NB_B	`ia, ja`	`jb, ic`	mod(`ib`-1, MB_B) = mod(`jc`-1, NB_C)	`n`+mod(`ib`-1, MB_B) > MB_B -or- `n`+mod(`jc`-1, NB_C) > NB_C
A	'T' or 'C'	'N'	NB_A = MB_C NB_B = NB_C MB_A = MB_B	`ia, ja`	`ib, ic`	mod(`jb`-1, NB_B) = mod(`jc`-1, NB_C)	`n`+mod(`jb`-1, NB_B) > NB_B -or- `n`+mod(`jc`-1, NB_C) > NB_C
A	'T' or 'C'	'T' or 'C'	NB_A = MB_C MB_B = NB_C MB_A = NB_B	`ia, ja`	`jb, ic`	mod(`ib`-1, MB_B) = mod(`jc`-1, NB_C)	`n`+mod(`ib`-1, MB_B) > MB_B -or- `n`+mod(`jc`-1, NB_C) > NB_C
B	'N'	'N'	MB_A = MB_C NB_B = NB_C NB_A = MB_B	`ib, jb`	`ja, jc`	mod(`ia`-1, MB_A) = mod(`ic`-1, MB_C)	`m`+mod(`ia`-1, MB_A) > MB_A -or- `m`+mod(`ic`-1, MB_C) > MB_C
B	'N'	'T' or 'C'	MB_A = MB_C MB_B = NB_C NB_A = NB_B	`ib, jb`	`ja, jc`	mod(`ia`-1, MB_A) = mod(`ic`-1, MB_C)	`m`+mod(`ia`-1, MB_A) > MB_A -or- `m`+mod(`ic`-1, MB_C) > MB_C
B	'T' or 'C'	'N'	NB_A = MB_C NB_B = NB_C MB_A = MB_B	`ib, jb`	`ia, jc`	mod(`ja`-1, NB_A) = mod(`ic`-1, MB_C)	`m`+mod(`ja`-1, NB_A) > NB_A -or- `m`+mod(`ic`-1, MB_C) > MB_C
B	'T' or 'C'	'T' or 'C'	NB_A = MB_C MB_B = NB_C MB_A = NB_B	`ib, jb`	`ia, jc`	mod(`ja`-1, NB_A) = mod(`ic`-1, MB_C)	`m`+mod(`ja`-1, NB_A) > NB_A -or- `m`+mod(`ic`-1, MB_C) > MB_C
C	'N'	'N'	MB_A = MB_C NB_B = NB_C NB_A = MB_B	`ic, jc`	`ia, jb`	mod(`ja`-1, NB_A) = mod(`ib`-1, MB_B)	`k`+mod(`ja`-1, NB_A) > NB_A -or- `k`+mod(`ib`-1, MB_B) > MB_B
C	'N'	'T' or 'C'	MB_A = MB_C MB_B = NB_C NB_A = NB_B	`ic, jc`	`ia, ib`	mod(`ja`-1, NB_A) = mod(`jb`-1, NB_B)	`k`+mod(`ja`-1, NB_A) > NB_A -or- `k`+mod(`jb`-1, NB_B) > NB_B
C	'T' or 'C'	'N'	NB_A = MB_C NB_B = NB_C MB_A = MB_B	`ic, jc`	`ja, jb`	mod(`ia`-1, MB_A) = mod(`ib`-1, MB_B)	`k`+mod(`ia`-1, MB_A) > MB_A -or- `k`+mod(`ib`-1, MB_B) > MB_B
C	'T' or 'C'	'T' or 'C'	NB_A = MB_C MB_B = NB_C MB_A = NB_B	`ic, jc`	`ja, ib`	mod(`ia`-1, MB_A) = mod(`jb`-1, NB_B)	`k`+mod(`ia`-1, MB_A) > MB_A -or- `k`+mod(`jb`-1, NB_B) > NB_B

Additional rules apply to the row and column alignment of the various matrices in the process grid; specifically, the process row or process column containing the first row or column of the reference submatrix X, respectively, must also contain the first row or column of one of the other two nonreference submatrices, as indicated in column 4 of Table 47 for each entry for X. Following is the definition of ixrow and ixcol, which holds true for A, B, and C:

ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)

ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)

Table 47. Coding Rules for the Reference Matrix X

-1- X

-2- transa

-3- transb

-4- (e) Process Grid Alignment

A 'N' 'N' iarow = icrow
A 'N' 'T' or 'C'
iarow = icrow ibcol = iacol

A 'T' or 'C' 'N' iarow = ibrow
A 'T' or 'C' 'T' or 'C' (no rules)
B 'N' 'N' ibcol = iccol
B 'N' 'T' or 'C' ibcol = iacol
B 'T' or 'C' 'N'
iarow = ibrow ibcol = iccol

B 'T' or 'C' 'T' or 'C' (no rules)
C 'N' 'N'
iarow = icrow ibcol = iccol

C 'N' 'T' or 'C' iarow = icrow
C 'T' or 'C' 'N' ibcol = iccol
C 'T' or 'C' 'T' or 'C' (no rules)

-1- X	-2- `transa`	-3- `transb`	-4- (e) Process Grid Alignment
A	'N'	'N'	`iarow` = `icrow`
A	'N'	'T' or 'C'	`iarow` = `icrow` `ibcol` = `iacol`
A	'T' or 'C'	'N'	`iarow` = `ibrow`
A	'T' or 'C'	'T' or 'C'	(no rules)
B	'N'	'N'	`ibcol` = `iccol`
B	'N'	'T' or 'C'	`ibcol` = `iacol`
B	'T' or 'C'	'N'	`iarow` = `ibrow` `ibcol` = `iccol`
B	'T' or 'C'	'T' or 'C'	(no rules)
C	'N'	'N'	`iarow` = `icrow` `ibcol` = `iccol`
C	'N'	'T' or 'C'	`iarow` = `icrow`
C	'T' or 'C'	'N'	`ibcol` = `iccol`
C	'T' or 'C'	'T' or 'C'	(no rules)

Example: Following is an example of the coding rules necessary for the case where transa = 'N' and transb = 'N', where the reference matrix selected is A. Following are the indexes, dimensions, and block sizes used in the computation for the matrices:

Indexes: ic jc ia ja ib jb ic jc | | | | | | | | Dimensions: C ( m , n ) <-- alpha A ( m , k ) B ( k , n ) + beta C ( m , n ) | | | | | | | | Block Sizes: MB_C NB_C MB_A NB_A MB_B NB_B MB_C NB_C

A must be aligned on a block boundary, as indicated in column 5 in Table 46:

ia-1 must be a multiple of MB_A.
ja-1 must be a multiple of NB_A.
The block sizes that correspond to each matrix dimension must be equal, where MB_ represents the row dimension and NB_ represents the column dimension, as indicated in column 4 in Table 46:

MB_A = MB_C
NB_B = NB_C
NB_A = MB_B
As shown above, m and k are the dimensions of the reference matrix A; therefore, n is used to determine if looping is required; that is, if one of the following is true, as indicated in column 8 in Table 46:

n+mod(jc-1, NB_C) > NB_C
n+mod(jb-1, NB_B) > NB_B

then the following offsets must be equal, as indicated in column 7 in Table 46:

mod(jb-1, NB_B) = mod(jc-1, NB_C)
The other indexes from each of the nonreference matrices--not used in c above--must be aligned on a block boundary, as indicated in column 6 in Table 46:

ic-1 must be a multiple of MB_C.
ib-1 must be a multiple of MB_B.
In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix C, as indicated in column 4 in Table 47; that is, iarow = icrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_B is invalid.
DTYPE_C is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

The subroutine was called from outside the process grid.

Stage 4

transa <> 'N', 'T', or 'C'
transb <> 'N', 'T', or 'C'
m < 0
n < 0
k < 0
M_A < 0 and (m = 0 or k = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 or k = 0); N_A < 1 otherwise
M_B < 0 and (k = 0 or n = 0); M_B < 1 otherwise
N_B < 0 and (k = 0 or n = 0); N_B < 1 otherwise
M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
ia < 1
ib < 1
ic < 1
ja < 1
jb < 1
jc < 1
MB_A < 1
MB_B < 1
MB_C < 1
NB_A < 1
NB_B < 1
NB_C < 1
RSRC_A < 0 or RSRC_A >= p
RSRC_B < 0 or RSRC_B >= p
RSRC_C < 0 or RSRC_C >= p
CSRC_A < 0 or CSRC_A >= q
CSRC_B < 0 or CSRC_B >= q
CSRC_C < 0 or CSRC_C >= q
CTXT_A <> CTXT_B
CTXT_A <> CTXT_C

Stage 5

If m <> 0 and k <> 0:

transa = 'N' and ia+m-1 > M_A
transa = 'T' or 'C' and ia+k-1 > M_A
transa = 'N' and ja+k-1 > N_A
transa = 'T' or 'C' and ja+m-1 > N_A
ia > M_A
ja > N_A

If n <> 0 and k <> 0:
transb = 'N' and ib+k-1 > M_B
transb = 'T' or 'C' and ib+n-1 > M_B
transb = 'N' and jb+n-1 > N_B
transb = 'T' or 'C' and jb+k-1 > N_B
ib > M_B
jb > N_B

If m <> 0 and n <> 0:
ic+m-1 > M_C
jc+n-1 > N_C
ic > M_C
jc > N_C
For the reference matrix (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, the indexes listed in column 5 of Table 46 are not aligned on a block boundary, where boundary alignment is defined as:

ix-1 must be a multiple of MB_X.
jx-1 must be a multiple of NB_X.
For the two nonreference matrices (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, the indexes listed in column 6 of Table 46 are not aligned on a block boundary. Using Z to represent one of the nonreference matrices, each boundary alignment is expressed as one of the following:

iz-1 must be a multiple of MB_Z.
jz-1 must be a multiple of NB_Z.
For the reference matrix (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, if looping occurs--that is, one of the conditions in column 8 of Table 46 is true--then the block offsets indicated in column 7 are not equal.

Stage 6

For the appropriate transa and transb values indicated in Table 46 (where the reference matrix does not matter), some of the block sizes indicated in column 4 are not equal.
LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))
LLD_C < max(1, LOCp(M_C))
In the process grid, the process row or process column containing the first row or column of the reference submatrix X (defined in note 7 in "Notes and Coding Rules"), respectively, does not contain the first row or column of one of the other two nonreference submatrices, as indicated in column 4 of Table 47. Following is the definition of ixrow and ixcol, which holds true for A, B, and C:

ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)

Example 1

This example computes C = betaC+alphaAB using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
            TRANSA TRANSB  M    N    K   ALPHA    A  IA  JA   DESC_A   B  IB  JB
               |      |    |    |    |     |      |   |   |     |      |   |   |
 CALL PDGEMM( 'N' ,  'N' , 6  , 4  , 5 , 1.0D0  , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B    BETA    C  IC  JC   DESC_C
                |         |     |   |   |     |
              DESC_B ,  2.0D0 , C , 1 , 1 , DESC_C )

Desc_A

Desc_B

Desc_C

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_C = 3 on all processes, and LLD_B = 3 on P₁₀ and P₀₁ and LLD_B = 2 on P₁₀ and P₁₁.

Global general 6 × 5 matrix A with block size 3 × 2:

B,D        0             1          2
     *                                  *
     |  1.0  2.0  |  -1.0 -1.0  |   4.0 |
 0   |  2.0  0.0  |   1.0  1.0  |  -1.0 |
     |  1.0 -1.0  |  -1.0  1.0  |   2.0 |
     | -----------|-------------|------ |
     | -3.0  2.0  |   2.0  2.0  |   0.0 |
 1   |  4.0  0.0  |  -2.0  1.0  |  -1.0 |
     | -1.0 -1.0  |   1.0 -3.0  |   2.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D 0 2 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  1.0  2.0  4.0  |  -1.0 -1.0
 0   |  2.0  0.0 -1.0  |   1.0  1.0
     |  1.0 -1.0  2.0  |  -1.0  1.0
-----|-----------------|------------
     | -3.0  2.0  0.0  |   2.0  2.0
 1   |  4.0  0.0 -1.0  |  -2.0  1.0
     | -1.0 -1.0  2.0  |   1.0 -3.0

Global general 5 × 4 matrix B with block size 2 × 2:

B,D        0             1
     *                         *
 0   |  1.0 -1.0  |   0.0  2.0 |
     |  2.0  2.0  |  -1.0 -2.0 |
     | -----------|----------- |
 1   |  1.0  0.0  |  -1.0  1.0 |
     | -3.0 -1.0  |   1.0 -1.0 |
     | -----------|----------- |
 2   |  4.0  2.0  |  -1.0  1.0 |
     *                         *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for B:

p,q  |     0      |      1
-----|------------|------------
     |  1.0 -1.0  |   0.0  2.0
 0   |  2.0  2.0  |  -1.0 -2.0
     |  4.0  2.0  |  -1.0  1.0
-----|------------|------------
 1   |  1.0  0.0  |  -1.0  0.0
     | -3.0 -1.0  |   1.0 -1.0

Global general 6 × 4 matrix C with block size 3 × 2:

B,D        0             1
     *                         *
     |  0.5  0.5  |   0.5  0.5 |
 0   |  0.5  0.5  |   0.5  0.5 |
     |  0.5  0.5  |   0.5  0.5 |
     | -----------|----------- |
     |  0.5  0.5  |   0.5  0.5 |
 1   |  0.5  0.5  |   0.5  0.5 |
     |  0.5  0.5  |   0.5  0.5 |
     *                         *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for C:

p,q  |     0      |      1
-----|------------|------------
     |  0.5  0.5  |   0.5  0.5
 0   |  0.5  0.5  |   0.5  0.5
     |  0.5  0.5  |   0.5  0.5
-----|------------|------------
     |  0.5  0.5  |   0.5  0.5
 1   |  0.5  0.5  |   0.5  0.5
     |  0.5  0.5  |   0.5  0.5

Output:

Global general 6 × 4 matrix C with block size 3 × 2:

B,D         0               1
     *                             *
     |  24.0  13.0  |   -5.0   3.0 |
 0   |  -3.0  -4.0  |    2.0   4.0 |
     |   4.0   1.0  |    2.0   5.0 |
     | -------------|------------- |
     |  -2.0   6.0  |   -1.0  -9.0 |
 1   |  -4.0  -6.0  |    5.0   5.0 |
     |  16.0   7.0  |   -4.0   7.0 |
     *                             *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for C:

p,q  |      0       |       1
-----|--------------|--------------
     |  24.0  13.0  |   -5.0   3.0
 0   |  -3.0  -4.0  |    2.0   4.0
     |   4.0   1.0  |    2.0   5.0
-----|--------------|--------------
     |  -2.0   6.0  |   -1.0  -9.0
 1   |  -4.0  -6.0  |    5.0   5.0
     |  16.0   7.0  |   -4.0   7.0

Example 2

This example computes C = betaC+alphaAB using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
           TRANSA TRANSB M   N   K       ALPHA       A  IA  JA   DESC_A   B  IB  JB
              |     |    |   |   |         |         |   |   |     |      |   |   |
 CALL PZGEMM('N' , 'N' , 6 , 2 , 3 , (1.0D0,0.0D0) , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B       BETA        C  IC  JC   DESC_C
                |            |         |   |   |     |
              DESC_B , (2.0D0,0.0D0) , C , 1 , 1 , DESC_C)

Desc_A

Desc_B

Desc_C

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on P₀₀ and P₀₁ and LLD_A = 2 on P₁₀ and P₁₁. LLD_B = 2 on P₀₀ and LLD_B = 1 on P₁₀. LLD_C = 4 on P₀₀ and LLD_C = 2 on P₁₀.

Global general 6 × 3 matrix A with block size 2 × 2:

B,D              0                   1
     *                                      *
 0   |  (1.0,5.0)  (9.0,2.0)  |   (1.0,9.0) |
     |  (2.0,4.0)  (8.0,3.0)  |   (1.0,8.0) |
     | -----------------------|------------ |
 1   |  (3.0,3.0)  (7.0,5.0)  |   (1.0,7.0) |
     |  (4.0,2.0)  (4.0,7.0)  |   (1.0,5.0) |
     | -----------------------|------------ |
 2   |  (5.0,1.0)  (5.0,1.0)  |   (1.0,6.0) |
     |  (6.0,6.0)  (3.0,6.0)  |   (1.0,4.0) |
     *                                      *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |           0            |      1
-----|------------------------|-------------
     |  (1.0,5.0)  (9.0,2.0)  |   (1.0,9.0)
     |  (2.0,4.0)  (8.0,3.0)  |   (1.0,8.0)
 0   |  (5.0,1.0)  (5.0,1.0)  |   (1.0,6.0)
     |  (6.0,6.0)  (3.0,6.0)  |   (1.0,4.0)
-----|------------------------|-------------
 1   |  (3.0,3.0)  (7.0,5.0)  |   (1.0,7.0)
     |  (4.0,2.0)  (4.0,7.0)  |   (1.0,5.0)

Global general 3 × 2 matrix B with block size 2 × 2:

B,D              0
     *                       *
 0   |  (1.0,8.0)  (2.0,7.0) |
     |  (4.0,4.0)  (6.0,8.0) |
     | --------------------- |
 1   |  (6.0,2.0)  (4.0,5.0) |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 --
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	--
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for B:

p,q  |           0
-----|-----------------------
 0   |  (1.0,8.0)  (2.0,7.0)
     |  (4.0,4.0)  (6.0,8.0)
-----|-----------------------
 1   |  (6.0,2.0)  (4.0,5.0)

Global general 6 × 2 matrix C with block size 2 × 2:

B,D              0
     *                       *
 0   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     | --------------------- |
 1   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     | --------------------- |
 2   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	--
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for C:

p,q  |           0
-----|-----------------------
     |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)
 0   |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)
-----|-----------------------
 1   |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)

Output:

Global general 6 × 2 matrix C with block size 2 × 2:

B,D                  0
     *                               *
 0   |  (-22.0,113.0)  (-35.0.142.0) |
     |  (-19.0,114.0)  (-35.0.141.0) |
     | ----------------------------- |
 1   |  (-20.0,119.0)  (-43.0.146.0) |
     |  (-27.0,110.0)  (-58.0.131.0) |
     | ----------------------------- |
 2   |  (8.0,103.0)    (0.0.112.0)   |
     |  (-55.0,116.0)  (-75.0.135.0) |
     *                               *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	--
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for C:

p,q  |               0
-----|-------------------------------
     |  (-22.0,113.0)  (-35.0.142.0)
     |  (-19.0,114.0)  (-35.0.141.0)
 0   |  (8.0,103.0)    (0.0.112.0)
     |  (-55.0,116.0)  (-75.0.135.0)
-----|-------------------------------
 1   |  (-20.0,119.0)  (-43.0.146.0)
     |  (-27.0,110.0)  (-58.0.131.0)

PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric

This subroutine computes one of the following matrix-matrix products:

1. C <-- alphaAB+betaC

2. C <-- alphaBA+betaC

where, in the formulas above:

A represents the global symmetric submatrix:

For side = 'L', it is A_{ia:ia+m-1,
ja:ja+m-1}.
For side = 'R', it is A_{ia:ia+n-1,
ja:ja+n-1}.

B represents the global general submatrix B_{ib:ib+m-1,
jb:jb+n-1}.

C represents the global general submatrix C_{ic:ic+m-1,
jc:jc+n-1}.

alpha and beta are scalars.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

m = 0 or n = 0
alpha is zero and beta is one.

See references [14] and [15].

Table 48. Data Types

alpha, beta, A, B, C Subprogram
Long-precision real PDSYMM

Syntax

Fortran	CALL PDSYMM (`side`, `uplo`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`)
C and C++	pdsymm (`side`, `uplo`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`);

On Entry

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1.

If side = 'R', A is to the right of B, resulting in equation 2.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global symmetric submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

m

is the number of rows in submatrices B and C used in the computation, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrices B and C used in the computation, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 48.

a

is the local part of the global symmetric matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

If uplo = 'U', the leading numa × numa upper triangular part of the global symmetric submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading numa × numa lower triangular part of the global symmetric submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	NB_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 48.

c

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:

`desc_c`	Name	Description	Limits	Scope
1	DTYPE_C	Descriptor type	DTYPE_C=1	Global
2	CTXT_C	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_C	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_C >= 0 Otherwise: M_C >= 1	Global
4	N_C	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_C >= 0 Otherwise: N_C >= 1	Global
5	MB_C	Row block size	MB_C >= 1	Global
6	NB_C	Column block size	NB_C >= 1	Global
7	RSRC_C	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_C < `p`	Global
8	CSRC_C	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_C < `q`	Global
9	LLD_C	The leading dimension of the local array	LLD_C >= max(1,LOCp(M_C))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48.

Notes and Coding Rules

This subroutine accepts lowercase letters for the side and uplo arguments.
The matrices must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_B = CTXT_C.
If side = 'L':
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrices B and C; that is:
  
  iarow = ibrow
  iarow = icrow
  where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
- If looping is required--that is, either of the following is true:
  
  n+mod(jb-1, NB_B) > NB_B
  n+mod(jc-1, NB_C) > NB_C
  
  then:
  - The following block sizes must be equal: NB_B = NB_C.
  - The block column offset of B must be equal to the block column offset of C; that is, mod(jb-1, NB_B) = mod(jc-1, NB_C).
If side = 'R':
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrices B and C; that is:
  
  iacol = ibcol
  iacol = iccol
  where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
- If looping is required--that is, either of the following is true:
  
  m+mod(ib-1, MB_B) > MB_B
  m+mod(ic-1, MB_C) > MB_C
  
  then:
  - The following block sizes must be equal: MB_B = MB_C.
  - The block row offset of B must be equal to the block row offset of C; that is, mod(ib-1, MB_B) = mod(ic-1, MB_C)
If all the following are true:
- A is contained within a single block, that is:
  
  numa+mod(ia-1, MB_A) <= MB_A
  numa+mod(ja-1, NB_A) <= NB_A
  where:
  
  If side = 'L', numa = m
  If side = 'R', numa = n
- If side = 'L', then (in the process grid) the process column containing the first column of the submatrix B must also contain the first column of the submatrix C, that is, ibcol = iccol, where:
  
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
- If side = 'R', then (in the process grid) the process row containing the first row of the submatrix B must also contain the first row of the submatrix C; that is, ibrow = icrow, where:
  
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
then you must follow these rules:
- If side = 'L', then B and C must be block row matrices; that is, if p > 1:
  
  m+mod(ib-1, MB_B) <= MB_B
  m+mod(ic-1, MB_C) <= MB_C
- If side = 'R', then B and C must be block column matrices; that is, if q > 1:
  
  n+mod(jb-1, NB_B) <= NB_B
  n+mod(jc-1, NB_C) <= NB_C
If the following is true:
- A is not contained within a single block.
or if all the following are true:
- A is contained within a single block.
- If side = 'L', then (in the process grid) the process column containing the first column of the submatrix B does not contain the first column of the submatrix C, that is, ibcol <> iccol, where:
  
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
- If side = 'R', then (in the process grid) the process row containing the first row of the submatrix B does not contain the first row of the submatrix C; that is, ibrow <> icrow, where:
  
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
then you must follow these rules:
- The global symmetric matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
- The global symmetric matrix A must be aligned on a block boundary, that is:
  
  ia-1 must be a multiple of MB_A.
  ja-1 must be a multiple of NB_A.
- If side = 'L':
  - The following block sizes must be equal: MB_B = MB_C = NB_A.
  - The global matrices B and C must be aligned on a block row boundary, that is:
    
    ib-1 must be a multiple of MB_B.
    ic-1 must be a multiple of MB_C.
- If side = 'R':
  - The following block sizes must be equal: NB_B = NB_C = MB_A.
  - The global matrices B and C must be aligned on a block column boundary, that is:
    
    jb-1 must be a multiple of NB_B.
    jc-1 must be a multiple of NB_C.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_B is invalid.
DTYPE_C is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYMM was called from outside the process grid.

Stage 4

side <> 'L' or 'R'
uplo <> 'U' or 'L'
m < 0
n < 0
M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1
M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
MB_B < 1
NB_B < 1
RSRC_B < 0 or RSRC_B >= p
CSRC_B < 0 or CSRC_B >= q
ib < 1
jb < 1
M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
MB_C < 1
NB_C < 1
RSRC_C < 0 or RSRC_C >= p
CSRC_C < 0 or CSRC_C >= q
ic < 1
jc < 1
CTXT_A <> CTXT_B
CTXT_A <> CTXT_C

Stage 5

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

ia > M_A
ja > N_A
ia+numa-1 > M_A
ja+numa-1 > N_A
where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

ib > M_B
jb > N_B
ib+m-1 > M_B
jb+n-1 > N_B
ic > M_C
jc > N_C
ic+m-1 > M_C
jc+n-1 > N_C

Stage 6

If A is contained within a single block, that is:

numa+mod(ia-1, MB_A) <= MB_A

numa+mod(ja-1, NB_A) <= NB_A

where:

If side = 'L', numa = m

If side = 'R', numa = n

and:

If side = 'L', then (in the process grid) the process column containing the first column of the submatrix B must also contain the first column of the submatrix C, that is, ibcol = iccol, where:

ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
If side = 'R', then (in the process grid) the process row containing the first row of the submatrix B must also contain the first row of the submatrix C; that is, ibrow = icrow, where:

ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

then:

If side = 'L':
1. p > 1 and m+mod(ib-1, MB_B) > MB_B
2. p > 1 and m+mod(ic-1, MB_C) > MB_C
If side = 'R':
1. q > 1 and n+mod(jb-1, NB_B) > NB_B
2. q > 1 and n+mod(jc-1, NB_C) > NB_C

If A is not contained within a single block, or if A is contained within a single block and:

If side = 'L', then (in the process grid) the process column containing the first column of the submatrix B does not contain the first column of the submatrix C, that is, ibcol <> iccol, where:

ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
If side = 'R', then (in the process grid) the process row containing the first row of the submatrix B does not contain the first row of the submatrix C; that is, ibrow <> icrow, where:

ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

then:

MB_A <> NB_A
mod(ia-1, MB_A) <> 0
mod(ja-1, NB_A) <> 0

If side = 'L':
MB_B <> NB_A
MB_C <> NB_A
mod(ib-1, MB_B) <> 0
mod(ic-1, MB_C) <> 0

If side = 'R':
NB_B <> MB_A
NB_C <> MB_A
mod(jb-1, NB_B) <> 0
mod(jc-1, NB_C) <> 0

In all cases:

LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))
LLD_C < max(1, LOCp(M_C))

If side = 'L' and looping is required--that is, either of the following is true:

n+mod(jb-1, NB_B) > NB_B
n+mod(jc-1, NB_C) > NB_C

then:
NB_B <> NB_C
mod(jb-1, NB_B) <> mod(jc-1, NB_C).

If side = 'L':
In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix C; that is, iarow <> icrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

If side = 'R' and looping is required--that is, either of the following is true:

m+mod(ib-1, MB_B) > MB_B
m+mod(ic-1, MB_C) > MB_C

then:
MB_B <> MB_C
mod(ib-1, MB_B) <> mod(ic-1, MB_C).

If side = 'R':
In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix C; that is, iacol <> iccol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
iccol = mod((((jc-1)/NB_C)+CSRC_C), q)

Example

This example computes C = betaC+alphaBA using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO   M   N    ALPHA    A  IA  JA   DESC_A   B  IB   JB
               |     |     |   |      |      |   |   |     |      |   |   |
 CALL PDSYMM( 'R' , 'U' , 16 , 8 ,  1.0D0  , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B   BETA    C  IC  JC   DESC_C
                |        |     |   |   |     |
              DESC_B , 0.0D0 , C , 1 , 1 , DESC_C )

Desc_A

Desc_B

Desc_C

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on all processes, and LLD_B = LLD_C = 8 on all processes.

Global symmetric matrix A of order 8 with block size 2 × 2:

B,D        0             1             2             3
     *                                                     *
 0   |  0.0 -1.0  |  -1.0  0.0  |   0.0  0.0  |   0.0  0.0 |
     |   .   1.0  |   0.0  1.0  |   0.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 1   |   .   .    |  -1.0 -1.0  |   0.0  0.0  |   1.0  0.0 |
     |   .   .    |    .  -1.0  |   1.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 2   |   .    .   |    .    .   |  -1.0  0.0  |   0.0  0.0 |
     |   .    .   |    .    .   |    .   1.0  |   0.0  0.0 |
     | -----------|-------------|-------------|----------- |
 3   |   .    .   |    .    .   |    .    .   |   0.0  0.0 |
     |   .    .   |    .    .   |    .    .   |    .   0.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0 2	1 3
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for A:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0 -1.0  0.0  0.0  |  -1.0  0.0  0.0  0.0
     |   .   1.0  0.0  1.0  |   0.0  1.0  0.0  1.0
 0   |   .    .  -1.0  0.0  |    .    .   0.0  0.0
     |   .    .    .   1.0  |    .    .   0.0  0.0
-----|----------------------|----------------------
     |   .    .   0.0  0.0  |  -1.0 -1.0  1.0  0.0
     |   .    .   1.0  1.0  |    .  -1.0  0.0  1.0
 1   |   .    .    .    .   |    .    .   0.0  0.0
     |   .    .    .    .   |    .    .    .   0.0

Global general 16 × 8 matrix B with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     | -1.0  0.0  |   1.0 -1.0  |   1.0  1.0  |  -1.0 -1.0 |
     | -1.0 -1.0  |   1.0  0.0  |   1.0 -1.0  |  -1.0  1.0 |
 0   |  1.0  1.0  |  -1.0  0.0  |  -1.0  0.0  |   1.0  0.0 |
     |  0.0 -1.0  |   0.0  0.0  |   0.0  0.0  |   0.0 -1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  1.0  |   0.0  1.0  |   0.0  1.0  |   1.0  0.0 |
     |  0.0  0.0  |   1.0  0.0  |  -1.0 -1.0  |   0.0  0.0 |
 1   |  1.0  1.0  |   0.0  0.0  |   1.0  1.0  |   0.0 -1.0 |
     |  0.0  0.0  |  -1.0  0.0  |   0.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  0.0  |   0.0 -1.0  |   1.0  1.0  |   0.0  1.0 |
     | -1.0 -1.0  |   1.0  0.0  |   0.0 -1.0  |   0.0  1.0 |
 2   |  0.0  0.0  |   0.0  1.0  |   1.0  0.0  |   0.0  0.0 |
     |  0.0  0.0  |   1.0  1.0  |   0.0 -1.0  |   0.0  0.0 |
     | -----------|-------------|-------------|----------- |
     |  1.0  1.0  |  -1.0  0.0  |  -1.0 -1.0  |   1.0  1.0 |
     |  0.0  0.0  |   0.0  0.0  |   1.0  0.0  |   0.0 -1.0 |
 3   |  0.0  1.0  |   0.0  0.0  |   0.0  0.0  |   0.0  0.0 |
     | -1.0  0.0  |  -1.0  0.0  |   0.0  1.0  |   1.0  0.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0 2	1 3
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for B:

p,q  |          0           |           1
-----|----------------------|----------------------
     | -1.0  0.0  1.0  1.0  |   1.0 -1.0 -1.0 -1.0
     | -1.0 -1.0  1.0 -1.0  |   1.0  0.0 -1.0  1.0
     |  1.0  1.0 -1.0  0.0  |  -1.0  0.0  1.0  0.0
     |  0.0 -1.0  0.0  0.0  |   0.0  0.0  0.0 -1.0
 0   |  0.0  0.0  1.0  1.0  |   0.0 -1.0  0.0  1.0
     | -1.0 -1.0  0.0 -1.0  |   1.0  0.0  0.0  1.0
     |  0.0  0.0  1.0  0.0  |   0.0  1.0  0.0  0.0
     |  0.0  0.0  0.0 -1.0  |   1.0  1.0  0.0  0.0
-----|----------------------|----------------------
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  1.0  0.0
     |  0.0  0.0 -1.0 -1.0  |   1.0  0.0  0.0  0.0
     |  1.0  1.0  1.0  1.0  |   0.0  0.0  0.0 -1.0
     |  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  1.0
 1   |  1.0  1.0 -1.0 -1.0  |  -1.0  0.0  1.0  1.0
     |  0.0  0.0  1.0  0.0  |   0.0  0.0  0.0 -1.0
     |  0.0  1.0  0.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  1.0  |  -1.0  0.0  1.0  0.0

Output:

Global general 16 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     | -1.0  0.0  |   0.0  1.0  |  -2.0  0.0  |   1.0 -1.0 |
     |  0.0  0.0  |  -1.0 -1.0  |  -1.0 -2.0  |   1.0 -1.0 |
 0   |  0.0  0.0  |   1.0  1.0  |   1.0  1.0  |  -1.0  1.0 |
     |  1.0 -2.0  |   0.0 -2.0  |   0.0 -1.0  |   0.0 -1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  3.0  |   0.0  1.0  |   1.0  3.0  |   0.0  2.0 |
     | -1.0 -1.0  |  -1.0 -3.0  |   1.0 -1.0  |   1.0  0.0 |
 1   | -1.0  0.0  |  -1.0  2.0  |  -1.0  2.0  |   0.0  1.0 |
     |  1.0  2.0  |   1.0  3.0  |   0.0  1.0  |  -1.0  0.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  1.0  |   1.0  4.0  |  -2.0  0.0  |   0.0 -1.0 |
     |  0.0  0.0  |   0.0 -2.0  |   0.0 -2.0  |   1.0 -1.0 |
 2   |  0.0  1.0  |  -1.0  0.0  |   0.0  1.0  |   0.0  1.0 |
     | -1.0  0.0  |  -2.0 -3.0  |   1.0  0.0  |   1.0  1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  0.0  |   1.0  1.0  |   1.0  0.0  |  -1.0  1.0 |
     |  0.0 -1.0  |   0.0  0.0  |  -1.0  0.0  |   0.0  0.0 |
 3   | -1.0  1.0  |   0.0  1.0  |   0.0  1.0  |   0.0  1.0 |
     |  1.0  2.0  |   3.0  2.0  |   0.0  1.0  |  -1.0  0.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0 2	1 3
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     | -1.0  0.0 -2.0  0.0  |   0.0  1.0  1.0 -1.0
     |  0.0  0.0 -1.0 -2.0  |  -1.0 -1.0  1.0 -1.0
     |  0.0  0.0  1.0  1.0  |   1.0  1.0 -1.0  1.0
     |  1.0 -2.0  0.0 -1.0  |   0.0 -2.0  0.0 -1.0
 0   |  0.0  1.0 -2.0  0.0  |   1.0  4.0  0.0 -1.0
     |  0.0  0.0  0.0 -2.0  |   0.0 -2.0  1.0 -1.0
     |  0.0  1.0  0.0  1.0  |  -1.0  0.0  0.0  1.0
     | -1.0  0.0  1.0  0.0  |  -2.0 -3.0  1.0  1.0
-----|----------------------|----------------------
     | -1.0  3.0  1.0  3.0  |   0.0  1.0  0.0  2.0
     | -1.0 -1.0  1.0 -1.0  |  -1.0 -3.0  1.0  0.0
     | -1.0  0.0 -1.0  2.0  |  -1.0  2.0  0.0  1.0
     |  1.0  2.0  0.0  1.0  |   1.0  3.0 -1.0  0.0
 1   |  0.0  0.0  1.0  0.0  |   1.0  1.0 -1.0  1.0
     |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0
     |  1.0  2.0  0.0  1.0  |   3.0  2.0 -1.0  0.0

PDTRMM--Triangular Matrix-Matrix Product

This subroutine computes one of the following matrix-matrix products:

1. B <-- alphaAB 3. B <-- alphaBA
2. B <-- alphaA^TB 4. B <-- alphaBA^T

where, in the formulas above:

A represents the global triangular submatrix:

For side = 'L', it is A_{ia:ia+m-1,
ja:ja+m-1}.
For side = 'R', it is A_{ia:ia+n-1,
ja:ja+n-1}.

B represents the global general submatrix B_{ib:ib+m-1,
jb:jb+n-1}.

alpha is a scalar.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 49. Data Types

alpha, A, B Subprogram
Long-precision real PDTRMM

Syntax

Fortran	CALL PDTRMM (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`)
C and C++	pdtrmm (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`);

On Entry

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1 or 2.

If side = 'R', A is to the right of B, resulting in equation 3 or 4.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1 or 3.

If transa = 'T', A^T is used in the computation, resulting in equation 2 or 4.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m

is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 49.

a

is the local part of the global triangular matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

If uplo = 'U', the leading numa × numa upper triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading numa × numa lower triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Note: No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 49. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 and `side` = 'L' or `n` = 0 and `side` = 'R': N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	NB_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b

is the updated local part of the global matrix B, containing the results of the computation.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49.

Notes and Coding Rules

This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
PDTRMM assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_B.
If A is not contained within a single block, that is:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:

If side = 'L', numa = m
If side = 'R', numa = n

then:
- The global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
- The global triangular matrix A must be aligned on a block boundary, that is:
  
  ia-1 must be a multiple of MB_A.
  ja-1 must be a multiple of NB_A.
If side = 'L':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: MB_B = NB_A.
  - The global matrix B must be aligned on a block row boundary; that is, ib-1 must be a multiple of MB_B.
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix B; that is, iarow = ibrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
- If A is contained within a single block, then B must be a block row matrix; that is, if p > 1:
  
  m+mod(ib-1, MB_B) <= MB_B
If side = 'R':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: NB_B = MB_A
  - The global matrix B must be aligned on a block column boundary; that is, jb-1 must be a multiple of NB_B.
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix B, that is, iacol = ibcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
- If A is contained within a single block, then B must be a block column matrix; that is, if q > 1:
  
  n+mod(jb-1, NB_B) <= NB_B

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_B is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDTRMM was called from outside the process grid.

Stage 4

side <> 'L' or 'R'
uplo <> 'U' or 'L'
transa <> 'N', 'T', or 'C'
diag <> 'N' or 'U'
m < 0
n < 0
M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
MB_A < 1
NB_A < 1
M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
MB_B < 1
NB_B < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
RSRC_B < 0 or RSRC_B >= p
CSRC_B < 0 or CSRC_B >= q
ia < 1
ja < 1
ib < 1
jb < 1
CTXT_A <> CTXT_B

Stage 5

MB_A <> NB_A

If A is not contained within a single block, that is:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:

If side = 'L', numa = m
If side = 'R', numa = n

and:
side = 'L' and MB_B <> NB_A
side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

ia > M_A
ja > N_A
ia+numa-1 > M_A
ja+numa-1 > N_A
where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

ib > M_B
jb > N_B
ib+m-1 > M_B
jb+n-1 > N_B

If A is not contained in a single block:

mod(ia-1, MB_A) <> 0
mod(ja-1, NB_A) <> 0
side = 'L' and mod(ib-1, MB_B) <> 0
side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))

If side = 'L':
In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
If A is contained in a single block:

p > 1 and m+mod(ib-1, MB_B) > MB_B

If side = 'R':
In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If A is contained in a single block:

q > 1 and n+mod(jb-1, NB_B) > NB_B

Example

This example computes B = alphaAB using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA  DIAG  M   N    ALPHA    A  IA  JA   DESC_A
               |     |      |      |    |   |      |      |   |   |     |
 CALL PDTRMM( 'L' , 'U'  , 'N'  , 'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 , DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )

Desc_A

Desc_B

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))

In this example, LLD_A = LLD_B = 3 on P₀₀ and P₀₁, and LLD_A = LLD_B = 2 on P₁₀ and P₁₁.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

Output:

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |   -2.0 |
     | -16.0  -1.0  |    6.0 |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

This subroutine performs one of the following solves for a triangular system of equations with multiple right-hand sides:

Solution Equation
1. B <-- alpha(A^-1)B AX = alphaB
2. B <-- alpha(A^-T)B A^TX = alphaB
3. B <-- alphaB(A^-1) XA = alphaB
4. B <-- alphaB(A^-T) XA^T = alphaB

Solution	Equation
1. B <-- alpha(A^-1)B	AX = alphaB
2. B <-- alpha(A^-T)B	A^TX = alphaB
3. B <-- alphaB(A^-1)	XA = alphaB
4. B <-- alphaB(A^-T)	XA^T = alphaB

where, in the formulas above:

A represents the global triangular submatrix:

For side = 'L', it is A_{ia:ia+m-1,
ja:ja+m-1}.
For side = 'R', it is A_{ia:ia+n-1,
ja:ja+n-1}.

B represents the global general submatrix B_{ib:ib+m-1,
jb:jb+n-1}.

alpha is a scalar.

Notes:

The term X used in the systems of equations listed above represents the output solution matrix. It is important to note that, in this subroutine, the solution matrix is actually returned in the input-output argument b.
No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 50. Data Types

alpha, A, B Subprogram
Long-precision real PDTRSM

Syntax

Fortran	CALL PDTRSM (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`)
C and C++	pdtrsm (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`);

On Entry

side

indicates whether A is located to the left or right of B in the system of equations, where:

If side = 'L', A is to the left of B, resulting in solution 1 or 2.

If side = 'R', A is to the right of B, resulting in solution 3 or 4.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used in the system of equations, resulting in solution 1 or 3.

If transa = 'T', A^T is used in the system of equations, resulting in solution 2 or 4.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m

is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 50.

a

is the local part of the global triangular matrix A, used in the system of equations. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

If uplo = 'U', the leading numa × numa upper triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading numa × numa lower triangular part of the global triangular submatrix A_{ia:ia+numa-1,
ja:ja+numa-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Note: No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `side` = 'L' and `m` = 0: M_A >= 0 If `side` = 'R' and `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `side` = 'L' and `m` = 0: N_A >= 0 If `side` = 'R' and `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B, containing the right-hand sides of the triangular system to be solved. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `side` = 'L' and `m` = 0: M_B >= 0 If `side` = 'R' and `n` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	If `side` = 'L' and `m` = 0: N_B >= 0 If `side` = 'R' and `n` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b

is the updated local part of the global matrix B, containing the n solution vectors of length m.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50.

Notes and Coding Rules

This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
PDTRSM assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_B.
If looping is required--that is, either of the following is true:

side = 'L' and m+mod(ia-1, MB_A) > MB_A
side = 'R' and n+mod(ja-1, NB_A) > NB_A

then the global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
If A is not contained within a single block, that is:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:

If side = 'L', numa = m
If side = 'R', numa = n

then the global triangular matrix A must be aligned on a block boundary, that is:

ia-1 must be a multiple of MB_A.
ja-1 must be a multiple of NB_A.
If side = 'L':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: MB_B = NB_A
  - The global matrix B must be aligned on a block row boundary; that is, ib-1 must be a multiple of MB_B.
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix B; that is, iarow = ibrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
If side = 'R':
- If A is not contained within a single block, then:
  - The following block sizes must be equal: NB_B = MB_A
  - The global matrix B must be aligned on a block column boundary; that is, jb-1 must be a multiple of NB_B.
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix B, that is, iacol = ibcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If A is contained within a single block, then:
- If side = 'L', then B must be a block row matrix; that is, if p > 1:
  
  m+mod(ib-1, MB_B) <= MB_B
- If side = 'R', then B must be a block column matrix; that is, if q > 1:
  
  n+mod(jb-1, NB_B) <= NB_B

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_B is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDTRSM was called from outside the process grid.

Stage 4

side <> 'L' or 'R'
uplo <> 'U' or 'L'
transa <> 'N', 'T', or 'C'
diag <> 'N' or 'U'
m < 0
n < 0
M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
MB_A < 1
NB_A < 1
M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
MB_B < 1
NB_B < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
RSRC_B < 0 or RSRC_B >= p
CSRC_B < 0 or CSRC_B >= q
ia < 1
ja < 1
ib < 1
jb < 1
CTXT_A <> CTXT_B

Stage 5

If A is not contained within a single block, that is:

numa+mod(ia-1, MB_A) > MB_A

numa+mod(ja-1, NB_A) > NB_A

where:

If side = 'L', numa = m

If side = 'R', numa = n

then:

MB_A <> NB_A
side = 'L' and MB_B <> NB_A
side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

ia > M_A
ja > N_A
ia+numa-1 > M_A
ja+numa-1 > N_A
where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

ib > M_B
jb > N_B
ib+m-1 > M_B
jb+n-1 > N_B

If A is not contained in a single block:

mod(ia-1, MB_A) <> 0
mod(ja-1, NB_A) <> 0
side = 'L' and mod(ib-1, MB_B) <> 0
side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))

If side = 'L':

In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
If A is contained in a single block:

p > 1 and m+mod(ib-1, MB_B) > MB_B

If side = 'R':

In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If A is contained in a single block:

q > 1 and n+mod(jb-1, NB_B) > NB_B

Example

This example shows the solution B <-- alpha(A^-1)B using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA   DIAG   M   N    ALPHA    A  IA  JA    DESC_A
               |      |      |       |    |   |      |      |   |   |      |
 CALL PDTRSM( 'L'  , 'U' ,  'N'  ,  'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 ,  DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )

Desc_A

Desc_B

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))

In this example, LLD_A = LLD_B = 3 on P₀₀ and P₀₁, and LLD_A = LLD_B = 2 on P₁₀ and P₁₁.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global general 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |  -2.0  |
     | -16.0  -1.0  |   6.0  |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

Output:

Global general 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

PDSYRK--Rank-K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-k updates:

1. C <-- alphaAA^T+betaC

2. C <-- alphaA^TA+betaC

where, in the formulas above:

A represents the global general submatrix:

For trans = 'N', it is A_{ia:ia+n-1,
ja:ja+k-1}.
For trans = 'T', it is A_{ia:ia+k-1,
ja:ja+n-1}.

C represents the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1}.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

n = 0
beta is one, and alpha is zero or k = 0.

See references [14] and [15].

Table 51. Data Types

alpha, beta, A, C Subprogram
Long-precision real PDSYRK

Syntax

Fortran	CALL PDSYRK (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `beta`, `c`, `ic`, `jc`, `desc_c`)
C and C++	pdsyrk (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `beta`, `c`, `ic`, `jc`, `desc_c`);

On Entry

uplo

indicates whether the upper or lower triangular part of the symmetric submatrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Scope: global

Specified as: a single character; trans = 'N' or 'T'.

n

is the order of the global symmetric submatrix C used in the computation, and:

If trans = 'N', it is the number of rows in submatrix A used in the computation.

If trans = 'T', it is the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If trans = 'N', it is the number of columns in submatrix A used in the computation.

If trans = 'T', it is the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 51.

a

If trans = 'N', the leading LOCp(ia+n-1) by LOCq(ja+k-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+k-1 part of the global matrix.
If trans = 'T', the leading LOCp(ia+k-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+k-1 by ja+n-1 part of the global matrix.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If trans = 'N', then ia+n-1 <= M_A.

If trans = 'T', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If trans = 'N', then ja+k-1 <= N_A.

If trans = 'T', then ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0 or `k` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0 or `k` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 51.

c

is the local part of the global symmetric matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+n-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+n-1 by jc+n-1 part of the global matrix, and:

If uplo = 'U', the leading n × n upper triangular part of the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading n × n lower triangular part of the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:

`desc_c`	Name	Description	Limits	Scope
1	DTYPE_C	Descriptor type	DTYPE_C=1	Global
2	CTXT_C	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_C	Number of rows in the global matrix	If `n` = 0: M_C >= 0 Otherwise: M_C >= 1	Global
4	N_C	Number of columns in the global matrix	If `n` = 0: N_C >= 0 Otherwise: N_C >= 1	Global
5	MB_C	Row block size	MB_C >= 1	Global
6	NB_C	Column block size	NB_C >= 1	Global
7	RSRC_C	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_C < `p`	Global
8	CSRC_C	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_C < `q`	Global
9	LLD_C	The leading dimension of the local array	LLD_C >= max(1,LOCp(M_C))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global symmetric matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo and trans arguments.
If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_C.
If C is not contained within a single block, that is:

n+mod(ic-1, MB_C) > MB_C
n+mod(jc-1, NB_C) > NB_C

then:
- The global symmetric matrix C must be distributed using a square block-cyclic distribution; that is, MB_C = NB_C.
- The global symmetric matrix C must be aligned on a block boundary, that is:
  
  ic-1 must be a multiple of MB_C.
  jc-1 must be a multiple of NB_C.
If trans = 'N':
- If C is not contained within a single block, then:
  - The following block sizes must be equal: MB_A = NB_C.
  - The global matrix A must be aligned on a block row boundary; that is, ia-1 must be a multiple of MB_A.
- In the process grid, the process row containing the first row of the submatrix C must also contain the first row of the submatrix A; that is, icrow = iarow, where:
  
  icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
If trans = 'T':
- If C is not contained within a single block, then:
  - The following block sizes must be equal: NB_A = MB_C.
  - The global matrix A must be aligned on a block column boundary; that is, ja-1 must be a multiple of NB_A.
- In the process grid, the process column containing the first column of the submatrix C must also contain the first column of the submatrix A; that is, iccol = iacol, where:
  
  iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
If C is contained within a single block:
- If trans = 'N', A must be a block row matrix; that is, if p > 1:
  
  n+mod(ia-1, MB_A) <= MB_A
- If trans = 'T', A must be a block column matrix; that is, if q > 1:
  
  n+mod(ja-1, NB_A) <= NB_A

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_C is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYRK was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
trans <> 'N', 'T', or 'C'
n < 0 and trans = 'N'
n < 0 and trans = 'T' or 'C'
n < 0 and trans is invalid.
k < 0 and trans = 'N'
k < 0 and trans = 'T' or 'C'
k < 0 and trans is invalid.
M_A < 0 and (n = 0 or k = 0); M_A < 1 otherwise
N_A < 0 and (n = 0 or k = 0); N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1
M_C < 0 and n = 0; M_C < 1 otherwise
N_C < 0 and n = 0; N_C < 1 otherwise
MB_C < 1
NB_C < 1
RSRC_C < 0 or RSRC_C >= p
CSRC_C < 0 or CSRC_C >= q
ic < 1
jc < 1
CTXT_A <> CTXT_C

If n <> 0 and k <> 0:

ia > M_A
ja > N_A
trans = 'N' and ia+n-1 > M_A
trans = 'N' and ja+k-1 > N_A
trans = 'T' and ia+k-1 > M_A
trans = 'T' and ja+n-1 > N_A

If n <> 0:

ic > M_C
jc > N_C
ic+n-1 > M_C
jc+n-1 > N_C

Stage 5

If C is not contained within a single block, that is:

n+mod(ic-1, MB_C) > MB_C
n+mod(jc-1, NB_C) > NB_C

and NB_C <> MB_C.
trans = 'N' and NB_C <> MB_A.
trans = 'T' and MB_C <> NB_A.

If C is not contained within a single block:

mod(ic-1, MB_C) <> 0
mod(jc-1, NB_C) <> 0
trans = 'N' and mod(ia-1, MB_A) <> 0
trans = 'T' and mod(ja-1, NB_A) <> 0

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_C < max(1, LOCp(M_C))
If trans = 'N', then (in the process grid) the process row containing the first row of the submatrix C does not contain the first row of the submatrix A; that is, icrow <> iarow, where:

icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
If trans = 'T', then (in the process grid) the process column containing the first column of the submatrix C does not contain the first column of the submatrix A; that is, iccol <> iacol, where:

iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
iacol = mod((((ja-1)/NB_A)+CSRC_A), q)

If C is contained within a single block:

If trans = 'N':

p > 1 and n+mod(ia-1, MB_A) > MB_A
If trans = 'T':

q > 1 and n+mod(ja-1, NB_A) > NB_A

Example

This example computes C = alphaAA^T+betaC using a 2 × 3 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 3
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
             UPLO   TRANS   N    K     ALPHA    A  IA  JA    DESC_A    BETA
               |      |     |    |       |      |   |   |      |         |
 CALL PDSYRK( 'L' ,  'N' ,  8  , 5  ,  1.0D0  , A , 1 , 1 ,  DESC_A ,  1.0D0 ,
 
               C  IC  JC   DESC_C
               |   |   |     |
               C , 1 , 1 , DESC_C )

Desc_A Desc_C
DTYPE_ 1 1
CTXT_ icontxt¹ icontxt¹
M_ 8 8
N_ 5 8
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below² See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.
² Each process should set the LLD_ as follows:
LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_C = 4 on all processes.

Global general 8 × 5 matrix A with block size 2 × 2:

B,D         0                1             2
     *                                         *
 0   |   0.0    8.0  |   16.0   24.0  |   32.0 |
     |   1.0    9.0  |   17.0   25.0  |   33.0 |
     | --------------|----------------|------- |
 1   |   2.0   10.0  |   18.0   26.0  |   34.0 |
     |   3.0   11.0  |   19.0   27.0  |   35.0 |
     | --------------|----------------|------- |
 2   |   4.0   12.0  |   20.0   28.0  |   36.0 |
     |   5.0   13.0  |   21.0   29.0  |   37.0 |
     | --------------|----------------|------- |
 3   |   6.0   14.0  |   22.0   30.0  |   38.0 |
     |   7.0   15.0  |   23.0   31.0  |   39.0 |
     *                                         *

The following is the 2 × 3 process grid:

B,D 0 1 2
0
2
P₀₀ P₀₁ P₀₂
1
3
P₁₀ P₁₁ P₁₂

Local arrays for A:

p,q  |     0       |       1        |    2
-----|-------------|----------------|--------
     |  0.0   8.0  |   16.0   24.0  |   32.0
     |  1.0   9.0  |   17.0   25.0  |   33.0
 0   |  4.0  12.0  |   20.0   28.0  |   36.0
     |  5.0  13.0  |   21.0   29.0  |   37.0
-----|-------------|----------------|--------
     |  2.0  10.0  |   18.0   26.0  |   34.0
     |  3.0  11.0  |   19.0   27.0  |   35.0
 1   |  6.0  14.0  |   22.0   30.0  |   38.0
     |  7.0  15.0  |   23.0   31.0  |   39.0

Global symmetric matrix C of order 8 block size 2 × 2:

B,D         0               1               2               3
     *                                                             *
 0   |   0.0    .   |     .     .   |     .     .   |     .     .  |
     |   1.0   8.0  |     .     .   |     .     .   |     .     .  |
     | -------------|---------------|---------------|------------- |
 1   |   2.0   9.0  |   15.0    .   |     .     .   |     .     .  |
     |   3.0  10.0  |   16.0  21.0  |     .     .   |     .     .  |
     | -------------|---------------|---------------|------------- |
 2   |   4.0  11.0  |   17.0  22.0  |   26.0    .   |     .     .  |
     |   5.0  12.0  |   18.0  23.0  |   27.0  30.0  |     .     .  |
     | -------------|---------------|---------------|------------- |
 3   |   6.0  13.0  |   19.0  24.0  |   28.0  31.0  |   33.0    .  |
     |   7.0  14.0  |   20.0  25.0  |   29.0  32.0  |   34.0  35.0 |
     *                                                             *

The following is the 2 × 3 process grid:

B,D 0 3 1 2
0
2
P₀₀ P₀₁ P₀₂
1
3
P₁₀ P₁₁ P₁₂

Local arrays for C:

p,q  |            0             |       1       |       2
-----|--------------------------|---------------|--------------
     |   0.0    .     .     .   |     .     .   |     .     .
     |   1.0   8.0    .     .   |     .     .   |     .     .
 0   |   4.0  11.0    .     .   |   17.0  22.0  |   26.0    .
     |   5.0  12.0    .     .   |   18.0  23.0  |   27.0  30.0
-----|--------------------------|---------------|--------------
     |   2.0   9.0    .     .   |   15.0    .   |     .     .
     |   3.0  10.0    .     .   |   16.0  21.0  |     .     .
 1   |   6.0  13.0  33.0    .   |   19.0  24.0  |   28.0  31.0
     |   7.0  14.0  34.0  35.0  |   20.0  25.0  |   29.0  32.0

Output:

Global symmetric matrix C of order 8 with block size 2 × 2:

B,D           0                   1                   2                   3
     *                                                                             *
 0   |  1920.0      .   |       .       .   |       .       .   |       .       .  |
     |  2001.0  2093.0  |       .       .   |       .       .   |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 1   |  2082.0  2179.0  |   2275.0      .   |       .       .   |       .       .  |
     |  2163.0  2265.0  |   2366.0  2466.0  |       .       .   |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 2   |  2244.0  2351.0  |   2457.0  2562.0  |   2666.0      .   |       .       .  |
     |  2325.0  2437.0  |   2548.0  2658.0  |   2767.0  2875.0  |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 3   |  2406.0  2523.0  |   2639.0  2754.0  |   2868.0  2981.0  |   3093.0      .  |
     |  2487.0  2609.0  |   2730.0  2850.0  |   2969.0  3087.0  |   3204.0  3320.0 |
     *                                                                             *

The following is the 2 × 3 process grid:

B,D 0 3 1 2
0
2
P₀₀ P₀₁ P₀₂
1
3
P₁₀ P₁₁ P₁₂

Local arrays for C:

p,q  |                0                 |         1         |         2
-----|----------------------------------|-------------------|------------------
     |  1920.0      .       .       .   |       .       .   |       .       .
     |  2001.0  2093.0      .       .   |       .       .   |       .       .
 0   |  2244.0  2351.0      .       .   |   2457.0  2562.0  |   2666.0      .
     |  2325.0  2437.0      .       .   |   2548.0  2658.0  |   2767.0  2875.0
-----|----------------------------------|-------------------|------------------
     |  2082.0  2179.0      .       .   |   2275.0      .   |       .       .
     |  2163.0  2265.0      .       .   |   2366.0  2466.0  |       .       .
 1   |  2406.0  2523.0  3093.0      .   |   2639.0  2754.0  |   2868.0  2981.0
     |  2487.0  2609.0  3204.0  3320.0  |   2730.0  2850.0  |   2969.0  3087.0

PDSYR2K--Rank-2K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-2k updates:

1. C <-- alphaAB^T+alphaBA^T+betaC

2. C <-- alphaA^TB+alphaB^TA+betaC

where, in the formulas above:

A represents the global general submatrix:

For trans = 'N', it is A_{ia:ia+n-1,
ja:ja+k-1}.
For trans = 'T', it is A_{ia:ia+k-1,
ja:ja+n-1}.

B represents the global general submatrix:

For trans = 'N', it is B_{ib:ib+n-1,
jb:jb+k-1}.
For trans = 'T', it is B_{ib:ib+k-1,
jb:jb+n-1}.

C represents the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1}.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

n = 0
beta is one, and alpha is zero or k = 0.

See references [14] and [15].

Table 52. Data Types

alpha, beta, A, B, C Subprogram
Long-precision real PDSYR2K

Syntax

Fortran	CALL PDSYR2K (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`)
C and C++	pdsyr2k (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `b`, `ib`, `jb`, `desc_b`, `beta`, `c`, `ic`, `jc`, `desc_c`);

On Entry

uplo

indicates whether the upper or lower triangular part of the symmetric submatrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Scope: global

Specified as: a single character; trans = 'N' or 'T'.

n

is the order of the global symmetric submatrix C used in the computation, and:

If trans = 'N', it is the number of rows in submatrices A and B used in the computation.

If trans = 'T', it is the number of columns in submatrices A and B used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If trans = 'N', it is the number of columns in submatrices A and B used in the computation.

If trans = 'T', it is the number of rows in submatrices A and B used in the computation.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 52.

a

If trans = 'N', the leading LOCp(ia+n-1) by LOCq(ja+k-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+k-1 part of the global matrix.
If trans = 'T', the leading LOCp(ia+k-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+k-1 by ja+n-1 part of the global matrix.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If trans = 'N', then ia+n-1 <= M_A.

If trans = 'T', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If trans = 'N', then ja+k-1 <= N_A.

If trans = 'T', then ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0 or `k` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0 or `k` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

If trans = 'N', the leading LOCp(ib+n-1) by LOCq(jb+k-1) part of the local array B must contain the local pieces of the leading ib+n-1 by jb+k-1 part of the global matrix.
If trans = 'T', the leading LOCp(ib+k-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+k-1 by jb+n-1 part of the global matrix.

Note:

No data should be moved to form B^T; that is, the matrix B should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B, and:

If trans = 'N', then ib+n-1 <= M_B.

If trans = 'T', then ib+k-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B, and:

If trans = 'N', then jb+k-1 <= N_B.

If trans = 'T', then jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:

`desc_b`	Name	Description	Limits	Scope
1	DTYPE_B	Descriptor type	DTYPE_B=1	Global
2	CTXT_B	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_B	Number of rows in the global matrix	If `n` = 0 or `k` = 0: M_B >= 0 Otherwise: M_B >= 1	Global
4	N_B	Number of columns in the global matrix	If `n` = 0 or `k` = 0: N_B >= 0 Otherwise: N_B >= 1	Global
5	MB_B	Row block size	MB_B >= 1	Global
6	NB_B	Column block size	NB_B >= 1	Global
7	RSRC_B	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_B < `p`	Global
8	CSRC_B	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_B < `q`	Global
9	LLD_B	The leading dimension of the local array	LLD_B >= max(1,LOCp(M_B))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 52.

c

If uplo = 'U', the leading n × n upper triangular part of the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading n × n lower triangular part of the global symmetric submatrix C_{ic:ic+n-1,
jc:jc+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:

`desc_c`	Name	Description	Limits	Scope
1	DTYPE_C	Descriptor type	DTYPE_C=1	Global
2	CTXT_C	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_C	Number of rows in the global matrix	If `n` = 0: M_C >= 0 Otherwise: M_C >= 1	Global
4	N_C	Number of columns in the global matrix	If `n` = 0: N_C >= 0 Otherwise: N_C >= 1	Global
5	MB_C	Row block size	MB_C >= 1	Global
6	NB_C	Column block size	NB_C >= 1	Global
7	RSRC_C	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_C < `p`	Global
8	CSRC_C	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_C < `q`	Global
9	LLD_C	The leading dimension of the local array	LLD_C >= max(1,LOCp(M_C))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global symmetric matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo and trans arguments.
If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
The matrices must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_B = CTXT_C.
If trans = 'N':
- In the process grid, the process row containing the first row of the submatrix C must also contain the first row of the submatrices A and B; that is:
  
  icrow = iarow
  icrow = ibrow
  where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
- If looping is required--that is, either of the following is true:
  
  k+mod(ja-1, NB_A) > NB_A
  k+mod(jb-1, NB_B) > NB_B
  
  then the block column offset of A must be equal to the block column offset of B; that is, mod(ja-1, NB_A) = mod(jb-1, NB_B).
If trans = 'T':
- In the process grid, the process column containing the first column of the submatrix C must also contain the first column of the submatrices A and B; that is:
  
  iccol = iacol
  iccol = ibcol
  where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
- If looping is required--that is, either of the following is true:
  
  k+mod(ia-1, MB_A) > MB_A
  k+mod(ib-1, MB_B) > MB_B
  
  then the block row offset of A must be equal to the block row offset of B; that is, mod(ia-1, MB_A) = mod(ib-1, MB_B)
If all the following are true:
- C is contained within a single block, that is:
  
  n+mod(ic-1, MB_C) <= MB_C
  n+mod(jc-1, NB_C) <= NB_C
- If trans = 'N', then (in the process grid) the process column containing the first column of the submatrix A must also contain the first column of the submatrix B; that is, iacol = ibcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
- If trans = 'T', then (in the process grid) the process row containing the first row of the submatrix A must also contain the first row of the submatrix B; that is, iarow = ibrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
then you must follow these rules:
- If trans = 'N':
  - A and B must be block row matrices; that is, if p > 1:
    
    n+mod(ia-1, MB_A) <= MB_A
    n+mod(ib-1, MB_B) <= MB_B
  - If looping is required, the following block sizes must be equal: NB_A = NB_B.
- If trans = 'T':
  - A and B must be block column matrices; that is, if q > 1:
    
    n+mod(ja-1, NB_A) <= NB_A
    n+mod(jb-1, NB_B) <= NB_B
  - If looping is required, the following block sizes must be equal: MB_A = MB_B.
If the following is true:
- C is not contained within a single block.
or if all the following are true:
- C is contained within a single block.
- If trans = 'N', then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
- If trans = 'T', then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
then you must follow these rules:
- The global symmetric matrix C must be distributed using a square block-cyclic distribution; that is, MB_C = NB_C.
- The global symmetric matrix C must be aligned on a block boundary, that is:
  
  ic-1 must be a multiple of MB_C.
  jc-1 must be a multiple of NB_C.
- If trans = 'N':
  - The following block sizes must be equal:
    
    NB_A = NB_B
    MB_A = MB_B = NB_C.
  - The global matrices A and B must be aligned on a block row boundary, that is:
    
    ia-1 must be a multiple of MB_A.
    ib-1 must be a multiple of MB_B.
- If trans = 'T':
  - The following block sizes must be equal:
    
    MB_A = MB_B
    NB_A = NB_B = MB_C.
  - The global matrices A and B must be aligned on a block column boundary, that is:
    
    ja-1 must be a multiple of NB_A.
    jb-1 must be a multiple of NB_B.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_B is invalid.
DTYPE_C is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYR2K was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
trans <> 'N', 'T', or 'C'
n < 0 and trans = 'N'; n < 0 and trans = 'T' or 'C'; n < 0 and trans is invalid.
k < 0 and trans = 'N'; k < 0 and trans = 'T' or 'C'; k < 0 and trans is invalid.
M_A < 0 and (n = 0 or k = 0); M_A < 1 otherwise
N_A < 0 and (n = 0 or k = 0); N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1
M_B < 0 and (n = 0 or k = 0); M_B < 1 otherwise
N_B < 0 and (n = 0 or k = 0); N_B < 1 otherwise
MB_B < 1
NB_B < 1
RSRC_B < 0 or RSRC_B >= p
CSRC_B < 0 or CSRC_B >= q
ib < 1
jb < 1
M_C < 0 and n = 0; M_C < 1 otherwise
N_C < 0 and n = 0; N_C < 1 otherwise
MB_C < 1
NB_C < 1
RSRC_C < 0 or RSRC_C >= p
CSRC_C < 0 or CSRC_C >= q
ic < 1
jc < 1
CTXT_A <> CTXT_B
CTXT_A <> CTXT_C

Stage 5

If n <> 0 and k <> 0:

ia > M_A
ja > N_A
trans = 'N' and ia+n-1 > M_A
trans = 'N' and ja+k-1 > N_A
trans = 'T' and ia+k-1 > M_A
trans = 'T' and ja+n-1 > N_A
ib > M_B
jb > N_B
trans = 'N' and ib+n-1 > M_B
trans = 'N' and jb+k-1 > N_B
trans = 'T' and ib+k-1 > M_B
trans = 'T' and jb+n-1 > N_B

If n <> 0:
ic > M_C
jc > N_C
ic+n-1 > M_C
jc+n-1 > N_C

Stage 6

If C is contained within a single block, that is:

n+mod(ic-1, MB_C) <= MB_C

n+mod(jc-1, NB_C) <= NB_C

and:

If trans = 'N', then (in the process grid) the process column containing the first column of the submatrix A must also contain the first column of the submatrix B; that is, iacol = ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If trans = 'T', then (in the process grid) the process row containing the first row of the submatrix A must also contain the first row of the submatrix B; that is, iarow = ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)

then:

If trans = 'N':
1. p > 1 and n+mod(ia-1, MB_A) > MB_A
2. p > 1 and n+mod(ib-1, MB_B) > MB_B
3. Looping is required--that is, either of the following is true:
  
  k+mod(ja-1, NB_A) > NB_A
  k+mod(jb-1, NB_B) > NB_B
  
  and NB_A <> NB_B.
If trans = 'T':
1. q > 1 and n+mod(ja-1, NB_A) > NB_A
2. q > 1 and n+mod(jb-1, NB_B) > NB_B
3. Looping is required--that is, either of the following is true:
  
  k+mod(ia-1, MB_A) > MB_A
  k+mod(ib-1, MB_B) > MB_B
  
  and MB_A <> MB_B.

If C is not contained within a single block, or if C is contained within a single block and:

If trans = 'N', then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
If trans = 'T', then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)

then:

MB_C <> NB_C
mod(ic-1, MB_C) <> 0
mod(jc-1, NB_C) <> 0

If trans = 'N':
NB_C <> MB_A
NB_C <> MB_B
NB_A <> NB_B
mod(ia-1, MB_A) <> 0
mod(ib-1, MB_B) <> 0

If trans = 'T':
MB_C <> NB_A
MB_C <> NB_B
MB_A <> MB_B
mod(ja-1, NB_A) <> 0
mod(jb-1, NB_B) <> 0

In all cases:

LLD_A < max(1, LOCp(M_A))
LLD_B < max(1, LOCp(M_B))
LLD_C < max(1, LOCp(M_C))

If trans = 'N':
Looping is required and mod(ja-1, NB_A) <> mod(jb-1, NB_B).
In the process grid, the process row containing the first row of the submatrix C does not contain the first row of the submatrix A; that is, icrow <> iarow, where:

icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
In the process grid, the process row containing the first row of the submatrix C does not contain the first row of the submatrix B; that is, icrow <> ibrow, where:

icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)

If trans = 'T':
Looping is required and mod(ia-1, MB_A) <> mod(ib-1, MB_B).
In the process grid, the process column containing the first column of the submatrix C does not contain the first column of the submatrix A; that is, iccol <> iacol, where:

iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
In the process grid, the process column containing the first column of the submatrix C does not contain the first column of the submatrix B; that is, iccol <> ibcol, where:

iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)

Example

This example computes C = alphaA^TB+alphaB^TA+betaC using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              UPLO   TRANS   N    K     ALPHA    A  IA  JA    DESC_A   B  IB  JB
                |      |     |    |       |      |   |   |      |      |   |   |
 CALL PDSYR2K( 'U' ,  'T' ,  9  , 8  ,  1.0D0  , A , 1 , 1 ,  DESC_A , B , 1 , 1 ,
 
               DESC_B    BETA    C  IC  JC   DESC_C
                 |         |     |   |   |     |
               DESC_B ,  0.0D0 , C , 1 , 1 , DESC_C )

Desc_A Desc_B Desc_C
DTYPE_ 1 1 1
CTXT_ icontxt¹ icontxt¹ icontxt¹
M_ 8 8 9
N_ 9 9 9
MB_ 2 2 4
NB_ 4 4 4
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below² See below² See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.
² Each process should set the LLD_ as follows:
LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW)) LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_B = 4 on all processes, LLD_C = 5 on P₀₀ and P₀₁, and LLD_C = 4 on P₁₀ and P₁₁.

Global general 8 × 9 matrix A with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0  |   1.0 |
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0 -1.0 -1.0  |   0.0  0.0  1.0  0.0  |   1.0 |
     |  0.0  1.0  0.0 -1.0  |   1.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 2   |  1.0  0.0  0.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     |  1.0  0.0  0.0  0.0  |   1.0  1.0  0.0  0.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0 -1.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     | -1.0  0.0  0.0  0.0  |   0.0  0.0 -1.0  0.0  |   1.0 |
     *                                                      *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for A:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0 -1.0 -1.0  0.0  1.0  |   0.0  0.0  0.0  0.0
     |  0.0  1.0  0.0  1.0  1.0  |   0.0  1.0  0.0  1.0
 0   |  1.0  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0  1.0  |   1.0  1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0 -1.0 -1.0  1.0  |   0.0  0.0  1.0  0.0
     |  0.0  1.0  0.0 -1.0  1.0  |   1.0  1.0  0.0  1.0
 1   |  0.0  0.0 -1.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0  1.0  |   0.0  0.0 -1.0  0.0

Global general 8 × 9 matrix B with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0  1.0  1.0  0.0  |   0.0  0.0  0.0  0.0  |  -1.0 |
     |  0.0 -1.0  0.0 -1.0  |   0.0 -1.0  0.0 -1.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0  1.0  1.0  |   0.0  0.0 -1.0  0.0  |  -1.0 |
     |  0.0 -1.0  0.0  1.0  |  -1.0 -1.0  0.0 -1.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 2   | -1.0  0.0  0.0  0.0  |   1.0  0.0  0.0  0.0  |  -1.0 |
     | -1.0  0.0  0.0  0.0  |  -1.0 -1.0  0.0  0.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0  1.0  0.0  |   1.0  0.0  0.0  0.0  |  -1.0 |
     |  1.0  0.0  0.0  0.0  |   0.0  0.0  1.0  0.0  |  -1.0 |
     *                                                      *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for B:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0  1.0  1.0  0.0 -1.0  |   0.0  0.0  0.0  0.0
     |  0.0 -1.0  0.0 -1.0 -1.0  |   0.0 -1.0  0.0 -1.0
 0   | -1.0  0.0  0.0  0.0 -1.0  |   1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0 -1.0  |  -1.0 -1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0  1.0  1.0 -1.0  |   0.0  0.0 -1.0  0.0
     |  0.0 -1.0  0.0  1.0 -1.0  |  -1.0 -1.0  0.0 -1.0
 1   |  0.0  0.0  1.0  0.0 -1.0  |   1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0 -1.0  |   0.0  0.0  1.0  0.0

Output:

Global symmetric matrix C of order 9 with block size 4 × 4:

B,D             0                       1                2
     *                                                       *
     | -6.0  0.0  0.0  0.0  |   0.0 -2.0 -2.0  0.0  |  -2.0  |
     |   .  -6.0 -2.0  0.0  |  -2.0 -4.0  0.0 -4.0  |  -2.0  |
 0   |   .    .  -6.0 -2.0  |  -2.0  0.0  2.0  0.0  |   6.0  |
     |   .    .    .  -6.0  |   2.0  0.0  2.0  0.0  |   2.0  |
     | ---------------------|-----------------------|------- |
     |   .    .    .    .   |  -8.0 -4.0  0.0 -2.0  |   0.0  |
     |   .    .    .    .   |    .  -6.0  0.0 -4.0  |  -6.0  |
 1   |   .    .    .    .   |    .    .  -4.0  0.0  |   0.0  |
     |   .    .    .    .   |    .    .    .  -4.0  |  -4.0  |
     | ---------------------|-----------------------|------- |
 2   |   .    .    .    .   |    .    .    .    .   |  -16.0 |
     *                                                       *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for C:

p,q  |             0              |           1
-----|----------------------------|----------------------
     | -6.0  0.0  0.0  0.0  -2.0  |   0.0 -2.0 -2.0  0.0
     |   .  -6.0 -2.0  0.0  -2.0  |  -2.0 -4.0  0.0 -4.0
 0   |   .    .  -6.0 -2.0   6.0  |  -2.0  0.0  2.0  0.0
     |   .    .    .  -6.0   2.0  |   2.0  0.0  2.0  0.0
     |   .    .    .    .  -16.0  |    .    .    .    .
-----|----------------------------|----------------------
     |   .    .    .    .    0.0  |  -8.0 -4.0  0.0 -2.0
     |   .    .    .    .   -6.0  |    .  -6.0  0.0 -4.0
 1   |   .    .    .    .    0.0  |    .    .  -4.0  0.0
     |   .    .    .    .   -4.0  |    .    .    .  -4.0

PDTRAN--Matrix Transpose for a General Matrix

This subroutine performs the following matrix computation:

C <-- betaC+alphaA^T

where, in the formula above:

A represents the global general submatrix A_{ia:ia+n-1,
ja:ja+m-1}.

C represents the global general submatrix C_{ic:ic+m-1,
jc:jc+n-1}.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:

m = 0
n = 0
alpha is zero and beta is one.

See references [14] and [15].

Table 53. Data Types

alpha, beta, A, C Subprogram
Long-precision real PDTRAN

Syntax

Fortran	CALL PDTRAN (`m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `beta`, `c`, `ic`, `jc`, `desc_c`)
C and C++	pdtran (`m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `beta`, `c`, `ic`, `jc`, `desc_c`);

On Entry

m

is the number of rows in submatrix C and the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix C and the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 53.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+m-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+m-1 part of the global matrix.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+m-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 53.

c

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:

`desc_c`	Name	Description	Limits	Scope
1	DTYPE_C	Descriptor type	DTYPE_C=1	Global
2	CTXT_C	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_C	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_C >= 0 Otherwise: M_C >= 1	Global
4	N_C	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_C >= 0 Otherwise: N_C >= 1	Global
5	MB_C	Row block size	MB_C >= 1	Global
6	NB_C	Column block size	NB_C >= 1	Global
7	RSRC_C	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_C < `p`	Global
8	CSRC_C	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_C < `q`	Global
9	LLD_C	The leading dimension of the local array	LLD_C >= max(1,LOCp(M_C))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global general matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53.

Notes and Coding Rules

The matrices must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_C.
The coding rules (given in this section) and the error conditions (given in the next section) are written in terms of adist. To determine a value for adist, check the following conditions, in order, and chose the first value having a true condition:
1. If A is a block column matrix, that is:
  
  m+mod(ja-1, NB_A) <= NB_A
  
  then adist = 'C'
2. If A is a block row matrix, that is:
  
  n+mod(ia-1, MB_A) <= MB_A
  
  then adist = 'R'
3. If A is neither a block column or a block row matrix, then:
  - If m <= n, then adist = 'C'.
  - Otherwise, adist = 'R'.
If adist = 'C', then you must follow these coding rules:
- A must be aligned on a block row boundary, that is:
  
  ia-1 must be a multiple of MB_A.
- C must be aligned on a block column boundary, that is:
  
  jc-1 must be a multiple of NB_C.
- MB_A = NB_C
- If looping is required--that is, either of the following is true:
  
  m+mod(ja-1, NB_A) > NB_A
  m+mod(ic-1, MB_C) > MB_C
  
  then:
  - The block column offset of A must be equal to the block row offset of C; that is, mod(ja-1, NB_A) = mod(ic-1, MB_C).
  - NB_A = MB_C
If adist = 'R', then you must follow these coding rules:
- A must be aligned on a block column boundary, that is:
  
  ja-1 must be a multiple of NB_A.
- C must be aligned on a block row boundary, that is:
  
  ic-1 must be a multiple of MB_C.
- NB_A = MB_C
- If looping is required--that is, either of the following is true:
  
  n+mod(ia-1, MB_A) > MB_A
  n+mod(jc-1, NB_C) > NB_C
  
  then:
  - The block row offset of A must be equal to the block column offset of C; that is, mod(ia-1, MB_A) = mod(jc-1, NB_C).
  - MB_A = NB_C

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_C is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDTRAN was called from outside the process grid.

Stage 4

m < 0
n < 0
M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1
M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
MB_C < 1
NB_C < 1
RSRC_C < 0 or RSRC_C >= p
CSRC_C < 0 or CSRC_C >= q
ic < 1
jc < 1
CTXT_A <> CTXT_C

Stage 5

Note:

Some of the following error conditions depend on the value of adist--that is, adist = 'C' or or adist = 'R'. For details on determining the value, see "Notes and Coding Rules".

If m <> 0 and n <> 0:

ia > M_A
ja > N_A
ia+n-1 > M_A
ja+m-1 > N_A
ic > M_C
jc > N_C
ic+m-1 > M_C
jc+n-1 > N_C

If adist = 'C':

mod(ia-1, MB_A) <> 0
mod(jc-1, NB_C) <> 0
MB_A <> NB_C
If looping is required--that is, either of the following is true:

m+mod(ja-1, NB_A) > NB_A
m+mod(ic-1, MB_C) > MB_C

then:
1. mod(ja-1, NB_A) <> mod(ic-1, MB_C)
2. NB_A <> MB_C.

If adist = 'R':

mod(ja-1, NB_A) <> 0
mod(ic-1, MB_C) <> 0
NB_A <> MB_C
If looping is required--that is, either of the following is true:

n+mod(ia-1, MB_A) > MB_A
n+mod(jc-1, NB_C) > NB_C

then:
1. mod(ia-1, MB_A) <> mod(jc-1, NB_C)
2. MB_A <> NB_C.

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_C < max(1, LOCp(M_C))

Example

This example computes C = betaC+alphaA^T using a 2 × 2 process grid.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M    N     ALPHA    A  IA  JA    DESC_A    BETA    C  IC  JC   DESC_C
              |    |       |      |   |   |      |         |     |   |   |     |
CALL PDTRAN(  9  , 8  ,  1.0D0  , A , 1 , 1 ,  DESC_A ,  1.0D0 , C , 1 , 1 , DESC_C )

Desc_A

Desc_C

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on all processes, LLD_C = 5 on P₀₀ and P₀₁, and LLD_C = 4 on P₁₀ and P₁₁.

Global general 8 × 9 matrix A with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0  |   1.0 |
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0 -1.0 -1.0  |   0.0  0.0  1.0  0.0  |   1.0 |
     |  0.0  1.0  0.0 -1.0  |   1.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 2   |  1.0  0.0  0.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     |  1.0  0.0  0.0  0.0  |   1.0  1.0  0.0  0.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0 -1.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     | -1.0  0.0  0.0  0.0  |   0.0  0.0 -1.0  0.0  |   1.0 |
     *                                                      *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for A:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0 -1.0 -1.0  0.0  1.0  |   0.0  0.0  0.0  0.0
     |  0.0  1.0  0.0  1.0  1.0  |   0.0  1.0  0.0  1.0
 0   |  1.0  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0  1.0  |   1.0  1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0 -1.0 -1.0  1.0  |   0.0  0.0  1.0  0.0
     |  0.0  1.0  0.0 -1.0  1.0  |   1.0  1.0  0.0  1.0
 1   |  0.0  0.0 -1.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0  1.0  |   0.0  0.0 -1.0  0.0

Global general 9 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     |  0.0  1.0  |   1.0  5.0  |   6.0  7.0  |   8.0  9.0 |
     |  0.0 -1.0  |   0.0 -1.0  |   0.0 -1.0  |   0.0  1.0 |
 0   |  0.0  0.0  |   1.0  1.0  |   0.0  0.0  |  -1.0  0.0 |
     |  0.0 -1.0  |   0.0  1.0  |  -1.0 -1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  2.0  |   0.0  0.0  |   1.0  0.0  |   0.0  0.0 |
     | -1.0  3.0  |   0.0  0.0  |  -1.0 -1.0  |   0.0  0.0 |
 1   |  0.0  4.0  |   1.0  0.0  |   1.0  0.0  |   0.0  0.0 |
     |  1.0  5.0  |   0.0  0.0  |   0.0  0.0  |   1.0  0.0 |
     | -----------|-------------|-------------|----------- |
 2   |  1.0  2.0  |   3.0  4.0  |   1.0  1.0  |   1.0  1.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0  1.0  6.0  7.0  |   1.0  5.0  8.0  9.0
     |  0.0 -1.0  0.0 -1.0  |   0.0 -1.0  0.0  1.0
 0   |  0.0  0.0  0.0  0.0  |   1.0  1.0 -1.0  0.0
     |  0.0 -1.0 -1.0 -1.0  |   0.0  1.0  0.0  1.0
     |  1.0  2.0  1.0  1.0  |   3.0  4.0  1.0  1.0
-----|----------------------|----------------------
     | -1.0  2.0  1.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  3.0 -1.0 -1.0  |   0.0  0.0  0.0  0.0
 1   |  0.0  4.0  1.0  0.0  |   1.0  0.0  0.0  0.0
     |  1.0  5.0  0.0  0.0  |   0.0  0.0  1.0  0.0

Output:

Global general 9 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     |  0.0  1.0  |   1.0  5.0  |   7.0  8.0  |   8.0  8.0 |
     | -1.0  0.0  |   0.0  0.0  |   0.0 -1.0  |   0.0  1.0 |
 0   | -1.0  0.0  |   0.0  1.0  |   0.0  0.0  |  -2.0  0.0 |
     |  0.0  0.0  |  -1.0  0.0  |  -1.0 -1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  2.0  |   0.0  1.0  |   0.0  1.0  |  -1.0  0.0 |
     | -1.0  4.0  |   0.0  1.0  |  -1.0  0.0  |   0.0  0.0 |
 1   |  0.0  4.0  |   2.0  0.0  |   1.0  0.0  |   0.0 -1.0 |
     |  1.0  6.0  |   0.0  1.0  |   0.0  0.0  |   1.0  0.0 |
     | -----------|-------------|-------------|----------- |
 2   |  2.0  3.0  |   4.0  5.0  |   2.0  2.0  |   2.0  2.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0  1.0  7.0  8.0  |   1.0  5.0  8.0  8.0
     | -1.0  0.0  0.0 -1.0  |   0.0  0.0  0.0  1.0
 0   | -1.0  0.0  0.0  0.0  |   0.0  1.0 -2.0  0.0
     |  0.0  0.0 -1.0 -1.0  |  -1.0  0.0  0.0  1.0
     |  2.0  3.0  2.0  2.0  |   4.0  5.0  2.0  2.0
-----|----------------------|----------------------
     | -1.0  2.0  0.0  1.0  |   0.0  1.0 -1.0  0.0
     | -1.0  4.0 -1.0  0.0  |   0.0  1.0  0.0  0.0
 1   |  0.0  4.0  1.0  0.0  |   2.0  0.0  0.0 -1.0
     |  1.0  6.0  0.0  0.0  |   0.0  1.0  1.0  0.0

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

A, B, C, alpha, beta	Subroutine
Long-precision real	PDGEMM
Long-precision complex	PZGEMM

1. B <-- alphaAB	3. B <-- alphaBA
2. B <-- alphaA^TB	4. B <-- alphaBA^T

alpha, beta, A, B, C	Subprogram
Long-precision real	PDSYR2K