This chapter describes the Level 3 PBLAS subroutines.
The Level 3 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 3 BLAS.
Note: | These subroutines are designed in accordance with the proposed Level 3 PBLAS standard. (See references [14], [15], and [17].) If these subroutines do not comply with the standard as approved, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program. |
Table 44. List of Level 3 PBLAS (Message Passing)
Descriptive Name | Long-Precision Subprogram | Page |
---|---|---|
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose | PDGEMM
PZGEMM | PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose |
Matrix-Matrix Product Where One Matrix is Real Symmetric | PDSYMM | PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric |
Triangular Matrix-Matrix Product | PDTRMM | PDTRMM--Triangular Matrix-Matrix Product |
Solution of Triangular System of Equations with Multiple Right-Hand Sides | PDTRSM | PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides |
Rank-K Update of a Real Symmetric Matrix | PDSYRK | PDSYRK--Rank-K Update of a Real Symmetric Matrix |
Rank-2K Update of a Real Symmetric Matrix | PDSYR2K | PDSYR2K--Rank-2K Update of a Real Symmetric Matrix |
Matrix Transpose for a General Matrix | PDTRAN | PDTRAN--Matrix Transpose for a General Matrix |
This section contains the Level 3 PBLAS subroutine descriptions.
PDGEMM performs any one of the following combined matrix computations:
PZGEMM performs any one of the following combined matrix computations:
where, in the PDGEMM and PZGEMM formulas above:
Note: | No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms. |
In the following four cases, no computation is performed and the subroutine returns after doing some parameter checking:
Assuming the above conditions do not exist, if beta is not one and k is 0, then betaC is returned.
A, B, C, alpha, beta | Subroutine |
Long-precision real | PDGEMM |
Long-precision complex | PZGEMM |
Fortran | CALL PDGEMM | PZGEMM (transa, transb, m, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c) |
C and C++ | pdgemm | pzgemm (transa, transb, m, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c); |
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Scope: global
Specified as: a single character; transa = 'N', 'T', or 'C'
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
If transb = 'C', BH is used in the computation.
Scope: global
Specified as: a single character; transb = 'N', 'T', or 'C'
If transa = 'N', it is the number of rows in submatrix A.
If transa = 'T' or 'C', it is the number of columns in submatrix A.
Scope: global
Specified as: a fullword integer; m >= 0.
If transb = 'N', it is the number of columns in submatrix B.
If transb = 'T' or 'C', it is the number of rows in submatrix B.
Scope: global
Specified as: a fullword integer; n >= 0.
If transa = 'N', it is the number of columns in submatrix A.
If transa = 'T' or 'C', it is the number of rows in submatrix A.
In addition:
If transb = 'N', it is the number of rows in submatrix B.
If transb = 'T' or 'C', it is the number of columns in submatrix B.
Scope: global
Specified as: a fullword integer; k >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 45.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A, and:
If transa = 'N', then ia+m-1 <= M_A.
If transa = 'T' or 'C', then ia+k-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A, and:
If transa = 'N', then ja+k-1 <= N_A.
If transa = 'T' or 'C', then ja+m-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 or k = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 or k = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Note: | No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B, and:
If transb = 'N', then ib+k-1 <= M_B.
If transb = 'T' or 'C', then ib+n-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B, and:
If transb = 'N', then jb+n-1 <= N_B.
If transb = 'T' or 'C', then jb+k-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If k = 0 or n = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix |
If k = 0 or n = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size | NB_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 45.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If m = 0 or n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If m = 0 or n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45.
Step 1: First, the reference matrix is selected. For optimal performance, the reference matrix is selected based on the arguments m, n, and k, as follows:
The matrix selected must satisfy coding rules a and d, described below, to be a suitable reference matrix. If it does, you go to step 2. If it does not, then it checks to see if either of the other two matrices satisfies coding rules a, c, and d, making one of them a suitable reference matrix. If one of them is suitable, then you go to step 2. If neither matrix is suitable, an error condition results.
Step 2: After a suitable reference matrix is chosen in Step 2, all remaining coding rules, described below, are checked. If the rules are satisfied, the subroutine continues normally. If they are not, an error condition results.
Coding Rules: Following are the coding rules:
These indexes are indicated in column 5 of Table 46 for each entry for X.
Table 46. Coding Rules for the Reference Matrix X
-1- X |
-2- transa |
-3- transb |
-4- (b) Equal Block Sizes |
-5- (a) Block Bndry For X |
-6- (d) Block Bndry For Other |
-7- (c) Equal Block Offsets (If Looping is Required) |
-8- (c) Conditions For Looping |
---|---|---|---|---|---|---|---|
A | 'N' | 'N' |
MB_A = MB_C NB_B = NB_C NB_A = MB_B | ia, ja | ib, ic |
mod(jb-1, NB_B) = mod(jc-1, NB_C) |
n+mod(jb-1, NB_B) > NB_B -or- n+mod(jc-1, NB_C) > NB_C |
A | 'N' | 'T' or 'C' |
MB_A = MB_C MB_B = NB_C NB_A = NB_B | ia, ja | jb, ic |
mod(ib-1, MB_B) = mod(jc-1, NB_C) |
n+mod(ib-1, MB_B) > MB_B -or- n+mod(jc-1, NB_C) > NB_C |
A | 'T' or 'C' | 'N' |
NB_A = MB_C NB_B = NB_C MB_A = MB_B | ia, ja | ib, ic |
mod(jb-1, NB_B) = mod(jc-1, NB_C) |
n+mod(jb-1, NB_B) > NB_B -or- n+mod(jc-1, NB_C) > NB_C |
A | 'T' or 'C' | 'T' or 'C' |
NB_A = MB_C MB_B = NB_C MB_A = NB_B | ia, ja | jb, ic |
mod(ib-1, MB_B) = mod(jc-1, NB_C) |
n+mod(ib-1, MB_B) > MB_B -or- n+mod(jc-1, NB_C) > NB_C |
B | 'N' | 'N' |
MB_A = MB_C NB_B = NB_C NB_A = MB_B | ib, jb | ja, jc |
mod(ia-1, MB_A) = mod(ic-1, MB_C) |
m+mod(ia-1, MB_A) > MB_A -or- m+mod(ic-1, MB_C) > MB_C |
B | 'N' | 'T' or 'C' |
MB_A = MB_C MB_B = NB_C NB_A = NB_B | ib, jb | ja, jc |
mod(ia-1, MB_A) = mod(ic-1, MB_C) |
m+mod(ia-1, MB_A) > MB_A -or- m+mod(ic-1, MB_C) > MB_C |
B | 'T' or 'C' | 'N' |
NB_A = MB_C NB_B = NB_C MB_A = MB_B | ib, jb | ia, jc |
mod(ja-1, NB_A) = mod(ic-1, MB_C) |
m+mod(ja-1, NB_A) > NB_A -or- m+mod(ic-1, MB_C) > MB_C |
B | 'T' or 'C' | 'T' or 'C' |
NB_A = MB_C MB_B = NB_C MB_A = NB_B | ib, jb | ia, jc |
mod(ja-1, NB_A) = mod(ic-1, MB_C) |
m+mod(ja-1, NB_A) > NB_A -or- m+mod(ic-1, MB_C) > MB_C |
C | 'N' | 'N' |
MB_A = MB_C NB_B = NB_C NB_A = MB_B | ic, jc | ia, jb |
mod(ja-1, NB_A) = mod(ib-1, MB_B) |
k+mod(ja-1, NB_A) > NB_A -or- k+mod(ib-1, MB_B) > MB_B |
C | 'N' | 'T' or 'C' |
MB_A = MB_C MB_B = NB_C NB_A = NB_B | ic, jc | ia, ib |
mod(ja-1, NB_A) = mod(jb-1, NB_B) |
k+mod(ja-1, NB_A) > NB_A -or- k+mod(jb-1, NB_B) > NB_B |
C | 'T' or 'C' | 'N' |
NB_A = MB_C NB_B = NB_C MB_A = MB_B | ic, jc | ja, jb |
mod(ia-1, MB_A) = mod(ib-1, MB_B) |
k+mod(ia-1, MB_A) > MB_A -or- k+mod(ib-1, MB_B) > MB_B |
C | 'T' or 'C' | 'T' or 'C' |
NB_A = MB_C MB_B = NB_C MB_A = NB_B | ic, jc | ja, ib |
mod(ia-1, MB_A) = mod(jb-1, NB_B) |
k+mod(ia-1, MB_A) > MB_A -or- k+mod(jb-1, NB_B) > NB_B |
Table 47. Coding Rules for the Reference Matrix X
-1- X |
-2- transa |
-3- transb |
-4- (e) Process Grid Alignment |
---|---|---|---|
A | 'N' | 'N' | iarow = icrow |
A | 'N' | 'T' or 'C' |
iarow = icrow ibcol = iacol |
A | 'T' or 'C' | 'N' | iarow = ibrow |
A | 'T' or 'C' | 'T' or 'C' | (no rules) |
B | 'N' | 'N' | ibcol = iccol |
B | 'N' | 'T' or 'C' | ibcol = iacol |
B | 'T' or 'C' | 'N' |
iarow = ibrow ibcol = iccol |
B | 'T' or 'C' | 'T' or 'C' | (no rules) |
C | 'N' | 'N' |
iarow = icrow ibcol = iccol |
C | 'N' | 'T' or 'C' | iarow = icrow |
C | 'T' or 'C' | 'N' | ibcol = iccol |
C | 'T' or 'C' | 'T' or 'C' | (no rules) |
Example: Following is an example of the coding rules necessary for the case where transa = 'N' and transb = 'N', where the reference matrix selected is A. Following are the indexes, dimensions, and block sizes used in the computation for the matrices:
Indexes: ic jc ia ja ib jb ic jc | | | | | | | | Dimensions: C ( m , n ) <-- alpha A ( m , k ) B ( k , n ) + beta C ( m , n ) | | | | | | | | Block Sizes: MB_C NB_C MB_A NB_A MB_B NB_B MB_C NB_C |
then the following offsets must be equal, as indicated in column 7 in Table 46:
None
Unable to allocate work space
If m <> 0 and k <> 0:
If n <> 0 and k <> 0:
If m <> 0 and n <> 0:
This example computes C = betaC+alphaAB using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA TRANSB M N K ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | | CALL PDGEMM( 'N' , 'N' , 6 , 4 , 5 , 1.0D0 , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , 2.0D0 , C , 1 , 1 , DESC_C )
| Desc_A | Desc_B | Desc_C | ||
---|---|---|---|---|---|
DTYPE_ | 1 | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | icontxt1 | ||
M_ | 6 | 5 | 6 | ||
N_ | 5 | 4 | 4 | ||
MB_ | 3 | 2 | 3 | ||
NB_ | 2 | 2 | 2 | ||
RSRC_ | 0 | 0 | 0 | ||
CSRC_ | 0 | 0 | 0 | ||
LLD_ | See below2 | See below2 | See below2 | ||
|
Global general 6 × 5 matrix A with block size 3 × 2:
B,D 0 1 2 * * | 1.0 2.0 | -1.0 -1.0 | 4.0 | 0 | 2.0 0.0 | 1.0 1.0 | -1.0 | | 1.0 -1.0 | -1.0 1.0 | 2.0 | | -----------|-------------|------ | | -3.0 2.0 | 2.0 2.0 | 0.0 | 1 | 4.0 0.0 | -2.0 1.0 | -1.0 | | -1.0 -1.0 | 1.0 -3.0 | 2.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|-----------------|------------ | 1.0 2.0 4.0 | -1.0 -1.0 0 | 2.0 0.0 -1.0 | 1.0 1.0 | 1.0 -1.0 2.0 | -1.0 1.0 -----|-----------------|------------ | -3.0 2.0 0.0 | 2.0 2.0 1 | 4.0 0.0 -1.0 | -2.0 1.0 | -1.0 -1.0 2.0 | 1.0 -3.0
Global general 5 × 4 matrix B with block size 2 × 2:
B,D 0 1 * * 0 | 1.0 -1.0 | 0.0 2.0 | | 2.0 2.0 | -1.0 -2.0 | | -----------|----------- | 1 | 1.0 0.0 | -1.0 1.0 | | -3.0 -1.0 | 1.0 -1.0 | | -----------|----------- | 2 | 4.0 2.0 | -1.0 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|------------|------------ | 1.0 -1.0 | 0.0 2.0 0 | 2.0 2.0 | -1.0 -2.0 | 4.0 2.0 | -1.0 1.0 -----|------------|------------ 1 | 1.0 0.0 | -1.0 0.0 | -3.0 -1.0 | 1.0 -1.0
Global general 6 × 4 matrix C with block size 3 × 2:
B,D 0 1 * * | 0.5 0.5 | 0.5 0.5 | 0 | 0.5 0.5 | 0.5 0.5 | | 0.5 0.5 | 0.5 0.5 | | -----------|----------- | | 0.5 0.5 | 0.5 0.5 | 1 | 0.5 0.5 | 0.5 0.5 | | 0.5 0.5 | 0.5 0.5 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|------------|------------ | 0.5 0.5 | 0.5 0.5 0 | 0.5 0.5 | 0.5 0.5 | 0.5 0.5 | 0.5 0.5 -----|------------|------------ | 0.5 0.5 | 0.5 0.5 1 | 0.5 0.5 | 0.5 0.5 | 0.5 0.5 | 0.5 0.5
Output:
Global general 6 × 4 matrix C with block size 3 × 2:
B,D 0 1 * * | 24.0 13.0 | -5.0 3.0 | 0 | -3.0 -4.0 | 2.0 4.0 | | 4.0 1.0 | 2.0 5.0 | | -------------|------------- | | -2.0 6.0 | -1.0 -9.0 | 1 | -4.0 -6.0 | 5.0 5.0 | | 16.0 7.0 | -4.0 7.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|--------------|-------------- | 24.0 13.0 | -5.0 3.0 0 | -3.0 -4.0 | 2.0 4.0 | 4.0 1.0 | 2.0 5.0 -----|--------------|-------------- | -2.0 6.0 | -1.0 -9.0 1 | -4.0 -6.0 | 5.0 5.0 | 16.0 7.0 | -4.0 7.0
This example computes C = betaC+alphaAB using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) TRANSA TRANSB M N K ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | | CALL PZGEMM('N' , 'N' , 6 , 2 , 3 , (1.0D0,0.0D0) , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , (2.0D0,0.0D0) , C , 1 , 1 , DESC_C)
| Desc_A | Desc_B | Desc_C | ||
---|---|---|---|---|---|
DTYPE_ | 1 | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | icontxt1 | ||
M_ | 6 | 3 | 6 | ||
N_ | 3 | 2 | 2 | ||
MB_ | 2 | 2 | 2 | ||
NB_ | 2 | 2 | 2 | ||
RSRC_ | 0 | 0 | 0 | ||
CSRC_ | 0 | 0 | 0 | ||
LLD_ | See below2 | See below2 | See below2 | ||
|
Global general 6 × 3 matrix A with block size 2 × 2:
B,D 0 1 * * 0 | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | | (2.0,4.0) (8.0,3.0) | (1.0,8.0) | | -----------------------|------------ | 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | | (4.0,2.0) (4.0,7.0) | (1.0,5.0) | | -----------------------|------------ | 2 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | | (6.0,6.0) (3.0,6.0) | (1.0,4.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|------------------------|------------- | (1.0,5.0) (9.0,2.0) | (1.0,9.0) | (2.0,4.0) (8.0,3.0) | (1.0,8.0) 0 | (5.0,1.0) (5.0,1.0) | (1.0,6.0) | (6.0,6.0) (3.0,6.0) | (1.0,4.0) -----|------------------------|------------- 1 | (3.0,3.0) (7.0,5.0) | (1.0,7.0) | (4.0,2.0) (4.0,7.0) | (1.0,5.0)
Global general 3 × 2 matrix B with block size 2 × 2:
B,D 0 * * 0 | (1.0,8.0) (2.0,7.0) | | (4.0,4.0) (6.0,8.0) | | --------------------- | 1 | (6.0,2.0) (4.0,5.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 -----|----------------------- 0 | (1.0,8.0) (2.0,7.0) | (4.0,4.0) (6.0,8.0) -----|----------------------- 1 | (6.0,2.0) (4.0,5.0)
Global general 6 × 2 matrix C with block size 2 × 2:
B,D 0 * * 0 | (0.5,0.0) (0.5,0.0) | | (0.5,0.0) (0.5,0.0) | | --------------------- | 1 | (0.5,0.0) (0.5,0.0) | | (0.5,0.0) (0.5,0.0) | | --------------------- | 2 | (0.5,0.0) (0.5,0.0) | | (0.5,0.0) (0.5,0.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 -----|----------------------- | (0.5,0.0) (0.5,0.0) | (0.5,0.0) (0.5,0.0) 0 | (0.5,0.0) (0.5,0.0) | (0.5,0.0) (0.5,0.0) -----|----------------------- 1 | (0.5,0.0) (0.5,0.0) | (0.5,0.0) (0.5,0.0)
Output:
Global general 6 × 2 matrix C with block size 2 × 2:
B,D 0 * * 0 | (-22.0,113.0) (-35.0.142.0) | | (-19.0,114.0) (-35.0.141.0) | | ----------------------------- | 1 | (-20.0,119.0) (-43.0.146.0) | | (-27.0,110.0) (-58.0.131.0) | | ----------------------------- | 2 | (8.0,103.0) (0.0.112.0) | | (-55.0,116.0) (-75.0.135.0) | * *
The following is the 2 × 2 process grid:
B,D | 0 | -- |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 -----|------------------------------- | (-22.0,113.0) (-35.0.142.0) | (-19.0,114.0) (-35.0.141.0) 0 | (8.0,103.0) (0.0.112.0) | (-55.0,116.0) (-75.0.135.0) -----|------------------------------- 1 | (-20.0,119.0) (-43.0.146.0) | (-27.0,110.0) (-58.0.131.0)
This subroutine computes one of the following matrix-matrix products:
where, in the formulas above:
In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, B, C | Subprogram |
Long-precision real | PDSYMM |
Fortran | CALL PDSYMM (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c) |
C and C++ | pdsymm (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c); |
If side = 'L', A is to the left of B, resulting in equation 1.
If side = 'R', A is to the right of B, resulting in equation 2.
Scope: global
Specified as: a single character; side = 'L' or 'R'.
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If side = 'L', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; m >= 0.
If side = 'R', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 48.
If side = 'L', numa = m
If side = 'R', numa = n
the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If m = 0 or n = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix |
If m = 0 or n = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size | NB_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 48.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If m = 0 or n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If m = 0 or n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48.
then:
then:
then you must follow these rules:
or if all the following are true:
then you must follow these rules:
None
Unable to allocate work space
If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):
where numa = m if side = 'L' and numa = n if side = 'R'.
If m <> 0 and n <> 0:
If A is contained within a single block, that is:
and:
then:
If A is not contained within a single block, or if A is contained within a single block and:
then:
If side = 'L':
If side = 'R':
In all cases:
If side = 'L' and looping is required--that is, either of the following is true:
then:
If side = 'L':
If side = 'R' and looping is required--that is, either of the following is true:
then:
If side = 'R':
This example computes C = betaC+alphaBA using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO M N ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | CALL PDSYMM( 'R' , 'U' , 16 , 8 , 1.0D0 , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , 0.0D0 , C , 1 , 1 , DESC_C )
| Desc_A | Desc_B | Desc_C | ||
---|---|---|---|---|---|
DTYPE_ | 1 | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | icontxt1 | ||
M_ | 8 | 16 | 16 | ||
N_ | 8 | 8 | 8 | ||
MB_ | 2 | 4 | 4 | ||
NB_ | 2 | 2 | 2 | ||
RSRC_ | 0 | 0 | 0 | ||
CSRC_ | 0 | 0 | 0 | ||
LLD_ | See below2 | See below2 | See below2 | ||
|
Global symmetric matrix A of order 8 with block size 2 × 2:
B,D 0 1 2 3 * * 0 | 0.0 -1.0 | -1.0 0.0 | 0.0 0.0 | 0.0 0.0 | | . 1.0 | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | 1 | . . | -1.0 -1.0 | 0.0 0.0 | 1.0 0.0 | | . . | . -1.0 | 1.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | 2 | . . | . . | -1.0 0.0 | 0.0 0.0 | | . . | . . | . 1.0 | 0.0 0.0 | | -----------|-------------|-------------|----------- | 3 | . . | . . | . . | 0.0 0.0 | | . . | . . | . . | . 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|----------------------|---------------------- | 0.0 -1.0 0.0 0.0 | -1.0 0.0 0.0 0.0 | . 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 0 | . . -1.0 0.0 | . . 0.0 0.0 | . . . 1.0 | . . 0.0 0.0 -----|----------------------|---------------------- | . . 0.0 0.0 | -1.0 -1.0 1.0 0.0 | . . 1.0 1.0 | . -1.0 0.0 1.0 1 | . . . . | . . 0.0 0.0 | . . . . | . . . 0.0
Global general 16 × 8 matrix B with block size 4 × 2:
B,D 0 1 2 3 * * | -1.0 0.0 | 1.0 -1.0 | 1.0 1.0 | -1.0 -1.0 | | -1.0 -1.0 | 1.0 0.0 | 1.0 -1.0 | -1.0 1.0 | 0 | 1.0 1.0 | -1.0 0.0 | -1.0 0.0 | 1.0 0.0 | | 0.0 -1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 -1.0 | | -----------|-------------|-------------|----------- | | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | 1.0 0.0 | | 0.0 0.0 | 1.0 0.0 | -1.0 -1.0 | 0.0 0.0 | 1 | 1.0 1.0 | 0.0 0.0 | 1.0 1.0 | 0.0 -1.0 | | 0.0 0.0 | -1.0 0.0 | 0.0 1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | | 0.0 0.0 | 0.0 -1.0 | 1.0 1.0 | 0.0 1.0 | | -1.0 -1.0 | 1.0 0.0 | 0.0 -1.0 | 0.0 1.0 | 2 | 0.0 0.0 | 0.0 1.0 | 1.0 0.0 | 0.0 0.0 | | 0.0 0.0 | 1.0 1.0 | 0.0 -1.0 | 0.0 0.0 | | -----------|-------------|-------------|----------- | | 1.0 1.0 | -1.0 0.0 | -1.0 -1.0 | 1.0 1.0 | | 0.0 0.0 | 0.0 0.0 | 1.0 0.0 | 0.0 -1.0 | 3 | 0.0 1.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | | -1.0 0.0 | -1.0 0.0 | 0.0 1.0 | 1.0 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|----------------------|---------------------- | -1.0 0.0 1.0 1.0 | 1.0 -1.0 -1.0 -1.0 | -1.0 -1.0 1.0 -1.0 | 1.0 0.0 -1.0 1.0 | 1.0 1.0 -1.0 0.0 | -1.0 0.0 1.0 0.0 | 0.0 -1.0 0.0 0.0 | 0.0 0.0 0.0 -1.0 0 | 0.0 0.0 1.0 1.0 | 0.0 -1.0 0.0 1.0 | -1.0 -1.0 0.0 -1.0 | 1.0 0.0 0.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 1.0 0.0 0.0 | 0.0 0.0 0.0 -1.0 | 1.0 1.0 0.0 0.0 -----|----------------------|---------------------- | 0.0 1.0 0.0 1.0 | 0.0 1.0 1.0 0.0 | 0.0 0.0 -1.0 -1.0 | 1.0 0.0 0.0 0.0 | 1.0 1.0 1.0 1.0 | 0.0 0.0 0.0 -1.0 | 0.0 0.0 0.0 1.0 | -1.0 0.0 0.0 1.0 1 | 1.0 1.0 -1.0 -1.0 | -1.0 0.0 1.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 0.0 0.0 -1.0 | 0.0 1.0 0.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 0.0 0.0 1.0 | -1.0 0.0 1.0 0.0
Output:
Global general 16 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3 * * | -1.0 0.0 | 0.0 1.0 | -2.0 0.0 | 1.0 -1.0 | | 0.0 0.0 | -1.0 -1.0 | -1.0 -2.0 | 1.0 -1.0 | 0 | 0.0 0.0 | 1.0 1.0 | 1.0 1.0 | -1.0 1.0 | | 1.0 -2.0 | 0.0 -2.0 | 0.0 -1.0 | 0.0 -1.0 | | -----------|-------------|-------------|----------- | | -1.0 3.0 | 0.0 1.0 | 1.0 3.0 | 0.0 2.0 | | -1.0 -1.0 | -1.0 -3.0 | 1.0 -1.0 | 1.0 0.0 | 1 | -1.0 0.0 | -1.0 2.0 | -1.0 2.0 | 0.0 1.0 | | 1.0 2.0 | 1.0 3.0 | 0.0 1.0 | -1.0 0.0 | | -----------|-------------|-------------|----------- | | 0.0 1.0 | 1.0 4.0 | -2.0 0.0 | 0.0 -1.0 | | 0.0 0.0 | 0.0 -2.0 | 0.0 -2.0 | 1.0 -1.0 | 2 | 0.0 1.0 | -1.0 0.0 | 0.0 1.0 | 0.0 1.0 | | -1.0 0.0 | -2.0 -3.0 | 1.0 0.0 | 1.0 1.0 | | -----------|-------------|-------------|----------- | | 0.0 0.0 | 1.0 1.0 | 1.0 0.0 | -1.0 1.0 | | 0.0 -1.0 | 0.0 0.0 | -1.0 0.0 | 0.0 0.0 | 3 | -1.0 1.0 | 0.0 1.0 | 0.0 1.0 | 0.0 1.0 | | 1.0 2.0 | 3.0 2.0 | 0.0 1.0 | -1.0 0.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|----------------------|---------------------- | -1.0 0.0 -2.0 0.0 | 0.0 1.0 1.0 -1.0 | 0.0 0.0 -1.0 -2.0 | -1.0 -1.0 1.0 -1.0 | 0.0 0.0 1.0 1.0 | 1.0 1.0 -1.0 1.0 | 1.0 -2.0 0.0 -1.0 | 0.0 -2.0 0.0 -1.0 0 | 0.0 1.0 -2.0 0.0 | 1.0 4.0 0.0 -1.0 | 0.0 0.0 0.0 -2.0 | 0.0 -2.0 1.0 -1.0 | 0.0 1.0 0.0 1.0 | -1.0 0.0 0.0 1.0 | -1.0 0.0 1.0 0.0 | -2.0 -3.0 1.0 1.0 -----|----------------------|---------------------- | -1.0 3.0 1.0 3.0 | 0.0 1.0 0.0 2.0 | -1.0 -1.0 1.0 -1.0 | -1.0 -3.0 1.0 0.0 | -1.0 0.0 -1.0 2.0 | -1.0 2.0 0.0 1.0 | 1.0 2.0 0.0 1.0 | 1.0 3.0 -1.0 0.0 1 | 0.0 0.0 1.0 0.0 | 1.0 1.0 -1.0 1.0 | 0.0 -1.0 -1.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 | 1.0 2.0 0.0 1.0 | 3.0 2.0 -1.0 0.0
This subroutine computes one of the following matrix-matrix products:
1. B <-- alphaAB | 3. B <-- alphaBA |
2. B <-- alphaATB | 4. B <-- alphaBAT |
where, in the formulas above:
Note: | No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form. |
If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.
alpha, A, B | Subprogram |
Long-precision real | PDTRMM |
Fortran | CALL PDTRMM (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b) |
C and C++ | pdtrmm (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b); |
If side = 'L', A is to the left of B, resulting in equation 1 or 2.
If side = 'R', A is to the right of B, resulting in equation 3 or 4.
Scope: global
Specified as: a single character; side = 'L' or 'R'.
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If transa = 'N', A is used in the computation, resulting in equation 1 or 3.
If transa = 'T', AT is used in the computation, resulting in equation 2 or 4.
Scope: global
Specified as: a single character; transa = 'N' or 'T'.
If diag = 'U', A is a unit triangular matrix.
If diag = 'N', A is not a unit triangular matrix.
Scope: global
Specified as: a single character; diag = 'U' or 'N'.
If side = 'L', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; m >= 0.
If side = 'R', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 49.
If side = 'L', numa = m
If side = 'R', numa = n
the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 49. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 and side = 'L' or n = 0 and side = 'R': N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If m = 0 or n = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix |
If m = 0 or n = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size | NB_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49.
then:
None
Unable to allocate work space
If A is not contained within a single block, that is:
and:
If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):
where numa = m if side = 'L' and numa = n if side = 'R'.
If m <> 0 and n <> 0:
If A is not contained in a single block:
If side = 'L':
If side = 'R':
This example computes B = alphaAB using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO TRANSA DIAG M N ALPHA A IA JA DESC_A | | | | | | | | | | | CALL PDTRMM( 'L' , 'U' , 'N' , 'N' , 5 , 3 , 1.0D0 , A , 1 , 1 , DESC_A , B IB JB DESC_B | | | | B , 1 , 1 , DESC_B )
| Desc_A | Desc_B | ||
---|---|---|---|---|
DTYPE_ | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | ||
M_ | 5 | 5 | ||
N_ | 5 | 3 | ||
MB_ | 2 | 2 | ||
NB_ | 2 | 2 | ||
RSRC_ | 0 | 0 | ||
CSRC_ | 0 | 0 | ||
LLD_ | See below2 | See below2 | ||
|
Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:
B,D 0 1 2 * * 0 | 3.0 -1.0 | 2.0 2.0 | 1.0 | | . -2.0 | 4.0 -1.0 | 3.0 | | -----------|-------------|------ | 1 | . . | -3.0 0.0 | 2.0 | | . . | . 4.0 | -2.0 | | -----------|-------------|------ | 2 | . . | . . | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|-----------------|------------ | 3.0 -1.0 1.0 | 2.0 2.0 0 | . -2.0 3.0 | 4.0 -1.0 | . . 1.0 | . . -----|-----------------|------------ 1 | . . 2.0 | -3.0 0.0 | . . -2.0 | . 4.0
Global rectangular 5 × 3 matrix B with block size 2 × 2:
B,D 0 1 * * 0 | 2.0 3.0 | 1.0 | | 5.0 5.0 | 4.0 | | -----------|------ | 1 | 0.0 1.0 | 2.0 | | 3.0 1.0 | -3.0 | | -----------|------ | 2 | -1.0 2.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|------------|------- | 2.0 3.0 | 1.0 0 | 5.0 5.0 | 4.0 | -1.0 2.0 | 1.0 -----|------------|------- 1 | 0.0 1.0 | 2.0 | 3.0 1.0 | -3.0
Output:
Global rectangular 5 × 3 matrix B with block size 2 × 2:
B,D 0 1 * * 0 | 6.0 10.0 | -2.0 | | -16.0 -1.0 | 6.0 | | -------------|------- | 1 | -2.0 1.0 | -4.0 | | 14.0 0.0 | -14.0 | | -------------|------- | 2 | -1.0 2.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|--------------|-------- | 6.0 10.0 | -2.0 0 | -16.0 -1.0 | 6.0 | -1.0 2.0 | 1.0 -----|--------------|-------- 1 | -2.0 1.0 | -4.0 | 14.0 0.0 | -14.0
This subroutine performs one of the following solves for a triangular
system of equations with multiple right-hand sides:
Solution | Equation |
|
---|---|---|
1. B <-- alpha(A-1)B | AX = alphaB |
|
2. B <-- alpha(A-T)B | ATX = alphaB |
|
3. B <-- alphaB(A-1) | XA = alphaB |
|
4. B <-- alphaB(A-T) | XAT = alphaB |
|
where, in the formulas above:
Notes:
If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.
alpha, A, B | Subprogram |
Long-precision real | PDTRSM |
Fortran | CALL PDTRSM (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b) |
C and C++ | pdtrsm (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b); |
If side = 'L', A is to the left of B, resulting in solution 1 or 2.
If side = 'R', A is to the right of B, resulting in solution 3 or 4.
Scope: global
Specified as: a single character; side = 'L' or 'R'.
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If transa = 'N', A is used in the system of equations, resulting in solution 1 or 3.
If transa = 'T', AT is used in the system of equations, resulting in solution 2 or 4.
Scope: global
Specified as: a single character; transa = 'N' or 'T'.
If diag = 'U', A is a unit triangular matrix.
If diag = 'N', A is not a unit triangular matrix.
Scope: global
Specified as: a single character; diag = 'U' or 'N'.
If side = 'L', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; m >= 0.
If side = 'R', it is the number of rows and columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 50.
If side = 'L', numa = m
If side = 'R', numa = n
the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If side = 'L' and m = 0: M_A >= 0 If side = 'R' and n = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If side = 'L' and m = 0: N_A >= 0 If side = 'R' and n = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If side = 'L' and m = 0: M_B >= 0 If side = 'R' and n = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix | N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size |
If side = 'L' and m = 0: N_B >= 0 If side = 'R' and n = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50.
then the global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
then the global triangular matrix A must be aligned on a block boundary, that is:
None
Unable to allocate work space
If A is not contained within a single block, that is:
then:
If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):
where numa = m if side = 'L' and numa = n if side = 'R'.
If m <> 0 and n <> 0:
If A is not contained in a single block:
If side = 'L':
If side = 'R':
This example shows the solution B <-- alpha(A-1)B using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) SIDE UPLO TRANSA DIAG M N ALPHA A IA JA DESC_A | | | | | | | | | | | CALL PDTRSM( 'L' , 'U' , 'N' , 'N' , 5 , 3 , 1.0D0 , A , 1 , 1 , DESC_A , B IB JB DESC_B | | | | B , 1 , 1 , DESC_B )
| Desc_A | Desc_B | ||
---|---|---|---|---|
DTYPE_ | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | ||
M_ | 5 | 5 | ||
N_ | 5 | 3 | ||
MB_ | 2 | 2 | ||
NB_ | 2 | 2 | ||
RSRC_ | 0 | 0 | ||
CSRC_ | 0 | 0 | ||
LLD_ | See below2 | See below2 | ||
|
Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:
B,D 0 1 2 * * 0 | 3.0 -1.0 | 2.0 2.0 | 1.0 | | . -2.0 | 4.0 -1.0 | 3.0 | | -----------|-------------|------ | 1 | . . | -3.0 0.0 | 2.0 | | . . | . 4.0 | -2.0 | | -----------|-------------|------ | 2 | . . | . . | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|-----------------|------------ | 3.0 -1.0 1.0 | 2.0 2.0 0 | . -2.0 3.0 | 4.0 -1.0 | . . 1.0 | . . -----|-----------------|------------ 1 | . . 2.0 | -3.0 0.0 | . . -2.0 | . 4.0
Global general 5 × 3 matrix B with block size 2 × 2:
B,D 0 1 * * 0 | 6.0 10.0 | -2.0 | | -16.0 -1.0 | 6.0 | | -------------|------- | 1 | -2.0 1.0 | -4.0 | | 14.0 0.0 | -14.0 | | -------------|------- | 2 | -1.0 2.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|--------------|-------- | 6.0 10.0 | -2.0 0 | -16.0 -1.0 | 6.0 | -1.0 2.0 | 1.0 -----|--------------|-------- 1 | -2.0 1.0 | -4.0 | 14.0 0.0 | -14.0
Output:
Global general 5 × 3 matrix B with block size 2 × 2:
B,D 0 1 * * 0 | 2.0 3.0 | 1.0 | | 5.0 5.0 | 4.0 | | -----------|------ | 1 | 0.0 1.0 | 2.0 | | 3.0 1.0 | -3.0 | | -----------|------ | 2 | -1.0 2.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|------------|------- | 2.0 3.0 | 1.0 0 | 5.0 5.0 | 4.0 | -1.0 2.0 | 1.0 -----|------------|------- 1 | 0.0 1.0 | 2.0 | 3.0 1.0 | -3.0
This subroutine computes one of the following rank-k updates:
where, in the formulas above:
Note: | No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form. |
In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, C | Subprogram |
Long-precision real | PDSYRK |
Fortran | CALL PDSYRK (uplo, trans, n, k, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c) |
C and C++ | pdsyrk (uplo, trans, n, k, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c); |
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If trans = 'N', the computation in equation 1 is performed.
If trans = 'T', the computation in equation 2 is performed.
Scope: global
Specified as: a single character; trans = 'N' or 'T'.
If trans = 'N', it is the number of rows in submatrix A used in the computation.
If trans = 'T', it is the number of columns in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
If trans = 'N', it is the number of columns in submatrix A used in the computation.
If trans = 'T', it is the number of rows in submatrix A used in the computation.
Scope: global
Specified as: a fullword integer; k >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 51.
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A, and:
If trans = 'N', then ia+n-1 <= M_A.
If trans = 'T', then ia+k-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A, and:
If trans = 'N', then ja+k-1 <= N_A.
If trans = 'T', then ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If n = 0 or k = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If n = 0 or k = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 51.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51.
then:
None
Unable to allocate work space
If n <> 0 and k <> 0:
If n <> 0:
and NB_C <> MB_C.
If C is not contained within a single block:
If C is contained within a single block:
This example computes C = alphaAAT+betaC using a 2 × 3 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 3 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) UPLO TRANS N K ALPHA A IA JA DESC_A BETA | | | | | | | | | | CALL PDSYRK( 'L' , 'N' , 8 , 5 , 1.0D0 , A , 1 , 1 , DESC_A , 1.0D0 , C IC JC DESC_C | | | | C , 1 , 1 , DESC_C )
| Desc_A | Desc_C | ||
---|---|---|---|---|
DTYPE_ | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | ||
M_ | 8 | 8 | ||
N_ | 5 | 8 | ||
MB_ | 2 | 2 | ||
NB_ | 2 | 2 | ||
RSRC_ | 0 | 0 | ||
CSRC_ | 0 | 0 | ||
LLD_ | See below2 | See below2 | ||
|
Global general 8 × 5 matrix A with block size 2 × 2:
B,D 0 1 2 * * 0 | 0.0 8.0 | 16.0 24.0 | 32.0 | | 1.0 9.0 | 17.0 25.0 | 33.0 | | --------------|----------------|------- | 1 | 2.0 10.0 | 18.0 26.0 | 34.0 | | 3.0 11.0 | 19.0 27.0 | 35.0 | | --------------|----------------|------- | 2 | 4.0 12.0 | 20.0 28.0 | 36.0 | | 5.0 13.0 | 21.0 29.0 | 37.0 | | --------------|----------------|------- | 3 | 6.0 14.0 | 22.0 30.0 | 38.0 | | 7.0 15.0 | 23.0 31.0 | 39.0 | * *
The following is the 2 × 3 process grid:
B,D | 0 | 1 | 2 |
---|---|---|---|
0
2 | P00 | P01 | P02 |
1
3 | P10 | P11 | P12 |
Local arrays for A:
p,q | 0 | 1 | 2 -----|-------------|----------------|-------- | 0.0 8.0 | 16.0 24.0 | 32.0 | 1.0 9.0 | 17.0 25.0 | 33.0 0 | 4.0 12.0 | 20.0 28.0 | 36.0 | 5.0 13.0 | 21.0 29.0 | 37.0 -----|-------------|----------------|-------- | 2.0 10.0 | 18.0 26.0 | 34.0 | 3.0 11.0 | 19.0 27.0 | 35.0 1 | 6.0 14.0 | 22.0 30.0 | 38.0 | 7.0 15.0 | 23.0 31.0 | 39.0
Global symmetric matrix C of order 8 block size 2 × 2:
B,D 0 1 2 3 * * 0 | 0.0 . | . . | . . | . . | | 1.0 8.0 | . . | . . | . . | | -------------|---------------|---------------|------------- | 1 | 2.0 9.0 | 15.0 . | . . | . . | | 3.0 10.0 | 16.0 21.0 | . . | . . | | -------------|---------------|---------------|------------- | 2 | 4.0 11.0 | 17.0 22.0 | 26.0 . | . . | | 5.0 12.0 | 18.0 23.0 | 27.0 30.0 | . . | | -------------|---------------|---------------|------------- | 3 | 6.0 13.0 | 19.0 24.0 | 28.0 31.0 | 33.0 . | | 7.0 14.0 | 20.0 25.0 | 29.0 32.0 | 34.0 35.0 | * *
The following is the 2 × 3 process grid:
B,D | 0 3 | 1 | 2 |
---|---|---|---|
0
2 | P00 | P01 | P02 |
1
3 | P10 | P11 | P12 |
Local arrays for C:
p,q | 0 | 1 | 2 -----|--------------------------|---------------|-------------- | 0.0 . . . | . . | . . | 1.0 8.0 . . | . . | . . 0 | 4.0 11.0 . . | 17.0 22.0 | 26.0 . | 5.0 12.0 . . | 18.0 23.0 | 27.0 30.0 -----|--------------------------|---------------|-------------- | 2.0 9.0 . . | 15.0 . | . . | 3.0 10.0 . . | 16.0 21.0 | . . 1 | 6.0 13.0 33.0 . | 19.0 24.0 | 28.0 31.0 | 7.0 14.0 34.0 35.0 | 20.0 25.0 | 29.0 32.0
Output:
Global symmetric matrix C of order 8 with block size 2 × 2:
B,D 0 1 2 3 * * 0 | 1920.0 . | . . | . . | . . | | 2001.0 2093.0 | . . | . . | . . | | -----------------|-------------------|-------------------|----------------- | 1 | 2082.0 2179.0 | 2275.0 . | . . | . . | | 2163.0 2265.0 | 2366.0 2466.0 | . . | . . | | -----------------|-------------------|-------------------|----------------- | 2 | 2244.0 2351.0 | 2457.0 2562.0 | 2666.0 . | . . | | 2325.0 2437.0 | 2548.0 2658.0 | 2767.0 2875.0 | . . | | -----------------|-------------------|-------------------|----------------- | 3 | 2406.0 2523.0 | 2639.0 2754.0 | 2868.0 2981.0 | 3093.0 . | | 2487.0 2609.0 | 2730.0 2850.0 | 2969.0 3087.0 | 3204.0 3320.0 | * *
The following is the 2 × 3 process grid:
B,D | 0 3 | 1 | 2 |
---|---|---|---|
0
2 | P00 | P01 | P02 |
1
3 | P10 | P11 | P12 |
Local arrays for C:
p,q | 0 | 1 | 2 -----|----------------------------------|-------------------|------------------ | 1920.0 . . . | . . | . . | 2001.0 2093.0 . . | . . | . . 0 | 2244.0 2351.0 . . | 2457.0 2562.0 | 2666.0 . | 2325.0 2437.0 . . | 2548.0 2658.0 | 2767.0 2875.0 -----|----------------------------------|-------------------|------------------ | 2082.0 2179.0 . . | 2275.0 . | . . | 2163.0 2265.0 . . | 2366.0 2466.0 | . . 1 | 2406.0 2523.0 3093.0 . | 2639.0 2754.0 | 2868.0 2981.0 | 2487.0 2609.0 3204.0 3320.0 | 2730.0 2850.0 | 2969.0 3087.0
This subroutine computes one of the following rank-2k updates:
where, in the formulas above:
Note: | No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms. |
In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, B, C | Subprogram |
Long-precision real | PDSYR2K |
Fortran | CALL PDSYR2K (uplo, trans, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c) |
C and C++ | pdsyr2k (uplo, trans, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c); |
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
If trans = 'N', the computation in equation 1 is performed.
If trans = 'T', the computation in equation 2 is performed.
Scope: global
Specified as: a single character; trans = 'N' or 'T'.
If trans = 'N', it is the number of rows in submatrices A and B used in the computation.
If trans = 'T', it is the number of columns in submatrices A and B used in the computation.
Scope: global
Specified as: a fullword integer; n >= 0.
If trans = 'N', it is the number of columns in submatrices A and B used in the computation.
If trans = 'T', it is the number of rows in submatrices A and B used in the computation.
Scope: global
Specified as: a fullword integer; k >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 52.
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A, and:
If trans = 'N', then ia+n-1 <= M_A.
If trans = 'T', then ia+k-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A, and:
If trans = 'N', then ja+k-1 <= N_A.
If trans = 'T', then ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If n = 0 or k = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If n = 0 or k = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Note: | No data should be moved to form BT; that is, the matrix B should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.
Scope: global
Specified as: a fullword integer; 1 <= ib <= M_B, and:
If trans = 'N', then ib+n-1 <= M_B.
If trans = 'T', then ib+k-1 <= M_B.
Scope: global
Specified as: a fullword integer; 1 <= jb <= N_B, and:
If trans = 'N', then jb+k-1 <= N_B.
If trans = 'T', then jb+n-1 <= N_B.
desc_b | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_B | Descriptor type | DTYPE_B=1 | Global |
2 | CTXT_B | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_B | Number of rows in the global matrix |
If n = 0 or k = 0: M_B >= 0 Otherwise: M_B >= 1 | Global |
4 | N_B | Number of columns in the global matrix |
If n = 0 or k = 0: N_B >= 0 Otherwise: N_B >= 1 | Global |
5 | MB_B | Row block size | MB_B >= 1 | Global |
6 | NB_B | Column block size | NB_B >= 1 | Global |
7 | RSRC_B | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_B < p | Global |
8 | CSRC_B | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_B < q | Global |
9 | LLD_B | The leading dimension of the local array | LLD_B >= max(1,LOCp(M_B)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 52.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52.
then the block column offset of A must be equal to the block column offset of B; that is, mod(ja-1, NB_A) = mod(jb-1, NB_B).
then the block row offset of A must be equal to the block row offset of B; that is, mod(ia-1, MB_A) = mod(ib-1, MB_B)
then you must follow these rules:
or if all the following are true:
then you must follow these rules:
None
Unable to allocate work space
If n <> 0 and k <> 0:
If n <> 0:
If C is contained within a single block, that is:
and:
then:
and NB_A <> NB_B.
and MB_A <> MB_B.
If C is not contained within a single block, or if C is contained within a single block and:
then:
If trans = 'N':
If trans = 'T':
In all cases:
If trans = 'N':
If trans = 'T':
This example computes C = alphaATB+alphaBTA+betaC using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) UPLO TRANS N K ALPHA A IA JA DESC_A B IB JB | | | | | | | | | | | | CALL PDSYR2K( 'U' , 'T' , 9 , 8 , 1.0D0 , A , 1 , 1 , DESC_A , B , 1 , 1 , DESC_B BETA C IC JC DESC_C | | | | | | DESC_B , 0.0D0 , C , 1 , 1 , DESC_C )
| Desc_A | Desc_B | Desc_C | ||
---|---|---|---|---|---|
DTYPE_ | 1 | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | icontxt1 | ||
M_ | 8 | 8 | 9 | ||
N_ | 9 | 9 | 9 | ||
MB_ | 2 | 2 | 4 | ||
NB_ | 4 | 4 | 4 | ||
RSRC_ | 0 | 0 | 0 | ||
CSRC_ | 0 | 0 | 0 | ||
LLD_ | See below2 | See below2 | See below2 | ||
|
Global general 8 × 9 matrix A with block size 2 × 4:
B,D 0 1 2 * * 0 | 0.0 -1.0 -1.0 0.0 | 0.0 0.0 0.0 0.0 | 1.0 | | 0.0 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 | 1.0 | | ---------------------|-----------------------|------ | 1 | 0.0 0.0 -1.0 -1.0 | 0.0 0.0 1.0 0.0 | 1.0 | | 0.0 1.0 0.0 -1.0 | 1.0 1.0 0.0 1.0 | 1.0 | | ---------------------|-----------------------|------ | 2 | 1.0 0.0 0.0 0.0 | -1.0 0.0 0.0 0.0 | 1.0 | | 1.0 0.0 0.0 0.0 | 1.0 1.0 0.0 0.0 | 1.0 | | ---------------------|-----------------------|------ | 3 | 0.0 0.0 -1.0 0.0 | -1.0 0.0 0.0 0.0 | 1.0 | | -1.0 0.0 0.0 0.0 | 0.0 0.0 -1.0 0.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|---------------------------|---------------------- | 0.0 -1.0 -1.0 0.0 1.0 | 0.0 0.0 0.0 0.0 | 0.0 1.0 0.0 1.0 1.0 | 0.0 1.0 0.0 1.0 0 | 1.0 0.0 0.0 0.0 1.0 | -1.0 0.0 0.0 0.0 | 1.0 0.0 0.0 0.0 1.0 | 1.0 1.0 0.0 0.0 -----|---------------------------|---------------------- | 0.0 0.0 -1.0 -1.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 1.0 0.0 -1.0 1.0 | 1.0 1.0 0.0 1.0 1 | 0.0 0.0 -1.0 0.0 1.0 | -1.0 0.0 0.0 0.0 | -1.0 0.0 0.0 0.0 1.0 | 0.0 0.0 -1.0 0.0
Global general 8 × 9 matrix B with block size 2 × 4:
B,D 0 1 2 * * 0 | 0.0 1.0 1.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 | | 0.0 -1.0 0.0 -1.0 | 0.0 -1.0 0.0 -1.0 | -1.0 | | ---------------------|-----------------------|------ | 1 | 0.0 0.0 1.0 1.0 | 0.0 0.0 -1.0 0.0 | -1.0 | | 0.0 -1.0 0.0 1.0 | -1.0 -1.0 0.0 -1.0 | -1.0 | | ---------------------|-----------------------|------ | 2 | -1.0 0.0 0.0 0.0 | 1.0 0.0 0.0 0.0 | -1.0 | | -1.0 0.0 0.0 0.0 | -1.0 -1.0 0.0 0.0 | -1.0 | | ---------------------|-----------------------|------ | 3 | 0.0 0.0 1.0 0.0 | 1.0 0.0 0.0 0.0 | -1.0 | | 1.0 0.0 0.0 0.0 | 0.0 0.0 1.0 0.0 | -1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for B:
p,q | 0 | 1 -----|---------------------------|---------------------- | 0.0 1.0 1.0 0.0 -1.0 | 0.0 0.0 0.0 0.0 | 0.0 -1.0 0.0 -1.0 -1.0 | 0.0 -1.0 0.0 -1.0 0 | -1.0 0.0 0.0 0.0 -1.0 | 1.0 0.0 0.0 0.0 | -1.0 0.0 0.0 0.0 -1.0 | -1.0 -1.0 0.0 0.0 -----|---------------------------|---------------------- | 0.0 0.0 1.0 1.0 -1.0 | 0.0 0.0 -1.0 0.0 | 0.0 -1.0 0.0 1.0 -1.0 | -1.0 -1.0 0.0 -1.0 1 | 0.0 0.0 1.0 0.0 -1.0 | 1.0 0.0 0.0 0.0 | 1.0 0.0 0.0 0.0 -1.0 | 0.0 0.0 1.0 0.0
Output:
Global symmetric matrix C of order 9 with block size 4 × 4:
B,D 0 1 2 * * | -6.0 0.0 0.0 0.0 | 0.0 -2.0 -2.0 0.0 | -2.0 | | . -6.0 -2.0 0.0 | -2.0 -4.0 0.0 -4.0 | -2.0 | 0 | . . -6.0 -2.0 | -2.0 0.0 2.0 0.0 | 6.0 | | . . . -6.0 | 2.0 0.0 2.0 0.0 | 2.0 | | ---------------------|-----------------------|------- | | . . . . | -8.0 -4.0 0.0 -2.0 | 0.0 | | . . . . | . -6.0 0.0 -4.0 | -6.0 | 1 | . . . . | . . -4.0 0.0 | 0.0 | | . . . . | . . . -4.0 | -4.0 | | ---------------------|-----------------------|------- | 2 | . . . . | . . . . | -16.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|----------------------------|---------------------- | -6.0 0.0 0.0 0.0 -2.0 | 0.0 -2.0 -2.0 0.0 | . -6.0 -2.0 0.0 -2.0 | -2.0 -4.0 0.0 -4.0 0 | . . -6.0 -2.0 6.0 | -2.0 0.0 2.0 0.0 | . . . -6.0 2.0 | 2.0 0.0 2.0 0.0 | . . . . -16.0 | . . . . -----|----------------------------|---------------------- | . . . . 0.0 | -8.0 -4.0 0.0 -2.0 | . . . . -6.0 | . -6.0 0.0 -4.0 1 | . . . . 0.0 | . . -4.0 0.0 | . . . . -4.0 | . . . -4.0
This subroutine performs the following matrix computation:
where, in the formula above:
Note: | No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form. |
In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:
alpha, beta, A, C | Subprogram |
Long-precision real | PDTRAN |
Fortran | CALL PDTRAN (m, n, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c) |
C and C++ | pdtran (m, n, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c); |
Scope: global
Specified as: a fullword integer; m >= 0.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a number of the data type indicated in Table 53.
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+m-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix |
If m = 0 or n = 0: M_A >= 0 Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix |
If m = 0 or n = 0: N_A >= 0 Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: global
Specified as: a number of the data type indicated in Table 53.
When beta is zero, C need not be set on input.
Scope: local
Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.
Scope: global
Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.
Scope: global
Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.
desc_c | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_C | Descriptor type | DTYPE_C=1 | Global |
2 | CTXT_C | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_C | Number of rows in the global matrix |
If m = 0 or n = 0: M_C >= 0 Otherwise: M_C >= 1 | Global |
4 | N_C | Number of columns in the global matrix |
If m = 0 or n = 0: N_C >= 0 Otherwise: N_C >= 1 | Global |
5 | MB_C | Row block size | MB_C >= 1 | Global |
6 | NB_C | Column block size | NB_C >= 1 | Global |
7 | RSRC_C | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_C < p | Global |
8 | CSRC_C | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_C < q | Global |
9 | LLD_C | The leading dimension of the local array | LLD_C >= max(1,LOCp(M_C)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
Scope: local
Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53.
then adist = 'C'
then adist = 'R'
then:
then:
None
Unable to allocate work space
Note: | Some of the following error conditions depend on the value of adist--that is, adist = 'C' or or adist = 'R'. For details on determining the value, see "Notes and Coding Rules". |
If m <> 0 and n <> 0:
If adist = 'C':
then:
If adist = 'R':
then:
This example computes C = betaC+alphaAT using a 2 × 2 process grid.
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET (0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) M N ALPHA A IA JA DESC_A BETA C IC JC DESC_C | | | | | | | | | | | | CALL PDTRAN( 9 , 8 , 1.0D0 , A , 1 , 1 , DESC_A , 1.0D0 , C , 1 , 1 , DESC_C )
| Desc_A | Desc_C | ||
---|---|---|---|---|
DTYPE_ | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | ||
M_ | 8 | 9 | ||
N_ | 9 | 8 | ||
MB_ | 2 | 4 | ||
NB_ | 4 | 2 | ||
RSRC_ | 0 | 0 | ||
CSRC_ | 0 | 0 | ||
LLD_ | See below2 | See below2 | ||
|
Global general 8 × 9 matrix A with block size 2 × 4:
B,D 0 1 2 * * 0 | 0.0 -1.0 -1.0 0.0 | 0.0 0.0 0.0 0.0 | 1.0 | | 0.0 1.0 0.0 1.0 | 0.0 1.0 0.0 1.0 | 1.0 | | ---------------------|-----------------------|------ | 1 | 0.0 0.0 -1.0 -1.0 | 0.0 0.0 1.0 0.0 | 1.0 | | 0.0 1.0 0.0 -1.0 | 1.0 1.0 0.0 1.0 | 1.0 | | ---------------------|-----------------------|------ | 2 | 1.0 0.0 0.0 0.0 | -1.0 0.0 0.0 0.0 | 1.0 | | 1.0 0.0 0.0 0.0 | 1.0 1.0 0.0 0.0 | 1.0 | | ---------------------|-----------------------|------ | 3 | 0.0 0.0 -1.0 0.0 | -1.0 0.0 0.0 0.0 | 1.0 | | -1.0 0.0 0.0 0.0 | 0.0 0.0 -1.0 0.0 | 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|---------------------------|---------------------- | 0.0 -1.0 -1.0 0.0 1.0 | 0.0 0.0 0.0 0.0 | 0.0 1.0 0.0 1.0 1.0 | 0.0 1.0 0.0 1.0 0 | 1.0 0.0 0.0 0.0 1.0 | -1.0 0.0 0.0 0.0 | 1.0 0.0 0.0 0.0 1.0 | 1.0 1.0 0.0 0.0 -----|---------------------------|---------------------- | 0.0 0.0 -1.0 -1.0 1.0 | 0.0 0.0 1.0 0.0 | 0.0 1.0 0.0 -1.0 1.0 | 1.0 1.0 0.0 1.0 1 | 0.0 0.0 -1.0 0.0 1.0 | -1.0 0.0 0.0 0.0 | -1.0 0.0 0.0 0.0 1.0 | 0.0 0.0 -1.0 0.0
Global general 9 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3 * * | 0.0 1.0 | 1.0 5.0 | 6.0 7.0 | 8.0 9.0 | | 0.0 -1.0 | 0.0 -1.0 | 0.0 -1.0 | 0.0 1.0 | 0 | 0.0 0.0 | 1.0 1.0 | 0.0 0.0 | -1.0 0.0 | | 0.0 -1.0 | 0.0 1.0 | -1.0 -1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | | -1.0 2.0 | 0.0 0.0 | 1.0 0.0 | 0.0 0.0 | | -1.0 3.0 | 0.0 0.0 | -1.0 -1.0 | 0.0 0.0 | 1 | 0.0 4.0 | 1.0 0.0 | 1.0 0.0 | 0.0 0.0 | | 1.0 5.0 | 0.0 0.0 | 0.0 0.0 | 1.0 0.0 | | -----------|-------------|-------------|----------- | 2 | 1.0 2.0 | 3.0 4.0 | 1.0 1.0 | 1.0 1.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|----------------------|---------------------- | 0.0 1.0 6.0 7.0 | 1.0 5.0 8.0 9.0 | 0.0 -1.0 0.0 -1.0 | 0.0 -1.0 0.0 1.0 0 | 0.0 0.0 0.0 0.0 | 1.0 1.0 -1.0 0.0 | 0.0 -1.0 -1.0 -1.0 | 0.0 1.0 0.0 1.0 | 1.0 2.0 1.0 1.0 | 3.0 4.0 1.0 1.0 -----|----------------------|---------------------- | -1.0 2.0 1.0 0.0 | 0.0 0.0 0.0 0.0 | -1.0 3.0 -1.0 -1.0 | 0.0 0.0 0.0 0.0 1 | 0.0 4.0 1.0 0.0 | 1.0 0.0 0.0 0.0 | 1.0 5.0 0.0 0.0 | 0.0 0.0 1.0 0.0
Output:
Global general 9 × 8 matrix C with block size 4 × 2:
B,D 0 1 2 3 * * | 0.0 1.0 | 1.0 5.0 | 7.0 8.0 | 8.0 8.0 | | -1.0 0.0 | 0.0 0.0 | 0.0 -1.0 | 0.0 1.0 | 0 | -1.0 0.0 | 0.0 1.0 | 0.0 0.0 | -2.0 0.0 | | 0.0 0.0 | -1.0 0.0 | -1.0 -1.0 | 0.0 1.0 | | -----------|-------------|-------------|----------- | | -1.0 2.0 | 0.0 1.0 | 0.0 1.0 | -1.0 0.0 | | -1.0 4.0 | 0.0 1.0 | -1.0 0.0 | 0.0 0.0 | 1 | 0.0 4.0 | 2.0 0.0 | 1.0 0.0 | 0.0 -1.0 | | 1.0 6.0 | 0.0 1.0 | 0.0 0.0 | 1.0 0.0 | | -----------|-------------|-------------|----------- | 2 | 2.0 3.0 | 4.0 5.0 | 2.0 2.0 | 2.0 2.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for C:
p,q | 0 | 1 -----|----------------------|---------------------- | 0.0 1.0 7.0 8.0 | 1.0 5.0 8.0 8.0 | -1.0 0.0 0.0 -1.0 | 0.0 0.0 0.0 1.0 0 | -1.0 0.0 0.0 0.0 | 0.0 1.0 -2.0 0.0 | 0.0 0.0 -1.0 -1.0 | -1.0 0.0 0.0 1.0 | 2.0 3.0 2.0 2.0 | 4.0 5.0 2.0 2.0 -----|----------------------|---------------------- | -1.0 2.0 0.0 1.0 | 0.0 1.0 -1.0 0.0 | -1.0 4.0 -1.0 0.0 | 0.0 1.0 0.0 0.0 1 | 0.0 4.0 1.0 0.0 | 2.0 0.0 0.0 -1.0 | 1.0 6.0 0.0 0.0 | 0.0 1.0 1.0 0.0