Guide and Reference


Level 3 PBLAS (Message Passing)

This chapter describes the Level 3 PBLAS subroutines.


Overview of the Level 3 PBLAS Subroutines

The Level 3 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 3 BLAS.
Note: These subroutines are designed in accordance with the proposed Level 3 PBLAS standard. (See references [14], [15], and [17].) If these subroutines do not comply with the standard as approved, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program.

Table 44. List of Level 3 PBLAS (Message Passing)
Descriptive Name Long-Precision Subprogram Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose PDGEMM

PZGEMM

PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric PDSYMM PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product PDTRMM PDTRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides PDTRSM PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix PDSYRK PDSYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix PDSYR2K PDSYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix PDTRAN PDTRAN--Matrix Transpose for a General Matrix

Level 3 PBLAS Subroutines

This section contains the Level 3 PBLAS subroutine descriptions.

PDGEMM and PZGEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose

PDGEMM performs any one of the following combined matrix computations:

C <-- alphaAB+betaC
C <-- alphaABT+betaC
C <-- alphaATB+betaC
C <-- alphaATBT+betaC

PZGEMM performs any one of the following combined matrix computations:

C <-- alphaAB+betaC
C <-- alphaABT+betaC
C <-- alphaATB+betaC
C <-- alphaATBT+betaC
C <-- alphaAHB+betaC
C <-- alphaAHBT+betaC
C <-- alphaABH+betaC
C <-- alphaATBH+betaC
C <-- alphaAHBH+betaC

where, in the PDGEMM and PZGEMM formulas above:

A represents the global general submatrix:
B represents the global general submatrix:
C represents the global general submatrix Cic:ic+m-1, jc:jc+n-1.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following four cases, no computation is performed and the subroutine returns after doing some parameter checking:

Assuming the above conditions do not exist, if beta is not one and k is 0, then betaC is returned.

See references [14] and [15].

Table 45. Data Types
A, B, C, alpha, beta Subroutine
Long-precision real PDGEMM
Long-precision complex PZGEMM

Syntax

Fortran CALL PDGEMM | PZGEMM (transa, transb, m, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c)
C and C++ pdgemm | pzgemm (transa, transb, m, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c);

On Entry

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', AT is used in the computation.

If transa = 'C', AH is used in the computation.

Scope: global

Specified as: a single character; transa = 'N', 'T', or 'C'

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', BT is used in the computation.

If transb = 'C', BH is used in the computation.

Scope: global

Specified as: a single character; transb = 'N', 'T', or 'C'

m

is the number of rows in submatrix C used in the computation, and:

If transa = 'N', it is the number of rows in submatrix A.

If transa = 'T' or 'C', it is the number of columns in submatrix A.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix C used in the computation, and:

If transb = 'N', it is the number of columns in submatrix B.

If transb = 'T' or 'C', it is the number of rows in submatrix B.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If transa = 'N', it is the number of columns in submatrix A.

If transa = 'T' or 'C', it is the number of rows in submatrix A.

In addition:

If transb = 'N', it is the number of rows in submatrix B.

If transb = 'T' or 'C', it is the number of columns in submatrix B.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 45.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore:
Note: No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If transa = 'N', then ia+m-1 <= M_A.

If transa = 'T' or 'C', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If transa = 'N', then ja+k-1 <= N_A.

If transa = 'T' or 'C', then ja+m-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If m = 0 or k = 0:
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If m = 0 or k = 0:
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore:
Note: No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B, and:

If transb = 'N', then ib+k-1 <= M_B.

If transb = 'T' or 'C', then ib+n-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B, and:

If transb = 'N', then jb+n-1 <= N_B.

If transb = 'T' or 'C', then jb+k-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix
If k = 0 or n = 0:
     M_B >= 0
Otherwise:
     M_B >= 1

Global
4 N_B Number of columns in the global matrix
If k = 0 or n = 0:
     N_B >= 0
Otherwise:
     N_B >= 1

Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 45.

c

is the local part of the global general matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+m-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+m-1 by jc+n-1 part of the global matrix.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:
desc_c Name Description Limits Scope
1 DTYPE_C Descriptor type DTYPE_C=1 Global
2 CTXT_C BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_C Number of rows in the global matrix
If m = 0 or n = 0:
     M_C >= 0
Otherwise:
     M_C >= 1

Global
4 N_C Number of columns in the global matrix
If m = 0 or n = 0:
     N_C >= 0
Otherwise:
     N_C >= 1

Global
5 MB_C Row block size MB_C >= 1 Global
6 NB_C Column block size NB_C >= 1 Global
7 RSRC_C The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_C < p Global
8 CSRC_C The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_C < q Global
9 LLD_C The leading dimension of the local array LLD_C >= max(1,LOCp(M_C)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 45.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the transa and transb arguments.

  2. For PDGEMM, if you specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.

  3. The matrices must have no common elements; otherwise, results are unpredictable.

  4. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  5. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  6. The following values must be equal: CTXT_A = CTXT_B = CTXT_C.

  7. The coding rules described in this note depend upon which matrix--A, B, or C--is used as the reference matrix, which is referred to, in general, as matrix X. For each of the three possible selections for the reference matrix, there is a unique set of coding rules that must be met. These are detailed in Table 46 and Table 47. Follow these steps to select a reference matrix and determine what coding rules to use:

    Step 1: First, the reference matrix is selected. For optimal performance, the reference matrix is selected based on the arguments m, n, and k, as follows:

    If k <= min(m, n), then X = C
    If n <= min(m, k), then X = A
    If m <= min(n, k), then X = B

    The matrix selected must satisfy coding rules a and d, described below, to be a suitable reference matrix. If it does, you go to step 2. If it does not, then it checks to see if either of the other two matrices satisfies coding rules a, c, and d, making one of them a suitable reference matrix. If one of them is suitable, then you go to step 2. If neither matrix is suitable, an error condition results.

    Step 2: After a suitable reference matrix is chosen in Step 2, all remaining coding rules, described below, are checked. If the rules are satisfied, the subroutine continues normally. If they are not, an error condition results.

    Coding Rules: Following are the coding rules:

    1. The reference matrix must be aligned on a block boundary; that is:
      ix-1 must be a multiple of MB_X.
      jx-1 must be a multiple of NB_X.

      These indexes are indicated in column 5 of Table 46 for each entry for X.

    2. The block sizes that must be equal are indicated in column 4 of Table 46 for each entry for X. The rules for block sizes depend only upon the values of transa and transb, and not on the reference matrix selected; however, for your convenience, the rules are repeated in the table for each reference matrix.

    3. Given the reference matrix X, additional rules apply to the block row and block column offsets of the two nonreference matrices. These rules are listed in column 7 of Table 46 for each entry for X. These rules must only be met when looping is required--that is, either of the conditions in column 8 is met.

    4. The indexes of the nonreference matrices, which need to be on a block boundary, are listed in column 6 of Table 46 for each entry for X.

      Table 46. Coding Rules for the Reference Matrix X
      -1-
      X
      

      -2-
      transa
      

      -3-
      transb
      

      -4-
      (b)
      Equal
      Block Sizes
      

      -5-
      (a)
      Block
      Bndry
      For
      X
      

      -6-
      (d)
      Block
      Bndry
      For
      Other
      

      -7-
      (c)
      Equal
      Block Offsets
      (If Looping
      is Required)
      

      -8-
      (c)
      Conditions
      For Looping
      

      A 'N' 'N'
      MB_A = MB_C
      NB_B = NB_C
      NB_A = MB_B
      

      ia, ja ib, ic
      mod(jb-1, NB_B)
                =
      mod(jc-1, NB_C)
      

      n+mod(jb-1, NB_B) > NB_B
                    -or-
      n+mod(jc-1, NB_C) > NB_C
      

      A 'N' 'T' or 'C'
      MB_A = MB_C
      MB_B = NB_C
      NB_A = NB_B
      

      ia, ja jb, ic
      mod(ib-1, MB_B)
                =
      mod(jc-1, NB_C)
      

      n+mod(ib-1, MB_B) > MB_B
                    -or-
      n+mod(jc-1, NB_C) > NB_C
      

      A 'T' or 'C' 'N'
      NB_A = MB_C
      NB_B = NB_C
      MB_A = MB_B
      

      ia, ja ib, ic
      mod(jb-1, NB_B)
                =
      mod(jc-1, NB_C)
      

      n+mod(jb-1, NB_B) > NB_B
                    -or-
      n+mod(jc-1, NB_C) > NB_C
      

      A 'T' or 'C' 'T' or 'C'
      NB_A = MB_C
      MB_B = NB_C
      MB_A = NB_B
      

      ia, ja jb, ic
      mod(ib-1, MB_B)
                =
      mod(jc-1, NB_C)
      

      n+mod(ib-1, MB_B) > MB_B
                    -or-
      n+mod(jc-1, NB_C) > NB_C
      

      B 'N' 'N'
      MB_A = MB_C
      NB_B = NB_C
      NB_A = MB_B
      

      ib, jb ja, jc
      mod(ia-1, MB_A)
                =
      mod(ic-1, MB_C)
      

      m+mod(ia-1, MB_A) > MB_A
                    -or-
      m+mod(ic-1, MB_C) > MB_C
      

      B 'N' 'T' or 'C'
      MB_A = MB_C
      MB_B = NB_C
      NB_A = NB_B
      

      ib, jb ja, jc
      mod(ia-1, MB_A)
                =
      mod(ic-1, MB_C)
      

      m+mod(ia-1, MB_A) > MB_A
                    -or-
      m+mod(ic-1, MB_C) > MB_C
      

      B 'T' or 'C' 'N'
      NB_A = MB_C
      NB_B = NB_C
      MB_A = MB_B
      

      ib, jb ia, jc
      mod(ja-1, NB_A)
                =
      mod(ic-1, MB_C)
      

      m+mod(ja-1, NB_A) > NB_A
                    -or-
      m+mod(ic-1, MB_C) > MB_C
      

      B 'T' or 'C' 'T' or 'C'
      NB_A = MB_C
      MB_B = NB_C
      MB_A = NB_B
      

      ib, jb ia, jc
      mod(ja-1, NB_A)
                =
      mod(ic-1, MB_C)
      

      m+mod(ja-1, NB_A) > NB_A
                    -or-
      m+mod(ic-1, MB_C) > MB_C
      

      C 'N' 'N'
      MB_A = MB_C
      NB_B = NB_C
      NB_A = MB_B
      

      ic, jc ia, jb
      mod(ja-1, NB_A)
                =
      mod(ib-1, MB_B)
      

      k+mod(ja-1, NB_A) > NB_A
                    -or-
      k+mod(ib-1, MB_B) > MB_B
      

      C 'N' 'T' or 'C'
      MB_A = MB_C
      MB_B = NB_C
      NB_A = NB_B
      

      ic, jc ia, ib
      mod(ja-1, NB_A)
                =
      mod(jb-1, NB_B)
      

      k+mod(ja-1, NB_A) > NB_A
                    -or-
      k+mod(jb-1, NB_B) > NB_B
      

      C 'T' or 'C' 'N'
      NB_A = MB_C
      NB_B = NB_C
      MB_A = MB_B
      

      ic, jc ja, jb
      mod(ia-1, MB_A)
                =
      mod(ib-1, MB_B)
      

      k+mod(ia-1, MB_A) > MB_A
                    -or-
      k+mod(ib-1, MB_B) > MB_B
      

      C 'T' or 'C' 'T' or 'C'
      NB_A = MB_C
      MB_B = NB_C
      MB_A = NB_B
      

      ic, jc ja, ib
      mod(ia-1, MB_A)
                =
      mod(jb-1, NB_B)
      

      k+mod(ia-1, MB_A) > MB_A
                    -or-
      k+mod(jb-1, NB_B) > NB_B
      

    5. Additional rules apply to the row and column alignment of the various matrices in the process grid; specifically, the process row or process column containing the first row or column of the reference submatrix X, respectively, must also contain the first row or column of one of the other two nonreference submatrices, as indicated in column 4 of Table 47 for each entry for X. Following is the definition of ixrow and ixcol, which holds true for A, B, and C:
      ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
      ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)


      Table 47. Coding Rules for the Reference Matrix X
      -1-
      X
      

      -2-
      transa
      

      -3-
      transb
      

      -4-
      (e)
      Process Grid
      Alignment
      

      A 'N' 'N' iarow = icrow
      A 'N' 'T' or 'C'
      iarow = icrow
      ibcol = iacol
      

      A 'T' or 'C' 'N' iarow = ibrow
      A 'T' or 'C' 'T' or 'C' (no rules)
      B 'N' 'N' ibcol = iccol
      B 'N' 'T' or 'C' ibcol = iacol
      B 'T' or 'C' 'N'
      iarow = ibrow
      ibcol = iccol
      

      B 'T' or 'C' 'T' or 'C' (no rules)
      C 'N' 'N'
      iarow = icrow
      ibcol = iccol
      

      C 'N' 'T' or 'C' iarow = icrow
      C 'T' or 'C' 'N' ibcol = iccol
      C 'T' or 'C' 'T' or 'C' (no rules)

    Example: Following is an example of the coding rules necessary for the case where transa = 'N' and transb = 'N', where the reference matrix selected is A. Following are the indexes, dimensions, and block sizes used in the computation for the matrices:



    Indexes:        ic  jc             ia  ja        ib  jb             ic  jc
                     |   |              |   |         |   |              |   |
    Dimensions:  C ( m , n )  <--  alpha  A ( m , k )   B ( k , n )  +  beta  C ( m , n )
                     |   |              |   |         |   |              |   |
    Block Sizes:   MB_C NB_C          MB_A NB_A     MB_B NB_B          MB_C NB_C
    

    1. A must be aligned on a block boundary, as indicated in column 5 in Table 46:
      ia-1 must be a multiple of MB_A.
      ja-1 must be a multiple of NB_A.

    2. The block sizes that correspond to each matrix dimension must be equal, where MB_ represents the row dimension and NB_ represents the column dimension, as indicated in column 4 in Table 46:
      MB_A = MB_C
      NB_B = NB_C
      NB_A = MB_B

    3. As shown above, m and k are the dimensions of the reference matrix A; therefore, n is used to determine if looping is required; that is, if one of the following is true, as indicated in column 8 in Table 46:
      n+mod(jc-1, NB_C) > NB_C
      n+mod(jb-1, NB_B) > NB_B

      then the following offsets must be equal, as indicated in column 7 in Table 46:

      mod(jb-1, NB_B) = mod(jc-1, NB_C)

    4. The other indexes from each of the nonreference matrices--not used in c above--must be aligned on a block boundary, as indicated in column 6 in Table 46:
      ic-1 must be a multiple of MB_C.
      ib-1 must be a multiple of MB_B.

    5. In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix C, as indicated in column 4 in Table 47; that is, iarow = icrow, where:
      iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
      icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.
  3. DTYPE_C is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. The subroutine was called from outside the process grid.

Stage 4
  1. transa <> 'N', 'T', or 'C'
  2. transb <> 'N', 'T', or 'C'
  3. m < 0
  4. n < 0
  5. k < 0
  6. M_A < 0 and (m = 0 or k = 0); M_A < 1 otherwise
  7. N_A < 0 and (m = 0 or k = 0); N_A < 1 otherwise
  8. M_B < 0 and (k = 0 or n = 0); M_B < 1 otherwise
  9. N_B < 0 and (k = 0 or n = 0); N_B < 1 otherwise
  10. M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
  11. N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
  12. ia < 1
  13. ib < 1
  14. ic < 1
  15. ja < 1
  16. jb < 1
  17. jc < 1
  18. MB_A < 1
  19. MB_B < 1
  20. MB_C < 1
  21. NB_A < 1
  22. NB_B < 1
  23. NB_C < 1
  24. RSRC_A < 0 or RSRC_A >= p
  25. RSRC_B < 0 or RSRC_B >= p
  26. RSRC_C < 0 or RSRC_C >= p
  27. CSRC_A < 0 or CSRC_A >= q
  28. CSRC_B < 0 or CSRC_B >= q
  29. CSRC_C < 0 or CSRC_C >= q
  30. CTXT_A <> CTXT_B
  31. CTXT_A <> CTXT_C

Stage 5

    If m <> 0 and k <> 0:

  1. transa = 'N' and ia+m-1 > M_A
  2. transa = 'T' or 'C' and ia+k-1 > M_A
  3. transa = 'N' and ja+k-1 > N_A
  4. transa = 'T' or 'C' and ja+m-1 > N_A
  5. ia > M_A
  6. ja > N_A

    If n <> 0 and k <> 0:

  7. transb = 'N' and ib+k-1 > M_B
  8. transb = 'T' or 'C' and ib+n-1 > M_B
  9. transb = 'N' and jb+n-1 > N_B
  10. transb = 'T' or 'C' and jb+k-1 > N_B
  11. ib > M_B
  12. jb > N_B

    If m <> 0 and n <> 0:

  13. ic+m-1 > M_C
  14. jc+n-1 > N_C
  15. ic > M_C
  16. jc > N_C
  17. For the reference matrix (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, the indexes listed in column 5 of Table 46 are not aligned on a block boundary, where boundary alignment is defined as:
    ix-1 must be a multiple of MB_X.
    jx-1 must be a multiple of NB_X.
  18. For the two nonreference matrices (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, the indexes listed in column 6 of Table 46 are not aligned on a block boundary. Using Z to represent one of the nonreference matrices, each boundary alignment is expressed as one of the following:
    iz-1 must be a multiple of MB_Z.
    jz-1 must be a multiple of NB_Z.
  19. For the reference matrix (defined in note 7 in "Notes and Coding Rules") and the appropriate transa and transb values, if looping occurs--that is, one of the conditions in column 8 of Table 46 is true--then the block offsets indicated in column 7 are not equal.

Stage 6
  1. For the appropriate transa and transb values indicated in Table 46 (where the reference matrix does not matter), some of the block sizes indicated in column 4 are not equal.
  2. LLD_A < max(1, LOCp(M_A))
  3. LLD_B < max(1, LOCp(M_B))
  4. LLD_C < max(1, LOCp(M_C))
  5. In the process grid, the process row or process column containing the first row or column of the reference submatrix X (defined in note 7 in "Notes and Coding Rules"), respectively, does not contain the first row or column of one of the other two nonreference submatrices, as indicated in column 4 of Table 47. Following is the definition of ixrow and ixcol, which holds true for A, B, and C:
    ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
    ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)

Example 1

This example computes C = betaC+alphaAB using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
            TRANSA TRANSB  M    N    K   ALPHA    A  IA  JA   DESC_A   B  IB  JB
               |      |    |    |    |     |      |   |   |     |      |   |   |
 CALL PDGEMM( 'N' ,  'N' , 6  , 4  , 5 , 1.0D0  , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B    BETA    C  IC  JC   DESC_C
                |         |     |   |   |     |
              DESC_B ,  2.0D0 , C , 1 , 1 , DESC_C )



Desc_A Desc_B Desc_C
DTYPE_ 1 1 1
CTXT_ icontxt1 icontxt1 icontxt1
M_ 6 5 6
N_ 5 4 4
MB_ 3 2 3
NB_ 2 2 2
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below2 See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_C = 3 on all processes, and LLD_B = 3 on P10 and P01 and LLD_B = 2 on P10 and P11.

Global general 6 × 5 matrix A with block size 3 × 2:

B,D        0             1          2
     *                                  *
     |  1.0  2.0  |  -1.0 -1.0  |   4.0 |
 0   |  2.0  0.0  |   1.0  1.0  |  -1.0 |
     |  1.0 -1.0  |  -1.0  1.0  |   2.0 |
     | -----------|-------------|------ |
     | -3.0  2.0  |   2.0  2.0  |   0.0 |
 1   |  4.0  0.0  |  -2.0  1.0  |  -1.0 |
     | -1.0 -1.0  |   1.0 -3.0  |   2.0 |
     *                                  *

The following is the 2 × 2 process grid:
B,D 0 2 1
0 P00 P01
1 P10 P11

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  1.0  2.0  4.0  |  -1.0 -1.0
 0   |  2.0  0.0 -1.0  |   1.0  1.0
     |  1.0 -1.0  2.0  |  -1.0  1.0
-----|-----------------|------------
     | -3.0  2.0  0.0  |   2.0  2.0
 1   |  4.0  0.0 -1.0  |  -2.0  1.0
     | -1.0 -1.0  2.0  |   1.0 -3.0

Global general 5 × 4 matrix B with block size 2 × 2:

B,D        0             1
     *                         *
 0   |  1.0 -1.0  |   0.0  2.0 |
     |  2.0  2.0  |  -1.0 -2.0 |
     | -----------|----------- |
 1   |  1.0  0.0  |  -1.0  1.0 |
     | -3.0 -1.0  |   1.0 -1.0 |
     | -----------|----------- |
 2   |  4.0  2.0  |  -1.0  1.0 |
     *                         *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for B:

p,q  |     0      |      1
-----|------------|------------
     |  1.0 -1.0  |   0.0  2.0
 0   |  2.0  2.0  |  -1.0 -2.0
     |  4.0  2.0  |  -1.0  1.0
-----|------------|------------
 1   |  1.0  0.0  |  -1.0  0.0
     | -3.0 -1.0  |   1.0 -1.0

Global general 6 × 4 matrix C with block size 3 × 2:

B,D        0             1
     *                         *
     |  0.5  0.5  |   0.5  0.5 |
 0   |  0.5  0.5  |   0.5  0.5 |
     |  0.5  0.5  |   0.5  0.5 |
     | -----------|----------- |
     |  0.5  0.5  |   0.5  0.5 |
 1   |  0.5  0.5  |   0.5  0.5 |
     |  0.5  0.5  |   0.5  0.5 |
     *                         *

The following is the 2 × 2 process grid:
B,D 0 1
0 P00 P01
1 P10 P11

Local arrays for C:

p,q  |     0      |      1
-----|------------|------------
     |  0.5  0.5  |   0.5  0.5
 0   |  0.5  0.5  |   0.5  0.5
     |  0.5  0.5  |   0.5  0.5
-----|------------|------------
     |  0.5  0.5  |   0.5  0.5
 1   |  0.5  0.5  |   0.5  0.5
     |  0.5  0.5  |   0.5  0.5

Output:

Global general 6 × 4 matrix C with block size 3 × 2:

B,D         0               1
     *                             *
     |  24.0  13.0  |   -5.0   3.0 |
 0   |  -3.0  -4.0  |    2.0   4.0 |
     |   4.0   1.0  |    2.0   5.0 |
     | -------------|------------- |
     |  -2.0   6.0  |   -1.0  -9.0 |
 1   |  -4.0  -6.0  |    5.0   5.0 |
     |  16.0   7.0  |   -4.0   7.0 |
     *                             *

The following is the 2 × 2 process grid:
B,D 0 1
0 P00 P01
1 P10 P11

Local arrays for C:

p,q  |      0       |       1
-----|--------------|--------------
     |  24.0  13.0  |   -5.0   3.0
 0   |  -3.0  -4.0  |    2.0   4.0
     |   4.0   1.0  |    2.0   5.0
-----|--------------|--------------
     |  -2.0   6.0  |   -1.0  -9.0
 1   |  -4.0  -6.0  |    5.0   5.0
     |  16.0   7.0  |   -4.0   7.0

Example 2

This example computes C = betaC+alphaAB using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
           TRANSA TRANSB M   N   K       ALPHA       A  IA  JA   DESC_A   B  IB  JB
              |     |    |   |   |         |         |   |   |     |      |   |   |
 CALL PZGEMM('N' , 'N' , 6 , 2 , 3 , (1.0D0,0.0D0) , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B       BETA        C  IC  JC   DESC_C
                |            |         |   |   |     |
              DESC_B , (2.0D0,0.0D0) , C , 1 , 1 , DESC_C)



Desc_A Desc_B Desc_C
DTYPE_ 1 1 1
CTXT_ icontxt1 icontxt1 icontxt1
M_ 6 3 6
N_ 3 2 2
MB_ 2 2 2
NB_ 2 2 2
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below2 See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on P00 and P01 and LLD_A = 2 on P10 and P11. LLD_B = 2 on P00 and LLD_B = 1 on P10. LLD_C = 4 on P00 and LLD_C = 2 on P10.

Global general 6 × 3 matrix A with block size 2 × 2:

B,D              0                   1
     *                                      *
 0   |  (1.0,5.0)  (9.0,2.0)  |   (1.0,9.0) |
     |  (2.0,4.0)  (8.0,3.0)  |   (1.0,8.0) |
     | -----------------------|------------ |
 1   |  (3.0,3.0)  (7.0,5.0)  |   (1.0,7.0) |
     |  (4.0,2.0)  (4.0,7.0)  |   (1.0,5.0) |
     | -----------------------|------------ |
 2   |  (5.0,1.0)  (5.0,1.0)  |   (1.0,6.0) |
     |  (6.0,6.0)  (3.0,6.0)  |   (1.0,4.0) |
     *                                      *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for A:

p,q  |           0            |      1
-----|------------------------|-------------
     |  (1.0,5.0)  (9.0,2.0)  |   (1.0,9.0)
     |  (2.0,4.0)  (8.0,3.0)  |   (1.0,8.0)
 0   |  (5.0,1.0)  (5.0,1.0)  |   (1.0,6.0)
     |  (6.0,6.0)  (3.0,6.0)  |   (1.0,4.0)
-----|------------------------|-------------
 1   |  (3.0,3.0)  (7.0,5.0)  |   (1.0,7.0)
     |  (4.0,2.0)  (4.0,7.0)  |   (1.0,5.0)

Global general 3 × 2 matrix B with block size 2 × 2:

B,D              0
     *                       *
 0   |  (1.0,8.0)  (2.0,7.0) |
     |  (4.0,4.0)  (6.0,8.0) |
     | --------------------- |
 1   |  (6.0,2.0)  (4.0,5.0) |
     *                       *

The following is the 2 × 2 process grid:
B,D 0 --
0 P00 P01
1 P10 P11

Local arrays for B:

p,q  |           0
-----|-----------------------
 0   |  (1.0,8.0)  (2.0,7.0)
     |  (4.0,4.0)  (6.0,8.0)
-----|-----------------------
 1   |  (6.0,2.0)  (4.0,5.0)

Global general 6 × 2 matrix C with block size 2 × 2:

B,D              0
     *                       *
 0   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     | --------------------- |
 1   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     | --------------------- |
 2   |  (0.5,0.0)  (0.5,0.0) |
     |  (0.5,0.0)  (0.5,0.0) |
     *                       *

The following is the 2 × 2 process grid:
B,D 0 --
0

2

P00 P01
1 P10 P11

Local arrays for C:

p,q  |           0
-----|-----------------------
     |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)
 0   |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)
-----|-----------------------
 1   |  (0.5,0.0)  (0.5,0.0)
     |  (0.5,0.0)  (0.5,0.0)

Output:

Global general 6 × 2 matrix C with block size 2 × 2:

B,D                  0
     *                               *
 0   |  (-22.0,113.0)  (-35.0.142.0) |
     |  (-19.0,114.0)  (-35.0.141.0) |
     | ----------------------------- |
 1   |  (-20.0,119.0)  (-43.0.146.0) |
     |  (-27.0,110.0)  (-58.0.131.0) |
     | ----------------------------- |
 2   |  (8.0,103.0)    (0.0.112.0)   |
     |  (-55.0,116.0)  (-75.0.135.0) |
     *                               *

The following is the 2 × 2 process grid:
B,D 0 --
0

2

P00 P01
1 P10 P11

Local arrays for C:

p,q  |               0
-----|-------------------------------
     |  (-22.0,113.0)  (-35.0.142.0)
     |  (-19.0,114.0)  (-35.0.141.0)
 0   |  (8.0,103.0)    (0.0.112.0)
     |  (-55.0,116.0)  (-75.0.135.0)
-----|-------------------------------
 1   |  (-20.0,119.0)  (-43.0.146.0)
     |  (-27.0,110.0)  (-58.0.131.0)

PDSYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric

This subroutine computes one of the following matrix-matrix products:

1. C <-- alphaAB+betaC
2. C <-- alphaBA+betaC

where, in the formulas above:

A represents the global symmetric submatrix:
B represents the global general submatrix Bib:ib+m-1, jb:jb+n-1.
C represents the global general submatrix Cic:ic+m-1, jc:jc+n-1.
alpha and beta are scalars.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [14] and [15].

Table 48. Data Types
alpha, beta, A, B, C Subprogram
Long-precision real PDSYMM

Syntax

Fortran CALL PDSYMM (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c)
C and C++ pdsymm (side, uplo, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c);

On Entry

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1.

If side = 'R', A is to the right of B, resulting in equation 2.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global symmetric submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

m

is the number of rows in submatrices B and C used in the computation, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrices B and C used in the computation, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 48.

a

is the local part of the global symmetric matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If m = 0 and side = 'L'
or n = 0 and side = 'R':
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If m = 0 and side = 'L'
or n = 0 and side = 'R':
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix
If m = 0 or n = 0:
     M_B >= 0
Otherwise:
     M_B >= 1

Global
4 N_B Number of columns in the global matrix
If m = 0 or n = 0:
     N_B >= 0
Otherwise:
     N_B >= 1

Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 48.

c

is the local part of the global general matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+m-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+m-1 by jc+n-1 part of the global matrix.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:
desc_c Name Description Limits Scope
1 DTYPE_C Descriptor type DTYPE_C=1 Global
2 CTXT_C BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_C Number of rows in the global matrix
If m = 0 or n = 0:
     M_C >= 0
Otherwise:
     M_C >= 1

Global
4 N_C Number of columns in the global matrix
If m = 0 or n = 0:
     N_C >= 0
Otherwise:
     N_C >= 1

Global
5 MB_C Row block size MB_C >= 1 Global
6 NB_C Column block size NB_C >= 1 Global
7 RSRC_C The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_C < p Global
8 CSRC_C The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_C < q Global
9 LLD_C The leading dimension of the local array LLD_C >= max(1,LOCp(M_C)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 48.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the side and uplo arguments.

  2. The matrices must have no common elements; otherwise, results are unpredictable.

  3. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  4. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  5. The following values must be equal: CTXT_A = CTXT_B = CTXT_C.

  6. If side = 'L':

  7. If side = 'R':

  8. If all the following are true:

    then you must follow these rules:

  9. If the following is true:

    or if all the following are true:

    then you must follow these rules:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.
  3. DTYPE_C is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDSYMM was called from outside the process grid.

Stage 4
  1. side <> 'L' or 'R'
  2. uplo <> 'U' or 'L'
  3. m < 0
  4. n < 0
  5. M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
  6. N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
  7. MB_A < 1
  8. NB_A < 1
  9. RSRC_A < 0 or RSRC_A >= p
  10. CSRC_A < 0 or CSRC_A >= q
  11. ia < 1
  12. ja < 1
  13. M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
  14. N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
  15. MB_B < 1
  16. NB_B < 1
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ib < 1
  20. jb < 1
  21. M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
  22. N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
  23. MB_C < 1
  24. NB_C < 1
  25. RSRC_C < 0 or RSRC_C >= p
  26. CSRC_C < 0 or CSRC_C >= q
  27. ic < 1
  28. jc < 1
  29. CTXT_A <> CTXT_B
  30. CTXT_A <> CTXT_C

Stage 5

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

  1. ia > M_A
  2. ja > N_A
  3. ia+numa-1 > M_A
  4. ja+numa-1 > N_A

    where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

  1. ib > M_B
  2. jb > N_B
  3. ib+m-1 > M_B
  4. jb+n-1 > N_B
  5. ic > M_C
  6. jc > N_C
  7. ic+m-1 > M_C
  8. jc+n-1 > N_C

Stage 6

If A is contained within a single block, that is:

numa+mod(ia-1, MB_A) <= MB_A
numa+mod(ja-1, NB_A) <= NB_A
where:
If side = 'L', numa = m
If side = 'R', numa = n

and:

then:

If A is not contained within a single block, or if A is contained within a single block and:

then:

  1. MB_A <> NB_A
  2. mod(ia-1, MB_A) <> 0
  3. mod(ja-1, NB_A) <> 0

    If side = 'L':

  4. MB_B <> NB_A
  5. MB_C <> NB_A
  6. mod(ib-1, MB_B) <> 0
  7. mod(ic-1, MB_C) <> 0

    If side = 'R':

  8. NB_B <> MB_A
  9. NB_C <> MB_A
  10. mod(jb-1, NB_B) <> 0
  11. mod(jc-1, NB_C) <> 0

In all cases:

  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))
  3. LLD_C < max(1, LOCp(M_C))

    If side = 'L' and looping is required--that is, either of the following is true:

    n+mod(jb-1, NB_B) > NB_B
    n+mod(jc-1, NB_C) > NB_C

    then:

  4. NB_B <> NB_C
  5. mod(jb-1, NB_B) <> mod(jc-1, NB_C).

    If side = 'L':

  6. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  7. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix C; that is, iarow <> icrow, where:
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    icrow = mod((((ic-1)/MB_C)+RSRC_C), p)

    If side = 'R' and looping is required--that is, either of the following is true:

    m+mod(ib-1, MB_B) > MB_B
    m+mod(ic-1, MB_C) > MB_C

    then:

  8. MB_B <> MB_C
  9. mod(ib-1, MB_B) <> mod(ic-1, MB_C).

    If side = 'R':

  10. In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  11. In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix C; that is, iacol <> iccol, where:
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    iccol = mod((((jc-1)/NB_C)+CSRC_C), q)

Example

This example computes C = betaC+alphaBA using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO   M   N    ALPHA    A  IA  JA   DESC_A   B  IB   JB
               |     |     |   |      |      |   |   |     |      |   |   |
 CALL PDSYMM( 'R' , 'U' , 16 , 8 ,  1.0D0  , A , 1 , 1 , DESC_A , B , 1 , 1 ,
 
              DESC_B   BETA    C  IC  JC   DESC_C
                |        |     |   |   |     |
              DESC_B , 0.0D0 , C , 1 , 1 , DESC_C )



Desc_A Desc_B Desc_C
DTYPE_ 1 1 1
CTXT_ icontxt1 icontxt1 icontxt1
M_ 8 16 16
N_ 8 8 8
MB_ 2 4 4
NB_ 2 2 2
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below2 See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on all processes, and LLD_B = LLD_C = 8 on all processes.

Global symmetric matrix A of order 8 with block size 2 × 2:

B,D        0             1             2             3
     *                                                     *
 0   |  0.0 -1.0  |  -1.0  0.0  |   0.0  0.0  |   0.0  0.0 |
     |   .   1.0  |   0.0  1.0  |   0.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 1   |   .   .    |  -1.0 -1.0  |   0.0  0.0  |   1.0  0.0 |
     |   .   .    |    .  -1.0  |   1.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 2   |   .    .   |    .    .   |  -1.0  0.0  |   0.0  0.0 |
     |   .    .   |    .    .   |    .   1.0  |   0.0  0.0 |
     | -----------|-------------|-------------|----------- |
 3   |   .    .   |    .    .   |    .    .   |   0.0  0.0 |
     |   .    .   |    .    .   |    .    .   |    .   0.0 |
     *                                                     *

The following is the 2 × 2 process grid:
B,D 0 2 1 3
0

2

P00 P01
1

3

P10 P11

Local arrays for A:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0 -1.0  0.0  0.0  |  -1.0  0.0  0.0  0.0
     |   .   1.0  0.0  1.0  |   0.0  1.0  0.0  1.0
 0   |   .    .  -1.0  0.0  |    .    .   0.0  0.0
     |   .    .    .   1.0  |    .    .   0.0  0.0
-----|----------------------|----------------------
     |   .    .   0.0  0.0  |  -1.0 -1.0  1.0  0.0
     |   .    .   1.0  1.0  |    .  -1.0  0.0  1.0
 1   |   .    .    .    .   |    .    .   0.0  0.0
     |   .    .    .    .   |    .    .    .   0.0

Global general 16 × 8 matrix B with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     | -1.0  0.0  |   1.0 -1.0  |   1.0  1.0  |  -1.0 -1.0 |
     | -1.0 -1.0  |   1.0  0.0  |   1.0 -1.0  |  -1.0  1.0 |
 0   |  1.0  1.0  |  -1.0  0.0  |  -1.0  0.0  |   1.0  0.0 |
     |  0.0 -1.0  |   0.0  0.0  |   0.0  0.0  |   0.0 -1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  1.0  |   0.0  1.0  |   0.0  1.0  |   1.0  0.0 |
     |  0.0  0.0  |   1.0  0.0  |  -1.0 -1.0  |   0.0  0.0 |
 1   |  1.0  1.0  |   0.0  0.0  |   1.0  1.0  |   0.0 -1.0 |
     |  0.0  0.0  |  -1.0  0.0  |   0.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  0.0  |   0.0 -1.0  |   1.0  1.0  |   0.0  1.0 |
     | -1.0 -1.0  |   1.0  0.0  |   0.0 -1.0  |   0.0  1.0 |
 2   |  0.0  0.0  |   0.0  1.0  |   1.0  0.0  |   0.0  0.0 |
     |  0.0  0.0  |   1.0  1.0  |   0.0 -1.0  |   0.0  0.0 |
     | -----------|-------------|-------------|----------- |
     |  1.0  1.0  |  -1.0  0.0  |  -1.0 -1.0  |   1.0  1.0 |
     |  0.0  0.0  |   0.0  0.0  |   1.0  0.0  |   0.0 -1.0 |
 3   |  0.0  1.0  |   0.0  0.0  |   0.0  0.0  |   0.0  0.0 |
     | -1.0  0.0  |  -1.0  0.0  |   0.0  1.0  |   1.0  0.0 |
     *                                                     *

The following is the 2 × 2 process grid:
B,D 0 2 1 3
0

2

P00 P01
1

3

P10 P11

Local arrays for B:

p,q  |          0           |           1
-----|----------------------|----------------------
     | -1.0  0.0  1.0  1.0  |   1.0 -1.0 -1.0 -1.0
     | -1.0 -1.0  1.0 -1.0  |   1.0  0.0 -1.0  1.0
     |  1.0  1.0 -1.0  0.0  |  -1.0  0.0  1.0  0.0
     |  0.0 -1.0  0.0  0.0  |   0.0  0.0  0.0 -1.0
 0   |  0.0  0.0  1.0  1.0  |   0.0 -1.0  0.0  1.0
     | -1.0 -1.0  0.0 -1.0  |   1.0  0.0  0.0  1.0
     |  0.0  0.0  1.0  0.0  |   0.0  1.0  0.0  0.0
     |  0.0  0.0  0.0 -1.0  |   1.0  1.0  0.0  0.0
-----|----------------------|----------------------
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  1.0  0.0
     |  0.0  0.0 -1.0 -1.0  |   1.0  0.0  0.0  0.0
     |  1.0  1.0  1.0  1.0  |   0.0  0.0  0.0 -1.0
     |  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  1.0
 1   |  1.0  1.0 -1.0 -1.0  |  -1.0  0.0  1.0  1.0
     |  0.0  0.0  1.0  0.0  |   0.0  0.0  0.0 -1.0
     |  0.0  1.0  0.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  1.0  |  -1.0  0.0  1.0  0.0

Output:

Global general 16 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     | -1.0  0.0  |   0.0  1.0  |  -2.0  0.0  |   1.0 -1.0 |
     |  0.0  0.0  |  -1.0 -1.0  |  -1.0 -2.0  |   1.0 -1.0 |
 0   |  0.0  0.0  |   1.0  1.0  |   1.0  1.0  |  -1.0  1.0 |
     |  1.0 -2.0  |   0.0 -2.0  |   0.0 -1.0  |   0.0 -1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  3.0  |   0.0  1.0  |   1.0  3.0  |   0.0  2.0 |
     | -1.0 -1.0  |  -1.0 -3.0  |   1.0 -1.0  |   1.0  0.0 |
 1   | -1.0  0.0  |  -1.0  2.0  |  -1.0  2.0  |   0.0  1.0 |
     |  1.0  2.0  |   1.0  3.0  |   0.0  1.0  |  -1.0  0.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  1.0  |   1.0  4.0  |  -2.0  0.0  |   0.0 -1.0 |
     |  0.0  0.0  |   0.0 -2.0  |   0.0 -2.0  |   1.0 -1.0 |
 2   |  0.0  1.0  |  -1.0  0.0  |   0.0  1.0  |   0.0  1.0 |
     | -1.0  0.0  |  -2.0 -3.0  |   1.0  0.0  |   1.0  1.0 |
     | -----------|-------------|-------------|----------- |
     |  0.0  0.0  |   1.0  1.0  |   1.0  0.0  |  -1.0  1.0 |
     |  0.0 -1.0  |   0.0  0.0  |  -1.0  0.0  |   0.0  0.0 |
 3   | -1.0  1.0  |   0.0  1.0  |   0.0  1.0  |   0.0  1.0 |
     |  1.0  2.0  |   3.0  2.0  |   0.0  1.0  |  -1.0  0.0 |
     *                                                     *

The following is the 2 × 2 process grid:
B,D 0 2 1 3
0

2

P00 P01
1

3

P10 P11

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     | -1.0  0.0 -2.0  0.0  |   0.0  1.0  1.0 -1.0
     |  0.0  0.0 -1.0 -2.0  |  -1.0 -1.0  1.0 -1.0
     |  0.0  0.0  1.0  1.0  |   1.0  1.0 -1.0  1.0
     |  1.0 -2.0  0.0 -1.0  |   0.0 -2.0  0.0 -1.0
 0   |  0.0  1.0 -2.0  0.0  |   1.0  4.0  0.0 -1.0
     |  0.0  0.0  0.0 -2.0  |   0.0 -2.0  1.0 -1.0
     |  0.0  1.0  0.0  1.0  |  -1.0  0.0  0.0  1.0
     | -1.0  0.0  1.0  0.0  |  -2.0 -3.0  1.0  1.0
-----|----------------------|----------------------
     | -1.0  3.0  1.0  3.0  |   0.0  1.0  0.0  2.0
     | -1.0 -1.0  1.0 -1.0  |  -1.0 -3.0  1.0  0.0
     | -1.0  0.0 -1.0  2.0  |  -1.0  2.0  0.0  1.0
     |  1.0  2.0  0.0  1.0  |   1.0  3.0 -1.0  0.0
 1   |  0.0  0.0  1.0  0.0  |   1.0  1.0 -1.0  1.0
     |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0
     |  1.0  2.0  0.0  1.0  |   3.0  2.0 -1.0  0.0

PDTRMM--Triangular Matrix-Matrix Product

This subroutine computes one of the following matrix-matrix products:
1. B <-- alphaAB 3. B <-- alphaBA
2. B <-- alphaATB 4. B <-- alphaBAT

where, in the formulas above:

A represents the global triangular submatrix:
B represents the global general submatrix Bib:ib+m-1, jb:jb+n-1.
alpha is a scalar.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 49. Data Types
alpha, A, B Subprogram
Long-precision real PDTRMM

Syntax

Fortran CALL PDTRMM (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b)
C and C++ pdtrmm (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b);

On Entry

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1 or 2.

If side = 'R', A is to the right of B, resulting in equation 3 or 4.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1 or 3.

If transa = 'T', AT is used in the computation, resulting in equation 2 or 4.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m

is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 49.

a

is the local part of the global triangular matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

Note: No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 49. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If m = 0 and side = 'L'
or n = 0 and side = 'R':
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If m = 0 and side = 'L'
or n = 0 and side = 'R':
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix
If m = 0 or n = 0:
     M_B >= 0
Otherwise:
     M_B >= 1

Global
4 N_B Number of columns in the global matrix
If m = 0 or n = 0:
     N_B >= 0
Otherwise:
     N_B >= 1

Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b

is the updated local part of the global matrix B, containing the results of the computation.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 49.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.

  2. If you specify 'C' for transa, it is interpreted as though you specified 'T'.

  3. The matrices must have no common elements; otherwise, results are unpredictable.

  4. PDTRMM assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.

  5. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  6. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  7. The following values must be equal: CTXT_A = CTXT_B.

  8. If A is not contained within a single block, that is:
    numa+mod(ia-1, MB_A) > MB_A
    numa+mod(ja-1, NB_A) > NB_A
    where:
    If side = 'L', numa = m
    If side = 'R', numa = n

    then:

  9. If side = 'L':

  10. If side = 'R':

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDTRMM was called from outside the process grid.

Stage 4
  1. side <> 'L' or 'R'
  2. uplo <> 'U' or 'L'
  3. transa <> 'N', 'T', or 'C'
  4. diag <> 'N' or 'U'
  5. m < 0
  6. n < 0
  7. M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
  8. N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
  9. MB_A < 1
  10. NB_A < 1
  11. M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
  12. N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
  13. MB_B < 1
  14. NB_B < 1
  15. RSRC_A < 0 or RSRC_A >= p
  16. CSRC_A < 0 or CSRC_A >= q
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ia < 1
  20. ja < 1
  21. ib < 1
  22. jb < 1
  23. CTXT_A <> CTXT_B

Stage 5
  1. MB_A <> NB_A

    If A is not contained within a single block, that is:

    numa+mod(ia-1, MB_A) > MB_A
    numa+mod(ja-1, NB_A) > NB_A
    where:
    If side = 'L', numa = m
    If side = 'R', numa = n

    and:

  2. side = 'L' and MB_B <> NB_A
  3. side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

  1. ia > M_A
  2. ja > N_A
  3. ia+numa-1 > M_A
  4. ja+numa-1 > N_A

    where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

  1. ib > M_B
  2. jb > N_B
  3. ib+m-1 > M_B
  4. jb+n-1 > N_B

If A is not contained in a single block:

  1. mod(ia-1, MB_A) <> 0
  2. mod(ja-1, NB_A) <> 0
  3. side = 'L' and mod(ib-1, MB_B) <> 0
  4. side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6
  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))

    If side = 'L':

  3. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)
  4. If A is contained in a single block:
    p > 1 and m+mod(ib-1, MB_B) > MB_B

    If side = 'R':

  5. In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)
  6. If A is contained in a single block:
    q > 1 and n+mod(jb-1, NB_B) > NB_B

Example

This example computes B = alphaAB using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET(0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA  DIAG  M   N    ALPHA    A  IA  JA   DESC_A
               |     |      |      |    |   |      |      |   |   |     |
 CALL PDTRMM( 'L' , 'U'  , 'N'  , 'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 , DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )



Desc_A Desc_B
DTYPE_ 1 1
CTXT_ icontxt1 icontxt1
M_ 5 5
N_ 5 3
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))

In this example, LLD_A = LLD_B = 3 on P00 and P01, and LLD_A = LLD_B = 2 on P10 and P11.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1 P10 P11

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

Output:

Global rectangular 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |   -2.0 |
     | -16.0  -1.0  |    6.0 |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

PDTRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

This subroutine performs one of the following solves for a triangular system of equations with multiple right-hand sides:
Solution Equation
1. B <-- alpha(A-1)B AX = alphaB
2. B <-- alpha(A-T)B ATX = alphaB
3. B <-- alphaB(A-1) XA = alphaB
4. B <-- alphaB(A-T) XAT = alphaB

where, in the formulas above:

A represents the global triangular submatrix:
B represents the global general submatrix Bib:ib+m-1, jb:jb+n-1.
alpha is a scalar.

Notes:

  1. The term X used in the systems of equations listed above represents the output solution matrix. It is important to note that, in this subroutine, the solution matrix is actually returned in the input-output argument b.

  2. No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If m = 0 or n = 0, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [14] and [15].

Table 50. Data Types
alpha, A, B Subprogram
Long-precision real PDTRSM

Syntax

Fortran CALL PDTRSM (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b)
C and C++ pdtrsm (side, uplo, transa, diag, m, n, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b);

On Entry

side

indicates whether A is located to the left or right of B in the system of equations, where:

If side = 'L', A is to the left of B, resulting in solution 1 or 2.

If side = 'R', A is to the right of B, resulting in solution 3 or 4.

Scope: global

Specified as: a single character; side = 'L' or 'R'.

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used in the system of equations, resulting in solution 1 or 3.

If transa = 'T', AT is used in the system of equations, resulting in solution 2 or 4.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

m

is the number of rows in submatrix B, and:

If side = 'L', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix B, and:

If side = 'R', it is the number of rows and columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 50.

a

is the local part of the global triangular matrix A, used in the system of equations. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, assuming the following:

If side = 'L', numa = m

If side = 'R', numa = n

the leading LOCp(ia+numa-1) by LOCq(ja+numa-1) part of the local array A must contain the local pieces of the leading ia+numa-1 by ja+numa-1 part of the global matrix, and:

Note: No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+numa-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+numa-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If side = 'L' and m = 0:
     M_A >= 0
If side = 'R' and n = 0:
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If side = 'L' and m = 0:
     N_A >= 0
If side = 'R' and n = 0:
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B, containing the right-hand sides of the triangular system to be solved. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore, the leading LOCp(ib+m-1) by LOCq(jb+n-1) part of the local array B must contain the local pieces of the leading ib+m-1 by jb+n-1 part of the global matrix.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B and ib+m-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B and jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix
If side = 'L' and m = 0:
     M_B >= 0
If side = 'R' and n = 0:
     M_B >= 0
Otherwise:
     M_B >= 1

Global
4 N_B Number of columns in the global matrix N_B >= 1 Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size
If side = 'L' and m = 0:
     N_B >= 0
If side = 'R' and n = 0:
     N_B >= 0
Otherwise:
     N_B >= 1

Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

b

is the updated local part of the global matrix B, containing the n solution vectors of length m.

Scope: local

Returned as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 50.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.

  2. If you specify 'C' for transa, it is interpreted as though you specified 'T'.

  3. The matrices must have no common elements; otherwise, results are unpredictable.

  4. PDTRSM assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.

  5. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  6. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  7. The following values must be equal: CTXT_A = CTXT_B.

  8. If looping is required--that is, either of the following is true:
    side = 'L' and m+mod(ia-1, MB_A) > MB_A
    side = 'R' and n+mod(ja-1, NB_A) > NB_A

    then the global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.

  9. If A is not contained within a single block, that is:
    numa+mod(ia-1, MB_A) > MB_A
    numa+mod(ja-1, NB_A) > NB_A
    where:
    If side = 'L', numa = m
    If side = 'R', numa = n

    then the global triangular matrix A must be aligned on a block boundary, that is:

    ia-1 must be a multiple of MB_A.
    ja-1 must be a multiple of NB_A.

  10. If side = 'L':

  11. If side = 'R':

  12. If A is contained within a single block, then:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDTRSM was called from outside the process grid.

Stage 4
  1. side <> 'L' or 'R'
  2. uplo <> 'U' or 'L'
  3. transa <> 'N', 'T', or 'C'
  4. diag <> 'N' or 'U'
  5. m < 0
  6. n < 0
  7. M_A < 0 and m = 0 and side = 'L'; M_A < 0 and n = 0 and side = 'R'; M_A < 1 otherwise
  8. N_A < 0 and m = 0 and side = 'L'; N_A < 0 and n = 0 and side = 'R'; N_A < 1 otherwise
  9. MB_A < 1
  10. NB_A < 1
  11. M_B < 0 and (m = 0 or n = 0); M_B < 1 otherwise
  12. N_B < 0 and (m = 0 or n = 0); N_B < 1 otherwise
  13. MB_B < 1
  14. NB_B < 1
  15. RSRC_A < 0 or RSRC_A >= p
  16. CSRC_A < 0 or CSRC_A >= q
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ia < 1
  20. ja < 1
  21. ib < 1
  22. jb < 1
  23. CTXT_A <> CTXT_B

Stage 5

If A is not contained within a single block, that is:

numa+mod(ia-1, MB_A) > MB_A
numa+mod(ja-1, NB_A) > NB_A
where:
If side = 'L', numa = m
If side = 'R', numa = n

then:

  1. MB_A <> NB_A
  2. side = 'L' and MB_B <> NB_A
  3. side = 'R' and NB_B <> MB_A

If (m <> 0 or side <> 'L') and (n <> 0 or side <> 'R'):

  1. ia > M_A
  2. ja > N_A
  3. ia+numa-1 > M_A
  4. ja+numa-1 > N_A

    where numa = m if side = 'L' and numa = n if side = 'R'.

If m <> 0 and n <> 0:

  1. ib > M_B
  2. jb > N_B
  3. ib+m-1 > M_B
  4. jb+n-1 > N_B

If A is not contained in a single block:

  1. mod(ia-1, MB_A) <> 0
  2. mod(ja-1, NB_A) <> 0
  3. side = 'L' and mod(ib-1, MB_B) <> 0
  4. side = 'R' and mod(jb-1, NB_B) <> 0

Stage 6
  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))

If side = 'L':

  1. In the process grid, the process row containing the first row of the submatrix A does not contain the first row of the submatrix B; that is, iarow <> ibrow, where:
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)

  2. If A is contained in a single block:
    p > 1 and m+mod(ib-1, MB_B) > MB_B

If side = 'R':

  1. In the process grid, the process column containing the first column of the submatrix A does not contain the first column of the submatrix B; that is, iacol <> ibcol, where:
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)

  2. If A is contained in a single block:
    q > 1 and n+mod(jb-1, NB_B) > NB_B

Example

This example shows the solution B <-- alpha(A-1)B using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              SIDE  UPLO  TRANSA   DIAG   M   N    ALPHA    A  IA  JA    DESC_A
               |      |      |       |    |   |      |      |   |   |      |
 CALL PDTRSM( 'L'  , 'U' ,  'N'  ,  'N' , 5 , 3 ,  1.0D0  , A , 1 , 1 ,  DESC_A ,
 
              B  IB  JB   DESC_B
              |   |   |     |
              B , 1 , 1 , DESC_B )



Desc_A Desc_B
DTYPE_ 1 1
CTXT_ icontxt1 icontxt1
M_ 5 5
N_ 5 3
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))

In this example, LLD_A = LLD_B = 3 on P00 and P01, and LLD_A = LLD_B = 2 on P10 and P11.

Global triangular matrix A of order 5 is upper triangular with block size 2 × 2:

B,D        0             1          2
     *                                  *
 0   |  3.0 -1.0  |   2.0  2.0  |   1.0 |
     |   .  -2.0  |   4.0 -1.0  |   3.0 |
     | -----------|-------------|------ |
 1   |   .    .   |  -3.0  0.0  |   2.0 |
     |   .    .   |    .   4.0  |  -2.0 |
     | -----------|-------------|------ |
 2   |   .    .   |    .    .   |   1.0 |
     *                                  *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1 P10 P11

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |  3.0 -1.0  1.0  |   2.0  2.0
 0   |   .  -2.0  3.0  |   4.0 -1.0
     |   .    .   1.0  |    .    .
-----|-----------------|------------
 1   |   .    .   2.0  |  -3.0  0.0
     |   .    .  -2.0  |    .   4.0

Global general 5 × 3 matrix B with block size 2 × 2:

B,D         0            1
     *                       *
 0   |   6.0  10.0  |  -2.0  |
     | -16.0  -1.0  |   6.0  |
     | -------------|------- |
 1   |  -2.0   1.0  |   -4.0 |
     |  14.0   0.0  |  -14.0 |
     | -------------|------- |
 2   |  -1.0   2.0  |    1.0 |
     *                       *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for B:

p,q  |      0       |    1
-----|--------------|--------
     |   6.0  10.0  |   -2.0
 0   | -16.0  -1.0  |    6.0
     |  -1.0   2.0  |    1.0
-----|--------------|--------
 1   |  -2.0   1.0  |   -4.0
     |  14.0   0.0  |  -14.0

Output:

Global general 5 × 3 matrix B with block size 2 × 2:

B,D        0          1
     *                    *
 0   |  2.0  3.0  |   1.0 |
     |  5.0  5.0  |   4.0 |
     | -----------|------ |
 1   |  0.0  1.0  |   2.0 |
     |  3.0  1.0  |  -3.0 |
     | -----------|------ |
 2   | -1.0  2.0  |   1.0 |
     *                    *

The following is the 2 × 2 process grid:
B,D 0 1
0

2

P00 P01
1 P10 P11

Local arrays for B:

p,q  |     0      |   1
-----|------------|-------
     |  2.0  3.0  |   1.0
 0   |  5.0  5.0  |   4.0
     | -1.0  2.0  |   1.0
-----|------------|-------
 1   |  0.0  1.0  |   2.0
     |  3.0  1.0  |  -3.0

PDSYRK--Rank-K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-k updates:

1. C <-- alphaAAT+betaC
2. C <-- alphaATA+betaC

where, in the formulas above:

A represents the global general submatrix:
C represents the global symmetric submatrix Cic:ic+n-1, jc:jc+n-1.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [14] and [15].

Table 51. Data Types
alpha, beta, A, C Subprogram
Long-precision real PDSYRK

Syntax

Fortran CALL PDSYRK (uplo, trans, n, k, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c)
C and C++ pdsyrk (uplo, trans, n, k, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c);

On Entry

uplo

indicates whether the upper or lower triangular part of the symmetric submatrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Scope: global

Specified as: a single character; trans = 'N' or 'T'.

n

is the order of the global symmetric submatrix C used in the computation, and:

If trans = 'N', it is the number of rows in submatrix A used in the computation.

If trans = 'T', it is the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If trans = 'N', it is the number of columns in submatrix A used in the computation.

If trans = 'T', it is the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 51.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore:
Note: No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If trans = 'N', then ia+n-1 <= M_A.

If trans = 'T', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If trans = 'N', then ja+k-1 <= N_A.

If trans = 'T', then ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If n = 0 or k = 0:
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If n = 0 or k = 0:
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 51.

c

is the local part of the global symmetric matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+n-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+n-1 by jc+n-1 part of the global matrix, and:

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:
desc_c Name Description Limits Scope
1 DTYPE_C Descriptor type DTYPE_C=1 Global
2 CTXT_C BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_C Number of rows in the global matrix
If n = 0:
     M_C >= 0
Otherwise:
     M_C >= 1

Global
4 N_C Number of columns in the global matrix
If n = 0:
     N_C >= 0
Otherwise:
     N_C >= 1

Global
5 MB_C Row block size MB_C >= 1 Global
6 NB_C Column block size NB_C >= 1 Global
7 RSRC_C The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_C < p Global
8 CSRC_C The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_C < q Global
9 LLD_C The leading dimension of the local array LLD_C >= max(1,LOCp(M_C)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global symmetric matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 51.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the uplo and trans arguments.

  2. If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.

  3. The matrices must have no common elements; otherwise, results are unpredictable.

  4. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  5. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  6. The following values must be equal: CTXT_A = CTXT_C.

  7. If C is not contained within a single block, that is:
    n+mod(ic-1, MB_C) > MB_C
    n+mod(jc-1, NB_C) > NB_C

    then:

  8. If trans = 'N':

  9. If trans = 'T':

  10. If C is contained within a single block:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_C is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDSYRK was called from outside the process grid.

Stage 4
  1. uplo <> 'U' or 'L'
  2. trans <> 'N', 'T', or 'C'
  3. n < 0 and trans = 'N'
  4. n < 0 and trans = 'T' or 'C'
  5. n < 0 and trans is invalid.
  6. k < 0 and trans = 'N'
  7. k < 0 and trans = 'T' or 'C'
  8. k < 0 and trans is invalid.
  9. M_A < 0 and (n = 0 or k = 0); M_A < 1 otherwise
  10. N_A < 0 and (n = 0 or k = 0); N_A < 1 otherwise
  11. MB_A < 1
  12. NB_A < 1
  13. RSRC_A < 0 or RSRC_A >= p
  14. CSRC_A < 0 or CSRC_A >= q
  15. ia < 1
  16. ja < 1
  17. M_C < 0 and n = 0; M_C < 1 otherwise
  18. N_C < 0 and n = 0; N_C < 1 otherwise
  19. MB_C < 1
  20. NB_C < 1
  21. RSRC_C < 0 or RSRC_C >= p
  22. CSRC_C < 0 or CSRC_C >= q
  23. ic < 1
  24. jc < 1
  25. CTXT_A <> CTXT_C

If n <> 0 and k <> 0:

  1. ia > M_A
  2. ja > N_A
  3. trans = 'N' and ia+n-1 > M_A
  4. trans = 'N' and ja+k-1 > N_A
  5. trans = 'T' and ia+k-1 > M_A
  6. trans = 'T' and ja+n-1 > N_A

If n <> 0:

  1. ic > M_C
  2. jc > N_C
  3. ic+n-1 > M_C
  4. jc+n-1 > N_C

Stage 5
  1. If C is not contained within a single block, that is:
    n+mod(ic-1, MB_C) > MB_C
    n+mod(jc-1, NB_C) > NB_C

    and NB_C <> MB_C.

  2. trans = 'N' and NB_C <> MB_A.
  3. trans = 'T' and MB_C <> NB_A.

If C is not contained within a single block:

  1. mod(ic-1, MB_C) <> 0
  2. mod(jc-1, NB_C) <> 0
  3. trans = 'N' and mod(ia-1, MB_A) <> 0
  4. trans = 'T' and mod(ja-1, NB_A) <> 0

Stage 6
  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_C < max(1, LOCp(M_C))
  3. If trans = 'N', then (in the process grid) the process row containing the first row of the submatrix C does not contain the first row of the submatrix A; that is, icrow <> iarow, where:
    icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  4. If trans = 'T', then (in the process grid) the process column containing the first column of the submatrix C does not contain the first column of the submatrix A; that is, iccol <> iacol, where:
    iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)

If C is contained within a single block:

  1. If trans = 'N':
    p > 1 and n+mod(ia-1, MB_A) > MB_A
  2. If trans = 'T':
    q > 1 and n+mod(ja-1, NB_A) > NB_A

Example

This example computes C = alphaAAT+betaC using a 2 × 3 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 3
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
             UPLO   TRANS   N    K     ALPHA    A  IA  JA    DESC_A    BETA
               |      |     |    |       |      |   |   |      |         |
 CALL PDSYRK( 'L' ,  'N' ,  8  , 5  ,  1.0D0  , A , 1 , 1 ,  DESC_A ,  1.0D0 ,
 
               C  IC  JC   DESC_C
               |   |   |     |
               C , 1 , 1 , DESC_C )



Desc_A Desc_C
DTYPE_ 1 1
CTXT_ icontxt1 icontxt1
M_ 8 8
N_ 5 8
MB_ 2 2
NB_ 2 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_C = 4 on all processes.

Global general 8 × 5 matrix A with block size 2 × 2:

B,D         0                1             2
     *                                         *
 0   |   0.0    8.0  |   16.0   24.0  |   32.0 |
     |   1.0    9.0  |   17.0   25.0  |   33.0 |
     | --------------|----------------|------- |
 1   |   2.0   10.0  |   18.0   26.0  |   34.0 |
     |   3.0   11.0  |   19.0   27.0  |   35.0 |
     | --------------|----------------|------- |
 2   |   4.0   12.0  |   20.0   28.0  |   36.0 |
     |   5.0   13.0  |   21.0   29.0  |   37.0 |
     | --------------|----------------|------- |
 3   |   6.0   14.0  |   22.0   30.0  |   38.0 |
     |   7.0   15.0  |   23.0   31.0  |   39.0 |
     *                                         *

The following is the 2 × 3 process grid:
B,D 0 1 2
0

2

P00 P01 P02
1

3

P10 P11 P12

Local arrays for A:

p,q  |     0       |       1        |    2
-----|-------------|----------------|--------
     |  0.0   8.0  |   16.0   24.0  |   32.0
     |  1.0   9.0  |   17.0   25.0  |   33.0
 0   |  4.0  12.0  |   20.0   28.0  |   36.0
     |  5.0  13.0  |   21.0   29.0  |   37.0
-----|-------------|----------------|--------
     |  2.0  10.0  |   18.0   26.0  |   34.0
     |  3.0  11.0  |   19.0   27.0  |   35.0
 1   |  6.0  14.0  |   22.0   30.0  |   38.0
     |  7.0  15.0  |   23.0   31.0  |   39.0

Global symmetric matrix C of order 8 block size 2 × 2:

B,D         0               1               2               3
     *                                                             *
 0   |   0.0    .   |     .     .   |     .     .   |     .     .  |
     |   1.0   8.0  |     .     .   |     .     .   |     .     .  |
     | -------------|---------------|---------------|------------- |
 1   |   2.0   9.0  |   15.0    .   |     .     .   |     .     .  |
     |   3.0  10.0  |   16.0  21.0  |     .     .   |     .     .  |
     | -------------|---------------|---------------|------------- |
 2   |   4.0  11.0  |   17.0  22.0  |   26.0    .   |     .     .  |
     |   5.0  12.0  |   18.0  23.0  |   27.0  30.0  |     .     .  |
     | -------------|---------------|---------------|------------- |
 3   |   6.0  13.0  |   19.0  24.0  |   28.0  31.0  |   33.0    .  |
     |   7.0  14.0  |   20.0  25.0  |   29.0  32.0  |   34.0  35.0 |
     *                                                             *

The following is the 2 × 3 process grid:
B,D 0 3 1 2
0

2

P00 P01 P02
1

3

P10 P11 P12

Local arrays for C:

p,q  |            0             |       1       |       2
-----|--------------------------|---------------|--------------
     |   0.0    .     .     .   |     .     .   |     .     .
     |   1.0   8.0    .     .   |     .     .   |     .     .
 0   |   4.0  11.0    .     .   |   17.0  22.0  |   26.0    .
     |   5.0  12.0    .     .   |   18.0  23.0  |   27.0  30.0
-----|--------------------------|---------------|--------------
     |   2.0   9.0    .     .   |   15.0    .   |     .     .
     |   3.0  10.0    .     .   |   16.0  21.0  |     .     .
 1   |   6.0  13.0  33.0    .   |   19.0  24.0  |   28.0  31.0
     |   7.0  14.0  34.0  35.0  |   20.0  25.0  |   29.0  32.0

Output:

Global symmetric matrix C of order 8 with block size 2 × 2:


B,D           0                   1                   2                   3
     *                                                                             *
 0   |  1920.0      .   |       .       .   |       .       .   |       .       .  |
     |  2001.0  2093.0  |       .       .   |       .       .   |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 1   |  2082.0  2179.0  |   2275.0      .   |       .       .   |       .       .  |
     |  2163.0  2265.0  |   2366.0  2466.0  |       .       .   |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 2   |  2244.0  2351.0  |   2457.0  2562.0  |   2666.0      .   |       .       .  |
     |  2325.0  2437.0  |   2548.0  2658.0  |   2767.0  2875.0  |       .       .  |
     | -----------------|-------------------|-------------------|----------------- |
 3   |  2406.0  2523.0  |   2639.0  2754.0  |   2868.0  2981.0  |   3093.0      .  |
     |  2487.0  2609.0  |   2730.0  2850.0  |   2969.0  3087.0  |   3204.0  3320.0 |
     *                                                                             *

The following is the 2 × 3 process grid:
B,D 0 3 1 2
0

2

P00 P01 P02
1

3

P10 P11 P12

Local arrays for C:


p,q  |                0                 |         1         |         2
-----|----------------------------------|-------------------|------------------
     |  1920.0      .       .       .   |       .       .   |       .       .
     |  2001.0  2093.0      .       .   |       .       .   |       .       .
 0   |  2244.0  2351.0      .       .   |   2457.0  2562.0  |   2666.0      .
     |  2325.0  2437.0      .       .   |   2548.0  2658.0  |   2767.0  2875.0
-----|----------------------------------|-------------------|------------------
     |  2082.0  2179.0      .       .   |   2275.0      .   |       .       .
     |  2163.0  2265.0      .       .   |   2366.0  2466.0  |       .       .
 1   |  2406.0  2523.0  3093.0      .   |   2639.0  2754.0  |   2868.0  2981.0
     |  2487.0  2609.0  3204.0  3320.0  |   2730.0  2850.0  |   2969.0  3087.0

PDSYR2K--Rank-2K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-2k updates:

1. C <-- alphaABT+alphaBAT+betaC
2. C <-- alphaATB+alphaBTA+betaC

where, in the formulas above:

A represents the global general submatrix:
B represents the global general submatrix:
C represents the global symmetric submatrix Cic:ic+n-1, jc:jc+n-1.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [14] and [15].

Table 52. Data Types
alpha, beta, A, B, C Subprogram
Long-precision real PDSYR2K

Syntax

Fortran CALL PDSYR2K (uplo, trans, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c)
C and C++ pdsyr2k (uplo, trans, n, k, alpha, a, ia, ja, desc_a, b, ib, jb, desc_b, beta, c, ic, jc, desc_c);

On Entry

uplo

indicates whether the upper or lower triangular part of the symmetric submatrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Scope: global

Specified as: a single character; trans = 'N' or 'T'.

n

is the order of the global symmetric submatrix C used in the computation, and:

If trans = 'N', it is the number of rows in submatrices A and B used in the computation.

If trans = 'T', it is the number of columns in submatrices A and B used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

k

has the following meaning:

If trans = 'N', it is the number of columns in submatrices A and B used in the computation.

If trans = 'T', it is the number of rows in submatrices A and B used in the computation.

Scope: global

Specified as: a fullword integer; k >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 52.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore:
Note: No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A, and:

If trans = 'N', then ia+n-1 <= M_A.

If trans = 'T', then ia+k-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A, and:

If trans = 'N', then ja+k-1 <= N_A.

If trans = 'T', then ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If n = 0 or k = 0:
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If n = 0 or k = 0:
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

b

is the local part of the global general matrix B. This identifies the first element of the local array B. This subroutine computes the location of the first element of the local subarray used, based on ib, jb, desc_b, p, q, myrow, and mycol; therefore:
Note: No data should be moved to form BT; that is, the matrix B should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_B by (at least) LOCq(N_B) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix B are stored in desc_b.

ib

is the row index of the global matrix B, identifying the first row of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= ib <= M_B, and:

If trans = 'N', then ib+n-1 <= M_B.

If trans = 'T', then ib+k-1 <= M_B.

jb

is the column index of the global matrix B, identifying the first column of the submatrix B.

Scope: global

Specified as: a fullword integer; 1 <= jb <= N_B, and:

If trans = 'N', then jb+k-1 <= N_B.

If trans = 'T', then jb+n-1 <= N_B.

desc_b

is the array descriptor for global matrix B, described in the following table:
desc_b Name Description Limits Scope
1 DTYPE_B Descriptor type DTYPE_B=1 Global
2 CTXT_B BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_B Number of rows in the global matrix
If n = 0 or k = 0:
     M_B >= 0
Otherwise:
     M_B >= 1

Global
4 N_B Number of columns in the global matrix
If n = 0 or k = 0:
     N_B >= 0
Otherwise:
     N_B >= 1

Global
5 MB_B Row block size MB_B >= 1 Global
6 NB_B Column block size NB_B >= 1 Global
7 RSRC_B The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_B < p Global
8 CSRC_B The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_B < q Global
9 LLD_B The leading dimension of the local array LLD_B >= max(1,LOCp(M_B)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 52.

c

is the local part of the global symmetric matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+n-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+n-1 by jc+n-1 part of the global matrix, and:

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+n-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:
desc_c Name Description Limits Scope
1 DTYPE_C Descriptor type DTYPE_C=1 Global
2 CTXT_C BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_C Number of rows in the global matrix
If n = 0:
     M_C >= 0
Otherwise:
     M_C >= 1

Global
4 N_C Number of columns in the global matrix
If n = 0:
     N_C >= 0
Otherwise:
     N_C >= 1

Global
5 MB_C Row block size MB_C >= 1 Global
6 NB_C Column block size NB_C >= 1 Global
7 RSRC_C The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_C < p Global
8 CSRC_C The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_C < q Global
9 LLD_C The leading dimension of the local array LLD_C >= max(1,LOCp(M_C)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global symmetric matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 52.

Notes and Coding Rules

  1. This subroutine accepts lowercase letters for the uplo and trans arguments.

  2. If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.

  3. The matrices must have no common elements; otherwise, results are unpredictable.

  4. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  5. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  6. The following values must be equal: CTXT_A = CTXT_B = CTXT_C.

  7. If trans = 'N':

  8. If trans = 'T':

  9. If all the following are true:

    then you must follow these rules:

  10. If the following is true:

    or if all the following are true:

    then you must follow these rules:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_B is invalid.
  3. DTYPE_C is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDSYR2K was called from outside the process grid.

Stage 4
  1. uplo <> 'U' or 'L'
  2. trans <> 'N', 'T', or 'C'
  3. n < 0 and trans = 'N'; n < 0 and trans = 'T' or 'C'; n < 0 and trans is invalid.
  4. k < 0 and trans = 'N'; k < 0 and trans = 'T' or 'C'; k < 0 and trans is invalid.
  5. M_A < 0 and (n = 0 or k = 0); M_A < 1 otherwise
  6. N_A < 0 and (n = 0 or k = 0); N_A < 1 otherwise
  7. MB_A < 1
  8. NB_A < 1
  9. RSRC_A < 0 or RSRC_A >= p
  10. CSRC_A < 0 or CSRC_A >= q
  11. ia < 1
  12. ja < 1
  13. M_B < 0 and (n = 0 or k = 0); M_B < 1 otherwise
  14. N_B < 0 and (n = 0 or k = 0); N_B < 1 otherwise
  15. MB_B < 1
  16. NB_B < 1
  17. RSRC_B < 0 or RSRC_B >= p
  18. CSRC_B < 0 or CSRC_B >= q
  19. ib < 1
  20. jb < 1
  21. M_C < 0 and n = 0; M_C < 1 otherwise
  22. N_C < 0 and n = 0; N_C < 1 otherwise
  23. MB_C < 1
  24. NB_C < 1
  25. RSRC_C < 0 or RSRC_C >= p
  26. CSRC_C < 0 or CSRC_C >= q
  27. ic < 1
  28. jc < 1
  29. CTXT_A <> CTXT_B
  30. CTXT_A <> CTXT_C

Stage 5

If n <> 0 and k <> 0:

  1. ia > M_A
  2. ja > N_A
  3. trans = 'N' and ia+n-1 > M_A
  4. trans = 'N' and ja+k-1 > N_A
  5. trans = 'T' and ia+k-1 > M_A
  6. trans = 'T' and ja+n-1 > N_A
  7. ib > M_B
  8. jb > N_B
  9. trans = 'N' and ib+n-1 > M_B
  10. trans = 'N' and jb+k-1 > N_B
  11. trans = 'T' and ib+k-1 > M_B
  12. trans = 'T' and jb+n-1 > N_B

    If n <> 0:

  13. ic > M_C
  14. jc > N_C
  15. ic+n-1 > M_C
  16. jc+n-1 > N_C

Stage 6

If C is contained within a single block, that is:

n+mod(ic-1, MB_C) <= MB_C
n+mod(jc-1, NB_C) <= NB_C

and:

then:

If C is not contained within a single block, or if C is contained within a single block and:

then:

  1. MB_C <> NB_C
  2. mod(ic-1, MB_C) <> 0
  3. mod(jc-1, NB_C) <> 0

    If trans = 'N':

  4. NB_C <> MB_A
  5. NB_C <> MB_B
  6. NB_A <> NB_B
  7. mod(ia-1, MB_A) <> 0
  8. mod(ib-1, MB_B) <> 0

    If trans = 'T':

  9. MB_C <> NB_A
  10. MB_C <> NB_B
  11. MB_A <> MB_B
  12. mod(ja-1, NB_A) <> 0
  13. mod(jb-1, NB_B) <> 0

In all cases:

  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_B < max(1, LOCp(M_B))
  3. LLD_C < max(1, LOCp(M_C))

    If trans = 'N':

  4. Looping is required and mod(ja-1, NB_A) <> mod(jb-1, NB_B).
  5. In the process grid, the process row containing the first row of the submatrix C does not contain the first row of the submatrix A; that is, icrow <> iarow, where:
    icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  6. In the process grid, the process row containing the first row of the submatrix C does not contain the first row of the submatrix B; that is, icrow <> ibrow, where:
    icrow = mod((((ic-1)/MB_C)+RSRC_C), p)
    ibrow = mod((((ib-1)/MB_B)+RSRC_B), p)

    If trans = 'T':

  7. Looping is required and mod(ia-1, MB_A) <> mod(ib-1, MB_B).
  8. In the process grid, the process column containing the first column of the submatrix C does not contain the first column of the submatrix A; that is, iccol <> iacol, where:
    iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  9. In the process grid, the process column containing the first column of the submatrix C does not contain the first column of the submatrix B; that is, iccol <> ibcol, where:
    iccol = mod((((jc-1)/NB_C)+CSRC_C), q)
    ibcol = mod((((jb-1)/NB_B)+CSRC_B), q)

Example

This example computes C = alphaATB+alphaBTA+betaC using a 2 × 2 process grid.

Call Statements and Input


 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              UPLO   TRANS   N    K     ALPHA    A  IA  JA    DESC_A   B  IB  JB
                |      |     |    |       |      |   |   |      |      |   |   |
 CALL PDSYR2K( 'U' ,  'T' ,  9  , 8  ,  1.0D0  , A , 1 , 1 ,  DESC_A , B , 1 , 1 ,
 
               DESC_B    BETA    C  IC  JC   DESC_C
                 |         |     |   |   |     |
               DESC_B ,  0.0D0 , C , 1 , 1 , DESC_C )



Desc_A Desc_B Desc_C
DTYPE_ 1 1 1
CTXT_ icontxt1 icontxt1 icontxt1
M_ 8 8 9
N_ 9 9 9
MB_ 2 2 4
NB_ 4 4 4
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below2 See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_B = MAX(1,NUMROC(M_B, MB_B, MYROW, RSRC_B, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = LLD_B = 4 on all processes, LLD_C = 5 on P00 and P01, and LLD_C = 4 on P10 and P11.

Global general 8 × 9 matrix A with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0  |   1.0 |
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0 -1.0 -1.0  |   0.0  0.0  1.0  0.0  |   1.0 |
     |  0.0  1.0  0.0 -1.0  |   1.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 2   |  1.0  0.0  0.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     |  1.0  0.0  0.0  0.0  |   1.0  1.0  0.0  0.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0 -1.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     | -1.0  0.0  0.0  0.0  |   0.0  0.0 -1.0  0.0  |   1.0 |
     *                                                      *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1

3

P10 P11

Local arrays for A:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0 -1.0 -1.0  0.0  1.0  |   0.0  0.0  0.0  0.0
     |  0.0  1.0  0.0  1.0  1.0  |   0.0  1.0  0.0  1.0
 0   |  1.0  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0  1.0  |   1.0  1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0 -1.0 -1.0  1.0  |   0.0  0.0  1.0  0.0
     |  0.0  1.0  0.0 -1.0  1.0  |   1.0  1.0  0.0  1.0
 1   |  0.0  0.0 -1.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0  1.0  |   0.0  0.0 -1.0  0.0

Global general 8 × 9 matrix B with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0  1.0  1.0  0.0  |   0.0  0.0  0.0  0.0  |  -1.0 |
     |  0.0 -1.0  0.0 -1.0  |   0.0 -1.0  0.0 -1.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0  1.0  1.0  |   0.0  0.0 -1.0  0.0  |  -1.0 |
     |  0.0 -1.0  0.0  1.0  |  -1.0 -1.0  0.0 -1.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 2   | -1.0  0.0  0.0  0.0  |   1.0  0.0  0.0  0.0  |  -1.0 |
     | -1.0  0.0  0.0  0.0  |  -1.0 -1.0  0.0  0.0  |  -1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0  1.0  0.0  |   1.0  0.0  0.0  0.0  |  -1.0 |
     |  1.0  0.0  0.0  0.0  |   0.0  0.0  1.0  0.0  |  -1.0 |
     *                                                      *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1

3

P10 P11

Local arrays for B:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0  1.0  1.0  0.0 -1.0  |   0.0  0.0  0.0  0.0
     |  0.0 -1.0  0.0 -1.0 -1.0  |   0.0 -1.0  0.0 -1.0
 0   | -1.0  0.0  0.0  0.0 -1.0  |   1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0 -1.0  |  -1.0 -1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0  1.0  1.0 -1.0  |   0.0  0.0 -1.0  0.0
     |  0.0 -1.0  0.0  1.0 -1.0  |  -1.0 -1.0  0.0 -1.0
 1   |  0.0  0.0  1.0  0.0 -1.0  |   1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0 -1.0  |   0.0  0.0  1.0  0.0

Output:

Global symmetric matrix C of order 9 with block size 4 × 4:

B,D             0                       1                2
     *                                                       *
     | -6.0  0.0  0.0  0.0  |   0.0 -2.0 -2.0  0.0  |  -2.0  |
     |   .  -6.0 -2.0  0.0  |  -2.0 -4.0  0.0 -4.0  |  -2.0  |
 0   |   .    .  -6.0 -2.0  |  -2.0  0.0  2.0  0.0  |   6.0  |
     |   .    .    .  -6.0  |   2.0  0.0  2.0  0.0  |   2.0  |
     | ---------------------|-----------------------|------- |
     |   .    .    .    .   |  -8.0 -4.0  0.0 -2.0  |   0.0  |
     |   .    .    .    .   |    .  -6.0  0.0 -4.0  |  -6.0  |
 1   |   .    .    .    .   |    .    .  -4.0  0.0  |   0.0  |
     |   .    .    .    .   |    .    .    .  -4.0  |  -4.0  |
     | ---------------------|-----------------------|------- |
 2   |   .    .    .    .   |    .    .    .    .   |  -16.0 |
     *                                                       *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1 P10 P11

Local arrays for C:

p,q  |             0              |           1
-----|----------------------------|----------------------
     | -6.0  0.0  0.0  0.0  -2.0  |   0.0 -2.0 -2.0  0.0
     |   .  -6.0 -2.0  0.0  -2.0  |  -2.0 -4.0  0.0 -4.0
 0   |   .    .  -6.0 -2.0   6.0  |  -2.0  0.0  2.0  0.0
     |   .    .    .  -6.0   2.0  |   2.0  0.0  2.0  0.0
     |   .    .    .    .  -16.0  |    .    .    .    .
-----|----------------------------|----------------------
     |   .    .    .    .    0.0  |  -8.0 -4.0  0.0 -2.0
     |   .    .    .    .   -6.0  |    .  -6.0  0.0 -4.0
 1   |   .    .    .    .    0.0  |    .    .  -4.0  0.0
     |   .    .    .    .   -4.0  |    .    .    .  -4.0

PDTRAN--Matrix Transpose for a General Matrix

This subroutine performs the following matrix computation:

C <-- betaC+alphaAT

where, in the formula above:

A represents the global general submatrix Aia:ia+n-1, ja:ja+m-1.
C represents the global general submatrix Cic:ic+m-1, jc:jc+n-1.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [14] and [15].

Table 53. Data Types
alpha, beta, A, C Subprogram
Long-precision real PDTRAN

Syntax

Fortran CALL PDTRAN (m, n, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c)
C and C++ pdtran (m, n, alpha, a, ia, ja, desc_a, beta, c, ic, jc, desc_c);

On Entry

m

is the number of rows in submatrix C and the number of columns in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix C and the number of rows in submatrix A used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 53.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+m-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+m-1 part of the global matrix.
Note: No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+m-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:
desc_a Name Description Limits Scope
1 DTYPE_A Descriptor type DTYPE_A=1 Global
2 CTXT_A BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_A Number of rows in the global matrix
If m = 0 or n = 0:
     M_A >= 0
Otherwise:
     M_A >= 1

Global
4 N_A Number of columns in the global matrix
If m = 0 or n = 0:
     N_A >= 0
Otherwise:
     N_A >= 1

Global
5 MB_A Row block size MB_A >= 1 Global
6 NB_A Column block size NB_A >= 1 Global
7 RSRC_A The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_A < p Global
8 CSRC_A The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_A < q Global
9 LLD_A The leading dimension of the local array LLD_A >= max(1,LOCp(M_A)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 53.

c

is the local part of the global general matrix C. This identifies the first element of the local array C. This subroutine computes the location of the first element of the local subarray used, based on ic, jc, desc_c, p, q, myrow, and mycol; therefore, the leading LOCp(ic+m-1) by LOCq(jc+n-1) part of the local array C must contain the local pieces of the leading ic+m-1 by jc+n-1 part of the global matrix.

When beta is zero, C need not be set on input.

Scope: local

Specified as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53. Details about the block-cyclic data distribution of global matrix C are stored in desc_c.

ic

is the row index of the global matrix C, identifying the first row of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= ic <= M_C and ic+m-1 <= M_C.

jc

is the column index of the global matrix C, identifying the first column of the submatrix C.

Scope: global

Specified as: a fullword integer; 1 <= jc <= N_C and jc+n-1 <= N_C.

desc_c

is the array descriptor for global matrix C, described in the following table:
desc_c Name Description Limits Scope
1 DTYPE_C Descriptor type DTYPE_C=1 Global
2 CTXT_C BLACS context Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP Global
3 M_C Number of rows in the global matrix
If m = 0 or n = 0:
     M_C >= 0
Otherwise:
     M_C >= 1

Global
4 N_C Number of columns in the global matrix
If m = 0 or n = 0:
     N_C >= 0
Otherwise:
     N_C >= 1

Global
5 MB_C Row block size MB_C >= 1 Global
6 NB_C Column block size NB_C >= 1 Global
7 RSRC_C The process row of the p × q grid over which the first row of the global matrix is distributed 0 <= RSRC_C < p Global
8 CSRC_C The process column of the p × q grid over which the first column of the global matrix is distributed 0 <= CSRC_C < q Global
9 LLD_C The leading dimension of the local array LLD_C >= max(1,LOCp(M_C)) Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

c

is the updated local part of the global general matrix C, containing the results of the computation.

Scope: local

Returned as: an LLD_C by (at least) LOCq(N_C) array, containing numbers of the data type indicated in Table 53.

Notes and Coding Rules

  1. The matrices must have no common elements; otherwise, results are unpredictable.

  2. The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.

  3. For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".

  4. The following values must be equal: CTXT_A = CTXT_C.

  5. The coding rules (given in this section) and the error conditions (given in the next section) are written in terms of adist. To determine a value for adist, check the following conditions, in order, and chose the first value having a true condition:

    1. If A is a block column matrix, that is:
      m+mod(ja-1, NB_A) <= NB_A

      then adist = 'C'

    2. If A is a block row matrix, that is:
      n+mod(ia-1, MB_A) <= MB_A

      then adist = 'R'

    3. If A is neither a block column or a block row matrix, then:

      • If m <= n, then adist = 'C'.

      • Otherwise, adist = 'R'.

  6. If adist = 'C', then you must follow these coding rules:

  7. If adist = 'R', then you must follow these coding rules:

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1
  1. DTYPE_A is invalid.
  2. DTYPE_C is invalid.

Stage 2
  1. CTXT_A is invalid.

Stage 3
  1. PDTRAN was called from outside the process grid.

Stage 4
  1. m < 0
  2. n < 0
  3. M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
  4. N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
  5. MB_A < 1
  6. NB_A < 1
  7. RSRC_A < 0 or RSRC_A >= p
  8. CSRC_A < 0 or CSRC_A >= q
  9. ia < 1
  10. ja < 1
  11. M_C < 0 and (m = 0 or n = 0); M_C < 1 otherwise
  12. N_C < 0 and (m = 0 or n = 0); N_C < 1 otherwise
  13. MB_C < 1
  14. NB_C < 1
  15. RSRC_C < 0 or RSRC_C >= p
  16. CSRC_C < 0 or CSRC_C >= q
  17. ic < 1
  18. jc < 1
  19. CTXT_A <> CTXT_C

Stage 5
Note: Some of the following error conditions depend on the value of adist--that is, adist = 'C' or or adist = 'R'. For details on determining the value, see "Notes and Coding Rules".

If m <> 0 and n <> 0:

  1. ia > M_A
  2. ja > N_A
  3. ia+n-1 > M_A
  4. ja+m-1 > N_A
  5. ic > M_C
  6. jc > N_C
  7. ic+m-1 > M_C
  8. jc+n-1 > N_C

If adist = 'C':

  1. mod(ia-1, MB_A) <> 0
  2. mod(jc-1, NB_C) <> 0
  3. MB_A <> NB_C
  4. If looping is required--that is, either of the following is true:
    m+mod(ja-1, NB_A) > NB_A
    m+mod(ic-1, MB_C) > MB_C

    then:

    1. mod(ja-1, NB_A) <> mod(ic-1, MB_C)
    2. NB_A <> MB_C.

If adist = 'R':

  1. mod(ja-1, NB_A) <> 0
  2. mod(ic-1, MB_C) <> 0
  3. NB_A <> MB_C
  4. If looping is required--that is, either of the following is true:
    n+mod(ia-1, MB_A) > MB_A
    n+mod(jc-1, NB_C) > NB_C

    then:

    1. mod(ia-1, MB_A) <> mod(jc-1, NB_C)
    2. MB_A <> NB_C.

Stage 6
  1. LLD_A < max(1, LOCp(M_A))
  2. LLD_C < max(1, LOCp(M_C))

Example

This example computes C = betaC+alphaAT using a 2 × 2 process grid.

Call Statements and Input


ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M    N     ALPHA    A  IA  JA    DESC_A    BETA    C  IC  JC   DESC_C
              |    |       |      |   |   |      |         |     |   |   |     |
CALL PDTRAN(  9  , 8  ,  1.0D0  , A , 1 , 1 ,  DESC_A ,  1.0D0 , C , 1 , 1 , DESC_C )



Desc_A Desc_C
DTYPE_ 1 1
CTXT_ icontxt1 icontxt1
M_ 8 9
N_ 9 8
MB_ 2 4
NB_ 4 2
RSRC_ 0 0
CSRC_ 0 0
LLD_ See below2 See below2

1 icontxt is the output of the BLACS_GRIDINIT call.

2 Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_C = MAX(1,NUMROC(M_C, MB_C, MYROW, RSRC_C, NPROW))

In this example, LLD_A = 4 on all processes, LLD_C = 5 on P00 and P01, and LLD_C = 4 on P10 and P11.

Global general 8 × 9 matrix A with block size 2 × 4:

B,D             0                       1               2
     *                                                      *
 0   |  0.0 -1.0 -1.0  0.0  |   0.0  0.0  0.0  0.0  |   1.0 |
     |  0.0  1.0  0.0  1.0  |   0.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 1   |  0.0  0.0 -1.0 -1.0  |   0.0  0.0  1.0  0.0  |   1.0 |
     |  0.0  1.0  0.0 -1.0  |   1.0  1.0  0.0  1.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 2   |  1.0  0.0  0.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     |  1.0  0.0  0.0  0.0  |   1.0  1.0  0.0  0.0  |   1.0 |
     | ---------------------|-----------------------|------ |
 3   |  0.0  0.0 -1.0  0.0  |  -1.0  0.0  0.0  0.0  |   1.0 |
     | -1.0  0.0  0.0  0.0  |   0.0  0.0 -1.0  0.0  |   1.0 |
     *                                                      *

The following is the 2 × 2 process grid:
B,D 0 2 1
0

2

P00 P01
1

3

P10 P11

Local arrays for A:

p,q  |            0              |           1
-----|---------------------------|----------------------
     |  0.0 -1.0 -1.0  0.0  1.0  |   0.0  0.0  0.0  0.0
     |  0.0  1.0  0.0  1.0  1.0  |   0.0  1.0  0.0  1.0
 0   |  1.0  0.0  0.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     |  1.0  0.0  0.0  0.0  1.0  |   1.0  1.0  0.0  0.0
-----|---------------------------|----------------------
     |  0.0  0.0 -1.0 -1.0  1.0  |   0.0  0.0  1.0  0.0
     |  0.0  1.0  0.0 -1.0  1.0  |   1.0  1.0  0.0  1.0
 1   |  0.0  0.0 -1.0  0.0  1.0  |  -1.0  0.0  0.0  0.0
     | -1.0  0.0  0.0  0.0  1.0  |   0.0  0.0 -1.0  0.0

Global general 9 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     |  0.0  1.0  |   1.0  5.0  |   6.0  7.0  |   8.0  9.0 |
     |  0.0 -1.0  |   0.0 -1.0  |   0.0 -1.0  |   0.0  1.0 |
 0   |  0.0  0.0  |   1.0  1.0  |   0.0  0.0  |  -1.0  0.0 |
     |  0.0 -1.0  |   0.0  1.0  |  -1.0 -1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  2.0  |   0.0  0.0  |   1.0  0.0  |   0.0  0.0 |
     | -1.0  3.0  |   0.0  0.0  |  -1.0 -1.0  |   0.0  0.0 |
 1   |  0.0  4.0  |   1.0  0.0  |   1.0  0.0  |   0.0  0.0 |
     |  1.0  5.0  |   0.0  0.0  |   0.0  0.0  |   1.0  0.0 |
     | -----------|-------------|-------------|----------- |
 2   |  1.0  2.0  |   3.0  4.0  |   1.0  1.0  |   1.0  1.0 |
     *                                                     *

The following is the 2 × 2 process grid:
B,D 0 2 1 3
0

2

P00 P01
1 P10 P11

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0  1.0  6.0  7.0  |   1.0  5.0  8.0  9.0
     |  0.0 -1.0  0.0 -1.0  |   0.0 -1.0  0.0  1.0
 0   |  0.0  0.0  0.0  0.0  |   1.0  1.0 -1.0  0.0
     |  0.0 -1.0 -1.0 -1.0  |   0.0  1.0  0.0  1.0
     |  1.0  2.0  1.0  1.0  |   3.0  4.0  1.0  1.0
-----|----------------------|----------------------
     | -1.0  2.0  1.0  0.0  |   0.0  0.0  0.0  0.0
     | -1.0  3.0 -1.0 -1.0  |   0.0  0.0  0.0  0.0
 1   |  0.0  4.0  1.0  0.0  |   1.0  0.0  0.0  0.0
     |  1.0  5.0  0.0  0.0  |   0.0  0.0  1.0  0.0

Output:

Global general 9 × 8 matrix C with block size 4 × 2:

B,D        0             1             2             3
     *                                                     *
     |  0.0  1.0  |   1.0  5.0  |   7.0  8.0  |   8.0  8.0 |
     | -1.0  0.0  |   0.0  0.0  |   0.0 -1.0  |   0.0  1.0 |
 0   | -1.0  0.0  |   0.0  1.0  |   0.0  0.0  |  -2.0  0.0 |
     |  0.0  0.0  |  -1.0  0.0  |  -1.0 -1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
     | -1.0  2.0  |   0.0  1.0  |   0.0  1.0  |  -1.0  0.0 |
     | -1.0  4.0  |   0.0  1.0  |  -1.0  0.0  |   0.0  0.0 |
 1   |  0.0  4.0  |   2.0  0.0  |   1.0  0.0  |   0.0 -1.0 |
     |  1.0  6.0  |   0.0  1.0  |   0.0  0.0  |   1.0  0.0 |
     | -----------|-------------|-------------|----------- |
 2   |  2.0  3.0  |   4.0  5.0  |   2.0  2.0  |   2.0  2.0 |
     *                                                     *

The following is the 2 × 2 process grid:
B,D 0 2 1 3
0

2

P00 P01
1 P10 P11

Local arrays for C:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0  1.0  7.0  8.0  |   1.0  5.0  8.0  8.0
     | -1.0  0.0  0.0 -1.0  |   0.0  0.0  0.0  1.0
 0   | -1.0  0.0  0.0  0.0  |   0.0  1.0 -2.0  0.0
     |  0.0  0.0 -1.0 -1.0  |  -1.0  0.0  0.0  1.0
     |  2.0  3.0  2.0  2.0  |   4.0  5.0  2.0  2.0
-----|----------------------|----------------------
     | -1.0  2.0  0.0  1.0  |   0.0  1.0 -1.0  0.0
     | -1.0  4.0 -1.0  0.0  |   0.0  1.0  0.0  0.0
 1   |  0.0  4.0  1.0  0.0  |   2.0  0.0  0.0 -1.0
     |  1.0  6.0  0.0  0.0  |   0.0  1.0  1.0  0.0


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]