Guide and Reference


Reference Information (HPF)

This part of the book is organized into five areas, providing reference information for coding the Parallel ESSL calling sequences in a High Performance Fortran (HPF) program. It is organized as follows:


PBLAS (HPF)

This chapter describes the Level 2 and 3 PBLAS subroutines that can be called from an HPF program.


Overview of the PBLAS Subroutines

The Level 2 and 3 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 2 and 3 BLAS.
Note: These subroutines are designed to be consistent with the proposals for the Fortran 90 BLAS and the Fortran 90 LAPACK. (See references [30] and [31].) If these subroutines do not comply with any eventual proposal for HPF interfaces to the PBLAS and ScaLAPACK, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program.

Level 2 PBLAS


Table 110. List of Level 2 PBLAS (HPF)
Descriptive Name Long-Precision Subprogram Page
Matrix-Vector Product for a General Matrix or Its Transpose GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Vector Product for a Real Symmetric Matrix SYMM SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Rank-One Update of a General Matrix GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Rank-One Update of a Real Symmetric Matrix SYRK SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-Two Update of a Real Symmetric Matrix SYR2K SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix-Vector Product for a Triangular Matrix or Its Transpose TRMM TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with a Single Right-Hand Side TRSM TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

Level 3 PBLAS


Table 111. List of Level 3 PBLAS (HPF)
Descriptive Name Long-Precision Subprogram Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric SYMM SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product TRMM TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides TRSM TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix SYRK SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix SYR2K SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix TRAN TRAN--Matrix Transpose for a General Matrix

PBLAS Subroutines

This section contains the PBLAS subroutine descriptions.

GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose

This subroutine performs any one of the following combined matrix computations:

1. C <-- alphaAB+betaC
2. C <-- alphaABT+betaC
3. C <-- alphaATB+betaC
4. C <-- alphaATBT+betaC
5. C <-- alphaAHB+betaC
6. C <-- alphaAHBT+betaC
7. C <-- alphaABH+betaC
8. C <-- alphaATBH+betaC
9. C <-- alphaAHBH+betaC
10. c <-- alphaAb+betac
11. c <-- alphaATb+betac
12. C <-- alphaabT+C

where, in the formulas above:

A, B, and C are general matrices.
a, b, and c are vectors.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes or matrix conjugate transposes; that is, the matrices should always be stored in their untransposed forms.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [17], [30], [31], and [44].

Table 112. Data Types
alpha, beta, A, B, C, a, b, c Subroutine
Long-precision real GEMM
Long-precision complex GEMM

Syntax

HPF Equations 1-9 CALL GEMM (alpha, a, b, beta, c)

CALL GEMM (alpha, a, b, beta, c, transa, transb)

HPF Equations 10 and 11 CALL GEMM (alpha, a, b, beta, c)

CALL GEMM (alpha, a, b, beta, c, transa)

HPF Equation 12 CALL GEMM (alpha, a, b, c)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 112.

a

is the general matrix A or the vector a, where:

If transa = 'N', A is used in the computation.

If transa = 'T', AT is used in the computation.

If transa = 'C', AH is used in the computation.
Note: No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

b

is the general matrix B or the vector b, where:

If transb = 'N', B is used in the computation.

If transb = 'T', BT is used in the computation.

If transb = 'C', BH is used in the computation.

Type: required
Note: No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form.

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

beta

is the scalar beta.

Type: required (equations 1-11); not present (equation 12)

Specified as: a number of the data type indicated in Table 112.

c

is the general matrix C or the vector c. When beta is zero, c need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1, 2, 7, or 10.

If transa = 'T', AT is used in the computation, resulting in equation 3, 4, 8, or 11.

If transa = 'C', AH is used in the computation, resulting in equation 5, 6, or 9.

Type: optional (equations 1-11); not present (equation 12)

Default: transa = 'N'

Specified as: a single character; transa = 'N', 'T', or 'C'.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation, resulting in equation 1, 3, or 5.

If transb = 'T', BT is used in the computation, resulting in equation 2, 4, or 6.

If transb = 'C', BH is used in the computation, resulting in equation 7, 8, or 9.

Type: optional (equations 1-9); not present (equations 10-12)

Default: transb = 'N'

Specified as: a single character; transb = 'N' or 'T'.

On Return

c

is the updated matrix C or vector c, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. This subroutine accepts lowercase letters for the transa and transb arguments.

  3. If you are using long-precision real data and specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.

  4. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  5. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  6. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  7. The restrictions given in "Notes and Coding Rules", "Notes and Coding Rules", and "Notes and Coding Rules" also apply to this subroutine.

  8. An example of the use of this subroutine in a thermal diffusion application program is shown in Appendix B. "Sample Programs". See "Program Main (HPF)".

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions", "Error Conditions", and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1-9

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2
  1. transa is present, and transa <>'N', 'T', or 'C'.
  2. transb is present, and transb <>'N', 'T', or 'C'.

Stage 3

The process grid is not the same for a, b, and c.

Stage 4

The data distribution is inconsistent for a, b, and c.

Stage 5

The shape of the assumed-shape arrays a, b, and c is incompatible:

  1. transa = 'N' and transb = 'N':
    size(a,1) <> size(c,1) or
    size(b,2) <> size(c,2) or
    size(a,2) <> size(b,1)
  2. transa = 'N' and transb = 'T' or 'C':
    size(a,1) <> size(c,1) or
    size(b,1) <> size(c,2) or
    size(a,2) <> size(b,2)
  3. transa = 'T' or 'C' and transb = 'N':
    size(a,2) <> size(c,1) or
    size(b,2) <> size(c,2) or
    size(a,1) <> size(b,1)
  4. transa = 'T' or 'C' and transb = 'T' or 'C':
    size(a,2) <> size(c,1) or
    size(b,1) <> size(c,2) or
    size(a,1) <> size(b,2)

Stage 6

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equations 10 and 11

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

transa is present, and transa <>'N', 'T', or 'C'.

Stage 3

The process grid is not the same for a, b, and c.

Stage 3
  1. The vector for b or c is replicated.
  2. The data distribution for a is unsupported.

Stage 5
  1. Vector distribution error for b or c
  2. The shape of the assumed-shape arrays a, b, and c is incompatible:
    1. transa = 'N':
      size(a,1) <> size(c) or
      size(a,2) <> size(b)
    2. transa = 'T':
      size(a,1) <> size(b) or
      size(a,2) <> size(c)

Stage 6

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 12

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution for c is unsupported.

Stage 4

The vector for a or b is replicated.

Stage 5

The data distribution is inconsistent for a and b.

Stage 6
  1. Vector distribution error for a or b.
  2. The shape of the assumed-shape arrays a, b, and c is incompatible:
    size(c,1) <> size(a) or
    size(c,2) <> size(b)

Stage 7

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaAB+betaC. As in "Example 1", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A , B , 2.0D0 , C )
-or-
CALL GEMM( 1.0D0 , A , B , 2.0D0 , C , TRANSA='N' , TRANSB='N' )

Input

General 6 × 5 matrix A:

 *                              *
 |  1.0   2.0  -1.0  -1.0   4.0 |
 |  2.0   0.0   1.0   1.0  -1.0 |
 |  1.0  -1.0  -1.0   1.0   2.0 |
 | -3.0   2.0   2.0   2.0   0.0 |
 |  4.0   0.0  -2.0   1.0  -1.0 |
 | -1.0  -1.0   1.0  -3.0   2.0 |
 *                              *

General 5 × 4 matrix B:

 *                        *
 |  1.0  -1.0   0.0   2.0 |
 |  2.0   2.0  -1.0  -2.0 |
 |  1.0   0.0  -1.0   1.0 |
 | -3.0  -1.0   1.0  -1.0 |
 |  4.0   2.0  -1.0   1.0 |
 *                        *

General 6 × 4 matrix C:

 *                    *
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 *                    *

Output

General 6 × 4 matrix C:

 *                        *
 | 24.0  13.0  -5.0   3.0 |
 | -3.0  -4.0   2.0   4.0 |
 |  4.0   1.0   2.0   5.0 |
 | -2.0   6.0  -1.0  -9.0 |
 | -4.0  -6.0   5.0   5.0 |
 | 16.0   7.0  -4.0   7.0 |
 *                        *

Example 2

This example computes C = alphaAB+betaC. As in "Example 2", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( (1.0D0,0.0D0) , A , B , (2.0D0,0.0D0) , C )
-or-
CALL GEMM( (1.0D0,0.0D0) , A , B , (2.0D0,0.0D0) , C , TRANSA='N' , TRANSB='N' )

Input

General 6 × 3 matrix A:

 *                                   *
 |  (1.0,5.0)   (9.0,2.0)  (1.0,9.0) |
 |  (2.0,4.0)   (8.0,3.0)  (1.0,8.0) |
 |  (3.0,3.0)   (7.0,5.0)  (1.0,7.0) |
 |  (4.0,2.0)   (4.0,7.0)  (1.0,5.0) |
 |  (5.0,1.0)   (5.0,1.0)  (1.0,6.0) |
 |  (6.0,6.0)   (3.0,6.0)  (1.0,4.0) |
 *                                   *

General 3 × 2 matrix B:

 *                       *
 |  (1.0,8.0)  (2.0,7.0) |
 |  (4.0,4.0)  (6.0,8.0) |
 |  (6.0,2.0)  (4.0,5.0) |
 *                       *

General 6 × 2 matrix C:

 *                      *
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 *                      *

Output

General 6 × 2 matrix C:

 *                                *
 |  (-22.0,113.0)  (-35.0.142.0)  |
 |  (-19.0,114.0)  (-35.0.141.0)  |
 |  (-20.0,119.0)  (-43.0.146.0)  |
 |  (-27.0,110.0)  (-58.0.131.0)  |
 |    (8.0,103.0)    (0.0.112.0)  |
 |  (-55.0,116.0)  (-75.0.135.0)  |
 *                                *

Example 3

This example computes c = alphaAb+betac. The input matrices A, B, and C, used here, are the same as the matrices used in "Example 1". The updated portion of C is also the same, as this computation is equivalent to a portion of the computation.

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix A starting at row 3 and column 1 in an array, a column vector b starting at row 1 and column 2 in an array, and a column vector c, starting at row 3 and column 2 in an array.

As in "Example 1", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A(3:6,1:5) , B(1:5,2:2) , 2.0D0 , C(3:6,2:2) )
-or-
CALL GEMM( 1.0D0 , A(3:6,1:5) , B(1:5,2:2) , 2.0D0 , C(3:6,2:2) , TRANSA='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the 4 × 5 submatrix A, starting at row 3 and column 1 in the 6 × 5 array:

 *                              *
 |   .     .     .     .     .  |
 |   .     .     .     .     .  |
 |  1.0  -1.0  -1.0   1.0   2.0 |
 | -3.0   2.0   2.0   2.0   0.0 |
 |  4.0   0.0  -2.0   1.0  -1.0 |
 | -1.0  -1.0   1.0  -3.0   2.0 |
 *                              *

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 5, starting at row 1 and column 2 in the 5 × 4 array:

 *                       *
 |  .   -1.0    .     .  |
 |  .    2.0    .     .  |
 |  .    0.0    .     .  |
 |  .   -1.0    .     .  |
 |  .    2.0    .     .  |
 *                       *

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 3 and column 2 in the 6 × 4 array:

 *                    *
 |  .    .    .    .  |
 |  .    .    .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 *                    *

Output

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 3 and column 2 in the 6 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .    1.0    .     .  |
 |  .    6.0    .     .  |
 |  .   -6.0    .     .  |
 |  .    7.0    .     .  |
 *                       *

Example 4

This example computes c = alphaAb+betac. The input matrices A, B, and C, used here, are the same as A, B, and C, used in "Example 1".

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array, a row vector b starting at row 4 and column 2 in an array, and a column vector c starting at row 2 and column 3 in an array.

As in "Example 2", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A(2:5,2:4) , B(4:4,2:4) , 2.0D0 , C(2:5,3:3) )
-or-
CALL GEMM( 1.0D0 , A(2:5,2:4) , B(4:4,2:4) , 2.0D0 , C(2:5,3:3) , TRANSA='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the 4 × 3 submatrix A, starting at row 2 and column 2 in the 6 × 5 array:

 *                             *
 |  .     .     .     .     .  |
 |  .    0.0   1.0   1.0    .  |
 |  .   -1.0  -1.0   1.0    .  |
 |  .    2.0   2.0   2.0    .  |
 |  .    0.0  -2.0   1.0    .  |
 |  .     .     .     .     .  |
 *                             *

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 3, starting at row 4 and column 2 in the 5 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .   -1.0   1.0  -1.0 |
 |  .     .     .     .  |
 *                       *

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 2 and column 3 in the 6 × 4 array:

 *                    *
 |  .    .    .    .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .    .    .  |
 *                    *

Output

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 2 and column 3 in the 6 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .    1.0    .  |
 |  .     .    0.0    .  |
 |  .     .   -1.0    .  |
 |  .     .   -2.0    .  |
 |  .     .     .     .  |
 *                       *

Example 5

This example computes C = alphaabT+C.

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix C starting at row 2 and column 2 in an array, a column vector a, starting at element 2 in an array, and a row vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ ALIGN B(:) WITH C(1,:)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: C
 
CALL GEMM( 1.0D0 , A(2:10) , B(2:10) , C(2:10,2:10) )

Input

Only a portion of the data structure is used--that is, submatrix C. Following is the 9 × 9 submatrix C, starting at row 2 and column 2 in the 10 × 10 array:

 *                                                                   *
 |  .     .      .      .      .      .      .      .      .      .  |
 |  .   12.0   22.0   32.0   42.0   52.0   62.0   72.0   82.0   92.0 |
 |  .   13.0   23.0   33.0   43.0   53.0   63.0   73.0   83.0   93.0 |
 |  .   14.0   24.0   34.0   44.0   54.0   64.0   74.0   84.0   94.0 |
 |  .   15.0   25.0   35.0   45.0   55.0   65.0   75.0   85.0   95.0 |
 |  .   16.0   26.0   36.0   46.0   56.0   66.0   76.0   86.0   96.0 |
 |  .   17.0   27.0   37.0   47.0   57.0   67.0   77.0   87.0   97.0 |
 |  .   18.0   28.0   38.0   48.0   58.0   68.0   78.0   88.0   98.0 |
 |  .   19.0   29.0   39.0   49.0   59.0   69.0   79.0   89.0   99.0 |
 |  .   20.0   30.0   40.0   50.0   60.0   70.0   80.0   90.0  100.0 |
 *                                                                   *

Only a portion of the data structure is used--that is, vector a, which is a column vector. Following is the vector a of size 9, starting at element 2 in the array of size 11:

 *     *
 |  .  |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 |  .  |
 *     *

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 9, starting at element 2 in the array of size 11:

 *                                                                 *
 |  .    2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0    .  |
 *                                                                 *

Output

Only a portion of the data structure is used--that is, submatrix C. Following is the 9 × 9 submatrix C, starting at row 2 and column 2 in the 10 × 10 array:

 *                                                                   *
 |  .     .      .      .      .      .      .      .      .      .  |
 |  .   14.0   25.0   36.0   47.0   58.0   69.0   80.0   91.0  102.0 |
 |  .   15.0   26.0   37.0   48.0   59.0   70.0   81.0   92.0  103.0 |
 |  .   16.0   27.0   38.0   49.0   60.0   71.0   82.0   93.0  104.0 |
 |  .   17.0   28.0   39.0   50.0   61.0   72.0   83.0   94.0  105.0 |
 |  .   18.0   29.0   40.0   51.0   62.0   73.0   84.0   95.0  106.0 |
 |  .   19.0   30.0   41.0   52.0   63.0   74.0   85.0   96.0  107.0 |
 |  .   20.0   31.0   42.0   53.0   64.0   75.0   86.0   97.0  108.0 |
 |  .   21.0   32.0   43.0   54.0   65.0   76.0   87.0   98.0  109.0 |
 |  .   22.0   33.0   44.0   55.0   66.0   77.0   88.0   99.0  110.0 |
 *                                                                   *

SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric

This subroutine computes one of the following matrix-matrix products:

1. C <-- alphaAB+betaC
2. C <-- alphaBA+betaC
3. c <-- alphaAb+betac

where, in the formulas above:

A is a symmetric matrix.
B and C are general matrices.
b and c are vectors.
alpha and beta are scalars.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [17], [30], [31], and [44].

Table 113. Data Types
alpha, beta, A, B, C, b, c Subprogram
Long-precision real SYMM

Syntax

HPF Equations 1 and 2 CALL SYMM (alpha, a, b, beta, c, uplo, side)
HPF Equation 3 CALL SYMM (alpha, a, b, beta, c, uplo)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 113.

a

is the symmetric matrix A, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 113, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

beta

is the scalar beta.

Type: required

Specified as: a number of the data type indicated in Table 113.

c

is the general matrix C or the vector c. When beta is zero, c need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

uplo

indicates whether the upper or lower triangular part of the symmetric matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1.

If side = 'R', A is to the right of B, resulting in equation 2.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a single character; side = 'L' or 'R'.

On Return

c

is the updated matrix C or vector c, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.

  3. This subroutine accepts lowercase letters for the side and uplo arguments.

  4. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  5. The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

  6. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  7. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is inconsistent for a, b, and c.

Stage 4
  1. side is present, and side <>'L' or 'R'.
  2. side = 'L' or 'R', and the shape of the assumed-shape arrays a, b, and c is incompatible:
    1. side = 'L' and:
      size(b,1) <> size(c,1) or
      size(c,1) <> size(a,1) or
      size(a,1) <> size(a,2) or
      size(b,2) <> size(c,2)
    2. side = 'R' and:
      size(b,1) <> size(c,1) or
      size(b,2) <> size(c,2) or
      size(c,2) <> size(a,1) or
      size(a,1) <> size(a,2)
  3. The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3
  1. The vector for b or c is replicated.
  2. The data distribution for a is unsupported.

Stage 4
  1. Vector distribution error for b or c.
  2. The shape of the assumed-shape arrays a, b, and c is incompatible:
    size(a,1) <> size(a,2) or
    size(a,1) <> size(b) or
    size(a,1) <> size(c)
  3. The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaBA+betaC. Because beta = 0, C need not be set on input. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL SYMM( 1.0D0, A, B, 0.0D0, C, 'U', 'R' )

Input

Symmetric matrix A of order 8:

 *                                               *
 | 0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0 |
 |  .    1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  .     .   -1.0  -1.0   0.0   0.0   1.0   0.0 |
 |  .     .     .   -1.0   1.0   1.0   0.0   1.0 |
 |  .     .     .     .   -1.0   0.0   0.0   0.0 |
 |  .     .     .     .     .    1.0   0.0   0.0 |
 |  .     .     .     .     .     .    0.0   0.0 |
 |  .     .     .     .     .     .     .    0.0 |
 *                                               *

General 16 × 8 matrix B:

 *                                                *
 | -1.0   0.0   1.0  -1.0   1.0   1.0  -1.0  -1.0 |
 | -1.0  -1.0   1.0   0.0   1.0  -1.0  -1.0   1.0 |
 |  1.0   1.0  -1.0   0.0  -1.0   0.0   1.0   0.0 |
 |  0.0  -1.0   0.0   0.0   0.0   0.0   0.0  -1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   1.0   0.0 |
 |  0.0   0.0   1.0   0.0  -1.0  -1.0   0.0   0.0 |
 |  1.0   1.0   0.0   0.0   1.0   1.0   0.0  -1.0 |
 |  0.0   0.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   0.0   0.0  -1.0   1.0   1.0   0.0   1.0 |
 | -1.0  -1.0   1.0   0.0   0.0  -1.0   0.0   1.0 |
 |  0.0   0.0   0.0   1.0   1.0   0.0   0.0   0.0 |
 |  0.0   0.0   1.0   1.0   0.0  -1.0   0.0   0.0 |
 |  1.0   1.0  -1.0   0.0  -1.0  -1.0   1.0   1.0 |
 |  0.0   0.0   0.0   0.0   1.0   0.0   0.0  -1.0 |
 |  0.0   1.0   0.0   0.0   0.0   0.0   0.0   0.0 |
 | -1.0   0.0  -1.0   0.0   0.0   1.0   1.0   0.0 |
 *                                                *

Output

General 16 × 8 matrix C:

 *                                                *
 | -1.0   0.0   0.0   1.0  -2.0   0.0   1.0  -1.0 |
 |  0.0   0.0  -1.0  -1.0  -1.0  -2.0   1.0  -1.0 |
 |  0.0   0.0   1.0   1.0   1.0   1.0  -1.0   1.0 |
 |  1.0  -2.0   0.0  -2.0   0.0  -1.0   0.0  -1.0 |
 | -1.0   3.0   0.0   1.0   1.0   3.0   0.0   2.0 |
 | -1.0  -1.0  -1.0  -3.0   1.0  -1.0   1.0   0.0 |
 | -1.0   0.0  -1.0   2.0  -1.0   2.0   0.0   1.0 |
 |  1.0   2.0   1.0   3.0   0.0   1.0  -1.0   0.0 |
 |  0.0   1.0   1.0   4.0  -2.0   0.0   0.0  -1.0 |
 |  0.0   0.0   0.0  -2.0   0.0  -2.0   1.0  -1.0 |
 |  0.0   1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 | -1.0   0.0  -2.0  -3.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0   1.0   1.0   1.0   0.0  -1.0   1.0 |
 |  0.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0 |
 | -1.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  1.0   2.0   3.0   2.0   0.0   1.0  -1.0   0.0 |
 *                                                *

Example 2

This example computes c = alphaAb+betac. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(:,1)
!HPF$ ALIGN C(:) WITH A(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL SYMM( 1.0D0, A, B, 0.0D0, C, 'U' )

Input

Symmetric matrix A of order 8:

 *                                               *
 | 0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0 |
 |  .    1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  .     .   -1.0  -1.0   0.0   0.0   1.0   0.0 |
 |  .     .     .   -1.0   1.0   1.0   0.0   1.0 |
 |  .     .     .     .   -1.0   0.0   0.0   0.0 |
 |  .     .     .     .     .    1.0   0.0   0.0 |
 |  .     .     .     .     .     .    0.0   0.0 |
 |  .     .     .     .     .     .     .    0.0 |
 *                                               *

Vector b of size 8:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Output

Vector c of size 8:

 *      *
 | -2.0 |
 |  3.0 |
 | -2.0 |
 |  2.0 |
 |  0.0 |
 |  3.0 |
 |  1.0 |
 |  2.0 |
 *      *

TRMM--Triangular Matrix-Matrix Product

This subroutine computes one of the following matrix-matrix products:

1. B <-- alphaAB
2. B <-- alphaATB
3. B <-- alphaBA
4. B <-- alphaBAT
5. b <-- Ab
6. b <-- ATb

where, in the formulas above:

A is a triangular matrix.
B is a general matrix.
b is a vector.
alpha is a scalar.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If any of the assumed-shape arrays have a size of zero, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [17], [30], [31], and [44].

Table 114. Data Types
alpha, A, B, b Subprogram
Long-precision real TRMM

Syntax

HPF Equations 1-4 CALL TRMM (alpha, a, b, uplo, side)

CALL TRMM (alpha, a, b, uplo, side, transa, diag)

HPF Equations 5 and 6 CALL TRMM (a, b, uplo)

CALL TRMM (a, b, uplo, transa, diag)

On Entry

alpha

is the scalar alpha.

Type: required (equations 1-4); not present (equations 5 and 6)

Specified as: a number of the data type indicated in Table 114.

a

is the triangular matrix A, where:

If uplo = 'U', the array contains the upper triangle of the triangular matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the triangular matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 114, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 114.

uplo

indicates whether the upper or lower triangular part of the triangular matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1 or 2.

If side = 'R', A is to the right of B, resulting in equation 3 or 4.

Type: required (equations 1-4); not present (equations 5 and 6)

Specified as: a single character; side = 'L' or 'R'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1, 3, or 5.

If transa = 'T', AT is used in the computation, resulting in equation 2, 4, or 6.

Type: optional

Default: transa = 'N'

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Type: optional

Default: diag = 'N'

Specified as: a single character; diag = 'U' or 'N'.

On Return

b

is the updated matrix B or vector b, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 114.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.

  3. This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.

  4. If you specify 'C' for transa, it is interpreted as though you specified 'T'.

  5. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  6. This subroutine assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.

  7. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  8. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  9. The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1-4

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or b.
  2. The process rank is not the same for a and b.
  3. The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4
  1. side is present, and side <>'L' or 'R'.
  2. side = 'L' or 'R', and the shape of the assumed-shape arrays a and b is incompatible:
    1. side = 'L' and:
      size(b,1) <> size(a,2) or
      size(a,1) <> size(a,2)
    2. side = 'R' and:
      size(b,2) <> size(a,1) or
      size(a,1) <> size(a,2)
  3. The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Input-Argument Errors for Equations 5 and 6

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or b.
  2. The process rank is not the same for a and b.
  3. The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Stage 6

The shape of the assumed-shape arrays a and b is incompatible: size(a,1) <> size(b)

Example 1

This example computes B = alphaAB. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B
 
CALL TRMM( 1.0D0 , A , B , 'U' , 'L' )
-or-
CALL TRMM( 1.0D0 , A , B , 'U' , 'L' , TRANSA='N' , DIAG='N' )

Input

Triangular matrix A of order 5 is upper triangular:

 *                             *
 | 3.0  -1.0   2.0   2.0   1.0 |
 |  .   -2.0   4.0  -1.0   3.0 |
 |  .     .   -3.0   0.0   2.0 |
 |  .     .     .    4.0  -2.0 |
 |  .     .     .     .    1.0 |
 *                             *

Rectangular 5 × 3 matrix B:

 *                  *
 |  2.0   3.0   1.0 |
 |  5.0   5.0   4.0 |
 |  0.0   1.0   2.0 |
 |  3.0   1.0  -3.0 |
 | -1.0   2.0   1.0 |
 *                  *

Output

Rectangular 5 × 3 matrix B:

 *                     *
 |   6.0   10.0   -2.0 |
 | -16.0   -1.0    6.0 |
 |  -2.0    1.0   -4.0 |
 |  14.0    0.0  -14.0 |
 |  -1.0    2.0    1.0 |
 *                     *

Example 2

This example computes b = Ab, where A is not a unit triangular matrix, and b is a column vector.

Array sections are specified for arguments a and b, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array and a column vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL TRMM( A(2:13,2:13) , B(2:13) , 'U' )
-or-
CALL TRMM( A(2:13,2:13) , B(2:13) , 'U' , TRANSA='N' , DIAG='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the triangular submatrix A of order 12, starting at row 2 and column 2 in the array of order 13:

 *                                                                 *
 |  .    .    .    .    .    .    .    .    .    .    .    .    .  |
 |  .   1.0  2.0  1.0  2.0  1.0  1.0  3.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .   3.0  2.0  3.0  1.0  2.0  3.0  1.0  1.0  2.0  3.0  3.0 |
 |  .    .    .   3.0  1.0  3.0  2.0  1.0  2.0  1.0  2.0  3.0  1.0 |
 |  .    .    .    .   1.0  2.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .    .    .    .   2.0  1.0  2.0  2.0  1.0  2.0  3.0  3.0 |
 |  .    .    .    .    .    .   1.0  2.0  1.0  1.0  2.0  3.0  1.0 |
 |  .    .    .    .    .    .    .   2.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .    .    .    .    .    .    .   2.0  1.0  2.0  3.0  3.0 |
 |  .    .    .    .    .    .    .    .    .   3.0  1.0  3.0  1.0 |
 |  .    .    .    .    .    .    .    .    .    .   2.0  2.0  2.0 |
 |  .    .    .    .    .    .    .    .    .    .    .   1.0  3.0 |
 |  .    .    .    .    .    .    .    .    .    .    .    .   1.0 |
 *                                                                 *

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *     *
 |  .  |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 *     *

Output

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *      *
 |   .  |
 | 42.0 |
 | 48.0 |
 | 39.0 |
 | 31.0 |
 | 34.0 |
 | 23.0 |
 | 23.0 |
 | 23.0 |
 | 15.0 |
 | 12.0 |
 |  6.0 |
 |  1.0 |
 *      *

TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

This subroutine performs one of the following solves for a triangular system of equations with multiple right-hand sides:
Solution Equation
1. B <-- alpha(A-1)B AX = alphaB
2. B <-- alpha(A-T)B ATX = alphaB
3. B <-- alphaB(A-1) XA = alphaB
4. B <-- alphaB(A-T) XAT = alphaB
5. b <-- (A-1)b Ax = b
6. b <-- (A-T)b ATx = b

where, in the formulas above:

A is a triangular matrix.
B is a general matrix.
b is a vector.
alpha is a scalar.

Notes:

  1. The term X or x used in the systems of equations listed above represents the output solution matrix or vector, respectively. It is important to note that, in this subroutine, the solution matrix or vector is actually returned in the input-output argument b.

  2. No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If any of the assumed-shape arrays have a size of zero, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [17], [30], [31], and [44].

Table 115. Data Types
alpha, A, B, b Subprogram
Long-precision real TRSM

Syntax

HPF Solutions 1-4 CALL TRSM (alpha, a, b, uplo, side)

CALL TRSM (alpha, a, b, uplo, side, transa, diag)

HPF Solutions 5 and 6 CALL TRSM (a, b, uplo)

CALL TRSM (a, b, uplo, transa, diag)

On Entry

alpha

is the scalar alpha.

Type: required (solutions 1-4); not present (solutions 5 and 6)

Specified as: a number of the data type indicated in Table 115.

a

is the triangular matrix A used in the system of equations, where:

If uplo = 'U', the array contains the upper triangle of the triangular matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the triangular matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 115, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b, containing the right-hand side(s) of the triangular system to be solved.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 115.

uplo

indicates whether the upper or lower triangular part of the triangular matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the system of equations, where:

If side = 'L', A is to the left of B, resulting in solution 1 or 2.

If side = 'R', A is to the right of B, resulting in solution 3 or 4.

Type: required (solutions 1-4); not present (solutions 5 and 6)

Specified as: a single character; side = 'L' or 'R'.

transa

indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used in the system of equations, resulting in solution 1, 3, or 5.

If transa = 'T', AT is used in the system of equations, resulting in solution 2, 4, or 6.

Type: optional

Default: transa = 'N'

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Type: optional

Default: diag = 'N'

Specified as: a single character; diag = 'U' or 'N'.

On Return

b

is the updated matrix B or vector b, containing the solution vector(s).

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 115.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.

  3. This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.

  4. If you specify 'C' for transa, it is interpreted as though you specified 'T'.

  5. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  6. This subroutine assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.

  7. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  8. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  9. The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Solutions 1-4

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or b.
  2. The process rank is not the same for a and b.
  3. The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4
  1. side is present, and side <>'L' or 'R'.
  2. side = 'L' or 'R', and the shape of the assumed-shape arrays a and b is incompatible:
    1. side = 'L' and:
      size(b,1) <> size(a,2) or
      size(a,1) <> size(a,2)
    2. side = 'R' and:
      size(b,2) <> size(a,1) or
      size(a,1) <> size(a,2)
  3. The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Input-Argument Errors for Solutions 5 and 6

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or b.
  2. The process rank is not the same for a and b.
  3. The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Stage 6

The shape of the assumed-shape arrays a and b is incompatible: size(a,1) <> size(b)

Example 1

This example shows the solution B <-- alpha(A-1)B. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B
 
CALL TRSM(  1.0D0 , A , B , 'U' , 'L' )
-or-
CALL TRSM(  1.0D0 , A , B , 'U' , 'L' , TRANSA='N' , DIAG='N' )

Input

Triangular matrix A of order 5 is upper triangular:

 *                             *
 | 3.0  -1.0   2.0   2.0   1.0 |
 |  .   -2.0   4.0  -1.0   3.0 |
 |  .     .   -3.0   0.0   2.0 |
 |  .     .     .    4.0  -2.0 |
 |  .     .     .     .    1.0 |
 *                             *

General 5 × 3 matrix B:

 *                     *
 |   6.0   10.0   -2.0 |
 | -16.0   -1.0    6.0 |
 |  -2.0    1.0   -4.0 |
 |  14.0    0.0  -14.0 |
 |  -1.0    2.0    1.0 |
 *                     *

Output

General 5 × 3 matrix B:

 *                  *
 |  2.0   3.0   1.0 |
 |  5.0   5.0   4.0 |
 |  0.0   1.0   2.0 |
 |  3.0   1.0  -3.0 |
 | -1.0   2.0   1.0 |
 *                  *

Example 2

This example solves b <-- A-1b, where A is a unit triangular matrix, and b is a row vector.

Array sections are specified for arguments a and b, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array and a row vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(1,:)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL TRSM( A(2:13,2:13) , B(2:13) , 'L' , DIAG='U' )
-or-
CALL TRSM( A(2:13,2:13) , B(2:13) , 'L' , TRANSA='N' , DIAG='U' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the triangular submatrix A of order 12, starting at row 2 and column 2 in the array of order 13:

 *                                                                 *
 |  .    .    .    .    .    .    .    .    .    .    .    .    .  |
 |  .   1.0   .    .    .    .    .    .    .    .    .    .    .  |
 |  .   2.0  1.0   .    .    .    .    .    .    .    .    .    .  |
 |  .   3.0  2.0  1.0   .    .    .    .    .    .    .    .    .  |
 |  .   1.0  3.0  2.0  1.0   .    .    .    .    .    .    .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0   .    .    .    .    .    .    .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .    .    .  |
 |  .   1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .  |
 |  .   1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0 |
 *                                                                 *
Note: Because matrix A is unit triangular, the diagonal elements are not referenced. This subroutine assumes a value of 1.0 for the diagonal elements.

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *                                                                             *
 |  .    2.0   7.0  13.0  15.0  17.0  26.0  28.0  27.0  39.0  41.0  37.0  52.0 |
 *                                                                             *

Output

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *                                                                 *
 |  .   2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0  1.0 |
 *                                                                 *

SYRK--Rank-K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-k updates:

1. C <-- alphaAAT+betaC
2. C <-- alphaATA+betaC
3. C <-- alphaaaT+C

where, in the formulas above:

A is a general matrix.
C is a symmetric matrix.
a is a vector.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [17], [30], [31], and [44].

Table 116. Data Types
alpha, beta, A, C, a Subprogram
Long-precision real SYRK

Syntax

HPF Equations 1 and 2 CALL SYRK (alpha, a, beta, c, uplo)

CALL SYRK (alpha, a, beta, c, uplo, trans)

HPF Equation 3 CALL SYRK (alpha, a, c, uplo)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 116.

a

is the general matrix A or the vector a.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 116.

beta

is the scalar beta.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a number of the data type indicated in Table 116.

c

is the symmetric matrix C, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix C in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix C in its lower triangle, and its strictly upper triangular part is not referenced.

For equations 1 and 2, when beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 116, where size(c,1) = size(c,2).

uplo

indicates whether the upper or lower triangular part of the symmetric matrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Type: optional (equations 1 and 2); not present (equation 3)

Default: trans = 'N'

Specified as: a single character; trans = 'N' or 'T'.

On Return

c

is the updated symmetric matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 116.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. This subroutine accepts lowercase letters for the uplo and trans arguments.

  3. If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.

  4. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  5. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  6. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  7. The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or c.
  2. The process rank is not the same for a and c.
  3. The process rank is not 1 or 2 for a or c.

Stage 2

The process grid is not the same for a and c.

Stage 3

The data distribution is inconsistent for a and c.

Stage 4
  1. trans is present, and trans <>'N', 'T', or 'C'
  2. trans = 'N', 'T', or 'C', and the shape of the assumed-shape arrays a and c is incompatible:
    1. trans = 'N':
      size(c,2) <> size(a,1) or
      size(c,1) <> size(c,2)
    2. trans = 'T':
      size(c,2) <> size(a,2) or
      size(c,1) <> size(c,2)
  3. The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 5

The data distribution for a or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or c.
  2. The process rank is not the same for a and c.
  3. The process rank is not 1 or 2 for a or c.

Stage 2

The process grid is not the same for a and c.

Stage 3

The data distribution is unsupported for c.

Stage 4

The vector for a is replicated.

Stage 5

The data distribution for a is unsupported.

Stage 6

The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 7

The data distribution for c or a is unsupported.

Stage 8

The shape of the assumed-shape arrays c and a is incompatible: size(c,1) <> size(a)

Example 1

This example computes C = alphaAAT+betaC. As in "Example", array data is block-cyclically distributed using a 2 × 3 process grid.

!HPF$ PROCESSORS PROC(2,3)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, C
 
CALL SYRK( 1.0D0 , A , 1.0D0 , C , UPLO='L' )
-or-
CALL SYRK( 1.0D0 , A , 1.0D0 , C , UPLO='L' , TRANS='N' )

Input

General 8 × 5 matrix A:

 *                             *
 | 0.0   8.0  16.0  24.0  32.0 |
 | 1.0   9.0  17.0  25.0  33.0 |
 | 2.0  10.0  18.0  26.0  34.0 |
 | 3.0  11.0  19.0  27.0  35.0 |
 | 4.0  12.0  20.0  28.0  36.0 |
 | 5.0  13.0  21.0  29.0  37.0 |
 | 6.0  14.0  22.0  30.0  38.0 |
 | 7.0  15.0  23.0  31.0  39.0 |
 *                             *

Symmetric matrix C of order 8:

 *                                               *
 | 0.0    .     .     .     .     .     .     .  |
 | 1.0   8.0    .     .     .     .     .     .  |
 | 2.0   9.0  15.0    .     .     .     .     .  |
 | 3.0  10.0  16.0  21.0    .     .     .     .  |
 | 4.0  11.0  17.0  22.0  26.0    .     .     .  |
 | 5.0  12.0  18.0  23.0  27.0  30.0    .     .  |
 | 6.0  13.0  19.0  24.0  28.0  31.0  33.0    .  |
 | 7.0  14.0  20.0  25.0  29.0  32.0  34.0  35.0 |
 *                                               *

Output

Symmetric matrix C of order 8:

 *                                                                *
 | 1920.0      .       .       .       .       .       .       .  |
 | 2001.0  2093.0      .       .       .       .       .       .  |
 | 2082.0  2179.0  2275.0      .       .       .       .       .  |
 | 2163.0  2265.0  2366.0  2466.0      .       .       .       .  |
 | 2244.0  2351.0  2457.0  2562.0  2666.0      .       .       .  |
 | 2325.0  2437.0  2548.0  2658.0  2767.0  2875.0      .       .  |
 | 2406.0  2523.0  2639.0  2754.0  2868.0  2981.0  3093.0      .  |
 | 2487.0  2609.0  2730.0  2850.0  2969.0  3087.0  3204.0  3320.0 |
 *                                                                *

Example 2

This example computes C = alphaaaT+C. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL SYRK( 1.0D0 , A , C , UPLO='L' )

Input

Symmetric matrix C of order 9:

 *                                                     *
 | 1.0    .     .     .     .     .     .     .     .  |
 | 2.0  12.0    .     .     .     .     .     .     .  |
 | 3.0  13.0  23.0    .     .     .     .     .     .  |
 | 4.0  14.0  24.0  34.0    .     .     .     .     .  |
 | 5.0  15.0  25.0  35.0  45.0    .     .     .     .  |
 | 6.0  16.0  26.0  36.0  46.0  56.0    .     .     .  |
 | 7.0  17.0  27.0  37.0  47.0  57.0  67.0    .     .  |
 | 8.0  18.0  28.0  38.0  48.0  58.0  68.0  78.0    .  |
 | 9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0  89.0 |
 *                                                     *

Vector a of size 9:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Output

Matrix C of order 9:

 *                                                      *
 |  2.0    .     .     .     .     .     .     .     .  |
 |  3.0  13.0    .     .     .     .     .     .     .  |
 |  4.0  14.0  24.0    .     .     .     .     .     .  |
 |  5.0  15.0  25.0  35.0    .     .     .     .     .  |
 |  6.0  16.0  26.0  36.0  46.0    .     .     .     .  |
 |  7.0  17.0  27.0  37.0  47.0  57.0    .     .     .  |
 |  8.0  18.0  28.0  38.0  48.0  58.0  68.0    .     .  |
 |  9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0    .  |
 | 10.0  20.0  30.0  40.0  50.0  60.0  70.0  80.0  90.0 |
 *                                                      *

SYR2K--Rank-2K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-2k updates:

1. C <-- alphaABT+alphaBAT+betaC
2. C <-- alphaATB+alphaBTA+betaC
3. C <-- alphaabT+alphabaT+C

where, in the formulas above:

A and B are general matrices.
C is a symmetric matrix.
a and b are vectors.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [17], [30], [31], and [44].

Table 117. Data Types
alpha, beta, A, B, C, a, b Subprogram
Long-precision real SYR2K

Syntax

HPF Equations 1 and 2 CALL SYR2K (alpha, a, b, beta, c, uplo)

CALL SYR2K (alpha, a, b, beta, c, uplo, trans)

HPF Equation 3 CALL SYR2K (alpha, a, b, c, uplo)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 117.

a

is the general matrix A or the vector a.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 117.

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 117.

beta

is the scalar beta.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a number of the data type indicated in Table 117.

c

is the symmetric matrix C, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix C in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix C in its lower triangle, and its strictly upper triangular part is not referenced.

For equations 1 and 2, when beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 117, where size(c,1) = size(c,2).

uplo

indicates whether the upper or lower triangular part of the symmetric matrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Type: optional (equations 1 and 2); not present (equation 3)

Default: trans = 'N'

Specified as: a single character; trans = 'N' or 'T'.

On Return

c

is the updated symmetric matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 117.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. This subroutine accepts lowercase letters for the uplo and trans arguments.

  3. If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.

  4. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  5. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  6. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  7. The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is inconsistent for a, b, and c.

Stage 4
  1. trans is present, and trans <>'N', 'T', or 'C'
  2. trans = 'N', 'T', or 'C', and the shape of the assumed-shape arrays for a, b, and c is incompatible:
    1. trans = 'N':
      size(c,1) <> size(c,2) or
      size(c,2) <> size(a,1) or
      size(a,1) <> size(b,1) or
      size(a,2) <> size(b,2)
    2. trans = 'T':
      size(c,1) <> size(c,2) or
      size(c,2) <> size(a,2) or
      size(a,2) <> size(b,2) or
      size(a,1) <> size(b,1)
  3. The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a, b, or c.
  2. The process rank is not the same for a, b, and c.
  3. The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is unsupported for c.

Stage 4

The vector for a or b is replicated.

Stage 5

The data distribution is unsupported for a or b.

Stage 6
  1. The shape of the assumed-shape arrays for a, b, and c is incompatible:
    size(c,1) <> size(c,2) or
    size(c,1) <> size(a) or
    size(c,1) <> size(b)
  2. The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 7

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaATB+alphaBTA+betaC. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL SYR2K( 1.0D0 , A , B , 0.0D0 , C , 'U' , 'T' )

Input

General 8 × 9 matrix A:

 *                                                      *
 |  0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0   1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0  -1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   1.0   0.0  -1.0   1.0   1.0   0.0   1.0   1.0 |
 |  1.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 |  1.0   0.0   0.0   0.0   1.0   1.0   0.0   0.0   1.0 |
 |  0.0   0.0  -1.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 | -1.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 *                                                      *

General 8 × 9 matrix B:

 *                                                      *
 |  0.0   1.0   1.0   0.0   0.0   0.0   0.0   0.0  -1.0 |
 |  0.0  -1.0   0.0  -1.0   0.0  -1.0   0.0  -1.0  -1.0 |
 |  0.0   0.0   1.0   1.0   0.0   0.0  -1.0   0.0  -1.0 |
 |  0.0  -1.0   0.0   1.0  -1.0  -1.0   0.0  -1.0  -1.0 |
 | -1.0   0.0   0.0   0.0   1.0   0.0   0.0   0.0  -1.0 |
 | -1.0   0.0   0.0   0.0  -1.0  -1.0   0.0   0.0  -1.0 |
 |  0.0   0.0   1.0   0.0   1.0   0.0   0.0   0.0  -1.0 |
 |  1.0   0.0   0.0   0.0   0.0   0.0   1.0   0.0  -1.0 |
 *                                                      *

Output

Symmetric matrix C of order 9:

 *                                                              *
 | -6.0    0.0    0.0    0.0    0.0   -2.0   -2.0    0.0   -2.0 |
 |    .   -6.0   -2.0    0.0   -2.0   -4.0    0.0   -4.0   -2.0 |
 |    .      .   -6.0   -2.0   -2.0    0.0    2.0    0.0    6.0 |
 |    .      .      .   -6.0    2.0    0.0    2.0    0.0    2.0 |
 |    .      .      .      .   -8.0   -4.0    0.0   -2.0    0.0 |
 |    .      .      .      .      .   -6.0    0.0   -4.0   -6.0 |
 |    .      .      .      .      .      .   -4.0    0.0    0.0 |
 |    .      .      .      .      .      .      .   -4.0   -4.0 |
 |    .      .      .      .      .      .      .      .  -16.0 |
 *                                                              *

Example 2

This example computes C = alphaabT+alphabaT+C. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ ALIGN B(:) WITH C(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: C
 
CALL SYR2K( 1.0D0 , A , B , C , 'L' )

Input

Symmetric matrix C of order 9:

 *                                                     *
 | 1.0    .     .     .     .     .     .     .     .  |
 | 2.0  12.0    .     .     .     .     .     .     .  |
 | 3.0  13.0  23.0    .     .     .     .     .     .  |
 | 4.0  14.0  24.0  34.0    .     .     .     .     .  |
 | 5.0  15.0  25.0  35.0  45.0    .     .     .     .  |
 | 6.0  16.0  26.0  36.0  46.0  56.0    .     .     .  |
 | 7.0  17.0  27.0  37.0  47.0  57.0  67.0    .     .  |
 | 8.0  18.0  28.0  38.0  48.0  58.0  68.0  78.0    .  |
 | 9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0  89.0 |
 *                                                     *

Vector a of size 9:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Vector b of size 9:

 *     *
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 *     *

Output

Matrix C of order 9:

 *                                                      *
 |  5.0    .     .     .     .     .     .     .     .  |
 |  6.0  16.0    .     .     .     .     .     .     .  |
 |  7.0  17.0  27.0    .     .     .     .     .     .  |
 |  8.0  18.0  28.0  38.0    .     .     .     .     .  |
 |  9.0  19.0  29.0  39.0  49.0    .     .     .     .  |
 | 10.0  20.0  30.0  40.0  50.0  60.0    .     .     .  |
 | 11.0  21.0  31.0  41.0  51.0  61.0  71.0    .     .  |
 | 12.0  22.0  32.0  42.0  52.0  62.0  72.0  82.0    .  |
 | 13.0  23.0  33.0  43.0  53.0  63.0  73.0  83.0  93.0 |
 *                                                      *

TRAN--Matrix Transpose for a General Matrix

This subroutine performs the following matrix computation:

C <-- betaC+alphaAT

where, in the formula above:

A and C are general matrices.
alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

See references [17], [30], [31], and [44].

Table 118. Data Types
alpha, beta, A, C Subprogram
Long-precision real TRAN

Syntax

HPF CALL TRAN (alpha, a, beta, c)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 118.

a

is the general matrix A.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 118.

beta

is the scalar beta.

Type: required

Specified as: a number of the data type indicated in Table 118.

c

is the general matrix C. When beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 113.

On Return

c

is the updated general matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 118.

Notes and Coding Rules

  1. The assumed-shape arrays must have the exact size required for the computation, that is:

  2. The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.

  3. For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"

  4. Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

  5. The restrictions given in "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" also apply to this subroutine.

Input-Argument Errors

Stage 1
  1. The rank of the ultimate align target is greater than 2 for a or c.
  2. The process rank is not the same for a and c.
  3. The process rank is not 1 or 2 for a or c.

Stage 2

The process grid is not the same for a and c.

Stage 3

The data distribution is inconsistent for a and c.

Stage 4

The shape of the assumed-shape arrays a and c is incompatible:

size(c,1) <> size(a,2) or
size(c,2) <> size(a,1)

Stage 5

The data distribution for a or c is unsupported.

Example

This example computes C = betaC+alphaAT. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, C
 
CALL TRAN( 1.0D0 , A , 1.0D0 , C )

Input

General 8 × 9 matrix A:

 *                                                      *
 |  0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0   1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0  -1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   1.0   0.0  -1.0   1.0   1.0   0.0   1.0   1.0 |
 |  1.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 |  1.0   0.0   0.0   0.0   1.0   1.0   0.0   0.0   1.0 |
 |  0.0   0.0  -1.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 | -1.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 *                                                      *

General 9 × 8 matrix C:

 *                                                *
 |  0.0   1.0   1.0   5.0   6.0   7.0   8.0   9.0 |
 |  0.0  -1.0   0.0  -1.0   0.0  -1.0   0.0   1.0 |
 |  0.0   0.0   1.0   1.0   0.0   0.0  -1.0   0.0 |
 |  0.0  -1.0   0.0   1.0  -1.0  -1.0   0.0   1.0 |
 | -1.0   2.0   0.0   0.0   1.0   0.0   0.0   0.0 |
 | -1.0   3.0   0.0   0.0  -1.0  -1.0   0.0   0.0 |
 |  0.0   4.0   1.0   0.0   1.0   0.0   0.0   0.0 |
 |  1.0   5.0   0.0   0.0   0.0   0.0   1.0   0.0 |
 |  1.0   2.0   3.0   4.0   1.0   1.0   1.0   1.0 |
 *                                                *

Output

General 9 × 8 matrix C:

 *                                                *
 |  0.0   1.0   1.0   5.0   7.0   8.0   8.0   8.0 |
 | -1.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 | -1.0   0.0   0.0   1.0   0.0   0.0  -2.0   0.0 |
 |  0.0   0.0  -1.0   0.0  -1.0  -1.0   0.0   1.0 |
 | -1.0   2.0   0.0   1.0   0.0   1.0  -1.0   0.0 |
 | -1.0   4.0   0.0   1.0  -1.0   0.0   0.0   0.0 |
 |  0.0   4.0   2.0   0.0   1.0   0.0   0.0  -1.0 |
 |  1.0   6.0   0.0   1.0   0.0   0.0   1.0   0.0 |
 |  2.0   3.0   4.0   5.0   2.0   2.0   2.0   2.0 |
 *                                                *


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]