Guide and Reference

Reference Information (HPF)

This part of the book is organized into five areas, providing reference information for coding the Parallel ESSL calling sequences in a High Performance Fortran (HPF) program. It is organized as follows:

PBLAS
Linear Algebraic Equations
Eigensystem Analysis and Singular Value Analysis
Fourier Transforms
Random Number Generation

PBLAS (HPF)

This chapter describes the Level 2 and 3 PBLAS subroutines that can be called from an HPF program.

Overview of the PBLAS Subroutines

The Level 2 and 3 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 2 and 3 BLAS.
Note: These subroutines are designed to be consistent with the proposals for the Fortran 90 BLAS and the Fortran 90 LAPACK. (See references [30] and [31].) If these subroutines do not comply with any eventual proposal for HPF interfaces to the PBLAS and ScaLAPACK, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program.

Level 2 PBLAS

Table 110. List of Level 2 PBLAS (HPF)

Descriptive Name Long-Precision Subprogram Page
Matrix-Vector Product for a General Matrix or Its Transpose GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Vector Product for a Real Symmetric Matrix SYMM SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Rank-One Update of a General Matrix GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Rank-One Update of a Real Symmetric Matrix SYRK SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-Two Update of a Real Symmetric Matrix SYR2K SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix-Vector Product for a Triangular Matrix or Its Transpose TRMM TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with a Single Right-Hand Side TRSM TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

Descriptive Name	Long-Precision Subprogram	Page
Matrix-Vector Product for a General Matrix or Its Transpose	GEMM	GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Vector Product for a Real Symmetric Matrix	SYMM	SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Rank-One Update of a General Matrix	GEMM	GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Rank-One Update of a Real Symmetric Matrix	SYRK	SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-Two Update of a Real Symmetric Matrix	SYR2K	SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix-Vector Product for a Triangular Matrix or Its Transpose	TRMM	TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with a Single Right-Hand Side	TRSM	TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

Level 3 PBLAS

Table 111. List of Level 3 PBLAS (HPF)

Descriptive Name Long-Precision Subprogram Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose GEMM GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric SYMM SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product TRMM TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides TRSM TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix SYRK SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix SYR2K SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix TRAN TRAN--Matrix Transpose for a General Matrix

Descriptive Name	Long-Precision Subprogram	Page
Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose	GEMM	GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose
Matrix-Matrix Product Where One Matrix is Real Symmetric	SYMM	SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric
Triangular Matrix-Matrix Product	TRMM	TRMM--Triangular Matrix-Matrix Product
Solution of Triangular System of Equations with Multiple Right-Hand Sides	TRSM	TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides
Rank-K Update of a Real Symmetric Matrix	SYRK	SYRK--Rank-K Update of a Real Symmetric Matrix
Rank-2K Update of a Real Symmetric Matrix	SYR2K	SYR2K--Rank-2K Update of a Real Symmetric Matrix
Matrix Transpose for a General Matrix	TRAN	TRAN--Matrix Transpose for a General Matrix

PBLAS Subroutines

This section contains the PBLAS subroutine descriptions.

GEMM--Matrix-Matrix Product for a General Matrix, Its Transpose, or Its Conjugate Transpose

This subroutine performs any one of the following combined matrix computations:

1. C <-- alphaAB+betaC

2. C <-- alphaAB^T+betaC

3. C <-- alphaA^TB+betaC

4. C <-- alphaA^TB^T+betaC

5. C <-- alphaA^HB+betaC

6. C <-- alphaA^HB^T+betaC

7. C <-- alphaAB^H+betaC

8. C <-- alphaA^TB^H+betaC

9. C <-- alphaA^HB^H+betaC

10. c <-- alphaAb+betac

11. c <-- alphaA^Tb+betac

12. C <-- alphaab^T+C

where, in the formulas above:

A, B, and C are general matrices.

a, b, and c are vectors.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes or matrix conjugate transposes; that is, the matrices should always be stored in their untransposed forms.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

For equations 1-9:
- The assumed-shape array for C has a size of zero.
- alpha is zero and beta is one.
- beta is one, and the assumed-shape arrays for A and B have a size of zero.
Assuming the above conditions do not exist, if beta is not one and the assumed-shape arrays for A and B have a size of zero, then betaC is returned.
For equations 10 and 11:
- Any of the assumed-shape arrays have a size of zero.
- alpha is zero and beta is one.
For equation 12:
- Any of the assumed-shape arrays have a size of zero.
- alpha is zero.

See references [17], [30], [31], and [44].

Table 112. Data Types

alpha, beta, A, B, C, a, b, c Subroutine
Long-precision real GEMM
Long-precision complex GEMM

Syntax

HPF	Equations 1-9	CALL GEMM (`alpha`, `a`, `b`, `beta`, `c`) CALL GEMM (`alpha`, `a`, `b`, `beta`, `c`, `transa`, `transb`)
HPF	Equations 10 and 11	CALL GEMM (`alpha`, `a`, `b`, `beta`, `c`) CALL GEMM (`alpha`, `a`, `b`, `beta`, `c`, `transa`)
HPF	Equation 12	CALL GEMM (`alpha`, `a`, `b`, `c`)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 112.

a

is the general matrix A or the vector a, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.
Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

b

is the general matrix B or the vector b, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Type: required
Note: No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

beta

is the scalar beta.

Type: required (equations 1-11); not present (equation 12)

Specified as: a number of the data type indicated in Table 112.

c

is the general matrix C or the vector c. When beta is zero, c need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1, 2, 7, or 10.

If transa = 'T', A^T is used in the computation, resulting in equation 3, 4, 8, or 11.

If transa = 'C', A^H is used in the computation, resulting in equation 5, 6, or 9.

Type: optional (equations 1-11); not present (equation 12)

Default: transa = 'N'

Specified as: a single character; transa = 'N', 'T', or 'C'.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation, resulting in equation 1, 3, or 5.

If transb = 'T', B^T is used in the computation, resulting in equation 2, 4, or 6.

If transb = 'C', B^H is used in the computation, resulting in equation 7, 8, or 9.

Type: optional (equations 1-9); not present (equations 10-12)

Default: transb = 'N'

Specified as: a single character; transb = 'N' or 'T'.

On Return

c

is the updated matrix C or vector c, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 112.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For equations 1 through 9:
  - If transa = 'N' and transb = 'N':
    - size(a,1) = size(c,1)
    - size(b,2) = size(c,2)
    - size(a,2) = size(b,1)
  - If transa = 'N' and transb = 'T' or 'C':
    - size(a,1) = size(c,1)
    - size(b,1) = size(c,2)
    - size(a,2) = size(b,2)
  - If transa = 'T' or 'C' and transb = 'N':
    - size(a,2) = size(c,1)
    - size(b,2) = size(c,2)
    - size(a,1) = size(b,1)
  - If transa = 'T' or 'C' and transb = 'T' or 'C':
    - size(a,2) = size(c,1)
    - size(b,1) = size(c,2)
    - size(a,1) = size(b,2)
- For equations 10 and 11:
  - If transa = 'N':
    - size(a,1) = size(c)
    - size(a,2) = size(b)
  - If transa = 'T':
    - size(a,1) = size(b)
    - size(a,2) = size(c)
- For equation 12:
  - size(c,1) = size(a)
  - size(c,2) = size(b)
This subroutine accepts lowercase letters for the transa and transb arguments.
If you are using long-precision real data and specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules", "Notes and Coding Rules", and "Notes and Coding Rules" also apply to this subroutine.
An example of the use of this subroutine in a thermal diffusion application program is shown in Appendix B. "Sample Programs". See "Program Main (HPF)".

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions", "Error Conditions", and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1-9

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

transa is present, and transa <>'N', 'T', or 'C'.
transb is present, and transb <>'N', 'T', or 'C'.

Stage 3

The process grid is not the same for a, b, and c.

Stage 4

The data distribution is inconsistent for a, b, and c.

Stage 5

The shape of the assumed-shape arrays a, b, and c is incompatible:

transa = 'N' and transb = 'N':

size(a,1) <> size(c,1) or
size(b,2) <> size(c,2) or
size(a,2) <> size(b,1)
transa = 'N' and transb = 'T' or 'C':

size(a,1) <> size(c,1) or
size(b,1) <> size(c,2) or
size(a,2) <> size(b,2)
transa = 'T' or 'C' and transb = 'N':

size(a,2) <> size(c,1) or
size(b,2) <> size(c,2) or
size(a,1) <> size(b,1)
transa = 'T' or 'C' and transb = 'T' or 'C':

size(a,2) <> size(c,1) or
size(b,1) <> size(c,2) or
size(a,1) <> size(b,2)

Stage 6

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equations 10 and 11

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

transa is present, and transa <>'N', 'T', or 'C'.

Stage 3

The process grid is not the same for a, b, and c.

Stage 3

The vector for b or c is replicated.
The data distribution for a is unsupported.

Stage 5

Vector distribution error for b or c
The shape of the assumed-shape arrays a, b, and c is incompatible:
1. transa = 'N':
  
  size(a,1) <> size(c) or
  size(a,2) <> size(b)
2. transa = 'T':
  
  size(a,1) <> size(b) or
  size(a,2) <> size(c)

Stage 6

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 12

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution for c is unsupported.

Stage 4

The vector for a or b is replicated.

Stage 5

The data distribution is inconsistent for a and b.

Stage 6

Vector distribution error for a or b.
The shape of the assumed-shape arrays a, b, and c is incompatible:

size(c,1) <> size(a) or
size(c,2) <> size(b)

Stage 7

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaAB+betaC. As in "Example 1", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A , B , 2.0D0 , C )
-or-
CALL GEMM( 1.0D0 , A , B , 2.0D0 , C , TRANSA='N' , TRANSB='N' )

Input

General 6 × 5 matrix A:

 *                              *
 |  1.0   2.0  -1.0  -1.0   4.0 |
 |  2.0   0.0   1.0   1.0  -1.0 |
 |  1.0  -1.0  -1.0   1.0   2.0 |
 | -3.0   2.0   2.0   2.0   0.0 |
 |  4.0   0.0  -2.0   1.0  -1.0 |
 | -1.0  -1.0   1.0  -3.0   2.0 |
 *                              *

General 5 × 4 matrix B:

 *                        *
 |  1.0  -1.0   0.0   2.0 |
 |  2.0   2.0  -1.0  -2.0 |
 |  1.0   0.0  -1.0   1.0 |
 | -3.0  -1.0   1.0  -1.0 |
 |  4.0   2.0  -1.0   1.0 |
 *                        *

General 6 × 4 matrix C:

 *                    *
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 | 0.5  0.5  0.5  0.5 |
 *                    *

Output

General 6 × 4 matrix C:

 *                        *
 | 24.0  13.0  -5.0   3.0 |
 | -3.0  -4.0   2.0   4.0 |
 |  4.0   1.0   2.0   5.0 |
 | -2.0   6.0  -1.0  -9.0 |
 | -4.0  -6.0   5.0   5.0 |
 | 16.0   7.0  -4.0   7.0 |
 *                        *

Example 2

This example computes C = alphaAB+betaC. As in "Example 2", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( (1.0D0,0.0D0) , A , B , (2.0D0,0.0D0) , C )
-or-
CALL GEMM( (1.0D0,0.0D0) , A , B , (2.0D0,0.0D0) , C , TRANSA='N' , TRANSB='N' )

Input

General 6 × 3 matrix A:

 *                                   *
 |  (1.0,5.0)   (9.0,2.0)  (1.0,9.0) |
 |  (2.0,4.0)   (8.0,3.0)  (1.0,8.0) |
 |  (3.0,3.0)   (7.0,5.0)  (1.0,7.0) |
 |  (4.0,2.0)   (4.0,7.0)  (1.0,5.0) |
 |  (5.0,1.0)   (5.0,1.0)  (1.0,6.0) |
 |  (6.0,6.0)   (3.0,6.0)  (1.0,4.0) |
 *                                   *

General 3 × 2 matrix B:

 *                       *
 |  (1.0,8.0)  (2.0,7.0) |
 |  (4.0,4.0)  (6.0,8.0) |
 |  (6.0,2.0)  (4.0,5.0) |
 *                       *

General 6 × 2 matrix C:

 *                      *
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 | (0.5,0.0)  (0.5,0.0) |
 *                      *

Output

General 6 × 2 matrix C:

 *                                *
 |  (-22.0,113.0)  (-35.0.142.0)  |
 |  (-19.0,114.0)  (-35.0.141.0)  |
 |  (-20.0,119.0)  (-43.0.146.0)  |
 |  (-27.0,110.0)  (-58.0.131.0)  |
 |    (8.0,103.0)    (0.0.112.0)  |
 |  (-55.0,116.0)  (-75.0.135.0)  |
 *                                *

Example 3

This example computes c = alphaAb+betac. The input matrices A, B, and C, used here, are the same as the matrices used in "Example 1". The updated portion of C is also the same, as this computation is equivalent to a portion of the computation.

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix A starting at row 3 and column 1 in an array, a column vector b starting at row 1 and column 2 in an array, and a column vector c, starting at row 3 and column 2 in an array.

As in "Example 1", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A(3:6,1:5) , B(1:5,2:2) , 2.0D0 , C(3:6,2:2) )
-or-
CALL GEMM( 1.0D0 , A(3:6,1:5) , B(1:5,2:2) , 2.0D0 , C(3:6,2:2) , TRANSA='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the 4 × 5 submatrix A, starting at row 3 and column 1 in the 6 × 5 array:

 *                              *
 |   .     .     .     .     .  |
 |   .     .     .     .     .  |
 |  1.0  -1.0  -1.0   1.0   2.0 |
 | -3.0   2.0   2.0   2.0   0.0 |
 |  4.0   0.0  -2.0   1.0  -1.0 |
 | -1.0  -1.0   1.0  -3.0   2.0 |
 *                              *

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 5, starting at row 1 and column 2 in the 5 × 4 array:

 *                       *
 |  .   -1.0    .     .  |
 |  .    2.0    .     .  |
 |  .    0.0    .     .  |
 |  .   -1.0    .     .  |
 |  .    2.0    .     .  |
 *                       *

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 3 and column 2 in the 6 × 4 array:

 *                    *
 |  .    .    .    .  |
 |  .    .    .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 |  .   0.5   .    .  |
 *                    *

Output

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 3 and column 2 in the 6 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .    1.0    .     .  |
 |  .    6.0    .     .  |
 |  .   -6.0    .     .  |
 |  .    7.0    .     .  |
 *                       *

Example 4

This example computes c = alphaAb+betac. The input matrices A, B, and C, used here, are the same as A, B, and C, used in "Example 1".

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array, a row vector b starting at row 4 and column 2 in an array, and a column vector c starting at row 2 and column 3 in an array.

As in "Example 2", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL GEMM( 1.0D0 , A(2:5,2:4) , B(4:4,2:4) , 2.0D0 , C(2:5,3:3) )
-or-
CALL GEMM( 1.0D0 , A(2:5,2:4) , B(4:4,2:4) , 2.0D0 , C(2:5,3:3) , TRANSA='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the 4 × 3 submatrix A, starting at row 2 and column 2 in the 6 × 5 array:

 *                             *
 |  .     .     .     .     .  |
 |  .    0.0   1.0   1.0    .  |
 |  .   -1.0  -1.0   1.0    .  |
 |  .    2.0   2.0   2.0    .  |
 |  .    0.0  -2.0   1.0    .  |
 |  .     .     .     .     .  |
 *                             *

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 3, starting at row 4 and column 2 in the 5 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .     .     .     .  |
 |  .   -1.0   1.0  -1.0 |
 |  .     .     .     .  |
 *                       *

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 2 and column 3 in the 6 × 4 array:

 *                    *
 |  .    .    .    .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .   0.5   .  |
 |  .    .    .    .  |
 *                    *

Output

Only a portion of the data structure is used--that is, vector c, which is a column vector. Following is the vector c of size 4, starting at row 2 and column 3 in the 6 × 4 array:

 *                       *
 |  .     .     .     .  |
 |  .     .    1.0    .  |
 |  .     .    0.0    .  |
 |  .     .   -1.0    .  |
 |  .     .   -2.0    .  |
 |  .     .     .     .  |
 *                       *

Example 5

This example computes C = alphaab^T+C.

Array sections are specified for arguments a, b, and c, resulting in the computation using a submatrix C starting at row 2 and column 2 in an array, a column vector a, starting at element 2 in an array, and a row vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ ALIGN B(:) WITH C(1,:)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: C
 
CALL GEMM( 1.0D0 , A(2:10) , B(2:10) , C(2:10,2:10) )

Input

Only a portion of the data structure is used--that is, submatrix C. Following is the 9 × 9 submatrix C, starting at row 2 and column 2 in the 10 × 10 array:

 *                                                                   *
 |  .     .      .      .      .      .      .      .      .      .  |
 |  .   12.0   22.0   32.0   42.0   52.0   62.0   72.0   82.0   92.0 |
 |  .   13.0   23.0   33.0   43.0   53.0   63.0   73.0   83.0   93.0 |
 |  .   14.0   24.0   34.0   44.0   54.0   64.0   74.0   84.0   94.0 |
 |  .   15.0   25.0   35.0   45.0   55.0   65.0   75.0   85.0   95.0 |
 |  .   16.0   26.0   36.0   46.0   56.0   66.0   76.0   86.0   96.0 |
 |  .   17.0   27.0   37.0   47.0   57.0   67.0   77.0   87.0   97.0 |
 |  .   18.0   28.0   38.0   48.0   58.0   68.0   78.0   88.0   98.0 |
 |  .   19.0   29.0   39.0   49.0   59.0   69.0   79.0   89.0   99.0 |
 |  .   20.0   30.0   40.0   50.0   60.0   70.0   80.0   90.0  100.0 |
 *                                                                   *

Only a portion of the data structure is used--that is, vector a, which is a column vector. Following is the vector a of size 9, starting at element 2 in the array of size 11:

 *     *
 |  .  |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 |  .  |
 *     *

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 9, starting at element 2 in the array of size 11:

 *                                                                 *
 |  .    2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0    .  |
 *                                                                 *

Output

Only a portion of the data structure is used--that is, submatrix C. Following is the 9 × 9 submatrix C, starting at row 2 and column 2 in the 10 × 10 array:

 *                                                                   *
 |  .     .      .      .      .      .      .      .      .      .  |
 |  .   14.0   25.0   36.0   47.0   58.0   69.0   80.0   91.0  102.0 |
 |  .   15.0   26.0   37.0   48.0   59.0   70.0   81.0   92.0  103.0 |
 |  .   16.0   27.0   38.0   49.0   60.0   71.0   82.0   93.0  104.0 |
 |  .   17.0   28.0   39.0   50.0   61.0   72.0   83.0   94.0  105.0 |
 |  .   18.0   29.0   40.0   51.0   62.0   73.0   84.0   95.0  106.0 |
 |  .   19.0   30.0   41.0   52.0   63.0   74.0   85.0   96.0  107.0 |
 |  .   20.0   31.0   42.0   53.0   64.0   75.0   86.0   97.0  108.0 |
 |  .   21.0   32.0   43.0   54.0   65.0   76.0   87.0   98.0  109.0 |
 |  .   22.0   33.0   44.0   55.0   66.0   77.0   88.0   99.0  110.0 |
 *                                                                   *

SYMM--Matrix-Matrix Product Where One Matrix is Real Symmetric

This subroutine computes one of the following matrix-matrix products:

1. C <-- alphaAB+betaC

2. C <-- alphaBA+betaC

3. c <-- alphaAb+betac

where, in the formulas above:

A is a symmetric matrix.

B and C are general matrices.

b and c are vectors.

alpha and beta are scalars.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

Any of the assumed-shape arrays have a size of zero.
alpha is zero and beta is one.

See references [17], [30], [31], and [44].

Table 113. Data Types

alpha, beta, A, B, C, b, c Subprogram
Long-precision real SYMM

Syntax

HPF	Equations 1 and 2	CALL SYMM (`alpha`, `a`, `b`, `beta`, `c`, `uplo`, `side`)
HPF	Equation 3	CALL SYMM (`alpha`, `a`, `b`, `beta`, `c`, `uplo`)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 113.

a

is the symmetric matrix A, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 113, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

beta

is the scalar beta.

Type: required

Specified as: a number of the data type indicated in Table 113.

c

is the general matrix C or the vector c. When beta is zero, c need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

uplo

indicates whether the upper or lower triangular part of the symmetric matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1.

If side = 'R', A is to the right of B, resulting in equation 2.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a single character; side = 'L' or 'R'.

On Return

c

is the updated matrix C or vector c, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 113.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For equations 1 and 2:
  - If side = 'L':
    - size(b,1) = size(c,1) = size(a,1) = size(a,2)
    - size(b,2) = size(c,2)
  - If side = 'R':
    - size(b,1) = size(c,1)
    - size(b,2) = size(c,2) = size(a,1) = size(a,2)
- For equation 3: size(a,1) = size(a,2) = size(b) = size(c)
For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.
This subroutine accepts lowercase letters for the side and uplo arguments.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is inconsistent for a, b, and c.

Stage 4

side is present, and side <>'L' or 'R'.
side = 'L' or 'R', and the shape of the assumed-shape arrays a, b, and c is incompatible:
1. side = 'L' and:
  
  size(b,1) <> size(c,1) or
  size(c,1) <> size(a,1) or
  size(a,1) <> size(a,2) or
  size(b,2) <> size(c,2)
2. side = 'R' and:
  
  size(b,1) <> size(c,1) or
  size(b,2) <> size(c,2) or
  size(c,2) <> size(a,1) or
  size(a,1) <> size(a,2)
The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The vector for b or c is replicated.
The data distribution for a is unsupported.

Stage 4

Vector distribution error for b or c.
The shape of the assumed-shape arrays a, b, and c is incompatible:

size(a,1) <> size(a,2) or
size(a,1) <> size(b) or
size(a,1) <> size(c)
The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaBA+betaC. Because beta = 0, C need not be set on input. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL SYMM( 1.0D0, A, B, 0.0D0, C, 'U', 'R' )

Input

Symmetric matrix A of order 8:

 *                                               *
 | 0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0 |
 |  .    1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  .     .   -1.0  -1.0   0.0   0.0   1.0   0.0 |
 |  .     .     .   -1.0   1.0   1.0   0.0   1.0 |
 |  .     .     .     .   -1.0   0.0   0.0   0.0 |
 |  .     .     .     .     .    1.0   0.0   0.0 |
 |  .     .     .     .     .     .    0.0   0.0 |
 |  .     .     .     .     .     .     .    0.0 |
 *                                               *

General 16 × 8 matrix B:

 *                                                *
 | -1.0   0.0   1.0  -1.0   1.0   1.0  -1.0  -1.0 |
 | -1.0  -1.0   1.0   0.0   1.0  -1.0  -1.0   1.0 |
 |  1.0   1.0  -1.0   0.0  -1.0   0.0   1.0   0.0 |
 |  0.0  -1.0   0.0   0.0   0.0   0.0   0.0  -1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   1.0   0.0 |
 |  0.0   0.0   1.0   0.0  -1.0  -1.0   0.0   0.0 |
 |  1.0   1.0   0.0   0.0   1.0   1.0   0.0  -1.0 |
 |  0.0   0.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   0.0   0.0  -1.0   1.0   1.0   0.0   1.0 |
 | -1.0  -1.0   1.0   0.0   0.0  -1.0   0.0   1.0 |
 |  0.0   0.0   0.0   1.0   1.0   0.0   0.0   0.0 |
 |  0.0   0.0   1.0   1.0   0.0  -1.0   0.0   0.0 |
 |  1.0   1.0  -1.0   0.0  -1.0  -1.0   1.0   1.0 |
 |  0.0   0.0   0.0   0.0   1.0   0.0   0.0  -1.0 |
 |  0.0   1.0   0.0   0.0   0.0   0.0   0.0   0.0 |
 | -1.0   0.0  -1.0   0.0   0.0   1.0   1.0   0.0 |
 *                                                *

Output

General 16 × 8 matrix C:

 *                                                *
 | -1.0   0.0   0.0   1.0  -2.0   0.0   1.0  -1.0 |
 |  0.0   0.0  -1.0  -1.0  -1.0  -2.0   1.0  -1.0 |
 |  0.0   0.0   1.0   1.0   1.0   1.0  -1.0   1.0 |
 |  1.0  -2.0   0.0  -2.0   0.0  -1.0   0.0  -1.0 |
 | -1.0   3.0   0.0   1.0   1.0   3.0   0.0   2.0 |
 | -1.0  -1.0  -1.0  -3.0   1.0  -1.0   1.0   0.0 |
 | -1.0   0.0  -1.0   2.0  -1.0   2.0   0.0   1.0 |
 |  1.0   2.0   1.0   3.0   0.0   1.0  -1.0   0.0 |
 |  0.0   1.0   1.0   4.0  -2.0   0.0   0.0  -1.0 |
 |  0.0   0.0   0.0  -2.0   0.0  -2.0   1.0  -1.0 |
 |  0.0   1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 | -1.0   0.0  -2.0  -3.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0   1.0   1.0   1.0   0.0  -1.0   1.0 |
 |  0.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0 |
 | -1.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  1.0   2.0   3.0   2.0   0.0   1.0  -1.0   0.0 |
 *                                                *

Example 2

This example computes c = alphaAb+betac. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(:,1)
!HPF$ ALIGN C(:) WITH A(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL SYMM( 1.0D0, A, B, 0.0D0, C, 'U' )

Input

Symmetric matrix A of order 8:

 *                                               *
 | 0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0 |
 |  .    1.0   0.0   1.0   0.0   1.0   0.0   1.0 |
 |  .     .   -1.0  -1.0   0.0   0.0   1.0   0.0 |
 |  .     .     .   -1.0   1.0   1.0   0.0   1.0 |
 |  .     .     .     .   -1.0   0.0   0.0   0.0 |
 |  .     .     .     .     .    1.0   0.0   0.0 |
 |  .     .     .     .     .     .    0.0   0.0 |
 |  .     .     .     .     .     .     .    0.0 |
 *                                               *

Vector b of size 8:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Output

Vector c of size 8:

 *      *
 | -2.0 |
 |  3.0 |
 | -2.0 |
 |  2.0 |
 |  0.0 |
 |  3.0 |
 |  1.0 |
 |  2.0 |
 *      *

TRMM--Triangular Matrix-Matrix Product

This subroutine computes one of the following matrix-matrix products:

1. B <-- alphaAB

2. B <-- alphaA^TB

3. B <-- alphaBA

4. B <-- alphaBA^T

5. b <-- Ab

6. b <-- A^Tb

where, in the formulas above:

A is a triangular matrix.

B is a general matrix.

b is a vector.

alpha is a scalar.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If any of the assumed-shape arrays have a size of zero, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [17], [30], [31], and [44].

Table 114. Data Types

alpha, A, B, b Subprogram
Long-precision real TRMM

Syntax

HPF	Equations 1-4	CALL TRMM (`alpha`, `a`, `b`, `uplo`, `side`) CALL TRMM (`alpha`, `a`, `b`, `uplo`, `side`, `transa`, `diag`)
HPF	Equations 5 and 6	CALL TRMM (`a`, `b`, `uplo`) CALL TRMM (`a`, `b`, `uplo`, `transa`, `diag`)

On Entry

alpha

is the scalar alpha.

Type: required (equations 1-4); not present (equations 5 and 6)

Specified as: a number of the data type indicated in Table 114.

a

is the triangular matrix A, where:

If uplo = 'U', the array contains the upper triangle of the triangular matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the triangular matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 114, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 114.

uplo

indicates whether the upper or lower triangular part of the triangular matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1 or 2.

If side = 'R', A is to the right of B, resulting in equation 3 or 4.

Type: required (equations 1-4); not present (equations 5 and 6)

Specified as: a single character; side = 'L' or 'R'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1, 3, or 5.

If transa = 'T', A^T is used in the computation, resulting in equation 2, 4, or 6.

Type: optional

Default: transa = 'N'

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Type: optional

Default: diag = 'N'

Specified as: a single character; diag = 'U' or 'N'.

On Return

b

is the updated matrix B or vector b, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 114.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For equations 1 through 4:
  - If side = 'L', size(b,1) = size(a,1) = size(a,2)
  - If side = 'R', size(b,2) = size(a,1) = size(a,2)
- For equations 5 and 6: size(b) = size(a,1) = size(a,2)
For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.
This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
This subroutine assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1-4

Stage 1

The rank of the ultimate align target is greater than 2 for a or b.
The process rank is not the same for a and b.
The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

side is present, and side <>'L' or 'R'.
side = 'L' or 'R', and the shape of the assumed-shape arrays a and b is incompatible:
1. side = 'L' and:
  
  size(b,1) <> size(a,2) or
  size(a,1) <> size(a,2)
2. side = 'R' and:
  
  size(b,2) <> size(a,1) or
  size(a,1) <> size(a,2)
The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Input-Argument Errors for Equations 5 and 6

Stage 1

The rank of the ultimate align target is greater than 2 for a or b.
The process rank is not the same for a and b.
The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Stage 6

The shape of the assumed-shape arrays a and b is incompatible: size(a,1) <> size(b)

Example 1

This example computes B = alphaAB. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B
 
CALL TRMM( 1.0D0 , A , B , 'U' , 'L' )
-or-
CALL TRMM( 1.0D0 , A , B , 'U' , 'L' , TRANSA='N' , DIAG='N' )

Input

Triangular matrix A of order 5 is upper triangular:

 *                             *
 | 3.0  -1.0   2.0   2.0   1.0 |
 |  .   -2.0   4.0  -1.0   3.0 |
 |  .     .   -3.0   0.0   2.0 |
 |  .     .     .    4.0  -2.0 |
 |  .     .     .     .    1.0 |
 *                             *

Rectangular 5 × 3 matrix B:

 *                  *
 |  2.0   3.0   1.0 |
 |  5.0   5.0   4.0 |
 |  0.0   1.0   2.0 |
 |  3.0   1.0  -3.0 |
 | -1.0   2.0   1.0 |
 *                  *

Output

Rectangular 5 × 3 matrix B:

 *                     *
 |   6.0   10.0   -2.0 |
 | -16.0   -1.0    6.0 |
 |  -2.0    1.0   -4.0 |
 |  14.0    0.0  -14.0 |
 |  -1.0    2.0    1.0 |
 *                     *

Example 2

This example computes b = Ab, where A is not a unit triangular matrix, and b is a column vector.

Array sections are specified for arguments a and b, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array and a column vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL TRMM( A(2:13,2:13) , B(2:13) , 'U' )
-or-
CALL TRMM( A(2:13,2:13) , B(2:13) , 'U' , TRANSA='N' , DIAG='N' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the triangular submatrix A of order 12, starting at row 2 and column 2 in the array of order 13:

 *                                                                 *
 |  .    .    .    .    .    .    .    .    .    .    .    .    .  |
 |  .   1.0  2.0  1.0  2.0  1.0  1.0  3.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .   3.0  2.0  3.0  1.0  2.0  3.0  1.0  1.0  2.0  3.0  3.0 |
 |  .    .    .   3.0  1.0  3.0  2.0  1.0  2.0  1.0  2.0  3.0  1.0 |
 |  .    .    .    .   1.0  2.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .    .    .    .   2.0  1.0  2.0  2.0  1.0  2.0  3.0  3.0 |
 |  .    .    .    .    .    .   1.0  2.0  1.0  1.0  2.0  3.0  1.0 |
 |  .    .    .    .    .    .    .   2.0  1.0  1.0  2.0  3.0  2.0 |
 |  .    .    .    .    .    .    .    .   2.0  1.0  2.0  3.0  3.0 |
 |  .    .    .    .    .    .    .    .    .   3.0  1.0  3.0  1.0 |
 |  .    .    .    .    .    .    .    .    .    .   2.0  2.0  2.0 |
 |  .    .    .    .    .    .    .    .    .    .    .   1.0  3.0 |
 |  .    .    .    .    .    .    .    .    .    .    .    .   1.0 |
 *                                                                 *

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *     *
 |  .  |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 | 2.0 |
 | 3.0 |
 | 1.0 |
 *     *

Output

Only a portion of the data structure is used--that is, vector b, which is a column vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *      *
 |   .  |
 | 42.0 |
 | 48.0 |
 | 39.0 |
 | 31.0 |
 | 34.0 |
 | 23.0 |
 | 23.0 |
 | 23.0 |
 | 15.0 |
 | 12.0 |
 |  6.0 |
 |  1.0 |
 *      *

TRSM--Solution of Triangular System of Equations with Multiple Right-Hand Sides

This subroutine performs one of the following solves for a triangular system of equations with multiple right-hand sides:

Solution Equation
1. B <-- alpha(A^-1)B AX = alphaB
2. B <-- alpha(A^-T)B A^TX = alphaB
3. B <-- alphaB(A^-1) XA = alphaB
4. B <-- alphaB(A^-T) XA^T = alphaB
5. b <-- (A^-1)b Ax = b
6. b <-- (A^-T)b A^Tx = b

Solution	Equation
1. B <-- alpha(A^-1)B	AX = alphaB
2. B <-- alpha(A^-T)B	A^TX = alphaB
3. B <-- alphaB(A^-1)	XA = alphaB
4. B <-- alphaB(A^-T)	XA^T = alphaB
5. b <-- (A^-1)b	Ax = b
6. b <-- (A^-T)b	A^Tx = b

where, in the formulas above:

A is a triangular matrix.

B is a general matrix.

b is a vector.

alpha is a scalar.

Notes:

The term X or x used in the systems of equations listed above represents the output solution matrix or vector, respectively. It is important to note that, in this subroutine, the solution matrix or vector is actually returned in the input-output argument b.
No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If any of the assumed-shape arrays have a size of zero, no computation is performed, and the subroutine returns after doing some parameter checking.

See references [17], [30], [31], and [44].

Table 115. Data Types

alpha, A, B, b Subprogram
Long-precision real TRSM

Syntax

HPF	Solutions 1-4	CALL TRSM (`alpha`, `a`, `b`, `uplo`, `side`) CALL TRSM (`alpha`, `a`, `b`, `uplo`, `side`, `transa`, `diag`)
HPF	Solutions 5 and 6	CALL TRSM (`a`, `b`, `uplo`) CALL TRSM (`a`, `b`, `uplo`, `transa`, `diag`)

On Entry

alpha

is the scalar alpha.

Type: required (solutions 1-4); not present (solutions 5 and 6)

Specified as: a number of the data type indicated in Table 115.

a

is the triangular matrix A used in the system of equations, where:

If uplo = 'U', the array contains the upper triangle of the triangular matrix A in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the triangular matrix A in its lower triangle, and its strictly upper triangular part is not referenced.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 115, where size(a,1) = size(a,2).

b

is the general matrix B or the vector b, containing the right-hand side(s) of the triangular system to be solved.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 115.

uplo

indicates whether the upper or lower triangular part of the triangular matrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

side

indicates whether A is located to the left or right of B in the system of equations, where:

If side = 'L', A is to the left of B, resulting in solution 1 or 2.

If side = 'R', A is to the right of B, resulting in solution 3 or 4.

Type: required (solutions 1-4); not present (solutions 5 and 6)

Specified as: a single character; side = 'L' or 'R'.

transa

indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used in the system of equations, resulting in solution 1, 3, or 5.

If transa = 'T', A^T is used in the system of equations, resulting in solution 2, 4, or 6.

Type: optional

Default: transa = 'N'

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Type: optional

Default: diag = 'N'

Specified as: a single character; diag = 'U' or 'N'.

On Return

b

is the updated matrix B or vector b, containing the solution vector(s).

Type: required

Returned as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 115.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For solutions 1 through 4:
  - If side = 'L', size(b,1) = size(a,1) = size(a,2)
  - If side = 'R', size(b,2) = size(a,1) = size(a,2)
- For solutions 5 and 6: size(b) = size(a,1) = size(a,2)
For migration purposes, note that the side and uplo arguments appear in reverse order from the corresponding BLAS and PBLAS subroutines.
This subroutine accepts lowercase letters for the side, uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
This subroutine assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Solutions 1-4

Stage 1

The rank of the ultimate align target is greater than 2 for a or b.
The process rank is not the same for a and b.
The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

side is present, and side <>'L' or 'R'.
side = 'L' or 'R', and the shape of the assumed-shape arrays a and b is incompatible:
1. side = 'L' and:
  
  size(b,1) <> size(a,2) or
  size(a,1) <> size(a,2)
2. side = 'R' and:
  
  size(b,2) <> size(a,1) or
  size(a,1) <> size(a,2)
The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Input-Argument Errors for Solutions 5 and 6

Stage 1

The rank of the ultimate align target is greater than 2 for a or b.
The process rank is not the same for a and b.
The process rank is not 1 or 2 for a or b.

Stage 2

The process grid is not the same for a and b.

Stage 3

The data distribution is inconsistent for a and b.

Stage 4

The shape of the assumed-shape array for a is invalid: size(a,1) <> size(a,2)

Stage 5

The data distribution for a or b is unsupported.

Stage 6

The shape of the assumed-shape arrays a and b is incompatible: size(a,1) <> size(b)

Example 1

This example shows the solution B <-- alpha(A^-1)B. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B
 
CALL TRSM(  1.0D0 , A , B , 'U' , 'L' )
-or-
CALL TRSM(  1.0D0 , A , B , 'U' , 'L' , TRANSA='N' , DIAG='N' )

Input

Triangular matrix A of order 5 is upper triangular:

 *                             *
 | 3.0  -1.0   2.0   2.0   1.0 |
 |  .   -2.0   4.0  -1.0   3.0 |
 |  .     .   -3.0   0.0   2.0 |
 |  .     .     .    4.0  -2.0 |
 |  .     .     .     .    1.0 |
 *                             *

General 5 × 3 matrix B:

 *                     *
 |   6.0   10.0   -2.0 |
 | -16.0   -1.0    6.0 |
 |  -2.0    1.0   -4.0 |
 |  14.0    0.0  -14.0 |
 |  -1.0    2.0    1.0 |
 *                     *

Output

General 5 × 3 matrix B:

 *                  *
 |  2.0   3.0   1.0 |
 |  5.0   5.0   4.0 |
 |  0.0   1.0   2.0 |
 |  3.0   1.0  -3.0 |
 | -1.0   2.0   1.0 |
 *                  *

Example 2

This example solves b <-- A^-1b, where A is a unit triangular matrix, and b is a row vector.

Array sections are specified for arguments a and b, resulting in the computation using a submatrix A starting at row 2 and column 2 in an array and a row vector b starting at element 2 in an array.

As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN B(:) WITH A(1,:)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL TRSM( A(2:13,2:13) , B(2:13) , 'L' , DIAG='U' )
-or-
CALL TRSM( A(2:13,2:13) , B(2:13) , 'L' , TRANSA='N' , DIAG='U' )

Input

Only a portion of the data structure is used--that is, submatrix A. Following is the triangular submatrix A of order 12, starting at row 2 and column 2 in the array of order 13:

 *                                                                 *
 |  .    .    .    .    .    .    .    .    .    .    .    .    .  |
 |  .   1.0   .    .    .    .    .    .    .    .    .    .    .  |
 |  .   2.0  1.0   .    .    .    .    .    .    .    .    .    .  |
 |  .   3.0  2.0  1.0   .    .    .    .    .    .    .    .    .  |
 |  .   1.0  3.0  2.0  1.0   .    .    .    .    .    .    .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0   .    .    .    .    .    .    .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .    .    .  |
 |  .   1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .    .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .    .  |
 |  .   1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .    .  |
 |  .   2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0   .  |
 |  .   3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0  3.0  2.0  1.0 |
 *                                                                 *

Note:

Because matrix A is unit triangular, the diagonal elements are not referenced. This subroutine assumes a value of 1.0 for the diagonal elements.

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *                                                                             *
 |  .    2.0   7.0  13.0  15.0  17.0  26.0  28.0  27.0  39.0  41.0  37.0  52.0 |
 *                                                                             *

Output

Only a portion of the data structure is used--that is, vector b, which is a row vector. Following is the vector b of size 12, starting at element 2 in the array of size 13:

 *                                                                 *
 |  .   2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0  1.0 |
 *                                                                 *

SYRK--Rank-K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-k updates:

1. C <-- alphaAA^T+betaC

2. C <-- alphaA^TA+betaC

3. C <-- alphaaa^T+C

where, in the formulas above:

A is a general matrix.

C is a symmetric matrix.

a is a vector.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

For equations 1 and 2:
- Any of the assumed-shape arrays have a size of zero.
- alpha is zero and beta is one.
For equation 3:
- Any of the assumed-shape arrays have a size of zero.
- alpha is zero.

See references [17], [30], [31], and [44].

Table 116. Data Types

alpha, beta, A, C, a Subprogram
Long-precision real SYRK

Syntax

HPF	Equations 1 and 2	CALL SYRK (`alpha`, `a`, `beta`, `c`, `uplo`) CALL SYRK (`alpha`, `a`, `beta`, `c`, `uplo`, `trans`)
HPF	Equation 3	CALL SYRK (`alpha`, `a`, `c`, `uplo`)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 116.

a

is the general matrix A or the vector a.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 116.

beta

is the scalar beta.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a number of the data type indicated in Table 116.

c

is the symmetric matrix C, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix C in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix C in its lower triangle, and its strictly upper triangular part is not referenced.

For equations 1 and 2, when beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 116, where size(c,1) = size(c,2).

uplo

indicates whether the upper or lower triangular part of the symmetric matrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Type: optional (equations 1 and 2); not present (equation 3)

Default: trans = 'N'

Specified as: a single character; trans = 'N' or 'T'.

On Return

c

is the updated symmetric matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 116.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For equations 1 and 2:
  - If trans = 'N', size(c,1) = size(c,2) = size(a,1)
  - If trans = 'T', size(c,1) = size(c,2) = size(a,2)
- For equation 3: size(c,1) = size(c,2) = size(a)
This subroutine accepts lowercase letters for the uplo and trans arguments.
If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Error Conditions" and "Error Conditions" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1

The rank of the ultimate align target is greater than 2 for a or c.
The process rank is not the same for a and c.
The process rank is not 1 or 2 for a or c.

Stage 2

The process grid is not the same for a and c.

Stage 3

The data distribution is inconsistent for a and c.

Stage 4

trans is present, and trans <>'N', 'T', or 'C'
trans = 'N', 'T', or 'C', and the shape of the assumed-shape arrays a and c is incompatible:
1. trans = 'N':
  
  size(c,2) <> size(a,1) or
  size(c,1) <> size(c,2)
2. trans = 'T':
  
  size(c,2) <> size(a,2) or
  size(c,1) <> size(c,2)
The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 5

The data distribution for a or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1

The rank of the ultimate align target is greater than 2 for a or c.
The process rank is not the same for a and c.
The process rank is not 1 or 2 for a or c.

Stage 2

The process grid is not the same for a and c.

Stage 3

The data distribution is unsupported for c.

Stage 4

The vector for a is replicated.

Stage 5

The data distribution for a is unsupported.

Stage 6

The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 7

The data distribution for c or a is unsupported.

Stage 8

The shape of the assumed-shape arrays c and a is incompatible: size(c,1) <> size(a)

Example 1

This example computes C = alphaAA^T+betaC. As in "Example", array data is block-cyclically distributed using a 2 × 3 process grid.

!HPF$ PROCESSORS PROC(2,3)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, C
 
CALL SYRK( 1.0D0 , A , 1.0D0 , C , UPLO='L' )
-or-
CALL SYRK( 1.0D0 , A , 1.0D0 , C , UPLO='L' , TRANS='N' )

Input

General 8 × 5 matrix A:

 *                             *
 | 0.0   8.0  16.0  24.0  32.0 |
 | 1.0   9.0  17.0  25.0  33.0 |
 | 2.0  10.0  18.0  26.0  34.0 |
 | 3.0  11.0  19.0  27.0  35.0 |
 | 4.0  12.0  20.0  28.0  36.0 |
 | 5.0  13.0  21.0  29.0  37.0 |
 | 6.0  14.0  22.0  30.0  38.0 |
 | 7.0  15.0  23.0  31.0  39.0 |
 *                             *

Symmetric matrix C of order 8:

 *                                               *
 | 0.0    .     .     .     .     .     .     .  |
 | 1.0   8.0    .     .     .     .     .     .  |
 | 2.0   9.0  15.0    .     .     .     .     .  |
 | 3.0  10.0  16.0  21.0    .     .     .     .  |
 | 4.0  11.0  17.0  22.0  26.0    .     .     .  |
 | 5.0  12.0  18.0  23.0  27.0  30.0    .     .  |
 | 6.0  13.0  19.0  24.0  28.0  31.0  33.0    .  |
 | 7.0  14.0  20.0  25.0  29.0  32.0  34.0  35.0 |
 *                                               *

Output

Symmetric matrix C of order 8:

 *                                                                *
 | 1920.0      .       .       .       .       .       .       .  |
 | 2001.0  2093.0      .       .       .       .       .       .  |
 | 2082.0  2179.0  2275.0      .       .       .       .       .  |
 | 2163.0  2265.0  2366.0  2466.0      .       .       .       .  |
 | 2244.0  2351.0  2457.0  2562.0  2666.0      .       .       .  |
 | 2325.0  2437.0  2548.0  2658.0  2767.0  2875.0      .       .  |
 | 2406.0  2523.0  2639.0  2754.0  2868.0  2981.0  3093.0      .  |
 | 2487.0  2609.0  2730.0  2850.0  2969.0  3087.0  3204.0  3320.0 |
 *                                                                *

Example 2

This example computes C = alphaaa^T+C. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A
 
CALL SYRK( 1.0D0 , A , C , UPLO='L' )

Input

Symmetric matrix C of order 9:

 *                                                     *
 | 1.0    .     .     .     .     .     .     .     .  |
 | 2.0  12.0    .     .     .     .     .     .     .  |
 | 3.0  13.0  23.0    .     .     .     .     .     .  |
 | 4.0  14.0  24.0  34.0    .     .     .     .     .  |
 | 5.0  15.0  25.0  35.0  45.0    .     .     .     .  |
 | 6.0  16.0  26.0  36.0  46.0  56.0    .     .     .  |
 | 7.0  17.0  27.0  37.0  47.0  57.0  67.0    .     .  |
 | 8.0  18.0  28.0  38.0  48.0  58.0  68.0  78.0    .  |
 | 9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0  89.0 |
 *                                                     *

Vector a of size 9:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Output

Matrix C of order 9:

 *                                                      *
 |  2.0    .     .     .     .     .     .     .     .  |
 |  3.0  13.0    .     .     .     .     .     .     .  |
 |  4.0  14.0  24.0    .     .     .     .     .     .  |
 |  5.0  15.0  25.0  35.0    .     .     .     .     .  |
 |  6.0  16.0  26.0  36.0  46.0    .     .     .     .  |
 |  7.0  17.0  27.0  37.0  47.0  57.0    .     .     .  |
 |  8.0  18.0  28.0  38.0  48.0  58.0  68.0    .     .  |
 |  9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0    .  |
 | 10.0  20.0  30.0  40.0  50.0  60.0  70.0  80.0  90.0 |
 *                                                      *

SYR2K--Rank-2K Update of a Real Symmetric Matrix

This subroutine computes one of the following rank-2k updates:

1. C <-- alphaAB^T+alphaBA^T+betaC

2. C <-- alphaA^TB+alphaB^TA+betaC

3. C <-- alphaab^T+alphaba^T+C

where, in the formulas above:

A and B are general matrices.

C is a symmetric matrix.

a and b are vectors.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transposes; that is, the matrices should always be stored in their untransposed forms.

In the following cases, no computation is performed and the subroutine returns after doing some parameter checking:

For equations 1 and 2:
- All of the assumed-shape arrays have a size of zero.
- beta is one, and (alpha is zero or the assumed-shape arrays for a and b have a size of zero).
For equation 3:
- Any of the assumed-shape arrays have a size of zero.
- alpha is zero.

See references [17], [30], [31], and [44].

Table 117. Data Types

alpha, beta, A, B, C, a, b Subprogram
Long-precision real SYR2K

Syntax

HPF	Equations 1 and 2	CALL SYR2K (`alpha`, `a`, `b`, `beta`, `c`, `uplo`) CALL SYR2K (`alpha`, `a`, `b`, `beta`, `c`, `uplo`, `trans`)
HPF	Equation 3	CALL SYR2K (`alpha`, `a`, `b`, `c`, `uplo`)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 117.

a

is the general matrix A or the vector a.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 117.

b

is the general matrix B or the vector b.

Type: required

Specified as: an assumed-shape array with shape (:,:) or (:), containing numbers of the data type indicated in Table 117.

beta

is the scalar beta.

Type: required (equations 1 and 2); not present (equation 3)

Specified as: a number of the data type indicated in Table 117.

c

is the symmetric matrix C, where:

If uplo = 'U', the array contains the upper triangle of the symmetric matrix C in its upper triangle, and its strictly lower triangular part is not referenced.

If uplo = 'L', the array contains the lower triangle of the symmetric matrix C in its lower triangle, and its strictly upper triangular part is not referenced.

For equations 1 and 2, when beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 117, where size(c,1) = size(c,2).

uplo

indicates whether the upper or lower triangular part of the symmetric matrix C is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Type: required

Specified as: a single character; uplo = 'U' or 'L'.

trans

indicates which computation is performed, where:

If trans = 'N', the computation in equation 1 is performed.

If trans = 'T', the computation in equation 2 is performed.

Type: optional (equations 1 and 2); not present (equation 3)

Default: trans = 'N'

Specified as: a single character; trans = 'N' or 'T'.

On Return

c

is the updated symmetric matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 117.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- For equations 1 and 2:
  - If trans = 'N':
    - size(c,1) = size(c,2) = size(a,1) = size(b,1)
    - size(a,2) = size(b,2)
  - If trans = 'T':
    - size(c,1) = size(c,2) = size(a,2) = size(b,2)
    - size(a,1) = size(b,1)
- For equation 3: size(c,1) = size(c,2) = size(a) = size(b)
This subroutine accepts lowercase letters for the uplo and trans arguments.
If you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your vectors and matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Error Conditions

HPF-specific errors are listed below. Resource and input-argument errors listed in "Notes and Coding Rules" and "Notes and Coding Rules" also apply to this subroutine.

Input-Argument Errors for Equations 1 and 2

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is inconsistent for a, b, and c.

Stage 4

trans is present, and trans <>'N', 'T', or 'C'
trans = 'N', 'T', or 'C', and the shape of the assumed-shape arrays for a, b, and c is incompatible:
1. trans = 'N':
  
  size(c,1) <> size(c,2) or
  size(c,2) <> size(a,1) or
  size(a,1) <> size(b,1) or
  size(a,2) <> size(b,2)
2. trans = 'T':
  
  size(c,1) <> size(c,2) or
  size(c,2) <> size(a,2) or
  size(a,2) <> size(b,2) or
  size(a,1) <> size(b,1)
The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 5

The data distribution for a, b, or c is unsupported.

Input-Argument Errors for Equation 3

Stage 1

The rank of the ultimate align target is greater than 2 for a, b, or c.
The process rank is not the same for a, b, and c.
The process rank is not 1 or 2 for a, b, or c.

Stage 2

The process grid is not the same for a, b, and c.

Stage 3

The data distribution is unsupported for c.

Stage 4

The vector for a or b is replicated.

Stage 5

The data distribution is unsupported for a or b.

Stage 6

The shape of the assumed-shape arrays for a, b, and c is incompatible:

size(c,1) <> size(c,2) or
size(c,1) <> size(a) or
size(c,1) <> size(b)
The shape of the assumed-shape array for c is invalid: size(c,1) <> size(c,2)

Stage 7

The data distribution for a, b, or c is unsupported.

Example 1

This example computes C = alphaA^TB+alphaB^TA+betaC. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, B, C
 
CALL SYR2K( 1.0D0 , A , B , 0.0D0 , C , 'U' , 'T' )

Input

General 8 × 9 matrix A:

 *                                                      *
 |  0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0   1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0  -1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   1.0   0.0  -1.0   1.0   1.0   0.0   1.0   1.0 |
 |  1.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 |  1.0   0.0   0.0   0.0   1.0   1.0   0.0   0.0   1.0 |
 |  0.0   0.0  -1.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 | -1.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 *                                                      *

General 8 × 9 matrix B:

 *                                                      *
 |  0.0   1.0   1.0   0.0   0.0   0.0   0.0   0.0  -1.0 |
 |  0.0  -1.0   0.0  -1.0   0.0  -1.0   0.0  -1.0  -1.0 |
 |  0.0   0.0   1.0   1.0   0.0   0.0  -1.0   0.0  -1.0 |
 |  0.0  -1.0   0.0   1.0  -1.0  -1.0   0.0  -1.0  -1.0 |
 | -1.0   0.0   0.0   0.0   1.0   0.0   0.0   0.0  -1.0 |
 | -1.0   0.0   0.0   0.0  -1.0  -1.0   0.0   0.0  -1.0 |
 |  0.0   0.0   1.0   0.0   1.0   0.0   0.0   0.0  -1.0 |
 |  1.0   0.0   0.0   0.0   0.0   0.0   1.0   0.0  -1.0 |
 *                                                      *

Output

Symmetric matrix C of order 9:

 *                                                              *
 | -6.0    0.0    0.0    0.0    0.0   -2.0   -2.0    0.0   -2.0 |
 |    .   -6.0   -2.0    0.0   -2.0   -4.0    0.0   -4.0   -2.0 |
 |    .      .   -6.0   -2.0   -2.0    0.0    2.0    0.0    6.0 |
 |    .      .      .   -6.0    2.0    0.0    2.0    0.0    2.0 |
 |    .      .      .      .   -8.0   -4.0    0.0   -2.0    0.0 |
 |    .      .      .      .      .   -6.0    0.0   -4.0   -6.0 |
 |    .      .      .      .      .      .   -4.0    0.0    0.0 |
 |    .      .      .      .      .      .      .   -4.0   -4.0 |
 |    .      .      .      .      .      .      .      .  -16.0 |
 *                                                              *

Example 2

This example computes C = alphaab^T+alphaba^T+C. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ ALIGN A(:) WITH C(:,1)
!HPF$ ALIGN B(:) WITH C(:,1)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: C
 
CALL SYR2K( 1.0D0 , A , B , C , 'L' )

Input

Symmetric matrix C of order 9:

 *                                                     *
 | 1.0    .     .     .     .     .     .     .     .  |
 | 2.0  12.0    .     .     .     .     .     .     .  |
 | 3.0  13.0  23.0    .     .     .     .     .     .  |
 | 4.0  14.0  24.0  34.0    .     .     .     .     .  |
 | 5.0  15.0  25.0  35.0  45.0    .     .     .     .  |
 | 6.0  16.0  26.0  36.0  46.0  56.0    .     .     .  |
 | 7.0  17.0  27.0  37.0  47.0  57.0  67.0    .     .  |
 | 8.0  18.0  28.0  38.0  48.0  58.0  68.0  78.0    .  |
 | 9.0  19.0  29.0  39.0  49.0  59.0  69.0  79.0  89.0 |
 *                                                     *

Vector a of size 9:

 *     *
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 | 1.0 |
 *     *

Vector b of size 9:

 *     *
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 | 2.0 |
 *     *

Output

Matrix C of order 9:

 *                                                      *
 |  5.0    .     .     .     .     .     .     .     .  |
 |  6.0  16.0    .     .     .     .     .     .     .  |
 |  7.0  17.0  27.0    .     .     .     .     .     .  |
 |  8.0  18.0  28.0  38.0    .     .     .     .     .  |
 |  9.0  19.0  29.0  39.0  49.0    .     .     .     .  |
 | 10.0  20.0  30.0  40.0  50.0  60.0    .     .     .  |
 | 11.0  21.0  31.0  41.0  51.0  61.0  71.0    .     .  |
 | 12.0  22.0  32.0  42.0  52.0  62.0  72.0  82.0    .  |
 | 13.0  23.0  33.0  43.0  53.0  63.0  73.0  83.0  93.0 |
 *                                                      *

TRAN--Matrix Transpose for a General Matrix

This subroutine performs the following matrix computation:

C <-- betaC+alphaA^T

where, in the formula above:

A and C are general matrices.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

All of the assumed-shape arrays have a size of zero.
alpha is zero and beta is one.

See references [17], [30], [31], and [44].

Table 118. Data Types

alpha, beta, A, C Subprogram
Long-precision real TRAN

Syntax

HPF

CALL TRAN (alpha, a, beta, c)

On Entry

alpha

is the scalar alpha.

Type: required

Specified as: a number of the data type indicated in Table 118.

a

is the general matrix A.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 118.

beta

is the scalar beta.

Type: required

Specified as: a number of the data type indicated in Table 118.

c

is the general matrix C. When beta is zero, C need not be set on input.

Type: required

Specified as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 113.

On Return

c

is the updated general matrix C, containing the results of the computation.

Type: required

Returned as: an assumed-shape array with shape (:,:), containing numbers of the data type indicated in Table 118.

Notes and Coding Rules

The assumed-shape arrays must have the exact size required for the computation, that is:
- size(c,1) = size(a,2)
- size(c,2) = size(a,1)
The assumed-shape arrays must have no common elements; otherwise, results are unpredictable.
For details on how to set up and code your HPF program using Parallel ESSL, see "Coding Your HPF Program"
Block-cyclic data distribution is required for your array data. Because data directives are included in the interface module PESSL_HPF, you can specify any data distribution for your matrices, and the XL HPF compiler will, if necessary, redistribute the data prior to calling this subroutine. For how to code your HPF directives, see "Distributing Data in an HPF Program". For a sample program including directives, see Figure 9.
The restrictions given in "Notes and Coding Rules" also apply to this subroutine.

The rank of the ultimate align target is greater than 2 for a or c.
The process rank is not the same for a and c.
The process rank is not 1 or 2 for a or c.

size(c,2) <> size(a,1)

Stage 5

The data distribution for a or c is unsupported.

Example

This example computes C = betaC+alphaA^T. As in "Example", array data is block-cyclically distributed using a 2 × 2 process grid.

!HPF$ PROCESSORS PROC(2,2)
!HPF$ DISTRIBUTE (CYCLIC, CYCLIC) ONTO PROC :: A, C
 
CALL TRAN( 1.0D0 , A , 1.0D0 , C )

Input

General 8 × 9 matrix A:

 *                                                      *
 |  0.0  -1.0  -1.0   0.0   0.0   0.0   0.0   0.0   1.0 |
 |  0.0   1.0   0.0   1.0   0.0   1.0   0.0   1.0   1.0 |
 |  0.0   0.0  -1.0  -1.0   0.0   0.0   1.0   0.0   1.0 |
 |  0.0   1.0   0.0  -1.0   1.0   1.0   0.0   1.0   1.0 |
 |  1.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 |  1.0   0.0   0.0   0.0   1.0   1.0   0.0   0.0   1.0 |
 |  0.0   0.0  -1.0   0.0  -1.0   0.0   0.0   0.0   1.0 |
 | -1.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 *                                                      *

General 9 × 8 matrix C:

 *                                                *
 |  0.0   1.0   1.0   5.0   6.0   7.0   8.0   9.0 |
 |  0.0  -1.0   0.0  -1.0   0.0  -1.0   0.0   1.0 |
 |  0.0   0.0   1.0   1.0   0.0   0.0  -1.0   0.0 |
 |  0.0  -1.0   0.0   1.0  -1.0  -1.0   0.0   1.0 |
 | -1.0   2.0   0.0   0.0   1.0   0.0   0.0   0.0 |
 | -1.0   3.0   0.0   0.0  -1.0  -1.0   0.0   0.0 |
 |  0.0   4.0   1.0   0.0   1.0   0.0   0.0   0.0 |
 |  1.0   5.0   0.0   0.0   0.0   0.0   1.0   0.0 |
 |  1.0   2.0   3.0   4.0   1.0   1.0   1.0   1.0 |
 *                                                *

Output

General 9 × 8 matrix C:

 *                                                *
 |  0.0   1.0   1.0   5.0   7.0   8.0   8.0   8.0 |
 | -1.0   0.0   0.0   0.0   0.0  -1.0   0.0   1.0 |
 | -1.0   0.0   0.0   1.0   0.0   0.0  -2.0   0.0 |
 |  0.0   0.0  -1.0   0.0  -1.0  -1.0   0.0   1.0 |
 | -1.0   2.0   0.0   1.0   0.0   1.0  -1.0   0.0 |
 | -1.0   4.0   0.0   1.0  -1.0   0.0   0.0   0.0 |
 |  0.0   4.0   2.0   0.0   1.0   0.0   0.0  -1.0 |
 |  1.0   6.0   0.0   1.0   0.0   0.0   1.0   0.0 |
 |  2.0   3.0   4.0   5.0   2.0   2.0   2.0   2.0 |
 *                                                *

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

alpha, beta, A, B, C, a, b, c	Subroutine
Long-precision real	GEMM
Long-precision complex	GEMM

alpha, beta, A, B, C, b, c	Subprogram
Long-precision real	SYMM

alpha, beta, A, B, C, a, b	Subprogram
Long-precision real	SYR2K