Guide and Reference

Performance and Accuracy Considerations

This section describes some key points about performance and accuracy in the matrix operations subroutines.

In General

The matrix operation subroutines use algorithms that are tuned specifically to the workstation processors they run on. The techniques involve using any one of several computational methods, based on certain operation counts and sizes of data.
The short-precision multiplication subroutines provide increased accuracy by partially accumulating results in long precision.
Strassen's method is not stable for certain row or column scalings of the input matrices A and B. Therefore, for matrices A and B with divergent exponent values, Strassen's method may give inaccurate results. For these cases, you should use the _GEMUL or _GEMM subroutines.
There are ESSL-specific rules that apply to the results of computations on the workstation processors using the ANSI/IEEE standards. For details, see "What Data Type Standards Are Used by ESSL, and What Exceptions Should You Know About?".

If you are using large square matrices in your matrix multiplication operations, you get better performance by using SGEMMS, DGEMMS, CGEMMS, and ZGEMMS. These subroutines use Winograd's variation of Strassen's algorithm for both real and complex matrices.

For Combined Operations

If you want to perform a combined matrix multiplication and addition with scaling, SGEMM, DGEMM, CGEMM, and ZGEMM provide better performance than if you perform the parts of the computation separately in your program. See references [32] and [35].

Matrix Operation Subroutines

This section contains the matrix operation subroutine descriptions.

SGEADD, DGEADD, CGEADD, and ZGEADD--Matrix Addition for General Matrices or Their Transposes

These subroutines can perform any one of the following matrix additions, using matrices A and B or their transposes, and matrix C:

C <-- A+B

C <-- A^T+B

C <-- A+B^T

C <-- A^T+B^T

Table 72. Data Types

A, B, C Subroutine
Short-precision real SGEADD
Long-precision real DGEADD
Short-precision complex CGEADD
Long-precision complex ZGEADD

Syntax

Fortran	CALL SGEADD \| DGEADD \| CGEADD \| ZGEADD (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`)
C and C++	sgeadd \| dgeadd \| cgeadd \| zgeadd (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`);
PL/I	CALL SGEADD \| DGEADD \| CGEADD \| ZGEADD (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`);

On Entry

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has m rows and n columns.

If transa = 'T', A^T is used in the computation, and A has n rows and m columns.
Note: No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:

If transa = 'N', its size must be lda by (at least) n.

If transa = 'T', its size must be lda by (at least) m.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= m.

If transa = 'T', lda >= n.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

Specified as: a single character; transa = 'N' or 'T'.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.
Note: No data should be moved to form B^T; that is, the matrix B should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T', ldb >= n.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

Specified as: a single character; transb = 'N' or 'T'.

c

See 'On Return'.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= m.

m

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= m <= ldc.

n

is the number of columns in matrix C. Specified as: a fullword integer; 0 <= n.

On Return

c: is the m by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 72.

Notes

All subroutines accept lowercase letters for the transa and transb arguments.
Matrix C must have no common elements with matrices A or B. However, C may (exactly) coincide with A if transa = 'N', and C may (exactly) coincide with B if transb = 'N'. Otherwise, results are unpredictable. See "Concepts".

Function

The matrix sum is expressed as follows, where a_ij, b_ij, and c_ij are elements of matrices A, B, and C, respectively:

c_ij = a_ij+b_ij for C <-- A+B

c_ij = a_ij+b_ji for C <-- A+B^T

c_ij = a_ji+b_ij for C <-- A^T+B

c_ij = a_ji+b_ji for C <-- A^T+B^T

for i = 1, m and j = 1, n

If m or n is 0, no computation is performed.

Special Usage

You can compute the transpose C^T of each of the four computations listed under "Function" by using the following matrix identities:

(A+B)^T = A^T+B^T

(A+B^T)^T = A^T+B

(A^T+B)^T = A+B^T

(A^T+B^T)^T = A+B

Be careful that your output array receiving C^T has dimensions large enough to hold the transposed matrix. See "Example 4".

Error Conditions

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
m, n < 0
m > ldc
transa, transb <> 'N' or 'T'
transa = 'N' and m > lda
transa = 'T' and n > lda
transb = 'N' and m > ldb
transb = 'T' and n > ldb

Example 1

This example shows the computation C <-- A+B, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B, in which it is contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGEADD( A , 6 , 'N'  , B , 4 , 'N'  , C , 5 , 4 , 3 )

        *                              *
        | 110000.0  120000.0  130000.0 |
        | 210000.0  220000.0  230000.0 |
A    =  | 310000.0  320000.0  330000.0 |
        | 410000.0  420000.0  430000.0 |
        |       .         .         .  |
        |       .         .         .  |
        *                              *

        *                  *
        | 11.0  12.0  13.0 |
B    =  | 21.0  22.0  23.0 |
        | 31.0  32.0  33.0 |
        | 41.0  42.0  43.0 |
        *                  *

Output

        *                              *
        | 110011.0  120012.0  130013.0 |
        | 210021.0  220022.0  230023.0 |
C    =  | 310031.0  320032.0  330033.0 |
        | 410041.0  420042.0  430043.0 |
        |       .         .         .  |
        *                              *

Example 2

This example shows the computation C <-- A^T+B, where A, B, and C are the same size as arrays A, B, and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGEADD( A , 3 , 'T'  , B , 4 , 'N'  , C , 4 , 4 , 3 )

        *                                        *
        | 110000.0  120000.0  130000.0  140000.0 |
A    =  | 210000.0  220000.0  230000.0  240000.0 |
        | 310000.0  320000.0  330000.0  340000.0 |
        *                                        *

        *                  *
        | 11.0  12.0  13.0 |
B    =  | 21.0  22.0  23.0 |
        | 31.0  32.0  33.0 |
        | 41.0  42.0  43.0 |
        *                  *

Output

        *                              *
        | 110011.0  210012.0  310013.0 |
C    =  | 120021.0  220022.0  320023.0 |
        | 130031.0  230032.0  330033.0 |
        | 140041.0  240042.0  340043.0 |
        *                              *

Example 3

This example shows computation C <-- A+B^T, where A is contained in a larger array A, and B and C are the same size as arrays B and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGEADD( A , 5 , 'N'  , B , 3 , 'T'  , C , 4 , 4 , 3 )

        *                              *
        | 110000.0  120000.0  130000.0 |
        | 210000.0  220000.0  230000.0 |
A    =  | 310000.0  320000.0  330000.0 |
        | 410000.0  420000.0  430000.0 |
        |       .         .         .  |
        *                              *

        *                        *
        | 11.0  12.0  13.0  14.0 |
B    =  | 21.0  22.0  23.0  24.0 |
        | 31.0  32.0  33.0  34.0 |
        *                        *

Output

        *                              *
        | 110011.0  120021.0  130031.0 |
C    =  | 210012.0  220022.0  230032.0 |
        | 310013.0  320023.0  330033.0 |
        | 410014.0  420024.0  430034.0 |
        *                              *

Example 4

This example shows how to produce the transpose of the result of the computation performed in "Example 3", C <-- A+B^T, which uses the calling sequence:

      CALL SGEADD( A , 5 , 'N' , B , 3 , 'T' , C , 4 , 4 , 3 )

You instead code a calling sequence for C^T <-- A^T+B, as shown below, where the resulting matrix C^T in the output array CT is the transpose of the matrix in the output array C in "Example 3". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C   LDC  M   N
             |   |    |     |   |    |     |    |   |   |
CALL SGEADD( A , 5 , 'T'  , B , 3 , 'N'  , CT , 4 , 3 , 4 )

        *                              *
        | 110000.0  120000.0  130000.0 |
        | 210000.0  220000.0  230000.0 |
A    =  | 310000.0  320000.0  330000.0 |
        | 410000.0  420000.0  430000.0 |
        |       .         .         .  |
        *                              *

        *                        *
        | 11.0  12.0  13.0  14.0 |
B    =  | 21.0  22.0  23.0  24.0 |
        | 31.0  32.0  33.0  34.0 |
        *                        *

Output

        *                                        *
        | 110011.0  210012.0  310013.0  410014.0 |
CT   =  | 120021.0  220022.0  320023.0  420024.0 |
        | 130031.0  230032.0  330033.0  430034.0 |
        |       .         .         .         .  |
        *                                        *

Example 5

This example shows the computation C <-- A^T+B^T, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGEADD( A , 3 , 'T'  , B , 3 , 'T'  , C , 4 , 4 , 3 )

        *                                        *
        | 110000.0  120000.0  130000.0  140000.0 |
A    =  | 210000.0  220000.0  230000.0  240000.0 |
        | 310000.0  320000.0  330000.0  340000.0 |
        *                                        *

        *                        *
        | 11.0  12.0  13.0  14.0 |
B    =  | 21.0  22.0  23.0  24.0 |
        | 31.0  32.0  33.0  34.0 |
        *                        *

Output

        *                              *
        | 110011.0  210021.0  310031.0 |
C    =  | 120012.0  220022.0  320032.0 |
        | 130013.0  230023.0  330033.0 |
        | 140014.0  240024.0  340034.0 |
        *                              *

Example 6

This example shows the computation C <-- A+B, where A, B, and C are contained in larger arrays A, B, and C, respectively, and the arrays contain complex data.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL CGEADD( A , 6 , 'N'  , B , 5 , 'N'  , C , 5 , 4 , 3 )

        *                                    *
        | (1.0, 5.0)  (9.0, 2.0)  (1.0, 9.0) |
        | (2.0, 4.0)  (8.0, 3.0)  (1.0, 8.0) |
A    =  | (3.0, 3.0)  (7.0, 5.0)  (1.0, 7.0) |
        | (6.0, 6.0)  (3.0, 6.0)  (1.0, 4.0) |
        |     .           .           .      |
        |     .           .           .      |
        *                                    *

        *                                    *
        | (1.0, 8.0)  (2.0, 7.0)  (3.0, 2.0) |
        | (4.0, 4.0)  (6.0, 8.0)  (6.0, 3.0) |
B    =  | (6.0, 2.0)  (4.0, 5.0)  (4.0, 5.0) |
        | (7.0, 2.0)  (6.0, 4.0)  (1.0, 6.0) |
        |     .           .           .      |
        *                                    *

Output

        *                                         *
        |  (2.0, 13.0)  (11.0,  9.0)  (4.0, 11.0) |
        |  (6.0,  8.0)  (14.0, 11.0)  (7.0, 11.0) |
C    =  |  (9.0,  5.0)  (11.0, 10.0)  (5.0, 12.0) |
        | (13.0,  8.0)   (9.0, 10.0)  (2.0, 10.0) |
        |      .             .            .       |
        *                                         *

SGESUB, DGESUB, CGESUB, and ZGESUB--Matrix Subtraction for General Matrices or Their Transposes

These subroutines can perform any one of the following matrix subtractions, using matrices A and B or their transposes, and matrix C:

C <-- A-B

C <-- A^T-B

C <-- A-B^T

C <-- A^T-B^T

Table 73. Data Types

A, B, C Subroutine
Short-precision real SGESUB
Long-precision real DGESUB
Short-precision complex CGESUB
Long-precision complex ZGESUB

Syntax

Fortran	CALL SGESUB \| DGESUB \| CGESUB \| ZGESUB (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`)
C and C++	sgesub \| dgesub \| cgesub \| zgesub (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`);
PL/I	CALL SGESUB \| DGESUB \| CGESUB \| ZGESUB (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `m`, `n`);

On Entry

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has m rows and n columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 73, where:

If transa = 'N', its size must be lda by (at least) n.

If transa = 'T', its size must be lda by (at least) m.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= m.

If transa = 'T', lda >= n.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

Specified as: a single character; transa = 'N' or 'T'.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T', ldb >= n.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

Specified as: a single character; transb = 'N' or 'T'.

c

See 'On Return'.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= m.

m

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= m <= ldc.

n

is the number of columns in matrix C. Specified as: a fullword integer; 0 <= n.

On Return

c: is the m by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 73.

Notes

All subroutines accept lowercase letters for the transa and transb arguments.
Matrix C must have no common elements with matrices A or B. However, C may (exactly) coincide with A if transa = 'N', and C may (exactly) coincide with B if transb = 'N'. Otherwise, results are unpredictable. See "Concepts".

Function

The matrix subtraction is expressed as follows, where a_ij, b_ij, and c_ij are elements of matrices A, B, and C, respectively:

c_ij = a_ij-b_ij for C <-- A-B

c_ij = a_ij-b_ji for C <-- A-B^T

c_ij = a_ji-b_ij for C <-- A^T-B

c_ij = a_ji-b_ji for C <-- A^T-B^T

for i = 1, m and j = 1, n

If m or n is 0, no computation is performed.

Special Usage

You can compute the transpose C^T of each of the four computations listed under "Function" by using the following matrix identities:

(A-B)^T = A^T-B^T

(A-B^T)^T = A^T-B

(A^T-B)^T = A-B^T

(A^T-B^T)^T = A-B

Be careful that your output array receiving C^T has dimensions large enough to hold the transposed matrix. See "Example 5".

Error Conditions

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
m, n < 0
m > ldc
transa, transb <> 'N' or 'T'
transa = 'N' and m > lda
transa = 'T' and n > lda
transb = 'N' and m > ldb
transb = 'T' and n > ldb

Example 1

This example shows the computation C <-- A-B, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B, in which it is contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGESUB( A , 6 , 'N'  , B , 4 , 'N'  , C , 5 , 4 , 3 )

        *                              *
        | 110000.0  120000.0  130000.0 |
        | 210000.0  220000.0  230000.0 |
A    =  | 310000.0  320000.0  330000.0 |
        | 410000.0  420000.0  430000.0 |
        |       .         .         .  |
        |       .         .         .  |
        *                              *

        *                     *
        | -11.0  -12.0  -13.0 |
B    =  | -21.0  -22.0  -23.0 |
        | -31.0  -32.0  -33.0 |
        | -41.0  -42.0  -43.0 |
        *                     *

Output

        *                              *
        | 110011.0  120012.0  130013.0 |
        | 210021.0  220022.0  230023.0 |
C    =  | 310031.0  320032.0  330033.0 |
        | 410041.0  420042.0  430043.0 |
        |       .         .         .  |
        *                              *

Example 2

This example shows the computation C <-- A^T-B, where A, B, and C are the same size as arrays A, B, and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGESUB( A , 3 , 'T'  , B , 4 , 'N'  , C , 4 , 4 , 3 )

        *                                        *
        | 110000.0  120000.0  130000.0  140000.0 |
A    =  | 210000.0  220000.0  230000.0  240000.0 |
        | 310000.0  320000.0  330000.0  340000.0 |
        *                                        *

        *                     *
        | -11.0  -12.0  -13.0 |
B    =  | -21.0  -22.0  -23.0 |
        | -31.0  -32.0  -33.0 |
        | -41.0  -42.0  -43.0 |
        *                     *

Output

        *                              *
        | 110011.0  210012.0  310013.0 |
C    =  | 120021.0  220022.0  320023.0 |
        | 130031.0  230032.0  330033.0 |
        | 140041.0  240042.0  340043.0 |
        *                              *

Example 3

This example shows computation C <-- A-B^T, where A is contained in a larger array A, and B and C are the same size as arrays B and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGESUB( A , 5 , 'N'  , B , 3 , 'T'  , C , 4 , 4 , 3 )

        *                              *
        | 110000.0  120000.0  130000.0 |
        | 210000.0  220000.0  230000.0 |
A    =  | 310000.0  320000.0  330000.0 |
        | 410000.0  420000.0  430000.0 |
        |       .         .         .  |
        *                              *

        *                            *
        | -11.0  -12.0  -13.0  -14.0 |
B    =  | -21.0  -22.0  -23.0  -24.0 |
        | -31.0  -32.0  -33.0  -34.0 |
        *                            *

Output

        *                              *
        | 110011.0  120021.0  130031.0 |
C    =  | 210012.0  220022.0  230032.0 |
        | 310013.0  320023.0  330033.0 |
        | 410014.0  420024.0  430034.0 |
        *                              *

Example 4

This example shows the computation C <-- A^T-B^T, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL SGESUB( A , 3 , 'T'  , B , 3 , 'T'  , C , 4 , 4 , 3 )

        *                                        *
        | 110000.0  120000.0  130000.0  140000.0 |
A    =  | 210000.0  220000.0  230000.0  240000.0 |
        | 310000.0  320000.0  330000.0  340000.0 |
        *                                        *

        *                            *
        | -11.0  -12.0  -13.0  -14.0 |
B    =  | -21.0  -22.0  -23.0  -24.0 |
        | -31.0  -32.0  -33.0  -34.0 |
        *                            *

Output

        *                              *
        | 110011.0  210021.0  310031.0 |
C    =  | 120012.0  220022.0  320032.0 |
        | 130013.0  230023.0  330033.0 |
        | 140014.0  240024.0  340034.0 |
        *                              *

Example 5

This example shows how to produce the transpose of the result of the computation performed in "Example 4", C <-- A^T-B^T, which uses the calling sequence:

      CALL SGESUB( A , 3 , 'T' , B , 3 , 'T' , C , 4 , 4 , 3 )

You instead code a calling sequence for C^T <-- A-B, as shown below, where the resulting matrix C^T in the output array CT is the transpose of the matrix in the output array C in "Example 4". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C   LDC  M   N
             |   |    |     |   |    |     |    |   |   |
CALL SGESUB( A , 3 , 'N'  , B , 3 , 'N'  , CT , 3 , 3 , 4 )

        *                                        *
        | 110000.0  120000.0  130000.0  140000.0 |
A    =  | 210000.0  220000.0  230000.0  240000.0 |
        | 310000.0  320000.0  330000.0  340000.0 |
        *                                        *

        *                            *
        | -11.0  -12.0  -13.0  -14.0 |
B    =  | -21.0  -22.0  -23.0  -24.0 |
        | -31.0  -32.0  -33.0  -34.0 |
        *                            *

Output

        *                                        *
        | 110011.0  120012.0  130013.0  140014.0 |
CT   =  | 210021.0  220022.0  230023.0  240024.0 |
        | 310031.0  320032.0  330033.0  340034.0 |
        *                                        *

Example 6

This example shows the computation C <-- A-B, where A, B, and C are contained in larger arrays A, B, and C, respectively, and the arrays contain complex data.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  M   N
             |   |    |     |   |    |     |   |   |   |
CALL CGESUB( A , 6 , 'N'  , B , 5 , 'N'  , C , 5 , 4 , 3 )

        *                                    *
        | (1.0, 5.0)  (9.0, 2.0)  (1.0, 9.0) |
        | (2.0, 4.0)  (8.0, 3.0)  (1.0, 8.0) |
A    =  | (3.0, 3.0)  (7.0, 5.0)  (1.0, 7.0) |
        | (6.0, 6.0)  (3.0, 6.0)  (1.0, 4.0) |
        |     .           .           .      |
        |     .           .           .      |
        *                                    *

        *                                    *
        | (1.0, 8.0)  (2.0, 7.0)  (3.0, 2.0) |
        | (4.0, 4.0)  (6.0, 8.0)  (6.0, 3.0) |
B    =  | (6.0, 2.0)  (4.0, 5.0)  (4.0, 5.0) |
        | (7.0, 2.0)  (6.0, 4.0)  (1.0, 6.0) |
        |     .           .           .      |
        *                                    *

Output

        *                                          *
        |  (0.0, -3.0)   (7.0, -5.0)  (-2.0,  7.0) |
        | (-2.0,  0.0)   (2.0, -5.0)  (-5.0,  5.0) |
C    =  | (-3.0,  1.0)   (3.0,  0.0)  (-3.0,  2.0) |
        | (-1.0,  4.0)  (-3.0,  2.0)   (0.0, -2.0) |
        |      .             .             .       |
        *                                          *

SGEMUL, DGEMUL, CGEMUL, and ZGEMUL--Matrix Multiplication for General Matrices, Their Transposes, or Conjugate Transposes

SGEMUL and DGEMUL can perform any one of the following matrix multiplications, using matrices A and B or their transposes, and matrix C:

C <-- AB C <-- AB^T
C <-- A^TB C <-- A^TB^T

CGEMUL and ZGEMUL can perform any one of the following matrix multiplications, using matrices A and B, their transposes or their conjugate transposes, and matrix C:

C <-- AB C <-- AB^T C <-- AB^H
C <-- A^TB C <-- A^TB^T C <-- A^TB^H
C <-- A^HB C <-- A^HB^T C <-- A^HB^H

Table 74. Data Types

A, B, C Subroutine
Short-precision real SGEMUL
Long-precision real DGEMUL
Short-precision complex CGEMUL
Long-precision complex ZGEMUL

Syntax

Fortran	CALL SGEMUL \| DGEMUL \| CGEMUL \| ZGEMUL (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`)
C and C++	sgemul \| dgemul \| cgemul \| zgemul (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`);
PL/I	CALL SGEMUL \| DGEMUL \| CGEMUL \| ZGEMUL (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`);
APL2	SGEMUL \| DGEMUL \| CGEMUL \| ZGEMUL `a` `lda` `transa` `b` `ldb` `transb` '`c`' `ldc` `l` `m` `n`

On Entry

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', A^T is used in the computation, and A has m rows and l columns.

If transa = 'C', A^H is used in the computation, and A has m rows and l columns.
Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 74, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= l.

If transa = 'T' or 'C', lda >= m.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Specified as: a single character; transa = 'N' or 'T' for SGEMUL and DGEMUL; transa = 'N', 'T', or 'C' for CGEMUL and ZGEMUL.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.

If transb = 'C', B^H is used in the computation, and B has n rows and m columns.
Note: No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 74, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T' or 'C', ldb >= n.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Specified as: a single character; transb = 'N' or 'T' for SGEMUL and DGEMUL; transb = 'N', 'T', or 'C' for CGEMUL and ZGEMUL.

c

See 'On Return'.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= l.

l

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= l <= ldc.

m

has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: a fullword integer; m >= 0.

n

is the number of columns in matrix C. Specified as: a fullword integer; n >= 0.

On Return

c: is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n numbers of the data type indicated in Table 74.

Notes

All subroutines accept lowercase letters for the transa and transb arguments.
Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See "Concepts".

Function

The matrix multiplication is expressed as follows, where a_ik, b_kj, and c_ij are elements of matrices A, B, and C, respectively:

See reference [38]. If l or n is 0, no computation is performed. If l and n are greater than 0, and m is 0, an l by n matrix of zeros is returned.

Special Usage

Equivalence Rules

By using the following equivalence rules, you can compute the transpose C^T or the conjugate transpose C^H of some of the computations performed by these subroutines:

Transpose Conjugate Transpose
(AB)^T = B^TA^T (AB)^H = B^HA^H
(A^TB)^T = B^TA (A^HB)^H = B^HA
(AB^T)^T = BA^T (AB^H)^H = BA^H
(A^TB^T)^T = BA (A^HB^H)^H = BA

Transpose	Conjugate Transpose
(AB)^T = B^TA^T	(AB)^H = B^HA^H
(A^TB)^T = B^TA	(A^HB)^H = B^HA
(AB^T)^T = BA^T	(AB^H)^H = BA^H
(A^TB^T)^T = BA	(A^HB^H)^H = BA

When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving C^T or C^H, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See "Example 2" and "Example 4".

Error Conditions

Resource Errors

Unable to allocate internal work area (CGEMUL and ZGEMUL only).

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
l, m, n < 0
l > ldc
transa, transb <> 'N' or 'T' for SGEMUL and DGEMUL
transa, transb <> 'N', 'T', or 'C' for CGEMUL and ZGEMUL
transa = 'N' and l > lda
transa = 'T' or 'C' and m > lda
transb = 'N' and m > ldb
transb = 'T' or 'C' and n > ldb

Example 1

This example shows the computation C <-- AB, where A, B, and C are contained in larger arrays A, B, and C, respectively.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL SGEMUL( A , 8 , 'N'  , B , 6 , 'N'  , C , 7 , 6 , 5 , 4 )

        *                              *
        |  1.0   2.0  -1.0  -1.0   4.0 |
        |  2.0   0.0   1.0   1.0  -1.0 |
        |  1.0  -1.0  -1.0   1.0   2.0 |
A    =  | -3.0   2.0   2.0   2.0   0.0 |
        |  4.0   0.0  -2.0   1.0  -1.0 |
        | -1.0  -1.0   1.0  -3.0   2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |
        *                              *

        *                        *
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |
        *                        *

Output

        *                         *
        | 23.0  12.0  -6.0    2.0 |
        | -4.0  -5.0   1.0    3.0 |
        |  3.0   0.0   1.0    4.0 |
C    =  | -3.0   5.0  -2.0  -10.0 |
        | -5.0  -7.0   4.0    4.0 |
        | 15.0   6.0  -5.0    6.0 |
        |   .     .     .      .  |
        *                         *

Example 2

This example shows how to produce the transpose of the result of the computation performed in "Example 1", C <-- AB, which uses the calling sequence:

      CALL  SGEMUL (A,8,'N',B,6,'N',C,7,6,5,4)

You instead code a calling sequence for C^T <-- B^TA^T, as shown below, where the resulting matrix C^T in the output array CT is the transpose of the matrix in the output array C in "Example 1". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C   LDC  L   M   N
             |   |    |     |   |    |     |    |   |   |   |
CALL SGEMUL( B , 6 , 'T'  , A , 8 , 'T'  , CT , 5 , 4 , 5 , 6 )

        *                        *
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |
        *                        *

        *                              *
        |  1.0   2.0  -1.0  -1.0   4.0 |
        |  2.0   0.0   1.0   1.0  -1.0 |
        |  1.0  -1.0  -1.0   1.0   2.0 |
A    =  | -3.0   2.0   2.0   2.0   0.0 |
        |  4.0   0.0  -2.0   1.0  -1.0 |
        | -1.0  -1.0   1.0  -3.0   2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |
        *                              *

Output

        *                                    *
        | 23.0  -4.0  3.0   -3.0  -5.0  15.0 |
        | 12.0  -5.0  0.0    5.0  -7.0   6.0 |
CT   =  | -6.0   1.0  1.0   -2.0   4.0  -5.0 |
        |  2.0   3.0  4.0  -10.0   4.0   6.0 |
        |   .     .    .      .     .     .  |
        *                                    *

Example 3

This example shows the computation C <-- A^TB, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL SGEMUL( A , 4 , 'T'  , B , 3 , 'N'  , C , 5 , 3 , 3 , 6 )

        *                 *
        | 1.0  -3.0   2.0 |
A    =  | 2.0   4.0   0.0 |
        | 1.0  -1.0  -1.0 |
        |  .     .     .  |
        *                 *

        *                                   *
        | 1.0  -3.0   2.0   2.0  -1.0   2.0 |
B    =  | 2.0   4.0   0.0   0.0   1.0  -2.0 |
        | 1.0  -1.0  -1.0  -1.0  -1.0   1.0 |
        *                                   *

Output

        *                                    *
        | 6.0   4.0   1.0   1.0   0.0   -1.0 |
        | 4.0  26.0  -5.0  -5.0   8.0  -15.0 |
C    =  | 1.0  -5.0   5.0   5.0  -1.0    3.0 |
        |  .     .     .     .     .      .  |
        |  .     .     .     .     .      .  |
        *                                    *

Example 4

This example shows how to produce the transpose of the result of the computation performed in "Example 3", C <-- A^TB, which uses the calling sequence:

      CALL  SGEMUL (A,4,'T',B,3,'N',C,5,3,3,6)

You instead code the calling sequence for C^T <-- B^TA, as shown below, where the resulting matrix C^T in the output array CT is the transpose of the matrix in the output array C in "Example 3". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C   LDC  L   M   N
             |   |    |     |   |    |     |    |   |   |   |
CALL SGEMUL( B , 3 , 'T'  , A , 4 , 'N'  , CT , 8 , 6 , 3 , 3 )

        *                                   *
        | 1.0  -3.0   2.0   2.0  -1.0   2.0 |
B    =  | 2.0   4.0   0.0   0.0   1.0  -2.0 |
        | 1.0  -1.0  -1.0  -1.0  -1.0   1.0 |
        *                                   *

        *                 *
        | 1.0  -3.0   2.0 |
A    =  | 2.0   4.0   0.0 |
        | 1.0  -1.0  -1.0 |
        |  .     .     .  |
        *                 *

Output

        *                   *
        |  6.0    4.0   1.0 |
        |  4.0   26.0  -5.0 |
        |  1.0   -5.0   5.0 |
CT   =  |  1.0   -5.0   5.0 |
        |  0.0    8.0  -1.0 |
        | -1.0  -15.0   3.0 |
        |   .      .     .  |
        |   .      .     .  |
        *                   *

Example 5

This example shows the computation C <-- AB^T, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL SGEMUL( A , 4 , 'N'  , B , 3 , 'T'  , C , 5 , 3 , 2 , 3 )

        *           *
        | 1.0  -3.0 |
A    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        |  .     .  |
        *           *

        *           *
        | 1.0  -3.0 |
B    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        *           *

Output

        *                    *
        |  10.0  -10.0   4.0 |
        | -10.0   20.0  -2.0 |
C    =  |   4.0   -2.0   2.0 |
        |    .      .     .  |
        |    .      .     .  |
        *                    *

Example 6

This example shows the computation C <-- A^TB^T, where A, B, and C are the same size as the arrays A, B, and C in which they are contained. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL SGEMUL( A , 3 , 'T'  , B , 3 , 'T'  , C , 1 , 1 , 3 , 3 )

        *     *
        | 1.0 |
A    =  | 2.0 |
        | 1.0 |
        *     *

        *                 *
        | 1.0  -3.0   2.0 |
B    =  | 2.0   4.0   0.0 |
        | 1.0  -1.0  -1.0 |
        *                 *

Output

        *                  *
B    =  | -3.0  10.0  -2.0 |
        *                  *

Example 7

This example shows the computation C <-- A^TB using complex data, where A, B, and C are contained in larger arrays A, B, and C, respectively.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL CGEMUL( A , 6 , 'T'  , B , 7 , 'N'  , C , 3 , 2 , 3 , 3 )

        *                        *
        | (1.0, 2.0)  (3.0, 4.0) |
        | (4.0, 6.0)  (7.0, 1.0) |
A    =  | (6.0, 3.0)  (2.0, 5.0) |
        |     .           .      |
        |     .           .      |
        |     .           .      |
        *                        *

        *                                    *
        | (1.0, 9.0)  (2.0, 6.0)  (5.0, 6.0) |
        | (2.0, 5.0)  (6.0, 2.0)  (6.0, 4.0) |
        | (2.0, 6.0)  (5.0, 4.0)  (2.0, 6.0) |
B    =  |     .           .           .      |
        |     .           .           .      |
        |     .           .           .      |
        |     .           .           .      |
        *                                    *

Output

        *                                             *
        | (-45.0, 85.0)  (20.0, 93.0)  (-13.0, 110.0) |
C    =  | (-50.0, 90.0)  (12.0, 79.0)    (3.0,  94.0) |
        |       .             .              .        |
        *                                             *

Example 8

This example shows the computation C <-- AB^H using complex data, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N
             |   |    |     |   |    |     |   |   |   |   |
CALL CGEMUL( A , 4 , 'N'  , B , 3 , 'C'  , C , 4 , 3 , 2 , 3 )

        *                        *
        | (1.0, 2.0)  (-3.0, 2.0) |
A    =  | (2.0, 6.0)   (4.0, 5.0) |
        | (1.0, 2.0)  (-1.0, 8.0) |
        |     .            .      |
        *                         *

        *                         *
        | (1.0, 3.0)  (-3.0, 2.0) |
B    =  | (2.0, 5.0)   (4.0, 6.0) |
        | (1.0, 1.0)  (-1.0, 9.0) |
        *                         *

Output

        *                                            *
        | (20.0,  -1.0)  (12.0, 25.0)  (24.0,  26.0) |
C    =  | (18.0, -23.0)  (80.0, -2.0)  (49.0, -37.0) |
        | (26.0, -23.0)  (56.0, 37.0)  (76.0,   2.0) |
        |      .              .             .        |
        *                                            *

SGEMMS, DGEMMS, CGEMMS, and ZGEMMS--Matrix Multiplication for General Matrices, Their Transposes, or Conjugate Transposes Using Winograd's Variation of Strassen's Algorithm

These subroutines use Winograd's variation of the Strassen's algorithm to perform the matrix multiplication for both real and complex matrices. SGEMMS and DGEMMS can perform any one of the following matrix multiplications, using matrices A and B or their transposes, and matrix C:

C <-- AB C <-- AB^T
C <-- A^TB C <-- A^TB^T

CGEMMS and ZGEMMS can perform any one of the following matrix multiplications, using matrices A and B, their transposes or their conjugate transposes, and matrix C:

C <-- AB C <-- AB^T C <-- AB^H
C <-- A^TB C <-- A^TB^T C <-- A^TB^H
C <-- A^HB C <-- A^HB^T C <-- A^HB^H

Table 75. Data Types

A, B, C aux Subroutine
Short-precision real Short-precision real SGEMMS
Long-precision real Long-precision real DGEMMS
Short-precision complex Short-precision real CGEMMS
Long-precision complex Long-precision real ZGEMMS

Syntax

Fortran	CALL SGEMMS \| DGEMMS \| CGEMMS \| ZGEMMS (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`)
C and C++	sgemms \| dgemms \| cgemms \| zgemms (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`);
PL/I	CALL SGEMMS \| DGEMMS \| CGEMMS \| ZGEMMS (`a`, `lda`, `transa`, `b`, `ldb`, `transb`, `c`, `ldc`, `l`, `m`, `n`, `aux`, `naux`);

On Entry

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', A^T is used in the computation, and A has m rows and l columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= l.

If transa = 'T' or 'C', lda >= m.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Specified as: a single character; transa = 'N' or 'T' for SGEMMS and DGEMMS; transa = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T' or 'C', ldb >= n.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Specified as: a single character; transb = 'N' or 'T' for SGEMMS and DGEMMS; transb = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.

c

See 'On Return'.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= l.

l

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= l <= ldc.

m

has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: a fullword integer; m >= 0.

n

is the number of columns in matrix C. Specified as: a fullword integer; n >= 0.

aux

has the following meaning:

If naux = 0 and error 2015 is unrecoverable, aux is ignored.

Otherwise, is the storage work area used by this subroutine. Its size is specified by naux.

Specified as: an area of storage containing numbers of the data type indicated in Table 75.

naux

is the size of the work area specified by aux--that is, the number of elements in aux.

Specified as: a fullword integer, where:

If naux = 0 and error 2015 is unrecoverable, SGEMMS, DGEMMS, CGEMMS, and ZGEMMS dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program.

Otherwise,

When this subroutine uses Strassen's algorithm:

For SGEMMS and DGEMMS:

Use naux = max[(n)(l), 0.7m(l+n)].
For CGEMMS and ZGEMMS:

Use naux = max[(n)(l), 0.7m(l+n)]+nb1+nb2, where:

If l >= n, then nb1 >= (l)(n+20) and nb2 >= max[(n)(l), (m)(n+20)].
If l < n, then nb1 >= (m)(n+20) and nb2 >= max[(n)(l), (l)(m+20)].

When this subroutine uses the direct method (_GEMUL), use naux >= 0.

Notes:

In most cases, these formulas provide an overestimate.
For an explanation of when this subroutine uses the direct method versus Strassen's algorithm, see "Notes".

On Return

c: is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 75.

Notes

There are two instances when these subroutines use the direct method (_GEMUL), rather than using Strassen's algorithm:
- When either or both of the input matrices are small
- For CGEMMS and ZGEMMS, when input matrices A and B overlap
In these instances when the direct method is used, the subroutine does not use auxiliary storage, and you can specify naux = 0.
For CGEMMS and ZGEMMS, one of the input matrices, A or B, is rearranged during the computation and restored to its original form on return. Keep this in mind when diagnosing an abnormal termination.
All subroutines accept lowercase letters for the transa and transb arguments.
Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See "Concepts".
You have the option of having the minimum required value for naux dynamically returned to your program. For details, see "Using Auxiliary Storage in ESSL".

Function

The matrix multiplications performed by these subroutines are functionally equivalent to those performed by SGEMUL, DGEMUL, CGEMUL, and ZGEMUL. For details on the computations performed, see "Function".

SGEMMS, DGEMMS, CGEMMS, and ZGEMMS use Winograd's variation of the Strassen's algorithm with minor changes for tuning purposes. (See pages 45 and 46 in reference [11].) The subroutines compute matrix multiplication for both real and complex matrices of large sizes. Complex matrix multiplication uses a special technique, using three real matrix multiplications and five real matrix additions. Each of these three resulting matrix multiplications then uses Strassen's algorithm.

Strassen's Algorithm

The steps of Strassen's algorithm can be repeated up to four times by these subroutines, with each step reducing the dimensions of the matrix by a factor of two. The number of steps used by this subroutine depends on the size of the input matrices. Each step reduces the number of operations by about 10% from the normal matrix multiplication. On the other hand, if the matrix is small, a normal matrix multiplication is performed without using the Strassen's algorithm, and no improvement is gained. For details about small matrices, see "Notes".

Complex Matrix Multiplication

The complex multiplication is performed by forming the real and imaginary parts of the input matrices. These subroutines uses three real matrix multiplications and five real matrix additions, instead of the normal four real matrix multiplications and two real matrix additions. Using only three real matrix multiplications allows the subroutine to achieve up to a 25% reduction in matrix operations, which can result in a significant savings in computing time for large matrices.

Accuracy Considerations

Strassen's method is not stable for certain row or column scalings of the input matrices A and B. Therefore, for matrices A and B with divergent exponent values Strassen's method may give inaccurate results. For these cases, you should use the _GEMUL or _GEMM subroutines.

Special Usage

The equivalence rules, defined for matrix multiplication of A and B in "Special Usage", also apply to these subroutines. You should use the equivalence rules when you want to transpose or conjugate transpose the result of the multiplication computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving C^T or C^H, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See "Example 2" and "Example 4".

Error Conditions

Resource Errors

Error 2015 is unrecoverable, naux = 0, and unable to allocate work area.

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
l, m, n < 0
l > ldc
transa, transb <> 'N' or 'T' for SGEMMS and DGEMMS
transa, transb <> 'N', 'T', or 'C' for CGEMMS and ZGEMMS
transa = 'N' and l > lda
transa = 'T' or 'C' and m > lda
transb = 'N' and m > ldb
transb = 'T' or 'C' and n > ldb
Error 2015 is recoverable or naux<>0, and naux is too small--that is, less than the minimum required value. Return code 1 is returned if error 2015 is recoverable.

Example 1

This example shows the computation C <-- AB, where A, B, and C are contained in larger arrays A, B, and C, respectively. It shows how to code the calling sequence for SGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in "Example 1" for SGEMUL.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N   AUX  NAUX
             |   |    |     |   |    |     |   |   |   |   |    |    |
CALL SGEMMS( A , 8 , 'N'  , B , 6 , 'N'  , C , 7 , 6 , 5 , 4 , AUX , 0  )

Example 2

This example shows the computation C <-- AB^H, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained. The arrays contain complex data. This example shows how to code the calling sequence for CGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in "Example 8" for CGEMUL.

Call Statement and Input

             A  LDA TRANSA  B  LDB TRANSB  C  LDC  L   M   N   AUX  NAUX
             |   |    |     |   |    |     |   |   |   |   |    |    |
CALL CGEMMS( A , 4 , 'N'  , B , 3 , 'C'  , C , 4 , 3 , 2 , 3 , AUX , 0 )

SGEMM, DGEMM, CGEMM, and ZGEMM--Combined Matrix Multiplication and Addition for General Matrices, Their Transposes, or Conjugate Transposes

SGEMM and DGEMM can perform any one of the following combined matrix computations, using scalars alpha and beta, matrices A and B or their transposes, and matrix C:

C <-- alphaAB+betaC C <-- alphaAB^T+betaC
C <-- alphaA^TB+betaC C <-- alphaA^TB^T+betaC

CGEMM and ZGEMM can perform any one of the following combined matrix computations, using scalars alpha and beta, matrices A and B, their transposes or their conjugate transposes, and matrix C:

C <-- alphaAB+betaC C <-- alphaAB^T+betaC C <-- alphaAB^H+betaC
C <-- alphaA^TB+betaC C <-- alphaA^TB^T+betaC C <-- alphaA^TB^H+betaC
C <-- alphaA^HB+betaC C <-- alphaA^HB^T+betaC C <-- alphaA^HB^H+betaC

Table 76. Data Types

A, B, C, alpha, beta Subroutine
Short-precision real SGEMM
Long-precision real DGEMM
Short-precision complex CGEMM
Long-precision complex ZGEMM

Syntax

Fortran	CALL SGEMM \| DGEMM \| CGEMM \| ZGEMM (`transa`, `transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`)
C and C++	sgemm \| dgemm \| cgemm \| zgemm (`transa`, `transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);
PL/I	CALL SGEMM \| DGEMM \| CGEMM \| ZGEMM (`transa`, `transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);

On Entry

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Specified as: a single character; transa = 'N', 'T', or 'C'.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Specified as: a single character; transb = 'N', 'T', or 'C'.

l

is the number of rows in matrix C. Specified as: a fullword integer; 0 <= l <= ldc.

n

is the number of columns in matrix C. Specified as: a fullword integer; n >= 0.

m

has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: a fullword integer; m >= 0.

alpha

is the scalar alpha. Specified as: a number of the data type indicated in Table 76.

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', A^T is used in the computation, and A has m rows and l columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 76, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If transa = 'N', lda >= l.

If transa = 'T' or 'C', lda >= m.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 76, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If transb = 'N', ldb >= m.

If transb = 'T' or 'C', ldb >= n.

beta

is the scalar beta. Specified as: a number of the data type indicated in Table 76.

c

is the l by n matrix C. Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 76.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= l.

On Return

c: is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 76.

Notes

All subroutines accept lowercase letters for the transa and transb arguments.
For SGEMM and DGEMM, if you specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.
Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See "Concepts".

Function

The combined matrix addition and multiplication is expressed as follows, where a_ik, b_kj, and c_ij are elements of matrices A, B, and C, respectively:

See references [32] and [38]. In the following three cases, no computation is performed:

l is 0.
n is 0.
beta is 1 and alpha is 0.

Assuming the above conditions do not exist, if beta <> 1 and m is 0, then betaC is returned.

Special Usage

Equivalence Rules

The equivalence rules, defined for matrix multiplication of A and B in "Special Usage", also apply to the matrix multiplication part of the computation performed by this subroutine. You should use the equivalent rules when you want to transpose or conjugate transpose the multiplication part of the computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your input and output array C has dimensions large enough to hold the resulting matrix. See "Example 4".

Error Conditions

Resource Errors

Unable to allocate internal work area (CGEMM and ZGEMM only).

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
l, m, n < 0
l > ldc
transa, transb <> 'N', 'T', or 'C'
transa = 'N' and l > lda
transa = 'T' or 'C' and m > lda
transb = 'N' and m > ldb
transb = 'T' or 'C' and n > ldb

Example 1

This example shows the computation C <-- alphaAB+betaC, where A, B, and C are contained in larger arrays A, B, and C, respectively.

Call Statement and Input

           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'N'  , 6 , 4 , 5 , 1.0 , A , 8 , B , 6 , 2.0 , C , 7 )

        *                              *
        |  1.0   2.0  -1.0  -1.0   4.0 |
        |  2.0   0.0   1.0   1.0  -1.0 |
        |  1.0  -1.0  -1.0   1.0   2.0 |
A    =  | -3.0   2.0   2.0   2.0   0.0 |
        |  4.0   0.0  -2.0   1.0  -1.0 |
        | -1.0  -1.0   1.0  -3.0   2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |
        *                              *

        *                        *
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |
        *                        *

        *                    *
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        |  .    .    .    .  |
        *                    *

Output

        *                        *
        | 24.0  13.0  -5.0   3.0 |
        | -3.0  -4.0   2.0   4.0 |
        |  4.0   1.0   2.0   5.0 |
C    =  | -2.0   6.0  -1.0  -9.0 |
        | -4.0  -6.0   5.0   5.0 |
        | 16.0   7.0  -4.0   7.0 |
        |   .     .     .     .  |
        *                        *

Example 2

This example shows the computation C <-- alphaAB^T+betaC, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B in which it is contained.

Call Statement and Input

           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'T'  , 3 , 3 , 2 , 1.0 , A , 4 , B , 3 , 2.0 , C , 5 )

        *           *
        | 1.0  -3.0 |
A    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        |  .     .  |
        *           *

        *           *
        | 1.0  -3.0 |
B    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        *           *

        *               *
        | 0.5  0.5  0.5 |
        | 0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5 |
        |  .    .    .  |
        |  .    .    .  |
        *               *

Output

        *                  *
        | 11.0  -9.0   5.0 |
        | -9.0  21.0  -1.0 |
C    =  |  5.0  -1.0   3.0 |
        |   .     .     .  |
        |   .     .     .  |
        *                  *

Example 3

This example shows the computation C <-- alphaAB+betaC using complex data, where A, B, and C are contained in larger arrays, A, B, and C, respectively.

Call Statement and Input

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'N'  , 6 , 2 , 3 , ALPHA , A , 8 , B , 4 , BETA , C , 8 )

ALPHA    =  (1.0, 0.0)
BETA     =  (2.0, 0.0)
 
        *                                    *
        | (1.0, 5.0)  (9.0, 2.0)  (1.0, 9.0) |
        | (2.0, 4.0)  (8.0, 3.0)  (1.0, 8.0) |
        | (3.0, 3.0)  (7.0, 5.0)  (1.0, 7.0) |
A    =  | (4.0, 2.0)  (4.0, 7.0)  (1.0, 5.0) |
        | (5.0, 1.0)  (5.0, 1.0)  (1.0, 6.0) |
        | (6.0, 6.0)  (3.0, 6.0)  (1.0, 4.0) |
        |     .           .           .      |
        |     .           .           .      |
        *                                    *

        *                        *
        | (1.0, 8.0)  (2.0, 7.0) |
B    =  | (4.0, 4.0)  (6.0, 8.0) |
        | (6.0, 2.0)  (4.0, 5.0) |
        |     .           .      |
        *                        *

        *                        *
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
C    =  | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        |     .           .      |
        |     .           .      |
        *                        *

Output

        *                                *
        | (-22.0, 113.0)  (-35.0, 142.0) |
        | (-19.0, 114.0)  (-35.0, 141.0) |
        | (-20.0, 119.0)  (-43.0, 146.0) |
C    =  | (-27.0, 110.0)  (-58.0, 131.0) |
        |   (8.0, 103.0)    (0.0, 112.0) |
        | (-55.0, 116.0)  (-75.0, 135.0) |
        |       .               .        |
        |       .               .        |
        *                                *

Example 4

This example shows how to obtain the conjugate transpose of AB^H.

This shows the conjugate transpose of the computation performed in "Example 8" for CGEMUL, which uses the following calling sequence:

CALL CGEMUL( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 )

You instead code the calling sequence for C <-- betaC+alphaBA^H, where beta = 0, alpha = 1, and the array C has the correct dimensions to receive the transposed matrix. Because beta is zero, betaC = 0. For a description of all the matrix identities, see "Special Usage".

Call Statement and Input

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'C'  , 3 , 3 , 2 , ALPHA , B , 3 , A , 3 , BETA , C , 4 )

ALPHA    =  (1.0, 0.0)
BETA     =  (0.0, 0.0)
 
        *                         *
        | (1.0, 3.0)  (-3.0, 2.0) |
B    =  | (2.0, 5.0)   (4.0, 6.0) |
        | (1.0, 1.0)  (-1.0, 9.0) |
        *                         *

        *                         *
        | (1.0, 2.0)  (-3.0, 2.0) |
A    =  | (2.0, 6.0)   (4.0, 5.0) |
        | (1.0, 2.0)  (-1.0, 8.0) |
        |     .            .      |
        *                         *

C        =(not relevant)

Output

        *                                            *
        | (20.0,   1.0)  (18.0, 23.0)  (26.0,  23.0) |
C    =  | (12.0, -25.0)  (80.0,  2.0)  (56.0, -37.0) |
        | (24.0, -26.0)  (49.0, 37.0)  (76.0,  -2.0) |
        |      .              .             .        |
        *                                            *

Example 5

This example shows the computation C <-- alphaA^TB^H+betaC using complex data, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained. Because beta is zero, betaC = 0. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)

Call Statement and Input

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'T'  , 'C'  , 1 , 3 , 3 , ALPHA , A , 3 , B , 3 , BETA , C , 1 )

ALPHA    =  (1.0, 1.0)
BETA     =  (0.0, 0.0)
 
        *             *
        | (1.0,  2.0) |
A    =  | (2.0,  5.0) |
        | (1.0,  6.0) |
        *             *

        *                                      *
        | (1.0, 6.0)  (-3.0, 4.0)   (2.0, 6.0) |
B    =  | (2.0, 3.0)   (4.0, 6.0)   (0.0, 3.0) |
        | (1.0, 3.0)  (-1.0, 6.0)  (-1.0, 9.0) |
        *                                      *

C        =(not relevant)

Output

        *                                         *
A    =  | (86.0, 44.0) (58.0, 70.0) (121.0, 55.0) |
        *                                         *

SSYMM, DSYMM, CSYMM, ZSYMM, CHEMM, and ZHEMM--Matrix-Matrix Product Where One Matrix is Real or Complex Symmetric or Complex Hermitian

These subroutines compute one of the following matrix-matrix products, using the scalars alpha and beta and matrices A, B, and C:

1. C <-- alphaAB+betaC

2. C <-- alphaBA+betaC

where matrix A is stored in either upper or lower storage mode, and:

For SSYMM and DSYMM, matrix A is real symmetric.
For CSYMM and ZSYMM, matrix A is complex symmetric.
For CHEMM and ZHEMM, matrix A is complex Hermitian.

Table 77. Data Types

alpha, A, B, beta, C Subprogram
Short-precision real SSYMM
Long-precision real DSYMM
Short-precision complex CSYMM and CHEMM
Long-precision complex ZSYMM and ZHEMM

Syntax

Fortran	CALL SSYMM \| DSYMM \| CSYMM \| ZSYMM \| CHEMM \| ZHEMM (`side`, `uplo`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`)
C and C++	ssymm \| dsymm \| csymm \| zsymm \| chemm \| zhemm (`side`, `uplo`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);
PL/I	CALL SSYMM \| DSYMM \| CSYMM \| ZSYMM \| CHEMM \| ZHEMM (`side`, `uplo`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);

On Entry

side

indicates whether matrix A is located to the left or right of rectangular matrix B in the equation used for this computation, where:

If side = 'L', A is to the left of B, resulting in equation 1.

If side = 'R', A is to the right of B, resulting in equation 2.

Specified as: a single character. It must be 'L' or 'R'.

uplo

indicates the storage mode used for matrix A, where:

If uplo = 'U', A is stored in upper storage mode.

If uplo = 'L', A is stored in lower storage mode.

Specified as: a single character. It must be 'U' or 'L'.

m

is the number of rows in rectangular matrices B and C, and:

If side = 'L', m is the order of triangular matrix A.

Specified as: a fullword integer; 0 <= m <= ldb, m <= ldc, and:

If side = 'L', m <= lda.

n

is the number of columns in rectangular matrices B and C, and:

If side = 'R', n is the order of triangular matrix A.

Specified as: a fullword integer; n >= 0 and:

If side = 'R', n <= lda.

alpha

is the scalar alpha. Specified as: a number of the data type indicated in Table 77.

a

is the real symmetric, complex symmetric, or complex Hermitian matrix A, where:

If side = 'L', A is order m.

If side = 'R', A is order n.

and where it is stored as follows:

If uplo = 'U', A is stored in upper storage mode.

If uplo = 'L', A is stored in lower storage mode.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 77, where:

If side = 'L', its size must be lda by (at least) m.

If side = 'R', it size must be lda by (at least) n.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If side = 'L', lda >= m.

If side = 'R', lda >= n.

b

is the m by n rectangular matrix B. Specified as: an ldb by (at least) n array, containing numbers of the data type indicated in Table 77.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and ldb >= m.

beta

is the scalar beta. Specified as: a number of the data type indicated in Table 77.

c

is the m by n rectangular matrix C. Specified as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 77.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= m.

On Return

c: is the m by n matrix C, containing the results of the computation.
Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 77.

Notes

These subroutines accept lowercase letters for the side and uplo arguments.
Matrices A, B, and C must have no common elements; otherwise, results are unpredictable.
If matrix A is upper triangular (uplo = 'U'), these subroutines use only the data in the upper triangular portion of the array. If matrix A is lower triangular, (uplo = 'L'), these subroutines use only the data in the lower triangular portion of the array. In each case, the other portion of the array is altered during the computation, but restored before exit.
The imaginary parts of the diagonal elements of a complex Hermitian matrix A are assumed to be zero, so you do not have to set these values.
For a description of how symmetric matrices are stored in upper and lower storage mode, see "Symmetric Matrix". For a description of how complex Hermitian matrices are stored in upper and lower storage mode, see "Complex Hermitian Matrix".

Function

These subroutines can perform the following matrix-matrix product computations using matrix A, which is real symmetric for SSYMM and DSYMM, complex symmetric for CSYMM and ZSYMM, and complex Hermitian for CHEMM and ZHEMM:

C <-- alphaAB+betaC
C <-- alphaBA+betaC
where:

alpha and beta are scalars.
A is a matrix of the type indicated above, stored in upper or lower storage mode. It is order m for equation 1 and order n for equation 2.
B and C are m by n rectangular matrices.

See references [32] and [38]. In the following two cases, no computation is performed:

n or m is 0.
beta is one and alpha is zero.

Error Conditions

Resource Errors

Unable to allocate internal work area.

Computational Errors

None

Input-Argument Errors

m < 0
m > ldb
m > ldc
n < 0
lda, ldb, ldc <= 0
side <> 'L' or 'R'
uplo <> 'L' or 'U'
side = 'L' and m > lda
side = 'R' and n > lda

Example 1

This example shows the computation C <-- alphaAB+betaC, where A is a real symmetric matrix of order 5, stored in upper storage mode, and B and C are 5 by 4 rectangular matrices.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA  C  LDC
             |     |    |   |     |     |   |   |   |    |    |   |
CALL SSYMM( 'L' , 'U' , 5 , 4 ,  2.0  , A , 8 , B , 6 , 1.0 , C , 5 )

        *                            *
        | 1.0  2.0  -1.0  -1.0   4.0 |
        |  .   0.0   1.0   1.0  -1.0 |
        |  .    .   -1.0   1.0   2.0 |
A    =  |  .    .     .    2.0   0.0 |
        |  .    .     .     .   -1.0 |
        |  .    .     .     .     .  |
        |  .    .     .     .     .  |
        |  .    .     .     .     .  |
        *                            *

        *                        *
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |
        *                        *

        *                        *
        | 23.0  12.0  -6.0   2.0 |
        | -4.0  -5.0   1.0   3.0 |
C    =  |  5.0   6.0  -1.0  -4.0 |
        | -4.0   1.0   0.0  -5.0 |
        |  8.0  -4.0  -2.0  13.0 |
        *                        *

Output

        *                            *
        |  69.0   36.0  -18.0    6.0 |
        | -12.0  -15.0    3.0    9.0 |
C    =  |  15.0   18.0   -3.0  -12.0 |
        | -12.0    3.0    0.0  -15.0 |
        |   8.0  -20.0   -2.0   35.0 |
        *                            *

Example 2

This example shows the computation C <-- alphaAB+betaC, where A is a real symmetric matrix of order 3, stored in lower storage mode, and B and C are 3 by 6 rectangular matrices.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA  C  LDC
             |     |    |   |     |     |   |   |   |    |    |   |
CALL SSYMM( 'L' , 'L' , 3 , 6 ,  2.0  , A , 4 , B , 3 , 2.0 , C , 5 )

        *                 *
        | 1.0    .     .  |
A    =  | 2.0   4.0    .  |
        | 1.0  -1.0  -1.0 |
        |  .     .     .  |
        *                 *

        *                                   *
        | 1.0  -3.0   2.0   2.0  -1.0   2.0 |
B    =  | 2.0   4.0   0.0   0.0   1.0  -2.0 |
        | 1.0  -1.0  -1.0  -1.0  -1.0   1.0 |
        *                                   *

        *                                  *
        |  6.0   4.0  1.0  1.0   0.0  -1.0 |
        |  9.0  11.0  5.0  5.0   3.0  -5.0 |
C    =  | -2.0  -6.0  3.0  3.0  -1.0  32.0 |
        |   .     .    .    .     .     .  |
        |   .     .    .    .     .     .  |
        *                                  *

Output

        *                                      *
        | 24.0   16.0   4.0   4.0   0.0   -4.0 |
        | 36.0   44.0  20.0  20.0  12.0  -20.0 |
C    =  | -8.0  -24.0  12.0  12.0  -4.0   12.0 |
        |   .      .     .     .     .      .  |
        |   .      .     .     .     .      .  |
        *                                      *

Example 3

This example shows the computation C <-- alphaBA+betaC, where A is a real symmetric matrix of order 3, stored in upper storage mode, and B and C are 2 by 3 rectangular matrices.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA  C  LDC
             |     |    |   |     |     |   |   |   |    |    |   |
CALL SSYMM( 'R' , 'U' , 2 , 3 ,  2.0  , A , 4 , B , 3 , 1.0 , C , 5 )

        *                 *
        | 1.0  -3.0   1.0 |
A    =  |  .    4.0  -1.0 |
        |  .     .    2.0 |
        |  .     .     .  |
        *                 *

        *                 *
        | 1.0  -3.0   3.0 |
B    =  | 2.0   4.0  -1.0 |
        |  .     .     .  |
        *                 *

        *                    *
        |  13.0  -18.0  10.0 |
        | -11.0   11.0  -4.0 |
C    =  |    .      .     .  |
        |    .      .     .  |
        |    .      .     .  |
        *                    *

Output

        *                     *
        |  39.0  -54.0   30.0 |
        | -33.0   33.0  -12.0 |
C    =  |    .      .      .  |
        |    .      .      .  |
        |    .      .      .  |
        *                     *

Example 4

This example shows the computation C <-- alphaBA+betaC, where A is a real symmetric matrix of order 3, stored in lower storage mode, and B and C are 3 by 3 square matrices.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA  C  LDC
             |     |    |   |     |     |   |   |   |    |    |   |
CALL SSYMM( 'R' , 'L' , 3 , 3 , -1.0  , A , 3 , B , 3 , 1.0 , C , 3 )

        *                *
        | 1.0    .    .  |
A    =  | 2.0  10.0   .  |
        | 1.0  11.0  4.0 |
        *                *

        *                 *
        | 1.0  -3.0   2.0 |
B    =  | 2.0   4.0   0.0 |
        | 1.0  -1.0  -1.0 |
        *                 *

        *                  *
        |  1.0   5.0  -9.0 |
C    =  | -3.0  10.0  -2.0 |
        | -2.0   8.0   0.0 |
        *                  *

Output

        *                     *
        |   4.0   11.0   15.0 |
C    =  | -13.0  -34.0  -48.0 |
        |   0.0   27.0   14.0 |
        *                     *

Example 5

This example shows the computation C <-- alphaBA+betaC, where A is a complex symmetric matrix of order 3, stored in upper storage mode, and B and C are 2 by 3 rectangular matrices.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |     |    |   |     |     |   |   |   |    |     |   |
CALL CSYMM( 'R' , 'U' , 2 , 3 , ALPHA , A , 4 , B , 3 , BETA , C , 5 )

ALPHA    =  (2.0, 3.0)
 
BETA     =  (1.0, 6.0)

        *                                      *
        | (1.0, 5.0)  (-3.0, 2.0)   (1.0, 6.0) |
A    =  |     .        (4.0, 5.0)  (-1.0, 4.0) |
        |     .            .        (2.0, 5.0) |
        |     .            .            .      |
        *                                      *

        *                                      *
        | (1.0, 1.0)  (-3.0, 2.0)   (3.0, 3.0) |
B    =  | (2.0, 6.0)   (4.0, 5.0)  (-1.0, 4.0) |
        |     .            .            .      |
        *                                      *

        *                                         *
        |  (13.0, 6.0)  (-18.0, 6.0)  (10.0, 7.0) |
        | (-11.0, 8.0)   (11.0, 1.0)  (-4.0, 2.0) |
C    =  |       .             .            .      |
        |       .             .            .      |
        |       .             .            .      |
        *                                         *

Output

        *                                                      *
        |  (-96.0,   72.0)  (-141.0, -226.0)  (-112.0,   38.0) |
        | (-230.0, -269.0)  (-133.0,  -23.0)  (-272.0, -198.0) |
C    =  |        .                 .                 .         |
        |        .                 .                 .         |
        |        .                 .                 .         |
        *                                                      *

Example 6

This example shows the computation C <-- alphaBA+betaC, where A is a complex Hermitian matrix of order 3, stored in lower storage mode, and B and C are 3 by 3 square matrices.
Note: The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values.

Call Statement and Input

            SIDE  UPLO  M   N   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |     |    |   |     |     |   |   |   |    |     |   |
CALL CHEMM( 'R' , 'L' , 2 , 3 , ALPHA , A , 4 , B , 3 , BETA , C , 5 )

ALPHA    =  (2.0, 3.0)
 
BETA     =  (1.0, 6.0)

        *                                     *
        |  (1.0,  . )      .           .      |
A    =  |  (3.0, 2.0)  (4.0,  . )      .      |
        | (-1.0, 6.0)  (1.0, 4.0)  (2.0,  . ) |
        |      .           .           .      |
        *                                     *

        *                                      *
        | (1.0, 1.0)  (-3.0, 2.0)   (3.0, 3.0) |
B    =  | (2.0, 6.0)   (4.0, 5.0)  (-1.0, 4.0) |
        |     .            .            .      |
        *                                      *

        *                                         *
        |  (13.0, 6.0)  (-18.0, 6.0)  (10.0, 7.0) |
        | (-11.0, 8.0)   (11.0, 1.0)  (-4.0, 2.0) |
C    =  |       .             .            .      |
        |       .             .            .      |
        |       .             .            .      |
        *                                         *

Output

        *                                                   *
        | (-137.0,  17.0)  (-158.0, -102.0)  (-39.0, 141.0) |
        | (-154.0, -77.0)   (-63.0,  186.0)  (159.0, 104.0) |
C    =  |        .                .                .        |
        |        .                .                .        |
        |        .                .                .        |
        *                                                   *

STRMM, DTRMM, CTRMM, and ZTRMM--Triangular Matrix-Matrix Product

STRMM and DTRMM compute one of the following matrix-matrix products, using the scalar alpha, rectangular matrix B, and triangular matrix A or its transpose:

1. B <-- alphaAB 3. B <-- alphaBA
2. B <-- alphaA^TB 4. B <-- alphaBA^T

CTRMM and ZTRMM compute one of the following matrix-matrix products, using the scalar alpha, rectangular matrix B, and triangular matrix A, its transpose, or its conjugate transpose:

1. B <-- alphaAB 3. B <-- alphaBA 5. B <-- alphaA^HB
2. B <-- alphaA^TB 4. B <-- alphaBA^T 6. B <-- alphaBA^H

Table 78. Data Types

A, B, alpha Subroutine
Short-precision real STRMM
Long-precision real DTRMM
Short-precision complex CTRMM
Long-precision complex ZTRMM

Syntax

Fortran	CALL STRMM \| DTRMM \| CTRMM \| ZTRMM (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`)
C and C++	strmm \| dtrmm \| ctrmm \| ztrmm (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`);
PL/I	CALL STRMM \| DTRMM \| CTRMM \| ZTRMM (`side`, `uplo`, `transa`, `diag`, `m`, `n`, `alpha`, `a`, `lda`, `b`, `ldb`);

On Entry

side

indicates whether the triangular matrix A is located to the left or right of rectangular matrix B in the equation used for this computation, where:

If side = 'L', A is to the left of B in the equation, resulting in either equation 1, 2, or 5.

If side = 'R', A is to the right of B in the equation, resulting in either equation 3, 4, or 6.

Specified as: a single character. It must be 'L' or 'R'.

uplo

indicates whether matrix A is an upper or lower triangular matrix, where:

If uplo = 'U', A is an upper triangular matrix.

If uplo = 'L', A is a lower triangular matrix.

Specified as: a single character. It must be 'U' or 'L'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in either equation 1 or 3.

If transa = 'T', A^T is used in the computation, resulting in either equation 2 or 4.

If transa = 'C', A^H is used in the computation, resulting in either equation 5 or 6.

Specified as: a single character. It must be 'N', 'T', or 'C'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Specified as: a single character. It must be 'U' or 'N'.

m

is the number of rows in rectangular matrix B, and:

If side = 'L', m is the order of triangular matrix A.

Specified as: a fullword integer, where:

If side = 'L', 0 <= m <= lda and m <= ldb.

If side = 'R', 0 <= m <= ldb.

n

is the number of columns in rectangular matrix B, and:

If side = 'R', n is the order of triangular matrix A.

Specified as: a fullword integer; n >= 0 and:

If side = 'R', n <= lda.

alpha

is the scalar alpha. Specified as: a number of the data type indicated in Table 78.

a

is the triangular matrix A, of which only the upper or lower triangular portion is used, where:

If side = 'L', A is order m.

If side = 'R', A is order n.
Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 78, where:

If side = 'L', its size must be lda by (at least) m.

If side = 'R', it size must be lda by (at least) n.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If side = 'L', lda >= m.

If side = 'R', lda >= n.

b

is the m by n rectangular matrix B. Specified as: an ldb by (at least) n array, containing numbers of the data type indicated in Table 78.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and ldb >= m.

On Return

b: is the m by n matrix B, containing the results of the computation. Returned as: an ldb by (at least) n array, containing numbers of the data type indicated in Table 78.

Notes

These subroutines accept lowercase letters for the side, uplo, transa, and diag arguments.
For STRMM and DTRMM, if you specify 'C' for the transa argument, it is interpreted as though you specified 'T'.
Matrices A and B must have no common elements; otherwise, results are unpredictable.
ESSL assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be 1.0 for real matrices and (1.0, 0.0) for complex matrices. When using upper- or lower-triangular storage, the unreferenced elements in the lower and upper triangular part, respectively, are assumed to be zero.
For a description of triangular matrices and how they are stored, see "Triangular Matrix".

Function

These subroutines can perform the following matrix-matrix product computations, using the triangular matrix A, its transpose, or its conjugate transpose, where A can be either upper- or lower-triangular:

B <-- alphaAB
B <-- alphaA^TB
B <-- alphaA^HB (for CTRMM and ZTRMM only)
where:

alpha is a scalar.
A is a triangular matrix of order m.
B is an m by n rectangular matrix.
B <-- alphaBA
B <-- alphaBA^T
B <-- alphaBA^H (for CTRMM and ZTRMM only)
where:

alpha is a scalar.
A is a triangular matrix of order n.
B is an m by n rectangular matrix.

See references [32] and [38]. If n or m is 0, no computation is performed.

Error Conditions

Resource Errors

Unable to allocate internal work area.

Computational Errors

None

Input-Argument Errors

m < 0
n < 0
lda, ldb <= 0
side <> 'L' or 'R'
uplo <> 'L' or 'U'
transa <> 'T', 'N', or 'C'
diag <> 'N' or 'U'
side = 'L' and m > lda
m > ldb
side = 'R' and n > lda

Example 1

This example shows the computation B <-- alphaAB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 3 rectangular matrix.

Call Statement and Input

            SIDE  UPLO TRANSA  DIAG  M   N   ALPHA   A  LDA  B  LDB
             |     |     |      |    |   |     |     |   |   |   |
CALL STRMM( 'L' , 'U' , 'N'  , 'N' , 5 , 3 ,  1.0  , A , 7 , B , 6 )

        *                             *
        | 3.0  -1.0   2.0   2.0   1.0 |
        |  .   -2.0   4.0  -1.0   3.0 |
        |  .     .   -3.0   0.0   2.0 |
A    =  |  .     .     .    4.0  -2.0 |
        |  .     .     .     .    1.0 |
        |  .     .     .     .     .  |
        |  .     .     .     .     .  |
        *                             *

        *                 *
        |  2.0  3.0   1.0 |
        |  5.0  5.0   4.0 |
B    =  |  0.0  1.0   2.0 |
        |  3.0  1.0  -3.0 |
        | -1.0  2.0   1.0 |
        |   .    .     .  |
        *                 *

Output

        *                    *
        |   6.0  10.0   -2.0 |
        | -16.0  -1.0    6.0 |
B    =  |  -2.0   1.0   -4.0 |
        |  14.0   0.0  -14.0 |
        |  -1.0   2.0    1.0 |
        |    .     .      .  |
        *                    *

Example 2

This example shows the computation B <-- alphaA^TB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 4 rectangular matrix.

Call Statement and Input

            SIDE  UPLO TRANSA  DIAG  M   N    ALPHA  A  LDA  B  LDB
             |     |     |      |    |   |     |     |   |   |   |
CALL STRMM( 'L' , 'U' , 'T'  , 'N' , 5 , 4 ,  1.0  , A , 7 , B , 6 )

        *                              *
        | -1.0  -4.0  -2.0   2.0   3.0 |
        |   .   -2.0   2.0   2.0   2.0 |
        |   .     .   -3.0  -1.0   4.0 |
A    =  |   .     .     .    1.0   0.0 |
        |   .     .     .     .   -2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |
        *                              *

        *                        *
        |  1.0   2.0   3.0   4.0 |
        |  3.0   3.0  -1.0   2.0 |
B    =  | -2.0  -1.0   0.0   1.0 |
        |  4.0   4.0  -3.0  -3.0 |
        |  2.0   2.0   2.0   2.0 |
        |   .     .     .     .  |
        *                        *

Output

        *                          *
        | -1.0  -2.0   -3.0   -4.0 |
        |  2.0  -2.0  -14.0  -12.0 |
B    =  | 10.0   5.0   -8.0   -7.0 |
        | 14.0  15.0    1.0    8.0 |
        | -3.0   4.0    3.0   16.0 |
        |   .     .      .      .  |
        *                          *

Example 3

This example shows the computation B <-- alphaBA, where A is a 5 by 5 lower triangular matrix that is not unit triangular, and B is a 3 by 5 rectangular matrix.

Call Statement and Input

            SIDE  UPLO TRANSA  DIAG  M   N    ALPHA  A  LDA  B  LDB
             |     |     |      |    |   |     |     |   |   |   |
CALL STRMM( 'R' , 'L' , 'N'  , 'N' , 3 , 5 ,  1.0  , A , 7 , B , 4 )

        *                            *
        | 2.0   .     .     .     .  |
        | 2.0  3.0    .     .     .  |
        | 2.0  1.0   1.0    .     .  |
A    =  | 0.0  3.0   0.0  -2.0    .  |
        | 2.0  4.0  -1.0   2.0  -1.0 |
        |  .    .     .     .     .  |
        |  .    .     .     .     .  |
        *                            *

        *                              *
        |  3.0   4.0  -1.0  -1.0  -1.0 |
B    =  |  2.0   1.0  -1.0   0.0   3.0 |
        | -2.0  -1.0  -3.0   0.0   2.0 |
        |   .     .     .     .     .  |
        *                              *

Output

        *                             *
        | 10.0   4.0   0.0  0.0   1.0 |
B    =  | 10.0  14.0  -4.0  6.0  -3.0 |
        | -8.0   2.0  -5.0  4.0  -2.0 |
        |   .     .     .    .     .  |
        *                             *

Example 4

This example shows the computation B <-- alphaBA, where A is a 6 by 6 upper triangular matrix that is unit triangular, and B is a 1 by 6 rectangular matrix.
Note: Because matrix A is unit triangular, the diagonal elements are not referenced. ESSL assumes a value of 1.0 for the diagonal elements.

Call Statement and Input

            SIDE  UPLO TRANSA  DIAG  M   N    ALPHA  A  LDA  B  LDB
             |     |     |      |    |   |     |     |   |   |   |
CALL STRMM( 'R' , 'U' , 'N'  , 'U' , 1 , 6 ,  1.0  , A , 7 , B , 2 )

        *                               *
        | .  2.0  -3.0  1.0   2.0   4.0 |
        | .   .    0.0  1.0   1.0  -2.0 |
        | .   .     .   4.0  -1.0   1.0 |
A    =  | .   .     .    .    0.0  -1.0 |
        | .   .     .    .     .    2.0 |
        | .   .     .    .     .     .  |
        | .   .     .    .     .     .  |
        *                               *

        *                                *
B    =  | 1.0  2.0  1.0  3.0  -1.0  -2.0 |
        |  .    .    .    .     .     .  |
        *                                *

Output

        *                                 *
B    =  | 1.0  4.0  -2.0  10.0  2.0  -6.0 |
        |  .    .     .     .    .     .  |
        *                                 *

Example 5

This example shows the computation B <-- alphaA^HB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 1 rectangular matrix.

Call Statement and Input

            SIDE  UPLO TRANSA  DIAG  M   N    ALPHA   A  LDA  B  LDB
             |     |     |      |    |   |      |     |   |   |   |
CALL CTRMM( 'L' , 'U' , 'C'  , 'N' , 5 , 1 ,  ALPHA , A , 6 , B , 6 )
 
ALPHA    =  (1.0, 0.0)

        *                                                                   *
        | (-4.0, 1.0)  (4.0, -3.0)  (-1.0,  3.0)  (0.0,  0.0)  (-1.0,  0.0) |
        |      .      (-2.0,  0.0)  (-3.0, -1.0) (-2.0, -1.0)   (4.0,  3.0) |
A    =  |      .           .        (-5.0,  3.0) (-3.0, -3.0)  (-5.0, -5.0) |
        |      .           .             .        (4.0, -4.0)   (2.0,  0.0) |
        |      .           .             .            .         (2.0, -1.0) |
        |      .           .             .            .             .       |
        *                                                                   *

        *             *
        |  (3.0, 4.0) |
        | (-4.0, 2.0) |
B    =  | (-5.0, 0.0) |
        |  (1.0, 3.0) |
        |  (3.0, 1.0) |
        |      .      |
        *             *

Output

        *               *
        | (-8.0, -19.0) |
        |  (8.0,  21.0) |
B    =  | (44.0,  -8.0) |
        | (13.0,  -7.0) |
        | (19.0,   2.0) |
        |      .        |
        *               *

SSYRK, DSYRK, CSYRK, ZSYRK, CHERK, and ZHERK--Rank-K Update of a Real or Complex Symmetric or a Complex Hermitian Matrix

These subroutines compute one of the following rank-k updates, where matrix C is stored in either upper or lower storage mode. SSYRK, DSYRK, CSYRK, and ZSYRK use the scalars alpha and beta, real or complex matrix A or its transpose, and real or complex symmetric matrix C to compute:

1. C <-- alphaAA^T+betaC

2. C <-- alphaA^TA+betaC

CHERK and ZHERK use the scalars alpha and beta, complex matrix A or its complex conjugate transpose, and complex Hermitian matrix C to compute:

3. C <-- alphaAA^H+betaC

4. C <-- alphaA^HA+betaC

Table 79. Data Types

A, C alpha, beta Subprogram
Short-precision real Short-precision real SSYRK
Long-precision real Long-precision real DSYRK
Short-precision complex Short-precision complex CSYRK
Long-precision complex Long-precision complex ZSYRK
Short-precision complex Short-precision real CHERK
Long-precision complex Long-precision real ZHERK

Syntax

Fortran	CALL SSYRK \| DSYRK \| CSYRK \| ZSYRK \| CHERK \| ZHERK (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `beta`, `c`, `ldc`)
C and C++	ssyrk \| dsyrk \| csyrk \| zsyrk \| cherk \| zherk (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `beta`, `c`, `ldc`);
PL/I	CALL SSYRK \| DSYRK \| CSYRK \| ZSYRK \| CHERK \| ZHERK (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `beta`, `c`, `ldc`);

On Entry

uplo

indicates the storage mode used for matrix C, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Specified as: a single character. It must be 'U' or 'L'.

trans

indicates the form of matrix A to use in the computation, where:

If trans = 'N', A is used, resulting in equation 1 or 3.

If trans = 'T', A^T is used, resulting in equation 2.

If trans = 'C', A^H is used, resulting in equation 4.

Specified as: a single character, where:

For SSYRK and DSYRK, it must be 'N', 'T', or 'C'.

For CSYRK and ZSYRK, it must be 'N' or 'T'.

For CHERK and ZHERK, it must be 'N' or 'C'.

n

is the order of matrix C. Specified as: a fullword integer; 0 <= n <= ldc and:

If trans = 'N', then n <= lda.

k

has the following meaning, where:

If trans = 'N', it is the number of columns in matrix A.

If trans = 'T' or 'C', it is the number of rows in matrix A.

Specified as: a fullword integer; k >= 0 and:

If trans = 'T' or 'C', then k <= lda.

alpha

is the scalar alpha. Specified as: a number of the data type indicated in Table 79.

a

is the rectangular matrix A, where:

If trans = 'N', A is n by k.

If trans = 'T' or 'C', A is k by n.
Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 79, where:

If trans = 'N', its size must be lda by (at least) k.

If trans = 'T' or 'C', its size must be lda by (at least) n.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If trans = 'N', lda >= n.

If trans = 'T' or 'C', lda >= k.

beta

is the scalar beta. Specified as: a number of the data type indicated in Table 79.

c

is matrix C of order n, which is real symmetric, complex symmetric, or complex Hermitian, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Specified as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 79.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= n.

On Return

c

is matrix C of order n, which is real symmetric, complex symmetric, or complex Hermitian, containing the results of the computation, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 79.

Notes

These subroutines accept lowercase letters for the uplo and trans arguments.
For SSYRK and DSYRK, if you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
Matrices A and C must have no common elements; otherwise, results are unpredictable.
The imaginary parts of the diagonal elements of a complex Hermitian matrix A are assumed to be zero, so you do not have to set these values. On output, they are set to zero, except when beta is one and alpha or k is zero, in which case no computation is performed.
For a description of how symmetric matrices are stored in upper and lower storage mode, see "Symmetric Matrix". For a description of how complex Hermitian matrices are stored in upper and lower storage mode, see "Complex Hermitian Matrix".

Function

These subroutines can perform the following rank-k updates. For SSYRK and DSYRK, matrix C is real symmetric. For CSYRK and ZSYRK, matrix C is complex symmetric. They perform:

1. C <-- alphaAA^T+betaC

2. C <-- alphaA^TA+betaC

For CHERK and ZHERK, matrix C is complex Hermitian. They perform:

3. C <-- alphaAA^H+betaC

4. C <-- alphaA^HA+betaC

where:

alpha and beta are scalars.

A is a rectangular matrix, which is n by k for equations 1 and 3, and is k by n for equations 2 and 4.

C is a matrix of order n of the type indicated above, stored in upper or lower storage mode.

See references [32] and [38]. In the following two cases, no computation is performed:

n is 0.
beta is one, and alpha is zero or k is zero.

Assuming the above conditions do not exist, if beta is not one, and alpha is zero or k is zero, then betaC is returned.

Error Conditions

Resource Errors

Unable to allocate internal work area.

Computational Errors

None

Input-Argument Errors

lda, ldc <= 0
ldc < n
k, n < 0
uplo <> 'U' or 'L'
trans <> 'N', 'T', or 'C' for SSYRK and DSYRK
trans <> 'N' or 'T' for CSYRK and ZSYRK
trans <> 'N' or 'C' for CHERK and ZHERK
trans = 'N' and lda < n
trans = 'T' or 'C' and lda < k

Example 1

This example shows the computation C <-- alphaAA^T+betaC, where A is an 8 by 2 real rectangular matrix, and C is a real symmetric matrix of order 8, stored in upper storage mode.

Call Statement and Input

            UPLO TRANS   N   K   ALPHA   A  LDA  BETA  C   LDC
             |     |     |   |     |     |   |    |    |    |
CALL SSYRK( 'U' , 'N' ,  8 , 2 ,  1.0  , A , 9 , 1.0 , C , 10 )

        *           *
        | 0.0   8.0 |
        | 1.0   9.0 |
        | 2.0  10.0 |
        | 3.0  11.0 |
A    =  | 4.0  12.0 |
        | 5.0  13.0 |
        | 6.0  14.0 |
        | 7.0  15.0 |
        |  .     .  |
        *           *

        *                                            *
        | 0.0  1.0  3.0  6.0  10.0  15.0  21.0  28.0 |
        |  .   2.0  4.0  7.0  11.0  16.0  22.0  29.0 |
        |  .    .   5.0  8.0  12.0  17.0  23.0  30.0 |
        |  .    .    .   9.0  13.0  18.0  24.0  31.0 |
C    =  |  .    .    .    .   14.0  19.0  25.0  32.0 |
        |  .    .    .    .     .   20.0  26.0  33.0 |
        |  .    .    .    .     .     .   27.0  34.0 |
        |  .    .    .    .     .     .     .   35.0 |
        |  .    .    .    .     .     .     .     .  |
        |  .    .    .    .     .     .     .     .  |
        *                                            *

Output

        *                                                      *
        | 64.0  73.0   83.0   94.0  106.0  119.0  133.0  148.0 |
        |   .   84.0   96.0  109.0  123.0  138.0  154.0  171.0 |
        |   .     .   109.0  124.0  140.0  157.0  175.0  194.0 |
        |   .     .      .   139.0  157.0  176.0  196.0  217.0 |
C    =  |   .     .      .      .   174.0  195.0  217.0  240.0 |
        |   .     .      .      .      .   214.0  238.0  263.0 |
        |   .     .      .      .      .      .   259.0  286.0 |
        |   .     .      .      .      .      .      .   309.0 |
        |   .     .      .      .      .      .      .      .  |
        |   .     .      .      .      .      .      .      .  |
        *                                                      *

Example 2

This example shows the computation C <-- alphaA^TA+betaC, where A is a 3 by 8 real rectangular matrix, and C is a real symmetric matrix of order 8, stored in lower storage mode.

Call Statement and Input

            UPLO TRANS   N   K   ALPHA   A  LDA  BETA  C  LDC
             |     |     |   |     |     |   |    |    |   |
CALL SSYRK( 'L' , 'T' ,  8 , 3 ,  1.0  , A , 4 , 1.0 , C , 8 )

        *                                             *
        | 0.0  3.0  6.0   9.0  12.0  15.0  18.0  21.0 |
A    =  | 1.0  4.0  7.0  10.0  13.0  16.0  19.0  22.0 |
        | 2.0  5.0  8.0  11.0  14.0  17.0  20.0  23.0 |
        |  .    .    .     .     .     .     .     .  |
        *                                             *

        *                                               *
        | 0.0    .     .     .     .     .     .     .  |
        | 1.0   8.0    .     .     .     .     .     .  |
        | 2.0   9.0  15.0    .     .     .     .     .  |
C    =  | 3.0  10.0  16.0  21.0    .     .     .     .  |
        | 4.0  11.0  17.0  22.0  26.0    .     .     .  |
        | 5.0  12.0  18.0  23.0  27.0  30.0    .     .  |
        | 6.0  13.0  19.0  24.0  28.0  31.0  33.0    .  |
        | 7.0  14.0  20.0  25.0  29.0  32.0  34.0  35.0 |
        *                                               *

Output

        *                                                          *
        |  5.0     .      .      .      .       .       .       .  |
        | 15.0   58.0     .      .      .       .       .       .  |
        | 25.0   95.0  164.0     .      .       .       .       .  |
C    =  | 35.0  132.0  228.0  323.0     .       .       .       .  |
        | 45.0  169.0  292.0  414.0  535.0      .       .       .  |
        | 55.0  206.0  356.0  505.0  653.0   800.0      .       .  |
        | 65.0  243.0  420.0  596.0  771.0   945.0  1118.0      .  |
        | 75.0  280.0  484.0  687.0  889.0  1090.0  1290.0  1489.0 |
        *                                                          *

Example 3

This example shows the computation C <-- alphaAA^T+betaC, where A is a 3 by 5 complex rectangular matrix, and C is a complex symmetric matrix of order 3, stored in upper storage mode.

Call Statement and Input

            UPLO TRANS   N   K   ALPHA   A  LDA  BETA   C  LDC
             |     |     |   |     |     |   |    |     |   |
CALL CSYRK( 'U' , 'N' ,  3 , 5 , ALPHA , A , 3 , BETA , C , 4 )

ALPHA    =  (1.0, 1.0)
BETA     =  (1.0, 1.0)

        *                                                        *
        | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) (1.0, 7.0) (0.0, 0.0) |
A    =  | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) (2.0, 4.0) (1.0, 2.0) |
        | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) (3.0, 2.0) (2.0, 2.0) |
        *                                                        *

        *                                  *
        | (2.0, 1.0) (1.0, 9.0) (4.0, 5.0) |
C    =  |     .      (3.0, 1.0) (6.0, 7.0) |
        |     .          .      (8.0, 1.0) |
        |     .          .          .      |
        *                                  *

Output

        *                                            *
        | (-57.0, 13.0) (-63.0, 79.0) (-24.0,  70.0) |
C    =  |       .       (-28.0, 90.0) (-55.0, 103.0) |
        |       .             .        (13.0,  75.0) |
        |       .             .             .        |
        *                                            *

Example 4

This example shows the computation C <-- alphaA^HA+betaC, where A is a 5 by 3 complex rectangular matrix, and C is a complex Hermitian matrix of order 3, stored in lower storage mode.
Note: The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values. On output, they are set to zero.

Call Statement and Input

            UPLO TRANS   N   K   ALPHA   A  LDA  BETA   C  LDC
             |     |     |   |     |     |   |    |     |   |
CALL CHERK( 'L' , 'C' ,  3 , 5 ,  1.0  , A , 5 , 1.0  , C , 4 )

        *                                  *
        | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) |
        | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) |
A    =  | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) |
        | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) |
        | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) |
        *                                  *

        *                                    *
        | (6.0,  . )       .           .     |
C    =  | (3.0, 4.0)  (10.0,  . )      .     |
        | (9.0, 1.0)  (12.0, 2.0)  (3.0, . ) |
        |     .            .           .     |
        *                                    *

Output

        *                                             *
        | (138.0,  0.0)        .               .      |
C    =  |  (65.0, 80.0)  (165.0,   0.0)        .      |
        | (134.0, 46.0)   (88.0, -88.0)  (199.0, 0.0) |
        |       .              .               .      |
        *                                             *

SSYR2K, DSYR2K, CSYR2K, ZSYR2K, CHER2K, and ZHER2K--Rank-2K Update of a Real or Complex Symmetric or a Complex Hermitian Matrix

These subroutines compute one of the following rank-2k updates, where matrix C is stored in upper or lower storage mode. SSYR2K, DSYR2K, CSYR2K, and ZSYR2K use the scalars alpha and beta, real or complex matrices A and B or their transposes, and real or complex symmetric matrix C to compute:

1. C <-- alphaAB^T+alphaBA^T+betaC

2. C <-- alphaA^TB+alphaB^TA+betaC

CHER2K and ZHER2K use the scalars alpha and beta, complex matrices A and B or their complex conjugate transposes, and complex Hermitian matrix C to compute:

Table 80. Data Types

A, B, C, alpha beta Subprogram
Short-precision real Short-precision real SSYR2K
Long-precision real Long-precision real DSYR2K
Short-precision complex Short-precision complex CSYR2K
Long-precision complex Long-precision complex ZSYR2K
Short-precision complex Short-precision real CHER2K
Long-precision complex Long-precision real ZHER2K

Syntax

Fortran	CALL SSYR2K \| DSYR2K \| CSYR2K \| ZSYR2K \| CHER2K \| ZHER2K (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`)
C and C++	ssyr2k \| dsyr2k \| csyr2k \| zsyr2k \| cher2k \| zher2k (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);
PL/I	CALL SSYR2K \| DSYR2K \| CSYR2K \| ZSYR2K \| CHER2K \| ZHER2K (`uplo`, `trans`, `n`, `k`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);

On Entry

uplo

indicates the storage mode used for matrix C, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Specified as: a single character. It must be 'U' or 'L'.

trans

indicates the form of matrices A and B to use in the computation, where:

If trans = 'N', A and B are used, resulting in equation 1 or 3.

If trans = 'T', A^T and B^T are used, resulting in equation 2.

If trans = 'C', A^H and B^H are used, resulting in equation 4.

Specified as: a single character, where:

For SSYR2K and DSYR2K, it must be 'N', 'T', or 'C'.

For CSYR2K and ZSYR2K, it must be 'N' or 'T'.

For CHER2K and ZHER2K, it must be 'N' or 'C'.

n

is the order of matrix C. Specified as: a fullword integer; 0 <= n <= ldc and:

If trans = 'N', then n <= lda and n <= ldb.

k

has the following meaning, where:

If trans = 'N', it is the number of columns in matrices A and B.

If trans = 'T' or 'C', it is the number of rows in matrices A and B.

Specified as: a fullword integer; k >= 0 and:

If trans = 'T' or 'C', then k <= lda and k <= ldb.

alpha

is the scalar alpha. Specified as: a number of the data type indicated in Table 80.

a

is the rectangular matrix A, where:

If trans = 'N', A is n by k.

If trans = 'T' or 'C', A is k by n.
Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 80, where:

If trans = 'N', its size must be lda by (at least) k.

If trans = 'T' or 'C', its size must be lda by (at least) n.

lda

is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and:

If trans = 'N', lda >= n.

If trans = 'T' or 'C', lda >= k.

b

is the rectangular matrix B, where:

If trans = 'N', B is n by k.

If trans = 'T' or 'C', B is k by n.
Note: No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 80, where:

If trans = 'N', its size must be ldb by (at least) k.

If trans = 'T' or 'C', its size must be ldb by (at least) n.

ldb

is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and:

If trans = 'N', ldb >= n.

If trans = 'T' or 'C', ldb >= k.

beta

is the scalar beta. Specified as: a number of the data type indicated in Table 80.

c

is matrix C of order n, which is real symmetric, complex symmetric, or complex Hermitian, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Specified as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 80.

ldc

is the leading dimension of the array specified for c. Specified as: a fullword integer; ldc > 0 and ldc >= n.

On Return

c

is matrix C of order n, which is real symmetric, complex symmetric, or complex Hermitian, containing the results of the computation, where:

If uplo = 'U', C is stored in upper storage mode.

If uplo = 'L', C is stored in lower storage mode.

Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 80.

Notes

These subroutines accept lowercase letters for the uplo and trans arguments.
For SSYR2K and DSYR2K, if you specify 'C' for the trans argument, it is interpreted as though you specified 'T'.
Matrices A and B must have no common elements with matrix C; otherwise, results are unpredictable.
The imaginary parts of the diagonal elements of a complex Hermitian matrix A are assumed to be zero, so you do not have to set these values. On output, they are set to zero, except when beta is one and alpha or k is zero, in which case no computation is performed.
For a description of how symmetric matrices are stored in upper and lower storage mode, see "Symmetric Matrix". For a description of how complex Hermitian matrices are stored in upper and lower storage mode, see "Complex Hermitian Matrix".

Function

These subroutines can perform the following rank-2k updates. For SSYR2K and DSYR2K, matrix C is real symmetric. For CSYR2K and ZSYR2K, matrix C is complex symmetric. They perform:

1. C <-- alphaAB^T + alphaBA^T + betaC

2. C <-- alphaA^TB + alphaB^TA + betaC

For CHER2K and ZHER2K, matrix C is complex Hermitian. They perform:

where:

alpha and beta are scalars.

A and B are rectangular matrices, which are n by k for equations 1 and 3, and are k by n for equations 2 and 4.

C is a matrix of order n of the type indicated above, stored in upper or lower storage mode.

See references [32], [38], and [66]. In the following two cases, no computation is performed:

n is 0.
beta is one, and alpha is zero or k is zero.

Assuming the above conditions do not exist, if beta is not one, and alpha is zero or k is zero, then betaC is returned.

Error Conditions

Resource Errors

Unable to allocate internal work area.

Computational Errors

None

Input-Argument Errors

lda, ldb, ldc <= 0
ldc < n
k, n < 0
uplo <> 'U' or 'L'
trans <> 'N', 'T', or 'C' for SSYR2K and DSYR2K
trans <> 'N' or 'T' for CSYR2K and ZSYR2K
trans <> 'N' or 'C' for CHER2K and ZHER2K
trans = 'N' and lda < n
trans = 'T' or 'C' and lda < k
trans = 'N' and ldb < n
trans = 'T' or 'C' and ldb < k

Example 1

This example shows the computation C <-- alphaAB^T+alphaBA^T+betaC, where A and B are 8 by 2 real rectangular matrices, and C is a real symmetric matrix of order 8, stored in upper storage mode.

Call Statement and Input

             UPLO TRANS   N   K   ALPHA   A  LDA  B  LDB  BETA  C   LDC
              |     |     |   |     |     |   |   |   |    |    |    |
CALL SSYR2K( 'U' , 'N' ,  8 , 2 ,  1.0  , A , 9 , B , 8 , 1.0 , C , 10 )

        *           *
        | 0.0   8.0 |
        | 1.0   9.0 |
        | 2.0  10.0 |
        | 3.0  11.0 |
A    =  | 4.0  12.0 |
        | 5.0  13.0 |
        | 6.0  14.0 |
        | 7.0  15.0 |
        |  .     .  |
        *           *

        *           *
        | 15.0  7.0 |
        | 14.0  6.0 |
        | 13.0  5.0 |
B    =  | 12.0  4.0 |
        | 11.0  3.0 |
        | 10.0  2.0 |
        |  9.0  1.0 |
        |  8.0  0.0 |
        *           *

        *                                            *
        | 0.0  1.0  3.0  6.0  10.0  15.0  21.0  28.0 |
        |  .   2.0  4.0  7.0  11.0  16.0  22.0  29.0 |
        |  .    .   5.0  8.0  12.0  17.0  23.0  30.0 |
        |  .    .    .   9.0  13.0  18.0  24.0  31.0 |
C    =  |  .    .    .    .   14.0  19.0  25.0  32.0 |
        |  .    .    .    .     .   20.0  26.0  33.0 |
        |  .    .    .    .     .     .   27.0  34.0 |
        |  .    .    .    .     .     .     .   35.0 |
        |  .    .    .    .     .     .     .     .  |
        |  .    .    .    .     .     .     .     .  |
        *                                            *

Output

        *                                                        *
        | 112.0  127.0  143.0  160.0  178.0  197.0  217.0  238.0 |
        |    .   138.0  150.0  163.0  177.0  192.0  208.0  225.0 |
        |    .      .   157.0  166.0  176.0  187.0  199.0  212.0 |
        |    .      .      .   169.0  175.0  182.0  190.0  199.0 |
C    =  |    .      .      .      .   174.0  177.0  181.0  186.0 |
        |    .      .      .      .      .   172.0  172.0  173.0 |
        |    .      .      .      .      .      .   163.0  160.0 |
        |    .      .      .      .      .      .      .   147.0 |
        |    .      .      .      .      .      .      .      .  |
        |    .      .      .      .      .      .      .      .  |
        *                                                        *

Example 2

This example shows the computation C <-- alphaA^TB+alphaB^TA+betaC, where A and B are 3 by 8 real rectangular matrices, and C is a real symmetric matrix of order 8, stored in lower storage mode.

Call Statement and Input

             UPLO TRANS   N   K   ALPHA   A  LDA  B  LDB  BETA  C  LDC
              |     |     |   |     |     |   |   |   |    |    |   |
CALL SSYR2K( 'L' , 'T' ,  8 , 3 ,  1.0  , A , 4 , B , 5 , 1.0 , C , 8 )

        *                                             *
        | 0.0  3.0  6.0   9.0  12.0  15.0  18.0  21.0 |
A    =  | 1.0  4.0  7.0  10.0  13.0  16.0  19.0  22.0 |
        | 2.0  5.0  8.0  11.0  14.0  17.0  20.0  23.0 |
        |  .    .    .     .     .     .     .     .  |
        *                                             *

        *                                         *
        | 1.0  2.0  3.0  4.0  5.0  6.0  7.0   8.0 |
        | 2.0  3.0  4.0  5.0  6.0  7.0  8.0   9.0 |
B    =  | 3.0  4.0  5.0  6.0  7.0  8.0  9.0  10.0 |
        |  .    .    .    .    .    .    .     .  |
        |  .    .    .    .    .    .    .     .  |
        *                                         *

        *                                               *
        | 0.0    .     .     .     .     .     .     .  |
        | 1.0   8.0    .     .     .     .     .     .  |
        | 2.0   9.0  15.0    .     .     .     .     .  |
C    =  | 3.0  10.0  16.0  21.0    .     .     .     .  |
        | 4.0  11.0  17.0  22.0  26.0    .     .     .  |
        | 5.0  12.0  18.0  23.0  27.0  30.0    .     .  |
        | 6.0  13.0  19.0  24.0  28.0  31.0  33.0    .  |
        | 7.0  14.0  20.0  25.0  29.0  32.0  34.0  35.0 |
        *                                               *

Output

        *                                                   *
        |  16.0    .     .     .     .     .      .      .  |
        |  38.0  84.0    .     .     .     .      .      .  |
        |  60.0 124.0 187.0    .     .     .      .      .  |
C    =  |  82.0 164.0 245.0 325.0    .     .      .      .  |
        | 104.0 204.0 303.0 401.0 498.0    .      .      .  |
        | 126.0 244.0 361.0 477.0 592.0 706.0     .      .  |
        | 148.0 284.0 419.0 553.0 686.0 818.0  949.0     .  |
        | 170.0 324.0 477.0 629.0 780.0 930.0 1079.0 1227.0 |
        *                                                   *

Example 3

This example shows the computation C <-- alphaAB^T+alphaBA^T+betaC, where A and B are 3 by 5 complex rectangular matrices, and C is a complex symmetric matrix of order 3, stored in lower storage mode.

Call Statement and Input

             UPLO TRANS   N   K   ALPHA   A  LDA  B  LDB  BETA   C  LDC
              |     |     |   |     |     |   |   |   |    |     |   |
CALL CSYR2K( 'L' , 'N' ,  3 , 5 , ALPHA , A , 3 , B , 3 , BETA , C , 4 )

ALPHA    =  (1.0, 1.0)
 
BETA     =  (1.0, 1.0)

        *                                                        *
        | (2.0, 5.0) (3.0, 2.0) (4.0, 1.0) (1.0, 7.0) (0.0, 0.0) |
A    =  | (3.0, 3.0) (8.0, 5.0) (2.0, 5.0) (2.0, 4.0) (1.0, 2.0) |
        | (1.0, 3.0) (2.0, 1.0) (6.0, 5.0) (3.0, 2.0) (2.0, 2.0) |
        *                                                        *

        *                                                        *
        | (1.0, 5.0) (6.0, 2.0) (3.0, 1.0) (2.0, 0.0) (1.0, 0.0) |
B    =  | (2.0, 4.0) (7.0, 5.0) (2.0, 5.0) (2.0, 4.0) (0.0, 0.0) |
        | (3.0, 5.0) (8.0, 1.0) (1.0, 5.0) (1.0, 0.0) (1.0, 1.0) |
        *                                                        *

        *                                  *
        | (2.0, 3.0)     .          .      |
C    =  | (1.0, 9.0) (3.0, 3.0)     .      |
        | (4.0, 5.0) (6.0, 7.0) (8.0, 3.0) |
        |     .          .          .      |
        *                                  *

Output

        *                                                 *
        | (-101.0, 121.0)        .               .        |
C    =  | (-182.0, 192.0) (-274.0, 248.0)        .        |
        |  (-98.0, 146.0) (-163.0, 205.0) (-151.0, 115.0) |
        |        .               .               .        |
        *                                                 *

Example 4

This example shows the computation:

where A and B are 5 by 3 complex rectangular matrices, and C is a complex Hermitian matrix of order 3, stored in upper storage mode.
Note: The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values. On output, they are set to zero.

Call Statement and Input

             UPLO TRANS   N   K   ALPHA   A  LDA  B  LDB  BETA   C  LDC
              |     |     |   |     |     |   |   |   |    |     |   |
CALL CHER2K( 'U' , 'C' ,  3 , 5 , ALPHA , A , 5 , B , 5 , 1.0  , C , 4 )
 
ALPHA    =  (1.0, 1.0)

        *                                  *
        | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) |
        | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) |
A    =  | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) |
        | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) |
        | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) |
        *                                  *

        *                                  *
        | (4.0, 5.0) (6.0, 7.0) (8.0, 0.0) |
        | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) |
B    =  | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) |
        | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) |
        | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) |
        *                                  *

        *                                    *
        | (6.0,  . )  (3.0, 4.0)  (9.0, 1.0) |
C    =  |     .      (10.0,  . ) (12.0, 2.0) |
        |     .           .       (3.0,  . ) |
        |     .           .           .      |
        *                                    *

Output

        *                                             *
        | (102.0, 0.0)  (56.0, -143.0) (244.0, -96.0) |
C    =  |       .      (174.0,    0.0) (238.0,  78.0) |
        |       .            .         (363.0,   0.0) |
        |       .            .               .        |
        *                                             *

SGETMI, DGETMI, CGETMI, and ZGETMI--General Matrix Transpose (In-Place)

These subroutines transpose an n by n matrix A in place--that is, in matrix A:

A <-- A^T

Table 81. Data Types

A Subroutine
Short-precision real SGETMI
Long-precision real DGETMI
Short-precision complex CGETMI
Long-precision complex ZGETMI

Syntax

Fortran	CALL SGETMI \| DGETMI \| CGETMI \| ZGETMI (`a`, `lda`, `n`)
C and C++	sgetmi \| dgetmi \| cgetmi \| zgetmi (`a`, `lda`, `n`);
PL/I	CALL SGETMI \| DGETMI \| CGETMI \| ZGETMI (`a`, `lda`, `n`);

On Entry

a: is the matrix A having n rows and n columns. Specified as: an lda by (at least) n array, containing numbers of the data type indicated in Table 81.
lda: is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and lda >= n.
n: is the number of rows and columns in matrix A. Specified as: a fullword integer; n >= 0.

On Return

a: is the n by n matrix A^T, containing the results of the matrix transpose operation Returned as: an lda by (at least) n array, containing numbers of the data type indicated in Table 81.

Notes

To achieve optimal performance in these subroutines, specify an even value for lda. An odd value may degrade performance.
To achieve optimal performance in CGETMI, align the array specified for a on a doubleword boundary.

Function

Matrix A is transposed in place; that is, the n rows and n columns in matrix A are exchanged. For matrix A with elements a_ij, where i, j = 1, n, the in-place transpose is expressed as a_ji = a_ij for i, j = 1, n.

For the following input matrix A:

the in-place matrix transpose operation A <-- A^T is expressed as:

If n is 0, no computation is performed.

Error Conditions

Computational Errors

None

Input-Argument Errors

n < 0 or n > lda
lda <= 0

Example

This example shows an in-place matrix transpose of matrix A having 5 rows and 5 columns.

Call Statement and Input

               A     LDA   N
               |      |    |
CALL SGETMI( A(2,3) , 10 , 5 )

        *                                    *
        | .   .   .     .     .     .     .  |
        | .   .  1.0   6.0  11.0  16.0  21.0 |
        | .   .  2.0   7.0  12.0  17.0  22.0 |
        | .   .  3.0   8.0  13.0  18.0  23.0 |
A    =  | .   .  4.0   9.0  14.0  19.0  24.0 |
        | .   .  5.0  10.0  15.0  20.0  25.0 |
        | .   .   .     .     .     .     .  |
        | .   .   .     .     .     .     .  |
        | .   .   .     .     .     .     .  |
        | .   .   .     .     .     .     .  |
        *                                    *

Output

        *                                     *
        | .   .    .     .     .     .     .  |
        | .   .   1.0   2.0   3.0   4.0   5.0 |
        | .   .   6.0   7.0   8.0   9.0  10.0 |
        | .   .  11.0  12.0  13.0  14.0  15.0 |
A    =  | .   .  16.0  17.0  18.0  19.0  20.0 |
        | .   .  21.0  22.0  23.0  24.0  25.0 |
        | .   .    .     .     .     .     .  |
        | .   .    .     .     .     .     .  |
        | .   .    .     .     .     .     .  |
        | .   .    .     .     .     .     .  |
        *                                     *

SGETMO, DGETMO, CGETMO, and ZGETMO--General Matrix Transpose (Out-of-Place)

These subroutines transpose an m by n matrix A out of place, returning the result in matrix B:

B <-- A^T

Table 82. Data Types

A, B Subroutine
Short-precision real SGETMO
Long-precision real DGETMO
Short-precision complex CGETMO
Long-precision complex ZGETMO

Syntax

Fortran	CALL SGETMO \| DGETMO \| CGETMO \| ZGETMO (`a`, `lda`, `m`, `n`, `b`, `ldb`)
C and C++	sgetmo \| dgetmo \| cgetmo \| zgetmo (`a`, `lda`, `m`, `n`, `b`, `ldb`);
PL/I	CALL SGETMO \| DGETMO \| CGETMO \| ZGETMO (`a`, `lda`, `m`, `n`, `b`, `ldb`);

On Entry

a: is the matrix A having m rows and n columns. Specified as: an lda by (at least) n array, containing numbers of the data type indicated in Table 82.
lda: is the leading dimension of the array specified for a. Specified as: a fullword integer; lda > 0 and lda >= m.
m: is the number of rows in matrix A and the number of columns in matrix B. Specified as: a fullword integer; m >= 0.
n: is the number of columns in matrix A and the number of rows in matrix B. Specified as: a fullword integer; n >= 0.
b: See 'On Return'.
ldb: is the leading dimension of the array specified for b. Specified as: a fullword integer; ldb > 0 and ldb >= n.

On Return

b: is the matrix B having n rows and m columns, containing the results of the matrix transpose operation, A^T. Returned as: an ldb by (at least) m array, containing numbers of the data type indicated in Table 82.

Notes

The matrix B must have no common elements with matrix A; otherwise, results are unpredictable. See "Concepts".
To achieve optimal performance in CGETMO, align the arrays specified for a and b on doubleword boundaries.

Function

Matrix A is transposed out of place; that is, the m rows and n columns in matrix A are stored in n rows and m columns of matrix B. For matrix A with elements a_ij, where i = 1, m and j = 1, n, the out-of-place transpose is expressed as b_ji = a_ij for i = 1, m and j = 1, n.

For the following input matrix A:

the out-of-place matrix transpose operation B <-- A^T is expressed as:

If m or n is 0, no computation is performed.

Error Conditions

Computational Errors

None

Input-Argument Errors

m < 0 or m > lda
n < 0 or n > ldb
lda <= 0
ldb <= 0

Example 1

This example shows an out-of-place matrix transpose of matrix A, having 5 rows and 4 columns, with the result going into matrix B.

Call Statement and Input

               A     LDA   M   N     B     LDB
               |      |    |   |     |      |
CALL SGETMO( A(2,3) , 10 , 5 , 4 , B(2,2) , 6 )

        *                                *
        | .  .   .     .     .     .   . |
        | .  .  1.0   6.0  11.0  16.0  . |
        | .  .  2.0   7.0  12.0  17.0  . |
        | .  .  3.0   8.0  13.0  18.0  . |
A    =  | .  .  4.0   9.0  14.0  19.0  . |
        | .  .  5.0  10.0  15.0  20.0  . |
        | .  .   .     .     .     .   . |
        | .  .   .     .     .     .   . |
        | .  .   .     .     .     .   . |
        | .  .   .     .     .     .   . |
        *                                *

Output

        *                                    *
        | .    .     .     .     .     .   . |
        | .   1.0   2.0   3.0   4.0   5.0  . |
B    =  | .   6.0   7.0   8.0   9.0  10.0  . |
        | .  11.0  12.0  13.0  14.0  15.0  . |
        | .  16.0  17.0  18.0  19.0  20.0  . |
        | .    .     .     .     .     .   . |
        *                                    *

Example 2

This example uses the same input matrix A as in Example 1 to show that transposes can be achieved in the same array as long as the input and output data do not overlap. On output, the input data is not overwritten in the array.

Call Statement and Input

               A     LDA   M   N     B     LDB
               |      |    |   |     |      |
CALL SGETMO( A(2,3) , 10 , 5 , 4 , A(7,1) , 10 )

        *                                       *
        |   .     .     .     .     .     .   . |
        |   .     .    1.0   6.0  11.0  16.0  . |
        |   .     .    2.0   7.0  12.0  17.0  . |
        |   .     .    3.0   8.0  13.0  18.0  . |
A    =  |   .     .    4.0   9.0  14.0  19.0  . |
        |   .     .    5.0  10.0  15.0  20.0  . |
        |  1.0   2.0   3.0   4.0   5.0    .   . |
        |  6.0   7.0   8.0   9.0  10.0    .   . |
        | 11.0  12.0  13.0  14.0  15.0    .   . |
        | 16.0  17.0  18.0  19.0  20.0    .   . |
        *                                       *

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

A, B, C	Subroutine
Short-precision real	SGEADD
Long-precision real	DGEADD
Short-precision complex	CGEADD
Long-precision complex	ZGEADD

C <-- AB	C <-- AB^T	C <-- AB^H
C <-- A^TB	C <-- A^TB^T	C <-- A^TB^H
C <-- A^HB	C <-- A^HB^T	C <-- A^HB^H

A, B, C	`aux`	Subroutine
Short-precision real	Short-precision real	SGEMMS
Long-precision real	Long-precision real	DGEMMS
Short-precision complex	Short-precision real	CGEMMS
Long-precision complex	Long-precision real	ZGEMMS

C <-- alphaAB+betaC	C <-- alphaAB^T+betaC
C <-- alphaA^TB+betaC	C <-- alphaA^TB^T+betaC

C <-- alphaAB+betaC	C <-- alphaAB^T+betaC	C <-- alphaAB^H+betaC
C <-- alphaA^TB+betaC	C <-- alphaA^TB^T+betaC	C <-- alphaA^TB^H+betaC
C <-- alphaA^HB+betaC	C <-- alphaA^HB^T+betaC	C <-- alphaA^HB^H+betaC

A, B, C, alpha, beta	Subroutine
Short-precision real	SGEMM
Long-precision real	DGEMM
Short-precision complex	CGEMM
Long-precision complex	ZGEMM

alpha, A, B, beta, C	Subprogram
Short-precision real	SSYMM
Long-precision real	DSYMM
Short-precision complex	CSYMM and CHEMM
Long-precision complex	ZSYMM and ZHEMM

1. B <-- alphaAB	3. B <-- alphaBA
2. B <-- alphaA^TB	4. B <-- alphaBA^T

1. B <-- alphaAB	3. B <-- alphaBA	5. B <-- alphaA^HB
2. B <-- alphaA^TB	4. B <-- alphaBA^T	6. B <-- alphaBA^H

A, B, alpha	Subroutine
Short-precision real	STRMM
Long-precision real	DTRMM
Short-precision complex	CTRMM
Long-precision complex	ZTRMM

A, C	alpha, beta	Subprogram
Short-precision real	Short-precision real	SSYRK
Long-precision real	Long-precision real	DSYRK
Short-precision complex	Short-precision complex	CSYRK
Long-precision complex	Long-precision complex	ZSYRK
Short-precision complex	Short-precision real	CHERK
Long-precision complex	Long-precision real	ZHERK

A, B, C, alpha	beta	Subprogram
Short-precision real	Short-precision real	SSYR2K
Long-precision real	Long-precision real	DSYR2K
Short-precision complex	Short-precision complex	CSYR2K
Long-precision complex	Long-precision complex	ZSYR2K
Short-precision complex	Short-precision real	CHER2K
Long-precision complex	Long-precision real	ZHER2K

A	Subroutine
Short-precision real	SGETMI
Long-precision real	DGETMI
Short-precision complex	CGETMI
Long-precision complex	ZGETMI

A, B	Subroutine
Short-precision real	SGETMO
Long-precision real	DGETMO
Short-precision complex	CGETMO
Long-precision complex	ZGETMO