This section describes some key points about performance and accuracy in the matrix operations subroutines.
If you are using large square matrices in your matrix multiplication operations, you get better performance by using SGEMMS, DGEMMS, CGEMMS, and ZGEMMS. These subroutines use Winograd's variation of Strassen's algorithm for both real and complex matrices.
If you want to perform a combined matrix multiplication and addition with scaling, SGEMM, DGEMM, CGEMM, and ZGEMM provide better performance than if you perform the parts of the computation separately in your program. See references [32] and [35].
This section contains the matrix operation subroutine descriptions.
These subroutines can perform any one of the following matrix additions, using matrices A and B or their transposes, and matrix C:
A, B, C | Subroutine |
Short-precision real | SGEADD |
Long-precision real | DGEADD |
Short-precision complex | CGEADD |
Long-precision complex | ZGEADD |
Fortran | CALL SGEADD | DGEADD | CGEADD | ZGEADD (a, lda, transa, b, ldb, transb, c, ldc, m, n) |
C and C++ | sgeadd | dgeadd | cgeadd | zgeadd (a, lda, transa, b, ldb, transb, c, ldc, m, n); |
PL/I | CALL SGEADD | DGEADD | CGEADD | ZGEADD (a, lda, transa, b, ldb, transb, c, ldc, m, n); |
If transa = 'N', A is used in the computation, and A has m rows and n columns.
If transa = 'T', AT is used in the computation, and A has n rows and m columns.
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:
If transa = 'N', its size must be lda by (at least) n.
If transa = 'T', its size must be lda by (at least) m.
If transa = 'N', lda >= m.
If transa = 'T', lda >= n.
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
Specified as: a single character; transa = 'N' or 'T'.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
Note: | No data should be moved to form BT; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T', ldb >= n.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
Specified as: a single character; transb = 'N' or 'T'.
The matrix sum is expressed as follows, where aij, bij, and cij are elements of matrices A, B, and C, respectively:
If m or n is 0, no computation is performed.
You can compute the transpose CT of each of the four computations listed under "Function" by using the following matrix identities:
Be careful that your output array receiving CT has dimensions large enough to hold the transposed matrix. See "Example 4".
None
This example shows the computation C <-- A+B, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B, in which it is contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGEADD( A , 6 , 'N' , B , 4 , 'N' , C , 5 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 | | 210000.0 220000.0 230000.0 | A = | 310000.0 320000.0 330000.0 | | 410000.0 420000.0 430000.0 | | . . . | | . . . | * *
* * | 11.0 12.0 13.0 | B = | 21.0 22.0 23.0 | | 31.0 32.0 33.0 | | 41.0 42.0 43.0 | * *
* * | 110011.0 120012.0 130013.0 | | 210021.0 220022.0 230023.0 | C = | 310031.0 320032.0 330033.0 | | 410041.0 420042.0 430043.0 | | . . . | * *
This example shows the computation C <-- AT+B, where A, B, and C are the same size as arrays A, B, and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGEADD( A , 3 , 'T' , B , 4 , 'N' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 140000.0 | A = | 210000.0 220000.0 230000.0 240000.0 | | 310000.0 320000.0 330000.0 340000.0 | * *
* * | 11.0 12.0 13.0 | B = | 21.0 22.0 23.0 | | 31.0 32.0 33.0 | | 41.0 42.0 43.0 | * *
* * | 110011.0 210012.0 310013.0 | C = | 120021.0 220022.0 320023.0 | | 130031.0 230032.0 330033.0 | | 140041.0 240042.0 340043.0 | * *
This example shows computation C <-- A+BT, where A is contained in a larger array A, and B and C are the same size as arrays B and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGEADD( A , 5 , 'N' , B , 3 , 'T' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 | | 210000.0 220000.0 230000.0 | A = | 310000.0 320000.0 330000.0 | | 410000.0 420000.0 430000.0 | | . . . | * *
* * | 11.0 12.0 13.0 14.0 | B = | 21.0 22.0 23.0 24.0 | | 31.0 32.0 33.0 34.0 | * *
* * | 110011.0 120021.0 130031.0 | C = | 210012.0 220022.0 230032.0 | | 310013.0 320023.0 330033.0 | | 410014.0 420024.0 430034.0 | * *
This example shows how to produce the transpose of the result of the computation performed in "Example 3", C <-- A+BT, which uses the calling sequence:
CALL SGEADD( A , 5 , 'N' , B , 3 , 'T' , C , 4 , 4 , 3 )
You instead code a calling sequence for CT <-- AT+B, as shown below, where the resulting matrix CT in the output array CT is the transpose of the matrix in the output array C in "Example 3". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGEADD( A , 5 , 'T' , B , 3 , 'N' , CT , 4 , 3 , 4 )
* * | 110000.0 120000.0 130000.0 | | 210000.0 220000.0 230000.0 | A = | 310000.0 320000.0 330000.0 | | 410000.0 420000.0 430000.0 | | . . . | * *
* * | 11.0 12.0 13.0 14.0 | B = | 21.0 22.0 23.0 24.0 | | 31.0 32.0 33.0 34.0 | * *
* * | 110011.0 210012.0 310013.0 410014.0 | CT = | 120021.0 220022.0 320023.0 420024.0 | | 130031.0 230032.0 330033.0 430034.0 | | . . . . | * *
This example shows the computation C <-- AT+BT, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGEADD( A , 3 , 'T' , B , 3 , 'T' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 140000.0 | A = | 210000.0 220000.0 230000.0 240000.0 | | 310000.0 320000.0 330000.0 340000.0 | * *
* * | 11.0 12.0 13.0 14.0 | B = | 21.0 22.0 23.0 24.0 | | 31.0 32.0 33.0 34.0 | * *
* * | 110011.0 210021.0 310031.0 | C = | 120012.0 220022.0 320032.0 | | 130013.0 230023.0 330033.0 | | 140014.0 240024.0 340034.0 | * *
This example shows the computation C <-- A+B, where A, B, and C are contained in larger arrays A, B, and C, respectively, and the arrays contain complex data.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL CGEADD( A , 6 , 'N' , B , 5 , 'N' , C , 5 , 4 , 3 )
* * | (1.0, 5.0) (9.0, 2.0) (1.0, 9.0) | | (2.0, 4.0) (8.0, 3.0) (1.0, 8.0) | A = | (3.0, 3.0) (7.0, 5.0) (1.0, 7.0) | | (6.0, 6.0) (3.0, 6.0) (1.0, 4.0) | | . . . | | . . . | * *
* * | (1.0, 8.0) (2.0, 7.0) (3.0, 2.0) | | (4.0, 4.0) (6.0, 8.0) (6.0, 3.0) | B = | (6.0, 2.0) (4.0, 5.0) (4.0, 5.0) | | (7.0, 2.0) (6.0, 4.0) (1.0, 6.0) | | . . . | * *
* * | (2.0, 13.0) (11.0, 9.0) (4.0, 11.0) | | (6.0, 8.0) (14.0, 11.0) (7.0, 11.0) | C = | (9.0, 5.0) (11.0, 10.0) (5.0, 12.0) | | (13.0, 8.0) (9.0, 10.0) (2.0, 10.0) | | . . . | * *
These subroutines can perform any one of the following matrix subtractions, using matrices A and B or their transposes, and matrix C:
A, B, C | Subroutine |
Short-precision real | SGESUB |
Long-precision real | DGESUB |
Short-precision complex | CGESUB |
Long-precision complex | ZGESUB |
Fortran | CALL SGESUB | DGESUB | CGESUB | ZGESUB (a, lda, transa, b, ldb, transb, c, ldc, m, n) |
C and C++ | sgesub | dgesub | cgesub | zgesub (a, lda, transa, b, ldb, transb, c, ldc, m, n); |
PL/I | CALL SGESUB | DGESUB | CGESUB | ZGESUB (a, lda, transa, b, ldb, transb, c, ldc, m, n); |
If transa = 'N', A is used in the computation, and A has m rows and n columns.
If transa = 'T', AT is used in the computation, and A has n rows and m columns.
Note: | No data should be moved to form AT; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 73, where:
If transa = 'N', its size must be lda by (at least) n.
If transa = 'T', its size must be lda by (at least) m.
If transa = 'N', lda >= m.
If transa = 'T', lda >= n.
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
Specified as: a single character; transa = 'N' or 'T'.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
Note: | No data should be moved to form BT; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 72, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T', ldb >= n.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
Specified as: a single character; transb = 'N' or 'T'.
The matrix subtraction is expressed as follows, where aij, bij, and cij are elements of matrices A, B, and C, respectively:
If m or n is 0, no computation is performed.
You can compute the transpose CT of each of the four computations listed under "Function" by using the following matrix identities:
Be careful that your output array receiving CT has dimensions large enough to hold the transposed matrix. See "Example 5".
None
This example shows the computation C <-- A-B, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B, in which it is contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGESUB( A , 6 , 'N' , B , 4 , 'N' , C , 5 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 | | 210000.0 220000.0 230000.0 | A = | 310000.0 320000.0 330000.0 | | 410000.0 420000.0 430000.0 | | . . . | | . . . | * *
* * | -11.0 -12.0 -13.0 | B = | -21.0 -22.0 -23.0 | | -31.0 -32.0 -33.0 | | -41.0 -42.0 -43.0 | * *
* * | 110011.0 120012.0 130013.0 | | 210021.0 220022.0 230023.0 | C = | 310031.0 320032.0 330033.0 | | 410041.0 420042.0 430043.0 | | . . . | * *
This example shows the computation C <-- AT-B, where A, B, and C are the same size as arrays A, B, and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGESUB( A , 3 , 'T' , B , 4 , 'N' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 140000.0 | A = | 210000.0 220000.0 230000.0 240000.0 | | 310000.0 320000.0 330000.0 340000.0 | * *
* * | -11.0 -12.0 -13.0 | B = | -21.0 -22.0 -23.0 | | -31.0 -32.0 -33.0 | | -41.0 -42.0 -43.0 | * *
* * | 110011.0 210012.0 310013.0 | C = | 120021.0 220022.0 320023.0 | | 130031.0 230032.0 330033.0 | | 140041.0 240042.0 340043.0 | * *
This example shows computation C <-- A-BT, where A is contained in a larger array A, and B and C are the same size as arrays B and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGESUB( A , 5 , 'N' , B , 3 , 'T' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 | | 210000.0 220000.0 230000.0 | A = | 310000.0 320000.0 330000.0 | | 410000.0 420000.0 430000.0 | | . . . | * *
* * | -11.0 -12.0 -13.0 -14.0 | B = | -21.0 -22.0 -23.0 -24.0 | | -31.0 -32.0 -33.0 -34.0 | * *
* * | 110011.0 120021.0 130031.0 | C = | 210012.0 220022.0 230032.0 | | 310013.0 320023.0 330033.0 | | 410014.0 420024.0 430034.0 | * *
This example shows the computation C <-- AT-BT, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGESUB( A , 3 , 'T' , B , 3 , 'T' , C , 4 , 4 , 3 )
* * | 110000.0 120000.0 130000.0 140000.0 | A = | 210000.0 220000.0 230000.0 240000.0 | | 310000.0 320000.0 330000.0 340000.0 | * *
* * | -11.0 -12.0 -13.0 -14.0 | B = | -21.0 -22.0 -23.0 -24.0 | | -31.0 -32.0 -33.0 -34.0 | * *
* * | 110011.0 210021.0 310031.0 | C = | 120012.0 220022.0 320032.0 | | 130013.0 230023.0 330033.0 | | 140014.0 240024.0 340034.0 | * *
This example shows how to produce the transpose of the result of the computation performed in "Example 4", C <-- AT-BT, which uses the calling sequence:
CALL SGESUB( A , 3 , 'T' , B , 3 , 'T' , C , 4 , 4 , 3 )
You instead code a calling sequence for CT <-- A-B, as shown below, where the resulting matrix CT in the output array CT is the transpose of the matrix in the output array C in "Example 4". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL SGESUB( A , 3 , 'N' , B , 3 , 'N' , CT , 3 , 3 , 4 )
* * | 110000.0 120000.0 130000.0 140000.0 | A = | 210000.0 220000.0 230000.0 240000.0 | | 310000.0 320000.0 330000.0 340000.0 | * *
* * | -11.0 -12.0 -13.0 -14.0 | B = | -21.0 -22.0 -23.0 -24.0 | | -31.0 -32.0 -33.0 -34.0 | * *
* * | 110011.0 120012.0 130013.0 140014.0 | CT = | 210021.0 220022.0 230023.0 240024.0 | | 310031.0 320032.0 330033.0 340034.0 | * *
This example shows the computation C <-- A-B, where A, B, and C are contained in larger arrays A, B, and C, respectively, and the arrays contain complex data.
A LDA TRANSA B LDB TRANSB C LDC M N | | | | | | | | | | CALL CGESUB( A , 6 , 'N' , B , 5 , 'N' , C , 5 , 4 , 3 )
* * | (1.0, 5.0) (9.0, 2.0) (1.0, 9.0) | | (2.0, 4.0) (8.0, 3.0) (1.0, 8.0) | A = | (3.0, 3.0) (7.0, 5.0) (1.0, 7.0) | | (6.0, 6.0) (3.0, 6.0) (1.0, 4.0) | | . . . | | . . . | * *
* * | (1.0, 8.0) (2.0, 7.0) (3.0, 2.0) | | (4.0, 4.0) (6.0, 8.0) (6.0, 3.0) | B = | (6.0, 2.0) (4.0, 5.0) (4.0, 5.0) | | (7.0, 2.0) (6.0, 4.0) (1.0, 6.0) | | . . . | * *
* * | (0.0, -3.0) (7.0, -5.0) (-2.0, 7.0) | | (-2.0, 0.0) (2.0, -5.0) (-5.0, 5.0) | C = | (-3.0, 1.0) (3.0, 0.0) (-3.0, 2.0) | | (-1.0, 4.0) (-3.0, 2.0) (0.0, -2.0) | | . . . | * *
SGEMUL and DGEMUL can perform any one of the following matrix multiplications, using matrices A and B or their transposes, and matrix C:
C <-- AB | C <-- ABT |
|
C <-- ATB | C <-- ATBT |
|
CGEMUL and ZGEMUL can perform any one of the following matrix
multiplications, using matrices A and B, their
transposes or their conjugate transposes, and matrix C:
C <-- AB | C <-- ABT | C <-- ABH |
C <-- ATB | C <-- ATBT | C <-- ATBH |
C <-- AHB | C <-- AHBT | C <-- AHBH |
A, B, C | Subroutine |
Short-precision real | SGEMUL |
Long-precision real | DGEMUL |
Short-precision complex | CGEMUL |
Long-precision complex | ZGEMUL |
Fortran | CALL SGEMUL | DGEMUL | CGEMUL | ZGEMUL (a, lda, transa, b, ldb, transb, c, ldc, l, m, n) |
C and C++ | sgemul | dgemul | cgemul | zgemul (a, lda, transa, b, ldb, transb, c, ldc, l, m, n); |
PL/I | CALL SGEMUL | DGEMUL | CGEMUL | ZGEMUL (a, lda, transa, b, ldb, transb, c, ldc, l, m, n); |
APL2 | SGEMUL | DGEMUL | CGEMUL | ZGEMUL a lda transa b ldb transb 'c' ldc l m n |
If transa = 'N', A is used in the computation, and A has l rows and m columns.
If transa = 'T', AT is used in the computation, and A has m rows and l columns.
If transa = 'C', AH is used in the computation, and A has m rows and l columns.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 74, where:
If transa = 'N', its size must be lda by (at least) m.
If transa = 'T' or 'C', its size must be lda by (at least) l.
If transa = 'N', lda >= l.
If transa = 'T' or 'C', lda >= m.
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Specified as: a single character; transa = 'N' or 'T' for SGEMUL and DGEMUL; transa = 'N', 'T', or 'C' for CGEMUL and ZGEMUL.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
If transb = 'C', BH is used in the computation, and B has n rows and m columns.
Note: | No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 74, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T' or 'C', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T' or 'C', ldb >= n.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
If transb = 'C', BH is used in the computation.
Specified as: a single character; transb = 'N' or 'T' for SGEMUL and DGEMUL; transb = 'N', 'T', or 'C' for CGEMUL and ZGEMUL.
If transa = 'N', it is the number of columns in matrix A.
If transa = 'T' or 'C', it is the number of rows in matrix A.
In addition:
If transb = 'N', it is the number of rows in matrix B.
If transb = 'T' or 'C', it is the number of columns in matrix B.
Specified as: a fullword integer; m >= 0.
The matrix multiplication is expressed as follows, where aik, bkj, and cij are elements of matrices A, B, and C, respectively:
See reference [38]. If l or n is 0, no computation is performed. If l and n are greater than 0, and m is 0, an l by n matrix of zeros is returned.
By using the following equivalence rules, you can compute the transpose
CT or the conjugate transpose CH
of some of the computations performed by these subroutines:
Transpose | Conjugate Transpose |
---|---|
(AB)T = BTAT | (AB)H = BHAH |
(ATB)T = BTA | (AHB)H = BHA |
(ABT)T = BAT | (ABH)H = BAH |
(ATBT)T = BA | (AHBH)H = BA |
When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving CT or CH, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See "Example 2" and "Example 4".
Unable to allocate internal work area (CGEMUL and ZGEMUL only).
None
This example shows the computation C <-- AB, where A, B, and C are contained in larger arrays A, B, and C, respectively.
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( A , 8 , 'N' , B , 6 , 'N' , C , 7 , 6 , 5 , 4 )
* * | 1.0 2.0 -1.0 -1.0 4.0 | | 2.0 0.0 1.0 1.0 -1.0 | | 1.0 -1.0 -1.0 1.0 2.0 | A = | -3.0 2.0 2.0 2.0 0.0 | | 4.0 0.0 -2.0 1.0 -1.0 | | -1.0 -1.0 1.0 -3.0 2.0 | | . . . . . | | . . . . . | * *
* * | 1.0 -1.0 0.0 2.0 | | 2.0 2.0 -1.0 -2.0 | B = | 1.0 0.0 -1.0 1.0 | | -3.0 -1.0 1.0 -1.0 | | 4.0 2.0 -1.0 1.0 | | . . . . | * *
* * | 23.0 12.0 -6.0 2.0 | | -4.0 -5.0 1.0 3.0 | | 3.0 0.0 1.0 4.0 | C = | -3.0 5.0 -2.0 -10.0 | | -5.0 -7.0 4.0 4.0 | | 15.0 6.0 -5.0 6.0 | | . . . . | * *
This example shows how to produce the transpose of the result of the computation performed in "Example 1", C <-- AB, which uses the calling sequence:
CALL SGEMUL (A,8,'N',B,6,'N',C,7,6,5,4)
You instead code a calling sequence for CT <-- BTAT, as shown below, where the resulting matrix CT in the output array CT is the transpose of the matrix in the output array C in "Example 1". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( B , 6 , 'T' , A , 8 , 'T' , CT , 5 , 4 , 5 , 6 )
* * | 1.0 -1.0 0.0 2.0 | | 2.0 2.0 -1.0 -2.0 | B = | 1.0 0.0 -1.0 1.0 | | -3.0 -1.0 1.0 -1.0 | | 4.0 2.0 -1.0 1.0 | | . . . . | * *
* * | 1.0 2.0 -1.0 -1.0 4.0 | | 2.0 0.0 1.0 1.0 -1.0 | | 1.0 -1.0 -1.0 1.0 2.0 | A = | -3.0 2.0 2.0 2.0 0.0 | | 4.0 0.0 -2.0 1.0 -1.0 | | -1.0 -1.0 1.0 -3.0 2.0 | | . . . . . | | . . . . . | * *
* * | 23.0 -4.0 3.0 -3.0 -5.0 15.0 | | 12.0 -5.0 0.0 5.0 -7.0 6.0 | CT = | -6.0 1.0 1.0 -2.0 4.0 -5.0 | | 2.0 3.0 4.0 -10.0 4.0 6.0 | | . . . . . . | * *
This example shows the computation C <-- ATB, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( A , 4 , 'T' , B , 3 , 'N' , C , 5 , 3 , 3 , 6 )
* * | 1.0 -3.0 2.0 | A = | 2.0 4.0 0.0 | | 1.0 -1.0 -1.0 | | . . . | * *
* * | 1.0 -3.0 2.0 2.0 -1.0 2.0 | B = | 2.0 4.0 0.0 0.0 1.0 -2.0 | | 1.0 -1.0 -1.0 -1.0 -1.0 1.0 | * *
* * | 6.0 4.0 1.0 1.0 0.0 -1.0 | | 4.0 26.0 -5.0 -5.0 8.0 -15.0 | C = | 1.0 -5.0 5.0 5.0 -1.0 3.0 | | . . . . . . | | . . . . . . | * *
This example shows how to produce the transpose of the result of the computation performed in "Example 3", C <-- ATB, which uses the calling sequence:
CALL SGEMUL (A,4,'T',B,3,'N',C,5,3,3,6)
You instead code the calling sequence for CT <-- BTA, as shown below, where the resulting matrix CT in the output array CT is the transpose of the matrix in the output array C in "Example 3". Note that the array CT has dimensions large enough to receive the transposed matrix. For a description of all the matrix identities, see "Special Usage".
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( B , 3 , 'T' , A , 4 , 'N' , CT , 8 , 6 , 3 , 3 )
* * | 1.0 -3.0 2.0 2.0 -1.0 2.0 | B = | 2.0 4.0 0.0 0.0 1.0 -2.0 | | 1.0 -1.0 -1.0 -1.0 -1.0 1.0 | * *
* * | 1.0 -3.0 2.0 | A = | 2.0 4.0 0.0 | | 1.0 -1.0 -1.0 | | . . . | * *
* * | 6.0 4.0 1.0 | | 4.0 26.0 -5.0 | | 1.0 -5.0 5.0 | CT = | 1.0 -5.0 5.0 | | 0.0 8.0 -1.0 | | -1.0 -15.0 3.0 | | . . . | | . . . | * *
This example shows the computation C <-- ABT, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained.
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( A , 4 , 'N' , B , 3 , 'T' , C , 5 , 3 , 2 , 3 )
* * | 1.0 -3.0 | A = | 2.0 4.0 | | 1.0 -1.0 | | . . | * *
* * | 1.0 -3.0 | B = | 2.0 4.0 | | 1.0 -1.0 | * *
* * | 10.0 -10.0 4.0 | | -10.0 20.0 -2.0 | C = | 4.0 -2.0 2.0 | | . . . | | . . . | * *
This example shows the computation C <-- ATBT, where A, B, and C are the same size as the arrays A, B, and C in which they are contained. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL SGEMUL( A , 3 , 'T' , B , 3 , 'T' , C , 1 , 1 , 3 , 3 )
* * | 1.0 | A = | 2.0 | | 1.0 | * *
* * | 1.0 -3.0 2.0 | B = | 2.0 4.0 0.0 | | 1.0 -1.0 -1.0 | * *
* * B = | -3.0 10.0 -2.0 | * *
This example shows the computation C <-- ATB using complex data, where A, B, and C are contained in larger arrays A, B, and C, respectively.
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL CGEMUL( A , 6 , 'T' , B , 7 , 'N' , C , 3 , 2 , 3 , 3 )
* * | (1.0, 2.0) (3.0, 4.0) | | (4.0, 6.0) (7.0, 1.0) | A = | (6.0, 3.0) (2.0, 5.0) | | . . | | . . | | . . | * *
* * | (1.0, 9.0) (2.0, 6.0) (5.0, 6.0) | | (2.0, 5.0) (6.0, 2.0) (6.0, 4.0) | | (2.0, 6.0) (5.0, 4.0) (2.0, 6.0) | B = | . . . | | . . . | | . . . | | . . . | * *
* * | (-45.0, 85.0) (20.0, 93.0) (-13.0, 110.0) | C = | (-50.0, 90.0) (12.0, 79.0) (3.0, 94.0) | | . . . | * *
This example shows the computation C <-- ABH using complex data, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained.
A LDA TRANSA B LDB TRANSB C LDC L M N | | | | | | | | | | | CALL CGEMUL( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 )
* * | (1.0, 2.0) (-3.0, 2.0) | A = | (2.0, 6.0) (4.0, 5.0) | | (1.0, 2.0) (-1.0, 8.0) | | . . | * *
* * | (1.0, 3.0) (-3.0, 2.0) | B = | (2.0, 5.0) (4.0, 6.0) | | (1.0, 1.0) (-1.0, 9.0) | * *
* * | (20.0, -1.0) (12.0, 25.0) (24.0, 26.0) | C = | (18.0, -23.0) (80.0, -2.0) (49.0, -37.0) | | (26.0, -23.0) (56.0, 37.0) (76.0, 2.0) | | . . . | * *
These subroutines use Winograd's variation of the Strassen's
algorithm to perform the matrix multiplication for both real and complex
matrices. SGEMMS and DGEMMS can perform any one of the following matrix
multiplications, using matrices A and B or their
transposes, and matrix C:
C <-- AB | C <-- ABT |
|
C <-- ATB | C <-- ATBT |
|
CGEMMS and ZGEMMS can perform any one of the following matrix
multiplications, using matrices A and B, their
transposes or their conjugate transposes, and matrix C:
C <-- AB | C <-- ABT | C <-- ABH |
C <-- ATB | C <-- ATBT | C <-- ATBH |
C <-- AHB | C <-- AHBT | C <-- AHBH |
A, B, C | aux | Subroutine |
Short-precision real | Short-precision real | SGEMMS |
Long-precision real | Long-precision real | DGEMMS |
Short-precision complex | Short-precision real | CGEMMS |
Long-precision complex | Long-precision real | ZGEMMS |
Fortran | CALL SGEMMS | DGEMMS | CGEMMS | ZGEMMS (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux) |
C and C++ | sgemms | dgemms | cgemms | zgemms (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux); |
PL/I | CALL SGEMMS | DGEMMS | CGEMMS | ZGEMMS (a, lda, transa, b, ldb, transb, c, ldc, l, m, n, aux, naux); |
If transa = 'N', A is used in the computation, and A has l rows and m columns.
If transa = 'T', AT is used in the computation, and A has m rows and l columns.
If transa = 'C', AH is used in the computation, and A has m rows and l columns.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:
If transa = 'N', its size must be lda by (at least) m.
If transa = 'T' or 'C', its size must be lda by (at least) l.
If transa = 'N', lda >= l.
If transa = 'T' or 'C', lda >= m.
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Specified as: a single character; transa = 'N' or 'T' for SGEMMS and DGEMMS; transa = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
If transb = 'C', BH is used in the computation, and B has n rows and m columns.
Note: | No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 75, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T' or 'C', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T' or 'C', ldb >= n.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
If transb = 'C', BH is used in the computation.
Specified as: a single character; transb = 'N' or 'T' for SGEMMS and DGEMMS; transb = 'N', 'T', or 'C' for CGEMMS and ZGEMMS.
If transa = 'N', it is the number of columns in matrix A.
If transa = 'T' or 'C', it is the number of rows in matrix A.
In addition:
If transb = 'N', it is the number of rows in matrix B.
If transb = 'T' or 'C', it is the number of columns in matrix B.
Specified as: a fullword integer; m >= 0.
If naux = 0 and error 2015 is unrecoverable, aux is ignored.
Otherwise, is the storage work area used by this subroutine. Its size is specified by naux.
Specified as: an area of storage containing numbers of the data type indicated in Table 75.
Specified as: a fullword integer, where:
If naux = 0 and error 2015 is unrecoverable, SGEMMS, DGEMMS, CGEMMS, and ZGEMMS dynamically allocate the work area used by the subroutine. The work area is deallocated before control is returned to the calling program.
Otherwise,
When this subroutine uses Strassen's algorithm:
When this subroutine uses the direct method (_GEMUL), use naux >= 0.
Notes:
In these instances when the direct method is used, the subroutine does not use auxiliary storage, and you can specify naux = 0.
The matrix multiplications performed by these subroutines are functionally equivalent to those performed by SGEMUL, DGEMUL, CGEMUL, and ZGEMUL. For details on the computations performed, see "Function".
SGEMMS, DGEMMS, CGEMMS, and ZGEMMS use Winograd's variation of the Strassen's algorithm with minor changes for tuning purposes. (See pages 45 and 46 in reference [11].) The subroutines compute matrix multiplication for both real and complex matrices of large sizes. Complex matrix multiplication uses a special technique, using three real matrix multiplications and five real matrix additions. Each of these three resulting matrix multiplications then uses Strassen's algorithm.
The steps of Strassen's algorithm can be repeated up to four times by these subroutines, with each step reducing the dimensions of the matrix by a factor of two. The number of steps used by this subroutine depends on the size of the input matrices. Each step reduces the number of operations by about 10% from the normal matrix multiplication. On the other hand, if the matrix is small, a normal matrix multiplication is performed without using the Strassen's algorithm, and no improvement is gained. For details about small matrices, see "Notes".
The complex multiplication is performed by forming the real and imaginary parts of the input matrices. These subroutines uses three real matrix multiplications and five real matrix additions, instead of the normal four real matrix multiplications and two real matrix additions. Using only three real matrix multiplications allows the subroutine to achieve up to a 25% reduction in matrix operations, which can result in a significant savings in computing time for large matrices.
Strassen's method is not stable for certain row or column scalings of the input matrices A and B. Therefore, for matrices A and B with divergent exponent values Strassen's method may give inaccurate results. For these cases, you should use the _GEMUL or _GEMM subroutines.
The equivalence rules, defined for matrix multiplication of A and B in "Special Usage", also apply to these subroutines. You should use the equivalence rules when you want to transpose or conjugate transpose the result of the multiplication computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your output array, receiving CT or CH, has dimensions large enough to hold the resulting transposed or conjugate transposed matrix. See "Example 2" and "Example 4".
Error 2015 is unrecoverable, naux = 0, and unable to allocate work area.
None
This example shows the computation C <-- AB, where A, B, and C are contained in larger arrays A, B, and C, respectively. It shows how to code the calling sequence for SGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in "Example 1" for SGEMUL.
A LDA TRANSA B LDB TRANSB C LDC L M N AUX NAUX | | | | | | | | | | | | | CALL SGEMMS( A , 8 , 'N' , B , 6 , 'N' , C , 7 , 6 , 5 , 4 , AUX , 0 )
This example shows the computation C <-- ABH, where A and C are contained in larger arrays A and C, respectively, and B is the same size as the array B in which it is contained. The arrays contain complex data. This example shows how to code the calling sequence for CGEMMS, but does not use the Strassen algorithm for doing the computation. The calling sequence is shown below. The input and output, other than auxiliary storage, is the same as in "Example 8" for CGEMUL.
A LDA TRANSA B LDB TRANSB C LDC L M N AUX NAUX | | | | | | | | | | | | | CALL CGEMMS( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 , AUX , 0 )
SGEMM and DGEMM can perform any one of the following combined matrix
computations, using scalars alpha and beta, matrices A and
B or their transposes, and matrix C:
C <-- alphaAB+betaC | C <-- alphaABT+betaC |
|
C <-- alphaATB+betaC | C <-- alphaATBT+betaC |
|
CGEMM and ZGEMM can perform any one of the following combined matrix
computations, using scalars alpha and beta, matrices A and
B, their transposes or their conjugate transposes, and matrix
C:
C <-- alphaAB+betaC | C <-- alphaABT+betaC | C <-- alphaABH+betaC |
C <-- alphaATB+betaC | C <-- alphaATBT+betaC | C <-- alphaATBH+betaC |
C <-- alphaAHB+betaC | C <-- alphaAHBT+betaC | C <-- alphaAHBH+betaC |
A, B, C, alpha, beta | Subroutine |
Short-precision real | SGEMM |
Long-precision real | DGEMM |
Short-precision complex | CGEMM |
Long-precision complex | ZGEMM |
Fortran | CALL SGEMM | DGEMM | CGEMM | ZGEMM (transa, transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc) |
C and C++ | sgemm | dgemm | cgemm | zgemm (transa, transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc); |
PL/I | CALL SGEMM | DGEMM | CGEMM | ZGEMM (transa, transb, l, n, m, alpha, a, lda, b, ldb, beta, c, ldc); |
If transa = 'N', A is used in the computation.
If transa = 'T', AT is used in the computation.
If transa = 'C', AH is used in the computation.
Specified as: a single character; transa = 'N', 'T', or 'C'.
If transb = 'N', B is used in the computation.
If transb = 'T', BT is used in the computation.
If transb = 'C', BH is used in the computation.
Specified as: a single character; transb = 'N', 'T', or 'C'.
If transa = 'N', it is the number of columns in matrix A.
If transa = 'T' or 'C', it is the number of rows in matrix A.
In addition:
If transb = 'N', it is the number of rows in matrix B.
If transb = 'T' or 'C', it is the number of columns in matrix B.
Specified as: a fullword integer; m >= 0.
If transa = 'N', A is used in the computation, and A has l rows and m columns.
If transa = 'T', AT is used in the computation, and A has m rows and l columns.
If transa = 'C', AH is used in the computation, and A has m rows and l columns.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 76, where:
If transa = 'N', its size must be lda by (at least) m.
If transa = 'T' or 'C', its size must be lda by (at least) l.
If transa = 'N', lda >= l.
If transa = 'T' or 'C', lda >= m.
If transb = 'N', B is used in the computation, and B has m rows and n columns.
If transb = 'T', BT is used in the computation, and B has n rows and m columns.
If transb = 'C', BH is used in the computation, and B has n rows and m columns.
Note: | No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 76, where:
If transb = 'N', its size must be ldb by (at least) n.
If transb = 'T' or 'C', its size must be ldb by (at least) m.
If transb = 'N', ldb >= m.
If transb = 'T' or 'C', ldb >= n.
The combined matrix addition and multiplication is expressed as follows, where aik, bkj, and cij are elements of matrices A, B, and C, respectively:
See references [32] and [38]. In the following three cases, no computation is performed:
Assuming the above conditions do not exist, if beta <> 1 and m is 0, then betaC is returned.
The equivalence rules, defined for matrix multiplication of A and B in "Special Usage", also apply to the matrix multiplication part of the computation performed by this subroutine. You should use the equivalent rules when you want to transpose or conjugate transpose the multiplication part of the computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your input and output array C has dimensions large enough to hold the resulting matrix. See "Example 4".
Unable to allocate internal work area (CGEMM and ZGEMM only).
None
This example shows the computation C <-- alphaAB+betaC, where A, B, and C are contained in larger arrays A, B, and C, respectively.
TRANSA TRANSB L N M ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | | CALL SGEMM( 'N' , 'N' , 6 , 4 , 5 , 1.0 , A , 8 , B , 6 , 2.0 , C , 7 )
* * | 1.0 2.0 -1.0 -1.0 4.0 | | 2.0 0.0 1.0 1.0 -1.0 | | 1.0 -1.0 -1.0 1.0 2.0 | A = | -3.0 2.0 2.0 2.0 0.0 | | 4.0 0.0 -2.0 1.0 -1.0 | | -1.0 -1.0 1.0 -3.0 2.0 | | . . . . . | | . . . . . | * *
* * | 1.0 -1.0 0.0 2.0 | | 2.0 2.0 -1.0 -2.0 | B = | 1.0 0.0 -1.0 1.0 | | -3.0 -1.0 1.0 -1.0 | | 4.0 2.0 -1.0 1.0 | | . . . . | * *
* * | 0.5 0.5 0.5 0.5 | | 0.5 0.5 0.5 0.5 | | 0.5 0.5 0.5 0.5 | C = | 0.5 0.5 0.5 0.5 | | 0.5 0.5 0.5 0.5 | | 0.5 0.5 0.5 0.5 | | . . . . | * *
* * | 24.0 13.0 -5.0 3.0 | | -3.0 -4.0 2.0 4.0 | | 4.0 1.0 2.0 5.0 | C = | -2.0 6.0 -1.0 -9.0 | | -4.0 -6.0 5.0 5.0 | | 16.0 7.0 -4.0 7.0 | | . . . . | * *
This example shows the computation C <-- alphaABT+betaC, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B in which it is contained.
TRANSA TRANSB L N M ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | | CALL SGEMM( 'N' , 'T' , 3 , 3 , 2 , 1.0 , A , 4 , B , 3 , 2.0 , C , 5 )
* * | 1.0 -3.0 | A = | 2.0 4.0 | | 1.0 -1.0 | | . . | * *
* * | 1.0 -3.0 | B = | 2.0 4.0 | | 1.0 -1.0 | * *
* * | 0.5 0.5 0.5 | | 0.5 0.5 0.5 | C = | 0.5 0.5 0.5 | | . . . | | . . . | * *
* * | 11.0 -9.0 5.0 | | -9.0 21.0 -1.0 | C = | 5.0 -1.0 3.0 | | . . . | | . . . | * *
This example shows the computation C <-- alphaAB+betaC using complex data, where A, B, and C are contained in larger arrays, A, B, and C, respectively.
TRANSA TRANSB L N M ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | | CALL CGEMM( 'N' , 'N' , 6 , 2 , 3 , ALPHA , A , 8 , B , 4 , BETA , C , 8 )
ALPHA = (1.0, 0.0) BETA = (2.0, 0.0) * * | (1.0, 5.0) (9.0, 2.0) (1.0, 9.0) | | (2.0, 4.0) (8.0, 3.0) (1.0, 8.0) | | (3.0, 3.0) (7.0, 5.0) (1.0, 7.0) | A = | (4.0, 2.0) (4.0, 7.0) (1.0, 5.0) | | (5.0, 1.0) (5.0, 1.0) (1.0, 6.0) | | (6.0, 6.0) (3.0, 6.0) (1.0, 4.0) | | . . . | | . . . | * *
* * | (1.0, 8.0) (2.0, 7.0) | B = | (4.0, 4.0) (6.0, 8.0) | | (6.0, 2.0) (4.0, 5.0) | | . . | * *
* * | (0.5, 0.0) (0.5, 0.0) | | (0.5, 0.0) (0.5, 0.0) | | (0.5, 0.0) (0.5, 0.0) | C = | (0.5, 0.0) (0.5, 0.0) | | (0.5, 0.0) (0.5, 0.0) | | (0.5, 0.0) (0.5, 0.0) | | . . | | . . | * *
* * | (-22.0, 113.0) (-35.0, 142.0) | | (-19.0, 114.0) (-35.0, 141.0) | | (-20.0, 119.0) (-43.0, 146.0) | C = | (-27.0, 110.0) (-58.0, 131.0) | | (8.0, 103.0) (0.0, 112.0) | | (-55.0, 116.0) (-75.0, 135.0) | | . . | | . . | * *
This example shows how to obtain the conjugate transpose of ABH.
This shows the conjugate transpose of the computation performed in "Example 8" for CGEMUL, which uses the following calling sequence:
CALL CGEMUL( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 )
You instead code the calling sequence for C <-- betaC+alphaBAH, where beta = 0, alpha = 1, and the array C has the correct dimensions to receive the transposed matrix. Because beta is zero, betaC = 0. For a description of all the matrix identities, see "Special Usage".
TRANSA TRANSB L N M ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | | CALL CGEMM( 'N' , 'C' , 3 , 3 , 2 , ALPHA , B , 3 , A , 3 , BETA , C , 4 )
ALPHA = (1.0, 0.0) BETA = (0.0, 0.0) * * | (1.0, 3.0) (-3.0, 2.0) | B = | (2.0, 5.0) (4.0, 6.0) | | (1.0, 1.0) (-1.0, 9.0) | * *
* * | (1.0, 2.0) (-3.0, 2.0) | A = | (2.0, 6.0) (4.0, 5.0) | | (1.0, 2.0) (-1.0, 8.0) | | . . | * *
C =(not relevant)
* * | (20.0, 1.0) (18.0, 23.0) (26.0, 23.0) | C = | (12.0, -25.0) (80.0, 2.0) (56.0, -37.0) | | (24.0, -26.0) (49.0, 37.0) (76.0, -2.0) | | . . . | * *
This example shows the computation C <-- alphaATBH+betaC using complex data, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained. Because beta is zero, betaC = 0. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)
TRANSA TRANSB L N M ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | | CALL CGEMM( 'T' , 'C' , 1 , 3 , 3 , ALPHA , A , 3 , B , 3 , BETA , C , 1 )
ALPHA = (1.0, 1.0) BETA = (0.0, 0.0) * * | (1.0, 2.0) | A = | (2.0, 5.0) | | (1.0, 6.0) | * *
* * | (1.0, 6.0) (-3.0, 4.0) (2.0, 6.0) | B = | (2.0, 3.0) (4.0, 6.0) (0.0, 3.0) | | (1.0, 3.0) (-1.0, 6.0) (-1.0, 9.0) | * *
C =(not relevant)
* * A = | (86.0, 44.0) (58.0, 70.0) (121.0, 55.0) | * *
These subroutines compute one of the following matrix-matrix products, using the scalars alpha and beta and matrices A, B, and C:
where matrix A is stored in either upper or lower storage mode, and:
alpha, A, B, beta, C | Subprogram |
Short-precision real | SSYMM |
Long-precision real | DSYMM |
Short-precision complex | CSYMM and CHEMM |
Long-precision complex | ZSYMM and ZHEMM |
Fortran | CALL SSYMM | DSYMM | CSYMM | ZSYMM | CHEMM | ZHEMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) |
C and C++ | ssymm | dsymm | csymm | zsymm | chemm | zhemm (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc); |
PL/I | CALL SSYMM | DSYMM | CSYMM | ZSYMM | CHEMM | ZHEMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc); |
If side = 'L', A is to the left of B, resulting in equation 1.
If side = 'R', A is to the right of B, resulting in equation 2.
Specified as: a single character. It must be 'L' or 'R'.
If uplo = 'U', A is stored in upper storage mode.
If uplo = 'L', A is stored in lower storage mode.
Specified as: a single character. It must be 'U' or 'L'.
If side = 'L', m is the order of triangular matrix A.
Specified as: a fullword integer; 0 <= m <= ldb, m <= ldc, and:
If side = 'L', m <= lda.
If side = 'R', n is the order of triangular matrix A.
Specified as: a fullword integer; n >= 0 and:
If side = 'R', n <= lda.
If side = 'L', A is order m.
If side = 'R', A is order n.
and where it is stored as follows:
If uplo = 'U', A is stored in upper storage mode.
If uplo = 'L', A is stored in lower storage mode.
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 77, where:
If side = 'L', its size must be lda by (at least) m.
If side = 'R', it size must be lda by (at least) n.
If side = 'L', lda >= m.
If side = 'R', lda >= n.
Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 77.
These subroutines can perform the following matrix-matrix product computations using matrix A, which is real symmetric for SSYMM and DSYMM, complex symmetric for CSYMM and ZSYMM, and complex Hermitian for CHEMM and ZHEMM:
where:
See references [32] and [38]. In the following two cases, no computation is performed:
Unable to allocate internal work area.
None
This example shows the computation C <-- alphaAB+betaC, where A is a real symmetric matrix of order 5, stored in upper storage mode, and B and C are 5 by 4 rectangular matrices.
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYMM( 'L' , 'U' , 5 , 4 , 2.0 , A , 8 , B , 6 , 1.0 , C , 5 )
* * | 1.0 2.0 -1.0 -1.0 4.0 | | . 0.0 1.0 1.0 -1.0 | | . . -1.0 1.0 2.0 | A = | . . . 2.0 0.0 | | . . . . -1.0 | | . . . . . | | . . . . . | | . . . . . | * *
* * | 1.0 -1.0 0.0 2.0 | | 2.0 2.0 -1.0 -2.0 | B = | 1.0 0.0 -1.0 1.0 | | -3.0 -1.0 1.0 -1.0 | | 4.0 2.0 -1.0 1.0 | | . . . . | * *
* * | 23.0 12.0 -6.0 2.0 | | -4.0 -5.0 1.0 3.0 | C = | 5.0 6.0 -1.0 -4.0 | | -4.0 1.0 0.0 -5.0 | | 8.0 -4.0 -2.0 13.0 | * *
* * | 69.0 36.0 -18.0 6.0 | | -12.0 -15.0 3.0 9.0 | C = | 15.0 18.0 -3.0 -12.0 | | -12.0 3.0 0.0 -15.0 | | 8.0 -20.0 -2.0 35.0 | * *
This example shows the computation C <-- alphaAB+betaC, where A is a real symmetric matrix of order 3, stored in lower storage mode, and B and C are 3 by 6 rectangular matrices.
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYMM( 'L' , 'L' , 3 , 6 , 2.0 , A , 4 , B , 3 , 2.0 , C , 5 )
* * | 1.0 . . | A = | 2.0 4.0 . | | 1.0 -1.0 -1.0 | | . . . | * *
* * | 1.0 -3.0 2.0 2.0 -1.0 2.0 | B = | 2.0 4.0 0.0 0.0 1.0 -2.0 | | 1.0 -1.0 -1.0 -1.0 -1.0 1.0 | * *
* * | 6.0 4.0 1.0 1.0 0.0 -1.0 | | 9.0 11.0 5.0 5.0 3.0 -5.0 | C = | -2.0 -6.0 3.0 3.0 -1.0 32.0 | | . . . . . . | | . . . . . . | * *
* * | 24.0 16.0 4.0 4.0 0.0 -4.0 | | 36.0 44.0 20.0 20.0 12.0 -20.0 | C = | -8.0 -24.0 12.0 12.0 -4.0 12.0 | | . . . . . . | | . . . . . . | * *
This example shows the computation C <-- alphaBA+betaC, where A is a real symmetric matrix of order 3, stored in upper storage mode, and B and C are 2 by 3 rectangular matrices.
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYMM( 'R' , 'U' , 2 , 3 , 2.0 , A , 4 , B , 3 , 1.0 , C , 5 )
* * | 1.0 -3.0 1.0 | A = | . 4.0 -1.0 | | . . 2.0 | | . . . | * *
* * | 1.0 -3.0 3.0 | B = | 2.0 4.0 -1.0 | | . . . | * *
* * | 13.0 -18.0 10.0 | | -11.0 11.0 -4.0 | C = | . . . | | . . . | | . . . | * *
* * | 39.0 -54.0 30.0 | | -33.0 33.0 -12.0 | C = | . . . | | . . . | | . . . | * *
This example shows the computation C <-- alphaBA+betaC, where A is a real symmetric matrix of order 3, stored in lower storage mode, and B and C are 3 by 3 square matrices.
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYMM( 'R' , 'L' , 3 , 3 , -1.0 , A , 3 , B , 3 , 1.0 , C , 3 )
* * | 1.0 . . | A = | 2.0 10.0 . | | 1.0 11.0 4.0 | * *
* * | 1.0 -3.0 2.0 | B = | 2.0 4.0 0.0 | | 1.0 -1.0 -1.0 | * *
* * | 1.0 5.0 -9.0 | C = | -3.0 10.0 -2.0 | | -2.0 8.0 0.0 | * *
* * | 4.0 11.0 15.0 | C = | -13.0 -34.0 -48.0 | | 0.0 27.0 14.0 | * *
This example shows the computation C <-- alphaBA+betaC, where A is a complex symmetric matrix of order 3, stored in upper storage mode, and B and C are 2 by 3 rectangular matrices.
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL CSYMM( 'R' , 'U' , 2 , 3 , ALPHA , A , 4 , B , 3 , BETA , C , 5 )
ALPHA = (2.0, 3.0) BETA = (1.0, 6.0)
* * | (1.0, 5.0) (-3.0, 2.0) (1.0, 6.0) | A = | . (4.0, 5.0) (-1.0, 4.0) | | . . (2.0, 5.0) | | . . . | * *
* * | (1.0, 1.0) (-3.0, 2.0) (3.0, 3.0) | B = | (2.0, 6.0) (4.0, 5.0) (-1.0, 4.0) | | . . . | * *
* * | (13.0, 6.0) (-18.0, 6.0) (10.0, 7.0) | | (-11.0, 8.0) (11.0, 1.0) (-4.0, 2.0) | C = | . . . | | . . . | | . . . | * *
* * | (-96.0, 72.0) (-141.0, -226.0) (-112.0, 38.0) | | (-230.0, -269.0) (-133.0, -23.0) (-272.0, -198.0) | C = | . . . | | . . . | | . . . | * *
This example shows the computation C <-- alphaBA+betaC, where A is a complex Hermitian matrix of order 3, stored in lower storage mode, and B and C are 3 by 3 square matrices.
Note: | The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values. |
SIDE UPLO M N ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL CHEMM( 'R' , 'L' , 2 , 3 , ALPHA , A , 4 , B , 3 , BETA , C , 5 )
ALPHA = (2.0, 3.0) BETA = (1.0, 6.0)
* * | (1.0, . ) . . | A = | (3.0, 2.0) (4.0, . ) . | | (-1.0, 6.0) (1.0, 4.0) (2.0, . ) | | . . . | * *
* * | (1.0, 1.0) (-3.0, 2.0) (3.0, 3.0) | B = | (2.0, 6.0) (4.0, 5.0) (-1.0, 4.0) | | . . . | * *
* * | (13.0, 6.0) (-18.0, 6.0) (10.0, 7.0) | | (-11.0, 8.0) (11.0, 1.0) (-4.0, 2.0) | C = | . . . | | . . . | | . . . | * *
* * | (-137.0, 17.0) (-158.0, -102.0) (-39.0, 141.0) | | (-154.0, -77.0) (-63.0, 186.0) (159.0, 104.0) | C = | . . . | | . . . | | . . . | * *
STRMM and DTRMM compute one of the following matrix-matrix products, using
the scalar alpha, rectangular matrix B, and triangular matrix
A or its transpose:
1. B <-- alphaAB | 3. B <-- alphaBA |
|
2. B <-- alphaATB | 4. B <-- alphaBAT |
|
CTRMM and ZTRMM compute one of the following matrix-matrix products, using
the scalar alpha, rectangular matrix B, and triangular matrix
A, its transpose, or its conjugate transpose:
1. B <-- alphaAB | 3. B <-- alphaBA | 5. B <-- alphaAHB |
2. B <-- alphaATB | 4. B <-- alphaBAT | 6. B <-- alphaBAH |
A, B, alpha | Subroutine |
Short-precision real | STRMM |
Long-precision real | DTRMM |
Short-precision complex | CTRMM |
Long-precision complex | ZTRMM |
Fortran | CALL STRMM | DTRMM | CTRMM | ZTRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) |
C and C++ | strmm | dtrmm | ctrmm | ztrmm (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb); |
PL/I | CALL STRMM | DTRMM | CTRMM | ZTRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb); |
If side = 'L', A is to the left of B in the equation, resulting in either equation 1, 2, or 5.
If side = 'R', A is to the right of B in the equation, resulting in either equation 3, 4, or 6.
Specified as: a single character. It must be 'L' or 'R'.
If uplo = 'U', A is an upper triangular matrix.
If uplo = 'L', A is a lower triangular matrix.
Specified as: a single character. It must be 'U' or 'L'.
If transa = 'N', A is used in the computation, resulting in either equation 1 or 3.
If transa = 'T', AT is used in the computation, resulting in either equation 2 or 4.
If transa = 'C', AH is used in the computation, resulting in either equation 5 or 6.
Specified as: a single character. It must be 'N', 'T', or 'C'.
If diag = 'U', A is a unit triangular matrix.
If diag = 'N', A is not a unit triangular matrix.
Specified as: a single character. It must be 'U' or 'N'.
If side = 'L', m is the order of triangular matrix A.
Specified as: a fullword integer, where:
If side = 'L', 0 <= m <= lda and m <= ldb.
If side = 'R', 0 <= m <= ldb.
If side = 'R', n is the order of triangular matrix A.
Specified as: a fullword integer; n >= 0 and:
If side = 'R', n <= lda.
If side = 'L', A is order m.
If side = 'R', A is order n.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 78, where:
If side = 'L', its size must be lda by (at least) m.
If side = 'R', it size must be lda by (at least) n.
If side = 'L', lda >= m.
If side = 'R', lda >= n.
These subroutines can perform the following matrix-matrix product computations, using the triangular matrix A, its transpose, or its conjugate transpose, where A can be either upper- or lower-triangular:
where:
where:
See references [32] and [38]. If n or m is 0, no computation is performed.
Unable to allocate internal work area.
None
This example shows the computation B <-- alphaAB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 3 rectangular matrix.
SIDE UPLO TRANSA DIAG M N ALPHA A LDA B LDB | | | | | | | | | | | CALL STRMM( 'L' , 'U' , 'N' , 'N' , 5 , 3 , 1.0 , A , 7 , B , 6 )
* * | 3.0 -1.0 2.0 2.0 1.0 | | . -2.0 4.0 -1.0 3.0 | | . . -3.0 0.0 2.0 | A = | . . . 4.0 -2.0 | | . . . . 1.0 | | . . . . . | | . . . . . | * *
* * | 2.0 3.0 1.0 | | 5.0 5.0 4.0 | B = | 0.0 1.0 2.0 | | 3.0 1.0 -3.0 | | -1.0 2.0 1.0 | | . . . | * *
* * | 6.0 10.0 -2.0 | | -16.0 -1.0 6.0 | B = | -2.0 1.0 -4.0 | | 14.0 0.0 -14.0 | | -1.0 2.0 1.0 | | . . . | * *
This example shows the computation B <-- alphaATB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 4 rectangular matrix.
SIDE UPLO TRANSA DIAG M N ALPHA A LDA B LDB | | | | | | | | | | | CALL STRMM( 'L' , 'U' , 'T' , 'N' , 5 , 4 , 1.0 , A , 7 , B , 6 )
* * | -1.0 -4.0 -2.0 2.0 3.0 | | . -2.0 2.0 2.0 2.0 | | . . -3.0 -1.0 4.0 | A = | . . . 1.0 0.0 | | . . . . -2.0 | | . . . . . | | . . . . . | * *
* * | 1.0 2.0 3.0 4.0 | | 3.0 3.0 -1.0 2.0 | B = | -2.0 -1.0 0.0 1.0 | | 4.0 4.0 -3.0 -3.0 | | 2.0 2.0 2.0 2.0 | | . . . . | * *
* * | -1.0 -2.0 -3.0 -4.0 | | 2.0 -2.0 -14.0 -12.0 | B = | 10.0 5.0 -8.0 -7.0 | | 14.0 15.0 1.0 8.0 | | -3.0 4.0 3.0 16.0 | | . . . . | * *
This example shows the computation B <-- alphaBA, where A is a 5 by 5 lower triangular matrix that is not unit triangular, and B is a 3 by 5 rectangular matrix.
SIDE UPLO TRANSA DIAG M N ALPHA A LDA B LDB | | | | | | | | | | | CALL STRMM( 'R' , 'L' , 'N' , 'N' , 3 , 5 , 1.0 , A , 7 , B , 4 )
* * | 2.0 . . . . | | 2.0 3.0 . . . | | 2.0 1.0 1.0 . . | A = | 0.0 3.0 0.0 -2.0 . | | 2.0 4.0 -1.0 2.0 -1.0 | | . . . . . | | . . . . . | * *
* * | 3.0 4.0 -1.0 -1.0 -1.0 | B = | 2.0 1.0 -1.0 0.0 3.0 | | -2.0 -1.0 -3.0 0.0 2.0 | | . . . . . | * *
* * | 10.0 4.0 0.0 0.0 1.0 | B = | 10.0 14.0 -4.0 6.0 -3.0 | | -8.0 2.0 -5.0 4.0 -2.0 | | . . . . . | * *
This example shows the computation B <-- alphaBA, where A is a 6 by 6 upper triangular matrix that is unit triangular, and B is a 1 by 6 rectangular matrix.
Note: | Because matrix A is unit triangular, the diagonal elements are not referenced. ESSL assumes a value of 1.0 for the diagonal elements. |
SIDE UPLO TRANSA DIAG M N ALPHA A LDA B LDB | | | | | | | | | | | CALL STRMM( 'R' , 'U' , 'N' , 'U' , 1 , 6 , 1.0 , A , 7 , B , 2 )
* * | . 2.0 -3.0 1.0 2.0 4.0 | | . . 0.0 1.0 1.0 -2.0 | | . . . 4.0 -1.0 1.0 | A = | . . . . 0.0 -1.0 | | . . . . . 2.0 | | . . . . . . | | . . . . . . | * *
* * B = | 1.0 2.0 1.0 3.0 -1.0 -2.0 | | . . . . . . | * *
* * B = | 1.0 4.0 -2.0 10.0 2.0 -6.0 | | . . . . . . | * *
This example shows the computation B <-- alphaAHB, where A is a 5 by 5 upper triangular matrix that is not unit triangular, and B is a 5 by 1 rectangular matrix.
SIDE UPLO TRANSA DIAG M N ALPHA A LDA B LDB | | | | | | | | | | | CALL CTRMM( 'L' , 'U' , 'C' , 'N' , 5 , 1 , ALPHA , A , 6 , B , 6 ) ALPHA = (1.0, 0.0)
* * | (-4.0, 1.0) (4.0, -3.0) (-1.0, 3.0) (0.0, 0.0) (-1.0, 0.0) | | . (-2.0, 0.0) (-3.0, -1.0) (-2.0, -1.0) (4.0, 3.0) | A = | . . (-5.0, 3.0) (-3.0, -3.0) (-5.0, -5.0) | | . . . (4.0, -4.0) (2.0, 0.0) | | . . . . (2.0, -1.0) | | . . . . . | * *
* * | (3.0, 4.0) | | (-4.0, 2.0) | B = | (-5.0, 0.0) | | (1.0, 3.0) | | (3.0, 1.0) | | . | * *
* * | (-8.0, -19.0) | | (8.0, 21.0) | B = | (44.0, -8.0) | | (13.0, -7.0) | | (19.0, 2.0) | | . | * *
These subroutines compute one of the following rank-k updates, where matrix C is stored in either upper or lower storage mode. SSYRK, DSYRK, CSYRK, and ZSYRK use the scalars alpha and beta, real or complex matrix A or its transpose, and real or complex symmetric matrix C to compute:
CHERK and ZHERK use the scalars alpha and beta, complex matrix A or its complex conjugate transpose, and complex Hermitian matrix C to compute:
A, C | alpha, beta | Subprogram |
Short-precision real | Short-precision real | SSYRK |
Long-precision real | Long-precision real | DSYRK |
Short-precision complex | Short-precision complex | CSYRK |
Long-precision complex | Long-precision complex | ZSYRK |
Short-precision complex | Short-precision real | CHERK |
Long-precision complex | Long-precision real | ZHERK |
Fortran | CALL SSYRK | DSYRK | CSYRK | ZSYRK | CHERK | ZHERK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc) |
C and C++ | ssyrk | dsyrk | csyrk | zsyrk | cherk | zherk (uplo, trans, n, k, alpha, a, lda, beta, c, ldc); |
PL/I | CALL SSYRK | DSYRK | CSYRK | ZSYRK | CHERK | ZHERK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc); |
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Specified as: a single character. It must be 'U' or 'L'.
If trans = 'N', A is used, resulting in equation 1 or 3.
If trans = 'T', AT is used, resulting in equation 2.
If trans = 'C', AH is used, resulting in equation 4.
Specified as: a single character, where:
For SSYRK and DSYRK, it must be 'N', 'T', or 'C'.
For CSYRK and ZSYRK, it must be 'N' or 'T'.
For CHERK and ZHERK, it must be 'N' or 'C'.
If trans = 'N', then n <= lda.
If trans = 'N', it is the number of columns in matrix A.
If trans = 'T' or 'C', it is the number of rows in matrix A.
Specified as: a fullword integer; k >= 0 and:
If trans = 'T' or 'C', then k <= lda.
If trans = 'N', A is n by k.
If trans = 'T' or 'C', A is k by n.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 79, where:
If trans = 'N', its size must be lda by (at least) k.
If trans = 'T' or 'C', its size must be lda by (at least) n.
If trans = 'N', lda >= n.
If trans = 'T' or 'C', lda >= k.
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Specified as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 79.
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 79.
These subroutines can perform the following rank-k updates. For SSYRK and DSYRK, matrix C is real symmetric. For CSYRK and ZSYRK, matrix C is complex symmetric. They perform:
For CHERK and ZHERK, matrix C is complex Hermitian. They perform:
where:
See references [32] and [38]. In the following two cases, no computation is performed:
Assuming the above conditions do not exist, if beta is not one, and alpha is zero or k is zero, then betaC is returned.
Unable to allocate internal work area.
None
This example shows the computation C <-- alphaAAT+betaC, where A is an 8 by 2 real rectangular matrix, and C is a real symmetric matrix of order 8, stored in upper storage mode.
UPLO TRANS N K ALPHA A LDA BETA C LDC | | | | | | | | | | CALL SSYRK( 'U' , 'N' , 8 , 2 , 1.0 , A , 9 , 1.0 , C , 10 )
* * | 0.0 8.0 | | 1.0 9.0 | | 2.0 10.0 | | 3.0 11.0 | A = | 4.0 12.0 | | 5.0 13.0 | | 6.0 14.0 | | 7.0 15.0 | | . . | * *
* * | 0.0 1.0 3.0 6.0 10.0 15.0 21.0 28.0 | | . 2.0 4.0 7.0 11.0 16.0 22.0 29.0 | | . . 5.0 8.0 12.0 17.0 23.0 30.0 | | . . . 9.0 13.0 18.0 24.0 31.0 | C = | . . . . 14.0 19.0 25.0 32.0 | | . . . . . 20.0 26.0 33.0 | | . . . . . . 27.0 34.0 | | . . . . . . . 35.0 | | . . . . . . . . | | . . . . . . . . | * *
* * | 64.0 73.0 83.0 94.0 106.0 119.0 133.0 148.0 | | . 84.0 96.0 109.0 123.0 138.0 154.0 171.0 | | . . 109.0 124.0 140.0 157.0 175.0 194.0 | | . . . 139.0 157.0 176.0 196.0 217.0 | C = | . . . . 174.0 195.0 217.0 240.0 | | . . . . . 214.0 238.0 263.0 | | . . . . . . 259.0 286.0 | | . . . . . . . 309.0 | | . . . . . . . . | | . . . . . . . . | * *
This example shows the computation C <-- alphaATA+betaC, where A is a 3 by 8 real rectangular matrix, and C is a real symmetric matrix of order 8, stored in lower storage mode.
UPLO TRANS N K ALPHA A LDA BETA C LDC | | | | | | | | | | CALL SSYRK( 'L' , 'T' , 8 , 3 , 1.0 , A , 4 , 1.0 , C , 8 )
* * | 0.0 3.0 6.0 9.0 12.0 15.0 18.0 21.0 | A = | 1.0 4.0 7.0 10.0 13.0 16.0 19.0 22.0 | | 2.0 5.0 8.0 11.0 14.0 17.0 20.0 23.0 | | . . . . . . . . | * *
* * | 0.0 . . . . . . . | | 1.0 8.0 . . . . . . | | 2.0 9.0 15.0 . . . . . | C = | 3.0 10.0 16.0 21.0 . . . . | | 4.0 11.0 17.0 22.0 26.0 . . . | | 5.0 12.0 18.0 23.0 27.0 30.0 . . | | 6.0 13.0 19.0 24.0 28.0 31.0 33.0 . | | 7.0 14.0 20.0 25.0 29.0 32.0 34.0 35.0 | * *
* * | 5.0 . . . . . . . | | 15.0 58.0 . . . . . . | | 25.0 95.0 164.0 . . . . . | C = | 35.0 132.0 228.0 323.0 . . . . | | 45.0 169.0 292.0 414.0 535.0 . . . | | 55.0 206.0 356.0 505.0 653.0 800.0 . . | | 65.0 243.0 420.0 596.0 771.0 945.0 1118.0 . | | 75.0 280.0 484.0 687.0 889.0 1090.0 1290.0 1489.0 | * *
This example shows the computation C <-- alphaAAT+betaC, where A is a 3 by 5 complex rectangular matrix, and C is a complex symmetric matrix of order 3, stored in upper storage mode.
UPLO TRANS N K ALPHA A LDA BETA C LDC | | | | | | | | | | CALL CSYRK( 'U' , 'N' , 3 , 5 , ALPHA , A , 3 , BETA , C , 4 )
ALPHA = (1.0, 1.0) BETA = (1.0, 1.0)
* * | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) (1.0, 7.0) (0.0, 0.0) | A = | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) (2.0, 4.0) (1.0, 2.0) | | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) (3.0, 2.0) (2.0, 2.0) | * *
* * | (2.0, 1.0) (1.0, 9.0) (4.0, 5.0) | C = | . (3.0, 1.0) (6.0, 7.0) | | . . (8.0, 1.0) | | . . . | * *
* * | (-57.0, 13.0) (-63.0, 79.0) (-24.0, 70.0) | C = | . (-28.0, 90.0) (-55.0, 103.0) | | . . (13.0, 75.0) | | . . . | * *
This example shows the computation C <-- alphaAHA+betaC, where A is a 5 by 3 complex rectangular matrix, and C is a complex Hermitian matrix of order 3, stored in lower storage mode.
Note: | The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values. On output, they are set to zero. |
UPLO TRANS N K ALPHA A LDA BETA C LDC | | | | | | | | | | CALL CHERK( 'L' , 'C' , 3 , 5 , 1.0 , A , 5 , 1.0 , C , 4 )
* * | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) | | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) | A = | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) | | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) | | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) | * *
* * | (6.0, . ) . . | C = | (3.0, 4.0) (10.0, . ) . | | (9.0, 1.0) (12.0, 2.0) (3.0, . ) | | . . . | * *
* * | (138.0, 0.0) . . | C = | (65.0, 80.0) (165.0, 0.0) . | | (134.0, 46.0) (88.0, -88.0) (199.0, 0.0) | | . . . | * *
These subroutines compute one of the following rank-2k updates, where matrix C is stored in upper or lower storage mode. SSYR2K, DSYR2K, CSYR2K, and ZSYR2K use the scalars alpha and beta, real or complex matrices A and B or their transposes, and real or complex symmetric matrix C to compute:
CHER2K and ZHER2K use the scalars alpha and beta, complex matrices A and B or their complex conjugate transposes, and complex Hermitian matrix C to compute:
A, B, C, alpha | beta | Subprogram |
Short-precision real | Short-precision real | SSYR2K |
Long-precision real | Long-precision real | DSYR2K |
Short-precision complex | Short-precision complex | CSYR2K |
Long-precision complex | Long-precision complex | ZSYR2K |
Short-precision complex | Short-precision real | CHER2K |
Long-precision complex | Long-precision real | ZHER2K |
Fortran | CALL SSYR2K | DSYR2K | CSYR2K | ZSYR2K | CHER2K | ZHER2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) |
C and C++ | ssyr2k | dsyr2k | csyr2k | zsyr2k | cher2k | zher2k (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc); |
PL/I | CALL SSYR2K | DSYR2K | CSYR2K | ZSYR2K | CHER2K | ZHER2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc); |
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Specified as: a single character. It must be 'U' or 'L'.
If trans = 'N', A and B are used, resulting in equation 1 or 3.
If trans = 'T', AT and BT are used, resulting in equation 2.
If trans = 'C', AH and BH are used, resulting in equation 4.
Specified as: a single character, where:
For SSYR2K and DSYR2K, it must be 'N', 'T', or 'C'.
For CSYR2K and ZSYR2K, it must be 'N' or 'T'.
For CHER2K and ZHER2K, it must be 'N' or 'C'.
If trans = 'N', then n <= lda and n <= ldb.
If trans = 'N', it is the number of columns in matrices A and B.
If trans = 'T' or 'C', it is the number of rows in matrices A and B.
Specified as: a fullword integer; k >= 0 and:
If trans = 'T' or 'C', then k <= lda and k <= ldb.
If trans = 'N', A is n by k.
If trans = 'T' or 'C', A is k by n.
Note: | No data should be moved to form AT or AH; that is, the matrix A should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 80, where:
If trans = 'N', its size must be lda by (at least) k.
If trans = 'T' or 'C', its size must be lda by (at least) n.
If trans = 'N', lda >= n.
If trans = 'T' or 'C', lda >= k.
If trans = 'N', B is n by k.
If trans = 'T' or 'C', B is k by n.
Note: | No data should be moved to form BT or BH; that is, the matrix B should always be stored in its untransposed form. |
Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 80, where:
If trans = 'N', its size must be ldb by (at least) k.
If trans = 'T' or 'C', its size must be ldb by (at least) n.
If trans = 'N', ldb >= n.
If trans = 'T' or 'C', ldb >= k.
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Specified as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 80.
If uplo = 'U', C is stored in upper storage mode.
If uplo = 'L', C is stored in lower storage mode.
Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 80.
These subroutines can perform the following rank-2k updates. For SSYR2K and DSYR2K, matrix C is real symmetric. For CSYR2K and ZSYR2K, matrix C is complex symmetric. They perform:
For CHER2K and ZHER2K, matrix C is complex Hermitian. They perform:
where:
See references [32], [38], and [66]. In the following two cases, no computation is performed:
Assuming the above conditions do not exist, if beta is not one, and alpha is zero or k is zero, then betaC is returned.
Unable to allocate internal work area.
None
This example shows the computation C <-- alphaABT+alphaBAT+betaC, where A and B are 8 by 2 real rectangular matrices, and C is a real symmetric matrix of order 8, stored in upper storage mode.
UPLO TRANS N K ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYR2K( 'U' , 'N' , 8 , 2 , 1.0 , A , 9 , B , 8 , 1.0 , C , 10 )
* * | 0.0 8.0 | | 1.0 9.0 | | 2.0 10.0 | | 3.0 11.0 | A = | 4.0 12.0 | | 5.0 13.0 | | 6.0 14.0 | | 7.0 15.0 | | . . | * *
* * | 15.0 7.0 | | 14.0 6.0 | | 13.0 5.0 | B = | 12.0 4.0 | | 11.0 3.0 | | 10.0 2.0 | | 9.0 1.0 | | 8.0 0.0 | * *
* * | 0.0 1.0 3.0 6.0 10.0 15.0 21.0 28.0 | | . 2.0 4.0 7.0 11.0 16.0 22.0 29.0 | | . . 5.0 8.0 12.0 17.0 23.0 30.0 | | . . . 9.0 13.0 18.0 24.0 31.0 | C = | . . . . 14.0 19.0 25.0 32.0 | | . . . . . 20.0 26.0 33.0 | | . . . . . . 27.0 34.0 | | . . . . . . . 35.0 | | . . . . . . . . | | . . . . . . . . | * *
* * | 112.0 127.0 143.0 160.0 178.0 197.0 217.0 238.0 | | . 138.0 150.0 163.0 177.0 192.0 208.0 225.0 | | . . 157.0 166.0 176.0 187.0 199.0 212.0 | | . . . 169.0 175.0 182.0 190.0 199.0 | C = | . . . . 174.0 177.0 181.0 186.0 | | . . . . . 172.0 172.0 173.0 | | . . . . . . 163.0 160.0 | | . . . . . . . 147.0 | | . . . . . . . . | | . . . . . . . . | * *
This example shows the computation C <-- alphaATB+alphaBTA+betaC, where A and B are 3 by 8 real rectangular matrices, and C is a real symmetric matrix of order 8, stored in lower storage mode.
UPLO TRANS N K ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL SSYR2K( 'L' , 'T' , 8 , 3 , 1.0 , A , 4 , B , 5 , 1.0 , C , 8 )
* * | 0.0 3.0 6.0 9.0 12.0 15.0 18.0 21.0 | A = | 1.0 4.0 7.0 10.0 13.0 16.0 19.0 22.0 | | 2.0 5.0 8.0 11.0 14.0 17.0 20.0 23.0 | | . . . . . . . . | * *
* * | 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 | | 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 | B = | 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 | | . . . . . . . . | | . . . . . . . . | * *
* * | 0.0 . . . . . . . | | 1.0 8.0 . . . . . . | | 2.0 9.0 15.0 . . . . . | C = | 3.0 10.0 16.0 21.0 . . . . | | 4.0 11.0 17.0 22.0 26.0 . . . | | 5.0 12.0 18.0 23.0 27.0 30.0 . . | | 6.0 13.0 19.0 24.0 28.0 31.0 33.0 . | | 7.0 14.0 20.0 25.0 29.0 32.0 34.0 35.0 | * *
* * | 16.0 . . . . . . . | | 38.0 84.0 . . . . . . | | 60.0 124.0 187.0 . . . . . | C = | 82.0 164.0 245.0 325.0 . . . . | | 104.0 204.0 303.0 401.0 498.0 . . . | | 126.0 244.0 361.0 477.0 592.0 706.0 . . | | 148.0 284.0 419.0 553.0 686.0 818.0 949.0 . | | 170.0 324.0 477.0 629.0 780.0 930.0 1079.0 1227.0 | * *
This example shows the computation C <-- alphaABT+alphaBAT+betaC, where A and B are 3 by 5 complex rectangular matrices, and C is a complex symmetric matrix of order 3, stored in lower storage mode.
UPLO TRANS N K ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL CSYR2K( 'L' , 'N' , 3 , 5 , ALPHA , A , 3 , B , 3 , BETA , C , 4 )
ALPHA = (1.0, 1.0) BETA = (1.0, 1.0)
* * | (2.0, 5.0) (3.0, 2.0) (4.0, 1.0) (1.0, 7.0) (0.0, 0.0) | A = | (3.0, 3.0) (8.0, 5.0) (2.0, 5.0) (2.0, 4.0) (1.0, 2.0) | | (1.0, 3.0) (2.0, 1.0) (6.0, 5.0) (3.0, 2.0) (2.0, 2.0) | * *
* * | (1.0, 5.0) (6.0, 2.0) (3.0, 1.0) (2.0, 0.0) (1.0, 0.0) | B = | (2.0, 4.0) (7.0, 5.0) (2.0, 5.0) (2.0, 4.0) (0.0, 0.0) | | (3.0, 5.0) (8.0, 1.0) (1.0, 5.0) (1.0, 0.0) (1.0, 1.0) | * *
* * | (2.0, 3.0) . . | C = | (1.0, 9.0) (3.0, 3.0) . | | (4.0, 5.0) (6.0, 7.0) (8.0, 3.0) | | . . . | * *
* * | (-101.0, 121.0) . . | C = | (-182.0, 192.0) (-274.0, 248.0) . | | (-98.0, 146.0) (-163.0, 205.0) (-151.0, 115.0) | | . . . | * *
This example shows the computation:
where A and B are 5 by 3 complex rectangular matrices, and C is a complex Hermitian matrix of order 3, stored in upper storage mode.
Note: | The imaginary parts of the diagonal elements of a complex Hermitian matrix are assumed to be zero, so you do not have to set these values. On output, they are set to zero. |
UPLO TRANS N K ALPHA A LDA B LDB BETA C LDC | | | | | | | | | | | | CALL CHER2K( 'U' , 'C' , 3 , 5 , ALPHA , A , 5 , B , 5 , 1.0 , C , 4 ) ALPHA = (1.0, 1.0)
* * | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) | | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) | A = | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) | | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) | | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) | * *
* * | (4.0, 5.0) (6.0, 7.0) (8.0, 0.0) | | (1.0, 9.0) (3.0, 0.0) (6.0, 7.0) | B = | (3.0, 3.0) (8.0, 0.0) (2.0, 5.0) | | (1.0, 3.0) (2.0, 1.0) (6.0, 0.0) | | (2.0, 0.0) (3.0, 2.0) (4.0, 1.0) | * *
* * | (6.0, . ) (3.0, 4.0) (9.0, 1.0) | C = | . (10.0, . ) (12.0, 2.0) | | . . (3.0, . ) | | . . . | * *
* * | (102.0, 0.0) (56.0, -143.0) (244.0, -96.0) | C = | . (174.0, 0.0) (238.0, 78.0) | | . . (363.0, 0.0) | | . . . | * *
These subroutines transpose an n by n matrix A in place--that is, in matrix A:
A | Subroutine |
Short-precision real | SGETMI |
Long-precision real | DGETMI |
Short-precision complex | CGETMI |
Long-precision complex | ZGETMI |
Fortran | CALL SGETMI | DGETMI | CGETMI | ZGETMI (a, lda, n) |
C and C++ | sgetmi | dgetmi | cgetmi | zgetmi (a, lda, n); |
PL/I | CALL SGETMI | DGETMI | CGETMI | ZGETMI (a, lda, n); |
Matrix A is transposed in place; that is, the n rows and n columns in matrix A are exchanged. For matrix A with elements aij, where i, j = 1, n, the in-place transpose is expressed as aji = aij for i, j = 1, n.
For the following input matrix A:
the in-place matrix transpose operation A <-- AT is expressed as:
If n is 0, no computation is performed.
None
This example shows an in-place matrix transpose of matrix A having 5 rows and 5 columns.
A LDA N | | | CALL SGETMI( A(2,3) , 10 , 5 )
* * | . . . . . . . | | . . 1.0 6.0 11.0 16.0 21.0 | | . . 2.0 7.0 12.0 17.0 22.0 | | . . 3.0 8.0 13.0 18.0 23.0 | A = | . . 4.0 9.0 14.0 19.0 24.0 | | . . 5.0 10.0 15.0 20.0 25.0 | | . . . . . . . | | . . . . . . . | | . . . . . . . | | . . . . . . . | * *
* * | . . . . . . . | | . . 1.0 2.0 3.0 4.0 5.0 | | . . 6.0 7.0 8.0 9.0 10.0 | | . . 11.0 12.0 13.0 14.0 15.0 | A = | . . 16.0 17.0 18.0 19.0 20.0 | | . . 21.0 22.0 23.0 24.0 25.0 | | . . . . . . . | | . . . . . . . | | . . . . . . . | | . . . . . . . | * *
These subroutines transpose an m by n matrix A out of place, returning the result in matrix B:
A, B | Subroutine |
Short-precision real | SGETMO |
Long-precision real | DGETMO |
Short-precision complex | CGETMO |
Long-precision complex | ZGETMO |
Fortran | CALL SGETMO | DGETMO | CGETMO | ZGETMO (a, lda, m, n, b, ldb) |
C and C++ | sgetmo | dgetmo | cgetmo | zgetmo (a, lda, m, n, b, ldb); |
PL/I | CALL SGETMO | DGETMO | CGETMO | ZGETMO (a, lda, m, n, b, ldb); |
Matrix A is transposed out of place; that is, the m rows and n columns in matrix A are stored in n rows and m columns of matrix B. For matrix A with elements aij, where i = 1, m and j = 1, n, the out-of-place transpose is expressed as bji = aij for i = 1, m and j = 1, n.
For the following input matrix A:
the out-of-place matrix transpose operation B <-- AT is expressed as:
If m or n is 0, no computation is performed.
None
This example shows an out-of-place matrix transpose of matrix A, having 5 rows and 4 columns, with the result going into matrix B.
A LDA M N B LDB | | | | | | CALL SGETMO( A(2,3) , 10 , 5 , 4 , B(2,2) , 6 )
* * | . . . . . . . | | . . 1.0 6.0 11.0 16.0 . | | . . 2.0 7.0 12.0 17.0 . | | . . 3.0 8.0 13.0 18.0 . | A = | . . 4.0 9.0 14.0 19.0 . | | . . 5.0 10.0 15.0 20.0 . | | . . . . . . . | | . . . . . . . | | . . . . . . . | | . . . . . . . | * *
* * | . . . . . . . | | . 1.0 2.0 3.0 4.0 5.0 . | B = | . 6.0 7.0 8.0 9.0 10.0 . | | . 11.0 12.0 13.0 14.0 15.0 . | | . 16.0 17.0 18.0 19.0 20.0 . | | . . . . . . . | * *
This example uses the same input matrix A as in Example 1 to show that transposes can be achieved in the same array as long as the input and output data do not overlap. On output, the input data is not overwritten in the array.
A LDA M N B LDB | | | | | | CALL SGETMO( A(2,3) , 10 , 5 , 4 , A(7,1) , 10 )
* * | . . . . . . . | | . . 1.0 6.0 11.0 16.0 . | | . . 2.0 7.0 12.0 17.0 . | | . . 3.0 8.0 13.0 18.0 . | A = | . . 4.0 9.0 14.0 19.0 . | | . . 5.0 10.0 15.0 20.0 . | | 1.0 2.0 3.0 4.0 5.0 . . | | 6.0 7.0 8.0 9.0 10.0 . . | | 11.0 12.0 13.0 14.0 15.0 . . | | 16.0 17.0 18.0 19.0 20.0 . . | * *