Guide and Reference

Performance and Accuracy Considerations

In ESSL, the SSCAL and DSCAL subroutines provide the fastest way to zero out contiguous (stride 1) arrays, by specifying incx = 1 and alpha = 0.
Where possible, use the matrix-vector linear algebra subprograms, rather than the vector-scalar, to optimize performance. Because data is presented in matrices rather than vectors, multiple operations can be performed by a single ESSL subprogram.
Where possible, use subprograms that do multiple computations, such as SNDOT and SNAXPY, rather than individual computations, such as SDOT and SAXPY. You get better performance.
Many of the short-precision subprograms provide increased accuracy by accumulating results in long precision. This is noted in the functional description of each subprogram.
In some of the subprograms, because implementation techniques vary to optimize performance, accuracy of the results may vary for different array sizes. In the subprograms in which this occurs, a general description of the implementation techniques is given in the functional description for each subprogram.
To select the sparse matrix subroutine that gives you the best performance, you must consider the layout of the data in your matrix. From this, you can determine the most efficient storage mode for your sparse matrix. ESSL provides two versions of each of its sparse matrix-vector subroutines that you can use. One operates on sparse matrices stored in compressed-matrix storage mode, and the other operates on sparse matrices stored in compressed-diagonal storage mode. These two storage modes are described in "Sparse Matrix".
Compressed-matrix storage mode is generally applicable. It should be used when each row of the matrix contains approximately the same number of nonzero elements. However, if the matrix has a special form--that is, where the nonzero elements are concentrated along a few diagonals--compressed-diagonal storage mode gives improved performance.
There are some ESSL-specific rules that apply to the results of computations on the workstation processors using the ANSI/IEEE standards. For details, see "What Data Type Standards Are Used by ESSL, and What Exceptions Should You Know About?".

Vector-Scalar Subprograms

This section contains the vector-scalar subprogram descriptions.

ISAMAX, IDAMAX, ICAMAX, and IZAMAX--Position of the First or Last Occurrence of the Vector Element Having the Largest Magnitude

ISAMAX and IDAMAX find the position i of the first or last occurrence of a vector element having the maximum absolute value. ICAMAX and IZAMAX find the position i of the first or last occurrence of a vector element having the largest sum of the absolute values of the real and imaginary parts of the vector elements.

You get the position of the first or last occurrence of an element by specifying positive or negative stride, respectively, for vector x. Regardless of the stride, the position i is always relative to the location specified in the calling sequence for vector x (in argument x).

Table 36. Data Types

x Subprogram
Short-precision real ISAMAX
Long-precision real IDAMAX
Short-precision complex ICAMAX
Long-precision complex IZAMAX

Syntax

Fortran	ISAMAX \| IDAMAX \| ICAMAX \| IZAMAX (`n`, `x`, `incx`)
C and C++	isamax \| idamax \| icamax \| izamax (`n`, `x`, `incx`);
PL/I	ISAMAX \| IDAMAX \| ICAMAX \| IZAMAX (`n`, `x`, `incx`);

On Entry

n: is the number of elements in vector x. Specified as: a fullword integer; n >= 0.
x: is the vector x of length n. Specified as: a one-dimensional array of (at least) length 1+(n-1)|incx|, containing numbers of the data type indicated in Table 36.
incx: is the stride for vector x. Specified as: a fullword integer. It can have any value.

On Return

Function value

is the position i of the element in the array, where:

If incx >= 0, i is the position of the first occurrence.

If incx < 0, i is the position of the last occurrence.

Returned as: a fullword integer; 0 <= i <= n.

Note

Declare the ISAMAX, IDAMAX, ICAMAX, and IZAMAX functions in your program as returning a fullword integer value.

Function

ISAMAX and IDAMAX find the first element x_k, where k is defined as the smallest index k, such that:

|x_k| = max{|x_j| for j = 1, n}

ICAMAX and IZAMAX find the first element x_k, where k is defined as the smallest index k, such that:

|a_k|+|b_k| = max{|a_j|+|b_j| for j = 1, n}

where x_k = (a_k, b_k)

By specifying a positive or negative stride for vector x, the first or last occurrence, respectively, is found in the array. The position i, returned as the value of the function, is always figured relative to the location specified in the calling sequence for vector x (in argument x). Therefore, depending on the stride specified for incx, i has the following values:

For incx >= 0, i = k

For incx < 0, i = n-k+1

See reference [73]. The result is returned as a function value. If n is 0, then 0 is returned as the value of the function.

Error Conditions

Computational Errors

None

Input-Argument Errors

n < 0

Example 1

This example shows a vector, x, with a stride of 1.

Function Reference and Input

               N   X   INCX
               |   |    |
IMAX = ISAMAX( 9 , X ,  1   )
 
X        =  (1.0, 2.0, 7.0, -8.0, -5.0, -10.0, -9.0, 10.0, 6.0)

x	Subprogram
Short-precision real	ISAMAX
Long-precision real	IDAMAX
Short-precision complex	ICAMAX
Long-precision complex	IZAMAX

x	Subprogram
Short-precision real	ISAMIN
Long-precision real	IDAMIN

Fortran	ISAMIN \| IDAMIN (`n`, `x`, `incx`)
C and C++	isamin \| idamin (`n`, `x`, `incx`);
PL/I	ISAMIN \| IDAMIN (`n`, `x`, `incx`);

x	Subprogram
Short-precision real	ISMAX
Long-precision real	IDMAX

Fortran	ISMAX \| IDMAX (`n`, `x`, `incx`)
C and C++	ismax \| idmax (`n`, `x`, `incx`);
PL/I	ISMAX \| IDMAX (`n`, `x`, `incx`);

x	Subprogram
Short-precision real	ISMIN
Long-precision real	IDMIN

Fortran	ISMIN \| IDMIN (`n`, `x`, `incx`)
C and C++	ismin \| idmin (`n`, `x`, `incx`);
PL/I	ISMIN \| IDMIN (`n`, `x`, `incx`);

x	Result	Subprogram
Short-precision real	Short-precision real	SASUM
Long-precision real	Long-precision real	DASUM
Short-precision complex	Short-precision real	SCASUM
Long-precision complex	Long-precision real	DZASUM

Fortran	SASUM \| DASUM \| SCASUM \| DZASUM (`n`, `x`, `incx`)
C and C++	sasum \| dasum \| scasum \| dzasum (`n`, `x`, `incx`);
PL/I	SASUM \| DASUM \| SCASUM \| DZASUM (`n`, `x`, `incx`);

`alpha`, x, y	Subprogram
Short-precision real	SAXPY
Long-precision real	DAXPY
Short-precision complex	CAXPY
Long-precision complex	ZAXPY

Fortran	CALL SCOPY \| DCOPY \| CCOPY \| ZCOPY (`n`, `x`, `incx`, `y`, `incy`)
C and C++	scopy \| dcopy \| ccopy \| zcopy (`n`, `x`, `incx`, `y`, `incy`);
PL/I	CALL SCOPY \| DCOPY \| CCOPY \| ZCOPY (`n`, `x`, `incx`, `y`, `incy`);

Fortran	SDOT \| DDOT \| CDOTU \| ZDOTU \| CDOTC \| ZDOTC (`n`, `x`, `incx`, `y`, `incy`)
C and C++	sdot \| ddot \| cdotu \| zdotu \| cdotc \| zdotc (`n`, `x`, `incx`, `y`, `incy`);
PL/I	SDOT \| DDOT \| CDOTU \| ZDOTU \| CDOTC \| ZDOTC (`n`, `x`, `incx`, `y`, `incy`);

Fortran	CALL SNAXPY \| DNAXPY (`n`, `m`, `a`, `inca`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`)
C and C++	snaxpy \| dnaxpy (`n`, `m`, `a`, `inca`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`);
PL/I	CALL SNAXPY \| DNAXPY (`n`, `m`, `a`, `inca`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`);

`s`_i <-- x_i * y_i	Store positive dot product
`s`_i <-- -x_i * y_i	Store negative dot product
`s`_i <-- `s`_i+x_i * y_i	Accumulate positive dot product
`s`_i <-- `s`_i-x_i * y_i	Accumulate negative dot product
for `i` = 1, `n`

Fortran	CALL SAXPY \| DAXPY \| CAXPY \| ZAXPY (`n`, `alpha`, `x`, `incx`, `y`, `incy`)
C and C++	saxpy \| daxpy \| caxpy \| zaxpy (`n`, `alpha`, `x`, `incx`, `y`, `incy`);
PL/I	CALL SAXPY \| DAXPY \| CAXPY \| ZAXPY (`n`, `alpha`, `x`, `incx`, `y`, `incy`);

x, y	Subprogram
Short-precision real	SCOPY
Long-precision real	DCOPY
Short-precision complex	CCOPY
Long-precision complex	ZCOPY

x, y, Result	Subprogram
Short-precision real	SDOT
Long-precision real	DDOT
Short-precision complex	CDOTU and CDOTC
Long-precision complex	ZDOTU and ZDOTC

a, x, y	Subprogram
Short-precision real	SNAXPY
Long-precision real	DNAXPY

s, x, y	Subprogram
Short-precision real	SNDOT
Long-precision real	DNDOT

Fortran	CALL SNDOT \| DNDOT (`n`, `m`, `s`, `incs`, `isw`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`)
C and C++	sndot \| dndot (`n`, `m`, `s`, `incs`, `isw`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`);
PL/I	CALL SNDOT \| DNDOT (`n`, `m`, `s`, `incs`, `isw`, `x`, `incxi`, `incxo`, `y`, `incyi`, `incyo`);

Fortran	SNRM2 \| DNRM2 \| SCNRM2 \| DZNRM2 (`n`, `x`, `incx`)
C and C++	snrm2 \| dnrm2 \| scnrm2 \| dznrm2 (`n`, `x`, `incx`);
PL/I	SNRM2 \| DNRM2 \| SCNRM2 \| DZNRM2 (`n`, `x`, `incx`);

Fortran	SNORM2 \| DNORM2 \| CNORM2 \| ZNORM2 (`n`, `x`, `incx`)
C and C++	snorm2 \| dnorm2 \| cnorm2 \| znorm2 (`n`, `x`, `incx`);
PL/I	SNORM2 \| DNORM2 \| CNORM2 \| ZNORM2 (`n`, `x`, `incx`);

`a`, `b`, `r`, `s`	`c`	`z`	Subprogram
Short-precision real	Short-precision real	Short-precision real	SROTG
Long-precision real	Long-precision real	Long-precision real	DROTG
Short-precision complex	Short-precision real	(No value returned)	CROTG
Long-precision complex	Long-precision real	(No value returned)	ZROTG

Fortran	CALL SROTG \| DROTG \| CROTG \| ZROTG (`a`, `b`, `c`, `s`)
C and C++	srotg \| drotg \| crotg \| zrotg (`a`, `b`, `c`, `s`);
PL/I	CALL SROTG \| DROTG \| CROTG \| ZROTG (`a`, `b`, `c`, `s`);

Fortran	CALL SROT \| DROT \| CROT \| ZROT \| CSROT \| ZDROT (`n`, `x`, `incx`, `y`, `incy`, `c`, `s`)
C and C++	srot \| drot \| crot \| zrot \| csrot \| zdrot (`n`, `x`, `incx`, `y`, `incy`, `c`, `s`);
PL/I	CALL SROT \| DROT \| CROT \| ZROT \| CSROT \| ZDROT (`n`, `x`, `incx`, `y`, `incy`, `c`, `s`);

alpha	x	Subprogram
Short-precision real	Short-precision real	SSCAL
Long-precision real	Long-precision real	DSCAL
Short-precision complex	Short-precision complex	CSCAL
Long-precision complex	Long-precision complex	ZSCAL
Short-precision real	Short-precision complex	CSSCAL
Long-precision real	Long-precision complex	ZDSCAL

Fortran	CALL SSCAL \| DSCAL \| CSCAL \| ZSCAL \| CSSCAL \| ZDSCAL (`n`, `alpha`, `x`, `incx`)
C and C++	sscal \| dscal \| cscal \| zscal \| csscal \| zdscal (`n`, `alpha`, `x`, `incx`);
PL/I	CALL SSCAL \| DSCAL \| CSCAL \| ZSCAL \| CSSCAL \| ZDSCAL (`n`, `alpha`, `x`, `incx`);

Fortran	CALL SSWAP \| DSWAP \| CSWAP \| ZSWAP (`n`, `x`, `incx`, `y`, `incy`)
C and C++	sswap \| dswap \| cswap \| zswap (`n`, `x`, `incx`, `y`, `incy`);
PL/I	CALL SSWAP \| DSWAP \| CSWAP \| ZSWAP (`n`, `x`, `incx`, `y`, `incy`);

x, y, z	Subprogram
Short-precision real	SVEA
Long-precision real	DVEA
Short-precision complex	CVEA
Long-precision complex	ZVEA

Fortran	CALL SVEA \| DVEA \| CVEA \| ZVEA (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`)
C and C++	svea \| dvea \| cvea \| zvea (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);
PL/I	CALL SVEA \| DVEA \| CVEA \| ZVEA (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);

Fortran	CALL SVES \| DVES \| CVES \| ZVES (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`)
C and C++	sves \| dves \| cves \| zves (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);
PL/I	CALL SVES \| DVES \| CVES \| ZVES (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);

Fortran	CALL SVEM \| DVEM \| CVEM \| ZVEM (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`)
C and C++	svem \| dvem \| cvem \| zvem (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);
PL/I	CALL SVEM \| DVEM \| CVEM \| ZVEM (`n`, `x`, `incx`, `y`, `incy`, `z`, `incz`);

Fortran	CALL SYAX \| DYAX \| CYAX \| ZYAX \| CSYAX \| ZDYAX (`n`, `alpha`, `x`, `incx`, `y`, `incy`)
C and C++	syax \| dyax \| cyax \| zyax \| csyax \| zdyax (`n`, `alpha`, `x`, `incx`, `y`, `incy`);
PL/I	CALL SYAX \| DYAX \| CYAX \| ZYAX \| CSYAX \| ZDYAX (`n`, `alpha`, `x`, `incx`, `y`, `incy`);

alpha, x, y, z	Subprogram
Short-precision real	SZAXPY
Long-precision real	DZAXPY
Short-precision complex	CZAXPY
Long-precision complex	ZZAXPY