Guide and Reference
This section describes how you can achieve the best possible performance
from the ESSL subroutines.
There are many ways in which you can improve the performance of your
program. Here are some of them:
- Use the basic linear algebra subprograms and matrix operations in the
order of optimum performance: matrix-matrix computations, matrix-vector
computations, and vector-scalar computations. When data is presented in
matrices or vectors, rather than vectors or scalars, multiple operations can
be performed by a single ESSL subroutine.
- Where possible, use subroutines that do multiple computations, such as
SNDOT and SNAXPY, rather than individual computations, such as SDOT and SAXPY.
- Use a stride of 1 for the data in your computations. Not having vector
elements consecutively accessed in storage can degrade your performance. The
closer the vector elements are to each other in storage, the better your
performance. For an explanation of stride, see "How Stride Is Used for Vectors".
- Do not specify the size of the leading dimension of an array
(lda) or stride of a vector (inc) equal to or near a
multiple of:
- 128 for a long-precision array
- 256 for a short-precision array
- Do not specify the individual sizes of your one-dimensional
arrays as multiples of 128. This is especially important when you are passing
several one-dimensional arrays to an ESSL subroutine. (The multiplicity can
cause a performance problem that otherwise might not occur.)
- For small problems, avoid using a large leading dimension (lda)
for your matrix.
- In general, align your arrays on doubleword boundaries, regardless of the
type of data; however, when running on a POWER2 processor, it is best to align
your long-precision arrays on a quadword boundary. For information on how your
programming language aligns data, see your programming language manuals.
- One subroutine may do scaling while another does not. If scaling is not
necessary for your data, you get better performance by using the subroutine
without scaling. SNORM2 and DNORM2 are examples of subroutines that do not do
scaling, versus SNRM2 and DNRM2, which do scaling.
- Use the STRIDE subroutine to calculate the optimal stride values for your
input or output data when using any of the Fourier transform subroutines,
except _RCFT and _CRFT. Using these stride values for your data allows the
Fourier transform subroutines to achieve maximum performance. You first obtain
the optimal stride values from STRIDE, calling it once for each stride value
desired. You then arrange your data using these stride values. After the data
is set up, you call the Fourier transform subroutine. For details on the
STRIDE subroutine and how to use it for each Fourier transform subroutine, see
STRIDE--Determine the Stride Value for Optimal Performance in Specified Fourier Transform Subroutines. For additional information, see "Setting Up Your Data".
Information about performance can be found in the following places:
- Many of the techniques ESSL uses to achieve the best possible performance
are described in the "High Performance of ESSL".
- Migration considerations concerning performance are described in "Migrating ESSL Version 2 Programs to Version 3".
- Specific information on performance for each area of ESSL is given in
"Performance and Accuracy Considerations" in each chapter introduction
in Part 2.
- Detailed performance information for selected subroutines can be found in
reference [30], [41], [42] and on the
IBM RS/6000 web site at
http://www.rs6000.ibm.com/software/Apps/essl.html.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]