XL Fortran for AIX 8.1

User's Guide


Optimizing Loops and Array Language

The -qhot option does the following transformations to improve the performance of loops, array language, and memory management:

It requires at least level 2 of -O. The -C option inhibits it.

If you have SMP hardware, you can enable automatic parallelization of loops by specifying the -qsmp option. This optimization includes explicitly coded DO loops as well as DO loops that are generated by the compiler for array language (WHERE, FORALL, array assignment, and so on). The compiler can only parallelize loops that are independent (each iteration can be computed independently of any other iteration). One case where the compiler will not automatically parallelize loops is where the loops contain I/O, because doing so could lead to unexpected results. In this case, by using the PARALLEL DO directive, you can advise the compiler that such a loop can be safely parallelized. However, the type of I/O must be one of the following:

For more details, refer to the description of the PARALLEL DO directive in the XL Fortran for AIX Language Reference.

You can use the -qhot and -qsmp options on:

Related Information:
See the following sections:

Unrolling Loops

Loop unrolling involves expanding the loop body to do the work of two, three, or more iterations, and reducing the iteration count proportionately. Benefits to loop unrolling on programs compiled for the POWER, POWER2, and PowerPC architecture include the following:

Loop unrolling also increases code sizes in the new loop body, which can increase register allocation and possibly cause register spilling. For this reason, unrolling sometimes does not improve performance.

Related Information:
See -qunroll Option.

Efficiency of Different Array Forms

In general, operations on arrays with constant or adjustable bounds, assumed-size arrays, and pointee arrays require less processing than those on automatic, assumed-shape, or deferred-shape arrays and are thus likely to be faster.

Reducing Use of Temporary Arrays

If your program uses array language but never performs array assignments where the array on the left-hand side of the expression overlaps the array on the right-hand side, specifying the option -qalias=noaryovrlp can improve performance by reducing the use of temporary array objects.

The -qhot option also eliminates many temporary arrays.

Cost Model for Loop Transformations

The loop transformations performed by the -qhot option are controlled by a set of assumptions about the characteristics of typical loops and the costs (in terms of registers used and potential delays introduced) of performing particular transformations.

The cost model takes into consideration:

When the compiler can determine information precisely, such as the number of iterations of a loop, it uses this information to improve the accuracy of the cost model at that location in the program. If it cannot determine the information, the compiler relies on the default assumptions of the cost model. You can change these default assumptions, and thus influence how the compiler optimizes loops, by specifying compiler options:

A program might contain a variety of loops, some of which are speeded up by these options and others unaffected or even slowed down. Therefore, you might want to determine which loops benefit most from which options, split some loops into different files, and compile the files with the set of options that suits them best.

Describing the Hardware Configuration

The -qtune setting determines the default assumptions about the number of registers and functional units in the processor. For example, when tuning loops, -qtune=pwr2 may cause the compiler to unroll most of the inner loops to a depth of two to take advantage of the extra arithmetic units.

The -qcache setting determines the blocking factor that the compiler uses when it blocks loops. The more cache memory that is available, the larger the blocking factor.

Array Padding

Because of the implementation of the POWER, POWER2, POWER3,, POWER4,, and PowerPC cache architecture, array dimensions that are powers of 2 can lead to decreased cache utilization.

The optional arraypad suboption of the -qhot option permits the compiler to increase the dimensions of arrays where doing so might improve the efficiency of array-processing loops. If you have large arrays with some dimensions (particularly the first one) that are powers of 2 or if you find that your array-processing programs are slowed down by cache misses or page faults, consider specifying -qhot=arraypad or -qhot=arraypad=n rather than just -qhot.

The padding that -qhot=arraypad performs is conservative. It also assumes that there are no cases in the source code (such as those created by an EQUIVALENCE statement) where storage elements have a relationship that is broken by padding. You can also manually pad array dimensions if you determine that doing so does not affect the program's results.

The additional storage taken up by the padding, especially for arrays with many dimensions, might increase the storage overhead of the program to the point where it slows down again or even runs out of storage. For more information, see -qhot Option.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]