XL Fortran for AIX 8.1

Language Reference

PREFETCH

Purpose

You can use prefetching to instruct the compiler to load specific data from main memory into the cache before the data is referenced. Some prefetching may be done automatically by hardware that is POWER3 and above, but since compiler-assisted software prefetching uses information found in source code, using the directive significantly reduces the number of cache misses.

XL Fortran provides five directives for compiler-assisted software prefetching, as follows:

Format

The PREFETCH directive can take the following forms:



>>-PREFETCH_BY_LOAD--(--prefetch_variable_list--)--------------><
 
 



>>-PREFETCH_FOR_LOAD--(--prefetch_variable_list--)-------------><
 
 



>>-PREFETCH_FOR_STORE--(--prefetch_variable_list--)------------><
 
 



>>-PREFETCH_BY_STREAM_BACKWARD--(--prefetch_variable--)--------><
 
 



>>-PREFETCH_BY_STREAM_FORWARD--(--prefetch_variable--)---------><
 
 

prefetch_variable
is a variable to be prefetched. The variable can be of any data type, including intrinsic and derived data types. The variable cannot be a procedure name, subroutine name, module name, function name, constant, label, zero-sized string, or an array with a vector subscript.

Rules

To use the PREFETCH_BY_STREAM_BACKWARD, PREFETCH_BY_STREAM_FORWARD, PREFETCH_FOR_LOAD and PREFETCH_FOR_STORE directives, you must compile for PowerPC hardware.

When you prefetch a variable, the memory block that includes the variable address is loaded into the cache. A memory block is equal to the size of a cache line. Since the variable you are loading into the cache may appear anywhere within the memory block, you may not be able to prefetch all the elements of an array.

These directives may appear anywhere in your source code where executable constructs may appear.

These directives can add run-time overhead to your program. Therefore you should use the directives only where necessary.

To maximize the effectiveness of the prefetch directives, it is recommended that you specify the LIGHT_SYNC directive after a single prefetch or at the end of a series of prefetches.

Examples

Example 1: This example shows valid uses of the PREFETCH_BY_LOAD, PREFETCH_FOR_LOAD, and PREFETCH_FOR_STORE directives.

For this example, assume that the size of the cache line is 64 bytes and that none of the declared data items exist in the cache at the beginning of the program. The rationale for using the directives is as follows:

      PROGRAM GOODPREFETCH
 
      REAL*4 A, B, C, TEMP
      REAL*4 ARRA(2**5), ARRB(2**10), ARRC(2**5)
      INTEGER(4) I, K
 
! Bring ARRA into cache for writing.
!IBM* PREFETCH_FOR_STORE (ARRA(1), ARRA(2**4+1))
 
! Bring ARRC into cache for reading.
!IBM* PREFETCH_FOR_LOAD (ARRC(1), ARRC(2**4+1))
 
! Bring all variables into the cache.
!IBM* PREFETCH_BY_LOAD (A, B, C, TEMP, I , K)
 
! A subroutine is called to allow clock cycles to pass so that the
! data is loaded into the cache before the data is referenced.
      CALL FOO()
      K = 32
      DO I = 1, 2 ** 10
 
! Bring ARRB(I*K) into the cache
!IBM* PREFETCH_BY_LOAD (ARRB(I*K))
        A = -I
        B = I + 1
        C = I + 2
        TEMP = SQRT(B*B - 4*A*C)
        ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A)
        ARRB(I*K) = (-B - TEMP) / (2*A)
      END DO
      END PROGRAM GOODPREFETCH

Example 2: In this example, assume that the total cache line's size is 256 bytes, and that none of the declared data items are initially stored in the cache or register. All elements of array ARRA and ARRC will then be read into the cache.

     PROGRAM PREFETCH_STREAM
 
     REAL*4 A, B, C, TEMP
     REAL*4 ARRA(2**5), ARRC(2**5), ARRB(2**10)
     INTEGER*4 I, K
 
! All elements of ARRA and ARRC are read into the cache.
!IBM* PREFETCH_BY_STREAM_FORWARD(ARRA(1))
! You can substitute PREFETCH_BY_STREAM_BACKWARD (ARRC(2**5)) to read all
! elements of ARRA and ARRC into the cache.
     K = 32
     DO I = 1, 2**5
        A = -i
        B = i + 1
        C = i + 2
        TEMP = SQRT(B*B -4*A*C)
        ARRA(I) = ARRC(I) + (-B + TEMP) / (2*A)
        ARRB(I*K) = (-B -TEMP) / (2*A)
     END DO
     END PROGRAM PREFETCH_STREAM
 

Related Information

See LIGHT_SYNC for details on the LIGHT_SYNC directive.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]