XL Fortran for AIX 8.1

Language Reference

SCHEDULE

Purpose

The SCHEDULE directive allows the user to specify the chunking method for parallelization. Work is assigned to threads in different manners depending on the scheduling type or chunk size used.

The SCHEDULE directive only takes effect if you specify the -qsmp compiler option.

Format



>>-SCHEDULE--(--sched_type--+------+--)------------------------><
                            '-,--n-'
 
 

n
n must be a positive, specification expression. You must not specify n for the sched_type RUNTIME.

sched_type
is AFFINITY, DYNAMIC, GUIDED, RUNTIME, or STATIC

Definitions:

number_of_iterations
is the number of iterations in the loop to be parallelized.

number_of_threads
is the number of threads used by the program.

AFFINITY
The iterations of a loop are initially divided into number_of_threads partitions, containing
CEILING(number_of_iterations /
number_of_threads)

iterations. Each partition is initially assigned to a thread, and is then further subdivided into chunks containing n iterations, if n has been specified. If n has not been specified, then the chunks consist of

CEILING(number_of_iterations_remaining_in_partition / 2)

loop iterations.

When a thread becomes free, it takes the next chunk from its initially assigned partition. If there are no more chunks in that partition, then the thread takes the next available chunk from a partition that is initially assigned to another thread.

Threads that are active will complete the work in a partition that is initially assigned to a sleeping thread.

DYNAMIC
If n has been specified, the iterations of a loop are divided into chunks containing n iterations each. If n has not been specified, then the default chunk size is 1 iteration.

Threads are assigned these chunks on a "first-come, first-do" basis. Chunks of the remaining work are assigned to available threads, until all work has been assigned.

If a thread is asleep, its assigned work will be taken over by an active thread, once that other thread becomes available.

GUIDED
If you specify a value for n, the iterations of a loop are divided into chunks such that the size of each successive chunk is exponentially decreasing. n specifies the size of the smallest chunk, except possibly the last. If you do not specify a value for n, the default value is 1.

The size of the initial chunk is

CEILING(number_of_iterations /
number_of_threads)
iterations. Subsequent chunks consist of
CEILING(number_of_iterations_remaining
/
number_of_threads)
iterations. As each thread finishes a chunk, it dynamically obtains the next available chunk.

You can use guided scheduling in a situation in which multiple threads in a team might arrive at a DO work-sharing construct at varying times, and each iteration requires roughly the same amount of work. For example, if you have a DO loop preceded by one or more work-sharing SECTIONS or DO constructs with NOWAIT clauses, you can guarantee that no thread waits at the barrier longer than it takes another thread to execute its final iteration, or final k iterations if a chunk size of k is specified. The GUIDED schedule requires the fewest synchronizations of all the scheduling methods.

An n expression is evaluated outside of the context of the DO construct. Any function reference in the n expression must not have side effects.

The value of the n parameter on the SCHEDULE clause must be the same for all of the threads in the team.

RUNTIME
Determine the scheduling type at run time.

At run time, the scheduling type can be specified using the environment variable XLSMPOPTS. If no scheduling type is specified using that variable, then the default scheduling type used is STATIC.

STATIC
If n has been specified, the iterations of a loop are divided into chunks that contain n iterations. Each thread is assigned chunks in a "round robin" fashion. This is known as block cyclic scheduling. If the value of n is 1, then the scheduling type is specifically referred to as cyclic scheduling.

If n has not been specified, the chunks will contain

CEILING(number_of_iterations /
number_of_threads)

iterations. Each thread is assigned one of these chunks. This is known as block cyclic scheduling.

If a thread is asleep and it has been assigned work, it will be awakened so that it may complete its work.

STATIC is the default scheduling type if the user has not specified any scheduling type at compile-time or run time.

Rules

The SCHEDULE directive must appear in the specification part of a scoping unit.

Only one SCHEDULE directive may appear in the specification part of a scoping unit.

The SCHEDULE directive applies to one of the following:

Any dummy arguments appearing or referenced in the specification expression for the chunk size n must also appear in the SUBROUTINE or FUNCTION statement and in all ENTRY statements appearing in the given subprogram.

If the specified chunk size n is greater than the number of iterations, the loop will not be parallelized and will execute on a single thread.

If you specify more than one method of determining the chunking algorithm, the compiler will follow, in order of precedence:

  1. SCHEDULE clause to the PARALLEL DO directive.
  2. SCHEDULE directive
  3. schedule suboption to the -qsmp compiler option. See "-qsmp Option" in the User's Guide
  4. XLSMPOPTS run-time option. See "XLSMPOPTS" in the User's Guide
  5. run-time default (that is, STATIC)

Examples

Example 1. Given the following information:

number of
iterations = 1000
number of threads = 4
 

and using the GUIDED scheduling type, the chunk sizes would be as follows:

250 188 141 106 79 59 45 33 25 19 14 11 8 6 4 3 3 2 1 1 1 1
 

The iterations would then be divided into the following chunks:

chunk  1 = iterations    1 to  250
chunk  2 = iterations  251 to  438
chunk  3 = iterations  439 to  579
chunk  4 = iterations  580 to  685
chunk  5 = iterations  686 to  764
chunk  6 = iterations  765 to  823
chunk  7 = iterations  824 to  868
chunk  8 = iterations  869 to  901
chunk  9 = iterations  902 to  926
chunk 10 = iterations  927 to  945
chunk 11 = iterations  946 to  959
chunk 12 = iterations  960 to  970
chunk 13 = iterations  971 to  978
chunk 14 = iterations  979 to  984
chunk 15 = iterations  985 to  988
chunk 16 = iterations  989 to  991
chunk 17 = iterations  992 to  994
chunk 18 = iterations  995 to  996
chunk 19 = iterations  997 to  997
chunk 20 = iterations  998 to  998
chunk 21 = iterations  999 to  999
chunk 22 = iterations 1000 to 1000
 

A possible scenario for the division of work could be:

thread 1
executes chunks 1 5 10 13 18 20
thread 2 executes chunks 2 7  9 14 16 22
thread 3 executes chunks 3 6 12 15 19
thread 4 executes chunks 4 8 11 17 21
 

Example 2. Given the following information:

number of
iterations = 100
number of threads = 4
 

and using the AFFINITY scheduling type, the iterations would be divided into the following partitions:

partition 1 = iterations  1 to 25
partition 2 = iterations 26 to  50
partition 3 = iterations 51 to  75
partition 4 = iterations 76 to 100
 

The partitions would be divided into the following chunks:

chunk 1a = iterations   1 to  13
chunk 1b = iterations  14 to  19
chunk 1c = iterations  20 to  22
chunk 1d = iterations  23 to  24
chunk 1e = iterations  25 to  25
 
chunk 2a = iterations  26 to  38
chunk 2b = iterations  39 to  44
chunk 2c = iterations  45 to  47
chunk 2d = iterations  48 to  49
chunk 2e = iterations  50 to  50
 
chunk 3a = iterations  51 to  63
chunk 3b = iterations  64 to  69
chunk 3c = iterations  70 to  72
chunk 3d = iterations  73 to  74
chunk 3e = iterations  75 to  75
 
chunk 4a = iterations  76 to  88
chunk 4b = iterations  89 to  94
chunk 4c = iterations  95 to  97
chunk 4d = iterations  98 to  99
chunk 4e = iterations 100 to 100
 

A possible scenario for the division of work could be:

thread 1
executes chunks 1a 1b 1c 1d 1e 4d
thread 2 executes chunks 2a 2b 2c 2d
thread 3 executes chunks 3a 3b 3c 3d 3e 2e
thread 4 executes chunks 4a 4b 4c 4e
 

Note that in this scenario, thread 1 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 4. Similarly, thread 3 finished executing all the chunks in its partition and then grabbed an available chunk from the partition of thread 2.

Example 3. Given the following information:

number of
iterations = 1000
number of threads = 4
 

and using the DYNAMIC scheduling type and chunk size of 100, the chunk sizes would be as follows:

100 100 100 100 100 100 100 100 100
100
 

The iterations would be divided into the following chunks:

chunk  1 = iterations   1 to  100
chunk  2 = iterations 101 to  200
chunk  3 = iterations 201 to  300
chunk  4 = iterations 301 to  400
chunk  5 = iterations 401 to  500
chunk  6 = iterations 501 to  600
chunk  7 = iterations 601 to  700
chunk  8 = iterations 701 to  800
chunk  9 = iterations 801 to  900
chunk 10 = iterations 901 to 1000
 

A possible scenario for the division of work could be:

thread 1
executes chunks 1  5  9
thread 2 executes chunks 2  8
thread 3 executes chunks 3  6  10
thread 4 executes chunks 4  7
 

Example 4. Given the following information:

number of
iterations = 100
number of threads = 4
 

and using the STATIC scheduling type, the iterations would be divided into the following chunks:

chunk 1 = iterations  1 to  25
chunk 2 = iterations 26 to  50
chunk 3 = iterations 51 to  75
chunk 4 = iterations 76 to 100
 

A possible scenario for the division of work could be:

thread 1
executes chunks 1
thread 2 executes chunks 2
thread 3 executes chunks 3
thread 4 executes chunks 4
 

Related Information


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]