This chapter tells you how to submit and manage parallel jobs. For information on setting up and planning for parallel jobs, see Chapter 6. "Administration Tasks for Parallel Jobs".
LoadLeveler allows you to schedule parallel batch jobs that have been written using the following:
Note that for parallel batch jobs, LoadLeveler no longer interacts with the PSSP Resource Manager, since all Resource Manager function has been incorporated into LoadLeveler. For more information, see "Resource Manager Functions Now in LoadLeveler".
Several LoadLeveler job command language keywords are associated with parallel jobs. Whether a keyword is appropriate is dependent upon the type of job and the type of LoadLeveler scheduler you are running.
Table 4 shows you the parallel keywords supported by the LoadLeveler Backfill
scheduler, based on the type of job you are running.
Table 4. Parallel Keywords Supported by the Backfill Scheduler
job_type=parallel | job_type=pvm3 |
---|---|
|
|
Table 5 shows you the parallel keywords supported by the default LoadLeveler
scheduler, based on the type of job you are running.
Table 5. Parallel Keywords Supported by the Default Scheduler
job_type=parallel | job_type=pvm3 |
---|---|
|
|
These keywords are used in the examples in this chapter, and are described in more detail in "Job Command File Keywords".
If you disable the default LoadLeveler scheduler to run an external scheduler, see "Usage Notes" for an explanation of which keywords are supported.
This section contains sample job command files for the following parallel environments:
Figure 17 is a sample job command file for POE 2.4.0.
Figure 17. POE 2.4.0 Job Command File - Multiple Tasks Per Node
# # @ job_type = parallel # @ environment = COPY_ALL # @ output = poe.out # @ error = poe.error # @ node = 8,10 # @ tasks_per_node = 2 # @ network.LAPI = switch,shared,US # @ network.MPI = switch,shared,US # @ wall_clock_limit = 60 # @ executable = /usr/bin/poe # @ arguments = /u/richc/My_POE_program -euilib "us" # @ class = POE # @ queue |
Figure 17 shows the following:
Figure 18 is a second sample job command file for POE 2.4.0.
Figure 18. POE Sample Job Command File - Invoking POE Twice
# # @ job_type = parallel # @ input = poe.in.1 # @ output = poe.out.1 # @ error = poe.err # @ node = 2,8 # @ network.MPI = switch,shared,IP # @ wall_clock_limit = 60 # @ class = POE # @ queue /usr/bin/poe /u/richc/my_POE_setup_program -infolevel 2 /usr/bin/poe /u/richc/my_POE_main_program -infolevel 2 |
Figure 18 shows the following:
Figure 19 shows a sample job command file for PVM 3.3 (RS6K architecture). Before using PVM, users should contact their administrator to determine which PVM architecture has been installed.
Figure 19. Sample PVM 3.3 Job Command File
# @ executable = my_PVM_program # @ job_type = pvm3 # @ parallel_path = /home/LL_userid/cmds/pvm3/$PVM_ARCH:$PVM_ROOT/lib/$PVM_ARCH # @ class = PVM3 # @ requirements = (Pool == 4) # @ output = my_PVM_program.$(cluster).$(process).out # @ error = my_PVM_program.$(cluster).$(process).err # @ min_processors = 8 # @ max_processors = 10 # @ queue |
Note the following requirements for PVM 3.3 (RS6K architecture) jobs:
Figure 20 shows a sample job command file for PVM 3.3.11+ (SP2MPI architecture). Before using PVM, users should contact their administrator to determine which PVM architecture has been installed. The SP2MPI architecture version should be used when users require that their jobs run in user space.
Figure 20. Sample PVM 3.3.11+ (SP2MPI Architecture) Job Command File
# @ job_type = parallel # @ class = PVM3 # @ requirements = (Adapter == "hps_us") # @ output = my_PVM_program.$(cluster).$(process).out # @ error = my_PVM_program.$(cluster).$(process).err # @ node = 3,3 # @ queue # Set PVM daemon and starter path dictated by LoadLeveler administrator starter_path=/home/userid/loadl/pvm3/bin/SP2MPI daemon_path=/home/userid/loadl/pvm3/lib/SP2MPI # Export "MP_EUILIB" before starting PVM3 (default is "ip") export MP_EUILIB=us echo MP_EUILIB=$MP_EUILIB # Clean up old PVM log and daemon files belonging to user filelog=/tmp/pvml.`id | awk -F'=' '{print $2}' | awk -F'(' '{print $1}'` filedaemon=/tmp/pvmd.`id | awk -F'=' '{print $2}' | awk -F'(' '{print $1}'` rm -f $filelog > /dev/null rm -f $filedaemon > /dev/null # Start PVM daemon in background $daemon_path/pvmd3 & echo "pvm background pid=$!" echo "Sleep 2 seconds" sleep 2 echo "PVM daemon started" # Start parallel executable llnode_cnt=`echo "$LOADL_PROCESSOR_LIST" | awk '{print NF}'` actual_cnt=`expr "$llnode_cnt" - 1` $starter_path/starter -n $actual_cnt /home/userid/my_PVM_program echo "Parallel executable starting" # Check processes running and halt PVM daemon echo "ps -a" | /home/userid/loadl/pvm3/lib/SP2MPI/pvm echo "Halt PVM daemon" echo "halt" | /home/userid/loadl/pvm3/lib/SP2MPI/pvm wait echo "PVM daemon completed" |
Note the following requirements for PVM 3.3.11+ (SP2MPI architecture) jobs:
This example demonstrates the sequence of events that occur when you submit the sample job command file shown in Figure 20.
Figure 21 illustrates the following:
Figure 21. Sequence of Events in a PVM 3.3.11+ Job
View figure.
Both end users and LoadLeveler administrators can obtain status of parallel jobs in the same way as they obtain status of serial jobs - either by using the llq command or by viewing the Jobs window on the graphical user interface (GUI). By issuing llq -l, or by using the Job Details selection in the GUI, users get a list of machines allocated to the parallel job. See llq - Query Job Status for sample output from an llq -l command issued to query a parallel job.
Also, administrators can create a class for parallel jobs. Users can check the status of their parallel jobs by specifying this class in the Class field on the Jobs window of the GUI.
llq -l output includes information on allocated host names. Another way to obtain the allocated host names is with the LOADL_PROCESSOR_LIST environment variable, which you can use from a shell script in your job command file as shown in Figure 22.
This example uses LOADL_PROCESSOR_LIST to perform a remote copy of a local file to all of the nodes, and then invokes POE. Note that the processor list contains an entry for each task running on a node. If two tasks are running on a node, LOADL_PROCESSOR_LIST will contain two instances of the host name where the tasks are running. The example in Figure 22 removes any duplicate entries.
Note that LOADL_PROCESSOR_LIST is set by LoadLeveler, not by the user.
Figure 22. Using LOADL_PROCESSOR_LIST in a Shell Script
#!/bin/ksh # @ output = my_POE_program.$(cluster).$(process).out # @ error = my_POE_program.$(cluster).$(process).err # @ class = POE # @ job_type = parallel # @ node = 8,12 # @ network.MPI = css0,shared,US # @ queue tmp_file="/tmp/node_list" rm -f $tmp_file # Copy each entry in the list to a new line in a file so # that duplicate entries can be removed. for node in $LOADL_PROCESSOR_LIST do echo $node >> $tmp_file done # Sort the file removing duplicate entries and save list in variable nodelist= `sort -u /tmp/node_list` for node in $nodelist do rcp localfile $node:/home/userid done rm -f $tmp_file /usr/bin/poe /home/userid/my_POE_program |