IBM Books

Using and Administering


Chapter 6. Administration Tasks for Parallel Jobs

This chapter describes administration tasks that apply to parallel jobs. For more general information on administering and configuring LoadLeveler, see Chapter 5. "Administering and Configuring LoadLeveler". For information on submitting parallel jobs, see Chapter 4. "Submitting and Managing Parallel Jobs".


Scheduling Considerations for Parallel Jobs

For parallel jobs, the LoadLeveler Backfill scheduler makes the most efficient use of your resources. This scheduler runs both serial and parallel jobs, but is primarily meant for installations running parallel jobs.

Also, The Backfill scheduler supports:

You specify the Backfill scheduler using the SCHEDULER_TYPE keyword. For more information on this keyword, and for information on other schedulers you can run, see "Choosing a Scheduler".


Setting Up to Allow Users to Submit Interactive POE Jobs

Follow the steps in this section to set up your system so that users can submit interactive POE jobs to LoadLeveler.

  1. Make sure that you have installed LoadLeveler and defined LoadLeveler administrators. See "Quick Set Up" for information on defining LoadLeveler administrators.

  2. Run the llextSDR command to extract node and adapter information from the SDR. See llextSDR - Extract adapter information from the SDR for information on using this command.

  3. Incorporate the appropriate node and adapter information into your LoadLeveler administration file stanzas.

    For example, the following output represents two adapter stanzas and their corresponding machine stanza:

    k10n09.ppd.pok.ibm.com: type = adapter
     adapter_name = en0
     network_type = ethernet
     interface_address = 9.114.51.73
     interface_name = k10n09.ppd.pok.ibm.com
     
    k10sn09.ppd.pok.ibm.com: type = adapter
     adapter_name = css0
     network_type = switch
     interface_address = 9.114.51.137
     interface_name = k10sn09.ppd.pok.ibm.com
     switch_node_number = 8
     
    k10n09.ppd.pok.ibm.com: type=machine
     adapter_stanzas = k10n09.ppd.pok.ibm.com k10sn09.ppd.pok.ibm.com
     spacct_exclusive_enable = true
    

  4. Define a machine to act as the LoadLeveler central manager. See "Quick Set Up" for more information.

  5. Define your scheduler to be the LoadLeveler Backfill scheduler by setting SCHEDULER_TYPE = BACKFILL in the LoadLeveler configuration file. See "Choosing a Scheduler" for more information.

  6. Consider setting up a class stanza for your interactive POE jobs. See "Setting Up a Class for Parallel Jobs" for more information. Define this class to be your default class for interactive jobs by specifying this class name on the default_interactive_class keyword. See "Step 2: Specify User Stanzas" for more information.

  7. Configure optional functions, including:

  8. Start LoadLeveler using the llctl command. See "Quick Set Up" for more information.

Setting Up to Allow Users to Submit PVM Jobs

If users will be submitting PVM jobs, your installation must first obtain and install PVM. PVM is a public domain package distributed through electronic mail by Oak Ridge National Labs. To obtain information on PVM, issue the following:

echo "send index from pvm3" | mail netlib@ornl.gov

For RS6K architecture PVM, LoadLeveler expects to find PVM installed in ~loadl/pvm3. You can override this using the pvm_root entry in the machine stanza. The value of pvm_root is used to set the environment variable $(PVM_ROOT) which PVM requires. For example:

gallifrey:  type = machine
central_manager = true
schedd_host = true
alias = drwho
pvm_root = /home/userid/loadl/1.2.0/aix32/pvm3

For PVM 3.3.11+ (that is, SP2MPI architecture), LoadLeveler does not expect to find PVM installed in ~loadl/pvm3. PVM 3.3.11+ must be installed in a directory accessable to and executable by all nodes in the LoadLeveler cluster. Administrators must communicate the location of this directory to their users.

Running PVM requires that each user be allowed to run only one instance of PVM per machine. In order to ensure that LoadLeveler does not attempt to start more than one PVM job per machine, you can set up a class for PVM jobs. To do this, you need to add a class stanza to your administration file and a class statement to your configuration file. The following is an example of a PVM class stanza that you can add to your administration file:

PVM3:  type = class
max_node = 15  # max of 15 processors per user per job

The following is an example of statements that you can add to your configuration file:

MAX_STARTERS = 2
Class = {"ClassA" "ClassA" "PVM3" }

This combination of the MAX_STARTERS keyword and the Class keyword allows two jobs of Class A or one job of Class A and one of class PVM3 to start. Limiting PVM jobs by using a class where MAX_STARTERS is greater than 1 is only a policy. The user can still submit a PVM job to Class A. Note also that setting MAX_STARTERS=1 would enforce a policy of one job per machine.

See "Common Set Up Problems with Parallel Jobs" for more information.

Restrictions and Limitations for PVM Jobs

For PVM 3.3, dynamic allocation and de-allocation of parallel machines are not supported.


Setting Up a Class for Parallel Jobs

To define the characteristics of parallel jobs run by your installation you should set up a class stanza in the administration file and define a class (on the Class statement in the configuration file) for each task you want to run on a node.

Suppose your installation plans to submit long-running parallel jobs, and you want to define the following characteristics:

The following is a sample class stanza for long-running parallel jobs which takes into account the above characteristics:

  long_parallel: type=class
  wall_clock_limit = 1800
  include_users = jack queen king ace
  priority = 50
  total_tasks = 120
  max_node = 60
  maxjobs = 2

Note the following about this class stanza:

Suppose users need to submit job command files containing the following statements:

  node = 30
  tasks_per_node = 4

You must code the Class statement such that at least 30 nodes have four or more long_parallel classes defined. That is, the configuration file for each of these nodes must include the following statement:

  Class = { "long_parallel" "long_parallel" "long_parallel" "long_parallel" }

Setting Up a Parallel Master Node

LoadLeveler allows you to define a parallel master node which LoadLeveler will select as the first node for a job submitted to a particular class. To set up a parallel master node, code the following keywords in the class stanza and the machine stanza of the administration file:

# MACHINE STANZA: (optional)
mach1:     type = machine
master_node_exclusive = true
 
# CLASS STANZA: (optional)
pmv3:      type = class
master_node_requirement = true

master_node_requirement = true forces all parallel jobs in this class to use, as their first node, a machine with the master_node_exclusive = true setting. For more information of these keywords, see "Step 1: Specify Machine Stanzas" and "Step 3: Specify Class Stanzas".


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]