LoadLeveler is a job management system that allows users to run more jobs in less time by matching their processing needs to available resources. LoadLeveler serves as a job scheduler and provides a facility for building, submitting, and processing jobs quickly and efficiently in a dynamic environment.
Figure 1 shows the different environments to which LoadLeveler can schedule jobs. Together, these environments comprise the LoadLeveler cluster. An environment can include heterogeneous clusters, dedicated nodes, and the RISC System/6000 Scalable POWERparallel System (SP).
Figure 1. Example of a LoadLeveler Configuration
View figure.
In addition, LoadLeveler can schedule jobs written for NQS to machines outside of the LoadLeveler cluster for execution. As Figure 1 also illustrates, a LoadLeveler cluster can include submit-only machines, which allow users to have access to a limited number of LoadLeveler features. This type of machine is further discussed in "Roles of Machines".
This section describes how LoadLeveler works by introducing some basic job scheduling concepts.
A network job management and job scheduling system, such as LoadLeveler, is a software program that schedules and manages jobs that you submit to one or more machines under its control. LoadLeveler accepts jobs that users submit and reviews the job requirements. LoadLeveler then examines the machines under its control to determine which machines are best suited to run each job.
LoadLeveler schedules your jobs on one or more machines for processing. The definition of a job, in this context, is a set of job steps. For each job step, you can specify a different executable. (The executable is the part of the job that gets processed.) You can use LoadLeveler to submit jobs which are made up of one or more job steps, where each job step depends upon the completion status of a previous job step. For example, Figure 2 illustrates a stream of job steps:
Figure 2. LoadLeveler Job Steps
View figure.
Each of these job steps is defined in a single job command file. A job command file specifies the name of the job, as well as the job steps that you want to submit, and can contain other LoadLeveler statements.
LoadLeveler tries to execute each of your job steps on a machine that has enough resources to support executing and checkpointing each step. If your job command file has multiple job steps, the job steps will not necessarily run on the same machine, unless you explicitly request that they do.
You can submit batch jobs to LoadLeveler for scheduling. Batch jobs run in the background and generally do not require any input from the user. Batch jobs can either be serial or parallel. A serial job runs on a single machine, while a parallel job - a job that was written using a parallel language Application Program Interface (API) - is separated into multiple parts that can be processed simultaneously by several machines.
In order for LoadLeveler to schedule a job on a machine, the machine must be a valid member of the LoadLeveler cluster. A cluster is the combination of all of the different types of machines that use LoadLeveler. The following types of machines can be in a LoadLeveler cluster:
To make a machine a member of the LoadLeveler cluster, the administrator has to install the LoadLeveler software onto the machine and identify the central manager (described in "Roles of Machines"). Once the machine becomes a valid member of the cluster, LoadLeveler can schedule jobs to the machine.
Each machine in the LoadLeveler cluster performs one or more roles that make job scheduling possible. These roles are described below:
Keep in mind that one machine can assume multiple roles.
There may be times when some of the machines in the LoadLeveler cluster are not available to process jobs. This may be when the owners of the machines have decided to make them unavailable. This ability of LoadLeveler to allow users to restrict the use of their machines provides flexibility and control over the resources.
Machine owners can make their personal workstations available to other LoadLeveler users in several ways. For example, you can specify that:
Owners can also specify that their personal workstations will never be available to other LoadLeveler users.
This section lists the daemons that LoadLeveler uses to process jobs. For more detailed information, see "Daemons and Processes".
Once a user submits a job to LoadLeveler, LoadLeveler examines the job in order to determine what resources it needs to run the job. Then, LoadLeveler determines which machines in the LoadLeveler cluster are best suited to run the job. Once the appropriate machine is found, LoadLeveler dispatches the job to the machines. To provide this function, LoadLeveler uses the concept of queues.
A job queue is a list of jobs that are waiting to be processed. When a user submits a job to LoadLeveler, the job enters into an internal database that resides on one of the machines in the LoadLeveler cluster until it is ready to be dispatched to another machine to be run, as shown in Figure 3.
Figure 3. Job Queues
View figure.
Once LoadLeveler examines the job to determine its required resources, the job is dispatched to a machine to be processed. Arrows 2 and 3 indicate that the job can be dispatched to either one machine or, in the case of parallel jobs, to multiple machines. Once the job reaches the executing machine, the job runs.
Jobs do not necessarily get dispatched to machines in the cluster based upon a first-come, first-serve basis. Instead, LoadLeveler examines the requirements and characteristics of the job and the availability of machines and determines the best time for the job to be dispatched.
LoadLeveler also uses the concept of job classes to schedule jobs to run on machines. A job class is a classification to which a job can belong. For example, short running jobs may belong to a job class called short_jobs. Similarly, jobs that are only allowed to run on the weekends may belong to a class called weekend. The system administrator can define these job classes and select the users that are authorized to submit jobs of these classes. For more information, see "Step 3: Specify Class Stanzas".
You can specify which types of jobs will run on a machine by specifying the type(s) of job classes the machine will support. For more information, see "Step 1: Specify Machine Stanzas".
LoadLeveler also examines a job's priority in order to determine when to schedule the job on a machine. A priority of a job is used to determine its position among a list of all jobs waiting to be dispatched. For more information on job priority, see "Setting and Changing the Priority of a Job".
Figure 4 illustrates the information flow through the LoadLeveler cluster:
Figure 4. High-Level Job Flow
View figure.
With LoadLeveler, there is a managing machine known as the central manager. Also, there are machines that act as scheduling machines and machines that serve as the executing machines. The arrows in Figure 4 illustrate the following:
Figure 4 is broken down into the following more detailed diagrams illustrating how LoadLeveler processes a job.
Figure 5. Job is Submitted to LoadLeveler
View figure.
Figure 5 illustrates that the schedd daemon runs on the scheduling machine. This machine can also have the startd daemon running on it. The negotiator daemon resides on the central manager machine. The arrows in Figure 5 illustrate the following:
Figure 6. LoadLeveler Authorizes the Job
View figure.
In Figure 6, arrow 4 indicates that the negotiator daemon authorizes the schedd daemon to begin taking steps to run the job. This authorization is called a permit to run. Once this is done, the job is considered Pending or Starting. (See "LoadLeveler Job States" for more information.)
Figure 7. LoadLeveler Prepares to Run the Job
View figure.
In Figure 7, arrow 5 illustrates that the schedd daemon contacts the startd daemon on the executing machine and requests that it start the job. The executing machine can either be a local machine (the machine from which the job was submitted) or a remote machine (another machine in the cluster).
Figure 8. LoadLeveler Starts the Job
View figure.
The arrows in Figure 8 illustrate the following:
The starter forks and executes the user's job, and the starter parent waits for the child to complete.
Figure 9. LoadLeveler Completes the Job
View figure.
The arrows in Figure 9 illustrate the following: