Using and Administering

Chapter 2. LoadLeveler Daemons and Job States

This chapter presents a detailed explanation of LoadLeveler daemons and processes. Included here is a description of job states, which are controlled by certain daemons. See "LoadLeveler Job States" for more information.

Daemons and Processes

This section presents a detailed explanation of LoadLeveler daemons and processes. For more information on configuration file keywords mentioned in this section, see "Configuring LoadLeveler".

The master Daemon

The master daemon runs on every machine in the LoadLeveler cluster, except the submit-only machine. The real and effective user ID of this daemon must be root.

The master daemon determines whether to start any other daemons by checking the START_DAEMONS keyword in the global or local configuration file. If the keyword is set to true, the daemons are started. If the keyword is set to false, the master daemon terminates and generates a message.

On the machine designated as the central manager, the master runs the negotiator daemon. The master also controls the central manager backup function. The negotiator runs on either the primary or an alternate central manager. If a central manager failure is detected, one of the alternate central managers becomes the primary central manager by starting the negotiator.

The master daemon starts and if necessary, restarts all the LoadLeveler daemons that the machine it resides on is configured to run. As part of its startup procedure, this daemon executes the .llrc file (a dummy file is provided in the bin subdirectory of the release directory). You can use this script to customize your local configuration file, specifying what particular data is stored locally. This daemon also runs the kbdd daemon, which monitors keyboard and mouse activity.

When the master daemon detects a failure on one of the daemons that it is monitoring, it attempts to restart it. Because this daemon recognizes that certain situations may prevent a daemon from running, it limits its restart attempts to the number defined for the RESTARTS_PER_HOUR keyword in the configuration file. If this limit is exceeded, the master aborts and all daemons are killed.

When a daemon must be restarted, the master sends mail to the administrator(s) identified by the LOADL_ADMIN keyword in the configuration file. The mail contains the name of the failing daemon, its termination status, and a section of the daemon's most recent log file. If the master aborts after exceeding RESTARTS_PER_HOUR, it will also send that mail before exiting.

The master daemon may perform the following actions in response to an llctl command:

Kill all daemons and exit
Kill all daemons and execute a new master
Re-run the .llrc file, reread the configuration files, stop or start daemons as appropriate for the new configuration files
Send drain request to startd and schedd
Send flush request to startd and send result to caller
Send suspend request to startd and send result to caller
Send resume request to startd and schedd, and send result to caller

The schedd Daemon

The schedd daemon receives jobs sent by the llsubmit command and schedules those jobs to machines selected by the negotiator daemon. The schedd daemon is started, restarted, signalled, and stopped by the master daemon.

The schedd daemon can be in any one of the following states:

Available: This machine is available to schedule jobs.
Draining: The schedd daemon has been drained by the administrator but some jobs are still running. The state of the machine remains Draining until all running jobs complete. At that time, the machine status changes to Drained.
Drained: The schedd machine accepts no more jobs; jobs in the Starting or Running state are allowed to continue running, and jobs in the Idle state are drained, meaning they will not get dispatched.
Down: The daemon is not running on this machine. The schedd daemon enters this state when it has not reported its status to the negotiator. This can occur when the machine is actually down, or because there is a network failure.

The schedd daemon performs the following functions:

Assigns new job ids when requested by the job submission process (for example, by the llsubmit command).
Receives new jobs from the llsubmit command. A new job is received as a job object for each job step. A job object is the data structure in memory containing all the information about a job step. The schedd forwards the job object to the negotiator daemon as soon as it is received from the submit command.
Maintains on disk copies of jobs submitted locally (on this machine) that are either waiting or running on a remote (different) machine. The central manager can use this information to reconstruct the job information in the event of a failure. This information is also used for accounting purposes.
Responds to directives sent by the administrator through the negotiator daemon. The directives include:
- Run a job.
- Change the priority of a job.
- Remove a job.
- Hold or release a job.
- Send information about all jobs.
Sends job events to the negotiator daemon when:
- schedd is restarting.
- A new series of job objects are arriving.
- A job is started.
- A job was rejected, completed, removed, or vacated. schedd determines the status by examining the exit status returned by the startd.
Communicates with the Parallel Operating Environment (POE) when you run a POE job.
Requests that a remote startd daemon kill a job.
Handles the checkpoint file associated with the job, provided checkpointing has been enabled. For more information, see "Step 13: Enable Checkpointing".
Receives accounting information from startd.

The startd Daemon

The startd daemon monitors jobs and machine resources on the local machine and forwards this information to the negotiator daemon. The startd also receives and executes job requests originating from remote machines. The master daemon starts, restarts, signals, and stops the startd daemon.

The startd daemon can be in any one of the following states:

Busy: The maximum number of jobs are running on this machine.
Down: The daemon is not running on this machine. The startd daemon enters this state when it has not reported its status to the negotiator. This can occur when the machine is actually down, or because there is a network failure.
Drained: The startd machine accepts no more jobs; and jobs already running on the startd machine are allowed to complete.
Draining: The startd daemon has been drained by the administrator but some jobs are still running. The state of the machine remains Draining until all running jobs complete. At that time, the machine status changes to Drained.
Flush: All jobs on this machine have been flushed. No new jobs are accepted.
Idle: The machine is not running any jobs.
None: LoadLeveler is running on this machine, but no jobs can run here.
Reserved: The resource manager has this machine reserved for use by interactive jobs.
Running: The machine is running one or more jobs and is capable of running more.
Suspend: All jobs on this machine have been suspended by the administrator.

The startd deamon performs these functions:

Runs a timeout procedure that includes building a snapshot of the state of the machine that includes static and dynamic data. This timeout procedure is run at the following times:
- After a job completes.
- According to the definition of the POLLING_FREQUENCY keyword in the configuration file.
Records the following information in LoadLeveler variables and sends the information to the negotiator. These variables are described in "LoadLeveler Variables".
- State (of the startd daemon)
- EnteredCurrentState
- Memory
- Disk
- KeyboardIdle
- Cpus
- LoadAvg
- Machine
- Adapter
- AvailableClasses
Calculates the SUSPEND, RESUME, CONTINUE, and VACATE expressions. These are described in "Step 7: Manage a Job's Status Using Control Expressions".

Receives job requests from the schedd daemon to:

Start a job
Vacate a job
Cancel

When the schedd daemon tells the startd to start a job, the startd determines whether its own state permits a new job to run:

If: Then this happens:
Yes, it can start a new job The startd forks a starter process.
No, it cannot start a new job The startd rejects the request for one of the following reasons:

Jobs have been suspended, flushed, or drained
The job limit set for the MAX_STARTERS keyword has been reached
There are not enough classes available for the designated job class

Receives requests from the master (via llctl) to do one of the following:
- Drain
- Flush
- Suspend
- Resume.
For each request, startd marks its own new state, forwards its new state to the negotiator daemon, and then performs the appropriate action for any jobs that are active.
Receives notification of keyboard and mouse activity from the kbdd daemon
Periodically examines the process table for LoadLeveler jobs and accumulates resources consumed by those jobs. This resource data is used to determine if a job has exceeded its job limit and for recording in the history file.
Send accounting information to schedd.

The starter Process

The startd daemon spawns a starter process after the schedd daemon tells the startd to start a job. The starter process manages all the processes associated with a job step. The starter process is responsible for running the job and reporting status back to startd.

The starter process performs these functions:

Processes the prolog and epilog programs as defined by the JOB_PROLOG and JOB_EPILOG keywords in the configuration file. The job will not run if the prolog program exits with a return code other than zero.
Handles authentication. This includes:
- Authenticates AFS, if necessary
- Verifies that the submitting user is not root
- Verifies that the submitting user has access to the appropriate directories in the local file system.
Runs the job by forking a child process that runs with the user id and all groups of the submitting user. The starter child creates a new process group of which it is the process group leader, and executes the user's program or a shell. The starter parent is responsible for detecting the termination of the starter child. LoadLeveler does not monitor the children of the parent.
Responds to vacate and suspend orders from the startd.
Periodically generates a new checkpoint file, provided checkpointing has been enabled, and sends it to the scheduling machine.

The negotiator Daemon

The negotiator daemon maintains status of each job and machine in the cluster and responds to queries from the llstatus and llq commands. The negotiator daemon runs on a single machine in the cluster (the central manager machine). This daemon is started, restarted, signalled, and stopped by the master daemon.

The negotiator daemon receives status messages from each schedd and startd daemon running in the cluster. The negotiator daemon tracks:

Which schedd daemons are running
Which startd daemons are running, and the status of each startd machine.

If the negotiator does not receive an update from any machine within the time period defined by the MACHINE_UPDATE_INTERVAL keyword, it assumes that machine is down and therefore the schedd and startd daemons are also down.

The negotiator also maintains in its memory several queues and tables which determine where the job should run.

The negotiator performs the following functions:

Receives and records job status changes from the schedd daemon.
Schedules jobs based on a variety of scheduling criteria and policy options. Once a job is selected, the negotiator contacts the schedd that originally created the job.
Handles requests to:
- Set priorities
- Query about jobs
- Remove a job
- Hold or release a job
- Favor or unfavor a user or a job.
Receives notification of schedd resets indicating that a schedd has restarted.

The kbdd Daemon

The kbdd daemon monitors keyboard and mouse activity. The kbdd daemon is spawned by the master daemon if the X_RUNS_HERE keyword in the configuration file is set to true.

The kbdd daemon notifies the startd daemon when it detects keyboard or mouse activity; however, kbdd is not interrupt driven. It sleeps for the number of seconds defined by the POLLING_FREQUENCY keyword in the LoadLeveler configuration file, and then determines if X events, in the form of mouse or keyboard activity, have occurred. For more information on the configuration file, see Chapter 5. "Administering and Configuring LoadLeveler".

LoadLeveler Job States

As LoadLeveler processes a job, the job moves into various states. Some states are unique to specific daemons; for example, only the negotiator places a job in the NotQueued state. For more information on daemons, see "Daemons and Processes". Possible job states are:

Completed: The job has completed.
Deferred: The job will not be assigned to a machine until a specified date. This date may have been specified by the user in the job command file, or may have been generated by the negotiator because a parallel job did not accumulate enough machines to run the job. (Only the negotiator places a job in the Deferred state.)
Idle: The job is being considered to run on a machine, though no machine has been selected.
NotQueued: The job is not being considered to run on a machine. A job can enter this state because the associated schedd is down, the user or group associated with the job is at its maximum maxqueued or maxidle value, or because the job has a dependency which cannot be determined. For more information on these keywords, see "Controlling the Mix of Idle and Running Jobs". (Only the negotiator places a job in the NotQueued state.)
Not Run: The job will never be run because a dependency associated with the job was found to be false.
Pending: The job is in the process of starting on one or more machines. (The negotiator indicates this state until the schedd acknowleges that it has received the request to start the job. Then the negotiator changes the state of the job to Starting. The schedd indicates the Pending state until all startd machines have acknowledged receipt of the start request. The schedd then changes the state of the job to Starting.)
Reject Pending: The job did not start. Possible reasons why a job is rejected are: job requirements were not met on the target machine, or the user ID of the person running the job is not valid on the target machine. After a job leaves the Reject Pending state, it is moved into one of the following states: Idle, User Hold, or Removed.
Removed: The job was removed (cancelled), either by LoadLeveler or by the user.
Remove Pending: The job is in the process of being removed, but not all associated machines have acknowledged the removal of the job.
Running: The job is running: the job was dispatched and has started on the designated machine.
Starting: The job is starting: the job was dispatched, was received by the target machine, and LoadLeveler is setting up the environment in which to run the job. For a parallel job, LoadLeveler sets up the environment on all required nodes. See the description of the "Pending" state for more information on when the negotiator or the schedd daemon moves a job into the Starting state.
System Hold: The job has been put in system hold.
System User Hold: The job has been put in system hold and user hold.
User Hold: The job has been put in user hold.
Vacate: The job started but did not complete. The negotiator will reschedule the job (provided the job is allowed to be rescheduled). Possible reasons why a job moves to the Vacate state are: the machine where the job was running was flushed, the VACATE expression in the configuration file evaluated to True, or LoadLeveler detected a condition indicating the job needed to be vacated. For more information on the VACATE expression, see "Step 7: Manage a Job's Status Using Control Expressions".

You may also see other states that include "Pending," such as Complete Pending and Vacate Pending. These are intermediate, temporary states usually associated with parallel jobs.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

If:	Then this happens:
Yes, it can start a new job	The startd forks a starter process.
No, it cannot start a new job	The startd rejects the request for one of the following reasons: Jobs have been suspended, flushed, or drained The job limit set for the MAX_STARTERS keyword has been reached There are not enough classes available for the designated job class