Using and Administering
This chapter presents a detailed explanation of LoadLeveler daemons and
processes. Included here is a description of job states, which are
controlled by certain daemons. See "LoadLeveler Job States" for more information.
This section presents a detailed explanation of LoadLeveler
daemons and processes. For more information on configuration file
keywords mentioned in this section, see "Configuring LoadLeveler".
The master daemon runs on every machine in the LoadLeveler
cluster, except the submit-only machine. The real and effective user ID
of this daemon must be root.
The master daemon determines whether to start any other daemons by checking
the START_DAEMONS keyword in the global or local configuration
file. If the keyword is set to true, the daemons are
started. If the keyword is set to false, the master daemon
terminates and generates a message.
On the machine designated as the central manager, the master runs the
negotiator daemon. The master also controls the central
manager backup function. The negotiator runs on either the primary or
an alternate central manager. If a central manager failure is detected,
one of the alternate central managers becomes the primary central manager by
starting the negotiator.
The master daemon starts and if necessary, restarts all the LoadLeveler
daemons that the machine it resides on is configured to run. As part of
its startup procedure, this daemon executes the .llrc file
(a dummy file is provided in the bin subdirectory of the release
directory). You can use
this script to customize your local configuration file, specifying what
particular data is stored locally. This daemon also runs the
kbdd daemon, which monitors keyboard and mouse activity.
When the master daemon detects a failure on one of the daemons that it is
monitoring, it attempts to restart it. Because this daemon recognizes
that certain situations may prevent a daemon from running, it limits its
restart attempts to the number defined for the RESTARTS_PER_HOUR
keyword in the configuration file. If this limit is exceeded, the
master aborts and all daemons are killed.
When a daemon must be restarted, the master sends mail to the
administrator(s) identified by the LOADL_ADMIN keyword in the
configuration file. The mail contains the name of the failing daemon,
its termination status, and a section of the daemon's most recent log
file. If the master aborts after exceeding
RESTARTS_PER_HOUR, it will also send that mail before
exiting.
The master daemon may perform the following actions in response to an
llctl command:
- Kill all daemons and exit
- Kill all daemons and execute a new master
- Re-run the .llrc file, reread the configuration files,
stop or start daemons as appropriate for the new configuration files
- Send drain request to startd and schedd
- Send flush request to startd and send result to caller
- Send suspend request to startd and send result to caller
- Send resume request to startd and schedd, and send result to caller
The schedd daemon receives jobs sent by the llsubmit
command and schedules those jobs to machines selected by the negotiator
daemon. The schedd daemon is started, restarted, signalled, and stopped
by the master daemon.
The schedd daemon can be in any one of the following states:
- Available
- This machine is available to schedule jobs.
- Draining
- The schedd daemon has been drained by the administrator but some jobs are
still running. The state of the machine remains Draining until all
running jobs complete. At that time, the machine status changes to
Drained.
- Drained
- The schedd machine accepts no more jobs; jobs in the Starting or Running
state are allowed to continue running, and jobs in the Idle state are drained,
meaning they will not get dispatched.
- Down
- The daemon is not running on this machine. The schedd daemon enters
this state when it has not reported its status to the negotiator. This
can occur when the machine is actually down, or because there is a network
failure.
The schedd daemon performs the following functions:
- Assigns new job ids when requested by the job submission process (for
example, by the llsubmit command).
- Receives new jobs from the llsubmit command. A new job
is received as a
job object for each job step. A job object is the data
structure in memory containing all the information about a job step.
The schedd forwards the job object to the negotiator daemon as soon as it is
received from the submit command.
- Maintains on disk copies of jobs submitted locally (on this machine) that
are either waiting or running on a remote (different) machine. The
central manager can use this information to reconstruct the job information in
the event of a failure. This information is also used for accounting
purposes.
- Responds to directives sent by the administrator through the negotiator
daemon. The directives include:
- Run a job.
- Change the priority of a job.
- Remove a job.
- Hold or release a job.
- Send information about all jobs.
- Sends job events to the negotiator daemon when:
- schedd is restarting.
- A new series of job objects are arriving.
- A job is started.
- A job was rejected, completed, removed, or vacated. schedd
determines the status by examining the exit status returned by the
startd.
- Communicates with the Parallel Operating Environment (POE) when you run a
POE job.
- Requests that a remote startd daemon kill a job.
- Handles the checkpoint file associated with the job, provided
checkpointing has been enabled. For more information, see "Step 13: Enable Checkpointing".
- Receives accounting information from startd.
The startd daemon monitors jobs and machine resources on the
local machine and forwards this information to the negotiator daemon.
The startd also receives and executes job requests originating from remote
machines. The master daemon starts, restarts, signals, and stops the
startd daemon.
The startd daemon can be in any one of the following states:
- Busy
- The maximum number of jobs are running on this machine.
- Down
- The daemon is not running on this machine. The startd daemon enters
this state when it has not reported its status to the negotiator. This
can occur when the machine is actually down, or because there is a network
failure.
- Drained
- The startd machine accepts no more jobs; and jobs already running on the
startd machine are allowed to complete.
- Draining
- The startd daemon has been drained by the administrator but some jobs are
still running. The state of the machine remains Draining until all
running jobs complete. At that time, the machine status changes to
Drained.
- Flush
- All jobs on this machine have been flushed. No new jobs are
accepted.
- Idle
- The machine is not running any jobs.
- None
- LoadLeveler is running on this machine, but no jobs can run here.
- Reserved
- The resource manager has this machine reserved for use by interactive
jobs.
- Running
- The machine is running one or more jobs and is capable of running
more.
- Suspend
- All jobs on this machine have been suspended by the administrator.
The startd deamon performs these functions:
The startd daemon spawns a starter process after the schedd
daemon tells the startd to start a job. The starter process manages all
the processes associated with a job step. The starter process is
responsible for running the job and reporting status back to startd.
The starter process performs these functions:
- Processes the prolog and epilog programs as defined by the
JOB_PROLOG and JOB_EPILOG keywords in the configuration
file. The job will not run if the prolog program exits with a return
code other than zero.
- Handles authentication. This includes:
- Authenticates AFS, if necessary
- Verifies that the submitting user is not root
- Verifies that the submitting user has access to the appropriate
directories in the local file system.
- Runs the job by forking a child process that runs with the user id and all
groups of the submitting user. The starter child creates a new process
group of which it is the process group leader, and executes the user's
program or a shell. The starter parent is responsible for detecting the
termination of the starter child. LoadLeveler does not monitor the
children of the parent.
- Responds to vacate and suspend orders from the startd.
- Periodically generates a new checkpoint file, provided checkpointing has
been enabled, and sends it to the scheduling machine.
The negotiator daemon maintains status of each job and machine
in the cluster and responds to queries from the llstatus and
llq commands. The negotiator daemon runs on a single machine
in the cluster (the central manager machine). This daemon is started,
restarted, signalled, and stopped by the master daemon.
The negotiator daemon receives status messages from each schedd and startd
daemon running in the cluster. The negotiator daemon tracks:
- Which schedd daemons are running
- Which startd daemons are running, and the status of each startd
machine.
If the negotiator does not receive an update from any machine within the
time period defined by the MACHINE_UPDATE_INTERVAL keyword, it
assumes that machine is down and therefore the schedd and startd daemons are
also down.
The negotiator also maintains in its memory several queues and tables which
determine where the job should run.
The negotiator performs the following functions:
- Receives and records job status changes from the schedd daemon.
- Schedules jobs based on a variety of scheduling criteria and policy
options. Once a job is selected, the negotiator contacts the schedd
that originally created the job.
- Handles requests to:
- Set priorities
- Query about jobs
- Remove a job
- Hold or release a job
- Favor or unfavor a user or a job.
- Receives notification of schedd resets indicating that a schedd has
restarted.
The kbdd daemon monitors keyboard and mouse activity. The
kbdd daemon is spawned by the master daemon if the X_RUNS_HERE
keyword in the configuration file is set to true.
The kbdd daemon notifies the startd daemon when it detects keyboard or
mouse activity; however, kbdd is not interrupt driven. It
sleeps for the number of seconds defined by the POLLING_FREQUENCY
keyword in the LoadLeveler configuration file, and then determines if X
events, in the form of mouse or keyboard activity, have occurred. For
more information on the configuration file, see Chapter 5. "Administering and Configuring LoadLeveler".
As LoadLeveler processes a job, the job moves into various states.
Some states are unique to specific daemons; for example, only the negotiator
places a job in the NotQueued state. For more information on daemons,
see "Daemons and Processes". Possible job states are:
- Completed
- The job has completed.
- Deferred
- The job will not be assigned to a machine until a specified date.
This date may have been specified by the user in the job command file, or may
have been generated by the negotiator because a parallel job did not
accumulate enough machines to run the job. (Only the negotiator places
a job in the Deferred state.)
- Idle
- The job is being considered to run on a machine, though no machine has
been selected.
- NotQueued
- The job is not being considered to run on a machine. A job can
enter this state because the associated schedd is down, the user or group
associated with the job is at its maximum maxqueued or
maxidle value, or because the job has a dependency which cannot be
determined. For more information on these keywords, see "Controlling the Mix of Idle and Running Jobs". (Only the negotiator places a job in the NotQueued state.)
- Not Run
- The job will never be run because a dependency associated with the job was
found to be false.
- Pending
- The job is in the process of starting on one or more machines. (The
negotiator indicates this state until the schedd acknowleges that it has
received the request to start the job. Then the negotiator changes the
state of the job to Starting. The schedd indicates the Pending state
until all startd machines have acknowledged receipt of the start
request. The schedd then changes the state of the job to
Starting.)
- Reject Pending
- The job did not start. Possible reasons why a job is rejected
are: job requirements were not met on the target machine, or the user ID
of the person running the job is not valid on the target machine. After
a job leaves the Reject Pending state, it is moved into one of the following
states: Idle, User Hold, or Removed.
- Removed
- The job was removed (cancelled), either by LoadLeveler or by the
user.
- Remove Pending
- The job is in the process of being removed, but not all associated
machines have acknowledged the removal of the job.
- Running
- The job is running: the job was dispatched and has started on the
designated machine.
- Starting
- The job is starting: the job was dispatched, was received by the
target machine, and LoadLeveler is setting up the environment in which to run
the job. For a parallel job, LoadLeveler sets up the environment on all
required nodes. See the description of the "Pending" state for
more information on when the negotiator or the schedd daemon moves a job into
the Starting state.
- System Hold
- The job has been put in system hold.
- System User Hold
- The job has been put in system hold and user hold.
- User Hold
- The job has been put in user hold.
- Vacate
- The job started but did not complete. The negotiator will
reschedule the job (provided the job is allowed to be rescheduled).
Possible reasons why a job moves to the Vacate state are: the machine
where the job was running was flushed, the VACATE expression in the
configuration file evaluated to True, or LoadLeveler detected a condition
indicating the job needed to be vacated. For more information on the
VACATE expression, see "Step 7: Manage a Job's Status Using Control Expressions".
You may also see other states that include "Pending," such as
Complete Pending and Vacate Pending. These are intermediate, temporary
states usually associated with parallel jobs.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]