Using and Administering
The schedd daemon receives jobs sent by the llsubmit
command and schedules those jobs to machines selected by the negotiator
daemon. The schedd daemon is started, restarted, signalled, and stopped
by the master daemon.
The schedd daemon can be in any one of the following states:
- Available
- This machine is available to schedule jobs.
- Draining
- The schedd daemon has been drained by the administrator but some jobs are
still running. The state of the machine remains Draining until all
running jobs complete. At that time, the machine status changes to
Drained.
- Drained
- The schedd machine accepts no more jobs; jobs in the Starting or
Running state are allowed to continue running, and jobs in the Idle state are
drained, meaning they will not get dispatched.
- Down
- The daemon is not running on this machine. The schedd daemon enters
this state when it has not reported its status to the negotiator. This
can occur when the machine is actually down, or because there is a network
failure.
The schedd daemon performs the following functions:
- Assigns new job ids when requested by the job submission process (for
example, by the llsubmit command).
- Receives new jobs from the llsubmit command. A new job
is received as a
job object for each job step. A job object is the data
structure in memory containing all the information about a job step.
The schedd forwards the job object to the negotiator daemon as soon as it is
received from the submit command.
- Maintains on disk copies of jobs submitted locally (on this machine) that
are either waiting or running on a remote (different) machine. The
central manager can use this information to reconstruct the job information in
the event of a failure. This information is also used for accounting
purposes.
- Responds to directives sent by the administrator through the negotiator
daemon. The directives include:
- Run a job.
- Change the priority of a job.
- Remove a job.
- Hold or release a job.
- Send information about all jobs.
- Sends job events to the negotiator daemon when:
- schedd is restarting.
- A new series of job objects are arriving.
- A job is started.
- A job was rejected, completed, removed, or vacated. schedd
determines the status by examining the exit status returned by the
startd.
- Communicates with the Parallel Operating Environment (POE) when you run a
POE job.
- Requests that a remote startd daemon kill a job.
- Handles the checkpoint file associated with the job, provided
checkpointing has been enabled. For more information, see Step 14: Enable Checkpointing.
- Receives accounting information from startd.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]