Using and Administering
As LoadLeveler processes a job, the job moves into various states.
Some states are unique to specific daemons; for example, only the
negotiator places a job in the NotQueued state. For more information on
daemons, see Daemons and Processes. Possible job states are:
- Cancelled
- The job was cancelled either by a user or by an administrator.
- Completed
- The job has completed.
- Deferred
- The job will not be assigned to a machine until a specified date.
This date may have been specified by the user in the job command file, or may
have been generated by the negotiator because a parallel job did not
accumulate enough machines to run the job. (Only the negotiator places
a job in the Deferred state.)
- Idle
- The job is being considered to run on a machine, though no machine has
been selected.
- NotQueued
- The job is not being considered to run on a machine. A job can
enter this state because the associated schedd is down, the user or group
associated with the job is at its maximum maxqueued or
maxidle value, or because the job has a dependency which cannot be
determined. For more information on these keywords, see Controlling the Mix of Idle and Running Jobs. (Only the negotiator places a job in the NotQueued
state.)
- Not Run
- The job will never be run because a dependency associated with the job was
found to be false.
- Pending
- The job is in the process of starting on one or more machines. (The
negotiator indicates this state until the schedd acknowleges that it has
received the request to start the job. Then the negotiator changes the
state of the job to Starting. The schedd indicates the Pending state
until all startd machines have acknowledged receipt of the start
request. The schedd then changes the state of the job to
Starting.)
- Reject Pending
- The job did not start. Possible reasons why a job is rejected
are: job requirements were not met on the target machine, or the user ID
of the person running the job is not valid on the target machine. After
a job leaves the Reject Pending state, it is moved into one of the following
states: Idle, User Hold, or Removed.
- Removed
- The job was stopped by LoadLeveler.
- Remove Pending
- The job is in the process of being removed, but not all associated
machines have acknowledged the removal of the job.
- Running
- The job is running: the job was dispatched and has started on the
designated machine.
- Starting
- The job is starting: the job was dispatched, was received by the
target machine, and LoadLeveler is setting up the environment in which to run
the job. For a parallel job, LoadLeveler sets up the environment on all
required nodes. See the description of the "Pending" state for
more information on when the negotiator or the schedd daemon moves a job into
the Starting state.
- System Hold
- The job has been put in system hold.
- System User Hold
- The job has been put in system hold and user hold.
- Terminated
- If the negotiator and schedd daemons experience communication problems,
they may be temporarily unable to exchange information concerning the status
of jobs in the system. During this period of time, some of the jobs may
actually complete and therefore be removed from the schedd's list of
active jobs. When communication resumes between the two daemons, the
negotiator will move such jobs to the Terminated state, where they will remain
for a set period of time (specified by the NEGOTIATOR_REMOVE_COMPLETED keyword
in the configuration file). When this time has passed, the negotiator
will remove the jobs from its active list.
- User Hold
- The job has been put in user hold.
- Vacated
- The job started but did not complete. The negotiator will
reschedule the job (provided the job is allowed to be rescheduled).
Possible reasons why a job moves to the Vacated state are: the machine
where the job was running was flushed, the VACATE expression in the
configuration file evaluated to True, or LoadLeveler detected a condition
indicating the job needed to be vacated. For more information on the
VACATE expression, see Step 8: Manage a Job's Status Using Control Expressions.
You may also see other states that include "Pending," such as
Complete Pending and Vacate Pending. These are intermediate, temporary
states usually associated with parallel jobs.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]