Using and Administering

llctl - Control LoadLeveler Daemons

Purpose

Controls LoadLeveler daemons on all members of the LoadLeveler cluster.

Syntax

llctl [-?] [-H] [-v] [-q] [-g | -h host] [keyword]

Flags

-?
Provides a short usage message.

-H
Provides entended help information.

-v
Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-q
Specifies quiet mode: print no messages other than error messages.

-g
Indicates that the command applies globally to all machines in the administration file.

-h host
Indicates that the command applies to only the host machine in the LoadLeveler cluster. If neither -h nor -g is specified, the default is the machine on which the llctl command is issued.

keyword
Must be specified after all flags and can be the following:

purge list_of_machines
Forces a schedd to delete any queued transaction to the machines in the list_of_machines. If all jobs on the listed machines have completed, and there are no messages pending to that machine, this option is not necessary.

This option is intended for recovery and cleanup after a machine has permanently crashed or was inadvertantly removed from the LoadLeveler cluster before all activity on it was quiesced. Do not use this option unless the specified list_of_machines are guaranteed not to return to the LoadLeveler cluster.

If you need to return the machine to the cluster later, you must clear all files from the spool and execute directory of the machine which was deleted.

capture eventname
Captures accounting data for all jobs running on the designated machines. eventname is the name you associate with the data, and must be a character string containing no blanks. For more information, see Collecting Job Resource Data Based on Events.

drain [schedd|startd [classlist |allclasses]]
When you issue drain with no options, the following happens: (1) no more LoadLeveler jobs can begin running on this machine, and (2) no more LoadLeveler jobs can be submitted through this machine. When you issue drain schedd, the following happens: (1) the schedd machine accepts no more LoadLeveler jobs for submission, (2) jobs in the Starting or Running state in the schedd queue are allowed to continue running, and (3) jobs in the Idle state in the schedd queue are drained, meaning they will not get dispatched. When you issue drain startd, the following happens: (1) the startd machine accepts no more LoadLeveler jobs to be run, and (2) jobs already running on the startd machine are allowed to complete. When you issue drain startd classlist, the classes you specify which are available on the startd machine are drained (made unavailable). When you issue drain startd allclasses, all available classes on the startd machine are drained.

flush
Terminates running jobs on this machine and sends them back, in the Idle state, to the negotiator to await redispatch (provided restart=yes in the job command file). No new jobs are sent to this machine until resume is issued. Forces a checkpoint if jobs are enabled for checkpointing. However, the checkpoint gets cancelled if it does not complete within a five minute period.

purgeschedd
Requests that all jobs scheduled by the specified host machine be purged (removed). To use this keyword, you must first specify schedd_fenced=true in the machine stanza for this host. The -g option cannot be specified with this keyword. For more information, see "How Do I Recover Resources Allocated by a schedd Machine?" in the IBM LoadLeveler for AIX: Diagnosis and Messages Guide.

reconfig
Forces all daemons to reread the configuration files.

recycle
Stops all LoadLeveler daemons and restarts them.

resume [schedd|startd [classlist |allclasses]]
When you issue resume with no options, job submission and job execution on this machine is resumed. When you issue resume schedd, the schedd machine resumes the submission of jobs. When you issue resume startd, the startd machine resumes the execution of jobs. When you issue resume startd classlist, the startd machine resumes the execution of those job classes you specify which are also configured (defined on the machine). When you issue resume startd allclasses, the startd machine resumes the execution of all configured classes.

start
Starts the LoadLeveler daemons on the specified machine. You must have rsh privileges to start LoadLeveler on a remote machine.

stop
Stops the LoadLeveler daemons on the specified machine.

suspend
Suspends all jobs on this machine. This is not supported for parallel jobs.

version
Displays version and release data at the screen.

Description

This command sends a message to the master daemon on the target machine requesting that action be taken on the members of the LoadLeveler cluster. Note the following when using this command:

Examples

This example stops LoadLeveler on the machine named iron:

llctl -h iron stop

This example starts the LoadLeveler daemons on all members of the LoadLeveler cluster, starting with the central manager, as defined in the machine stanzas of the administration file:

llctl -g start

This example causes the LoadLeveler daemons on machine iron to re-read the configuration files, which may contain new configuration information for the iron machine:

llctl -h iron reconfig

For the next three examples, suppose the classes small, medium, and large are available on the machine called iron.

This example drains the classes medium and large on the machine named iron.

llctl -h iron drain startd medium large

This example drains the classes medium and large on all machines.

llctl -g drain startd medium large

This example stops all the jobs on the system, then allows only jobs of a certain class (medium) to run.

llctl -g drain startd allclasses
llctl -g flush
llctl -g resume
llctl -g resume startd medium

This example resumes the classes medium and large on the machine named iron.

llctl -h iron resume startd medium large

This example illustrates how to capture accounting information on a work shift called day on the machine iron:

llctl -h iron capture day

You can capture accounting information on all the machines in the LoadLeveler cluster by using the -g option, or you can collect accounting information on the local machine by simply issuing the following:

llctl capture day

Capturing information on the local machine is the default. For more information, see Collecting Job Resource Data Based on Events.

Assume the machine earth has crashed while running jobs. Its hard disk needs to be replaced. You try to cancel the jobs that were running on that machine. The schedd marks the job Remove Pending until it gets confirmation from earth that the jobs were removed. Since earth will be reinstalled, you need to inform schedd that it should not wait for confirmation.

Assume the schedd is named mars, and the running jobs are named mars.1.0 and mars.1.1. First you want to tell the negotiator to remove the jobs:

llcancel  mars.1.0
llcancel  mars.1.1

Next, tell the schedd not to wait for confirmation from earth before marking the jobs removed:

llctl -h mars purge earth

Results

The following shows the result of the llctl -h mars purge earth command:

llctl: Sent purge command to host mars


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]