Using and Administering

Chapter 9. LoadLeveler Commands

LoadLeveler provides two types of commands: those that are available to all users of LoadLeveler, and those that are reserved for LoadLeveler administrators. (Administrators are identified by the LOADL_ADMIN keyword in the configuration file.)

The administrator commands can operate on the entire LoadLeveler job queue and all machines configured. The user commands mainly affect those jobs submitted by that user. Some commands, such as llhold, include options that can only be performed by an administrator.

Summary of LoadLeveler Commands

The following table summarizes the LoadLeveler commands:

Command Description Who Can Issue? For More Information
llacctmrg Collects all individual machine history files together into a single file. Administrators See page llacctmrg - Collect machine history files
llcancel Cancels a submitted job. Users and Administrators See page llcancel - Cancel a Submitted Job
llclass Returns information about LoadLeveler classes. Users and Administrators See page llclass - Query Class Information
llctl Controls daemons on one or more machines in the LoadLeveler cluster. Administrators See page llctl - Control LoadLeveler Daemons
llextSDR Extracts adapter information from the system data repository (SDR). Users and Administrators See page llextSDR - Extract adapter information from the SDR
llfavorjob Raises one or more jobs to the highest priority, or restores original priority. Administrators See page llfavorjob - Reorder System Queue by Job
llfavoruser Raises job(s) submitted by one or more users to the highest priority, or restores original priority. Administrators See page llfavoruser - Reorder System Queue by User
llhold Holds or releases a hold on a job. Users and Administrators See page llhold - Hold or Release a Submitted Job
llinit Initializes a new machine as a member of the LoadLeveler cluster. Administrators See page llinit - Initialize Machines in the LoadLeveler Cluster
llprio Changes the user priority of a submitted job step. Users and Administrators See page llprio - Change the User Priority of Submitted Job Steps
llq Queries the status of LoadLeveler jobs. Users and Administrators See page llq - Query Job Status
llstatus Queries the status of LoadLeveler machines. Users and Administrators See page llstatus - Query Machine Status
llsubmit Submits a job. Users and Administrators See page llsubmit - Submit a Job
llsummary Returns resource information on completed jobs. Administrators See page llsummary - Return Job Resource Information for Accounting

Command	Description	Who Can Issue?	For More Information
llacctmrg	Collects all individual machine history files together into a single file.	Administrators	See page llacctmrg - Collect machine history files
llcancel	Cancels a submitted job.	Users and Administrators	See page llcancel - Cancel a Submitted Job
llclass	Returns information about LoadLeveler classes.	Users and Administrators	See page llclass - Query Class Information
llctl	Controls daemons on one or more machines in the LoadLeveler cluster.	Administrators	See page llctl - Control LoadLeveler Daemons
llextSDR	Extracts adapter information from the system data repository (SDR).	Users and Administrators	See page llextSDR - Extract adapter information from the SDR
llfavorjob	Raises one or more jobs to the highest priority, or restores original priority.	Administrators	See page llfavorjob - Reorder System Queue by Job
llfavoruser	Raises job(s) submitted by one or more users to the highest priority, or restores original priority.	Administrators	See page llfavoruser - Reorder System Queue by User
llhold	Holds or releases a hold on a job.	Users and Administrators	See page llhold - Hold or Release a Submitted Job
llinit	Initializes a new machine as a member of the LoadLeveler cluster.	Administrators	See page llinit - Initialize Machines in the LoadLeveler Cluster
llprio	Changes the user priority of a submitted job step.	Users and Administrators	See page llprio - Change the User Priority of Submitted Job Steps
llq	Queries the status of LoadLeveler jobs.	Users and Administrators	See page llq - Query Job Status
llstatus	Queries the status of LoadLeveler machines.	Users and Administrators	See page llstatus - Query Machine Status
llsubmit	Submits a job.	Users and Administrators	See page llsubmit - Submit a Job
llsummary	Returns resource information on completed jobs.	Administrators	See page llsummary - Return Job Resource Information for Accounting

llacctmrg - Collect machine history files

Purpose

Collects individual machine history files together into a single file specified as a parameter.

Syntax

llacctmrg [-?] [ -H] [-v] [-h hostlist] [-d directory]

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-h hostlist: Specifies a blank delimited list of machines from which to collect data. The default is all machines in the LoadLeveler cluster.
-d directory: Specifies the directory to hold the new global history file. If not specified, the directory specified in the GLOBAL_HISTORY keyword in the configuration file is used.

Description

This command by default collects data from all the machines identified in the administration file. To override the default, specify a machine or a list of machines using the -h flag.

When the llacctmrg command ends, accounting information is stored in a file called globalhist.YYYYMMDDHHmm. Information such as the amount of resources consumed by the job and other job-related data is stored in this file. In this file:

YYYY: indicates the year
MM: indicates the month
DD: indicates the day
HH: indicates the hour
mm: indicates the minute.

You can use this file as input to the llsummary command. For example, if you created the file globalhist.199808301050, you can issue llsummary globalhist.199808301050 to record information on all machines.

Data on processes which fork child processes will be included in the file only if the parent process waits for the child process to end. Therefore, complete data may not be collected for jobs which are not composed of simple parent/child processes. For example, if a LoadLeveler job invokes an rsh command to execute some function on another machine, the resources consumed on the other machine will not be collected as part of the accounting data.

Examples

The following example collects data from machines named mars and pluto.

llacctmrg -h mars pluto

The following example collects data from the machine named mars and places the data in an existing directory called merge.

llacctmrg -h mars -d merge

Results

The following shows a sample system response from the llacctmrg -h mars -d merge command.

llacctmrg: History transferred successfully from mars (10080 bytes)

llcancel - Cancel a Submitted Job

Purpose

Cancels one or more jobs from the LoadLeveler queue.

Syntax

llcancel [-?] [-H] [-v] [-q] [-u userlist] [-h hostlist] [joblist]

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-v

Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-q

Specifies quiet mode: print no messages other than error messages.

-u userlist

Is a blank-delimited list of users. When used with the -h option, only the user's jobs monitored on the machines in the hostlist are cancelled. When used alone, only the user's jobs monitored by the machine issuing the command are cancelled.

-h hostlist

Is a blank-delimited list of machine names. All jobs monitored on machines in this list are cancelled. When issued with the -u option, the userlist is used to further select jobs for cancellation.

joblist

Is a blank-delimited list of jobs of the form host.jobid.stepid where:

host is the name of the machine to which the job was submitted (delimited by dot). The default is the local machine.
jobid is the job ID assigned to the job when it was submitted using the llsubmit command. The jobid is required.
stepid (delimited by dot) is the step ID assigned to the job when it was submitted using the llsubmit command. The default is to include all steps of the job.

The -u or -h flags override the host.jobid.stepid parameters.

When the -h flag is specified by a non-administrator, all jobs submitted from the machines in hostlist by the user issuing the command are cancelled.

When the -h flag is specified by an administrator, all jobs submitted by the administrator are canceled, unless the -u is also specified, in which case all jobs both submitted by users in userlist and monitored on machines in hostlist are cancelled.

Group administrators and class administrators are considered normal users unless they are also LoadLeveler administrators.

Description

When you issue llcancel, the command is sent to the negotiator. You should then use the llq command to verify your job was cancelled. A job state of RM (Removed) indicates the job was cancelled. A job state of RP (Remove Pending) indicates the job is in the process of being cancelled.

When cancelling a job from a submit-only machine, you must specify the machine name that scheduled the job. For example, if you submitted the job from machine A, a submit-only machine, and machine B, a scheduling machine, scheduled the job to run, you must specify machine B's name in the cancel command. If machine A and B are in different sub-domains, you must specify the fully-qualified name of the job in the cancel command. You can use the llq -l command to determine the fully-qualified name of the job.

Examples

This example cancels the job step 3 that is part of the job 18 that is scheduled by the machine named bronze:

llcancel bronze.18.3

This example cancels all the job steps that are a part of job 8 that are scheduled by the machine named gold.

llcancel gold.8

Results

The following shows a sample system response for the llcancel gold.8 command.

llcancel: Cancel command has been sent to the central manager.

llclass - Query Class Information

Purpose

Returns information about classes.

Syntax

llclass [-?] [-H] [-v] [-l] [classlist]

Flags

-?: Provides a short usage message.
-H: Provides entended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-l: Specifies that a long listing be generated for each class for which status is requested. If -l is not specified, then the standard listing is generated.
classlist: Is a blank-delimited list of classes for which you are requesting status. If no classlist is specified, all classes are queried.

If you have more than a few classes configured for LoadLeveler, consider redirecting the output to a file when you use the -l flag.

Examples

This example generates a long listing for classes named silver and gold.

llclass -l silver gold

Results

The Standard Listing: . The standard listing is generated when you do not specify -l with the llclass command. The following is sample output from the llclass silver command, where there are five silver classes configured in the cluster, with one silver class job currently running.

+--------------------------------------------------------------------------------+
|Name               MaxJobCPU     MaxProcCPU  Free   Max Description             |
|                  d+hh:mm:ss     d+hh:mm:ss Slots Slots                         |
|                                                                                |
|silver            0+00:30:00     0+00:10:00     4     5 silver grade jobs       |
+--------------------------------------------------------------------------------+

The standard listing includes the following fields:

MaxJobCPU: The CPU limit for all the processes in a job of this class. For a parallel job, this is the CPU limit for all processes in a task.
MaxProcCPU: The CPU limit for processes in this class.
Free Slots: The number of free slots (available classes) on this machine.
Max Slots: The total number of slots (configured classes) on this cluster.
Description: The description of this class.

The Long Listing: The long listing is generated when you specify the -l option on the llclass command. The following is sample output from the llclass -l silver command, where there are five silver classes configured in the cluster, with one silver class job currently running.

+--------------------------------------------------------------------------------+
|=============== Class silver ==========                                         |
|                Name: silver                                                    |
|            priority: 50                                                        |
|               admin: brownap                                                   |
|           NQS_class: F                                                         |
|          NQS_submit:                                                           |
|           NQS_query:                                                           |
|      max_processors: 1                                                         |
|            max_jobs: 3                                                         |
|       class_comment: silver grade jobs                                         |
|    wall_clock_limit:   0+00:60:00, -1                                          |
|       job_cpu_limit:   0+00:30:00, -1                                          |
|           cpu_limit:   0+00:10:00, -1                                          |
|          data_limit: -1, -1                                                    |
|          core_limit: -1, -1                                                    |
|          file_limit: -1, -1                                                    |
|         stack_limit: -1, -1                                                    |
|           rss_limit: -1, -1                                                    |
|                nice: 15                                                        |
|                free: 4                                                         |
|             maximum: 5                                                         |
|                                                                                |
+--------------------------------------------------------------------------------+

The long listing includes these fields:

Name: The name of the class
Priority: The system priority of this class relative to other classes.
admin: The list of administrators of this class.
NQS_class: Indicates whether this class is a gateway for an NQS system.
NQS_submit: The NQS queue where the job will be submitted.
NQS_query: The NQS queues to query where the job has been dispatched.
max_processors: The maximum number of processors than can be used for parallel jobs.
max_jobs: The maximum number of jobs the class can run at any time.
class_comment: The text supplied by the administrator describing this class.
wall_clock_limit: The hard and soft wall clock limits (the elapsed time for which the job can run).
job_cpu_limit: The hard and soft CPU limits for all processes in a job of this class.
cpu_limit: The hard and soft CPU limits for all processes in this class.
data_limit: The hard and soft limits for the data area used for processes in this class.
core_limit: The hard and soft core size limits.
file_limit: The hard and soft file size limits.
stack_limit: The hard and soft stack size limits.
rss_limit: The hard and soft rss size limits.
nice: The nice value of jobs in this class.
free: The number of classes available to new jobs.
maximum: The total number of configured classes in this cluster.

Related Information

Each machine periodically updates the central manager with a snapshot of its environment. Since the information returned by llclass is a collection of these snapshots, all taken at varying times, the total picture may not be completely consistent.

llctl - Control LoadLeveler Daemons

Purpose

Controls LoadLeveler daemons on all members of the LoadLeveler cluster.

Syntax

llctl [-?] [-H] [-v] [-q] [-g | -h host] keyword

Flags

-?: Provides a short usage message.
-H: Provides entended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-q: Specifies quiet mode: print no messages other than error messages.
-g: Indicates that the command should apply globally to all machines in the administration file.
-h host: Indicates that the command should apply to only this machine in the LoadLeveler cluster. If neither -h nor -g is specified, the default is the machine on which the llctl command is issued.
keyword: Must be specified after all flags and can be the following:

recycle

Stops all LoadLeveler daemons and restarts them.

reconfig

Forces all daemons to reread the configuration files.

start

Starts the LoadLeveler daemons on the specified machine. You must have rsh privileges to start LoadLeveler on a remote machine.

stop

Stops the LoadLeveler daemons on the specified machine.

purge list_of_machines

Forces a schedd to delete any queued transaction to the machines in the list_of_machines. If all jobs on the listed machines have completed, and there are no messages pending to that machine, this option is not necessary.

This option is intended for recovery and cleanup after a machine has permanently crashed or was inadvertantly removed from the LoadLeveler cluster before all activity on it was quiesced. Do not use this option unless the specified list_of_machines are guaranteed not to return to the LoadLeveler cluster.

If you need to return the machine to the cluster later, you must clear all files from the spool and execute directory of the machine which was deleted.

drain [schedd|startd [classlist |allclasses]]

When you issue drain with no options, the following happens: (1) no more LoadLeveler jobs can begin running on this machine, and (2) no more LoadLeveler jobs can be submitted through this machine. When you issue drain schedd, the following happens: (1) the schedd machine accepts no more LoadLeveler jobs for submission, (2) jobs in the Starting or Running state in the schedd queue are allowed to continue running, and (3) jobs in the Idle state in the schedd queue are drained, meaning they will not get dispatched. When you issue drain startd, the following happens: (1) the startd machine accepts no more LoadLeveler jobs to be run, and (2) jobs already running on the startd machine are allowed to complete. When you issue drain startd classlist, the classes you specify which are available on the startd machine are drained (made unavailable). When you issue drain startd allclasses, all available classes on the startd machine are drained.

flush

Terminates running jobs on this machine and sends them back, in the Idle state, to the negotiator to await redispatch (provided restart=yes in the job command file). No new jobs are sent to this machine until resume is issued. Forces a checkpoint if jobs are enabled for checkpointing. However, the checkpoint gets cancelled if it does not complete within a five minute period.

suspend

Suspends all jobs on this machine. This is not supported for parallel jobs.

resume [schedd|startd [classlist |allclasses]]

When you issue resume with no options, job submission and job execution on this machine is resumed. When you issue resume schedd, the schedd machine resumes the submission of jobs. When you issue resume startd, the startd machine resumes the execution of jobs. When you issue resume startd classlist, the startd machine resumes the execution of those job classes you specify which are also configured (defined on the machine). When you issue resume startd allclasses, the startd machine resumes the execution of all configured classes.

version

Displays version and release data at the screen.

capture eventname

Captures accounting data for all jobs running on the designated machines. eventname is the name you associate with the data, and must be a character string containing no blanks. For more information, see "Collecting Job Resource Data Based on Events".

Description

This command sends a message to the master daemon on the target machine requesting that action be taken on the members of the LoadLeveler cluster. Note the following when using this command:

After you make changes to the configuration files for a running cluster, be sure to issue llctl reconfig. This command causes the LoadLeveler daemons to reread the configuration files, and prevents problems that can occur when the LoadLeveler commands are using a new configuration while the daemons are using an old configuration.
The llctl drain startd classlist command drains classes on the startd machine, and the startd daemon remains operational. If you reconfigure the daemon, the draining of classes remains in effect. However, if the startd goes down and is brought up again (either by the master daemon or by a LoadLeveler administrator), the startd daemon is configured according to the global or local configuration file in effect, and therefore the draining of classes is lost.
Draining all the classes on a startd machine is not equivalent to draining the startd machine. When you drain all the classes, the startd enters the Idle state. When you drain the startd, the startd enters the Drained state. Similarly, resuming all the classes on a startd machine is not equivalent to resuming the startd machine.
If a parallel job is running on a machine that receives the llctl recycle command, or the llctl stop and llctl start commands, the running job is terminated. You can restart the job by resubmitting the job or by specifying the restart=yes option in the job command file.
If a serial job is running on a machine that receives the llctl recycle command, or the llctl stop and llctl start commands, the running job is terminated. You can restart the job by resubmitting the job or by enabling checkpointing and specifying the restart=yes option in the job command file.
If you find that the llctl -g start command or the llctl -g * command is taking a long time to complete, you should consider using the SP dsh command to send llctl commands to multiple nodes in a parallel fashion. For more information on dsh, see IBM RS/6000 Scalable POWERparallel Systems: Administration Guide, (SH26-2486).

Examples

This example stops LoadLeveler on the machine named iron:

llctl -h iron stop

This example starts the LoadLeveler daemons on all members of the LoadLeveler cluster, starting with the central manager, as defined in the machine stanzas of the administration file:

llctl -g start

This example causes the LoadLeveler daemons on machine iron to re-read the configuration files, which may contain new configuration information for the iron machine:

llctl -h iron reconfig

For the next three examples, suppose the classes small, medium, and large are available on the machine called iron.

This example drains the classes medium and large on the machine named iron.

llctl -h iron drain startd medium large

This example drains the classes medium and large on all machines.

llctl -g drain medium large

This example resumes the classes medium and large on the machine named iron.

llctl -h iron resume startd medium large

This example illustrates how to capture accounting information on a work shift called day on the machine iron:

llctl -h iron capture day

You can capture accounting information on all the machines in the LoadLeveler cluster by using the -g option, or you can collect accounting information on the local machine by simply issuing the following:

llctl capture day

Capturing information on the local machine is the default. For more information, see "Collecting Job Resource Data Based on Events".

Assume the machine earth has crashed while running jobs. Its hard disk needs to be replaced. You try to cancel the jobs that were running on that machine. The schedd marks the job Remove Pending until it gets confirmation from earth that the jobs were removed. Since earth will be reinstalled, you need to inform schedd that it should not wait for confirmation.

Assume the schedd is named mars, and the running jobs are named mars.1.0 and mars.1.1. First you want to tell the negotiator to remove the jobs:

llcancel  mars.1.0
llcancel  mars.1.1

Next, tell the schedd not to wait for confirmation from earth before marking the jobs removed.

llctl -h mars purge earth

Results

The following shows the result of the llctl -h mars purge earth command.

llctl: Sent purge command to host mars

llextSDR - Extract adapter information from the SDR

Purpose

Extracts adapter information from the system data repository (SDR) and creates adapter and machine stanzas for each node in an RS/6000 SP partition. You can use the information in these stanzas in the LoadLeveler administration file. This command writes the stanzas to standard output.

Syntax

llextSDR [-?] [-H] [-v] [-a adapter]

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-a adapter: Specifies that the interface name of the given adapter on each node is used as the label (machine stanza name) of the generated machine stanza. If you do not specify an adapter, the label used is the initial_hostname field of the Node class in the SDR.

Description

This command is available to users and administrators.

In the SDR, the Node class contains an entry for each node in the SP partition. The Adapter class contains an entry for each adapter configured on a node. This command extracts the information in the Adapter class and creates an adapter stanza. This command also creates a machine stanza which identifies the node and the adapters attached to the node. The generated machine stanza also includes the spacct_excluse_enable keyword, whose value is obtained from the spacct_excluse_enable attribute in the SP class of the SDR. For more information on adapter stanzas, see "Step 5: Specify Adapter Stanzas". For more information on machine stanzas, see "Step 1: Specify Machine Stanzas".

The partition for which information is extracted is either the default partition or that specified with the SP_NAME environment variable. For the control workstation, the default partition is the default system partition. For an SP node, the default partition is the partition to which the node belongs.

You must issue this command on a machine with the ssp.clients file set installed. If you issue this command from a non-SP workstation, you must set SP_NAME to the IP address of the appropriate SDR instance for the partition.

Examples

The following example creates adapter and machine stanzas for all nodes in a partition:

llextSDR

The following example creates machine stanzas with each node's css0 interface name as the label.

llextSDR -a css0

Results

You may need to alter or add information to the stanzas produced by this command when you incorporate the stanzas into the administration file. For example, administrators may want to have each network_type field use a value that reflects the type of nodes installed on the network. Users will need to know the values used for network_type so that they can specify an appropriate value in their job command files.

Also, the output of this command includes fully-qualified machine names. If your existing administration file uses short names, you may need to change either the command output or your existing administration file so that you use either all fully-qualified names or all short names.

The following shows sample output for the llextSDR command, where the default partition is k4s. This sample output shows the first two nodes in the partition.

k4inst.ppd.pok.ibm.com: type = machine
 adapter_stanzas = k4n01.ppd.pok.ibm.com k4sn01.ppd.pok.ibm.com 
                   k4inst.ppd.pok.ibm.com
 spacct_excluse_enable = true
 
k4n01.ppd.pok.ibm.com: type = adapter
 adapter_name = en1
 network_type = ethernet
 interface_address = 9.114.45.65
 interface_name = k4n01.ppd.pok.ibm.com
 
k4sn01.ppd.pok.ibm.com: type = adapter
 adapter_name = css0
 network_type = switch
 interface_address = 9.114.45.129
 interface_name = k4sn01.ppd.pok.ibm.com
 switch_node_number = 0
 
k4inst.ppd.pok.ibm.com: type = adapter
 adapter_name = en0
 network_type = ethernet
 interface_address = 9.114.45.1
 interface_name = k4inst.ppd.pok.ibm.com
 
k4n03.ppd.pok.ibm.com: type = machine
 adapter_stanzas = k4sn03.ppd.pok.ibm.com k4n03.ppd.pok.ibm.com
 spacct_excluse_enable = true
 
k4sn03.ppd.pok.ibm.com: type = adapter
 adapter_name = css0
 network_type = switch
 interface_address = 9.114.45.131
 interface_name = k4sn03.ppd.pok.ibm.com
 switch_node_number = 2
 
k4n03.ppd.pok.ibm.com: type = adapter
 adapter_name = en0
 network_type = ethernet
 interface_address = 9.114.45.67
 interface_name = k4n03.ppd.pok.ibm.com
   .
   .
   .

The following shows sample output for the llextSDR -a css0 command for a single node.

k10sn09.ppd.pok.ibm.com: type = machine
 adapter_stanzas = k10sn09.ppd.pok.ibm.com k10n09.ppd.pok.ibm.com
 spacct_excluse_enable = true
 
k10sn09.ppd.pok.ibm.com: type = adapter
 adapter_name = css0
 network_type = switch
 interface_address = 9.114.51.137
 interface_name = k10sn09.ppd.pok.ibm.com
 switch_node_number = 8
 
k10n09.ppd.pok.ibm.com: type = adapter
 adapter_name = en0
 network_type = ethernet
 interface_address = 9.114.51,73
 interface_name = k10n09.ppd.pok.ibm.com

llfavorjob - Reorder System Queue by Job

Purpose

Sets specified jobs to a higher system priority than all jobs that are not favored. This command also unfavors previously favored job(s), restoring the original priority, when you specify the -u flag.

Syntax

llfavorjob [-?] [-H] [-v] [-q] [-u] joblist

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-v

Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-q

Specifies quiet mode: print no messages other than error messages.

-u

Unfavors previously favored jobs, requeuing them according to their original priority levels.

joblist

Is a blank-delimited list of jobs of the form host.jobid.stepid where:

host is the name of the machine to which the job was submitted (delimited by dot). The default is the local machine.
jobid is the job ID assigned to the job by LoadLeveler when it was submitted using the llsubmit command. jobid is required.
stepid (delimited by dot) Is the job step ID assigned to the job by LoadLeveler when it was submitted using the llsubmit command. The default is to include all members of the job.

Description

If this command is issued against jobs that are already running, it has no effect. If the job vacates, however, and returns to the queue, the job gets re-ordered with the new priority.

If more than one job is affected by this command, then the jobs are ordered by the sysprio expression and are scanned before the not favored jobs. However, favored jobs which do not match the job requirements with available machines may run after not favored jobs. This command remains in effect until reversed with the -u option.

Examples

This example assigns jobs 12.4 on the machine iron and 8.2 on zinc the highest priorities in the system, with the jobs ordered by the sysprio expression.

llfavorjob iron.12.4 zinc.8.2

This example unfavors jobs 12.4 on the machine iron and 8.2 on the machine zinc.

llfavorjob -u iron.12.4 zinc.8.2

llfavoruser - Reorder System Queue by User

Purpose

Sets a user's job(s) to the highest priority in the system, regardless of the current setting of the job priority. Jobs already running are not affected. This command also unfavors the user's job(s), restoring the original priority, when you specify the -u flag.

Syntax

llfavoruser [-?] [-H] [-v] [-q] [-u] userlist

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-q: Specifies quiet mode: print no messages other than error messages.
-u: Unfavors previously favored users, reordering their job(s) according to their original priority level(s). If -u is not specified, the user's job(s) are favored.
userlist: Is a blank-delimited list of users whose jobs are given the highest priority. If -u is specified, userlist jobs are unfavored.

Description

This command affects your current and future jobs until you remove the favor.

When the central manager daemon is restarted, any favor applied to users is revoked.

The user's jobs still remain ordered by user priority (which may cause jobs for the user to swap sysprio). If more than one user is affected by this command, the jobs of favored users are ordered by sysprio and are scanned before the jobs of not favored users. However, jobs of favored users which do not match job requirements with available machines may run after jobs of not favored users.

Examples

This example grants highest priority to all queued jobs submitted by users ellen and fred according to the sysprio expression.

llfavoruser ellen fred

This example unfavors all queued jobs submitted by users ellen and fred.

llfavoruser -u ellen fred

llhold - Hold or Release a Submitted Job

Purpose

Places jobs in user hold or system hold and releases jobs from both types of hold. Users can only move their own jobs into and out of user hold. Only LoadLeveler administrators can move jobs into and release them from system hold.

Syntax

llhold [-?] [-H] [-v] [-q] [-s] [-r] [-u userlist] [-h hostlist] [joblist]

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-v

Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-q

Specifies quiet mode: print no messages other than error messages.

-s

Puts job(s) in system hold. Only a LoadLeveler administrator can use this option.

If neither -s nor -r is specified, LoadLeveler puts the job(s) in user hold.

-r

Releases a job from hold. A job in user hold is released unless it is also in system hold, where it remains. A job in system hold is released unless it is also in user hold, where it remains.

Only a LoadLeveler administrator can release jobs from system hold. Only an administrator or the owner of a job can release it from user hold.

If neither -s nor -r is specified, LoadLeveler puts the job(s) in user hold.

-u userlist

Is a blank-delimited list of users. When used with the -h option, only the user's jobs monitored on the machines in the hostlist are held or released. When used alone, only the user's jobs monitored on the schedd machine are held or released.

-h hostlist

Is a blank-delimited list of machine names. All jobs monitored on machines in this list are held or released. When issued with the -u option, the userlist is used to further select jobs for holding or releasing.

When issued by a non-administrator, this option only acts upon jobs that user has submitted to the machines in hostlist.

When issued by an administrator, all jobs monitored on the machines are acted upon unless the -u option is also used. In that case, the userlist is also part of the selection process, and only jobs both submitted by users in userlist and monitored on the machines in the hostlist are acted upon.

joblist

Is a blank-delimited list of jobs of the form host.jobid.stepid where:

host is the name of the machine to which the job was submitted (delimited by dot). The default is the local machine.
If the job was submitted from a submit-only machine, this is the name of the schedd machine that sent the job to the negotiator.
jobid is the job ID assigned to the job when it was submitted using the llsubmit command. jobid is required.
stepid (delimited by dot) is the step ID assigned to the job by LoadLeveler when it was submitted using the llsubmit command. The default is to include all steps of the job.

Description

This command does not affect a job step that is running unless the job step attempts to enter the Idle state. At this point, the job step is placed in the Hold state.

To ensure a job is released from both system hold and user hold, the administrator must issue the command with -r specified to release it from system hold. The administrator or the submitting user can reissue the command to release the job from user hold.

This command will fail if:

a non-administrator attempts to move a job into or out of system hold.
a non-administrator attempts to move a job submitted by someone else into or out of user hold.

Examples

This example places job 23, job step 0 and job 19, job step 1 on hold.

llhold 23.0 19.1

This example releases job 23, job step 0, job 19, job step 1, and job 20, job step 3 fron a hold state.

llhold -r 23.0 19.1 20.3

This example places all jobs from users abe, barbara, and carol2 in system hold.

llhold -s -u abe barbara carol2

This example releases from a hold state all jobs on machines bronze, iron, and steel.

llhold -r -h bronze iron steel

This example releases from a hold state all jobs on machines bronze, iron, and steel that smith submitted.

llhold -r -u smith -h bronze iron steel

Results

The following shows a sample system response for the llhold -r -h bronze command.

llhold: Hold command has been sent to the central manager.

llinit - Initialize Machines in the LoadLeveler Cluster

Purpose

Initializes a new machine as a member of the LoadLeveler hardware resource cluster

Syntax

llinit [-?] [-H] [-q] [-prompt] [-local pathname] [-release pathname] [-cm machine] [-debug]

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-q

Specifies quiet mode: print no messages other than error messages.

-prompt

Prompts or leads you through a set of questions that help you to complete the llinit command.

-local pathname

Where pathname is the local directory on which to create the spool, execute, and log sub-directories. The default, if this flag is not used, is the home directory.

There must be a unique local directory for each LoadLeveler cluster member.

-release pathname

Where pathname is the release directory, where the LoadLeveler bin, lib, man, include, and samples subdirectories are located. The default, if this flag is not used, is the /usr/lpp/LoadL/full directory.

-cm machine

Where machine is the central manager machine, where the negotiator daemon runs.

-debug

Displays a large amount of messages, tracing the path through llinit during execution. This is intended for debugging purposes only.

Description

This command runs once on each machine during the installation process. It must be run by the user ID you have defined as the LoadLeveler user ID. The log, spool, and execute directories are created with the correct modes and ownerships. The LoadLeveler configuration and administration files, LoadL_config and LoadL_admin, respectively, are copied from LoadLeveler's release directory to LoadLeveler's home directory. The local configuration file, LoadL_config.local, is copied from LoadLeveler's release directory to LoadLeveler's local directory.

llinit initializes a new machine as a member of the LoadLeveler resource cluster by doing the following:

Creates the following LoadLeveler subdirectories with the given permissions:

spool subdirectory, with permissions set to 700.
execute subdirectory, with permissions set to 1777.
log subdirectory, with permissions set to 775.
Copies the LoadL_config and LoadL_admin files from the release directory samples subdirectory into the loadl home directory.
Copies the LoadL_config.local file from the release directory samples subdirectory into the local directory.
Creates symbolic links from the loadl home directory to the spool, execute, and log subdirectories and the LoadL_config.local file in the local directory (if home and local directories are not identical).
Creates symbolic links from the home directory to the bin, lib, man, samples, and include subdirectories in the release directory.
Updates the LoadL_config with the release directory name.
Updates the LoadL_admin with the central manager machine name.

Before running llinit ensure that your HOME environment variable is set to LoadLeveler's home directory. To run llinit you must have:

Write privileges in the LoadLeveler home directory
Write privileges in the LoadLeveler release directory
Write privileges in the LoadLeveler local directory.

Examples

The following example initializes a machine, assigning /var/loadl as the local directory, /usr/lpp/LoadL/full as the release directory, and the machine named bronze as the central manager.

llinit -local /var/loadl -release /usr/lpp/LoadL/full -cm bronze

Results

The following is sample output from this command:

llinit -local /home/ll_admin -release /usr/lpp/LoadL/full -cm mars

llinit: creating directory "/home/ll_admin/spool" llinit: creating directory "/home/ll_admin/log" llinit: creating directory "/home/ll_admin/execute" llinit: set permission "700" on "/home/ll_admin/spool" llinit: set permission "775" on "/home/ll_admin/log" llinit: set permission "1777" on "/home/ll_admin/execute" llinit: creating file "/home/ll_admin/LoadL_admin" llinit: creating file "/home/ll_admin/LoadL_config" llinit: creating file "/home/ll_admin/LoadL_config.local" llinit: editing file /home/ll_admin/LoadL_config llinit: editing file /home/ll_admin/LoadL_admin llinit: creating symbolic link "/home/ll_admin/bin -> /usr/lpp/LoadL/full/bin" llinit: creating symbolic link "/home/ll_admin/lib -> /usr/lpp/LoadL/full/lib" llinit: creating symbolic link "/home/ll_admin/man -> /usr/lpp/LoadL/full/man" llinit: creating symbolic link "/home/ll_admin/samples -> /usr/lpp/LoadL/full/samples" llinit: creating symbolic link "/home/ll_admin/include -> /usr/lpp/LoadL/full/include" llinit: program complete.

llprio - Change the User Priority of Submitted Job Steps

Purpose

Changes the user priority of one or more job steps in the LoadLeveler queue. You can adjust the priority by supplying a + (plus) or - (minus) immediately followed by an integer value. llprio does not affect a job step that is running, even if its priority is lower than other jobs steps, unless the job step goes into the Idle state.

Syntax

llprio [-?] [-H] [-v] [-q] [+integer | -integer | -p priority] joblist

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-v

Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-q

Specifies quiet mode: print no messages other than error messages.

+ | - integer

Operates on the current priority of the job step, making it higher (closer to execution) or lower (further from execution) by adding or subtracting the value of integer.

-p priority

Is the new absolute value for priority. The valid range is 0-100 (inclusive) where 0 is the lowest possible priority and 100 is highest.

joblist

Is a blank-delimited list of jobs of the form host.jobid.stepid where:

host is the name of the machine to which the job step was submitted (delimited by dot). The default is the local machine.
If the job step was submitted from a submit-only machine, this is the name of the machine where the schedd daemon that sent the job to the negotiator resides.
jobid is the job ID assigned to the job when it was submitted using the llsubmit command. jobid is required.
stepid (delimited by dot) is the job step ID assigned to the job when it was submitted using the llsubmit command.

Description

The user priority of a job step ranges from 0 to 100 inclusively, with higher numbers corresponding to greater priority. The default priority is 50. Only the owner of a job step or the LoadLeveler administrator can change the priority of that job step. Note that the priority is not the UNIX nice priority.

Priority changes resulting in a value less than 0 become 0.

Priority changes resulting in a value greater than 100 become 100.

Any change to a job step's priority applied by a user is relative only to that user's other job steps in the same class. If you have three job steps enqueued, you can reorder those three job steps with llprio but the result does not affect job steps submitted by other users, regardless of their priority and position in the queue.

See "Setting and Changing the Priority of a Job" for more information.

Examples

This example raises the priority of job 4, job step 1 submitted to machine bronze by a value of 25.

llprio +25 bronze.4.1

This example sets the priority of job 18, job step 4 submitted to machine silver to 100, the highest possible value.

llprio -p 100 silver.18.4

Results

The following shows a sample system response for the llprio -p 100 silver.18.4 command.

llprio: Priority command has been sent to the central manager.

llq - Query Job Status

Purpose

Returns information about jobs that have been dispatched.

Syntax

llq [-?] [-H] [-v] [-x] [-s] [ -l] [joblist] [-u userlist] [-h hostlist] [-c classlist] [-f category_list] [-r category_list]

Flags

-?

Provides a short usage message.

-H

Provides extended help information.

-v

Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.

-x

Provides extended information about the selected job. If the -x flag is used with the -r, -s, or -f flag, an error message is generated.

CPU usage and other resource consumption information on active jobs can only be reported using the -x flag if the LoadLeveler administrator has enabled it by specifying A_ON and A_DETAIL for the ACCT keyword in the LoadLeveler configuration file.

Normally, llq connects with the central manager to obtain job information. When you specify -x, llq connects to the schedd machine that received the specified job to get extended job information.

When specified without -l, CPU usage for active jobs is reported in the short format. Using -x can produce a very long report and can cause excess network traffic.

-s

Provides information on why a selected list of jobs remain in the NotQueued, Idle or Deferred state. Along with this flag, users must specify a list of jobs. The user can also optionally supply a list of machines to be considered when determining why the job(s) cannot run. If a list of machines is not provided, the default is the list of machines in the LoadLeveler cluster. For each job, llq determines why the job remains in one of the given states instead of Running.

-l

Specifies that a long listing be generated for each job for which status is requested. Detailed information can only be displayed for jobs belonging to the user issuing the llq command. If status is requested for a job which does not belong to the user, an error message is generated and no further output is displayed. Administrators can always display detailed information about any job. Fields included in the long listing are shown in Results. Only the owner of a job and the administrator can use this option.

If -l is not specified, then the standard listing is generated as shown in Results.

joblist

Is a blank-delimited list of jobs of the form host.jobid.stepid where:

host is the name of the machine to which the job was submitted (delimited by dot). The default is the local machine.
If the job was submitted from a submit-only machine, this is the name of the machine where the schedd daemon that sent the job to the negotiator resides.
jobid is the job id assigned to the job when it was submitted using the llsubmit command.
stepid (delimited by dot) Is the step id assigned to the job when it was submitted using the llsubmit command. The default is to include all members of the cluster.

-u userlist

Is a blank-delimited list of users. When used with the -h option, only the user's jobs monitored on the machines in the hostlist are queried. When used alone, only the user's jobs monitored on the schedd machine are queried.

-h hostlist

Is a blank-delimited list of machines. If the -s flag is not specified, all jobs monitored on machines in this list are queried. If the -s flag is specified, the list of machines is considered when determining why a job remains in Idle state. When issued with the -u option, the userlist is used to further select jobs for querying.

-c classlist

Is a blank-delimited list of classes. When used with -h, only those jobs monitored on the machines in the hostlist are queried.

-f category_list

Is a blank-delimited list of categories you want to query. Each category you specify must be preceded by a percent sign. The category_list cannot contain duplicate entries. This flag allows you to create a customized version of the standard llq listing. You cannot use this flag with the -l flag. The output fields produced by this flag all have a fixed length. The output is displayed in the order in which you specify the categories. category_list can be one or more of the following:

%a: Account number
%c: Class
%cc: Completion code
%dc: Completion date
%dd: Dispatch Date
%dh: Hold date
%dq: Queue date
%gl: LoadLeveler group
%gu: UNIX group
%h: Host (First hostname if more than one is allocated to the job)
%id: Step ID
%is: Virtual image size
%jn: Job name
%jt: Job type
%nh: Number of hosts allocated to the job
%o: Job owner
%p: User priority
%sn: Step name
%st: Status

-r category_list: Is a blank-delimited list of formats (categories) you want to query. Each category you specify must be preceded by a percent sign. The category_list cannot contain duplicate entries. This flag allows you to create a customized version of the standard llq listing. You cannot use this flag with the -l flag. The output produced by this flag is considered raw, in that the fields can be variable in length. Output fields are separated by an exclamation point (!). The output is displayed in the order in which you specify the formats. category_list can be one or more of the formats listed under the -f flag.

If the -u or -h options are not specified, and if no jobid is specified, then all jobs are queried.

The -u and -h options override the jobid parameters.

Examples

This example generates a long listing for job 8, job step 2 submitted to machine gold.

llq -l gold.8.2

This example generates a standard listing for all job steps of job name 12 submitted to the local machine.

llq 12

Results

In this section, the term "job step" refers to either a serial job step or a parallel task.

Standard Listing: The standard listing is generated when you do not specify the -l option with the llq command. The following is sample output from the llq -h mars command, where the machine mars has two jobs running and one job waiting.

+--------------------------------------------------------------------------------+
|Id                       Owner      Submitted   ST PRI Class        Running On  |
|------------------------ ---------- ----------- -- --- ------------ ----------- |
|mars.498.0               brownap     5/20 11:31 R  100 silver       mars        |
|mars.499.0               brownap     5/20 11:31 R  50  No_Class     mars        |
|mars.501.0               brownap     5/20 11:31 I  50  silver                   |
|                                                                                |
|3 job steps in queue, 1 waiting, 0 pending, 2 running, 0 held.                  |
+--------------------------------------------------------------------------------+

The standard listing includes the following fields:

Id: job identifier presented as host.jobid.stepid. The job ID may be shortened if the job ID is a name which would be too long to fit into the standard format. The name is shortened by removing the domain name from the host portion of the job ID. In this case, a dash (-) is added to the shortened name to indicate that it was shortened. To see the full job ID run llq with the -l flag.
Owner: userid of the job submitter.
Submitted: date and time of job submission.
ST: current job status (state). Job status can be:

C: Completed
CA: Cancelled
CP: Complete Pending
D: Deferred
H: User Hold
HS: User Hold and System Hold
I: Idle
NR: Not Run
NQ: Not Queued
P: Pending
R: Running
RM: Removed
RP: Remove Pending
S: System Hold
ST: Starting
SX: Submission Error
TX: Terminated
V: Vacated
VP: Vacate Pending
X: Rejected
XP: Reject Pending

For a detailed explanation of job states, see "LoadLeveler Job States".

PRI: user priority of the job, where the values are defined with the user_priority keyword in the job command file or changed by the llprio command. See llprio - Change the User Priority of Submitted Job Steps
Class: job class.
Running On: if running, the machine the job is running on. This is blank when the job is not running. For parallel jobs, only the first machine is shown.

Customized, Formatted Standard Listing: A customized and formatted standard listing is generated when you specify llq with the -f flag. The following is sample output from this command:

   llq -f %id %c %dq %dd %gl %h

+--------------------------------------------------------------------------------+
|Step Id           Class      Queue Date  Disp. Date  LL Group   Running On      |
|----------------- ---------- ----------- ----------- ---------- --------------- |
|ll6.2.0           No_Class   04/08 09:19 04/08 09:21 No_Group   ll6.pok.ibm.com |
|ll6.1.0           No_Class   04/08 09:19 04/08 09:21 No_Group   ll6.pok.ibm.com |
|ll6.3.0           No_Class   04/08 09:19 04/08 09:21 No_Group   ll5.pok.ibm.com |
|                                                                                |
|3 job steps in queue, 0 waiting, 0 pending, 3 running, 0 held                   |
+--------------------------------------------------------------------------------+

Customized, Unformatted Standard Listing: A customized and unformatted (raw) standard listing is generated when you specify llq with the -r flag. Output fields are separated by an exclamation point (!). The following is sample output from this command:

   llq -r %id %c %dq %dd %gl %h

+--------------------------------------------------------------------------------+
|ll6.pok.ibm.com.2.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll6.pok.ibm.com   |
|ll6.pok.ibm.com.1.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll6.pok.ibm.com   |
|ll6.pok.ibm.com.3.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll5.pok.ibm.com   |
+--------------------------------------------------------------------------------+

The Long Listing: The long listing is generated when you specify the -l option with the llq command. This section contains sample output for two llq commands: one querying a serial job and one querying a parallel job. Following the sample output is an explanation of all possible fields displayed by the llq command.

The following is sample output for the llq -l command for the serial job "ll6.pok.ibm.com.2."

+--------------------------------------------------------------------------------+
|=============== Job Step ll6.pok.ibm.com.2.0 ===============                    |
|        Job Step Id: ll6.pok.ibm.com.2.0                                        |
|           Job Name: ll6.pok.ibm.com.2                                          |
|          Step Name: ltest1                                                     |
|  Structure Version: 9                                                          |
|              Owner: loadl                                                      |
|         Queue Date: Wed Apr  8 09:19:21 1998                                   |
|             Status: Running                                                    |
|      Dispatch Time: Wed Apr  8 09:21:40 1998                                   |
|    Completion Date:                                                            |
|    Completion Code:                                                            |
|      User Priority: 50                                                         |
|       user_sysprio: 0                                                          |
|      class_sysprio: 30                                                         |
|      group_sysprio: 0                                                          |
|    System Priority: -1116                                                      |
|          q_sysprio: -1116                                                      |
|      Notifications: Complete                                                   |
| Virtual Image Size: 1 kilobytes                                                |
|         Checkpoint:                                                            |
|            Restart: yes                                                        |
|     Hold Job Until:                                                            |
|                Cmd: c_test1.cmd                                                |
|               Args:                                                            |
|                Env:                                                            |
|                 In: /dev/null                                                  |
|                Out: c_test1_cmd.ll6.2.0.out                                    |
|                Err: c_test1_cmd.ll6.2.0.err                                    |
|Initial Working Dir: /home/loadl/TEST_DIR                                       |
|         Dependency:                                                            |
|       Requirements: ((Arch == "R6000") && (OpSys == "AIX43"))                  |
|        Preferences:                                                            |
|          Step Type: Serial                                                     |
|     Min Processors:                                                            |
|     Max Processors:                                                            |
|     Allocated Host: ll6.pok.ibm.com                                            |
|    Submitting host: ll6.pok.ibm.com                                            |
|        Notify User: loadl@ll6.pok.ibm.com                                      |
|              Shell: /bin/ksh                                                   |
|  LoadLeveler Group: No_Group                                                   |
|              Class: No_Class                                                   |
|     Cpu Hard Limit: -1                                                         |
|     Cpu Soft Limit: -1                                                         |
|    Data Hard Limit: -1                                                         |
|    Data Soft Limit: -1                                                         |
|    Core Hard Limit: -1                                                         |
|    Core Soft Limit: -1                                                         |
|    File Hard Limit: -1                                                         |
|    File Soft Limit: -1                                                         |
|   Stack Hard Limit: -1                                                         |
|   Stack Soft Limit: -1                                                         |
|     Rss Hard Limit: -1                                                         |
|     Rss Soft Limit: -1                                                         |
|Step Cpu Hard Limit: -1                                                         |
|Step Cpu Soft Limit: -1                                                         |
|Wall Clk Hard Limit: 3000 seconds                                               |
|Wall Clk Soft Limit: -1                                                         |
|            Comment:                                                            |
|            Account:                                                            |
|         Unix Group: loadl                                                      |
| User Space Windows: 0                                                          |
|   NQS Submit Queue:                                                            |
|   NQS Query Queues:                                                            |
+--------------------------------------------------------------------------------+

The following is sample output for the llq -l -x k10n10.3.0 command, where k10n10.3.0 is a parallel job.

+--------------------------------------------------------------------------------+
|=============== Job Step k10n10.ppd.pok.ibm.com.3.0 ===============             |
|        Job Step Id: k10n10.ppd.pok.ibm.com.3.0                                 |
|           Job Name: k10n10.ppd.pok.ibm.com.3                                   |
|          Step Name: 0                                                          |
|  Structure Version: 9                                                          |
|              Owner: richc                                                      |
|         Queue Date: Wed Apr  8 13:33:10 1998                                   |
|             Status: Running                                                    |
|      Dispatch Time:                                                            |
|         Start Time:                                                            |
|    Completion Date:                                                            |
|    Completion Code:                                                            |
|      User Priority: 50                                                         |
|       user_sysprio: 0                                                          |
|      class_sysprio: 0                                                          |
|      group_sysprio: 0                                                          |
|    System Priority: 0                                                          |
|          q_sysprio: 0                                                          |
|      Notifications: Always                                                     |
| Virtual Image Size: 506 kilobytes                                              |
|         Checkpoint:                                                            |
|            Restart: yes                                                        |
|     Hold Job Until:                                                            |
|                Env:                                                            |
|                 In: /dev/null                                                  |
|                Out: mpi.out                                                    |
|                Err: mpi.err                                                    |
|Initial Working Dir: /u/richc/sp/mpi                                            |
|         Dependency:                                                            |
|          Step Type: General Parallel                                           |
|    Submitting host: k10n10.ppd.pok.ibm.com                                     |
|        Notify User: richc@k10n10.ppd.pok.ibm.com                               |
|              Shell: /bin/ksh                                                   |
|  LoadLeveler Group: No_Group                                                   |
|              Class: No_Class                                                   |
|     Cpu Hard Limit: -1                                                         |
|     Cpu Soft Limit: -1                                                         |
|    Data Hard Limit: -1                                                         |
|    Data Soft Limit: -1                                                         |
|    Core Hard Limit: -1                                                         |
|    Core Soft Limit: -1                                                         |
|    File Hard Limit: -1                                                         |
|    File Soft Limit: -1                                                         |
|   Stack Hard Limit: -1                                                         |
|   Stack Soft Limit: -1                                                         |
|     Rss Hard Limit: -1                                                         |
|     Rss Soft Limit: -1                                                         |
|Step Cpu Hard Limit: -1                                                         |
|Step Cpu Soft Limit: -1                                                         |
|Wall Clk Hard Limit: 6000 seconds                                               |
|Wall Clk Soft Limit: 5965 seconds                                               |
|            Comment:                                                            |
|            Account:                                                            |
|         Unix Group: usr                                                        |
|   NQS Submit Queue:                                                            |
|   NQS Query Queues:                                                            |
|Negotiator Messages:                                                            |
+--------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------+
|--------------- Detail for k10n10.ppd.pok.ibm.com.3.0 ---------------           |
|       Running Host: k10n09.ppd.pok.ibm.com                                     |
|      Machine Speed: 1.000000                                                   |
|  Starter User Time:   0+00:00:00.170000                                        |
|Starter System Time:   0+00:00:00.300000                                        |
| Starter Total Time:   0+00:00:00.470000                                        |
|     Starter maxrss: 1256                                                       |
|      Starter ixrss: 5628                                                       |
|      Starter idrss: 9552                                                       |
|      Starter isrss: 0                                                          |
|     Starter minflt: 793                                                        |
|     Starter majflt: 10                                                         |
|      Starter nswap: 0                                                          |
|    Starter inblock: 0                                                          |
|    Starter oublock: 0                                                          |
|     Starter msgsnd: 0                                                          |
|     Starter msgrcv: 0                                                          |
|   Starter nsignals: 0                                                          |
|      Starter nvcsw: 399                                                        |
|     Starter nivcsw: 31                                                         |
|     Step User Time:   0+00:00:00.40000                                         |
|   Step System Time:   0+00:00:00.40000                                         |
|    Step Total Time:   0+00:00:00.80000                                         |
|        Step maxrss: 960                                                        |
|         Step ixrss: 2120                                                       |
|         Step idrss: 2436                                                       |
|         Step isrss: 0                                                          |
|        Step minflt: 273                                                        |
|        Step majflt: 12                                                         |
|         Step nswap: 0                                                          |
|       Step inblock: 0                                                          |
|       Step oublock: 0                                                          |
|        Step msgsnd: 0                                                          |
|        Step msgrcv: 0                                                          |
|      Step nsignals: 0                                                          |
|         Step nvcsw: 0                                                          |
|        Step nivcsw: 0                                                          |
|-------------------------------------------------                               |
|Node                                                                            |
|----                                                                            |
|                                                                                |
|   Name            :                                                            |
|   Requirements    :                                                            |
|   Preferences     :                                                            |
|   Node minimum    : 2                                                          |
|   Node maximum    : 2                                                          |
|   Node actual     : 2                                                          |
|   Allocated Hosts : k10n09.ppd.pok.ibm.com:RUNNING:css0(0,MPI,us),             |
|                     css0(1,MPI,us)                                             |
|                   + k10n10.ppd.pok.ibm.com:RUNNING:css0(0,MPI,us),             |
|                     css0(1,MPI,us)                                             |
+--------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------+
|   Master Task                                                                  |
|   -----------                                                                  |
|                                                                                |
|      Executable   : /u/richc/sp/poe/poe.musppa                                 |
|      Exec Args    : /u/richc/sp/mpi/fvt_mpi -v 131072 -euilib us -ilevel 6     |
|                     -labelio yes -pmdlog yes                                   |
|      Num Task Inst: 1                                                          |
|      Task Instance: k10n09:-1                                                  |
|                                                                                |
|   Task                                                                         |
|   ----                                                                         |
|                                                                                |
|      Num Task Inst: 4                                                          |
|      Task Instance: k10n09:0:css0(0,MPI,us)                                    |
|      Task Instance: k10n09:1:css0(1,MPI,us)                                    |
|      Task Instance: k10n10:2:css0(0,MPI,us)                                    |
|      Task Instance: k10n10:3:css0(1,MPI,us)                                    |
+--------------------------------------------------------------------------------+

The long listing includes these fields:

Job Step ID: job step identifier.
Job Name: name of the job.
Step Name: name of the job step
Structure Version: internal version identifier.
Owner: userid of the job submitter.
Queue Date: date and time job was received by LoadLeveler.
Status: status (state) of the job. Job status can be:

Cancelled

Completed

Complete Pending

Deferred

Idle

Not Queued

Not Run

Pending

Rejected

Reject Pending

Removed

Remove Pending

Running

Starting

Submission Error

System Hold

System and User Hold

Terminated

User Hold

Vacated

Vacate Pending

For a detailed explanation of these job states, see "LoadLeveler Job States".

Dispatch Time

the time the job was dispatched.

Completion Date

date and time job completed or exited.

Completion Code

the status returned by the wait3 UNIX system call.

User Priority

priority of the job, as specified by the user in the job command, or changed by the llprio command.

user_sysprio

user system priority of the job, where the value is defined in the administration file.

class_sysprio

class priority of the job, where the value is defined in the administration files.

group_sysprio

group priority of the job, where the value is defined in the administration files.

System Priority

overall system priority of the job, where the value is defined by the SYSPRIO expression in the configuration file.

q_sysprio

adjusted system priority of the job (See "How Does a Job's Priority Affect Dispatching Order?".)

Notifications

notification status for the job, where:

always: indicates notification is sent through the mail for all four notification categories below.
complete: indicates notification is sent through the mail only when the job completes.
error: indicates notification is sent through the mail only when the job terminates abnormally.
never: indicates notification is never sent.
start: indicates notification is sent through the mail only when starting or restarting the job.

Virtual Image Size

of the executable that was submitted.

Checkpoint

checkpoint status (yes or no)

Restart

restart status (yes or no)

Hold Job Until

job is deferred until this date and time.

Cmd

name of the executable that was submitted.

Args

arguments that were passed to the executable.

Env

environment variables to be set before executable runs. Appears only when the -x option is specified.

In

file to be used for stdin.

Out

file to be used for stdout.

Err

file to be used for stderr.

Init Working Directory

directory from which the job is run. The relative directory from which the stdio files are accessed, if appropriate.

Dependency

job requirements as specified when the job was submitted.

Requirements

job requirements as specified when the job was submitted.

Preferences

job preferences as specified when the job was submitted.

Job Type

type of job (serial or parallel).

Min Processors

minimum number of processors needed for this job.

Max Processors

maximum number of processors needed for this job.

Allocated Hosts

the machines that have been allocated for this job.

Submitting Host

name of machine to which job is submitted the job.

Notify User

user to be notified by mail of job status.

Shell

shell to be used when job is run.

LL_Group

the LoadLeveler group associated with the job.

Class

job class as specified when job was submitted.

CPU Hard Limit

CPU hard limit as specified when job was submitted.

CPU Soft Limit

CPU soft limit as specified when job was submitted.

Data Hard Limit

Data hard limit as specified when job was submitted.

Data Soft Limit

Data soft limit as specified when job was submitted.

Core Hard Limit

Core hard limit as specified when job was submitted.

Core Soft Limit

Core soft limit as specified when job was submitted.

File Hard Limit

File hard limits as specified when job was submitted.

File Soft Limit

File soft limit as specified when job was submitted.

Stack Hard Limit

Stack hard limit as specified when job was submitted.

Stack Soft Limit

Stack soft limit as specified when job was submitted.

Rss Hard Limit

RSS hard limit as specified when job was submitted.

Rss Soft Limit

RSS soft limit as specified when job was submitted.

Job Cpu Hard Limit

Job CPU hard limit as specified when job was submitted.

Job Cpu Soft Limit

Job CPU soft limit as specified when job was submitted.

Wall Clock Hard Limit

Wall clock hard limit as specified when job was submitted.

Wall Clock Soft Limit

Wall clock soft limit as specified when job was submitted.

NQS Submit Queue

The name of the NQS pipe queue to which the NQS job will be routed.

NQS Query Queue

The NQS queue names you can use to monitor the job.

Comment

The comment specified in the job command file.

Account

account number specified in the job command file.

UNIX Group

effective UNIX group name.

Negotiator Messages

informational message for jobs in the Idle or NotQueued state.

Other fields displayed when issuing llq -x -l are:

maxrss: maximum resident set size utilized.
ixrss: amount of shared memory used.
idrss: amount of unshared memory used.
isrss: Integral unshared stack used.
minflt: # Page faults (re-claimed).
majflt: # Page faults (I/O required).
nswap: # times swapped out.
inblock: # times file system performed input.
oublock: # times file system performed output.
msgsnd: # of IPC messages sent.
msgrcv: # of IPC messages received.
nsignals: # of signals delivered.
nvcsw: # of context switches due to voluntarily giving up processor.
nivcsw: # of involuntary context switches.

Other fields displayed for parallel jobs are:

Allocated Hosts: allocated hostname information in the format hostname:task status:adapter usage. The adapter usage information is in the format adapter name(adapter window ID,network protocol,mode).
Task Instance: task instance information in the format hostname:task ID:adapter usage. The adapter usage information is in the format adapter name(adapter window ID,network protocol,mode).

llstatus - Query Machine Status

Purpose

Returns status information about machines in the LoadLeveler cluster. It does not provide status on any NQS machine.

Syntax

llstatus [-?] [-H] [-v] [-l] [-f category_list] [-r category_list] [hostlist]

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-l: Specifies that a long listing be generated for each machine for which status is requested. If -l is not specified, the standard list, described below, is generated.
-f category_list: Is a blank-delimited list of categories you want to query. Each category you specify must be preceded by a percent sign. The category_list cannot contain duplicate entries. This flag allows you to create a customized version of the standard llstatus listing. The output fields produced by this flag all have a fixed length. The output is displayed in the order in which you specify the categories. category_list can be one or more of the following:

%a: Hardware architecture
%act: Number of jobs dispatched by the schedd on this machine
%cm: Custom Metric value
%cpu: Number of CPUs on this machine
%d: Available disk space in the LoadLeveler execute directory
%i: Number of seconds since last keyboard or mouse activity
%inq: Number of jobs in queue that were scheduled from this machine
%l: Berkeley one-minute load average
%m: Physical memory on this machine
%mt: Maximum number of tasks that can run simultaneously on this machine
%n: Machine name
%o: Operating system on this machine
%r: Number of jobs running on this machine
%sca: Availability of the schedd daemon
%scs: State of the schedd daemon
%sta: Availability of the startd daemon
%sts: State of the startd daemon
%v: Available swap space of this machine

-r category_list: Is a blank-delimited list of categories you want to query. Each category you specify must be preceded by a percent sign. The category_list cannot contain duplicate entries. This flag allows you to create a customized version of the standard llstatus listing. The output produced by this flag is considered raw, in that the fields can be variable in length. The output is displayed in the order in which you specify the formats. Output fields are separated by an exclamation point (!). category_list can be one or more of the categpries listed under the -f flag.
hostlist: Is a blank-delimited list of machines for which status is requested.

Description

If no hostlist is specified, all machines are queried.

If you have more than a few machines configured for LoadLeveler, consider redirecting the output to a file when using the -l flag.

Each machine periodically updates the central manager with a snapshot of its situation. Since the information returned by using llstatus is a collection of such snapshots, all taken at varying times, the total picture may not be completely consistent.

Examples

This example requests a long status listing for machines named silver and gold.

llstatus -l silver gold

Results

In this section, the term "job step" refers to either a serial job step or a parallel task.

The Standard Listing: The standard listing is generated when you do not specify the -l option with the llstatus command. The following is sample output from the llstatus command, where there are two nodes in the cluster.

+--------------------------------------------------------------------------------+
|Name                      Schedd  InQ Act Startd Run LdAvg Idle Arch      OpSys |
|k10n09.ppd.pok.ibm.com    Avail     3   3 Run      1 2.72     0 R6000     AIX43 |
|k10n12.ppd.pok.ibm.com    Avail     0   0 Idle     0 0.00   365 R6000     AIX43 |
|                                                                                |
|R6000/AIX43            2 machines   3 jobs   1 running                          |
|Total Machines         2 machines   3 jobs   1 running                          |
|                                                                                |
|The Central Manager is defined on k10n09.ppd.pok.ibm.com                        |
|                                                                                |
|All machines on the machine_list are present.                                   |
+--------------------------------------------------------------------------------+

The standard listing includes the following fields:

Name: hostname of the machine.
Schedd: state of the schedd daemon, which can be one of the following:

Down

Drned (Drained)

Drning (Draining)

Avail (Available)

For a detailed explanation of these states, see "The schedd Daemon".

InQ: number of job steps in the queue that were scheduled from this machine.
Act: number of job steps that the schedd has dispatched.
Startd: state of the startd daemon, which can be:

Busy

Down

Drned (Drained)

Drning (Draining)

Flush

Idle

None

Resrvd (Reserved)

Run (Running)

Suspnd (Suspend)

For a detailed explanation of these states, see "The startd Daemon".

Run: number of job steps running on this machine.
LdAvg: Berkeley one-minute load average on this machine.
Idle: number of seconds since keyboard or mouse activity in a login session was detected. Highest number displayed is 9999.
Arch: hardware architecture of machine as listed in configuration file.
OpSys: operating system on this machine.

Customized, Formatted Standard Listing: A customized and formatted standard listing is generated when you specify llstatus with the -f option. The following is sample output from this command:

   llstatus -f %n %scs %inq %m %v %sts %l %o

+--------------------------------------------------------------------------------+
|Name             Schedd  InQ    Memory      FreeVMemory Startd  LdAvg  OpSys    |
|ll5.pok.ibm.com  Avail   0      128         22708       Run     0.23   AIX43    |
|ll6.pok.ibm.com  Avail   3      224         16732       Run     0.51   AIX43    |
|                                                                                |
|R6000/AIX43                 2 machines      3  jobs      3  running             |
|Total Machines              2 machines      3  jobs      3  running             |
|                                                                                |
|The Central Manager is defined on ll5.pok.ibm.com                               |
|                                                                                |
|All machines on the machine_list are present.                                   |
+--------------------------------------------------------------------------------+

Customized, Unformatted Standard Listing: A customized and unformatted (raw) standard listing is generated when you specify llstatus with the -r flag. Output fields are separated by an exclamation point (!). The following is sample output from this command:

llstatus -r %n %scs %inq %m %v %sts %l %o

+--------------------------------------------------------------------------------+
|ll5.pok.ibm.com!Avail!0!128!22688!Running!0.14!AIX43                            |
|ll6.pok.ibm.com!Avail!3!224!16668!Running!0.37!AIX43                            |
+--------------------------------------------------------------------------------+

The Long Listing: The long listing is generated when you specify the -l option with the llstatus command. Following the sample output is an explanation of all possible fields displayed by the llstatus command.

The following is sample output from the llstatus -l ll6 command:

+--------------------------------------------------------------------------------+
|================================================================================|
|Name                = ll6.pok.ibm.com                                           |
|Machine             = ll6.pok.ibm.com                                           |
|Arch                = R6000                                                     |
|OpSys               = AIX43                                                     |
|SYSPRIO             = (0 -  QDate)                                              |
|MACHPRIO            = (0 -  LoadAvg)                                            |
|VirtualMemory       = 16640                                                     |
|Disk                = 23000                                                     |
|KeyboardIdle        = 600                                                       |
|Tmp                 = 48868                                                     |
|LoadAvg             = 0.302991                                                  |
|ConfiguredClasses   = No_Class(2) osl(1) small(2) medium(1) POE(2)              |
|AvailableClasses    = No_Class(0) osl(1) small(2) medium(1) POE(2)              |
|DrainingClasses     =                                                           |
|DrainedClasses      =                                                           |
|Pool                = 1                                                         |
|Adapter             = css0(tb3mx,llx5,9.114.16.155,26,4)                        |
|Feature             =                                                           |
|Max_Starters        = 2                                                         |
|Memory              = 224                                                       |
|ConfigTimeStamp     = Wed Apr  8 09:05:36 1998                                  |
|Cpus                = 1                                                         |
|Speed               = 1.000000                                                  |
|Subnet              = 9.117.17                                                  |
|MasterMachPriority  = 0.000000                                                  |
|CustomMetric        = 1                                                         |
|StartdAvail         = 1                                                         |
|State               = Running                                                   |
|EnteredCurrentState = Wed Apr  8 09:46:33 1998                                  |
|START               = T                                                         |
|SUSPEND             = F                                                         |
|CONTINUE            = T                                                         |
|VACATE              = F                                                         |
|KILL                = F                                                         |
|Machine Mode        = general                                                   |
|Running             = 2                                                         |
|ScheddAvail         = 1                                                         |
|ScheddState         = Avail                                                     |
|ScheddRunning       = 3                                                         |
|Pending             = 0                                                         |
|Starting            = 0                                                         |
|Idle                = 0                                                         |
|Unexpanded          = 0                                                         |
|Held                = 0                                                         |
|Removed             = 0                                                         |
|RemovedPending      = 0                                                         |
|Completed           = 0                                                         |
|TotalJobs           = 3                                                         |
|TimeStamp           = Wed Apr  8 09:47:45 1998                                  |
+--------------------------------------------------------------------------------+

The long listing includes these fields:

Name: hostname of the machine.
Running: number of job steps running on this machine.
ScheddAvail: flag indicating if machine is running a schedd daemon (0=no, 1=yes).
StartdAvail: flag indicating if machine is running a startd daemon (0=no, 1=yes).
State: state of the startd daemon, which can be:

Busy

Down

Drain

Flush

Idle

None

Reserved

Running

Suspend

For a detailed explanation of these states, see "The startd Daemon".

OpSys: operating system on this machine.
Arch: hardware architecture of machine as listed in configuration file.
Machine: fully qualified name of the machine.
START: the expression, defined following C conventions in the configuration file, that evaluates to true or false (T/F). This determines whether jobs can be started on this machine.
SUSPEND: the expression, defined following C conventions in the configuration file, that evaluates to true or false (T/F). This determines whether running jobs should be suspended on this machine.
CONTINUE: the expression, defined following C conventions in the configuration file, that evaluates to true or false (T/F). This determines whether suspended jobs are continued on this machine.
VACATE: the expression, defined following C conventions in the configuration file, that evaluates to true or false (T/F). This determines whether suspended jobs are vacated on this machine.
KILL: the expression, defined following C conventions in the configuration file, that evaluates to true or false (T/F). This determines whether running jobs should be killed on this machine.
SYSPRIO: actual expression that determines overall system priority of the job, defined in the configuration file.
MACHPRIO: actual expression that determines machine priority, defined in the configuration file.
Machine Mode: the type of job this machine can run. This can be: batch, interactive, or general.
Virtual Memory: available swap space, in kilobytes, on this machine.
Entered Current State: date and time when machine state was set.
Disk: available space, in kilobytes (less 512KB) in LoadLeveler's execute directory on this machine.
Keyboard Idle: number of seconds since last keyboard or mouse activity.
LoadAvg: Berkely one-minute load average on machine.
AvailableClasses: set of currently available classes.
DrainingClasses: set of names of classes which are currently being drained on this machine.
DrainedClasses: set of names of classes which have been drained on this machine and are therefore unavailable.
ConfiguredClasses: set of all classes supported on this machine, both those in use and those not in use, as defined in the configuration file.
Pool: the identifier of the pool where this startd machine is located.
Adapter: network adapter information associated with this machine. The format of this information is adapter name(network_type, interface_name, interface_address, switch_node_number, max_adapter_window). These fields are defined in the adapter stanza in the administration file.
Feature: set of all features on this machine.
Memory: physical memory, in megabytes, on this machine.
Max_Starters: maximum number of job steps that can run simultaneously on this machine.
Config Time Stamp: date and time of last (re)configuration.
Cpus: number of CPUs on this machine.
Speed: speed associated with the machine.
MasterMachPriority: machine priority for the parallel master node.
Subnet: TCP/IP subnet that this machine resides on.
CustomMetric: number that indicates the order of the machines for scheduling purposes.
ScheddRunning: number of job steps submitted to this machine that are running somewhere in the LoadLeveler cluster.
Pending: number of job steps in this state on this schedd machine.
Starting: number of job steps in this state on this schedd machine.
Idle: number of job steps in this state on this schedd machine.
Unexpanded: number of job steps in this state on this schedd machine.
Held: number of job steps in this state on this schedd machine.
Removed: number of job steps in this state on this schedd machine.
Remove Pending: number of job steps in this state on this schedd machine.
Completed: number of job steps in this state on this schedd machine.
Total Jobs: number of total job steps submitted to this schedd machine.
ScheddState: state of the schedd on this schedd machine.
time stamp: date and time the central manager last received a status update from this schedd machine.

llsubmit - Submit a Job

Purpose

Submits a job to LoadLeveler to be dispatched based upon job requirements in the job command file.

You can submit both LoadLeveler jobs and NQS jobs. To submit NQS jobs, the job command file must contain the shell script to be submitted to the NQS node.

Syntax

llsubmit [-?] [-H] [-v] [-q] [-n] [cmdfile | -]

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-q: Specifies quiet mode: print no messages other than error messages.
-n: Compiles the job command file, including checking for syntax errors, but does not submit it.
cmdfile: Is the name of the job command file containing LoadLeveler commands.
-: Specifies that LoadLeveler commands that would normally be in the job command file are read from stdin. When entry is complete, press Ctrl-D to end the input.

Related Information

Users with uid or gid equal to 0 are not allowed to issue the llsubmit command.
When a LoadLeveler job ends, you may receive UNIX mail notification indicating the job exit status. For example, you could get the following mail message:
```
Your LoadLeveler job
myjob1
exited with status 139.
```
The return code 139 is from the user's job, and is not a LoadLeveler return code.
For information on writing a program to filter job scripts when they are submitted, see "Filtering a Job Script".

Examples

In this example, a job command file named qtrlyrun.cmd is submitted.

llsubmit qtrlyrun.cmd

Results

The following shows the results of the llsubmit qtrlyrun.cmd command issued from the machine earth:

llsubmit: The job "earth.505" has been submitted.

Note that 505 is the job ID generated by LoadLeveler.

llsummary - Return Job Resource Information for Accounting

Purpose

Returns job resource information on completed jobs for accounting purposes.

Syntax

llsummary [-?] [-H] [-v] [-x] [-l] [-s MM/DD/YY to MM/DD/YY] [-e MM/DD/YY to MM/DD/YY] [-u user] [-c class] [-g group] [-G unixgroup] [-a allocated] [-r report] [-j host.jobid] [-d section] [filename]

Flags

-?: Provides a short usage message.
-H: Provides extended help information.
-v: Outputs the name of the command, release number, service level, service level date, and operating system used to build the command.
-x: Provides extended information. Using -x can produce a very long report. This option is meaningful only when used with the -l option. You must enable the recording of accounting data in order to collect information with the -x flag. To do this, specify ACCT=A_ON A_DETAIL in your LoadL_config file.
-l: Specifies that the long form of output is displayed.
-s: Specifies a range for the start date (queue date) for accounting data to be included in this report. The format for entering the date is either MM/DD/YY (where MM is month, DD is day, and YY is year), or is a string of digits representing the number of seconds since 1970. The default is to include all the data in the report.
-e: Specifies a range for the end date (completion date) for accounting data to be included in this report. The format for entering the date is either MM/DD/YY (where MM is month, DD is day, and YY is year), or is a string of digits representing the number of seconds since 1970. The default is to include all the data in the report.
-u user: Specifies the user ID for whom accounting data is reported.
-c class: Specifies the class for which accounting data is reported.
-g group: Specifies the LoadLeveler group for which accounting data reported is reported.
-G unixgroup: Specifies the UNIX group for which accounting data is reported.
-a allocated: Specifies the hostname that was allocated to run the job. You can specify the allocated host in short or long form.
-r report: Specifies the report type. You can choose one or more of the following reports:

resource: Provides CPU usage for all submitted jobs, including those that did not run. This is the default.
avgthroughput: Provides average queue time, run time, and CPU time for jobs that ran for at least some period of time.
maxthroughput: Provides maximum queue time, run time, and CPU time for jobs that ran for at least some period of time.
minthroughput: Provides minimum queue time, run time, and CPU time for jobs that ran for at least some period of time.
throughput: Selects all throughput reports.
numeric: Reports CPU times in seconds rather than hours, minutes, and seconds

You must enable the recording of accounting data in order to generate any of the four throughput reports. To do this, specify ACCT=A_ON A_DETAIL in your LoadL_config file.

-d section: Specifies the category (data section) for which you want to generate a report. You can specify one or more of the following: user, group, unixgroup, class, account, day, week, month, jobid, jobname, allocated.
-j host.jobid: The job for which accounting data is reported. host is the name of the machine to which the job was submitted. The default is the local machine. jobid is the job ID assigned to the job when it was submitted using the llsubmit command. The entire host.jobid string is required.
filename: The file containing the accounting data. If not specified, the default is the local history file on the machine from which the command was issued. You can use the llacctmrg command to produce such a file.

Examples

The following example requests summary reports (standard listing) of all the jobs submitted on your machine between the days of September 12, 1999 and October 12, 1999:

llsummary -s 09/12/99 to 10/12/99

Results

The Standard Listing: The standard listing is generated when you do not specify -l. -r, or -d with llsummary. This sample report includes summaries of the following data:

Number of jobs, Total CPU usage, per user.
Number of jobs, Total CPU usage, per class.
Number of jobs, Total CPU usage, per group.
Number of jobs, Total CPU usage, per account number.

The following is an example of the standard listing:

+--------------------------------------------------------------------------------+
|       Name   Jobs   Steps        Job Cpu    Starter Cpu     Leverage           |
|    krystal     15      36     0+00:09:50     0+00:00:10         59.0           |
|     lixin3     18      54     0+00:08:28     0+00:00:16         31.8           |
|      TOTAL     33      90     0+00:18:18     0+00:00:27         40.7           |
|                                                                                |
|      Class   Jobs   Steps        Job Cpu    Starter Cpu     Leverage           |
|      small      9      21     0+00:01:03     0+00:00:06         10.5           |
|      large     12      36     0+00:13:45     0+00:00:11         75.0           |
|       osl2      3       9     0+00:00:27     0+00:00:02         13.5           |
|   No_Class      9      24     0+00:03:01     0+00:00:06         30.2           |
|      TOTAL     33      90     0+00:18:18     0+00:00:27         40.7           |
|                                                                                |
|      Group   Jobs   Steps        Job Cpu    Starter Cpu     Leverage           |
|   No_Group     12      30     0+00:09:32     0+00:00:09         63.6           |
|  chemistry      7      18     0+00:04:50     0+00:00:05         58.0           |
|engineering     14      42     0+00:03:56     0+00:00:12         19.7           |
|  TOTAL         33      90     0+00:18:18     0+00:00:27         40.7           |
|                                                                                |
|    Account   Jobs   Steps        Job Cpu    Starter Cpu     Leverage           |
|      33333     16      39     0+00:05:54     0+00:00:11         32.2           |
|      22222     15      45     0+00:12:05     0+00:00:13         55.8           |
|      99999      2       6     0+00:00:18     0+00:00:01         18.0           |
|      TOTAL     33      90     0+00:18:18     0+00:00:27         40.7           |
+--------------------------------------------------------------------------------+

The standard listing includes the following fields:

Name: User ID submitting jobs.
Class: Class specified or defaulted for the jobs.
Group: User's login group.
Account: Account number specified for the jobs.
Jobs: Count of the total number of jobs submitted by this user, class, group, or account.
Steps: Count of the total number of job steps submitted by this user, class, group, or account.
Job CPU: Total CPU time consumed by user's jobs.
Starter CPU: Total CPU time consumed by LoadLeveler starter processes on behalf of the user jobs.
Leverage: Ratio of job CPU to starter CPU.

The -r Listing: The following is sample output from the llsummary -r throughput command. Only the user output is shown; the class, group, and account lines are not shown.

+--------------------------------------------------------------------------------+
|   Name   Jobs   Steps   AvgQueueTime    AvgRealTime     AvgCPUTime             |
|  loadl      1       4     0+00:00:03     0+00:05:27     0+00:05:17             |
|  user1      2       6     0+00:03:05     0+00:03:45     0+00:03:04             |
|    ALL      3      10     0+00:01:52     0+00:04:26     0+00:03:58             |
|                                                                                |
|   Name   Jobs   Steps   MinQueueTime    MinRealTime     MinCPUTime             |
|  loadl      1       4     0+00:00:01     0+00:02:49     0+00:02:44             |
|  user1      2       6     0+00:02:02     0+00:03:43     0+00:03:02             |
|    ALL      3      10     0+00:00:01     0+00:02:49     0+00:02:44             |
|                                                                                |
|   Name   Jobs   Steps   MaxQueueTime    MaxRealTime     MaxCPUTime             |
|  loadl      1       4     0+00:00:06     0+00:12:58     0+00:12:37             |
|  user1      2       6     0+00:06:21     0+00:03:48     0+00:03:07             |
|    ALL      3      10     0+00:06:21     0+00:12:58     0+00:12:37             |
+--------------------------------------------------------------------------------+

The -r listing includes the following fields:

AvgQueueTime: Average amount of time the job spent queued before running for this user, class, group, or account.
AvgRealTime: Average amount of accumulated wall clock time for jobs associated with this user, class, group, or account.
AvgCPUTime: Average amount of accumulated CPU time for jobs associated with this user, class, group, or account.
MinQueueTime: Time of the job that spent the least amount of time in queue before running for this user, class, group, or account.
MinRealTime: Time of the job with the least amount of wall clock time for this user, class, group, or account.
MinCPUime: Time of the job with the least amount of CPU time for this user, class, group, or account.

The MaxQueueTime, MaxRealTime, and MaxCPUTime fields display the time of the job with the greatest amount of queue, wall clock, and CPU time, respectively. The ALL line for the Average listing displays the average time for all users, classes, groups, and accounts. The ALL line for the Minimum listing displays the time of the job with the least amount of time for all users, classes, groups, and accounts. The ALL line for the Maximum listing displays the time of the job with the greatest amount of time for all users, classes, groups, and accounts.

The Long Listing: When you specify the -x option in conjunction with the -l option on the llsummary command, the long report resembles the following:

+--------------------------------------------------------------------------------+
|================== Job ll1.kgn.ibm.com 772 =================                    |
|             Job Id: ll1.kgn.ibm.com 772                                        |
|           Job Name: ll1.kgn.ibm.com.772                                        |
|  Structure Version: 140                                                        |
|              Owner: anton                                                      |
|         Unix_group: staff                                                      |
|    Submitting Host: ll1.kgn.ibm.com                                            |
|  Submitting Userid: 17212                                                      |
| Submitting Groupid: 100                                                        |
|    Number of Steps: 1                                                          |
|----------------- Step ll1.kgn.ibm.com 772.0 -----------------                  |
|        Job Step Id: ll1.kgn.ibm.com 772.0                                      |
|          Step Name: c_test                                                     |
|         Queue Date: Wed Sep 6 09:43:38 CDT 1998                                |
|Job Step Dependency:                                                            |
|             Status: Completed                                                  |
|    Completion Date: Wed Sep 6 10:27:23 CDT 1998                                |
|    Completion Code: 0                                                          |
|        Start Count: 1                                                          |
|      User Priority: 50                                                         |
|       user_sysprio: 0                                                          |
|      class_sysprio: 0                                                          |
|      group_sysprio: 0                                                          |
|      Notifications: Complete                                                   |
| Virtual Image Size: 19 kilobytes                                               |
|         Checkpoint: no                                                         |
|            Restart: yes                                                        |
|     Hold Job Until:                                                            |
|                Cmd: job1.cmd                                                   |
|               Args:                                                            |
|                Env: LOADL_CORESIZE = 1024                                      |
|                 In: /dev/null                                                  |
|                Out: job1.ll1.772.0.out                                         |
|                Err: job1.ll1.772.0.err                                         |
|Initial Working Dir: /u/jeffli/regress                                          |
|       Requirements: (Arch == "R6000") && (OpSys == "AIX43")                    |
|        Preferences:                                                            |
|          Step Type: Serial                                                     |
|     Min Processors:                                                            |
|     Max Processors:                                                            |
|     Allocated Host: ll1.kgn.ibm.com                                            |
|        Notify User: anton@ll1.kgn.ibm.com                                      |
|              Shell: /bin/ksh                                                   |
|  LoadLeveler Group: No_Group                                                   |
|              Class: No_Class                                                   |
|     Cpu Hard Limit: 300 seconds                                                |
|     Cpu Soft Limit: 100 seconds                                                |
|    Data Hard Limit: 262144000 bytes                                            |
|    Data Soft Limit: 230686720 bytes                                            |
|    Core Hard Limit: 262144000 bytes                                            |
|    Core Soft Limit: -1                                                         |
|    File Hard Limit: 262144000 bytes                                            |
|    File Soft Limit: 230686720 bytes                                            |
|   Stack Hard Limit: 262144000 bytes                                            |
|   Stack Soft Limit: -1                                                         |
|     Rss Hard Limit: 262144000 bytes                                            |
|     Rss Soft Limit: -1                                                         |
|Step Cpu Hard Limit: 400 seconds                                                |
|Step Cpu Soft Limit: 200 seconds                                                |
|Wall Clk Hard Limit: 600 seconds                                                |
|Wall Clk Soft Limit: 300 seconds                                                |
+--------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------+
|            Comment:                                                            |
|            Account:                                                            |
|   NQS Submit Queue:                                                            |
|   NQS Query Queues:                                                            |
|  Job Tracking Exit:                                                            |
|  Job Tracking Args:                                                            |
|--------------- Detail for ll1.kgn.ibm.com.772.0 ------                         |
|       Running Host: ll1.kgn.ibm.com                                            |
|      Machine Speed: 1.000000                                                   |
|              Event: System                                                     |
|         Event Name: completed                                                  |
|      Time of Event: Wed Sep 6 10:27:23 CDT 1998                                |
|  Starter User Time:   0+00:00:00.240000                                        |
|Starter System Time:   0+00:00:00.390000                                        |
| Starter Total Time:   0+00:00:00.630000                                        |
|     Starter maxrss: 828                                                        |
|      Starter ixrss: 8388                                                       |
|      Starter idrss: 6896                                                       |
|      Starter isrss: 0                                                          |
|     Starter minflt: 202                                                        |
|     Starter majflt: 0                                                          |
|      Starter nswap: 0                                                          |
|    Starter inblock: 0                                                          |
|    Starter oublock: 0                                                          |
|     Starter msgsnd: 12                                                         |
|     Starter msgrcv: 11                                                         |
|   Starter nsignals: 1                                                          |
|      Starter nvcsw: 79                                                         |
|     Starter nivcsw: 0                                                          |
|     Step User Time:   0+00:00:00.810000                                        |
|   Step System Time:   0+00:00:01.500000                                        |
|    Step Total Time:   0+00:00:02.310000                                        |
|        Step maxrss: 712                                                        |
|         Step ixrss: 15540                                                      |
|         Step idrss: 14296                                                      |
|         Step isrss: 0                                                          |
|        Step minflt: 1443                                                       |
|        Step majflt: 0                                                          |
|         Step nswap: 0                                                          |
|       Step inblock: 0                                                          |
|       Step oublock: 0                                                          |
|        Step msgsnd: 5                                                          |
|        Step msgrcv: 4                                                          |
|      Step nsignals: 14                                                         |
|         Step nvcsw: 70                                                         |
|        Step nivcsw: 0                                                          |
+--------------------------------------------------------------------------------+

For an explanation of these fields, see the description of the output fields for the long listing of the llq command.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]