LoadLeveler provides two types of commands: those that are available to all users of LoadLeveler, and those that are reserved for LoadLeveler administrators. (Administrators are identified by the LOADL_ADMIN keyword in the configuration file.)
The administrator commands can operate on the entire LoadLeveler job queue and all machines configured. The user commands mainly affect those jobs submitted by that user. Some commands, such as llhold, include options that can only be performed by an administrator.
The following table summarizes the LoadLeveler commands:
Command | Description | Who Can Issue? | For More Information |
---|---|---|---|
llacctmrg | Collects all individual machine history files together into a single file. | Administrators | See page llacctmrg - Collect machine history files |
llcancel | Cancels a submitted job. | Users and Administrators | See page llcancel - Cancel a Submitted Job |
llclass | Returns information about LoadLeveler classes. | Users and Administrators | See page llclass - Query Class Information |
llctl | Controls daemons on one or more machines in the LoadLeveler cluster. | Administrators | See page llctl - Control LoadLeveler Daemons |
llextSDR | Extracts adapter information from the system data repository (SDR). | Users and Administrators | See page llextSDR - Extract adapter information from the SDR |
llfavorjob | Raises one or more jobs to the highest priority, or restores original priority. | Administrators | See page llfavorjob - Reorder System Queue by Job |
llfavoruser | Raises job(s) submitted by one or more users to the highest priority, or restores original priority. | Administrators | See page llfavoruser - Reorder System Queue by User |
llhold | Holds or releases a hold on a job. | Users and Administrators | See page llhold - Hold or Release a Submitted Job |
llinit | Initializes a new machine as a member of the LoadLeveler cluster. | Administrators | See page llinit - Initialize Machines in the LoadLeveler Cluster |
llprio | Changes the user priority of a submitted job step. | Users and Administrators | See page llprio - Change the User Priority of Submitted Job Steps |
llq | Queries the status of LoadLeveler jobs. | Users and Administrators | See page llq - Query Job Status |
llstatus | Queries the status of LoadLeveler machines. | Users and Administrators | See page llstatus - Query Machine Status |
llsubmit | Submits a job. | Users and Administrators | See page llsubmit - Submit a Job |
llsummary | Returns resource information on completed jobs. | Administrators | See page llsummary - Return Job Resource Information for Accounting |
Purpose
Collects individual machine history files together into a single file specified as a parameter.
Syntax
llacctmrg [-?] [ -H] [-v] [-h hostlist] [-d directory]
Flags
Description
This command by default collects data from all the machines identified in the administration file. To override the default, specify a machine or a list of machines using the -h flag.
When the llacctmrg command ends, accounting information is stored in a file called globalhist.YYYYMMDDHHmm. Information such as the amount of resources consumed by the job and other job-related data is stored in this file. In this file:
You can use this file as input to the llsummary command. For example, if you created the file globalhist.199808301050, you can issue llsummary globalhist.199808301050 to record information on all machines.
Data on processes which fork child processes will be included in the file only if the parent process waits for the child process to end. Therefore, complete data may not be collected for jobs which are not composed of simple parent/child processes. For example, if a LoadLeveler job invokes an rsh command to execute some function on another machine, the resources consumed on the other machine will not be collected as part of the accounting data.
Examples
The following example collects data from machines named mars and pluto.
llacctmrg -h mars pluto
The following example collects data from the machine named mars and places the data in an existing directory called merge.
llacctmrg -h mars -d merge
Results
The following shows a sample system response from the llacctmrg -h mars -d merge command.
llacctmrg: History transferred successfully from mars (10080 bytes)
Purpose
Cancels one or more jobs from the LoadLeveler queue.
Syntax
llcancel [-?] [-H] [-v] [-q] [-u userlist] [-h hostlist] [joblist]
Flags
The -u or -h flags override the host.jobid.stepid parameters.
When the -h flag is specified by a non-administrator, all jobs submitted from the machines in hostlist by the user issuing the command are cancelled.
When the -h flag is specified by an administrator, all jobs submitted by the administrator are canceled, unless the -u is also specified, in which case all jobs both submitted by users in userlist and monitored on machines in hostlist are cancelled.
Group administrators and class administrators are considered normal users unless they are also LoadLeveler administrators.
Description
When you issue llcancel, the command is sent to the negotiator. You should then use the llq command to verify your job was cancelled. A job state of RM (Removed) indicates the job was cancelled. A job state of RP (Remove Pending) indicates the job is in the process of being cancelled.
When cancelling a job from a submit-only machine, you must specify the machine name that scheduled the job. For example, if you submitted the job from machine A, a submit-only machine, and machine B, a scheduling machine, scheduled the job to run, you must specify machine B's name in the cancel command. If machine A and B are in different sub-domains, you must specify the fully-qualified name of the job in the cancel command. You can use the llq -l command to determine the fully-qualified name of the job.
Examples
This example cancels the job step 3 that is part of the job 18 that is scheduled by the machine named bronze:
llcancel bronze.18.3
This example cancels all the job steps that are a part of job 8 that are scheduled by the machine named gold.
llcancel gold.8
Results
The following shows a sample system response for the llcancel gold.8 command.
llcancel: Cancel command has been sent to the central manager.
Purpose
Returns information about classes.
Syntax
llclass [-?] [-H] [-v] [-l] [classlist]
Flags
If you have more than a few classes configured for LoadLeveler, consider redirecting the output to a file when you use the -l flag.
Examples
This example generates a long listing for classes named silver and gold.
llclass -l silver gold
The Standard Listing: . The standard listing is generated when you do not specify -l with the llclass command. The following is sample output from the llclass silver command, where there are five silver classes configured in the cluster, with one silver class job currently running.
+--------------------------------------------------------------------------------+ |Name MaxJobCPU MaxProcCPU Free Max Description | | d+hh:mm:ss d+hh:mm:ss Slots Slots | | | |silver 0+00:30:00 0+00:10:00 4 5 silver grade jobs | +--------------------------------------------------------------------------------+
The standard listing includes the following fields:
The Long Listing: The long listing is generated when you specify the -l option on the llclass command. The following is sample output from the llclass -l silver command, where there are five silver classes configured in the cluster, with one silver class job currently running.
+--------------------------------------------------------------------------------+ |=============== Class silver ========== | | Name: silver | | priority: 50 | | admin: brownap | | NQS_class: F | | NQS_submit: | | NQS_query: | | max_processors: 1 | | max_jobs: 3 | | class_comment: silver grade jobs | | wall_clock_limit: 0+00:60:00, -1 | | job_cpu_limit: 0+00:30:00, -1 | | cpu_limit: 0+00:10:00, -1 | | data_limit: -1, -1 | | core_limit: -1, -1 | | file_limit: -1, -1 | | stack_limit: -1, -1 | | rss_limit: -1, -1 | | nice: 15 | | free: 4 | | maximum: 5 | | | +--------------------------------------------------------------------------------+
The long listing includes these fields:
Related Information
Each machine periodically updates the central manager with a snapshot of its environment. Since the information returned by llclass is a collection of these snapshots, all taken at varying times, the total picture may not be completely consistent.
Purpose
Controls LoadLeveler daemons on all members of the LoadLeveler cluster.
Syntax
llctl [-?] [-H] [-v] [-q] [-g | -h host] keyword
Flags
This option is intended for recovery and cleanup after a machine has permanently crashed or was inadvertantly removed from the LoadLeveler cluster before all activity on it was quiesced. Do not use this option unless the specified list_of_machines are guaranteed not to return to the LoadLeveler cluster.
If you need to return the machine to the cluster later, you must clear all files from the spool and execute directory of the machine which was deleted.
Description
This command sends a message to the master daemon on the target machine requesting that action be taken on the members of the LoadLeveler cluster. Note the following when using this command:
Draining all the classes on a startd machine is not equivalent to draining the startd machine. When you drain all the classes, the startd enters the Idle state. When you drain the startd, the startd enters the Drained state. Similarly, resuming all the classes on a startd machine is not equivalent to resuming the startd machine.
If a serial job is running on a machine that receives the llctl recycle command, or the llctl stop and llctl start commands, the running job is terminated. You can restart the job by resubmitting the job or by enabling checkpointing and specifying the restart=yes option in the job command file.
Examples
This example stops LoadLeveler on the machine named iron:
llctl -h iron stop
This example starts the LoadLeveler daemons on all members of the LoadLeveler cluster, starting with the central manager, as defined in the machine stanzas of the administration file:
llctl -g start
This example causes the LoadLeveler daemons on machine iron to re-read the configuration files, which may contain new configuration information for the iron machine:
llctl -h iron reconfig
For the next three examples, suppose the classes small, medium, and large are available on the machine called iron.
This example drains the classes medium and large on the machine named iron.
llctl -h iron drain startd medium large
This example drains the classes medium and large on all machines.
llctl -g drain medium large
This example resumes the classes medium and large on the machine named iron.
llctl -h iron resume startd medium large
This example illustrates how to capture accounting information on a work shift called day on the machine iron:
llctl -h iron capture day
You can capture accounting information on all the machines in the LoadLeveler cluster by using the -g option, or you can collect accounting information on the local machine by simply issuing the following:
llctl capture day
Capturing information on the local machine is the default. For more information, see "Collecting Job Resource Data Based on Events".
Assume the machine earth has crashed while running jobs. Its hard disk needs to be replaced. You try to cancel the jobs that were running on that machine. The schedd marks the job Remove Pending until it gets confirmation from earth that the jobs were removed. Since earth will be reinstalled, you need to inform schedd that it should not wait for confirmation.
Assume the schedd is named mars, and the running jobs are named mars.1.0 and mars.1.1. First you want to tell the negotiator to remove the jobs:
llcancel mars.1.0 llcancel mars.1.1
Next, tell the schedd not to wait for confirmation from earth before marking the jobs removed.
llctl -h mars purge earth
Results
The following shows the result of the llctl -h mars purge earth command.
llctl: Sent purge command to host mars
Purpose
Extracts adapter information from the system data repository (SDR) and creates adapter and machine stanzas for each node in an RS/6000 SP partition. You can use the information in these stanzas in the LoadLeveler administration file. This command writes the stanzas to standard output.
Syntax
llextSDR [-?] [-H] [-v] [-a adapter]
Flags
This command is available to users and administrators.
In the SDR, the Node class contains an entry for each node in the SP partition. The Adapter class contains an entry for each adapter configured on a node. This command extracts the information in the Adapter class and creates an adapter stanza. This command also creates a machine stanza which identifies the node and the adapters attached to the node. The generated machine stanza also includes the spacct_excluse_enable keyword, whose value is obtained from the spacct_excluse_enable attribute in the SP class of the SDR. For more information on adapter stanzas, see "Step 5: Specify Adapter Stanzas". For more information on machine stanzas, see "Step 1: Specify Machine Stanzas".
The partition for which information is extracted is either the default partition or that specified with the SP_NAME environment variable. For the control workstation, the default partition is the default system partition. For an SP node, the default partition is the partition to which the node belongs.
You must issue this command on a machine with the ssp.clients file set installed. If you issue this command from a non-SP workstation, you must set SP_NAME to the IP address of the appropriate SDR instance for the partition.
Examples
The following example creates adapter and machine stanzas for all nodes in a partition:
llextSDR
The following example creates machine stanzas with each node's css0 interface name as the label.
llextSDR -a css0
Results
You may need to alter or add information to the stanzas produced by this command when you incorporate the stanzas into the administration file. For example, administrators may want to have each network_type field use a value that reflects the type of nodes installed on the network. Users will need to know the values used for network_type so that they can specify an appropriate value in their job command files.
Also, the output of this command includes fully-qualified machine names. If your existing administration file uses short names, you may need to change either the command output or your existing administration file so that you use either all fully-qualified names or all short names.
The following shows sample output for the llextSDR command, where the default partition is k4s. This sample output shows the first two nodes in the partition.
k4inst.ppd.pok.ibm.com: type = machine adapter_stanzas = k4n01.ppd.pok.ibm.com k4sn01.ppd.pok.ibm.com k4inst.ppd.pok.ibm.com spacct_excluse_enable = true k4n01.ppd.pok.ibm.com: type = adapter adapter_name = en1 network_type = ethernet interface_address = 9.114.45.65 interface_name = k4n01.ppd.pok.ibm.com k4sn01.ppd.pok.ibm.com: type = adapter adapter_name = css0 network_type = switch interface_address = 9.114.45.129 interface_name = k4sn01.ppd.pok.ibm.com switch_node_number = 0 k4inst.ppd.pok.ibm.com: type = adapter adapter_name = en0 network_type = ethernet interface_address = 9.114.45.1 interface_name = k4inst.ppd.pok.ibm.com k4n03.ppd.pok.ibm.com: type = machine adapter_stanzas = k4sn03.ppd.pok.ibm.com k4n03.ppd.pok.ibm.com spacct_excluse_enable = true k4sn03.ppd.pok.ibm.com: type = adapter adapter_name = css0 network_type = switch interface_address = 9.114.45.131 interface_name = k4sn03.ppd.pok.ibm.com switch_node_number = 2 k4n03.ppd.pok.ibm.com: type = adapter adapter_name = en0 network_type = ethernet interface_address = 9.114.45.67 interface_name = k4n03.ppd.pok.ibm.com . . .
The following shows sample output for the llextSDR -a css0 command for a single node.
k10sn09.ppd.pok.ibm.com: type = machine adapter_stanzas = k10sn09.ppd.pok.ibm.com k10n09.ppd.pok.ibm.com spacct_excluse_enable = true k10sn09.ppd.pok.ibm.com: type = adapter adapter_name = css0 network_type = switch interface_address = 9.114.51.137 interface_name = k10sn09.ppd.pok.ibm.com switch_node_number = 8 k10n09.ppd.pok.ibm.com: type = adapter adapter_name = en0 network_type = ethernet interface_address = 9.114.51,73 interface_name = k10n09.ppd.pok.ibm.com
Purpose
Sets specified jobs to a higher system priority than all jobs that are not favored. This command also unfavors previously favored job(s), restoring the original priority, when you specify the -u flag.
llfavorjob [-?] [-H] [-v] [-q] [-u] joblist
Flags
Description
If this command is issued against jobs that are already running, it has no effect. If the job vacates, however, and returns to the queue, the job gets re-ordered with the new priority.
If more than one job is affected by this command, then the jobs are ordered by the sysprio expression and are scanned before the not favored jobs. However, favored jobs which do not match the job requirements with available machines may run after not favored jobs. This command remains in effect until reversed with the -u option.
Examples
This example assigns jobs 12.4 on the machine iron and 8.2 on zinc the highest priorities in the system, with the jobs ordered by the sysprio expression.
llfavorjob iron.12.4 zinc.8.2
This example unfavors jobs 12.4 on the machine iron and 8.2 on the machine zinc.
llfavorjob -u iron.12.4 zinc.8.2
Purpose
Sets a user's job(s) to the highest priority in the system, regardless of the current setting of the job priority. Jobs already running are not affected. This command also unfavors the user's job(s), restoring the original priority, when you specify the -u flag.
Syntax
llfavoruser [-?] [-H] [-v] [-q] [-u] userlist
Flags
Description
This command affects your current and future jobs until you remove the favor.
When the central manager daemon is restarted, any favor applied to users is revoked.
The user's jobs still remain ordered by user priority (which may cause jobs for the user to swap sysprio). If more than one user is affected by this command, the jobs of favored users are ordered by sysprio and are scanned before the jobs of not favored users. However, jobs of favored users which do not match job requirements with available machines may run after jobs of not favored users.
Examples
This example grants highest priority to all queued jobs submitted by users ellen and fred according to the sysprio expression.
llfavoruser ellen fred
This example unfavors all queued jobs submitted by users ellen and fred.
llfavoruser -u ellen fred
Purpose
Places jobs in user hold or system hold and releases jobs from both types of hold. Users can only move their own jobs into and out of user hold. Only LoadLeveler administrators can move jobs into and release them from system hold.
Syntax
llhold [-?] [-H] [-v] [-q] [-s] [-r] [-u userlist] [-h hostlist] [joblist]
Flags
If neither -s nor -r is specified, LoadLeveler puts the job(s) in user hold.
Only a LoadLeveler administrator can release jobs from system hold. Only an administrator or the owner of a job can release it from user hold.
If neither -s nor -r is specified, LoadLeveler puts the job(s) in user hold.
When issued by a non-administrator, this option only acts upon jobs that user has submitted to the machines in hostlist.
When issued by an administrator, all jobs monitored on the machines are acted upon unless the -u option is also used. In that case, the userlist is also part of the selection process, and only jobs both submitted by users in userlist and monitored on the machines in the hostlist are acted upon.
If the job was submitted from a submit-only machine, this is the name of the schedd machine that sent the job to the negotiator.
Description
This command does not affect a job step that is running unless the job step attempts to enter the Idle state. At this point, the job step is placed in the Hold state.
To ensure a job is released from both system hold and user hold, the administrator must issue the command with -r specified to release it from system hold. The administrator or the submitting user can reissue the command to release the job from user hold.
This command will fail if:
Examples
This example places job 23, job step 0 and job 19, job step 1 on hold.
llhold 23.0 19.1
This example releases job 23, job step 0, job 19, job step 1, and job 20, job step 3 fron a hold state.
llhold -r 23.0 19.1 20.3
This example places all jobs from users abe, barbara, and carol2 in system hold.
llhold -s -u abe barbara carol2
This example releases from a hold state all jobs on machines bronze, iron, and steel.
llhold -r -h bronze iron steel
This example releases from a hold state all jobs on machines bronze, iron, and steel that smith submitted.
llhold -r -u smith -h bronze iron steel
Results
The following shows a sample system response for the llhold -r -h bronze command.
llhold: Hold command has been sent to the central manager.
Purpose
Initializes a new machine as a member of the LoadLeveler hardware resource cluster
Syntax
llinit [-?] [-H] [-q] [-prompt] [-local pathname] [-release pathname] [-cm machine] [-debug]
Flags
There must be a unique local directory for each LoadLeveler cluster member.
Description
This command runs once on each machine during the installation process. It must be run by the user ID you have defined as the LoadLeveler user ID. The log, spool, and execute directories are created with the correct modes and ownerships. The LoadLeveler configuration and administration files, LoadL_config and LoadL_admin, respectively, are copied from LoadLeveler's release directory to LoadLeveler's home directory. The local configuration file, LoadL_config.local, is copied from LoadLeveler's release directory to LoadLeveler's local directory.
llinit initializes a new machine as a member of the LoadLeveler resource cluster by doing the following:
Before running llinit ensure that your HOME environment variable is set to LoadLeveler's home directory. To run llinit you must have:
Examples
The following example initializes a machine, assigning /var/loadl as the local directory, /usr/lpp/LoadL/full as the release directory, and the machine named bronze as the central manager.
llinit -local /var/loadl -release /usr/lpp/LoadL/full -cm bronze
Results
The following is sample output from this command:
llinit -local /home/ll_admin -release /usr/lpp/LoadL/full -cm mars
llinit: creating directory "/home/ll_admin/spool" llinit: creating directory "/home/ll_admin/log" llinit: creating directory "/home/ll_admin/execute" llinit: set permission "700" on "/home/ll_admin/spool" llinit: set permission "775" on "/home/ll_admin/log" llinit: set permission "1777" on "/home/ll_admin/execute" llinit: creating file "/home/ll_admin/LoadL_admin" llinit: creating file "/home/ll_admin/LoadL_config" llinit: creating file "/home/ll_admin/LoadL_config.local" llinit: editing file /home/ll_admin/LoadL_config llinit: editing file /home/ll_admin/LoadL_admin llinit: creating symbolic link "/home/ll_admin/bin -> /usr/lpp/LoadL/full/bin" llinit: creating symbolic link "/home/ll_admin/lib -> /usr/lpp/LoadL/full/lib" llinit: creating symbolic link "/home/ll_admin/man -> /usr/lpp/LoadL/full/man" llinit: creating symbolic link "/home/ll_admin/samples -> /usr/lpp/LoadL/full/samples" llinit: creating symbolic link "/home/ll_admin/include -> /usr/lpp/LoadL/full/include" llinit: program complete. |
Purpose
Changes the user priority of one or more job steps in the LoadLeveler queue. You can adjust the priority by supplying a + (plus) or - (minus) immediately followed by an integer value. llprio does not affect a job step that is running, even if its priority is lower than other jobs steps, unless the job step goes into the Idle state.
Syntax
llprio [-?] [-H] [-v] [-q] [+integer | -integer | -p priority] joblist
Flags
If the job step was submitted from a submit-only machine, this is the name of the machine where the schedd daemon that sent the job to the negotiator resides.
Description
The user priority of a job step ranges from 0 to 100 inclusively, with higher numbers corresponding to greater priority. The default priority is 50. Only the owner of a job step or the LoadLeveler administrator can change the priority of that job step. Note that the priority is not the UNIX nice priority.
Priority changes resulting in a value less than 0 become 0.
Priority changes resulting in a value greater than 100 become 100.
Any change to a job step's priority applied by a user is relative only to that user's other job steps in the same class. If you have three job steps enqueued, you can reorder those three job steps with llprio but the result does not affect job steps submitted by other users, regardless of their priority and position in the queue.
See "Setting and Changing the Priority of a Job" for more information.
Examples
This example raises the priority of job 4, job step 1 submitted to machine bronze by a value of 25.
llprio +25 bronze.4.1
This example sets the priority of job 18, job step 4 submitted to machine silver to 100, the highest possible value.
llprio -p 100 silver.18.4
Results
The following shows a sample system response for the llprio -p 100 silver.18.4 command.
llprio: Priority command has been sent to the central manager.
Purpose
Returns information about jobs that have been dispatched.
Syntax
llq [-?] [-H] [-v] [-x] [-s] [ -l] [joblist] [-u userlist] [-h hostlist] [-c classlist] [-f category_list] [-r category_list]
Flags
CPU usage and other resource consumption information on active jobs can only be reported using the -x flag if the LoadLeveler administrator has enabled it by specifying A_ON and A_DETAIL for the ACCT keyword in the LoadLeveler configuration file.
Normally, llq connects with the central manager to obtain job information. When you specify -x, llq connects to the schedd machine that received the specified job to get extended job information.
When specified without -l, CPU usage for active jobs is reported in the short format. Using -x can produce a very long report and can cause excess network traffic.
If -l is not specified, then the standard listing is generated as shown in Results.
If the job was submitted from a submit-only machine, this is the name of the machine where the schedd daemon that sent the job to the negotiator resides.
If the -u or -h options are not specified, and if no jobid is specified, then all jobs are queried.
The -u and -h options override the jobid parameters.
Examples
This example generates a long listing for job 8, job step 2 submitted to machine gold.
llq -l gold.8.2
This example generates a standard listing for all job steps of job name 12 submitted to the local machine.
llq 12
In this section, the term "job step" refers to either a serial job step or a parallel task.
Standard Listing: The standard listing is generated when you do not specify the -l option with the llq command. The following is sample output from the llq -h mars command, where the machine mars has two jobs running and one job waiting.
+--------------------------------------------------------------------------------+ |Id Owner Submitted ST PRI Class Running On | |------------------------ ---------- ----------- -- --- ------------ ----------- | |mars.498.0 brownap 5/20 11:31 R 100 silver mars | |mars.499.0 brownap 5/20 11:31 R 50 No_Class mars | |mars.501.0 brownap 5/20 11:31 I 50 silver | | | |3 job steps in queue, 1 waiting, 0 pending, 2 running, 0 held. | +--------------------------------------------------------------------------------+
The standard listing includes the following fields:
For a detailed explanation of job states, see "LoadLeveler Job States".
Customized, Formatted Standard Listing: A customized and formatted standard listing is generated when you specify llq with the -f flag. The following is sample output from this command:
llq -f %id %c %dq %dd %gl %h
+--------------------------------------------------------------------------------+ |Step Id Class Queue Date Disp. Date LL Group Running On | |----------------- ---------- ----------- ----------- ---------- --------------- | |ll6.2.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll6.pok.ibm.com | |ll6.1.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll6.pok.ibm.com | |ll6.3.0 No_Class 04/08 09:19 04/08 09:21 No_Group ll5.pok.ibm.com | | | |3 job steps in queue, 0 waiting, 0 pending, 3 running, 0 held | +--------------------------------------------------------------------------------+
Customized, Unformatted Standard Listing: A customized and unformatted (raw) standard listing is generated when you specify llq with the -r flag. Output fields are separated by an exclamation point (!). The following is sample output from this command:
llq -r %id %c %dq %dd %gl %h
+--------------------------------------------------------------------------------+ |ll6.pok.ibm.com.2.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll6.pok.ibm.com | |ll6.pok.ibm.com.1.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll6.pok.ibm.com | |ll6.pok.ibm.com.3.0!No_Class!04/08 09:19!04/08 09:21!No_Group!ll5.pok.ibm.com | +--------------------------------------------------------------------------------+
The Long Listing: The long listing is generated when you specify the -l option with the llq command. This section contains sample output for two llq commands: one querying a serial job and one querying a parallel job. Following the sample output is an explanation of all possible fields displayed by the llq command.
The following is sample output for the llq -l command for the serial job "ll6.pok.ibm.com.2."
+--------------------------------------------------------------------------------+ |=============== Job Step ll6.pok.ibm.com.2.0 =============== | | Job Step Id: ll6.pok.ibm.com.2.0 | | Job Name: ll6.pok.ibm.com.2 | | Step Name: ltest1 | | Structure Version: 9 | | Owner: loadl | | Queue Date: Wed Apr 8 09:19:21 1998 | | Status: Running | | Dispatch Time: Wed Apr 8 09:21:40 1998 | | Completion Date: | | Completion Code: | | User Priority: 50 | | user_sysprio: 0 | | class_sysprio: 30 | | group_sysprio: 0 | | System Priority: -1116 | | q_sysprio: -1116 | | Notifications: Complete | | Virtual Image Size: 1 kilobytes | | Checkpoint: | | Restart: yes | | Hold Job Until: | | Cmd: c_test1.cmd | | Args: | | Env: | | In: /dev/null | | Out: c_test1_cmd.ll6.2.0.out | | Err: c_test1_cmd.ll6.2.0.err | |Initial Working Dir: /home/loadl/TEST_DIR | | Dependency: | | Requirements: ((Arch == "R6000") && (OpSys == "AIX43")) | | Preferences: | | Step Type: Serial | | Min Processors: | | Max Processors: | | Allocated Host: ll6.pok.ibm.com | | Submitting host: ll6.pok.ibm.com | | Notify User: loadl@ll6.pok.ibm.com | | Shell: /bin/ksh | | LoadLeveler Group: No_Group | | Class: No_Class | | Cpu Hard Limit: -1 | | Cpu Soft Limit: -1 | | Data Hard Limit: -1 | | Data Soft Limit: -1 | | Core Hard Limit: -1 | | Core Soft Limit: -1 | | File Hard Limit: -1 | | File Soft Limit: -1 | | Stack Hard Limit: -1 | | Stack Soft Limit: -1 | | Rss Hard Limit: -1 | | Rss Soft Limit: -1 | |Step Cpu Hard Limit: -1 | |Step Cpu Soft Limit: -1 | |Wall Clk Hard Limit: 3000 seconds | |Wall Clk Soft Limit: -1 | | Comment: | | Account: | | Unix Group: loadl | | User Space Windows: 0 | | NQS Submit Queue: | | NQS Query Queues: | +--------------------------------------------------------------------------------+
The following is sample output for the llq -l -x k10n10.3.0 command, where k10n10.3.0 is a parallel job.
+--------------------------------------------------------------------------------+ |=============== Job Step k10n10.ppd.pok.ibm.com.3.0 =============== | | Job Step Id: k10n10.ppd.pok.ibm.com.3.0 | | Job Name: k10n10.ppd.pok.ibm.com.3 | | Step Name: 0 | | Structure Version: 9 | | Owner: richc | | Queue Date: Wed Apr 8 13:33:10 1998 | | Status: Running | | Dispatch Time: | | Start Time: | | Completion Date: | | Completion Code: | | User Priority: 50 | | user_sysprio: 0 | | class_sysprio: 0 | | group_sysprio: 0 | | System Priority: 0 | | q_sysprio: 0 | | Notifications: Always | | Virtual Image Size: 506 kilobytes | | Checkpoint: | | Restart: yes | | Hold Job Until: | | Env: | | In: /dev/null | | Out: mpi.out | | Err: mpi.err | |Initial Working Dir: /u/richc/sp/mpi | | Dependency: | | Step Type: General Parallel | | Submitting host: k10n10.ppd.pok.ibm.com | | Notify User: richc@k10n10.ppd.pok.ibm.com | | Shell: /bin/ksh | | LoadLeveler Group: No_Group | | Class: No_Class | | Cpu Hard Limit: -1 | | Cpu Soft Limit: -1 | | Data Hard Limit: -1 | | Data Soft Limit: -1 | | Core Hard Limit: -1 | | Core Soft Limit: -1 | | File Hard Limit: -1 | | File Soft Limit: -1 | | Stack Hard Limit: -1 | | Stack Soft Limit: -1 | | Rss Hard Limit: -1 | | Rss Soft Limit: -1 | |Step Cpu Hard Limit: -1 | |Step Cpu Soft Limit: -1 | |Wall Clk Hard Limit: 6000 seconds | |Wall Clk Soft Limit: 5965 seconds | | Comment: | | Account: | | Unix Group: usr | | NQS Submit Queue: | | NQS Query Queues: | |Negotiator Messages: | +--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+ |--------------- Detail for k10n10.ppd.pok.ibm.com.3.0 --------------- | | Running Host: k10n09.ppd.pok.ibm.com | | Machine Speed: 1.000000 | | Starter User Time: 0+00:00:00.170000 | |Starter System Time: 0+00:00:00.300000 | | Starter Total Time: 0+00:00:00.470000 | | Starter maxrss: 1256 | | Starter ixrss: 5628 | | Starter idrss: 9552 | | Starter isrss: 0 | | Starter minflt: 793 | | Starter majflt: 10 | | Starter nswap: 0 | | Starter inblock: 0 | | Starter oublock: 0 | | Starter msgsnd: 0 | | Starter msgrcv: 0 | | Starter nsignals: 0 | | Starter nvcsw: 399 | | Starter nivcsw: 31 | | Step User Time: 0+00:00:00.40000 | | Step System Time: 0+00:00:00.40000 | | Step Total Time: 0+00:00:00.80000 | | Step maxrss: 960 | | Step ixrss: 2120 | | Step idrss: 2436 | | Step isrss: 0 | | Step minflt: 273 | | Step majflt: 12 | | Step nswap: 0 | | Step inblock: 0 | | Step oublock: 0 | | Step msgsnd: 0 | | Step msgrcv: 0 | | Step nsignals: 0 | | Step nvcsw: 0 | | Step nivcsw: 0 | |------------------------------------------------- | |Node | |---- | | | | Name : | | Requirements : | | Preferences : | | Node minimum : 2 | | Node maximum : 2 | | Node actual : 2 | | Allocated Hosts : k10n09.ppd.pok.ibm.com:RUNNING:css0(0,MPI,us), | | css0(1,MPI,us) | | + k10n10.ppd.pok.ibm.com:RUNNING:css0(0,MPI,us), | | css0(1,MPI,us) | +--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+ | Master Task | | ----------- | | | | Executable : /u/richc/sp/poe/poe.musppa | | Exec Args : /u/richc/sp/mpi/fvt_mpi -v 131072 -euilib us -ilevel 6 | | -labelio yes -pmdlog yes | | Num Task Inst: 1 | | Task Instance: k10n09:-1 | | | | Task | | ---- | | | | Num Task Inst: 4 | | Task Instance: k10n09:0:css0(0,MPI,us) | | Task Instance: k10n09:1:css0(1,MPI,us) | | Task Instance: k10n10:2:css0(0,MPI,us) | | Task Instance: k10n10:3:css0(1,MPI,us) | +--------------------------------------------------------------------------------+
The long listing includes these fields:
For a detailed explanation of these job states, see "LoadLeveler Job States".
Other fields displayed when issuing llq -x -l are:
Other fields displayed for parallel jobs are:
Purpose
Returns status information about machines in the LoadLeveler cluster. It does not provide status on any NQS machine.
Syntax
llstatus [-?] [-H] [-v] [-l] [-f category_list] [-r category_list] [hostlist]
Flags
Description
If no hostlist is specified, all machines are queried.
If you have more than a few machines configured for LoadLeveler, consider redirecting the output to a file when using the -l flag.
Each machine periodically updates the central manager with a snapshot of its situation. Since the information returned by using llstatus is a collection of such snapshots, all taken at varying times, the total picture may not be completely consistent.
Examples
This example requests a long status listing for machines named silver and gold.
llstatus -l silver gold
In this section, the term "job step" refers to either a serial job step or a parallel task.
The Standard Listing: The standard listing is generated when you do not specify the -l option with the llstatus command. The following is sample output from the llstatus command, where there are two nodes in the cluster.
+--------------------------------------------------------------------------------+ |Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys | |k10n09.ppd.pok.ibm.com Avail 3 3 Run 1 2.72 0 R6000 AIX43 | |k10n12.ppd.pok.ibm.com Avail 0 0 Idle 0 0.00 365 R6000 AIX43 | | | |R6000/AIX43 2 machines 3 jobs 1 running | |Total Machines 2 machines 3 jobs 1 running | | | |The Central Manager is defined on k10n09.ppd.pok.ibm.com | | | |All machines on the machine_list are present. | +--------------------------------------------------------------------------------+
The standard listing includes the following fields:
For a detailed explanation of these states, see "The schedd Daemon".
For a detailed explanation of these states, see "The startd Daemon".
Customized, Formatted Standard Listing: A customized and formatted standard listing is generated when you specify llstatus with the -f option. The following is sample output from this command:
llstatus -f %n %scs %inq %m %v %sts %l %o
+--------------------------------------------------------------------------------+ |Name Schedd InQ Memory FreeVMemory Startd LdAvg OpSys | |ll5.pok.ibm.com Avail 0 128 22708 Run 0.23 AIX43 | |ll6.pok.ibm.com Avail 3 224 16732 Run 0.51 AIX43 | | | |R6000/AIX43 2 machines 3 jobs 3 running | |Total Machines 2 machines 3 jobs 3 running | | | |The Central Manager is defined on ll5.pok.ibm.com | | | |All machines on the machine_list are present. | +--------------------------------------------------------------------------------+
Customized, Unformatted Standard Listing: A customized and unformatted (raw) standard listing is generated when you specify llstatus with the -r flag. Output fields are separated by an exclamation point (!). The following is sample output from this command:
llstatus -r %n %scs %inq %m %v %sts %l %o
+--------------------------------------------------------------------------------+ |ll5.pok.ibm.com!Avail!0!128!22688!Running!0.14!AIX43 | |ll6.pok.ibm.com!Avail!3!224!16668!Running!0.37!AIX43 | +--------------------------------------------------------------------------------+
The Long Listing: The long listing is generated when you specify the -l option with the llstatus command. Following the sample output is an explanation of all possible fields displayed by the llstatus command.
The following is sample output from the llstatus -l ll6 command:
+--------------------------------------------------------------------------------+ |================================================================================| |Name = ll6.pok.ibm.com | |Machine = ll6.pok.ibm.com | |Arch = R6000 | |OpSys = AIX43 | |SYSPRIO = (0 - QDate) | |MACHPRIO = (0 - LoadAvg) | |VirtualMemory = 16640 | |Disk = 23000 | |KeyboardIdle = 600 | |Tmp = 48868 | |LoadAvg = 0.302991 | |ConfiguredClasses = No_Class(2) osl(1) small(2) medium(1) POE(2) | |AvailableClasses = No_Class(0) osl(1) small(2) medium(1) POE(2) | |DrainingClasses = | |DrainedClasses = | |Pool = 1 | |Adapter = css0(tb3mx,llx5,9.114.16.155,26,4) | |Feature = | |Max_Starters = 2 | |Memory = 224 | |ConfigTimeStamp = Wed Apr 8 09:05:36 1998 | |Cpus = 1 | |Speed = 1.000000 | |Subnet = 9.117.17 | |MasterMachPriority = 0.000000 | |CustomMetric = 1 | |StartdAvail = 1 | |State = Running | |EnteredCurrentState = Wed Apr 8 09:46:33 1998 | |START = T | |SUSPEND = F | |CONTINUE = T | |VACATE = F | |KILL = F | |Machine Mode = general | |Running = 2 | |ScheddAvail = 1 | |ScheddState = Avail | |ScheddRunning = 3 | |Pending = 0 | |Starting = 0 | |Idle = 0 | |Unexpanded = 0 | |Held = 0 | |Removed = 0 | |RemovedPending = 0 | |Completed = 0 | |TotalJobs = 3 | |TimeStamp = Wed Apr 8 09:47:45 1998 | +--------------------------------------------------------------------------------+
The long listing includes these fields:
For a detailed explanation of these states, see "The startd Daemon".
Purpose
Submits a job to LoadLeveler to be dispatched based upon job requirements in the job command file.
You can submit both LoadLeveler jobs and NQS jobs. To submit NQS jobs, the job command file must contain the shell script to be submitted to the NQS node.
Syntax
llsubmit [-?] [-H] [-v] [-q] [-n] [cmdfile | -]
Flags
Your LoadLeveler job myjob1 exited with status 139.
The return code 139 is from the user's job, and is not a LoadLeveler return code.
Examples
In this example, a job command file named qtrlyrun.cmd is submitted.
llsubmit qtrlyrun.cmd
Results
The following shows the results of the llsubmit qtrlyrun.cmd command issued from the machine earth:
llsubmit: The job "earth.505" has been submitted.
Note that 505 is the job ID generated by LoadLeveler.
Purpose
Returns job resource information on completed jobs for accounting purposes.
Syntax
llsummary [-?] [-H] [-v] [-x] [-l] [-s MM/DD/YY to MM/DD/YY] [-e MM/DD/YY to MM/DD/YY] [-u user] [-c class] [-g group] [-G unixgroup] [-a allocated] [-r report] [-j host.jobid] [-d section] [filename]
Flags
You must enable the recording of accounting data in order to generate any of the four throughput reports. To do this, specify ACCT=A_ON A_DETAIL in your LoadL_config file.
Examples
The following example requests summary reports (standard listing) of all the jobs submitted on your machine between the days of September 12, 1999 and October 12, 1999:
llsummary -s 09/12/99 to 10/12/99
The Standard Listing: The standard listing is generated when you do not specify -l. -r, or -d with llsummary. This sample report includes summaries of the following data:
The following is an example of the standard listing:
+--------------------------------------------------------------------------------+ | Name Jobs Steps Job Cpu Starter Cpu Leverage | | krystal 15 36 0+00:09:50 0+00:00:10 59.0 | | lixin3 18 54 0+00:08:28 0+00:00:16 31.8 | | TOTAL 33 90 0+00:18:18 0+00:00:27 40.7 | | | | Class Jobs Steps Job Cpu Starter Cpu Leverage | | small 9 21 0+00:01:03 0+00:00:06 10.5 | | large 12 36 0+00:13:45 0+00:00:11 75.0 | | osl2 3 9 0+00:00:27 0+00:00:02 13.5 | | No_Class 9 24 0+00:03:01 0+00:00:06 30.2 | | TOTAL 33 90 0+00:18:18 0+00:00:27 40.7 | | | | Group Jobs Steps Job Cpu Starter Cpu Leverage | | No_Group 12 30 0+00:09:32 0+00:00:09 63.6 | | chemistry 7 18 0+00:04:50 0+00:00:05 58.0 | |engineering 14 42 0+00:03:56 0+00:00:12 19.7 | | TOTAL 33 90 0+00:18:18 0+00:00:27 40.7 | | | | Account Jobs Steps Job Cpu Starter Cpu Leverage | | 33333 16 39 0+00:05:54 0+00:00:11 32.2 | | 22222 15 45 0+00:12:05 0+00:00:13 55.8 | | 99999 2 6 0+00:00:18 0+00:00:01 18.0 | | TOTAL 33 90 0+00:18:18 0+00:00:27 40.7 | +--------------------------------------------------------------------------------+
The standard listing includes the following fields:
The -r Listing: The following is sample output from the llsummary -r throughput command. Only the user output is shown; the class, group, and account lines are not shown.
+--------------------------------------------------------------------------------+ | Name Jobs Steps AvgQueueTime AvgRealTime AvgCPUTime | | loadl 1 4 0+00:00:03 0+00:05:27 0+00:05:17 | | user1 2 6 0+00:03:05 0+00:03:45 0+00:03:04 | | ALL 3 10 0+00:01:52 0+00:04:26 0+00:03:58 | | | | Name Jobs Steps MinQueueTime MinRealTime MinCPUTime | | loadl 1 4 0+00:00:01 0+00:02:49 0+00:02:44 | | user1 2 6 0+00:02:02 0+00:03:43 0+00:03:02 | | ALL 3 10 0+00:00:01 0+00:02:49 0+00:02:44 | | | | Name Jobs Steps MaxQueueTime MaxRealTime MaxCPUTime | | loadl 1 4 0+00:00:06 0+00:12:58 0+00:12:37 | | user1 2 6 0+00:06:21 0+00:03:48 0+00:03:07 | | ALL 3 10 0+00:06:21 0+00:12:58 0+00:12:37 | +--------------------------------------------------------------------------------+
The -r listing includes the following fields:
The MaxQueueTime, MaxRealTime, and MaxCPUTime fields display the time of the job with the greatest amount of queue, wall clock, and CPU time, respectively. The ALL line for the Average listing displays the average time for all users, classes, groups, and accounts. The ALL line for the Minimum listing displays the time of the job with the least amount of time for all users, classes, groups, and accounts. The ALL line for the Maximum listing displays the time of the job with the greatest amount of time for all users, classes, groups, and accounts.
The Long Listing: When you specify the -x option in conjunction with the -l option on the llsummary command, the long report resembles the following:
+--------------------------------------------------------------------------------+ |================== Job ll1.kgn.ibm.com 772 ================= | | Job Id: ll1.kgn.ibm.com 772 | | Job Name: ll1.kgn.ibm.com.772 | | Structure Version: 140 | | Owner: anton | | Unix_group: staff | | Submitting Host: ll1.kgn.ibm.com | | Submitting Userid: 17212 | | Submitting Groupid: 100 | | Number of Steps: 1 | |----------------- Step ll1.kgn.ibm.com 772.0 ----------------- | | Job Step Id: ll1.kgn.ibm.com 772.0 | | Step Name: c_test | | Queue Date: Wed Sep 6 09:43:38 CDT 1998 | |Job Step Dependency: | | Status: Completed | | Completion Date: Wed Sep 6 10:27:23 CDT 1998 | | Completion Code: 0 | | Start Count: 1 | | User Priority: 50 | | user_sysprio: 0 | | class_sysprio: 0 | | group_sysprio: 0 | | Notifications: Complete | | Virtual Image Size: 19 kilobytes | | Checkpoint: no | | Restart: yes | | Hold Job Until: | | Cmd: job1.cmd | | Args: | | Env: LOADL_CORESIZE = 1024 | | In: /dev/null | | Out: job1.ll1.772.0.out | | Err: job1.ll1.772.0.err | |Initial Working Dir: /u/jeffli/regress | | Requirements: (Arch == "R6000") && (OpSys == "AIX43") | | Preferences: | | Step Type: Serial | | Min Processors: | | Max Processors: | | Allocated Host: ll1.kgn.ibm.com | | Notify User: anton@ll1.kgn.ibm.com | | Shell: /bin/ksh | | LoadLeveler Group: No_Group | | Class: No_Class | | Cpu Hard Limit: 300 seconds | | Cpu Soft Limit: 100 seconds | | Data Hard Limit: 262144000 bytes | | Data Soft Limit: 230686720 bytes | | Core Hard Limit: 262144000 bytes | | Core Soft Limit: -1 | | File Hard Limit: 262144000 bytes | | File Soft Limit: 230686720 bytes | | Stack Hard Limit: 262144000 bytes | | Stack Soft Limit: -1 | | Rss Hard Limit: 262144000 bytes | | Rss Soft Limit: -1 | |Step Cpu Hard Limit: 400 seconds | |Step Cpu Soft Limit: 200 seconds | |Wall Clk Hard Limit: 600 seconds | |Wall Clk Soft Limit: 300 seconds | +--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+ | Comment: | | Account: | | NQS Submit Queue: | | NQS Query Queues: | | Job Tracking Exit: | | Job Tracking Args: | |--------------- Detail for ll1.kgn.ibm.com.772.0 ------ | | Running Host: ll1.kgn.ibm.com | | Machine Speed: 1.000000 | | Event: System | | Event Name: completed | | Time of Event: Wed Sep 6 10:27:23 CDT 1998 | | Starter User Time: 0+00:00:00.240000 | |Starter System Time: 0+00:00:00.390000 | | Starter Total Time: 0+00:00:00.630000 | | Starter maxrss: 828 | | Starter ixrss: 8388 | | Starter idrss: 6896 | | Starter isrss: 0 | | Starter minflt: 202 | | Starter majflt: 0 | | Starter nswap: 0 | | Starter inblock: 0 | | Starter oublock: 0 | | Starter msgsnd: 12 | | Starter msgrcv: 11 | | Starter nsignals: 1 | | Starter nvcsw: 79 | | Starter nivcsw: 0 | | Step User Time: 0+00:00:00.810000 | | Step System Time: 0+00:00:01.500000 | | Step Total Time: 0+00:00:02.310000 | | Step maxrss: 712 | | Step ixrss: 15540 | | Step idrss: 14296 | | Step isrss: 0 | | Step minflt: 1443 | | Step majflt: 0 | | Step nswap: 0 | | Step inblock: 0 | | Step oublock: 0 | | Step msgsnd: 5 | | Step msgrcv: 4 | | Step nsignals: 14 | | Step nvcsw: 70 | | Step nivcsw: 0 | +--------------------------------------------------------------------------------+
For an explanation of these fields, see the description of the output fields for the long listing of the llq command.