Using and Administering

Step 3: Specify Class Stanzas

The information in a class stanza defines characteristics for that class. Class stanzas are optional. Class stanzas take the following format. Default values for keywords appear in bold.

Figure 27. Format of a Class Stanza

label: type = class
admin= list
class_comment = "string"
default_resources = name(count) name(count)...name(count)
exclude_groups = list
exclude_users = list
include_groups = list
include_users = list
master_node_requirement = true | false
maxjobs = number
max_node = number
max_processors = number
nice = value
NQS_class = true | false
NQS_submit = name
NQS_query = queue names
priority = number
total_tasks = number
core_limit = hardlimit,softlimit
cpu_limit = hardlimit,softlimit
data_limit = hardlimit,softlimit
file_limit = hardlimit,softlimit
job_cpu_limit = hardlimit,softlimit
rss_limit = hardlimit,softlimit
stack_limit = hardlimit,softlimit
wall_clock_limit = hardlimit,softlimit

You can specify the following keywords in a class stanza:

admin = list
where list is a blank-delimited list of administrators for this class. These administrators can hold, release, and cancel jobs in this class.

class_comment = "string"
where string is text characterizing the class. This information appears when the user is building a job command file using the GUI and requests Choice information on the classes to which he or she is authorized to submit jobs. The length of the string cannot exceed 1024 characters.

default_resources = name(count) name(count)...name(count)

Specifies the default amount of resources consumed by a task of a job step, provided that no resources keyword is coded for the step in the job command file. If a resources keyword is coded for a job step, then it overrides any default_resources associated with the associated job class. The syntax is:

resources=name(count) name(count) ... name(count)

where name(count) could also be ConsumableMemory(count units) or ConsumableVirtualMemory(count units). ConsumableMemory and ConsumableVirtualMemory are the only two consumable resources that can be specified with both a count and units. The count for each specified resource must be an integer greater than or equal to zero, with three exceptions: ConsumableCpus, and ConsumableMemory must be specified with a value which is greater than zero, and ConsumableVirtualMemory must be specified with a value greater than 0, and greater than or equal to the image_size. If the count is not valid, then LoadLeveler will issue an error message, and will not submit the job. The allowable units are those normally used with LoadLeveler data limits:

b bytes
w words
kb kilobytes (2** 10 bytes)
kw kilowords (2** 10 words)
mb megabytes (2** 20 bytes)
mw megawords (2**20 words)
gb gigabytes (2** 30 bytes)
gw gigawords (2** 30 words)
 
 
ConsumableMemory and ConsumableVirtualMemory values are stored in mb (megabytes) and rounded up. Therefore, the smallest amount of ConsumableMemory or ConsumableVirtualMemory which you can request is one megabyte. If no units are specified, then megabytes are assumed. However, image size units are in kilobytes. Resources defined here that are not in the SCHEDULE_BY_RESOURCES list in the global configuration file will not effect the scheduling of the job. If the resources keyword is not specified in the job step, then the default_resources (if any) defined in the administration file for the class will be used for each task of the job step.

exclude_groups = list
where list is a blank-delimited list of groups who are not allowed to submit jobs of that class name. Do not specify both a list of included groups and a list of excluded groups. Only one of these may be used for any class. The default is that no groups are excluded.

exclude_users = list
where list is a blank-delimited list of users who are not permitted to submit jobs of that class name. Do not specify both a list of included users and a list of excluded users. Only one of these may be used for any class. The default is that no users are excluded.

include_groups = list
where list is a blank-delimited list of groups who are allowed to submit jobs of that class name. If provided, this list limits groups of that class to those on the list. Do not specify both a list of included groups and a list of excluded groups. Only one of these may be used for any class. The default is to include all groups.

include_users = list
where list is a blank-delimited list of users who are permitted to submit jobs of that class name. If provided, this list limits users of that class to those on the list. Do not specify both a list of included users and a list of excluded users. Only one of these may be used for any class. The default is to include all users.

master_node_requirement =  true |false
where true specifies that parallel jobs in this class require the master node feature. For these jobs, LoadLeveler allocates the first node (called the "master") on a machine having the master_node_exclusive = true setting in its machine stanza. If most or all of your parallel jobs require this feature, you should consider placing the statement master_node_requirement = true in your default class stanza. Then, for classes that do not require this feature, you can use the statement master_node_requirement = false in their class stanzas to override the default setting. One machine per class should have the true setting; if more than one machine has this setting, normal scheduling selection is performed.

maxjobs = number
where number is the maximum number of jobs that can run in this class. If the class stanza does not specify maxjobs, or if there is no class stanza at all, the maximum jobs that can be simultaneously run in this class is defined in the default stanza. The default is -1, which means that no limit is placed on the number of jobs a user can submit.

max_processors = number
where number specifies the maximum number of processors a user submitting jobs to this class can request for a parallel job in a job command file using the min_processors and max_processors keywords. The default is -1 which means that there is no limit.

max_node = number
where number specifies the maximum number of nodes a user submitting jobs in this class can request for a parallel job in a job command file using the node keyword. The default is -1, which means there is no limit. The max_node keyword will not affect the use of the min_processors and max_processors keywords in the job command file.

nice = value
where value is the amount by which the current UNIX nice value is incremented. The nice value is one factor in a job's run priority. The lower the number, the higher the run priority. If two jobs are running on a machine, the nice value determines the percentage of the CPU allocated to each job.

This value ranges from -20 to 20. Values out of this range are placed at the top (or bottom) of the range. For example, if your current nice value is 15, and you specify nice = 10, the resulting value is 20 (the upper limit) rather than 25. The default is 0.

For more information, consult the appropriate UNIX documentaion.

NQS_class =  true |false
When true, any job submitted to this class will be routed to an NQS machine.

NQS_submit = name
where name is the name of the NQS pipe queue to which the job will be routed. When the job is dispatched to LoadLeveler, LoadLeveler will invoke the qsub command using the name of this queue. There is no default.

NQS_query = queue names
where queue names is a blank-delimited list of queue names (including host names if necessary) to be used with the qstat command to monitor the job and with the qdel command to cancel the job. There is no default.

For more information on routing jobs to machines running NQS, refer to Figure 31

priority = number
where number is an integer that specifies the priority for jobs in this class. The default is 0. The number specified for priority is referenced as ClassSysprio in the configuration file. You can use ClassSysprio when assigning job priorities. If the variable ClassSysprio does not appear in the SYSPRIO expression, then the priority specified here in the administration file is ignored. See Step 6: Prioritize the Queue Maintained by the Negotiator for more information about the ClassSysprio keyword.

total_tasks = number
where number specifies the maximum number of tasks a user submitting jobs in this class can request for a parallel job in a job command file using the total_tasks keyword. The default is -1, which means there is no limit.

Limit Keywords

The class stanza includes the following limit keywords, which allow you to control the amount of resources used by a job step or a job process.

Table 10. Types of Limit Keywords
Limit How It Is Enforced
core_limit Per process
cpu_limit Per process
data_limit Per process
file_limit Per process
job_cpu_limit Per job step
rss_limit Per process
stack_limit Per process
wall_clock_limit Per job step

Individual keywords are described in Specifying Limits in the Class Stanza. The following section gives you a general overview of limits.

Overview of Limits

A limit is the amount of a resource that a job step or a process is allowed to use. (A process is a dispatchable unit of work.) A job step may be made up of several processes.

Limits include both a hard limit and a soft limit. When a hard limit is exceeded, the job is usually terminated. When a soft limit is exceeded, the job is usually given a chance to perform some recovery actions. For more information, see Exceeding Limits.

Limits are enforced either per process or per job step, depending on the type of limit. For parallel jobs steps, which consist of multiple tasks running on multiple machines, limits are enforced on a per task basis.

For example, a common limit is the cpu_limit, which limits the amount of CPU time a single process can use. If you set cpu_limit to five hours and you have a job step that forks five processes, each process can use up to five hours of CPU time, for a total of 25 CPU hours. Another limit that controls the amount of CPU used is job_cpu_limit. This is the total amount of CPU that the entire serial job step can use. If you impose a job_cpu_limit of five hours, the entire job step (made up of all five processes) cannot consume more than five CPU hours.

You can specify limits in either the class stanza of the administration file or in the job command file. The lowest of these two limits will be used to run the job. If the class limit is used the job will be started regardless of the users system limit.

Exceeding Limits

Process limits are enforced by the operating system. Job step limits are enforced by LoadLeveler.

Exceeding Job Step Limits

When a hard limit is exceeded LoadLeveler sends a non-trappable signal to the process (except in the case of a parallel job). When a soft limit is exceeded, LoadLeveler sends a trappable signal to the process. The following chart summarizes the actions that occur when a job step limit is exceeded:

Table 11. Exceeding Job Step Limits
Type of Job When a Soft Limit is Exceeded When a Hard Limit is Exceeded
Serial SIGXCPU or SIGKILL issued SIGKILL issued
Parallel (non-PVM) SIGXCPU issued to both the user program and to the parallel daemon SIGTERM issued
PVM SIGXCPU issued to the user prgram pvm_halt invoked to shut down PVM

On systems that do not support SIGXCPU, LoadLeveler does not distinguish between hard and soft limits. When a soft limit is reached on these platforms, LoadLeveler issues a SIGKILL.

Exceeding Per Process Limits

For per process limits, what happens when your job reaches and exceeds either the soft limit or the hard limit depends on the operating system you are using.

Note that when a job forks a process which exceeds a per process limit, such as the CPU limit, the operating system (and not LoadLeveler) terminates the process by issuing a SIGXCPU. As a result, you will not see an entry in the LoadLeveler logs indicating that the process exceeded the limit. The job will complete with a 0 return code. LoadLeveler can only report the status of any processes it has started.

If you need more specific information, refer to your operating system documentation.

Syntax

The syntax for setting a limit is

limit_type = hardlimit,softlimit

For example:

core_limit = 120kb,100kb

To specify only a hard limit, you can enter, for example:

core_limit = 120kb

To specify only a soft limit, you can enter, for example:

core_limit = ,100kb

In a keyword statement, you cannot have any blanks between the numerical value (100 in the above example) and the units (kb). Also, you cannot have any blanks to the left or right of the comma when you define a limit in a job command file.

For limit keywords that refer to a data limit -- such as data_limit, core_limit, file_limit, stack_limit, and rss_limit -- the hard limit and the soft limit are expressed as:

integer[.fraction][units]

where integer and fraction represent numerical strings of up to eight characters. units can be:

b
bytes
w
words
kb
kilobytes (2 10 bytes)
kw
kilowords (2 10 words)
mb
megabytes (2 20 bytes)
mw
megawords (2 20 words)
gb
gigabytes (2 30 bytes)
gw
gigawords (2 30 words)

If no units are specified for data limits, then bytes are assumed.

For limit keywords that refer to a time limit -- such as cpu_limit, job_cpu_limit, and wall_clock_limit -- the hard limit and the soft limit are expressed as:

[[hours:]minutes:]seconds[.fraction]

Fractions are rounded to seconds.

You can use the following character strings with all limit keywords except the copy keyword for wall_clock_limit:

rlim_infinity
Represents the largest positive number.
unlimited
Has same effect as rlim_infinity.
copy
Uses the limit currently active when the job is submitted.

See Table 12 for more information on specifying limits.

Table 12. Setting limits
If the hard limit: Then the:
Is set in both the class stanza and the job command file Smaller of the two limits is taken into consideration. If the smaller limit is the job limit, the job limit is then compared with the user limit set on the machine that runs the job. The smaller of these two values is used. If the limit used is the class limit, the class limit is used without being compared to the machine limit.
Is not set in either the class stanza or the job command file User per process limit set on the machine that runs the job is used.
Is set in the job command file and is less than its respective job soft limit The job is not submitted.
Is set in the class stanza and is less than its respective class stanza soft limit Soft limit is adjusted downward to equal the hard limit.
Is specified in the job command file Hard limit must be greater than or equal to the specified soft limit and less than or equal to the limit set by the administrator in the class stanza of the administration file.

Note: If the per process limit is not defined in the administration file and the hard limit defined by the user in the job command file is greater than the limit on the executing machine, then the hard limit is set to the machine limit.

Specifying Limits in the Class Stanza

You can specify the following limit keywords:

core_limit = hardlimit,softlimit

Specifies the hard limit and/or soft limit for the size of a core file.

Examples:

core_limit = unlimited
core_limit = 30mb

For more information, see Overview of Limits

cpu_limit = hardlimit,softlimit
Specifies hard limit and/or soft limit for the CPU time to be used by each individual process of a job step. For example, if you impose a cpu_limit of five hours and you have a job step composed of five processes, each process can consume five CPU hours; the entire job step can therefore consume 25 total hours of CPU.

Examples:

cpu_limit = 12:56:21       # hardlimit = 12 hours 56 minutes 21 seconds
cpu_limit = 56:00,50:00    # hardlimit = 56 minutes 0 seconds
# softlimit = 50 minutes 0 seconds
cpu_limit = 1:03           # hardlimit = 1 minute 3 seconds
cpu_limit = unlimited      # hardlimit = 2,147,483,647 seconds
# (X'7FFFFFFF')
cpu_limit = rlim_infinity  # hardlimit = 2,147,483,647 seconds
# (X'7FFFFFFF')
cpu_limit = copy           # current CPU hardlimit value on the
# submitting machine.

For more information, see Overview of Limits.

data_limit = hardlimit,softlimit

Specifies hard limit and/or soft limit for the data segment to be used by each process of the submitted job.

Examples:

data_limit = 125621         # hardlimit = 125621 bytes
data_limit = 5621kb         # hardlimit = 5621 kilobytes
data_limit = 2mb            # hardlimit = 2 megabytes
data_limit = 2.5mw          # hardlimit = 2.5 megawords
data_limit = unlimited      # hardlimit = 2,147,483,647 bytes
#             (X'7FFFFFF')
data_limit = rlim_infinity  # hardlimit = 2,147,483,647 bytes
#             (X'7FFFFFF')
data_limit = copy           # copy data hardlimit value from submitting
                            # machine.

For more information, see Overview of Limits.

file_limit = hardlimit,softlimit

Specifies the hard limit and/or soft limit for the size of a file. For more information, see Overview of Limits.

job_cpu_limit = hardlimit,softlimit
Specifies the maximum total CPU time to be used by all processes of a job step. That is, if a job step forks to produce multiple processes, the sum total of CPU consumed by all of the processes is added and controlled by this limit.

For example:

job_cpu_limit = 10000

For more information on this keyword, see the JOB_LIMIT_POLICY keyword in Chapter 7, Gathering Job Accounting Data. For more general information on limits, see Overview of Limits.

rss_limit = hardlimit,softlimit

Specifies the hard limit and/or soft limit for the resident size. For more information, see Overview of Limits.

stack_limit = hardlimit,softlimit

Specifies the hard limit and/or soft limit for the size of a stack. For more information, see Overview of Limits.

wall_clock_limit = hardlimit,softlimit

Specifies the hard limit and/or soft limit for the elapsed time for which a job can run. Note that LoadLeveler uses the time the negotiator daemon dispatches the job as the start time of the job. When a job is checkpointed, vacated, and then restarted, the wall_clock_limit is not adjusted to account for the amount of time that elapsed before the checkpoint occured. This keyword is not supported for NQS jobs. Also, if the startd daemon terminates abnormally with running jobs, any wall clock limits are not supported when the daemon is restarted.

If you are running the Backfill scheduler, you must set a wall clock limit either in the job command file or in a class stanza (for the class associated with the job you submit). LoadLeveler administrators should consider setting a default wall clock limit in a default class stanza. For more information on setting a wall clock limit when using the Backfill scheduler, see Choosing a Scheduler.

For more general information on limits, see Overview of Limits.

Examples of Class Stanzas

Example 1: Creating a Class that Excludes Certain Users

class_a: type=class                # class that excludes users
priority=10                        # ClassSysprio
exclude_users=green judy           # Excluded users

Example 2: Creating a Class for Small-Size Jobs

small:  type=class                 # class for small jobs
priority=80                        # ClassSysprio (max=100)
cpu_limit=00:02:00                 # 2 minute limit
data_limit=30mb                    # max 30 MB data segment
default_resources=ConsumbableVirtualMemory(10mb)  # resources consumed by each 
ConsumableCpus(1) resA(3) floatinglicenseX(1)     # task of a small job step if 
                                   # resources are not explicitly 
                                   # specified in the job command file
core_limit=10mb                    # max 10 MB core file
file_limit=50mb                    # max file size 50 MB
stack_limit=10mb                   # max stack size 10 MB
rss_limit=35mb                     # max resident set size 35 MB
include_users = bob sally          # authorized users

Example 3: Creating a Class for Medium-Size Jobs

medium: type=class             # class for medium jobs
priority=70                    # ClassSysprio
cpu_limit=00:10:00             # 10 minute run time limit
data_limit=80mb,60mb           # max 80 MB data segment
                               # soft limit 60 MB data segment
core_limit=30mb                # max 30 MB core file
file_limit=80mb                # max file size 80 MB
stack_limit=30mb               # max stack size 30 MB
rss_limit=100mb                # max resident set size 100 MB
job_cpu_limit=1800,1200        # hard limit is 30 minutes,
                               # soft limit is 20 minutes

Example 4: Creating a Class for Large-Size Jobs

large:  type=class             # class for large jobs
priority=60                    # ClassSysprio
cpu_limit=00:10:00             # 10 minute run time limit
data_limit=120mb               # max 120 MB data segment
default_resources=ConsumableVirtualMemory(40mb)         # resources consumed by each 
ConsumableCpus(2) resA(8) floatinglicenseX(1) resB(1)   # task of a large job step if 
                               # resources are not explicitly 
                               # specified in the job command file
core_limit=30mb                # max 30 MB core file
file_limit=120mb               # max file size 120 MB
stack_limit=unlimited          # unlimited stack size
rss_limit=150mb                # max resident set size 150 MB
job_cpu_limit = 3600,2700      # hard limit 60 minutes
                               # soft limit 45 minutes
wall_clock_limit=12:00:00,11:59:55 # hard limit is 12 hours
 

Example 5: Creating a Class to Route Jobs to NQS Machines

nqs:   type=class               # class for NQS jobs
NQS_class=true
NQS_submit=pipe_queue           # NQS pipe queue name
NQS_query=one two three         # list of queue names

You can use the class names in control expressions in both the global and local configuration file.

Example 6: Creating a Class for PVM Jobs

PVM3:  type=class             # class for PVM jobs
priority=60                   # ClassSysprio (max=100)
max_processors=15             # maximum number of processors

Example 7: Creating a Class for Master Node Machines

sp-6hr-sp:  type=class               # class for master node machines
priority=50              # ClassSysprio (max=100)
cpu_limit = 06:00:00     # 6 hour limit
job_cpu_limit = 06:00:00 # hard limit is 6 hours
core_limit = lmb         # max 1MB core file
master_node_requirement = true # master node definition


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]