This chapter gives you an overview, including configuration information, of some LoadLeveler customers. These profiles are meant to highlight how customers in different industries use LoadLeveler.
Note that all of these configurations apply to Version 1 Release 3 of the default LoadLeveler scheduler unless otherwise noted.
The Cornell Theory Center (CTC) of Cornell University provides a high-performance computing environment to advance and facilitate research and education.
The CTC runs a 160-node SP with 16 wide nodes and 144 thin nodes. The SP nodes include two interactive nodes and two submit-only nodes. The majority of the other SP nodes run batch jobs. The LoadLeveler central manager runs on a workstation outside of the SP. Also, two other non-SP workstations act as schedd hosts.
The CTC runs parallel jobs by disabling the default LoadLeveler scheduler SCHEDULER_API=YES) and running an external scheduler. The CTC has developed this scheduler to meet the needs of its users.
The following figures represent sections of the CTC's LoadL_admin file. Note that not all nodes are shown here.
############################################################################# # DEFAULTS FOR MACHINE, CLASS, USER, AND GROUP STANZAS: # Remove initial # (comment), and edit to suit. ############################################################################# default: type = machine central_manager = false # default not central manager schedd_host = false # default not a public scheduler submit_only = false # default not a submit-only machine pvm_root = /usr/local/app/pvm3 # default pvm3 directory rm_host = true # default is parallel SP2 node # speed = 1 # default machine speed # cpu_speed_scale = false # scale cpu limits by speed default: type = class # default class stanza # priority = 0 # default ClassSysprio # max_processors = -1 # default max processors for class (no
default: type = user # default user stanza # priority = 0 # default UserSysprio default_class = DSI # default class default_group = No_Group # default group = No_Group (not # optional) # maxjobs = -1 # default maximum jobs user is allowed # to run simultaneously (no limit) # maxqueued = -1 # default maximum jobs user is allowed # on system queue (no limit). does not # limit jobs submitted. default: type = group # default group stanza # priority = 0 # default GroupSysprio # maxjobs = -1 # default maximum jobs group is allowed # to run simultaneously (no limit) # maxqueued = -1 # default maximum jobs group is allowed # on system queue (no limit). does not # limit jobs submitted. ############################################################################# # MACHINE STANZAS: # These are the machine stanzas; the first machine is defined as # the central manager. mach1:, mach2:, etc. are machine name labels - # revise these placeholder labels with the names of the machines in the # pool, and specify any schedd_host and submit_only keywords and values # (true or false), if required. ############################################################################# # spscheduler is a 43P running EASY-LL and the Central Manager spscheduler.tc.cornell.edu: type = machine central_manager = true rm_host =false # ctc1 and ctc2 are two 43P's running as dedicated SchedDs ctc1.tc.cornell.edu: type = machine schedd_host = true ctc2.tc.cornell.edu: type = machine schedd_host = true # Submit only node for Sweb server arms.tc.cornell.edu: type = machine submit_only = true
# # Nodes of the SP2 # # Rack 1 # # PIOFS name server, HiPPi router, Switch & JMD primary #r01n01.tc.cornell.edu: type = machine # alias = r01n01-css # r01n02 & r01n05 are interactive nodes r01n03.tc.cornell.edu: type = machine alias = r01n03-css submit_only = true r01n05.tc.cornell.edu: type = machine alias = r01n05-css submit_only = true r01n07.tc.cornell.edu: type = machine alias = r01n07-css r01n09.tc.cornell.edu: type = machine alias = r01n09-css r01n11.tc.cornell.edu: type = machine alias = r01n11-css r01n13.tc.cornell.edu: type = machine alias = r01n13-css r01n15.tc.cornell.edu: type = machine alias = r01n15-css # # Rack 2 # # HPSS/PIOFS backup #r02n01.tc.cornell.edu: type = machine # alias = r02n01-css # r02n03, r02n05, r02n07, r02n09 are splong nodes r02n03.tc.cornell.edu: type = machine alias = r02n03-css submit_only = true r02n05.tc.cornell.edu: type = machine alias = r02n05-css submit_only = true r02n07.tc.cornell.edu: type = machine alias = r02n07-css submit_only = true r02n09.tc.cornell.edu: type = machine alias = r02n09-css submit_only = true # VIS node #r02n11.tc.cornell.edu: type = machine # alias = r02n11-css r02n13.tc.cornell.edu: type = machine alias = r02n13-css r02n15.tc.cornell.edu: type = machine alias = r02n15-css
# # Rack 3 # r03n01.tc.cornell.edu: type = machine alias = r03n01-css r03n02.tc.cornell.edu: type = machine alias = r03n02-css r03n03.tc.cornell.edu: type = machine alias = r03n03-css r03n04.tc.cornell.edu: type = machine alias = r03n04-css r03n05.tc.cornell.edu: type = machine alias = r03n05-css r03n06.tc.cornell.edu: type = machine alias = r03n06-css r03n07.tc.cornell.edu: type = machine alias = r03n07-css r03n08.tc.cornell.edu: type = machine alias = r03n08-css r03n09.tc.cornell.edu: type = machine alias = r03n09-css r03n10.tc.cornell.edu: type = machine alias = r03n10-css r03n11.tc.cornell.edu: type = machine alias = r03n11-css r03n12.tc.cornell.edu: type = machine alias = r03n12-css r03n13.tc.cornell.edu: type = machine alias = r03n13-css r03n14.tc.cornell.edu: type = machine alias = r03n14-css r03n15.tc.cornell.edu: type = machine alias = r03n15-css # ATM/FDDI routing node #r03n16.tc.cornell.edu: type = machine # alias = r03n16-css
# # Rack 4 # r04n01.tc.cornell.edu: type = machine alias = r04n01-css r04n02.tc.cornell.edu: type = machine alias = r04n02-css r04n03.tc.cornell.edu: type = machine alias = r04n03-css r04n04.tc.cornell.edu: type = machine alias = r04n04-css r04n05.tc.cornell.edu: type = machine alias = r04n05-css r04n06.tc.cornell.edu: type = machine alias = r04n06-css r04n07.tc.cornell.edu: type = machine alias = r04n07-css r04n08.tc.cornell.edu: type = machine alias = r04n08-css r04n09.tc.cornell.edu: type = machine alias = r04n09-css r04n10.tc.cornell.edu: type = machine alias = r04n10-css r04n11.tc.cornell.edu: type = machine alias = r04n11-css # r04n12 - r14n16 HPSS nodes #r04n12.tc.cornell.edu: type = machine # alias = r04n12-css #r04n13.tc.cornell.edu: type = machine # alias = r04n13-css #r04n14.tc.cornell.edu: type = machine # alias = r04n14-css #r04n15.tc.cornell.edu: type = machine # alias = r04n15-css #r04n16.tc.cornell.edu: type = machine # alias = r04n16-css # ############################################################################# # CLASS STANZAS: (optional) # These are sample class stanzas; small, medium, large, and nqs are sample # labels for job classes - revise these labels and specify attributes # to each class. ############################################################################# DSI: type = class piofs: type = class #############################################################################
The following represents the CTC's LoadL_config file.
# # Machine Description # ARCH = R6000 # # Specify LoadLeveler Administrators here: # LOADL_ADMIN = loadl admin1 admin2 admin3 admin4 # # Default to starting LoadLeveler daemons when requested # START_DAEMONS = TRUE # # Machine authentication # # If TRUE, only connections from machines in the ADMIN_LIST are accepted. # If FALSE, connections from any machine are accepted. Default if not # specified is FALSE. # MACHINE_AUTHENTICATE = FALSE # # Specify which daemons run on each node # SCHEDD_RUNS_HERE = False STARTD_RUNS_HERE = True # # Specify information for backup central manager # # CENTRAL_MANAGER_HEARTBEAT_INTERVAL = 300 # CENTRAL_MANAGER_TIMEOUT = 6
# # Specify pathnames # RELEASEDIR = /usr/lpp/LoadL/nfs LOCAL_CONFIG = $(tilde)/local/configs/LoadL_config.$(host) ADMIN_FILE = $(tilde)/LoadL_admin LOG = /var/loadl/log SPOOL = /var/loadl/spool EXECUTE = /var/loadl/execute HISTORY = $(SPOOL)/history BIN = $(RELEASEDIR)/bin LIB = $(RELEASEDIR)/lib ETC = $(RELEASEDIR)/etc # # Specify port numbers # COLLECTOR_STREAM_PORT = 9612 MASTER_STREAM_PORT = 9616 NEGOTIATOR_STREAM_PORT = 9614 SCHEDD_STREAM_PORT = 9605 STARTD_STREAM_PORT = 9611 COLLECTOR_DGRAM_PORT = 9613 STARTD_DGRAM_PORT = 9615 MASTER_DGRAM_PORT = 9617 SCHEDULER_API = YES SCHEDULER_PORT = 9624 # # Specify accounting controls # ACCT = A_ON ACCT_VALIDATION = $(BIN)/llacctval GLOBAL_HISTORY = $(SPOOL) # # Specify prolog and epilog path names # JOB_PROLOG = $(ETC)/llprolog JOB_EPILOG = $(ETC)/llepilog JOB_USER_PROLOG = $(ETC)/ll_user_prolog JOB_USER_EPILOG = $(ETC)/ll_user_epilog # # # Refresh AFS token program. # AFS_GETNEWTOKEN = $(ETC)/tokenreviveclient
# # Customized mail delivery program. # # MAIL = # # Customized submit (job command file) filter program. # # SUBMIT_FILTER = # # Specify checkpointing intervals # MIN_CKPT_INTERVAL = 900 MAX_CKPT_INTERVAL = 7200 # LoadL_KeyboardD Macros # KBDD = $(BIN)/LoadL_kbdd KBDD_LOG = $(LOG)/KbdLog MAX_KBDD_LOG = 64000 KBDD_DEBUG = # # Specify whether to start the keyboard daemon # X_RUNS_HERE = False # # Specify whether to use X server XGetIdleTime() protocol extension # USE_X_IDLE_EXTENSION = False # # LoadL_StartD Macros # STARTD = $(BIN)/LoadL_startd STARTD_LOG = $(LOG)/StartLog MAX_STARTD_LOG = 5000000 #STARTD_DEBUG = D_STARTD D_FULLDEBUG D_THREAD STARTD_DEBUG = D_FULLDEBUG POLLING_FREQUENCY = 10 POLLS_PER_UPDATE = 24 JOB_LIMIT_POLICY = 240 JOB_ACCT_Q_POLICY = 3600 # # LoadL_SchedD Macros # SCHEDD = $(BIN)/LoadL_schedd SCHEDD_LOG = $(LOG)/SchedLog MAX_SCHEDD_LOG = 5000000 SCHEDD_DEBUG = D_SCHEDD SCHEDD_INTERVAL = 180 CLIENT_TIMEOUT = 300
# # Negotiator Macros # NEGOTIATOR = $(BIN)/LoadL_negotiator NEGOTIATOR_DEBUG = D_FULLDEBUG D_ALWAYS D_NEGOTIATE NEGOTIATOR_LOG = $(LOG)/NegotiatorLog MAX_NEGOTIATOR_LOG = 5000000 NEGOTIATOR_INTERVAL = 60 MACHINE_UPDATE_INTERVAL = 600 NEGOTIATOR_PARALLEL_DEFER = 1800 NEGOTIATOR_PARALLEL_HOLD = 300 NEGOTIATOR_REDRIVE_PENDING = 1800 NEGOTIATOR_RESCAN_QUEUE = 180 NEGOTIATOR_REMOVE_COMPLETED = 0 # # Sets the interval between recalculation of the SYSPRIO values # for all the jobs in the queue # NEGOTIATOR_RECALCULATE_SYSPRIO_INTERVAL = 0 # # Starter Macros # STARTER = $(BIN)/LoadL_starter STARTER_DEBUG = D_FULLDEBUG STARTER_LOG = $(LOG)/StarterLog MAX_STARTER_LOG = 500000 # # LoadL_Master Macros # MASTER = $(BIN)/LoadL_master MASTER_LOG = $(LOG)/MasterLog MASTER_DEBUG = D_FULLDEBUG MAX_MASTER_LOG = 64000 RESTARTS_PER_HOUR = 12 PUBLISH_OBITUARIES = TRUE OBITUARY_LOG_LENGTH = 25 # # Specify whether log files are truncated when opened # TRUNC_MASTER_LOG_ON_OPEN = False TRUNC_STARTD_LOG_ON_OPEN = False TRUNC_SCHEDD_LOG_ON_OPEN = False TRUNC_KBDD_LOG_ON_OPEN = False TRUNC_STARTER_LOG_ON_OPEN = False TRUNC_COLLECTOR_LOG_ON_OPEN = False TRUNC_NEGOTIATOR_LOG_ON_OPEN = False
# NQS Directory # # # For users of NQS resources: # Specify the directory containing qsub, qstat, qdel # # NQS_DIR = /usr/bin # # Specify Custom metric keywords # # CUSTOM_METRIC = # CUSTOM_METRIC_COMMAND = $(ETC)/sw_chip_number # # Machine control expressions and macros # OpSys : $(OPSYS) Arch : $(ARCH) Machine : $(HOST).$(DOMAIN) # # Expressions used to control starting and stopping of foreign jobs # MINUTE = 60 HOUR = (60 * $(MINUTE)) StateTimer = (CurrentTime - EnteredCurrentState) BackgroundLoad = 0.7 HighLoad = 1.5 StartIdleTime = 15 * $(MINUTE) ContinueIdleTime = 5 * $(MINUTE) MaxSuspendTime = 10 * $(MINUTE) MaxVacateTime = 10 * $(MINUTE) KeyboardBusy= KeyboardIdle < $(POLLING_FREQUENCY) CPU_Idle = LoadAvg <= $(BackgroundLoad) CPU_Busy = LoadAvg >= $(HighLoad) # START : $(CPU_Idle) && KeyboardIdle > $(StartIdleTime) # SUSPEND : $(CPU_Busy) || $(KeyboardBusy) # CONTINUE : $(CPU_Idle) && KeyboardIdle > $(ContinueIdleTime) # VACATE : $(StateTimer) > $(MaxSuspendTime) # KILL : $(StateTimer) > $(MaxVacateTime) START : T SUSPEND : F CONTINUE : T VACATE : F KILL : F
# # Expressions used to prioritize job queue # # Values which can be part of the SYSPRIO expression are: # # QDate Job submission time # UserPrio User priority # UserSysprio System priority value based on userid (from the user # list file with default of 0) # ClassSysprio System priority value based on job class (from the class # list file with default of 0) # UserRunningProcs Number of jobs running for the user # GroupRunningProcs Number of jobs running for the group # # The following expression is an example. # #SYSPRIO: (ClassSysprio * 100) + (UserSysprio * 10) + (GroupSysprio * 1)- (QDate ) # # The following (default) expression for SYSPRIO creates a FIFO job queue. # SYSPRIO: (ClassSysprio * 100) - (QDate)
# # Expressions used to prioritize machines # # The following example orders machines by the load average # normalized for machine speed: # #MACHPRIO: 0 - (1000 * (LoadAvg / (Cpus * Speed))) # # The following (default) expression for MACHPRIO orders # machines by load average. # #MACHPRIO: 0 - (LoadAvg) + (MasterMachPriority * 10000) # The following expression for MACHPRIO orders # machines by increasing ammount of memory and # decreasing node number. # MACHPRIO: 0 - (100 * Memory) + CustomMetric + (MasterMachPriority * 10000) # # The MAX_JOB_REJECT value determines how many times a job can be # rejected before it is canceled or put on hold. The default value # is -1, which indicates no limit to the number of times a job can be # rejected. # MAX_JOB_REJECT = 0 # # When ACTION_ON_MAX_REJECT is HOLD, jobs will be put on user hold # when the number of rejects reaches the MAX_JOB_REJECT value. When # ACTION_ON_MAX_REJECT is CANCEL, jobs will be canceled when the # number of rejects reaches the MAX_JOB_REJECT value. The default # value is HOLD. # ACTION_ON_MAX_REJECT = CANCEL
This customer performs CPU-intensive work in the area of circuit simulation using Electronic Design Automation (EDA).
The customer has 752 batch servers; 209 are dedicated to run LoadLeveler jobs 24 hours a day (the central manager is excluded). The rest are used by LoadLeveler when they are not in use by their respective owners.
The LoadLeveler administrators control all the 173 dedicated machines. That means that users cannot get onto these systems without submitting a LoadLeveler job. 117 of the dedicated machines are public schedulers. The user machines are submit-only machines, and users do not have access to their root password. If a user needs root access to his or her machine, he or she is allowed alternate root access only; he or she cannot get global root access to all the machines on site. (Site administrators use a common global root password.)
This site runs over 31,000 jobs per week and about 2,800 CPU days of resource utilization. The central manager is a RISC/System 6000 model 370 with 128MB of RAM. The batch machines are generally 80 percent busy. The central manager is about 35 percent to 70 percent busy. The central manager does not run any jobs, it just manages. All of the LoadLeveler machines run one job at a time. (That is, MAX_STARTERS=1.)
This customer sees some machines in a down state occassionally. The administrator feels the CPU on these machines are too busy to get a time slice to report its state to the central manager. However, this down state does not cause any problem for this customer.
117 public schedulers are subset of our 173 dedicated machines and are listed in the admin file.
The following figures represent sections of this customer's LoadL_admin file for dedicated machines. Notice the default stanza. Also, every machine in the LoadLeveler cluster is listed in this file.
#=============================================================================# # type = machine default stanza #=============================================================================# default: type = machine # defaults for machine stanzas central_manager = false # no central manager on machine schedd_host = true # public schedd on machine #=============================================================================# # Central Manager #=============================================================================# mips1: type = machine # PRIMARY server - MANAGER 370 128M 3.2.5 central_manager = true # runs negotiator #=============================================================================# # Primary Servers #=============================================================================# beast100: type = machine # PRIMARY C=a/b/o/s2/t2 . . 550 128M 3.2.5 beast101: type = machine # PRIMARY C=a/b/b1/b4/c/o/r/s/t F . 550 128M 3.2.5 beast102: type = machine # PRIMARY C=a F . 550 128M 3.2.5 beast103: type = machine # PRIMARY C=a . . 550 128M 3.2.5
Later in the Loadl_admin file, user machines are defined. Notice the default stanza.
#=============================================================================# default: type = machine # defaults for machine stanzas central_manager = false # no central manager on machine schedd_host = false # no public schedd on machine #=============================================================================# agni: type = machine # SECONDARY server - rmkohn 550 64M 3.2.5 akama: type = machine # SECONDARY server - poulter 365 64M 3.2.5 alaska: type = machine # SECONDARY server - jcahill 340 64M 3.2.5 alcor: type = machine # SECONDARY server - drolson 340 64M 3.2.5
The following represents a local configuration file for a dedicated, public scheduler machine:
# PRIMARY LoadL SERVER ==> mips27 # # this loadl.config.local is tuned for a machine that is part of a compute # farm. Interactive users are discouraged. # # Run up to one jobs at a time. # # Always start a job if there is a class available. # # Never suspend a job. # # Since jobs never get suspended they never get vacated or killed. # SCHEDD_RUNS_HERE = True STARTD_RUNS_HERE = True Class = { "a" "b" "b1" "b4" "c" "k" "r" "s" "t" } Feature = { "PRI" } MAX_STARTERS = 1 POLLING_FREQUENCY = 30 POLLS_PER_UPDATE = 15 START : T SUSPEND : F START_DAEMONS = True X_RUNS_HERE = False
The following represents a local configuration file for a user's machine.
# SECONDARY SERVER ==> common # # This loadl_config.local is tuned to be "nice" to a workstation owner # who permits loadl jobs on his system but wants good response whenever # he is doing his own work. # # Run only one LoadLeveler job at a time. # # Check the keyboard for activity every five seconds. # # # Suspend a job if the load average exceeds 1.4 # # Continue a job when keyboard again goes idle for 10 minutes and the load # average is <.5 SCHEDD_RUNS_HERE = False STARTD_RUNS_HERE = True Class = { "a" "b" "b1" "b4" "c" "o" "r" "s" "t" } MAX_STARTERS = 1 START : $(FirstShift_KB9999) && $(StartS1) || ($(Off_Shift) || $(Week_End)) && $(Mach_Idle_S) SUSPEND : $(CPU_Busy) || $(KeyboardBusy) CONTINUE : $(Mach_Idle_C) VACATE : ((Class == "a") && $(Vacate_A)) || ($(Vacate_ClassesB) && $(Vacate_B)) || $(Vacate_X) KILL : $(Kill_Job) START_DAEMONS = True X_RUNS_HERE = True
This scientific customer provides experimental facilities for physicists from its 17 member states and for visiting scientists from throughout the world. The computing requirements of these users vary from mail and text processing to heavy batch and parallel processing.
Their processor is an SP2 using RISC System/6000 nodes linked by an internal high-speed network with a centrally managed software environment. The nodes are functionally divided into four groups of 16 each for different types of work: interactive logins, sequential job batch processing, parallel job batch processing and data, and tape and network services.
This customer uses AFS heavily. It provides the single system image for users' home directories and the files common to their experiments. Many software products are served directly out of AFS using symbolic links.
LoadLeveler provides this customer with the following facilities:
The batch configuration is designed to maximize short job turnaround while allowing the heavy CPU jobs to get good usage of the resources available.
The basic configuration uses a range of classes - short, medium, long and verylong - with a range of maximum job CPU times of from five minutes to six days. An additional class, night, provides off-peak and weekend computing time on the interactive areas of the SP2 during periods of low demand. Access to this class is limited to specific users.
Users in different experiments are defined in LoadLeveler groups which provide associated queue priorities. This allows groups with a large computing budget to be given higher priorities. An automated procedure calculates each group's resource utilization over the last month and adjusts their priorities accordingly. This ensures a fair allocation of CPU time among the groups.
This customer uses the Interactive Session Support facility to provide a name servier which returns the least loaded node according to a site defined metric. This allows a user to be given the least loaded operational node when he or she logs in.
This metric is based on the number of logged in users, with some weight given to those using Xstations. Every few minutes, the system is scanned to evaluate the following:
Xterminals*3 + Telnet*2 + Process
Where:
This metric tries to balance users across the system while providing some factor for their likely future utilization. A metric based on the CPU load average is too dependent on the current load to provide good balancing.
The metric can also be set to return a low priority if the file /etc/iss.nologin exists. This allows the administrator to drain the interactive use of a node if there is scheduled system maintenance. When the maintenance is completed, the file can be removed and the metric will return the correct value for the node. Users will therefore see an improved availability, since they will not be given a node that is about to shutdown.
The processors are configured as follows:
Is_Weekend = (tm_wday==0 || tm_wday==6) Is_Start_Night_Time = (tm_hour>18) START: $(Is_Start_Night_Time) || $(Is_Weekend)
This customer uses EDA to perform work in the area of computer chip design.
The customer has seven clusters of RISC/System 6000 machines. The largest cluster has 530 machines; the smallest cluster has 87 machines. The total number of machines at this installation is over 1200.
This customer has defined two configuration files for interactive work: one for standard workstations and one for large interactive servers. These files are meant to be tailored to machines of differing processing power.
#==============================================================================# # Description: LoadL_config.local for Standard Workstations (<370 Class) #==============================================================================# # Need 2x Paging Space to Real Memory ( minimum ) For Worst Case Of One # Suspended and One Foreground Running Job. # *) All Jobs (btv,lp) Suspend on LoadAvg or Keyboard/Mouse Movement. #==============================================================================# # Class defines the permissable classes, MAX_STARTERS defines the max # total jobs to be permitted. #==============================================================================# Class = { "btv" "lp" } MAX_STARTERS = 1 #==============================================================================# # The next definitions are used in the expressions below to regulate the # conditions under which jobs get started, suspended, and evicted. # All times are specified in units of seconds. #==============================================================================# BackgroundLoad = 0.8 HighLoad = 1.6 StartIdleTime = 900 ContinueIdleTime = 900
#==============================================================================# # LoadAvg is an internal variable whose value is the (Berkeley) load average # of the machine. # # CPU_Idle - No LoadL job running, or One job just finishing. # CPU_Busy - One LoadL job running, second job ( Foreground or Batch ) # starting up. # CPU_Max - Two LoadL jobs running. #==============================================================================# CPU_Idle = (LoadAvg <= $(BackgroundLoad)) CPU_Busy = (LoadAvg >= $(HighLoad)) #==============================================================================# # This defines a boolean "KeyboardBusy" whose value is TRUE if the keyboard # or mouse has been used since loadl last checked. Thus if POLLING_FREQUENCY # is 5 seconds, KeyboardBusy is TRUE if anybody has used the kbd or mouse in # the last 5 seconds. #==============================================================================# KeyboardBusy = KeyboardIdle < $(POLLING_FREQUENCY) #==============================================================================# # This statement indicates when a job should be started on this machine #==============================================================================# Weekend = ( (tm_wday >= 6) || (tm_wday < 1) ) Day = ( (tm_hour >= 7) && (tm_hour < 18) ) Night = ( (tm_hour >= 18) || (tm_hour < 4) ) Inactive = ( (KeyboardIdle > $(StartIdleTime)) && $(CPU_Idle) ) HP = ( (Class == "btv") ) LP = ( ($(Weekend) || $(Night)) ) START : ( ($(HP) || $(LP)) && $(Inactive) ) #==============================================================================# # The SUSPEND statement here says that a job should be suspended but not # killed if: # LoadAvg >= 1.6 Or KeyboardIdle < 5 #==============================================================================# SUSPEND : ( $(CPU_Busy) || $(KeyboardBusy) ) #==============================================================================# # This CONTINUE statement indicates that a suspended job should be continued # if the cpu goes idle and the keyboard/mouse has not been used for the last # 15 minutes. #==============================================================================# CONTINUE : $(CPU_Idle) && KeyboardIdle > $(ContinueIdleTime) #==============================================================================# # Jobs in the SUSPEND state are never killed, after 60 minutes they are # relocated to a different machine if possible. #==============================================================================# MaxSuspendTime = 60 * $(MINUTE) VACATE : $(StateTimer) > $(MaxSuspendTime) KILL : F
#==============================================================================# # If you set START_DAEMONS to False loadl can never start on this machine. # For example you may want to stop loadl for a couple days for maintenance # and make sure no procedure automatically restarts it. #==============================================================================# START_DAEMONS = True #==============================================================================# # Set the maximum size each of the logs can reach before wrapping. #==============================================================================# MAX_SCHEDD_LOG = 128000 MAX_COLLECTOR_LOG = 128000 MAX_STARTD_LOG = 128000 MAX_SHADOW_LOG = 128000 MAX_KBDD_LOG = 128000
#==============================================================================# # Description: LoadL_config.local for Interactive Large Servers (580-590 Class) #==============================================================================# # Need 3x Real Memory To Paging Space ( minimum ) For Worst Case Of Two # Suspended and One Foreground Running Job. # *) All Jobs (btv,lp) Suspend on LoadAvg or Keyboard/Mouse Movement. # *) Real Memory >= 192meg. #==============================================================================# #==============================================================================# # Class defines the permissable classes, MAX_STARTERS defines the max # total jobs to be permitted. #==============================================================================# Class = { "btv" "lp" } MAX_STARTERS = 2 #==============================================================================# # The next definitions are used in the expressions below to regulate the # conditions under which jobs get started, suspended, and evicted. # # All times are specified in units of seconds. #==============================================================================# BackgroundLoad = 0.8 LowLoad = 1.0 HighLoad = 1.6 MaxLoad = 2.0 StartIdleTime = 900 ContinueIdleTime = 900
#==============================================================================# # LoadAvg is an internal variable whose value is the (Berkeley) load average # of the machine. # # CPU_Idle - No LoadL job running, or One job just finishing. # CPU_Busy - One LoadL job running, second job ( Foreground or Batch ) # starting up. # CPU_Max - Two LoadL jobs running. #==============================================================================# CPU_Idle = (LoadAvg <= $(BackgroundLoad)) CPU_Run = (LoadAvg <= $(LowLoad)) CPU_Busy = (LoadAvg >= $(HighLoad)) CPU_Max = (LoadAvg >= $(MaxLoad)) #==============================================================================# # This defines a boolean "KeyboardBusy" whose value is TRUE if the keyboard # or mouse has been used since loadl last checked. Thus if POLLING_FREQUENCY # is 5 seconds, KeyboardBusy is TRUE if anybody has used the kbd or mouse in # the last 5 seconds. #==============================================================================# KeyboardBusy = KeyboardIdle < $(POLLING_FREQUENCY) #==============================================================================# # This statement indicates when a job should be started on this machine #==============================================================================# Weekend = ( (tm_wday >= 6) || (tm_wday < 1) ) Day = ( (tm_hour >= 7) && (tm_hour < 18) ) Night = ( (tm_hour >= 18) || (tm_hour < 4) ) Inactive1 = ( (KeyboardIdle > $(StartIdleTime)) ) Inactive2 = ( (KeyboardIdle > $(ContinueIdleTime)) ) HP = ( (Class == "btv") ) LP = ( (Class == "lp") && $(CPU_Idle) ) START : ( ($(HP) || $(LP)) && $(Inactive1) ) #==============================================================================# # The SUSPEND statement here says that a job should be suspended but not # killed if: # KeyboardIdle < 5 Or # lp Class And LoadAvg >= 1.6 Or # btv Class And LoadAvg >= 2.0 #==============================================================================# SUSPEND : ( ( (Class == "lp") && $(CPU_Busy) ) || \ ( (Class == "btv") && $(CPU_Max) ) || \ ( $(KeyboardBusy) ) ) #==============================================================================# # This CONTINUE statement indicates that a suspended job should be continued # if: # lp Class And LoadAvg <= 0.8 And KeyboardIdle > 15 min Or # btv Class And LoadAvg <= 1.0 And KeyboardIdle > 15 min #==============================================================================# CONTINUE : ( ( (Class == "lp") && $(CPU_Idle) && $(Inactive2) ) || \ ( (Class == "btv") && $(CPU_Run) && $(Inactive2) ) )
#==============================================================================# # Jobs in the SUSPEND state are never killed, after 60 minutes they are # relocated to a different box if possible. #==============================================================================# MaxSuspendTime = 60 * $(MINUTE) VACATE : $(StateTimer) > $(MaxSuspendTime) KILL : F #==============================================================================# # If you set START_DAEMONS to False loadl can never start on this machine. # For example you may want to stop loadl for a couple days for maintenance # and make sure no procedure automatically restarts it. #==============================================================================# START_DAEMONS = True #==============================================================================# # Set the maximum size each of the logs can reach before wrapping. #==============================================================================# MAX_SCHEDD_LOG = 128000 MAX_COLLECTOR_LOG = 128000 MAX_STARTD_LOG = 128000 MAX_SHADOW_LOG = 128000 MAX_KBDD_LOG = 128000
The following configuration file defines dedicated batch machines. Notice, however, that jobs in the lp class will suspend when a machine becomes too busy. So in this sense, the machines are not fully dedicated.
#==============================================================================# # Description: LoadL_config.local for Large Batch Servers ( 580 - 590 Class ) #==============================================================================# # Need 3x Real Memory To Paging Space ( minimum ) For Worst Case Of One # Suspended and Two Foreground Running Job. # *) High Priority Jobs (btv) Never Suspend. # *) Job Suspension (lp) Based on LoadAvg Only. # *) Real Memory >= 192meg. #==============================================================================# #==============================================================================# # Class defines the permissable classes, MAX_STARTERS defines the max # total jobs to be permitted. #==============================================================================# Class = { "btv" "lp" } MAX_STARTERS = 2 #==============================================================================# # The next definitions are used in the expressions below to regulate the # conditions under which jobs get started, suspended, and evicted. # # All times are specified in units of seconds. #==============================================================================# BackgroundLoad = 0.5 HighLoad = 1.6 StartIdleTime = 900 ContinueIdleTime = 900 #==============================================================================# # LoadAvg is an internal variable whose value is the (Berkeley) load average # of the machine. # # CPU_Idle - No LoadL job running, or One job just finishing. # CPU_Busy - One LoadL job running, second job ( Foreground or Batch ) # starting up. # CPU_Max - Two LoadL jobs running. #==============================================================================# CPU_Idle = (LoadAvg <= $(BackgroundLoad)) CPU_Busy = (LoadAvg >= $(HighLoad))
#==============================================================================# # This defines a boolean "KeyboardBusy" whose value is TRUE if the keyboard # or mouse has been used since loadl last checked. Thus if POLLING_FREQUENCY # is 5 seconds, KeyboardBusy is TRUE if anybody has used the kbd or mouse in # the last 5 seconds. #==============================================================================# KeyboardBusy = KeyboardIdle < $(POLLING_FREQUENCY) #==============================================================================# # This statement indicates when a job should be started on this machine #==============================================================================# HP = ( (Class == "btv") ) LP = ( (Class == "lp") && $(CPU_Idle) ) START : ( $(HP) || $(LP) ) #==============================================================================# # The SUSPEND statement here says that a "lp" job should be suspended but not # killed if a high priority job starts up or a foreground job causes the # Loadavg to be greater than CPU_Busy ( 1.6 ). #==============================================================================# SUSPEND : (Class == "lp") && $(CPU_Busy) #==============================================================================# # This CONTINUE statement indicates that a suspended job should be continued # if the cpu goes idle and the keyboard/mouse has not been used for the last # 15 minutes. #==============================================================================# CONTINUE : $(CPU_Idle) && KeyboardIdle > $(ContinueIdleTime) #==============================================================================# # Jobs in the SUSPEND state are never killed, after 60 minutes they are # relocated to a different box if possible. #==============================================================================# MaxSuspendTime = 60 * $(MINUTE) VACATE : $(StateTimer) > $(MaxSuspendTime) KILL : F #==============================================================================# # If you set START_DAEMONS to False loadl can never start on this machine. # For example you may want to stop loadl for a couple days for maintenance # and make sure no procedure automatically restarts it. #==============================================================================# START_DAEMONS = True #==============================================================================# # Set the maximum size each of the logs can reach before wrapping. #==============================================================================# MAX_SCHEDD_LOG = 128000 MAX_COLLECTOR_LOG = 128000 MAX_STARTD_LOG = 128000 MAX_SHADOW_LOG = 128000 MAX_KBDD_LOG = 128000
The following statements define a machine that schedules jobs but does not run jobs. Notice that the schedd daemon is never forced to not run.
# # This loadl local configuration file is set up to make a machine a # submitter only. # # No jobs are allowed to run on this system. # MAX_STARTERS = 0 START : F # # If you set START_DAEMONS to False loadl can never start on this machine. # For example you may want to stop loadl for a couple days for maintenance # and make sure no procedure automatically restarts it. # START_DAEMONS = True