Users can submit NQS scripts to LoadLeveler and have them routed to a machine outside of the LoadLeveler cluster that runs NQS. LoadLeveler supports COSMIC NQS version 2.0 and other versions of NQS that support the same commands and options and produce similar output for those commands.
The following diagram illustrates a typical environment that allows users to have their jobs routed to machines outside of LoadLeveler for processing:
Figure 31. Environment illustrating jobs being routed to NQS machines.
View figure.
As the diagram illustrates, machines A, B, and C, are members of the LoadLeveler cluster. Machine A has the central manager running on it and machine B has both LoadLeveler and NQS running on it. Machine C is a third member of the cluster. Machine D is outside of the cluster and is running NQS.
When a user submits a job to LoadLeveler, machine A, that runs the central manager, schedules the job to machine B. LoadLeveler running on machine B routes the job to machine D using NQS. Keep this diagram in mind as you continue to read this chapter.
Setting up the NQS environment involves the following:
In the previous diagram, you would create the NQS pipe queue on machine B.
To designate a machine to which your jobs will be routed, follow these steps:
You can set up multiple classes to access different machines.
The following example walks you through the process of setting up your environment to route jobs to machines that run NQS.
Assume Figure 31 depicts your environment. You have three machines in the cluster named A, B, and C. Outside of the cluster, you have machine D running NQS.
After setting up your NQS environment, modify the LoadL_admin file by defining the class NQS including the following stanzas:
NQS: type = class NQS_class = true NQS_submit = pipe_a NQS_query = queue@chevy.kgn.ibm.com
Modify the LoadL_config.local on the machine(s) that you want to accept this class of jobs. In this example, you would modify machine B's LoadL_config.local file. To do this, add a class statement similar to:
CLASS = {"NQS" "a" "b" ....}
where NQS is the name of the class of jobs that will be routed to the machines that run NQS, and a and b are names of additional classes.
After you perform the previous tasks, users can route their jobs to machines running NQS using the llsubmit command. The job command file must specify the class keyword. For example:
class = NQS
The job command file must also contain the shell script to be submitted to the NQS node. NQS accepts only shell scripts, binaries are not allowed. All options in the command file pertaining to scheduling the job will be used by LoadLeveler to schedule the job. When the job is dispatched to the node running the specified NQS class, the LoadLeveler options pertaining to the runtime environment are converted to NQS options and the job is submitted to the specified NQS queue.
LoadLeveler command file options are used as follows:
Users can also submit an NQS script. In this case, any NQS options in the script are used to schedule the job and once dispatched by LoadLeveler, the file is sent to NQS unmodified.
LoadLeveler schedules these jobs the same as it schedules other jobs. When the job is dispatched, LoadLeveler determines whether or not it is running in an NQS class. If it is, an NQS command qsub is issued.
LoadLeveler monitors the job by periodically invoking a qstat command. A qstat command is first issued for the pipe queue on the local host. If the request id is not found, a qstat is issued for each queue listed in the NQS_query class keyword. If the request id is still not found, starter marks the job as complete.
When a job is sent to an NQS class, llsubmit saves the following environment variables:
When LoadLeveler dispatches the job, these environment variables are installed so that they are available to qsub. llsubmit also saves the name of the current directory (pwd) and the current value of the user file create mask (umask).
Users can obtain status of NQS jobs in the same way as they obtain status of LoadLeveler jobs - either by using the llq command or by viewing the Jobs window on the graphical user interface. The users can identify the NQS jobs by the class field on the Jobs window.
LoadLeveler monitors the job until qstat shows the job is no longer in any specified queue.
NQS does not provide job accounting. Therefore, the only accounting information LoadLeveler will have is the total time for the job.
LoadLeveler will not send mail when the job completes. The LoadLeveler notification option is translated to the appropriate NQS flag (me or mb) and NQS will send the mail.
Users can cancel NQS jobs using the LoadLeveler llcancel command. All they need to know is the LoadLeveler job id for the NQS job. Once they submit their request to cancel the job, LoadLeveler forwards their request to the appropriate node and a qdel will be issued for the job for the queue listed in the the NQS_submit and NQS_query keywords.
Scripts originally written for NQS that contain NQS options are acceptable to LoadLeveler. The options are mapped as closely as possible to the features provided by LoadLeveler, but the exact function is not always available. NQS options map to LoadLeveler as follows: