Using and Administering

Collecting Job Resource Data on Serial and Parallel Jobs

Information on completed serial and parallel jobs is gathered using the UNIX wait3 system call. Information on non-completed serial and parallel jobs is gathered in a platform-dependent manner by examining data from the UNIX process.

Accounting information on a completed serial job is determined by accumulating resources consumed by that job on the machine(s) that ran the job. Similarly, accounting information on completed parallel jobs is gathered by accumulating resources used on all of the nodes that ran the job.

You can also view resource consumption information on serial and parallel jobs that are still running by specifying the -x option of the llq command. In order to enable llq -x, you should specify the following keywords in the configuration file:

ACCT = A_ON A_DETAIL: Turns accounting data recording on. For more information on this keyword, see Step 9: Define Job Accounting.
JOB_ACCT_Q_POLICY = number: where number is the amount of time in seconds that determines how often the startd daemon updates the schedd daemon with accounting data of running jobs. This controls the accuracy of the llq -x command. The default is 300 seconds.
JOB_LIMIT_POLICY = number: where number is an amount of time in seconds. The smaller of JOB_LIMIT_POLICY and JOB_ACCT_Q_POLICY is used to control how often the startd daemon collects resource consumption data on running jobs, and how often the job_cpu_limit is checked. The default for JOB_LIMIT_POLICY is POLLING_FREQUENCY multiplied by POLLS_PER_UPDATE.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]