Using and Administering


Planning Considerations

Node availability
Some workstation owners might agree to accept LoadLeveler jobs only when they are not using the workstation themselves. Using LoadLeveler keywords, these workstations can be configured to be available at designated times only.

Common name space
To run jobs on any machine in the LoadLeveler cluster, a user needs the same uid (the system ID number for a user) and gid (the system ID number for a group) on every machine in the cluster. The term cluster refers to all machines mentioned in the configuration file.

For example, if there are two machines in your LoadLeveler cluster, machine_1 and machine_2, user john must have the same user ID and login group ID in the /etc/passwd file on both machines. If user john has user ID 1234 and login group ID 100 on machine_1, then user john must have the same user ID and login group ID in /etc/passwd on machine_2. This ensures that the getuid system call returns the same user ID on both systems. (This allows a job to run with the same group ID and user ID of the person who submitted the job.)

If you do not have a user ID on one machine, your jobs will not run on that machine. Also, many commands, such as llq, will not work correctly if a user does not have a user ID on the central manager machine.

However, there are cases where you may choose to not give a user a login ID on a particular machine. For example, a user does not need an ID on every submit-only machine; the user only needs to be able to submit jobs from at least one such machine. Also, you may choose to restrict a user's access to a schedd machine that is not a public scheduler; again, the user only needs access to at least one schedd machine.

Performance
You should keep the log, spool, and execute directories in a local file system in order to maximize performance. Also, to measure the performance of your network, consider using one of the available products, such as Toolbox/6000.

Management
Managing distributed software systems is a primary concern for all system administrators. Allowing users to share filesystems to obtain a single, network-wide image, is one way to make managing LoadLeveler easier.

Resource Handling
Some nodes in the LoadLeveler cluster might have special software installed that users might need to run their jobs successfully. You should configure LoadLeveler to distinguish those nodes from other nodes using, for example, machine features.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]