Diagnosis and Messages Guide
Error logging is the writing of information to persistent storage to be
used for debugging purposes. This type of logging is for subsystems
that perform a service or function on behalf of an end user. The
subsystem does not communicate directly with the end user and, therefore,
needs to log events to some storage location. The events that are
logged are primarily error events.
LoadLeveler error logging uses AIX error log facilities to report events on
a per machine basis. The intent is to have the AIX error log be the
starting point for diagnosing system problems. For more information on
AIX error log facilities, see IBM AIX Problem
Solving Guide and Reference SC23-2606, and IBM General Concepts
and Procedures for RISC System/6000 GC23-2202. For more information on Error Log facilities available with
the SP, see IBM RS/6000 Scalable
POWERparallel SP Diagnosis and Messages Guide, SC23-3866.
Error log entries include a "DETECTING MODULE" string
that identifies software component, module name, module level, and line of
code or function that detected the event that was logged. The
information is formatted depending on the logging facility the user is
viewing. For example the AIX Error Log facility information appears
as:
DETECTING MODULE
LPP=LPP name Fn=filename SID_level_of_the_file L#=Line number
The following table shows the mapping of LoadLeveler error log suffixes to
syslog priorities and AIX Error Log error types.
Table 1. LoadLeveler Error Label Suffixes Mapped to AIX Error Log
Error Label Suffix
| AIX Error Log Error Type
| AIX Error Log Description
|
EM
| PEND
| The loss of availability of a device is imminent.
|
ER
| PERM
| No recovery from this condition. A permanent error
occurred.
|
ST
| UNKN
| It is not possible to determine the severity of the error.
|
TR
| UNKN
| It is not possible to determine the severity of the error.
|
RE
| TEMP
| Condition was recovered after several unsuccessful attempts.
|
DE
| UNKN
| It is not possible to determine the severity of the error.
|
The following events are recorded by the LoadLeveler error logging
facility:
- Errors occurring during the initial start up of LoadLeveler daemons
- Installation and customization errors
- The restarts per hour limit being exceeded
- LoadL_master detecting the crash of a daemon
- Changes in the state of a daemon
- LoadLeveler truncating the log files
Enter the following command to view the LoadLeveler error reports for a
machine:
errpt -a -N LoadLeveler
Enter the following command to clear all LoadLeveler entries
from the error logs of a machine:
errclear -N LoadLeveler 0
You can be notified of a LoadLeveler error when it occurs by using the AIX
Error Notification Facility.
This facility will perform an ODM method defined by the administrator when a
particular error occurs or a particular process fails. IBM General
Concepts and Procedures for RISC System/6000 (GC23-2202) explains how to use the AIX Error Notification Facility.
The following is a sample LoadLeveler error report:
Figure 1. Sample LoadLeveler Error Log
LABEL: LL_TRUNCATE_ST
IDENTIFIER: 05D868E9
Date/Time: Fri Mar 31 18:07:43
Sequence Number: 271
Machine Id: 000001801000
Node Id: ll2
Class: S
Type: UNKN
Resource Name: LoadLeveler
Description
SOFTWARE
Probable Causes
APPLICATION PROGRAM
User Causes
FILE NEEDS REORGANIZATION
Recommended Actions
NO ACTION NECESSARY
Detail Data
DETECTING MODULE
LPP=LoadL,Fn=dprintf_config.c,SID=1.2,L#=392,
DIAGNOSTIC EXPLANATION
Logfile "/u/loadl/log/StartLog" was truncated
Table 2 describes the possible causes for some common problems reported by the
error logging facility:
Table 2. Possible Causes for LoadLeveler Failure
- Label:
- LL_INIT_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- LoadL_master must be run as root
-
-
- Explanation:
- The ownership and the permissions of LoadL_master file are
incorrect
- Cause:
- This error ocurrs for one of the following reasons:
- The LoadL_master file is not owned by root
- The LoadL_master file does not have Set User ID permission (the
"s" bit)
- Action:
- Perform the following actions:
- Issue chown root LoadL_master
- chmod u+s LoadL_master
- Restart LoadLeveler
|
- Label:
- LL_INIT_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- master: could not set the real uid to ROOT, rc=nn
-
-
- Explanation:
- Unable to set the real Uid to 0 (root).
- Cause:
- The setuid system call failed for an authorized caller.
- Action:
- Contact the system administrator
|
- Label:
- LL_INIT_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- Error setting group id to nn. errno = nn.
-
-
- Explanation:
- LoadL_master was unable to set the group ID.
- Cause:
- The setguid() call failed.
- Action:
- Verify that the following are true:
- LoadLGroupid is defined correctly in
/etc/LoadL.cfg
- The group ID reported in the error description is defined on the
system.
|
- Label:
- LL_START_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- LoadLeveler cannot continue; this host not in machine list
-
-
- Explanation:
- You attempted to start LoadLeveler on a machine that is not defined in the
LoadL_admin file, and MACHINE_AUTHENTICATE = TRUE is set in the
LoadL_config file.
- Cause:
- See "Explanation."
- Action:
- Do one of the following:
- Define the machine in the LoadL_admin file
- Set MACHINE_AUTHENTICATE = FALSE in the LoadL_config file
Then, restart LoadLeveler on the machine where the problem occurred.
|
- Label:
- LL_START_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- Cannot start LoadLeveler. Must be the designated LoadLeveler
administrator
-
-
- Explanation:
- A user other than the user defined as the LoadLeveler administrator
attempted to start LoadLeveler.
- Cause:
- The LOADL_ADMIN keyword does not contain the user that
attempted to start LoadLeveler.
- Action:
- Add the user to the LoadL_admin file.
|
- Label:
- LL_START_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- START_DAEMONS flag was set to value. Exiting.
-
-
- Explanation:
- LoadL_master could not start the LoadLeveler daemons.
- Cause:
- The START_DAEMONS keyword was not defined or was set to
False.
- Action:
- Set START_DAEMONS = true.
|
- Label:
- LL_START_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- Cannot get address of hostname from nameserver; LoadLeveler not
starting.
-
-
- Explanation:
- LoadLeveler could not resolve the hostname of the machine defined as the
central manager.
- Cause:
- gethostbyname() failed.
- Action:
- Follow standard procedures for resolving nameserver problems.
|
- Label:
- LL_INSTL_CUST_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- keyword not specified in config file. Admin file not
specified in config! Cannot continue.
-
-
- Explanation:
- The specified keyword is not defined correctly.
- Cause:
- This error occurs for one of the following reasons:
- The keyword value is equal to a blank
- The keyword is misspelled
- The keyword is commented out
- The keyword is missing
- Action:
- Define the keyword correctly.
|
- Label:
- LL_INSTL_CUST_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- keyword=value can not execute
-
-
- Explanation:
- LoadLeveler was unable to execute the specified keyword (binary).
- Cause:
- This error occurs for one of the following reasons:
- The binary does not have execute permission
- The binary name was misspelled
- The binary does not exist
- Action:
- Correct the problem and reissue the keyword.
|
- Label:
- LL_INSTL_CUST_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- Cannot open file filename
-
-
- Explanation:
- LoadLeveler was unable to open the specified file.
- Cause:
- This error occurs for one of the following reasons:
- Incorrect permissions are set for the file
- The file does not exist
- Action:
- Correct the problem with file.
|
- Label:
- LL_RESTART_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- Exceeded nn restarts / hour
-
-
- Explanation:
- LoadL_master exceeded the maximum allowed restarts for a
daemon, as defined by the RESTARTS_PER_HOUR keyword.
- Cause:
- The LoadL_master daemon died after exceeding the
RESTARTS_PER_HOUR limit.
- Action:
- Check the LoadLeveler error logs and the administrator mail for more
information.
|
- Label:
- LL_CRASH_ER
- Error Type:
- PERM
- Diagnostic Explanation:
- The daemon_name (process pid) died
-
-
- Explanation:
- The specified daemon died.
- Cause:
- This error occurs for one of the following reasons:
- The daemon received a signal
- The daemon experienced an unrecoverable error
- Action:
- Check the LoadLeveler error logs for more information.
|
- Label:
- LL_INFO_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- Started daemon_name, pid and pgroup=pid
-
-
- Explanation:
- The specified daemon started.
- Action:
- None. This is an informational message.
|
- Label:
- LL_INFO_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- Got SHUTDOWN command. Got RECONFIG command.
-
-
- Explanation:
- LoadL_master received the specified commands.
- Cause:
- The llctl command was issued with "stop" or
"reconfig".
- Action:
- None. This is an informational message.
|
- Label:
- LL_INFO_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- Alternate central manager cent_manager will become active central
manager. Switching central manager to cent_manager.
-
-
- Explanation:
- LoadLeveler performed central manager recovery.
- Cause:
- The backup central manager was unable to contact the primary central
manager.
- Action:
- Check the LoadLeveler error logs and the administrator mail for more
information.
|
- Label:
- LL_INFO_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- Alternate central manager cent_manager will return to stand-by
state.
-
-
- Explanation:
- LoadLeveler performed central manager recovery.
- Cause:
- The backup central manager gave control back to the primary central
manager.
- Action:
- None. This is an informational message.
|
- Label:
- LL_INFO_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- machine is unable to serve as alternate central manager.
-
-
- Explanation:
- The machine specified cannot be the alternate central manager because the
appropriate negotiator keyword is not defined for this machine, or is defined
incorrectly.
- Cause:
- The specified machine was defined as the central manager.
- Action:
- Define the appropriate keywords.
|
- Label:
- LL_TRUNCATE_ST
- Error Type:
- UNKN
- Diagnostic Explanation:
- Logfile filename was truncated.
-
-
- Explanation:
- LoadLeveler truncated the specified log file because the file exceeded the
size defined in the LoadL_config file.
- Action:
- None. This is an informational message.
|
[ Top of Page | Previous Page | Next Page | Table of Contents ]