IBM Books

Diagnosis and Messages Guide


Error Logging Overview

Error logging is the writing of information to persistent storage to be used for debugging purposes. This type of logging is for subsystems that perform a service or function on behalf of an end user. The subsystem does not communicate directly with the end user and, therefore, needs to log events to some storage location. The events that are logged are primarily error events.

LoadLeveler error logging uses AIX error log facilities to report events on a per machine basis. The intent is to have the AIX error log be the starting point for diagnosing system problems. For more information on AIX error log facilities, see IBM AIX Problem Solving Guide and Reference SC23-2606, and IBM General Concepts and Procedures for RISC System/6000 GC23-2202. For more information on Error Log facilities available with the SP, see IBM RS/6000 Scalable POWERparallel SP Diagnosis and Messages Guide, SC23-3866.

Error log entries include a "DETECTING MODULE" string that identifies software component, module name, module level, and line of code or function that detected the event that was logged. The information is formatted depending on the logging facility the user is viewing. For example the AIX Error Log facility information appears as:

DETECTING MODULE
LPP=LPP name Fn=filename SID_level_of_the_file L#=Line number

How Error Log Events Are Classified

The following table shows the mapping of LoadLeveler error log suffixes to syslog priorities and AIX Error Log error types.

Table 1. LoadLeveler Error Label Suffixes Mapped to AIX Error Log
Error Label Suffix AIX Error Log Error Type AIX Error Log Description
EM PEND The loss of availability of a device is imminent.
ER PERM No recovery from this condition. A permanent error occurred.
ST UNKN It is not possible to determine the severity of the error.
TR UNKN It is not possible to determine the severity of the error.
RE TEMP Condition was recovered after several unsuccessful attempts.
DE UNKN It is not possible to determine the severity of the error.

What Events are Recorded

The following events are recorded by the LoadLeveler error logging facility:

Viewing LoadLeveler Error Log Reports

Enter the following command to view the LoadLeveler error reports for a machine:

errpt -a -N LoadLeveler

Clearing All LoadLeveler Error Log Entries

Enter the following command to clear all LoadLeveler entries from the error logs of a machine:

errclear -N LoadLeveler 0

Error Notification

You can be notified of a LoadLeveler error when it occurs by using the AIX Error Notification Facility. This facility will perform an ODM method defined by the administrator when a particular error occurs or a particular process fails. IBM General Concepts and Procedures for RISC System/6000 (GC23-2202) explains how to use the AIX Error Notification Facility.

Sample Error Log

The following is a sample LoadLeveler error report:

Figure 1. Sample LoadLeveler Error Log

 
LABEL:         LL_TRUNCATE_ST
IDENTIFIER:    05D868E9
 
Date/Time:       Fri Mar 31 18:07:43
Sequence Number: 271
Machine Id:      000001801000
Node Id:         ll2
Class:           S
Type:            UNKN
Resource Name:   LoadLeveler
 
Description
SOFTWARE
 
Probable Causes
APPLICATION PROGRAM
 
User Causes
FILE NEEDS REORGANIZATION
 
Recommended Actions
NO ACTION NECESSARY
 
Detail Data
DETECTING MODULE
LPP=LoadL,Fn=dprintf_config.c,SID=1.2,L#=392,
DIAGNOSTIC EXPLANATION
Logfile "/u/loadl/log/StartLog" was truncated

Possible Causes for LoadLeveler Errors

Table 2 describes the possible causes for some common problems reported by the error logging facility:

Table 2. Possible Causes for LoadLeveler Failure
Label:
LL_INIT_ER
Error Type:
PERM
Diagnostic Explanation:
LoadL_master must be run as root
Explanation:
The ownership and the permissions of LoadL_master file are incorrect
Cause:
This error ocurrs for one of the following reasons:
  • The LoadL_master file is not owned by root
  • The LoadL_master file does not have Set User ID permission (the "s" bit)
Action:
Perform the following actions:
  • Issue chown root LoadL_master
  • chmod u+s LoadL_master
  • Restart LoadLeveler

Label:
LL_INIT_ER
Error Type:
PERM
Diagnostic Explanation:
master: could not set the real uid to ROOT, rc=nn
Explanation:
Unable to set the real Uid to 0 (root).
Cause:
The setuid system call failed for an authorized caller.
Action:
Contact the system administrator

Label:
LL_INIT_ER
Error Type:
PERM
Diagnostic Explanation:
Error setting group id to nn. errno = nn.
Explanation:
LoadL_master was unable to set the group ID.
Cause:
The setguid() call failed.
Action:
Verify that the following are true:
  • LoadLGroupid is defined correctly in /etc/LoadL.cfg
  • The group ID reported in the error description is defined on the system.

Label:
LL_START_ER
Error Type:
PERM
Diagnostic Explanation:
LoadLeveler cannot continue; this host not in machine list
Explanation:
You attempted to start LoadLeveler on a machine that is not defined in the LoadL_admin file, and MACHINE_AUTHENTICATE = TRUE is set in the LoadL_config file.
Cause:
See "Explanation."
Action:
Do one of the following:
  • Define the machine in the LoadL_admin file
  • Set MACHINE_AUTHENTICATE = FALSE in the LoadL_config file

Then, restart LoadLeveler on the machine where the problem occurred.


Label:
LL_START_ER
Error Type:
PERM
Diagnostic Explanation:
Cannot start LoadLeveler. Must be the designated LoadLeveler administrator
Explanation:
A user other than the user defined as the LoadLeveler administrator attempted to start LoadLeveler.
Cause:
The LOADL_ADMIN keyword does not contain the user that attempted to start LoadLeveler.
Action:
Add the user to the LoadL_admin file.

Label:
LL_START_ER
Error Type:
PERM
Diagnostic Explanation:
START_DAEMONS flag was set to value. Exiting.
Explanation:
LoadL_master could not start the LoadLeveler daemons.
Cause:
The START_DAEMONS keyword was not defined or was set to False.
Action:
Set START_DAEMONS = true.

Label:
LL_START_ER
Error Type:
PERM
Diagnostic Explanation:
Cannot get address of hostname from nameserver; LoadLeveler not starting.
Explanation:
LoadLeveler could not resolve the hostname of the machine defined as the central manager.
Cause:
gethostbyname() failed.
Action:
Follow standard procedures for resolving nameserver problems.

Label:
LL_INSTL_CUST_ER
Error Type:
PERM
Diagnostic Explanation:
keyword not specified in config file. Admin file not specified in config! Cannot continue.
Explanation:
The specified keyword is not defined correctly.
Cause:
This error occurs for one of the following reasons:
  • The keyword value is equal to a blank
  • The keyword is misspelled
  • The keyword is commented out
  • The keyword is missing
Action:
Define the keyword correctly.

Label:
LL_INSTL_CUST_ER
Error Type:
PERM
Diagnostic Explanation:
keyword=value can not execute
Explanation:
LoadLeveler was unable to execute the specified keyword (binary).
Cause:
This error occurs for one of the following reasons:
  • The binary does not have execute permission
  • The binary name was misspelled
  • The binary does not exist
Action:
Correct the problem and reissue the keyword.

Label:
LL_INSTL_CUST_ER
Error Type:
PERM
Diagnostic Explanation:
Cannot open file filename
Explanation:
LoadLeveler was unable to open the specified file.
Cause:
This error occurs for one of the following reasons:
  • Incorrect permissions are set for the file
  • The file does not exist
Action:
Correct the problem with file.

Label:
LL_RESTART_ER
Error Type:
PERM
Diagnostic Explanation:
Exceeded nn restarts / hour
Explanation:
LoadL_master exceeded the maximum allowed restarts for a daemon, as defined by the RESTARTS_PER_HOUR keyword.
Cause:
The LoadL_master daemon died after exceeding the RESTARTS_PER_HOUR limit.
Action:
Check the LoadLeveler error logs and the administrator mail for more information.

Label:
LL_CRASH_ER
Error Type:
PERM
Diagnostic Explanation:
The daemon_name (process pid) died
Explanation:
The specified daemon died.
Cause:
This error occurs for one of the following reasons:
  • The daemon received a signal
  • The daemon experienced an unrecoverable error
Action:
Check the LoadLeveler error logs for more information.

Label:
LL_INFO_ST
Error Type:
UNKN
Diagnostic Explanation:
Started daemon_name, pid and pgroup=pid
Explanation:
The specified daemon started.
Action:
None. This is an informational message.

Label:
LL_INFO_ST
Error Type:
UNKN
Diagnostic Explanation:
Got SHUTDOWN command. Got RECONFIG command.
Explanation:
LoadL_master received the specified commands.
Cause:
The llctl command was issued with "stop" or "reconfig".
Action:
None. This is an informational message.

Label:
LL_INFO_ST
Error Type:
UNKN
Diagnostic Explanation:
Alternate central manager cent_manager will become active central manager. Switching central manager to cent_manager.
Explanation:
LoadLeveler performed central manager recovery.
Cause:
The backup central manager was unable to contact the primary central manager.
Action:
Check the LoadLeveler error logs and the administrator mail for more information.

Label:
LL_INFO_ST
Error Type:
UNKN
Diagnostic Explanation:
Alternate central manager cent_manager will return to stand-by state.
Explanation:
LoadLeveler performed central manager recovery.
Cause:
The backup central manager gave control back to the primary central manager.
Action:
None. This is an informational message.

Label:
LL_INFO_ST
Error Type:
UNKN
Diagnostic Explanation:
machine is unable to serve as alternate central manager.
Explanation:
The machine specified cannot be the alternate central manager because the appropriate negotiator keyword is not defined for this machine, or is defined incorrectly.
Cause:
The specified machine was defined as the central manager.
Action:
Define the appropriate keywords.

Label:
LL_TRUNCATE_ST
Error Type:
UNKN
Diagnostic Explanation:
Logfile filename was truncated.
Explanation:
LoadLeveler truncated the specified log file because the file exceeded the size defined in the LoadL_config file.
Action:
None. This is an informational message.


[ Top of Page | Previous Page | Next Page | Table of Contents ]