IBM Books

Installation


Chapter 2. Planning to Install the PE Software

When planning to install the IBM Parallel Environment for AIX software, you need to ensure that you have met all of the necessary system requirements. You also need to think about what your programming environment will be and the strategy for using that environment. The following sections address these and other important pre-installation topics.


PE Installation Requirements

This section describes the system requirements for installing and running the PE software. It contains sections for hardware, software, and disk space requirements and provides additional information relevant to installing PE.

Hardware Requirements

The PE software runs on the following:

The message passing libraries support the following hardware configurations:

Total random access memory (RAM) and fixed disk storage requirements for the machine are based on the licensed programs and user applications you install. See "Disk Space Requirements" for more information. For information on RAM and disk storage requirements for AIX Version 4.3.1 and associated programs, refer to IBM RS/6000 SP: Planning, Volume 1, Hardware and Physical Environment, GA22-7280.

Software Requirements

The software required for PE includes PE filesets plus additional software, as explained in the following sections.

PE Fileset Requirements

PE Version 2.4 consists of the filesets listed in the table below. You need to decide which of these filesets to install on the various nodes in your system, based on the PE product options you plan to use.

Notes:

  1. For more information about nodes, see "Node Resources".

  2. For information about installing any of the following product options individually, see "PE Installation Procedure Summary".

Table 1. PE Fileset Requirements
If you plan to... ...this product option is required: Fileset Name: Notes:
...develop and execute parallel applications from a node Parallel Operating Environment ppe.poe The VT trace and data collection code, as well as the pdbx command-line parallel debugger, is part of POE. However, if you plan to use poestat, you also need to install ppe.vt.

Installing POE with NIS

When POE is installed, it adds entries to the /etc/services and /etc/inetd.conf files. When POE is executed, a copy of the Partition Manager daemon is run on each remote node, and is identified by these files.

If you are using NIS or another master server for /etc/services, you need to update the individual files with the same information.

...perform trace visualization and performance monitoring or if you want to run the poestat command of poe Visualization Tool ppe.vt You need to have POE installed on any node on which you also want to generate trace files or to do performance monitoring. You do not need POE on nodes in order to look at trace files generated from other nodes.
...use the X-Windows version of the debugger facility X-Windows Parallel Debugger ppe.pedb The fileset ppe.poe is required on any node that is to install pedb. Install ppe.pedb on those nodes or system on which you want to use pedb to debug your code.
...use the Xprofiler X-Windows Performance Profiler tool Xprofiler X-Windows Performance Profiler ppe.xprofiler Xprofiler extends the usability of gprof by providing an environment for exploring the gmon.out data in a variety of ways. It provides a graphical function call tree display, navigation tools, and various filters to analyze an application's performance profiling data.

The installation of ppe.xprofiler is not dependent on the prior installation of any other PE products.
Note:Although it is not required to install Xprofiler on every node, it is advisable to install it on at least one node in each group of nodes that have the same software library levels. (See "Limitations".)

...access the online documentation PE documentation ppe.pedocs Installing ppe.pedocs gives you the PE documentation, as described under PE Documentation (PEDOCS).

Additional Software Requirements

PE Version 2.4 also requires some additional software products or filesets, listed in the table below. You need to decide which of these software products or filesets to install on your system, based on how you plan to use PE.

Table 2. Additional Software Requirements
If you plan to... ...this software is required: Notes:
Always required AIX Version 4.3.2 for IBM RS/6000 (5765-C34) for Servers
...run a parallel program on the IBM RS/6000 SP IBM Parallel System Support Programs (PSSP) for AIX (5765-D51) Version 3.1

On the SP:
Ensure that the Communication Subsystem (CSS) libraries, Resource Manager, and System Data Repository (SDR) client libraries are available in their default locations.

...compile parallel executables

IBM C for AIX Version 4.3, 5765-AAR (part number 04L0675 with feature number 2163)

or

IBM C and C++ Compilers for AIX Version 3.6.4, 5801-AAR (part number 04L3535)

or

IBM XL Fortran for AIX Version 5.1.1 or later, 5808-AAR (part number 04L2110)
Note:Due to limitations with threads support, XL Fortran for AIX Version 4 (5765-658) is supported in binary compatibility mode.

or

IBM XL High Performance Fortran (HPF) for AIX Version 1.3.1 or later, 5765-613
Note:For full debugging support within PE, use Fortran 77 and C.


...submit a POE job that will use the SP Resource Manager from a non-SP node (for example, a standalone RS/6000 workstation run off the rack) ssp.clients fileset, on the non-SP node See "When to Install ssp.clients (SP Resource Manager)" for detailed information.
...submit a POE job from outside a LoadLeveler cluster loadl.so on the node outside the LoadLeveler cluster See "When to Install loadl.so (LoadLeveler)" for detailed information.
...use either of the PE debuggers bos.adt.debug fileset
...use Xprofiler For the CDE environment:

  • X11.Dt.lib 4.2.1.0 or later

...use the VT visualization capability IBM C and C++ Compilers for AIX Version 3.6.4, 5801-AAR (part number 04L3535)

On the SP:
Use the Network Time Protocol (NTP) utility to synchronize the clocks on all machines.

On an IBM RS/6000 network cluster:
Use any available Internet host/server utility to synchronize the clocks on all machines.
Note:If the SP Switch is installed, VT uses it for common time on each node, and no additional synchronization software is required.
...use LoadLeveler to allow execution of batch jobs LoadLeveler Version 2.1

Disk Space Requirements

The following table lists the amount of disk space you need in the appropriate directories for each of the separately-installable PE product options.
Note:If you plan to install the PE software on an IBM RS/6000 network cluster, each machine in the cluster on which you install it must meet these disk space requirements.

PE Fileset Number of 512-Byte Blocks Required in Directory:
/usr /tmp /etc
ppe.poe 75000 15000 10
ppe.pedb 5600 N/A N/A
ppe.vt 6000 N/A N/A
ppe.xprofiler 6500 N/A N/A
ppe.pedocs 15000 N/A N/A


Limitations

Some PE product options and related software are subject to certain limitations, as explained below.

Fortran 90 and MPI

Incompatibilities exist between Fortran 90 and MPI that may affect the ability to use such programs. For further information, refer to /usr/lpp/ppe.poe/samples/mpif90/README.mpif90 after installing POE, and to "Enabling Fortran 90 Compiler Support".

MPI-IO
This release of PE includes a subset of the new MPI functionality defined by the MPI-IO chapter of the MPI-2 document. This MPI-2 functionality is provided in the threaded version of the MPI library, but not in the signals-based version. MPI-IO is intended to be used with the IBM Generalized Parallel File System (GPFS). MPI-IO depends on having a single file system underlying all tasks of an MPI job. Shared file systems such as NFS and AFS do not meet this requirement when they are used across multiple nodes. MPI jobs that have all tasks on a single node can use non-GPFS file systems, but this is not expected to be a useful model for production use of MPI-IO.

Parallel Applications and System Calls

User-written parallel applications are limited in their use of system calls. IBM Parallel Environment for AIX: MPI Programming and Subroutine Reference, GC23-3894 provides a discussion of these limitations.

Parallel Debuggers

When using the Parallel Debuggers, the application should have been compiled using the parallel compiler scripts supplied with POE (namely mpcc, mpcc_r, mpxlf, or mpxlf_r). Both debuggers currently support only Fortran 77 and C.

SP Switch Adapter-1

Customers with SP systems who have SP Switch Adapter-1 installed will not be able to use Version 2.4 of PE, because this adapter is no longer supported.

VisualAge C++

Parallel Environment does not support IBM VisualAge C++ Professional for AIX, Version 4.0 incremental compiler and C++ runtime library Version 4.0. This does not apply to the batch IBM C and C++ Version 3.6 compilers and the Version 3.6 C++ runtime libraries that are also included in VisualAge C++ Version 4.0.

VT
VT will generate trace files for applications running up to 128 nodes. When the visualization portion of VT is used, some of the displays will experience degraded usability for more than 32 nodes.

Xprofiler

If users plan to collect a gmon.out file on one machine and then use Xprofiler to analyze the data on another machine, they should be aware that some shared (system) libraries may not be the same on the two machines, which may result in different function call tree displays for those shared libraries.

32-Bit and 64-Bit Application Support

Parallel Environment supports 32-bit applications only. 64-bit applications are not supported and will not run.


Information for the System Administrator

Software Compatibility Within Workstation Clusters and Within Partitions

For all processors within a workstation cluster, the same release level of PE software is required. (This ensures that an individual PE application can run on any workstation in the cluster.)

When you use partitioning (available in PSSP Version 2 or later) on an IBM RS/6000 SP, you may have partitions at different levels of PE software; however, within a partition, all the nodes must be at the same level of PE software. (This ensures that an individual PE application can run on any node in the partition.)

Table 3 lists the versions of PSSP and AIX required on a particular workstation cluster or partition, depending on the version of PE installed on that cluster or partition, and the possible migration paths.
Note:PSSP cannot be put on a workstation.

Note About Upgrading AIX Without Upgrading Compilers

Many of the compilers link to different libraries based on the AIX OSLEVEL value when they are installed. If you migrate just AIX and you will be using libraries for a back level, be sure to change the compiler library links or reinstall compilers.

Node Resources

How you plan your node resources will vary according to whether you are installing PE on an IBM RS/6000 SP or an IBM RS/6000 network cluster.

On an IBM RS/6000 SP...

...Using Resource Manager

On an SP system, you partition nodes into pools and assign numbers and other information to these pools.

Pools and the Resource Manager

Pools are managed by the Resource Manager. You tell the POE Partition Manager which pools to use; the Partition Manager in turn requests the Resource Manager for nodes in the specified pools.

For more information on the Resource Manager and setting up pools, see IBM Parallel System Support Programs for AIX: Administration Guide, GC23-3897.

...Within a LoadLeveler Cluster

On an SP system within a LoadLeveler cluster, the system administrator uses LoadLeveler to partition nodes into pools and/or features, to which he or she assigns numbers and other information.

Home Node

The workstation from which parallel jobs are started. The home node can be any workstation on the LAN.

On an IBM RS/6000 Network Cluster

On an IBM RS/6000 network cluster, you assign workstations to the following categories:

Deciding Which Nodes Require Which PE Filesets or Additional Software

An important aspect of planning your PE node resources is deciding which nodes will require which PE filesets or additional software. You do not need to install all of the PE filesets on every node. Refer to "Software Requirements" for more information on the filesets and their dependencies to help you decide how to install PE and additional required software on your nodes.

File Systems

The PE filesets (poe, vt, pedb, pedocs, and xprofiler)are installed in the /usr file system. When the poe fileset is installed, it adds entries to the /etc/services and /etc/inetd.conf files. When poe is executed, a copy of the Partition Manager daemon is run on each remote node, and is identified in these files.

If you are using NIS or another master server for /etc/services, you need to create updates with the same information that is put into the individual files.

If you do not use a shared file system, you need to copy the user's executable files to the other nodes. To copy them, use the scripts provided by PE: mprcp and mpmkdir. You can also use mcp, the message passing file copy command. For more information on copying the file system and these scripts, see IBM Parallel Environment for AIX: Operation and Use, Volume 1, SC28-1979.

Also, you can declare these files part of a file collection. A file collection is a set of files and directories that are duplicated on multiple machines in a network and managed by tools that simplify their control and maintenance. For more information about file collections, see IBM Parallel System Support Programs for AIX: Administration Guide, SA22-7348.

User IDs on Remote Nodes

The system administrator must set up a user ID, other than a root ID, for each user on each remote node that requires POE access.

Each user must have an account on all nodes where a job runs. Both the user name and user ID must be the same on all nodes. Also, the user must be a member of the same named group on the home node and the remote nodes.

User Authorization

With PE Version 2 Release 4, interactive and batch parallel jobs can be submitted under LoadLeveler. When LoadLeveler is used, LoadLeveler is completely responsible for the user authorization. Any user authorization under POE is bypassed.

When LoadLeveler is not used, POE handles the user authorization. The following sections on POE user authorization apply when POE is used without LoadLeveler.

POE supports two methods of user authorization for submitting a parallel job:

AIX authorization (default)

via /etc/hosts.equiv or .rhosts entries

This is the default mechanism.

DFS/DCE authorization

where POE checks for a valid set of DCE credentials for the user

The user authorization mechanism is controlled by the MP_AUTH POE environment variable. This variable can be defined by the system administrator in the /etc/poe.limits file, as described in "Using the /etc/poe.limits File", so that users do not need to decide which mechanism to use.

The two types of authorization cannot be mixed in the same parallel job. All tasks and nodes defined for a POE job must use the same type of authorization.

Using AIX User Authorization

If AIX user authorization (the default) is used as a security mechanism on the system, each node needs to be set up so that each userid is authorized to access that node or remote link from the initiating home node. The /etc/hosts.equiv file and/or the .rhosts file are used to specify this user ID authorization, as explained below.

If the combination of the home node machine and user name:

For more information on .rhosts and /etc/host.equiv, refer to the chapter on managing jobs in IBM AIX Version 4 Files Reference for AIX, SC23-2512.

Using DFS/DCE User Authorization

If DFS/DCE user authorization is used as a security mechanism on the system, POE accepts a valid set of DCE user credentials as user authorization for executing parallel jobs.

In order to use DFS/DCE with POE, the following are required:

Note:When DFS/DCE authorization is selected, there is no need for entries in either the /etc/hosts.equiv file or the .rhosts file, as these are not checked by POE.

For more information about running POE in a DFS/DCE environment, and about the poeauth command, see IBM Parallel Environment for AIX: Operation and Use, Volume 1, SC28-1979 .

Port Numbers

When POE is installed, it modifies entries in /etc/services and in /etc/inetd.conf to install the Partition Manager daemon. In doing so, it requires an available port number which must be the same number on all nodes on which POE is to be installed and running. You need to ensure such a port number is available.

Running Large POE Jobs and IP Buffer Usage

A POE application may require additional IP buffers (mbufs) under any of the following circumstances:

The need for additional IP buffers is usually evident when repeated requests for memory are denied. Using the netstat -m command and option can tell you when such a condition exists. In such a case, it may be necessary to use the no command to change the network option system parameters on the home node or on the SP nodes being used in the partition. (You can use the no command to initially check the values as well.)

The number of IP buffers allocated in the kernel is controlled by the thewall parameter of the no command. Increasing the value of the thewall parameter increases the number of IP buffers.

Notes:

  1. You must have root authority to change options with the no command, and the setting applies to all processes running on the node on which it is executed.

  2. In AIX Version 4.3, the thewall default value is 16384.

On SP nodes, you can use the dsh command to execute the no command on each node of an SP. See the section on tuning in IBM Parallel System Support Programs for AIX: Administration Guide, SA22-7348 for more information on dsh).

For non-SP nodes, you can also set the values at system boot time by adding the appropriate call to the no command in either /etc/rc.net or /etc/rc.tcpip.

For more information in general on mbufs, see IBM AIX Versions 3.2 and 4 Performance Monitoring and Tuning, SC23-2365 .

Running Multiple Versions of POE

POE Version 1 and POE Version 2 are not compatible. All of your tasks must run with either POE Version 2 or with POE Version 1, not a combination of the two. . The POE home node and all remote nodes must run with the same version of code. You must be at the same level of AIX and PSSP within a partition to submit PE jobs. See Chapter 3. "PE Version 2.4 Migration Information" for more information.

Partition Manager Daemon Services and Installation

As part of the Version 2 installation the Partition Manager daemon (pmd) and POE executables have different names than their Version 1 counterparts. Also, different TCP/IP port numbers and daemon service names are utilized. Furthermore, Version 2 and Version 1 files use different directory path names.

The following table summarizes the differences and can be used to tell which version of POE you have if you are not sure.
Type of Name or Number POE Version 1 POE Version 2
Service name in /etc/services pm2 pmv2
Daemon name in /etc/inetd.conf pmd2 pmdv2
Default port number 6124 6125
pmd executable name pmd2 pmdv2
File path name /usr/lpp/poe /usr/lpp/ppe.poe


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]