When planning to install the IBM Parallel Environment for AIX software, you need to ensure that you have met all of the necessary system requirements. You also need to think about what your programming environment will be and the strategy for using that environment. The following sections address these and other important pre-installation topics.
This section describes the system requirements for installing and running the PE software. It contains sections for hardware, software, and disk space requirements and provides additional information relevant to installing PE.
The PE software runs on the following:
The message passing libraries support the following hardware configurations:
Total random access memory (RAM) and fixed disk storage requirements for the machine are based on the licensed programs and user applications you install. See "Disk Space Requirements" for more information. For information on RAM and disk storage requirements for AIX Version 4.3.1 and associated programs, refer to IBM RS/6000 SP: Planning, Volume 1, Hardware and Physical Environment, GA22-7280.
The software required for PE includes PE filesets plus additional software, as explained in the following sections.
PE Version 2.4 consists of the filesets listed in the table below. You need to decide which of these filesets to install on the various nodes in your system, based on the PE product options you plan to use.
Notes:
Table 1. PE Fileset Requirements
If you plan to... | ...this product option is required: | Fileset Name: | Notes: | ||
---|---|---|---|---|---|
...develop and execute parallel applications from a node | Parallel Operating Environment | ppe.poe | The VT trace and data collection code, as well as the pdbx
command-line parallel debugger, is part of POE. However, if you plan to
use poestat, you also need to install
ppe.vt.
| ||
...perform trace visualization and performance monitoring or if you want to run the poestat command of poe | Visualization Tool | ppe.vt | You need to have POE installed on any node on which you also want to generate trace files or to do performance monitoring. You do not need POE on nodes in order to look at trace files generated from other nodes. | ||
...use the X-Windows version of the debugger facility | X-Windows Parallel Debugger | ppe.pedb | The fileset ppe.poe is required on any node that is to install pedb. Install ppe.pedb on those nodes or system on which you want to use pedb to debug your code. | ||
...use the Xprofiler X-Windows Performance Profiler tool | Xprofiler X-Windows Performance Profiler | ppe.xprofiler |
Xprofiler extends the usability of gprof by providing an environment
for exploring the gmon.out data in a variety of ways.
It provides a graphical function call tree display, navigation tools, and
various filters to analyze an application's performance profiling
data.
The installation of ppe.xprofiler is not dependent on the prior installation of any other PE products.
| ||
...access the online documentation | PE documentation | ppe.pedocs | Installing ppe.pedocs gives you the PE documentation, as described under PE Documentation (PEDOCS). |
PE Version 2.4 also requires some additional software products
or filesets, listed in the table below. You need to decide which of
these software products or filesets to install on your system, based on how
you plan to use PE.
Table 2. Additional Software Requirements
If you plan to... | ...this software is required: | Notes: | ||||
---|---|---|---|---|---|---|
Always required | AIX Version 4.3.2 for IBM RS/6000 (5765-C34) for Servers |
| ||||
...run a parallel program on the IBM RS/6000 SP | IBM Parallel System Support Programs (PSSP) for AIX (5765-D51) Version 3.1 |
| ||||
...compile parallel executables |
IBM C for AIX Version 4.3, 5765-AAR (part number 04L0675 with feature number 2163) or IBM C and C++ Compilers for AIX Version 3.6.4, 5801-AAR (part number 04L3535) or IBM XL Fortran for AIX Version 5.1.1 or later, 5808-AAR (part number 04L2110)
or IBM XL High Performance Fortran (HPF) for AIX Version 1.3.1 or later, 5765-613
|
| ||||
...submit a POE job that will use the SP Resource Manager from a non-SP node (for example, a standalone RS/6000 workstation run off the rack) | ssp.clients fileset, on the non-SP node | See "When to Install ssp.clients (SP Resource Manager)" for detailed information. | ||||
...submit a POE job from outside a LoadLeveler cluster | loadl.so on the node outside the LoadLeveler cluster | See "When to Install loadl.so (LoadLeveler)" for detailed information. | ||||
...use either of the PE debuggers | bos.adt.debug fileset |
| ||||
...use Xprofiler | For the CDE environment:
|
| ||||
...use the VT visualization capability | IBM C and C++ Compilers for AIX Version 3.6.4, 5801-AAR (part number 04L3535) |
| ||||
...use LoadLeveler to allow execution of batch jobs | LoadLeveler Version 2.1 |
|
The following table lists the amount of disk space you need in the appropriate directories for each of the separately-installable PE product options.
Note: | If you plan to install the PE software on an IBM RS/6000 network cluster, each machine in the cluster on which you install it must meet these disk space requirements. |
PE Fileset | Number of 512-Byte Blocks Required in Directory: | ||
---|---|---|---|
/usr | /tmp | /etc | |
ppe.poe | 75000 | 15000 | 10 |
ppe.pedb | 5600 | N/A | N/A |
ppe.vt | 6000 | N/A | N/A |
ppe.xprofiler | 6500 | N/A | N/A |
ppe.pedocs | 15000 | N/A | N/A |
Some PE product options and related software are subject to certain limitations, as explained below.
Incompatibilities exist between Fortran 90 and MPI that may affect the ability to use such programs. For further information, refer to /usr/lpp/ppe.poe/samples/mpif90/README.mpif90 after installing POE, and to "Enabling Fortran 90 Compiler Support".
User-written parallel applications are limited in their use of system calls. IBM Parallel Environment for AIX: MPI Programming and Subroutine Reference, GC23-3894 provides a discussion of these limitations.
When using the Parallel Debuggers, the application should have been compiled using the parallel compiler scripts supplied with POE (namely mpcc, mpcc_r, mpxlf, or mpxlf_r). Both debuggers currently support only Fortran 77 and C.
Customers with SP systems who have SP Switch Adapter-1 installed will not be able to use Version 2.4 of PE, because this adapter is no longer supported.
Parallel Environment does not support IBM VisualAge C++ Professional for AIX, Version 4.0 incremental compiler and C++ runtime library Version 4.0. This does not apply to the batch IBM C and C++ Version 3.6 compilers and the Version 3.6 C++ runtime libraries that are also included in VisualAge C++ Version 4.0.
If users plan to collect a gmon.out file on one machine and then use Xprofiler to analyze the data on another machine, they should be aware that some shared (system) libraries may not be the same on the two machines, which may result in different function call tree displays for those shared libraries.
Parallel Environment supports 32-bit applications only. 64-bit applications are not supported and will not run.
For all processors within a workstation cluster, the same release level of PE software is required. (This ensures that an individual PE application can run on any workstation in the cluster.)
When you use partitioning (available in PSSP Version 2 or later) on an IBM RS/6000 SP, you may have partitions at different levels of PE software; however, within a partition, all the nodes must be at the same level of PE software. (This ensures that an individual PE application can run on any node in the partition.)
Table 3 lists the versions of PSSP and AIX required on a particular workstation cluster or partition, depending on the version of PE installed on that cluster or partition, and the possible migration paths.
Note: | PSSP cannot be put on a workstation. |
Many of the compilers link to different libraries based on the AIX OSLEVEL value when they are installed. If you migrate just AIX and you will be using libraries for a back level, be sure to change the compiler library links or reinstall compilers.
How you plan your node resources will vary according to whether you are installing PE on an IBM RS/6000 SP or an IBM RS/6000 network cluster.
On an SP system, you partition nodes into pools and assign numbers and other information to these pools.
Pools are managed by the Resource Manager. You tell the POE Partition Manager which pools to use; the Partition Manager in turn requests the Resource Manager for nodes in the specified pools.
For more information on the Resource Manager and setting up pools, see IBM Parallel System Support Programs for AIX: Administration Guide, GC23-3897.
On an SP system within a LoadLeveler cluster, the system administrator uses LoadLeveler to partition nodes into pools and/or features, to which he or she assigns numbers and other information.
The workstation from which parallel jobs are started. The home node can be any workstation on the LAN.
On an IBM RS/6000 network cluster, you assign workstations to the following categories:
You need to identify these workstations running as execution nodes by name in a host list file.
An important aspect of planning your PE node resources is deciding which nodes will require which PE filesets or additional software. You do not need to install all of the PE filesets on every node. Refer to "Software Requirements" for more information on the filesets and their dependencies to help you decide how to install PE and additional required software on your nodes.
The PE filesets (poe, vt, pedb, pedocs, and xprofiler)are installed in the /usr file system. When the poe fileset is installed, it adds entries to the /etc/services and /etc/inetd.conf files. When poe is executed, a copy of the Partition Manager daemon is run on each remote node, and is identified in these files.
If you are using NIS or another master server for /etc/services, you need to create updates with the same information that is put into the individual files.
If you do not use a shared file system, you need to copy the user's executable files to the other nodes. To copy them, use the scripts provided by PE: mprcp and mpmkdir. You can also use mcp, the message passing file copy command. For more information on copying the file system and these scripts, see IBM Parallel Environment for AIX: Operation and Use, Volume 1, SC28-1979.
Also, you can declare these files part of a file collection. A file collection is a set of files and directories that are duplicated on multiple machines in a network and managed by tools that simplify their control and maintenance. For more information about file collections, see IBM Parallel System Support Programs for AIX: Administration Guide, SA22-7348.
The system administrator must set up a user ID, other than a root ID, for each user on each remote node that requires POE access.
Each user must have an account on all nodes where a job runs. Both the user name and user ID must be the same on all nodes. Also, the user must be a member of the same named group on the home node and the remote nodes.
With PE Version 2 Release 4, interactive and batch parallel jobs can be submitted under LoadLeveler. When LoadLeveler is used, LoadLeveler is completely responsible for the user authorization. Any user authorization under POE is bypassed.
When LoadLeveler is not used, POE handles the user authorization. The following sections on POE user authorization apply when POE is used without LoadLeveler.
POE supports two methods of user authorization for submitting a parallel job:
via /etc/hosts.equiv or .rhosts entries
This is the default mechanism.
where POE checks for a valid set of DCE credentials for the user
The user authorization mechanism is controlled by the MP_AUTH POE environment variable. This variable can be defined by the system administrator in the /etc/poe.limits file, as described in "Using the /etc/poe.limits File", so that users do not need to decide which mechanism to use.
The two types of authorization cannot be mixed in the same parallel job. All tasks and nodes defined for a POE job must use the same type of authorization.
If AIX user authorization (the default) is used as a security mechanism on the system, each node needs to be set up so that each userid is authorized to access that node or remote link from the initiating home node. The /etc/hosts.equiv file and/or the .rhosts file are used to specify this user ID authorization, as explained below.
If the combination of the home node machine and user name:
For more information on .rhosts and /etc/host.equiv, refer to the chapter on managing jobs in IBM AIX Version 4 Files Reference for AIX, SC23-2512.
If DFS/DCE user authorization is used as a security mechanism on the system, POE accepts a valid set of DCE user credentials as user authorization for executing parallel jobs.
In order to use DFS/DCE with POE, the following are required:
Note: | When DFS/DCE authorization is selected, there is no need for entries in either the /etc/hosts.equiv file or the .rhosts file, as these are not checked by POE. |
For more information about running POE in a DFS/DCE environment, and about the poeauth command, see IBM Parallel Environment for AIX: Operation and Use, Volume 1, SC28-1979 .
When POE is installed, it modifies entries in /etc/services and in /etc/inetd.conf to install the Partition Manager daemon. In doing so, it requires an available port number which must be the same number on all nodes on which POE is to be installed and running. You need to ensure such a port number is available.
A POE application may require additional IP buffers (mbufs) under any of the following circumstances:
The need for additional IP buffers is usually evident when repeated requests for memory are denied. Using the netstat -m command and option can tell you when such a condition exists. In such a case, it may be necessary to use the no command to change the network option system parameters on the home node or on the SP nodes being used in the partition. (You can use the no command to initially check the values as well.)
The number of IP buffers allocated in the kernel is controlled by the thewall parameter of the no command. Increasing the value of the thewall parameter increases the number of IP buffers.
Notes:
On SP nodes, you can use the dsh command to execute the no command on each node of an SP. See the section on tuning in IBM Parallel System Support Programs for AIX: Administration Guide, SA22-7348 for more information on dsh).
For non-SP nodes, you can also set the values at system boot time by adding the appropriate call to the no command in either /etc/rc.net or /etc/rc.tcpip.
For more information in general on mbufs, see IBM AIX Versions 3.2 and 4 Performance Monitoring and Tuning, SC23-2365 .
POE Version 1 and POE Version 2 are not compatible. All of your tasks must run with either POE Version 2 or with POE Version 1, not a combination of the two. . The POE home node and all remote nodes must run with the same version of code. You must be at the same level of AIX and PSSP within a partition to submit PE jobs. See Chapter 3. "PE Version 2.4 Migration Information" for more information.
As part of the Version 2 installation the Partition Manager daemon (pmd) and POE executables have different names than their Version 1 counterparts. Also, different TCP/IP port numbers and daemon service names are utilized. Furthermore, Version 2 and Version 1 files use different directory path names.
The following table summarizes the differences and can be used to tell
which version of POE you have if you are not sure.
Type of Name or Number | POE Version 1 | POE Version 2 |
---|---|---|
Service name in /etc/services | pm2 | pmv2 |
Daemon name in /etc/inetd.conf | pmd2 | pmdv2 |
Default port number | 6124 | 6125 |
pmd executable name | pmd2 | pmdv2 |
File path name | /usr/lpp/poe | /usr/lpp/ppe.poe |