This chapter describes the eigensystem analysis and singular value analysis subroutines.
The eigensystem analysis and singular value analysis subroutines include a subset of the ScaLAPACK subroutines. See references [19] and [20].
Note: | These subroutines are designed in accordance with the proposed ScaLAPACK standard. If these subroutines do no comply with the standard as approved, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program. |
Table 95. List of Eigensystem Analysis and Singular Value Analysis Subroutines (Message Passing)
Descriptive Name | Long-Precision Subroutine | Page |
---|---|---|
Selected Eigenvalues and, Optionally, the Eigenvectors of a Real Symmetric Matrix | PDSYEVX | PDSYEVX--Selected Eigenvalues and, Optionally, the Eigenvectors of a Real Symmetric Matrix |
Reduce a Real Symmetric Matrix to Tridiagonal Form | PDSYTRD | PDSYTRD--Reduce a Real Symmetric Matrix to Tridiagonal Form |
Reduce a General Matrix to Upper Hessenberg Form | PDGEHRD | PDGEHRD--Reduce a General Matrix to Upper Hessenberg Form |
Reduce a General Matrix to Bidiagonal Form | PDGEBRD | PDGEBRD--Reduce a General Matrix to Bidiagonal Form |
This section contains the eigensystem analysis subroutine descriptions.
This subroutine computes selected eigenvalues and, optionally, the eigenvectors of a real symmetric matrix A, where A represents the global symmetric submatrix Aia:ia+n-1, ja:ja+n-1. Eigenvalues and eigenvectors can be selected by specifying a range of values or a range of indices for the eigenvalues.
If n = 0, no computation is performed and the subroutine returns after doing some parameter checking.
See references [13], [24], [25], and [26].
A, vl, vu, abstol, orfac, Z, w, work, gap | iwork, ifail, iclustr | Subroutine |
Long-precision real | Integer | PDSYEVX |
Fortran | CALL PDSYEVX (jobz, range, uplo, n, a, ia, ja, desc_a, vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, desc_z, work, lwork, iwork, liwork, ifail, iclustr, gap, info) |
C and C++ | pdsyevx (jobz, range, uplo, n, a, ia, ja, desc_a, vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, desc_z, work, lwork, iwork, liwork, ifail, iclustr, gap, info); |
If jobz = 'N', eigenvalues only are computed.
If jobz = 'V', eigenvalues and eigenvectors are computed.
Scope: global
Specified as: a single character; jobz = 'N' or 'V'.
If range = 'A', all eigenvalues are to be found.
If range = 'V', all eigenvalues in the interval [vl, vu] are to be found.
If range = 'I', the il-th through iu-th eigenvalues are to be found.
Scope: global
Specified as: a single character; range = 'A', 'V', or 'I'.
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 96. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix | If n = 0: M_A >= 0
Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix | If n = 0: N_A >= 0
Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If range = 'V', it is the lower bound of the interval to be searched for eigenvalues.
If range <> 'V', this argument is ignored.
Scope: global
Specified as: a number of the data type indicated in Table 96. If range = 'V', vl < vu.
If range = 'V', it is the upper bound of the interval to be searched for eigenvalues.
If range <> 'V', this argument is ignored.
Scope: global
Specified as: a number of the data type indicated in Table 96. If range = 'V', vl < vu.
If range = 'I', it is the index (from smallest to largest) of the smallest eigenvalue to be returned.
If range <> 'I', this argument is ignored.
Scope: global
Specified as: a fullword integer; il >= 1.
If range = 'I', it is the index (from smallest to largest) of the largest eigenvalue to be returned.
If range <> 'I', this argument is ignored.
Scope: global
Specified as: a fullword integer; min(il, n) <= iu <= n.
where epsilon is the machine precision. If abstol is less than or equal to zero, then epsilon(norm(T)) is used in its place, where norm(T) is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form. For most problems, this is the appropriate level of accuracy to request.
For certain strongly graded matrices, greater accuracy can be obtained in very small eigenvalues by setting abstol to a very small positive number. However, if abstol is less than:
where unfl is the underflow threshold, then:
is used in its place.
Eigenvalues are computed most accurately when abstol is set to twice the underflow threshold--that is, (2)(unfl).
If jobz = 'V', then setting abstol to unfl, the underflow threshold, yields the most orthogonal eigenvectors.
Scope: global
Specified as: a number of the data type indicated in Table 96.
of each other (where norm(A) is the 1-norm of A) are to be reorthogonalized.
However, if the workspace is insufficient (see lwork), ortol may be decreased until all eigenvectors to be reorthogonalized can be stored in one process.
If orfac is zero, no reorthogonalization is done.
If orfac is less than zero, a default value of 10-3 is used.
Scope: global
Specified as: a number of the data type indicated in Table 96.
Scope: global
Specified as: a fullword integer; 1 <= iz <= M_Z and iz+n-1 <= M_Z.
Scope: global
Specified as: a fullword integer; 1 <= jz <= N_Z and jz+n-1 <= N_Z.
desc_z | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_Z | Descriptor type | DTYPE_Z=1 | Global |
2 | CTXT_Z | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_Z | Number of rows in the global matrix | If n = 0: M_Z >= 0
Otherwise: M_Z >= 1 | Global |
4 | N_Z | Number of columns in the global matrix | If n = 0: N_Z >= 0
Otherwise: N_Z >= 1 | Global |
5 | MB_Z | Row block size | MB_Z >= 1 | Global |
6 | NB_Z | Column block size | NB_Z >= 1 | Global |
7 | RSRC_Z | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_Z < p | Global |
8 | CSRC_Z | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_Z < q | Global |
9 | LLD_Z | The leading dimension of the local array | LLD_Z >= max(1,LOCp(M_Z)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If lwork = 0, work is ignored.
If lwork <> 0, work is a work area used by this subroutine, where:
Scope: local
Specified as: an area of storage containing numbers of data type indicated in Table 96.
Scope:
Specified as: a fullword integer; where:
lwork >= 3(n+ja-1)+2n +max(5nn, (nb)(np+1))
lwork >= 3(n+ja-1)+2n +max(5nn, (np0)(mq0)+2(nb)(nb)) + iceil(neig, (nprow)(npcol))(nn)
where:
The computed eigenvectors may not be orthogonal if the minimum workspace is supplied and ortol is too small; therefore, if you want to guarantee orthogonality (at the cost of potentially compromising performance), you should add the following to lwork:
(clustersize-1)(n)
where clustersize is the number of eigenvalues in the largest cluster, where a cluster is defined as a set of close eigenvalues:
Note: | This subroutine does not add this amount when dynamically allocating this workspace. You must use static allocation if you want to guarantee orthogonality. |
When lwork is too small:
Relationship between workspace, orthogonality, and performance:
then providing enough space to compute all the eigenvectors orthogonally causes serious degradation in performance. In the limit (clustersize = n-1), performance may be no better than using one process.
then reorthogonalizing all eigenvectors increases the total execution time by a factor of 2 or more.
then execution time grows as the square of the cluster size, assuming all other factors remain equal and there is enough workspace. Less workspace means less reorthogonalization, but faster execution.
If liwork = 0, iwork is ignored.
If liwork <> 0, iwork is a work area used by this subroutine, where:
Scope: local
Specified as: an area of storage containing fullword integers.
Scope:
Specified as: a fullword integer; where:
liwork >= max(isizestein, isizestebz)+2n
where:
If uplo = 'U', the upper triangle and diagonal of submatrix Aia:ia+n-1, ja:ja+n-1 are overwritten; that is, the original input is not preserved.
If uplo = 'L', the lower triangle and diagonal of submatrix Aia:ia+n-1, ja:ja+n-1 are overwritten; that is, the original input is not preserved.
Scope: local
Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 96.
Scope: global
Returned as: a fullword integer; 0 <= m <= n.
If jobz <> 'V', then nz is ignored.
If jobz = 'V', then nz is the number of eigenvectors computed--that is, the number of columns of Z used in the computation. On output, nz = m unless you provide insufficient space. To get all the eigenvectors requested, you must supply both sufficient space to hold the eigenvectors in Z and sufficient workspace to compute them (see lwork).
If range = 'A' or 'I', PDSYEVX does not perform any computations if the work space supplied is insufficient. In this case, an input-argument error is issued and your job is terminated. For range = 'V', the number of requested eigenvectors is unknown until the eigenvalues are found. In this case, PDSYEVX computes as many eigenvectors as space allows. Then, if nz <> m, a computational error message is issued.
Scope: global
Returned as: a fullword integer; 0 <= nz <= m.
Scope: global
Returned as: a one-dimensional array of (at least) length n, containing numbers of the data type indicated in Table 96.
If jobz = 'N', then z is ignored.
If jobz = 'V' and there is a normal exit (see info), then this is the updated local part of the global matrix Z, where columns jz to jz+m-1 of the global matrix Z contain the orthonormal eigenvectors of the global matrix A, corresponding to the selected eigenvalues. If an eigenvector fails to converge, then the corresponding column of the global matrix Z contains the last approximation to the eigenvector, and the index of the eigenvector is returned in ifail.
This identifies the first element of the local array Z. This subroutine computes the location of the first element of the local subarray used, based on iz, jz, desc_z, p, q, myrow, and mycol; therefore, the leading LOCp(iz+n-1) by LOCq(jz+n-1) part of the local array Z must contain the local pieces of the leading iz+n-1 by jz+n-1 part of the global matrix Z.
Scope: local
Returned as: an LLD_Z by (at least) LOCq(N_Z) array, containing numbers of the data type indicated in Table 96.
If lwork <> 0 and lwork <> -1, its size is (at least) of length lwork.
If lwork = -1, its size is (at least) of length 1.
Scope: local
Returned as: an area of storage, containing numbers of the data type indicated in Table 96, where:
If lwork >= 1 or lwork = -1, then:
If liwork <> 0 and liwork <> -1, then its size is (at least) of length liwork.
If liwork = -1, then its size is (at least) of length 1.
Scope: local
Returned as: an area of storage, where:
If liwork >= 1 or liwork = -1, then iwork1 is set to the minimum liwork value and contains numbers of the data type indicated in Table 96. Except for iwork1, the contents of iwork are overwritten on return.
If jobz = 'N', then ifail is ignored.
If jobz = 'V', it is vector ifail, where:
Scope: global
Returned as: a one-dimensional array of (at least) length n, containing fullword integers; 0 <= ifaili <= n.
If jobz = 'N', then iclustr is ignored.
If jobz = 'V', it is vector iclustr, containing the indices of the eigenvectors corresponding to a cluster of eigenvalues that could not be reorthogonalized due to insufficient workspace. Eigenvectors corresponding to clusters of eigenvalues indexed iclustr2i-1 to iclustr2i could not be reorthogonalized due to lack of workspace. Hence, the eigenvectors corresponding to these clusters may not be orthogonal.
iclustr is a zero-terminated vector; that is, the last element of iclustr is set to zero. Assuming that k is the number of clusters, then:
Scope: global
Returned as: a one-dimensional array of (at least) length 2(nprow)(npcol), containing fullword integers; 0 <= iclustri <= n.
If jobz = 'N', then gap is ignored.
If jobz = 'V', it is vector gap, containing the gap between the eigenvalues whose eigenvectors could not be reorthogonalized. The values in this vector correspond to the clusters indicated by iclustr. As a result, the dot product between the eigenvectors corresponding to the i-th cluster may be as high as (C)(n)/gapi, where C is a small constant.
Scope: global
Returned as: a one-dimensional array of (at least) length (nprow)(npcol), containing numbers of the data type indicated in Table 96.
If info = 0, then no input-argument errors or computational errors occurred. This indicates a normal exit.
Note: | One use of info in ScaLAPACK is to identify whether input-argument errors occurred. Because Parallel ESSL terminates the application if input-argument errors occur, the setting of info is irrelevant for these errors. |
If info > 0, then one or more of the following computational errors occurred and the appropriate error messages were issued, indicating an error exit, where:
Scope: global
Returned as: a fullword integer; info >= 0.
where:
This subroutine computes selected eigenvalues and, optionally, the eigenvectors of a real symmetric matrix A. Eigenvalues and eigenvectors can be selected by specifying a range of values or a range of indices for the eigenvalues. The computation involves the following steps:
Note: | For more details, see output argument info. |
If jobz = 'V':
If n <> 0:
If n <> 0 and jobz = 'V':
In all cases:
If jobz = 'V':
If jobz = 'V':
Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:
Also:
Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:
If range = 'V':
If range = 'I':
If jobz = 'V':
This example shows how to find all the eigenvalues and eigenvectors of a real symmetric matrix A of order 4 using a 2 × 2 process grid.
Notes:
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) JOBZ RANGE UPLO N A IA JA DESC_A VL VU IL IU ABSTOL M NZ W | | | | | | | | | | | | | | | | CALL PDSYEVX( 'V', 'A', 'U', 4, A, 1, 1, DESC_A, 0.0D0, 0.0D0, 0, 0, -1.0D0, M, NZ, W, ORFAC Z IZ JZ DESC_Z WORK LWORK IWORK LIWORK IFAIL ICLUSTR GAP INFO + | | | | | | | | | | | | | -1.0D0, Z, 1, 1, DESC_Z, WORK , 0 , IWORK , 0 , IFAIL, ICLUSTR, GAP, INFO)
| DESC_A | DESC_Z | ||
---|---|---|---|---|
DTYPE_ | 1 | 1 | ||
CTXT_ | icontxt1 | icontxt1 | ||
M_ | 4 | 4 | ||
N_ | 4 | 4 | ||
MB_ | 1 | 1 | ||
NB_ | 1 | 1 | ||
RSRC_ | 0 | 0 | ||
CSRC_ | 0 | 0 | ||
LLD_ | See below2 | See below2 | ||
|
Global symmetric matrix A of order 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 5.0 | 4.0 | 1.0 | 1.0 | | ------|--------|--------|------ | 1 | . | 5.0 | 1.0 | 1.0 | | ------|--------|--------|------ | 2 | . | . | 4.0 | 2.0 | | ------|--------|--------|------ | 3 | . | . | . | 4.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|------------|------------ 0 | 5.0 1.0 | 4.0 1.0 | . 4.0 | . 2.0 -----|------------|------------ 1 | . 1.0 | 5.0 1.0 | . . | . 4.0
Output:
The upper triangle, including the diagonal, of the global symmetric matrix A is overwritten; that is, the original input is not preserved.
On all processes, m = 4 and nz = 4.
Global vector w of length 4 is the same on all processes:
Global general matrix Z of order 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 0.7071 | 0.0000 | -0.3162 | -0.6325 | | ---------|-----------|-----------|--------- | 1 | -0.7071 | 0.0000 | -0.3162 | -0.6325 | | ---------|-----------|-----------|--------- | 2 | 0.0000 | -0.7071 | 0.6325 | -0.3162 | | ---------|-----------|-----------|--------- | 3 | 0.0000 | 0.7071 | 0.6325 | -0.3162 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for Z:
p,q | 0 | 1 -----|-------------------|------------------- 0 | 0.7071 -0.3162 | 0.0000 -0.6325 | 0.0000 0.6325 | -0.7071 -0.3162 -----|-------------------|------------------- 1 | -0.7071 -0.3162 | 0.0000 -0.6325 | 0.0000 0.6325 | 0.7071 -0.3162
Global vector ifail of length 4 is the same on all processes:
Global vector iclustr of length 8 ( = 2(nprow)(npcol)) is the same on all processes:
Global vector gap of length 4 ( = (nprow)(npcol)) is the same on all processes:
The value of info is 0 on all processes.
This subroutine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonal similarity transformation:
where A represents the global symmetric submatrix Aia:ia+n-1, ja:ja+n-1.
If n = 0, no computation is performed and the subroutine returns after doing some parameter checking.
A, d, e, tau, work | Subroutine |
Long-precision real | PDSYTRD |
Fortran | CALL PDSYTRD (uplo, n, a, ia, ja, desc_a, d, e, tau, work, lwork, info) |
C and C++ | pdsytrd (uplo, n, a, ia, ja, desc_a, d, e, tau, work, lwork, info); |
If uplo = 'U', the upper triangular part is referenced.
If uplo = 'L', the lower triangular part is referenced.
Scope: global
Specified as: a single character; uplo = 'U' or 'L'.
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 97. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix | If n = 0: M_A >= 0
Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix | If n = 0: N_A >= 0
Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If lwork = 0, work is ignored.
If lwork <> 0, work is the work area used by this subroutine, where:
Scope: local
Specified as: an area of storage containing numbers of data type indicated in Table 97.
Scope:
Specified as: a fullword integer; where:
lwork >= max(nb(np+1), 3nb)
where:
See "Function", for more information.
Scope: local
Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 97. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
This identifies the first element of the local array D. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+n-1) part of the local array D must contain the local pieces of the leading 1 by ja+n-1 part of the global matrix D.
A copy of the vector d, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of d is distributed is CSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 97.
If uplo = 'U', then eja = 0 and eja+1:ja+n-1 contains the superdiagonal elements of the tridiagonal matrix T.
If uplo = 'L', then eja:ja+n-2 contains the subdiagonal elements of the tridiagonal matrix T, and eja+n-1 = 0.
This identifies the first element of the local array E. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+n-1) part of the local array E must contain the local pieces of the leading 1 by ja+n-1 part of the global matrix E.
A copy of the vector e, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of E is distributed is CSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 97.
If uplo = 'U', then tauja = 0 and tauja+1:ja+n-1 contains the scalar factors of the elementary reflectors.
If uplo = 'L', then tauja:ja+n-2 contains the scalar factors of the elementary reflectors and tauja+n-1 = 0
This identifies the first element of the local array tau. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+n-1) part of the local array tau must contain the local pieces of the leading 1 by ja+n-1 part of the global matrix tau.
A copy of the vector tau, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of tau is distributed is CSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 97.
If lwork <> 0 and lwork <> -1, its size is (at least) of length lwork.
If lwork = -1, its size is (at least) of length 1.
Scope: local
Returned as: an area of storage, where:
If lwork >= 1 or lwork = -1, then work1 is set to the minimum lwork value and contains numbers of the data type indicated in Table 97. Except for work1, the contents of work are overwritten on return.
Scope: global
Returned as: a fullword integer; info = 0.
This subroutine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonal similarity transformation:
where:
where:
If uplo = 'U', then the following example shows the contents of A on return with n = 5 and ia = ja = 1:
where:
where:
If uplo = 'L', then the following example shows the contents of A on return with n = 5 and ia = ij = 1:
where:
None
If n <> 0:
In all cases:
where:
Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:
Also:
This example shows the reduction of a symmetric matrix of order 4 to symmetric tridiagonal form, using a 2 × 2 process grid.
Note: | Because lwork = 0, PDSYTRD dynamically allocates the work area used by this subroutine. |
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) UPLO N A IA JA DESC_A D E TAU WORK LWORK INFO | | | | | | | | | | | | CALL PDSYTRD( 'U' , 4 , A , 1 , 1 , DESC_A , D , E , TAU , WORK , 0 , INFO )
| DESC_A | ||
---|---|---|---|
DTYPE_ | 1 | ||
CTXT_ | icontxt1 | ||
M_ | 4 | ||
N_ | 4 | ||
MB_ | 1 | ||
NB_ | 1 | ||
RSRC_ | 0 | ||
CSRC_ | 0 | ||
LLD_ | See below2 | ||
|
Global symmetric matrix A of order 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 5.0 | 4.0 | 1.0 | 1.0 | | ------|--------|--------|------ | 1 | . | 5.0 | 1.0 | 1.0 | | ------|--------|--------|------ | 2 | . | . | 4.0 | 2.0 | | ------|--------|--------|------ | 3 | . | . | . | 4.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|------------|------------ 0 | 5.0 1.0 | 4.0 1.0 | . 4.0 | . 2.0 -----|------------|------------ 1 | . 1.0 | 5.0 1.0 | . . | . 4.0
Output:
Global symmetric matrix A of order 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 1.00 | 0.00 | 0.41 | 0.22 | | -------|---------|---------|------- | 1 | . | 6.00 | 2.83 | 0.22 | | -------|---------|---------|------- | 2 | . | . | 7.00 | -2.45 | | -------|---------|---------|------- | 3 | . | . | . | 4.00 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
0
2 | P00 | P01 |
1
3 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|--------------|-------------- 0 | 1.00 0.41 | 0.00 0.22 | . 7.00 | . -2.45 -----|--------------|-------------- 1 | . 2.83 | 6.00 0.22 | . . | . 4.00
Global row vector D of length 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 1.00 | 6.00 | 7.00 | 4.00 | * *
Note: | A copy of D is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for D:
p,q | 0 | 1 -----|--------------|-------------- 0 | 1.00 7.00 | 6.00 4.00 -----|--------------|-------------- 1 | 1.00 7.00 | 6.00 4.00
Global row vector E of length 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 0.00 | 0.00 | 2.83 | -2.45 | * *
Note: | A copy of E is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for E:
p,q | 0 | 1 -----|--------------|-------------- 0 | 0.00 2.83 | 0.00 -2.45 -----|--------------|-------------- 1 | 0.00 2.83 | 0.00 -2.45
Global row vector tau of length 4 with block sizes 1 × 1:
B,D 0 1 2 3 * * 0 | 0.00 | 0.00 | 1.71 | 1.82 | * *
Note: | A copy of tau is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 3 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for tau:
p,q | 0 | 1 -----|--------------|-------------- 0 | 0.00 1.71 | 0.00 1.82 -----|--------------|-------------- 1 | 0.00 1.71 | 0.00 1.82
The value of info is 0 on all processes.
This subroutine reduces a real general matrix A to upper Hessenberg form H by an orthogonal similarity transformation:
where A represents the global general submatrix Aia+ilo-1: ia+ihi-1, ja+ilo-1: ja+ihi-1.
If n = 0, no computation is performed, and the subroutine returns after doing some parameter checking. Then, if ihi = ilo, the subroutine returns after doing some parameter checking and setting tauja:ja+ilo-2 and tauja+ihi-1:ja+n-2 to zero.
A, tau, work | Subroutine |
Long-precision real | PDGEHRD |
Fortran | CALL PDGEHRD (n, ilo, ihi, a, ia, ja, desc_a, tau, work, lwork, info) |
C and C++ | pdgehrd (n, ilo, ihi, a, ia, ja, desc_a, tau, work, lwork, info); |
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: global
Specified as: a fullword integer; 1 <= ilo <= max(1, n).
Scope: global
Specified as: a fullword integer; min(ilo, n) <= ihi <= n.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 98. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix | If n = 0: M_A >= 0
Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix | If n = 0: N_A >= 0
Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If lwork = 0, work is ignored.
If lwork <> 0, work is the work area used by this subroutine, where:
Scope: local
Specified as: an area of storage containing numbers of data type indicated in Table 98.
Scope:
Specified as: a fullword integer; where:
lwork >= (nb × nb)+nb × max(ihip+1, ihlp+inlq)
where:
The upper triangle and the first subdiagonal of Aia:ia+n-1, ja:ja+n-1 are overwritten by the corresponding elements of the upper Hessenberg matrix H. The elements below the first subdiagonal are overwritten with vi+2:ihi. These elements with tau represent the orthogonal matrix Q as a product of elementary reflectors.
See "Function", for more information.
Scope: local
Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 98. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
This identifies the first element of the local array tau. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+n-2) part of the local array tau must contain the local pieces of the leading 1 by ja+n-2 part of the global matrix tau.
A copy of the vector tau, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of tau is distributed is CSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(N_A-1) array, containing numbers of the data type indicated in Table 98.
If lwork <> 0 and lwork <> -1, its size is (at least) of length lwork.
If lwork = -1, its size is (at least) of length 1.
Scope: local
Returned as: an area of storage, where:
If lwork >= 1 or lwork = -1, then work1 is set to the minimum lwork value and contains numbers of the data type indicated in Table 98. Except for work1, the contents of work are overwritten on return.
Scope: global
Returned as: a fullword integer; info = 0.
If n = 0, you should set ilo = 1 and ihi = 0. If n > 0, you should set 1 <= ilo <= ihi <= n.
This subroutine reduces a real general matrix A to upper Hessenberg form H by an orthogonal similarity transformation:
where:
where:
The following example shows the contents of the general submatrix A on entry with n = 7, ia = ja = 1, ilo = 2, and ihi = 6:
Following is the general submatrix A on return:
where:
None
If n <> 0:
In all cases:
where:
Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:
Also:
This example shows the reduction of a general matrix of order 3 to upper Hessenberg form using a 2 × 2 process grid.
Note: | Because lwork = 0, PDGEHRD dynamically allocates the work area used by this subroutine. |
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) N ILO IHI A IA JA DESC_A TAU WORK LWORK INFO | | | | | | | | | | | CALL PDGEHRD( 3 , 1 , 3 , A , 1 , 1 , DESC_A , TAU , WORK , 0 , INFO)
| DESC_A | ||
---|---|---|---|
DTYPE_ | 1 | ||
CTXT_ | icontxt1 | ||
M_ | 3 | ||
N_ | 3 | ||
MB_ | 1 | ||
NB_ | 1 | ||
RSRC_ | 0 | ||
CSRC_ | 0 | ||
LLD_ | See below2 | ||
|
Global general matrix A of order 3 with block sizes 1 × 1:
B,D 0 1 2 * * 0 | 33.0 | 16.0 | 72.0 | | -------|---------|------- | 1 | -24.0 | -10.0 | -57.0 | | -------|---------|------- | 2 | -8.0 | -4.0 | -17.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|--------------|-------- 0 | 33.0 72.0 | 16.0 | -8.0 -17.0 | -4.0 -----|--------------|-------- 1 | -24.0 -57.0 | -10.0
Output:
Global general matrix A of order 3 with block sizes 1 × 1:
B,D 0 1 2 * * 0 | 33.00 | -37.95 | 63.25 | | --------|----------|-------- | 1 | 25.30 | -29.00 | 53.00 | | --------|----------|-------- | 2 | 0.16 | 0.00 | 2.00 | * *
The following is the 2 × 2 process grid:
B,D | 0 2 | 1 |
---|---|---|
0
2 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|----------------|--------- 0 | 33.0 63.25 | -37.95 | 0.16 2.00 | 0.00 -----|----------------|--------- 1 | 25.30 53.00 | -29.00
Global row vector tau of length 2 with block sizes of 1:
B,D 0 1 * * 0 | 1.95 | 0.00 | * *
Note: | A copy of tau is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for tau:
p,q | 0 | 1 -----|--------|-------- 0 | 1.95 | 0.00 -----|--------|-------- 1 | 1.95 | 0.00
The value of info is 0 on all processes.
This subroutine reduces a real general matrix A of order m by n to upper or lower bidiagonal form B by an orthogonal transformation:
where:
If min(m, n) = 0, no computation is performed and the subroutine returns after doing some parameter checking.
A, d, e, tauq, taup, work | Subroutine |
Long-precision real | PDGEBRD |
Fortran | CALL PDGEBRD (m, n, a, ia, ja, desc_a, d, e, tauq, taup, work, lwork, info) |
C and C++ | pdgebrd (m, n, a, ia, ja, desc_a, d, e, tauq, taup, work, lwork, info); |
Scope: global
Specified as: a fullword integer; m >= 0
Scope: global
Specified as: a fullword integer; n >= 0.
Scope: local
Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 99. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
Scope: global
Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.
Scope: global
Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.
desc_a | Name | Description | Limits | Scope |
---|---|---|---|---|
1 | DTYPE_A | Descriptor type | DTYPE_A=1 | Global |
2 | CTXT_A | BLACS context | Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP | Global |
3 | M_A | Number of rows in the global matrix | If m = 0 or n = 0:
M_A >= 0
Otherwise: M_A >= 1 | Global |
4 | N_A | Number of columns in the global matrix | If m = 0 or n = 0:
N_A >= 0
Otherwise: N_A >= 1 | Global |
5 | MB_A | Row block size | MB_A >= 1 | Global |
6 | NB_A | Column block size | NB_A >= 1 | Global |
7 | RSRC_A | The process row of the p × q grid over which the first row of the global matrix is distributed | 0 <= RSRC_A < p | Global |
8 | CSRC_A | The process column of the p × q grid over which the first column of the global matrix is distributed | 0 <= CSRC_A < q | Global |
9 | LLD_A | The leading dimension of the local array | LLD_A >= max(1,LOCp(M_A)) | Local |
Specified as: an array of (at least) length 9, containing fullword integers.
If lwork = 0, work is ignored.
If lwork <> 0, work is the work area used by this subroutine, where:
Scope: local
Specified as: an area of storage containing numbers of data type indicated in Table 99.
Scope:
Specified as: a fullword integer; where:
lwork >= nb(mp0+nq0+1)+nq0
where:
See "Function", for more information.
Scope: local
Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 99. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.
This identifies the first element of the local array D. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+n-1) part of the local array D must contain the local pieces of the leading 1 by ja+n-1 part of the global matrix D.
A copy of the vector d, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of d is distributed is CSRC_A.
This identifies the first element of the local array D. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+m-1) by 1 part of the local array D must contain the local pieces of the leading ia+m-1 by 1 part of the global matrix D.
A copy of the vector d, with a block size of MB_A and global index ia, is returned to each column of the process grid. The process row over which the first row of d is distributed is RSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(N_A) array if m >= n, and a LOCp(M_A) by 1 array if m < n, containing numbers of the data type indicated in Table 99.
This identifies the first element of the local array E. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-2) by 1 part of the local array E must contain the local pieces of the leading ia+n-2 by 1 part of the global matrix E.
A copy of the vector e, with a block size of MB_A and global index ia, is returned to each column of the process grid. The process row over which the first row of e is distributed is RSRC_A.
This identifies the first element of the local array D. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+m-2) part of the local array E must contain the local pieces of the leading 1 by ja+m-2 part of the global matrix E.
A copy of the vector e, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of e is distributed is CSRC_A.
Scope: local
Returned as: an (at least) LOCp(N_A-1) by 1 array if m >= n and a 1 by (at least) LOCq(M_A-1) array if m < n, containing numbers of the data type indicated in Table 99.
contains the scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See "Function" for more details.
This identifies the first element of the local array tauq. This subroutine computes the location of the first element of the local subarray used, based on ja, desc_a, p, q, myrow, and mycol; therefore, the leading 1 by LOCq(ja+min(m, n)-1) part of the local array tauq must contain the local pieces of the leading 1 by ja+min(m, n)-1 part of the global matrix tauq.
A copy of the vector tauq, with a block size of NB_A and global index ja, is returned to each row of the process grid. The process column over which the first column of tauq is distributed is CSRC_A.
Scope: local
Returned as: a 1 by (at least) LOCq(min(M_A, N_A)) array, containing numbers of the data type indicated in Table 99.
contains the scalar factors of the elementary reflectors which represent the orthogonal matrix P. See "Function" for more details.
This identifies the first element of the local array taup. This subroutine computes the location of the first element of the local subarray used, based on ia, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+min(m, n)-1) by 1 part of the local array taup must contain the local pieces of the leading ia+min(m, n)-1 by 1 part of the global matrix taup.
A copy of the vector taup, with a block size of MB_A and global index ia, is returned to each column of the process grid. The process row over which the first row of taup is distributed is RSRC_A.
Scope: local
Returned as: an (at least) LOCp(min(M_A, N_A)) by 1 array, containing numbers of the data type indicated in Table 99.
If lwork <> 0 and lwork <> -1, its size is (at least) of length lwork.
If lwork = -1, its size is (at least) of length 1.
Scope: local
Returned as: an area of storage, where:
If lwork >= 1 or lwork = -1, then work1 is set to the minimum lwork value and contains numbers of the data type indicated in Table 99. Except for work1, the contents of work are overwritten on return.
Scope: global
Returned as: a fullword integer; info = 0.
This subroutine reduces a real general matrix A of order m by n to upper or lower bidiagonal form B by an orthogonal transformation:
where:
where:
The following example shows the contents of A on return with ia = ja = 1, m = 6, and n = 5:
where:
where:
The following example shows the contents of A on return with ia = ja = 1, m = 5, and n = 6:
where:
None
If m <> 0 and n <> 0:
In all cases:
where:
Each of the following global input arguments are checked to determine whether its value differs from the value specified on process P00:
Also:
This example shows the reduction of a general matrix of order 4 by 3 to bidiagonal form using a 2 × 2 process grid.
Note: | Because lwork = 0, PDGEBRD dynamically allocates the work area used by this subroutine. |
ORDER = 'R' NPROW = 2 NPCOL = 2 CALL BLACS_GET(0, 0, ICONTXT) CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL) CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL) M N A IA JA DESC_A D E TAUQ TAUP WORK LWORK INFO | | | | | | | | | | | | | CALL PDGEBRD( 4 , 3 , A , 1 , 1 , DESC_A , D , E , TAUQ , TAUP , WORK , 0 , INFO )
| DESC_A | ||
---|---|---|---|
DTYPE_ | 1 | ||
CTXT_ | icontxt1 | ||
M_ | 4 | ||
N_ | 3 | ||
MB_ | 2 | ||
NB_ | 2 | ||
RSRC_ | 0 | ||
CSRC_ | 0 | ||
LLD_ | See below2 | ||
|
Global general matrix A of order 4 × 3 with block sizes 2 × 2:
B,D 0 1 * * 0 | 10.0 5.0 | 9.0 | | 2.0 16.0 | 10.0 | | -----------|------ | 1 | 3.0 7.0 | 21.0 | | 4.0 8.0 | 12.0 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|------------|------- 0 | 10.0 5.0 | 9.0 | 2.0 16.0 | 10.0 -----|------------|------- 1 | 3.0 7.0 | 21.0 | 4.0 8.0 | 12.0
Output:
Global general matrix A of order 4 × 3 with block sizes 2 × 2:
B,D 0 1 * * 0 | -11.36 22.80 | 0.56 | | 0.09 23.32 | 1.67 | | ---------------|-------- | 1 | 0.14 0.46 | -9.68 | | 0.19 0.22 | 0.08 | * *
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for A:
p,q | 0 | 1 -----|----------------|--------- 0 | -11.36 22.80 | 0.56 | 0.09 23.32 | 1.67 -----|----------------|--------- 1 | 0.14 0.46 | -9.68 | 0.19 0.22 | 0.08
Global row vector D of length 3 with block size 2:
B,D 0 1 * * 0 | -11.36 23.32 | -9.68 | * *
Note: | A copy of D is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for D:
p,q | 0 | 1 -----|----------------|--------- 0 | -11.36 23.32 | -9.68 -----|----------------|--------- 1 | -11.36 23.32 | -9.68
Global column vector E of length 2 with block size 2:
B,D 0 * * 0 | 22.80 | | 1.67 | * *
Note: | A copy of E is distributed across each column of the process grid. |
The following is the 2 × 2 process grid:
B,D |
|
|
---|---|---|
0 | P00 | P01 |
| P10 | P11 |
Local arrays for E:
p,q | 0 | 1 -----|--------|-------- 0 | 22.80 | 22.80 | 1.67 | 1.67 -----|--------|-------- 1 | . | .
Global row vector tauq of length 3 with block size 2:
B,D 0 1 * * 0 | 1.88 1.59 | 1.99 | * *
Note: | A copy of tauq is distributed across each row of the process grid. |
The following is the 2 × 2 process grid:
B,D | 0 | 1 |
---|---|---|
| P00 | P01 |
| P10 | P11 |
Local arrays for tauq:
p,q | 0 | 1 -----|--------------|-------- 0 | 1.88 1.59 | 1.99 -----|--------------|-------- 1 | 1.88 1.59 | 1.99
Global column vector taup of length 3 with block size 2:
B,D 0 * * 0 | 1.52 | | 0.00 | | ----- | 1 | 0.00 | * *
Note: | A copy of taup is distributed across each column of the process grid. |
The following is the 2 × 2 process grid:
B,D |
|
|
---|---|---|
0 | P00 | P01 |
1 | P10 | P11 |
Local arrays for taup:
p,q | 0 | 1 -----|--------|-------- 0 | 1.52 | 1.52 | 0.00 | 0.00 -----|--------|-------- 1 | 0.00 | 0.00
The value of info is 0 on all processes.