Guide and Reference

Reference Information (Message Passing)

This part of the book is organized into seven areas, providing reference information for coding the Parallel ESSL message passing subroutines. It is organized as follows:

Level 2 PBLAS
Level 3 PBLAS
Linear Algebraic Equations
Eigensystem Analysis and Singular Value Analysis
Fourier Transforms
Random Number Generation
Utilities

Level 2 PBLAS (Message Passing)

This chapter describes the Level 2 PBLAS subroutines.

Overview of the Level 2 PBLAS Subroutines

The Level 2 PBLAS include a subset of the standard set of distributed memory parallel versions of the Level 2 BLAS.
Note: These subroutines are designed in accordance with the proposed Level 2 PBLAS standard. (See references [14], [15], and [17].) If these subroutines do not comply with the standard as approved, IBM will consider updating them to do so. If IBM updates these subroutines, the update could require modifications of the calling application program.

Table 36. List of Level 2 PBLAS (Message Passing)

Descriptive Name Long-Precision Subprogram Page
Matrix-Vector Product for a General Matrix or Its Transpose PDGEMV PDGEMV--Matrix-Vector Product for a General Matrix or Its Transpose
Matrix-Vector Product for a Real Symmetric Matrix PDSYMV PDSYMV--Matrix-Vector Product for a Real Symmetric Matrix
Rank-One Update of a General Matrix PDGER PDGER--Rank-One Update of a General Matrix
Rank-One Update of a Real Symmetric Matrix PDSYR PDSYR--Rank-One Update of a Real Symmetric Matrix
Rank-Two Update of a Real Symmetric Matrix PDSYR2 PDSYR2--Rank-Two Update of a Real Symmetric Matrix
Matrix-Vector Product for a Triangular Matrix or Its Transpose PDTRMV PDTRMV--Matrix-Vector Product for a Triangular Matrix or Its Transpose
Solution of Triangular System of Equations with a Single Right-Hand Side PDTRSV PDTRSV--Solution of Triangular System of Equations with a Single Right-Hand Side

Descriptive Name	Long-Precision Subprogram	Page
Matrix-Vector Product for a General Matrix or Its Transpose	PDGEMV	PDGEMV--Matrix-Vector Product for a General Matrix or Its Transpose
Matrix-Vector Product for a Real Symmetric Matrix	PDSYMV	PDSYMV--Matrix-Vector Product for a Real Symmetric Matrix
Rank-One Update of a General Matrix	PDGER	PDGER--Rank-One Update of a General Matrix
Rank-One Update of a Real Symmetric Matrix	PDSYR	PDSYR--Rank-One Update of a Real Symmetric Matrix
Rank-Two Update of a Real Symmetric Matrix	PDSYR2	PDSYR2--Rank-Two Update of a Real Symmetric Matrix
Matrix-Vector Product for a Triangular Matrix or Its Transpose	PDTRMV	PDTRMV--Matrix-Vector Product for a Triangular Matrix or Its Transpose
Solution of Triangular System of Equations with a Single Right-Hand Side	PDTRSV	PDTRSV--Solution of Triangular System of Equations with a Single Right-Hand Side

Level 2 PBLAS Subroutines

This section contains the Level 2 PBLAS subroutine descriptions.

PDGEMV--Matrix-Vector Product for a General Matrix or Its Transpose

This subroutine computes one of the following matrix-vector products:

y <-- alphaAx+betay

y <-- alphaA^Tx+betay

where, in the formulas above:

A represents the global general submatrix A_{ia:ia+m-1,
ja:ja+n-1}.

x represents the global vector:

For transa = 'N':
- For incx = M_X, it is X_{ix:ix,
  jx:jx+n-1}.
- For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
  jx:jx}.
For transa = 'T':
- For incx = M_X, it is X_{ix:ix,
  jx:jx+m-1}.
- For incx = 1 and incx <> M_X, it is X_{ix:ix+m-1,
  jx:jx}.

y represents the global vector:

For transa = 'N':
- For incy = M_Y, it is Y_{iy:iy,
  jy:jy+m-1}.
- For incy = 1 and incy <> M_Y, it is Y_{iy:iy+m-1,
  jy:jy}.
For transa = 'T':
- For incy = M_Y, it is Y_{iy:iy,
  jy:jy+n-1}.
- For incy = 1 and incy <> M_Y, it is Y_{iy:iy+n-1,
  jy:jy}.

alpha and beta are scalars.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:

m = 0
n = 0
alpha is zero and beta is one.

See references [14] and [15].

Table 37. Data Types

alpha, beta, A, x, y Subprogram
Long-precision real PDGEMV

Syntax

Fortran	CALL PDGEMV (`transa`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`, `beta`, `y`, `iy`, `jy`, `desc_y`, `incy`)
C and C++	pdgemv (`transa`, `m`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`, `beta`, `y`, `iy`, `jy`, `desc_y`, `incy`);

On Entry

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

m

is the number of rows in submatrix A used in the computation, and:

If transa = 'N', it is the number of elements in vector y.

If transa = 'T', it is the number of elements in vector x.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix A used in the computation, and:

If transa = 'N', it is the number of elements in vector x.

If transa = 'T', it is the number of elements in vector y.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 37.

a

is the local part of the global general matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+m-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+m-1 by ja+n-1 part of the global matrix.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

x

is the local part of the global matrix X. This identifies the first element of the local array X. This subroutine computes the location of the first element of the local subarray used, based on ix, jx, desc_x, p, q, myrow, and mycol; therefore, assuming the following:

If transa = 'N', numx = n

If transa = 'T', numx = m

the following must be true:

If incx = M_X, the leading LOCp(ix) by LOCq(jx+numx-1) part of the local array X must contain the local pieces of the leading ix by jx+numx-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+numx-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+numx-1 by jx part of the global matrix.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X, and if incx = 1 and incx <> M_X, then:

If transa = 'N', then ix+n-1 <= M_X.

If transa = 'T', then ix+m-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X, and if incx = M_X, then:

If transa = 'N', then jx+n-1 <= N_X.

If transa = 'T', then jx+m-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `transa` = 'N' and `n` = 0: M_X >= 0 If `transa` = 'T' and `m` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `transa` = 'N' and `n` = 0: N_X >= 0 If `transa` = 'T' and `m` = 0: N_X >= 0 Otherwise: N_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 37.

y

is the local part of the global matrix Y. This identifies the first element of the local array Y. This subroutine computes the location of the first element of the local subarray used, based on iy, jy, desc_y, p, q, myrow, and mycol; therefore, assuming the following:

If transa = 'N', numy = m

If transa = 'T', numy = n

the following must be true:

If incy = M_Y, the leading LOCp(iy) by LOCq(jy+numy-1) part of the local array Y must contain the local pieces of the leading iy by jy+numy-1 part of the global matrix.
If incy = 1 and incy <> M_Y, the leading LOCp(iy+numy-1) by LOCq(jy) part of the local array Y must contain the local pieces of the leading iy+numy-1 by jy part of the global matrix.

When beta is zero, y need not be set on input.

Scope: local

Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.

iy

has the following meaning:

If incy = M_Y, it indicates which row of global matrix Y is used for vector y.

If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.

Scope: global

Specified as: a fullword integer; 1 <= iy <= M_Y, and if incy = 1 and incy <> M_Y, then:

If transa = 'N', then iy+m-1 <= M_Y.

If transa = 'T', then iy+n-1 <= M_Y.

jy

has the following meaning:

If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.

If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.

Scope: global

Specified as: a fullword integer; 1 <= jy <= N_Y, and if incy = M_Y, then:

If transa = 'N', then jy+m-1 <= N_Y.

If transa = 'T', then jy+n-1 <= N_Y.

desc_y

is the array descriptor for global matrix Y, described in the following table:

`desc_y`	Name	Description	Limits	Scope
1	DTYPE_Y	Descriptor type	DTYPE_Y=1	Global
2	CTXT_Y	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_Y	Number of rows in the global matrix	If `transa` = 'N' and `m` = 0: M_Y >= 0 If `transa` = 'T' and `n` = 0: M_Y >= 0 Otherwise: M_Y >= 1	Global
4	N_Y	Number of columns in the global matrix	If `transa` = 'N' and `m` = 0: N_Y >= 0 If `transa` = 'T' and `n` = 0: N_Y >= 0 Otherwise: N_Y >= 1	Global
5	MB_Y	Row block size	MB_Y >= 1	Global
6	NB_Y	Column block size	NB_Y >= 1	Global
7	RSRC_Y	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_Y < `p`	Global
8	CSRC_Y	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_Y < `q`	Global
9	LLD_Y	The leading dimension of the local array	LLD_Y >= max(1,LOCp(M_Y))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incy

is the stride for global vector y.

Scope: global

Specified as: a fullword integer; incy = 1 or incy = M_Y, where:

If incy = M_Y, then y is a row-distributed vector.

If incy = 1 and incy <> M_Y, then y is a column-distributed vector.

On Return

y

is the updated local part of the global matrix Y, containing the results of the computation.

Scope: local

Returned as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 37.

Notes and Coding Rules

This subroutine accepts lowercase letters for the transa argument.
If you specify 'C' for the transa, it is interpreted as though you specified 'T'.
The matrix and vectors must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X = CTXT_Y.
The following coding rules depend upon the values specified for transa and incx:
- If transa = 'N' and incx = M_X:
  - The following block sizes must be equal: NB_A = NB_X.
  - In the process grid, the process column containing the first column of the submatrix X must also contain the first column of the submatrix A; that is, iacol = ixcol, where:
    
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
  - The block column offset of x must be equal to the block column offset of A; that is, mod(jx-1, NB_X) = mod(ja-1, NB_A).
- If transa = 'N' and incx = 1( <> M_X):
  - The following block sizes must be equal: NB_A = MB_X.
  - The block row offset of x must be equal to the block column offset of A; that is, mod(ix-1, MB_X) = mod(ja-1, NB_A).
- If transa = 'T' and incx = M_X:
  - The following block sizes must be equal: MB_A = NB_X.
  - The block column offset of x must be equal to the block row offset of A; that is, mod(jx-1, NB_X) = mod(ia-1, MB_A).
- If transa = 'T' and incx = 1( <> M_X):
  - The following block sizes must be equal: MB_A = MB_X.
  - In the process grid, the process row containing the first row of the submatrix X must also contain the first row of the submatrix A; that is, iarow = ixrow, where:
    
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
  - The block row offset of x must be equal to the block row offset of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A).
The following coding rules depend upon the values specified for transa and incy:
- If transa = 'N' and incy = M_Y:
  - The following block sizes must be equal: MB_A = NB_Y.
  - The block column offset of y must be equal to the block row offset of A; that is, mod(jy-1, NB_Y) = mod(ia-1, MB_A).
- If transa = 'N' and incy = 1( <> M_Y):
  - The following block sizes must be equal: MB_A = MB_Y.
  - In the process grid, the process row containing the first row of the submatrix Y must also contain the first row of the submatrix A; that is, iarow = iyrow, where:
    
    iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
    iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)
  - The block row offset of y must be equal to the block row offset of A; that is, mod(iy-1, MB_Y) = mod(ia-1, MB_A).
- If transa = 'T' and incy = M_Y:
  - The following block sizes must be equal: NB_A = NB_Y.
  - In the process grid, the process column containing the first column of the submatrix Y must also contain the first column of the submatrix A; that is, iacol = iycol, where:
    
    iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
    iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
  - The block column offset of y must be equal to the block column offset of A; that is, mod(jy-1, NB_Y) = mod(ja-1, NB_A).
- If transa = 'T' and incy = 1( <> M_Y):
  - The following block sizes must be equal: NB_A = MB_Y.
  - The block row offset of y must be equal to the block column offset of A; that is, mod(iy-1, MB_Y) = mod(ja-1, NB_A).
An example of the use of this subroutine in a thermal diffusion application program is shown in Appendix B. "Sample Programs". See "Program Main (Message Passing)".

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.
DTYPE_Y is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDGEMV was called from outside the process grid.

Stage 4

transa <> 'N', 'T', or 'C'
m < 0
n < 0
M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1

If (n = 0 and transa = 'N') or (m = 0 and transa = 'T'):
M_X < 0
N_X < 0

Otherwise:
M_X < 1
N_X < 1

In all cases:
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q
CTXT_A <> CTXT_X
ix < 1
jx < 1

If (m = 0 and transa = 'N') or (n = 0 and transa = 'T'):
M_Y < 0
N_Y < 0

Otherwise:
M_Y < 1
N_Y < 1

In all cases:
MB_Y < 1
NB_Y < 1
RSRC_Y < 0 or RSRC_Y >= p
CSRC_Y < 0 or CSRC_Y >= q
CTXT_A <> CTXT_Y
iy < 1
jy < 1

Stage 5

If m <> 0 and n <> 0:

ia > M_A
ja > N_A
ia+m-1 > M_A
ja+n-1 > N_A

If (n <> 0 and transa = 'N') or (m <> 0 and transa = 'T'):
ix > M_X
jx > N_X

If (m <> 0 and transa = 'N') or (n <> 0 and transa = 'T'):
iy > M_Y
jy > N_Y

If incx = M_X and transa = 'N':
NB_X <> NB_A
mod(jx-1, NB_X) <> mod(ja-1, NB_A)
n <> 0 and jx+n-1 <= N_X

If incx = M_X and transa = 'T':
NB_X <> MB_A
mod(jx-1, NB_X) <> mod(ia-1, MB_A)
m <> 0 and jx+m-1 <= N_X

If incx = 1( <> M_X) and transa = 'N':
MB_X <> NB_A
mod(ix-1, MB_X) <> mod(ja-1, NB_A)
n <> 0 and ix+n-1 <= M_X

If incx = 1( <> M_X) and transa = 'T':
MB_X <> MB_A
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
m <> 0 and ix+m-1 <= M_X

In all cases:
incx <> M_X and incx <> 1

If incy = M_Y and transa = 'N':
NB_Y <> MB_A
mod(jy-1, NB_Y) <> mod(ia-1, MB_A)
m <> 0 and jy+m-1 <= N_Y

If incy = M_Y and transa = 'T':
NB_Y <> NB_A
mod(jy-1, NB_Y) <> mod(ja-1, NB_A)
n <> 0 and jy+n-1 <= N_Y

If incy = 1( <> M_Y) and transa = 'N':
MB_Y <> MB_A
mod(iy-1, MB_Y) <> mod(ia-1, MB_A)
m <> 0 and iy+m-1 <= M_Y

If incy = 1( <> M_Y) and transa = 'T':
MB_Y <> NB_A
mod(iy-1, MB_Y) <> mod(ja-1, NB_A)
n <> 0 and iy+n-1 <= M_Y

In all cases:
incy <> M_Y and incy <> 1

Stage 6

If transa = 'N':

If incx = M_X, then (in the process grid) the process column containing the first column of the submatrix X does not contain the first column of the submatrix A; that is, iacol <> ixcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
If incy = 1( <> M_Y), then (in the process grid) the process row containing the first row of the submatrix Y does not contain the first row of the submatrix A; that is, iarow <> iyrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)

If transa = 'T':
If incx = 1( <> M_X), then (in the process grid) the process row containing the first row of the submatrix X does not contain the first row of the submatrix A; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
If incy = M_Y, then (in the process grid) the process column containing the first column of the submatrix Y does not contain the first column of the submatrix A; that is, iacol <> iycol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)

In all cases:
LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
LLD_Y < max(1, LOCp(M_Y))

Example 1

This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in "Example 1" for PDGEMM. The updated portion of Y is the same as for C in PDGEMM, as this computation is equivalent to a portion of the PDGEMM computation.

This example uses a global submatrix A within a global matrix A by specifying ia = 3 and ja = 1. It uses vectors x and y, which are column-distributed vectors within a column of X and Y, respectively, by specifying incx = 1, ix = 1, and jx = 2 for x and incy = 1, iy = 3, and jy = 2 for y.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
             TRANSA M   N    ALPHA    A  IA  JA    DESC_A   X  IX  JX
               |    |   |      |      |   |   |      |      |   |   |
 CALL PDGEMV( 'N' , 4 , 5  , 1.0D0  , A , 3 , 1 ,  DESC_A , X , 1 , 2 ,
 
             DESC_X  INCX  BETA    Y  IY  JY   DESC_Y   INCY
               |      |     |      |   |   |     |       |
             DESC_X , 1 ,  2.0D0 , Y , 3 , 2 , DESC_Y ,  1 )

Desc_A Desc_X Desc_Y
DTYPE_ 1 1 1
CTXT_ icontxt¹ icontxt¹ icontxt¹
M_ 6 5 6
N_ 5 4 4
MB_ 3 2 3
NB_ 2 2 2
RSRC_ 0 0 0
CSRC_ 0 0 0
LLD_ See below² See below² See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.
² Each process should set the LLD_ as follows:
LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW)) LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW)) LLD_Y = MAX(1,NUMROC(M_Y, MB_Y, MYROW, RSRC_Y, NPROW))

In this example, LLD_A = LLD_Y = 3 on all processes, LLD_X = 3 on P₀₀ and P₀₁, and LLD_X = 2 on P₁₀ and P₁₁.

After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 5 submatrix A, starting at row 3 and column 1 in global general 6 × 5 matrix A with block size 3 × 2:

B,D        0             1          2
     *                                  *
     |   .    .   |    .    .   |    .  |
 0   |   .    .   |    .    .   |    .  |
     |  1.0 -1.0  |  -1.0  1.0  |   2.0 |
     | -----------|-------------|------ |
     | -3.0  2.0  |   2.0  2.0  |   0.0 |
 1   |  4.0  0.0  |  -2.0  1.0  |  -1.0 |
     | -1.0 -1.0  |   1.0 -3.0  |   2.0 |
     *                                  *

The following is the 2 × 2 process grid:

B,D 0 2 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |       0         |      1
-----|-----------------|------------
     |   .    .    .   |    .    .
 0   |   .    .    .   |    .    .
     |  1.0 -1.0  2.0  |  -1.0  1.0
-----|-----------------|------------
     | -3.0  2.0  0.0  |   2.0  2.0
 1   |  4.0  0.0 -1.0  |  -2.0  1.0
     | -1.0 -1.0  2.0  |   1.0 -3.0

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 5 × 1, starting at row 1 and column 2 in 5 × 4 global matrix X with block size 2 × 2:

B,D        0            1
     *                        *
 0   |   .  -1.0  |    .    . |
     |   .   2.0  |    .    . |
     | -----------|---------- |
 1   |   .   0.0  |    .    . |
     |   .  -1.0  |    .    . |
     | -----------|---------- |
 2   |   .   2.0  |    .    . |
     *                        *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for x:

p,q  |     0      |     1
-----|------------|-----------
     |   .  -1.0  |    .    .
 0   |   .   2.0  |    .    .
     |   .   2.0  |    .    .
-----|------------|-----------
 1   |   .   0.0  |    .    .
     |   .  -1.0  |    .    .

After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 3 and column 2 in 6 × 4 global matrix Y with block size 3 × 2:

B,D        0            1
     *                        *
     |   .    .   |    .    . |
 0   |   .    .   |    .    . |
     |   .   0.5  |    .    . |
     | -----------|---------- |
     |   .   0.5  |    .    . |
 1   |   .   0.5  |    .    . |
     |   .   0.5  |    .    . |
     *                        *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for y:

p,q  |     0      |     1
-----|------------|-----------
     |   .    .   |    .    .
 0   |   .    .   |    .    .
     |   .   0.5  |    .    .
-----|------------|-----------
     |   .   0.5  |    .    .
 1   |   .   0.5  |    .    .
     |   .   0.5  |    .    .

Output:

B,D        0            1
     *                        *
     |   .    .   |    .    . |
 0   |   .    .   |    .    . |
     |   .   1.0  |    .    . |
     | -----------|---------- |
     |   .   6.0  |    .    . |
 1   |   .  -6.0  |    .    . |
     |   .   7.0  |    .    . |
     *                        *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for y:

p,q  |     0      |     1
-----|------------|-----------
     |   .    .   |    .    .
 0   |   .    .   |    .    .
     |   .   1.0  |    .    .
-----|------------|-----------
     |   .   6.0  |    .    .
 1   |   .  -6.0  |    .    .
     |   .   7.0  |    .    .

Example 2

This example computes y = alphaAx+betay using a 2 × 2 process grid. The input matrices A, X, and Y, used here, are the same as A, B, and C, used in "Example 1" for PDGEMM.

This example uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a row-distributed vector within a row of X, by specifying incx = M_X = 5, ix = 4, and jx = 2. It uses vector y, which is a column-distributed vector within a column of Y, by specifying incy = 1, iy = 2, and jy = 3.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
             TRANSA M   N    ALPHA    A  IA  JA    DESC_A   X  IX  JX
               |    |   |      |      |   |   |      |      |   |   |
 CALL PDGEMV( 'N' , 4 , 3  , 1.0D0  , A , 2 , 2 ,  DESC_A , X , 4 , 2 ,
 
             DESC_X INCX   BETA    Y  IY  JY   DESC_Y   INCY
               |      |     |      |   |   |     |       |
             DESC_X , 5 ,  2.0D0 , Y , 2 , 3 , DESC_Y ,  1 )

Desc_A

Desc_X

Desc_Y

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))
LLD_Y = MAX(1,NUMROC(M_Y, MB_Y, MYROW, RSRC_Y, NPROW))

In this example, LLD_A = LLD_Y = 3 on all processes, LLD_X = 3 on P₀₀ and P₀₁, and LLD_X = 2 on P₁₀ and P₁₁.

After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 4 × 3 submatrix A, starting at row 2 and column 2 in global general 6 × 5 matrix A with block size 3 × 2:

B,D        0             1          2
     *                                 *
     |   .    .   |    .    .   |    . |
 0   |   .   0.0  |   1.0  1.0  |    . |
     |   .  -1.0  |  -1.0  1.0  |    . |
     | -----------|-------------|----- |
     |   .   2.0  |   2.0  2.0  |    . |
 1   |   .   0.0  |  -2.0  1.0  |    . |
     |   .    .   |    .    .   |    . |
     *                                 *

The following is the 2 × 2 process grid:

B,D 0 2 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |       0        |      1
-----|----------------|------------
     |   .    .    .  |    .    .
 0   |   .   0.0   .  |   1.0  1.0
     |   .  -1.0   .  |  -1.0  1.0
-----|----------------|------------
     |   .   2.0   .  |   2.0  2.0
 1   |   .   0.0   .  |  -2.0  1.0
     |   .    .    .  |    .    .

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 3, starting at row 4 and column 2 in 5 × 4 global matrix X with block size 2 × 2:

B,D        0             1
     *                         *
 0   |   .    .   |    .    .  |
     |   .    .   |    .    .  |
     | -----------|----------- |
 1   |   .    .   |    .    .  |
     |   .  -1.0  |   1.0 -1.0 |
     | -----------|----------- |
 2   |   .    .   |    .    .  |
     *                         *

The following is the 2 × 2 process grid:

B,D 0 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for x:

p,q  |     0      |      1
-----|------------|------------
     |   .    .   |    .    .
 0   |   .    .   |    .    .
     |   .    .   |    .    .
-----|------------|------------
 1   |   .    .   |    .    .
     |   .  -1.0  |   1.0 -1.0

After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:

B,D       0            1
     *                       *
     |   .    .  |    .    . |
 0   |   .    .  |   0.5   . |
     |   .    .  |   0.5   . |
     | ----------|---------- |
     |   .    .  |   0.5   . |
 1   |   .    .  |   0.5   . |
     |   .    .  |    .    . |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for y:

p,q  |    0      |     1
-----|-----------|-----------
     |   .    .  |    .    .
 0   |   .    .  |   0.5   .
     |   .    .  |   0.5   .
-----|-----------|-----------
     |   .    .  |   0.5   .
 1   |   .    .  |   0.5   .
     |   .    .  |    .    .

Output:

After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a column-distributed vector. Following is the global vector y of size 4 × 1, starting at row 2 and column 3 in 6 × 4 global matrix Y with block size 3 × 2:

B,D       0            1
     *                       *
     |   .    .  |    .    . |
 0   |   .    .  |   1.0   . |
     |   .    .  |   0.0   . |
     | ----------|---------- |
     |   .    .  |  -1.0   . |
 1   |   .    .  |  -2.0   . |
     |   .    .  |    .    . |
     *                       *

The following is the 2 × 2 process grid:

B,D 0 1
0 P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	1
0	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for y:

p,q  |    0      |     1
-----|-----------|-----------
     |   .    .  |    .    .
 0   |   .    .  |   1.0   .
     |   .    .  |   0.0   .
-----|-----------|-----------
     |   .    .  |  -1.0   .
 1   |   .    .  |  -2.0   .
     |   .    .  |    .    .

PDSYMV--Matrix-Vector Product for a Real Symmetric Matrix

This subroutine computes the following matrix-vector product:

y <-- alphaAx+betay

where, in the formula above:

A represents the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+n-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
jx:jx}.

y represents the global vector:

For incy = M_Y, it is Y_{iy:iy,
jy:jy+n-1}.
For incy = 1 and incy <> M_Y, it is Y_{iy:iy+n-1,
jy:jy}.

alpha and beta are scalars.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

n = 0
alpha is zero and beta is one.

See references [14] and [15].

Table 38. Data Types

alpha, beta, A, x, y Subprogram
Long-precision real PDSYMV

Syntax

Fortran	CALL PDSYMV (`uplo`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`, `beta`, `y`, `iy`, `jy`, `desc_y`, `incy`)
C and C++	pdsymv (`uplo`, `n`, `alpha`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`, `beta`, `y`, `iy`, `jy`, `desc_y`, `incy`);

On Entry

uplo

indicates whether the upper or lower triangular part of the global symmetric submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

n

is the number of rows and columns in submatrix A and the number of elements in vectors x and y used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 38.

a

is the local part of the global symmetric matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+n-1 part of the global matrix, and:

If uplo = 'U', the leading n × n upper triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading n × n lower triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 38. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

x

If incx = M_X, the leading LOCp(ix) by LOCq(jx+n-1) part of the local array X must contain the local pieces of the leading ix by jx+n-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+n-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+n-1 by jx part of the global matrix.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 38. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+n-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+n-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `n` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `n` = 0: N_X >= 0 Otherwise: N_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

beta

is the scalar beta.

Scope: global

Specified as: a number of the data type indicated in Table 38.

y

If incy = M_Y, the leading LOCp(iy) by LOCq(jy+n-1) part of the local array Y must contain the local pieces of the leading iy by jy+n-1 part of the global matrix.
If incy = 1 and incy <> M_Y, the leading LOCp(iy+n-1) by LOCq(jy) part of the local array Y must contain the local pieces of the leading iy+n-1 by jy part of the global matrix.

When beta is zero, y need not be set on input.

Scope: local

Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 38. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.

iy

has the following meaning:

If incy = M_Y, it indicates which row of global matrix Y is used for vector y.

If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.

Scope: global

Specified as: a fullword integer; 1 <= iy <= M_Y and:

If incy = 1 and incy <> M_Y, then iy+n-1 <= M_Y.

jy

has the following meaning:

If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.

If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.

Scope: global

Specified as: a fullword integer; 1 <= jy <= N_Y and:

If incy = M_Y, then jy+n-1 <= N_Y.

desc_y

is the array descriptor for global matrix Y, described in the following table:

`desc_y`	Name	Description	Limits	Scope
1	DTYPE_Y	Descriptor type	DTYPE_Y=1	Global
2	CTXT_Y	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_Y	Number of rows in the global matrix	If `n` = 0: M_Y >= 0 Otherwise: M_Y >= 1	Global
4	N_Y	Number of columns in the global matrix	If `n` = 0: N_Y >= 0 Otherwise: N_Y >= 1	Global
5	MB_Y	Row block size	MB_Y >= 1	Global
6	NB_Y	Column block size	NB_Y >= 1	Global
7	RSRC_Y	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_Y < `p`	Global
8	CSRC_Y	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_Y < `q`	Global
9	LLD_Y	The leading dimension of the local array	LLD_Y >= max(1,LOCp(M_Y))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incy

is the stride for global vector y.

Scope: global

Specified as: a fullword integer; incy = 1 or incy = M_X, where:

If incy = M_Y, then y is a row-distributed vector.

If incy = 1 and incy <> M_Y, then y is a column-distributed vector.

On Return

y

is the updated local part of the global matrix Y, containing the results of the computation.

Scope: local

Returned as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 38.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo argument.
The matrix and vectors must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X = CTXT_Y.
The global symmetric matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The block row and block column offsets of the global symmetric matrix A must be equal; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
The vectors x and y must be distributed along the same axis--that is, they must both be row distributed or column distributed, where:
- incx = M_X and incy = M_Y for row distribution
- incx = 1( <> M_X) and incy = 1( <> M_Y) for column distribution
If incx = M_X and incy = M_Y, then (in the process grid) the process column containing the first column of the submatrix A must also contain the first column of the submatrices X and Y; that is:

iacol = ixcol
iacol = iycol
where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
If incx = 1( <> M_X) and incy = 1( <> M_Y), then (in the process grid) the process row containing the first row of the submatrix A must also contain the first row of the submatrices X and Y; that is:

iarow = ixrow
iarow = iyrow
where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)
If incx = M_X:
- The block column offset of x must be equal to the block column offset of A; that is, mod(jx-1, NB_X) = mod(ja-1, NB_A).
- The following block sizes must be equal: NB_X = NB_A.
If incx = 1( <> M_X):
- The block row offset of x must be equal to the block column offset of A; that is, mod(ix-1, MB_X) = mod(ja-1, NB_A).
- The following block sizes must be equal: MB_X = NB_A.
If incy = M_Y:
- The block column offset of y must be equal to the block row offset of A; that is, mod(jy-1, NB_Y) = mod(ia-1, MB_A).
- The following block sizes must be equal: NB_Y = MB_A.
If incy = 1( <> M_Y):
- The block row offset of y must be equal to the block row offset of A; that is, mod(iy-1, MB_Y) = mod(ia-1, MB_A).
- The following block sizes must be equal: MB_Y = MB_A.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.
DTYPE_Y is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYMV was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
n < 0
MB_X < 1
NB_X < 1
M_X < 0 and n = 0; M_X < 1 otherwise
N_X < 0 and n = 0; N_X < 1 otherwise
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q
CTXT_A <> CTXT_X
ix < 1
jx < 1
MB_Y < 1
NB_Y < 1
M_Y < 0 and n = 0; M_Y < 1 otherwise
N_Y < 0 and n = 0; N_Y < 1 otherwise
RSRC_Y < 0 or RSRC_Y >= p
CSRC_Y < 0 or CSRC_Y >= q
CTXT_A <> CTXT_Y
iy < 1
jy < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
M_A < 0 and (m = 0 and n = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 and n = 0); N_A < 1 otherwise
NB_A < 1
MB_A < 1
ja < 1
ia < 1

Stage 5

MB_A <> NB_A

If n <> 0:
ix > M_X
jx > N_X
iy > M_Y
jy > N_Y
ia+n-1 > M_A
ja+n-1 > N_A

If incx = M_X and incy = M_Y:
NB_A <> NB_X
MB_A <> NB_Y
mod(jx-1, NB_X) <> mod(ja-1, NB_A)
mod(jy-1, NB_Y) <> mod(ia-1, MB_A)
n <> 0 and jx+n-1 > N_X
n <> 0 and jy+n-1 > N_Y

If incx = 1( <> M_X) and incy = 1( <> M_Y):
NB_A <> MB_X
MB_A <> MB_Y
mod(ix-1, MB_X) <> mod(ja-1, NB_A)
mod(iy-1, MB_Y) <> mod(ia-1, MB_A)
n <> 0 and ix+n-1 > M_X
n <> 0 and iy+n-1 > M_Y

Otherwise:
incx <> M_X and incx <> 1
incy <> M_Y and incy <> 1

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
LLD_Y < max(1, LOCp(M_Y))
mod(ia-1, MB_A) <> mod(ja-1, NB_A)
If incx = M_X and incy = M_Y, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrices X and Y; that is:

ixcol <> iacol
iycol <> iacol
where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
If incx = 1( <> M_X) and incy = 1( <> M_Y), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrices X and Y; that is:

ixrow <> iarow
iyrow <> iarow
where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)

Example

This example computes y = alphaAx+betay using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
               UPLO   N    ALPHA    A  IA  JA    DESC_A   X   IX   JX
                |     |      |      |   |   |      |      |    |    |
 CALL PDSYMV(  'U'  , 8 ,  1.0D0  , A , 1 , 1 ,  DESC_A , X ,  1 ,  1 ,
 
              DESC_X   INCX    BETA    Y   IY   JY   DESC_Y   INCY
                |        |      |      |    |    |     |       |
              DESC_X ,   1  , 0.0D0  , Y ,  1 ,  1 , DESC_Y ,  1 )

Desc_A

Desc_X

Desc_Y

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))
LLD_Y = MAX(1,NUMROC(M_Y, MB_Y, MYROW, RSRC_Y, NPROW))

In this example, LLD_A = 4 on all processes, and LLD_X = LLD_Y = 4 on P₀₀ and P₁₀.

Global symmetric matrix A of order 8 with block size 2 × 2:

B,D        0             1             2             3
     *                                                     *
 0   |  0.0 -1.0  |  -1.0  0.0  |   0.0  0.0  |   0.0  0.0 |
     |   .   1.0  |   0.0  1.0  |   0.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 1   |   .    .   |  -1.0 -1.0  |   0.0  0.0  |   1.0  0.0 |
     |   .    .   |    .  -1.0  |   1.0  1.0  |   0.0  1.0 |
     | -----------|-------------|-------------|----------- |
 2   |   .    .   |    .    .   |  -1.0  0.0  |   0.0  0.0 |
     |   .    .   |    .    .   |    .   1.0  |   0.0  0.0 |
     | -----------|-------------|-------------|----------- |
 3   |   .    .   |    .    .   |    .    .   |   0.0  0.0 |
     |   .    .   |    .    .   |    .    .   |    .   0.0 |
     *                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1 3
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0 2	1 3
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for A:

p,q  |          0           |           1
-----|----------------------|----------------------
     |  0.0 -1.0  0.0  0.0  |  -1.0  0.0  0.0  0.0
     |   .   1.0  0.0  1.0  |   0.0  1.0  0.0  1.0
 0   |   .    .  -1.0  0.0  |    .    .   0.0  0.0
     |   .    .    .   1.0  |    .    .   0.0  0.0
-----|----------------------|----------------------
     |   .    .   0.0  0.0  |  -1.0 -1.0  1.0  0.0
     |   .    .   1.0  1.0  |    .  -1.0  0.0  1.0
 1   |   .    .    .    .   |    .    .   0.0  0.0
     |   .    .    .    .   |    .    .    .   0.0

Global vector x of size 8 × 1 with block size 2:

B,D     0
     *      *
 0   |  1.0 |
     |  1.0 |
     | ---- |
 1   |  1.0 |
     |  1.0 |
     | ---- |
 2   |  1.0 |
     |  1.0 |
     | ---- |
 3   |  1.0 |
     |  1.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0	--
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for x:

p,q  |  0
-----|------
     |  1.0
     |  1.0
 0   |  1.0
     |  1.0
-----|------
     |  1.0
     |  1.0
 1   |  1.0
     |  1.0

Output:

Global vector y of size 8 × 1 with block size 2 × 1:

B,D     0
     *      *
 0   | -2.0 |
     |  3.0 |
     | ---- |
 1   | -2.0 |
     |  2.0 |
     | ---- |
 2   |  0.0 |
     |  3.0 |
     | ---- |
 3   |  1.0 |
     |  2.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1
3
P₁₀ P₁₁

B,D	0	--
0 2	P₀₀	P₀₁
1 3	P₁₀	P₁₁

Local arrays for y:

p,q  |  0
-----|------
     | -2.0
     |  3.0
 0   |  0.0
     |  3.0
-----|------
     | -2.0
     |  2.0
 1   |  1.0
     |  2.0

PDGER--Rank-One Update of a General Matrix

This subroutine computes the following rank-one update:

A <-- alphaxy^T+A

where, in the formula above:

A represents the global general submatrix A_{ia:ia+m-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+m-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+m-1,
jx:jx}.

y represents the global vector:

For incy = M_Y, it is Y_{iy:iy,
jy:jy+n-1}.
For incy = 1 and incy <> M_Y, it is Y_{iy:iy+n-1,
jy:jy}.

alpha is a scalar.

Note: No data should be moved to form the vector transpose; that is, the vector should always be stored in its untransposed form.

In the following three cases, no computation is performed and the subroutine returns after doing some parameter checking:

m = 0
n = 0
alpha is zero.

See references [14] and [15].

Table 39. Data Types

alpha, A, x, y Subprogram
Long-precision real PDGER

Syntax

Fortran	CALL PDGER (`m`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `y`, `iy`, `jy`, `desc_y`, `incy`, `a`, `ia`, `ja`, `desc_a`)
C and C++	pdger (`m`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `y`, `iy`, `jy`, `desc_y`, `incy`, `a`, `ia`, `ja`, `desc_a`);

On Entry

m

is the number of rows in submatrix A and the number of elements in vector x used in the computation.

Scope: global

Specified as: a fullword integer; m >= 0.

n

is the number of columns in submatrix A and the number of elements in vector y used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 39.

x

If incx = M_X, the leading LOCp(ix) by LOCq(jx+m-1) part of the local array X must contain the local pieces of the leading ix by jx+m-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+m-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+m-1 by jx part of the global matrix.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 39. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+m-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+m-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `m` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `m` = 0: N_X >= 0 Otherwise: N_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

y

If incy = M_Y, the leading LOCp(iy) by LOCq(jy+n-1) part of the local array Y must contain the local pieces of the leading iy by jy+n-1 part of the global matrix.
If incy = 1 and incy <> M_Y, the leading LOCp(iy+n-1) by LOCq(jy) part of the local array Y must contain the local pieces of the leading iy+n-1 by jy part of the global matrix.

Note:

No data should be moved to form y^T; that is, the vector y should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 39. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.

iy

has the following meaning:

If incy = M_Y, it indicates which row of global matrix Y is used for vector y.

If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.

Scope: global

Specified as: a fullword integer; 1 <= iy <= M_Y and:

If incy = 1 and incy <> M_Y, then iy+n-1 <= M_Y.

jy

has the following meaning:

If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.

If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.

Scope: global

Specified as: a fullword integer; 1 <= jy <= N_Y and:

If incy = M_Y, then jy+n-1 <= N_Y.

desc_y

is the array descriptor for global matrix Y, described in the following table:

`desc_y`	Name	Description	Limits	Scope
1	DTYPE_Y	Descriptor type	DTYPE_Y=1	Global
2	CTXT_Y	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_Y	Number of rows in the global matrix	If `n` = 0: M_Y >= 0 Otherwise: M_Y >= 1	Global
4	N_Y	Number of columns in the global matrix	If `n` = 0: N_Y >= 0 Otherwise: N_Y >= 1	Global
5	MB_Y	Row block size	MB_Y >= 1	Global
6	NB_Y	Column block size	NB_Y >= 1	Global
7	RSRC_Y	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_Y < `p`	Global
8	CSRC_Y	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_Y < `q`	Global
9	LLD_Y	The leading dimension of the local array	LLD_Y >= max(1,LOCp(M_Y))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incy

is the stride for global vector y.

Scope: global

Specified as: a fullword integer; incy = 1 or incy = M_X, where:

If incy = M_Y, then y is a row-distributed vector.

If incy = 1 and incy <> M_Y, then y is a column-distributed vector.

a

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 39. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+m-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `m` = 0 or `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `m` = 0 or `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

a

is the updated local part of the global matrix A, containing the results of the computation.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 39.

Notes and Coding Rules

The matrix and vectors must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X = CTXT_Y.
If incx = M_X:
- The block column offset of x must be equal to the block row offset of A; that is, mod(jx-1, NB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: NB_X = MB_A.
If incx = 1( <> M_X):
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix X; that is, iarow = ixrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
- The block row offset of x must be equal to the block row offset of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: MB_X = MB_A.
If incy = M_Y:
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix Y; that is, iacol = iycol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
- The block column offset of y must be equal to the block column offset of A; that is, mod(jy-1, NB_Y) = mod(ja-1, NB_A).
- The following block sizes must be equal: NB_Y = NB_A.
If incy = 1( <> M_Y):
- The block row offset of y must be equal to the block column offset of A; that is, mod(iy-1, MB_Y) = mod(ja-1, NB_A).
- The following block sizes must be equal: MB_Y = NB_A.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.
DTYPE_Y is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDGER was called from outside the process grid.

Stage 4

m < 0
n < 0
M_X < 0 and m = 0; M_X < 1 otherwise
N_X < 0 and m = 0; N_X < 1 otherwise
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q
CTXT_A <> CTXT_X
ix < 1
jx < 1
M_Y < 0 and n = 0; M_Y < 1 otherwise
N_Y < 0 and n = 0; N_Y < 1 otherwise
MB_Y < 1
NB_Y < 1
RSRC_Y < 0 or RSRC_Y >= p
CSRC_Y < 0 or CSRC_Y >= q
CTXT_A <> CTXT_Y
iy < 1
jy < 1
M_A < 0 and (m = 0 or n = 0); M_A < 1 otherwise
N_A < 0 and (m = 0 or n = 0); N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1

Stage 5

If m <> 0 and n <> 0:

ia > M_A
ja > N_A
ia+m-1 > M_A
ja+n-1 > N_A

If m <> 0:
ix > M_X
jx > N_X

If n <> 0:
iy > M_Y
jy > N_Y

If incx = M_X:
NB_X <> MB_A
mod(jx-1, NB_X) <> mod(ia-1, MB_A)
m <> 0 and jx+m-1 > N_X

If incx = 1( <> M_X):
MB_X <> MB_A
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
m <> 0 and ix+m-1 > M_X

Otherwise:
incx <> M_X and incx <> 1

If incy = M_Y:
NB_Y <> NB_A
mod(jy-1, NB_Y) <> mod(ja-1, NB_A)
n <> 0 and jy+n-1 > N_Y

If incy = 1( <> M_Y):
MB_Y <> NB_A
mod(iy-1, MB_Y) <> mod(ja-1, NB_A)
n <> 0 and iy+n-1 > M_Y

Otherwise:
incy <> M_Y and incy <> 1

Stage 6

If incx = 1( <> M_X), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix X; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
If incy = M_Y, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix Y; that is, iacol <> iycol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
LLD_Y < max(1, LOCp(M_Y))

Example

This example computes A = alphaxy^T+A using a 2 × 2 process grid. It uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a column-distributed vector within a column of global matrix X, by specifying incx = 1, ix = 2, and jx = 1. It uses vector y, which is a row-distributed vector within a row of global matrix Y, by specifying incy = M_Y = 5, iy = 1, and jy = 2.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              M   N    ALPHA    X  IX  JX    DESC_X   INCX   Y   IY   JY
              |   |      |      |   |   |      |       |     |    |    |
 CALL PDGER(  9 , 9 ,  1.0D0  , X , 2 , 1 ,  DESC_X ,  1   , Y ,  1 ,  2 ,
 
              DESC_Y   INCY   A   IA   JA    DESC_A
                |        |    |    |    |      |
              DESC_Y ,   1  , A ,  2 ,  2 ,  DESC_A  )

Desc_A

Desc_X

Desc_Y

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))
LLD_Y = MAX(1,NUMROC(M_Y, MB_Y, MYROW, RSRC_Y, NPROW))

In this example, LLD_A = 6 on P₀₀ and P₀₁, LLD_A = 4 on P₁₀ and P₁₁, LLD_X = 7 on P₀₀, LLD_X = 4 on P₁₀, LLD_Y = 1 on P₀₀ and P₀₁.

After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global 9 × 9 submatrix A, starting at row 2 and column 2 in global general 10 × 10 matrix A with block size 4 × 4:

B,D               0                           1                     2
     *                                                                     *
     |    .     .     .     .   |     .     .     .     .   |     .     .  |
     |    .   12.0  22.0  32.0  |   42.0  52.0  62.0  72.0  |   82.0  92.0 |
 0   |    .   13.0  23.0  33.0  |   43.0  53.0  63.0  73.0  |   83.0  93.0 |
     |    .   14.0  24.0  34.0  |   44.0  54.0  64.0  74.0  |   84.0  94.0 |
     | -------------------------|---------------------------|------------- |
     |    .   15.0  25.0  35.0  |   45.0  55.0  65.0  75.0  |   85.0  95.0 |
     |    .   16.0  26.0  36.0  |   46.0  56.0  66.0  76.0  |   86.0  96.0 |
 1   |    .   17.0  27.0  37.0  |   47.0  57.0  67.0  77.0  |   87.0  97.0 |
     |    .   18.0  28.0  38.0  |   48.0  58.0  68.0  78.0  |   88.0  98.0 |
     | -------------------------|---------------------------|------------- |
 2   |    .   19.0  29.0  39.0  |   49.0  59.0  69.0  79.0  |   89.0  99.0 |
     |    .   20.0  30.0  40.0  |   50.0  60.0  70.0  80.0  |   90.0 100.0 |
     *                                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |                  0                   |             1
-----|--------------------------------------|--------------------------
     |    .     .     .     .     .     .   |     .     .     .     .
     |    .   12.0  22.0  32.0  82.0  92.0  |   42.0  52.0  62.0  72.0
     |    .   13.0  23.0  33.0  83.0  93.0  |   43.0  53.0  63.0  73.0
 0   |    .   14.0  24.0  34.0  84.0  94.0  |   44.0  54.0  64.0  74.0
     |    .   19.0  29.0  39.0  89.0  99.0  |   49.0  59.0  69.0  79.0
     |    .   20.0  30.0  40.0  90.0 100.0  |   50.0  60.0  70.0  80.0
-----|--------------------------------------|--------------------------
     |    .   15.0  25.0  35.0  85.0  95.0  |   45.0  55.0  65.0  75.0
     |    .   16.0  26.0  36.0  86.0  96.0  |   46.0  56.0  66.0  76.0
 1   |    .   17.0  27.0  37.0  87.0  97.0  |   47.0  57.0  67.0  77.0
     |    .   18.0  28.0  38.0  88.0  98.0  |   48.0  58.0  68.0  78.0

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 9 × 1, starting at row 2 and column 1 in 11 × 1 global matrix X with block size 4:

B,D     0
     *      *
     |   .  |
     |  1.0 |
 0   |  1.0 |
     |  1.0 |
     | ---- |
     |  1.0 |
     |  1.0 |
 1   |  1.0 |
     |  1.0 |
     | ---- |
     |  1.0 |
 2   |  1.0 |
     |   .  |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0	--
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for x:

p,q  |  0
-----|------
     |   .
     |  1.0
     |  1.0
 0   |  1.0
     |  1.0
     |  1.0
     |   .
-----|------
     |  1.0
     |  1.0
 1   |  1.0
     |  1.0

After the global matrix Y is distributed over the process grid, only a portion of the global data structure is used--that is, global vector y, which is a row-distributed vector. Following is the global vector y of size 1 × 9, starting at row 1 and column 2 in 1 × 11 global matrix Y with block size 4:

B,D             0                       1                    2
     *                                                               *
 0   |   .   2.0  3.0  4.0  |   5.0  6.0  7.0  8.0  |   9.0 10.0   . |
     *                                                               *

The following is the 2 × 2 process grid:

B,D 0 2 1
0 P₀₀ P₀₁
-- P₁₀ P₁₁

B,D	0 2	1
0	P₀₀	P₀₁
--	P₁₀	P₁₁

Local arrays for y:

p,q  |                 0                   |           1
-----|-------------------------------------|----------------------
 0   |   .   2.0  3.0  4.0  9.0  10.0   .  |   5.0  6.0  7.0  8.0

Output:

B,D               0                           1                     2
     *                                                                     *
     |    .     .     .     .   |     .     .     .     .   |     .     .  |
     |    .   14.0  25.0  36.0  |   47.0  58.0  69.0  80.0  |   91.0 102.0 |
 0   |    .   15.0  26.0  37.0  |   48.0  59.0  70.0  81.0  |   92.0 103.0 |
     |    .   16.0  27.0  38.0  |   49.0  60.0  71.0  82.0  |   93.0 104.0 |
     | -------------------------|---------------------------|------------- |
     |    .   17.0  28.0  39.0  |   50.0  61.0  72.0  83.0  |   94.0 105.0 |
     |    .   18.0  29.0  40.0  |   51.0  62.0  73.0  84.0  |   95.0 106.0 |
 1   |    .   19.0  30.0  41.0  |   52.0  63.0  74.0  85.0  |   96.0 107.0 |
     |    .   20.0  31.0  42.0  |   53.0  64.0  75.0  86.0  |   97.0 108.0 |
     | -------------------------|---------------------------|------------- |
 2   |    .   21.0  32.0  43.0  |   54.0  65.0  76.0  87.0  |   98.0 109.0 |
     |    .   22.0  33.0  44.0  |   55.0  66.0  77.0  88.0  |   99.0 110.0 |
     *                                                                     *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

B,D	0 2	1
0 2	P₀₀	P₀₁
1	P₁₀	P₁₁

Local arrays for A:

p,q  |                 0                   |             1
-----|-------------------------------------|--------------------------
     |   .    .     .     .     .      .   |     .     .     .     .
     |   .  14.0  25.0  36.0  91.0  102.0  |   47.0  58.0  69.0  80.0
     |   .  15.0  26.0  37.0  92.0  103.0  |   48.0  59.0  70.0  81.0
 0   |   .  16.0  27.0  38.0  93.0  104.0  |   49.0  60.0  71.0  82.0
     |   .  21.0  32.0  43.0  98.0  109.0  |   54.0  65.0  76.0  87.0
     |   .  22.0  33.0  44.0  99.0  110.0  |   55.0  66.0  77.0  88.0
-----|-------------------------------------|--------------------------
     |   .  17.0  28.0  39.0  94.0  105.0  |   50.0  61.0  72.0  83.0
     |   .  18.0  29.0  40.0  95.0  106.0  |   51.0  62.0  73.0  84.0
 1   |   .  19.0  30.0  41.0  96.0  107.0  |   52.0  63.0  74.0  85.0
     |   .  20.0  31.0  42.0  97.0  108.0  |   53.0  64.0  75.0  86.0

PDSYR--Rank-One Update of a Real Symmetric Matrix

This subroutine computes the following rank-one update:

A <-- alphaxx^T+A

where, in the formula above:

A represents the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+n-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
jx:jx}.

alpha is a scalar.

Note: No data should be moved to form the vector transpose; that is, the vector should always be stored in its untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

n = 0
alpha is zero.

See references [14] and [15].

Table 40. Data Types

A, x, alpha Subprogram
Long-precision real PDSYR

Syntax

Fortran	CALL PDSYR (`uplo`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `a`, `ia`, `ja`, `desc_a`)
C and C++	pdsyr (`uplo`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `a`, `ia`, `ja`, `desc_a`);

On Entry

uplo

indicates whether the upper or lower triangular part of the global symmetric submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

n

is the number of rows and columns in submatrix A and the number of elements in vector x used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 40.

x

If incx = M_X, the leading LOCp(ix) by LOCq(jx+n-1) part of the local array X must contain the local pieces of the leading ix by jx+n-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+n-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+n-1 by jx part of the global matrix.

Note:

No data should be moved to form x^T; that is, the vector x should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 40. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+n-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+n-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `n` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `n` = 0: N_X >= 0 Otherwise: N_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

a

If uplo = 'U', the leading n × n upper triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading n × n lower triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 40. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

a

is the updated local part of the global matrix A, containing the results of the computation.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 40.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo argument.
The matrix and vector must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_Y.
The global symmetric matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The block row and block column offsets of the global symmetric matrix A must be equal; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
If incx = M_X:
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix X; that is, iacol = ixcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
- The block column offset of x must be equal to the block row offset of A; that is, mod(jx-1, NB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: NB_X = NB_A.
If incx = 1( <> M_X):
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix X; that is, iarow = ixrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
- The block row offset of x must be equal to the block row offset of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: MB_X = MB_A.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYR was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
n < 0
M_X < 0 and n = 0; M_X < 1 otherwise
N_X < 0 and n = 0; N_X < 1 otherwise
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q
CTXT_A <> CTXT_X
ix < 1
jx < 1
M_A < 0 and n = 0; M_A < 1 otherwise
N_A < 0 and n = 0; N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1

Stage 5

NB_A <> MB_A

If n <> 0:
ia > M_A
ja > N_A
ia+n-1 > M_A
ja+n-1 > N_A
ix > M_X
jx > N_X

If incx = M_X:
NB_X <> NB_A
mod(jx-1, NB_X) <> mod(ia-1, MB_A)
n <> 0 and jx+n-1 > N_X

If incx = 1( <> M_X):
MB_X <> MB_A
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
n <> 0 and ix+n-1 > M_X

Otherwise:
incx <> M_X and incx <> 1

Stage 6

mod(ja-1, NB_A) <> mod(ia-1, MB_A)
If incx = M_X, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix X; that is, iacol <> ixcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
If incx = 1( <> M_X), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix X; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))

Example

This example computes A = alphaxx^T+A using a 2 × 2 process grid.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
            UPLO  N   ALPHA   X  IX  JX   DESC_X  INCX  A  IA  JA   DESC_A
             |    |     |     |   |   |     |      |    |   |   |     |
CALL PDSYR( 'L' , 9 , 1.0D0 , X , 1 , 1 , DESC_X , 1  , A , 1 , 1 , DESC_A)

Desc_A

Desc_X

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))

In this example, LLD_A = 5 on P₀₀ and P₀₁, LLD_A = 4 on P₁₀ and P₁₁, LLD_X = 5 on P₀₀, and LLD_X = 4 on P₁₀.

Global symmetric matrix A of order 9 with block size 4 × 4:

B,D               0                           1                  2
     *                                                               *
     |   1.0    .     .     .   |     .     .     .     .   |     .  |
     |   2.0  12.0    .     .   |     .     .     .     .   |     .  |
 0   |   3.0  13.0  23.0    .   |     .     .     .     .   |     .  |
     |   4.0  14.0  24.0  34.0  |     .     .     .     .   |     .  |
     | -------------------------|---------------------------|------- |
     |   5.0  15.0  25.0  35.0  |   45.0    .     .     .   |     .  |
     |   6.0  16.0  26.0  36.0  |   46.0  56.0    .     .   |     .  |
 1   |   7.0  17.0  27.0  37.0  |   47.0  57.0  67.0    .   |     .  |
     |   8.0  18.0  28.0  38.0  |   48.0  58.0  68.0  78.0  |     .  |
     | -------------------------|---------------------------|------- |
 2   |   9.0  19.0  29.0  39.0  |   49.0  59.0  69.0  79.0  |   89.0 |
     *                                                               *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for A:

p,q  |               0                |             1
-----|--------------------------------|--------------------------
     |   1.0    .     .     .     .   |     .     .     .     .
     |   2.0  12.0    .     .     .   |     .     .     .     .
 0   |   3.0  13.0  23.0    .     .   |     .     .     .     .
     |   4.0  14.0  24.0  34.0    .   |     .     .     .     .
     |   9.0  19.0  29.0  39.0  89.0  |   49.0  59.0  69.0  79.0
-----|--------------------------------|--------------------------
     |   5.0  15.0  25.0  35.0    .   |   45.0    .     .     .
     |   6.0  16.0  26.0  36.0    .   |   46.0  56.0    .     .
 1   |   7.0  17.0  27.0  37.0    .   |   47.0  57.0  67.0    .
     |   8.0  18.0  28.0  38.0    .   |   48.0  58.0  68.0  78.0

Global vector x of size 9 × 1 with block size 4:

B,D     0
     *      *
     |  1.0 |
     |  1.0 |
 0   |  1.0 |
     |  1.0 |
     | ---- |
     |  1.0 |
     |  1.0 |
 1   |  1.0 |
     |  1.0 |
     | ---- |
 2   |  1.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for x:

p,q  |  0
-----|------
     |  1.0
     |  1.0
 0   |  1.0
     |  1.0
     |  1.0
-----|------
     |  1.0
     |  1.0
 1   |  1.0
     |  1.0

Output:

Global matrix A of order 9 with block size 4 × 4:

B,D               0                           1                  2
     *                                                               *
     |   2.0    .     .     .   |     .     .     .     .   |     .  |
     |   3.0  13.0    .     .   |     .     .     .     .   |     .  |
 0   |   4.0  14.0  24.0    .   |     .     .     .     .   |     .  |
     |   5.0  15.0  25.0  35.0  |     .     .     .     .   |     .  |
     | -------------------------|---------------------------|------- |
     |   6.0  16.0  26.0  36.0  |   46.0    .     .     .   |     .  |
     |   7.0  17.0  27.0  37.0  |   47.0  57.0    .     .   |     .  |
 1   |   8.0  18.0  28.0  38.0  |   48.0  58.0  68.0    .   |     .  |
     |   9.0  19.0  29.0  39.0  |   49.0  59.0  69.0  79.0  |     .  |
     | -------------------------|---------------------------|------- |
 2   |  10.0  20.0  30.0  40.0  |   50.0  60.0  70.0  80.0  |   90.0 |
     *                                                               *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for A:

p,q  |               0                |             1
-----|--------------------------------|--------------------------
     |   2.0    .     .     .     .   |     .     .     .     .
     |   3.0  13.0    .     .     .   |     .     .     .     .
 0   |   4.0  14.0  24.0    .     .   |     .     .     .     .
     |   5.0  15.0  25.0  35.0    .   |     .     .     .     .
     |  10.0  20.0  30.0  40.0  90.0  |   50.0  60.0  70.0  80.0
-----|--------------------------------|--------------------------
     |   6.0  16.0  26.0  36.0    .   |   46.0    .     .     .
     |   7.0  17.0  27.0  37.0    .   |   47.0  57.0    .     .
 1   |   8.0  18.0  28.0  38.0    .   |   48.0  58.0  68.0    .
     |   9.0  19.0  29.0  39.0    .   |   49.0  59.0  69.0  79.0

PDSYR2--Rank-Two Update of a Real Symmetric Matrix

This subroutine computes the following rank-two update:

A <-- alphaxy^T+alphayx^T+A

where, in the formula above:

A represents the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+n-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
jx:jx}.

y represents the global vector:

For incy = M_Y, it is Y_{iy:iy,
jy:jy+n-1}.
For incy = 1 and incy <> M_Y, it is Y_{iy:iy+n-1,
jy:jy}.

alpha is a scalar.

Note: No data should be moved to form the vector transposes; that is, the vectors should always be stored in their untransposed form.

In the following two cases, no computation is performed and the subroutine returns after doing some parameter checking:

n = 0
alpha is zero.

See references [14] and [15].

Table 41. Data Types

A, x, y, alpha Subprogram
Long-precision real PDSYR2

Syntax

Fortran	CALL PDSYR2 (`uplo`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `y`, `iy`, `jy`, `desc_y`, `incy`, `a`, `ia`, `ja`, `desc_a`)
C and C++	pdsyr2 (`uplo`, `n`, `alpha`, `x`, `ix`, `jx`, `desc_x`, `incx`, `y`, `iy`, `jy`, `desc_y`, `incy`, `a`, `ia`, `ja`, `desc_a`);

On Entry

uplo

indicates whether the upper or lower triangular part of the global symmetric submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

n

is the number of rows and columns in submatrix A and the number of elements in vectors x and y used in the computation.

Scope: global

Specified as: a fullword integer; n >= 0.

alpha

is the scalar alpha.

Scope: global

Specified as: a number of the data type indicated in Table 41.

x

If incx = M_X, the leading LOCp(ix) by LOCq(jx+n-1) part of the local array X must contain the local pieces of the leading ix by jx+n-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+n-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+n-1 by jx part of the global matrix.

Note:

No data should be moved to form x^T; that is, the vector x should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 41. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+n-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+n-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `n` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `n` = 0: N_X >= 0 Otherwise: N_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

y

If incy = M_Y, the leading LOCp(iy) by LOCq(jy+n-1) part of the local array Y must contain the local pieces of the leading iy by jy+n-1 part of the global matrix.
If incy = 1 and incy <> M_Y, the leading LOCp(iy+n-1) by LOCq(jy) part of the local array Y must contain the local pieces of the leading iy+n-1 by jy part of the global matrix.

Scope: local

Specified as: an LLD_Y by (at least) LOCq(N_Y) array, containing numbers of the data type indicated in Table 41. Details about the block-cyclic data distribution of the global matrix Y are stored in desc_y.

iy

has the following meaning:

If incy = M_Y, it indicates which row of global matrix Y is used for vector y.

If incy = 1 and incy <> M_Y, it is the row index of global matrix Y, identifying the first element of vector y.

Scope: global

Specified as: a fullword integer; 1 <= iy <= M_Y and:

If incy = 1 and incy <> M_Y, then iy+n-1 <= M_Y.

jy

has the following meaning:

If incy = M_Y, it is the column index of global matrix Y, identifying the first element of vector y.

If incy = 1 and incy <> M_Y, it indicates which column of global matrix Y is used for vector y.

Scope: global

Specified as: a fullword integer; 1 <= jy <= N_Y and:

If incy = M_Y, then jy+n-1 <= N_Y.

desc_y

is the array descriptor for global matrix Y, described in the following table:

`desc_y`	Name	Description	Limits	Scope
1	DTYPE_Y	Descriptor type	DTYPE_Y=1	Global
2	CTXT_Y	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_Y	Number of rows in the global matrix	If `n` = 0: M_Y >= 0 Otherwise: M_Y >= 1	Global
4	N_Y	Number of columns in the global matrix	If `n` = 0: N_Y >= 0 Otherwise: N_Y >= 1	Global
5	MB_Y	Row block size	MB_Y >= 1	Global
6	NB_Y	Column block size	NB_Y >= 1	Global
7	RSRC_Y	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_Y < `p`	Global
8	CSRC_Y	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_Y < `q`	Global
9	LLD_Y	The leading dimension of the local array	LLD_Y >= max(1,LOCp(M_Y))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incy

is the stride for global vector y.

Scope: global

Specified as: a fullword integer; incy = 1 or incy = M_X, where:

If incy = M_Y, then y is a row-distributed vector.

If incy = 1 and incy <> M_Y, then y is a column-distributed vector.

a

If uplo = 'U', the leading n × n upper triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading n × n lower triangular part of the global symmetric submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 41. Details about the square block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia <= M_A and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja <= N_A and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0: N_A >= 0 Otherwise: N_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

On Return

a

is the updated local part of the global matrix A, containing the results of the computation.

Scope: local

Returned as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 41.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo argument.
The matrix and vectors must have no common elements; otherwise, results are unpredictable.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X = CTXT_Y.
The vectors x and y must be distributed along the same axis--that is, they must both be row distributed or column distributed, where:
- incx = M_X and incy = M_Y for row distribution
- incx = 1( <> M_X) and incy = 1( <> M_Y) for column distribution
The global symmetric matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The block row and block column offsets of the global symmetric matrix A must be equal; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
If incx = M_X:
- In the process grid, the process column containing the first column of the submatrix X must also contain the first column of the submatrix A; that is, iacol = ixcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
- The block column offset of x must be equal to the block row offset of A; that is, mod(jx-1, NB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: NB_X = NB_A.
If incx = 1( <> M_X):
- In the process grid, the process row containing the first row of the submatrix X must also contain the first row of the submatrix A; that is, iarow = ixrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
- The block row offset of x must be equal to the block row offset of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A).
- The following block sizes must be equal: MB_X = MB_A.
If incy = M_Y:
- In the process grid, the process column containing the first column of the submatrix Y must also contain the first column of the submatrix A; that is, iacol = iycol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
- The block column offset of y must be equal to the block row offset of A; that is, mod(jy-1, NB_Y) = mod(ia-1, MB_A).
- The following block sizes must be equal: NB_Y = NB_A.
If incy = 1( <> M_Y):
- In the process grid, the process row containing the first row of the submatrix Y must also contain the first row of the submatrix A; that is, iarow = iyrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)
- The block row offset of y must be equal to the block row offset of A; that is, mod(iy-1, MB_Y) = mod(ia-1, MB_A).
- The following block sizes must be equal: MB_Y = MB_A.

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.
DTYPE_Y is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDSYR2 was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
n < 0
M_X < 0 and n = 0; M_X < 1 otherwise
N_X < 0 and n = 0; N_X < 1 otherwise
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q
CTXT_A <> CTXT_X
ix < 1
jx < 1
M_Y < 0 and n = 0; M_Y < 1 otherwise
N_Y < 0 and n = 0; N_Y < 1 otherwise
MB_Y < 1
NB_Y < 1
RSRC_Y < 0 or RSRC_Y >= p
CSRC_Y < 0 or CSRC_Y >= q
CTXT_A <> CTXT_Y
iy < 1
jy < 1
M_A < 0 and n = 0; M_A < 1 otherwise
N_A < 0 and n = 0; N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
ia < 1
ja < 1

Stage 5

NB_A <> MB_A

If n <> 0:
ia > M_A
ja > N_A
ia+n-1 > M_A
ja+n-1 > N_A
ix > M_X
jx > N_X
iy > M_Y
jy > N_Y

If incx = M_X:
NB_X <> NB_A
mod(jx-1, NB_X) <> mod(ia-1, MB_A)
n <> 0 and jx+n-1 > N_X

If incx = 1( <> M_X):
MB_X <> MB_A
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
n <> 0 and ix+n-1 > M_X

Otherwise:
incx <> M_X and incx <> 1

If incy = M_Y:
NB_Y <> NB_A
mod(jy-1, NB_Y) <> mod(ia-1, MB_A)
n <> 0 and jy+n-1 > N_Y

If incy = 1( <> M_Y):
MB_Y <> MB_A
mod(iy-1, MB_Y) <> mod(ia-1, MB_A)
n <> 0 and iy+n-1 > M_Y

Otherwise:
incy <> M_Y and incy <> 1

Stage 6

mod(ja-1, NB_A) <> mod(ia-1, MB_A)
If incx = M_X, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix X; that is, iacol <> ixcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
If incx = 1( <> M_X), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix X; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
If incy = M_Y, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix Y; that is, iacol <> iycol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
iycol = mod((((jy-1)/NB_Y)+CSRC_Y), q)
If incy = 1( <> M_Y), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix Y; that is, iarow <> iyrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
iyrow = mod((((iy-1)/MB_Y)+RSRC_Y), p)
LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
LLD_Y < max(1, LOCp(M_Y))

Example

This example computes A = alphaxy^T+alphayx^T+A using a 2 × 2 process grid.

Call Statements and Input

 ORDER = 'R'
 NPROW = 2
 NPCOL = 2
 CALL BLACS_GET (0, 0, ICONTXT)
 CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
 CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
              UPLO   N    ALPHA    X  IX  JX    DESC_X   INCX   Y  IY  JY
               |     |      |      |   |   |      |       |     |   |   |
 CALL PDSYR2( 'L'  , 9 ,  1.0D0  , X , 1 , 1 ,  DESC_X ,  1  ,  Y , 1 , 1 ,
 
               DESC_Y   INCY   A  IA  JA   DESC_A
                 |       |     |   |   |     |
               DESC_Y ,  1   , A , 1 , 1 , DESC_A  )

Desc_A

Desc_X

Desc_Y

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))
LLD_Y = MAX(1,NUMROC(M_Y, MB_Y, MYROW, RSRC_Y, NPROW))

In this example, LLD_A = 5 on P₀₀ and P₀₁, LLD_A = 4 on P₁₀ and P₁₁, LLD_X = LLD_Y = 5 on P₀₀, and LLD_X = LLD_Y = 4 on P₁₀.

Global symmetric matrix A of order 9 with block size 4 × 4:

B,D               0                           1                  2
     *                                                               *
     |   1.0    .     .     .   |     .     .     .     .   |     .  |
     |   2.0  12.0    .     .   |     .     .     .     .   |     .  |
 0   |   3.0  13.0  23.0    .   |     .     .     .     .   |     .  |
     |   4.0  14.0  24.0  34.0  |     .     .     .     .   |     .  |
     | -------------------------|---------------------------|------- |
     |   5.0  15.0  25.0  35.0  |   45.0    .     .     .   |     .  |
     |   6.0  16.0  26.0  36.0  |   46.0  56.0    .     .   |     .  |
 1   |   7.0  17.0  27.0  37.0  |   47.0  57.0  67.0    .   |     .  |
     |   8.0  18.0  28.0  38.0  |   48.0  58.0  68.0  78.0  |     .  |
     | -------------------------|---------------------------|------- |
 2   |   9.0  19.0  29.0  39.0  |   49.0  59.0  69.0  79.0  |   89.0 |
     *                                                               *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for A:

p,q  |               0                |             1
-----|--------------------------------|--------------------------
     |   1.0    .     .     .     .   |     .     .     .     .
     |   2.0  12.0    .     .     .   |     .     .     .     .
 0   |   3.0  13.0  23.0    .     .   |     .     .     .     .
     |   4.0  14.0  24.0  34.0    .   |     .     .     .     .
     |   9.0  19.0  29.0  39.0  89.0  |   49.0  59.0  69.0  79.0
-----|--------------------------------|--------------------------
     |   5.0  15.0  25.0  35.0    .   |   45.0    .     .     .
     |   6.0  16.0  26.0  36.0    .   |   46.0  56.0    .     .
 1   |   7.0  17.0  27.0  37.0    .   |   47.0  57.0  67.0    .
     |   8.0  18.0  28.0  38.0    .   |   48.0  58.0  68.0  78.0

Global vector x of size 9 × 1 with block size 4:

B,D     0
     *      *
     |  1.0 |
     |  1.0 |
 0   |  1.0 |
     |  1.0 |
     | ---- |
     |  1.0 |
     |  1.0 |
 1   |  1.0 |
     |  1.0 |
     | ---- |
 2   |  1.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for x:

p,q  |  0
-----|------
     |  1.0
     |  1.0
 0   |  1.0
     |  1.0
     |  1.0
-----|------
     |  1.0
     |  1.0
 1   |  1.0
     |  1.0

Global vector y of size 9 × 1 with block size 4:

B,D     0
     *      *
     |  2.0 |
     |  2.0 |
 0   |  2.0 |
     |  2.0 |
     | ---- |
     |  2.0 |
     |  2.0 |
 1   |  2.0 |
     |  2.0 |
     | ---- |
 2   |  2.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for y:

p,q  |  0
-----|------
     |  2.0
     |  2.0
 0   |  2.0
     |  2.0
     |  2.0
-----|------
     |  2.0
     |  2.0
 1   |  2.0
     |  2.0

Output:

Global matrix A of order 9 with block size 4 × 4:

B,D               0                           1                  2
     *                                                               *
     |   5.0    .     .     .   |     .     .     .     .   |     .  |
     |   6.0  16.0    .     .   |     .     .     .     .   |     .  |
 0   |   7.0  17.0  27.0    .   |     .     .     .     .   |     .  |
     |   8.0  18.0  28.0  38.0  |     .     .     .     .   |     .  |
     | -------------------------|---------------------------|------- |
     |   9.0  19.0  29.0  39.0  |   49.0    .     .     .   |     .  |
     |  10.0  20.0  30.0  40.0  |   50.0  60.0    .     .   |     .  |
 1   |  11.0  21.0  31.0  41.0  |   51.0  61.0  71.0    .   |     .  |
     |  12.0  22.0  32.0  42.0  |   52.0  62.0  72.0  82.0  |     .  |
     | -------------------------|---------------------------|------- |
 2   |  13.0  23.0  33.0  43.0  |   53.0  63.0  73.0  83.0  |   93.0 |
     *                                                               *

The following is the 2 × 2 process grid:

B,D 0 2 1
0
2
P₀₀ P₀₁
1 P₁₀ P₁₁

Local arrays for A:

p,q  |               0                |             1
-----|--------------------------------|--------------------------
     |   5.0    .     .     .     .   |     .     .     .     .
     |   6.0  16.0    .     .     .   |     .     .     .     .
 0   |   7.0  17.0  27.0    .     .   |     .     .     .     .
     |   8.0  18.0  28.0  38.0    .   |     .     .     .     .
     |  13.0  23.0  33.0  43.0  93.0  |   53.0  63.0  73.0  83.0
-----|--------------------------------|--------------------------
     |   9.0  19.0  29.0  39.0    .   |   49.0    .     .     .
     |  10.0  20.0  30.0  40.0    .   |   50.0  60.0    .     .
 1   |  11.0  21.0  31.0  41.0    .   |   51.0  61.0  71.0    .
     |  12.0  22.0  32.0  42.0    .   |   52.0  62.0  72.0  82.0

PDTRMV--Matrix-Vector Product for a Triangular Matrix or Its Transpose

This subroutine computes one of the following matrix-vector products:

1. x <-- Ax

2. x <-- A^Tx

where, in the formulas above:

A represents the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+n-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
jx:jx}.

Note: No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If n = 0, no computation is performed and the subroutine returns after doing some parameter checking. See references [14] and [15].

Table 42. Data Types

A, x Subprogram
Long-precision real PDTRMV

Syntax

Fortran	CALL PDTRMV (`uplo`, `transa`, `diag`, `n`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`)
C and C++	pdtrmv (`uplo`, `transa`, `diag`, `n`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`);

On Entry

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation, resulting in equation 1.

If transa = 'T', A^T is used in the computation, resulting in equation 2.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

n

is the order of global triangular submatrix A and the length of global vector x.

Scope: global

Specified as: a fullword integer; n >= 0.

a

is the local part of the global triangular matrix A. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+n-1 part of the global matrix, and:

If uplo = 'U', the leading n × n upper triangular part of the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading lower triangular part of the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 42. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0: N_A >= 0 Otherwise: M_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

x

If incx = M_X, the leading LOCp(ix) by LOCq(jx+n-1) part of the local array X must contain the local pieces of the leading ix by jx+n-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+n-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+n-1 by jx part of the global matrix.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 42. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+n-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+n-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `n` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `n` = 0: N_X >= 0 Otherwise: M_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

On Return

x

is the updated local part of the global matrix X, containing the results of the computation.

Scope: local

Returned as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 42.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The matrix and vector must have no common elements; otherwise, results are unpredictable.
PDTRMV assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X.
The global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The block row and block column offsets of the global triangular matrix A must be equal; that is, mod(ia-1, MB_A) = mod(ja-1, NB_A).
If incx = M_X:
- The following block sizes must be equal: NB_X = MB_A = NB_A
- In the process grid, the process column containing the first column of the submatrix A must also contain the first column of the submatrix X; that is, iacol = ixcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
- The block column offset of x must be equal to the block row and block column offsets of A; that is, mod(jx-1, NB_X) = mod(ja-1, NB_A) = mod(ia-1, MB_A).
If incx = 1( <> M_X):
- The following block sizes must be equal: MB_X = MB_A = NB_A
- In the process grid, the process row containing the first row of the submatrix A must also contain the first row of the submatrix X; that is, iarow = ixrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
- The block row offset of x must be equal to the block row and block column offsets of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A) = mod(ja-1, NB_A).

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDTRMV was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
transa <> 'N', 'T', or 'C'
diag <> 'N' or 'U'
n < 0
M_A < 0 and n = 0; M_A < 1 otherwise
N_A < 0 and n = 0; N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
CTXT_A <> CTXT_X
M_X < 0 and n = 0; M_X < 1 otherwise
N_X < 0 and n = 0; N_X < 1 otherwise
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q

Stage 5

MB_A = NB_A
mod(ia-1, MB_A) <> mod(ja-1, NB_A)

If n <> 0:
ix > M_X
jx > N_X
ia > M_A
ja > N_A
ia+n-1 > M_A
ja+n-1 > N_A

If incx = M_X:
NB_A <> NB_X
mod(jx-1, NB_X) <> mod(ja-1, NB_A)
n <> 0 and jx+n-1 > N_X

If incx = 1( <> M_X):
MB_A <> MB_X
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
n <> 0 and ix+n-1 > M_X

Otherwise:
incx <> 1 and incx <> M_X

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
If incx = M_X, then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix X; that is, iacol <> ixcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
If incx = 1( <> M_X), then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix X; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)

Example

This example computes x = Ax using a 2 × 2 process grid. It uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a column-distributed vector within a column of X, by specifying incx = 1, ix = 2, and jx = 1.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
            UPLO  TRANSA DIAG  N    A  IA  JA   DESC_A   X  IX  JX
              |     |     |    |    |   |   |     |      |   |   |
CALL PDTRMV( 'U' , 'N' , 'N' , 12 , A , 2 , 2 , DESC_A , X , 2 , 1 ,
 
              DESC_X INCX
                |      |
              DESC_X , 1 )

Desc_A

Desc_X

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))

In this example, LLD_A = 7 on P₀₀ and P₀₁, LLD_A = 6 on P₁₀ and P₁₁, LLD_X = 7 on P₁₀, and LLD_X = 6 on P₁₁.

After the global matrix A is distributed over the process grid, only a portion of the global data structure is used--that is, global submatrix A. Following is the global submatrix A of order 12, starting at row 2 and column 2 in global triangular matrix A of order 13 with block size 3 × 3:

B,D          0                  1                  2                  3             4
     *                                                                                  *
     |   .    .    .   |    .    .    .   |    .    .    .   |    .    .    .   |    .  |
 0   |   .   1.0  2.0  |   1.0  2.0  1.0  |   1.0  3.0  1.0  |   1.0  2.0  3.0  |   2.0 |
     |   .    .   3.0  |   2.0  3.0  1.0  |   2.0  3.0  1.0  |   1.0  2.0  3.0  |   3.0 |
     | ----------------|------------------|------------------|------------------|------ |
     |   .    .    .   |   3.0  1.0  3.0  |   2.0  1.0  2.0  |   1.0  2.0  3.0  |   1.0 |
 1   |   .    .    .   |    .   1.0  2.0  |   2.0  1.0  1.0  |   1.0  2.0  3.0  |   2.0 |
     |   .    .    .   |    .    .   2.0  |   1.0  2.0  2.0  |   1.0  2.0  3.0  |   3.0 |
     | ----------------|------------------|------------------|------------------|------ |
     |   .    .    .   |    .    .    .   |   1.0  2.0  1.0  |   1.0  2.0  3.0  |   1.0 |
 2   |   .    .    .   |    .    .    .   |    .   2.0  1.0  |   1.0  2.0  3.0  |   2.0 |
     |   .    .    .   |    .    .    .   |    .    .   2.0  |   1.0  2.0  3.0  |   3.0 |
     | ----------------|------------------|------------------|------------------|------ |
     |   .    .    .   |    .    .    .   |    .    .    .   |   3.0  1.0  3.0  |   1.0 |
 3   |   .    .    .   |    .    .    .   |    .    .    .   |    .   2.0  2.0  |   2.0 |
     |   .    .    .   |    .    .    .   |    .    .    .   |    .    .   1.0  |   3.0 |
     | ----------------|------------------|------------------|------------------|------ |
 4   |   .    .    .   |    .    .    .   |    .    .    .   |    .    .    .   |   1.0 |
     *                                                                                  *

The following is the 2 × 2 process grid:

B,D 0 2 4 1 3
0
2
4
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for A:

p,q  |                 0                   |                1
-----|-------------------------------------|--------------------------------
     |   .    .    .    .    .    .    .   |    .    .    .    .    .    .
     |   .   1.0  2.0  1.0  3.0  1.0  2.0  |   1.0  2.0  1.0  1.0  2.0  3.0
     |   .    .   3.0  2.0  3.0  1.0  3.0  |   2.0  3.0  1.0  1.0  2.0  3.0
 0   |   .    .    .   1.0  2.0  1.0  1.0  |    .    .    .   1.0  2.0  3.0
     |   .    .    .    .   2.0  1.0  2.0  |    .    .    .   1.0  2.0  3.0
     |   .    .    .    .    .   2.0  3.0  |    .    .    .   1.0  2.0  3.0
     |   .    .    .    .    .    .   1.0  |    .    .    .    .    .    .
-----|-------------------------------------|--------------------------------
     |   .    .    .   2.0  1.0  2.0  1.0  |   3.0  1.0  3.0  1.0  2.0  3.0
     |   .    .    .   2.0  1.0  1.0  2.0  |    .   1.0  2.0  1.0  2.0  3.0
     |   .    .    .   1.0  2.0  2.0  3.0  |    .    .   2.0  1.0  2.0  3.0
 1   |   .    .    .    .    .    .   1.0  |    .    .    .   3.0  1.0  3.0
     |   .    .    .    .    .    .   2.0  |    .    .    .    .   2.0  2.0
     |   .    .    .    .    .    .   3.0  |    .    .    .    .    .   1.0

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 12 × 1, starting at row 2 in 13 × 1 global matrix X with block size 3:

B,D     0
     *      *
     |   .  |
 0   |  2.0 |
     |  3.0 |
     | ---- |
     |  1.0 |
 1   |  2.0 |
     |  3.0 |
     | ---- |
     |  1.0 |
 2   |  2.0 |
     |  3.0 |
     | ---- |
     |  1.0 |
 3   |  2.0 |
     |  3.0 |
     | ---- |
 4   |  1.0 |
     *      *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
4
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for x:

p,q  |  0
-----|------
     |   .
     |  2.0
     |  3.0
 0   |  1.0
     |  2.0
     |  3.0
     |  1.0
-----|------
     |  1.0
     |  2.0
     |  3.0
 1   |  1.0
     |  2.0
     |  3.0

Output:

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a column-distributed vector. Following is the global vector x of size 12 × 1, starting at row 2 in 13 × 1 global matrix X with block size 3:

B,D      0
     *       *
     |    .  |
 0   |  42.0 |
     |  48.0 |
     | ----- |
     |  39.0 |
 1   |  31.0 |
     |  34.0 |
     | ----- |
     |  23.0 |
 2   |  23.0 |
     |  23.0 |
     | ----- |
     |  15.0 |
 3   |  12.0 |
     |   6.0 |
     | ----- |
 4   |   1.0 |
     *       *

The following is the 2 × 2 process grid:

B,D 0 --
0
2
4
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for x:

p,q  |   0
-----|-------
     |    .
     |  42.0
     |  48.0
 0   |  23.0
     |  23.0
     |  23.0
     |   1.0
-----|-------
     |  39.0
     |  31.0
     |  34.0
 1   |  15.0
     |  12.0
     |   6.0

PDTRSV--Solution of Triangular System of Equations with a Single Right-Hand Side

This subroutine performs one of the following solves for a triangular system of equations with a single right-hand side:

Solution Equation
1. x <-- A^-1x Ax = b
2. x <-- A^-Tx A^Tx = b

where, in the formulas above:

A represents the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1}.

x represents the global vector:

For incx = M_X, it is X_{ix:ix,
jx:jx+n-1}.
For incx = 1 and incx <> M_X, it is X_{ix:ix+n-1,
jx:jx}.

Notes:

The term b used in the systems of equations listed above represents the right-hand side of the system. It is important to note that in these subroutines the right-hand side of the equation is actually provided in the input-output argument x.
No data should be moved to form the matrix transpose; that is, the matrix should always be stored in its untransposed form.

If n = 0, no computation is performed and the subroutine returns after doing some parameter checking. See references [14] and [15].

Table 43. Data Types

A, x Subprogram
Long-precision real PDTRSV

Syntax

Fortran	CALL PDTRSV (`uplo`, `transa`, `diag`, `n`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`)
C and C++	pdtrsv (`uplo`, `transa`, `diag`, `n`, `a`, `ia`, `ja`, `desc_a`, `x`, `ix`, `jx`, `desc_x`, `incx`);

On Entry

uplo

indicates whether the upper or lower triangular part of the global triangular submatrix A is referenced, where:

If uplo = 'U', the upper triangular part is referenced.

If uplo = 'L', the lower triangular part is referenced.

Scope: global

Specified as: a single character; uplo = 'U' or 'L'.

transa

indicates the form of matrix A used in the system of equations, where:

If transa = 'N', A is used in the system of equations, resulting in solution 1.

If transa = 'T', A^T is used in the system of equations, resulting in solution 2.

Scope: global

Specified as: a single character; transa = 'N' or 'T'.

diag

indicates the characteristics of the diagonal of matrix A, where:

If diag = 'U', A is a unit triangular matrix.

If diag = 'N', A is not a unit triangular matrix.

Scope: global

Specified as: a single character; diag = 'U' or 'N'.

n

is the order of global triangular submatrix A and the length of global vector x.

Scope: global

Specified as: a fullword integer; n >= 0.

a

is the local part of the global triangular matrix A, used in the system of equations. This identifies the first element of the local array A. This subroutine computes the location of the first element of the local subarray used, based on ia, ja, desc_a, p, q, myrow, and mycol; therefore, the leading LOCp(ia+n-1) by LOCq(ja+n-1) part of the local array A must contain the local pieces of the leading ia+n-1 by ja+n-1 part of the global matrix, and:

If uplo = 'U', the leading n × n upper triangular part of the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the upper triangular part of the submatrix, and the strictly lower triangular part is not referenced.
If uplo = 'L', the leading lower triangular part of the global triangular submatrix A_{ia:ia+n-1,
ja:ja+n-1} must contain the lower triangular part of the submatrix, and the strictly upper triangular part is not referenced.

Note:

No data should be moved to form A^T; that is, the matrix A should always be stored in its untransposed form.

Scope: local

Specified as: an LLD_A by (at least) LOCq(N_A) array, containing numbers of the data type indicated in Table 43. Details about the block-cyclic data distribution of global matrix A are stored in desc_a.

ia

is the row index of the global matrix A, identifying the first row of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ia and ia+n-1 <= M_A.

ja

is the column index of the global matrix A, identifying the first column of the submatrix A.

Scope: global

Specified as: a fullword integer; 1 <= ja and ja+n-1 <= N_A.

desc_a

is the array descriptor for global matrix A, described in the following table:

`desc_a`	Name	Description	Limits	Scope
1	DTYPE_A	Descriptor type	DTYPE_A=1	Global
2	CTXT_A	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_A	Number of rows in the global matrix	If `n` = 0: M_A >= 0 Otherwise: M_A >= 1	Global
4	N_A	Number of columns in the global matrix	If `n` = 0: N_A >= 0 Otherwise: M_A >= 1	Global
5	MB_A	Row block size	MB_A >= 1	Global
6	NB_A	Column block size	NB_A >= 1	Global
7	RSRC_A	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_A < `p`	Global
8	CSRC_A	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_A < `q`	Global
9	LLD_A	The leading dimension of the local array	LLD_A >= max(1,LOCp(M_A))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

x

is the local part of the global matrix X, containing the right-hand side of the triangular system to be solved. This identifies the first element of the local array X. This subroutine computes the location of the first element of the local subarray used, based on ix, jx, desc_x, p, q, myrow, and mycol; therefore:

If incx = M_X, the leading LOCp(ix) by LOCq(jx+n-1) part of the local array X must contain the local pieces of the leading ix by jx+n-1 part of the global matrix.
If incx = 1 and incx <> M_X, the leading LOCp(ix+n-1) by LOCq(jx) part of the local array X must contain the local pieces of the leading ix+n-1 by jx part of the global matrix.

Scope: local

Specified as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 43. Details about the block-cyclic data distribution of the global matrix X are stored in desc_x.

ix

has the following meaning:

If incx = M_X, it indicates which row of global matrix X is used for vector x.

If incx = 1 and incx <> M_X, it is the row index of global matrix X, identifying the first element of vector x.

Scope: global

Specified as: a fullword integer; 1 <= ix <= M_X and:

If incx = 1 and incx <> M_X, then ix+n-1 <= M_X.

jx

has the following meaning:

If incx = M_X, it is the column index of global matrix X, identifying the first element of vector x.

If incx = 1 and incx <> M_X, it indicates which column of global matrix X is used for vector x.

Scope: global

Specified as: a fullword integer; 1 <= jx <= N_X and:

If incx = M_X, then jx+n-1 <= N_X.

desc_x

is the array descriptor for global matrix X, described in the following table:

`desc_x`	Name	Description	Limits	Scope
1	DTYPE_X	Descriptor type	DTYPE_X=1	Global
2	CTXT_X	BLACS context	Valid value, as returned by BLACS_GRIDINIT or BLACS_GRIDMAP	Global
3	M_X	Number of rows in the global matrix	If `n` = 0: M_X >= 0 Otherwise: M_X >= 1	Global
4	N_X	Number of columns in the global matrix	If `n` = 0: N_X >= 0 Otherwise: M_X >= 1	Global
5	MB_X	Row block size	MB_X >= 1	Global
6	NB_X	Column block size	NB_X >= 1	Global
7	RSRC_X	The process row of the `p` × `q` grid over which the first row of the global matrix is distributed	0 <= RSRC_X < `p`	Global
8	CSRC_X	The process column of the `p` × `q` grid over which the first column of the global matrix is distributed	0 <= CSRC_X < `q`	Global
9	LLD_X	The leading dimension of the local array	LLD_X >= max(1,LOCp(M_X))	Local

Specified as: an array of (at least) length 9, containing fullword integers.

incx

is the stride for global vector x.

Scope: global

Specified as: a fullword integer; incx = 1 or incx = M_X, where:

If incx = M_X, then x is a row-distributed vector.

If incx = 1 and incx <> M_X, then x is a column-distributed vector.

On Return

x

is the updated local part of the global matrix X, containing the solution vector.

Scope: local

Returned as: an LLD_X by (at least) LOCq(N_X) array, containing numbers of the data type indicated in Table 43.

Notes and Coding Rules

This subroutine accepts lowercase letters for the uplo, transa, and diag arguments.
If you specify 'C' for transa, it is interpreted as though you specified 'T'.
The matrix and vector must have no common elements; otherwise, results are unpredictable.
PDTRSV assumes certain values in your array for parts of a triangular matrix. As a result, you do not have to set these values. For unit triangular matrices, the elements of the diagonal are assumed to be one. When using an upper or lower triangular matrix, the unreferenced elements in the strictly lower or upper triangular part, respectively, are assumed to be zero.
The NUMROC utility subroutine can be used to determine the values of LOCp(M_) and LOCq(N_) used in the argument descriptions above. For details, see "Determining the Number of Rows and Columns in Your Local Arrays" and NUMROC--Compute the Number of Rows or Columns of a Block-Cyclically Distributed Matrix Contained in a Process.
For suggested block sizes, see "Coding Tips for Optimizing Parallel Performance".
The following values must be equal: CTXT_A = CTXT_X.
The global triangular matrix A must be distributed using a square block-cyclic distribution; that is, MB_A = NB_A.
The block row and block column offsets of the global triangular matrix A must be equal; that is, mod(ia-1,MB_A) = mod(ja-1,NB_A).
If incx = M_X:
- The following block sizes must be equal: NB_X = MB_A = NB_A
- If transa = 'T', then (in the process grid) the process column containing the first column of the submatrix A must also contain the first column of the submatrix X; that is, iacol = ixcol, where:
  
  iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
  ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
- The block column offset of x must be equal to the block row and block column offsets of A; that is, mod(jx-1, NB_X) = mod(ja-1, NB_A) = mod(ia-1, MB_A).
If incx = 1( <> M_X):
- The following block sizes must be equal: MB_X = MB_A = NB_A
- If transa = 'N', then (in the process grid) the process row containing the first row of the submatrix A must also contain the first row of the submatrix X; that is, iarow = ixrow, where:
  
  iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
  ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)
- The block row offset of x must be equal to the block row and block column offsets of A; that is, mod(ix-1, MB_X) = mod(ia-1, MB_A) = mod(ja-1, NB_A).

Error Conditions

Computational Errors

None

Resource Errors

Unable to allocate work space

Input-Argument and Miscellaneous Errors

Stage 1

DTYPE_A is invalid.
DTYPE_X is invalid.

Stage 2

CTXT_A is invalid.

Stage 3

PDTRSV was called from outside the process grid.

Stage 4

uplo <> 'U' or 'L'
transa <> 'N', 'T', or 'C'
diag <> 'N' or 'U'
n < 0
M_A < 0 and n = 0; M_A < 1 otherwise
N_A < 0 and n = 0; N_A < 1 otherwise
MB_A < 1
NB_A < 1
RSRC_A < 0 or RSRC_A >= p
CSRC_A < 0 or CSRC_A >= q
CTXT_A <> CTXT_X
M_X < 0 and n = 0; M_X < 1 otherwise
N_X < 0 and n = 0; N_X < 1 otherwise
MB_X < 1
NB_X < 1
RSRC_X < 0 or RSRC_X >= p
CSRC_X < 0 or CSRC_X >= q

Stage 5

MB_A = NB_A
mod(ia-1, MB_A) <> mod(ja-1, NB_A)

If n <> 0:
ix > M_X
jx > N_X
ia > M_A
ja > N_A
ia+n-1 > M_A
ja+n-1 > N_A

If incx = M_X:
NB_A <> NB_X
mod(jx-1, NB_X) <> mod(ja-1, NB_A)
n <> 0 and jx+n-1 > N_X

If incx = 1( <> M_X):
MB_A <> MB_X
mod(ix-1, MB_X) <> mod(ia-1, MB_A)
n <> 0 and ix+n-1 > M_X

Otherwise:
incx <> 1 and incx <> M_X

Stage 6

LLD_A < max(1, LOCp(M_A))
LLD_X < max(1, LOCp(M_X))
If incx = M_X and transa = 'T', then (in the process grid) the process column containing the first column of the submatrix A does not contain the first column of the submatrix X; that is, iacol <> ixcol, where:

iacol = mod((((ja-1)/NB_A)+CSRC_A), q)
ixcol = mod((((jx-1)/NB_X)+CSRC_X), q)
If incx = 1( <> M_X) and transa = 'N', then (in the process grid) the process row containing the first row of the submatrix A does not contain the first row of the submatrix X; that is, iarow <> ixrow, where:

iarow = mod((((ia-1)/MB_A)+RSRC_A), p)
ixrow = mod((((ix-1)/MB_X)+RSRC_X), p)

Example

This example solves x <-- A^-1x using a 2 × 2 process grid, where A is a unit triangular matrix. It uses a global submatrix A within a global matrix A by specifying ia = 2 and ja = 2. It uses vector x, which is a row-distributed vector within a row of global matrix X, by specifying specifying incx = M_X = 1, ix = 1, and jx = 2.

Call Statements and Input

ORDER = 'R'
NPROW = 2
NPCOL = 2
CALL BLACS_GET (0, 0, ICONTXT)
CALL BLACS_GRIDINIT(ICONTXT, ORDER, NPROW, NPCOL)
CALL BLACS_GRIDINFO(ICONTXT, NPROW, NPCOL, MYROW, MYCOL)
 
            UPLO  TRANSA DIAG   N    A  IA  JA   DESC_A   X  IX  JX
              |     |      |    |    |   |   |     |      |   |   |
CALL PDTRSV( 'L' , 'N'  , 'U' , 12 , A , 2 , 2 , DESC_A , X , 1 , 2 ,
 
            DESC_X INCX
               |     |
            DESC_X , 1 )

Desc_A

Desc_X

DTYPE_

CTXT_

icontxt¹

MB_

NB_

RSRC_

CSRC_

LLD_

See below²

¹ icontxt is the output of the BLACS_GRIDINIT call.

² Each process should set the LLD_ as follows:

LLD_A = MAX(1,NUMROC(M_A, MB_A, MYROW, RSRC_A, NPROW))
LLD_X = MAX(1,NUMROC(M_X, MB_X, MYROW, RSRC_X, NPROW))

In this example, LLD_A = 7 on P₀₀ and P₀₁, LLD_A = 6 on P₁₀ and P₁₁, and LLD_X = 1 on all processes.

B,D          0                  1                  2                  3             4
     *                                                                                  *
     |   .    .    .   |    .    .    .   |    .    .    .   |    .    .    .   |    .  |
 0   |   .   1.0   .   |    .    .    .   |    .    .    .   |    .    .    .   |    .  |
     |   .   2.0  1.0  |    .    .    .   |    .    .    .   |    .    .    .   |    .  |
     | ----------------|------------------|------------------|------------------|------ |
     |   .   3.0  2.0  |   1.0   .    .   |    .    .    .   |    .    .    .   |    .  |
 1   |   .   1.0  3.0  |   2.0  1.0   .   |    .    .    .   |    .    .    .   |    .  |
     |   .   2.0  1.0  |   3.0  2.0  1.0  |    .    .    .   |    .    .    .   |    .  |
     | ----------------|------------------|------------------|------------------|------ |
     |   .   3.0  2.0  |   1.0  3.0  2.0  |   1.0   .    .   |    .    .    .   |    .  |
 2   |   .   1.0  3.0  |   2.0  1.0  3.0  |   2.0  1.0   .   |    .    .    .   |    .  |
     |   .   2.0  1.0  |   3.0  2.0  1.0  |   3.0  2.0  1.0  |    .    .    .   |    .  |
     | ----------------|------------------|------------------|------------------|------ |
     |   .   3.0  2.0  |   1.0  3.0  2.0  |   1.0  3.0  2.0  |   1.0   .    .   |    .  |
 3   |   .   1.0  3.0  |   2.0  1.0  3.0  |   2.0  1.0  3.0  |   2.0  1.0   .   |    .  |
     |   .   2.0  1.0  |   3.0  2.0  1.0  |   3.0  2.0  1.0  |   3.0  2.0  1.0  |    .  |
     | ----------------|------------------|------------------|------------------|------ |
 4   |   .   3.0  2.0  |   1.0  3.0  2.0  |   1.0  3.0  2.0  |   1.0  3.0  2.0  |   1.0 |
     *                                                                                  *

Note:

Because matrix A is unit triangular, the diagonal elements are not referenced. This subroutine assumes a value of 1.0 for the diagonal elements.

The following is the 2 × 2 process grid:

B,D 0 2 4 1 3
0
2
4
P₀₀ P₀₁
1
3
P₁₀ P₁₁

Local arrays for A:

p,q  |                 0                  |                1
-----|------------------------------------|--------------------------------
     |   .    .    .    .    .    .    .  |    .    .    .    .    .    .
     |   .    .    .    .    .    .    .  |    .    .    .    .    .    .
     |   .   2.0   .    .    .    .    .  |    .    .    .    .    .    .
 0   |   .   3.0  2.0   .    .    .    .  |   1.0  3.0  2.0   .    .    .
     |   .   1.0  3.0  2.0   .    .    .  |   2.0  1.0  3.0   .    .    .
     |   .   2.0  1.0  3.0  2.0   .    .  |   3.0  2.0  1.0   .    .    .
     |   .   3.0  2.0  1.0  3.0  2.0   .  |   1.0  3.0  2.0  1.0  3.0  2.0
-----|------------------------------------|--------------------------------
     |   .   3.0  2.0   .    .    .    .  |    .    .    .    .    .    .
     |   .   1.0  3.0   .    .    .    .  |   2.0   .    .    .    .    .
     |   .   2.0  1.0   .    .    .    .  |   3.0  2.0   .    .    .    .
 1   |   .   3.0  2.0  1.0  3.0  2.0   .  |   1.0  3.0  2.0   .    .    .
     |   .   1.0  3.0  2.0  1.0  3.0   .  |   2.0  1.0  3.0  2.0   .    .
     |   .   2.0  1.0  3.0  2.0  1.0   .  |   3.0  2.0  1.0  3.0  2.0   .

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 12, starting at row 1 and column 2 in 1 × 13 global matrix X with block size 3:

B,D          0                  1                  2                  3             4
     *                                                                                  *
 0   |   .   2.0  7.0  |  13.0 15.0 17.0  |  26.0 28.0 27.0  |  39.0 41.0 37.0  |  52.0 |
     *                                                                                  *

The following is the 2 × 2 process grid:

B,D 0 2 4 1 3
0 P₀₀ P₀₁
-- P₁₀ P₁₁

Local arrays for x:

p,q  |                 0                   |                1
-----|-------------------------------------|--------------------------------
 0   |   .   2.0  7.0 26.0 28.0 27.0 52.0  |  13.0 15.0 17.0 39.0 41.0 37.0

Output:

After the global matrix X is distributed over the process grid, only a portion of the global data structure is used--that is, global vector x, which is a row-distributed vector. Following is the global vector x of size 1 × 12, starting at row 1 and column 2 in 1 × 13 global matrix X with block size 3:

B,D          0                  1                  2                  3             4
     *                                                                                  *
 0   |   .   2.0  3.0  |   1.0  2.0  3.0  |   1.0  2.0  3.0  |   1.0  2.0  3.0  |   1.0 |
     *                                                                                  *

The following is the 2 × 2 process grid:

B,D 0 2 4 1 3
0 P₀₀ P₀₁
-- P₁₀ P₁₁

Local arrays for x:

p,q  |                 0                   |                1
-----|-------------------------------------|--------------------------------
 0   |   .   2.0  3.0  1.0  2.0  3.0  1.0  |   1.0  2.0  3.0  1.0  2.0  3.0

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]