pghpf Version 2.1

Release Notes

The Portland Group, Inc.

9150 SW Pioneer Court, Suite H

Wilsonville, Oregon 97070

While every precaution has been taken in the preparation of this document, The Portland Group, Inc. makes no warranty for the use of its products and assumes no responsibility for any errors which may appear, or for damages resulting from the use of the information contained herein. The Portland Group, Inc. retains the right to make changes to this information at any time, without notice. The software described in this document is distributed under license from The Portland Group, Inc and may be used or copied only in accordance with the terms of the license agreement. No part of this document may be reproduced or transmitted in any form or by any means, for any purpose other than the purchaser's personal use without the express written permission of The Portland Group, Inc. Commercial uses are strictly prohibited.

PGI, pghpf, pgf77, pgcc, pgprof, and pgdbg are trademarks of The Portland Group, Inc. Other brands and names are the property of their respective owners.

pghpf Version 2.1 Release Notes
Copyright (c) 1996 The Portland Group, Inc.
All rights reserved.
Printed in the United States of America

Printing History
May 1996: First Printing

Part Number: 2401-990-990-0596

Phone: (503) 682-2806
Fax: (503) 682-2637
e-mail: trs@pgroup.com

Table of Contents

  1. pghpf 2.1 Features
  2. Getting Started
  3. Restrictions and Omissions
  4. Optimization Features
  5. pghpf 2.1 Input/Output
  6. INDEPENDENT Loops
  7. Profiling
  8. Debugging
  9. Bug Fixes
  10. Compiler Command-line Options
  11. Contacting PGI
Appendix A
HPF_LOCAL_LIBRARY Procedures

pghpf 2.1 Release Notes


This document describes important issues relating to pghpf Version 2.1, including the changes from pghpf version 2.0 to version 2.1.

1 pghpf 2.1 Features

This section briefly lists the features of pghpf Version 2.1.

Most features of full HPF are included in pghpf 2.1, with the few exceptions noted in these release notes. If you encounter any HPF feature that is not supported, and not listed in Section 3, "Restrictions and Omissions - pghpf 2.1", you should consider it a bug and report it to PGI at the e-mail address trs@pgroup.com.

HPF language features include:

1.1 Release 2.1 Changes from Previous Release 2.0

Fortran 90 Additions and Changes

HPF Additions and Changes

USE HPF_LOCAL_LIBRARY

pghpf Driver Additions and Changes



Note

The driver has been updated so that -Mautopar no longer sets the optimization level to -O2. If you have existing makefiles that use -Mautopar but do not set the optimization level, you should update the makefiles.
The driver supports several new command-line options. These options are:

-Mnofree

-Mnofreeform

These two options serve the same function. Using pghpf 2.1, the compiler treats files with a .f90 extension as Fortran 90 files using free source form. Using either of these options specifies fixed source form for files with the .f90 extension. For files with other extensions, for example .F or .hpf, the option -Mfreeform specifies Fortran 90 free source form.

-Mnoindependent

Independent DO loop processing has been significantly expanded in this release. This new option disables parallelization associated with INDEPENDENT DO loops.

-Moverlap=size:n

This option allows the programmer to set the size of the overlap shift area created for the overlap shift optimization. By default the size is set to 4. If the programmer wants a different overlap shift area size, either to save memory allocated or to reduce communications when the compiler generates the overlap shift optimization, a size other than 4 can be set. A size of 0 disables the overlap shift optimization. For more details on this option, refer to section 4.1 "Setting the Size of the Overlap Area".

2 Getting Started

Once pghpf has been installed, use the following steps to start using the compiler (this assumes you are using csh or a variant of csh; for other shells the commands may differ). Assume that the compiler has been installed in the directory /usr/pgi on your system, that the target is platform (for example, rs6000, sp2, solaris, hp, sgi, etc.), and that a valid license.dat file has been placed in /usr/pgi:
% setenv PGI /usr/pgi 
% set path=($PGI/platform/bin $path) 
% setenv LM_LICENSE_FILE $PGI/license.dat
You should now be able to compile and run HPF programs as follows:
% pghpf hello.hpf  
% a.out options -pghpf pghpf_options
If you wish to link and run with a version of pghpf other than the default for your system, refer to the pghpf User's Guide for more details.

3 Restrictions and Omissions

The following is a list of restrictions that apply to pghpf 2.1. Some of these restrictions are known bugs, others are Fortran 90 or HPF features that are not yet implemented.

3.1 Known Restrictions

  1. The HPF_LIBRARY routines GRADE_UP and GRADE_DOWN require a DIM argument. These routines also do not yet support cyclic distributions of the selected dimension.

  2. The compiler command-line option -g is a beta feature for debugger developers. Details for this option are available by contacting PGI at sales@pgroup.com.. For a file filename.hpf, the -g option creates a file named filename.stb in the current directory.

  3. An object of derived type cannot be initialized with a DATA statement; instead use the Fortran 90-style form for initializing an object.

If a and b are automatic arrays and they have extents n and m that are equal pghpf currently cannot conclude that n and m are equal. If the programmer gives them the same extent, pghpf may perform more optimizations. For example:
        subroutine foo(a,b)
common /c1/ n,m
integer, dimension(n) :: a
integer, dimension(m) :: b
!hpf$ distribute (block) :: a,b
a(:) = b(:)
end
The assignment a(:) = b(:) says that a and b must be equal sized arrays, since the assignment implies the arrays are conformable. When using either of n or m in the declaration for a and b, the compiler performs additional optimizations, as compared with the code shown above.

Character String Restriction

In evaluating a character string expression on the right-hand side of an assignment, the values on the left-hand side may not be used. This is allowed in Fortran 90, but not currently supported in pghpf. For example, the following will not be valid:
DATE (2:5) = DATE(1:4)

PURE Restriction

The pghpf 2.1 implementation of PURE conforms to the HPF 1.1 language specification, with the following exception: in PURE subroutines pghpf will not generate any communication for distributed COMMON variables or distributed MODULE variables. The user is advised to pass distributed COMMON as arguments to a PURE subroutine, or use non-distributed COMMON .

Optional Argument Restrictions

An optional argument should not be used as an align-target for any variable that is not also optional. If an alignee is present, then its align-target must also be present. For example:
        subroutine test1(a,b)
integer, dimension(10):: a,b
optional :: a
!hpf$ template t(n)
!hpf$ distribute (block)::t
!hpf$ align a(i) with t(i)
!hpf$ align b(i) with a(i) ! THIS IS A PROBLEM
Should be rewritten as:
        subroutine test1(a,b)
integer, dimension(10):: a,b
optional :: a
!hpf$ template ta(n)
!hpf$ distribute (block):: ta
!hpf$ align a(i) with ta(i)
!hpf$ template tb(n)
!hpf$ distribute (block):: tb
!hpf$ align b(i) with tb(i) ! THIS IS FINE

Pointer Restrictions

The pghpf 2.1 compiler supports Fortran 90 pointers with the following restrictions:
  1. Objects with the POINTER attribute cannot appear in COMMON (they can appear in a module if they are not distributed).
  2. Objects with the POINTER attribute cannot be DYNAMIC.
  3. DERIVED TYPE components cannot have the POINTER attribute.
  4. COMPLEX objects cannot have the TARGET attribute when compiling for some systems.
  5. Objects with the TARGET attribute cannot have CYCLIC or CYCLIC(N) distributions. It may not be possible to detect this at compile-time in all cases, for example, when a CYCLIC actual argument is passed to a dummy with the TARGET attribute.
  6. A scalar POINTER cannot be associated with a distributed array element. For example:
        integer, pointer :: p
        integer, target , :: a(10),b(10)
!hpf$   distribute (block) :: a
        p => a(1,1)   ! unsupported
        p => b(1,1)   ! supported
        end
Finally, do not use a pointer dummy variable to declare other variables such as automatic arrays using lbound(), ubound() and size() intrinsics. For example:
      subroutine sub(p)
         integer, pointer, dimension(:,:) :: p
         integer, dimension(lbound(p,1):   &
    +    ubound(p,1),size(p,2))::a ! does not work
The compiler error messages for the pointer limitations are:
PGHPF-S-0000-Internal error. POINTER common block member not supported
PGHPF-S-0155-DYNAMIC object may not have the POINTER attribute
PGHPF-S-0000-Internal error. POINTER component of derived type not supported
PGHPF-W-0155-Complex TARGET may not be properly aligned
PGHPF-F-0155 scalar POINTER associated with distributed object is unsupported 
Runtime Error Message:
POINTER: cyclic distribution of target unsupported

Derived Type Restrictions

The DATA statement does not support array constructors and arrays of derived type. As a work-around, use entity-style initialization.

The following Fortran 90 Intrinsics will not work with variables or arrays of derived type:

ALLOCATED(ARRAY)
CSHIFT(ARRAY,SHIFT,DIM)
EOSHIFT(ARRAY,SHIFT,BOUNDARY,DIM)
LBOUND(ARRAY,DIM)
MERGE(TSOURCE,FSOURCE,MASK)
PACK(ARRAY,MASK,VECTOR)
PRESENT(A)
RESHAPE(SOURCE,SHAPE,PAD,ORDER)
SHAPE(SOURCE)
SIZE(ARRAY,DIM)
SPREAD(SOURCE,DIM,NCOPIES)
TRANSFER(SOURCE,MOLD,SIZE)
TRANSPOSE(MATRIX)
UBOUND(ARRAY,DIM)
UNPACK(VECTOR,MASK,FIELD)
The following HPF library and intrinsic procedures will not work with variables or arrays of derived type:
COPY_PREFIX()
COPY_SCATTER()
COPY_SUFFIX()
HPF_ALIGNMENT()
HPF_DISTRIBUTE()
HPF_TEMPLATE()

Named Constant Restrictions

Named array or structure constants cannot be subscripted or referenced by member to yield a constant value. For example:
INTEGER, PARAMETER, DIMENSION(3):: X=(/1,2,3/)
The following will not work:
INTEGER, PARAMETER:: Y=X(1)   !WILL NOT WORK
Because of this limitation, named array or structure constants cannot be used in the following places:

Module Restrictions

Named array constants defined in a module can't be used as an initializer in a subprogram which USEs the module.

Named array or structure constants found in modules cannot be used in the following cases:

NAMELIST objects are not allowed in the specification part of a MODULE.

The module PUBLIC/PRIVATE access statements cannot reference a CONTAINed subprogram.

A MODULE cannot contain forward references to procedures defined in the same module. For example, the module B below will not work in the current release, while module C will work:

MODULE B
CONTAINS
FUNCTION G
.
.
.
  CALL H
END FUNCTION G
SUBROUTINE H
.
.
.
END SUBROUTINE H
END MODULE B
MODULE C
CONTAINS
SUBROUTINE H
.
.
.
END SUBROUTINE H
FUNCTION G
.
.
.
  CALL H
END FUNCTION G
END MODULE C

3.2 Omissions - Version 2.1

This section lists Fortran 90 and HPF features that are omitted from pghpf 2.1.

Fortran 90 Language Omissions

HPF Language Omissions

PGHPF-W-3011-Non-replicated mapping for character/struct/union array, 
char_table, ignored (file.F: lineno)
LOCAL_TO_GLOBAL()

3.3 System Specific Notes

CRAY T3D Runtime

The execution of a T3D program depends on the policies of the host site. In general, programs are executed with:
%a.out mppexec_opt user_opt -pghpf HPF_opt
The mppexec options are described in the mppexec(1) man page. The -npes mppexec_opt option is required and specifies the number of processors. The number of processors must be a power of 2.

The only supported HPF options are -stat and -np. The HPF -np option may be specified to reduce the number of processors from the value specified by the -npes option. The use of the -np option is not recommended as the unused processors are not available for other uses.

CRAY T3D Profiling

The profiler, pgprof, is not currently supported on the CRAY T3D. However, CRAY T3D programs can be compiled and run with the -Mprof options. The resulting pgprof.out file can be analyzed on any supported workstation platform.

CRAY T3D Compiler Options

There is a compiler option that is only available on the T3D. The option is
%pghpf -Ojump file1.hpf
The -Ojump switch will pass "-Wf,-ojump" to the T3D Fortran 77 compiler and link a version of the runtime library compiled with -h jump. See the documentation on -h jump for the T3D C compiler for more details.

IBM Systems

IBM SP2 Runtime
The MPI implementation used by pghpf is the IBM version (MPI-F 1.41). The execution of a SP2 MPI program depends on the policies of the host site. For example, programs could be executed with:
%mpirun -np numberofprocs a.out user_opt -pghpf HPF_opt
The only supported HPF option is -stat. The HPF -np option is not supported.

For the IBM SP2, the MPL communications library is also available. To use the MPL library, include the option -Mmpl on the compiler command line (this library is loaded when linking occurs).

IBM Directory Structure
Previously, IBM RS6000 and SP 2 systems used the same directory structure. For example to set the path to include PGI binaries on an rs6000, use the following:
set path=($path $PGI/rs6000/bin)
In pghpf 2.1, the directory structure for rs600 workstations e is as follows:
set path=($path $PGI/rs6000/bin) 
And for SP 2 systems is:
set path=($path $PGI/sp2/bin)

SGI Systems

Due to a known bug in the IRIX 6.0 linker, some HPF programs may fail to link and could produce the following link-time error:
ERROR 104: GOT page/offset relocation out of range: x.o
ERROR 104: GOT page/offset relocation out of range: x.o
where x.o is one of the object files being linked. This problem should not occur with the version of the linker included in IRIX 6.1. No known work-around is available.
SGI MPI Communications Libraries
Several versions of the MPI communications libraries are available on some SGI systems, including: versions of MPI-CH and SGI's MPI. Using pghpf, the default is the MPI-CH environment.

The file $PGI/patches/ contains a patch script and a README file with the latest information for changing an installation for the different MPI versions.

With the patches for SGI MPI 2.0 installed, every link will give the following warnings:

ld64: WARNING 85: definition of atexit in /usr/pgi/pcxl/lib/mips4/lib...
ld64: WARNING 85: definition of exit in /usr/pgi/pcxl/lib/mips4/libpg...
These warning messages can be ignored.
SGI - Setting MPIRUN_UNEX
On SGI systems, the following error may occur:
    Too many HIPPI messages in input queue without matching receives.
MPI can hold only 64 such unexpected messages per process at a time. The environment variable MPIRUN_UNEX may be used to increase this limit. Setting MPIRUN_UNEX to 1024 should work for most code.

Convex Exemplar

The pghpf compiler, running on the Convex Exemplar, requires the permissions on /dev/lan0 to be 0666 for licensing to work correctly. Using these permissions leads to possible security implications for the site.

The Convex Exemplar PVM runtime implementation has a limited buffer size. This may cause compiled programs to fail. The buffer size can be increased. Refer to your system administrator or the bugs section in the PVM Readme.mp file for more information.

When compiling extrinsic routines, the Fortran 77 compiler option +ppu should be used. This option appends underscores at the end of definitions of and references to externally visible symbols. Since the caller appends underscores for extrinsic names, the callee extrinsic needs this option when it is compiled.

Intel Paragon Systems

The pghpf 2.1 release supports cross development from various systems to Intel Paragon systems. To support this cross development environment, several variables need to be set. The environment variable PARAGON_XDEV needs to be set to use the Intel tools. This should be one directory above the Intel-supplied paragon directory. Intel's documentation should provide information on how to do this.

For example:

setenv PARAGON_XDEV /usr/local
The environment variable PGI needs to be set:
setenv PGI /usr/local/paragon/pgi
Then two elements need to be added to the path:
set path=($PARAGON_XDEV/paragon/bin.arch \     
          $PGI/pgon/bin.arch $path)
Where arch is the architecture on which the compilation is performed. Choices for arch include: sgi, solaris, and sun4.

4 Optimization Features

The pghpf 2.1 compiler performs many optimizations. Some of these optimizations are only available using higher levels of optimization (-O1 or -O2 on the compiler command-line). These optimizations include:
1.
Generation of overlap shift communications in the presence of appropriate indexing patterns, and for CSHIFT calls. This optimization involves generation of overlap shift communication when certain compile-time specifications are met. For example:
		INTEGER, DIMENSION(N,N) :: A,B
!HPF$ 	DISTRIBUTE (BLOCK,BLOCK) :: A,B
		A = CSHIFT(B, DIM=1, SHIFT=2)
For this example, a temporary will not be created, and data requiring communication will be communicated through an overlap area.

2.
Generation of collective regular communication calls. For example:
		INTEGER, DIMENSION(N,N) :: A,B
!HPF$ 	DISTRIBUTE (BLOCK,BLOCK) :: A,B
		FORALL(I=1:N,J=1:N) A(I,J) = B(J,I)
This example will generate a call to a runtime routine that handles permutations of axes for communications.

3.
Generation of collective irregular communication calls in the presence of indexed array assignments or FORALL. For example:
	INTEGER, DIMENSION(N,N) :: A,B,C
!HPF$ 	DISTRIBUTE (BLOCK,BLOCK) :: A,B,C
	FORALL(I=1:N,J=1:N) A(C(1,I),C(2,J)) = B(J,I)
This scatter array access is recognized and a scatter communication call is generated.

4.
Sharing of runtime data descriptors for arrays of like size and shape that are identically aligned to a common template. For example:
	INTEGER, DIMENSION(N,N) :: A,B
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B
One descriptor is created and shared for both arrays A and B. The compiler automatically aligns A and B to the same template. This alignment occurs even though the programmer did not align A and B with each other or align them to a common template.

5.
Use of INTENT information to eliminate unnecessary copying of arguments at subroutine boundaries. Note, many Fortran 90 compilers currently ignore INTENT statements. If you compile code containing erroneous INTENT statements, your program may fail under pghpf.

6.
Common runtime call elimination across basic blocks. For example:
	INTEGER, DIMENSION(N,N) :: A,B,C,D
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B,D
!HPF$  DISTRIBUTE (CYCLIC,CYCLIC) :: C
	A(:,:) = C(:,:)
	B(:,:) = C(:,:)
This code will generate a single runtime communication sequence, so any communication involved with the use of C will only happen once.

7.
Sharing of communications schedules, including schedules generated for irregular communications. For example, with the arrays in the example above and an indirection array V:
	A = B(V)
	C = D(V)
This sequence could generate two gather communication sequences, one for B(V) and one for D(V). However, the compiler creates a single schedule for the communications since they both use the same communication pattern. This reduces the overhead of scheduling computation, especially in nested loops.

8.
Fusing of Fortran 90 array assignments and FORALL statements. For example, below the statements for arrays A, B, C and D will be fused in the same generated loop. Without this optimization, several sequences of loops will be generated for these statements:
		A = B
		C = D

9.
Invariant communication calls are hoisted out of loops. For example:
		DO I = 1,N
		   A(I) = B(1) + A(I)*2
		   CALL FOO(A(I)
		END DO
The communication of B(1) is loop invariant and will be hoisted out of the loop.
The following points may be helpful in enabling you to obtain the best possible performance with the 2.1 release:

4.1 Overlap Shift Optimization

A new compilation flag is available to control the size of the overlap area that the compiler generates for certain array expressions. The overlap area is only generated and used for BLOCK distributed arrays. For most programs, the compiler's default handling for overlap areas should suffice as a balance between memory needs and the possible use of the overlap shift communications optimization. However, using too large an overlap area may result in a runtime memory allocation error such as the following:
0: ALLOCATE: xxxx bytes requested; not enough memory
The new overlap option is available for such cases. Use -Moverlap as follows:
%pghpf -Moverlap[=size:n]
This option controls the size of the overlap area the compiler generates for certain arrays. In some cases, increasing or decreasing the size of the overlap area may improve a program's performance. The default size is 4. You may want to change the size from the default to improve performance in cases where pghpf generates overlap_shift() communications. For example, in the code:
 !hpf$ distribute (block) :: a,b
forall(i=1:n) a(i) = b(i+10)
using the default overlap size of 4, pghpf does not use the overlap shift optimization, since 4 is too small. By increasing the overlap size to 10, pghpf generates overlap shifts.

The compiler only performs overlap shift communications for BLOCK distributed dimensions where an array's shift amount is a compile time constant and is less than the overlap default, or the size specified with -Moverlap.

Reducing the overlap size may also improve performance for some codes. Setting the size to 0 completely disables overlap shifts. If a program's expressions which utilize the overlap optimization never use an offset greater than one or two, then specifying an overlap size smaller than the default, for example a value of 2, will reduce memory usage and may reduce communications. For example, the following code shows an expression that would only require on overlap size of 1.

!hpf$ distribute (block) :: a,b
forall(i=1:n) a(i) = b(i-1) + b(i) + b(i+1)

5 pghpf 2.1 Input/Output

The pghpf 2.1 implementation supports full Fortran 90 I/O semantics. This includes non-advancing I/O, namelist I/O, and I/O of array sections. There are no restrictions on using mapped arrays in list directed, formatted, or unformatted I/O statements. For example:
	INTEGER, DIMENSION(N,N) :: A,B
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B
	FORALL(I=1:N,J=1:N) A(I,J) = B(J,I)
	PRINT *, A,B
	END
Variables used in namelist groups (NAMELIST) may not be mapped; the compiler issues a warning message if an attempt is made to map a variable in a namelist group:
PGHPF-W-0311-Non-replicated mapping for namelist
             array, name, ignored (test1.hpf:4)
Input and output is currently serialized. One processor reads or writes the data and sends or receives it to or from the other processors owning the data.

There are two methods pghpf uses to perform I/O, depending upon the data items being read or written. For example, assuming a and b are arrays, the command:

read(...) (a(i),b(i),i=1,1000)
will not be very efficient running on anything other than a small number of processors. All of a and b are read by a single processor and then broadcast to all nodes.
read(...) a(1),a(3),a(5),...
The code above reads a list of scalars. This is not the most efficient I/O for pghpf.
read(...) a,b
The example above should perform better, a single processor still reads all data, but it only sends the parts of arrays a and b that each node requires. The example below will generate similar code:
read(...) (a(i),i=1,1000)

6 INDEPENDENT Loops

Beginning in pghpf version 2.1, the compiler parallelizes INDEPENDENT DO loops without first transforming them into FORALL statements. Auto-parallelization is also provided in pghpf and will process DO loops that have not already been parallelized by processing of INDEPENDENT DO loops (for details on this facility, refer to the information on the command-line option -Mautopar).

An INDEPENDENT DO loop is designated by the programmer by preceding it with the INDEPENDENT directive. For example:

!HPF$ INDEPENDENT
The compiler accepts the above, or any of the standard HPF directive prefixes, as well as additional INDEPENDENT clauses (section 6.2 "INDEPENDENT Clauses").

No command-line switches are needed to invoke parallelization of INDEPENDENT loops. The -Mnoindependent switch is available to inhibit parallelization of all INDEPENDENT loops. The -Minfo command-line switch reports which loops have been parallelized.

6.1 Restrictions on Parallelization



Warning

Use of the INDEPENDENT directive on some loops may cause array copying and introduce significant overhead. If you find that performance for a program degrades with pghpf 2.1 version, try using option -Mnoindependent with -Mautopar. If performance improves, the program may contain a loop should not use the INDEPENDENT directive.


At present, only INDEPENDENT loops with Fortran-77 constructs can be parallelized. In particular, the presence of array assignments, WHERE statements, FORALL statements, and ALLOCATE statements will eliminate loops from consideration for parallelization. INDEPENDENT loops can be nested, currently to a depth of seven loops, but there can be at most one INDEPENDENT loop directly nested within another INDEPENDENT loop. For example, the following loop nest will not be parallelized since two independent loops are present at the same level.

!HPF$ INDEPENDENT
	DO i = 1, n
!HPF$ INDEPENDENT
	    DO 10 j = 1, m
	10	A(j,i) = (j-1) * n + i
!HPF$ INDEPENDENT
	    DO 20 k = m, 1, -1
	20	B(k,i) = A(m-k+1,i)
	ENDDO
This restriction has been added to ensure that a unique home array can be found for the entire INDEPENDENT loop nest (see section 6.2 "The On Home Clause" for a discussion of home arrays). For the same reason, trip counts and strides for non-outermost INDEPENDENT loops must be invariant with respect to the entire loop nest.

There are additional cases where INDEPENDENT loops are not parallelized or are only parallelized if an INDEPENDENT clause is used (refer to the following section for a description of INDEPENDENT clauses). To describe these cases, we must first define several terms. Each INDEPENDENT DO loop defines an INDEPENDENT index, which is the DO loop's index. In processing INDEPENDENT loops, the compiler will replicate those variables that do not contain subscripts that are functions of INDEPENDENT indices. As a degenerate case, all scalars will be replicated. Variables that the compiler replicates may originally be distributed. To perform parallelization, the compiler will create replicated copies. The resulting variables are compiler-replicated.

The compiler must ensure that values of compiler-replicated variables will be identical across all processors. If a compiler-replicated variable can be modified within an INDEPENDENT loop, and is used outside the loop, the loop will not be parallelized.

Modifications to compiler-replicated variables can be made through assignment statements, or through procedure calls. Any modification to a compiler replicated variable disables parallelization of the INDEPENDENT loop unless NEW or REDUCTION clauses are specified for the modified variable or there are no uses (refer to section 6.2). The presence of INTERFACE blocks for procedures describing the INTENTs of parameters will help the compiler to identify variables that are not modified across procedure calls (refer to section 6.3 "Procedure calling").

Uses of variables may be explicit, and can occur either after the INDEPENDENT loop nest, or within the same loop nest. For example, the following INDEPENDENT loop has a likely programming error because variable j is both read and written on different iterations, violating Bernstein's conditions (refer to page 193 of The High Performance Fortran Handbook).

!HPF$ INDEPENDENT
	DO 10 i = 1, n
	10   j = j + A(i)
Implicit uses of variables arise either because the variables exist in COMMON blocks, or because the variables occur as dummy parameters with INTENT INOUT or INTENT OUT.

Another reason that INDEPENDENT loops may not be parallelized is the presence of array aliases: there may be distinct array references, where at least one reference is a store, that refer to the same array locations on certain iterations. When the compiler must copy programmer-defined arrays to compiler-created arrays and array aliasing arises, the compiler cannot determine how to replace a given array reference. This problem can arise in the following INDEPENDENT loop.

!HPF$ INDEPENDENT
	DO i = 1, n
	    A(J1(I)) = 0
	    A(J2(I)) = 1
	ENDDO
If the first reference to array A is replaced with A$TMP1 and the second is replaced by A$TMP2, the compiler cannot determine which temporary array to copy back to A after the loop.

6.2 INDEPENDENT Clauses

The full syntax of the pghpf 2.1 implementation of INDEPENDENT directive is the following.
INDEPENDENT [, ON HOME ( home-array )]
            [, NEW ( var-list )]
            [, REDUCTION ( var-list )]
The following sections, describe the NEW, ON HOME, and REDUCTION clauses.

Warning

The ON HOME and REDUCTION clauses are not part of the HPF 1.1 language standard. These clauses are provided to assist users in parallelizing INDEPENDENT loops. Both the name and the syntax for these clauses may change in upcoming releases of pghpf, in accordance with changes made in the HPF language standard.

The NEW Clause

The NEW clause specifies a list of compiler-replicated variable names (separated with commas). Assignment to a compiler-replicated variable violates Bernstein's conditions (the variable will be assigned values in multiple iterations), and will prevent parallelization (see Section 2). However, when the variable is present in a NEW clause, the loop is treated as if a new instance of the variable is created for each iteration of the INDEPENDENT loop, and Bernstein's conditions are discharged.

The following example demonstrates use of the NEW clause.

!HPF$ INDEPENDENT, NEW (S)
	DO I = 1, n
	    S = SQRT(A(i)**2 + B(i)**2)
	    C(i) = S
	ENDDO
After execution of the INDEPENDENT loop, values of compiler-replicated variables appearing in NEW clauses may be different across different processors, causing errors if these variables are used without intervening assignments.

The ON HOME Clause (pghpf 2.1 Extension to HPF)

The ON HOME clause specifies an array reference to be used to localize loop iterations for an INDEPENDENT loop nest. The ON HOME clause associates INDEPENDENT indices to dimensions of the home array.

The ON HOME clause is optional. If it is not specified, the compiler will select a suitable home array from array references within the INDEPENDENT loop, or will create a home array (without actually allocating space for it).

Each INDEPENDENT index of a loop nest should be a subscript in a mapped dimension of the home array reference in the ON HOME clause. Valid distribution attributes are BLOCK and BLOCK(N). The home-array should reference valid array locations for all values of the INDEPENDENT indices. When a subscript is not an INDEPENDENT index, it can be a triple. The following example demonstrates use of the ON HOME clause.

	DIMENSION A(0:n+1,1:m)
!HPF$	DISTRIBUTE A(BLOCK,*)
!HPF$ INDEPENDENT, ON HOME (A(i,:))
	DO 1 i = 1, n
1	    B(i) = i

The REDUCTION Clause (pghpf 2.1 Extension to HPF)

The REDUCTION clause specifies a list of accumulator variable names (separated with commas). When an accumulator is compiler-replicated, its appearance in a reduction statement will violate Bernstein's conditions in the same way that other assignments to compiler-replicated variables violate these conditions. It is not correct to place accumulators in NEW clauses because their values must be accumulated across processors. The REDUCTION clause specifies that reduction statements do not violate Bernstein's conditions.
!HPF$ INDEPENDENT, REDUCTION (S)
	DO I = 1, n
	    S = S + A(I)
	ENDDO
A reduction statement is an assignment statement in one of the forms below:
A = A + E
A = A * E
A = A .or. E
A = A .and. E
A = A .neqv. E
A = iand(A, E1, ..., En)
A = ior(A, E1, ..., En)
A = ieor(A, E1, ..., En)
A = min(A, E1, ..., En)
A = max(A, E1, ..., En)
In these reduction statements, A is an accumulator appearing in a REDUCTION clause, and expressions E, E1, ..., En do not contain A. The compiler produces statements to perform reductions locally on all processors, then combines all local accumulators globally.

6.3 Procedure Calling

Calls to subroutines, functions, and most intrinsics can occur within INDEPENDENT loops. Due to the presence of side-effects in their implementation, intrinsics of class "Subroutine" (for example random_number()) will prevent parallelization of INDEPENDENT loops. All called subroutines and functions must be PURE, for the programmer to specify that no communication will be generated within the called program unit. If a called subroutine or function is not PURE, as described in an INTERFACE block, the compiler issues a warning message:
             PGHPF-W-0313-Sub-program x within INDEPENDENT loop not PURE

6.4 Independent Performance

DO loops are marked with INDEPENDENT directives to inform the compiler that the loops can be executed in parallel, thereby attaining improved performance. To achieve correct program behavior in the presence of parallelism, the compiler must first analyze INDEPENDENT loops, and then possibly perform transformations on the loops. Some of these transformations may impede performance to the point that they execute slower than similar loops running sequentially on replicated data. While the compiler attempts to reduce the number of such transformations, the programmer has a large role to play in eliminating these transformations. This section will discuss strategies for programmers that will improve the performance of INDEPENDENT loops.

Every INDEPENDENT loop nest is assigned a home array by the compiler. All array references in an INDEPENDENT loop nest are examined to see if they are aligned with the home array. Array references that are not aligned are replaced with new temporary arrays which are aligned with the home array. The time required to allocate and deallocate new temporary arrays, as well as the time to copy data both to the temporary arrays and then back to the original arrays can be substantial, and is the primary cause of slowdown in performance of INDEPENDENT loops.

The compiler's -Minfo command-line switch informs programmers about the presence of temporary arrays for which performance overhead of array copying may be substantial. In this case, the compiler produces messages such as the following:

    14, Independent loop parallelized
        expensive communication: all-to-all communication (copy_section)
    18, expensive communication: all-to-all communication (copy_section)
The first "expensive communication" message is produced for the copy into a temporary array, and is associated with the first line of the INDEPENDENT loop nest (line 14 in the above message). The second "expensive communication" message is produced for the copy from the temporary array to the original array, and is associated with the last line of the INDEPENDENT loop nest (line 18 in the above message).

Small changes to a program can lead to a substantial reduction in the number of temporary arrays created by the compiler. There are two primary strategies that can be followed:

  1. Change the number of INDEPENDENT loops to match the home array.
  2. Change array distributions to align with the home array.
In the following loop nest, no suitable home array can be found, because its only array is distributed over just one dimension, while both loops in the nest are INDEPENDENT :
!HPF$ DISTRIBUTE (BLOCK,*) :: A
!HPF$ INDEPENDENT
DO 1 i = 1, m
!HPF$ INDEPENDENT
DO 1 j = 1, n
1 A(i,j) = (i-1) * n + j
For this loop nest, a temporary copy will be created for A. The temporary array will be distributed over both of its dimensions. This temporary can be eliminated in one of two ways:
  1. Eliminate the inner INDEPENDENT directive. With this change, array reference A(i,j) becomes a suitable candidate as a home array. Of course a disadvantage to this change is that the amount of parallelism is reduced. But the amount of available parallelism is constrained by the distribution of A, since it is mapped over just one dimension.
  2. Remap A to be (BLOCK,BLOCK).

7 Profiling

PGI provides a graphical HPF profiling tool, pgprof, which allows function and line level profiling of HPF programs. The pgprof User's Guide contains more information on the profiler. Using the profiler involves compiling with the profiling flags, executing the program to create the profile data file pgprof.out, and then running the profiler to view the profile data.

7.1 Profiling Compilation

The following list shows the pghpf switches which cause profile data collection calls to be inserted and libraries to be linked in the executable file:
-Mprof=func
insert calls to produce a pgprof.out file for function level data.
-Mprof=lines
insert calls to produce a pgprof.out file which contains both function and line level data.
For example:
%pghpf -Mprof=lines -otest1 test_prog.hpf

7.2 Program Execution

Once a program is compiled for profiling, it needs to be executed. The profiled program is invoked normally, but while running it collects call counts and/or timing data. When the program terminates, it generates a profile data file called pgprof.out .

7.3 Profiler Invocation and Initialization

Running the profiler, pgprof allows the profile data produced during the execution phase to be analyzed and initializes the profiler.

The profiler pgprof is invoked as follows:

% pgprof [options] [-I srcdir] [-o prog] [datafile]
If invoked without any options command-line or arguments, pgprof looks for the pgprof.out data file and the program source files in the current directory. The program's executable name, as specified when the program was run, is usually stored in the profile data file. If all program related activity occurs in a single directory, pgprof needs no arguments. If present, the arguments are interpreted as follows:

8 Debugging

Debugging programs developed under pghpf 2.1 can be difficult. No HPF debugger is currently provided. However, if necessary the programmer can debug the generated SPMD Fortran 77 program on each node using multiple X windows. This can be particularly useful in obtaining a traceback on a program that is crashing unexpectedly.

To prepare an HPF program for debugging, use the -Mg compile-time option to pghpf. The generated Fortran 77 output will be saved and the Fortran 77 node compiler will be invoked with the -g compile-time option to provide symbolic information in the image file. If you wish to execute the program on only a single processor, use the -Mrpm1 compiler option when linking the program (this is not available on all platforms). You can then use a standard debugger on the image file.

If you need to execute the program on multiple processors, the following sequence is useful:

Another debugging technique is to use the compiler command-line option -Mprof=lines, and then run the program. If a control-C is pressed, a traceback will usually result.

9 Bug Fixes

This section briefly lists the bugs fixed from release 2.0.3 to 2.1. The following bugs are fixed only in pghpf 2.0.3 and newer versions of pghpf. The following bugs are fixed only in pghpf 2.0.2 and newer versions of pghpf. This section briefly lists the bugs fixed from release 1.3 to 2.0.
    subroutine sub(a)
    implicit none
    common /c/ n
    integer n
    character*8 a(n)
    end

10 Compiler Command-line Options

The table below provides a list of compiler command line options that are valid on many systems. Some systems do not support all of these options. In addition, most node compiler options are available for systems with a node compiler that is not supplied with pghpf. The -Marg pghpf for specific options are listed after this table and the pghpf User's Guide describes them in more detail.

Compiler Command-line Options

Option               Description                                       
-c                   Stops after assembling (results placed in         
                     filename.o).                                      
-Dname[=val ]        Defines a preprocessor macro name with value      
                     val.                                              
-dryrun              Show but do not execute all commands created by   
                     the driver.                                       
-E                   Displays preprocessed HPF file to the standard    
                     output.                                           
-F                   Saves a preprocessed HPF file in filename.f.      
-help                Display the complete list of driver options.      
-Idirectory          Adds a directory directory to the search path     
                     for #include files.                               
-Ldirectory          Adds a directory directory to the search path     
                     for library files.                                
-llibrary            Loads the library, in addition to the standard    
                     libraries.                                        
-O[level]            Specifies code optimization at the specified      
                     level.                                            
-ofilename           Names the object file filename.                   
-r4                  Interpret DOUBLE PRECISION variables as REAL.     
-r8                  Interpret REAL variables as DOUBLE PRECISION.     
-time                Print execution times for the various compiler    
                     steps.                                            
-Uname               Undefine a preprocessor macro name.               
-V                   Displays the compiler phase version messages.     
-v                   Displays the compiler, assembler and linker       
                     phase invocation.                                 
-W0,arg              Passes arguments arg to the node compiler.        
-Wa,arg              Passes arguments arg to the assembler.            
-Wl,arg              Passes arguments arg to the linker.               
-Wh,arg              Passes arguments arg to the HPF compiler.         
-w                   Do not print warning messages.                    

pghpf Compiler Options

Option              Description                                         
-Mautopar           Auto-parallelize Fortran DO loops.                  
-M[no]backslash     Determines how the backslash character is treated   
                    in quoted strings.                                  
-Mcmf               Provides limited support for CM Fortran             
                    compatibility.                                      
-Mextract           Perform a manual extract phase for procedures       
                    within INDEPENDENT DO loops that are to be          
                    inlined. See the -Minline option.                   
-M[no]dclchk        Determines whether all program variables must be    
                    declared.                                           
-M[no]depchk        Compiler checks for potential data dependencies.    
-M[no]dlines        The compiler treats lines containing "D" in         
                    column 1 as executable statements. With nodlines    
                    the compiler does not treat lines containing "D"    
                    in column 1 as executable statements (does not      
                    ignore the "D".                                     
-Mextend            The compiler accepts 132-column source code;        
                    without this option lines are 72 columns.           
-Mfreeform          Process source using Fortran 90 freeform input      
                    specifications.                                     
-Mftn               Stop after compiling HPF and keep the               
                    intermediate Fortran 77 output.                     
-Mg                 Set the debug option, as well as the -Mkeepftn      
                    option, and also set the pghpf compiler flag that   
                    makes debugging the Fortran 77 output easy by       
                    suppressing HPF line numbers in the generated       
                    Fortran 77 intermediate file.                       
-Minfo              Instructs the compiler to produce size, time, and   
                    other compilation information.                      
-Minform            Specify the minimum level of error severity that    
                    the compiler will display.                          
-Minline            Perform procedure inlining within INDEPENDENT DO    
                    loops.                                              
-Mkeepftn           Retain Fortran 77 intermediate files.               
-Mmpi               Link a version of the HPF runtime libraries and     
                    startup routines for the PGI mpi environment        
                    (valid only on certain platforms).                  
-Mmpl               Link a version of the HPF runtime libraries and     
                    startup routines for the PGI mpl environment        
                    (valid only on certain platforms).                  
-M[no]list          Specifies whether the compiler creates a listing    
                    file.                                               
-Mnofree[form]      Use fixed form formatting for file processing.      
-Mnohpfc            Skip the HPF compilation step and compile using     
                    the Fortran  77 compiler if a file with a .f or     
                    .F extension is supplied.                           
-Mnoindependent     Do not apply the INDEPENDENT directive to DO        
                    loops.                                              
-Moverlap           Set the size of the overlap area for BLOCK          
                    distributed arrays.                                 
-Mpreprocess        Run the preprocessor on the input source file.      
-Mprof              Select profiling. Insert calls to profile           
                    routines and link profiler libraries.               
-Mpvm               Generate code using runtime libraries and startup   
                    routines for the PVM environment.                   
-Mr8                Promote REAL variables and constants to DOUBLE      
                    PRECISION and COMPLEX to DOUBLE COMPLEX.            
-Mreplicate         The array replicator eliminates calls to            
                    pghpf_get_scalar() by  replicating distributed      
                    arrays that satisfy  certain conditions.            
-Mrpm               Link a version of the HPF runtime libraries and     
                    startup routines for the PGI RPM environment        
                    (valid only on certain platforms).                  
-Mrpm1              Link a version of the HPF runtime libraries and     
                    startup routines for the PGI RPM single-process     
                    environment for debugging (valid only on certain    
                    platforms).                                         
-M[no]sequence      All variables are created as SEQUENCE variables,    
                    where sequential storage is assumed. With           
                    -Mnosequence, all variables are created as          
                    nonsequential variables unless an explicit          
                    SEQUENCE directive is supplied or the variable is   
                    an assumed size array.                              
-Mstats             Link a version of the runtime libraries for         
                    printing runtime communications and message         
                    passing statistics.                                 
-Mstandard          Causes the compiler to flag source code that does   
                    not conform to the ANSI Fortran 90 standard.        
-Mupcase            Allow uppercase letters in identifiers.             



Note

The default pghpf compiler options depend on values set in the .pghpfrc driver configuration file. Depending on the PGI product you purchased, your defaults may differ.

11 Contacting PGI

The Portland Group, Inc. has the following mail address and telephone number. You can call PGI, or contact us by email as described below.
The Portland Group, Inc                                               
9150 SW Pioneer Ct, Suite H        +1-503-682-2806 (voice)            
Wilsonville, OR  97070             +1-503-682-2637 (FAX)              

11.1 Obtaining Sales Information

To obtain further information on pghpf 2.1, or on other PGI products, please send
e-mail to sales@pgroup.com or contact PGI at the address/number shown above.

The Portland Group, Inc. also maintains a WWW home page with information on PGI and its products; the URL is http://www.pgroup.com.

11.2 Reporting Bugs

To report bugs with the pghpf compiler or runtime, please send e-mail to trs@pgroup.com. If you are reporting a bug, it is best if you include a code sample that demonstrates the bug, a description of the system you are using, as well as the error message and the options used to compile. If it is a runtime error, or a problem with your program's results, the options used while running the program.

To obtain further assistance on pghpf 2.0, or on other PGI products, you can also use the address/number shown above.

11.3 Retrieving Software and Documentation

PGI currently supports version 1.10 of MPI. For information on obtaining the current version of MPI, contact the following:
http://www.mcs.anl.gov/mpi/index.html
PVM
PGI currently supports the latest version of PVM version 3.3. For information on obtaining PVM, contact the following:
http://www.netlib.org/pvm3

11.4 pghpf 2.1 Online Documentation

Online documentation is available for pghpf 2.1 using a WWW web browser such as Mosaic. To access the online documentation, access the file pghpf.index.html. For example, using Mosaic the command to bring up the online documents would be:
%xmosaic $PGI/doc/hpf/html/pghpf.index.html

     

Appendix A
HPF_LOCAL_LIBRARY Procedures

This appendix lists the HPF_LOCAL_LIBRARY procedures. Table B.1 briefly lists the procedures. Refer to Appendix A and B for details on the intrinsics defined in the Fortran 90 Language Specification and for HPF LIBRARY procedures.

For complete descriptions of the HPF_LOCAL_LIBRARY routines, and the current standards for HPF_LOCAL extrinsics, refer to Annex A, "Coding Local Routines in HPF and Fortran 90", in the High Performance Fortran Language Specification (Version 1.1, November 10, 1994, http://www.erc.msstate.edu/hpff/hpf-report/hpf-report/hpf-report.html or http://www.crpc.rice.edu/HPFF/home.html)

HPF_LOCAL_LIBRARY Procedures

Intrinsic                   Description                                
ABSTRACT_TO_PHYSICAL        Returns processor identification for       
                            physical processor associated with a       
                            specified abstract processor.              
GLOBAL_ALIGNMENT            Returns information about the global HPF   
                            array argument.                            
GLOBAL_DISTRIBUTION         Returns information about the global HPF   
                            array argument.                            
GLOBAL_LBOUND               Returns lower bounds of the actual HPF     
                            global array associated with a dummy       
                            array.                                     
GLOBAL_SHAPE                Returns the shape of the global HPF        
                            actual argument.                           
GLOBAL_SIZE                 Returns the global extent of the           
                            specified argument.                        
GLOBAL_TEMPLATE             Returns template information for the       
                            global HPF array argument.                 
GLOBAL_TO_LOCAL             Converts a set of global coordinates       
                            within a global HPF actual argument to     
                            an equivalent set of local coordinates.    
GLOBAL_UBOUND               Returns upper bounds of the actual HPF     
                            global array associated with a dummy       
                            array.                                     
LOCAL_BLKCNT                Returns the number of blocks of elements   
                            in each dimension on a given processor.    
LOCAL_LINDEX                Returns the lowest local index of all      
                            blocks of an array dummy ..                
LOCAL_TO_GLOBAL             Converts set of local coordinates within   
                            a local dummy array to an equivalent set   
                            of global coordinates.                     
LOCAL_UINDEX                Returns the highest local index of all     
                            blocks of an array dummy argument.         
MY_PROCESSOR                Returns the identifying number of the      
                            calling physical processor.                
PHYSICAL_TO_ABSTRACT        Returns coordinates for an abstract        
                            processor, relative to a global actual     
                            argument array.