Performance Modelling of Parallel Porgrams using a Performance Evaluating Virtual Parallel Machine


Motivation

Developing and optimising message-passing parallel programs using MPI can be very difficult. Speedup and scalability of the program to large numbers of CPUs can vary significantly on different parallel computing architectures, particularly with different communications networks. Optimising the program requires testing performance and scalability every time the program is modified, which is a time-consuming task, and cannot easily be done across a variety of architectures and for large numbers of CPUs.

To address these problems, there has been a lot of work on techniques for modelling the performance of parallel programs. However in order to get accurate results, solutions to date have typically required very detailed, complex models that take significant effort to develop and are very slow to run. Simpler and faster models do not usually give very accurate results, and in particular, are not sensitive to the effects of network contention, which can be quite significant for large numbers of processors and/or when the application requires large amounts of communication.

Performance Evaluating Virtual Parallel Machine (PEVPM)

We have developed a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). Unlike previous techniques, the PEVPM system is relatively easy to use, inexpensive to apply and potentially very accurate. It uses a novel bottom-up approach, where submodels of individual computation and communication events are dynamically constructed from data-dependencies, current contention levels and the performance distributions of low-level operations, which define performance variability in the face of contention. During model evaluation, the performance distribution attached to each submodel is sampled using Monte Carlo techniques, thus simulating the effects of contention. This allows the PEVPM to accurately simulate a program's execution structure, even if it is non-deterministic, and thus to predict its performance. Obtaining these performance distributions required the development of a new MPI communications benchmarking tool, called MPIBench.

Currently, performance modelling using PEVPM requires some manual steps to develop a PEVPM performance model and then execute it using the Virtual Parallel Machine. Further work on this project is aiming to automate the procedure and to test and improve the accuracy of the system on a wider variety of parallel programs. We are particularly aiming at developing a system that can use performance measurements on systems with small numbers of CPUs to accurately predict performance and scalability on large numbers of CPUs. This is one of the goals of the Jabberwocky project.

Publications:


Paul Coddington, paulc@cs.adelaide.edu.au
Distributed and High-Performance Computing Group, School of Computer Science, University of Adelaide.