Archived: 30 January 2004
Updated: 30 January 2006
Published in
International Journal of Supercomputing, vol. 34, 201-217 (2005).
© Copyright Springer.
An earlier version was published in Proc. of Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS'04), Santa Fe, April 2004.
This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message passing programs and communications on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines, and can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights into MPI communications performance of parallel computers, particularly the effects of network contention. PEVPM provides a simple, fast and accurate technique for performance modelling and prediction of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on Beowulf clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, and can be difficult to model accurately using other tools.
Keywords: parallel computing, cluster computing, performance modelling.
The original publication is available at www.springerlink.com.