NOTE: This scholarship offer is now closed.
The Distributed and High-Performance Computing Group in the School of Computer Science at the University of Adelaide has a PhD scholarship available for research into performance modelling and prediction of parallel computer networks and parallel programs. The PhD research work is part of the Jabberwocky project, in collaboration with the Department of Computer Science at the Australian National University, and Alexander Technology, a company that designs and builds cluster computers (high-performance parallel computers that use commodity components).
The successful applicant will get an APA_IT PhD scholarship, which has a stipend of $24650 pa (tax free). The project is also offering a top-up of up to $5000 pa for strongly qualified candidates, conference travel and visits, internships and part-time work opportunities with Alexander Technology.
An APA_IT scholarship requires an Honours I or IIA degree in a computing-related discipline (or equivalent qualification). Applicants must be Australian permanent residents or New Zealand citizens to be eligible for APAI_IT scholarships. However, if no suitably qualified Australian resident can be found, the scholarship may be offered to non-residents, so we encourage applications from international students.
Applicants should have an interest, and preferably some experience, in the areas of networks, network traffic modelling and/or parallel computing.
NOTE: APPLICANTS MUST BE ABLE TO START BY 31 DECEMBER 2006.
If you are interested in this scholarship, you should send a copy of your CV, academic transcript, and the names and contact details of three referees to Dr Paul Coddington. Some information about previous research projects (e.g. thesis, reports, papers) will also be helpful.
Developing and optimising message-passing parallel programs using MPI can be very difficult. Speedup and scalability of the program to large numbers of CPUs can vary significantly on different parallel computing architectures, particularly with different communications networks. Optimising the program requires testing performance and scalability every time the program is modified, which is a time-consuming task, and cannot easily be done across a variety of architectures and for large numbers of CPUs.
To address these problems, there has been a lot of work on techniques for modelling the performance of parallel programs. However in order to get accurate results, solutions to date have typically required very detailed, complex models that take significant effort to develop and are very slow to run. Simpler and faster models do not usually give very accurate results, and in particular, are not sensitive to the effects of network contention, which can be quite significant for large numbers of processors and/or when the application requires large amounts of communication.
We have developed a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). Unlike previous techniques, the PEVPM system is relatively easy to use, inexpensive to apply and potentially very accurate. To accurately simulate the performance of the communications network of the parallel computer, including the effects of contention, PEVPM uses a new MPI communications benchmarking tool that we have developed, called MPIBench.
Currently, performance modelling using PEVPM requires some manual steps to develop a PEVPM performance model and then execute it using the Virtual Parallel Machine. Further work on this project is aiming to automate the procedure and to test and improve the accuracy of the system on a wider variety of parallel programs.
We are particularly aiming at developing a system that can use performance measurements on systems with small numbers of CPUs to accurately predict performance and scalability on large numbers of CPUs. This is the goal of the project that the PhD student will be working on.
There are two related components of the project that will be studied by researchers at the DHPC group at the University of Adelaide, and our collaborators at the Australian National Unversity:
Efficient Application Performance Modelling and Prediction for Cluster
Computers
This part of the project aims to develop tools and methodologies to accurately
predict the performance of MPI applications running on medium to large
scale clusters. The predictions will be based on an analysis of the
application's performance on a small-scale cluster. The project's
approach is to enhance and extend the MPIBench / PEVPM methodology,
making it easier to use by automating some of the steps that are
currently done manually, and developing tools to efficiently derive
performance models of an application executing on the cluster.
Improved Models of Cluster Networks
This part of the project aims to develop more accurate and sophisticated analytical
models of the communication performance of cluster networks. These
models will be based on fitting distributions of communication times
measured by MPIBench to standard probability distribution functions
used in network performance modelling. Provided the function is
carefully selected, and its parameters are accurately estimated, subtle
effects such as switch-level contention and the degree of CPU involvement
in communication can be modelled. Of key importance is modelling these
as a function of the cluster's size - this will enhance the MPIBench /
PEVPM methodology to make accurate predictions on large-scale clusters,
and also on clusters using next-generation networks.