DHPC Adelaide

DHPC Technical Report DHPC-038

Service Scheduling on Wide-Area Metacomputer Clusters

K.A.Hawick, P.D.Coddington and H.A.James

Archived: 16 March 1998

Abstract

It is a significant problem to provide a robust and portable software environment that can link together clusters of workstations and other heterogeneous computers. There are particular difficulties when the computer clusters to be managed transcend administrative boundaries across wide-area networks. We review some of the technologies that have emerged recently for managing arbitrary computer programs across clusters of computers, and use our experiences with such systems to illustrate the difficulties in managing systems across wide areas.

A simplifying approach is to limit the services provided across wide-area clusters to well-defined processing and data access modules, that are specified a priori and are advertised between servers. Client programs can then invoke queries on databases, and set up processing tasks based on combinations of these well-defined services. Developers can build new modules or services conforming to a well specified application programming interface and new services can be tested within administrative boundaries before being made available across wide-area clusters. This is the approach we take with our DISCWorld metacomputing environment.

We focus on a description of the scheduling aspects involved in managing multiple job streams across wide-area clusters to optimise either user response-time or cluster utilisation. We describe how a server-less or non-hierarchical architecture maintains scalability when additional cluster nodes are added. This high-level service-based approach provides a higher granularity of distributed computation than other systems and provides a way to amortise the latency that accrues over wide areas. Services can be provided as portable code modules that may run on a variety of service providers, such as Java modules running on distributed Java Virtual Machines, or can be optimised native code that runs on specific high-performance resources in the clusters. This provides a way of encapsulating parallel supercomputers in a wide-area cluster environment.

PDF version

PostScript version (gzip compressed)


[ DHPC Adelaide | DHPC Bangor | Contacts | People | Projects | Reports ]

webmaster@dhpc.adelaide.edu.au