DHPC Adelaide

DHPC Technical Report DHPC-133

Active Data Repositories for Computational Grid Applications

Jimmy Lee

Archived: 11 February 2003

University of Adelaide Honours thesis, October 2002.

Supervisors: Andrew Wendelborn, Paul Coddington and Kevin Maciunas

Abstract

A grid computing environment concerns the coordinated sharing of heterogeneous resources between individuals and/or institutions over a wide area network. A data grid describes a grid environment involving the management of remote access to large amounts of data. Data grids must satisfy the combination of large dataset size, geographic distance and the need for computationally intensive analysis.

Applications in a data grid are often required to access large datasets, but may only need a subset of this dataset, or use this dataset to compute a resulting dataset of significantly smaller size. A better approach is the processing of this data where it is located, reducing the amount of data that must be transmitted over the network. This allows a request for data that does not yet exist, termed active or virtual data.

The Globus toolkit provides low-level data management tools and services that are widely used in grid computing. We have provided support for active data using a pure Globus implementation and through the incorporation of Globus tools and services into the Distributed Active Resource ArChitecture (DARC). These implementations have been incorporated into the Process Networks Architecture for Geographical Information Systems (PAGIS) to demonstrate the applicability of this work in computational grid applications.

PDF version

Postscript version (gzip compressed)


[ DHPC Adelaide | DHPC Bangor | Contacts | People | Projects | Reports ]

webmaster@dhpc.adelaide.edu.au