Technical Report DHPC-029 Compression of Image Data and Performance Tradeoffs for Client/Server Systems

Technical Report DHPC-029


Compression of Image Data and Performance Tradeoffs for Client/Server Systems

J.F.Hercus and K.A.Hawick
Email: {james,khawick}@cs.adelaide.edu.au
Department of Computer Science, University of Adelaide, SA 5005, Australia

23 January 1997





Introduction

 

We discuss issues relating We to the compression system in the distributed image repository of the DISCWorld system. We The distributed image repository is intended for the distributed storage and retrieval of large scientific imagery datasets. Given the large and regular nature of this type of data there is considerable scope for the application of data compression techniques in the system to reduce storage requirements and data transmission times.

We develop a framework for modeling the performance of the compression system within the DISCWorld, with the intention of applying this to and in the design of the system. The model is applied to two example situations to illustrate its application in the design and optimisation of the repository.

Compression in a Distributed Image Repository

This section discusses the requirements of compression algorithms in the DISCWorld system[2] and describes the criteria which can be used to evaluate the suitability of algorithms in a distributed system.

The reasons for using data compression in the DISCWorld are twofold. By storing an image, or any other data, in a smaller amount of space, we can reduce the total amount of storage required in the system and also the transmission times for individual data items. Transmission includes transfer to and from storage devices as well as across networks. The corollary of reduced storage requirements is the capacity to store more data, which applies to the various levels of cache in a distributed system as well as to the primary store. These benefits come at the cost of the increased processing required to convert the data between the compressed format(s) in which it is stored in the system and the uncompressed format in which it is used, and of impaired flexibility of access to the data. This last cost can effectively be expressed as the additional processing time required to overcome the rigidities imposed by compressed formats. These benefits and costs, in broad terms, define the trade-off space which must be considered in the design of a compression system and associated algorithms for the DISCWorld.

The two reasons stated above for using compression constitute the primary quantitative criteria for assessing the compression system. To these must be added the time required to add data to the system, which is essentially the compression time. Of themselves none of these criteria are particularly involved or difficult to quantify. However, they are made more complex because of the other important requirement of the system, flexibility of access, which is perhaps the most important requirement of all. The need to provide for a wide variety of access patterns, and the trade-offs between this requirement and the others mentioned above complicates the measurement of the criteria, particularly access times.

Hierarchical Storage
This refers to the ability to retrieve images at less than full resolution. This might be used to provide a preview of an image to a user prior to transfer of the whole image, or for use in a procedure which doesn't require, or cannot handle, a full scale image. If, in the former example, the user decides against retrieval of the complete image then the total transmission time has been significantly reduced. On the other hand, if the full image is requested then we would like the compression system to utilise the low resolution image already transmitted in order to reduce the balance of the data which must be transmitted. More precisely, the system should determine whether such a course of action is beneficial or whether, as may often be the case, the additional processing is not justified by the reduction in transmission time.

Data Tiling and Parallel (de)compression
Parallel processing of images often requires that the image data be distributed among the processing elements in a manner which is not consistent with the raster scan order in which images are usually stored. There are a wide variety of tiling patterns, all of which disrupt, to some degree, the locality among the pixels in the image. Therefore, to store the image in the tiled format will most likely lead to reduced compression efficiency or longer compression and decompression times. Conversely, storing the data in normal raster scan order may increase the cost of loading the image into a parallel process. This means that a trade-off is required between these two issues.

Another interesting issue raised by parallel processing is parallel compression algorithms. Decompression times could be drastically reduced if the decompression were shared among numerous processors, perhaps even in such a way that each processor decompressed that data which it subsequently required in a parallel computation. This would add computation overheads and also inter-process communication overhead, reducing the theoretical efficiency. However, the actual decompression time could be considerably shorter and if the additional processors used in the parallel decompression would have been idle otherwise, which is often the case, then the reduced efficiency is irrelevant.

Sub-image Access
Accesses to the repository will not always be for complete images or lower resolution versions of whole images. Users will frequently be interested in smaller geographical regions of an image and most of these will be for one of a small number of regions (eg. the region encompassing Australia). The difficulty with this for an image compressed in raster scan order is that it is hard or impossible to extract some data from a compressed file without first extracting all of the data which precedes the required section. One method of overcoming this problem is to divide the image into blocks in a regular manner and then satisfy the requests by extracting only those blocks which overlap the requested area. This significantly reduces the redundant computation which is required. Alternatively, instead of dividing the image in a regular manner, it could be divided in a manner corresponding to the commonly requested sub-regions.

Satellite Image Compression

One particular data set which the DISCWorld will be used to store is geostationary satellite imagery, in particular an archive of data from the Japanese GMS5 satellite[3]. Most such data can be thought of as a four dimensional data set where two dimensions are spatial, one is spectral and the last is temporal. In the case of the GMS5 data, the temporal frequency of the images is about one image set per hour, each image set consists of four images from different spectrums and the individual images are 2291x2291 pixels (about 5.4M uncompressed).

All of the general compression requirement above apply to this data set. Below we discuss some issues which are specific to this type of data set.

Since the satellite data set is four dimensional there are opportunities to improve the compression algorithm beyond what is possible for normal 2D images. Broadly, there are two ways in which this can be done, one per extra dimension. The first of these is to exploit the similarities between the images in different spectrums from the same time. This is not possible for all spectrums; some of them have virtually no correlation with other images in the set. When there is considerable correlation it should be possible to use this to obtain compression ratios higher than could be obtained for the images compressed in isolation. This approach also has disadvantages. If two or more images are compressed as a single unit then it may be more expensive to extract a single image than would otherwise be the case.

The second way in which the compression could be improved for this data set is to try and exploit similarities between images from different times; either images either side of the one in question or perhaps from 24 hours earlier. It might be possible to find useful correlations between these images. This approach is complicated by the nutation of the satellite in its orbit which leads to the shifting of the earth disc within the image. Corresponding pixels in two images from different times do not necessarily correspond to the same region of the earths surface. Using the satellite location information in the metadata associated with the images it is possible to partially correct for this effect by calculating offsets between any two images. Furthermore, at least for this particular data set, there are less correlations between temporally separated images than between spectrally separated ones.

Another strategy to improve the compression system which was mentioned above is to isolate common elements of the elements of a data set and to store these separately to the remainder of the data. In the case of the satellite data set this common data might be statistical information which is common to the whole data set (or large parts of it, one spectrum for instance) or some type of spatial template which had features common to the whole data set.

This strategy is particularly useful in the distributed context. The common data described above could be duplicated at many (or every) site in the system, thereby reducing the volume of data which would need to be transmitted across the network. As described below, it would also mean that during an access to the repository some processing could be performed at the client end prior to data arriving from the server, thereby improving total access times.

Modeling the Compression System

When designing a compression system it will be useful to be able to obtain quantitative measurements of the criteria discussed above. However, as can be seen from the previous section, they are sufficiently complicated by the other requirements of the system that it would be very difficult to obtain a single number measurement of how well the system satisfied the criteria.

An alternative to this approach is to develop a model of the system which can incorporate most of the complexities found in it. This model can then be used to obtain measurements (or estimates) of the performance of the system in any given situation. It can also be used in the design of the system, for performance optimisation and estimation and for selection and design of compression algorithms.

To begin with, consider the following sequence of tasks which are to be performed when a user initiates a request for some data object which might be, for instance, a low resolution image for preview purposes or a full resolution image.

  1. After searching for the requested object in a local cache, the client identifies the location of the server which stores the data (possibly by referring to a database) and issues a request to the server asking for the requested object.
  2. The server receives the request, determines what data is necessary and locates the data.
  3. The necessary files are loaded from the local data repository.
  4. If necessary, some processing is performed on the server to generate the data object requested by the client. This processing might reflect any of the image access requirements stated above.
  5. The data object is transmitted to the client.
  6. The client may perform some processing to generate the data item requested in step 1. This will most likely involve decompression of the data.
  7. The data is given to the user.

In the discussion which follows: T = Time, D = Data Size, L = Latency and B = Bandwidth. For subscripts: numbers 1 to 6 refer to the steps above, T = total and D = decompression.

Given the above series of steps, the simplest cost model is:


equation23

This is very simplistic and is not satisfactory. We can do better by considering latency and bandwidth separately. This will reveal the relationship between the steps in the process and permit investigation of possible pipelining of operations. For the steps which involve processing, we use a linear cost model, and characterise them by fixed and proportional cost (C and M) using the equation


equation27

To do this it is necessary to add some detail to the above sequence of steps.

  1. tex2html_wrap_inline194 is the latency of sending a message from the client to the server. This is a small message and so its length can be approximated by 0.
  2. This step has a fixed cost of tex2html_wrap_inline196; no significant data volume is involved. It might involve querying a database in order to locate the required data.
  3. This step involves loading data from the local data repository at the server. Since this might be a hierarchical storage system, the latency in some instances could be very large. Also, since access may be required to multiple files, there could be many different latency values. To simplify this we will just consider a single value, tex2html_wrap_inline198. This step is also characterised by the bandwidth achieved by the store, tex2html_wrap_inline200, which also could vary according to where in the storage system the data resides. The bandwidth will also vary with system load.
  4. This step has a fixed cost of tex2html_wrap_inline202, proportional cost of tex2html_wrap_inline204, and its data input and output are tex2html_wrap_inline206 and tex2html_wrap_inline208 respectively.
  5. This step will have parameters tex2html_wrap_inline210, tex2html_wrap_inline212 and tex2html_wrap_inline214. tex2html_wrap_inline214 is the same as tex2html_wrap_inline208. tex2html_wrap_inline210 will not necessarily be the same as tex2html_wrap_inline194 as different protocols may be used for both steps. For example, the request in step 1 may be in a high level language such as Java and the large data transfer in step 5 may use a UNIX socket to achieve greater speed.
  6. In the simple case this step has parameters tex2html_wrap_inline224, tex2html_wrap_inline226 and data output tex2html_wrap_inline226. It could be more complex. As discussed above the compression system may duplicate some common parts of the data set at all nodes in the system. In this case the requested data item would be derived from the combination of the local duplicated data and the data retrieved from the remote site. This would mean that some processing would be performed at the client end concurrently with the server responding to the request. Consider the data inputs to this step to be tex2html_wrap_inline230 from the local store on the client and tex2html_wrap_inline232 from the remote store. For simplicity, assume that the processing of data tex2html_wrap_inline230 is completed before data begins to arrive from the server. The remaining processing has parameters tex2html_wrap_inline236 and tex2html_wrap_inline232. The advantage of this split processing, if there is any, will derive mainly from a reduction of the latency of processing the incoming data from the remote store; ie. We require that tex2html_wrap_inline236 be significantly smaller than tex2html_wrap_inline224 would have been. There is also some gain if less data requires transmission in step 5. However, for this gain to be significant large amounts of data would have to be stored at the client end, which is inappropriate.

We now apply this model to a simple example in which a user makes a request for a complete image from the repository. The image is not cached anywhere and must be retrieved from a remote server in the manner described above. Assume that the required image resides in the data store of the server in the required format (but possibly compressed) such that no processing is required at either the client or the server, with the possible exception of decompression at the client if the image is compressed. Therefore, step 4 is empty and step 6 involves only decompression. Steps 3 and 5 involve simple data transfers and therefore their time performance would usually be characterised in terms of latency and bandwidth. For this example we will represent this as a simple linear equation as described above. For step 3 this equation is:


 equation36

where tex2html_wrap_inline206 is the data volume, tex2html_wrap_inline246 is the latency of the transfer and


equation39

where tex2html_wrap_inline200 is the bandwidth of the link. The model for step 5 is the same.

The time performance of most decompression algorithms can be characterised by the formula


equation41

where tex2html_wrap_inline250 is the volume of data, tex2html_wrap_inline252 is the startup cost of the decompression program and tex2html_wrap_inline254 measures the speed of the decompresser (it is the time required to decompress a single byte). Most decompression algorithms have a very low fixed startup cost, provided that they are implemented in a pipeline fashion without large scale buffering of output. (The corresponding compression algorithms, by contrast, frequently have high startup costs due to analysis of the image prior to commencing encoding). We will assume the startup cost for the decompression in step 6 to be 0, ie tex2html_wrap_inline256.

Finally, for this example we will assume that the three stages discussed above are pipelined; the fixed cost occurs at the beginning of the stage, after which the subsequent stage may commence. Given this, we wish to find a formula for the time performance of the entire operation which looks like


equation43

To do this we must first make the above individual equations equivalent by expressing all of the data transfer rates in terms of the same amount of data. This choice is arbitrary but from the users perspective the most useful figure, and that used below, would be the data transfer rate of uncompressed data. Ie tex2html_wrap_inline258. Rewrite equation 3 above as


equation46

where


equation50
and


equation54

and tex2html_wrap_inline260 is the compression ratio.

Now we can write


equation59
and


equation62

So,


equation66

To apply this equation to the design of the compression algorithm we must try to minimise the max term. Of the first two components of this term, the second is likely to be the larger as tex2html_wrap_inline262 involves a network and tex2html_wrap_inline264, a disk. Therefore given a monotonic relationship between tex2html_wrap_inline266 and tex2html_wrap_inline260, the optimal point occurs when


 equation71

or


 equation75

This is a reasonably intuitive result which shows how the model can be used to analyse the tradeoff space for the time performance of the compression algorithms. It does not yet address the other main criteria, storage space.

Compression Algorithms

At present the satellite image data set is stored in our archive in the format in which it is distributed, HDF (Hierarchical Data Format) files compressed with the UNIX Compress utility. To convey a brief impression of the tradeoff space involved in the design of the compression system, this algorithm is compared with two lossless image coders: lossless JPEG[1] and CALIC[4]. Lossless JPEG is the standard for lossless image compression specified by the JPEG committee. It is far from being the state of the art, but is a reasonably simple algorithm which provides reasonable compression at reasonable cost. CALIC is a more expensive algorithm which is close to the state of the art in lossless image compression (as measured by compression ratio). It is one of the submissions to JPEG for the next generation of lossless image compression in this standard.

Table 1 gives the relevant figures for these algorithms. These figures are for compressing an image from the IR1 spectrum of the GMS5 dataset. The lossless JPEG algorithm has 7 modes, the figures below are the one which gives the best compression ratio. These figures were derived from running the algorithms on a Ultra 2170 with 2*167MHz UltraSPARC processors and 256M ram.

  table83
Table 1: Comparison of lossless compression algorithms

Given this information, we can apply the model developed above to some example operations. Consider making a request over a wide area network. Data transfer rates between Adelaide and Canberra across AARNET at an inter-peak time are around 350 KB/s. This gives a value of tex2html_wrap_inline278. Given this value, the optimal total transfer time would be provided by an algorithm with a slightly higher decompression time and lower compression ratio than the lossless JPEG algorithm. At the other en of the network bandwidth spectrum, consider a modem connection which is about 100 times slower than the above example. In this situation the best algorithm would be slower than CALIC and provide better compression.

These examples illustrate another desirable feature of the compression algorithms which are employed in the repository. It would be useful to be able to parameterise the algorithm such that compression ratios and processing times could be optimised for particular applications. However, this issue is more complex than the selection or optimisation of an algorithm based on the model above. The parameters of the compressed image, and therefore decompression times, are set when the image is compressed. Instead of algorithm modifications, the model could also be used to design parallel decompression algorithms and to optimise them for a particular network.

Discussion

Other issues that arise in a real system include the relative economic cost of the system components. In the above analysis the tradeoff space has been explored as if the only consideration were the time to complete. In some time critical operations, particularly those in military applications with effectively unlimited budgets, this is the situation. In other application domains, budgetary cost is important. This can be complicated by the need to utilise legacy resources that effectively cost the organisation nothing compared to newer components. The tradeoff between storage capacity cost and response cost, processing response time and network delivery time is then no longer entirely technical.

Acknowledgments

  This work is being carried out as part of the Distributed High Performance Computing Infrastructure (DHPC-I) project of the Research Data Networks Cooperative Research Center (RDN CRC) and is managed under the On-Line Data Archives Program of the Advanced Computational Systems CRC. RDN and ACSys are established under the Australian Government's CRC Program. The authors thank Don Bone of CSIRO for helpful discussions on this work.

References

1
C. Wayne Brown and Barry J Shepherd. Graphics File Formats - reference and guide. Manning Publications Co., 1995.

2
K. A. Hawick et al. DISCWorld: An Integrated Data Environment for Distributed High-Performance Computing. Technical report, Department of Computer Science, University of Adelaide, Adelaide, South Australia, 5005, 1998.

3
Meteorological Satellite Center, 3-235 Nakakiyoto, Kiyose, Tokyo 204, Japan. The GMS User's Guide, 2nd edition, 1989.

4
Xiaolin Wu. An Algorithmic Study on Lossless Image Compression. In James A Storer and Martin Cohn, editors, Proceedings of the 1996 IEEE Data Compression Conference, pages 150-159, Los Alamitos, California, March 1996. IEEE Computer Society Press.

About this document ...

Technical Report DHPC-029

Compression of Image Data and Performance Tradeoffs for Client/Server Systems

James Hercus
Sat Jan 24 18:02:42 CST 1998