The services that are presented for performance measurement perform
the same functions as those found in the ERIC
prototype [129] (see
appendix
). As such, a direct comparison may be made
between the performance of the services using the DISCWorld daemon and
ERIC. Both ERIC and the implemented services provide remote
access to a repository of GMS satellite [133] imagery. In
addition to providing browsing access, they allow processing
operations to be invoked on the stored images.
The main difference between the two approaches is that the ERIC
program involves processing on the server-side only. The program is
implemented as a Perl common gateway interface
(cgi-bin) [100] script. The script interfaces to
user-level command-line programs which communicate by the use of
shared file systems and pipes. No distribution of the command-line
programs is possible unless it is explicitly incorporated by the
author of the script. The user composes a query using a HTML forms
interface, and the result is returned as another HTML page. Although
ERIC implements basic caching, only resultant images, metadata and
MPEGs [134] are stored. No partial results are
stored, and final results cannot be used as inputs for further
processing. Other limitations of the ERIC prototype are discussed in
appendix
.
In contrast, the DISCWorld daemon allows processing to be performed at both the client-side and server-side. For example, the scheduling placement algorithm may decide that simple imagery operations can be more efficiently performed at the client-side than the server-side. Services can be written as pure Java or may be implemented as a wrapper around a native program (using either JNI [148] or system calls). Objects can be sent between nodes. Consequently, there is no need for common file-systems. The DISCWorld daemon stores all results of services, whether they are the final results of the user's processing request or the results of intermediate computations. Intermediate results can be used in subsequent processing requests since the DISCWorld model assigns canonical names to all objects..
For performance analysis, we focus on three example services:
findGMSImage, processGMSImage, and
createMPEG. FindGMSImage is very similar to one of the fundamental
operations found in ERIC: it invokes a shell-script to retrieve an
image from the spatial imagery archive. After finding the image, it is
loaded into the DISCWorld daemon using services built from the methods
in the Java Advanced Imaging (JAI) [217] API. The image
is then accessible via its DRAM. The input parameters to the service
are the date and time at which the image was created, and its spectral
channel. This service invokes a native program, and it is marked as
``not movable'' (as described in chapter
); it
cannot be transferred between nodes.
ProcessGMSImage uses services from the JAI API to crop and scale the GMS Image. It performs the same imagery operations as found in the ERIC. The inputs to the service are the GMS image, the area to be cropped, and the zoom scale of the resultant image. As this service is written completely in Java, it can be transferred between nodes.
CreateMPEG is a service that creates MPEG animations from sequences of images. This service invokes the mpeg_encode [87] program as a system call. Due to the way in which the program is implemented, it is necessary to write the images and an associated parameter file to a temporary disk area. The program also writes the newly-created MPEG to the disk, where it is ingested into the system using services from the JAI API. Like the findGMSImage service, the createMPEG service is deemed ``not movable''.
Three computers were used for testing the performance of the DISCWorld daemon: a dual-processor Sun Enterprise 250, (called lerwick) with 256Mb of physical and 300Mb of virtual memory that runs Solaris 2.6 and Java 1.2.1; a dual-processor Celeron, (called banff) with 256Mb of physical and 1Gb of virtual memory that runs Solaris 2.6 and Java 1.2.1; and a single-processor Pentium II, (called geronimo) with 64Mb of physical and 64Mb of virtual memory that runs Windows NT 4.0 and Java 1.2. Lerwick and banff are connected via a 100Mbps network; geronimo is connected via a 10Mbps network.

Table 8: Comparison of overheads when individual
services are executed by different environments. The execution time of
the services implemented as command-line programs are
measured so the overheads of the Java system can be measured.
The time taken for a pre-computed result to be retrieved from the
local daemon, and from a remote daemon via a slow 10Mbps link are also
shown. Measurements of the ERIC system,
from [127] are shown for comparison. All
times are based on at least 10 measurements. Variances are based on a
least-squares linear fit of the measured values. The error in the
timers used to collect timing information is that of Java's
currentTimeMillis method, which does not have an accuracy of exactly
one millisecond [157].
Table
shows the time that each
of the services takes to run on lerwick under a number of different
execution environments. The Unix command line measurements are the
execution time of the command-line programs. ProcessGMSImage cannot
be measured in this way as it is written purely in Java; it cannot be
directly executed without the creation of a JVM. All services within
DISCWorld are encapsulated by a service wrapper, which provides
JavaBean interfaces to the methods of the service. A service's
run() method is invoked to execute the service. The run-time penalty
due to Java can be calculated by invoking the service from a thin
client on a local machine. The run-time overhead on findGMSImage is
approximately 3000ms. We attribute this to: the necessity of creating an
instance of the service in the JVM; invoking the service's set
method for each parameter; and forking a new process to execute the
shell-script. This overhead is increased with the createMPEG service
because of the need to write each of the input images to disk before
executing the mpeg_encode program.
It is useful to compare the performance of each of our example
services when executed as stand-alone servers in an RMI environment. As
table
shows, the additional
overhead of each service when invoked via RMI is approximately
2000ms. The increase in execution time can be attributed to
service brokering by the RMI registry and the fact that each of the
services is executed in their own JVM. This overhead is not
substantially increased when the RMI client is non-local.
The times quoted for the DISCWorld daemon are the total elapsed time from when the user client submits a processing request to when the client receives the final DRAMD. This includes the time taken to: transmit the processing request to the daemon; parse and place the service; return the DRAMF to the client; for the client to inspect the DRAMF; for the service to be executed; and for the DRAMD to be returned to the client. As the majority of the time spent executing a user processing request is at the server-side, the additional time taken to submit a request from a remote client is relatively small. Clearly, the DISCWorld daemon introduces some overheads, which when amortised over large services, become relatively inconsequential.
As previously mentioned, all results and partial results in DISCWorld
are cached. This proves to be useful when a client requests an object
that has already been cached. As illustrated in
table
, requesting a cached
object from a DISCWorld daemon's store is extremely efficient. In
contrast to the remote clients using RMI and DISCWorld, which were
measured using a fast (100Mbps) network, the results of the remote
retrieval from the DISCWorld cache are across a slow (10Mbps)
network. The retrieval times are dominated by the network
characteristics. The time is comparable to the time taken to execute
the service using a thin Java client on the faster machine, or using
RMI on the faster machine. Use of the remote client to request
information from the local DISCWorld daemon illustrates the re-use of
DRAMDs and DRAMFs inside the daemon - in this example, the original
processing request that caused the data to be created was made from a
client on the node local to the daemon. This further emphasises the
usefulness of the DRAM mechanism, whereby the data may be further
processed before being returned to the client, which may be connected
via a very slow network.
In addition to the timing information presented in
table
, further measurements of
the ERIC system are presented
in [127]. While timing information for the
findGMSImage alone service is not available for ERIC, the times for
ERIC's processGMSImage service include that taken by findGMSImage. No
standard deviation is provided in [127]
for the time to create an MPEG at the resolution and zoom that we
consider in this analysis. The figure given for the time taken to
retrieve a cached MPEG is that taken to retrieve any object from
ERIC's store.
The performance of the DISCWorld daemon is very dependent on the background load of the processors. The performance of Java is very susceptible to these effects because not only do the Java application's threads need to be scheduled by the JVM, but the JVM needs to be scheduled by the host operating system.
Table
shows the performance of
the DISCWorld daemon in the case where a single server is chosen to be
used by the placement algorithm. The first two columns of this table
show the effects of adding a second service which is independent of
the first. The mean time to return DRAMFs, corresponding to the
outputs of the services, scales linearly with the number of
independent services. However, there is a high standard deviation in
the DRAMF creation time for the case in which there are two
independent services. The increase in DRAMF creation time may be
attributed to additional parsing necessary for the extra service, and
also to increased contention for the quartermaster's synchronised
store object, as described in
chapter
. Interestingly, while the mean time
shown to execute two services is less than that for a single service,
the standard deviation is larger. Within the bounds of experimental
measurement, these values appear to be the same. This is feasible
because the daemon is multi-threaded, and it able to execute many
services simultaneously.

Table 9: Performance of DISCWorld prototype using multiple processing
requests. In the case of two services, two independent findGMSImage
services are requested; in the case of four services, two independent
instances of a findGMSImage and processGMSImage service are
requested. The DISCWorld daemon processes all independent
services concurrently.
The last column of table
presents the results of a client submitting a processing request in
which there are two independent requests, comprised of a
processGMSImage service that uses the output of a findGMSImage
service. This may be compared with the second column of the table, in
which two independent findGMSImage services are requested. The time
taken to create and return DRAMFs is found to scale linearly with the
total number of services requested
. The addition of
the two processGMSImage services, dependent on the output of their
respective findGMSImage services, causes the total execution time of
the processing request to double. Thus for these services, at least in
this situation, the performance scales linearly.
The situation in which more than one client simultaneously makes a
processing request to the same DISCWorld daemon is shown in
table
. The time taken for the
daemon to create and return a DRAMF, and to return previously-created
DRAMDs from the cache, increase by approximately 750ms per additional
client. Each additional client increases the time taken to execute a
single findGMSImage service by approximately 7200ms. This again
emphasises that the daemon scales linearly, subject to the limitations
as described above.

Table 10: Performance of DISCWorld prototype using single service when
processing requests are made simultaneously. Within the bounds of
control, each client submits a processing request, and then inspects
the resulting DRAMFs at the same time.
Multiple servers will only be used by the scheduling placement algorithm when it will result in a lower processing cost, or if the required services are only available on different servers. Of course, when using multiple servers the effects of the interconnection network become significant, as illustrated above.
As described in section
,
whenever a daemon receives a DJPL script, the script
is parsed and tested to see if any extra placement information needs
to be generated. If two nodes are used for processing, the first
node (local to the client from which it was submitted) generates an
annotated DJPL script for the request. The annotated DJPL script is
then distributed to all participating nodes. When the script is parsed
and the services created, no DRAMFs are produced until all the DRAMFs
that will be used as inputs to that service are received. This adds
extra overhead that is not been taken into consideration when creating
the schedule. This is a second-order effect, and has not been
incorporated into the model.
Further to the example presented and discussed in
section
, we present a detailed
example of event timing within the execution of a client processing
request. We use this example to discuss the strengths and weaknesses
of the DISCWorld architecture model.
Consider a processing request in which a number of GMS images are to
be retrieved and processed on lerwick, and then transferred to
banff, where they are used to make an MPEG. This sequence of
events is shown pictorially in
figure
. An individual GMS image
may be identified by the time and date at which it was made, and the
spectral channel it represents [129]. A
processed GMS image has the additional information of the bounding box
to define the area of interest, and the zoom scale of the resulting
image. The figure shows the services involved in the creation of such
an MPEG; the images represent the data being manipulated at each step
(although at greatly reduced resolution). When the processing request
is parsed by the DISCWorld daemon, DRAMFs corresponding to the future
data product are returned to the user. When the DRAMFs are inspected,
the computation is started. Therefore, the arrows in
figure
represent both DRAMFs
before the computation is started, and DRAMDs after the partial
products have been made. For all the image cases tested, the
performance of the daemon scales approximately linearly. The case in
which there are four images to be used and the processing request is
submitted from lerwick, the following sequence of events, at the
following times are observed (we ignore the following steps discussed
in section
: resource discovery, user
logging onto the client, and composing the processing request):

Figure 47: Pictorial representation of a complex DJPL request. A
processed GMS image is defined by the tuple (time, date, spectral
channel, bounding box, zoom scale). A sequence of processed GMS images
is used as input by the createMPEG service. The query is constructed
using DRAMDs and DRAMFs, denoted conceptually by arrows. The images
show the data that is produced and consumed at each processing step.
On lerwick,
When the DJPL script and DRAMFs are received by banff,
The user client, running on lerwick, receives the DRAMF for the output of the createMPEG service, and inspects it as soon as it is received.
Banff receives the request for the DRAMF to be created
Lerwick receives the request for the data to which the DRAMFs correspond,
On banff, the DRAMDs corresponding to the output of the processGMSImage services are received,

Figure 48: Event
timing diagram of the complex processing request, shown in
figure
, using four images. By
far, the greatest amount of time is spent in the findGMSImage and
processGMSImage services. This is not unreasonable due to the penalty of
co-locating services on the same node.
The event sequence of this processing request is shown in
figure
. It can be seen that the majority
of the total execution time is spent when the DRAMFs that are produced
by the processGMSImage services are inspected. The request, instigated
on banff, causes the services on lerwick to be started, and the final
results returned. This example shows how DRAMs can be used to set up,
and execute complex processing requests that span multiple nodes in a
distributed system.
It can be seen in this example that the parsing of the annotated DJPL, as distributed by the DISCWorld daemon on lerwick, takes an abnormally large amount of time. While we are unable to pinpoint the exact cause of this anomaly, we suspect that its cause may be due to the way in which the annotations are implemented by our parsing routines.
This example shows the case in which a server has been specialised to
create MPEG animations. Another example of a server specialising in a
specific service is the dploader tool (described in
chapter
), which offers a parameterised simulation
service that uses a network of workstations that are unable
to run their own JVMs.
It can be seen from this example the benefits of caching partial results of services. For example, if a number of the same images are to be re-used, the total service execution time will be approximately 145000ms shorter. We have not addressed the issue of flushing cache contents; this is planned in future work. If a DRAMD with the same name is available from a different server at a lower cost, it may be retrieved from the alternate server, thus optimising the execution of a processing request.
As has been previously discussed, one of the limitations of the current implementation of the DISCWorld daemon is its poor memory management. If the cache becomes too full, the daemon will fail, losing its current state information.
The current DISCWorld daemon implementation relies heavily on information about available nodes and interconnection networks. In the current implementation, if both the node and interconnection network information are not available, a node is not used. The reason for this is that while a node may be available all of the time and may have a very short waiting time, its interconnections to the remainder of the network may introduce a processing bottleneck. Future versions are intended to perform an on-demand analysis of available nodes and interconnection networks, similar to the Network Weather Service [237] of the AppLeS project.
This model relies on maintaining a mean size for objects of given types, in order to make future estimates. Since the same tests are repeated during performance analysis, we are unable to take advantage of the repeated creation of objects of the same type. Consequently, the estimation effects are unable to be seen in these performance results.
A serious limitation of the current implementation's performance is due to Java's virtual machine (JVM). The amount of memory that a Java virtual machine can use is limited by the amount of physical and virtual memory on a node. This places an upper limit on the amount of data that can be stored in the DISCWorld daemon's store in the current implementation. It was observed that when the JVM runs out of memory, not only will it eventually crash, but the daemon fails in curious (but unpredictable) ways: some threads will be created but not others; partial parsing of new processing requests will be performed; and, some requests for information in the store will be successful, others will not. The presence of other users' processes running on the nodes further reduces the amount of memory that the daemon can use for caching.
When a service is moved to a new node, the execution time and variance are assumed to be the same as that of the node from where it has come. When the service is downloaded to a node, the past run-times are set to zero, so new averages and variances can be computed. Thus, the scheduling node makes the assumption of the same general performance characteristics; when the service is used the new values replace the assumed characteristics.
Due to the way in which the mpeg_encode program is implemented, each of the images to be used for input must be written to disk. This is primarily so that the images can be converted to a file format suitable for the program. The current implementation of the createMPEG service writes the files in the base format required by the program. Unfortunately, the need to write files to a disk makes the service reliant on the amount of space available on the disk. If, during the course of service execution, the disk fills, the service fails. While this does not cause the daemon to crash, the user client is not notified, and no attempt is made to rectify the failure.
A simple timer thread is used to approximate the load on a node. The operation of the timer thread is very simple: it resets and increments a counter for a given time period. Depending on what percentage of the current maximum is returned by the counter will give an estimate of the load on the node. In the current implementation, services are only executed once the estimated load is below a certain threshold. Of course, the execution of services is not the only contributing factor to the observed load: the parsing of new DJPL scripts; any blocking requests; and saving the state of the store all contribute significantly from within the JVM. This figure will be additionally affected by the external load on the system. In future versions of the daemon, a more sophisticated method of estimating load must be developed and incorporated.
The daemon's thread count, shown in
figure
, shows at least 8 threads
that are executing while the daemon is quiescent. This figure can
more than double when processing requests are being parsed,
distributed and executed. While implementations that use native
threads can handle the large number of threads created during
processing, we have found that implementations that used green
threads had serious synchronisation problems. In fact, although the
daemon was developed on Java 1.2 on Windows NT 4, the release of
Java 1.2.1 for Solaris was the first in which our implementation did
not freeze.
As the DISCWorld daemon (and client program) are written in pure Java, we must rely on the mechanisms provided within the Java language specification [88] and the Java Runtime Environment [218] for robustness and fault tolerance. Although the prototype emphasises the usefulness of adaptive scheduling and DRAMs as an enabling mechanism, for the remainder of this section we discuss how reliability, robustness and fault tolerance might be incorporated into a production version of DISCWorld written in Java.
One of the fundamental issues that must be addressed in a production system is that of availability. On whatever platform the daemon runs, there must be a guarantee that whenever a host receives a message destined for a daemon, a daemon is running to accept it. On a UNIX machine the DISCWorld daemon can be treated similarly to the operating system daemons. An entry can be placed into the file /etc/inetd.conf so that if a message is received on the port dedicated to the DISCWorld daemon, a new daemon will be created if none is currently running. Similarly on Windows NT a file in /WINNT/system32/drivers/etc can be modified to the same effect.
As implemented, the DISCWorld prototype has two major failings: the
store fills very quickly, which causes the daemon to crash; and, if
the daemon fails while parsing or executing a processing request, the
request is forgotten when the daemon is restarted. Problems with the
store and avenues for future research are discussed in
section
. An obvious solution to the
problem of fault tolerance is to store all daemon state information in
a Java-accessible object-oriented database such as
Oracle [182], Informix [121] and
JDBC [218]. Thus by using transaction technology to ensure
the database remains in a consistent state, when the daemon is
restarted the state is able to be restored.
Writing all state information to a database could result in an unacceptable level of system overhead. Thus one of the management policies enforced by the daemon might be to categorise all state information and only store the highest levels. This scheme, similar to an incremental file backup system, would ensure that when the daemon receives important messages such as processing requests or DRAM dereference instructions, they will be stored. If the daemon is restarted after the message has been saved the action can be re-performed without too much cost to the daemon. Network failures are harder to detect because of the lack of acknowledgement messages in the daemon-daemon and client-daemon protocols. We are investigating the use of generous time-outs after which messages are resent.
Inside the daemon Java's strong typing and byte-code verification system provide the necessary framework within which a type-safe environment can be built. Other researcher in the Distributed and High Performance Computing group are working on adding user- and message-verification functions into the daemon's portal module. Authentication systems such as provided by Kerberos [173] are being considered for this task.
In this chapter, we have analysed the performance of the prototype DISCWorld daemon. The experimental results have shown that the scheduling algorithm is practically useful and DRAMs are a good enabling mechanism. However, the performance of the daemon is limited by the fact that several crucial parts of the daemon have not been implemented. The limitations of the current implementation are discussed, which suggest a large body of possible future work in this area.
We have compared the performance of the DISCWorld daemon with Java RMI and cgi-bin versions of the same programs. Results have shown that while the DISCWorld daemon is slower than the RMI version in the construction of the same data products, the time taken for the RMI version is constant, whereas the DISCWorld daemon benefits from caching all intermediate results. The DISCWorld model of computation also allows client-side computation, which is unavailable in cgi-bin scripts.
DRAMFs allow demand-driven execution of processing requests. Execution of processing requests can be optimised if the same data is available from alternate nodes. The prototype DISCWorld daemon has been shown to have performance which scales linearly within the limitations of the current implementation. A detailed example has shown how DRAMs can be used to construct and execute complex processing requests that span multiple servers. An event timing diagram has shown the relative time costs of the different processing stages of request execution.
It is important to remember that although the DISCWorld daemon is a vital component in the implementation of scheduling in DISCWorld, the creation of a perfect DISCWorld daemon is not the aim of this thesis. The main topics are: the creation and distribution of good service placement decisions in the presence of incomplete system state information; and, the implementation of a framework that allows high-level information about data and services to be traded between participating DISCWorld daemons and clients, to allow on-demand client- and server-side processing.
We have achieved these two goals through the use of: a platform independent Distributed Job Placement language, with which processing requests can be described; an execution-cost minimisation function, which incorporates the concept of a daemon's willingness to perform a given function; and the DISCWorld Remote Access Mechanism, which allows services and data, whether the data has been physically created or not, to be accessed and traded between clients and servers.