DHPC Adelaide

DHPC Technical Report DHPC-109

J.A.Toon

Archived 8 August 2002.

Abstract

We live in a world where the ever escalating levels of raw data available are demanding continually more sophisticated methods of processing such data for valuable information. Many different approaches are present, often under differing and somewhat ill-defined terms, drawing on expertise from numerous fields. This paper attempts to summarise the current state of the field, its relevant commercial applicability, and the great potential for both future CAST courses within the University of Wales, Bangor, and academic research in this area.

An unfortunate by-product of the ``information age'' is that along with the increasing quantities of ``information'', we have ever increasing levels of its raw material, data -- ``in the next ten years we will generate as much codified data as has been generated in all years before. Data is everywhere -- from satellite telemetry, to food chemical analysis, product marketing statistics, to sales trends and innumerable other sources. Irrespective of its source, if we wish to be able to impart some meaning onto the data, we need to analyse the data and determine structures within the data that indicate some underlying feature.

Typically such data will be present in vast quantities. For example, it is perfectly normal for the log of a web server running a major site to contain millions of entries. Processing such a vast quantity of data manually is completely infeasible. Instead, we wish to be able to automatically analyse and extract the relevant information, and do so quickly. This is what the field of data mining attempts to accomplish.

Ideally we want most, if not all, of our systems to have this behaviour. We want this process to be as automatic as possible; we want to have intelligent systems. The progression towards automated systems involves the application of many diverse fields. It is now ubiquitous that any major system maintains its data store via a suitable Database Management System (DMS). The numerous data mining techniques attempt to help us organise, categorise, and understand the data; therefore data mining can be viewed as a subfield of Database technology. More generally, some subsume this process under the title of ``Knowledge Discovery in Databases'' (KDD). Mathematically, data mining can also be categorised as a specialisation of pattern recognition. Furthermore, the techniques employed are often highly statistical, so it could equally be expressed as a statistical field.

Naturally, data mining can draw heavily from Artificial Intelligence (AI) techniques, utilising its central paradigm of search and notions of agents. The combination of these AI techniques with the various statistical methods is often categorised under the term ``machine learning''.

PDF version

Postscript version (gzip compressed)


[ DHPC Adelaide | DHPC Bangor | Contacts | People | Projects | Reports ]

webmaster@dhpc.adelaide.edu.au