Computer Science and
     Software Engineering

Computer Science and Software Engineering

An overview of Data Mining Algorithms

Professor Tadao Takaoka

Dept. of CSSE, University of Canterbury

Fri Sep 09 15:10:00 NZST 2005 in Room 031, MSCS

Abstract

Data mining is to extract useful information from a vast amount of data, typically from a large database. Here useful information means some interesting information that can be found only going through a large database with a computer, which a human can never scan through with bare eyes and hands.

The database can be that of sales data of a supermarket, image data such as X-ray images for medical records, etc. Interesting information may be customers' purchasing behavior in the sales database, or some abnormality in medical images. As the size of the database is measured by gigabytes stored in disk space, algorithms that deal with the data must be not only fast, but also need to make minimum access to the disk, as access to a disk is expensive in terms of accessing time.

The meaning of an association rule A ==> B is that if its confidence is high, a customer who buys A is likely to buy B. In this talk I explain some representative algorithms for data mining. Specifically I explain how the legendary association rule "nappy ==> beer" can be found. Also the relationship between association rules and the maximum subarray problem will be explained.


View past or future seminars; or view the CSSESS Home Page.