Abstract for HONS 04/05 - Computer Science and Software Engineering - University of Canterbury - New Zealand
HONS 04/05

Evaluating ensemble classifiers for spam filtering

James M Carpinter
Department of Computer Science
University of Canterbury

Abstract

In this study, the ensemble classifier presented by Caruana, Niculescu-Mizil, Crew & Ksikes (2004) is investigated. Their ensemble approach generates thousands of models using a variety of machine learning algorithms and uses a forward stepwise selection to build robust ensembles that can be optimised to an arbitrary metric. On average, the resulting ensemble out-performs the best individual machine learning models. The classifier is implemented in the WEKA machine learning environment, which allows the results presented by the original paper to be validated and the classifier to be extended to multi-class problem domains. The behaviour of different ensemble building strategies is also investigated. The classifier is then applied to the spam filtering domain, where it is tested on three different corpora in an attempt to provide a realistic evaluation of the system. It records similar performance levels to that seen in other problem domains and out-performs individual models and the naive Bayesian filtering technique regularly used by commercial spam filtering solutions. Caruana et al.’s (2004) classifier will typically outperform the best known models in a variety of problems.

  • Phone: +64 3 369 2777
    Fax: +64 3 364 2569
    CSSEadministration@canterbury.ac.nz
  • Computer Science and Software Engineering
    University of Canterbury
    Private Bag 4800, Christchurch
    New Zealand
  • Follow us
    FacebookYoutubetwitterLinked In