An Evaluation of Machine Learning Techniques for Enterprise Spam Filters

Authors: 

Andrew Tuttle
Evangelos Milios
Nauzer Kalyaniwalla

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2

Abstract: 

Like a distributed denial-of-service attack, the barrage of spam email is overwhelming enterprise network resources. We propose and evaluate an architecture for a practical enterprise spam filter that provides personalized filtering on the server side using machine learning algorithms. We also introduce a novel experimental methodology that overcomes the "privacy barrier", making it possible to evaluate spam classifiers on a variety of individual, complete streams of real email. Our tests yield convincing evidence that these algorithms can be used to build practical enterprise spam filters. We show that the proposed architecture will likely be efficient and scalable. We show that the filters can be, on average, highly effective even at very low false positive rates. And we show that the algorithms offer a well-behaved tuning mechanism that can be used to manage the overall enterprise risk of legitimate mail loss.

Tech Report Number: 
CS-2004-03
Report Date: 
March 12, 2004
AttachmentSize
PDF icon CS-2004-03.pdf5 MB