Comparison of Unsupervised Learning Techniques for Encrypted Traffic Identification

Authors: 

Carlos Bacquet
Kubra Gumus
Dogukan Tizer
A. Nur Zincir-Heywood
Malcolm I. Heywood

Author Addresses: 

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax NS, Canada bacquet@cs.dal.ca

Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey kgumus@ug.bilkent.edu.tr

Computer Engineering Department, Ege University, 35040 Bornova-IZMIR, Turkey 05050007004@ogrenci.ege.edu.tr

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax NS, Canada zincir@cs.dal.ca

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax NS, Canada mheywood@cs.dal.ca

Abstract: 

The increasing use of encrypted traffic combined with non-standard port associations makes the task of traffic identification increasingly difficult. This work benchmarks the performance of five unsupervised clustering algorithms: Basic K-Means, Semi-supervised K-Means, DBSCAN, EM, and MOGA for encrypted traffic identification, specifically SSH. Results show that the performance of MOGA, a multi objective clustering approach using a Genetic Algorithm, is not only better than the others, but also provides a good trade off in terms of detection rate, false positive rate, and time to built and run the model. This is a very desirable property for a potential implementation of an encrypted traffic identification system.

Tech Report Number: 
CS-2009-09
Report Date: 
December 18, 2009
AttachmentSize
PDF icon CS-2009-09.pdf953.7 KB