An Evolutionary Algorithm for Feature Selective Double Clustering of Text Documents

Authors: 

S. N. Nourashrafeddin
Evangelos Milios
Dirk V. Arnold

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2

Abstract: 

We propose FSDC, an evolutionary algorithm for Feature Selective Double Clustering of text documents. We first cluster the terms existing in the document corpus. The term clusters are then fed into multiobjective genetic algorithms to prune non-informative terms and form sets of keyterms repre- senting topics. Based on the topic keyterms found, representative documents for each topic are extracted. These documents are then used as seeds to cluster all documents in the dataset. FSDC is compared to some well-known co-clusterers on real text datasets. The experimental results show that our algorithm can outperform the competitors.

Tech Report Number: 
CS-2013-01
Report Date: 
February 2, 2013
AttachmentSize
PDF icon CS-2013-01.pdf1.46 MB