Experimental investigation of linear mixing in real world datasets

Authors: 

Jie Ouyang
Thomas Trappenberg
Andrew Back

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2

Abstract: 

It is well acknowledged in the data mining community that feature values, which become the input variables for modeling the system, are often statistically dependent. In this paper we attempt to quantify the dependencies by assuming a linear mixing model and using an independent component analysis (ICA) to estimate the mixing matrix. The major difficulty in quantifying the mixing strength comes thereby from the fact that ICA algorithms give estimations of a mixing matrix only up to row permutations and scalar factors of the mixing matrix. In this paper we propose several measures of the mixing strength that are either appropriate estimates or lower bounds of the true linear mixing strength. These measures are tested on generic data and on 30 datasets from standard machine learning repositories. The experimental results not only indicate that statistical mixtures between input variables exist in real world problems, but most of them are strong.

Tech Report Number: 
CS-2004-05
Report Date: 
May 31, 2004
AttachmentSize
PDF icon CS-2004-05.pdf352.87 KB