Experimental investigation of linear mixing in real world datasets
It is well acknowledged in the data mining community that feature values, which become the input variables for modeling the system, are often statistically dependent. In this paper we attempt to quantify the dependencies by assuming a linear mixing model and using an independent component analysis (ICA) to estimate the mixing matrix. The major difficulty in quantifying the mixing strength comes thereby from the fact that ICA algorithms give estimations of a mixing matrix only up to row permutations and scalar factors of the mixing matrix. In this paper we propose several measures of the mixing strength that are either appropriate estimates or lower bounds of the true linear mixing strength. These measures are tested on generic data and on 30 datasets from standard machine learning repositories. The experimental results not only indicate that statistical mixtures between input variables exist in real world problems, but most of them are strong.
Attachment | Size |
---|---|
![]() | 352.87 KB |