Dalhousie University: ICA Voices Demo

	ICA Voices Demo
	Matt Boardman, Faculty of Computer Science

Attached is a demonstration of how Independent Components Analysis (ICA) can separate similar voices spoken simultaneously. Here, the voice is my own, speaking some well-known quotes from U.S. President George W. Bush (*). Each recording is about 3 seconds in length.

Click the button near the recordings to hear an .mp3 of the waveform in a popup window.

Original Sources: Several sources are independently recorded, using a single microphone at 22 kHz. The voice is my own, speaking in a soft, measured voice to keep the signals as similar as possible in order to minimize the differences between signals: this should make the ICA more difficult. Source signals were filtered to reduce a buzzing artefact present in the original recordings.

Random Mixtures: The source recordings are randomly mixed together to create 12 jumbled mixtures. Each of the 12 mixtures contains a different proportion of all eight sources.

Whitened Mixtures: A Principal Components Analysis (PCA) reduces the number of mixtures from 12 to 9. Whitening, or sphereing, forces the signals to be uncorrelated (see here for more information).

Separated Sources: Independent Components Analysis (ICA) recreates nine maximally independent signals, which closely match the originals. ICA signals can be returned in any order, so a correlation measure was run to determine which signal was which. ICA signals can also be returned with exactly opposite phase, which would not greatly affect the sound of the signal, but would affect the correlation measure: the absolute maximum correlation is therefore used in the measure (for example, the sixth signal below has exactly opposite phase from the original, which is apparent by visual comparison to the corresponding eighth original signal). The image shows the signals obtained using the tanh non-linearity, but the .mp3 files to the right of the image are available for all four non-linearities for comparison.

skew

pow3

gauss

tanh

Comparison of Non-Linearities used in ICA: Audibly, some whispering residuals of the other voices remain in the separated sources, but overall ICA performs very well. Cross-correlation, shown in the following table, measures the statistical similarity with the original source signals. Higher numbers indicate a better match.

Signal	skew	pow3	gauss	tanh
"The future will be better tomorrow."	23.5	25.7	24.7	25.8
"Public speaking is very easy."	19.3	20.0	19.5	20.2
"If we don't succeed, we run the risk of failure."	22.1	22.0	22.6	23.5
"Welcome to Mrs. Bush and my fellow astronauts."	25.3	26.9	25.8	26.9
"I am not part of the problem, I am a Republican."	27.6	28.3	27.4	28.8
"For NASA, space is still a high priority."	20.8	21.7	20.9	21.7
"Verbocity leads to unclear, inarticulate thing."	17.7	18.6	18.0	18.6
"I stand by all the misstatements that I have made."	22.3	21.9	21.9	23.1
"It's time for the human race to enter the solar system."	30.3	30.9	29.8	31.2
Mean	23.2	24.0	23.4	24.4
Standard Deviation	4.0	4.1	3.8	4.1
Time (sec)	0.84 s	4.74 s	2.06 s	3.75 s

We can better compare the performance of the non-linearities by computing the ratio of the cross-correlation achieved for each non-linearity to the mean cross-correlation across all four non-linearities. As seen in the following graph, the tanh non-linearity consistently achieves the best cross-correlation, although the cubic non-linearities often comes quite close, while the square and gaussian non-linearities achieve poorer cross-correlation results with this data set. This indicates that the tanh non-linearity is able to better distinguish the true signal from cross-talk contamination from the remaining sources.

Tests were performed using FastICA with the deflation approach in MATLAB 7.1 on a 3.4 GHz Pentium 4-HT with 1 GB dual-channel SDRAM and Microsoft Windows XP. Listed times include ICA only, excluding PCA and whitening. The .mp3 files were encoded from .wav files created by MATLAB, using LAME 3.97b2 with 48-128 kHz VBR and a 22 kHz resampling frequency. Source recordings were filtered using GoldWave's Noise Reduction filter with default settings.

MATLAB code to perform this demo is available here: ica.m Please note that FastICA must be installed and in the current MATLAB path.

* Quotes are selected from public speeches made by currently-standing U.S. President George W. Bush, as quoted by Comedy Central.

Back to top