The curse of D- and the LDA

All you dataheads know the curse whose name must not be spoken. The curse of D(imensionality)! Let’s look at how the curse sickens us when we perform Linear Discriminant Analysis (LDA). Our intuition, when we perform LDA, is that we are rotating a higher dimensional data space and casting a shadow onto a lower dimensional surface. We rotate things such that the shadow exaggerates the separation of data coming from different sources (categories) and we hope that the data, which may look jumbled in a higher dimension, actually cast a shadow where the categories are better separated.

I have some data that can be thought of as having a large number of dimensions, say about a hundred. (Each dimension is a neuron in the brain, if you must know). I know, from plotting the data from the neurons individually, that some neurons just don’t carry any information I’m interested in, while others do. I’m interested in the question: if I combine multiple neurons together can I get more information out of them than if I look at them individually. It is possible to construct toy scenarios where this is true, so I want to know if this works in a real brain.

A question quickly arose: which neurons should I pick to combine? I know that by using some criteria I can rank the neurons as to informativeness and then pick the top 50% or 30% of the neurons to put together. But what happens if I just take all the neurons? Will LDA discard the useless noisy ones. Will my data space be rotated so that these useless neurons don’t cast any shadow and are eclipsed by the useful neurons?

This is not entirely an idle question, or a question simply of data analysis. The brain itself, if it is using data from these neurons, needs some kind of mechanism to figure out which neurons are important for what. I find this is an important question and I don’t think we are sure of the answers yet.

However, back to the math. We can generate a toy scenario where we can test this. I created a dataset that has 10,25 and 50 dimensions. Only one dimension is informative, the rest are noise. Data from the first dimension come from two different classes and are separated. What happens when we rotate these spaces such that the points from the two classes are as well separated as possible?

The plot below shows thlda_dimensionalitye original data (blue and green classes). You can see that the ‘bumps’ are decently separated. Then you can see the 10d, 25d and 50d data.

Wow! Adding irrelevant dimensions sure helps us separate our data! Shouldn’t we all do this, just add noise as additional dimensions and then rotate the space to cast a well separated shadow? Uh, oh! The curse of D- strikes again!

We aren’t fooled though. We know what’s going on. In real life we have limited data. For example, in this data set I used 100 samples. Our intuition tells us that as our dimensions increase the points get less crowded. Each point is able to nestle into a nice niche in a higher dimension, further and further away from its neighbors. Its like the more dimensions we add, the more streets and alleys we add to the city. All the points no longer have to live in the same street. They now have their own zip codes. (OK I’ll stop this theme now)

Poor old LDA knows nothing about this. LDA simply picks up our space and starts to rotate it and is extremely happy when the shadow looks like it’s well separated and stops. The illusion will be removed as soon as we actually try to use the shadow. Say we split our data into test and train sets. Our train set data look nicely separated, but the moment we dump in the test data: CHAOS! It’s really jumbled. Those separate zipcodes – mail fraud! Thanks to the curse of D-

Code follows:

import pylab
from sklearn.lda import LDA

def myplot(ax, F, title):
  bins = pylab.arange(-15,15,1)
  N = F.shape[0]
  pylab.subplot(ax)
  pylab.hist(F[N/2:,0], bins, histtype='step', lw=3)
  pylab.hist(F[:N/2,0], bins, histtype='step', lw=3)
  pylab.title(title)

D = 50 #Dimensions
N = 100 #Samples
pylab.np.random.seed(0)
F = pylab.randn(N,D)
C = pylab.zeros(N)
C[:N/2] = 1 #Category vector
F[:,0] += C*4 #Adjust 1st dimension to carry category information

lda = LDA(n_components=1)
#bins = pylab.arange(-15,15,1)
fig, ax = pylab.subplots(4,1,sharex=True, figsize=(4,8))

myplot(ax[0], F, 'original')
F_new = lda.fit_transform(F[:,:10],C) #Ten dimensions
myplot(ax[1], F_new, '10d')
F_new = lda.fit_transform(F[:,:25],C) #25 dimensions
myplot(ax[2], F_new, '25d')
F_new = lda.fit_transform(F[:,:50],C) #50 dimensions
myplot(ax[3], F_new, '50d')
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s