Support Vector Machines and dimensionality

When I was introduced to support vector machines I initially thought: this is great, the method takes care of irrelevant dimensions. My intuition was that since the algorithm tilts hyperplanes to cut the space, adding irrelevant dimensions does not matter at all, since the hyperplane would just lie parallel to the irrelevant dimensions.

hyperplane

Practically speaking, however, as the number of dimensions increases our data start to become sparse which can ‘fool’ the partitioning algorithm.

We can run some simple simulations to explore this question.

import pylab
from sklearn import svm, cross_validation

Let’s generate a dataset which consists of 500 examples of 200 dimensional data. The category information of the data only depend on the 1st dimension

d = 200
N = 500
C = pylab.randint(0,high=2,size=N)
F = pylab.randn(N,d)
F[:,0] += C*2

We set up a linear SVM classifier and cross validate with K-folds

clf = svm.SVC(kernel='linear', C=1)
cv = cross_validation.StratifiedKFold(C, n_folds=10)

If we run the classifier with just the first dimension, we get a classifier accuracy of 0.83 (chance being 0.5)

scores = cross_validation.cross_val_score(clf, F[:,:1], C, cv=cv)
scores.mean()

As we add in the first 99 irrelevant dimensions, our accuracy drops to 0.784

scores = cross_validation.cross_val_score(clf, F[:,:100], C, cv=cv)
scores.mean()

and when we add in all the 199 irrelevant dimensions, our accuracy drops to 0.754

scores = cross_validation.cross_val_score(clf, F[:,:], C, cv=cv)
scores.mean()

Now, this an extreme example (with so many dimensions), but it is a good lesson to keep in mind. The more complex your dataset (in terms of features) the more data you have to collect.

PS. For those wondering, the featured image is from Deus Ex:Human Revolution. It has not relevance to the post except that it has cool geometrical features. If you haven’t played Deus Ex:HR yet, you should do it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s