When we learned about matrices it wasn’t hard to think of slicing 2D matrices along rows and columns. But what about slicing higher dimensional matrices? I always get confused when I go above 3-dimensions. Interestingly, for me, an easy way to imagine slicing such high dimensional matrices is by using trees and thinking of how the numbers are stored in a computer.

Consider a 2x2x2x2 matrix, something we can construct as follows:

Now imagine the matrix is stored in the computer as a simple sequence of numbers (this is kind of true). What makes these numbers behave as a matrix is the indexing scheme. The indexing scheme can be thought of as a kind of tree.

Each level of the tree depicts a dimension, with the root being the outermost dimension and the leaves the inner-most dimensions.

Why do we visualize data? One reason is that it can be aesthetically pleasing. The beauty of data, of course, is in the eye of the data collector. Often data looks beautiful to me simply because I put a lot of effort into collecting it. But in general, I consider data visualization as one of the perks of the whole scientific chain of collecting, analyzing, interpreting and presenting data. But is it simply art? Is the utility of visualizing data simply to grab attention and thereby improve the impact of an analysis presentation by making it more memorable?

We visualize data because our memory is poor

I think that we visualize data for a deeper reason: we create visualizations because our short term (working) memory is fairly poor. There is a convincing body of study that says we can hold only about 4 items in short term memory. We have tricks, called chunking, that allow us to glob items together to form chunks, which allow us to, seemingly, remember more items, but this depends on long term memory, and in reality we are only remembering the bigger chunks. For example when given a list of known words (dog, cat, horse, cart) we don’t remember individual letters, we remember the words. In fact I could conceivably chunk (dog,cat) and (horse,cart) as these could form single concepts in my mind.

This poverty of short term memory makes it difficult for us to grasp patterns in data that is presented sequentially. So a table of data (in addition to being dull) requires concentrated study to interpret, whereas a bar graph is much quicker to grasp, and when done honestly and diligently, conveys the same quantitative precision and can be used to judge significance when confidence intervals (or the results of a significance test) are added to it.

This is because, now, instead of getting the information sequentially as we read down a table, we get it all at once as a pattern. We are good at categorizing and learning visual patterns. Through training we have learned to quickly interpret things like x-y plots, scatter plots, bar graphs and so on. The flip side of course is that when one of our colleagues comes up with a novel visual representation we have to train ourselves to understand it. Sometimes we can be mislead because the visualization looks like a different chart we are used to but the creator intends something different.

Non-interactive visualizations

My personal favorite non-interactive data visualizations take the form of animations. I often do work where I track things through time, and I like to display that as a video. Time can also serve as an additional dimension i.e there is no explicit time in the data, but forms one of the axes of our data plot.

Here is an animation of Don Knuth’s famous AlgorithmX being used to solve a sudoku grid. The code is here. Here time represents the steps in the algorithm and so is possibly literally time.

I want to improve the visualization to make it more illustrative of the searching nature of the task. All suggestions welcome.

The following animation shows the receptive fields of neurons in primary visual cortex. The top panel shows which electrode we are showing data from. The bottom shows the intensity of the neural response (darker = more response) at different points in the visual field. The two things to note are that the response is concentrated in blobs. This is called the receptive field of the neuron at that location. The other thing is that neurons on the right side of the brain respond to visual objects on the left visual field and vice-versa.

Here time represents a non-temporal dimension. As you have guessed it represents which electrode we are recording from. This is unpublished data from an experiment I did.

The following is also an animation of neural data in V1, but this time I’ve reversed the representation. I am showing Local Field Potentials in V1. The top panel is the visual display that the subject sees. The middle panel is the voltage recorded on each of the electrodes of two 6×8 electrode arrays implanted on visual cortex. The bottom panel shows the passage of time.

Here time serves a dual role. On the smaller scale time represents time (the bar traveling from left to right on the slider). In the larger scale time represents different visual stimuli placed at different positions.

Touching data

There is no reason for us to stick to the visual domain. Data can conceivably be presented in the domains of sound and touch. A cool demonstration of this is simply Braille. Blind people will let their fingers fly across a pockmarked page and fluidly read a novel where all we feel is a gentle raspiness. A more dramatic example is this prototype device that converts the visual scene to a ‘touch’ like representation that allows a blind person to see.

(I must say the description in this news clip is really good, in contrast to our typical crappy sound-bite focused news).

Blue sky

Taking off from that tongue based data visualization machine, if you remove the constraints of current technology I would say a grand adventure in data visualization would be to directly insert information across the cortex and let the person fly through high dimensional data spaces – literally think through high dimensional data spaces. Something only practiced mathematicians can seem to do, and that too with abstract mathematically perfect constructs and not real data.

Right now this would be an invasive process or placing electrodes on cortex and hooking up a signal processor that would create a spatio-temporal code out of the data and feed it to different parts of the cortical sheet. I would imagine that the user has the ability to control the display by imagining motions which then let them fly through this representational space.

From what I understand about cortex and learning, we would, with practice, learn to navigate this abstract data space like we can do a real spaces and we would notice links and shortcuts between different parts of the space which may give us insights for a more rigorous analysis of the data set.

Considerations of technology aside, I think one problem with this is that though cortex seems to be able to learn new representations it does not seem to be able to switch between different representations quickly. So, if we did learn to interpret this abstract representational space I think that our ability to interpret normal stimuli would be compromised.

If we placed the information in visual cortex, for example, after training we would interpret signals in our visual cortex in terms of the structure of our data set. When we went back to normal life, everything we saw would be very peculiar because our brain is processing the signals in cortex very differently now.

I have some data where each point consists of two values (x,y). A large bunch of the (x,y) values are likely to come from a distribution centered around the origin. A few of the points may come from distributions where x is significantly greater than zero. I wanted to find out if y for those points was significantly greater than zero.

My first idea was to draw a best fit line to the (x,y) pairs and see if the slope was significantly different from zero. I don’t have any hypothesis about how x is related to y, except that I think it is likely that if x is not different from zero, neither will be y.

To start with, I don’t think such a fitting is the correct approach but I did pursue it for want of a better idea. One thing I worried about is that if I performed such a fitting the noise from the uncorrelated points at the origin would drown out the far fewer points away from the origin.

Suppose you have a cluster of (x,y) points where both x and y come from a normal distribution with mean 0 and variance 1 such that x,y are independent. Now suppose you have one additional point for which x=y. What is the best fit line (‘chain’) through this cloud of points?

As your intuition will probably tell you, that depends on how far away theĀ lone correlated point is from the ‘ball’ (the cluster of uncorrelated points round the origin). The following two animations show how the slope and correlation coefficient change as this lone point gets further and further from the origin. In the first animation we have 10 points on the origin and in the second animation we have 100 points.

The further away the point is from the origin the more influence it has on the best fit line and correlation coefficient. Interesting to note is that the best fit line has a bias towards the horizontal. This is, of course, because of the form of the best fit line. If we had proposed a three parameter fit ax + by +c = 0 then our ‘best fit line’ would be sampled from around the circle.

This bias is interesting because we can see that the effect of the uncorrelated ball is to bias the chain more horizontal (corresponding to a reduction in correlation coefficient). If we use a different fitting form (such as ax+by+c=0) this bias would disappear.

import pylab, scipy.stats as ss, matplotlib.animation as animation
N = 100
Nc = 1
def run_fit(i=0):
x = pylab.randn(N)
y = pylab.randn(N)
xx = pylab.randn(Nc) if i < 100 else (i-100)/20.
x[:Nc] = xx #+ pylab.randn(Nc)
y[:Nc] = x[:Nc]
fitp = ss.linregress(x,y)
xsim = pylab.array([-20, 20])
yhat = fitp[0]*xsim + fitp[1]
return x,y,xsim,yhat,fitp
def init():
fig = pylab.figure(figsize=(3,3))
pylab.axis('equal')
ax = pylab.axes(xlim=(-20, 20), ylim=(-20, 20))
x,y,xsim,yhat,fitp = run_fit()
line1, = pylab.plot(x,y,'ko')
line2, = pylab.plot(xsim, yhat, 'k--', lw=2)
text = pylab.text(-10,10,'m={:+4.2f}\nb={:+4.2f}\nr={:+4.2f}'.format(fitp[0],fitp[1],fitp[2]))
return fig, line1, line2, text
def animate(i, line1, line2, text):
x,y,xsim,yhat,fitp = run_fit(i)
line1.set_data(x, y)
line2.set_data(xsim, yhat)
text.set_text('m={:6.2f}\nb={:6.2f}\nr={:6.2f}'.format(fitp[0],fitp[1],fitp[2]))
fig, line1, line2, text = init()
anim = animation.FuncAnimation(fig, animate, fargs=(line1, line2, text), frames=400, interval=20, repeat=False)
anim.save('ball_and_chain.mp4', fps=30, extra_args=['-vcodec', 'libx264'])
pylab.show()