Why do we visualize data? One reason is that it can be aesthetically pleasing. The beauty of data, of course, is in the eye of the data collector. Often data looks beautiful to me simply because I put a lot of effort into collecting it. But in general, I consider data visualization as one of the perks of the whole scientific chain of collecting, analyzing, interpreting and presenting data. But is it simply art? Is the utility of visualizing data simply to grab attention and thereby improve the impact of an analysis presentation by making it more memorable?
We visualize data because our memory is poor
I think that we visualize data for a deeper reason: we create visualizations because our short term (working) memory is fairly poor. There is a convincing body of study that says we can hold only about 4 items in short term memory. We have tricks, called chunking, that allow us to glob items together to form chunks, which allow us to, seemingly, remember more items, but this depends on long term memory, and in reality we are only remembering the bigger chunks. For example when given a list of known words (dog, cat, horse, cart) we don’t remember individual letters, we remember the words. In fact I could conceivably chunk (dog,cat) and (horse,cart) as these could form single concepts in my mind.
This poverty of short term memory makes it difficult for us to grasp patterns in data that is presented sequentially. So a table of data (in addition to being dull) requires concentrated study to interpret, whereas a bar graph is much quicker to grasp, and when done honestly and diligently, conveys the same quantitative precision and can be used to judge significance when confidence intervals (or the results of a significance test) are added to it.
This is because, now, instead of getting the information sequentially as we read down a table, we get it all at once as a pattern. We are good at categorizing and learning visual patterns. Through training we have learned to quickly interpret things like x-y plots, scatter plots, bar graphs and so on. The flip side of course is that when one of our colleagues comes up with a novel visual representation we have to train ourselves to understand it. Sometimes we can be mislead because the visualization looks like a different chart we are used to but the creator intends something different.
My personal favorite non-interactive data visualizations take the form of animations. I often do work where I track things through time, and I like to display that as a video. Time can also serve as an additional dimension i.e there is no explicit time in the data, but forms one of the axes of our data plot.
Here is an animation of Don Knuth’s famous AlgorithmX being used to solve a sudoku grid. The code is here. Here time represents the steps in the algorithm and so is possibly literally time.
I want to improve the visualization to make it more illustrative of the searching nature of the task. All suggestions welcome.
The following animation shows the receptive fields of neurons in primary visual cortex. The top panel shows which electrode we are showing data from. The bottom shows the intensity of the neural response (darker = more response) at different points in the visual field. The two things to note are that the response is concentrated in blobs. This is called the receptive field of the neuron at that location. The other thing is that neurons on the right side of the brain respond to visual objects on the left visual field and vice-versa.
Here time represents a non-temporal dimension. As you have guessed it represents which electrode we are recording from. This is unpublished data from an experiment I did.
The following is also an animation of neural data in V1, but this time I’ve reversed the representation. I am showing Local Field Potentials in V1. The top panel is the visual display that the subject sees. The middle panel is the voltage recorded on each of the electrodes of two 6×8 electrode arrays implanted on visual cortex. The bottom panel shows the passage of time.
Here time serves a dual role. On the smaller scale time represents time (the bar traveling from left to right on the slider). In the larger scale time represents different visual stimuli placed at different positions.
There is no reason for us to stick to the visual domain. Data can conceivably be presented in the domains of sound and touch. A cool demonstration of this is simply Braille. Blind people will let their fingers fly across a pockmarked page and fluidly read a novel where all we feel is a gentle raspiness. A more dramatic example is this prototype device that converts the visual scene to a ‘touch’ like representation that allows a blind person to see.
(I must say the description in this news clip is really good, in contrast to our typical crappy sound-bite focused news).
Taking off from that tongue based data visualization machine, if you remove the constraints of current technology I would say a grand adventure in data visualization would be to directly insert information across the cortex and let the person fly through high dimensional data spaces – literally think through high dimensional data spaces. Something only practiced mathematicians can seem to do, and that too with abstract mathematically perfect constructs and not real data.
Right now this would be an invasive process or placing electrodes on cortex and hooking up a signal processor that would create a spatio-temporal code out of the data and feed it to different parts of the cortical sheet. I would imagine that the user has the ability to control the display by imagining motions which then let them fly through this representational space.
From what I understand about cortex and learning, we would, with practice, learn to navigate this abstract data space like we can do a real spaces and we would notice links and shortcuts between different parts of the space which may give us insights for a more rigorous analysis of the data set.
Considerations of technology aside, I think one problem with this is that though cortex seems to be able to learn new representations it does not seem to be able to switch between different representations quickly. So, if we did learn to interpret this abstract representational space I think that our ability to interpret normal stimuli would be compromised.
If we placed the information in visual cortex, for example, after training we would interpret signals in our visual cortex in terms of the structure of our data set. When we went back to normal life, everything we saw would be very peculiar because our brain is processing the signals in cortex very differently now.