# What is machine learning, and why should I care?

This is a (longish) informal intro to machine learning aimed at Biology/Neuroscience undergraduates who are encountering this for the first time in the context of biological data analysis. We are, as you know, furiously lazy. It’s a biological adaptation: we always try to find the path of least resistance. Only people with distorted senses of…

# Data Management

They told you you’d have to manage your time, manage your equipment, even manage your research assistants, but nobody told you that you’d have to manage your data. This post is an haphazard list of tips that I’ve found useful for keeping data well behaved and obedient and that I would like to share with…

# Depth-of-field

Depth-of-field (DoF) is one of the most fun things about photography. It is enjoyable on both the technical and artistic levels. Depth-of-field is the extent (“depth”) in a scene that is in focus (“field”) on a photograph. Artistically it is usually used to isolate a subject from the surroundings and can be used to indicate…

# The curse of D- and the LDA

All you dataheads know the curse whose name must not be spoken. The curse of D(imensionality)! Let’s look at how the curse sickens us when we perform Linear Discriminant Analysis (LDA). Our intuition, when we perform LDA, is that we are rotating a higher dimensional data space and casting a shadow onto a lower dimensional…

# Slicing high dimensional arrays

When we learned about matrices it wasn’t hard to think of slicing 2D matrices along rows and columns. But what about slicing higher dimensional matrices? I always get confused when I go above 3-dimensions. Interestingly, for me, an easy way to imagine slicing such high dimensional matrices is by using trees and thinking of how…

# Beautiful Data

Why do we visualize data? One reason is that it can be aesthetically pleasing. The beauty of data, of course, is in the eye of the data collector. Often data looks beautiful to me simply because I put a lot of effort into collecting it. But in general, I consider data visualization as one of…

# Ball and Chain

I have some data where each point consists of two values (x,y). A large bunch of the (x,y) values are likely to come from a distribution centered around the origin. A few of the points may come from distributions where x is significantly greater than zero. I wanted to find out if y for those…