Why do we use ASCII files in genomics?

This rant was triggered when I was going over the format for FASTQ files. This is a pure rant: I propose no good solutions for anything here. I’m not even angry – just bemused. I’m a novice, so there are probably good reasons for keeping ASCII files for raw data which I just don’t know about. First,…

A lame adventure with progressiveCactus

Progressive Cactus is a set of tools that will align multiple DNA/protein sequences and save to the interesting HAL format. I decided to take it out for a spin. Compiling on Max OS Mavericks was easy (I just followed their Readme), except for this one problem with wget, but it was a easy fix. The command…

Screen Shot 2013-11-22 at 11.20.34 AM

Random numbers in a parallel world

TL;DR: It’s always a great idea to explicitly initialize your random number generator especially in a parallel computing environment. Random number generation in computing commonly uses pseudorandom number generators. These are iterated functions that take an initial number (a seed) and spit out a new number each time you call them. Each number is part…

The_Scream

Down the rabbit hole

I was putting some finalizing touches to pre-processing some data in preparation for some analysis I was raring to do. The plan was to create some pretty pictures, get some insight, get this off my desk by noon and go into the weekend with no backlog and a clear conscience. But here I am, this…

abba-knowing-me-knowing-you-polydor-2

What is Mutual Information?

The mutual information between two things is a measure of how much knowing one thing can tell you about the other thing. In this respect, it’s a bit like correlation, but cooler – at least in theory. Suppose we have accumulated a lot of data about the size of apartments and their rent and we…