Raw (unaligned), paired reads in a BAM file

Since BAM files are the binary version of SAM files, which in turn stand for Sequence Alignment/Mapping, its a little strange to store unaligned data (the raw reads from the sequencer) in a BAM. However, as eloquently argued by others, the text based FASTQ format is showing its age and an indexed binary file is a…

Screen Shot 2014-03-26 at 3.05.42 PM

SAM! BAM! VCF! What?

As part of an exercise I’m doing I was supplied with a FASTQ file which contains a set of reads and I needed to figure out how to get a VCF file out. A what what now? Exactly. When we try to obtain the sequence of some DNA the big machine that we dump our DNA does…

Why do we use ASCII files in genomics?

This rant was triggered when I was going over the format for FASTQ files. This is a pure rant: I propose no good solutions for anything here. I’m not even angry – just bemused. I’m a novice, so there are probably good reasons for keeping ASCII files for raw data which I just don’t know about. First,…

A lame adventure with progressiveCactus

Progressive Cactus is a set of tools that will align multiple DNA/protein sequences and save to the interesting HAL format. I decided to take it out for a spin. Compiling on Max OS Mavericks was easy (I just followed their Readme), except for this one problem with wget, but it was a easy fix. The command…

Screen Shot 2013-11-22 at 11.20.34 AM

Random numbers in a parallel world

TL;DR: It’s always a great idea to explicitly initialize your random number generator especially in a parallel computing environment. Random number generation in computing commonly uses pseudorandom number generators. These are iterated functions that take an initial number (a seed) and spit out a new number each time you call them. Each number is part…

The_Scream

Down the rabbit hole

I was putting some finalizing touches to pre-processing some data in preparation for some analysis I was raring to do. The plan was to create some pretty pictures, get some insight, get this off my desk by noon and go into the weekend with no backlog and a clear conscience. But here I am, this…