functools.partial is one of those things that you don’t need until you need it, and then you need it badly. Trying to create a series of lambda functions I discovered that the simplest answer that came to my mind did something counter intuitive, and functools.partial – which I had ignored so far, came to my rescue.

Effectively, I wanted to create a list of functions

f1(x), f2(x), f3(x) ...

where each function took the same input but did something different, for example

f1(x) = x + 1, f2(x) = x + 2, f3(x) = x + 3

What I tried was as follows

fl = [lambda x: x + i for i in range(4)]

It ran fine, but it gave me the following output

fl[0](1) -> 4
fl[1](1) -> 4
fl[2](1) -> 4
fl[3](1) -> 4

What was happening was that all the lambda functions were taking the last value of i (as a global), rather than the value of i at the time the function was created.

An object oriented person would probably do something like

class F:
  def __init__(self, i):
    self.i = i
  def f(self, x):
    return x + self.i

fl = [F(i) for i in range(4)]

But I enjoy Python because it borrows a lot of things from LISP and allows you to write a lot of code very succinctly, making it easier to grasp (if done well – it’s also possible to write Perl in Python)

Turns out the way you can capture the state of i in the lambda is to create a partial function, and functools.partial allows you to do this succinctly as

fl = [functools.partial(lambda x, j: x + j, j=i) for i in range(4)]

For those who are curious, WHY I had to do this – well, here is an analogy to my use case: I had data that was well expressed as a table of values. The data itself was stored as a list of dictionary which was sometimes nested – some of the values formed part of a hierarchy. For example, one row of the data could look like this:

row = {
 'name': 'Baker',
 'rank': 22,
 'score': {
   'raw': 43,
   'scaled': 25

I wanted a table with columns

Last Name | Rank | raw score | scaled score |
    Baker |   22 |        43 |            25|

The actual table I was creating was meant to show ALL the data from an experiment with many variables and so had a lot of columns. I was wary of making a mistake in labeling the column relative to the value it had. It would be one of those silent and deadly errors.

If I had a simple flat structure, I might simply have changed the code such that the dictionary keys matched the column headers, weighing the fact that there was other code – already written and debugged – that used the existing keys and that the existing keys were short and succinct and I liked them.

My solution was to create a list of tuples

[(column header, function to extract column value from dictionary)]

And then simply loop over this list, first to create the header, and then for each row, to extract the data value into it’s correct place. Having the column header right next to the extraction function minimized the chances I would mess up somewhere, especially when adding or moving a column when editing or refactoring the code.

Common Lisp doesn’t yield (but it maps)

I didn’t use Python’s yield statement until I’d been programming for a few years. Within a few days of learning it, however, I could’t live without it. I find Python generators and list/dict comprehensions one of the key things to making Python code compact and readable. I was pretty bummed to learn that Common Lisp does not yield to this kind of program structure. The clever pun not-withstanding, I was happy to learn that common lisp has a different paradigm: a map-like one.

Consider the following straightforward problem: I have a text file and am interested in the string of characters (called a “read”) present on every fourth line. For each such read I’m interested in getting each K character long section (called a “kmer”) (bioinformatics savvy readers will recognize the file as a FASTQ file, and the K character long sections as k-mers from the reads in the FASTQ.

In Python there’s a very elegant way to abstract this extraction of data from the FASTQ file using generators (which is what yield allows us to create)

def read_fastq(fname):
  with open(fname, 'r') as fp:
    for n, ln in enumerate(fp):
      if (n - 1) % 4 == 0:
        yield ln.strip()

def kmers_from_seq(seq, kmer_size):
  assert len(seq) > kmer_size
  for n in range(len(seq) - kmer_size + 1):
    yield seq[n:n + kmer_size]

def kmers_from_fastq(fname, kmer_size):
  for seq in read_fastq(fname):
    for kmer in kmers_from_seq(seq, kmer_size):
      yield kmer

Note how nonchalantly I nest the two generators which allows me to do:

for kmer in kmers_from_fastq(fastq_fname, kmer_size):

Update: My first run through is now appended to the end of the main article. Rainer Joswig saw my original post and showed me the Lisp way to do this, which turns out to be to use a nested map like pattern. Many thanks to Rainer! My only reservation was the medium of instruction – a series of tweets. It might work for the 45th president, but I personally prefer the more civilized medium of post comments, which creates a more coherent record.

Rainer’s example code is here. I’ve minorly changed the code below:

(defun map-to-seq-in-fastq (in fn)
  "Apply fn to every second line of the FASTQ file (seq line)"
  (flet ((read-seq-line (in)
	       (read-line in nil)
	       (read-line in nil)
	     (read-line in nil)
	     (read-line in nil))))
    (loop for seq = (read-seq-line in)
	 while seq do (funcall fn seq))))

(defun map-to-kmer-in-seq (seq kmer-size fn)
  "Apply fn to all kmers of length kmer-size in seq."
  (let ((len (length seq)))
      for pos from 0
      for end = (+ pos kmer-size)
      while (<= end len)
      do (funcall fn (subseq seq pos end)))))

(defun map-to-kmer-in-fastq (in kmer-size fn)
   (lambda (seq)
     (map-to-kmer-in-seq seq kmer-size fn))))

You can see what is happening. Computation functions are being passed in, Russian doll-like, into a nest of other functions that loop over some source of data. I get it, but it’s still not as neat looking as Python.

Appendix: my first run through using closures.

Well, I can’t do things like this in Common Lisp. Not easily and not with built-ins.

I’m pretty surprised at this. This has got to be the first idiom I know from Python that I can not replicate in CL. The closest I could get was via a closure:

;; generator (closure) based on
(defun fastq-reader (fname)
  (let ((in (open fname)))
    #'(lambda ()
	(prog2  ; <-- prog1 and prog2 are nifty!
	    (read-line in nil)
	    (read-line in nil)  ; <-- this is the sequence line
	  (read-line in nil)
	  (read-line in nil)))))	; using nil for return value

This can be called using the loop macro as follows

;; loop based on
(defparameter gen (fastq-reader "../../DATA/sm_r1.fq"))
(loop for seq = (funcall gen) ; <-- Note this
   while seq do (format t "~A~%" seq))

Similarly, I can write the kmer extraction as a closure

(defun kmer-gen (seq &key (kmer-size 30) (discard-N? nil))
  (let ((n 0) (n-max (- (length seq) kmer-size)))
   #'(lambda ()
	   (if (< n n-max)
	       (subseq seq n (+ n kmer-size))
	 (incf n)))))

And combine the two closures as:

(defparameter gen (fastq-reader "../../DATA/sm_r1.fq"))
(loop for seq = (funcall gen)
   while seq do
     (let ((g (kmer-gen seq :kmer-size 10)))
       (loop for s = (funcall g)
	    while s do (format t "~A~%" s))))

It’s not terrible as the code still looks compact and readable but it’s not as nicely abstracted away as the nested Python generator.

(A complete non sequitor: Remember how I had issues with [square brackets] and (parentheses) for array indexing when I moved from MATLAB to Python? Well now I have this issue with prefix and infix notation. I keep trying to write (print kmer) and print(kmer) is starting to look plain weird …)

Some notes on using Python multiprocessing

This is a short note describing a common code pattern useful for parallelizing some computations using the Python multiprocessing module.

Many problems are of the embarrassingly parallel type, where the task consists of the same set of computations done independently on a large set of input data. In many languages such problems are solved using multi-threaded code.  Due to events too complicated and embarrassing to mention Python doesn’t do too well with multiple threads, and we need to use multiple processes via the multiprocessing module instead.

A common pattern is to set up a number of workers and use queues to pass data back-and -forth. The pattern looks like this:


The use of this pattern is relatively straight forward but there are some things to pay attention to.

Sharing read only data

The general topic of sharing data between processes is worthy of deeper discussion, and there are some pointers in the Python documentation but there is an easy way to share read only data between processes – use globals that are written to in the parent process BEFORE the child processes are started.

If you declare a global variable and then modify it after the child processes are started, or if you modify the global from the child process you’ll enter into the twilight zone. There will be no errors and from within the process where you do the writing everything will look consistent, but the rest of the universe will not pay any attention to your writes.

Queues have overhead – the object has to pickled by the sender and then un-pickled by the receiver, and while the object is in the queue, it’s taking up memory. If you have a large amount of read-only data that needs to be shared amongst child processes – like a lookup table of primes, for example – this pattern of using globals can be pragmatic.


Explicitly set Queue size

On Mac OS X the default size of the queue is  32767 items. On Linux, the default size is … INFINITY. I learned this the hard way. I usually prototype my code on my mac, and then deploy to a linux container that runs god knows where on data that is several orders of magnitude larger than my prototype set. On Linux things SEEMED to run fine but the size of the result set was abnormally small compared to the input. After a lot of poking around I discovered this property of the Python implementation and what was happening was that in Linux my queues got larger and larger, taking up more and more RAM and the OS started to kill off the child processes. Setting an explicit queue size allows the processes to block when the queue becomes full, capping memory usage.

Wrap the child process function using traceback

Error messages from child processes are messy and uninformative. Importantly, the friendly python stack trace print out is missing. A fairly convenient pattern is to wrap the actual function call within a try-except block in thin outer function, using the traceback module, like so:

def func_w(args):
  """A thin wrapper to allow proper tracebacks when things go wrong in a thread

  :param args:
  import traceback
    return func(**args)
  except Exception as e:
    raise e

Check for normal termination from the child

One of the reasons it took me so long to track down the above mentioned memory error was that when the child processes was being killed by the OS no Python error was raised. The principled way to check for normal termination in is to look at the exitcode value of the process.

When not to use this pattern

Passing data via queues can be expensive in terms of computation and memory. If your individual workers have to pass a lot of data back and forth – say large submatrices of an even larger matrix – it may be neater and faster to use shared memory objects as supplied by the multiprocessing module which takes care of semaphores for you.


I think repr() is Python’s genius touch. What?! I’m sure you’re yelling at the screen right now, incoherent as items from the long list of things you love about Python fight each other to get gain command of your voice. But it’s what I think. And as you know, Lisp got there first.

Now, of course, you’re really mad at me. But listen. What do we use Python a lot for? We use it for anything that requires our language to be nimble, fluid and user friendly, like data analysis and algorithm prototyping. And why do we use Python? Because it has a REPL. We’ll think of something to do, we’ll write a set of functions and classes to do it (sometimes never leaving the REPL) and then run our data on it.

The key feature of the REPL is being able to see the results of a computation immediately printed below what we did. This is a user comfort feature, and user comfort has a disproportionate impact on our motivation to keep trying out new ideas.

Python’s built in types have decent printed representations, but these can get crowded (dicts with many keys get hard to read quickly, for example). And what about user defined classes? Often, the results of a computation are not a few strings or simple, single numbers. The result of adding two arrays or two tables can be visually complex, for instance. Representing these as unreadable combinations of Python’s built in types or equally unreadable pointer ids starts to defeat the purpose of the interpreted, REPL workflow.

To mitigate this, Python supplies us with the special repr() method. When we create a class we don’t have to define this method, but if we do define it, making it return a succinct string which we find useful as a summary of the object, at the REPL we are then rewarded with these succinct, readable representations.

For example, say we are doing some work with vectors and are interested in how the magnitude of the resultant vector changes as we add vectors together. We could do this as follows:

class Vect:
  def __init__(self, x, y):
    self.x, self.y = x, y


  def __radd__(self, other):
    return Vect(self.x + other.x, self.y + other.y)


  def __add__(self, other):
    return Vect(self.x + other.x, self.y + other.y)


  def __repr__(self):
    return &quot;({}, {}):{:0.3f}&quot;.format(self.x, self.y, (self.x ** 2 + self.y ** 2) ** 0.5)

There are some things here in addition to repr which enable us to do vector additions. We will ignore those here.

So now when we create some vectors, we get a nice printout

Vect(3,4)   # -&gt; (3, 4):5.000
Vect(5,12)  # -&gt; (5, 12):13.000

This feature plays wonderfully with Python’s built in types, so a Set of these classes, a dict with these classes as keys or values, or a simple list of these classes all work as you would hope.

[Vect(3,4), Vect(5,12)]       # -&gt; [(3, 4):5.000, (5, 12):13.000]
{Vect(3, 4), Vect(5, 12)}     # -&gt; {(3, 4):5.000, (5, 12):13.000}
{Vect(3, 4): 'A', Vect(5, 12): 'B'}  # -&gt; {(3, 4):5.000: 'A', (5, 12):13.000: 'B'}
sum([Vect(3,4), Vect(5,12)])  # -&gt; (8, 16):17.889

PS. I forgot – I dropped a teaser about how Lisp got there first (as usual). In Common Lisp when you define a structure, the structure gets a default “printer” which you can override:

(defstruct vect x y)    ; -> uses default printer

(make-vect 😡 3 :y 4)   ; -> #S(VECT :X 3 :Y 4)

Now we can replace this with our own printer

(defstruct vect x y)    ; -> uses default printer

(make-vect 😡 3 :y 4)   ; -> #S(VECT :X 3 :Y 4)

Python, global state, multiprocessing and other ways to hang yourself

C and C++ were supposed to be the “dangerous” languages. There’s Stroustrup’s quote, for example, widely circulated on the internet. However, Stroustrup clarifies that his statement applies to all powerful languages. I find Python to be a powerful language that, by design, protects you from “simple” dangers, but lets you wander into more complex dangers without much warning.

Stroustrup’s statement in more detail is (from his website):

“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off”. Yes, I said something like that (in 1986 or so). What people tend to miss, is that what I said there about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems.

The surprising thing about Python I’d like to briefly discuss here is how the multiprocessing module can silently fail when dealing with shared state.

What I effectively did was have a variable declared in the parent process, passed to the child process which then modified it. Python happily lets you do this, but changes to the variable are not seen by the parent process. As this answer to the question on stackoverflow explains:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

I was thrown though, by two additional layers of complexity. Firstly as the example code below shows, I shared the state in a somewhat hidden manner – I passed an instance method to the new process. I should have realized that this implicitly shares the original object – via self – with the new process.

Secondly, when I printed out the ids of the relevant object in the parent and child processes the ids came out to be the same. As the documents explain:

CPython implementation detail: This is the address of the object in memory.

The same id (= memory address) business threw me for a bit. Then I heard a faint voice in my head mumbling things about ‘virtual addressing’ and ‘copy-on-write’ and ‘os.fork()’. So what’s going on here? A little bit of perusing on stack overflow allows us to collect the appropriate details.

As mentioned above, Python, because of some implementation reasons (keyword: Global Interpreter Lock – GIL) uses os.fork() to achieve true multiprocessing via the multiprocessing module. fork() creates an exact copy of the old process and starts it in a new one. This means, that, for everything to be consistent, the original pointers in the program need to keep pointing to the same things, otherwise the new process will fall apart. But WAIT! This is chaos! Now we have two identical running processes writing and reading the same memory locations! Not quite.

Modern OSes use virtual addressing. Basically the address values (pointers) you see inside your program are not actual physical memory locations, but pointers to an index table (virtual addresses) that in turn contains pointers to the actual physical memory locations. Because of this indirection, you can have the same virtual address point to different physical addresses IF the virtual addresses belong to index tables of separate processes.

In our case, this explains the same id() value and the fact that when the child process modified the object with the same id value, it was actually accessing a different physical object, which explains why it’s parent doppelganger now diverges.

For completeness, we should mention copy-on-write. What this means is that the OS cleverly manages things such that initially the virtual addresses actually point to the original physical addresses – as if you copied the address table from the original process. This allows the two processes to share memory while reading (and saves a bunch of copying). Once either of the processes writes to a memory location, however, a bunch of copying is done and the relevant values now reside in a new memory location and one of the virtual tables are updated to reflect this.

Which brings us to the question: what the heck am I doing worrying about addresses in Python?! Shouldn’t this stuff just work? Isn’t that why I’m incurring the performance penalty, so I can have neater code and not worry about the low level details? Well, nothing’s perfect and keep in mind Stroustrup’s saw, I guess.

Also, never pass up a learning opportunity, much of which only comes through the school of hard knocks. It also gives a dopamine rush you’ll not believe.  I wonder what kind of shenanigans you can get into doing concurrent programming in Lisp, hmm …

So, what about the other ways to hang yourself, as promised in the title? Oh, that was just clickbait, sorry. This is all I got.

Code follows:

import Queue as Q
from multiprocessing import Process, Queue
import time

def describe_set(s):
  return 'id: {}, contents: {}'.format(id(s), s)

class SetManager(object):
  def __init__(self):
    self.my_set = set()
    print('Child process: {}'.format(describe_set(self.my_set)))

  def add_to_set(self, item):
    print('Adding {}'.format(item))
    print('Child process: {}'.format(describe_set(self.my_set)))

def test1():
  print('\n\nBasic test, no multiprocessing')

  sm = SetManager()
  print('Parent process: {}'.format(describe_set(sm.my_set)))

  print('Parent process: {}'.format(describe_set(sm.my_set)))

class SetManager2(SetManager):
  def __init__(self, q, q_reply):
    super(SetManager2, self).__init__()
    # SetManager.__init__(self)
    self.keep_running = True

    self.q, self.q_reply = q, q_reply
    self.sp = Process(target=self.loop, args=())

  def loop(self):
    while self.keep_running:
        msg = self.q.get(timeout=1)
        if msg == 'quit':
          self.keep_running = False
        elif msg == 'peek':
      except Q.Empty:

def test2():
  print('\n\nMultiprocessing with method set off in new thread')

  q, q_reply = Queue(), Queue()

  sm = SetManager2(q, q_reply)
  print('Parent process: {}'.format(describe_set(sm.my_set)))

  print('Parent process: {}'.format(describe_set(sm.my_set)))

  print('Parent process: {}'.format(describe_set(sm.my_set)))

  print('Reply from child process: {}'.format(describe_set(q_reply.get())))


class SetManager3(SetManager2):
  def __init__(self, q, q_reply):
    super(SetManager2, self).__init__()
    self.keep_running = True
    self.q = q
    self.q_reply = q_reply

def start_set_manager3_in_process(q, q_reply):
  sm = SetManager3(q, q_reply)

def test3():
  print('\n\nMultiprocessing with object created in new thread')

  q, q_reply = Queue(), Queue()

  sp = Process(target=start_set_manager3_in_process, args=(q, q_reply))
  # print('Parent process: items are: {}'.format(sm.my_set))



  print('Reply from child process: {}'.format(describe_set(q_reply.get())))



Chasing down a Cython error

My python program segfaulted. When I wrote C++ programs segfaults were no big deal. I would hook into gdb mostly through my IDE’s debugger, and dance around the point the program crashed until I had an idea of where I was going wrong. Python gives a wonderful stack trace and tells you where things have gone wrong, so you can just look into the source code from the information Python spits out before it dies. But not for C-extensions.

When things go wrong in a Cython or C module that you are calling from Python you are stuck, a little bit, in no-mans land. Fortunately, from a stack-overflow thread here, you can just run Python in gdb, run your code and then debug it all as a C program (Python) operating on some data (Your Python program).

Unfortunately, I’m on Mac OS X, and though I could use homebrew to install gdb, I could not get it to run my python interpreter due to a binary compatibility issue. I decided to be brave and use LLDB (which comes with Mac OS X) to try and debug this error.

lldb python

This attaches Python to the debugger. Then you can run the Python program as follows

run /path/to/my/script arg1 arg2

My program crashed with

Fatal Python error: GC object already tracked
Process 59193 stopped

And some instructions in assembly which weren’t so helpful. However, I could do

thread backtrace

Which then led me to some candidate functions in the c-source file of the translated Cython code, like:

   frame #4: 0x000000010004d658 Python`PyTuple_New + 252
    frame #5: 0x00000001036ad6af`__Pyx_PyObject_CallOneArg(_object*, _object*) [inlined] __Pyx__PyObject_CallOneArg(arg=&lt;unavailable&gt;) + 5 at util_cython.cpp:7694
    frame #6: 0x00000001036ad6aa`__Pyx_PyObject_CallOneArg(func=0x000000010456f170, arg=0x0000000104c62a90) + 149 at util_cython.cpp:7712
    frame #7: 0x00000001036b1480`__pyx_pw_5mitty_3lib_11util_cython_9markov_sequences(_object*, _object*, _object*) [inlined] __pyx_f_5mitty_3lib_11util_cython_markov_chain_sequence_gen(__pyx_v_first_letter='C', __pyx_v_l=&lt;unavailable&gt;, __pyx_v_max_len=10, __pyx_v_rng=&lt;unavailable&gt;) [5], unsigned char*, unsigned long*, unsigned long, _object*) + 110 at util_cython.cpp:2898
    frame #8: 0x00000001036b1412`__pyx_pw_5mitty_3lib_11util_cython_9markov_sequences(_object*, _object*, _object*) [inlined] __pyx_pf_5mitty_3lib_11util_cython_8markov_sequences(__pyx_v_max_len=0x0000000100300678) + 2150 at util_cython.cpp:3710
    frame #9: 0x00000001036b0bac`__pyx_pw_5mitty_3lib_11util_cython_9markov_sequences(__pyx_self=&lt;unavailable&gt;, __pyx_args=&lt;unavailable&gt;, __pyx_kwds=&lt;unavailable&gt;) + 6070 at util_cython.cpp:3263
    frame #10: 0x000000010008614d Python`PyEval_EvalFrameEx + 8080
    frame #11: 0x0000000100084093 Python`PyEval_EvalCodeEx + 1641

Which was enough information for me to look more carefully at my markov_chain_sequence_gen function.

While x:

This is a cautionary tale about not misusing Python’s kindness. Or perhaps a cautionary tale about not trusting even widely used libraries to mirror what Python does. It is a little technical …

I have a program that, for the purposes of this post, operates on a list of pairs of numbers, going through each pair one by one until it exhausts the list. For certain reasons I’m using an iterator on the list and using Python’s ‘None’ to indicate that the list has run out, like so:

x = [(0, 0), (1, 2), (3,1)]
itr = x.__iter__()
v = next(itr, None)
while v:
  print v
  v = next(itr, None)

The pair of numbers represents particular data. The first number can range from 0 onwards and is always increasing monotonically while the second number is one of {0, 1, 2}. I have several tests for this function and have checked for different edge cases.

As you can see, in the snippet above, the list x is a list of Python tuples. After the bulk of the development for this (and related) functions had been completed, I had decided that a better way to store my list of tuples would be as a numpy structured array. That way I would have labels for my first and second numbers in the pair and wouldn’t get confused which was which six months from now.

And Python, being the great dynamic language it is, let me make the switch without a murmur. Python doesn’t care that the object I was initially passing it was a list of tuples and the new one is a numpy structured array! Python sees both objects yield iterators as needed and both objects are spitting out pairs of numbers at each iteration. So Python just works!

Things ran just fine for a while.

One day my colleague, who has used my program many times before, came across some very strange results in their experiments. I knew right away that this was going to be an interesting bug, just like any bug that is found after a program has been used for a while. The simple/easy bugs have all been squashed. What remains are the cockroaches. Evil, secretive errors in logic that scurry way from the daylight and come out only in the shadows of the night.

So I took the input data my colleague was using and ran my program. Indeed something bizarre was going on. For some of the data the program was acting exactly as it should. But for some of the data the program claimed that the list of pairs of numbers was empty and would exit right away. It would do the rest of the processing without any errors.

My first guess was that something was wrong with the input files. The input files were being generated with some hastily written code that I had not fully tested, so it was the logical choice, though this would make the bug less interesting.

I reviewed the input data and found that it was just fine. So the input to the program was correct. The program was just borking, seemingly, randomly.

I then started to do things the old fashioned way. I started to put in print statements at various parts of the code and simply print the length of my array of pairs of numbers. This was being done on a remote machine and I hadn’t learned yet how to hook my debugger into the remote machine. Also, sometimes, print statements are just a more pragmatic way to debug, regardless of what the eggheads will tell you.

The program loads the list. A million elements in the list. Good. The program does a transformation on the list. A million elements. Good. Another transformation. Still a million little pairs of numbers, hanging on for dear life. Then they enter this piece of code I sketched out above. And BAM! the program skips the loop, grinning as it whizzes by, zero little pairs of numbers.

I run the program again, this time with a different input set. A hundred thousand elements. Passes first transform, second transform, and then my loop. Round and round it goes in the loop, a hundred thousand times before shooting out, all dizzy at the end, having done exactly what it was supposed to do.

Very interesting.

What is different between the two data sets that would make the code DO this? So I opened by the data and started to look closely at the ones that failed and the ones that succeeded and then it struck me. All the ones that failed had (0, 0) as their first element.

And then it was clear.

When I wrote the loop:

itr = x.__iter__()
v = next(itr, None)
while v:
  print v
  v = next(itr, None)

I was a bit lazy. What I really meant was, “exit the loop if v is None”. Python is very kind and None evaluates to False in this test, so it is equivalent. The problem is that the loop will also terminate if v == 0. But, wait this is actually not a problem because my v is not actually a scalar number. It is a pair of numbers and during my testing I have repeatedly verified that (0, 0) != 0. One is a scalar value – zero – while the other is a Python object – a tuple. That can’t possibly be the problem!

But wait! Halfway through development I told you I switched from using Python lists of tuples to numpy arrays.

So I looked a bit into that, and BAM, sure enough, that’s where the behavior differs. For some reason the numpy library has been written to evaluate an element of a structured array the same as zero if both their elements are zero. This is different from taking a slice of a multi-dimensional array, where Python will complain that “ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()”

This is what still throws me. When I ask numpy for an element of my structured array of pairs of numbers, I get back a pair of numbers. These numbers are not simply another numpy array of two elements – which is what you would get from slicing a 2D numpy array. It’s a pair of numbers. However, this pair does not act as a Python tuple. It’s a different beast altogether.

Well, folks, the moral of the story is, don’t be lazy. If you want to test for None, test it explicitly. You never know what evaluates to 1 or to 0 even if you think you do.

If you would like to see this bug for yourself, here is a sufficient example:

import numpy as np

def looper(x):
  itr = x.__iter__()
  v = next(itr, None)
  while v:
    print v
    v = next(itr, None)

def python_list():
  print 'Python list'
  l = [(0, 0), (1, 0), (2, 2), (3, 0)]

def numpy_struct_array():
  print 'Numpy array'
  l = np.array([(0, 0), (1, 0), (2, 2), (3, 0)], dtype=[('x', 'int32'), ('y', 'int8')])

if __name__ == '__main__':

Why would I ever write in C?

I’ve made computers do tricks in both C/C++ and Python for many years now. For the past few years I’ve written exclusively in Python for both work and play. I recently went back to a personal project to rewrite it in C++ and, for the first time in my life, thought, “Why would I ever write in C?”

I have two personal projects (A note taker/reference manager and a photo organizer) that I’ve been working on for over a decade now. I started both in C++ using the QT widget library. I moved the note taker to Ruby (on-rails, I had my browser-as-GUI phase) then to Python (bottle is an amazing and lightweight web framework, you should try it) and then I abandoned it in favor of paper and pencil (but I don’t do much manuscript writing anymore, which accounts for me not needing a reference manager and making do with paper notes).

The photo organizer I made two versions of in C++ using QT, and then I moved it to Python. I was quite happy with the Python version, but I decided I wanted some changes, and it seemed to be a new decade, so I started to itch to move it back into C++. QT had changed a lot since 2012 (which I think was the last time I messed with the C++/QT version of my photo organizer). There were a lot of new bells and whistles and I got excited and I wanted to sit down and see what I could do.

I wanted to start of with a library of core functions and an associated test suite. In Python, this kind of code is very easy to set up: I put the library in a source directory and the tests in a directory marked ‘tests’ and use nosetests to automatically discover and run them. The tests come about organically as short pieces of code I write in the REPL to test newly added features that smoothly build up the module

The first thing I realized is that I missed the REPL. For twenty years I had done without a REPL. I would code up a skeleton of an application, craft the makefile, have function stubs and then just make sure it compiled. I would then add functionality by fleshing out the stubs and adding new functions, refactoring into new files and so on. But now, just the thought of spending hours, perhaps days, setting everything up without any feedback, raised a barrier. It made things not fun. It was WORK! This is not a good thing for a project done in spare time, for enjoyment.

The next thing I missed were nosetests. The scene for unit tests is now much better for C++ and QT has it’s own little framework for writing and running tests. But they all look so CLUNKY! You have to declare a class, then include a header file, then make an application out of that. You have to tediously add every test separately, then add every function separately, then add every test case.

I guess, even with IDEs like QT Creator, there was too much boilerplate, too much gap between me writing some code and seeing what it did. The compile cycle didn’t help. Doxygen sounded OK, but there was no smile on my face when I went to download it and see how I would be using it in my project.


My first introduction to computers was via BASIC, then Pascal. I then went to C and C++. I had a class in 8085 assembly (We used kits packaged in neat wooden boxes. I enjoyed keying in the program and the data and then hitting go. My favorite project had been to write an infinite precision division program. I had seen Apollo 13 a few years back, and those kits and their interfaces reminded me of the DSKY).

My introduction to proper C++ was in a data structures and algorithms class, and the height of my C++ programming adventures was in a computer graphics class where I made animations out of particles in OpenGL and a kind of fly through of a complex scene. These were done on SGIs O2 machines, at the Indian Institute of Science’s Super Computer Center, where I would retire early each evening after dinner, because the O2 machines were limited in number and the cubicles would be quickly occupied.

I say all this, perhaps in a subconscious attempt to convince you that I’m not totally inexperienced in programming and do indeed intrinsically enjoy programming computers and understanding the art of programming computers, and don’t view it merely as a means to an end, which is to make these amazing calculating machines do clever tricks.

Screen Shot 2015-05-09 at 12.19.07 AM

But I think Python has spoiled me for ever in terms of the manner in which I enjoy computers. Writing C/C++ has become work. It’s something I would do wearing a suit and tie (not that I wear a suit and tie at work – which should tell you something). It’s something that I would do with a serious demeanor and a feeling, I imagine, mental patients have when wearing a straightjacket.

Boilerplate, boilerplate everywhere I look. Don’t smile. The doctors will take you away to the padded room. Better to look serious and keep your head down.

Performance you say? Well, to be honest, there are very few parts of my code where I need “C like” speed. And where I do, well, I can use CFFI, or more likely, I use Cython. It looks a little ugly, but hey, I have to write so little of it. The only annoying thing is inspecting the generated C code to make sure it’s getting rid of the dynamic Python calls.

Oh, yeah, pip install has spoiled me too. I ain’t chasing down your exotic library to read EXIF tag data in photo files. I haven’t got that kind of time. But I do have time to type pip install.

So, perhaps the only caveat I would add, in closing, is to say, the full title of this post should be “Why would I ever write in C for fun?”

How to shoot yourself in the foot with Cython

If you’ve grown fat and complacent using Python, try a little Cython. It’ll put hair on your chest and take some off your scalp.

Take the following Cython code I was writing (heavily paraphrased)

  char *s = seq  # seq is passed in to the function
  unsigned char i, j
  unsigned char sub_mat[85]

for i, j in zip([65, 67, 71, 84], [84, 65, 67, 71]):
  sub_mat[i] = j

return [ sub_mat[s[i]]  for i in range(len(seq))]

This code compiles file. It even runs fine for a few test cases I set up for it. But when I put it into a standard testing pipeline I use, it caused a different test to freeze up.

My first reaction was, I’ve lost a pointer somewhere. So I started to dig around. (That reminds me, there has got to be a easier way to hook a debugger up to Cython. The prescribed ways are so complicated). I started to sprinkle print statements in the code. Nothing.

I then broke the final list comprehension out into a loop and put print statements in there. Suddenly I saw that my program wasn’t going off the reservation by losing a pointer. It was actually looping over and over again at that point. And then the lightbulb went off. The loop variable i had been typed as an unsigned char (max value 255) and if len(seq) was ever greater than that it would just cause the loop to go on for ever as i overflowed and reset to 0 repeatedly.

Of course, those of you programming in C will have immediately noticed this issue and asked – “Does len(seq) ever get larger than 255?” But I’ve grown fat and lazy on Python which lets me do things like 1<<1024 without missing a beat (yes, that creates a 1025 bit integer. Suck it up C programmer), so, on occasion, I have to lose some hair from the top of my head when I slum it with all you C-coders.

Get more out of Cython

Getting more out of your Cython code doesn’t have to be a black art. A key step is to peek into the cythonized .c file and note if inner loops are getting down to bare C without any of the overheads associated with Python’s dynamism.

In an earlier post I had introduced Cython, which is a really cool language that straddles Python and C. Cython is Python with some ctype defs that helps a translator (Cython) to convert the Python code into a .c file which can be compiled and imported as a module. I’m one of the many folks who rave about this system because, just by changing .py to .pyx and running the code through the Cython system get get a 2x speedup for your functions. The next stage is to add some type declarations to your code. These give the real speedups.

I became curious as to what the type declarations actually do, and how they can help. Beyond curiosity, I was hoping that understanding this process would help me write more efficient Cython code.

I was working on a simple piece of code to determine the transition matrix for a DNA sequence. The DNA sequence is a string of characters from the set {A,C,G,T,N} (The N represents an unknown character, or, sometimes, a don’t care character). The transition matrix simply represents the probabilities that, in the sequence, character X will be followed by character Y, for every possible transition. This means a 5×5 matrix.

My first pass looked like this:

def compute_transition_matrix(bytes seq):
  # Given a string, computes the transition matrix description of that string
    unsigned long int t_mat[85][85]
    unsigned long int seq_len = len(seq), n, m

  for n in [65, 67, 71, 84, 78]:
    for m in [65, 67, 71, 84, 78]:
      t_mat[n][m] = 0

  for n in range(1, seq_len):
    t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1

  # Prefer to copy the data into a python list, rather than mess around with pointer conversion etc.
  return [[t_mat[n][m] for m in [65, 67, 71, 84, 78]] for n in [65, 67, 71, 84, 78]]

Inspecting the actg.c code generated by running cython on this file I found that my inner loop looked like this:

  /* "actg.pyx":17
 *       t_mat[n][m] = 0
 *   for n in range(1, seq_len):             # <<<<<<<<<<<<<<
 *     t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1
  __pyx_t_4 = __pyx_v_seq_len;
  for (__pyx_t_7 = 1; __pyx_t_7 < __pyx_t_4; __pyx_t_7+=1) {
    __pyx_v_n = __pyx_t_7;

    /* "actg.pyx":18
 *   for n in range(1, seq_len):
 *     t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1             # <<<<<<<<<<<<<<
 *   # Prefer to copy the data into a python list, rather than mess around with pointer conversion etc.
    if (unlikely(__pyx_v_seq == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_8 = (__pyx_v_n - 1);
    __pyx_t_9 = __Pyx_PyBytes_GetItemInt(__pyx_v_seq, __pyx_t_8, 1); if (unlikely(__pyx_t_9 == ((char)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_10 = ((unsigned char)__pyx_t_9);
    if (unlikely(__pyx_v_seq == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_9 = __Pyx_PyBytes_GetItemInt(__pyx_v_seq, __pyx_v_n, 1); if (unlikely(__pyx_t_9 == ((char)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_11 = ((unsigned char)__pyx_t_9);
    ((__pyx_v_t_mat[__pyx_t_10])[__pyx_t_11]) = (((__pyx_v_t_mat[__pyx_t_10])[__pyx_t_11]) + 1);

I know that the first thing you are thinking is that, Man this looks butt-ugly. Also gotos?!?! Well, the code is not really meant for human consumption AND there are some things in here that makes it easy to find what you are looking for. The first is that the name mangling preserves the core of the function name, so you can do a search for the function in an editor. The second thing, which turns out to be very, very important is that the original code is in comments along side the relevant C code. And for this I take my hats off the Cython folks, because they did not have to do this, but they put it in any way, which makes our inspection a LOT easier.

Now, you can see right away that the inner loop has a bunch of funny checks related to making sure variable types are correct and have appropriate attributes etc. And I thought, I don’t need this cruft, I KNOW this is a string of bytes – an array of chars.

My change was to explicitly tell Cython that my bytes seq variable was really an array of char (Note the char *seq = s):

def compute_transition_matrix(bytes s):
  """Given a string, computes the transition matrix description of that string"""
    unsigned long int t_mat[85][85]
    unsigned long int seq_len = len(s), n, m
    char *seq = s

  for n in [65, 67, 71, 84, 78]:
    for m in [65, 67, 71, 84, 78]:
      t_mat[n][m] = 0

  for n in range(1, seq_len):
    t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1

  # Prefer to copy the data into a python list, rather than mess around with pointer conversion etc.
  return [[t_mat[n][m] for m in [65, 67, 71, 84, 78]] for n in [65, 67, 71, 84, 78]]

The inner loop that results from this code, looks like:

  /* "actg.pyx":35
 *       t_mat[n][m] = 0
 *   for n in range(1, seq_len):             # <<<<<<<<<<<<<<
 *     t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1
  __pyx_t_5 = __pyx_v_seq_len;
  for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_5; __pyx_t_8+=1) {
    __pyx_v_n = __pyx_t_8;

    /* "actg.pyx":36
 *   for n in range(1, seq_len):
 *     t_mat[<unsigned char> seq[n - 1]][<unsigned char> seq[n]] += 1             # <<<<<<<<<<<<<<
 *   # Prefer to copy the data into a python list, rather than mess around with pointer conversion etc.
    __pyx_t_9 = ((unsigned char)(__pyx_v_seq[(__pyx_v_n - 1)]));
    __pyx_t_10 = ((unsigned char)(__pyx_v_seq[__pyx_v_n]));
    ((__pyx_v_t_mat[__pyx_t_9])[__pyx_t_10]) = (((__pyx_v_t_mat[__pyx_t_9])[__pyx_t_10]) + 1);

Which looks A LOT BETTER.

This is reflected in some simple %timeit based benchmarking by running this code on Human chromosome 1. The first version takes 1.4s to run while the second version takes 0.27s – a 5x speedup, which corresponds with us taking all that dynamic cruft out of the inner loop.

One thing to note is that the PROPER way to do all this is profile your code first and find out where the bottle necks are. Sometimes doing %timeit on individual functions are all that is needed. Sometimes you will need to do a proper profile, and Cython as directives to allow you to profile Cython functions, but these can slow the code down.