Cythonize!

In this post I take Python code to do a task (draw fractals), write it in the spirit of Python (a pleasure to write, read and maintain), use profiling to figure out the bottle necks and then speed up the most time consuming function using cython.

Python is great for writing projects of all sizes. Python code is easy to write, read and maintain (when well written) and there are many packages people have written to cover almost any use case you might have. Python, however, is slow. You don’t really feel this when you prototype things, but when you throw real data at the Python code, you wonder if things could be faster.

When I first took up Python I learned about boost python, and about swig and similar ways of wrapping C code to call from Python. The boiler plate involved dampened my enthusiasm. Such boiler plate (and the slow compile/debug cycle) was the major reason why I moved away from C/C++, which I otherwise quite liked.

Then, I recently discovered cython. cython is pretty interesting. It is a superset of the Python language with type declaration directives. It turns out that dynamic typing is a major reason why Python is “slow”. cython lets us static type variables of our choice, thereby getting rid of some of the overhead.

Consider a program to compute a representation of the mandelbrot fractal

def f(x, y, max_iter=1000):
  """
  z <- z^2 + c
  """
  c = x + y*1j
  z = (0 +0j)
  for n in range(max_iter):
    z = z**2 + c
    if abs(z) > 2:
      return 1.0 - float(n)/max_iter
  else:
    return 0.0

def mandelbrot(x0=-2.0, y0=-1.5, x1=1.0, y1=1.5, grid=50, max_iter=1000):
  dx, dy = float(x1 - x0)/grid, float(y1 - y0)/grid
  return [[f(x0 + n * dx, y0 + m * dy, max_iter) for n in range(grid)] for m in range(grid)]

This can be called and plotted as follows:

import pylab
pylab.imshow(mandelbrot(grid=500), cmap=pylab.cm.gray)

500px

Once you run the code a few times you will figure out that things get slow real fast because this is an O(n^2) algorithm – since the number of computations increases as the square of the size of the computation grid.

Now, since this is a reduced example, we can tell from simple inspection that the bottle neck function here is f. In a real world case you would make sure, perhaps using Python’s useful cProfile module where the bottleneck really is.

An easy way to do that is to import the code we just wrote and then run it using the profiler.

import cProfile
cProfile.run('mandelbrot(grid=100)', 'stats')

This will profile the run of the function and save the statistics to the file ‘stats’. We then use the stats module to interpret and print these statistics. We chose to order them by total time spent in a function, since we are trying to figure out where the computer spends most of its effort.

import pstats
p = pstats.Stats('stats')
p.sort_stats('tottime').print_stats(10)

Leading to an analysis that looks a bit like:

In [87]: run mandelbrot.py
Sat Jul 19 19:41:57 2014    stats

         1757763 function calls in 0.755 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.557    0.000    0.748    0.000 .../mandelbrot.py:2(f)
  1737658    0.138    0.000    0.138    0.000 {abs}
    10101    0.053    0.000    0.053    0.000 {range}
        1    0.007    0.007    0.755    0.755 .../mandelbrot.py:15(mandelbrot)
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
        1    0.000    0.000    0.755    0.755 :1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

As we guessed, the inner loop function f is the main CPU hog. In the past I would have stopped there. Sure, I could rewrite the inner loop in C – which would be satisfactorily efficient – but the headaches of making it work with Python! But cython lets us get some speedup with a minimum of fuss.

For a full introduction to cython, you should follow the tutorial, but here is one approach to using Cython to speed up the code we just wrote.

First, we simply transfer the bottleneck function to its own cython module – which is indicated with an extension of .pyx. Since Cython is a superset of Python for this first pass we simply leave f as it is, creating a file

mandel_f.pyx
def f(x, y, max_iter=1000):
  """
  z <- z^2 + c
  """
  c = x + y*1j
  z = (0 +0j)
  for n in range(max_iter):
    z = z**2 + c
    if abs(z) > 2:
      return 1.0 - float(n)/max_iter
  else:
    return 0.0

We then modify our original script to use the cythonized function:

import pyximport
pyximport.install(pyimport = True)
from mandel_f import f

def mandelbrot(x0=-2.0, y0=-1.5, x1=1.0, y1=1.5, grid=50, max_iter=1000):
  dx, dy = float(x1 - x0)/grid, float(y1 - y0)/grid
  return [[f(x0 + n * dx, y0 + m * dy, max_iter) for n in range(grid)] for m in range(grid)]

There are several things happening here all at once and there are several other ways to do this. Whatever code you put in the .pyx file is taken to be cython code. Cython is a superset of Python, so this time we left the python code as is and the cython system does just fine with it. Cython needs to first translate the (.pyx) file into C and then compile it to a library with appropriate hooks to python and then we need to import this compiled library into our code.

We could do this manually, but if our code is simple we can use the pyximport module.

Executing pyximport.install(pyimport = True) causes an automatic compilation of any imported cython files hiding the whole process from us. Therefore, the command from mandel_f import f causes mandel_f.pyx to be compiled and imported.

Profiling this code yields

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.348    0.000    0.348    0.000 {mandel_f.f}
    ...

Which is a small speedup, but without doing anything.

Suppose we start to put in some type information into the code

def f(float x, float y, int max_iter=1000):
  """
  z <- z^2 + c
  """
  c = x + y*1j
  z = (0 +0j)
  cdef int n
  for n in range(max_iter):
    z = z**2 + c
    if abs(z) > 2:
      return 1.0 - float(n)/max_iter
  else:
    return 0.0

The function signature now resembles a C signature and in the body we have declared n to be an int. Profiling yeilds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.243    0.000    0.243    0.000 {mandel_f2.f}

Which is a modest 2x speedup from the original pure python code.

Now we will do something that is a bit messy. See how we left the complex numbers alone? We could try and cythonize that too.

cdef extern from "complexobject.h":

  struct Py_complex:
    double real
    double imag

  ctypedef class __builtin__.complex [object PyComplexObject]:
    cdef Py_complex cval

def f(float x, float y, int max_iter=1000):
  """
  z <- z^2 + c
  """
  cdef complex c = x + y*1j
  cdef complex z = (0 +0j)
  cdef int n
  for n in range(max_iter):
    z = z**2 + c
    if abs(z) > 2:
      return 1.0 - float(n)/max_iter
  else:
    return 0.0

Profiling this we get

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.186    0.000    0.186    0.000 {mandel_f3.f}

which is around a 3x speedup.

A proper comparison, of course, would be in rewriting the function in C and then profiling that wrapper.

I would like to do one more thing before we leave. Function calls in Python have a lot of overhead, and this function is heavily called. Let’s see how much we gain by vectorizing our function. Vectorizing allows us to maintain the neatness of the function, but also reduces the function call overhead.

def f(x_list, y_list, max_iter=1000):
  """
  z <- z^2 + c
  """
  row = []
  for y in y_list:
    col = []
    for x in x_list:
      c = x + y*1j
      z = (0 +0j)
      for n in range(max_iter):
        z = z**2 + c
        if abs(z) > 2:
          rv = 1.0 - float(n)/max_iter
          break
      else:
        rv = 0.0
      col.append(rv)
    row.append(col)
  return row

def mandelbrot(x0=-2.0, y0=-1.5, x1=1.0, y1=1.5, grid=50, max_iter=1000):
  dx, dy = float(x1 - x0)/grid, float(y1 - y0)/grid
  return f([x0 + n * dx for n in range(grid)] , [y0 + m * dy for m in range(grid)] , max_iter)

def quick_plot(grid=100):
  import pylab
  pylab.imshow(mandelbrot(grid=grid), cmap=pylab.cm.gray)
  pylab.show()
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.501    0.501    0.670    0.670 .../mandelbrot_vec.py:1(f)

After applying all the cythonizations we learned:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.188    0.188    0.188    0.188 {mandel_f_vec.f}

Not THAT impressive, huh? I was pretty surprised as I had hope the “vectorized” form would speed things up a bunch by getting rid of function call overhead.

A note on “The Worst Programming Interview Question”

Rod Hilton has a blog post where he argues against using puzzler type questions during an interview. I agree with him in spirit, but feel that puzzler questions have a place in interviews, provided they are used correctly.

Hilton’s piece is very well written and I recommend it, especially if you are in a position where you interview candidates. Hilton is strongly against Puzzler questions because he feels they add additional stress to an already stressful situation and only detect if a candidate has heard the question before and memorized the answer.

I generally agree with Hilton’s sentiments, but, if you form a Union of questions/strategies that different people consider unsuitable for judging job candidates, you will end up with nothing left to ask or look at. This suggests that interviewing is an art: a complex human activity that can be learned through observation and practice, but is probably hard to convey in writing.

When I was interviewing for jobs I experienced a variety of interview styles. I am lucky to report that I enjoyed every interview. I interviewed at three places, and only in one did I get a puzzler kind of question.

In one firm, in the first round I got a very easy ETL (industry speak for scratching out usable, neat data, from a raw data file) problem, which I solved in Python and guessed they were using to kick out folks who could not program at all. In the next round I gave a talk and faced a panel of interviewers most of whom asked me straight technical questions related to statistics (though one of the interviewers clearly had no interest in interviewing me and went through my CV in front of me and asked random questions from it).

At the next firm the first round was simply getting to know me and them (which I guessed they were using to kick out people who could not communicate/were socially unsuitable). The next round I found immensely enjoyable: they gave me some of their data and just let me lose on it with a very open ended question. I had a lot of fun playing with it and ended up making an IPython notebook of my results. They were very thoughtful in giving guidelines: they said they were looking at an applicant’s ability to condition data (detect and toss out bad data) and an applicant’s ability to present analysis.

This was the interview I enjoyed the most and I suspect the interview that took me and the company folks the most time to do. When I was done with my notebook I was very satisfied and happy and eager to go work for these folks. I was invited over for a site interview, but by that time I had already received and accepted a job offer. This brings me to the puzzler question.

The job offer that I finally accepted was from the company where I was presented with a puzzler question. It was a statistical puzzler, rather than a programming/algorithm one. It caught me off guard. I worked through it the best I could, but I could tell the answer wasn’t quite there. My interviewer would listen to me think, wait for my partial answer and then give me a hint. I used the hints to move forward, but I felt I was failing.

At the end, I was pretty sure I would not hear from the company again, so I was surprised when I was asked in for a second interview. This was a broader discussion of work and interests and I felt that went really well, and indeed, I received an offer shortly.

A few months into the job, I spoke about the puzzler question with the colleague who had interviewed me. His reasoning for the puzzler, which I found satisfactory, was that it was a way to test how people thought when faced with a tough problem and whether they thought at all, since the work we do if often difficult and often does not have clear cut answers. I also realized, on my first day on the job, that my colleagues had been reading one of my blogs (this one) and that formed part of their opinion about me.

Hilton’s points are well taken. I feel, however, that there is great value in seeing whether people are willing to think. When I was preparing for my Ph.D. defense one of my advisors told me that some questions during a defense are thrown at the candidate simply to see how they react. The panel assumes – the question being so bizarrely out of scope of the thesis – that the candidate doesn’t know the answer, and they want to see how the candidate thinks through novel situations.

When the job you are hiring for requires people to take initiative and solve problems on their own, given a minimum of guidance, it is very effective to be able to see how a candidate reacts to such situations: do they seem eager to tackle open ended situations, or are they unhappy when directions are vague and outcomes not clear? It is possible that a Puzzler question will be able to unearth some of these qualities.

If I were to use puzzler questions during an interview, I would open by being honest: I would say that this is an intentionally difficult question, it is not meant to test knowledge but rather approach. I would then pick a case from my own work that I had to think through, so I would be familiar with the topic AND it would be a relevant kind of puzzler question.

What, of course, you can not judge from this kind of puzzler question is staying power: the person was willing to spend 10min thinking about a problem, but will this person come back to the problem day after day, week after week, until they crack it?

There are also many other factors at work here: perhaps the person does not like to expose their vulnerabilities to strangers – when faced with the same question with familiar colleagues the same person that choked at the interview could be spectacular to brainstorm with.

Looking back, the interview I liked the most – the one where I got to play with some data analysis – was actually a puzzler question. It was open ended, it did not have a set answer and it was designed to see how I thought. But it was not a cookie cutter puzzler question that you would get out of an interviewing book – it was a real life test, that was relevant to the position they were interviewing for.

Taking this further, I think the only way firms should interview candidates is to have paid internships or trial periods where they get to see a person in their “natural” state, working through the kind of problems they would actually face, and interacting as they normally would.

The problem of course, like many other things we don’t do well, is that we don’t have that kind of time …