While x:

This is a cautionary tale about not misusing Python’s kindness. Or perhaps a cautionary tale about not trusting even widely used libraries to mirror what Python does. It is a little technical …

I have a program that, for the purposes of this post, operates on a list of pairs of numbers, going through each pair one by one until it exhausts the list. For certain reasons I’m using an iterator on the list and using Python’s ‘None’ to indicate that the list has run out, like so:

x = [(0, 0), (1, 2), (3,1)]
itr = x.__iter__()
v = next(itr, None)
while v:
  print v
  v = next(itr, None)

The pair of numbers represents particular data. The first number can range from 0 onwards and is always increasing monotonically while the second number is one of {0, 1, 2}. I have several tests for this function and have checked for different edge cases.

As you can see, in the snippet above, the list x is a list of Python tuples. After the bulk of the development for this (and related) functions had been completed, I had decided that a better way to store my list of tuples would be as a numpy structured array. That way I would have labels for my first and second numbers in the pair and wouldn’t get confused which was which six months from now.

And Python, being the great dynamic language it is, let me make the switch without a murmur. Python doesn’t care that the object I was initially passing it was a list of tuples and the new one is a numpy structured array! Python sees both objects yield iterators as needed and both objects are spitting out pairs of numbers at each iteration. So Python just works!

Things ran just fine for a while.

One day my colleague, who has used my program many times before, came across some very strange results in their experiments. I knew right away that this was going to be an interesting bug, just like any bug that is found after a program has been used for a while. The simple/easy bugs have all been squashed. What remains are the cockroaches. Evil, secretive errors in logic that scurry way from the daylight and come out only in the shadows of the night.

So I took the input data my colleague was using and ran my program. Indeed something bizarre was going on. For some of the data the program was acting exactly as it should. But for some of the data the program claimed that the list of pairs of numbers was empty and would exit right away. It would do the rest of the processing without any errors.

My first guess was that something was wrong with the input files. The input files were being generated with some hastily written code that I had not fully tested, so it was the logical choice, though this would make the bug less interesting.

I reviewed the input data and found that it was just fine. So the input to the program was correct. The program was just borking, seemingly, randomly.

I then started to do things the old fashioned way. I started to put in print statements at various parts of the code and simply print the length of my array of pairs of numbers. This was being done on a remote machine and I hadn’t learned yet how to hook my debugger into the remote machine. Also, sometimes, print statements are just a more pragmatic way to debug, regardless of what the eggheads will tell you.

The program loads the list. A million elements in the list. Good. The program does a transformation on the list. A million elements. Good. Another transformation. Still a million little pairs of numbers, hanging on for dear life. Then they enter this piece of code I sketched out above. And BAM! the program skips the loop, grinning as it whizzes by, zero little pairs of numbers.

I run the program again, this time with a different input set. A hundred thousand elements. Passes first transform, second transform, and then my loop. Round and round it goes in the loop, a hundred thousand times before shooting out, all dizzy at the end, having done exactly what it was supposed to do.

Very interesting.

What is different between the two data sets that would make the code DO this? So I opened by the data and started to look closely at the ones that failed and the ones that succeeded and then it struck me. All the ones that failed had (0, 0) as their first element.

And then it was clear.

When I wrote the loop:

itr = x.__iter__()
v = next(itr, None)
while v:
  print v
  v = next(itr, None)

I was a bit lazy. What I really meant was, “exit the loop if v is None”. Python is very kind and None evaluates to False in this test, so it is equivalent. The problem is that the loop will also terminate if v == 0. But, wait this is actually not a problem because my v is not actually a scalar number. It is a pair of numbers and during my testing I have repeatedly verified that (0, 0) != 0. One is a scalar value – zero – while the other is a Python object – a tuple. That can’t possibly be the problem!

But wait! Halfway through development I told you I switched from using Python lists of tuples to numpy arrays.

So I looked a bit into that, and BAM, sure enough, that’s where the behavior differs. For some reason the numpy library has been written to evaluate an element of a structured array the same as zero if both their elements are zero. This is different from taking a slice of a multi-dimensional array, where Python will complain that “ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()”

This is what still throws me. When I ask numpy for an element of my structured array of pairs of numbers, I get back a pair of numbers. These numbers are not simply another numpy array of two elements – which is what you would get from slicing a 2D numpy array. It’s a pair of numbers. However, this pair does not act as a Python tuple. It’s a different beast altogether.

Well, folks, the moral of the story is, don’t be lazy. If you want to test for None, test it explicitly. You never know what evaluates to 1 or to 0 even if you think you do.

If you would like to see this bug for yourself, here is a sufficient example:

import numpy as np

def looper(x):
  itr = x.__iter__()
  v = next(itr, None)
  while v:
    print v
    v = next(itr, None)

def python_list():
  print 'Python list'
  l = [(0, 0), (1, 0), (2, 2), (3, 0)]

def numpy_struct_array():
  print 'Numpy array'
  l = np.array([(0, 0), (1, 0), (2, 2), (3, 0)], dtype=[('x', 'int32'), ('y', 'int8')])

if __name__ == '__main__':

6 thoughts on “While x:

  1. Goran says:

    Perhaps beside the point of the post, but any reason for not iterating the “proper” Pythonic way (for v in iter)?
    (It usually yields a bit faster, but much cleaner code, and it fits into other language constructs better…)

    • KG says:

      Hi Goran! Bwahahaha. Ok this is another post in ‘cargo-cult programming’. In the actual code there is a conditional for advancing the item while looping. So sometimes I can run through the loop twice before going to the next item. I’m looking at the code to see if I can rewrite it to use the for v in iter version, but I think it would look uglier because I’d have to repeat some code. This final version evolved as I squashed some nasty bugs. I then decided to let well enough alone once things were working as I wanted.

      The construct looks like:

      # set up state based on data
      while v is not None:
        if f(v, state):
          # advance state
          # advance state differently
          v = next(x)

      I can mail you the link to the code if you may have suggestions for making it more compact. I could also use an index, but I don’t know if that makes things slower.


      • Anonymous says:

        Sure, send the code, just for fun 🙂
        Also, (almost) never index in python 🙂 if you need indices do “for i, x in enumerate(iter)”

  2. Nice war story. I’ve had a handful of cases like that, that boiled down to mixing numbers, bools, and Nulls. It’s a blessing when the error is discovered because of a *leading* zero… so were there data sets that got prematurely (and subtly) truncated because the first zero is buried 90 percent of the way toward the end?

    • KG says:

      Hi Greg, good to hear from you!

      So the only time this error shows up is when the element is (0, 0). The first entry is monotonic (the sequence is sorted by the first entry). In much of the data the first entry is unlikely to be zero so the bug never manifests. In this new data set we were trying out, the first entry is always 0, so now we have a 1/3 chance of hitting this bug, because the second element is 0,1,2. Yes, it would be very subtle if the list were not sorted and the (0, 0) entry could happen anywhere.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s