maximum [1..1e6]

I had heard about Haskell – the coolest language on the block – from several friends. After reading the chapter on typeclasses fromĀ Learn You A Haskell for Great Good my mind was blown away and I wanted to get cracking and learn this cool functional programming thing.

I’m a practical guy. As I followed the tutorial along defining toy functions, foraging for a suitable editor (lighttable works, but I’m still looking), figuring out why cabal was giving 502 errors (define a different remote, hackage goes down sometimes) I suddenly thought, how does Haskell stack up against Python in doing a simple thing like finding the maximum of a list of numbers?

In Python, this stuff is easy. You open up IPython and do something like this:

%%timeit x = xrange(int(1e6))
max(x)

And get something like this:

10 loops, best of 3: 22.4 ms per loop

In the interactive Haskell compiler ghci

:set +s

turns on profiling and one can do

maximum [1..1e6]

To get

(0.20 secs, 244690144 bytes)

Both of these numbers (time taken, memory used) seem a little high! 244 megabytes for this?

Scuttlebutt on the internet has it that the ghci runs slowly, so I wanted to time a compiled version of this code.

To start off, I wanted to see if the bare bones version of the code would compile and run:

main = do let z = maximum [1..1e6]
          putStrLn $ "The result is: " ++ show z

Which led to:

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

Uh? Mog no understand. How much stack do we need for this again? It’s just a million numbers. Python, even Matlab, eats this for breakfast everyday, as a starter, with the OJ.

Ok, let’s try RTS.

./max +RTS -K100M
max: Most RTS options are disabled. Link with -rtsopts to enable them.

Oh, alright. We’ll compile with

ghc -rtsopts  max.hs

(Don’t forget to delete the intermediate files, got tripped up there)

The result is: 1000000.0

real    0m0.262s
user    0m0.222s
sys 0m0.039s

Well, this isn’t THAT impressive.

What does a guy have to do round here to get some decent performance from this thing? (I also tried with pure ints, in case you were wondering, with the same results)

As a comparison, what does a simple C implementation of bubble max get us? (Please excuse my rusty C)

#include <stdio.h>

struct ListGenPtr {
  int current;
  int next() {
    if (current < 1000000)
      return ++current;
    else
      return -1;
  }
};

int bubble_max(ListGenPtr *list_gen_ptr) {
  int m0 = list_gen_ptr->next(),
      m1 = list_gen_ptr->next();
  while (m1 != -1) {
    if (m0 < m1) m0 = m1;
    m1 = list_gen_ptr->next();    
  }
  return m0;
}

int main() {
  ListGenPtr lgp = ListGenPtr();
  lgp.current = 0;
  printf("%d\n", bubble_max(&lgp));
}

Compiled with

g++ max.c -o maxc
time ./maxc
1000000

real    0m0.008s
user    0m0.006s
sys 0m0.002s

Which seems reasonable and three times faster than Python.

(We can push Python to the limit and do:

%%timeit x = xrange(int(1e9))
max(x)
   ....: 
1 loops, best of 3: 22.9 s per loop

And against C:

time ./maxc
1000000000

real    0m4.547s
user    0m4.540s
sys 0m0.005s

Which is 5 times faster.)

Hmm. How do we get some decent performance out of Haskell for the kind of datasets we expect to play with in the real world? It seems that to do this I will need to learn more about Haskell internals – things that will come under advanced topics – things that meddle with lazy evaluation and tail recursion.

So, right now, to me, Haskell is an academic language, in that with Python I can get up and running and write good code very concisely while with Haskell, even for a simple thing like the maximum of a list we have to start tinkering with non-trivial parts to get acceptable performance.