maximum [1..1e6]

I had heard about Haskell – the coolest language on the block – from several friends. After reading the chapter on typeclasses from Learn You A Haskell for Great Good my mind was blown away and I wanted to get cracking and learn this cool functional programming thing.

I’m a practical guy. As I followed the tutorial along defining toy functions, foraging for a suitable editor (lighttable works, but I’m still looking), figuring out why cabal was giving 502 errors (define a different remote, hackage goes down sometimes) I suddenly thought, how does Haskell stack up against Python in doing a simple thing like finding the maximum of a list of numbers?

In Python, this stuff is easy. You open up IPython and do something like this:

%%timeit x = xrange(int(1e6))

And get something like this:

10 loops, best of 3: 22.4 ms per loop

In the interactive Haskell compiler ghci

:set +s

turns on profiling and one can do

maximum [1..1e6]

To get

(0.20 secs, 244690144 bytes)

Both of these numbers (time taken, memory used) seem a little high! 244 megabytes for this?

Scuttlebutt on the internet has it that the ghci runs slowly, so I wanted to time a compiled version of this code.

To start off, I wanted to see if the bare bones version of the code would compile and run:

main = do let z = maximum [1..1e6]
          putStrLn $ "The result is: " ++ show z

Which led to:

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

Uh? Mog no understand. How much stack do we need for this again? It’s just a million numbers. Python, even Matlab, eats this for breakfast everyday, as a starter, with the OJ.

Ok, let’s try RTS.

./max +RTS -K100M
max: Most RTS options are disabled. Link with -rtsopts to enable them.

Oh, alright. We’ll compile with

ghc -rtsopts  max.hs

(Don’t forget to delete the intermediate files, got tripped up there)

The result is: 1000000.0

real    0m0.262s
user    0m0.222s
sys 0m0.039s

Well, this isn’t THAT impressive.

What does a guy have to do round here to get some decent performance from this thing? (I also tried with pure ints, in case you were wondering, with the same results)

As a comparison, what does a simple C implementation of bubble max get us? (Please excuse my rusty C)

#include <stdio.h>

struct ListGenPtr {
  int current;
  int next() {
    if (current < 1000000)
      return ++current;
      return -1;

int bubble_max(ListGenPtr *list_gen_ptr) {
  int m0 = list_gen_ptr->next(),
      m1 = list_gen_ptr->next();
  while (m1 != -1) {
    if (m0 < m1) m0 = m1;
    m1 = list_gen_ptr->next();    
  return m0;

int main() {
  ListGenPtr lgp = ListGenPtr();
  lgp.current = 0;
  printf("%d\n", bubble_max(&lgp));

Compiled with

g++ max.c -o maxc
time ./maxc

real    0m0.008s
user    0m0.006s
sys 0m0.002s

Which seems reasonable and three times faster than Python.

(We can push Python to the limit and do:

%%timeit x = xrange(int(1e9))
1 loops, best of 3: 22.9 s per loop

And against C:

time ./maxc

real    0m4.547s
user    0m4.540s
sys 0m0.005s

Which is 5 times faster.)

Hmm. How do we get some decent performance out of Haskell for the kind of datasets we expect to play with in the real world? It seems that to do this I will need to learn more about Haskell internals – things that will come under advanced topics – things that meddle with lazy evaluation and tail recursion.

So, right now, to me, Haskell is an academic language, in that with Python I can get up and running and write good code very concisely while with Haskell, even for a simple thing like the maximum of a list we have to start tinkering with non-trivial parts to get acceptable performance.


7 Replies to “maximum [1..1e6]”

  1. Apparently the problem is regrettably just that maximum is implemented in a really stupid way in the Prelude, and isn’t lazy enough. Implementing it the way it should be implemented (“maximum (x:xs) = foldl’ max x xs”), your example runs in GHCi in 0.05s in my system (using maximum from the prelude, it’s 0.31s). I’ve been told that if you specify -O2, GHC might be able to catch this and introduce the needed strictness, so do give that a try.

    I really wish that the people upstream would work to get rid of these silly, pointless hurdles for beginners. The likely explanation for this is that the strict foldl’ isn’t portable across different Haskell compilers and the GHC guys didn’t want the hassle of maintaining their own variation different from the one in the report. So foldl1 it was, despite people knowing full well that lazy left folds are a terrible idea.

    1. Apparently, because of the overloading introduced by the Ord typeclass, lazy foldl does actually give a more correct implementation, potentially, in cases where (>) might be lazy. I am still wondering if this is the right tradeoff in the design space though, given that people are not very likely to work with these instances..

      Note that maximum here is different from the maximum functions you’re using in C++ and Python, so it shouldn’t be too surprising that it performs worse. To make a better comparison, try something like defining “maximum’ (x:xs) = foldl’ max x xs”. This version of maximum is more like the one in C++ and Python, and would be a better basis for comparison.

  2. Welcome to the Haskell community! Writing a frustrated performance blog post is a rite of passage 🙂 The foldl thing is indeed embarrassing, and for a handful of reasons like this, Haskell isn’t a language that someone who’s good in C can just pick up and use easily. Just as an experienced Python programmer might be initially insulted by his/her first encounters c (“Using an array as a function’s return value will cause a seg fault? Seriously?”).

    FWIW, maximum doesn’t call foldl in the latest compiler release. But getting good speed from Haskell on such small benchmarks will probably never be something that comes out of the box with just a week or two of studying. It’s not a beginner-level problem, but it’s not research-level either. There are just a few rules of thumb and troubleshooting techniques to learn.

    The stock performance advice is avoid lists and the builtin list functions if you want to very quickly scan. Haskell lists are great as control structures, but you don’t want to use them for fast scans.

    import qualified Data.Vector.Unboxed as V
    main = print $ V.maximum (V.generate 1000000 id)

    This runs as fast as your c bubbleMax on my machine… including the time spent starting the Haskell runtime.

    Anyway, if you liked the typeclass stuff, that’s just the tip of the iceberg. Don’t let a little foldl spoil your functional fun!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s