The magic of memoization

Project Euler problem #15 (This may be a spoiler BTW) can be solved via recursion, but because the tree is so wide it quickly blows up. More irritatingly, much like computing the Fibonacci sequence recursively, we keep doing the same computation over and over again. I knew how to solve this using imperative programming and I was trying to shoehorn that solution into Racket, but just thinking about it made me feel dirty. If only there was some way to retain the elegance of recursion but not have to redo computations …

Project Euler problem #15 poses the challenge: Starting in the top left corner of a 2×2 grid, and only being able to move to the right and down, there are exactly 6 routes to the bottom right corner. How many such routes are there through a 20×20 grid?

It turns out that there is an analytical solution to this problem involving combinatorials. I kind of got there while sketching out solutions, but fortunately for me, I decided to let the computer do all the work. Fortunately because I learned a cool new functional tool.

The straightforward way to solve this problem using functional programming paradigms is to use recursion. Just set off little machines that keep forking off copies off them selves chasing down alternate paths. If the machines go off the grid, discard them, and if they reach the exit corner, tell them to report back. Count up the reports and you are done!

#lang racket
(require rackunit)

; A 2D value
(struct pos (x y) #:inspector #f)

;; The two legal moves
(define (move-right p0)
  (struct-copy pos p0 [x (+ (pos-x p0) 1)])) 

(define (move-down p0)
  (struct-copy pos p0 [y (+ (pos-y p0) 1)])) 

;; Return True if we are at the grid boundary
(define (boundary-pos p0 grid-size)
  (if (or (= (pos-x grid-size) (pos-x p0)) (= (pos-y grid-size) (pos-y p0))) #t #f))


;; Use recursion to explore all possible moves. Our base case occurs when we
;; land on the grid boundary. We know that from then on there is only one
;; path to the exit - either straight right or straight down. Return 1 when that
;; happens
; p0 - initial position (pos struct)
; grid-size - size of grid (pos struct)
(define (count-paths grid-size [p0 (pos 0 0)])
  (cond
    [(boundary-pos p0 grid-size) 1]
    [(+ (count-paths grid-size (move-right p0)) (count-paths grid-size (move-down p0)))])) 

(test-case "Recursive"
           (check-equal? (count-paths (pos 2 2)) 6)
           (check-equal? (count-paths (pos 1 3)) 4)
           )

(time (let ([k (count-paths (pos 10 10))]) (display k))) 
;-> 184756  cpu time: 88 real time: 88 gc time: 0
;; 20x20 grid does not finish and keeps asking for more memory

Now this thing blows up quickly because, as you can see, each node we visit spawns two machines (function calls) and pretty soon our grid is crawling with machines. On my machine I can’t even compute the 20×20 solution.

What is annoying is that it doesn’t have to be this way: we can see that the recursion repeatedly visits the same lattice nodes (after all there are not THAT many of them). An interesting property of this problem is the number of paths from a given node to the exit does not depend on how we got to that node. So we don’t really need to send machines down a node we have already visited.

The way I would have exploited this property in imperative programming is to build a two dimensional array representing the grid. I would have started to fill out this array by walking backwards from the exit and noting, in the array values, how many paths there are to the exit. The value of each node is the sum of the values of the two “children” node – nodes that can be reached by going down or by going right.

If you want keywords, this is a bottom up dynamic programming approach because we are taking small, individual pieces of the problem and working out the answer separately and then putting things together to solve bigger parts of the problem. (Wikipedia’s page on dynamic programming is pretty good)

I started to poke around Racket and look into matrices and arrays and mutability and I started to get a yucky feeling. It started to feel a lot like housekeeping and housekeeping is one thing functional programming promises to free us from, so we may concentrate on the fun stuff.

While doing various web searches related to “dynamic programming functional languages” and “update mutable array Racket” (yes, I was desperate) I kept hitting the word memoization. I’d come across this term before, but had not paid any attention to it. Then I found this great post on the Racket blog.

Before we go any further, take a look at the code below:

(require memoize)

(define/memo* (count-paths-m grid-size p0)
  (cond
    [(boundary-pos p0 grid-size) 1]
    [(+ (count-paths-m grid-size (move-right p0)) (count-paths-m grid-size (move-down p0)))])) 

(test-case "Memoize"
           (check-equal? (count-paths-m (pos 2 2) (pos 0 0)) 6)
           (check-equal? (count-paths-m (pos 1 3) (pos 0 0)) 4)
           )

(time (let ([k (count-paths-m (pos 20 20) (pos 0 0))]) (display k))) 
; -> 137846528820  cpu time: 0 real time: 0 gc time: 0

;; Note that this last operation would basically run out of memory and crash using the naive recursive implementation

What just went down here? (Yes, I learned this catchy phrase from Tim Roughgarden’s lectures)

We only made one change – we used the memoize package and we defined the function as a memoized function. (We also had to remove the default value, apparently the memoization package does not support that)

What is this Memoization magic?

Recall that what I was looking for was a scheme that would prevent us having to recompute already known values. The approach I was familiar with from imperative programming was an explicit table store of known values that I could reference each time I went to do a computation.

Memoize is a very elegant implementation of a related idea. However, instead of putting the implementation (e.g. a matrix of known values) first, it puts the function first and says “What we really want is a way for a function to remember the values it computes”.

What the two lines of Racket above do (and note, really, that we did not have to touch the algorithm at ALL) is to say, here’s a function count-paths-m and I want to remember the answer to each call that is made to it, so if I ever call the function with the same arguments, instead of executing the body of the function, just return the stored value.

And THAT’s IT! No need to mess with dirty implementation details. Memoize gets to the heart of what we want to do.

Now, such memoizations, because they have to work with any function, are usually implemented as a hash (I know that the Racket one is – by the way, the Racket implementation is one short file). Performance freaks may hem and haw, since in this particular case, for example, rather than having a hash look up, it is faster and more compact to have a 2D array lookup. But, as you can see from the timing, it’s fast enough.

As a side note on Racket, I was very impressed by the smoothness of getting this package installed. I used the GUI to search for “memoize”, the manager searched in different repositories, found the github repo, downloaded the code and integrated the documentation with the rest, so now I can find the docs for memoize just like I would any of the pre-installed packages.

Finally, a small point specific to this Racket package: note I used define/memo*. I first used define/memo and got very unexpected results, which seemed a lot like memoziation wasn’t working. It turns out to be a detail of the package: I was using structs – not primitive types – as inputs to the function, and the *-d versions of the functions implement the proper comparisons to handle such derived data types.

UPDATE: I forgot to put this in: Note that now we have a kind of state in the function. It’s not a state that changes the input/output characteristics of the function, but rather an “implementation state” if you will. The state changes how the function is implemented based on the previous history of the program. This does not give us any problems when reasoning about what the function does (as befits a proper functional program) but it does make it hard to reason about the speed characteristics of the function in a way that mirrors difficulties in reasoning about algorithmic characteristics of an imperative program with mutable state. Funny world.

Quick thoughts on starting Functional Programming (in Racket)

Having been “functionally curious” for a while now, I’ve started with Racket, a language in the scheme family which is a type of Lisp. Yes. Racket rhymes with bracket.

My first foray into functional programming was with Haskell. When I started with Learn You a Haskell for Great Good I was initially excited, but then I got a little confused and this was enough of a barrier that I stopped.

When I came back to trying Functional Programming (almost a year later) I made two changes. I picked a different language (Racket) and I decided that, instead of my usual method of jumping into the deep end by starting a real life project with the language, I would patiently step through the Project Euler problems in that language. The combination of the three seems to be working.

Racket

DrRacket is a very awesome learning tool. It’s what the IPython notebook and the IPython interactive console aspire to be and more. I don’t use the pop-out hints (on the right hand corner) but I make extensive use of searching the documentation for a standard library description, for graphical description of function usages (DrRacket draws arrows from a function definition to its uses!) and for locating errors.

Screen Shot 2015-08-25 at 11.25.49 AM

Here’s a nice use case for this. I copied over a function intending to modify it and make a different version of it (this is part of some learning code, so I like to keep older versions of algorithms around for comparison). By hovering over the older definition and noting the long arrow that goes down into the new (pasted) code I can see that I haven’t changed the function name in one place yet:

Screen Shot 2015-11-05 at 8.54.56 AM

The documentation is very good, though I haven’t run into a user community (Edit: The folks on the mailing list are very kind and helpful!). Perhaps I should wander down to Northeastern U. I started with the tutorial in the guide, but after settling on Project Euler, I abandoned the tutorial and now make extensive but non-linear use of the guide and reference (which is, incidentally, included with the distribution, so I don’t need an internet connection) and stack-overflow.

In terms of syntax, Racket is a bit annoying. It’s fairly easy for me to write, but I don’t like to read it. The polish prefix notation is annoying but strangely addictive. I definitely prefer infix, but there is a perverse pleasure in getting fluent in prefix. The real problem is with the brackets. Perhaps I’ve just not learned how to format things and need to look at other people’s Racket code, but my code is unreadable.

Project Euler

Project Euler seems just the ticket for me getting started with Functional Programming. The problems are paced and are a nice mixture of mathematical theory and code. For each problem I find myself scribbling on paper, thinking a bit and then coding. I really like this. Solving problems in order of difficulty helps me from getting stuck in one place for too long.

Functional Programming

I have to admit learning a new paradigm is a lot more fun than just learning a new language. After a lifetime of coding imperatively it’s a nice challenge to deny myself the easy pleasures of loops and variable assignments (though that takes it a bit far if done strictly). I’m sure there is a lot more to FP, but that is what I’ve picked up doing the first few Euler Project problems.

Using list comprehensions in Python (which are implicitly map and/or filter) has made the concept of map and reduce very easy to grasp. (As an aside, I really like Python’s syntax for list/dictionary comprehensions which I find to be concise and readable.)

In a way map and reduce are easy. Map carries no state anyway and so is a really trivial “mapping” from the loop form

for i in N:
  l[i] = f(m[i])

simply becomes

l = map(f, m)

Reduce is slightly more interesting because it has an accumulator but it almost seems a like a cheat to me because reduce is a function that actually bundles an interesting functional concept (recursion) inside and lets you get away without thinking about it.

Pattern matching

The first pattern matching I was exposed to was Haskell’s and I fell in love with that. The notation is so close to how you would write down functions on a piece of paper that it seems the only natural way to do it (I’m sad to say that Racket’s pattern matching syntax is kind of cumbersome especially when you want to match conditions. I started using “if” statements as they are more succinct and then I discovered cond, which leads to pretty neat code)

Recursions instead of loops

Though racket has loop-like constructs, I’ve stayed away from them and tried to write everything as recursions. It was a little challenging at first but became more and more natural. It felt a little awkward because when I write Python, in the back of my mind is this voice that keeps saying <ghostly quiver in voice> “Function call overhead” and I usually end up making somewhat meaty functions. In Racket, because I’m doing things like recursion I end up abstracting computations into functions just to make code look neater and more atomic.

Testing

I’m really happy that Racket has a very easy to use and fully featured unit test system. It encourages me to include the tests right underneath an important function and I just run the file to execute the tests

Get it right the first time

I’ve now written a bunch of short scripts in Racket, mostly solving Euler problems, but some also to follow along Tim Roughgarden’s Algorithms course on coursera. One of the things that has startled me a bit is how often I get my Racket code right the first time. One big caveat is that I’m usually doing simple things, but it still startles me. I’ll be writing this dense looking code in this new language, I’ll write out the tests and then run and it’ll all pass. I think a large part of this is because I’m doing pure algoirthms – I’m not having to mess with real data which has a thousand edge cases that clutter up the code, and part of it has to do with me breaking up each task into atomic parts and coding and testing each part separately.

When writing in Racket I often feel I’m writing machine code again. I’m down in the weeds doing one small task at a time. This way, Python is more expressive, I feel I can do pretty complex things with a single list comprehension, for example. But functional code can be pretty cool and compact to look at when I get it right.

;; Given two sorted lists l1 and l2, merge them into a sorted list l3
(define (merge l1 l2)
  (cond
    [(= 0 (length l1)) l2]
    [(= 0 (length l2)) l1]
    [(< (first l1) (first l2)) (cons (first l1) (merge (rest l1) l2))]
    [else (cons (first l2) (merge l1 (rest l2)))]))

(test-case "merge"
           (check-equal? (merge `(1 2 3) `(4 5 6)) `(1 2 3 4 5 6))
           (check-equal? (merge `(1 3 5) `(2 4 6)) `(1 2 3 4 5 6))
           (check-equal? (merge `(2 4 6) `(1 3 5 7)) `(1 2 3 4 5 6 7))
           (check-equal? (merge `(2 4 6) `(2 4 6)) `(2 2 4 4 6 6))
           )

(define (merge-sort l)
  (cond
    [(< (length l) 2) l]
    [else (let-values ([(l1 l2) (split-at l (quotient (length l) 2))])
            (merge (merge-sort l1) (merge-sort l2)))]))

(test-case "merge-sort"
           (check-equal? (merge-sort `(6 5 4 3 2 1)) `(1 2 3 4 5 6))
           (check-equal? (merge-sort `(5 4 3 2 1)) `(1 2 3 4 5))
           (check-equal? (merge-sort `()) `())
           (check-equal? (merge-sort `(2)) `(2))
           (check-equal? (merge-sort `(6 1 4 3 2 5)) `(1 2 3 4 5 6))           
           )