Random numbers in a parallel world

TL;DR: It’s always a great idea to explicitly initialize your random number generator especially in a parallel computing environment.

Random number generation in computing commonly uses pseudorandom number generators. These are iterated functions that take an initial number (a seed) and spit out a new number each time you call them. Each number is part of a sequence that is completely defined by the seed. The sequence of numbers itself satisfies all the important properties of a random number sequence.

When you do a computer “experiment” you can explicitly initialize your random number generator by feeding it a seed. You can also let the algorithm pick out a seed from a stream of random numbers being constantly spit out by the computer. If you don’t care about repeatability and just want good quality randomness, the default in modern packages – which is to pick up a seed from the computer – works fine.

Python numpy’s random number generator uses the Mersenne Twister which, I am assured, is good enough for government work. By default the twister is initialized using /dev/urandom or the clock, which should be, also, good enough. Normally it is, but when we move into the weird whacky twilight world of parallel computing strange things can happen…

Say we have an embarrassingly parallel job where we simply start up a bunch of nodes and do some computations. What happens if we ask for a stream of 5 random numbers in each child process?

Say we ask for five numbers each:

[-1.83711524 -0.57025788 -0.47002959  1.13873683 -1.20119582]
[ 1.24164455 -1.05592017 -2.04148199 -1.10787051  0.0561248 ] <--- B
[-0.95616591 -0.17653271 -1.81273353 -0.39608772 -0.08856469] <--- C
[ 1.23821309  0.44823325 -2.09541964 -1.7484197  -0.46045061] <--- D
[ 0.21477345 -1.30221712 -0.53970868 -0.00636153  0.85769071] <--- E
[-1.1929044   0.29644317  0.64373556 -1.38675638 -0.42406317]
[ 0.84775676  0.69999582  0.36933264  1.79700735  0.34603859]
[ 0.12984335  0.26137673 -0.32926777  0.06448171  0.33496994]
[ 0.10993647  0.6695855  -1.32065528  0.93429973  0.32059549]
[-1.83711524 -0.57025788 -0.47002959  1.13873683 -1.20119582]
[ 0.81037046  0.056652    0.54062643  1.18642807 -0.16265761]
[ 1.06970208  0.13867405  1.2972758  -1.02361929  0.08262749]
[ 1.24164455 -1.05592017 -2.04148199 -1.10787051  0.0561248 ] <--- B
[-0.95616591 -0.17653271 -1.81273353 -0.39608772 -0.08856469] <--- C
[ 0.35640486 -0.36840793 -0.84085094  1.82913844 -0.09785973]
[ 1.03816641 -0.26439036 -1.73092253 -1.05740638  1.40258063]
[ 0.69154406 -0.11333892 -0.55908131 -0.80996816  1.33116669]
[-0.42959555  1.31447804 -0.5917133  -0.26106957 -1.42283209]
[ 1.23821309  0.44823325 -2.09541964 -1.7484197  -0.46045061] <--- D
[ 0.21477345 -1.30221712 -0.53970868 -0.00636153  0.85769071] <--- E

It LOOKS OK, but on closer inspection our hair stands on end. There are sequence repeats (I’ve pointed them out). That should scare you. Want to see something even more scary?

How about another run that takes slightly longer for the computer to run? (All I did was change how many random numbers I asked for, see code below):

[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[-2.49928063  0.0620409   0.953208   -1.11290331 -0.0086512  -1.18000836]
[-0.1035066   0.93604127  0.34371349 -2.03675653  1.47444023  0.29328905]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[-2.49928063  0.0620409   0.953208   -1.11290331 -0.0086512  -1.18000836]
[ 2.45623686  1.61541923 -0.57659283  1.33272398 -0.1824892  -0.98865263]
[-2.49928063  0.0620409   0.953208   -1.11290331 -0.0086512  -1.18000836]
[-2.49928063  0.0620409   0.953208   -1.11290331 -0.0086512  -1.18000836]
[ 1.20134648  0.99562444  0.39421463 -0.38797808  1.33273899 -0.44182553]

Oh, this is enough to keep us awake at night, huh?

We are not in control of when the child process reads from /dev/urandom or when it reads the system clock. If the process lands on the same machine it can read the same seed from the /dev/urandom or from the system clock, or not. Don’t take the risk, explicitly initialize the generator. An easy way is to pass a unique identifier to the child which can then form the seed.

Code follows:

import multiprocessing as mp, numpy

def child(n):
  numpy.random.seed(n) #<-- comment this out to get the fright of your life.
  m = numpy.random.randn(6)
  return m

N = 20
pool = mp.Pool()
results = pool.map(child, range(N))
for res in results: print res
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s