This is a short note describing a common code pattern useful for parallelizing some computations using the Python multiprocessing module.
Many problems are of the embarrassingly parallel type, where the task consists of the same set of computations done independently on a large set of input data. In many languages such problems are solved using multi-threaded code. Due to events too complicated and embarrassing to mention Python doesn’t do too well with multiple threads, and we need to use multiple processes via the multiprocessing module instead.
A common pattern is to set up a number of workers and use queues to pass data back-and -forth. The pattern looks like this:
The use of this pattern is relatively straight forward but there are some things to pay attention to.
Sharing read only data
The general topic of sharing data between processes is worthy of deeper discussion, and there are some pointers in the Python documentation but there is an easy way to share read only data between processes – use globals that are written to in the parent process BEFORE the child processes are started.
If you declare a global variable and then modify it after the child processes are started, or if you modify the global from the child process you’ll enter into the twilight zone. There will be no errors and from within the process where you do the writing everything will look consistent, but the rest of the universe will not pay any attention to your writes.
Queues have overhead – the object has to pickled by the sender and then un-pickled by the receiver, and while the object is in the queue, it’s taking up memory. If you have a large amount of read-only data that needs to be shared amongst child processes – like a lookup table of primes, for example – this pattern of using globals can be pragmatic.
Explicitly set Queue size
On Mac OS X the default size of the queue is 32767 items. On Linux, the default size is … INFINITY. I learned this the hard way. I usually prototype my code on my mac, and then deploy to a linux container that runs god knows where on data that is several orders of magnitude larger than my prototype set. On Linux things SEEMED to run fine but the size of the result set was abnormally small compared to the input. After a lot of poking around I discovered this property of the Python implementation and what was happening was that in Linux my queues got larger and larger, taking up more and more RAM and the OS started to kill off the child processes. Setting an explicit queue size allows the processes to block when the queue becomes full, capping memory usage.
Wrap the child process function using traceback
Error messages from child processes are messy and uninformative. Importantly, the friendly python stack trace print out is missing. A fairly convenient pattern is to wrap the actual function call within a try-except block in thin outer function, using the traceback module, like so:
def func_w(args): """A thin wrapper to allow proper tracebacks when things go wrong in a thread :param args: :return: """ import traceback try: return func(**args) except Exception as e: traceback.print_exc() print('') raise e
Check for normal termination from the child
One of the reasons it took me so long to track down the above mentioned memory error was that when the child processes was being killed by the OS no Python error was raised. The principled way to check for normal termination in is to look at the exitcode value of the process.
When not to use this pattern
Passing data via queues can be expensive in terms of computation and memory. If your individual workers have to pass a lot of data back and forth – say large submatrices of an even larger matrix – it may be neater and faster to use shared memory objects as supplied by the multiprocessing module which takes care of semaphores for you.