Many scientific computing applications consist of processing data and then writing out to disk. Often the application will be I/O bound, which means that the computation time is shorter that the time it takes to read or write data to disk. Even though modern OSes do their best to write efficiently to the hardware, there are some methods you can use to speed up your I/O bound your application even more.
When writing out computations to disk, the most basic method is to compute, save, compute save.
The example code is here:
The advantage is that this is very simple, single threaded code, easy to design, write and debug. The problem is that while the OS takes care of the final buffering to disk and optimizes the writes in batches, the call out to the write function is expensive, especially since it leaves user space and goes into the kernel.
Our baseline example performance is:
time ./nobuffer real 0m6.430s user 0m2.872s sys 0m2.532s
(As a refresher, “real” time is wall time, as measured by your stopwatch, “user” is time spent by the code in user space, and “sys” is time spent calling out to the kernel and waiting for I/O)
A basic upgrade is to buffer the data in memory and then write out the whole buffer with one write call. This code is a little bit more complicated – you have to create a data buffer and keep track of when it fills up – but not too bad. This gives us a speed up simply because we call out to the kernel far less often (Example code here)
time ./singlebuffer real 0m5.739s user 0m0.948s sys 0m2.711s
This is cool: we’ve sped things up, primarily by wasting less time making write calls, though the gain in wall time is not that dramatic.
The next upgrade is more complicated. Here we use two buffers and an extra thread. We save our computation to one buffer while writing out the other buffer to disk. We then switch. We show two versions. In the first one we spin up a new thread each time we write out the buffer. This involves a bit of thread synchronization. (Example code)
time ./doublebuffer real 0m5.025s user 0m1.171s sys 0m3.007s
This is good progress, we’ve kicked down our wall time even more because while we are doing I/O, at least for some of the time, we are also doing computation.
In the next version of the double buffer, we use a persistently running thread and use a condition variable to synchronize the writing. This code is way, way more complicated. In fact the first time I wrote this I ended up with a deadlock and had to spend some time debugging what I had missed out. (Example code) (It’s not the prettiest code. If I had more time to spend on this, I would make it more understandable, though the principles remain the same)
time ./doublebuffer2 real 0m4.404s user 0m1.651s sys 0m2.754s
So, this complicated code was definitely worth it, because we do seem to gain quite a bit by not creating a new thread each time we write out the buffer.
In case you were wondering WHY I playing around with this: My spacecraft trajectory simulator was going to generate gargantuan amounts of data. I mean huge data. My estimates were that for a simulation that charted 1 year of flight time, with a 1s time-step, I was going to generate 756 MB/spaceship. I was convinced that I would be I/O bound by this operation. I made all sorts of fancy sketches for double buffered writing and so on. But then I decided I should actually check. I mean modern OSs do a decent job of buffering disk reads and writes, and my laptop has an SSD, which is fast. I should figure out how bad this all is.
It does seem that using a double buffer is worth it, though it still depends quite a lot on how expensive each computation step is, compared to the amount of data. For my particular application, I still will have to profile and make a guess as to what buffering method I should implement.