Buffering writes

Many scientific computing applications consist of processing data and then writing out to disk. Often the application will be I/O bound, which means that the computation time is shorter that the time it takes to read or write data to disk. Even though modern OSes do their best to write efficiently to the hardware, there are some methods you can use to speed up your I/O bound your application even more.

When writing out computations to disk, the most basic method is to compute, save, compute save.

The example code is here:

g++ –std=c++17 -O3 nobuffer.cpp -onobuffer
#include <ctype.h>
#include <fstream>
int main(int argc, char* argv[])
std::ofstream buffer("test.bin", std::ios::binary | std::ios::out);
for (size_t j = 0; j < (1 << 27); j++) {
size_t k = (size_t)(j * 1.5 – j * 1.1 + j / 3.2);
buffer.write((char*)&k, sizeof(size_t));
view raw nobuffer.cpp hosted with ❤ by GitHub

The advantage is that this is very simple, single threaded code, easy to design, write and debug. The problem is that while the OS takes care of the final buffering to disk and optimizes the writes in batches, the call out to the write function is expensive, especially since it leaves user space and goes into the kernel.

Our baseline example performance is:

time ./nobuffer

real    0m6.430s
user    0m2.872s
sys     0m2.532s

(As a refresher, “real” time is wall time, as measured by your stopwatch, “user” is time spent by the code in user space, and “sys” is time spent calling out to the kernel and waiting for I/O)

A basic upgrade is to buffer the data in memory and then write out the whole buffer with one write call. This code is a little bit more complicated – you have to create a data buffer and keep track of when it fills up – but not too bad. This gives us a speed up simply because we call out to the kernel far less often (Example code here)

time ./singlebuffer

real    0m5.739s
user    0m0.948s
sys     0m2.711s

This is cool: we’ve sped things up, primarily by wasting less time making write calls, though the gain in wall time is not that dramatic.

The next upgrade is more complicated. Here we use two buffers and an extra thread. We save our computation to one buffer while writing out the other buffer to disk. We then switch. We show two versions. In the first one we spin up a new thread each time we write out the buffer. This involves a bit of thread synchronization. (Example code)

time ./doublebuffer

real    0m5.025s
user    0m1.171s
sys     0m3.007s

This is good progress, we’ve kicked down our wall time even more because while we are doing I/O, at least for some of the time, we are also doing computation.

In the next version of the double buffer, we use a persistently running thread and use a condition variable to synchronize the writing. This code is way, way more complicated. In fact the first time I wrote this I ended up with a deadlock and had to spend some time debugging what I had missed out. (Example code) (It’s not the prettiest code. If I had more time to spend on this, I would make it more understandable, though the principles remain the same)

time ./doublebuffer2

real    0m4.404s
user    0m1.651s
sys     0m2.754s

So, this complicated code was definitely worth it, because we do seem to gain quite a bit by not creating a new thread each time we write out the buffer.

In case you were wondering WHY I playing around with this: My spacecraft trajectory simulator was going to generate gargantuan amounts of data. I mean huge data. My estimates were that for a simulation that charted 1 year of flight time, with a 1s time-step, I was going to generate 756 MB/spaceship. I was convinced that I would be I/O bound by this operation. I made all sorts of fancy sketches for double buffered writing and so on. But then I decided I should actually check. I mean modern OSs do a decent job of buffering disk reads and writes, and my laptop has an SSD, which is fast. I should figure out how bad this all is.

It does seem that using a double buffer is worth it, though it still depends quite a lot on how expensive each computation step is, compared to the amount of data. For my particular application, I still will have to profile and make a guess as to what buffering method I should implement.

2 thoughts on “Buffering writes

  1. Hi Kaushik, I guess your 756MB/ss is raw data, and you can use data compression to trade some CPU resources for I/O bandwidth. It doesn’t cancel double buffering, o’course.

    1. Hi dlgbrdv, thanks for reading. Yes, compression could be an important part. I looked a bit into time series compression. Because the data is likely to have a broad auto-correlation peak, specialized compression algorithms can exploit that structure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.