Simple units checking for cheap in C++

Mixing up units is a very common problem everywhere. A formula takes meters, you put in feet and your Mars probe crashes. It happens. In C++ there are “zero cost” libraries to handle full-fledged dimensional checking, conversions and safeguards. How do they work? Are they really zero cost? Let’s explore using a simple limited example, using what people often call strong typedefs.

(Full code is available as a gist).

Say we have a function that accepts input in meters. In our example the function simply multiples the distance by two. Without any consideration for units, our code probably looks like

double foo(double m1) { return 2 * m1; }

There are zero safeguards in this code. I can accidentally pass in m1 as feet and get the wrong number.

As a step up, I could define two types

typedef double Meters;
typedef double Feet;

Meters foo(Meters m1) { return Meters{ 2 * m1.m }; }

In c++ typedefs are really simple visual sugar. So, the compiler will silently allow me to do something like this

int main(int argc, char* argv[])
{
    Feet f{ 1 };
    std::cout << double(foo(f)) << std::endl;
}

We want stronger checks! What we could do instead, is use what people often call a strong typedef.

struct Feet {
    explicit operator double() const { return f; }
    double f;
};

struct Meters {
    explicit operator double() const { return m; }
    double m;
};

Here we define a different structure each for the different units of measurement. For now, ignore the double operator. We’ll get to that in a second. Now if we go to compile our program:

Meters foo(Meters m1) { return Meters{ 2 * m1.m }; }

int main(int argc, char* argv[])
{
    Feet f{ 1 };

    std::cout << double(foo(f)) << std::endl;
}

We make the compiler angry! It says:

note: candidate function not viable: no known conversion from 'Feet' to 'Meters' for 1st
      argument
Meters foo(Meters m1) { return Meters{ 2 * m1.m }; }

This much better! Now we write in some explicit conversions and we have a working program:

#include <iostream>

struct Feet {
    explicit operator double() const { return f; }

    double f;
};

struct Meters {
    Meters(double m)
        : m(m)
    {
    }
    Meters(Feet f)
        : m(0.3048 * f.f)
    {
    }

    explicit operator double() const { return m; }

    double m;
};

Meters foo(Meters m1) { return Meters{ 2 * m1.m }; }

int main(int argc, char* argv[])
{
    Feet f{ 1 };

    std::cout << double(foo(f)) << std::endl;
}

The important part is that new constructor that explicitly tells the compiler what to do when converting feet to meters.

Explicit double

The double operator is there, as you have guessed by now, so we can convert the quantity to a simple float when we want. The explicit keyword is there because we don’t want the compiler to infer when a conversion is to be made. We want to be in control of that. Why? Because if we don’t the compiler will let us do dangerous things like:

#include <iostream>

struct Feet {
    operator double() const { return f; }

    double f;
};

struct Meters {

    operator double() const { return m; }

    double m;
};

int main(int argc, char* argv[])
{
    Meters m{ 1 };
    Feet   f{ 1 };

    std::cout << m + f << std::endl;
}

Note how it lets us merrily add meters and feet! It has implicitly converted both to doubles and run with that. One thing we could do is settle on some base unit, say meters, and then in the double operator for feet convert it into meters. But that gets messy: what if we want to operate with the length in feet for a bit? These implicit conversion just add to the confusion.

What is the bill?

How much does all this cost in runtime? To figure out, we are going to do a simple test where we do what we just did, but on an array. Our simple weak typedefed code will be

#include <iostream>
#include <vector>

typedef double Meters;
typedef double Feet;

Meters foo(Meters m1) { return Meters{ 2 * m1 }; }

int main(int argc, char* argv[])
{
    size_t N = 10000000;

    std::vector<Feet> feet;
    feet.reserve(N);
    for (int i = 0; i < N; i++) {
        feet.push_back(i);
    }

    Meters sum = 0.0;
    for (int i = 0; i < N; i++) {
        sum += foo(0.3048 * feet[i]);
    }

    std::cout << sum;
}

And our strongly typedef-ed code is:

#include <iostream>
#include <vector>

struct Feet {
    double f;
    Feet(double f)
        : f(f)
    {
    }
    explicit constexpr operator double() const { return f; }
};

struct Meters {
    double m;

    constexpr Meters(double m)
        : m(m)
    {
    }

    constexpr Meters(const Feet& f)
        : m(0.3048 * f.f)
    {
    }

    constexpr Meters& operator+=(const Meters& rhs)
    {
        m += rhs.m;
        return *this;
    }

    explicit constexpr operator double() const { return m; }
};

Meters foo(Meters m1) { return Meters{ 2 * m1.m }; }

int main(int argc, char* argv[])
{
    size_t N = 10000000;

    std::vector<Feet> feet;
    feet.reserve(N);
    for (int i = 0; i < N; i++) {
        feet.push_back(i);
    }

    Meters sum = 0.0;
    for (int i = 0; i < N; i++) {
        sum += foo(feet[i]);
    }

    std::cout << double(sum);
}

constexpr

Before we go on, a quick note about “constexpr”. This indicates to the compiler that the computation in the function might be performed at compile time, saving us some runtime budget. As you can imagine people have done some fun things with this. It is not really relevant here, but useful to have in this kind of code, e.g if we have a constant in Feet that is used in a Meters computation. We can avoid the conversion being done repeatedly.

Profiling

The weakly typed code (basically just operating on doubles) when compiled with

clang++ --std=c++17

Runs in 385ms, with the costliest function being foo, which takes up 18ms. (The most costly operation, besides start up, is the destruction of the vector, which takes 50ms)

The strongly typedef-ed code takes 581ms, with foo now taking 72ms, some of the time being taken up by out Meters(double) constructor. The converter constructor, that converts Feet to Meters takes up 41ms, the Feet constructor 25ms, the Meter + Meter operator 17ms … oh my goodness, this is quite a bill!

There is no free lunch …

So, should we shake our heads and accept this penalty and mark “zero cost” as advertising hype? Not quite. Let’s see what happens when we turn on compiler optimizations for speed. Compiling with

clang++ --std=c++17 -O3

now makes our two programs run identically (~150ms each)!

weak typedef code
strong typedef-ed code

So, all this amazing type safety goodness comes for free, if we write our type code decently and turn on compiler optimization.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.