close menu

It's 2020, can we finally get strings right?


30 July 2020


What's wrong with this code?

printf("Hello %s! Systems check running. You're on a trajectory to %s. "
       "Oxygen levels are at %u%%. There is %d %s of rocket fuel left.\n",
       operator_name, dest, o2, fuel_remaining);

At first glance, it appears well-formed, and pretty darn clear in intent. If it's not immediately obvious what's wrong, there's a missing parameter to the vararg list that printf takes.

Of course, clang issues a warning for this! (You still have to figure out which format specifier is unmatched though.)

warning: more '%' conversions than data arguments [-Wformat]
fprintf(stderr, "Hello %s! Systems check running. You're on a trajectory to %s.

It can be quite easy to miss in a long scrollback, but you should have tooling that can bring this to your attention during a review; it's a common problem, and it can be caught by pretty much any C++ code linter.

Except when it isn't!

Suppose we have a function char char* fmt_string(const char*, ...) which wraps sprintf for one reason or another.

template<typename... Args>
const char* fmt_string(const char* fmt, Args... args) {
  // Possible (unsafe) impl. Please, don't mind the buf.
  char* buf = new char[100];
  sprintf(buf, fmt, args...);
  return buf;
}
const char* fmt_string(const char* fmt) { return fmt; }

Is this code correct?

fmt_string("Hello %s! Systems check running. You're on a trajectory to %s.\n "
           "Oxygen levels are at %u%%. There is %d %s of rocket fuel left.\n",
           operator_name, dest, o2, fuel_remaining);

My compiler thinks so! We've added just a single level of indirection, and suddenly the format warning is erased. No one notices the missing param during code review, and I'm left wondering why I'm getting address errors.

Printf considered harmful

I've previously written about how warnings are not errors. That is not however to say that some warnings shouldn't be errors. And it's definitely not to say that they should elided completely by a simple wrapper function!

Screwing up printf style formats is really easy to do, especially if there's a long argument list. Spurious or incorrect format specifiers can also introduce unsafe memory accesses, and subsequently security bugs.

Things like this can make me — at times — hate working with C++. Trivial errors like this distract completely from the more interesting problems at hand, but trivial errors like this can also get you pwned!

It really sucks when a little abstraction which could otherwise unlock super powers actually gets in your way for no good reason.

Eliminating an entire category of errors

Working a lot with Kotlin in the past has really spoiled me with its super convenient string syntaxes. String formatting is practically WYSIWYG, and you can pretty-print any object.

println(
    """
    | Hello $operator! Systems check running. You're on a trajectory to $destination.
    | Oxygen levels are at $o2%. There is $fuel_remaining of rocket fuel left.
    """.trimMargin()
)

The above syntax safely replaces the $dollar references in the templated string with stringified values of variables of the same name — all for free. This is called string interpolation with embedded references.

With this language feature, it's impossible to miss a parameter. Loads of modern languages have this, often with very similar syntaxes: e.g. Kotlin, Python, JavaScript, Groovy, C#, Swift, and Ruby (among others).

At the time of writing, the equivalent idiomatic C++ is probably the following — but streams can be pretty verbose, sometimes side-effect-y and incompatible with each other, and they are otherwise hard to abstract over.

std::cout << "Hello " << operator_name << "! Systems check running. You're on a trajectory to "
          << destination << "." << std::endl << "Oxygen levels are at " << o2 << "%. There is  "
          << fuel_remaining << " of rocket fuel left." << std::endl;

Other languages like Rust provide the same guarantees with similar levels of convenience/readability with parametrized string interpolation, which decouples the template from the variables.

The good news is that the draft C++20 specification plans to provide safe string interpolation with parameterized references á la Rust's format! macro. The new std::format family of functions will do away with format specifiers and finally provide a good alternative to printf.

Until then, fmtlib provides an implementation, as well as a ton of other awesome features.

#include <fmt/core.h>

int main() {
  fmt::print("The answer is {}.", 42);  // Prints what you expect!
  fmt::print("Curly braces {} expect values!");  // Runtime error, but no unsafe memory access occurs.
}

If only it could be a compile-time error like it is in the equivalent Rust.

error: 1 positional argument in format string, but no arguments were given
  --> src/main.rs:3:27
  |
3 |     format!("Curly braces {} expect values!");
  |                           ^^