Octal Zero considered harmful

🅭🅯🄎 8. April 2018

Abstract: Octal literals that are distinguished by a leading zero are unnecessary, dangerous and should be deprecated. Better syntactic choices are available, though the use of octal numbers outside very few areas is highly questionable to begin with.

Let’s begin with the following two questions:

When was the last time you productively used an octal number-representation?
For anything other then unix-style file-permissions?

My personal answer for the first question is likely more than a year, and the answer for the second one is, unless I forget something, “not ever”. I’m positive that this is typical for most programmers. We certainly make regular use of hexadecimal numbers, occasionally touch binary numbers and most definitely use decimal all the time. The same is not true for octal.

If you disagree, please share your uses on reddit, I will follow the discussion there.

Let’s compare this to how we write our literals:

C++ Skip

auto b = 0b101;     // binary
auto d = 42;        // decimal
auto o = 023;       // octal
auto h = 0xC05FEFE; // hexadecimal

Decimal is the default if we don’t write anything (as it should be) and binary and hexadecimal are prefixed with 0b and 0x respectively, each of which can never occur in a valid decimal string. That is all good so far. Octal, the most obscure representation of all, does however receive special treatment in that it gets padded with zeros. As a result of that it is impossible to pad decimal numbers without whitespace:

C++ Skip

auto b1 = 0b00000001;
auto b2 = 0b00000010;
auto b3 = 0b00000100;

auto d1 = 1;
auto d2 = 10;
auto d3 = 100;

auto o1 = 0001;
auto o2 = 0010;
auto o3 = 0100;

auto h1 = 0x001;
auto h2 = 0x010;
auto h3 = 0x100;

This means that we have less options for good formatting and less readability for the most common representation. I assert that this is a stupid thing and doubly so because we get very little value in return.

Some people will now argue that padding with whitespace is more readable anyways; they are kind-of right, but your auto-formatter is way more likely to destroy padding that uses whitespace than it is to destroy 0-padding. And even though I have only limited experiences with teams, my very first team-work was enough to convince me that any code that has more than one regular author requires regular auto-formatting. Preferably as git-hook before every commit. And this is coming from someone who truly believes that well done manual formatting is superior to everything auto-formatters will ever produce.

It does however get worse: While you might expect programmers to know these things, you can certainly not expect users to do so. Now, users don’t input literals, so one might think that there is little potential harm, but let’s look a bit closer into our standard-library: The preferred method to convert strings (which may well originate from user-input) into strings is the std::sto*-family Luckily they will by default assume base 10 and cannot be convinced to use anything else by the input. But let’s assume that you want to allow your users to input hexadecimal numbers as well. Luckily base = 0 enables auto-detection. Regular users won’t notice a difference, while advanced users can input hexadecimal values if those are more appropriate (think of a color-picker, where you enter the value of a color-channel). This works great until one of the regular users puts in a padded number.

The obvious answer for that is of course “don’t use base=0”, but I think that this deserves more discussion: If it were not for those pointless octal values, this would have worked great. Now on the other hand I have to implement the base-distinction myself or suffer subtle bugs and potential crashes (maybe I checked for a valid numerical format before, but octal literals didn’t get caught because the fit the regular expression [0-9]+ but result in an exception if there is actually a 9 in them).

Hopefully I have now convinced you that there is an issue. Now let’s see why we don’t get anything useful in exchange: As noted before the only somewhat common use (that I am aware of) is setting permissions. I assert that this is a bad idea on it’s own:

In my experience the preferred method of setting them with chmod is not the use of numbers but meaningful letters such as u+rw to give the owner the permission to read and write the file.
Most modern permission-APIs will have you handle enums instead, since they are more semantic.
Numbers don’t represent permissions, using them as such is an extreme case of exposing users to implementation-details that they should not be exposed to.
Even if we want to expose implementation-details, octal numbers are still not reasonable, since we are literally talking about bits here, for which binary representation is more natural anyways. Octal just requires users two perform the conversion to binary in their head.
It’s error-prone: The number that you input is also used for further file-settings like the sticky-bit. While accidents from typos are unlikely, they can happen. Since we gain little from it, why take chances?

The main advantage that octal numbers offer is that they are a clean power of two that doesn’t require additional symbols. I consider that way too narrow to be regularly useful.

Let’s look at some other languages:

In python entering a number with leading zeros results in a parsing-error. Octal numbers can be generated by prefixing them with 0o.
In Rust leading zeros are ignored, 023 == 23
D behaves similar to Rust. It used to have 0o for octal numbers, though that syntax is deprecated now.

We see a pattern emerging: Most later languages didn’t copy this mistake. If we want to fix C++ without removing octal literals completely (for whatever value they really have), the best way would be too use 0o as alternate prefix, since it shares the virtues of 0x and 0b: It does not form a valid decimal integer, and is already well established, making it a perfect fit.

The reason I am writing this article is that I just saw that fmtlib, which is likely to become part of the C++-standard in the near future, allows to print octal numbers (no issue so far), but the (admittedly optional) prefix is still 0. While it’s not as bad for printing as it is for reading, I believe that this is just another (small) step into the wrong direction. Instead we should standardize the 0o-prefix and deprecate the 0-prefix in all places where it is currently used.

Settings