Let’s begin with the following two questions:
- When was the last time you productively used an octal number-representation?
- For anything other then unix-style file-permissions?
My personal answer for the first question is likely more than a year, and the answer for the second one is, unless I forget something, “not ever”. I’m positive that this is typical for most programmers. We certainly make regular use of hexadecimal numbers, occasionally touch binary numbers and most definitely use decimal all the time. The same is not true for octal.
If you disagree, please share your uses on reddit, I will follow the discussion there.
Let’s compare this to how we write our literals:
auto b = 0b101; // binaryauto d = 42; // decimalauto o = 023; // octalauto h = 0xC05FEFE; // hexadecimalDecimal is the default if we don’t write anything (as it should be)
and binary and hexadecimal are prefixed with 0b and 0x respectively,
each of which can never occur in a valid decimal string. That is all
good so far. Octal, the most obscure representation of all, does
however receive special treatment in that it gets padded with zeros.
As a result of that it is impossible to pad decimal numbers without
whitespace:
auto b1 = 0b00000001;auto b2 = 0b00000010;auto b3 = 0b00000100;auto d1 = 1;auto d2 = 10;auto d3 = 100;auto o1 = 0001;auto o2 = 0010;auto o3 = 0100;auto h1 = 0x001;auto h2 = 0x010;auto h3 = 0x100;This means that we have less options for good formatting and less readability for the most common representation. I assert that this is a stupid thing and doubly so because we get very little value in return.
Some people will now argue that padding with whitespace is more readable anyways; they are kind-of right, but your auto-formatter is way more likely to destroy padding that uses whitespace than it is to destroy 0-padding. And even though I have only limited experiences with teams, my very first team-work was enough to convince me that any code that has more than one regular author requires regular auto-formatting. Preferably as git-hook before every commit. And this is coming from someone who truly believes that well done manual formatting is superior to everything auto-formatters will ever produce.
It does however get worse: While you might expect programmers to know
these things, you can certainly not expect users to do so. Now, users
don’t input literals, so one might think that there is little
potential harm, but let’s look a bit closer into our standard-library:
The preferred method to convert strings (which may well originate from
user-input) into strings is the
std::sto*-family
Luckily they will by default assume base 10 and cannot be convinced to
use anything else by the input. But let’s assume that you want to
allow your users to input hexadecimal numbers as well. Luckily
base = 0 enables auto-detection. Regular users won’t notice a difference,
while advanced users can input hexadecimal values if those are more
appropriate (think of a color-picker, where you enter the value of
a color-channel). This works great until one of the regular users puts
in a padded number.
The obvious answer for that is of course “don’t use base=0”, but I
think that this deserves more discussion: If it were not for those
pointless octal values, this would have worked great. Now on the
other hand I have to implement the base-distinction myself or suffer
subtle bugs and potential crashes (maybe I checked for a valid
numerical format before, but octal literals didn’t get caught
because the fit the regular expression [0-9]+ but result in an
exception if there is actually a 9 in them).
Hopefully I have now convinced you that there is an issue. Now let’s see why we don’t get anything useful in exchange: As noted before the only somewhat common use (that I am aware of) is setting permissions. I assert that this is a bad idea on it’s own:
- In my experience the preferred method of setting them with
chmodis not the use of numbers but meaningful letters such asu+rwto give the owner the permission to read and write the file. - Most modern permission-APIs will have you handle enums instead, since they are more semantic.
- Numbers don’t represent permissions, using them as such is an extreme case of exposing users to implementation-details that they should not be exposed to.
- Even if we want to expose implementation-details, octal numbers are still not reasonable, since we are literally talking about bits here, for which binary representation is more natural anyways. Octal just requires users two perform the conversion to binary in their head.
- It’s error-prone: The number that you input is also used for further file-settings like the sticky-bit. While accidents from typos are unlikely, they can happen. Since we gain little from it, why take chances?
The main advantage that octal numbers offer is that they are a clean power of two that doesn’t require additional symbols. I consider that way too narrow to be regularly useful.
Let’s look at some other languages:
- In python entering a number with leading zeros results in a parsing-error. Octal
numbers can be generated by prefixing them with
0o. - In Rust leading zeros are ignored,
023 == 23 - D behaves similar to Rust. It used to have
0ofor octal numbers, though that syntax is deprecated now.
We see a pattern emerging: Most later languages didn’t copy this
mistake. If we want to fix C++ without removing octal literals
completely (for whatever value they really have), the best way
would be too use 0o as alternate prefix, since it shares the
virtues of 0x and 0b: It does not form a valid decimal integer,
and is already well established, making it a perfect fit.
The reason I am writing this article is that I just saw that fmtlib,
which is likely to become part of the C++-standard in the near future,
allows to print octal numbers (no issue so far), but the (admittedly
optional) prefix is still 0. While it’s not as bad for printing as
it is for reading, I believe that this is just another (small) step
into the wrong direction. Instead we should standardize the 0o-prefix
and deprecate the 0-prefix in all places where it is currently used.