NeverSawUs

Two Bit-Twiddling Tricks

And now for something completely different: I'd like to share my two favorite bit-twiddling tricks with you. One I've used and abused for a while, and is taken from Sean Anderson's wonderful bestiary. The other, semi-related hack is something that only recently "clicked" for me, though I've been aware of its use for a while.

For my first trick...

Modulo is a super useful operation in programming. It makes it easy to create ring buffers, wrap indexes through multi-dimensional arrays, and generally keep a value fenced between zero and an upper bound.

One issue — modulo can be a relatively expensive operation to perform in hot code. It's a floating point operation (at least in JS), which incurs some overhead. In cases where you control the size of the upper bound and the incoming data type is a positive small integer (<2^31), this trick might speed things up.

The trick is to exploit the base-two representation of the number in question. If your upper bound is a power of two, you can get away with a single bitwise AND operation against that upperbound minus one.

We'll illustrate this with 8 as our upper bound — we want x % 8, so we're going to try x & 7.

      ┌───┬───┬───┬───┬───┬───┬───┬───┐
8     | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
      └───┴───┴───┴───┴───┴───┴───┴───┘

       We start with the upper bound, 8. Only a single bit
       is set.
                           -1   1        -1   1   1    -1   1   1   1
┌───┬───┬───┬───┐ ┌───┬───┬───┬───┐ ┌───┬───┬───┬───┐ ┌───┬───┬───┬───┐ ┌───┬───┬───┬───┐
| 1 | 0 | 0 | 0 |→| 1 | 0 | 0 | 0⃫ |→| 1 | 0 | 0⃫ | 0⃫ |→| 1 | 0⃫ | 0⃫ | 0⃫ |→| 0 | 1 | 1 | 1 |
└───┴───┴───┴───┘ └───┴───┴───┴───┘ └───┴───┴───┴───┘ └───┴───┴───┴───┘ └───┴───┴───┴───┘
            ─ 1    Subtracting one will set all bits to the right of the 
                   N-th power-of-two bit. Carrying works just like it does
                   in base 10, it just happens more often in base two!

      ┌───┬───┬───┬───┬───┬───┬───┬───┐
8 ─ 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
      └───┴───┴───┴───┴───┴───┴───┴───┘

      ┌───┬───┬───┬───┬───┬───┬───┬───┐
x     | ? | ? | ? | ? | ? | ? | ? | ? |   X comes in. All bits could be zero or one. We only care
      └─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┘   about the [0, 8) value range.
        |   |   |   |   |   |   |   |
       ?&0 ?&0 ?&0 ?&0 ?&0 ?&1 ?&1 ?&1
        ║   ║   ║   ║   ║   ║   ║   ║
      ┌─╨─┬─╨─┬─╨─┬─╨─┬─╨─┬─╨─┬─╨─┬─╨─┐
7 & x | 0 | 0 | 0 | 0 | 0 | ? | ? | ? |   By bitwise-AND'ing, we mask off only the bits in the
      └───┴───┴───┴───┴───┴───┴───┴───┘   [0, 8) range. We proclaim victory and go eat a hoagie.

In JS, if you can be certain that the input types are SMI's (small integers), and your hot code path is performing a lot of modulo operations, you stand to see some performance gains!

I like this trick a lot: not only did it help speed up my inflate implementation, but I find that it's a helpful "gateway drug" to other bit twiddling tricks. For instance, now that we know that each column represents a power of two, it follows that shifting left is equivalent to multiplying your value by a power of two. From there, you can see that shifting right divides by a power of two — and so on, and so forth. Neat!

For my next trick...

I've long been aware of this trick — it's incredibly common to see it used in VMs. For whatever reason though, I had a hard time visualizing how it worked until recently. Actually, the first trick illustrates very nicely how the second trick works.

Tagged pointers allow a language runtime to add metadata directly to pointers to objects. For example, a pointer could contain the type of the object it points to for quickly checking the validity of an operation before incurring the cost of a dereference. Alternatively, it could signal that the value of the pointer is a literal number, so that mathematical operations on "small" integer values doesn't incur a pointer dereference.

You may have intuited by now that tagged pointers are in use in all of the leading JS implementations. A single bit denotes whether a pointer refers to a target (be that a double precision float, an object, or another primitive value) or represents a SMI. This is why SMIs can only represent 2^31 different values.

The key to this trick is the knowledge that many memory allocators require that allocations be 8-byte aligned. This fact eluded me at first!

This means that there are three bits which will never be set by a valid pointer — pointers may only point to multiples of 8!

                  ┌───┬───┬───┬───┬───┬───┬───┬───┐
valid pointer     | ? | ? | ? | ? | ? | 0 | 0 | 0 |
                  └───┴───┴───┴───┴───┴───┴───┴───┘

                   The least significant three bits will *never* be 
                   set on a valid pointer: they're free for us to use.


                  ┌───┬───┬───┬───┬───┬───┬───┬───┐
SMI               | ? | ? | ? | ? | ? | ? | ? | 1 |
                  └───┴───┴───┴───┴───┴───┴───┴───┘

                   If the least significant bit is set, then all other
                   bits represent an integer value. We need only shift
                   right one to obtain the value.


                  ┌───┬───┬───┬───┬───┬───┬───┬───┐
tagged pointer    | ? | ? | ? | ? | ? | 1 | 1 | 0 |
                  └───┴───┴───┴───┴───┴───┴───┴───┘

                   We can store extra metadata in the next two bits - the
                   type of the referenced object, for example. We're limited
                   to 2 bits, so only four values are available: the other bit
                   is already used to indicate whether the value is a SMI or
                   not! The original pointer value can be retrieved by AND-ing 
                   it with the complement of 7 (~7).