Bit twiddling

Sol's take on bit masking, bit manipulation and Boolean algebra.

Foreword

Some readers of my graphics tutorials have expressed frustration... err, interest in bitwise operations, and have asked me to write a tutorial about them.

Bitwise operations and, in general, Boolean algebra is rather difficult to explain in an entertaining way, but it's just one of those things that any programmer worth their salt (or dublons, or whatever) pretty much has to know by heart. Instead of being enteretaining, I try to be practical. I'll go through the basics, as well as some useful applications and tricks.

1. Bits, bytes, words, dwords, and so on

Computers are built on transistors, which are kind of like switches. Either electricity flows or it doesn't. On and off. One and zero. It follows from this that the very basic operations in computing are done through bits.

Doing more complicated calculations requires us to use more bits. The magnitude of an unsigned integer that can be represented by N bits is easily calculated as 2^N-1, like so:

Bits	2^N	Max value
1	2	1
2	4	3
3	8	7
4	16	15
5	32	31
6	64	63
7	128	127
8	256	255
9	512	511
10	1024	1023
11	2048	2047
12	4096	4095
13	8192	8191
14	16384	16383
15	32768	32767
16	65536	65535

If you've played with computers for a while, you should recognize many of the above values from various places in computer software and hardware.

Just to make sure we're on the same page with the notation:

2^N means 2*2*.. N times. i.e. 2⁴ = 2*2*2*2 = 16

Typical, but not in any way standard naming convention for groups of bits is as follows:

Name	Bits	Max value
Bit	1	1
Nibble	4	15
Byte	8	255
Word	16	65535
Doubleword	32	4294967295

Historically, a 'word' is supposed to be the native machine word on whatever platform you're working on, which then should on the current 32-bit platforms be 32-bit. However, the longevity of the x86 platform has blurred things a bit. I don't know if there's any standardized term for 64-bit words, but I've heard qword (or quadrupleword) been thrown around.

The C standard is a bit loose on the integer data type definitions, but the 'de facto' sizes for data types are as follows:

Data type	Bits
char	8
short	16
int	32
long	32
long long	64

The definitions of the data types are bound to change from one platform to another. For example, on 64-bit platforms, 'long' may be 64-bit, and on 16-bit platforms, 'int' may be 16-bit.

Note the data type char. It was supposed to be a data type that contains a character, which back then meant the western alphabet, which conveniently fits inside 8 bits. Java chars are 16-bit, and most of the internationalized applications these days use unicode, and that's way outside the scope of this tutorial. Let's just say that chars are usually 8 bit and get over it.

The C99 standard adds a header stdint.h which includes definitions for specific integer types, including int16_t for 16-bit signed integer and uint32_t for unsigned 32-bit integer. This header has finally started to appear even in Microsoft's tools, so you may wish to use it (if you care about specific variable type sizes, that is).

There's no native data types for nibbles or bits, but they're handy to know nevertheless. Nibble happens to be a half-byte, and can be expressed with one hexadecimal digit. I won't go deep into hexadecimals (or octals) here, though.

Native data types are by default signed (although there are some exceptions here). Most of the bit manipulations work equally well on signed and unsigned values, while others are tricky or even undefined on signed values. Hence, if you run into problems, first see if your variables are unsigned.

Since standard C does not have a notation for entering binary numbers, using hex is the easiest way. Setting single bits in hex is relatively easy, like so:

Bit pattern	Hex
00000000	0x00
00000001	0x01
00000010	0x02
00000100	0x04
00001000	0x08
00010000	0x10
00100000	0x20
01000000	0x40
10000000	0x80

2. AND, OR, NOT, XOR

The three common Boolean operations are AND, OR and NOT. XOR is also so handy that it's good to know. (XOR can be implemented using AND, OR and NOT. Trust me).

AND ('&') can be thought of as a multiplication and OR ('|') can be considered as an addition operation. If the result is zero, the result is zero. If it's anything else, the result is one.

NOT ('~') negates the value. Meaning, 0 becomes 1, and 1 becomes 0.

XOR ('^'), or exclusive or, is 1 if the two parameters don't agree, and 0 if they do.

A	B	A and B	A or B	A xor B	not A	not B
0	0	0	0	0	1	1
1	0	0	1	1	0	1
0	1	0	1	1	1	0
1	1	1	1	0	0	0

In C, these operations don't work on single bits, but whole variables at the same time. Brace yourself, the following table may look somewhat confusing, but it's really just the same thing as above:

Bit pattern	Operation	Value
0000011110110111	A	1975
0000011111010110	B	2006
0000011110010110	A and B	1942
0000011111110111	A or B	2039
0000000001100001	A xor B	97
1111100001001000	not A	63560
1111100000101001	not B	63529

Wasn't that fun? Ok, so what does this have to do with anything?

Many APIs have been developed so that different features are enabled or disabled using bit masks.

Let's say you have a couple of bit masks, FEATURE1 and FEATURE2, and you need to set a variable to hold these bits. You'll want to use the operator that combines the masks, i.e. turns the bits on. You'll want to use OR.

variable = FEATURE1 | FEATURE2;

Now then, if you want to test if variable has FEATURE1 bits on. You want to separate the bits out of the variable, or in other words you want to find the bits that are on in both variable and FEATURE1. Choose AND.

if ((variable & FEATURE1) == FEATURE1)

If the FEATURE1 has only one bit on, you can simplify this to:

if (variable & FEATURE1)

If you're unsure what FEATURE1 includes, or if you don't want to check whether any of the bits are enabled in variable, it's better to use the first version.

If you get a variable, and have no idea what it contains, but you need to disable FEATURE2, you'll end up with the following kind of snippet:

variable = variable & ~(FEATURE2);

Here variable gets ANDed with all bits on except the ones that were on in FEATURE2.

One thing to keep in mind is that C has two kinds of boolean operators, namely logical and binary operators. Logical operators come in pairs - && and || versus the binary operators & and |. The logical operators don't care about binary contents of the parameters, and just check if the whole variable is zero or not. You've been warned.

So what's the XOR useful for?

If you want to flip the FEATURE1 on and off, but don't really care which state it came in with, you can use the XOR operator.

variable = variable ^ FEATURE1;

A variable can also be cleared by XORing it with itself. Generally speaking, XORing something twice with the same value returns you to the original value. This has led to some very simple encryption algorithms.

3. Shifting

Shifting is a very useful bitwise operation, as many multiplications and divisions can be done using simple shift operations, and it's faster than using actual multiplication and division (except for some perverse platforms such as the 8051 microcontroller).

There are two different kinds of shift operations; arithmetic and logical. For unsigned values, the two are the same, but for negative values, the arithmetic shift keeps negative values as negative.

The C standard (as far as I know, anyway) does not define what should happen if signed values are shifted, but typically C compilers seem to perform arithmetic shifts on signed values and logical shifts on unsigned values.

So what's shifting about, then? Moving bits within a word.

Bits	Operation	Value
0000011110110111	none	1975
0000111101101110	<< 1	3950
0001111011011100	<< 2	7900
0011110110111000	<< 3	15800
0111101101110000	<< 4	31600
1111011011100000	<< 5	63200
1110110111000000	<< 6	60864
1101101110000000	<< 7	56192
1011011100000000	<< 8	46848

As seen from the table above, shifting to the left multiplies the value by 2. Shifting by two multiplies by 4, shifting by 4 multiplies by 16, and shifting by 6 multiplies by 64. Sound familiar? Back to the powers of two.

As the bits are shifted left, new zeros are added to the right end of the bit stream.

1975 multiplied by 64 is actually 126400, but since we only had 16 bits available above (or a 'short'), the highest bits get dropped off and the value is truncated to 60864.

Confused? It's ok, this might take some time to figure out.

Similar operations shifting right:

Bits	Operation	Value
0000011110110111	none	1975
0000001111011011	>> 1	987
0000000111101101	>> 2	493
0000000011110110	>> 3	246
0000000001111011	>> 4	123
0000000000111101	>> 5	61
0000000000011110	>> 6	30
0000000000001111	>> 7	15
0000000000000111	>> 8	7

Here shifting by one bit is division by two. By two bits is division by 4, 4 is division by 16, and 6 is division by 64.

As the bits are shifted right, new zeros (here, red) are added to the left end of the bit stream.

Note that if you shift enough to the left or right, you'll end up with zero! If you shift with larger values than the target word, the C standard doesn't even guarantee that you end up with zero. Thus, it's better not to try.

In arithmetic shifts, the negative values are recognized by the fact that the highest bit is 1, and the highest bit is simply replicated. It's usually safer to just stick to unsigned values when shifting things in C.

Most CPU architectures also have operations for bit rotation. This is also a very handy operation, where the bits that would be shifted outside the word are inserted back in to the other end of the word. Unfortunately C does not have an operator for this, and this is an example where many operations in C may result in a single operation in Assembler.

4. Tricks and tips

When converted to binary (use a calculator if you want), 2^N results in one bit on and the rest of the bits as zero, while 2^N-1 ends up with a string of ones. This is very useful in truncating values.

Let's say you have a variable and want to discard all too high bits so that the value is below 64. Since 64-1 = 63, and 63 happens to have a bit pattern of just six '1':s, you can do the following:

variable = variable & 63;

You may end up in a situation where you need to truncate a value to some power of two. You can be sneaky about it and write something like;

variable = variable & ((1 << poweroftwo) - 1);

Let's say you need to multiply by some value that's not a power of two; say, 320, and you don't want to (or can't, for some reason) use multiply operations. What should you do?

x * 320 is the same thing as x * 256 + x * 64, so all you need to do is shift x left by 8 and by 6 and add these values together.

variable = (variable << 8) + (variable << 6);

Please note the use of parentheses. Since the shift operators are evaluated in a bit awkward order in C, it's usually better to be safe than sorry.

Pixel color values are often packed into single integers. For instance, if you have a 16-bit color value of the format RGB 5:6:5, the highest 5 bits are red, middle 6 bits are green, and the lowest 5 bits are blue. Should you want to extract the color values from a variable, you could use:

red   = (variable >> 11) & ((1 << 5) - 1);
green = (variable >>  5) & ((1 << 6) - 1);
blue  = (variable >>  0) & ((1 << 5) - 1);

Let's say you get a 5-bit value (from the above 5:6:5 pixel format, for instance) and want to extend this to 8-bit so that zero is zero while maximum value is maximum value for both values. You need to fill out the bottom bits as well, like so:

variable = (variable << 3) | (variable >> 2);

This results in the lower bits filling out very neatly, like follows:

5 bits	8 bits
00000	00000000
00001	00001000
00011	00011000
00111	00111001
10111	10111101
11111	11111111

A related trick from Jim Blinn's "dirty pixels" chapter 19 is where you wish to multiply two 8-bit values and want 0*0 to be 0 and 255*255 to be 255 (since in pixel-speak, 255 equals 1.0);

temp = a * b + 128;
result = (temp + (temp >> 8)) >> 8;

This is sometimes called the "Blinn fix". Without this kind of operation, blending white with white would result in non-white, and applying this several times would end up with something that's not really.. white.

This is related to fixed point math, which you may be interested in; Jetro Lauha did a pretty good seminar about that subject. Slides as PDF and a video of the seminar are available.

If you wish to generate a checkerboard pattern, you can use XOR on one bit in the X and Y coordinates. For example,

for (y = 0; y < 128; y++)
{
   for (x = 0; x < 128; x++)
   {
   int color;
   if ((x & 1) ^ (y & 1))
   {
      color = white;
   }
   else
   {
      color = black;
   }        
   putpixel(x, y, color);
   }
}

For bigger checker squares, you could pick some higher bit in the coordinates. For 16x16 checkers, you could do (x & 4) ^ (y & 4).

Another XOR trick is swapping two values without a temporary variable. This is a great trick to use if you want to reduce the readability of your code..

a = a ^ b;
b = b ^ a;
a = a ^ b;

5. Further reading

A great compilation of bit twiddling hacks:

http://graphics.stanford.edu/~seander/bithacks.html

I recommend printing this out for safekeeping, should it disappear.

Wikipedia has lots of pages related to boolean algebra:

http://en.wikipedia.org/wiki/List_of_Boolean_algebra_topics

The formal stuff can be found at:

http://en.wikipedia.org/wiki/Boolean_algebra

You may especially be interested in Karnaugh maps:

http://en.wikipedia.org/wiki/Karnaugh_map

and De Morgan's laws:

http://en.wikipedia.org/wiki/De_Morgan%27s_laws

although it's unlikely that you'll ever use those outside the classroom environment.

Any comments etc. can be emailed to me.