Playing with bit masks

With storage and bandwidth being easily available, compressing data for general use is less of a concern today than it was 20 years ago. However neither is free or unlimited, and with large scale applications a few bits can get multiplied over a few million connections; similarly, we may be talking to a client living on a smart toaster which doesn’t have much in the way of storage space. In both situations, reducing the size of the data we are storing or sending may be technically necessary, and save on operational costs.

One way to do this is to use bit operations to reduce a state to a single number. This is nothing new; anyone who has ever used flags knows how it works. The catch is that you cannot crunch detailed information in this way; it only works for boolean values. Even so, depending on the structure of your data you should be able to leverage it rather easily.

For these examples we’ll use JavaScript; however the concepts apply to any language.

Let’s start with a JSON object. JSON is great, but even if it’s a massive step up from good old XML, it can still be rather verbose. Say we have the following state object we want to save or send.

var gameStateFlags = {
	"mission.abc.complete": false,
	"mission.xyz.complete": true,
	"mission.123.complete": false,
	"collected.gizmo.123": true,
	"talked.to.npc.ssadf": false,
	"talked.to.npc.eoesk": true
}

We can easily compress this, as long as we have a set of numeric values we can map to each property. Let’s say:

var states = {
	"mission.abc.complete": 1,
	"mission.xyz.complete": 2,
	"mission.123.complete": 4,
	"collected.gizmo.123": 8,
	"talked.to.npc.ssadf": 16,
	"talked.to.npc.eoesk": 32
}

Each successive property name is mapped to 2 to the power of n. You could even autogenerate this, though I’m not a fan of that approach; changing the order would cause different values to be mapped, leading to glitches when reading data stored using a different map.

Now that we have a numeric value for the properties, we can OR them together where they are true:

var serialize = function(state) {
	var serializedState = 0;

	Object.keys(state).forEach(function(flag) {
		if (state[flag]) serializedState |= states[flag];
	});

	return serializedState;
}

Let’s call serialize(gameStateFlags). We start with a serializedState of 0 (or 00000000 if we assume an 8 bit number). If we try to process each property in turn:
“mission.abc.complete” is false so nothing happens.
“mission.xyz.complete” is true, so we find its value (2, or 00000010) and OR it into the state. 00000000 | 00000010 gives us 00000010.
“mission.123.complete” is false so nothing happens.
“collected.gizmo.123”: is true, so we find its value (8, or 00001000) and OR it into the state. 00000010 | 00001000 gives us 00001010.
“talked.to.npc.ssadf”: is false so nothing happens.
“talked.to.npc.eoesk”: is true, so we find its value (32, or 00100000) and OR it into the state. 00001010 | 00100000 gives us 00101010.

The end result is 00101010, or 42. (Sure, we could have found this just by doing 2 + 8 + 32, but bear with me – it will help understand the reading later). We can send this over the wire, or store it, using much less space than it would have taken to store the entire object.

Now, how do we turn our number back into a usable object? Provided (and this is important) that the reader has the same set of mapped values, we can just reverse the process using the AND operator:

var deserialize = function(stateValue) {
	var deserializedState = {};

	Object.keys(states).forEach(function(flag) {
		deserializedState[flag] = ((stateValue & states[flag]) != 0);
	});

	return deserializedState;
}

If we deserialize(42), we will compare the bit for each property against the bits that make up our number. The & will work as a bit mask, and the result will only contain bits which appear in both numbers. Let’s take the first one, “mission.abc.complete”. The value for that is 1, or 00000001.

42: 00101010
01: 00000001
————
&: 00000000

In this case, since there are no matching bits we get 0, which we interpret as false.

Taking the second one (“mission.xyz.complete”, value 2):

42: 00101010
01: 00000010
————
&: 00000010

This time we do have a matching bit at position 2, so we get a non-zero value which we interpret as 2.

This is the reason we only use 2^n for the mapping, rather than just picking any which number. “3” would be ambiguous, since it can be the result of both 1 and 2 being present. That said, the reader can use this to advantage if it is only interested in knowing some facts without processing the whole thing. Let’s say your reader only needs to know if “mission.abc.complete” or “collected.gizmo.123” are true. Rather than desrializing the whole thing, you could just use the following:

var missionAbcCompleteOrCollectedGizmo123Mask = states["mission.abc.complete"] + states["collected.gizmo.123"];

This will set the value to 9 (1 + 8). Checking this against our original value, 42:

var missionAbcCompleteOrCollectedGizmo123 = (42 & missionAbcCompleteOrCollectedGizmo123Mask) != 0;

42: 00101010
01: 00001001
————
&: 00001000

This gives 8 (the matching bit), a non-zero value, so we know that one of the two criteria was met.

While this is a pretty old school way of doing things, it’s still a useful trick to have up one’s sleeve, especially if one is working with smaller, low power devices. If you have any questions or thoughts on the subject, let me know in the comments below!

Moar JavaScript? Check out The Little Book of JavaScript!