# Arduino Adventures: Does 255/256 = 0 or 1?

Max Maxfield tries to wrap his brain around the problems associated with truncating the results from integer division operations.

As you may recall from my previous blogs, I recently purchased a 4x4x4 3D tri-colored LED cube kit powered by an Arduino-compatible controller. Creating different effects for this rascal is helping me reacquaint myself with the C programming language (well, the Arduino version thereof). In turn, this is helping prepare me to program my Arduino-powered robot.

I will be writing a more in-depth column on my cube in the not-so-distant future. Suffice it to say, for the moment, that I started out with some very simple test patterns, and that I've gradually increased the complexity and sophistication of my effects.

Each LED has three color channels: red, green, and blue. Each channel is controlled using pulse-width modulation. The values that can be assigned to each channel range from 0x00 to 0xFF (0 to 255 in decimal). Those two values equate to being fully off and fully on, respectively.

Up until now, my effects have involved driving the various channels hard on or hard off, thus allowing me to assign seven colors (red, green, blue, red + green, red + blue, green + blue, and red + green + blue) to each LED. Now I'm at the stage where I want to experiment with a wider gamut of colors and to transition gradually from one color to another, and therein lies my problem.

For the purpose of this discussion, let's assume that we are interested in a single color channel for a single LED, and that I have two global variables of type "int" (integer): oldColor and newColor. Each will be set initially to either 0 or 255. Let's also assume that I have a function that starts off looking a bit like this.

The idea is that, when this function is called, there are four possible value combinations for oldColor and newColor.

This explains why I'm using integer variables instead of byte (eight-bit) variables. I want to be able to employ negative values. The howMany parameter specifies how many steps we wish to use to transition from the old color value to the new color value. The parameter is also used to calculate the modColor value, which is the amount by which we wish to keep incrementing or decrementing the oldColor (starting) value until it reaches the newColor (ending) value.

Purely for the purposes of these discussions, let's assume that we pass a value of 4 into the howMany parameter when we call out the lightLEDs function. If the old and new colors start off with the same value (both 0 or both 255), then modColor will end up as 0, so nothing will change as we circle around our for loop. Of course, this is what we want in this case.

Now suppose that we call this function when the old color is 0 and the new color is 255. In this case, modColor will be calculated as 255/4 = 63.73. This will be truncated to 63, since modColor was defined as an integer. (We certainly don't want to use floating-point values. They can consume lots of memory, and performing operations on them can involve a lot of compute cycles if our processor doesn't have a floating-point unit.) In this particular case, this is close enough for government work, as they say. This also explains why I used "i <= howMany" as the control in our for loop. If I had used "i < howMany," we would have gone around our for loop four times (i = 0, 1, 2, and 3). Our tmpColor variable would have been 4x63 = 252. By performing the loop an extra cycle while capping the value at 255, we ensure that we leave the LED fully on when we exit the loop.

Now let's consider what happens when the old color is 255 and the new color is 0. Assuming howMany is 4, then modColor will be calculated as -255/4 = -63.73, which will be rounded up (or down, depending on your point of view) to -64. In this case, we really need to perform the loop only four times -- 255-(4x-64) = -1, which will be bottom capped to 0 -- not five times, as the current code would have it. Of course, since we *are* bottom capping the value at 0, the extra cycle won't hurt us, but it's a waste of computing resources, which niggles at me.

The problem is that things get worse the more steps we take. Let's assume the old color is 0, the new color is 255, and we decide to perform the transition using 128 steps. In this case, modColor will be calculated as 255/128 = 1.99, which will be truncated to 1. Even if we perform our for loop 129 times (0 to 128), we'll still end up with a final color value of 129 instead of 255. Or, as an absolute worst case, suppose the old color is 0, the new color is 255, and we perform the transition using 256 steps. Now we end up with modColor as 255/256 = 0.99, which will be truncated to 0. That means we won't change the color at all.

From the above, we know that we don't have to worry about fading the color down from 255 to 0, which involves modColor having negative values, because two's complement values automatically round (truncate) toward negative infinity. It's the positive values we have to worry about. So, how about if I modify my function to look like the following?

By Jove, I think I've solved it. As you can see, if we are fading the LED up (modColor has positive values), we start off multiplying modColor by 2 before dividing it by howMany. Next we extract the least significant bit. If this is a 1, then we know the system will round our value downward, so after we've divided by 2 (to counteract our original multiplication by 2), we add this 1 back in.

Remembering that multiplications and divisions by 2 involve shifting the two's complement value one bit left or right, respectively, this doesn't add much of a computational burden. It also means that we can use "i < howMany" as our loop control.

To see how this works, let's return to our original example (old color = 0, new color = 255, number of steps = 4). When we calculate our modColor, we start by multiplying 255 by 2 to generate 510. Next, we divide this value by 4 to generate 127.5, which will be truncated to 127, or 01111111 in binary. This is the point where we extract the least significant bit -- in this case, 1. Now we divide 127 by 2 to generate 63.5, which will be truncated to 63. Finally, we add the 1, representing the old least significant bit, and get a modColor value of 64. When we proceed around our loop for i = 0, 1, 2, and 3, our corresponding tmpColor values will be 64, 128, 129, and 256. The last one will be capped at 255.

What do you think? Have I missed anything, or have I cracked this problem? Also, can you think of a more elegant way to achieve what I'm trying to do?

**Related posts:**

- Silicon Labs' Energy-Friendly ARM Cortex-M0+ MCUs
- See Bugblat's TIF (Tiny FPGA Board) & PIF (FPGA for Raspberry Pi)
- What's the Best Robot Base for My Arduino?
- Infographic: The AVR MCU — Atmel's Secret Maker Sauce
- The Internet of Things That Go Bump in the Night
- Do Arduinos Dream of Electric Sheep?
- Be an Atmel AVR Hero & Win $1,000

Author

Max The Magnificent 10/26/2013 5:58:30 PM

To obtain the same behaviour...Wonderful -- thanks for this (now I'll try to wrap my brain around it :-)

Author

pviry750 10/26/2013 11:09:21 AM

You're in effect drawing a straight line in a discontinous space, the coordinates being time and color value. The standard way to do this efficiently is the Bresenham algorithm. The idea is to incrementally compute the current error between the next integer color value (y coordinate) and the next "ideal" color value, and when this error becomes too large to update the current color value accordingly. Here is the code for the inner loop (excerpt from the wikipedia page) :

Note that there is no integer division in the loop, hence no room for rounding errors. And it's efficient: never more than a comparison and two additions at each iteration.

Author

josyb 10/26/2013 7:03:14 AM

To obtain the same behaviour,

rounding away from zerowhen dividing positive numbers as you get when dividing negative numbers:if ( modcolor > 0 ) modcolor = (modcolor + howmany - 1) / howmany ;else modcolor /= howmany ;

You can write this in a single line perhaps generating less code (as only one division operation is specified):modcolor = (modcolor + (modcolor > 0 ? howmany - 1 : 0)) / howmany ;

This trick can also be modified forrounding half away from zero:if (modcolor < 0) {

sign = 1 ;

modcolor = -modcolor ;

}

else sign = 0 ;

modcolor = (modcolor + (howmany / 2) - 1) / howmany ;

if (sign = 1) modcolor = -modcolor ;

If you want e.g. 1.5 to be rounded to 2 you can leave out the '-1'Author

Max The Magnificent 10/25/2013 5:53:17 PM

Please, give me feedback once you try thisI will do so

If I can help you with other chunk of C code, just let me know ;-)Don't worry ... you can count on it LOL

Author

Max The Magnificent 10/25/2013 5:52:15 PM

Author

Garcia-Lasheras 10/25/2013 5:42:26 PM

"I cannot wait to go home and try your solution thsi evening..."Please, give me feedback once you try this -- if I can help you with other chunk of C code, just let me know ;-)

Author

DrFPGA 10/25/2013 4:43:48 PM

Author

Max The Magnificent 10/25/2013 3:40:24 PM

Sometimes using Fixed-Point arithmetic can help with these issues...It all sounds so simple the way you say it LOL

Actually, you are correct in that I had completely neglected to consider using something like a 10.6 fixed-point representation. For some tasks this would be very appropriate -- but for my current application I think I'll stick with the solution from Javi (Garcia).

Author

Alvie 10/25/2013 3:21:18 PM

Sometimes using Fixed-Point arithmetic can help with these issues.

The basics behind FixedPoint are quite simple, and easy to implement in most architectures. Let's have a look at your concrete problem (255/256) and see how we could get a somewhat accurate output from it.

The easiest way to implement this would be to use a 10.6 fixed point, where 10 bits are used for the real part (signed number range would be between +511 and -512), and 6 bits for the "fractionary" part - since 6 bits can represent 64 different values, each bit can represent 1/64, 0.01625. This is your resolution in terms of the fractionary part.

The beauty about fixed-point is that most of the arithmetic is done with the integer representation. This includes addition/subtraction and multiplication.

In order to add two 10.6 fixed-point values, represented by a 16-bit integer, just... add them. The result will also be a 10.6 fixed point value. Same for subtraction. For multiplication, the result will not be 10.6, but rather 20.12 - 32 bits. In order to convert this multiplication back to 10.6, just shift it by 6 bits, and truncate the result (note: if you're dealing with signed numbers, you must do this in a slightly different way, in order to preserve the highest bit signal).

So, let's take a look at your case: 255/256.

First, let's represent them as fixed-point, 10.6 numbers, in binary

:

255 => 0011111111.000000

256 => 0100000000.000000

and:

1 => 0000000001.000000

In the following explanations, I'll prefix the numbers with "f" if they are representations of fixed-point values, and no prefix for their integer representations.

Now, let's invert f256. This is done by computing (f1/f256), also in fixed-point format. In order to perform this division, we use the integer representations to perform an integer division. But, similarly to the multiplication (which doubles the number of fractionary bits), the division will also change the fractionary bits - not only that, it will indeed remove them.

Let's take a look at a simple (f1/f1) division. This ought to result in f1, of course. If we take a clear look at the f1 representation above, and compute it's integer value, we get 2^6 == 64. If we divide the values using their integer representation, we get 64/64, which gives us 1. But this integer "1" does not represent "f1", it represents "0000000000.000001". So, we just lost a lot of precision (6 bits actually) by performing the division. To fix this, we shift the result by 6 bits to the left, and get the correct value: "0000000001.000000".

Now, another division. Let's divide f3 by f2, the result should be f1.5. The representation of f3 is "0000000011.000000", and f2 is "0000000010.000000". The integer representations are 192 and 128, respectively. If we perform the integer division, we get 1. If we shift this by 6-bits as in previous example, we get "0000000001.000000", which is f1 - not f1.5. So, we just lost all precision !!!

There are a few solutions to overcome this. One of them is to get a few more resolution by extending the numbers to the next representation, which in this case would be 32-bit. Basically, what we will be doing is to transform the equation:

x = a / b;

into

x = ( (a * N) / b ) / N

which is equivalent. The "N" value would be ((2^6)-1) in this case - which matches the 6-bit shift we have to perform to correct the division result - if we multiply the first argument by the same "shift amount" we have to perform at the end, we don't need to perform the final shift. We can also use a shift to perform the computation of the first operand.

So, for the case of f3/f2, we will get:

a*N (f3 times N): 0000000011.000000[000000] (integer representation: 12288)

b: [000000]0000000001.000000 (integer representation: 128)

And the result is 96 (12288 / 128). In binary form (and fixed point form) it's : [000000]0000000001.100000.

Integer part is 1, and fractionary part is '100000' in binary, 32 in decimal. If we multiply 32 by the fractionary resolution we computed in first place (0.01625), we get... 0.52, close enough to the "0.5" we need to add to the integer part.

Doing the same for your 255/256 values, we get a nicely looking value (result of (255<<6)/256):

0000000000.111111

We are very close to f1, and we can also truncate the result. But that would not be wise : the result is way more close to f1 than to f0.

So, we want to do rounding before converting that to an integer (which means shifting by 6 to the right). In order to do so, we add a '1' in the MSB of the to-be-discarded part. We are discarding 6 bits, so we have to add '100000' to the result:

0000000000.111111

+0000000000.100000

= 0000000001.011111.

Shift by 6 to the right.... gives "1".

So, here's your answer, and a quick, dirty and confuse explanation of fixed point arithmetic :)

Author

Max The Magnificent 10/25/2013 2:55:31 PM

Of course, I'm handling all 64 LEDs each with three color channels, but the way I'm doing it is by means of a plan so cunning we could pin a tail on it and call it a weasel :-)