# Arduino Adventures: Does 255/256 = 0 or 1?

Max Maxfield tries to wrap his brain around the problems associated with truncating the results from integer division operations.

As you may recall from my previous blogs, I recently purchased a 4x4x4 3D tri-colored LED cube kit powered by an Arduino-compatible controller. Creating different effects for this rascal is helping me reacquaint myself with the C programming language (well, the Arduino version thereof). In turn, this is helping prepare me to program my Arduino-powered robot.

I will be writing a more in-depth column on my cube in the not-so-distant future. Suffice it to say, for the moment, that I started out with some very simple test patterns, and that I've gradually increased the complexity and sophistication of my effects.

Each LED has three color channels: red, green, and blue. Each channel is controlled using pulse-width modulation. The values that can be assigned to each channel range from 0x00 to 0xFF (0 to 255 in decimal). Those two values equate to being fully off and fully on, respectively.

Up until now, my effects have involved driving the various channels hard on or hard off, thus allowing me to assign seven colors (red, green, blue, red + green, red + blue, green + blue, and red + green + blue) to each LED. Now I'm at the stage where I want to experiment with a wider gamut of colors and to transition gradually from one color to another, and therein lies my problem.

For the purpose of this discussion, let's assume that we are interested in a single color channel for a single LED, and that I have two global variables of type "int" (integer): oldColor and newColor. Each will be set initially to either 0 or 255. Let's also assume that I have a function that starts off looking a bit like this.

The idea is that, when this function is called, there are four possible value combinations for oldColor and newColor.

This explains why I'm using integer variables instead of byte (eight-bit) variables. I want to be able to employ negative values. The howMany parameter specifies how many steps we wish to use to transition from the old color value to the new color value. The parameter is also used to calculate the modColor value, which is the amount by which we wish to keep incrementing or decrementing the oldColor (starting) value until it reaches the newColor (ending) value.

Purely for the purposes of these discussions, let's assume that we pass a value of 4 into the howMany parameter when we call out the lightLEDs function. If the old and new colors start off with the same value (both 0 or both 255), then modColor will end up as 0, so nothing will change as we circle around our for loop. Of course, this is what we want in this case.

Now suppose that we call this function when the old color is 0 and the new color is 255. In this case, modColor will be calculated as 255/4 = 63.73. This will be truncated to 63, since modColor was defined as an integer. (We certainly don't want to use floating-point values. They can consume lots of memory, and performing operations on them can involve a lot of compute cycles if our processor doesn't have a floating-point unit.) In this particular case, this is close enough for government work, as they say. This also explains why I used "i <= howMany" as the control in our for loop. If I had used "i < howMany," we would have gone around our for loop four times (i = 0, 1, 2, and 3). Our tmpColor variable would have been 4x63 = 252. By performing the loop an extra cycle while capping the value at 255, we ensure that we leave the LED fully on when we exit the loop.

Now let's consider what happens when the old color is 255 and the new color is 0. Assuming howMany is 4, then modColor will be calculated as -255/4 = -63.73, which will be rounded up (or down, depending on your point of view) to -64. In this case, we really need to perform the loop only four times -- 255-(4x-64) = -1, which will be bottom capped to 0 -- not five times, as the current code would have it. Of course, since we *are* bottom capping the value at 0, the extra cycle won't hurt us, but it's a waste of computing resources, which niggles at me.

The problem is that things get worse the more steps we take. Let's assume the old color is 0, the new color is 255, and we decide to perform the transition using 128 steps. In this case, modColor will be calculated as 255/128 = 1.99, which will be truncated to 1. Even if we perform our for loop 129 times (0 to 128), we'll still end up with a final color value of 129 instead of 255. Or, as an absolute worst case, suppose the old color is 0, the new color is 255, and we perform the transition using 256 steps. Now we end up with modColor as 255/256 = 0.99, which will be truncated to 0. That means we won't change the color at all.

From the above, we know that we don't have to worry about fading the color down from 255 to 0, which involves modColor having negative values, because two's complement values automatically round (truncate) toward negative infinity. It's the positive values we have to worry about. So, how about if I modify my function to look like the following?

By Jove, I think I've solved it. As you can see, if we are fading the LED up (modColor has positive values), we start off multiplying modColor by 2 before dividing it by howMany. Next we extract the least significant bit. If this is a 1, then we know the system will round our value downward, so after we've divided by 2 (to counteract our original multiplication by 2), we add this 1 back in.

Remembering that multiplications and divisions by 2 involve shifting the two's complement value one bit left or right, respectively, this doesn't add much of a computational burden. It also means that we can use "i < howMany" as our loop control.

To see how this works, let's return to our original example (old color = 0, new color = 255, number of steps = 4). When we calculate our modColor, we start by multiplying 255 by 2 to generate 510. Next, we divide this value by 4 to generate 127.5, which will be truncated to 127, or 01111111 in binary. This is the point where we extract the least significant bit -- in this case, 1. Now we divide 127 by 2 to generate 63.5, which will be truncated to 63. Finally, we add the 1, representing the old least significant bit, and get a modColor value of 64. When we proceed around our loop for i = 0, 1, 2, and 3, our corresponding tmpColor values will be 64, 128, 129, and 256. The last one will be capped at 255.

What do you think? Have I missed anything, or have I cracked this problem? Also, can you think of a more elegant way to achieve what I'm trying to do?

**Related posts:**

- Silicon Labs' Energy-Friendly ARM Cortex-M0+ MCUs
- See Bugblat's TIF (Tiny FPGA Board) & PIF (FPGA for Raspberry Pi)
- What's the Best Robot Base for My Arduino?
- Infographic: The AVR MCU — Atmel's Secret Maker Sauce
- The Internet of Things That Go Bump in the Night
- Do Arduinos Dream of Electric Sheep?
- Be an Atmel AVR Hero & Win $1,000

Author

Garcia-Lasheras 10/24/2013 8:07:50 PM

But there are a lot of problems in using integers for this purpose too, as we cannot deal with decimal values. One of them is that the LED intensity increase/decrease is not going to be linear along the cycles -- it may reach the final value far before the for loop ends.

I propose you a bit more computational intensive solution, but I believe the LED regulation may be nicer:

void lightLEDs (int howMany,

int howLong)

{

int tmpColor = oldColor;

for (int i = 1; i <= howMany; i++)

{

if (newColor != oldColor)

{

tmpColor = 255 * i;

tmpColor = tmpColor / howMany;

if (newColor > oldColor)

tmpColor = 0 + tmpColor;

else

tmpColor = 255 - tmpColor;

}

// Use current tmpColor to drive LED

delay(howLong);

}

}

Author

salbayeng 10/25/2013 1:10:56 AM

I use the delta addition approach in Servo systems as well as LED drivers.

I call it "phase accumulation" , the idea has been around for years , embodied in DDS chips.

In your case I would set up an integer "phase accumulator" and then overlay the "led brightness" byte on the high 8 bits. You then add "ramping speed" to "phase accumulator"

All of the accumulating is done in an interrupt routine, so you can do multiple LED's , and other system timers at once , and can have long fade times. You can also interrupt a fade and fade smoothly off to something else.

--------------------------

Re the "missing bit" , yeh you get this issue with fixed point multiply (and divide) . From memory if you do it in assembler you can keep track of the missing bit, something like this: (plucked from code I wrote years ago, and the detail forgotton)

mulsu r23, r20

SBC r19, _zero

ADD r17, r0

ADC r18, r1

ADC r19, _zero

I can't remember exactly how this works , this piece is transferring the partial results from a multiply into the result. Note the

SBC r19, _zero, this seems to be pointlessly subtracting zero from a register, but its actually propagating the missing bit. Note the Atmel chips have 6 flavors of multiplyMUL Rd, RrMultiply Unsigned R1:R0 ← Rd x RrMULS Rd, RrMultiply Signed R1:R0 ← Rd x RrMULSU Rd, RrMultiply Signed with Unsigned R1:R0 ← Rd x RrFMUL Rd, RrFractional Multiply Unsigned R1:R0 ← (Rd x Rr) << 1FMULS Rd, RrFractional Multiply Signed R1:R0 ← (Rd x Rr) << 1FMULSU Rd, RrFractional Multiply Signed with Unsigned R1:R0 ← (Rd x Rr) << 1So if you go to the trouble of writing in assembler,and pick and choose the type of MUL you use, you can make 255/256=1 , It's also useful to see whether "negative zero" can appear in your algorithm and whether it is handled correctly typically -255/-256 = 1 while 255/256=0 but what about 0/-256 ??

A possible fix to your problem is to use a condition test like <= in one direction and > in the other

Author

Max The Magnificent 10/25/2013 9:58:34 AM

Author

Max The Magnificent 10/25/2013 10:02:01 AM

It's also useful to see whether "negative zero" can appear in your algorithmNow you are making my brain ache -- surely one of the main points about the twos complement form of representation is that you cannot get negative zero...

Author

Max The Magnificent 10/25/2013 10:10:12 AM

if (newColor > oldColor)

tmpColor = 0 + tmpColor;

else

tmpColor = 255 - tmpColor;

Isn't the first part of this redundant? Couldn't we just say:

if (newColor < oldColor)

tmpColor = 255 - tmpColor;

Or are we doing it your way to keep the time pretty much consistent for fade-ups and fade-downs by performing roughy the same number/type of calculations each way?

Author

Garcia-Lasheras 10/25/2013 10:19:34 AM

In addition, about the timing coherence, I was assuming that delay(howLong) dominates the fading, so I was not thinking on exactly matching the number of CPU cycles -- I just love symmetry ;-)

The code could be then:

void lightLEDs (int howMany,

int howLong)

{

int tmpColor = oldColor;

for (int i = 1; i <= howMany; i++)

{

if (newColor != oldColor)

{

tmpColor = 255 * i;

tmpColor = tmpColor / howMany;

if (newColor < oldColor)

tmpColor = 255 - tmpColor;

}

// Use current tmpColor to drive LED

delay(howLong);

}

}

Author

Wnderer 10/25/2013 10:54:12 AM

if (modColor > 0) modColor = (modColor + 1) / howMany

else if (modColor < 0) modColor = (modColor - 1)/howMany

if modColor is either 0, 255 or -255 and howMany is always even then you end up with a couple of compares, an increment, a decrement and some bit shifts. If I recall correctly that switch statement will create a jump table, so you're better off with if statements if you're looking for compact assembly code.

Author

Max The Magnificent 10/25/2013 2:52:19 PM

I might be missing something but why not just add or subtract 1?I've not tried this yet, but I think that if I try to fade the LEDs up over 256 increments / levels then this will consume a lot of processing cycles and may be overkill (changes not discernable to the human eye). Even if I transition the color using say 16 steps with 100 milliseconds (0.1 seconds) between each step, this still equates to 1.6 seconds.

My idea of being able to specify the "howMany" increments value and the delay between incremnents will let me experiment a bit. Also, setting the "howMany" parameter to 1 should make the LED immediately turn hard on or hard off. The idea is that I can vary the pace of the display either randomly or in a controlled manner...

I'm having a lot of fun with this.

Author

Max The Magnificent 10/25/2013 2:55:31 PM

Of course, I'm handling all 64 LEDs each with three color channels, but the way I'm doing it is by means of a plan so cunning we could pin a tail on it and call it a weasel :-)

Author

Alvie 10/25/2013 3:21:18 PM

Sometimes using Fixed-Point arithmetic can help with these issues.

The basics behind FixedPoint are quite simple, and easy to implement in most architectures. Let's have a look at your concrete problem (255/256) and see how we could get a somewhat accurate output from it.

The easiest way to implement this would be to use a 10.6 fixed point, where 10 bits are used for the real part (signed number range would be between +511 and -512), and 6 bits for the "fractionary" part - since 6 bits can represent 64 different values, each bit can represent 1/64, 0.01625. This is your resolution in terms of the fractionary part.

The beauty about fixed-point is that most of the arithmetic is done with the integer representation. This includes addition/subtraction and multiplication.

In order to add two 10.6 fixed-point values, represented by a 16-bit integer, just... add them. The result will also be a 10.6 fixed point value. Same for subtraction. For multiplication, the result will not be 10.6, but rather 20.12 - 32 bits. In order to convert this multiplication back to 10.6, just shift it by 6 bits, and truncate the result (note: if you're dealing with signed numbers, you must do this in a slightly different way, in order to preserve the highest bit signal).

So, let's take a look at your case: 255/256.

First, let's represent them as fixed-point, 10.6 numbers, in binary

:

255 => 0011111111.000000

256 => 0100000000.000000

and:

1 => 0000000001.000000

In the following explanations, I'll prefix the numbers with "f" if they are representations of fixed-point values, and no prefix for their integer representations.

Now, let's invert f256. This is done by computing (f1/f256), also in fixed-point format. In order to perform this division, we use the integer representations to perform an integer division. But, similarly to the multiplication (which doubles the number of fractionary bits), the division will also change the fractionary bits - not only that, it will indeed remove them.

Let's take a look at a simple (f1/f1) division. This ought to result in f1, of course. If we take a clear look at the f1 representation above, and compute it's integer value, we get 2^6 == 64. If we divide the values using their integer representation, we get 64/64, which gives us 1. But this integer "1" does not represent "f1", it represents "0000000000.000001". So, we just lost a lot of precision (6 bits actually) by performing the division. To fix this, we shift the result by 6 bits to the left, and get the correct value: "0000000001.000000".

Now, another division. Let's divide f3 by f2, the result should be f1.5. The representation of f3 is "0000000011.000000", and f2 is "0000000010.000000". The integer representations are 192 and 128, respectively. If we perform the integer division, we get 1. If we shift this by 6-bits as in previous example, we get "0000000001.000000", which is f1 - not f1.5. So, we just lost all precision !!!

There are a few solutions to overcome this. One of them is to get a few more resolution by extending the numbers to the next representation, which in this case would be 32-bit. Basically, what we will be doing is to transform the equation:

x = a / b;

into

x = ( (a * N) / b ) / N

which is equivalent. The "N" value would be ((2^6)-1) in this case - which matches the 6-bit shift we have to perform to correct the division result - if we multiply the first argument by the same "shift amount" we have to perform at the end, we don't need to perform the final shift. We can also use a shift to perform the computation of the first operand.

So, for the case of f3/f2, we will get:

a*N (f3 times N): 0000000011.000000[000000] (integer representation: 12288)

b: [000000]0000000001.000000 (integer representation: 128)

And the result is 96 (12288 / 128). In binary form (and fixed point form) it's : [000000]0000000001.100000.

Integer part is 1, and fractionary part is '100000' in binary, 32 in decimal. If we multiply 32 by the fractionary resolution we computed in first place (0.01625), we get... 0.52, close enough to the "0.5" we need to add to the integer part.

Doing the same for your 255/256 values, we get a nicely looking value (result of (255<<6)/256):

0000000000.111111

We are very close to f1, and we can also truncate the result. But that would not be wise : the result is way more close to f1 than to f0.

So, we want to do rounding before converting that to an integer (which means shifting by 6 to the right). In order to do so, we add a '1' in the MSB of the to-be-discarded part. We are discarding 6 bits, so we have to add '100000' to the result:

0000000000.111111

+0000000000.100000

= 0000000001.011111.

Shift by 6 to the right.... gives "1".

So, here's your answer, and a quick, dirty and confuse explanation of fixed point arithmetic :)