# Doing Math in FPGAs, Part 5 (Binary Division)

If you thought binary multiplication was "interesting," just wait until you try to wrap your brain around fixed-point binary division!

I'm afraid to say that I lied to you in my previous column.

I said that column would be the last of these rantings about binary math. But then I realized that I had no good references on performing binary division (especially in fixed-point), so I decided to dedicate this installment to binary division. I apologize in advance for the long-windedness of my pontifications. Fasten your seat belts!

Let's first refresh ourselves on the "long division" that we learned in grade school. Let's say we want to find how many threes are in 136 (136 ÷ 3) -- I don't know why we'd ever want to do this, but I'm sure there's a good application for it somewhere. First, we set up our long division table as illustrated below:

Starting from the left (the higher-order digits), and working towards the right, we test how many times we can fit our divisor into our dividend. So, left-aligning the two numbers, we see that 3 will go into 1 zero times (or, more correctly, 300 will go into 100 zero times):

Next, we multiply the zero and our divisor, and put the properly-aligned result into the table beneath the dividend:

Now we subtract the results of the multiplication and place the result of the subtraction in the table -- perhaps carrying down only the next digit in significance:

Using the next-lower-ordered bit, we see that 3 will go into 13 four times (or, more correctly, 30 will go into 130 four times):

We repeat this subtraction process until we've progressed through all of the digits:

Eventually, we discover that our result is 45 with a remainder or 1 (or 45 and 1/3). Of course, we can keep the division going into those decimal places and generate a result or 45.33333333r (where the "r" indicated that the preceding "3" will keep on recurring).

Of course, this is pretty easy for those of us that have been doing it for a while (my kids still struggle with it some), but how is this process different from binary? Well, it's not, really, but there are a lot of "gotchas" when it comes to performing division using our computing machinery. Let's quickly use the same numbers to perform a binary division as illustrated below:

Simple enough, if you give it some thought. But, there are a couple of "interesting" things here. The first thing, if you notice, is that we are shifting the divisor to the right (÷2) each step. Since this is binary, this makes perfect sense. Next, since we can only multiply by 1, all we ever need to do is set the appropriate quotient bit at the appropriate time. Finally, as in the original base-10 case, if the dividend is greater than the divisor, then we subtract the divisor from the dividend. So, we have the basic steps for performing binary division:

- Bit-align the left-most "1" of the divisor with the left-most "1" of the dividend
- Set all quotient bits to "0"
- If the dividend ≥ divisor
- Set the quotient bit that corresponds to the position of the divisor to "1"
- Subtract the divisor from the dividend

- Shift the divisor 1 bit to the right (@divide;2)
- If the leading bit of the divisor is at a position ≥ 0, return to step 3
- Otherwise we're done

But how do we implement this in our hardware? Right off the bat, we can see some problems. Firstly (and what I see as the "hardest") is this -- how do we align the left-most bits of the two numbers? It seems easy enough to say it, but do we use a left-shift and compare? Do we use a giant multiplexer? The first method takes clock ticks, while the second takes up lots of hardware.

Next, how do we keep track of the right number of zeroes at the front -- that is, how do we keep the bit position of the quotient correct, and properly aligned? It's not impossible, of course, but things to think about. Lastly, what do we do with a remainder? Do we just throw it away, or do we round?

Author

tom-ii 2/21/2014 3:52:51 PM

I was thinking specifically of Newton-Raphson (N-R). Wikipedia covers the gory details of the theory behind it so I won't repeat it here.Yeah, I have actually read through the whole thing. I really like that it finishes in log(N) steps (if your special cases are handled, and you have a good initial guess). The problem with it is the generalization to "any" implementation - I don't see a good way to package up an initial guess - I suppose I could just make it half of 2^N, but then it might not be saving me anything... Of course this is because I want to generalize it for fixed-point. I see that it may be a bit easier for floating point.

Author

betajet 2/21/2014 3:43:36 PM

Basically, to do floating-point division with N-R, you want to find the reciprocal of the denominator D. To use N-R, you first estimate the reciprocal X[0] and then refine the estimate using the N-R iteration X[i+1] = X[i] * (2 - D * X[i]) which has only multiples and adds. Each iteration reduces the error quadratically, so if you start with an estimate that's accurate to 13 bits, one iteration gives you 26 bits (single precision) and two gives you 52 bits (double precision). It you have high-speed multiplication hardware, this is a great way to go.

You find the initial estimate using a look-up ROM. The denominator D is a binary floating-point number of the form M * 2^E, where mantissa M is between 0.5 and 1.0. To calculate your first estimate X[0], look up the 12-16 MSbs of M in a ROM to get a number between 2.0 and 1.0, divided by 2 to renormalize it to between 0.5 and 1.0. The exponent is easy: just negate it and add 1 to compensate for the mantissa look-up. In other words, X[0] = XM * 2^(1-E) where XM is the value read from the ROM.

You need to handle some special cases: If M is zero or an unnormalized fraction, 1/D is +infinity. If D is +infinity, 1/D is +0.

There is also a good N-R iteration for calculating square roots. In this case, you actually calculate 1/sqrt(x) and then multiply by x to get sqrt(x).

Integer square root using the "long division" method is also loads of fun. It's a simplified form of the decimal "long division" method that was taught in USA public schools before 1950 and dropped once TV started to rot the minds of USA youth :-)

Author

tom-ii 2/20/2014 7:26:34 AM

aminterested in your thoughts. I know of the SVT algorithm, but haven't yet quite wrapped my mind around it. I also took a look at Newton-Rhapson, but the need for a scaling factor throws me in that I haven't figured out how to make it generic - that is, if I instantiate a QN = 8,12. but want to then instantiate a QN=3,15, the scaling factors will need to be different (I think).I'm interested in thoughts...

Author

betajet 2/19/2014 2:00:40 PM

If you have fast multiplication hardware -- which you usually do nowadays -- you can use an extremely clever technique to calculate the reciprocal. I think the author is working up to this, so I won't say more.

Author

KarlS01 2/19/2014 1:36:43 PM

Dividend occupied even odd register pair to form 64 bit width Divisor in third 32 bit register.

make both positive by complement if negative, remember if signs were different and make quotient negative at end

do a trial subtraction of divisor from high half of dividend, if positive write result into dividend and shift one into dividend low bit. If negative result shift dividend left and zero into low bit. (Aligns left bits and generates zero high quotient bits)

repeat 32 times(word width) so the 32 bit quotient is shifted into the low register of the dividend pair. The shift and subtract reduces the dividend to a 32 bit remainder in the register that held the high order dividend 32 bits.

Note that shifting zero into the emptied bit of the dividend until the divisor can be successfully subtracted accomplishes the alignment of the operands while reducing the magnitude of the quotient. (As I already wrote...)

Multiples of the divisor (2x,4x, 5x?) were generated and decoding of dividend and divisor bits to generate multiple quotient bits to reduce cycles. That is too complicated to remember.

The ALU and working registers were 64 bits for floating point so no big cost for divide and multiply 64 bit product.

Author

tom-ii 2/19/2014 12:07:34 PM