# Doing Math in FPGAs, Part 1

There are a number of tricks we can use to perform mathematical operations efficiently in FPGA hardware.

For a recent project, I explored doing "real" (that is, non-integer) math on a Spartan 3 FPGA. FPGAs, by their nature, do integer math. That is, there's no floating-point unit (or even fixed-point) in there unless you "roll your own" (I think many of the newer devices have dedicated math functions in them).

This led me (temporarily) down the path of trying to convert the "real" numbers I was working with (that are fairly small) into integers (by multiplication by 10 -- basically multiplying by powers) to perform the function. Why would I want to do this? Well, I was trying to get over the pains of (large) combinational multipliers and cumbersome dividers. Of course, multiplication is just repeated addition; but it appears that my synthesis tools prefer to generate them from combinational logic. This makes for faster speed at the sacrifice of real estate (or so I suppose).

The synthesis tools, however, do not seem to do this for division. This leaves us with a potential problem, then. If I roll my own divider, it will likely do successive additions/subtractions to compute a result. This means that I would have a stand-alone unit that would need to incorporate "ready" signaling to the next higher assembly. That is, I would have to hand it the numbers to divide and then wait for the module to signal that it had completed the operation before I could continue on with my process.

But, if we can contrive our math such that we are always using powers of two, then we can use shifts, right? That is, a left shift of one bit is equivalent to a multiply by two, while a right shift of one bit is equivalent to a divide by two:

This is handy if we can keep ourselves in the binary world, but sometimes it's nice to work in the more familiar decimal world. To this end, I thought I'd share a couple of tricks I've learned over the years.

The first trick is a fast way to multiply by 10. Even these days, this is a faster way to perform a "multiply by 10" function than what your CPU will normally do when it sees X * 10. So, are you ready to do some math?

Yep -- if you perform a left-shift of one bit and add it to a left shift of three bits (2^{3} = 8), then you've got a multiply by 10. In an FPGA, this is easily done in a single clock cycle... (on an older Intel CPU, it takes three clock cycles, while a normal multiply takes five or more).

Now, dividing is not so easy, but there is a way to do it. Allegedly, this is what the Microsoft compilers do when they see a ÷10, as it's much faster than doing an actual division (which is slow, slow, slow). What I'm about to show you is all "smoke and mirrors" to me as I've never seen the derivation, but it surely seems to work. Are you ready to scratch your head a lot and then (on your own) try a bunch of examples to prove it really works?

Here's what to do... whatever your word length, fill a register with 19...9Ah. That is, if you have a 16-bit word, load a register with 199Ah (199A in hexadecimal). Similarly, if you have a 32-bit system, then fill a register with 1999999Ah. Now multiply this by your number (1 clock in the FPGA) to generate a double-word result (that is, a 32-bit result on a 16-bit system or a 64-bit result on a 32-bit system). If you throw away the lower half of the result, you'll find that the upper half is identical to dividing the original number by 10. Here's a pictorial depiction of the process:

Simple, huh? I'm sure most of you already know this, but there's always a few of us that could use an occasional reminder. I mentioned that I had started down the multiply-by-10 and divide-by-10 path to try to get around the need for "real" number representations in my project. Well, I was never quite able to "tease" that out -- it got too complex too fast. I'm sure one of you folks that's a lot smarter than myself (which is pretty much any of you, I'm sure) could have figured it out, but I'm me, and that's where we'll leave it for now (our editor, Max Maxfield, hates it when I get long-winded, which is pretty easy for me to do).

Next time I'll talk about the implementation I settled on for my project, along with some of the reasons I chose the options I did. Until then, please post any questions or comments below.

**Related posts:**

- Altera & Enpirion: Great Things Have Arrived
- Lattice Always-On Sensor Solutions for Context-Aware Mobile Devices
- Altera's Secret Processor Unveiled: a Quad-Core ARM Cortex-A53
- Handling Asynchronous Clock Groups in SDC
- Cloud-Based FPGA Verification from OneSpin & Plunify
- Altera Announces 20nm FPGA/SoC Tools
- Xilinx Announces 20-nm FPGA/SoC Devices

Author

sa_penguin 12/6/2013 7:40:07 PM

I suspect 1/10 doesn't translate perfectly to binary so, just like 1/3 becomes 0.33... you get a hex factor of 0x1999... in your division.

Author

AZskibum 12/7/2013 11:52:18 AM

Author

tom-ii 12/7/2013 5:43:27 PM

I didn't intend to imply that I was stuck with base 10. Base 10 is just natural for us humans, and there are plenty of applications that use it a lot, especially when the end result is decimal math. In this specific application, I am sending the computation engine decimal numbers, and expecting them in return. It was worth my while to run down the rabbit hole to see if there was an immediately easy way to get it done this way.

Author

Brian_D 12/6/2013 7:56:58 PM

http://www.hackersdelight.org/divcMore.pdf

Also, VHDL has a nifty fixed point math package:

http://www.eda.org/fphdl/

-Brian

Author

krisi 12/7/2013 10:01:00 AM

Author

Max The Magnificent 12/9/2013 10:04:24 AM

Pretty cool math...I wonder whether they teach something like that in vlsi classes...

I'd be interested to knwo that myself -- but I fear not. When I started out, everyone knew and "swapped" tips and tricks like this -- it was key to making programs run as fast as possible when you were limited in terms of memory size, clock frequency, and the fac tthat CPUs too multiple cycles to do anything.Magazines like BYTE were always publishing stuff like this... I love this stuff

Author

DrFPGA 12/9/2013 11:02:27 AM

Maybe there should be a 'list' of tricks that users suggest would be good to put into synthesis tools. Perhaps via a discussion thread on Programmable Logic Design Line?

Author

betajet 12/9/2013 2:20:02 PM

Lesson 1: Don't trust automatic state assignment. Some synthesizers make it hard to turn off automatic state assignment, but it's worth the effort.

Lesson 2: Don't trust simulation. It only simulates theoretical models, not the real world.

Lessons 1+2 combined and generalized: Know the limitations of your tools.

Author

DrFPGA 12/9/2013 2:37:47 PM

Anyone else have tools with detailed enough docs that they can tell ahead of time what is being done?

Author

tom-ii 12/7/2013 5:49:11 PM

Interesting stuff. To be honest, though, I'm a complete n00b, and am teaching myself Verilog at this point.

Author

Max The Magnificent 12/9/2013 10:01:11 AM

There's some nice material about constant division on the companion website for "Hacker's Delight"I was going to mention that book Hacker's Delight -- it's a great book -- lots of useful tricks in it. I just now found there's a second edition (the link above) -- I have th efirst -- I'll have to add this second edition to my wish list :-)

Author

KarlS01 12/7/2013 11:11:18 AM

It is a shift and add sequence starting with the low order bit of the multiplier if it is a 1 add the multiplicand to the double wide product high order and shift right 1 into low order. if multiplier bit is zero, just shift product right one.

Repeat until higher multiplier bits are zero or shifts equal to multipl;ier width. If the high bits are all 0s, then just shift for the remaining word width.

Division was trial subtraction by subtracting the divisore fron the dividend, if the result was positive shift 1 into the quotient high bit else shift 0. Remainder is left in the reg that held the high order dividend and the quotient in the low order.

If the signs of the dividend and divisor were different, complement at the end.

This is best I remember, maybe a few details missing. The shifts amounted to *2 and /2 and the add/subtract would wind up in the appropriate power of two positions.

I think the constant used in the compiler is 1/10 so they are multiplying by the reciprical of 10 to effectively divide by 10.

Author

dtejada201 12/7/2013 1:13:51 PM

-32768 <--> -.5

32768 <--> .5

Thus .5 is 0x 7FFF (almost). To get .1, I divide by 5 and .1 is 0x199A. This is where the 0x199A factor comes from. When I multiply 2 Q15 numbers, I get a 32 bit result a 32-bit Q31 result. This means .25*.1 is as follows:

.25 => 0x4000

.1 => 0x199A

0x4000 * 0x199A => 0x06668000

0x666800 >> 16 is 0x0666 => which corresponds to .025

Hope this helps

Author

tom-ii 12/7/2013 5:45:42 PM

## @dtejada201

I figured as much, but just hadn't actually sat down and worked it out. Without knowing for a fact, I didn't want to spout complete nonsense.

...

I spout enough nonsense as it is...

Author

sa_penguin 12/7/2013 7:50:19 PM

For a 12-bit number: 1/10 = 409.6/4096 and 410 = 19AH

16 bits: 1/10 = [6,553.6] / [65,536] and 6,554 = 199AH

24 bits: 1/10 = [1,677,721.6] / [16,777,216] and 1,677,722 = 19 999AH

Yes, I have a habit of putting commas in decimal numbers and spaces in hex numbers. I find it makes them easier to distinguish.

Author

betajet 12/7/2013 2:49:25 PM

Non-negative integers, OTOH, do behave mathemically like modulo 2^n numbers so you do get the correct result modulo 2^n.

I agree with the above poster regarding using a decimal radix. Why not use base 2 like IEEE floating point or base 16 like IBM/360?

Author

AZskibum 12/7/2013 4:25:49 PM

Fractional fixed point (often referred to as "Q" format) is efficient -- you choose exactly the precision you need, no less and no more -- and the bookkeeping exercise of keeping track of the radix point is not a big deal.

Author

tom-ii 12/7/2013 5:48:22 PM

Fractional fixed point (often referred to as "Q" format) is efficient -- you choose exactly the precision you need, no less and no more -- and the bookkeeping exercise of keeping track of the radix point is not a big deal.

This is actually what I used in my final implementation - We'll get to that in a few weeks, I suppose.... (Max willing)

Author

Adam-Taylor 12/8/2013 7:22:29 AM

good blog, I wrote an article on this for Xcell a few years ago hope you do not mind me linking to it here

http://www.eetimes.com/document.asp?doc_id=1279807

Author

tom-ii 12/8/2013 10:31:35 AM

Of course I don't mind!

Author

Adam-Taylor 12/9/2013 4:04:30 AM

This is then scaled by integer(2**16 * 0.1) = 6554 or 0x199A doing the multiply then works out the same as doing the divide. This trick can be used for any number when the divisor is fixed.

Author

DrFPGA 12/8/2013 11:49:13 PM

Thanx for posting the link to your article. Very detailed and even had some code!

Author

Adam-Taylor 12/9/2013 3:47:20 AM

The more complex things like filters and CORDICs etc are covered in other articles

I find a simple example often helps to demonstrate the principles

Author

paragdighe 12/10/2013 4:45:31 AM