# Doing Math in FPGAs, Part 1

There are a number of tricks we can use to perform mathematical operations efficiently in FPGA hardware.

For a recent project, I explored doing "real" (that is, non-integer) math on a Spartan 3 FPGA. FPGAs, by their nature, do integer math. That is, there's no floating-point unit (or even fixed-point) in there unless you "roll your own" (I think many of the newer devices have dedicated math functions in them).

This led me (temporarily) down the path of trying to convert the "real" numbers I was working with (that are fairly small) into integers (by multiplication by 10 -- basically multiplying by powers) to perform the function. Why would I want to do this? Well, I was trying to get over the pains of (large) combinational multipliers and cumbersome dividers. Of course, multiplication is just repeated addition; but it appears that my synthesis tools prefer to generate them from combinational logic. This makes for faster speed at the sacrifice of real estate (or so I suppose).

The synthesis tools, however, do not seem to do this for division. This leaves us with a potential problem, then. If I roll my own divider, it will likely do successive additions/subtractions to compute a result. This means that I would have a stand-alone unit that would need to incorporate "ready" signaling to the next higher assembly. That is, I would have to hand it the numbers to divide and then wait for the module to signal that it had completed the operation before I could continue on with my process.

But, if we can contrive our math such that we are always using powers of two, then we can use shifts, right? That is, a left shift of one bit is equivalent to a multiply by two, while a right shift of one bit is equivalent to a divide by two:

This is handy if we can keep ourselves in the binary world, but sometimes it's nice to work in the more familiar decimal world. To this end, I thought I'd share a couple of tricks I've learned over the years.

The first trick is a fast way to multiply by 10. Even these days, this is a faster way to perform a "multiply by 10" function than what your CPU will normally do when it sees X * 10. So, are you ready to do some math?

Yep -- if you perform a left-shift of one bit and add it to a left shift of three bits (2^{3} = 8), then you've got a multiply by 10. In an FPGA, this is easily done in a single clock cycle... (on an older Intel CPU, it takes three clock cycles, while a normal multiply takes five or more).

Now, dividing is not so easy, but there is a way to do it. Allegedly, this is what the Microsoft compilers do when they see a ÷10, as it's much faster than doing an actual division (which is slow, slow, slow). What I'm about to show you is all "smoke and mirrors" to me as I've never seen the derivation, but it surely seems to work. Are you ready to scratch your head a lot and then (on your own) try a bunch of examples to prove it really works?

Here's what to do... whatever your word length, fill a register with 19...9Ah. That is, if you have a 16-bit word, load a register with 199Ah (199A in hexadecimal). Similarly, if you have a 32-bit system, then fill a register with 1999999Ah. Now multiply this by your number (1 clock in the FPGA) to generate a double-word result (that is, a 32-bit result on a 16-bit system or a 64-bit result on a 32-bit system). If you throw away the lower half of the result, you'll find that the upper half is identical to dividing the original number by 10. Here's a pictorial depiction of the process:

Simple, huh? I'm sure most of you already know this, but there's always a few of us that could use an occasional reminder. I mentioned that I had started down the multiply-by-10 and divide-by-10 path to try to get around the need for "real" number representations in my project. Well, I was never quite able to "tease" that out -- it got too complex too fast. I'm sure one of you folks that's a lot smarter than myself (which is pretty much any of you, I'm sure) could have figured it out, but I'm me, and that's where we'll leave it for now (our editor, Max Maxfield, hates it when I get long-winded, which is pretty easy for me to do).

Next time I'll talk about the implementation I settled on for my project, along with some of the reasons I chose the options I did. Until then, please post any questions or comments below.

**Related posts:**

- Altera & Enpirion: Great Things Have Arrived
- Lattice Always-On Sensor Solutions for Context-Aware Mobile Devices
- Altera's Secret Processor Unveiled: a Quad-Core ARM Cortex-A53
- Handling Asynchronous Clock Groups in SDC
- Cloud-Based FPGA Verification from OneSpin & Plunify
- Altera Announces 20nm FPGA/SoC Tools
- Xilinx Announces 20-nm FPGA/SoC Devices

Author

Adam-Taylor 12/8/2013 7:22:29 AM

good blog, I wrote an article on this for Xcell a few years ago hope you do not mind me linking to it here

http://www.eetimes.com/document.asp?doc_id=1279807

Author

sa_penguin 12/7/2013 7:50:19 PM

For a 12-bit number: 1/10 = 409.6/4096 and 410 = 19AH

16 bits: 1/10 = [6,553.6] / [65,536] and 6,554 = 199AH

24 bits: 1/10 = [1,677,721.6] / [16,777,216] and 1,677,722 = 19 999AH

Yes, I have a habit of putting commas in decimal numbers and spaces in hex numbers. I find it makes them easier to distinguish.

Author

tom-ii 12/7/2013 5:49:11 PM

Interesting stuff. To be honest, though, I'm a complete n00b, and am teaching myself Verilog at this point.

Author

tom-ii 12/7/2013 5:48:22 PM

Fractional fixed point (often referred to as "Q" format) is efficient -- you choose exactly the precision you need, no less and no more -- and the bookkeeping exercise of keeping track of the radix point is not a big deal.

This is actually what I used in my final implementation - We'll get to that in a few weeks, I suppose.... (Max willing)

Author

tom-ii 12/7/2013 5:45:42 PM

## @dtejada201

I figured as much, but just hadn't actually sat down and worked it out. Without knowing for a fact, I didn't want to spout complete nonsense.

...

I spout enough nonsense as it is...

Author

tom-ii 12/7/2013 5:43:27 PM

I didn't intend to imply that I was stuck with base 10. Base 10 is just natural for us humans, and there are plenty of applications that use it a lot, especially when the end result is decimal math. In this specific application, I am sending the computation engine decimal numbers, and expecting them in return. It was worth my while to run down the rabbit hole to see if there was an immediately easy way to get it done this way.

Author

AZskibum 12/7/2013 4:25:49 PM

Fractional fixed point (often referred to as "Q" format) is efficient -- you choose exactly the precision you need, no less and no more -- and the bookkeeping exercise of keeping track of the radix point is not a big deal.

Author

betajet 12/7/2013 2:49:25 PM

Non-negative integers, OTOH, do behave mathemically like modulo 2^n numbers so you do get the correct result modulo 2^n.

I agree with the above poster regarding using a decimal radix. Why not use base 2 like IEEE floating point or base 16 like IBM/360?

Author

dtejada201 12/7/2013 1:13:51 PM

-32768 <--> -.5

32768 <--> .5

Thus .5 is 0x 7FFF (almost). To get .1, I divide by 5 and .1 is 0x199A. This is where the 0x199A factor comes from. When I multiply 2 Q15 numbers, I get a 32 bit result a 32-bit Q31 result. This means .25*.1 is as follows:

.25 => 0x4000

.1 => 0x199A

0x4000 * 0x199A => 0x06668000

0x666800 >> 16 is 0x0666 => which corresponds to .025

Hope this helps

Author

AZskibum 12/7/2013 11:52:18 AM