# Doing Math in FPGAs, Part 1

There are a number of tricks we can use to perform mathematical operations efficiently in FPGA hardware.

For a recent project, I explored doing "real" (that is, non-integer) math on a Spartan 3 FPGA. FPGAs, by their nature, do integer math. That is, there's no floating-point unit (or even fixed-point) in there unless you "roll your own" (I think many of the newer devices have dedicated math functions in them).

This led me (temporarily) down the path of trying to convert the "real" numbers I was working with (that are fairly small) into integers (by multiplication by 10 -- basically multiplying by powers) to perform the function. Why would I want to do this? Well, I was trying to get over the pains of (large) combinational multipliers and cumbersome dividers. Of course, multiplication is just repeated addition; but it appears that my synthesis tools prefer to generate them from combinational logic. This makes for faster speed at the sacrifice of real estate (or so I suppose).

The synthesis tools, however, do not seem to do this for division. This leaves us with a potential problem, then. If I roll my own divider, it will likely do successive additions/subtractions to compute a result. This means that I would have a stand-alone unit that would need to incorporate "ready" signaling to the next higher assembly. That is, I would have to hand it the numbers to divide and then wait for the module to signal that it had completed the operation before I could continue on with my process.

But, if we can contrive our math such that we are always using powers of two, then we can use shifts, right? That is, a left shift of one bit is equivalent to a multiply by two, while a right shift of one bit is equivalent to a divide by two:

This is handy if we can keep ourselves in the binary world, but sometimes it's nice to work in the more familiar decimal world. To this end, I thought I'd share a couple of tricks I've learned over the years.

The first trick is a fast way to multiply by 10. Even these days, this is a faster way to perform a "multiply by 10" function than what your CPU will normally do when it sees X * 10. So, are you ready to do some math?

Yep -- if you perform a left-shift of one bit and add it to a left shift of three bits (2^{3} = 8), then you've got a multiply by 10. In an FPGA, this is easily done in a single clock cycle... (on an older Intel CPU, it takes three clock cycles, while a normal multiply takes five or more).

Now, dividing is not so easy, but there is a way to do it. Allegedly, this is what the Microsoft compilers do when they see a ÷10, as it's much faster than doing an actual division (which is slow, slow, slow). What I'm about to show you is all "smoke and mirrors" to me as I've never seen the derivation, but it surely seems to work. Are you ready to scratch your head a lot and then (on your own) try a bunch of examples to prove it really works?

Here's what to do... whatever your word length, fill a register with 19...9Ah. That is, if you have a 16-bit word, load a register with 199Ah (199A in hexadecimal). Similarly, if you have a 32-bit system, then fill a register with 1999999Ah. Now multiply this by your number (1 clock in the FPGA) to generate a double-word result (that is, a 32-bit result on a 16-bit system or a 64-bit result on a 32-bit system). If you throw away the lower half of the result, you'll find that the upper half is identical to dividing the original number by 10. Here's a pictorial depiction of the process:

Simple, huh? I'm sure most of you already know this, but there's always a few of us that could use an occasional reminder. I mentioned that I had started down the multiply-by-10 and divide-by-10 path to try to get around the need for "real" number representations in my project. Well, I was never quite able to "tease" that out -- it got too complex too fast. I'm sure one of you folks that's a lot smarter than myself (which is pretty much any of you, I'm sure) could have figured it out, but I'm me, and that's where we'll leave it for now (our editor, Max Maxfield, hates it when I get long-winded, which is pretty easy for me to do).

Next time I'll talk about the implementation I settled on for my project, along with some of the reasons I chose the options I did. Until then, please post any questions or comments below.

**Related posts:**

- Altera & Enpirion: Great Things Have Arrived
- Lattice Always-On Sensor Solutions for Context-Aware Mobile Devices
- Altera's Secret Processor Unveiled: a Quad-Core ARM Cortex-A53
- Handling Asynchronous Clock Groups in SDC
- Cloud-Based FPGA Verification from OneSpin & Plunify
- Altera Announces 20nm FPGA/SoC Tools
- Xilinx Announces 20-nm FPGA/SoC Devices