datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Comment


Max the Magnificent

9/16/2010 4:36 PM EDT

Alex -- is that you? I haven't seen you for ages -- how are you doing -- I ...

More...



unu

9/16/2010 3:59 PM EDT

The Altera people deserve congratulations!
A 2008 web published paper ...

More...

How to achieve 1 trillion floating-point operations-per-second in an FPGA

Michael Parker, Altera Corporation

9/14/2010 2:56 PM EDT

FPGAs enable specific optimizations for floating-point
FPGAs have specific characteristics lacking in microprocessors, and these features can be leveraged to produce a more optimal floating-point flow. First, FPGAs, unlike microprocessors, have thousands of hardened multiplier circuits. These can be used for both mantissa multiplication, and used as shifters. Shifting of the data is required to perform the normalization to set mantissa decimal point, and denormalization of mantissas as needed to align exponents. Using a barrel shifter structure would require very high fan-in multiplexers for each bit location, and the routing to connect each of the possible bit inputs. This leads to very poor fitting, slow clock rates, and excessive logic usage, which has discouraged use of floating-point operations in FPGAs previously.

Second, an FPGA has the ability to use larger mantissas than an IEEE 754 representation. This is possible because the variable-precision DSP blocks support 27x27 and 36x36 multiplier sizes, which can be used for 23-bit single-precision floating-point datapaths. Using configurable logic, the remainder of the circuits can by definition be made whatever mantissa size is desired. By using a mantissa size of a few extra bits, such as 27 bits instead of 23 bits, allows for extra precision to be carried from one operation to the next, significantly reducing normalization and denormalization.

The fused-datapath tool analyzes the need for normalization in the design, and inserts these stages only where necessary. This analysis leads to a dramatic reduction in logic, routing, and multiplier-based shifting resources. It also results in much higher fMAX or achievable clock rates, even in very large floating-point designs as shown graphically in Figure 1.

Figure 1. Fused datapath optimizations

Because an IEEE 754 representation is still necessary to comply with the floating-point world, all of the floating-point functions support this interface at the boundaries of each function, whether an fast Fourier transform (FFT), a matrix inversion, sine function, or a custom datapath specified by customers. But whether the fused-datapath toolflow provides the same results as the IEEE 754 approach used by microprocessors, and how verification is performed, are still under question. Even microprocessors have different floating-point results, depending on how they are implemented.

The main reason for these differences is that floating-point operations are not associative, which can be proved easily by writing a program in C or MATLAB to sum a bunch of floating-point numbers. Summing the same set of numbers in the opposite order will result in several different least significant bits (LSBs). To verify the fused-datapath method, the designer must discard the bit–by-bit matching of results typically used in fixed-point data processing. The tools allow the designer to declare a tolerance, and to compare the hardware results output from the fused-datapath toolflow to the simulation model results.

A large single-precision floating-point matrix inversion function can be implemented using the fused-datapath toolflow, and tested across different-size input matrices. These results can also be computed using to an IEEE 754-based Pentium processor. The reference result is computed on the processor, using double-precision floating-point operations, which provides perfect results compared to single-precision architecture. By comparing both the IEEE 754 single-precision results and the single-precision fused-datapath results, and computing the Frobenious norm of the differences, it can be shown that the fused-datapath toolflow gives more precise results than the IEEE 754 approach, due to the extra mantissa precision used in the intermediate calculations.

Table 1 lists the mean and the standard deviation and Frobenious norm where the SD subscript refers to IEEE 754-based single-precision architecture in comparison with to the reference double-precision architecture, and the HD subscript refers to hardware-based fused-datapath single–precision architecture in comparison with the reference double-precision architecture.

Table 1. Fused datapath precision results




Max the Magnificent

9/14/2010 4:55 PM EDT

OK, the next question is how many floating-point DSP operations do different applications require to perform per second ... does anyone know any metrics here?

Sign in to Reply



Yankee

9/15/2010 12:34 PM EDT

The key is "how many seconds do you have?" In order to achieve faster response to demanding operations such as 3D transformation, speech recognition, radar processing, image analysis, and the like, more flops are needed, so the answer is found in the market requirements for products.

Sign in to Reply



unu

9/16/2010 3:59 PM EDT

The Altera people deserve congratulations!
A 2008 web published paper available at

alex.zamfirescu.googlepages.com/Zamfirescu_2008.pdf
highlighted both the advantages of variable precision and of mixing the fix and float formats, skipping normalization until really required, etc. Altera people went on, and really did it. Bravo Altera! The extra innovation to keep mantissa in two's comp., instead of sign and mag., was not even predicted then. However, the paper described also the HDL extensions and what was feasible at that time in VHDL, and especially in Verilog. Maybe, people will pay attention and focus also on bringing HDL standards, to the level where designers could use them to push progress of FP computations (instead of confusing the execution machine FP formats with the designed FP capability).

Sign in to Reply



Max the Magnificent

9/16/2010 4:36 PM EDT

Alex -- is that you? I haven't seen you for ages -- how are you doing -- I remember that paper -- I recommend other folks to read it also -- regards -- Max

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)