
It would slightly simplify your equations for determining the required number of bits to simply use 'log'for the quotient of two logs, the base does not matter. (I also thought the base was presented as a subscript rather than a superscript, but that may be a cultural difference.)
It seems that a high result multiplier would be more desirable than a full precision (doubled precision result) or low result multiplier. I.e., one tends to care about the most significant bits. (Since a normalized FP multiply uses the high result, I would guess that FPGAs support such.)
Also with multiplication (and division) shifts can be done before the operation or after the operation as long as the multiplier will not lose necessary precision. (When one operand is a constant, this can allow one to avoid shifting.)
Sorry, some of that was already mentionedI should have read more carefully!
re: The basics of FPGA mathematics
AdamTaylor
8/8/2012 6:53:11 AM
No problem, it is hard to determinine what to include in these articles. Regarding the Log10 they will of course work in any base but most calculators have base log10 and ln hence my use of the log10. but any base will do as you say. Thanks for your comments
I am guessing that your polynomial constant A is 131.29 (as in the original formula and in the scaled version in the table) not 133.29 (as in the table and the scaling factor calculation).
(By the way, using a superscript for squaring would seem to be clearer than using x2. I am guessing that this was a conversion to html issue.)
It's 11:30pm in the UK as I pen these words  I'm sure Adam will respond in the morning  Max
re: The basics of FPGA mathematics
AdamTaylor
8/8/2012 6:50:56 AM
Sorry gents you are correct it should be 131.29 my appologies
It is good to see these issues being discussed in this forum  thank you.
A couple of issues for clarity:
1. You don't need more bits to represent a larger number in fixed point, as there is no reason to require the unit digit to be part of the bit representation, e.g. we could have "01" representing 1x2^8 if we like. Equally it could represent 1x2^{8}. So long as we know the scaling, it doesn't matter  in effect  whether the binary point is inside or outside the number. Thus the number of bits defines the dynamic range, but not the range of representable numbers. This is not quite captured by the notion of "integer bits" used here.
2. To add and subtract, you must align scalings of fixed point arguments, as you say. But you don't need to do this for division.
Readers may be interested in the latest developments on this and related issues in IEEE Design & Test magazine: "Numerical Data Representations for FPGAbased Scientific Computing", G.A. Constantinides, N. Nicolici, A.B. Kinsman, IEEE Design and Test 28(4).
re: The basics of FPGA mathematics
AdamTaylor
8/8/2012 5:33:23 PM
George
Thanks for the kind comments.
With respect to point one I did cover storing different scaling factors in a vector as opposed to the actual width. The key becomes can you accurately represent the number in the vector width available i.e. dynamic range as you correctly point out.
I made the point about aligning the numbers for division as while it is possible to divide none aligned numbers. the scaling of the result will be the difference between the two and you have to be careful not to send them negative. As this is a basic how to article I did not want to introduce to many concepts. I will address this in my blog over at programmable planet however as it is an important concept.
Thanks again for taking the time to read it I do appreciate it ;)
re: The basics of FPGA mathematics
larsen
8/9/2012 8:20:51 PM
Thanks for a good article. Your warning, however, about overflow producing an incorrect result is not concern in all cases and has important practical implications. I think it is less known and quite astonishing: It goes..
"You can add any quantity of fixed point signed numbers (say W bits wide), in _any_ order and ignore overflow  PROVIDED that the final result is within the range of the accumulator. The result will always be correct!"
Eksample: W=3 bits and for simplicity of example  no fractional bits, so numbers can be [4,3,2,10,1,2,3].
We all agree on the following example calculation using decimal numbers:
43+2+3=2
No do the summing from left to right using only 3 bits in the accumulator (Bxxx is in binary:
43=7 (B100+B101=B1001 overflow!  remove the excess bits) Result=B001=1,
1+2=3 (B001+B010=B011)
3+3=6 (B011+B011=B110) but B110 is the same as 2
I.e. the result we were looking for.
All this is due to the modulus arithmetic in operation.
Be cautious though. This does NOT work if you  in a mistaken attempt to be cautious and careful to catch errors  implement the adder with saturation! The result will be totally wrong. So in an FIR (Finite Impulse Response filter) for instance where such a long sum is breadandbutter, one should _not_ use a saturating adder but simply truncate the overflowing bits.
By the way in this example you could do with just 2 bits in the accumulator (and each number for that matter) because the result 2 can be represented by a 2 bits. Only the result determines the size required by the accumulator excess bits can be ignored.
Henning E. Larsen
Excellent point about not using saturation arithmetic in FIR filters, and just allowing modulo arithmetic to do it's thing.
re: The basics of FPGA mathematics
I_B_GREEN
8/12/2012 8:48:13 PM
Yes but when writing code how do you know the outcome? don't you have to assume worst case?
re: The basics of FPGA mathematics
AdamTaylor
8/13/2012 7:14:35 PM
Whether writing code or designing hardware one should always ensure it works in the worse case. With code you need to ensure you hit all of the corner cases. You can model it in excel, matlab or mathcad etc to ensure the worse case is captured.
re: The basics of FPGA mathematics
larsen
8/13/2012 9:56:00 PM
If you do financial calculations for instance, then better design for worst case, but when it comes to FIR filter for signal processing, designing for the absolute worst case with respect to maximum amplitude at the input is not always necessary. To illustrate: Take a narrow band FIR filter (f0). The maximum output ampltude will be with a sinewave at the centre frequency f0. But if you know that such an input signal will never be present in practice, then you can relax on the maximum supported ampltude and discard some MSB as illustrated in my small numeric example with modulus arithmetic. A realistic example of such an input could be if the sinewave at f0 is always accompanied by other signals  noise for example, then you know that the input sinewave can never fill the whole input amplitude, because if it did, there would be clipping or overrun already at the input. The outofband (f0) signals would be filtered away in the FIR filter and give little contribution to the output  only the sinewave at f0 gets through. I.e. the net result is that you can spare some of the MSB's without damaging your signal. But again, using saturation type of adders in the intermediate results would result in distortion at a lower input level than without saturation.
To determine if the filter is properly designed you would need to simulate or at least be prepared to do some iterations of test and redesign.




8/21/2017 4:36:11 PM
8/21/2017 4:36:11 PM
8/21/2017 1:16:35 PM
8/21/2017 1:08:27 PM
8/21/2017 1:08:04 PM
8/21/2017 12:55:45 PM
8/21/2017 12:15:29 PM
8/21/2017 11:26:19 AM
8/21/2017 10:36:15 AM
8/21/2017 7:08:51 AM

