Each sample in the filter core is coupled with an index value, which represents the number of samples less than the corresponding sample. When a new sample is inserted into the window, the samples already in the filter are compared to the new sample. Based on these comparisons the index values are updated, resulting in TAP-distinct values ranging from 0 (lowest sample) to TAP-1 (highest sample). However, this assumes that the values already in the filter core were unique. As new samples enter the filter, samples already in the filter are shifted along with their corresponding index values.
The architecture in Figure 3 illustrates the algorithm for a TAP equal to 5. There are five registers for storing data values (D[3 ... 0]) and the new data sample (ND). Every sample is compared with a new sample to determine whether corresponding values are less than the new sample. The result of these comparisons is fed to TAP bit-wide shift registers (SH[4 ... 1]). Other bits of the SH registers are updated using comparison values calculated in previous cycles.
Figure 3. Filter core architecture.
At any given time, shift registers SH[k] store the comparison results of the corresponding sample D[k], with other samples residing in the filter. The sum of bits in the SH registers generates the index information.
Bit b (b>0) in SH is updated with the negated comparison result of the comparator b. Bit 0 of SH, the self-comparison result, is initialized with a 1. Figure 4 illustrates the algorithm by presenting two cycles of the 5-TAP example as new samples enter from the right. Table cells show the contents of the data register (ND, D) and the corresponding shift register (SH) values.
Figure 4. Filter core operation example.
A straightforward way to sum up the number of 1s in the shift registers is to use adder trees. For the new sample, this is the only solution, as contents of SH can change arbitrarily between subsequent clock cycles. To calculate the sum of SH[TAP-1 ... 1] bits, you can use an increment/decrement structure, taking advantage of the correlation between subsequent SH values. For small TAP numbers, adder trees are less complex. As the TAP number increases, the latter solution becomes more efficient.
The performance of the 1D filter structure plays a key role in evaluating 2D implementation options. Table 1 summarizes the maximum operating frequency for different FPGAs and different TAP numbers.
Table 1. Filter core operating frequencies for different filter sizes and families.