Rank-order filtering is a non-linear filtering technique in which an element is selected from an ordered list of samples. Two-dimensional (2D) filtering is performed on the contents of a rectangular window that slides across an image. As the window moves by one pixel, a set of obsolete elements are discarded and a set of new elements are inserted into the filter window. The samples within the window are sorted and the element with the specified rank is selected as output. Most typical ranks are median, minimum, and maximum.
Compared to linear filters such as finite impulse response (FIR) or IIR, rank filters can effectively remove specks while preserving edges. This can be very useful for noise-removal or pre-processing applications. In this article, we'll present an architecture that lends itself well to area and performance trade-offs in high-performance FPGAs.
Earlier rank-filter implementations have not dealt with aspects of color image processing, predominantly classifying a filter as 2D if it was able to generate a valid pixel output in a single clock cycle.
Bit-serial approaches provide the lowest complexity. Processing rates are not usually dependent on the number of new samples the filter can handle in a single clock cycle. Bit-serial approaches are non-recursive and consequently easy to pipeline, but require a large number of comparators. Filtering performance is proportional to input data width.
Word-parallel architectures usually implement a sorting network that employs bubble sorting, odd/even merge sorting, and other architectures optimized for resource efficiency.
Most rank-filter architectures either store samples by the order of arrival (FIFO) or magnitude ordered. Insert/delete architectures store samples ordered by magnitude. The oldest sample is discarded and the most recent input is inserted into the sorting structure at the appropriate location. These solutions require fewer comparators.
FIFO-based architectures dynamically calculate the location of the output sample. These architectures are easier to pipeline and adapt to multiple samples per clock cycle, which is essential for 2D filtering.
The Xilinx 2D Rank-Filter Architecture
Other than larger tap numbers, the most important difference between 2D and 1D rank filtering is that multiple input samples must be inserted into the 2D filter core to generate one new output sample. Figure 1 shows the pixel high-sliding filter window, which has a vertical size variable we'll call WV, moving across the input image from left to right. The smaller gray squares represent pixels while darker squares illustrate new samples.
Figure 1. 2D filtering window.
You can trivially extend 1D filters for 2D use by operating the filter at a WV multiple of the pixel clock, reading new input pixels every clock cycle but generating valid output pixels only once every WV clock cycle. Depending on the filter size and the targeted FPGA technology, this solution is viable for a wide range of applications. You can filter windows as high as 3 pixels on most FPGAs and as high as 5 pixels on Xilinx Virtex-4 devices or Virtex-5 devices using pure 1D architectures for 75-MHz 1080p HDTV applications.
If pixel clock frequencies are prohibitively high, parallel instances of certain key components of the filter may accept new WV input samples every clock cycle.
Hybrid solutions spanning fully parallel (WV input samples per clock cycle) and word serial (one input sample per clock cycle) allow you to tune the filter core to the maximum clock frequency allowed by the target chip while minimizing resource counts.
Basic Filter Architecture
The architecture illustrated in Figure 2 comprises five main components. The line buffer stores WV-1 lines of the input frame. If required, the filter value generator (FVG) computes magnitude values (such as luminance) for filtering. The delay line block stores full pixel information (such as RGB) for the pixels currently being processed by the filter core, which carries out the actual rank filtering. The control block generates optional data switching, masking, and output valid signals.
Figure 2. Filter architecture.
For typical resolutions and filter sizes, you can implement the line buffer using block RAMs internal to the target FPGA.
Color frames are often filtered using a function of the RGB values—typically luminance—rather than the full RGB information. Color information (corresponding to the luminance values processed by the filter core) is stored in a FIFO. Let's designate the variable TAP to mean the size of the filter core; at any given time the number of pixels the FIFO stores is the sum of TAP and the latency of the filter core. The SRL16 primitives in Xilinx FPGAs offer an efficient way to implement this addressable FIFO.