Using dual-port memories ("dual ports") as system interconnects has proven to be an effective interface strategy for bridging multiple processing elements in high-performance applications.
Not only do dual-ports offer high-bandwidth communication between processors, they also provide the flexibility that is often required in fast-evolving design environments. Different dual-port implementations have emerged in recent years, and system designers now have the option of using a traditional dual-port or integrating the dual-port into an onboard FPGA. Newer FPGA families offer internal memory blocks that can be configured as dual-ports, and major FPGA vendors often market their FPGA dual-port as "free integrated memory." However, this is not entirely accurate, as the performance of these dual-ports is highly dependent on device utilization and how these memory blocks are instantiated. Moreover, overall system cost is a critical factor in the decision making process for dual-port implementation.
This article evaluates the validity of FPGA vendors' claims by taking the reader through a recent benchmarking effort on integrated dual-ports in FPGAs. Five popular FPGA families are examined and the performance of their integrated dual-ports is benchmarked against an external dual-port implementation.
As the demand for processing power increases in high-performance applications, using multiple processors has become the inevitable choice for many of today's designs. The immediate problem that emerges with a dual (or more) processor architecture is how these processors communicate with one another. In systems where the processors operate independently, one of the proven approaches is to use a dual-port interconnect. Not only do dual-ports offer high-bandwidth communication between processors, they also provide the flexibility that is often required in fast-evolving design environments.
Different dual-port implementations have emerged in recent years, and system designers now have the option of using a discrete dual-port or a dual-port integrated onto an FPGA. Newer FPGA families offer internal memory blocks that can be configured as dual-ports. Major FPGA vendors often market their FPGA dual-port as "free integrated memory," although the performance of these dual-ports is highly dependent on device utilization and how memory blocks are instantiated. Moreover, FPGA devices only offer a limited amount of configurable memory, as it does not make economical sense to increase the size of the FPGA for more integrated dual-port.
The rest of this article evaluates dual-port approaches based on a recent benchmarking effort of integrated dual-ports using different FPGA families. Three industry-leading high-performance and two low-cost FPGA families have been examined and analyzed, and the performance of the integrated dual-ports is compared with a discrete dual-port implementation.
The decision to use one interconnect solution over another is usually determined by the single most important factor: performance. Different types of dual-port implementation also affect the available bandwidth between processors. In the benchmarking effort, the performance of an integrated dual-port is compared with a discrete dual-port. The two implementations used for benchmarking are shown in Figure 1.
In this illustration, the block diagram on the left represents an integrated dual-port architecture, where port A is the interface between the dual-port memory controller and the integrated dual-port inside the FPGA. Port B of the dual-port interfaces to an external processing element. The block diagram on the right shows a discrete dual-port, with its port A talking to the FPGA memory controller and port B interfacing to an external processing element.
The memory controller in the FPGA supports simple functions such as single write, read, and data compare. No additional logic is loaded into the FPGA in order to provide most freedom for synthesis and place and route (PAR). This in turn provides the best case scenario for the performance benchmarking. The results take into consideration of internal FPGA operating frequency as well as I/O level timing and ensuring each setup and hold time is met.
As mentioned earlier, the scope of the benchmarking exercise includes both high-performance and low-cost FPGAs. Specifically, the cost-saving FPGAs are smaller devices, in terms of logic count, and can typically integrate less than 2 Mb of true dual-port.
These devices target low-end, high-volume applications, where the average selling price (ASP) is relatively low. On the other hand, the high-performance FPGAs have high logic count and can support up to 10 Mb of integrated dual-port. The ASP of these FPGAs is usually high, and they are mostly used for product prototypes and low-volume applications. The different FPGA families studied are summarized in Table 1. The FPGA vendors and family names are not explicitly disclosed. The three high-performance FPGA families are referred to as HP-A, HP-B and HP-3, while the two low-cost families are referred to as LC-A and LC-B.
In order to cover all grounds and obtain consistent results across every device in the same FPGA family, each FPGA device in a selected family was benchmarked individually. Each FPGA device was configured at different dual-port densities and widths. If an FPGA could integrate up to 2 Mb of dual-port, for example, that FPGA was benchmarked at a variety of different dual-port density and width configurations, ranging from the smallest dual-port in the given FPGA to the largest. The different widths include x9, x18, x36 and x72. The report file contains post-synthesis and place-and-route performance for each benchmarking cases, and all runs were optimized with medium effort. By doing this, a performance vs. integration density trend can be extracted. The tools used to perform the analysis are summerized in Table 2.
Let’s first take a look at the cost-saving FPGAs; Table 3 summarizes the largest device of each family.
The performance versus dual-port integration density of the cost-saving FPGA family, LC-A, is shown in Figure 2. The horizontal axis shows the different integration densities in Mb, while the vertical axis denotes the performance of the dual-port in MHz. As shown in the graph, the maximum available dual-port is 1.82 Mb, and the performance achieved at this density is only around 50 MHz.
As the curves suggest, integrated dual-port performance depends heavily on the memory configuration, where only small integration densities can achieve a high interface speed (>166 MHz), and the performance degrades drastically over integration densities.
These cost-saving FPGAs construct their dual-port memory with small memory blocks that are 18 kb in size. Thus, in order to construct a 1.82 Mb dual-port, over one hundred blocks need to be cascaded together, consuming a considerable portion of the available routing resources. This adds to the signal path delays; therefore, the performance of the dual-port memory suffers.
Conversely, a discrete dual-port implementation can achieve a consistent 250 MHz performance, independent of the dual-port size.
Three high-performance families were alsp benchmarked, and the largest device in each of the three families is summerized in
Similar to the trend observed in the cost-saving families, the performance of an integrated dual-port in high-performance FPGAs also depends heavily on the memory configuration. The performance versus integration density for one of the popular high-performance FPGA families, HP-A, is shown in Figure 3. As is illustrated in this figure, the integrated dual-port performance also degrades over integration density. At smaller densities, the performance of the integrated dual-port is fast, running at over 150 MHz with densities under 1 Mb. However, at higher densities the integrated dual-port runs at a speed of a little over 70 MHz. Again, a discrete dual-port runs at its maximum speed of 250 MHz, independent of density.
It is important to emphasize that the best case scenario was analyzed, as there was no other logic inside the FPGA. Including other logic will only make synthesis and PAR of the FPGA much more difficult and more than likely further deteriorate performance.
Apart from the memory performance, cost is also an important factor to consider when choosing a dual-port interface architecture.
The cost of each memory bit in an FPGA typically increases exponentially when moving to larger FPGAs compared to a more linear rise for integrated memories. A sample cost structure (in 10K units) for popular cost-saving FPGAs is shown in Figure 4. A similar cost structure for high-performance FPGA families is shown in Figure 5.
As is illustrated in these graphs, the cost of FPGAs increases disproportionably to the amount of available integrated memory. For example, a cost-saving FPGA with 0.5 Mb dual-port costs approximately $40, while a device with 1.82 Mb of memory costs around $250. That is over six times the cost for less than four times the memory density. Similarly, a high-performance FPGA with approximately 4 Mb of memory costs $1000, while an FPGA with approximately 8 Mb costs over $3000. Thus, increasing the size of the FPGA to integrate more memory is cost-ineffective. It is always more efficient to use a discrete dual-port implementation if the logic and I/O requirements can be satisfied with a small FPGA.
When evaluating whether to implement a dual-port interface in an FPGA or as a discrete device, developers must consider the trade-off between dual-port performance and integration density. A decision chart is shown in Figure 6, where the horizontal axis denotes the dual-port density requirement in Mb and the vertical axis denotes FPGA cost or the size of the FPGA. The white line is the hard cut-off for the available integrated dual-port in FPGA, as the largest FPGA offers up to 10 Mb of dual-port. Anything to the right of the white line (green region) shows memory densities beyond what an FPGA can integrate; therefore, the designer must use a discrete dual-port in this region. The red region shows small dual-port densities that can be easily integrated while still achieving sufficient performance. The region in question is the grey area that is highly performance-dependent on the application. That is, it may still make sense to integrate dual-port so long as the application does not have a high performance requirement. As the dual-port density requirement increases, however, it makes less sense to use an expensive FPGA to integrate all the dual-port due to the resulting devastating loss in performance coupled with the uneconomical cost.
About the author
Danny Tseng is a senior applications engineer in the System Interconnect group at Cypress Semiconductor.