United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

ASIC Block-Based Design Methodology for Fine-Grained FPGAs

The migration of ASIC design flows is feeding the demand for FPGA-type designs.

By Hichem Belhadj, Paresh Patel


The increasing gate count, speed, and sophistication of architectures offered by FPGA vendors have enabled designers to create complex, high-speed designs using programmable technologies. However, implementing heavily bused designs that regularly operate at frequencies over 100 MHz can contributeto a number of problems at various stages in the development.

Here we present several aspects of design development that can help to cope with these issues and to achieve higher productivity, as well as reduced design costs. The principal cornerstone lies in the definition of a clear, block-based, ASIC-like methodology that helps to avoid lengthy iterations and addresses the considerable challenges of system-level integration. Another element of the solution relates to software capabilities that ensure full consideration of the designerýs architectural decisions in addition to efficient debug and validation of the design. Note that we distinguish here between FPGA, ASIC, block, or system designers.

We have used as an example a tool suite that bridges the gap between FPGA and ASIC design, ASICmaster FPGA design suite for the Flash-based ProASIC product, which has ASIC-like silicon architecture and design methodology.

Design issues and barriers

ASIC designers face several barriers when targeting programmable technologies. They need specific tools, a specific way to approach the design, and they must make drastic changes in their design practices. In addition, there often exists a requisite steep and costly learning curve to efficiently utilize the FPGA capabilities and to obtain the best possible performance.

FPGA designers need to think in terms of resources. They often suffer through a long cycle of iterations in order to achieve targeted performance, and this struggle prevents them from planning ahead and from considering design reuse. As a result, they often end up adopting one-shot design practices. Clearly, the main bottlenecks for both ASIC and FPGA designers are timing convergence, the budgeting of FPGA resources, and the lack of control over both synthesis and place-and-route tools.

A block-based methodology should allow ASIC designers to continue using familiar ASIC design methodology and tools when tackling design complexity, and timing and power issues. In addition, using such methodologies, FPGA designers more easily structure projects and objectives while reducing overall development and design costs. A block-based methodology links the various design activities and tools, but separates the concerns at the architecture, block levels, and integration of the entire design.

The proposed methodology offers a structured way to consider and solve problems related to complexity management, timing convergence, power budgets, and tool shortcomings. It maps general guidelines for reducing redundancy and increasing the sharing of well-designed blocks across a project's design groups. It also stimulates design reuse and the process of designing for reuse. The outcome presents an easy and fast derivation of new versions of top-level design.

We focus here on general aspects of the strategy with an emphasis on timing issues. We will also demonstrate the higher level of integration required between system-level design and block synthesis, as well as the integration between synthesis and place and route.

Principles

The block-based approach addresses the source of timing and budgeting issues at the block level, which reduces or eliminates these concerns in top-level design. The methodology has evolved into three phases.

Figure 1 - Separating the wheat from the chaff
By separating the processing of the overall system and the blocks, the proposed design approach distinguishes between system architecture, block design, and block integration activities.
The first phase considers the hierarchical blocks individually (see Figure 1). For each individual hierarchical block, an initial synthesis is performed using a default hierarchical wire load model (HWLM). The floorplanning and place-and-route results help generate a more accurate custom wire load model (CWLM). The CWLM is later used for a second synthesis pass to generate a more efficient netlist and a more stable timing block place and route. Notice that the floorplanning provided by the place-and-route tool is based on a combination of placement-and-routing constraints. The constraints imply a definition of the mobility associated with the block.

The second phase loads all the netlists of the previously processed blocks with their associated post-layout SDF files. After timing characterization, a final synthesis of the RTL code uses the extracted input and output delays and the CWLMs generated in the first phase. This yields more efficient, stable, and accurate final timing results for each block.

Finally, the last phase updates the top-level design and performs an incremental compile, an incremental place and route, and a final static-timing analysis.

Based on this flow, a block is defined as an object that may be an RTL code with associated attributes, such as gate-level netlist, floorplan, stable timing shell, power budget, resource utilization ratios, etc.

The main goal of the system architect is to define an optimal partitioning of the implementation. The main areas of interest at this level include overall system performance, cost, testability, and interaction between sub-systems or blocks. While abstraction is recommended at this stage, the general design concept must be informative enough to minimize or even eliminate late design changes. Architects can then extract the meaningful information from the functional requirements of current and future projects. They must deliver clear instructions regarding the different feature sets and configurations that may be used in the future. The architects' instructions must always include clear statements about the assumptions and intent as well as quality assurance guidelines.

Critical design decisions are imperative in this early stage, considerable attention needs to be given to interface synthesis and inter-block communication protocols. Based on our experience, we recommend that point-to-point signals be adopted rather than tri-states, that channel-based rather than bus-based communication be implemented, and that customizable block interfaces, configurable clock speed, and feature sets be precisely defined.

Blocks versus systems

The block designer builds a library of validated and integration-friendly blocks. The objective is to make them generic and implement them efficiently. Since RTL HDL code quality is also a primary concern, it is recommended that designers adopt a coding style suitable for reuse. While maintaining the implementation quality of results, the end goal is to eliminate the need to redesign the block to fit the system requirements and late changes. In other words, designers should not compromise guarantees of performance and predictability, or the ability to parameterize blocks and top-level design.

The primary concerns of integrators remain architectural considerations and the validation of overall design functions. When timing issues arise, the integrators' role may be extended to include the identification of bottlenecks and their resolution. To achieve resolution, integrators have several choices that depend on the type of timing issues. They may adopt either an incremental or a re-optimization approach— depending on the difficulty faced and the margins available. Early study of blocks (giving an indication of design congestion) plus a concurrent prediction of place-and-route tool ability early on in the design cycle will enable integrators to fully explore alternative solutions.

If the architectural partitioning is done well at the block level, a manageable complexity of the blocks is created that favors incremental refinement and reduces the design time caused by late engineering changes. The block designer can then thoroughly investigatethe solution space and select the most stable and efficient implementations. This investigation may include achieving certain objectives, such as balancing timing performance, power dissipation, and testability.

At the integration level, integrators worry less about the blocks because they are validated and all thecomplexity, performance, and power dissipation attributes are known. Integrators have an easier task when balancing competing design constraints. If the place-and-route tool supports certain capabilities, the timing and functional validations are straightforward. In the power arena, the system designer can then implement an overall power control system that turns on and off clocking domains of exclusively active hierarchical blocks.

For the whole design team, the evidence of reuse advantages certainly creates the incentive to negotiate economical and technical barriers. Even if implementing such a methodology looks painful at first, it is quite beneficial in the long run—especially in terms of conserving resources and saving time.The block methodology is applicable if it supports tight coupling between specification, analysis, synthesis, and layout activities across multiple levels of representations. However, some other conditions need to be met as well. These conditions include the areas of target architecture and tool capabilities.

Figure 2 - Seeds of change
Architecture of the ProASIC Logic Tile demonstrates cells implementing 3-input combinational cells, latches, or registers indifferently.

Target architecture

Well-established ASIC design tools tend to provide better capabilities to characterize and freeze the timing of the blocks. In our experience, these tools are more effective in fine-grained ASIC-like architectures. The Actel ProASIC Flash-based reprogrammable technology offered the appropriate candidate for this discussion, as its logic cells are 3 inputs to 1 output tiles (see Figure 2).

A second architectural aspect that contributes to the support of the design process is the homogeneity of logic and routing resources in the die and amongall the devices within one family. The ProASIC routing architecture is hierarchical and is distributed for two of the four types of routing, namely the global network and the high-speed buses (see Figure 3).

Figure 3 - Global exchange
Global network and high-speed buses use routing resources differently.

Alternatively, several systems (networking applications, in particular) require a large number of external and internal clocks. Design challenges, such as skews and hold-time violations, arise as a result. The global network is splitable in spines, which make them abundantly available and frees designers from needing to budget them (see Figure 4).

Tool capabilities

To consolidate the block-based approach, ASICmaster offers a macro that allows the user to specify placement and routing resource constraints for individual cells or blocks. A macro can be hard, firm or soft, depending on when and how it is defined.

The macro is soft when defined after placement. The user can move, rotate, and flip the macro up/down, right/left. In cases where the macro is defined after routing, it implies a routing floorplan and involves different types of routing resources. In such cases, the macro can be either hard or firm. A hard macro is used if it involves specific routing resources such as a global network, or if it places FIFOs or RAMs in a specific coordinate of the die. Elsewhere, it is called a firm macro and the user can move and flip it to satisfy design requirements.

Figure 4 - Spine-splitting work
A networking application using 14 external clocks is split into spines.
ASICmaster also allows for exploration of several floorplan alternatives while refining the selected floorplan with a guaranteed timing convergence. This allows block designers to better characterize the block at the synthesis level, improve performance, and quickly close the timing loop. Moreover, the architect can then integrate one or several instances of the block at the top level or the higher hierarchical level, knowing that the place-and-route tools will ensure appropriate placement and effective timing.

This macro feature has been utilized in various networking applications that implement a large number of channels with several internal and external clocks for each block. The experimental results indicate that all the attributes associated with a particular macro or block have been fully preserved when these blocks have been integrated in larger designs. Interestingly enough, these performance and placement attributes are preserved for all instances of these macros (illustrated in Figure 5). For validation purposes, the gate-level netlist must be preserved in the layout tool so that all the original hierarchical boundaries, register, and net names are guaranteed to be in the gate-level netlist (see Figure 5).

Tight tool coupling

Under the traditional method of circuit design, engineers perform synthesis separately from place and route, and they do not figure out the routing delays until after the layout. Consequently, iterations between code tweaking, resynthesis, and place and route cause considerable schedule delays. The block methodology reduces these iterations and alerts the project manager and the block designer early on in the process. The newest generation of physical synthesis tools may help to solve this problem. Even if this type of tool is available for FPGAs, however, the design method proposed here dramatically reduces the risk. In the case of ProASIC, the link between synthesis and place and route works in both directions. The post synthesis SDF-path constraints, as well as the list of critical nets, enable ASICmaster to then perform timing-driven place and route. For further design re-optimization, the post layout SDF file is fed into synthesis (see Figure 2).

In designing a large block-based FPGA, it's the ability to perform block-based timing verification and then to perform complete chip verification that is most important. In case of verification failure or large positive margins, the synthesis scripts of certain blocks may be revisited.

Figure 5 - Preservation with tools
The macro feature preserves performance and placement attributes.

In the power arena, ASICmaster offers a power estimator that, based on various parameters such as block complexity, number of embedded memory blocks, estimated switching activity, pad types and loads, returns a good estimation of the power dissipation. Based on these estimates for the different blocks, synthesis scripts can be tuned towards an optimization of the factors that contribute to the power required. The architect/integrator may then revisit the targeted frequencies, or implement power control logic that switches on and off some of the blocks at top-level design.

For complete static timing verification, the use of advanced static analysis tools is a must. The post-layout timing data can be easily loaded in an industry standard tool and post-layout timing checks carried out using the original timing constraint files. In designs that contain multiple clock domains - false paths and multi-cycle paths—this saves a lot of time as no need exists to re-enter the timing constraints into a separate tool. Additionally, if the layout tool modifies the netlist, it may prove impossible to correctly verify the timing using software tools. In this case, many FPGA designers simply program the chips and test the design in the system. Unfortunately, this doesn't cover worst-case conditions or detect all timing arcs without extensive board-level testing.

Alternatively, post-layout simulations often fail at the gate level - even though RTL simulations are satisfactory. There are many reasons for gate-level simulation failures. They range from poor RTL code and synthesis errors totiming issues. Debugging these situations ina gate-level simulator is time consuming.However, since the gate-level netlist is preserved in the ASICmaster flow, all the original hierarchical boundaries, instances and even net names are in the netlist.

A final look

The implementation of a structured design methodology that satisfactorily copes with system issues from a user prospective must focus on the necessary tools and their linking to achieve the end goals, the example here being a high-speed networking application. Unlike coarse-grained, FPGA-associated design flows, the fine-grained device architecture simplifies both the ASIC and the FPGA designers' task in allowing designers to leverage existing ASIC design flows.


Paresh Patel is the principal of System Level Solutions( Morgan Hill CA).

Hichem Belhadj is the applications and customer support manager at Actel Corp. (San Jose).

To voice an opinion on this or any other article in Integrated System Design , please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About