Design Article
Comment
srikanth1986
I had realized a 100% hw solution for PR way back in 2010. So I don't know whats ...
angelmc
Just a clarification: they still work with the reconfiguration engine (as they ...
Accelerate partial reconfiguration with a 100% hardware solution
S. Lamonnier, M. Thoris, M. Ambielle, Sagem DS (Safran Group)
5/26/2012 12:39 PM EDT
Editor's Note: This article first appeared in the Second Quarter 2012 issue of Xcell Journal, and is reproduced here with the kind permission of Xilinx (Click Here to see a PDF of this issue).
In many modern applications such as video processing, minimizing FPGA reconfiguration time is critical in order to avoid losing too many images. Partial reconfiguration is a technique that allows users to reconfigure a small part of the FPGA without impacting logical elements around it. For the human eye to see an image without flicker, the reconfiguration time must be less than 40 milliseconds. That’s very little time to reconfigure an entire device, save for the smallest FPGAs; and in certain specific applications, this reconfiguration time must be even less. Hence the appeal of partial reconfiguration: Because a partial bitstream is smaller than a full one, it takes less time to reconfigure.
At Sagem DS, we have devised a technique that allows FPGA designers to accomplish partial reconfiguration very fast. The ML507 [1] was the Xilinx reference board we used for testing and validating the solution and to measure timing. Typically, the components on this board are a Virtex-5 FPGA (XC5VFX70T-FFG1136), a CPLD (used as a routing component) and two XCF32P memories (Xilinx Platform Flash).
A MicroBlaze vs. hardware solution
In many documents, partial reconfiguration (PR) uses an internal controller like a MicroBlaze or an external processor. Implementing a processor within the FPGA takes its toll in development time and consumes significant device resources, depending upon the configuration. Likewise, an external processor costs money and board space. Moreover, buses such as PLB or AXI have latency that degrades performance in terms of reconfiguration time.
For those reasons, we adopted an all-hardware solution based on a little state machine and the Internal Configuration Access Port (ICAP) interface for bitstream loading. This approach offers many advantages. Because there is no latency, it consumes few resources (less than 300 lookup tables on the FPGA), and designers can optimize the timing of the partial reconfiguration.
Development flow overview
From VHDL conception to bitstream and partial-bitstream creation, our hardware-based partial-reconfiguration flow is the same as the generic process described in Xilinx tutorials, user guides [2] and application notes, except that there is no embedded processor. Users must define reconfigurable areas in PlanAhead and import reconfigurable modules for each area. For all configuration runs, you import static logic from a previous run.
During partial reconfiguration, the FPGA must be in slave mode. That is to say, the only usable interfaces are JTAG, slave serial, slave SelectMap or ICAP. An external component should drive the FPGA CCLK for reconfiguration; ICAP is not accessible at the first FPGA boot. To achieve a timing-efficient reconfiguration, there can be no serial interfaces. That leaves us with at least two interesting interface choices: SelectMap and ICAP.
The first option is to use the SelectMap interface for full and partial bitstream loading. This method of configuration necessitates adding a Bitgen option (-g Persist) when creating the bitstreams. Here, the FPGA keeps the SelectMap pins under its control to load the partial bitstream. In addition, with SelectMap there is no signal to indicate the end of the process very precisely (like the DONE signal for full configuration). So it is difficult to know exactly when the partial reconfiguration has ended. Users must create a module that estimates when all configuration data has been sent.
This is why we opted instead to use the ICAP primitive to load the partial bitstreams. ICAP is not a self-configuring interface like SelectMap, so we do use SelectMap to realize the first boot for the FPGA. After that, the user code fully controls the ICAP primitive.
ICAP offers two advantages over a full SelectMap design. First, the commutation (SelectMap for first boot, ICAP for partial-bitstream loading) is transparent for users and there is no need for a Bitgen option. Thus, the user can control the memory pins. The second, and most important, advantage is that ICAP continually pushes a status on its output. This status changes if the FPGA is in a reconfiguration mode. That allows the user to “see” the end of partial reconfiguration.
Reconfigurable functions and components
In our company, we develop video-processing applications using FPGAs. These functions use logic elements, some BRAM components and a lot of DSPs. It is very interesting to study the reconfiguration time of these three elements because it will guide us in sizing a reconfigurable area based on the answers to two questions: How will we size the area to respect the 40-millisecond image refresh time our applications demand? And, how many different elements must we include in this area? On the ML507, we answer those questions by testing various types of configurations.
As shown in Figure 1, on the ML507 we chose the external phase-locked loop (PLL) as a reference clock for partial-reconfiguration control. Some connections are not our choice but are dictated by board design. The schematic does not show safeguards against conflicting levels on nets. The clock has a frequency of 33 MHz, which is the maximum speed of the two XCF32P memories on the board. [3] The data bus is 8 bits wide, which allows a data transfer rate of up to 264 Mbits/second.
On an FPGA, the smallest reconfigurable area is a frame. Frame size differs according to the type of FPGA. For example, on the Virtex-5 the frame is 20 CLBs high, [4] but on the Virtex-6 it is twice that number. The partial bitstream is a combination of elementary frames, BRAM, DSP and a few configuration words. The configuration interface and partial bitstream structure (number of elements contained) are the keys to achieve a timing-efficient partial reconfiguration.
Reconfiguration process
Figure 2 illustrates the nonexhaustive architecture of an FPGA designed for partial reconfiguration. All things begin when you switch on the board. The FPGA loads the full bitstream through the parallel SelectMap interface. After that, the user code controls all modules present in the light- and dark-green zones of the illustration. The decoupling logic is a specific zone designed around the PR module.
During the reconfiguration process, unknown states can appear on connection nets. This zone allows you to maintain a known state on the connection nets between the static zone and reconfigurable zone. The decoupling logic is in fact very small, consisting simply of a D flip-flop and a multiplexer on each net. The signal “PR_END” controls the multiplexer.
Now it’s time to reconfigure the PR_MODULE (for example, a histogram correction function) with a new configuration contained in a partial bitstream in memory. The signal “Ask for a reconfiguration” begins the partial-reconfiguration process, which follows a succession of states (the list given in Figure 2 is not exhaustive).
Inside ICAP
Users need not know or care precisely how ICAP works. That’s because the partial bitstream provides all the things ICAP needs for this application. Nevertheless, it’s important to understand two 32-bit words in ICAP: synchronize and desynchronize.
The first word, “synchronize,” is 5599AA66h on input. This word makes ICAP output changes from 9Fh to DFh (DFh means “component synchronized”). During all the time the ICAP output status is DFh, the FPGA loads in new configuration frames.
When configuration data is sent, the bitstream contains a “desynchronize” word, which is 000000B0. When ICAP receives this word, its output status changes to 9Fh, indicating that the component is desynchronized. So, checking the ICAP output status gives us a precise idea of the partial-reconfiguration time. Figure 3 shows the reconfiguration time for a frame with relevant signals.
Experiments and results
Designing a video-processing function such as automatic histogram correction (AHC) requires logic cells, BRAMs and a lot of DSP. Because corrections are not the same for illumination systems, thermic or TV video, the FPGA must adapt itself to respond in a minimum amount of time (less than 40 ms – that is to say, the time it takes the eye to register one image). Results seen in Figure 4(a), (b) and (c) give us a precise idea of the reconfiguration time according to our interface; we measured these timing results with little reconfigurable modules (one with logic only, one with logic and BRAM and the last with logic and DSP). The results show us that the time is linear by component category. Of course, the results are only valid with our interface specification (clock at 33 MHz and a bus that’s 8 bits wide).
Reconfiguring a bigger function, such as one to handle AHC, is more relevant for our products and puts the FPGA through its paces in operational conditions. To explain the AHC function briefly: A TV image has three components, Y, Cr and Cb. Y is luminance and Cr and Cb are red and blue colors. An AHC module must, for example, convert an image in RGB format and apply specific treatment on each color. In the case of black-and-white images, we can do a min, a max and an average, and then we apply a gain, an offset or both to add more luminosity (in case of dark images) and contrast. In terms of FPGA resources, this correction costs around 5,000 lookup tables (LUTs), 4,000 flip-flops, 100 DSP slices and 20 BRAM slices.
Figure 3 shows the reconfigurable zone we chose for our AHC function. This zone contains 7,840 LUTs and flip-flops, 112 DSP and 28 BRAM slices. With such a design, we respect Xilinx’s recommendation that logic density in the reconfigurable zone must not exceed 80 percent. The Virtex-5 architecture has four LUTs and four flip-flops in a slice. A configurable logic block (CLB) has two slices, so a frame contains 160 LUTs and 160 flip-flops. Our region contains 49 frames.
According to Figure 4(a), (b) and (c), we can calculate reconfiguration time this way: Logic needs 9.8 ms, 112 DSP needs 2 ms and 28 BRAM needs 6.4 ms. In all, then, the region needs 18.2 ms to reconfigure. That is far less than the 40 ms it takes the eye to register one image. Figure 5 shows the PlanAhead view with the reconfigurable zone.
Elimination of software development
As shown in Figure 1, we can replace the CPLD with logical components on a custom board. For now, the solution works fine, but it is not optimal. We can improve the reconfiguration interface by using a wider bus (ICAP supports up to 32 bits parallel) and a higher frequency.
The concept of 100 percent hardware partial reconfiguration represents a gain in term of system development cost because there is no need for software development, as would be the case in a solution using the MicroBlaze. By using the ICAP interface, a designer has total control of what is done in the FPGA, and ICAP greatly improves the performance and functionality of the system. Finally, for our video-processing applications in particular, hardware partial reconfiguration is a relevant feature that will enhance our company’s products.
References
About the authors
Sébastien Lamonnier is an FPGA Designer at Sagem DS (Safran Group). Sébastien may be contacted at sebastien.lamonnier@sagem.com
Marc Thoris is an FPGA Project Manager Sagem DS (Safran Group). Marc may be contacted at marc.thoris@sagem.com
Marlène Ambielle is an FPGA Designer at Sagem DS (Safran Group). Marlène may be contacted at marlene.ambielle@sagem.com
If you found this article to be of interest, visit Programmable Logic Designline where – in addition to my Max's Cool Beans blogs – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Last but certainly not least, make sure you check out the daily discussions and other information resources at All Programmable Planet, where we cover everything of interest in "Programmable Space" for everyone from beginners to experts, hardware designers and software developers, and system designers.
In many modern applications such as video processing, minimizing FPGA reconfiguration time is critical in order to avoid losing too many images. Partial reconfiguration is a technique that allows users to reconfigure a small part of the FPGA without impacting logical elements around it. For the human eye to see an image without flicker, the reconfiguration time must be less than 40 milliseconds. That’s very little time to reconfigure an entire device, save for the smallest FPGAs; and in certain specific applications, this reconfiguration time must be even less. Hence the appeal of partial reconfiguration: Because a partial bitstream is smaller than a full one, it takes less time to reconfigure.
At Sagem DS, we have devised a technique that allows FPGA designers to accomplish partial reconfiguration very fast. The ML507 [1] was the Xilinx reference board we used for testing and validating the solution and to measure timing. Typically, the components on this board are a Virtex-5 FPGA (XC5VFX70T-FFG1136), a CPLD (used as a routing component) and two XCF32P memories (Xilinx Platform Flash).
A MicroBlaze vs. hardware solution
In many documents, partial reconfiguration (PR) uses an internal controller like a MicroBlaze or an external processor. Implementing a processor within the FPGA takes its toll in development time and consumes significant device resources, depending upon the configuration. Likewise, an external processor costs money and board space. Moreover, buses such as PLB or AXI have latency that degrades performance in terms of reconfiguration time.
For those reasons, we adopted an all-hardware solution based on a little state machine and the Internal Configuration Access Port (ICAP) interface for bitstream loading. This approach offers many advantages. Because there is no latency, it consumes few resources (less than 300 lookup tables on the FPGA), and designers can optimize the timing of the partial reconfiguration.
Development flow overview
From VHDL conception to bitstream and partial-bitstream creation, our hardware-based partial-reconfiguration flow is the same as the generic process described in Xilinx tutorials, user guides [2] and application notes, except that there is no embedded processor. Users must define reconfigurable areas in PlanAhead and import reconfigurable modules for each area. For all configuration runs, you import static logic from a previous run.
During partial reconfiguration, the FPGA must be in slave mode. That is to say, the only usable interfaces are JTAG, slave serial, slave SelectMap or ICAP. An external component should drive the FPGA CCLK for reconfiguration; ICAP is not accessible at the first FPGA boot. To achieve a timing-efficient reconfiguration, there can be no serial interfaces. That leaves us with at least two interesting interface choices: SelectMap and ICAP.
The first option is to use the SelectMap interface for full and partial bitstream loading. This method of configuration necessitates adding a Bitgen option (-g Persist) when creating the bitstreams. Here, the FPGA keeps the SelectMap pins under its control to load the partial bitstream. In addition, with SelectMap there is no signal to indicate the end of the process very precisely (like the DONE signal for full configuration). So it is difficult to know exactly when the partial reconfiguration has ended. Users must create a module that estimates when all configuration data has been sent.
This is why we opted instead to use the ICAP primitive to load the partial bitstreams. ICAP is not a self-configuring interface like SelectMap, so we do use SelectMap to realize the first boot for the FPGA. After that, the user code fully controls the ICAP primitive.
ICAP offers two advantages over a full SelectMap design. First, the commutation (SelectMap for first boot, ICAP for partial-bitstream loading) is transparent for users and there is no need for a Bitgen option. Thus, the user can control the memory pins. The second, and most important, advantage is that ICAP continually pushes a status on its output. This status changes if the FPGA is in a reconfiguration mode. That allows the user to “see” the end of partial reconfiguration.
Reconfigurable functions and components
In our company, we develop video-processing applications using FPGAs. These functions use logic elements, some BRAM components and a lot of DSPs. It is very interesting to study the reconfiguration time of these three elements because it will guide us in sizing a reconfigurable area based on the answers to two questions: How will we size the area to respect the 40-millisecond image refresh time our applications demand? And, how many different elements must we include in this area? On the ML507, we answer those questions by testing various types of configurations.
As shown in Figure 1, on the ML507 we chose the external phase-locked loop (PLL) as a reference clock for partial-reconfiguration control. Some connections are not our choice but are dictated by board design. The schematic does not show safeguards against conflicting levels on nets. The clock has a frequency of 33 MHz, which is the maximum speed of the two XCF32P memories on the board. [3] The data bus is 8 bits wide, which allows a data transfer rate of up to 264 Mbits/second.
Figure 1. Partial-bitstream loading, ICAP
synchronization and desynchronization.
synchronization and desynchronization.
On an FPGA, the smallest reconfigurable area is a frame. Frame size differs according to the type of FPGA. For example, on the Virtex-5 the frame is 20 CLBs high, [4] but on the Virtex-6 it is twice that number. The partial bitstream is a combination of elementary frames, BRAM, DSP and a few configuration words. The configuration interface and partial bitstream structure (number of elements contained) are the keys to achieve a timing-efficient partial reconfiguration.
Reconfiguration process
Figure 2 illustrates the nonexhaustive architecture of an FPGA designed for partial reconfiguration. All things begin when you switch on the board. The FPGA loads the full bitstream through the parallel SelectMap interface. After that, the user code controls all modules present in the light- and dark-green zones of the illustration. The decoupling logic is a specific zone designed around the PR module.
Figure 2. Hardware PR control module within the FPGA.
During the reconfiguration process, unknown states can appear on connection nets. This zone allows you to maintain a known state on the connection nets between the static zone and reconfigurable zone. The decoupling logic is in fact very small, consisting simply of a D flip-flop and a multiplexer on each net. The signal “PR_END” controls the multiplexer.
Now it’s time to reconfigure the PR_MODULE (for example, a histogram correction function) with a new configuration contained in a partial bitstream in memory. The signal “Ask for a reconfiguration” begins the partial-reconfiguration process, which follows a succession of states (the list given in Figure 2 is not exhaustive).
- The first state, ICAP initialization, exists to give a default value to the control signals (CE and RW high).
- The second state prepares ICAP to receive data from memory. ICAP is now activated and is in write mode (write configuration in the FPGA).
- The third state controls memory signals and initializes the partial-bitstream transfer sequence from memory to FPGA. In parallel, decoupling logic retains signals between the static and reconfigurable regions. This prevents undesirable data from propagating in user functions.
- The fourth state is a wait state. During this time, ICAP loads configuration frames into the FPGA, while the “Status Check” block reads the ICAP output status. Detection of a “desynchronize” word releases the wait state.
- The fifth state pushes a synchronous reset in the PR_MODULE to reset and give a known state to the new logical elements.
- Finally, the last state releases decoupling logic and deactivates ICAP and memory.
Inside ICAP
Users need not know or care precisely how ICAP works. That’s because the partial bitstream provides all the things ICAP needs for this application. Nevertheless, it’s important to understand two 32-bit words in ICAP: synchronize and desynchronize.
The first word, “synchronize,” is 5599AA66h on input. This word makes ICAP output changes from 9Fh to DFh (DFh means “component synchronized”). During all the time the ICAP output status is DFh, the FPGA loads in new configuration frames.
When configuration data is sent, the bitstream contains a “desynchronize” word, which is 000000B0. When ICAP receives this word, its output status changes to 9Fh, indicating that the component is desynchronized. So, checking the ICAP output status gives us a precise idea of the partial-reconfiguration time. Figure 3 shows the reconfiguration time for a frame with relevant signals.
Figure 3. ICAP timing for one reconfigured
frame region consisting of six slices.
frame region consisting of six slices.
Experiments and results
Designing a video-processing function such as automatic histogram correction (AHC) requires logic cells, BRAMs and a lot of DSP. Because corrections are not the same for illumination systems, thermic or TV video, the FPGA must adapt itself to respond in a minimum amount of time (less than 40 ms – that is to say, the time it takes the eye to register one image). Results seen in Figure 4(a), (b) and (c) give us a precise idea of the reconfiguration time according to our interface; we measured these timing results with little reconfigurable modules (one with logic only, one with logic and BRAM and the last with logic and DSP). The results show us that the time is linear by component category. Of course, the results are only valid with our interface specification (clock at 33 MHz and a bus that’s 8 bits wide).
Figure 4 (a - top) Reconfiguration time for slice frames only;
(b - mid) Reconfiguration time for slice frames and BRAMs;
(c - bottom) Reconfiguration time for slice frames and DSPs.
(b - mid) Reconfiguration time for slice frames and BRAMs;
(c - bottom) Reconfiguration time for slice frames and DSPs.
Reconfiguring a bigger function, such as one to handle AHC, is more relevant for our products and puts the FPGA through its paces in operational conditions. To explain the AHC function briefly: A TV image has three components, Y, Cr and Cb. Y is luminance and Cr and Cb are red and blue colors. An AHC module must, for example, convert an image in RGB format and apply specific treatment on each color. In the case of black-and-white images, we can do a min, a max and an average, and then we apply a gain, an offset or both to add more luminosity (in case of dark images) and contrast. In terms of FPGA resources, this correction costs around 5,000 lookup tables (LUTs), 4,000 flip-flops, 100 DSP slices and 20 BRAM slices.
Figure 3 shows the reconfigurable zone we chose for our AHC function. This zone contains 7,840 LUTs and flip-flops, 112 DSP and 28 BRAM slices. With such a design, we respect Xilinx’s recommendation that logic density in the reconfigurable zone must not exceed 80 percent. The Virtex-5 architecture has four LUTs and four flip-flops in a slice. A configurable logic block (CLB) has two slices, so a frame contains 160 LUTs and 160 flip-flops. Our region contains 49 frames.
According to Figure 4(a), (b) and (c), we can calculate reconfiguration time this way: Logic needs 9.8 ms, 112 DSP needs 2 ms and 28 BRAM needs 6.4 ms. In all, then, the region needs 18.2 ms to reconfigure. That is far less than the 40 ms it takes the eye to register one image. Figure 5 shows the PlanAhead view with the reconfigurable zone.
Figure 5. PlanAhead view with reconfigurable zone.
Elimination of software development
As shown in Figure 1, we can replace the CPLD with logical components on a custom board. For now, the solution works fine, but it is not optimal. We can improve the reconfiguration interface by using a wider bus (ICAP supports up to 32 bits parallel) and a higher frequency.
The concept of 100 percent hardware partial reconfiguration represents a gain in term of system development cost because there is no need for software development, as would be the case in a solution using the MicroBlaze. By using the ICAP interface, a designer has total control of what is done in the FPGA, and ICAP greatly improves the performance and functionality of the system. Finally, for our video-processing applications in particular, hardware partial reconfiguration is a relevant feature that will enhance our company’s products.
References
- ML505/6/7 Block Diagram, Version A, Rev. 02, Xilinx, January 2008 (Click Here)
- Partial Reconfiguration User Guide, UG702, v13.1, Xilinx, March 2011 (Click Here)
- Platform Flash In-System Programmable Configuration PROMs, DS123, v2.18, Xilinx, May 2010 (Click Here)
- Virtex-5 FPGA Configuration User Guide, UG191, v3.9.1, Xilinx, August 2010 (Click Here)
About the authors
Sébastien Lamonnier is an FPGA Designer at Sagem DS (Safran Group). Sébastien may be contacted at sebastien.lamonnier@sagem.com
Marc Thoris is an FPGA Project Manager Sagem DS (Safran Group). Marc may be contacted at marc.thoris@sagem.com
Marlène Ambielle is an FPGA Designer at Sagem DS (Safran Group). Marlène may be contacted at marlene.ambielle@sagem.com
If you found this article to be of interest, visit Programmable Logic Designline where – in addition to my Max's Cool Beans blogs – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Last but certainly not least, make sure you check out the daily discussions and other information resources at All Programmable Planet, where we cover everything of interest in "Programmable Space" for everyone from beginners to experts, hardware designers and software developers, and system designers.
Navigate to related information


Dr DSP
5/28/2012 12:59 PM EDT
Seems to me that the tools have a long way to go to make dynamic reconfiguration something that most designs will be able to do. For example, the need to have the decoupling logic explicitly defined (if I'm reading the article correctly) is the type of thing I would expect to be handled automatically.
Sign in to Reply
angelmc
6/20/2012 5:27 AM EDT
In the university we worked (and they still do) on a similar approach for partial reconfiguration through the ICAP but that was handeled entirely by the FPGA itself. The maximum clock frequency for the ICAP was achieved with little FPGA resources consumption.
Sign in to Reply
angelmc
6/20/2012 5:56 AM EDT
Just a clarification: they still work with the reconfiguration engine (as they call it) that works correctly. They are working with upper layers to optimize full applications.
Sign in to Reply
srikanth1986
2/14/2013 6:26 AM EST
I had realized a 100% hw solution for PR way back in 2010. So I don't know whats so innovative about it now.
Sign in to Reply