This article highlights the capabilities of the new Xilinx 7 series FPGAs, giving potential users the information they need to understand the features of the families. It describes the functionality of the devices without going into the level of detail contained in the various 7 series FPGA user guides, providing sufficient detail to understand how the new 7 series FPGAs will benefit your system without requiring you to read the Full Set of detailed product documentation.
Introduction to Xilinx 7 Series FPGAs
The Xilinx 7 series comprises three new FPGA families that address the complete range of system requirements, from low-cost, small-form-factor, cost-sensitive, high-volume applications to the most demanding high-performance applications that need ultra-high-end connectivity bandwidth, logic capacity and signal-processing capability. The 7 series FPGAs are the programmable silicon foundation for Xilinx Targeted Design Platforms, which enable designers to focus on innovation from the outset of their development cycle.
The 7 series FPGAs include:
- Artix-7 Family: Optimized for lowest cost and power with small-form-factor packaging for the highest-volume applications.
- Kintex-7 Family: Optimized for best price/performance with a 2x improvement over previous-generation devices, enabling a new class of FPGAs.
- Virtex-7 Family: Optimized for highest system performance and 2x improvement in capacity over previous-generation FPGAs, to satisfy the insatiable demand for higher bandwidth and higher performance.
7 Series Families Comparison
All 7 series devices share a unified fourth-generation Advanced Silicon Modular Block (ASMBL) column-based architecture that reduces system development and deployment time with simplified design portability. Building three families on a common platform that is based on the successful Virtex-6 and Spartan-6 architectures enables designers to port their existing designs and IP to the 7 series FPGAs with minimal redesign. It also ensures easy migration between the different 7 series FPGA families as design requirements change.
The Artix-7 family, optimized for the lowest cost and lowest power, is ideally suited to demanding handheld applications including portable ultrasound machines, digital camera control and software-defined radio. The low-power, cost-effective Kintex-7 family provides a perfect balance of features and performance, making it ideal for wireless LTE infrastructure equipment, LED backlit and 3D digital video displays, medical imaging and avionics imaging systems. The highest in system performance, the Virtex-7 family, enables 400G line cards for next-generation optical networks, 300G bridges, terabit switch fabric, 100G OTN muxponders, radar and ASIC emulation. Within the Virtex-7 family, there are three categories of devices: the Virtex-7 T, providing the broadest range of capacity; Virtex-7 XT, providing higher signal-processing capability and increased serial bandwidth; and Virtex-7 HT, offering the very highest bandwidth using the 28.05-Gbit/second serial transceiver technology.
Xilinx’s innovative stacked-silicon interconnect technology enables 100x improvement in die-to-die bandwidth per watt to deliver a two- to threefold capacity advantage over monolithic devices. This technology is central to the Virtex-7 FPGA family. Supported by standard design flows, it provides the high logic, processing, memory and transceiver capacity that demanding applications require.
The logic structure, consisting of lookup tables (LUTs) and flip-flops, is the fundamental building block of any FPGA architecture. The LUTs in 7 series FPGAs can be configured as either a six-input LUT (64-bit ROMs) with one output, or as two five-input LUTs (32-bit ROMs) with separate outputs but common addresses or logic inputs. Each LUT output can optionally be registered in a flip-flop. Four such LUTs and their eight flip-flops as well as multiplexers and arithmetic carry logic form a slice, and two slices form a configurable logic block (CLB). Four flip-flops per slice (one per LUT) can optionally be configured as latches. In that case, the remaining four flip-flops in that slice must remain unused. In 25 to 50 percent of all slices, you can also use the LUTs as distributed 64-bit RAM, as 32-bit shift registers (SRL32) or as two SRL16s. Modern synthesis tools take advantage of these highly efficient logic, arithmetic and memory features.
In addition to the capability to configure the LUTs as distributed RAM, all Xilinx 7 series FPGAs contain blocks of 36-kbit configurable memory called block RAM. Every 7 series FPGA has between 20 and 2,360 dual-port block RAMs, each storing 36 kbits. Each block RAM has two completely independent ports that share nothing but the stored data.
The clock controls each memory access, read or write. All inputs, data, address, clock enables and write enables are registered. Nothing happens without a clock. The input address is always clocked, retaining data until the next operation. An optional output data pipeline register allows higher clock rates at the cost of an extra cycle of latency. During a write operation, the data output can reflect either the previously stored data or the newly written data, or it can remain unchanged.
Programmable Data Width
You can configure each port as 32K × 1, 16K × 2, 8K × 4, 4K × 9 (or 8), 2K × 18 (or 16), 1K × 36 (or 32) or 512K × 72 (or 64). The two ports can have different aspect ratios without any constraints. Each block RAM can be divided into two completely independent 18-kbit block RAMs, each configurable to any aspect ratio from 16K × 1 to 512K × 36. All attributes of the full 36-kbit block RAM also apply to each of the smaller 18-kbit block RAMs. In simple dual-port (SDP) mode, data widths of greater than 18 bits (18-kbit RAM) or 36 bits (36-kbit RAM) can be accessed. In this mode, one port is dedicated to read operation, the other to write operation. In SDP mode, one side (read or write) can be variable while the other is fixed to 32/36 or 64/72. Both sides of the dual-port 36-kbit RAM can be of variable width. Two adjacent 36-kbit block RAMs can be configured as one cascaded 64K × 1 dual-port RAM without any additional logic.
Error Detection and Correction
Each 64-bit-wide block RAM can generate, store and utilize eight additional Hamming-code bits and perform single-bit and double-bit error detection (ECC) during the read process. The ECC logic can also be used when writing to or reading from external 64- to 72-bit-wide memories.
The built-in FIFO controller for single-clock (synchronous) or dual-clock (asynchronous or multirate) operation increments the internal addresses and provides four handshaking flags: full, empty, almost full and almost empty. The almost full and almost empty flags are freely programmable. Similar to the block RAM, the FIFO width and depth are programmable, but the write and read ports are always identical in width. First-word fall-through mode presents the first-written word on the data output even before the first read operation. After the first word has been read, there is no difference between this mode and the standard mode.
Digital Signal Processing
DSP applications use many binary multipliers and accumulators, best implemented in dedicated DSP slices. All 7 series FPGAs have many dedicated, full-custom, low-power DSP slices, combining high speed with small size while retaining system design flexibility. Each DSP slice fundamentally consists of a dedicated 25 × 18 bit two's complement multiplier and a 48-bit accumulator. The multiplier can be dynamically bypassed, and two 48-bit inputs can feed a single-instruction-multiple-data (SIMD) arithmetic unit (dual 24-bit add/subtract/accumulate or quad 12-bit add/subtract/accumulate), or a logic unit that can generate any of 10 different logic functions of the two operands. The DSP includes an additional pre-adder, typically used in symmetrical filters. This pre-adder improves performance in densely packed designs and reduces the DSP slice count by up to 50 percent.
The DSP also includes a 48-bit-wide pattern detector that can be used for convergent or symmetric rounding. The pattern detector is capable of implementing 96-bit-wide logic functions when used in conjunction with the logic unit. The DSP slice provides pipelining and extension capabilities that enhance the speed and efficiency of many applications beyond digital signal processing, such as wide dynamic bus shifters, memory address generators, wide bus multiplexers and memory-mapped I/O register files. The accumulator can also be used as a synchronous up/down counter.
Each 7 series FPGA has up to 24 clock management tiles (CMTs), each consisting of one mixed-mode clock manager (MMCM) and one phase-locked loop (PLL).
Mixed-Mode Clock Manager and Phase-Locked Loop
The MMCM and PLL share many characteristics. Both can serve as a frequency synthesizer for a wide range of frequencies and as a jitter filter for incoming clocks. At the center of both components is a voltage-controlled oscillator (VCO), which speeds up and slows down depending on the input voltage it receives from the phase frequency detector (PFD). There are three sets of programmable frequency dividers: D, M and O. The predivider D, programmable by configuration and afterwards via the dynamic reconfiguration port (DRP), reduces the input frequency and feeds one input of the traditional PLL phase/frequency comparator. The feedback divider M (programmable by configuration and afterwards via DRP) acts as a multiplier because it divides the VCO output frequency before feeding the other input of the phase comparator. D and M must be chosen appropriately to keep the VCO within its specified frequency range. The VCO has eight equally spaced output phases (0°, 45°, 90°, 135°, 180°, 225°, 270° and 315°). Each can be selected to drive one of the output dividers (six for the PLL, O0 to O5, and seven for the MMCM, O0 to O6), each programmable by configuration to divide by any integer from 1 to 128.
MMCM Additional Programmable Features
The MMCM can have a fractional counter in either the feedback path (acting as a multiplier) or in one output path. Fractional counters allow noninteger increments of 1/8 and can thus increase frequency synthesis capabilities by a factor of 8. The MMCM can also provide fixed or dynamic phase shift in small increments that depend on the VCO frequency.
Every 7 series FPGA provides different types of clock lines (BUFG, BUFR, BUFIO, BUFH, BUFMR and the high-performance clock) to address the different clocking requirements of high fanout, short propagation delay and extremely low skew.
Global Clock Lines
In each 7 series FPGA, 32 global clock lines have the highest fanout and can reach every flip-flop clock, clock enable and set/reset as well as many logic inputs. There are 12 global clock lines within any clock region driven by the horizontal clock buffers (BUFH). Each BUFH can be independently enabled/disabled, allowing for clocks to be turned off within a region, thereby offering fine-grained control over which clock regions consume power. Global clock lines can be driven by global clock buffers, which can also perform glitchless clock multiplexing and clock-enable functions. Global clocks are often driven from the CMT, which can completely eliminate the basic clock distribution delay.
Regional clocks can drive all clock destinations in their region. A region is defined as an area that is 50 I/Os and 50 CLBs high and half the chip width. The 7 series FPGAs have between two and 24 regions, each with four regional clock tracks. Each regional clock buffer can be driven from either of four clock-capable input pins, and its frequency can optionally be divided by any integer from 1 to 8.
I/O clocks are especially fast and serve only I/O logic and serializer/deserializer (serdes) circuits. The
7 series FPGAs have a direct connection from the MMCM to the I/O for low-jitter, high-performance interfaces.
The number of I/O pins varies depending on device and package size. Each I/O is configurable and can comply with a large number of I/O standards. With the exception of the supply pins and a few dedicated configuration pins, all other package pins have the same I/O capabilities, constrained only by certain banking rules. The I/Os in 7 series FPGAs are classed as either high range (HR) or high performance (HP). The HR I/Os offer the widest range of voltage support, from 1.2 V to 3.3 V. The HP I/Os are optimized for highest-performance operation, from 1.2 V to 1.8 V. All I/O pins are organized in banks, with 50 pins per bank. Each bank has one common VCCO output supply, which also powers certain input buffers. Some single-ended input buffers require an internally generated or an externally applied reference voltage (VREF
). There are two VREF
pins per bank (except configuration bank 0). A single bank can have only one VREF
Xilinx 7 series FPGAs use a variety of package types to suit the needs of the user, including small-form-factor wire-bond packages for lowest cost; conventional, high-performance flip-chip packages; and lidless flip-chip packages that balance smaller form factor with high performance. In the flip-chip packages, the silicon device is attached to the package substrate using a high-performance flip-chip process. Controlled ESR discrete decoupling capacitors are mounted on the package substrate to optimize signal integrity under simultaneous switching of outputs (SSO) conditions.
I/O Electrical Characteristics
Single-ended outputs use a conventional CMOS push/pull output structure driving high toward VCCO or low toward ground, and can be put into a high-Z state. The system designer can specify the slew rate and the output strength. The input is always active but is usually ignored while the output is active. Each pin can optionally have a weak pull-up or a weak pull-down resistor.
Any signal pin pair can be configured as a differential input pair or output pair. Differential input pin pairs can optionally be terminated with a 100Ω internal resistor. All 7 series devices support differential standards beyond LVDS: HT, RSDS, BLVDS, differential SSTL and differential HSTL. Each of the I/Os supports memory I/O standards, such as single-ended and differential HSTL as well as single-ended SSTL and differential SSTL. The SSTL I/O standard can support data rates of up to 1,866 Mbits/s for DDR3 interfacing applications.
Three-State Digitally Controlled Impedance and Low-Power I/O Features
Three-state digitally controlled impedance (T_DCI) can control the output drive impedance (series termination) or can provide parallel termination of an input signal to VCCO or split (Thevenin) termination to VCCO/2. This allows users to eliminate off-chip termination for signals using T_DCI. Besides saving board space, the termination automatically turns off when in output mode or when three-stated, saving considerable power compared with off-chip termination. The I/Os also have low-power modes for IBUF and IDELAY to provide further power savings, especially when used to implement memory interfaces.
Input and Output Delay:
All inputs and outputs can be configured as either combinatorial or registered. Double data rate (DDR) is supported by all inputs and outputs. Any input and some outputs can be individually delayed by up to 32 increments via the IDELAY and ODELAY features. The number of delay steps can be set by configuration and can also be incremented or decremented while in use.
ISERDES and OSERDES:
Many applications combine high-speed, bit-serial I/O with slower parallel operation inside the device. This requires a serializer and deserializer (serdes) inside the I/O structure. Each I/O pin possesses an 8-bit IOSERDES (ISERDES and OSERDES), capable of performing serial-to-parallel or parallel-to-serial conversions with programmable widths of 2, 3, 4, 5, 6, 7 or 8 bits. By cascading two IOSERDES from two adjacent pins (default from differential I/O), wider conversions of 10 and 14 bits can also be supported. The ISERDES has a special oversampling mode capable of asynchronous data recovery for applications like a 1.25-Gbit/s LVDS I/O-based SGMII interface.
Ultrafast serial data transmission to optical modules—between ICs on the same printed-circuit board (PCB), over the backplane or over longer distances—is becoming increasingly popular and important to enable customer line cards to scale to 100 Gbits/s and onward to 400 Gbits/s. This kind of transmission requires specialized, dedicated on-chip circuitry and differential I/O capable of coping with the signal integrity issues at these high data rates.
Transceiver count in the 7 series FPGAs ranges from 0 to 32 transceiver circuits in the Artix-7 and Kintex-7 families and as many as 96 transceiver circuits in the Virtex-7 family. Each serial transceiver is a combined transmitter and receiver. The various 7 series family members offer different top-end data rates. The GTP operates up to 6.6 Gbits/s, the GTX at up to 12.5 Gbits/s, the GTH at up to 13.1 Gbits/s and the GTZ at up to 28.05 Gbits/s. The various 7 series serial transceivers use either a combination of ring oscillators and LC tank or, in the case of the GTZ, a single LC tank architecture to allow the ideal blend of flexibility and performance while enabling IP portability across the family members. Lower data rates can be achieved using FPGA logic-based oversampling. The serial transmitter and receiver are independent circuits that use an advanced PLL architecture to multiply the reference frequency input by certain programmable numbers between 4 and 25 to become the bit-serial data clock. Each transceiver has a large number of user-definable features and parameters that designers can define during device configuration. Many can also be modified during operation.
The transmitter is fundamentally a parallel-to-serial converter with a conversion ratio of 16, 20, 32, 40, 64 or 80. Additionally, the GTZ transmitter supports up to 160-bit data widths. This allows the designer to trade off datapath width for timing margin in high-performance designs. These transmitter outputs drive the PCB with a single-channel differential output signal. TXOUTCLK is the appropriately divided serial data clock and can be used directly to register the parallel data coming from the internal logic. The incoming parallel data is fed through an optional FIFO and has additional hardware support for the 8b/10b, 64b/66b or 64b/67b encoding schemes to provide a sufficient number of transitions. The bit-serial output signal drives two package pins with differential signals. This output signal pair has a programmable signal swing as well as programmable pre- and post-emphasis to compensate for PCB losses and other interconnect characteristics. For shorter channels, the swing can be reduced to curb power consumption.
The receiver is fundamentally a serial-to-parallel converter, changing the incoming bit-serial differential signal into a parallel stream of words, each 16, 20, 32, 40, 64 or 80 bits. Additionally, the GTZ receiver supports data widths of up to 160 bits. This allows the FPGA designer to trade off internal datapath width vs. logic timing margin. The receiver takes the incoming differential data stream, feeds it through programmable linear and decision feedback equalizers (to compensate for PC board and other interconnect characteristics) and uses the reference clock input to initiate clock recognition. There is no need for a separate clock line. The data pattern uses nonreturn-to-zero (NRZ) encoding and optionally guarantees sufficient data transitions by using the selected encoding scheme. Parallel data is then transferred into the FPGA logic using the RXUSRCLK clock. For short channels, the transceivers offer a special low-power mode (LPM) to reduce power consumption by approximately 30 percent.
The transceivers provide out-of-band signaling, often used to send low-speed signals from the transmitter to the receiver while high-speed serial data transmission is not active. This is typically done when the link is in a powered-down state or has not yet been initialized. This benefits PCI Express® and SATA/SAS applications.
All 7 series devices with transceivers include at least one integrated block for PCI Express technology that can be configured as an endpoint or root port, compliant to the PCI Express Base Specification Revision 2.1 or 3.0. The root port can be used to build the basis for a compatible root complex, to allow custom FPGA-to-FPGA communication via the PCI Express protocol and to attach ASSP endpoint devices, such as Ethernet controllers or Fibre Channel host bus adapters (HBAs) to the FPGA.
This block is highly configurable to system design requirements and can operate with one, two, four or eight lanes at data rates of 2.5, 5 and 8 Gbits/s. For high-performance applications, advanced buffering techniques offer a flexible maximum payload size of up to 1,024 bytes. This integrated block interfaces to the integrated high-speed transceivers for serial connectivity and to block RAMs for data buffering. Combined, these elements implement the physical, data link and transaction layers of the PCI Express protocol.
Xilinx provides a lightweight, configurable, easy-to-use LogiCORE™ IP wrapper that ties the various building blocks (the integrated block for PCI Express, the transceivers, block RAM and clocking resources) into an endpoint or root port solution. The system designer has control over many configurable parameters: lane width, maximum payload size, FPGA logic interface speeds, reference clock frequency and base address register decoding and filtering.
Xilinx offers two wrappers for the integrated block: AXI4-Stream and AXI4 (memory mapped). Legacy TRN/Local Link is not available in 7 series devices for the integrated block for PCI Express. AXI4-Stream is designed for existing customers of the integrated block and enables easy migration from TRN. AXI4 (memory mapped) is designed for the Xilinx Platform Studio/EDK design flow and MicroBlaze processor-based designs.
Xilinx 7 series FPGAs store their customized configuration in SRAM-type internal latches. The number of configuration bits is between 5 and 431 Mbits (0.6 to 54 Mbytes), depending on device size but independent of the specific user-design implementation, unless compression mode is used. The configuration storage is volatile and must be reloaded whenever the FPGA is powered up. This storage can also be reloaded at any time by pulling the PROGRAM_B pin low. Several methods and data formats for loading configuration are available, determined by the three mode pins.
The SPI interface (x1, x2 and x4 modes) and the BPI interface (parallel-NOR x8 and x16) are two common methods used for configuring the FPGA. Users can directly connect an SPI or BPI flash to the FPGA, and the FPGA's internal configuration logic reads the bitstream out of the flash and configures itself. The FPGA automatically detects the bus width on the fly, eliminating the need for any external controls or switches. Bus widths supported are x1, x2 and x4 for SPI, and x8 and x16 for BPI. The larger bus widths increase configuration speed and reduce the amount of time it takes for the FPGA to start up after power-on.
In master mode, the FPGA can drive the configuration clock from an internally generated clock. Alternatively, for higher-speed configuration, the FPGA can use an external configuration clock source. This allows high-speed configuration with the ease of use characteristic of master mode. Slave modes up to 32 bits wide are especially useful for a processor-driven configuration.
The FPGA has the ability to reconfigure itself with a different image using SPI or BPI flash, eliminating the need for an external controller. The FPGA can reload its original design in case there are any errors in the data transmission, ensuring an operational FPGA at the end of the process. This is especially useful for updates to a design after the end product has been shipped. Customers can ship their products with an early version of the design, thus getting to market faster. This feature allows customers to keep their end users current with the most up-to-date designs while the product is already in the field.
The dynamic reconfiguration port gives the system designer easy access to the configuration and status registers of the MMCM, PLL, XADC, transceivers and integrated block for PCI Express. The DRP behaves like a set of memory-mapped registers, accessing and modifying block-specific configuration bits as well as status and control registers.
Encryption, Readback and Partial Reconfiguration
In all 7 series FPGA devices the FPGA bitstream, which contains sensitive customer IP, can be protected with 256-bit AES encryption and HMAC/SHA-256 authentication to prevent unauthorized copying of the design. The FPGA performs decryption on the fly during configuration using an internally stored 256-bit key. This key can reside in battery-backed RAM or in nonvolatile eFUSE bits.
Most configuration data can be read back without affecting the system's operation. Typically, configuration is an all-or-nothing operation. But Xilinx 7 series FPGAs support partial reconfiguration, an extremely powerful and flexible feature that allows the user to change portions of the FPGA while other portions remain static. Users can time-slice these portions to fit more IP into smaller devices, saving cost and power. Where applicable in certain designs, partial reconfiguration can greatly improve the versatility of the FPGA.
Xilinx 7 series FPGAs deliver Agile Mixed Signal technology: customized analog with FPGA flexibility. All Xilinx 7 series FPGAs contain a general-purpose analog interface called the XADC, which builds upon the successful system monitor found in previous generations of Virtex FPGAs. The XADC contains two 12-bit analog-to-digital converters, on-chip sensors and external analog input channels. The 12-bit ADCs support sample rates of up to 1 million samples per second and can simultaneously sample two external-input analog channels. The 7 series FPGAs support up to 17 external analog input channels. The ADCs support a diverse range of applications that need to process analog signals with bandwidths of less than 500 kHz.
The XADC optionally uses an on-chip reference circuit, thereby eliminating the need for any external active components for basic on-chip monitoring of temperature and power supply rails. To achieve the full 12-bit performance of the ADCs, an external 1.25-V reference IC is recommended. The on-chip temperature and power supplies are monitored with a measurement accuracy of ±4°C and ±1 percent, respectively, using either reference source. By default, the XADC continuously digitizes the output of all on-chip sensors. The most recent measurement results (together with maximum and minimum readings) are stored in dedicated registers for access at any time via on-chip or external JTAG interfaces. User-defined alarm thresholds can automatically indicate over-temperature events and unacceptable power supply variation. A user-specified limit (for example, 100°C) can be used to initiate an automatic powerdown.
Xilinx’s 7 series FPGAs are an innovative new family of FPGAs, providing industry-leading I/O bandwidth capability, a breakthrough reduction in power consumption and class-leading DSP performance. Built on the fourth-generation ASMBL columnar architecture, all 7 series FPGAs families share a single, unified architecture. This unified approach shortens development time for customers and simplifies the process of migrating designs and IP from Virtex-6 and Spartan-6 FPGA families, while enabling new designs to easily span across all 7 series FPGAs with minimal effort. Adding the innovative stacked-silicon interconnect technology into the mix enables Xilinx to provide all these benefits on the largest FPGAs ever built.
to read more about the Xilinx 7 series FPGAs
About the author
Nick Mehta is a staff product marketing engineer in the technical marketing team at Xilinx. Having joined Xilinx in 2000, he has held a number of roles across different organizations and business units, authoring many documents, writing and delivering customer training classes, and presenting at seminars around the world. Nick received an honors degree in electrical and electronic engineering from the University of Leicester, U.K., in 1999.