Before we start, I would like to note that this column came out of a detailed case study that was conducted while I worked at SilMinds on the productization of a set of patented, high-performance, decimal floating-point arithmetic IP cores as a potential real-time FPGA solution for accelerating accuracy-demanding, computationally intensive high-frequency trading (HFT) platforms.
Emerging capital market HFT is bringing strong FPGA use cases in networking, messaging, and financial computing acceleration. Stake holders include institutional and proprietary investors, exchanges, and electronic communication networks (ECNs) offering 24×7 exchange-like services, brokerages, and third-party market data providers. In both the US and UK markets, HFT has averaged approximately 60% of equity trading volume throughout the past five years.
Stimulated by sub-millisecond buy and sell trade orders, the aforementioned entities are engaging in a speed race to cut down market data round-trip latency. The trader demands fresher market data to enable better qualified order executions that would reach the exchange’s order matching engine faster than those of competitors. Various latencies, within and across platforms, are being pressed into the (sub)-microsecond order.
HFT servers are co-located at the exchange/ECN premises, connected to the house LANs through lossless, high-performance Ethernet switches, which are often optimized for HFT operations. Areas for hardware acceleration include network stack, messaging protocols, and raw market data processing. With respect to traders’ platforms, additional areas include risk, order execution, and performance management, as illustrated below.

Several HFT operational characteristics demonstrate specific merit with regard to “reconfigurable” hardware acceleration as follows:
- HFT is sensitive to both latency and jitter. Algorithms work better with deterministic latency segments. Hardware implementations provide the needed determinism.
- HFT algorithms are highly proprietary and need to reconfigure fairly frequently. There are also always other traders monitoring orders at the exchange engine end (public data) applying sophisticated analytical algorithms to infer issuers’ identities and their types/parameters of execution algorithms. Particularly at the trader’s end, there are few template functions, and so hardware suppliers that target HFT platforms work more as custom-optimized FPGA solution providers.
Now, let’s consider various aspects of high-frequency trading — along with associated challenges — in a little more detail…
Network and messaging latency reduction
There are three sources of network latency: LAN switches, the software network (TCP/UDP) stack, and the radio or fiber metro links that are set up by traders operating across multiple venues. Messaging latencies are a result of software decoding and encoding of application protocols, most commonly FIX (Financial Information eXchange). There are also functions such as multiplexing and message filtering by customer, venue, or traded security symbol. The overall function is referred to as “Feed Handling” or “Market Data Acquisition.”
Cisco’s CPU-based switches afford, at best, on the order of 250 ns latency. Its Nexus 7000 series features an FPGA upgrade option, where the user would offload his or her own image onto the configurable device (e.g., a FIX/TCP offload).
xCelor’s XPM2 is a media-agnostic FPGA-based switch with an optional motherboard. Switching latency averages 2.5 ns through the FPGA, in contrast to 200+ns through the Xeon processor. It can deliver outbound orders to the exchange engine at 90 ns latency and zero-jitter, with port multiplexing and multicast filtering.
Enyx offers a comprehensive line of FPGA-based appliances featuring multi-venue distribution, per-symbol and other parameters filtering, and bandwidth-optimized multiplexed Ethernet over radio or dark fiber interconnecting venues within a metro area, improving bandwidth utilization by approximately 40%, and overcoming radio link reliability issues.




I'd be very interested tio hear more about the "high-performance, decimal floating-point arithmetic IP cores" you mention in this article