Sometimes it's hard to explain how much faster software logic can run when it is deployed in hardware. In a clever benchmarking experiment that incorporates simple logic refactoring, artificial intelligence, and machine compilation from software to FPGA hardware, a Clemson student took a popular board game and turbo-charged it 10x in an Altera Cyclone FPGA.
"Blokus-Duo" is a two-player strategy-based board game. It is a compact version of the original Blokus game. Its small size, progressive logic, and strategy-based approach makes Blokus-Duo a highly suitable application for performance evaluation of various Artificial Intelligence (AI) techniques on limited resource platforms like smaller Field Programmable Gate Arrays (FPGAs).
This implementation involves each player running an AI routine, either on an FPGA or a computer, with a host machine coordinating the two players. The players communicate with the host machine using an RS-232C (D-Sub 9-pin) serial port interface. Each move is transmitted as a code, which is subsequently decoded by the host machine. The host machine then informs the other player of its opponent's move. Besides coordinating, the host machine monitors the status of the game including violations by any player as well as final outcomes.
A minimax-based game tree algorithm is employed with alpha-beta pruning for the AI routine. The Altera DE2 115 development board with a Cyclone IV E FPGA is used as the target platform for the implementation.
Altera DE2 board.
The Altera DE2 board is a common academic "workhorse" due to its low cost relative to the processing power it provides. The AI routines are developed using Impulse C -- a High-Level Synthesis (HLS) tool for FPGA-based applications -- which converts code written in a subset of the C programming language to a Hardware Description Language (HDL) like Verilog and VHDL.
The resulting HDL code can then be further synthesized into bitstreams for programming FPGAs using proprietary tools provided by corresponding FPGA vendors. The ability to generate FPGA bitstreams from high-level C code greatly reduces both the learning curve and development time.
The minimax algorithm searches a game tree of available moves for the best possible move as evaluated by a heuristic function. The depth of the search tree depends on the time limit for making a move. Each branch of the tree represents a sequence of possible moves. A heuristic function, which evaluates the state of the game for every branch, returns a value denoting the preference of that state for the player making the move.
The algorithm tries to maximize the heuristic value for the player making the move, while -- at the same time -- minimizing it for the opponent; hence the name minimax. The algorithm is further optimized using alpha-beta pruning, whereby some of the branches are not searched because they cannot lead to better moves than the current best move.
The FPGA design includes a UART to communicate with a host system that monitors the game. The UART is designed in Verilog HDL. The AI part of the application that consists of a minimax-based game tree algorithm is implemented using C. This is subsequently machine-compiled to generate multiple streaming processes running on the FPGA. The AI part interfaces with the UART using a streaming interface. The high level design (written in C) consists of parallel computational blocks called processes, which communicate using streaming interfaces that are provided as library components.
Architecture of FPGA implementation for Blokus-Duo AI.
The implementation involves three layers of processes as illustrated above. A Level-1 process communicates with the UART and feeds data to four Level-2 processes. Each of these Level-2 processes in turn feeds data to five Level-3 processes. The Level-3 processes evaluate the heuristic function and hence constitute the most critical section in terms of performance. The number of processes at each level is optimized so to balance the load and ensure that all the processes are optimally utilized.
To Page 2 >