This paper presents an FPGA-based accelerated solution for DNA sequencing and dot plotting. It describes how multiple FPGA devices can be deployed to create a scalable cluster dedicated to the task of analyzing large amounts of data, and how this clustered hardware application can be connected to a software application for visualization and analysis.
It also discusses parallel FPGA optimizations, and demonstrates how higher-level programming methods, using the C language, can speed the development of this and other types of highly parallel algorithms.
DNA Sequencing and Dot Plots
DNA sequencing and analysis are key components of modern medical science. DNA sequencing is indispensable in basic research as well as in practical applications such as pharmaceutical development, disease prevention and criminal forensics.
DNA sequencing is just one step in the process of bioinformatics analysis. Managing and understanding the results of sequencing and comparing genetic data is critical to making bioinformatics a practical technology.
A dot plot is a graphical tool for visualization enabling the comparison of two biological sequences. A dot plot provides an easy way to understand a large amount of information about the relationship of two sequences, and serves as a framework for further analysis.
The most basic dot plot is a comparison of every acid in each DNA sequence to every acid in the other. These point-to-point comparisons are viewed as a 2-dimensional grid of dots, as shown in Figure 1. Each sequence is placed on an axis of the grid and a dot is drawn at each point in the grid at which the corresponding acids in each chain are equal. When you look at this image of dotsthe dot plotits lines, blocks, and other patterns clearly reveal the similarities between the two DNA sequences.
Figure 1. A DNA sequence dot plot (click on image to enlarge).