While the role of DNA as a biological memory is well established exploring its potential as a data memory is relatively new. DNA data memory has not quite yet reached the stage where a blob of DNA can have some wires attached to it to write and read its data content, good progress has been made.
The their latest work, published in Science magazine, Yaniv Erlich and Dina Zielinski from Columbia University and the New York Genome Center, mixed some clever biochemistry with some leading edge communications data encoding techniques and added a dash of processing power. The result, under the heading of “DNA Fountain,” is a demonstration of the ability to use DNA to store a complete operating system of 1.4 MBytes, a movie and other files for a total of greater than 2 Mbytes.
This is now possible because at the same time they have provided a new level of efficiency and reliability for the technique. If the DNA-data memory must have an acronym to fit it in the SRAM, DRAM, NVRAM memory spectrum, then biologic archival read rarely memory (BARRM) might be one choice.
As illustrated in Figure 1, within the DNA helix each cross linking nucleotide (nt) will contain one of the four nucleobases (bases). Given the the ability to be able to selectively place them in order along a DNA helix backbone offers the possibility of a binary data memory of two bits/base or nucleotide (i.e. 00, 01,10 and 11). The bonds between the bases linking the DNA spiral backbones are characterised by either two or three covalent hydrogen bonds.
It is suggested in the DNA would offer an eye catching memory data density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
At its core, the DNA-data memory methodology relies on a technique used in data communication where instead of repeating the transmission when an erroneous piece of a data stream is received, enough bytes are transmitted to allow by statistical analysis the correct data to be extracted. The technique is based on what are called “Fountain” codes.
Fountain codes allow data (such as a file) to be divided into an unlimited number of encoded pieces, in a form which allows them to be reassembled into the original file given any subset of the encoded pieces of data, provided that you have a little more than the size of the original file.
In data communications and now for memory a “Fountain” of suitably, encoded data is fired at a receiver, which is able to reassemble the file by catching enough "droplets" (the bits of encoded data). It is immaterial which bits of encoded data are received or missed. The water analogy using “fountains," “droplets” and “buckets” is now part of the language of these techniques. A bucket full of droplets will give you enough information to extract the original data.
Fountain codes are only a part of the method of changing a binary data stream into a form suitable for translation into strands of DNA. This latest work adds a new twist which accommodates the special stability needs of a potential DNA data memory. Emphasising the most desirable links and removing undesirable features such as too many (GC) links and long sequences of the same link the latter called homopolymer runs (TTTT...).
The target of the memory “Write” process is to turn the original data steam into a series of DNA oligonucleotides or “Oliogs” as the short form. These can be sent to a company specializing in the manufacture of DNA to order who return a small ampoule of the data encoded DNA.