LONDON – Venray Technology Ltd. (Dallas, Texas) has proposed building a small microprocessor on a DRAM process to save power.
The argument that cache transistors outnumber processor transistors on modern processors and therefore a superior design would result from a compromise on the processor and optimization of memory design by building logic in a DRAM process has been made in the past, but has not yet gained traction. It is something Micron Technology Inc. (Boise, Idaho) looked at early in 2000s.
Venray, founded in 2007, argues that such a processor can reduce power consumption to between one-fifth and one twentieth of an ARM or Intel Atom processor. Venray also claims a fifth to tenth the cost compared to ARM or Intel chips. At the same time Venray is claiming performance benefits from being able to connect CPUs and on-chip main memory through ultra-wide buses.
However, it is not clear for what applications the Venray architecture is optimized although the company writes on its website of improving mobile phone and tablet computer size, weight and performance.
Venray has developed the TOMI architecture (Thread-optimized Multiprocessor Instruction) and designed the Aurora SoC and Borealis chips but does not appear to have taken them to silicon.
Aurora is a 4-core processor with 64-Mbytes of memory. Borealis is an 8-core processor with 1-Gbyte of memory. The four-core Aurora has a projected cost of less than $1. Aurora has been designed for a 110-nm DRAM process and 500-MHz clock frequency CPUs would consume 23-mW each, Venray claims.
The Moore patents are pale echoes of real, shipped products - the transputer family from Inmos (true, they weren't built in a DRAM process, but...)
And even before that, there was ICL's Distributed Array Processor, which put a tiny single-bit processor in each (rather small at the time) DRAM chip, so you got thousands of processors for no incremental increase in chip cost. The machine ran as a SIMD device, of course.
DRAM needs refresh to top up leaking capacitors used in the memory matrix. The logic itself needs no refresh.
Looking at the commercial price of DRAM then this is a nice approach if the logic content is relative low and DRAM access time is no problem.
This has always been a trade off. Logic combined with memory. The two have different needs for process. You end up paying more for the process of the part of the chip that doesn't need it. Memories also don't shrink as easily as CMOS logic. This can hold you back as you work to get the memory into the next node. Having a simpler process can really save a lot of manufacturing cost and time, but the logic will be lower in performance. I haven't studied it, but I would assume that this processor has been optimized to still be competitive. I also assume that it is aimed at applications that don't need the highest performance.
David's iRAM is probably the best known previous efforts, although the first known description of CPU in DRAM dates from 1989: http://www.pat2pdf.org/patents/pat5440749.pdf (see Fig. 9)
iRAM, Exacube, Gilgamesh, Cyclops, and others embedded DRAM in logic processes.
TOMI embeds CPU cores in existing DRAMs. The primary benefit is a 500x advantage in cost: http://www.edn.com/photo/294/294788-microprocessor_vs_memory_transistors_graph.jpg
I recall Ivan Sutherland proposing "logic in memory" systems in the late 1970s. We only had two metal layers then. More recently, David Patterson's iRAM project proposed vector-style computing with processing and memory on the same chip. This sort of chip is most likely to work with multi-threading approaches similar to the Sun Niagara, targeting applications with significant thread-level parallelism, such as web or database.