Design Article
Improved memory throughput using serial NOR flash - part 1
Cliff Zitlaw, Spansion Inc.
6/11/2012 2:06 PM EDT
System-level improvements
Direct CPU read accesses
Host systems have historically interfaced with SPI flash memories through an integrated peripheral SPI controller. To accomplish a read operation, the host software will:
1. Load the target address into the controller address register
2. Load the read command into the command register
a. This loading would automatically initiate the read transaction on the SPI bus
3. Poll the SPI controller status register until the target data has been output by the SPI memory and captured by the host SPI controller
4. Extract the target data from the data buffer in the SPI controller
This process is dramatically slower than what occurs during a read from a parallel flash memory device where the external memory is mapped directly into the CPU address space. In the parallel-usage case, the CPU simply reads from the target address, and the data is returned without the need for additional software intervention. This process is a requirement for XiP operation where op-code fetches must be performed directly for efficient code execution.

Several recently released system-on-chip (SoC) products have included a bimodal SPI controller strategy that maintains the legacy peripheral access infrastructure while also allowing read operations to be performed directly from the CPU address map. This invention has led to the rapid adoption direct execution of code out of SPI memories. Direct mapping into the CPU memory map eliminates throughput bottlenecks that exist during transfers through the SPI peripheral controller.
Dual SPI interfaces
In a few high-performance applications, the need for increased read throughput and high-density support has caused SoC manufacturers to develop dual-channel QuadIO SPI interfaces (see figure 5). The two channels are operated simultaneously in QuadIO mode to double the read throughput while only increasing the interface by six signals. The increased pin count (from six to 12 pins) is still much lower than the 40+ signals that would be required for a parallel NOR interface.

Device-level improvements
Device level behavior has also been pressured by the need for higher read throughputs. The first order requirement of increasing the clock rate has been implemented but a number of other parameters and characteristics have also received attention to make the higher data rates viable in a real world system.
Bus-timing enhancements
Two of the more significant timing parameters that have received scrutiny are the clock to data valid time (tV) and the data hold after clock time (tHO). These two parameters describe how long data is valid on the bus. tV describes when data becomes valid after a clock edge, and tHO describes how long data will remain valid after a clock edge. Much work has been done to minimize tV and maximize tHO to extend the data valid period for operation at higher clock frequencies (shorter clock periods).

Shift from 3 V to 1.8 V operating voltages
Device operating voltages are starting to transition from the legacy 3 V supply to the 1.8 V voltage level. This migration has been largely driven by the use of SPI devices in cell phones, where the lower operating voltage is attractive from a power-consumption perspective. A peripheral advantage to the lower operating voltage is that signal swings on the bus interface are reduced when operating with lower voltages. This smaller voltage swing between logic states means shorter transition times, which are necessary at higher operating frequencies. Current 1.8 V offerings are starting to appear with operating frequencies of 133 MHz with the possibility of even higher frequencies.
Output drive strength control
One emerging trend to maximize signal integrity is to allow the output drive strength to be optimized in the target environment. This capability has long been part of the high-speed DRAM world and is essential to maximizing signal integrity at higher data rates. Typical implementations provide four settings that are configurable in-system by the host processor. Environmental problems related to capacitive loading, trace impedance, and trace length can be mitigated by adjustment of the output drive strength. Figure 7 shows the output drive capabilities from one of the devices that support drive strength control.

Direct CPU read accesses
Host systems have historically interfaced with SPI flash memories through an integrated peripheral SPI controller. To accomplish a read operation, the host software will:
1. Load the target address into the controller address register
2. Load the read command into the command register
a. This loading would automatically initiate the read transaction on the SPI bus
3. Poll the SPI controller status register until the target data has been output by the SPI memory and captured by the host SPI controller
4. Extract the target data from the data buffer in the SPI controller
This process is dramatically slower than what occurs during a read from a parallel flash memory device where the external memory is mapped directly into the CPU address space. In the parallel-usage case, the CPU simply reads from the target address, and the data is returned without the need for additional software intervention. This process is a requirement for XiP operation where op-code fetches must be performed directly for efficient code execution.

Click image to enlarge
Figure 4: Direct CPU memory
Several recently released system-on-chip (SoC) products have included a bimodal SPI controller strategy that maintains the legacy peripheral access infrastructure while also allowing read operations to be performed directly from the CPU address map. This invention has led to the rapid adoption direct execution of code out of SPI memories. Direct mapping into the CPU memory map eliminates throughput bottlenecks that exist during transfers through the SPI peripheral controller.
Dual SPI interfaces
In a few high-performance applications, the need for increased read throughput and high-density support has caused SoC manufacturers to develop dual-channel QuadIO SPI interfaces (see figure 5). The two channels are operated simultaneously in QuadIO mode to double the read throughput while only increasing the interface by six signals. The increased pin count (from six to 12 pins) is still much lower than the 40+ signals that would be required for a parallel NOR interface.

Click image to enlarge
Figure 5: Dual SPI interface.
Device-level improvements
Device level behavior has also been pressured by the need for higher read throughputs. The first order requirement of increasing the clock rate has been implemented but a number of other parameters and characteristics have also received attention to make the higher data rates viable in a real world system.
Bus-timing enhancements
Two of the more significant timing parameters that have received scrutiny are the clock to data valid time (tV) and the data hold after clock time (tHO). These two parameters describe how long data is valid on the bus. tV describes when data becomes valid after a clock edge, and tHO describes how long data will remain valid after a clock edge. Much work has been done to minimize tV and maximize tHO to extend the data valid period for operation at higher clock frequencies (shorter clock periods).

Figure 6: Data valid timelines
Shift from 3 V to 1.8 V operating voltages
Device operating voltages are starting to transition from the legacy 3 V supply to the 1.8 V voltage level. This migration has been largely driven by the use of SPI devices in cell phones, where the lower operating voltage is attractive from a power-consumption perspective. A peripheral advantage to the lower operating voltage is that signal swings on the bus interface are reduced when operating with lower voltages. This smaller voltage swing between logic states means shorter transition times, which are necessary at higher operating frequencies. Current 1.8 V offerings are starting to appear with operating frequencies of 133 MHz with the possibility of even higher frequencies.
Output drive strength control
One emerging trend to maximize signal integrity is to allow the output drive strength to be optimized in the target environment. This capability has long been part of the high-speed DRAM world and is essential to maximizing signal integrity at higher data rates. Typical implementations provide four settings that are configurable in-system by the host processor. Environmental problems related to capacitive loading, trace impedance, and trace length can be mitigated by adjustment of the output drive strength. Figure 7 shows the output drive capabilities from one of the devices that support drive strength control.

Click image to enlarge
Figure 7: Output drive strength control
Did you find this article of interest? Then visit the Memory Designline, where we update daily with design, technology, product, and news articles tailored to fit your world. Too busy to go every day? Sign up for our newsletter to get the week's best items delivered to your inbox. Just click here and choose the "Manage Newsletters" tab.
Part II of this article discusses protocol improvements that can further increase throughput.
_________________________
Did you find this article of interest? Then visit the Memory Designline, where we update daily with design, technology, product, and news articles tailored to fit your world. Too busy to go every day? Sign up for our newsletter to get the week's best items delivered to your inbox. Just click here and choose the "Manage Newsletters" tab.
Navigate to related information


Dr DSP
6/13/2012 4:58 PM EDT
Figure 1 image seems broken for me. Is it working for anyone else?
Sign in to Reply
susan.rambo
6/14/2012 1:19 PM EDT
Thanks. I fixed Figure 1 so it's visible now.
Sign in to Reply