The scaling of semiconductor technologies has led to a lower operating voltage in semiconductor devices, which, in turn, reduces the charge available on the capacitors for volatile memories. The overall effect of this is that devices are generally more sensitive to soft or transient errors, because even low-energy alpha particles can easily flip the bits stored in storage cells or change the values stored in sequential logic elements, producing erroneous results.
Increasing memory density, system-on-chip (SoC) memory content, performance, and technology-scaling combined with reduced voltages increases the probability of multi-bit transient errors. Notably, transient errors are no longer restricted to aerospace applications. Now applications such as biomedical, automotive, networking, and high-end computing are susceptible to transient errors and have a need for high reliability and safety.
Transient error sources are, in many cases, self-inflicted because alpha particles are commonly generated in materials adjacent to the chip, solders, and in the packaging. Due to the higher susceptibility to multiple-bit (multi-bit) transient errors, and an increasing requirement for high reliability, there is a greater need to mitigate transient errors in embedded memories. In this article we discuss transient error detection and correction methods using advanced error correction code (ECC) based solutions for embedded memories in order to meet the requirements of today’s high-reliability applications.
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.