SANTA CLARA, Calif. – Some more details of a design error in a companion chip to the Sandy Bridge processor – and the fix being implemented – have emerged in a conference call held by Intel with financial analysts to discuss the issue and the impact on Intel's revenues and margins.
The chip, known as Series 6 or Cougar Point, passed rigorous functional testing performed by both Intel and its OEMs but nonetheless there is a problem which can show up in a low percentage of chips, according to Steve Smith, vice president of PC client operation enabling, speaking on the call.
Smith said his best estimate was that a single-digit percentage of the chips, about 5 percent, have the potential to be affected over the typical 3-year life of a notebook computer. And the error would manifest itself with up to 4 of 6 serial-ATA channels being degraded in performance or failing altogether.
Systems including the chips only started shipping to consumers on Jan. 9 and there are no known reported failures in the field. Nonetheless Intel has suspended shipments of the chip while it brings up a corrected design and will provide replacements and support to affected parties.
"The root cause is a design oversight, if you will, and all we needed to do was make a metal change to configure that circuit back to a robust operating mode. And it's on one of the later layers of metal so we actually can utilize all the chipset pipeline that has been there and is there in the fab right now," said Smith.
Because the chipset is built in a relatively mature 65-nm process, Smith said there is confidence that the corrected chip can ramp up production quickly.
I have been designing silicon chips for longer than I wish to admit. Making metal revision to a mask set is a very standard process. Usually the errors are detected in testing in the lab. Thinking that you have a final product and later finding from a customer that it still contains a bug happens frequently too. You would expect it happens less at large company as Intel due to an army of design verifiers they have nevertheless this clearly happens from time to time (there was a big Intel recall few years back). The truth is that a sheer complexity of microprocessor or number of permutations required in testing is so large that you actually never know for sure that silicon is working all the time!...dr Kris
I fully agree with DrizztVD. What kind of world are we living in? Mistakes are only human and all of us make mistakes whether we admit it or not. And if a company fires its employees for making a mistake, its only creating fear and in future nobody will be eager to do new things.
Its not unfortunate to have errors, its unfortunate not to admit it.
You'd have to be a very messed up manager to want to fire your engineers for something they had no control over. This is not a case of negligence, it is a case of straightforward trail-and-error learning. And it's not embarrassing either, these things happen all the time. Kudos to Intel for doing the right thing and withholding chips until they have reliable chips to sell. Lesser companies would have tried to cover it up.
Intel admitting that it is a "design oversight" is really unfortunate. This is the risk involved in having a basic design issue as it seeps into other blocks making its impact catastrophic. Surely many heads would have rolled down in the aftermath of this. But certainly it is " Better Late than Never".
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.