How was it that my memory tests passed when there wasn't even any memory in the cabinet?
My first job when I got out of university was as a member of a team designing CPUs for mainframe computers. Fortunately, integrated circuits were in use by this time (it would have been a mega-pain designing a mainframe computer at the transistor level), but their geometries were huge compared to today’s deep-sub-micron devices.
Take the case of RAM and ROM memory chips, for example. Today, when I’m wandering around a store like Staples or Best Buy and I see a USB Flash memory stick containing say 16GB for a relatively small amount of money I think “Ho Hum” and carry on my merry way. I must have 10 or more such memory sticks in my backpack and I cannot say how many are scattered around my office (at conferences companies put presentations on them and give them away).
It’s important to note that we’re talking about gigabytes here, where each gigabyte is a thousand megabytes and each megabyte is a million bytes. If someone had used terms like 16GB to me back in 1980 I would have laughed my head off – even a single megabyte seemed to be a HUGE amount of memory at that time.
I remember once being instructed to create a “quick and dirty” test for a memory cabinet. Remember that we’re talking mainframe computers here. A single memory cabinet was the size of one of today’s large fridge-freezers. In the upper portion of the cabinet were a bunch of very large circuit boards, each perhaps 2 feet x 2 feet square and each carrying hundreds of memory chips (the chips themselves contained relatively small amounts of memory). The lower half of the cabinet contained the power supplies for that unit.
I was working in an engineering environment in which we were working on multiple cabinets (CPU, memory, peripheral devices…) and pulling boards out and plugging them in all the time. All that was required of me was to create a really simple test that could be run to check that the memory was functioning at the most rudimentary level (i.e. all of the boards were plugged in, powered up, and working).
So I set to with gusto and abandon. Purely for the sake of this discussion, let’s say that the cabinet contained 100,000 words of memory, where each word was 64 bits wide (truth to tell, I can no longer recall the nitty-gritty details).
You have to remember that I was new to all of this, so I just did what seemed to make sense. First of all I wrote zeros into every location. Next, for each word in the memory I wrote a pattern on 01010101…0101 and read it back; then I wrote a pattern of 10101010...1010 and read that back; then I moved on to the next location.
Creating this test didn’t take long. When my boss passed by I called him over and (using my command line interface) proudly executed the test, which soon completed and displayed something like Main Memory is OK on the screen.
My boss pondered this for a moment and then said: “That’s very interesting. And the thing that’s really interesting about it is that this memory cabinet is empty – the rack containing all of the memory boards is currently downstairs in the service department.”
I opened the access panel to the upper portion of the cabinet and he was right – it was totally empty. Good grief, I felt like an idiot (but where could we find one at that time of the day … sorry, I couldn’t help myself).
So how was it that my tests passed when there wasn’t even any memory in the cabinet? (The answer is given at the end of this column).
Of course I’ve learned a lot of “stuff” since those days of yore; for example…
Whose fault is it anyway?
For the purpose of these discussions, we shall take the term fault to refer to a physical failure mechanism such as a broken wire. Meanwhile, the term fault-effect refers to the way in which a fault manifests itself to the outside world.
In the case of memory devices, faults can be categorized as being either functional or dynamic. Functional faults include bad memory cells or bad access to these cells, while dynamic faults refer to timing failures.
One set of functional faults are predominantly associated with the interconnect (both on the circuit board and in the device). The majority of these will be stuck-at, bridging, or open faults. A stuck-at fault is a short between a signal and a ground or power plane, so (assuming we're working with Positive Logic) these are referred to as stuck-at-0 and stuck-at-1 faults, respectively.
Bridging faults are similar to stuck-ats in that they share common mechanisms (such as solder splashes at the board level or internal shorts at the device level). In the case of a bridging fault, however, the unwanted connection is between two or more signals rather than between a signal and a power plane.
Finally, an open fault refers to the lack of a desired connection, such as a broken track or a bad solder joint at the board level or a disconnected bonding wire at the device level. Open faults are referenced as open-0, open-1, or open-Z depending on the way in which they manifest themselves (where "Z" indicates a high-impedance value). For example, an open-0 fault indicates that a signal or input has become disconnected from its driving device(s), and that this signal or input will consequently “float” to a weak logic 0 value.
The “nameless” test sequence
Assuming for the moment that we’re interested in a single RAM device (either in isolation or embedded in the middle of a circuit), the first thing we need to do is to test our access to the device in the form of its address and data busses. The reason we perform these tests first is that they are relatively quick and painless, and it’s only after we’ve proved that we can actually “talk” to the device that we would wish to proceed to the time-consuming process of verifying its internal structures.
Before we look at the tests themselves, let’s first consider a group of eight wires named a through h. Let’s assume that we can drive signals into one end of these wires and monitor the results at the other end. Our task is to determine the minimum number of test patterns that are required to detect every possible stuck-at, bridging, and open fault on these wires as illustrated in Figure 1.
Figure 1. What is the minimum number of test patterns
that are required to detect every possible stuck-at,
bridge, and open fault on eight wires?
First of all, we know that we must check that each wire can be driven to a logic 0 and a logic 1. This will ensure that there are no stuck-at faults and, ignoring any weird capacitive effects, no open faults. To do this we could use just two test patterns, 000000002
, but this would not reveal any bridging faults. In order to detect bridging faults we have to ensure that every wire can carry the opposite logic value to every other wire.
One of the simplest test sequences is the “walking ones,” in which each wire is driven with a logic 1 while all of the other wires are driven with logic 0s. Thus, for n wires this sequence requires n test patterns, which, at a first glance, doesn’t appear to be an unduly excessive requirement as illustrated in Figure 2(a).
Figure 2. The “nameless” sequence requires
fewer tests than a “walking ones”
For a variety of reasons, however, we often wish to use the smallest possible test sequence that we can. An alternative test sequence that I call the “nameless” sequence (because I made it up myself and had never actually seen it documented anywhere until after I’d penned this piece) commences by dividing the wires into two groups. We start by driving the “left-hand” group with logic 1s and the “right-hand” group with logic 0s; then we proceed to divide each group into two sub-groups, and to drive each “left-hand” sub-group with logic 1s and each right-hand sub-group with logic 0s. This continues until we have alternating logic 1s and logic 0s on each wire, at which point we terminate the sequence by simply inverting all of the wires as illustrated in Figure 2(b).
The beauty of the nameless sequence is that whenever the number of wires double we only have to add one new test pattern. That is, 8 wires require 4 test patterns, 16 wires require 5 test patterns, 32 wires require 6 test patterns, and so on. Thus, as the number of wires increase, so does the efficiency of the “nameless” sequence in comparison to the walking ones sequence.
Note that if you’re using the “nameless” sequence and the number of wires is not equal to a power of two (2, 4, 8, 16, 32, 64, ...), then you can add some imaginary wires to create the sequence and discard them at the end. For example, if you have 5, 6, or 7 wires, you would add enough pseudo-wires to bring the total number of wires up to 8, write the test sequence based on 8 wires, and then drop the pseudo wires.The “nameless” sequence unmasked!
After I had first mentioned my “nameless” sequence in an article in EDN magazine many years ago, I received a jolly pleasant letter from Mr. Norman Megill, Vice President of Engineering at Production Services Corporation, Belmont, MA. Mr. Megill pointed out that my "nameless" sequence should more properly be referred to as the Modified Counting Sequence Algorithm
as per a 1989 IEEE paper:
N. Jarwala and C.W. Yau, “A New Framework for Analyzing Test Generation and Diagnosis Algorithms for Wiring Interconnects,” Proceedings, IEEE International Test Conference, 1989, pp. 63-70
Mr. Megill went on to note that, to the best of his knowledge, this algorithm was first documented by himself in a 1979 paper:
N.D. Megill, “Techniques for Reducing Pattern Counts for Functional Testing,” Digest of Papers, IEEE Test Conference 1979, pp. 90-94
In fact my discussions on the nameless sequence prompted a slew of emails from readers who had independently come up with the same thing. Furthermore, several readers noted a trick they used to generate a variation on the nameless sequence, which simply involves writing down a standard binary count sequence, commencing at 1, proceeding up to the number of wires you wish to test, and then “rotating” the results. For example, assuming that we wish to test 10 wires called a through j for stuck-ats, bridges, and open faults, we would commence by writing the binary values for 1 to 10 as illustrated in Figure 3(a).
Figure 3. Generating a variation of the nameless sequence
Once we’ve generated the binary count, we conceptually “rotate”
the table 90 degrees clockwise (or anti-clockwise if you are so-inclined) to create the final test sequence as illustrated in Figure 3(b). This scheme has an advantage over my nameless sequence in that it results in one less test for any number of wires except 2n
(I tell you, I learn something new every day).Actually testing the device
Using the "nameless" sequence (or similar) to generate addresses, we would first write corresponding "nameless-sequence-based" data values into each "nameless" location and then read these values back again. This would ensure that we had access to the device and that there were no short, open, or bridging faults on the address or data busses.
The next step would be to write functional tests that verify the internals of each memory device, but these techniques are a tad more complex and will therefore be left as topics for future columns.And the answer is…
And so, returning to my original tale of woe, how could it be that my tests passed when there wasn’t even any memory in the cabinet?
Arrggghhh! It was all due to parasitic capacitances on the data and address busses. Whatever 010101… (or similar) value that I wrote to the bus persisted long enough for me to read it back again.
I cannot tell you how silly I felt when I discovered my mistake, but that’s how we learn… the real trick is to not make the same mistake more than once (grin).Click Here
to see other articles in this "How it was..."
series...Editor's Note: It would be great if – in addition to commenting on my articles – you took the time to write down short stories of your own. I can help in the copy editing department, so you don’t need to worry about being “word perfect”. All you have to do is to email your offering to me at max@CliveMaxfield.com with
“How it was” in the subject line.I can post your article as “anonymous” if you wish. On the other hand, what would be really cool would be if you wanted to add a few words about yourself – and maybe even provide a couple of
“Then and Now” pictures – for example:On the left we see me as a young sprog – I was still a student at this time, poised on the brink of leaping into my first position at International Computers Limited (ICL). On the right we see me as I am today – a much older and sadder man, beaten down by the pressures of work and bowed by the awesome responsibilities I bear (grin).
If you found this article to be of interest, visit EDA Designline
where – in addition to blogs on all sorts of "stuff" – you will find the latest and greatest design, technology, product, and news articles with regard to all aspects of Electronic Design Automation (EDA).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for the EDA Designline weekly newsletter – just Click Here
to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).