Jacob and Rich... YES! I understand that underlying issues (that I'm unsure were adequately discussed/investigated) caused the software to fail. That is, hardware failures (bit-flips) caused the software to fail. Yes, the software could/should be improved, but what about the underlying issues that "strange things happen" that we need to understand? All of this focus on the software is fine, but I posit it's impossible to write software that can account for every possible random bit error the system might encounter (some of which are permenent, some are transient). Don't we need to understand the fundamental issues of what caused the software to fail in the first place to even begin to understand how to minimize the risks associated with those failures?