If you were on the Internet in the wee hours of today (Friday), it’s pretty likely that you experienced either significant slowdowns or unreachable domains. Don’t ask me how I know…
The problem was reportedly a consequence of a dispute between two tier 1 ISPs, or a software upgrade gone awry, or the opening salvo in the “or else” part of EU demands to wrest control of the Internet away from the US and place it under the UN, or a trial run for terrorists. There’s plenty of room for the most paranoid with this type of failure. A 'scoreboard' from the Internet Health Report can show you major routing problems – provided you can reach it.
WAIT! Wasn’t the Internet the product of a project that would ensure control in the event of nuclear war? Sort of. The idea was to maintain redundant routing between any two points. But the commercialization of the technology has paid scant attention to true redundancy. Redundancy costs money. Look at a typical network and count the single points of failure. Common mode failures, like bugs in router software, can bring down entire network segments.
This sort of event demonstrates in no uncertain terms the need for overlay routing schemes which can compensate for major outages along various routes of the backbone. Two papers dealing with this issue can be read at:
It’s time for engineers to re-evaluate systems designs for continuity of operation in the face of failure of a single component. What good is a cheap cell phone if the loss of a single facility makes it so much useless plastic and metal? Let’s find ways to make the classic engineering statement “Fast, cheap, reliable, pick two” become obsolete as we make it “Fast, cheap, reliable.”