I was writing a post yesterday about effects of the internet on Christmas, when the electrical power went off in my home. It came back in a couple of hours, which is better than usual. But my portable was uncharged, and I couldn’t get back to my post until today. Our electrical service seems to drop out several times in a year, usually in the rainy season (tree limbs fall on the wires, equipment that doesn’t like water gets wet, construction in area, etc.). But we are going through a long dry spell, and have not had rain or strong winds in weeks—business as usual.
I have now rebooted computers, changed all (most) of our clocks, charged batteries and am back in business. But I spent yesterday with an old friend who is a user of large computer systems, and we got into a discussion of system reliability, so lets think about that, rather than the internet and Christmas, since you all have just experienced that interaction.
Things tend to fail when unprecedented and complicated, just after being manufactured (sometimes called “infant mortality”), and after long usage (sometimes called “wearing out”,) The system supplying our house with electricity is certainly complicated. But I would not call its components unprecedented or even that exotic, and I trust that they are manufactured with care, and perhaps if highly critical even “run in” a bit before installation, because repairing failures results in real costs to the utility and annoys its customers. But yet it fails.
Big systems have failures. There are many reasons for this, not all predictable. There are seeming exceptions to this, such as the two Voyager Spacecraft launched in the 1970’s and still operating now over 15 billion kilometers from the sun. But the amount of care in design, manufacturing, and repetitive testing involved in the Voyager project, combined with a design that focused on simplicity and ruggedness, plus the predictability and benign nature of space (once you get past earth orbit) are not typical in the production of terrestrial devices or the environment in which they operate. And the voyagers too will eventually fail.
Computers fail. Yours probably have. The internet is full of stories of large and costly failures. Put “Large Computer Failures” into your browser, and you will get a large number of examples. Approximately a month ago, the Bay Area Rapid Transit sytem (BART) suffered a major problem with its computer system (here) which caused a large number of people to miss appointments, work schedules, and so on.
Why is it that visionary individuals and companies never seem to consider system failures in their ambitious plans. I cannot help but wonder how much thought about system reliability is going into planning such complex systems as highways containing computer controlled automobiles, a sky full of UMV’s (drones), and an ever increasing number of robotic machines in our lives. The government is not likely to release the reliability statistics of military drones. I have not heard statistics on the reliability of industrial robots, or the few exquisitely cared-for automatic automobiles running around at present. But I am happy to see that concern about such things is increasing and think there should be more.
When our power goes off, we suffer some annoyance, and some worry about such things as long outages which can cause inconveniences such as thawing food, non-working cordless phones,computers, TV’s, etc. But failures in complex automatic control systems can cause deaths. Consider automobiles.
Automobiles are typically driven by someone until failures become annoyingly frequent before they are scrapped. What does that say about Highway 5 or Los Angeles being full of computer driven cars? We hear of airplane accidents due to pilot error when taking over when the autopilot fails. A common explanation is that pilots spend so much time manually controlling the plane that they lose their instincts on what to do in an emergency. Shouldn’t we worry more about computer driven airplanes with out-of-practice “standby” human pilots —or no pilots at all?. Many leaders of the digital world believe that computers are more reliable at controlling complex equipment in a complicated environment than humans. I have not seen any data that convinces me that this is true over the total lifetime of such equipment.
I have a reputation as a booster of creativity, innovation, and entrepreneurship, and in fact I am a believer. But just because something is technically feasible on a small scale does not mean that it should too rapidly be introduced on a large one with attendant social complexity. If your computer driven car hits another computer driven car, who is liable for the damages? The car manufacturer? The digital equipment manufacturer? Are cars insured rather than drivers? Are periodic maintenance checks mandatory? How often does your car operating system require updating? There are many factors involved other than the capability of hardware and software.
Recent Comments