Multiple redundancy
Re: Multiple redundancy
Inter-system crashes
The dangers of assuming too much
Ozone hole undetected for years due to programming error
Sane sanity checks /  risking public discussion
A propos landing gear


Date: Thu, 9 Jan 86 12:14:34 PST
From: ihnp4!utzoo!
Subject: Multiple redundancy

Advocates of multiple redundancy through independently-written software
doing the same job might be interested in an incident involving complete
failure of such a scheme.

During the development of the De Havilland Victor jet bomber, roughly a
contemporary of the B-52, the designers were concerned about possible
problems with the unusual tailplane design.  They were particularly
worried about "flutter" -- a positive feedback loop between slightly-flexible
structures and the airflow around them, dangerous when the frequency of the
resulting oscillation matches a resonant frequency of the structure.  So
they tested for tailplane flutter very carefully:

 1. A specially-built wind-tunnel model was used to investigate the
 flutter behavior.  (Because one cannot scale down the fluid properties
 of the atmosphere, a simple scale model of the aircraft isn't good
 enough to check on subtle problems -- the model must be carefully
 built to answer a specific question.)

 2. Resonance tests were run on the first prototype before it flew,
 with the results cranked into aerodynamic equations.

 3. Early flight tests included some tests whose results could be
 extrapolated to reveal flutter behavior.  (Flutter is sensitive to
 speed, so low-speed tests could be run safely.)

All three methods produced similar answers, agreeing that there was no
flutter problem in the tailplane at any speed the aircraft could reach.

Somewhat later, when the first prototype was doing high-speed low-altitude
passes over an airbase for instrument calibration, the tailplane broke off.
The aircraft crashed instantly, killing the entire crew.  A long investigation
finally discovered what happened:

 1. The stiffness of a crucial part in the wind-tunnel flutter model
 was wrong.

 2. One term in the aerodynamic equations had been put in wrongly.

 3. The flight-test results involved some tricky problems of data
 interpretation, and the engineers had been misled.

And by sheer bad luck, all three wrong answers were roughly the same number.

Reference:  Bill Gunston, "Bombers of the West", Ian Allen 1977(?).

    Henry Spencer @ U of Toronto Zoology


Date: Mon, 13 Jan 86 19:49:18 PST
From: ihnp4!utzoo!
Subject: Re: Multiple redundancy

A correction and an addendum to my earlier contribution about multiple

Correction:  It was not the "De Havilland Victor" but the "Handley Page
Victor".  Blush.  That's like calling Boeing "McDonnell Douglas".

Addendum:  The full reference is  Bill Gunston, "Bombers of the West",
Ian Allan, London 1973, page 92.

    Henry Spencer @ U of Toronto Zoology


Date: Thu, 27 Mar 86 08:32:18 est
From: hammond%lafite@mouton.ARPA (Rich A. Hammond at lafite.UUCP)
Subject: Inter-system crashes

I worked in a hotel once when they were adding a new wing.  The main water
and electricity systems had to be turned off to connect the new wing.
Management decided to do both at the same time so there would only be one
interruption in service.  The problem:  Turning off the electric power
caused the emergency generator to come on, but the generator was cooled by
water which came from the main and ran into the drain, i.e., no
recirculation.  Of course there was no water, the generator engine managed
to warp its head pretty badly before we shut it off.


Sender: "J._Paul_Holbrook.OsbuSouth"@Xerox.COM
Date: 29 Apr 86 14:32:33 PDT (Tuesday)
Subject: The dangers of assuming too much
From: Holbrook.OsbuSouth@Xerox.COM
To: Risks@SRI-CSL.Arpa, Methodology^.PA@Xerox.COM

[From "Three Mile Island: Thirty Minutes to Meltdown" by Daniel Ford;
Viking Press 1982.]

(The discussion preceeding this quote talks about how the temperature of the
fuel rod at Three Mile Island-2 increased from the normal 600 degrees to
over 4000 degrees during the 1979 accident, partially destroying the fuel
rods.  It also notes that instruments to measure core temperatures were not
standard equipment in reactors.)

  "Purely by chance, there were some thermocouples -- temperature-measuring
  devices -- present in the TMI-2 reactor when the accident occured.  Located
  about 12 inches above the top of the core, these thermocouples ... were
  installed as part of an experimental study of core performance, and were a
  temporary instrumentation feature of the plant, connected to the
  control-room computer for measuring temperatures during normal operation.
  Accordingly, if a control-room operator requested temperature data from the
  computer, he would receive useful information only when the temperature was
  within the normal 600 degree range.  When the temperature got above 700
  degrees, the computer, instead of reporting it, would simply print out a
  string of question marks -- "???????."  Although the thermocouples could
  actually measure much higher temperatures, the computer was not programmed
  to pass these higher temperature readings on to the operators ... there was
  an urgent need for timely, reliable data about the temperature in the core
  in the critical period between 6am and 7am on March 28; what was available
  from the computer was mostly question marks."



Date: Fri, 1 Aug 86 0:48:48 EDT
From: sdcsvax!dcdwest!ittatc!bunker!wtm@ucbvax.Berkeley.EDU (Bill McGarry)
Subject: Ozone hole undetected for years due to programming error

(I read the following in a magazine but when I went to write this
 article, I could not remember which magazine and some of the exact
 details.  My apologies for any inaccuracies.)

Recently, it was disclosed that a large hole in the ozone layer appears once
a year over the South Pole.  The researchers had first detected this hole
approximately 8 years ago by tests done at the South Pole itself.

Why did they wait 8 years to disclose this disturbing fact?  Because the
satellite that normally gives ozone levels had not reported any such hole
and the researchers could not believe that the satellite's figures could be
incorrect.  It took 8 years of testing before they felt confident enough to
dispute the satellite's figures.

And why did the satellite fail to report this hole?  Because it had been
programmed to reject values that fell outside the "normal" range!

I do not know which is more disturbing -- that the researchers had so much
faith in the satellite that it took 8 years of testing before they would
dispute the satellite or that the satellite would observe this huge drop in
the ozone level year after year and just throw the results away?

   Bill McGarry
   Bunker Ramo, Trumbull, CT
   {decvax, philabs, ittatc, fortune}!bunker!wtm

          [A truly remarkable saga.  I read it too, and was going to report
           on it -- but could not find the source.  HELP, PLEASE!  PGN]


Date: Tue, 23 Sep 86 12:54:10 EDT
From: Jim Purtilo <>
Subject: Sane sanity checks /  risking public discussion

  [Regarding ``sanity checks'']

Let us remember that there are sane ``sanity checks'' as well as the other
kind. About 8 years ago while a grad student at an Ohio university that
probably ought to remain unnamed, I learned of the following follies:

The campus had long been doing class registration and scheduling via
computer, but the registrar insisted on a ``sanity check'' in the form of
hard copy.  Once each term, a dozen guys in overalls would spend the day
hauling a room full of paper boxes over from the CS center, representing a
paper copy of each document that had anything to do with the registration
process.  [I first took exception to this because their whole argument in
favor of "computerizing" was based on reduced costs, but I guess that should
be hashed out in NET.TREE-EATERS.]

No one in that registrar's office was at all interested in wading through
all that paper. Not even a little bit.

One fine day, the Burroughs people came through with a little upgrade to the
processor used by campus administration.  And some "unused status bits"
happened to float the other way.

This was right before the preregistration documents were run, and dutifully
about 12,000 students preregistration requests were scheduled and mailed
back to them.  All of them were signed up "PASS/FAIL".  This was
meticulously recorded on all those trees stored in the back room, but no one
wanted to look.

I suppose a moral would be ``if you include sanity checks, make sure a sane
person would be interested in looking at them.''

  [Regarding break-ins at Stanford]

A lot of the discussion seems to revolve about ``hey, Brian, you got what
you asked for'' (no matter how kindly it is phrased).  Without making
further editorial either way, I'd like to make sure that Brian is commended
for sharing the experience.  Sure would be a shame if ``coming clean'' about
a bad situation will be viewed as itself constituting a risk...

               [I am delighted to see this comment.  Thanks, Brian!  PGN]


Date: Wed, 1 Oct 86 16:55:14 pdt
From: ladkin@kestrel.ARPA (Peter Ladkin)
To: risks@sri-csl
Subject: A propos landing gear

Alan Marcum's comment on gear overrides in the Arrow reminded me
of a recent incident in my flying club (and his, too).
The Beech Duchess, a light training twin, has an override that
maintains the landing gear *down* while there is weight on the
wheels, ostensibly to prevent the pilot from retracting the
gear while on the ground (this is a problem that Beech has in
some of its airplanes, since they chose to use a non-standard
location for gear and flap switches, encouraging a pilot to
mistake one for the other).
Pilots can get into the habit of *retracting* the gear before
takeoff, secure in the knowledge that it will remain down
until weight is lifted off the wheels, whence it will commence
retracting. This has the major advantages that it's one less
thing to do during takeoff, allowing more concentration on
flying, and the gear is retracted at the earliest possible
moment, allowing maximum climb performance, which is important
in case an engine fails at this critical stage.
Can anyone guess the disadvantage of this procedure yet?

Our club pilot, on his ATP check ride, with an FAA inspector
aboard, suffered nosewheel collapse on take-off, and dinged
the nose, and both props, necessitating an expensive repair
and engine rebuild. Thankfully, all walked away unharmed.
It was a windy day.

It is popularly supposed that the premature retraction technique
was used, and a gust of wind near rotation speed caused the weight
to be lifted off the nosewheel. When the plane settled, the
retraction had activated, and the lock had disengaged,
allowing the weight to collapse the nosewheel.

Both pilots assure that the gear switch was in the down position,
contrary to the popular supposition.

All gear systems in the aircraft were functioning normally when
tested after the accident.

The relevance to Risks? The system is simple, and understood in
its entirety by all competent users. The technique of
premature retraction has advantages. It's not clear that a
gedankenexperiment could predict the disadvantage.

Peter Ladkin