How To Plan A Computer Disaster
Notes of a talk I gave at SPUSC 99 (South Pacific User Serivices Conference)
and an NZCS meeting.
Author: Tony Dale, Computer Science Department, University of Canterbury.
August 1999
Updated April 2000
Background
Over 1999 a number of large computer project disasters, each costing more than
10 million dollars, have hit the headlines, so I thought I'd look in to some
of the reasons for their occurence. There is an extensive literature on
project disasters, and they are quite well-characterized, but still they
happen.
The Standish Group has estimated the annual costs of computer software
disasters to be over 80 billion dollars (in J. Johnson, 1995; "Chaos:
the dollar drain of IT
project failures", Applied Development Trends, pp 41-7).
For many examples I'll draw on the INCIS disaster, described as
"One of the biggest bureaucratic bungles in NZ history". Anyone want to buy
a used IBM mainframe?
Computing and Other Disasters
Engineering is partially an empirical subject which advances on a trail of
mistakes.
Stressful environments will always flush out design inadequacies, whether
it's a 4WD breaking down in the desert or a bridge falling down because of
wind-induced harmonics; bridge-building technology is still advancing
after more than 10,000 years.
However, computer disasters do seem to be uniquely frequent and well-tolerated.
Consider the outrage generated from a $250,000 WINZ trip to Queenstown, versus
the quiet expiration of the $9 million Waikato Health project, where
the chief executive resigned when contract was awarded,
and the deputy chairman and
group manager of audit and finance have resigned after the failure.
In May 2000 legal action against the remaining Waikato Health directors
was being mooted.
Large projects can be halted: the $400 million Britomart project
was halted at an
early stage (and the
lawyers are gathering) for want of a $15 million top-up.
However, Britomart was killed off by political changes in the Auckland City
Council, not because it's infeasible to proceed.
Computer disasters, however, are frequently complete failures.
Rick Swinard writes in the Press, 2 June '99: "I find it difficult to think of
another industry that is as good as the computer business at shooting itself in
the foot." I've got news for Rick: it's a team effort.
What Is A Disaster?
- Success is when 70-100% of the project functionality is within
specification. This happens about half the time. Partial success
is 50-70% - actually what
most projects deliver in terms of functionality or performance.
- Failure is 30-50% - you've had a bad experience, but hopefully you
will learn
from it and maybe some of the project is salvageable.
- Disaster is 0-30% - why did you ever start? Considering that INCIS has
completed one stage of a proposed three and blown it's budget, we'll give it
33% - a borderline disaster? A complete disaster might mean you lose ownership
of the hardware.
The Progress Of A Disaster
1. The vision
Start with a vision: a vague statement leading to your downfall. Consider
Joan of Arc.
The vision is what keeps a project going long after any sane person would have
abandoned it. Usually the CEO has the vision, and signs the cheques. The
rest is left up to the salespeople and the "tekkies".
Did INCIS have a vision?
INCIS was big and was going to lead the world; a couple of prerequisites for
a disaster.
2. The Spadework
- Complexity.
One big project makes for a much bigger disaster than a lot of little ones.
Combinatorial explosion seems to make costs balloon and schedules slip with
the greatest of ease.
The trouble is frequently that no-one, including the supplier, appreciates the
complexity of the product, and so it's often hugely ambitious. Eg: The
Traveling Salesman Problem was discovered because a US cereal company ran a
competition to chart the shortest path through all State Capitols in the USA.
- Believe what the sales people tell you. (The scariest thing is that often
sales people really believe what they're telling you. Sales people tend not
to think about failure.)
- Do not seek others advice on a new system or, if you do, ask your supplier.
- Have loose specifications and amend them later: Consider a notorious
airforce squadron commander's house, where
furnishings were installed, then ripped out and reinstalled, etc, etc.
A Labour Party spokesman, George Hawkins, accused INCIS of being poorly
specified even though the contract ran to 4000 pages!
- Use the latest software technology 4GLs, 5GLs, OODBMSs, etc.
Be enterprising, be the first, go high-tech.
INCIS specified a lot of high technology at the start, but it was such a big
project that quite a lot of it became obsolete along the way.
Mike Sprange of PriceWaterhouseCoopers says:
"Companies and governments often take a technology-focused approach,
neglecting to consider technologys impact on and suitability for end-users".
- Use the latest hardware. Buy it first if you can, although this applies
more to low budgets, eg, if you spend half your $10,000 budget on
useless hardware.
- Get enthusiastic about the project - its going to be a winner! Problems of
all sizes are redefined as "teething troubles", the most overworked phrase in
the industry.
INCIS management took the "Amway approach" - think positive!
- Assume it will be easy; never think of what might go wrong because
nothing is going to go wrong, is it?
Rick Swinard writes: "... IT salesmen [and women!] have a gift for convincing
potential buyers that they know all the answers, and that any problems will be
minor glitches able easily to be overcome at minimal cost. Subliminal in all
of this is the message that their clients can't be expected to understand the
sophistication of the software."
The above two points are linked to the incomprehensibility of computers to
"outsiders" such as the CEO. No-one wants to look stupid by asking silly
questions, and the insiders are past masters at dodging pointed questions
(eg: "Why is it so slow?") by making the questioner look stupid. Therefore
everyone tends to avoid discussing messy technical details - the buyer wants
to avoid looking ignorant and the vendor wants to avoid being nailed down
as to what they are really up to.
- Overestimate benefits: the best a new system is likely
to perform is: no
worse than what you had, but you should assume the new system will work quite
a bit worse until everyone gets used to it, if it works at all.
INCIS was supposed to save $285 million, really a ridiculously
small payoff for such a
big ($90 million), high-tech, gamble.
- Under configure the hardware - you can buy more later! Scalability is not
just a wonderful tech term here, its also a great marketing tool and it
will become a very important issue later
on in the project. However, only expensive servers scale well, so if you're
a low-budget outfit it's probably trade-in time.
- Underestimate costs: calculate them using your hugely optimistic forecasts.
Current cost-forcasting methods are doing well if they get within 100% of
the eventual cost.
INCIS signed up at $90 million in Sept 1994, $120 million in Feb 98,
$130 million plus in May '99.
- Schedule an unrealistically tight timetable and let it slip alot. We'll
talk about the slippage factors later on, as well.
the INCIS completion date slipped from March '97 to, eventually, August '99 when
it was canned.
- Make sure the Contractor and/or Consultants don't understand your business
operation. In fact, the CEO probably understands the business best but is
happy to pass control into the hands of the SO competent vendor.
IBM had problems here, obviously, when implementing INCIS
- Don't change or simplify
your business to fit a standard package but rather commission custom
software or (best) modify an existing package extensively.
For a successful MIS system specification you should be
able to write down your business operation on two A4 sheets. In longhand.
- Use computers to introduce management change or (best) computerize
and restructure together. Boards of directors like to do this because it
avoids all that messy union negotiation and interaction with the workers.
INCIS was introduced at the same time as significant police restructuring
(downsizing).
In fact, INCIS was designed to support the restructuring because the police
couldn't afford it in any other way.
- Trust your supplier. Especially, trust their sales staff.
3. The contract
Rick Swinard again:
"I reckon there's a good opportunity for smart IT lawyers
to get in between govt depts and the likes of the INCIS and Landonline
providers, and insist on performance clauses and cost penalties if their fancy
projects go off the rails."
It's amazing how little use all that stuff is,
when you get down to it. Once little change request can void your whole
contract.
The INCIS contract was a 4000 page, fixed-price contract. IBM was happy to
work to this, until the fixed amount of money ran out.
The actual signing up process for a contract is frequently shrouded in
secrecy, but is probably the where the vendor will get their
"super-salesperson" involved - the one who will close the contract, remove the
troublesome liability clauses and increase the size of the order, all in one
meeting! Often the actual meeting only involves two or three people, one of
them the CEO, and it's amazing what changes are made to the contract, flying
in the face of all advice. The Wessex Regional Health Authority disaster is
a classic case in point.
Mike Sprange of PriceWaterhouseCoopers says:
"Something that I suspect is lacking in all public sector environments is
a really determined external review of major projects."
Public services in most countries have a Govt Audit office of some kind,
frequently very overworked, however NZ's office, like the English one, is
pretty much toothless. Only in the USA can the watchdogs stifle a project
at birth, and they only get a few...
4. Impending Disaster
The Press editor asks "why were comparatively unskilled police officers
put in charge of the [INCIS] project? Actually they were probably the best
people, if they had been skilled project managers. As it was, things
just got away on them.
Digging a deeper hole
At this point something interesting happens: the project management starts
to become disconnected from the project implementation (ie: what's really
happening). Here's how:
-
Let the tekkies take over - the more technology the better, right? It's
not unknown for the programmers to end up developing something very hi-tech,
but completely useless for the organization, or to use the organization as
a guinea-pig for a brand new system without their knowledge.
However, the INCIS programmers have worked for years for no worthwhile result.
- Assume the new system will work fine from day one and so don't run the
old system in parallel: throw it away. This is part and parcel of the "Big
Bang" approach.
- Do not monitor progress properly, especially never ask for bad news:
the messenger is frequently shot, eg: the Health Waikato CEO.
No-one wants to annoy the CEO with details such as massive cost overruns,
schedule slippage, etc. The worse it gets, the less people want to tell.
The police association head, Greg O'Conner, said re INCIS:
"Anyone who questioned the project was pushed aside".
- Don't have "milestones" which the project must pass to continue. Giving up
should be unthinkable.
- Never stop, never go back.
There were a number of obvious places to kill off INCIS, but no-one even
considered it until the money ran out.
- Encourage secrecy: "the programmers will work harder if they compete with
each other" and cover ups (no bad news, remember?).
- Don't get the end users on board but try to alienate them and
try to force change on them.
Training should be an afterthought: too little, too late.
INCIS users were certainly alienated - they had been hyped up to expect a
better system from the word go.
- Don't run a pilot project or run an unrealistic pilot and then ignore
any bad results.
- Let end users ask for lots of changes late in the project especially
after the
first installation. This is easy to justify because of a "chicken and egg"
problem: often, when you ask them, users don't have a clear idea of what
they want from a computer system until they see it in operation. Therefore,
bespoke software usually develops in an iterative way.
The trouble with last-minute changes versus an inflexible but initially well
specified system is that the schedule is bound to slip, costs are certain to
balloon (NB: often no legal comeback) and bugs will abound. The way to avoid
this is to prototype a system, use it to write a watertight specification
and then
throw the prototype away. However, a with a real disaster the prototype
operation is in fact the implementation phase...
INCIS had more than 900 contractural variations over its lifetime.
5. Final Disaster
What follows is consequential on what went on in the previous stages. Now
the chickens come home to roost.
The INCIS failure was unusually spectacular: IBM wanted to renegotiate and
re-specify INCIS, but the government wouldn't move - IBM says for no reason
- so IBM walked away.
Usually the first external sign of trouble is schedule slippage - only a few
days at first. Other, less visible signs, include:
- Slow (like >5 mins) response times to interactive queries, even in the
prototype system, assuming you had one.
- Huge piles of bug reports, especially "system crash" type bugs.
Of course, these are dismissed as teething troubles, but increasingly
desperate measures are tried:
- Throw money and consultants at any problem to fix it. Throw good money
after bad.
- Don't give up if you've spent 50% (or 100%) of the budget with no results
- Commit fraud(!) to keep the project going (eg: the Florida Traffic Dept
who were censured for misappropriating funds from a maintenance account for
new development.)
It is possible to turn a sows ear (a software disaster)
into a silk purse, but you have to spend roughly 300% of your original budget.
The Health Waikato SMS system ended up with a 200% increase.
Death Of A Project
Often the death of a large project is shrouded in secrecy,
citing commercial sensitivity, etc, as is it's birth.
INCIS has been surprisingly public, and
the public enquiry might be quite illuminating.
However, the NZ and English public service both have a real culture of
hiding their failures.
Commercial software failures are much more likely to be hidden: the company
might post a large annual loss, or simply go bankrupt.
7. Litigation
After the project has failed call in the lawyers. A canny supplier will sue
first so as to put the vendor on the back foot and hopefully force an
out-of-court settlement.
IBM and the government have settled out of court, with IBM getting all the
monies owed but giving $25 million back to the government.
However, not many clients win in court, no matter how
good the contract (eg: fixed price, no payment until a working system is
delivered), because
- Suppliers are good at getting burdonsome contract provisions amended or
removed.
- Even if a "watertight" contract is drawn up, the actual process of
implementation will include lots of non-contractural things, especially
change requests, enough to muddy the waters or nullify the contract.
For example: "Standard" contracts with limitation of liability clauses, Romalpa
clauses, last-minute amendments, etc.
Further research
The Police Commissioner, Peter Doone, wants to put INCIS behind
him and move on (Morning Report 10-8-99).
Watch the Ministry of Ed's new "Tertiary Information Project": a
comprehensive, computerised scheme to link funding, quality assurance and
course and student information; part of the Tertiary White Paper and still
continuing despite the official "hold" on TWP policies.
See http://www.minedu.govt.nz/tertiary/tip/.
Users (schools, polytechs and
universities) will have to pay for the system, and so have a chance of stifling
it at birth (by boycotting it). Already there are concerns about the
functionality and compliance costs of the system: likely to be 3-4 times the
ministry's own allocation of $5m. The users are skeptical! A worrying sign
is the MOE's press releases
saying the project will result in "a major reduction in compliance
and administration costs".
However, recent reports indicate that the project is going well, and the only
institutions up for major compliance costs will be those newly required
to supply information.
Epilogue
Software engineering is still a young science, concerned with the management of
the implementation of hugely complicated systems. Consider the production of
wine, a hugely complicated substance, but its manufacture is made simple by
many self-regulating processes. In software project implementation there
is often a good
chance for similar self-regulation but the chance is frequently thrown away
because of human nature: politics, competitiveness and fear of failure.
Resources
- Software failure: management failure, Flowers, 1996.
- Crash!, Collins/Bicknell, 1998.
- Recent issues of
The Press, Computerworld, InfoTech Weekly and
NZ Education Review.
- National Radio Morning Report.