How To Plan A Computer Disaster

Notes of a talk I gave at SPUSC 99 (South Pacific User Serivices Conference)
and an NZCS meeting.

Author: Tony Dale, Computer Science Department, University of Canterbury.


August 1999
Updated April 2000

Background

Over 1999 a number of large computer project disasters, each costing more than 10 million dollars, have hit the headlines, so I thought I'd look in to some of the reasons for their occurence. There is an extensive literature on project disasters, and they are quite well-characterized, but still they happen.

The Standish Group has estimated the annual costs of computer software disasters to be over 80 billion dollars (in J. Johnson, 1995; "Chaos: the dollar drain of IT project failures", Applied Development Trends, pp 41-7).

For many examples I'll draw on the INCIS disaster, described as "One of the biggest bureaucratic bungles in NZ history". Anyone want to buy a used IBM mainframe?

Computing and Other Disasters

Engineering is partially an empirical subject which advances on a trail of mistakes.

Stressful environments will always flush out design inadequacies, whether it's a 4WD breaking down in the desert or a bridge falling down because of wind-induced harmonics; bridge-building technology is still advancing after more than 10,000 years.

However, computer disasters do seem to be uniquely frequent and well-tolerated. Consider the outrage generated from a $250,000 WINZ trip to Queenstown, versus the quiet expiration of the $9 million Waikato Health project, where the chief executive resigned when contract was awarded, and the deputy chairman and group manager of audit and finance have resigned after the failure. In May 2000 legal action against the remaining Waikato Health directors was being mooted.

Large projects can be halted: the $400 million Britomart project was halted at an early stage (and the lawyers are gathering) for want of a $15 million top-up. However, Britomart was killed off by political changes in the Auckland City Council, not because it's infeasible to proceed. Computer disasters, however, are frequently complete failures.

Rick Swinard writes in the Press, 2 June '99: "I find it difficult to think of another industry that is as good as the computer business at shooting itself in the foot." I've got news for Rick: it's a team effort.

What Is A Disaster?

The Progress Of A Disaster

1. The vision

Start with a vision: a vague statement leading to your downfall. Consider Joan of Arc. The vision is what keeps a project going long after any sane person would have abandoned it. Usually the CEO has the vision, and signs the cheques. The rest is left up to the salespeople and the "tekkies".

Did INCIS have a vision?
INCIS was big and was going to lead the world; a couple of prerequisites for a disaster.

2. The Spadework

3. The contract

Rick Swinard again: "I reckon there's a good opportunity for smart IT lawyers to get in between govt depts and the likes of the INCIS and Landonline providers, and insist on performance clauses and cost penalties if their fancy projects go off the rails."

It's amazing how little use all that stuff is, when you get down to it. Once little change request can void your whole contract. The INCIS contract was a 4000 page, fixed-price contract. IBM was happy to work to this, until the fixed amount of money ran out.

The actual signing up process for a contract is frequently shrouded in secrecy, but is probably the where the vendor will get their "super-salesperson" involved - the one who will close the contract, remove the troublesome liability clauses and increase the size of the order, all in one meeting! Often the actual meeting only involves two or three people, one of them the CEO, and it's amazing what changes are made to the contract, flying in the face of all advice. The Wessex Regional Health Authority disaster is a classic case in point.

Mike Sprange of PriceWaterhouseCoopers says: "Something that I suspect is lacking in all public sector environments is a really determined external review of major projects."

Public services in most countries have a Govt Audit office of some kind, frequently very overworked, however NZ's office, like the English one, is pretty much toothless. Only in the USA can the watchdogs stifle a project at birth, and they only get a few...

4. Impending Disaster

The Press editor asks "why were comparatively unskilled police officers put in charge of the [INCIS] project? Actually they were probably the best people, if they had been skilled project managers. As it was, things just got away on them.

Digging a deeper hole

At this point something interesting happens: the project management starts to become disconnected from the project implementation (ie: what's really happening). Here's how:

5. Final Disaster

What follows is consequential on what went on in the previous stages. Now the chickens come home to roost. The INCIS failure was unusually spectacular: IBM wanted to renegotiate and re-specify INCIS, but the government wouldn't move - IBM says for no reason - so IBM walked away.

Usually the first external sign of trouble is schedule slippage - only a few days at first. Other, less visible signs, include:

Of course, these are dismissed as teething troubles, but increasingly desperate measures are tried: It is possible to turn a sows ear (a software disaster) into a silk purse, but you have to spend roughly 300% of your original budget. The Health Waikato SMS system ended up with a 200% increase.

Death Of A Project

Often the death of a large project is shrouded in secrecy, citing commercial sensitivity, etc, as is it's birth. INCIS has been surprisingly public, and the public enquiry might be quite illuminating. However, the NZ and English public service both have a real culture of hiding their failures.

Commercial software failures are much more likely to be hidden: the company might post a large annual loss, or simply go bankrupt.

7. Litigation

After the project has failed call in the lawyers. A canny supplier will sue first so as to put the vendor on the back foot and hopefully force an out-of-court settlement.

IBM and the government have settled out of court, with IBM getting all the monies owed but giving $25 million back to the government.

However, not many clients win in court, no matter how good the contract (eg: fixed price, no payment until a working system is delivered), because

Further research

The Police Commissioner, Peter Doone, wants to put INCIS behind him and move on (Morning Report 10-8-99).

Watch the Ministry of Ed's new "Tertiary Information Project": a comprehensive, computerised scheme to link funding, quality assurance and course and student information; part of the Tertiary White Paper and still continuing despite the official "hold" on TWP policies. See http://www.minedu.govt.nz/tertiary/tip/.

Users (schools, polytechs and universities) will have to pay for the system, and so have a chance of stifling it at birth (by boycotting it). Already there are concerns about the functionality and compliance costs of the system: likely to be 3-4 times the ministry's own allocation of $5m. The users are skeptical! A worrying sign is the MOE's press releases saying the project will result in "a major reduction in compliance and administration costs". However, recent reports indicate that the project is going well, and the only institutions up for major compliance costs will be those newly required to supply information.

Epilogue

Software engineering is still a young science, concerned with the management of the implementation of hugely complicated systems. Consider the production of wine, a hugely complicated substance, but its manufacture is made simple by many self-regulating processes. In software project implementation there is often a good chance for similar self-regulation but the chance is frequently thrown away because of human nature: politics, competitiveness and fear of failure.

Resources