The recent catastrophic BA outage, allegedly caused by an electrical contractor resulting in the total and immediate shutdown of the entire data centre highlighted an interdependency of systems that wasn’t previously known. It was stated that there are around 200 systems used by BA in its operations that are required to work together. This is a surprising number when you think that as a consumer all you are doing is buying a ticket to get on a plane and hand over your ID/Passport details.

Monitoring the status of a single process that should be running is one thing, the uptime of a server or maybe the response from a web page another, but that’s not particularly intelligent in terms of understanding the overall “health” of an application that the business relies on. And although the whole is greater than the sum of the parts, those parts need looking after as well.

There should be time allocated to go back and review exactly how an application or service is being delivered and then take the time to understand if there is an element of that solution that is starting to present a risk. This kind of exercise can also highlight the need for cross-training.