Lessons from the O2 Network Outage: The Real Cost of Manual Processes

Friday December 14, 2018

More than 30 million people lost their data connectivity on December 6, 2018, in the United Kingdom as O2's network suffered from a nationwide service outage. Based on several reports, the incident was caused by a human error at Ericsson, the telecoms supplier responsible for operating certain parts of the O2 network.

To compensate for the downtime and tarnished reputation, the O2 management is now reportedly seeking damages of up to a hundred million pounds from Ericsson. Considering the size of the claim, this incident clearly shows that the cost of human errors that lead to network downtime is much higher than traditionally accounted for.

At the same time, most service providers working on digital transformation initiatives continue to rely on manual work in many parts of their network automation processes. Based on discussions with dozens of service providers during 2018, incumbent network automation vendors such as Cisco, Ericsson, Nokia and VMware still continue to rely on text files and spreadsheets as part of their solutions.

As manual processes have been the norm in network management for decades, most organizations view automation from the Operating Expense (OPEX) point of view. To justify an investment into a solution that replaces a pair of human hands, the investment is commonly assessed by estimating the cost of manual labor and calculating the payback time for the required investment.

Yet as the recent O2 incident shows, the chances are that the real total cost of manual work is much higher. In a report published in mid-2017, Gartner estimated the average cost of network downtime at 5,600 dollars per minute including both the direct and the indirect costs.

Further considering that based on various estimates roughly half of the network downtime is caused by human mistakes, the salary and overheads paid for manual work are really just a tip of the iceberg.

SD-WAN – Business Case for Automating Manual Network Management Tasks

To provide a concrete example of how the economics of manual steps in network management play out, we decided to prepare a business case for a mid-sized service provider or a large enterprise. As manual mistakes in the source data used for network automation propagate the quickest, we decided to use Software-Defined Wide Area Networking (SD-WAN) as an example.

Another reason to select SD-WAN is that edge computing is picking up momentum quickly. Yet leading solutions like Cisco Viptela, Nokia Nuage and VMware VeloCloud still often rely on excel spreadsheets and/or text files as the source for network data used in service activation. The risk of network downtime caused by human error is at its highest with these emerging technologies, as they allow the manual mistakes to propagate further and quicker than ever before in history.

The business case was based on the following assumptions:

  • The operating organization has 10,000 networks under management
  • The organization spends 12 network engineering man-years on manual tasks at the average cost of USD 75,000 per year including overheads
  • On the average, each network suffers one minute of downtime annually
  • 49% of downtime is caused by manual errors (Wireless Review)
  • The average total cost of network downtime is USD 5,600 per minute including direct and indirect costs such as lost reputation, lost business, lost productivity and so forth (Gartner)
  • The total investment in the network automation platform is 20 million dollars
  • The manual steps make 25% of possible automation use cases impossible, thereby reducing the expected ROI from 20% to 15%

Based on these assumptions, we were able to establish the following total cost for the manual work:

  1. Operating Expense (OPEX) spent on manual tasks: USD 900,000
  2. Foregone ROI due to manual steps in the process: USD 1,000,000
  3. Cost of network downtime: USD 27,440,000
  4. Real annual total cost of manual steps in network automation: USD 29,340,000

As we prepared these calculations in November 2018, we were initially skeptical about whether the outcome could really be true. But after the O2 incident at a cost of 100 million pounds, we realized our calculation seems to be very nicely within the ballpark.

Conclusion

While the finding that network downtime constitutes 93.5% of the real total cost of manual network management steps was a real eye-opener, these numbers reveal an even more interesting finding. That is, if one invests 20 million into automating the propagation of human errors, the chances are that the cost of downtime can greatly exceed the value of the actual automation investment itself.

As Digital Transformation marches on, it is not likely that this calculation will do anything to prevent technology megatrends such as cloud computing, edge computing or 5G from marching on. But what it does imply is that if one is serious about network automation, he should not leave any manual steps in an automated process regardless of what a traditional OPEX-based business case would seem to suggest.