When Telecom and Data Services Fail People Get Angry
The third week of July 2022 was not a good one for Canadian telecom companies. It was particularly bad for Rogers, after a yet-to-be explained glitch knocked out internet and wireless services for more than 10 million customers all across Canada. And yes, people are angry.
Canada's telecom industry is dominated by three companies: Rogers, Bell Canada and Telus. The three companies combined control more than 90% of the Canadian market so, when one goes down, a lot of people are affected.
What made the most recent shutdown so maddening for so many people was that Rogers was unable to get things up and running in a timely manner. The outage lasted a stunning 19 hours and affected everything from 911 services to local internet access.
The Official Explanation
Rogers' official explanation was a maintenance update failure. That explanation did not sit well with Canadian regulators. They responded by giving the company just a few days to produce a detailed report explaining exactly what happened.
In the meantime, Canadian lawmakers are planning a complete investigation of their own. Rogers continues to try to diffuse the situation with ongoing public apologies and the promise of a five-day service credit to compensate for any losses caused by the outage.
Things Happen But…
The Canadian media and government have been absolutely relentless in their criticisms of Rogers since the shutdown. We would not expect anything different but, in fairness, nothing is perfect. All technologies fail at some point. Every system has its weaknesses; every policy has its gaps that remain unforeseen until exposed by a serious problem.
Could Rogers have done anything to prevent the outage? We cannot say until we learn the actual cause. We may find out that it just boils down to the fact that things happen but, even if this is the case, being down for 19 hours is another matter altogether.
That's Why We Have Redundancy
When your customer base is more than ten million, there is a certain expectation of redundancy being built-in to your system. Redundancy is standard to telecom and data services. Any data centre worth its salt operates with built-in redundancy and the same goes for collocation centres, telecom centres, etc.
The important question relating to the Rogers failure is not why it happened. Anyone who owns a PC knows that just one software glitch can lead to a blue screen of death and a computer that no longer works. No, the important question is why it took so long to restore service.
Going 19 hours without 911 services is unacceptable and downright dangerous. Expecting people who no longer have landline telephones to go nearly a full day without being able to make a phone call is not reasonable. Something went wrong with Rogers' redundancy capabilities. That is what needs to be determined and fixed.
You Never Really Know
Canada has heavily restricted telecom competition to keep foreign operators out of the market. They have created a highly consolidated system that allows the top three companies the ability to pretty much do what they want. In other countries, including the UK, the exact opposite scenario exists. The one thing all national markets have in common is this: you never really know when service is going to go down.
It's not possible to maintain 100% uptime in perpetuity. Things happen and when they do, people get angry. The best telecom and data services companies can do is implement redundancy strategies and policies designed to recover from failures as quickly as possible. Rogers fell down somewhere in that arena in mid-July and now they are paying for it.