Facebook Crash: A Reminder About Redundancy and Backups
The ‘unthinkable’ happened on Monday, 4 October 2021: Facebook crashed around the world for more than six hours. Billions of global users could not log on to their Facebook accounts, much less check their messages and newsfeeds. The outage also affected other Facebook-owned properties.
Though the outage was yet another black eye for the world's largest social media platform, it was also a reminder of the need for redundancy and backups. No company is too big to fail online. No network is strong enough to maintain 100% uptime in perpetuity.
What Happened to Facebook?
At the time of this writing, it is still not entirely clear what happened. Media reports indicate that Facebook suffered some sort of internal problem caused by a faulty configuration change. In turn, the company's entire network went dark. Getting things back up was made more difficult by the fact that technicians couldn't even access Facebook's data centres because their ID cards could not be scanned.
For a more in-depth understanding of what may have happened, take a look at a Cloudflare post from Celso Martinho and Tom Strickx. In the hours following the crash, they wrote to explain how Facebook's network essentially stopped reporting itself to the wider internet. They go through all the details involving DNS name servers, IP addresses, etc.
What is most important is an understanding that what happened to Facebook can happen to any other company regardless of size and financial position. Every network is vulnerable. Your best defence against losses caused by downtime is a combination of redundancy and data backups.
Know Your Data Centre
Facebook's case is somewhat unique in that they own their own data centres and infrastructure. When something on their network goes wrong, Facebook alone is responsible for fixing it but the vast majority of companies with an online presence are not in Facebook's position. Their data and networks are contained within commercial data centres.
What should this tell you as a small business owner? Know your data centre provider. Does your provider have built-in redundancy ready to go 24/7? If not, a server going down could mean your business being offline for an unacceptable amount of time. Likewise, all your data should be backed up on separate servers – just in case something goes wrong.
Downtime Means Lost Revenue
Because we don't know exactly what caused the Facebook crash, it's hard to say why redundancy didn't solve the problem right away. We do know that being down for more than six hours likely cost Facebook billions of dollars through a combination of lost advertising revenue and a drop in the company's stock price.
The reality is that downtime means lost revenue. Whenever a customer cannot access your business online, they will go elsewhere. Your site being offline frequently enough could mean a considerable drop in your customer base. It all adds up to lost revenues that your company really cannot afford.
We may never know exactly what happened to Facebook. Their explanation of an internal configuration error may or may not be accurate however, still, the lessons here are clear. Business owners need confidence in their data centre providers. They need to know that data centres offer redundancy and sufficient data backups.
While you are thinking about the reliability of your own data service provider, all eyes will be on Facebook to see if more details will be forthcoming. It is almost surreal to know that a company of their size and reach could suffer such a significant outage for so long a time. But what this demonstrates is that it can happen to even the best.