What if a nuclear bomb hits near your data center...??!!

Reading Time: 6 minutes

Let’s face it: when we see nuclear bombs again, then the world is gone crazy enough to maintain anything, rather than your workplaces.

But actually, in one of my classes, I asked about different scenarios to be taken into consideration while planning business continuity and disaster recovery plan. In fact, I let the students talk freely and suggest the situation and also discuss the recovery plan. it was perfectly nice.

One of the suggested scenarios was “What if a nuclear bomb hits near your data center…??!!” I started to write down their plans and modified them to reflect science – as per my background as a nuclear engineer – and reality as well.

Let’s see some arguments

1- Drop it and run

It is obviously the first thing you should think of when you are around a nuclear explosion. It is to drop everything you do – whatever you do – and run.

This is one of the motos advertised by IAEA (International Atomic Energy Agency) if you face uncontrolled radiation. This is because the safety of people is more important than anything else. There is a theory that “If people die to save your data center, so who left to work on this data center then?”

If you are working from home – like most people after a pandemic – so, you are safe and away from the blast effect. But for those who are in the area of the blast, here is some information:

In real-world, This is an experimental main effect that will happen from the blast:

Fireball (~0.9 km wide) — In the area closest to the bomb’s detonation site, searing flames incinerate most buildings, objects, and people.
Radiation (~2 km wide) — A nuclear bomb’s gamma and other radiation are so intense in this zone that 50% or more of people die within “several hours to several weeks,”
Air blast (~7.5 km wide) — This shows a blast area of 5 pounds per square inch, which is powerful enough to collapse most residential buildings and rupture eardrums.
Thermal radiation (~10.5 km wide) — This region is flooded with skin-scorching ultraviolet light, burning anyone within view of the blast. “Third degree burns extend throughout the layers of skin, and are often painless because they destroy the pain nerves. They can cause severe scarring or disablement, and can require amputation.”
Futurism

From the above effects, it is obvious that if you have multiple workspaces (i.e multiple data centers), they are all should be apart with more than 10.5 km. this is to avoid the widest effect. Which is thermal radiation.

Another idea came up in my student’s minds: using cloud providers’ services is valuable here. The beauty of using the cloud is that most of the global cloud providers have multiple data centers on different continents.

2- Backup, replication, and other technical stuff

Now, let’s talk about your data…

This is part of our job description, to maintain our data backed up and ensure it is replicated based on predefined SLA to other sites.

But taking into consideration the different scenarios that could happen and then use backup and replication options. in our case here: Nuclear explosion.

Backup & replication are essential part of day-to-day IT life. the current existing B&R solutions are always in an evolving to coupe with the changes from single backup to whatever destination, to more into full data protection solutions.

For example, One of widely used solutions for backup and replication is Veeam. They are moved from traditional backup policies to full data protection. They have different solutions to maintain even your backup safe and secure. Like backup copy.

With backup copy, you can create several instances of the same backup file and copy them to secondary (target) backup repositories for long-term storage. Target backup repositories can be located in the same site as the source backup repository or can be deployed offsite. The backup copy file has the same format as the primary backup, so you can restore necessary data directly from it in case of a disaster.
Veeam

Also, archiving job is a way to maintain a copy of your backup in an archiving form and offloaded from the performance tier.

In this specific case we are talking about, backup copy or archiving job MUST be planned to be offsite by minimum 10.5 km apart (and now we know why).

Replication also is a good choice to think about. Replication – as a definition – is a way to sync your data to one or more locations to ensure consistency and improve reliability, fault tolerance, and accessibility.

If you consider replication, you have to take into consideration some terms and how affect your business. let’s see

3- RTO/RPO/MTPD …… and lots of BIA terminologies

in this section, I will discuss some of Business Impact Analysis (BIA) terminologies that can affect – directly or indirectly – you business. according to Wentz Wu.

The first version of NIST SP 800-34 used the term Maximum Allowable Outage (MAO) to describe the downtime threshold of the information system. To further delineate the business process and the information system downtime, Maximum Tolerable Downtime (MTD) and Recovery Time Objective (RTO) terms are used.

Downtime here refers to the disruption of the business process, while outage emphasizes the unavailability of the information system. Terms such as downtime, interruption, and disruption can be used interchangeably, so do allowable, acceptable, and tolerable. Maximum Tolerable Downtime (MTD) is also known as Maximum Tolerable Period of Disruption (MTPD) and Maximum Allowable Outage (MAO) as Maximum Tolerable Outage (MTO).

As various methodologies or approaches may define those terminologies differently and lead to miscommunication, the diagram in this post demonstrates a scenario to introduce common languages used in the analysis of business impact.

Acceptable Interruption Window (AIW)

Acceptable Interruption Window (AIW) is “the maximum period of time that a system can be unavailable before compromising the achievement of the enterprise’s business objectives.” (ISACA, 2019)

AIW is also known as the Maximum Tolerable Downtime (MTD) or Maximum Tolerable Period of Disruption (MTPD). However, the definition by ISACA emphasizes “system,” while MTD or MTPD is a business term that focuses on the disruption of business processes or prioritized activities.

Work Recovery Time (WRT)

Work Recovery Time (WRT) is the “length of time needed to recover lost data, work backlog, and manually captured work once a system is recovered and repaired.” (BRCCI, 2019)

WRT is typically related to the Recovery Point Objective (RPO). The shorter is the RPO; the quicker is the WRT. The sum of the repairing time and WRT should be less than the Recovery Time Objective (RTO).

Recovery Time Objective (RTO)

Recovery Time Objective (RTO) is “the amount of time allowed for the recovery of a business function or resource after a disaster occurs.” (ISACA, 2019)

The recovery of a business function or resource means it meets both the ROP and Service Delivery Objective (SDO), and subject to Maximum Tolerable Outages (MTO); it is restored with the latest data and operates at an adequate level of services within the constraint of MTO.

Recovery Point Objective (RPO)

Recovery Point Objective (RPO) is “determined based on the acceptable data loss in case of a disruption of operations. It indicates the earliest point in time that is acceptable to recover the data. The RPO effectively quantifies the permissible amount of data loss in case of interruption.” (ISACA, 2019)

The RPO drives the design of recovery or alternate site and backup strategy. It also affects Work Recovery Time (WRT).

Service Delivery Objective (SDO)

Service Delivery Objective (SDO) is “directly related to the business needs, it is the level of services to be reached during the alternate mode until the normal situation is restored.” (ISACA, 2019)

When a system is resumed within the RTO and RPO, it operates in alternate mode, in which the system should provide an adequate level of services and meet the SDO.

Maximum Tolerable Outage (MTO)

Maximum Tolerable Outage (MTO) is the maximum time that an enterprise can support processing in alternate mode. (ISACA, 2019)

The alternate mode is not viable for long-term operations. MTO sets the objective of the time period for the business continuity solutions to transit to normal mode.

From the ubove terminologies, you can calculate – in brief – what is the time and data you can lost to continue your business to work as usual without inconsistency. In our case, you can consider calculating these if you face a disaster such as nuclear explosion.

4- Cloud is a possible solution

Moving to cloud is a good way to avoid many of the ubove calculations. For example, your workload can be shared in your data center and in public cloud, or as we know “Hybrid cloud”. So, in case of such a disaster, you will be sure that your workload will not be affected.

Also, another terminology raised lately, which is “Multi-cloud”. Which is a way to host your services into multiple public cloud providers.

Cloudflare

With this approach, you will maintain your workload not only even stand up in case of disaster, but also you maintain it against failure could happen to one of cloud providers.

Final thought

Nuclear war is something unimaginable. and many people may consider it as end of life as we know it. However, it would be one of the worst thing that can ever happened, but it cloud be recovered, and life will go on. Positive thinking here is something needed to imagine how life may goes on after the disaster. and not losing what you’ve done and achieved over years.

Some important link regarding surviving nuclear blast:

https://www.cdc.gov/nceh/radiation/emergencies/nuclearfaq.htm

https://www.businessinsider.com/nuclear-disaster-dos-and-donts-2019-9

Experience the power of a nuclear blast in your area:

https://outrider.org/nuclear-weapons/interactive/bomb-blast/

Muhammad Adel

www.archtonic.net

Spread the love

What if a nuclear bomb hits near your data center…??!!