Mission Critical IT Disaster-Recovery with iTAM™


 Disaster-Recovery Missing?

asset-management,risk-management,unique features of iTAM asset-managementDuring the development of Security Policies for an organization that was at the beginning of a journey towards safeguarding their own (and their customer’s) data, we were never asked the most obvious question: What if a disaster strikes and we have to go through this?

Everyone was so concentrated on determining the input, that the obvious thought did not appear: What do we really do when disaster strikes in particular? All good and well having a great policy, but we need to get cracking at implementing technical solutions that help us act when the unthinkable really occurs: A disaster-recovery or business-continuity plan.

Based on the policy, strategies to act in accordance to the policies emerge. And sometimes, like in this case, new solutions to issues at hand are discovered.

A Business Continuity Plan or BCP is how an organization guards against future disasters that could endanger its long-term health or the accomplishment of its primary mission. The primary objective of a Disaster-Recovery plan (a.k.a. Business-Continuity plan) is the description of how an organization has to deal with potential natural or human-induced disasters.

asset-management,risk-management,unique features of iTAM asset-management

source: Carnegie Mellon [1]

The disaster-recovery plan steps that every enterprise incorporates as part of business management includes the guidelines and procedures to be undertaken to effectively respond to and recover from disaster recovery scenarios, which adversely impacts information systems and business operations. Plan steps that are well-constructed and implemented will enable organizations to minimize the effects of the disaster and resume mission-critical functions quickly.[2]

According to NIST, The DRP only addresses information system disruptions that require relocation[3]. (Source: NIST). For our short analysis, we will treat the two terms as meaning the same – it is not quite necessary (or possible) to invest in an alternative location like a container data-center in all cases.

Businesses should develop an IT disaster-recovery plan. It begins by compiling an inventory of hardware (e.g. servers, desktops, laptops and wireless devices), software applications and data. Unfortunately, inventories pose an issue more often than not. Having a complete and up-to-date asset list is not supported in the way needed and desired for a disaster recovery plan. Most tools on the market support only a limited number of operating systems, and non-smart assets are a tedious manual workload. The toolset R&P offers underpinning our field-proven managed project office, is operating system agnostic and provides real-time information back on HW-assets, SW-assets (all operating systems), firmware-assets, BIOS/EFI, router/switch-configs, printer-queue/print-server configuration and by way of the newest addition, also provides access to firmware versions of displays and further non-smart assets.

From this inventory (asset-management), it is fairly easy to identify critical software applications and data and the hardware required to run them. Using standardized hardware will help to replicate and re-image new hardware. Ensure that copies of program software are available to enable re-installation on replacement equipment. Prioritize hardware and software restoration[4]. (Source: HLS US)

Phases of building a BC- or Disaster-Recovery Plan

Phase I – Data Collection

– the project should be organized with timeline, resources, and expected output

– the business impact analysis should be conducted at regular intervals

– a risk assessment should be conducted regularly

– Onsite and Offsite Backup and Recovery procedures should be reviewed regarding suitability and performance

– Alternate site locations (if any) must be selected and ready for use

Phase II – Plan Development and Testing

– development the Disaster Recovery Plan (DRP)

– Test the plan (regularly)

Phase III – Monitoring and Maintenance

– Maintenance of the plan by way of updates and regular reviews

– Periodic inspection or audit of DRP

– Documentation of any changes

There is – of course – need to introduce to staff any necessary information about the plans and train them on it, otherwise, staff cannot oblige to the rules once a critical situation hits.

Disaster-Recovery Plan Criteria

A documentation of the procedures as to declaring emergency, evacuation of site pertaining to nature of disaster, active backup, notification of the related officials/DR team/staff, notification of procedures to be followed when disaster breaks out, alternate location specifications, should all be maintained. It is beneficial to be prepared in advance with sample DRPs and disaster recovery examples so that every individual in an organization are better educated on the basics. A workable business continuity planning template or scenario plans are available with most IT-based organizations to train employees with the procedures to be carried out in the event of a catastrophe occurring[5].

Recovery strategies should be developed for Information technology (IT) systems, applications and data. This includes networks, servers, desktops, laptops, wireless devices, data and connectivity. Priorities for IT recovery should be consistent with the priorities for recovery of business functions and processes[6]. (Source: HLS US)

Downtime can be identified in several ways[7] (Source NIST):

patch-management,SCCM,release-management,security,hacker,data-protection,license-management,disaster-recovery,asset-management,inventory,inventory-management

Cost-Benefit

The longer a disruption is allowed to continue, the more costly it can become to the organization and its operations. Conversely, the shorter the return time to operations, the more expensive the recovery solutions cost to implement[8].

cost-balance

Cost Balance Point / Break Even

(Note that R&P excel in cost-reduction of systems recovery)

IT Disaster-Recovery Strategies

Information technology systems require hardware, software, data and connectivity. Without one component of the “system,” the system may not run. Therefore, recovery strategies should be developed to anticipate the loss of one or more of the following system components:

– Computer room environment (secure computer room with climate control, conditioned and backup power supply, etc.)

– Hardware (networks, servers, desktop and laptop computers, wireless devices and peripherals)

– Connectivity to a service provider (fiber, cable, wireless, etc.)

– Software applications (electronic data interchange, electronic mail, enterprise resource management, office productivity, etc.)

– Data and restoration[9] (Source: HLS US)

Impact Analysis

The impact analysis should identify the operational and financial impacts resulting from the disruption of business functions and processes. Impacts to consider include:

  • Lost sales and income
  • Delayed sales or income
  • Increased expenses (e.g., overtime labor, outsourcing, expediting costs, etc.)
  • Regulatory fines
  • Contractual penalties or loss of contractual bonuses
  • Customer dissatisfaction or defection
  • Delay of new business plans

in case of corporate businesses and in similar ways for public services.[10]

Testing and Maintenance

The dates of testing, the disaster recovery scenarios, and plans for each scenario should be documented. Maintenance involves records of scheduled review on a daily, weekly, monthly, quarterly, yearly basis; reviews of plans, teams, activities, tasks accomplished and complete documentation review and update.

In case of an incident

These are the recommended three steps in case any incident happens, be it a hacking attack or other malevolent cyber-incidents (e.g. ransomware hitting the organization), malfunctioning software- or operating-system-updates or faulty firmware, BIOS or software patches:

– Identification

– Containment

– Eradication (A good example of actions performed during the eradication phase would be using the R&P-provided toolset which allows for an individual recovery of each complete system end-to-end). Professional services close the attack-vectors, but at this point, it is of the essence not to lose time with forensic or analytical work. If necessary, the R&P tools may perform cloning of affected systems for analytic use later.

– Recovery (bring affected systems back into the production environment carefully, as to insure that it will not lead another incident. It is essential to test, monitor, and validate the systems that are being put back into production to verify that they are not being re-infected by malware or compromised by some other means.)

– Lessons Learnt[11] (well, this is the task of documentation everyone hates, but it is essential for future reference)

Checklist

This checklist helps to make sure all boxes are ticked in case the incident hits you:

– Stop the attack in progress.

– Cut off the attack vector.

– Assemble the response team.

– Isolate affected instances.

– Identify timeline of attack.

– Identify compromised data.

– Assess risk to other systems.

– Assess risk of re-attack.

– Apply additional mitigations, additions to monitoring, etc.

– Forensic analysis of compromised systems.

– Internal communication.

– Involve law enforcement (if you are not law enforcement yourselves).

– Reach out to external parties that may have been used as vector for attack.

– External communication.

Getting rid of assumptions as a winning strategy

Summarizing, here are the five major points to consider in disaster-recovery:

  1. Repetitive probing and repeated tests of IT security will deliver facts and figures vs. a false feeling of safety
  2. Generally speaking, the lead time to recovery of any of your configurable items (CI) is the best possible recovery time. Any company can be out of business quick, if incapable of returning to an operational state. If Deutsche Bank is not operational one day, it is their doomsday. Security tests will deliver unpleasant facts on IT –assets formerly deemed safe. Take 20 minutes to return to normal as a goal.
  3. Companies lose customers due to vanished trust in their capabilities (e.g. repeated outages or ability to adapt. Public services sometimes have even more critical usage and depend on minutes. Using experiences from R&P public sector/HLS experiences is not a bad idea.
  4. The shortcut in implementing disaster recovery is to implement a proper DR capability already in the early planning phase.
  5. The second best strategy is not to lose time over reviewing existing IT infrastructure and enhance it by applying the R&P MPO-toolset.

rup-contactRoth & Partners have significant experience in the above 5 topics and the capability to support IT experts globally in their challenge to enhance IT security systems. Your advantage: Give us a bell at one of our centers or write a mail:

Sources:

[1] http://resources.sei.cmu.edu/asset_files/TechnicalReport/2004_005_001_14405.pdf

[2] http://www.disasterrecovery.org/

[3] http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-34r1.pdf

[4] https://www.ready.gov/business/implementation/IT

[5] http://www.disasterrecovery.org/plan_steps.html

[6] https://www.ready.gov/business/implementation/IT

[7] http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-34r1.pdf

[8] http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-34r1.pdf

[9] https://www.ready.gov/business/implementation/IT

[10] https://www.ready.gov/business-impact-analysis

[11] https://www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901

 

Also check our other Service Categories:

iTAM™ Asset-Management

iTAM™ Asset-Discovery

iTAM™ intelligent License-Control and 

iTAM™ Geolocation

iTAM™ Disaster Recovery

iTAM™ Patch- and Release management

iTAM™ Rollout-Management