If a major component of your vital infrastructure failed today, what would you do? Do you have a plan in place for your business to continue, or would you be out of luck until the issue is resolved and the servers are back online? Having a BC plan is an important part of any IT security management program. With a BC plan, operations can continue in the event of a failure — whether it be a failure of an individual component, a whole system, or your entire data center.
At ServerConsultant, we specialize in creating an effective and reliable Disaster Recovery plan.
When your systems and servers are evaluated, a key issue is to identify Single Points of Failure (SPOFs) in infrastructure. An SPOF is one individual system, component, or process that, if it fails, stops operations either in that particular container or halts operations of the entire network.
For example:
- ServerConsultant may determine that the company's web server could crash. This could result in lost revenue (because all e-commerce transactions fail to process) and in a considerable loss of data.
- Staff in the Accounting department may determine that the accounting application could fail and cause all payroll and AP/AR functions to cease. This would delay payments sent or received.
- An Operations representative might discover that a failure of the air conditioning system in the data center room would cause all computer equipment to overheat and shut down, which could suspend your entire business.
Please note that the same principles and dangers apply to the backup servers as well.
At ServerConsultant, we take disaster planning seriously and plan to protect you against the worst case scenarios. We have a network of geo-redundant backup systems spread across America, Europe and Asia. Don't forget to ask our team about our 3-2-1+1 Backup policy and cold storage systems for maximum protection against data loss.
Contingency Plan for Each Failure Point
After every possible failure point has been identified, reviewed and documented, our team creates a backup plan for when those items fail. (After all, it's not a matter of if they fail, but when.) System redundancy is key when creating contingency plans, especially for critical IT systems. While it may seem like an unnecessary expense, having multiple backup servers is absolutely critical to ensure ongoing operations of the business in the event of a system failure. For example, if your website is running entirely off an individual web server, ServerConsultant still invests in a second server that could be brought online quickly if the primary one goes down.
Additionally, both servers should always be online and use a load balancer to distribute requests evenly across them. This reduces the likelihood of a failure in the first place.
Having a BCDR plan for IT systems is usually easier because most of the time it just involves having spare equipment to swap in. Traffic and operations can often be temporarily routed to a backup network or server if the primary system goes down. This allows a business to continue while any issues get resolved.
Test the BCDR Plan
After identifying single points of failure and creating contingency plans for each one, it's time to test those plans. Realize that it's one thing to have a plan, but it's another thing to have actually tested your plan and made sure that it works. Our team sets aside some time outside of normal operations to simulate a failure and see whether your process goes according to plan.
- We simulate an unexpected shutdown.
- See if you're able to bring the Backup Server online and have it take over.
- If it worked, we record how long services were completely down before it came online.
- If this Time To Recovery (TTR) was longer than is acceptable, we review the process and streamline where possible or see if any recovery process can be made more efficient or automated.
Whether the test was successful or not, document your observations and bring them to the next committee meeting. You can discuss how the test went and what areas can be improved.