One of the frequent concerns we currently hear from many of our clients is what options do they have for business continuity beyond basic tape backup.
Disclaimer: IBM i is an Operating System of POWER servers. iSeries and AS400 are IBM servers. I use IBM i, iSeries and AS400 interchangeably in my title and text to make it easier for internet users to access information like this for their research.
Recovering From Unexpected Down Time
First, let’s consider the causes of unexpected down time.
Industry studies have documented that about 40% of the time, unplanned downtime is caused by human error — incorrect procedure accidentally deleting a file, a spreadsheet uploaded that writes over the correct records, an improper software change, sabotage by a disgruntled employee, and more.
About 25% is caused by a hardware or software problem — disk failure, feature failure (fan, power supply, motherboard, bus, etc.), data corruption, software bug, incompatible software version, application software issues, and the like.
About 17% is the result of viruses or security breaches.
Less than 7% is the result of disaster — a flood, an earthquake, or a jet falling through the building.
Needless to say, there are lots of things beyond natural disasters that contribute to downtime. You probably cannot prevent them from happening. But you can be prepared for a recovery.
Recovery Time Objective (RTO) – How Fast Do You Need To Recover?
The first issue to decide is how fast you need to get back up and operational. The industry term is “Recovery Time Objective” or RTO. In essence, how much time does your team need in order to recover? Or how much downtime can your business afford?
You and your team are the only ones that can answer this question, and there are lots of variables. Over the years, many businesses have thought that downtime from two days to as much as two weeks as acceptable. However, as businesses become more dependent on their computers — for order processing, scheduling, customer service, government compliance, email, and mission-critical applications — as well as 24 x 7 operations, the acceptable recovery time is dramatically shrinking.
More frequently, even small businesses are growing concerned about recovery from 1–2 days to less than 24 hours…even 1–2 hours.
Recovery Point Objective (RPO) – At What Point Do My Records Need To Current?
The second key issue to decide is what an acceptable recovery point is. The industry term is “Recovery Point Objective” or RPO. Basically, this means that after successful recovery, how close are you to having your system current at the point the system was lost?
For many users, this may mean a restore of the last complete backup, followed by a proper sequential restore of the cumulative backups since the last complete one.
More frequently our clients want to recover from the point of last transaction prior to downtime.
Implications for Business Continuity
So, when you look at both your recovery point and the recovery time, how much time has elapsed? Assuming you lost everything on your hard drive but you have all your tapes (last complete backup and all the cumulative backups in the correct sequence), and a good server to work with, this can take about 12–24 hours, depending on the size and complexity of your system. This also assumes that you have not much more than about 300-700 GB to restore. When your system gets much bigger than that — say, 1–5 terabytes or more — recovery is not a simple restore.
By the way, 12–24 hours is a typical restoring time — provided that nothing goes wrong. But in a worst case scenario — which might include faulty tape or tape reader, power disruption, delays in getting key hardware, software, software keys or personnel — two days to as much as five days can be more realistic. There have even been cases where key backup tapes were missing or mislabeled and data needed to be recovered from source documents or archive sources. This can take weeks to months — if it can be done at all.
Consequently, business continuity is a primary issue that merits a close look.
Determining Your Requirements
How much stuff — mission critical applications and data — do you need to protect? If you have 50–200 GB, you should, at the very least, have a combination of high-speed backup tapes and RAID disk protection.
If you have 500 GB or more, it’s a good idea to start looking for additional options for business continuity.
Let’s assume you have a hardware failure. How long does it take to get a replacement part…or replacement server? Depending on your location, your server and its age, and the availability of parts and knowledgeable tech support, this alone might take 1–3 days. If you are in a remote location this may even be 4-7 days longer. In the event of a natural disaster (like hurricane or tornedo) who knows how long it would take just to replace critical hardware…even power.
Also, any data that was lost since the last backup has to be re-entered — if it’s even available. Assuming it is, this may add at least another 1–2 days. So, with the traditional tape backup and restore strategy, you can easily be down from 2–5 days. Possibly, a whole lot longer.
What to do?
Let’s look at your options.
Backup Server Options
For years, one popular approach has been a backup server — a “cold site” (this could be as simple as a backup server in your garage). Don’t laugh! I have lots of clients like this. A “hot site” (such as popularized by many DR providers) is another option. It refers to a replacement server, typically delivered in 1–2 days, or the availability of a similar system at a data center. How much does this approach cost? You can get a rough idea of the cost of a backup server by calling 2–3 used computer dealers. Data center hot site contracts generally run about $1000–$5000 per month or more, depending on the nature of your server. Replacement machines for an entry level IBM i server can run about $300–$1500 per month.
How long will it take you to get up and running again? Figure, in most cases, that you can get your server in 1–2 days. Then you restore in 12 hours to about three days.
So you have effectively been down from 2–5 days, and the cost of limiting your downtime is $300–$5000 per month. Good enough? Well actually, there are newer services that cost less and allow you recover WAY faster.
Cloud Online Backup
The next option you might want to consider is Cloud Online Backup. Simply put, only changed data is transmitted daily from an on-premise backup appliance to a remote hosting backup site. This approach offers several advantages. It is affordable — $400–$1000 per month, depending on how much data you need to save. Frequently, this is cheaper and safer than the cost of an operator to create a daily backup tape, properly label and log it, and hire a service to pick it up and store it offsite for future access. When you lose a file, you can easily download it from the hosted site. If you need an entire system restore, you can have your system restored to a Cloud IBM i POWER server within 12-24 hours, overnight a complete backup copy for a full restore at high performance speeds, or a replacement system populated with your environment as current as your last backup.
Most clients chose to restore on Cloud IBM i POWER server because recovery is faster and easier.
The “Dark Side” of Backup-Based Options
Unfortunately, whether you use backup tapes with a backup server or your vault, if you lose some of your data system after the last backup, all the data since the last backup may have to be re-created.
High Availability (HA)
The ultimate in business continuity is High Availability or real-time replication of changed data. Here’s how it works. Any changes from your production server are transmitted to a remote target server. In this way, you have a mirrored copy of your production system safely offsite and ready to continue business if your production environment has any downtime.
The cost — and here’s the big surprise — is $1500–$2500 per month. (Clearly, the price varies based on the size and complexity of your system.) Because of dramatic cost-reduction in the hardware and the replication software, as well as the speed of secure Internet transmission, High Availability replication is generally far more affordable than hot site or replacement servers. The time it takes to do a role swap? Generally, 20 minutes or less.
If you are serious about protecting your business from potentially disastrous downtime, you should definitely look into High Availability.
Want expert help to design and plan your IBM i (iSeries/AS400) Business Continuity Plan? Email blosey@source-data.com or call 714-593-0387
Leave a Reply