The purpose of this blog is to highlight the importance of High Availability Monitoring, Problem Remediation, and Maintenance Services. Disclaimer: IBM i is an operating system. iSeries and AS400 are legacy IBM servers that run i5/OS and OS400 respectively. I use all 3 terms interchangeably to attract a wider audience on the web.
If you are using High Availability (HA) to replicate your IBM i production environment to a secondary system – Congratulations on making a great move to protect your business.
But are you done?
- Are you 100% confident that you can fail over to your secondary system when needed?
- Do you have regulatory requirements or company policies that require periodic validation of your DR capability? Is your staff practiced in procedures to switch over to your backup system?
- Who is checking the health of your replication process to make sure there are no problems? Is your staff qualified to detect & resolve issues BEFORE they interfere with a failover or planned switch?
- Are you upgrading your MIMIX software regularly and updating your MIMIX configuration when you upgrade your applications?
HA Monitoring, Problem Remediation And Maintenance Services (MPRMS) can help close these gaps so you can be SURE your HA/DR strategy will work when you need it most. MPRMS provides HA users with expert support to detect and correct any issues with HA replication between the production and the target environment, so that you are always switch-ready.
In addition to Daily or Weekly monitoring and problem remediation, MPRMS can provide updates, configuration support for upgrades to your application software and annual support for your staff to test and validate your failover plan.
Besides checking your HA solution status, MPRMS also checks and report on other key IBM i performance indicators to spot early trends that may affect not only your HA replication but your server performance as well.
Why Monitor High Availability (HA)?
With any HA solution, things can go wrong. While several HA solutions offer built-in auto-heal features to correct many issues that come up…not all can be fixed without some intervention.
Some errors have a cascading effect, where a small issue can create bigger downstream problems. So, the earlier you find and fix the error, the better. If you monitor your HA regularly, identify any issue and fix it right away – you always have an accurate mirror copy of the production system on the target.
What Can Happen When You Don’t Monitor & Remediate High Availability (HA)?
Picture our client who had a catastrophic server failure that took IBM three (3) days to repair. HA replication was configured – but no one was “minding the store”.
When they switched over to the target server, they found critical data and other objects that were missing (or out-of-sync) from Production.
This situation can happen for a LOT of reasons: staff turnover, skills deficits, oversights due to heavy staff workloads/special projects, vacations, etc.
Monitoring, Problem Remediation And Maintenance Services With Annual HA Testing/DR Validation Options
Depending on your HA solution you may have 3 testing options:
- Virtual Switch – No Downtime. Clients test the target environment while production is still active.
- Limited Live Switch – Switch over to the target for 4-8 hours to validate the failover process & test critical communications processes.
- Extended Live Switch – 2-8 Hr. Move production operations to the target system for an extended period of time until a Switch Back is executed.
If your team is NOT trained to monitor and remediate your HA solution AND you want to be sure your DR solution will work, get the needed training in-house to monitor, remediate and manage your HA solution or find a competent service you can rely on.
Either way, you WANT to be prepared to successfully failover to your backup system when the unexpected happens.
Need help? Call me at 714-593-0387 or email me at blosey@source-data.com
Leave a Reply