High Availability with Batch Processing what you don’t know!

We have been beating the drum for some time now about the lack of batch monitoring in ‘most’ (HA4i does) High Availability products on the market today.  Most are able to replicate the actual ‘Job Queue’ object but cannot replicate the content as jobs flow through.  

Many of the High Availability Sales people state its not important, they believe they provide all of the replication tools you need so if you have to switch everything will be exactly as it should be. The truth is ‘IT IS IMPORTANT’ if your business relies on batch processing for running its applications. Another problem is the good old ‘HA ROLE SWAP TEST’, as High Availability consultants generally use this as a sign that you are covered should disaster strike, it tends to lull you into a false sense of security. Why? well when they do these role swap tests they are ALWAYS PLANNED, that is the test is run at a specific time of day when all activity has been stopped on the system and everything is ready to change direction. This will mean that all of the pending jobs have been cleared from the job queues and everything is at a ‘KNOWN STATE’. This also ensures when you start on the other system it is always exactly where it needs to be.

Should a disaster strike and the production system is lost during batch processing none all of that goes out of the window! None of the information about the state of the running jobs or the data they have affected will be available to the user. This means its a crap shoot when you restart your applications. You will have no idea what jobs were finished, no idea which jobs crashed as part of the system crash or what jobs were still waiting to run. Even if you have the ability to automatically re-submit jobs based on a job scheduler product you still have no idea about the data and objects which require careful consideration before the application starts.

If you are running IBM’s hardware solution and replicating by iASP, the job queues are still empty when you switch to the target because IBM does not replicate them either! Add to this the fact that you probably turned off journaling because its not needed will give you even more heartache because you cannot see what data the job touched either! Its a black box.
 
There are a number of companies that have seen reality and invested in our solution, for them it was not only a problem which appeared when a disaster struck but they were seeing the time it took to role swap was longer than they could accept due to the time it took to let the job queues to empty.  But I am sure there are others and lots of them that don’t realise that this exposure exists in their solution.   

If you are running IBM i with Batch processing you may be exposed, don’t believe that just because you passed the last role swap test you will be covered should the system fail! Give us a call and lets have a discussion about exactly what exposures you have and just how you can avoid a nasty shock should you ever lose your production system.

Chris…  


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.