The initial release of AAG has only just made the rounds and yet we already have some new features that we are adding. With the first couple of installs underway we are finding new features that need to be added to the product as we talk through the products capabilities with clients. The following enhancements have been built due to those discussions and will be available for download once testing is completed.
Our HA4i product provides a number of features that reduce the management overhead on the users. We provide a command that can be set up in the job scheduler to run on a frequent basis and check that everything that needs to be running is, and any problems found are fixed. However the user defined audits are not covered under these checks because we have no control over their running attributes or timing. Usually the audits are run when no activity is occurring against the objects so a check of the objects attributes between systems should provide the same results. In most cases this means the audits are run over night when no one is around and during normal batch processing.
The audits can be pretty intensive long running processes due to the number of objects being processed and the level of attribute/data checks being carried out. This in turn can cause problems where the job exceeds the time allotted to complete the task, or maybe the results are skewed because other processes are started before the audit completes which changed some objects. We have also seen it where problems found during the audit have caused the job queue process to be held waiting for operator intervention causing later jobs not to run.
The customer asked if we could monitor the audits and alert if the job is running outside of its allowed time frame or did not start. We also wanted to track if errors are found even if they are fixed. The data produced by the audits for consumption by the user did not provide the status information required to check the timing of the audit or determine the number of errors and resync status without processing lots of data, so we produced a new data collection process that would provide all of that information when the audit ran and we made the collection optional.
Now the user can run a check to make sure the audit started and that it completed within the time frame specified. We also track the number of errors found and if a request to resync the object was taken.
Note: Fixing auditing errors each time they are found is not a solution, you should identify the source of the error and fix that so there is zero exposure to your recovery should you carry out a role swap between the time the object goes out of sync and the audit fixing that error. If the error is due to a HA4i problem we need to fix it so everyone gets the benefit.
When we ship changes to our products they are packaged either as a PTF or an update SAVF. Each time we produce a PTF or create a subsequent update we create a tracking entry in our database that is retrievable from our web portal via the maintenance menus provided with each product. We also provide the PTF/Update level on our website on the specific product page so users can see what the latest levels are. All of this requires the user sign on and verify the system is at the latest level before taking any relevant actions to download and install the fixes/enhancements.
During the install at one client we noticed the products were down level and needed to be updated which we did via the maintenance process. Once installed the client was amazed at some of the enhancements they now had in the product and some of the niggles they were experiencing on a day to day basis were no longer there. This prompted the question as to why they had not updated to the latest levels. We had provided the information in a number of places and yet the customer did not know where to look so we had to address that.
As we use a lot of Linux servers in house we are very used to seeing notification of updates being available for the installed packages through the web based interface we use (Webmin). We wanted to add something similar so the users could set up a check on a regular basis for updates and PTF’s without being forced to install at the same time. Signing on to the system and using our Maintenance menu option to display the latest information works only as long as someone remembers to sign on and do the check (we do provide a command as well that would provide similar information but it never seems to be used :-().
As Nagios is used to do lots of other checks against the system we wanted to add a feature that would allow a regular check to run which would grab the latest data from the web and compare it against the data which shows the installed version. The check is run from the IBM i, it reaches out to the web portal to get the latest information, extracts the installed version and responds back to the Nagios instance if they do not match. We have also added a check for the maintenance status, if you are not under maintenance you cannot download and install any updates/enhancements so its important to know if that is the case.
All of the above are now packaged and ready to ship to the customer base, the HA4i updates required to run the checks are already available to download and install. As with all Nagios check commands you can set the check attributes and configure how each notification is handled.
As we went through the above we also recognized that we had to add the maintenance review features to all of the products so updates have been provided for AAG and EM4i to bring them in line with our other products.
If you need a demo or want to discuss further please call and we will be happy to go over what we have. Also if you have some ideas on what you would like to see added to the product lets us know.