Data auditing in a replicated environment

One of the problems I have seen people struggle with is the ability to check the data contained in one file with the data stored in another. High Availability products have been supplying a audit suite to check this for some time with varying degrees of success.

RAP/400 is not a HA product but does rely on the replication technology (Remote Journalling) that these products use. RAP/400 is simply applying the data in a different manner by using the IBM commands shipped with the OS and not a self designed technology. But the question is still asked, is the data in sync between the two sites?

There are a lot of reasons for the data to be out of sync, the main one is user error, either they have restored the wrong data, the save was not done correctly or the user has inadvertently accessed the programs on the remote system and updated some data. The problem is how do you know which data is incorrect? Replication processes do not check the data as part of the apply process (this would be too inefficient) and only sense a problem when an update occurs such as a delete of a record which doesn’t exist, an add to the file occurs at the wrong RRN or the add fails because the file has less records than the write is attempting to write to. This is not to say errors are an everyday problem in most cases, but you wont know unless you check that the data in each record is the same.

We decided to look into this problem with a few limitations, the check will only look at the data stored with a DB file (not IFS based) and it will only fix data errors. If other errors such as RRN’s are out of sync occur we believe a full restore is the best option, the amount of CPU and resources required to fix up this kind of problem could be huge. We also wanted it to run solely on the target system, RAP/400 only installs on the target system so to fall in line with this we had to come up with a effective single system solution. We now have a simple technology test developed, when run it checks the data from each system record by record. If it finds an error, it will simply copy the data from the source system and write it over the target system data. Eventually we will make it store the updates and allow the user to determine which ones it should apply.

We expect this new option to be released in a future release of RAP/400 but if you are interested in understanding and testing the technology let us know.

Chris…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.