We had been responding to a question on the iSeries Network Forums about how to check that 2 files on different systems had the same data content. This is important for those IBM i customers who are running a replication tool to keep the data in sync.
We already have data checking in our RAP product but it is done at the record level. Basically it reads every record in sequence and checks that the data in the record on the target system is the same. We had looked at how to manage a block type analysis before but never brought the technology into play because we felt the complexity of the solution would create more problems than it was worth.
This time we took a different approach to the problem, we decided to simply review all of the data in a member and create an MD5 checksum for it. This can be checked with a checksum generated on the target system and if the matched you could be assured the data was exactly the same.
The first few trials showed promise until we came across a small snag, the Data space sizes on the systems for a new member we added was different on each system, we were running V5R4 on the source and V6R1 on the target. The dataspace size on the target was 32K larger than that on the source! So we had to look at how to do a comparison using the correct data length. The first pass through we used the actual record length and simply multiplied it up by the number of records, this was a miserable failure! We started to get different MD5 checksums on the same system for members which had the same record count and data!
After some trial and error we manged to fix the problem and could create an accurate MD5 checksum for each member and file. The results are pretty dramatic when you consider the record by record method takes over 1 hour to check 600,000 records (we saw a peak throughput of approximately 471,000 records per hour) and yet the block method takes less than 1/2 minute to do over 1.2million records on one system and 15 seconds on the other!
We have packaged the test programs into a save file for download if you want it.
To install simply restore the objects into a library and call the command CRTMD5. The data will be presented back to the user in reduced form unless you change the option on the command to display the block results. We also add the member level CRC’s into a DB file MD5DETS just in case you want to SQL the results or compare between systems using DDM etc.