Using Virtual devices to assist with Availability

We recently presented to the Toronto User Group about the use of technology which can help with their Availability strategy. As part of the presentation we discussed the use of Virtual devices available since V5R1 and which are being improved with each new release of the OS. We think this is a technology which will continue to be adapted in the customer base as it offers many benefits to those who invest the time to understand its capabilities. We mentioned that we would show our findings as part of a new white paper we had under development, but we have decided to show them here in the interests of time.

System Configuration
The test we recorded were carried out on our internal i520 system which has the following configuration.
IBM system i520 Processor 7390 with 1Gb Memory and 140GB DASD
The system is very light in processing activity and DASD utilization which allows us to carry out the tests in a fairly clean environment.

Background
The reason we started on this endeavor was to allow us to manage our own save operations with as little human intervention as possible. We previously saved the data to tape every night and rotated the tapes on a weekly basis to allow for tape wear and tape head cleaning. This worked out well as long as we remembered to change the tapes and did the cleaning on a regular basis, failure to carry this out did result in a few damaged saves. When we moved to the new i520 from the old 270, the tapes we had been using were all incompatible, they could be read but not be written to! This gave us a short period where we were unable to carry out any backups as we built the new system and before we received the new tapes from the supplier. This prompted us to look for an alternative solution, one which didn’t involve a lot of cost yet allowed us to provide a level of coverage we felt would be adequate.
We are a software development company so a lot of our intellectual property is stored on the system, if we lose it we could lose our business. The fact that we develop new technology and code on a regular basis means we have ever changing data. Our license key cutting and product packaging is also done on the same system, so if it’s not available we could neither develop nor ship product. Having lost 3 disks recently in separate incidents, plus having no disk protection turned on we felt the pain pretty quickly, the fact that we had the tape saves running every night did help. We had saved everything each night as one save operation, this meant we didn’t have to spend a lot of time figuring out what tapes we needed. Recovery was pretty simple once and we had the new disks back in place and we found that we had lost very little data if any. The biggest problem encountered was the length of time it takes to restore from the tape drive. We have a very small data store so the amount of data saved each night is under 2GB (something we expect that to grow significantly in the near future with our ongoing development plans) but it still took some time to restore.

This prompted us to look at the virtual devices, originally we expected to save the data to the virtual device and then copy it to the tape on completion of the save operation. Providing us an additional benefit in the fact that we have two copies of the data, one on DASD for quick restore and the other on tape for security. Not having an offsite tape storage we do remove the tapes when we are not in the office for any period of time. Plus we have a fire proof safe for storing the tapes, although I am not sure it will stand up to a real fire! but that’s another story! Once the initial tests were completed we found the save to the Virtual device was very quick indeed as our results below will show.
Saving to the Virtual tape gives you the benefit of being able to copy the data to a device which supports the same data structure as the save you have made, but it does require a device which is connected to the System i5 for the save to be carried out. This wasn’t a suitable alternative for us and only gave a reduced save window that the system would be locked down for. We then looked at the Virtual optical devices, these offer the option of saving to a CD or DVD format that could be reproduced on any CD or DVD burning software. This is what we use today and as the test show is a very easy to use and viable option if reducing your save windows is a requirement. Another benefit is the data streams faster from the CD device than the Tape Drive on the System i5, so you also have the benefit of a faster restore from these CD burned images.
To make sure we covered as many bases as possible we built a number of tests which would show each technology and how well it performed. We started with a fairly simple test which saved our normal data to the tape drive. The save was adjusted to do a CLEAR *NO as this does take a long time on a tape drive, we normally have it set to CLEAR *YES and were interested in just how long this would take. The system was not being used at any time during the tests which reduces any lock contention issues we may have encountered in a normal working environment as we don’t use Save While Active.

Testing
Here is a listing of the first program we used to save directly to tape.
Program to save to Tape (SLR60)

PGM

DCL VAR(&MSGDTA) TYPE(*CHAR) LEN(150)

SNDBRKMSG MSG('Nightly save is starting, please log +
off the system.') TOMSGQ(*ALLWS)

START: INZTAP DEV(TAP01) NEWVOL(NSAVE) CHECK(*NO) +
DENSITY(*CTGTYPE)

SAV DEV('/QSYS.LIB/TAP01.DEVD') OBJ(('/home/*') +
('/www/*') ('/usr/local/Zend/core/etc/*') +
('/usr/local/Zend/apache2/conf/*'))

MONMSG MSGID(CPF3837 CPF3823)
SAVLIB LIB(*ALLUSR) DEV(TAP01) OMITLIB( Q* +
batchlib b_dta_lib jqg_data) +
OUTPUT(*PRINT)
MONMSG MSGID(CPF3777) EXEC(GOTO CMDLBL(CPF3777))
RETURN

CPF3777:
CHGVAR VAR(&MSGDTA) VALUE('Job NSAVE ran +
successfully and ended with the usual +
CPF3777 message - not all objects saved. +
Check the job output for details')

SNDPGMMSG MSGID(CPF9898) MSGF(QCPFMSG) MSGDTA(&MSGDTA) +
TOMSGQ(CHRISH)

ENDPGM

Results
This test took between 15 – 19 minutes to run which does not included any FTP to the remote host. We did a couple of tests just to see how much the time differed between each of the tests. 4 minutes may not seem a lot but it’s an additional 30% approximately and we could not identify what the factors are that made this difference.

Second test
We then did the same exercise with the CLEAR *YES parameter set so we could see the effect of this parameter on the overall time taken for the save.

Program save to Tape CLEAR *YES

PGM

DCL VAR(&MSGDTA) TYPE(*CHAR) LEN(150)

SNDBRKMSG MSG('Nightly save is starting, please log +
off the system.') TOMSGQ(*ALLWS)

START: INZTAP DEV(TAP01) NEWVOL(NSAVE) CHECK(*NO) +
DENSITY(*CTGTYPE) CLEAR(*YES)

SAV DEV('/QSYS.LIB/TAP01.DEVD') OBJ(('/home/*') +
('/www/*') ('/usr/local/Zend/core/etc/*') +
('/usr/local/Zend/apache2/conf/*'))

MONMSG MSGID(CPF3837 CPF3823)
SAVLIB LIB(*ALLUSR) DEV(TAP01) OMITLIB( Q* +
batchlib b_dta_lib jqg_data) +
OUTPUT(*PRINT)
MONMSG MSGID(CPF3777) EXEC(GOTO CMDLBL(CPF3777))
RETURN

CPF3777:
CHGVAR VAR(&MSGDTA) VALUE('Job NSAVE ran +
successfully and ended with the usual +
CPF3777 message - not all objects saved. +
Check the job output for details')

SNDPGMMSG MSGID(CPF9898) MSGF(QCPFMSG) MSGDTA(&MSGDTA) +
TOMSGQ(CHRISH)
ENDPGM

Results
This test shows the impact of using CLEAR *YES has on a save to tape. Our tape drive is just the standard one shipped with the system which holds 30GB uncompressed and 60GB compressed. It took 1 hour 54 minutes to run, this is not something many companies will configure unless they have security requirements to ensure the tapes are wiped of all data before the next save. One thing we did notice was the save time in the logs, it shows the save for each object time was 12 minutes before the end of the save?

Virtual Device Tests
The next stage was to look at the virtual devices. We set up a device for each format VRTOPT01 was for the CD/DVD image and VRTTAP01 for the Tape image. We went for the biggest tape size just for convenience as the save times should not be affected by this parameter? One day we will test out this theory but for now we took the stance that it should have little if any effect based on our save size.

This is the config for the Virtual Tape Drive


Device description . . . . . . . . : VRTTAP01
Option . . . . . . . . . . . . . . : *BASIC
Category of device . . . . . . . . : *TAP
Device type . . . . . . . . . . . : 63B0
Device model . . . . . . . . . . . : 001
Resource name . . . . . . . . . . : TAPVRT01
Online at IPL . . . . . . . . . . : *YES
Unload device at vary off . . . . : *YES
Allocated to:
Job name . . . . . . . . . . . . . : QTAPARB
User . . . . . . . . . . . . . . : QSYS
Number . . . . . . . . . . . . . : 162726
Message queue . . . . . . . . . . : QSYSOPR
Library . . . . . . . . . . . . : QSYS
Device description . . . . . . . . : VRTTAP01
Option . . . . . . . . . . . . . . : *BASIC
Category of device . . . . . . . . : *TAP
Current message queue . . . . . . : QSYSOPR
Library . . . . . . . . . . . . : QSYS
Text . . . . . . . . . . . . . . . : Virtual Tape Drive

This is the configuration for the Virtual Optical drive

Device description . . . . . . . . : VIRTOPT01
Option . . . . . . . . . . . . . . : *BASIC
Category of device . . . . . . . . : *OPT

Device type . . . . . . . . . . . : 632B
Device model . . . . . . . . . . . : 001
Resource name . . . . . . . . . . : OPTVRT01
Online at IPL . . . . . . . . . . : *YES
Message queue . . . . . . . . . . : QSYSOPR
Library . . . . . . . . . . . . : QSYS
Text . . . . . . . . . . . . . . . : Virtual optical device

Overview of the test
We developed a couple of programs for transferring the saved data to the Linux system. The program is based on programs we found on the internet and is not guaranteed to work in your environment. We have tested it to a Windows based FTP server as well, but didn’t detail the differences as we will run a Linux based system for our data storage requirements, it tends to be more stable than the Windows one. We had to develop a couple of scripts for the test because the data is stored in different places on the IFS so the transfer scripts have to take this into account. We have provided the scripts plus other information where we feel it’s important.

Test scripts
The scripts all have the same flow and are based around a couple of CL programs which save the Data to the Virtual Tape drive them FTP it off to the store.
The first set of scripts use the Virtual Tape Drive as the initial storage device and then copy the resulting image to the FTP store.

Program SAVTST03

PGM

DCL VAR(&MSGDTA) TYPE(*CHAR) LEN(150)

SNDBRKMSG MSG('Nightly save is starting, please log +
off the system.') TOMSGQ(*ALLWS)

START: INZTAP DEV(VRTTAP01) NEWVOL(NSAVE) CHECK(*NO) +
DENSITY(*VRT256K)

SAV DEV('/QSYS.LIB/TAP01.DEVD') OBJ(('/home/*') +
('/www/*') ('/usr/local/Zend/core/etc/*') +
('/usr/local/Zend/apache2/conf/*'))

MONMSG MSGID(CPF3837 CPF3823)
SAVLIB LIB(*ALLUSR) DEV(TAP01) OMITLIB( Q* +
batchlib b_dta_lib jqg_data) +
OUTPUT(*PRINT)
MONMSG MSGID(CPF3777) EXEC(GOTO CMDLBL(CPF3777))
GOTO CMDLBL(FTP)
CPF3777:
CHGVAR VAR(&MSGDTA) VALUE('Job NSAVE ran +
successfully and ended with the usual +
CPF3777 message - not all objects saved. +
Check the job output for details')

SNDPGMMSG MSGID(CPF9898) MSGF(QCPFMSG) MSGDTA(&MSGDTA) +
TOMSGQ(CHRISH)
FTP: CALL PGM(SASLIB/FTPCTLPGM2)

ENDPGM

Program FTPCTLPGM2

PGM

OVRDBF FILE(INPUT) TOFILE(SASLIB/QCLSRC) +
MBR(CPYSAVDTA2)
CLRPFM FILE(SASLIB/QCLSRC) MBR(FTPOUTPUT2)
OVRDBF FILE(OUPUT) TOFILE(SASLIB/QCLSRC) +
MBR(FTPOUTPUT2)
FTP RMTSYS(PLUTO)
DLTOVR FILE(*ALL)
ENDPGM

Text File CPYSAVDTA2

ftpuser password
BIN
NAMEFMT 1
lcd /shieldtape
MPUT *
QUIT

You will see we have a couple of files FTPOUTPUT1&2 which are just empty txt files that hold the output of the transfer operation and could be the same file for the purposes of this test.

Results Virtual Tape
Running the above scripts took 4 minutes 10 seconds from submission to the completion of the save to the remote system. The save took 142 seconds to complete which gave the save a total time of less than 2 minutes. This shows a 7-8 times speed improvement over a standard tape save.

Virtual Optical scripts
Next we did the same exercise with the Virtual Optical devices. We had configured and mounted the Catalogue entries before the test started. If we did not create enough volumes for the test we could have a problem with the configuring, initializing and attaching of the next volume because of the way the OS manages the volume full message and the ability to attach a new volume automatically.
Program SAVTST02

PGM

DCL VAR(&MSGDTA) TYPE(*CHAR) LEN(150)

SNDBRKMSG MSG('Nightly save is starting, please log +
off the system.') TOMSGQ(*ALLWS)

/********************************************************************** */
/* ADDED FOR THE VIRTUAL OPTICAL SUPPORT */
/********************************************************************** */

LODIMGCLG IMGCLG(SHIELDOPT) DEV(VRTOPT01)
LODIMGCLGE IMGCLG(SHIELDOPT)
INZOPT VOL(SAV00) DEV(VRTOPT01) CHECK(*NO) +
CLEAR(*YES) TEXT('nightly save') MEDFMT(*UDF)
LODIMGCLGE IMGCLG(SHIELDOPT) IMGCLGIDX(*NEXT)
INZOPT VOL(SAV01) DEV(VRTOPT01) CHECK(*NO) +
CLEAR(*YES) TEXT('nightly save') MEDFMT(*UDF)
LODIMGCLGE IMGCLG(SHIELDOPT)

SAV DEV('/QSYS.LIB/vrtopt01.DEVD') +
OBJ(('/home/*') ('/www/*') +
('/usr/local/Zend/core/etc/*') +
('/usr/local/Zend/apache2/conf/*'))

MONMSG MSGID(CPF3837 CPF3823)
SAVLIB LIB(*ALLUSR) DEV(VRTOPT01) OMITLIB(Q* +
BATCHLIB B_DTA_LIB JQG_DATA) OUTPUT(*PRINT)
MONMSG MSGID(CPF3777) EXEC(GOTO CMDLBL(CPF3777))
GOTO CMDLBL(FTP)

CPF3777:
CHGVAR VAR(&MSGDTA) VALUE('Job NSAVE ran +
successfully and ended with the usual +
CPF3777 message - not all objects saved. +
Check the job output for details')

SNDPGMMSG MSGID(CPF9898) MSGF(QCPFMSG) MSGDTA(&MSGDTA) +
TOMSGQ(CHRISH)
FTP: CALL PGM(SASLIB/FTPCTLPGM1)

ENDPGM

Program FTPCTLPGM1

PGM

OVRDBF FILE(INPUT) TOFILE(SASLIB/QCLSRC) +
MBR(CPYSAVDTA1)
CLRPFM FILE(SASLIB/QCLSRC) MBR(FTPOUTPUT1)
OVRDBF FILE(OUPUT) TOFILE(SASLIB/QCLSRC) +
MBR(FTPOUTPUT1)
FTP RMTSYS(PLUTO)
DLTOVR FILE(*ALL)
ENDPGM

Text File CPYSAVDTA1


ftpuser password
BIN
NAMEFMT 1
lcd /shieldopt
MPUT *
QUIT

Results Virtual Optical
We submitted the above scripts and saw a total time of 7 minutes 6 seconds from submission to the transfer operation completing. The interesting fact here is the save took less than 2 minutes to complete even with the mounting of two volumes and the initialization of each volume. The additional time was taken up with the FTP script, we also noticed the time taken to transport each image was approximately 90 seconds each image but the rest of the time has to be because we are transferring more than a single object this time

Conclusions
If there is one thing which reduces the effectiveness of this solution it is the need for more IBM DASD to allow you to do your saves. We are looking at ways of improving this using the Watch API’s and the FTP process and hope to have something to announce soon. Once the saves have been completed you can delete the IFS images which would save you the space overall, plus using the same image location and overwriting the previous images reduces the DASD requirement. There are a lot of other options we hope to review in the near future which could really make this a useful option for many customers. Using a non System i5 for the target for the FTP gives us the advantage of still retaining the security of DASD based storage which can be restored very quickly plus a very inviting costs analysis, our DASD for the Linux box (1TB) cost us less than $600. We are paying much more for the i5 tapes let alone the cost of i5 DASD.

Things which could be better
A couple of things we have noticed which need to be addressed with the Virtual Support on the System i5 are.

1. Little or no API support for the Virtual Devices, if these devices are to become more useable in the Availability environment they need to have better API support such as being able to display the contents of each Virtual image to an outfile or user space. IBM provides much better support for the Save File object display which could possibly be enhanced to include support for the IFS based Virtual images.
2. The Optical support has an issue where the volume fills and a new one has to be attached. The message queue used for the OPT149F message is QSYSOPR and is not configurable to another message queue, using the STRWCH API’s can help handle this situation but allowing the message queue to be configured would ensure the message is not inadvertently answered by someone watching the QSYSOPR message queue before the program attached to the watch can answer it. You do have the option to just have the system create the new volume and attach it automatically for you, but this removes some of the control from the programmer in how they want to label and track the images.
3. Documentation for the watches and the use of virtual optical devices is not the best I have seen from IBM. Some may be put off from using the technology after a few hours of trying to decipher the information? A much simpler description of what is available and how it can be used should be created.

Wrap up
Overall we feel this is very good technology which requires a little more support from IBM to allow the ISV community to develop the user interfaces and hide some of the intricate nuances from the users. Something that happens with most other technology IBM delivers, they are very, very good at the plumbing but generally stop short in delivering the interface which makes it a winner.

I hope you found this exercise informative, as we develop this into a more automated solution and hopefully get better API support from IBM we will publish our findings.

Chris…

3 thoughts on “Using Virtual devices to assist with Availability”

  1. WE have just completed a renewal of our old LAN switch and found that even though the System i5 was able to move the link speed to 1GB, our target system only allows 100MB as a maximum, we increased the buffer size in accordance with the IBM help and did see a little improvement but overall the speed was drastically reduced.

    We looked into the switch logs and found no errors or re-transmissions so must be normal for a 1GB to 100MB link to be slower than a 100MB to 100MB with a 100MB switch? We tested the transfer a couple of times but still found the link took and average of 142 seconds as opposed to 90 secs for the same data. We then connected to the server with a 1GB card in it and the transfer rates moved back up.

    Another nuance we noticed was the second transfer always took a lot longer than the first, this could be a server related issue as it only occurs on the 1GB link? The target is a Dual Zeon processor running Windows Small Business Server.

    We hope to replace the card in the Linux server so it can link at 1GB and try the tests again…

  2. We have now completed further testing for the virtual devices and a comment made earlier is incorrect. We had believed the QSYSOPR message queue was the only message queue that the OPT149F message would be sent to if a Volume had been filled by the save process. However if you set the message queue for the device to an alternative the OPT149F message is sent to that message queue not QSYSOPR as we had previously thought! This removes one of the problems we had that allows automation of the save process. You can set the message queue up and wait on the message being sent to the queue, respond accordingly to the message and then carry on with the process. This may also help with the ability to split the FTP process into chunks, as each volume is filled you can FTP that volume off to the FTP server and once completed delete it from the local source.

    Chris…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.