Real-Time IBM i Monitoring: Using Watch Programs and Nagios XI

One of the biggest challenges in monitoring is the gap between scheduled checks. You might have a check that runs every hour, but what if a critical message requires a reply right now?

This is where passive checks in Nagios shine. You can maintain a general active check that polls every few hours while simultaneously setting up a passive check that alerts you the instant a specific message hits the queue.

The Secret Weapon: IBM i Watch Programs

The IBM i has a powerful feature called Watch Programs. These can be configured to “watch” for specific messages sent to a message queue (as well as JOBLOGs and LICLOGs). When a match is found, it triggers an exit program to take immediate action.

By linking this technology with NG4i, we can ensure that any *INQ (Inquiry) messages are highlighted in Nagios XI the second they are received.

The Exit Program

We’ve based our solution on a sample program provided by IBM. Our version takes the event data and calls an NG4i command to communicate directly with the Nagios XI server.

Technical Note: A Word on Performance

Warning: IBM documentation states that using *ALL for Message IDs in a watch program is bad practice. While our tests didn’t show too much overhead, checking QAUDJRN when *JOBDTA is set, reveals a high volume of background jobs.

// This program is based on the example provided by IBM 
// https://www.ibm.com/docs/en/i/7.4.0?topic=function-scenario-exit-program-watch-event

#include <stdio.h>
#include <string.h>
#include <stdlib.h> 
#include <escwcht.h>   

// argv[1] = 10 char Reason Called (*MSGID,*LICLOG,*PAL)
// argv[2] = Session ID called from
// argv[3] = Set to Error if error detected (will end watch) set to blanks
// argv[4] = event data

int main(int argc, char **argv) {
   int len = 0;                                             // length     
   int state = 2;                                           // Nagios status 0 OK 1 warning 2 critical                             
   char cmd[256];                                           // command buffer
   char Data[128];                                          // Data buffer
   Qsc_Watch_For_Msg_t *eD;                                 // event data
   
   // return blanks normally
   memset(argv[3],' ',10);
   // link to the data
   eD = (Qsc_Watch_For_Msg_t *)argv[4];
   // set the library list for the NG4i command
   len = sprintf(cmd,"ADDLIBLE LIB(NG4I30)");
   if(system(cmd) != 0) {
      // could be already in library list?
      // just carry on and try command
      }
   // build the data to be sent to the command   
   len = sprintf(Data,"%.7s:%.20s%.20s:%.26s:%.10s",eD->Message_ID,eD->Message_Queue_Name,eD->Msg_File_Name,
                  eD->Job_Name,eD->Msg_Type);
   len = sprintf(cmd,"NG4I30/SNDSVCCHK SVC(MessageWait) DTA('%s') STATE(%d)",Data,state);
   if(system(cmd) != 0) {
      memcpy(argv[3],"*ERROR2   ",10);
      exit(0);
      }    
   exit(0);
}

Implementation

Create the program using the following command.

    CRTBNDC (YourLib/WCHEPGM) SRCFILE(YourLib/QCSRC) SRCMBR(WCHEPGM)

    Submit a watch that will call the program when a *INQ message is received.

    STRWCH SSNID(*GEN) WCHPGM(YourLib/WCHEPGM) WCHMSG((*ALL *NONE *MSGDTA *INQ *GE 0))

    Configure Nagios XI

    You must set up a Passive Check in Nagios XI to receive these notifications. Use the Passive Check Wizard provided to set this up.

    System Name: Must exactly match the name sent from the IBM i.

    Service Name: We used MessageWait. Ensure there are no spaces in the name.

    Volatile: Set this to “Yes” so every new arrival triggers an alert.

    That is all that is needed, just finish with defaults.

    Then ensure you apply the configuration to make sure the service is set up and running. We have already tested and reset so your status will probably be Pending.

    The watch can be viewed using the WRKWCH *ALL command .

    Looking at the watch we can see that it is monitoring for *ALL message in QSYSOPR of type *INQ and in our case will call program CHLIB/WCHEPGM whenever a *WCHEVT occurs.

    When an inquiry message (like a TST0004) hits QSYSOPR, the watch program triggers immediately. Nagios XI will switch to a Critical state, displaying the message ID and job info.

    Clearing the Alert

    Since this is a passive check triggered by a “one-way” watch, Nagios won’t know when the message has been answered. To clear the status:

    Conclusion
    By combining IBM i Watch Programs with NG4i, you get the best of both worlds: reduced system overhead from fewer active polls, and instant visibility into critical messages.

    Leave a Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.