AAG is the top solution to integrate IBM i systems into any Nagios monitoring configuration. With new additions being developed constantly and added to our current 155+ check commands, AAG provides an exceptional number of data points to monitor and track your IBM i systems.
High Availability software is only part of the solution in today’s distributed solutions, ensuring the system is always available requires notification of any system events that could impede application processing or lead to a system loss. This is not only pertinent for the production system but also for the recovery system, we have seen a number of occasions where an unmonitored recovery system has been offline for a significant period of time due to a lack of monitoring. Nagios is a well-known player in the Enterprise Monitoring Solution market but has very little IBM i integration, even with the community plugins provided. At Shield Advanced Solutions we looked initially at the available plugins provided by community members and IBM to add monitoring of the IBM i to Nagios but soon decided that an alternative approach would be required, this resulted in a new agent for the IBM i and a new plugin for Nagios. AAG is the Nagios plugin that allows system stats to be collected from the IBM i and reported back to Nagios, while NG4i is the IBM i responder application that provides the data back to AAG. NG4i is provided as an IBM LPP. Configuration is very simple using the panel groups provided, this means an IBM i instance can be installed and running in minutes. Since V2R0-100323 AAG now supports the ability to monitor the HMC. Shield provides HMC specific check_commands and handles communication between your Nagios enviroment and the HMC, in a similar manner to how AAG monitors your IBM i.
AAG is able to run on many different Linux platforms. We have had great success using a credit card sized Linux board called Raspberry Pi. Shield is able to provide a plug and play solution for your monitoring by shipping a pre-configured and assembled RPi that is matched to your infastructure. There is no need to question the relyability/longevity of micro-SD cards as all RPis provided by Shield will be converted to support NVMe drives. Due to the current chip shortage around the world, Raspberry Pis are in short supply. Contact us today to check availability.
With AAG being Shield's most recent development we are offering demos in order to provide a more personalized overview of our new solution. Contact us today to request a demo and find out what AAG can do for you!
FREE DemoAAG has been developed to alert users of their infrastructure's status as quickly and efficiently as possible. In order to keep the user informed AAG uses several methods of notification. Nagios provides standard email notification for host or service issues. To improve upon this AAG also implements the Pushover API to send notifications either directly to a user's device or broadcast to a user group. Pushover is a cost effective, single purchase application that provides simple and effective notifications to either apple, android or a browser interface. Find out more here. Integrated into the AAG Linux distribution is NagiosTV developed by Chris Carey. This browser application provides the user with realtime, accurate information regarding the status of their Nagios infrastructure.
The following list are the possible check commands to be run against an IBM i host. This set of commands relates to Shield's product HA4i and allow the user to monitor the host's replication status.
check_HA4i_RATE | Transfer rate between the HA4i *MGT and *NET system. |
check_HA4i_APY | Apply status from the *NET system. |
check_HA4i_RJRN | List each remote journal that is configured. |
check_HA4i_OBJ | Object replication status from the *MGT system. |
check_HA4i_SPLF | Spool file replication status from the *MGT system. |
check_HA4i_SYNC | Sync Manager status from the *MGT system. |
check_HA4i_RJOBS | Number of HA4i responder jobs running on the system. |
check_HA4i_SPLFW | Number of spool files that have been marked for replication but are still waiting to be sent to the remote system. |
check_HA4i_IJRN | Number of *INACTIVE journals, configured for replication. |
check_HA4i_STATUS | HA4i Server status for each critical server running in the HA4i subsystem. |
check_HA4i_AUDSTS | Returns information on the last audit that was run. Severity can be set on whether the audit has finished or not. |
check_HA4i_NEWLIB | Lists libraries that have been added but not configured for replication. |
check_HA4i_NEWDEV | Lists devices that have been created but not configured for replication. |
check_HA4i_ROLESWAP | Returns information on the last Role-Swap. |
check_HA4i_AUDERR | Returns any HA4i audits reporting errors in the last 24 hours. |
check_HA4i_RSREADY | Returns Role swap ready state between your HA4i LPARs. |
This set of commands relates to Shield's product EM4i and allow the user to monitor the host's message monitoring application.
check_EM4i_RESPWAIT | Number of *INQ messages which EM4i is waiting for responses to. |
check_EM4I_EMLOG | Returns Email notifications sent out by EM4i. |
*NEW check_EM4I_MSGCFG | Returns unconfigured message IDs that EM4i picked up. |
This set of commands are Shield's general use commands. Using these checks a user is able to monitor general values on an IBM i.
check_Shield_KEYEXP | Number of days until a LPP license key expires. |
check_Shield_SBSSRCH | Number of jobs running in a specified subsystem, with a *MSGW status. |
check_Shield_JOBSRCH | Number of jobs that are running matching specified search criteria. |
check_Shield_RPYW | Number of messages awaiting a reply in a specific message queue. |
check_Shield_DSBPRF | Number of profiles that are in a disabled state. |
check_Shield_SBSJOB | Number of active jobs for a given subsystem. |
check_Shield_JOBQ | Number of jobs on a specified job queue. |
check_Shield_RCVR | Quantity and total size of the receivers in a specified Library. |
check_Shield_CACHEBAT | Cache battery state and quantity. |
check_Shield_UPDLVL | Returns the information about an installed shield product. Displays PTF level, latest updates and maintenance expiry. |
check_Shield_PRDSYNC | Ensures AAG and NG4i are in-sync after updates. |
check_Shield_APTUPD | Returns information about possible apt updates for linux distro. Run this command against localhost of Nagios. |
check_Shield_PTF | Returns PTF levels for OS. If a PTF is not up to date, both latest and installed levels are displayed. |
check_Shield_JOBSCDE | Checks the status of a job schedule entry. Returns success state and time of last / next submission. |
check_Shield_SSLCERT | Returns number of SSL Certificates expiring in the next x days is set in configs. |
check_Shield_DSKSTS | Displays number of disks reported / configured and any disk errors. |
check_Shield_UPTIME | Returns the number of Min since last IPL. |
check_Shield_SYSVAL | Compares returned system value against a passed in parameter. |
check_Shield_SECUPD | Returns security bulletins from IBM since a specified date. |
check_Shield_AUTUPD | Can be configured to automatically update AAG OR notify users when an update is available for AAG. |
check_Shield_RCVRBKLG | Returns a specified journal receiver backlog and estimated number of minutes to clear. |
check_Shield_PING | Returns ping response time between IBM i and another address. |
check_Shield_OSLVL | Returns current OS level on IBM i. |
check_Shield_ASPSTS | Returns current status of passed ASP. |
check_Shield_ASPAVL | Returns disk space available on passed ASP. |
check_Shield_ASPMIR | Returns current mirror status of passed ASP. |
check_Shield_ASPLIFE | Percentage of life remaining for NVMe drives in passed ASP. |
check_Shield_ASPDSK | Returns Disk status for passed ASP. |
check_Shield_ASPOVRFLW | Amount of overflow storage for passed ASP. |
check_Shield_ASPGEOSTS | Geographic mirror data status. |
check_Shield_DMGOBJ | Number of damaged objects in a library. |
check_Shield_DEVSTS | Returns device status. |
check_Shield_TOPINTTRANS | Top x jobs by interactive transactions. |
check_Shield_TOPINTRS | Top x jobs by interactive response time. |
check_Shield_TOPCPUTIME | Top x jobs by CPU time(ms). |
check_Shield_TOPCPU | Top x jobs by CPU(%). |
check_Shield_TOPDSKIO | Top x jobs by disk I/O. |
check_Shield_JOBCPU | Returns the information about jobs that match the entered parameters specific to the CPU used and the runtime. |
check_Shield_JOBSTG | Returns the information about jobs that match the entered parameters specific to the QTEMP size and the temporary storage used. |
check_Shield_DQECOUNT | Returns number of data queue entries. |
check_Shield_LIBSIZE | Returns library size. |
check_Shield_WRKPRB | Problems on the IBM i returned by WRKPRB. |
check_Shield_OUTQC | Returns spoolfile count on an out queue. |
check_Shield_SSTP | Returns Status, Password Expiry or Password expiry Date of a SST Profile. |
check_Shield_JOBESTS | Returns job END Status. Uses search criteria to pull single or multiple jobs. |
check_Shield_PORTCONN | Returns connected users on a specified port. |
check_Shield_PGMEXP | Returns Program exit point registered status. |
check_Shield_QHST | Returns messages on QHST. |
check_Shield_JOBFUNC | Returns jobs running specified function. |
check_Shield_JOBSBSSTS | Returns jobs with specific status in subsystem. |
check_Shield_FMWR | Returns system firmware level and if there are updates available. |
check_Shield_USRCLS | Returns list of user profiles by user class. |
check_Shield_SPCAUTH | Returns list of user profiles by Special Authority. |
check_Shield_CPURESET | CPU usage with reset flag. |
check_Shield_STLUSR | Returns count and list of Stale Users. |
check_Shield_UPFFL | Returns count of failed loggins for user profile. |
check_Shield_UPFSTS | User profile Status. |
check_Shield_UPFPWDE | Days until password expiry for a user profile. |
check_Shield_UPFCLS | User profile class. |
check_Shield_UPFSA | User profile special authorities. |
check_Shield_UPFSTG | Storage used by a user profile. |
check_Shield_UPFEXP | User Profile Expiry Date. |
check_Shield_MSGSEV | Messages in the last x min filterd by Severity. |
check_Shield_JOBSCDE2 | JOBSCDE Status with additional details compared to check_Shield_JOBSCDE. |
*NEW check_Shield_IFSSTR | Search for SubString within an IFS file. |
*NEW check_Shield_DTAASTR | Search for SubString within a Data Area. |
This set of commands relates to the high availability product PowerHA.
check_PowerHA_SYNC | Returns sync percentage complete for PowerHA. |
This set of commands relates to the product BRMS.
check_BRMS_WERR | Returns number of Write errors. |
check_BRMS_RERR | Returns number of Read errors. |
check_BRMS_USED | Returns number of times volume is used. |
check_BRMS_FULL | Returns if volume is full. |
check_BRMS_EXPD | Returns volume expiry status. |
check_BRMS_EDAT | Returns days until volume expiry. |
check_BRMS_DUPD | Returns duplication status for media. |
check_BRMS_STS | Returns BRMS status for control group. |
This set of commands relates to your MiMiX replication status.
check_MMX_JRNSTS | Mimix Journal Status. |
check_MMX_SYSSTS | Mimix System Status. |
check_MMX_AGSTS | Mimix Application Group Status. |
check_MMX_ARSTS | Mimix Reciever Information. |
check_MMX_APYSTS | Mimix Apply Status. |
check_MMX_CNTRSTS | Mimix Container Replication Status. |
check_MMX_CFGCHG | Mimix Config Changes. |
check_MMX_OTESTS | Mimix Object Tracking Entries. |
check_MMX_ITESTS | Mimix IFS Tracking Entries. |
check_MMX_FESTS | Mimix File Tracking Entries. |
check_MMX_OBJAPY | Mimix Object Replication Process. |
check_MMX_RJLNK | Mimix Remote Journal Link. |
check_MMX_SWSTS | Mimix Switch Status. |
check_MMX_DBSND | Mimix DB Send Status. |
This set of commands are oriented around the general status of the IBM i.
check_Status_AVLDISK | Available disk as a percentage. |
check_Status_TOTDISK | Total disk in GB. |
check_Status_AVLDISKGB | Available disk in GB. |
check_Status_SYSNAME | System name. |
check_Status_SYSSTATE | Returns system state. |
check_Status_CPUUSED | Percentage of processor used. |
check_Status_NUMJOB | Number of jobs running on system. |
check_Status_PADDR | Percentage of permanent addresses used. |
check_Status_TADDR | Percentage of temporary addresses used. |
check_Status_ASP | Size of system ASP in GB. |
check_Status_STORAGE | Total storage size in GB. |
check_Status_UNPSTG | Size of unprotected storage in MB. |
check_Status_MAXUNPSTG | Max size of unprotected storage in MB. |
check_Status_NUMPART | Number of partitions on system. |
check_Status_PARTID | Partition ID for host. |
check_Status_CPUCAP | Processor capacity as a percentage. |
check_Status_CPUSHARE | Processor sharing status. |
check_Status_NUMCPU | Number of processors that are licensed. |
check_Status_ACTJOB | Number of *ACTIVE jobs running on system. |
check_Status_ACTTHD | Number of *ACTIVE threads on system. |
check_Status_MAXJOB | Maximum number of jobs on system. |
check_Status_TMP256 | % of temporary 256MB segments used. |
check_Status_PRM256 | % of permanent 256MB segments used. |
check_Status_TMP4GB | % of temporary 4GB segments used. |
check_Status_PRM4GB | % of permanent 4GB segments used. |
check_Status_UCAP | % of uncapped CPU used. |
check_Status_SPOOL | % of shared processor pool used. |
check_Status_MAINMEM | Amount of main memory in GB. |
check_Status_PRCTTU | Amount of processor unit time used in ms for each job. |
check_Status_INTTRN | Number of interactive transactions per job listed. |
check_Status_DBLCKW | Amount of database lock waits per job listed. |
check_Status_INTLCW | Amount of internal machine lock waits per job listed. |
check_Status_NDBLCKW | Amount of non-database lock waits per job listed. |
check_Status_AUXIOR | Amount of auxiliary I/O requests per job listed. |
check_Status_PEAKTS | Amount of peak temporary storage per job listed. |
check_Status_QTEMPS | Size of QTEMP library in MB per job listed. |
check_Status_RESPTT | Total response time in seconds per job listed. |
check_Status_TSDBLW | Total seconds spent in database lock wait, per job listed. |
check_Status_TSINTL | Total seconds spent in internal lock wait, per job listed. |
check_Status_TSNDBL | Total seconds spent in non-database lock wait, per job listed. |
check_Status_TMPSTG | Temporary storage used in MB, per job listed. |
AAG now provides the ability to create a HMC host and monitor with the following check_commands.
check_HMC_MSYSCON | Returns managed systems connection status. |
check_HMC_MEMSTS | Returns HMC memory status. |
check_HMC_SRVEVNT | Returns HMC managed system service events. |
check_HMC_SYSLED | Returns managed system LED status. |
check_HMC_PARTLED | Returns partition LED status. |
check_HMC_MAINTEXP | Returns managed system hardware maintenance expiry. |
check_HMC_PARTSTS | Returns partition name, status, os. |
check_HMC_UPD | Returns HMC available updates. |
check_HMC_MIGSTS | Returns partition migration status. |
check_HMC_LOGINS | Returns current loggins via ssh or webui to HMC. |
check_HMC_FSSIZE | Returns file system sizes & available space. |
check_HMC_CERT | Returns HMC certificate validity. |
check_HMC_SYSSRC | Returns managed system SRC code. |
check_HMC_PARTSRC | Returns partition SRC code.. |
check_HMC_SYSFMWR | Returns system firmware level. |
This set of commands relates to the product FT4i for FTP security.
check_FT4i_LOG | Returns FTP log entries matching criteria. |
This set of commands relates allows you to monitor your VIOS partitions.
check_VIOS_OSLVL | System OS level. |
check_VIOS_SEASTAT | VIOS SEA status. |
check_VIOS_USRSTS | VIOS user status. |
check_VIOS_ERRLOG | VIOS error log entries. |
check_VIOS_ENTSTATUS | VIOS ENT status. |
check_VIOS_DEVSTAUS | VIOS device status. |
check_VIOS_PVSIZE | Returns VIOS physical volume size. |
check_VIOS_FLOGINS | VIOS failed logins. |
check_VIOS_VMSTATUS | VIOS VM status. |
check_VIOS_FMWRLVL | VIOS firmware level. |
check_VIOS_LPARINFO | VIOS LPAR info. |
AAG is able to be installed by most IBM i/Nagios administrators, however, if you need assistance we provide highly trained consultants who will be able to install and configure AAG to ensure you are monitoring everything correctly. Any customer with a current maintenance contract in place can use the support portal to raise tickets for issues and questions they have about the products. The support portal also lists a number of FAQs that can help with product set up and configuration. Any tickets raised via the portal are immediately flagged to the support team to ensure a rapid response to your question(s). Access to the support portal is available Here If you are requested to start a remote desktop session (teamviewer) use the following link to install the correct version of teamviewer from our site Teamviewer Version 8