Check_MK now supports automatic actions (scripts) to be executed upon the state change of a host or service. This is similar to Nagios "Event Handlers" but has a much more flexible configuration and other advantages.
At the beginning there is a state change of a host or service. It does not matter whether this change is "soft" - because the maximum number of check attempts has not been reached. It simply matters that the state has changed from one of OK/WARN/CRIT/UNKNOWN to another.
Whenever this happens a new global rule chain of Alert Handler Rules is being processed. Each rule that matches calls an external script of your choice. Most times people want to restart services, trigger garbage collections of Java machines or do similar stuff.
Please note that some folks insist that monitoring should not try to repair things or by any other means actively change things. Whether you share this opinion or not is your own decision. Alert handlers do not limit you in what you exactly do with them. But you have been warned.
When you compare alert handlers with the Rule Based Notifications (RBN) then here are some important differences:
Note: this is just the first implementation of the Alert Handlers. Next steps will the introduction of error tracking, notifications tied to alert actions, even more flexible conditions, a system for secure remote execution and much more. Stay tuned!
Setting up Alert handlers
For setting up alert handlers you first need to create a script that should be called. This can be written in any programming language - most people will use a simple BASH script. It must be executable and be installed in ~/local/share/check_mk/alert_handlers and made executable.
The script is provided with all information about the alert with environment variable that being with ALERT_ - very similar to a notification script. A good start for testing is the following script:
#!/bin/bash env | grep ALERT_ | sort > /tmp/alert.out
This will dump all the variable of the alert into the file /tmp/alert.out. When specifying this script in the alert handler rule - simply write foo here.
Useful for debugging is to set Alert handler log level to Full dump of all variables and Logging of the alert processing to on in the global settings. You will find information it ~/var/log/cmc.log and ~/var/log/alerts.log.