Writing checks for Check_MK
May 27. 2015
Why not use local checks or MRPE?
Using local checks or MRPE for adding your own self-written checks to Check_MK is easy. Even inventory and performance data are supported. So why should you want to write native checks for Check_MK? Well, there can be several reasons:
If one or more of those issues are relevant for you, then you'll find all information needed for writing your own checks in this article and a couple of further articles.
Do I have to learn Python?
Well, to be honest: yes - at least to a certain basic degree. People have suggested to change Check_MK such that checks can be written in other languages as well. I understand this request very well. But from a technical point of view I cannot image how such an integration could be done in a clean, simple and performant way. Check_MK's checks are not standalone programs or scripts but are closely integrated into the check mechanism. They need to have access to some of Check_MK's internal functions. In the end, for each host one Python program will be created by combining a base and all checks used by this host into one new program. This feature saves about 75% of the CPU resources when compared to directly calling check_mk for checking.
On the other hand, Python is a language which is cleanly designed, elegant and easy learn. I'm sure you'll like it once you have some experience with it (even if you dislike its style of indentation).
How Check_MK's checks work
Each check consists at least of the following three components:
Two further components are optional but strongly recommended:
If your check outputs performance data, then two further components form a perfect check:
The data source
Everything begins with the data source, i.e. source of the data the check operates
on. Currently there are two different kinds of data sources: agent sections (tcp),
and SNMP queries (snmp). An agent section is a part of the output of an agent, for example
the output of the Linux command df. An SNMP based data source returns
data retrieved by one or several SNMP queries on certain OIDs. Both data sources
are presented to the check function as a table (a Python list of lists). We will
call these data the "agent data".
If you write a TCP based check you need a plugin for the agent. This is a typically small executable script which is placed in the plugins directory of the agent. It uses standard operating system methods for retrieving the data of interest.
It is important to understand the philosophy of Check_MK at this point. The plugin should:
The inventory function
If you want your check to support inventory (which is always a good idea), you have to supply an inventory function. This function examines the agent data of a host and creates a list of all items to be checked on this specific host. An item uniquely identifies a thing to be checked on a host within that type of check. Some examples of items are:
Some checks do not need to distinguish items. This is because the thing they check does not exist more than once on a host. An example is the check mem. But Check_MK always requires an item, so these checks simply use None as the item.
Please note that this does not mean you cannot do an inventory on mem. It's just that the number of items the inventory returns is at most one. In some cases it is even zero: when the agent output does not contain the information needed for the check. This is a very useful feature and enables the Nagios administrator to automatically perform the right checks on the right operating systems.
Your inventory function does not need to worry whether a certain item was
already configured manually or detected by a previous inventory. Check_MK
handles this in a general way, and makes sure that only newly detected items
are added to the list of services.
When an actual check of a host is done, all services for this host will be checked in turn. When it's your check's turn, Check_MK will call your check function for each item that is automatically or manually configured for the check and host. Your function will be provided with the checked item, the (optional) parameters of the check and the agent data.
The check function then
This is very similar to what standard Nagios plugins do, with the important
difference, that our check is already provided with data from the agent and
does not have to retrieve it by itself.
If you want to pass your check along to others, a manual page for the check is strongly
recommended. Check_MK has its own concept and syntax of check manuals. You do not need to learn
NROFF syntax or stuff like that. A check manual is a relatively simple text file named after
the check and usually installed in
If you check delivers performance data (i.e. not only returns a status and an explanatory text but also values like memused=77364), you should provide a template for PNP4Nagios which nicely displays the evolution of the value.
If you are using another graphing tool, or no graphing tool at all, then a PNP
template is not useful for you - of course. You only need one if you want your check
to be officially part of Check_MK.
The same holds for the Perf-O-Meter for Multisite. People like Perf-O-Meters. If you do not use Multisite then Perf-O-Meters are of no use to you. Checks wanting to be part of Check_MK must provide Perf-O-Meters (even if some older checks of Check_MK still do not have ones either).
Let's jump to practice: Preparing the agent
Let's now write our first check. For a start we offer two tutorials.