Guidelines for writing checks for the official distribution
November 26. 2014
The check file names should be named short and unique. They must consist only of lower case characters, digits and underscores and begin with a lower case character.
Vendor specific checks must be prefixed with a vendor specific unique abbreviation (which you think of). Example: fsc_ for Fujitsu Siemens Computers.
Product specific checks must be prefixed with a product abbreviation, for example steelhead_status. for a Steelhead appliance of Riverbed.
SNMP based checks: if the check makes use of a standardized MIB which is or might be implemented by more than one vendor, then the check should not be named after the vendor but after the MIB. An example are the hr_* checks.
Order of implementation
All checks follow the same order of implementation:
Add an author
If the check is contributed by a third party (i.e. not by the developers of Check_MK), you must
add your name and your email address as a comment into the check, right after the header.
Avoid long lines. Ideally, your lines shouldn't exceed 100 chars.
Use four spaces to indent your code. Just don't use tab chars.
And if you really can't live without tabs, set the tab width to 8 spaces.
For checks that are part of the official Check_MK project the file header with the
copyright information must be present. This will be automatically
created if you call 'make headers' in the main source directory
Including example output of the agent is very helpful for understanding how the check parser works.
TCP-Agent based checks must include an output example of the agent. If the agent output can have different formats or output styles then put an example for each kind of style the check supports (e.g.: the output of multipath -l has changed its layout between SLES 10 and SLES 11).
Configuration variables for main.mk should be named after the check, if they are only used by this check. This does not hold for variables, that are used by several checks (e.g. filesystem_default_levels is used by df, hr_fs, df_netapp, ...)
If a check does not use check parameters, then the inventory function must return None as parameter and the check function must name the parameter argument _no_params.
Other details / expected practices
Setting default values for configuration variables
Default values for check parameters (e.g. switch_cpu_default_levels) must be
chosen in a way that they make sense for everybody, not just for your
In case you are unsure then rather choose too loose than too tight levels.
This helps avoid false alarms.
If the same configuration variable is used in multiple checks, all of them
must set a default value and all those values must be identical!
Your check should assume that the agent is always producing valid data.
It should not try to handle cases where the agent output is broken.
This is handled by Check_MK via Python exceptions. Otherwise this will disable the
debug handler (make the code more ugly).
int(s) will throw an exception if if is not a valid number string (or empty). Then Check_MK will catch the exception and make the check result "UNKNOWN" with an according error message. saveint(s) will assume 0, if s is not valid.
Use saveint() in all places, where you know or suspect that some device does not supply valid data but the check can work with the rest of the data and produce useful results.
Use int() in all other cases,
e.g. if the check does not make any sense if you have no valid data.
Many checks have parameters that define levels for warning and critical that are compared against an actual value. Example: the check monitors the length of a mail queue. The critical level is at 100. If the length is exactly 100, then the check should go critical.
Levels for warning and critical are checked with >= and <= (there might be a few exceptions to this where this wouldn't make sense).
If there are just upper or just lower levels then in the WATO ruleset definitions the input fields for such levels are labelled with Warning at ______ and Critical at ______.
If there are both upper and lower levels then the labelling should be Warning at or above ___, Critical at or above ___, Warning at or below ___ and Critical at or below ___.
Each check issues one line of text - the plugin output (or sometimes called check output). In order to unify things the output must be formated with the following rules in mind:
Format of Performance data
Always send int or float data as performance data. Do not attach a unit. Write temp instead of "%0.2fC" % temp!
If you need to omit fields in the middle (e.g. warn or crit), add a None instead, for example [("usage", usage, None, None, 0, size)]
If you need to omit fields at the end, simply omit them. Do not add trailing Nones.
Naming of performance data variables:
Always use the canonical unit: send Bytes, not KB, MB or GB. Send Celsius, not Fahrenheit.
Send Bits/sec, not MBits/sec. It is the task the graphing tool to do a useful scaling.
Only set "has_perfdata" to True in check_info
if the check really produces performance data output.
Each check that outputs performance data must have a dedicated PNP graph definition in pnp-templates. If the check has warning and critical levels then the graph must display those levels as yellow and red lines.
PNP graphs should always use the consilidation function MAX (there are some rare exceptions where only MIN makes sense).
However: the Average value that is printed in the labelling of the
graph must use the consilidation function AVERAGE. Using MAX
would compute the average of the maximum values - which is totally useless.
Each check that outputs performance data must also have an RRA definition
the specifies which of MAX, MIN and AVERAGE is needed to display the
graph in its current (and maybe future) forms. Those are in pnp-rraconf.
Use a symlink here.
Each check that outputs performance data should have a Perf-O-Meter.
For checks part of Check_MK this must be done in
web/plugins/perfometer/check_mk.py, for third party checks this should
be done in a separate file in web/plugins/perfometer.
Only use numeric OIDs in your checks. Name based OIDs rely on MIB files and the check won't work when the MIB files are not in place. Always have your OIDs start with a root, for example: .220.127.116.11.4.1
Neither the check- nor the inventory function may use the print command or otherwise output any data to stdout or stderr nor otherwise communicate with the outside. An rare exception to this are checks that need a dedicated data storage (such as logwatch: it keeps unread log messages in files).
Each check must have a man page. This should be:
Information that must be contained in the check description:
Here are some frequent errors and further mixed guidelines:
When you output a number and a unit, then always put one space between both: write 4.5 ms instead of 4.5msOnly the percent sign is added without a space.
Checks doing the same should always have the same (consistent) service description. Examples:
Service descriptions should be capitalized like English titles, e.g. "Source of Output"
If you check ships an agent plugin, then:
A check with items must return with an UNKNOWN state (3) in case the checked item is not found in the agent output or SNMP data. The text in the case should be: Thing not found in SNMP data or Thing not found in agent output (depending on the type of check) where Thing is the name of the item type, for example Database not found, Sensor not found, Domain not found. Do not:
The state markers (!) and (!!) must only be used in checks that can go warning or critical for several different reasons, like sub-checks.
Never use a global import statement in a check file
Do not use datetime for date/time parsing an arithmentic. Use time. It is capable of all that you need. Believe me.