Guidelines for writing checks for the official distribution
March 04. 2011
Writing a really good check has many aspects. If you want your check to be part of the official Check_MK distribution, you have to make it adher to the following guidelines:
The check file names should be named short and unique. They must consist only of lower case characters, digits and underscores and begin with a lower case character.
Vendor specific checks must be prefixed with a vendor specific unique abbreviation (which you think of). Example: fsc_ for Fujitsu Siemens Computers.
Product specific checks must be prefixed with a product abbreviation, for example steelhead_status. for a Steelhead appliance of Riverbed.
SNMP based checks: if the check makes use of a standardized MIB which is or might be implemented by more than one vendor, then the check should not be named after the vendor but after the MIB. An example are the hr_* checks.
The service description of different check types that essentially do the same must be identical (e.g. if/if64/ifoperstatus). Reason: this makes rules in main.mk simpler for the user!
Order of implementation
All checks follow the same order of implementation:
Add an author
If the check is contributed by a third party (like you), you must add your name and your email address as a comment into the check, right after the header.
Readability, looks and intents.
Avoid long lines. In an optimal case your lines don't exceed 100 chars.
Use four spaces for intending your code. Just don't use tab chars. And if you really can't life without tabs set the tab width to 8 spaces.
For checks part of the official Check_MK the file header with the copyright information must be present. This will be automatically created if you call 'make headers' in the main source directory
Example agent output
Including example output of the agent is very helpful for understanding how the check parser works.
TCP-Agent based checks must include an output example of the agent. If the agent output can have different formats or output styles then put an example for each kind of style the check supports (e.g.: the output of multipath -l has changed its layout between SLES 10 and SLES 11).
For SNMP based checks include examples if the kind of output is in some respect remarkable.
Configuration variables for main.mk should be named after the check, if they are only used by this check. This does not hold for variables, that are used by several checks (e.g. filesystem_default_levels is used by df, hr_fs, df_netapp, ...)
If a check does not use check parameters, then the inventory function must return None as parameter and the check function must name the parameter argument _no_params.
The name of the inventory and check function must be prefixed with the name of the check type, for example inventory_h3c_lanswitch_cpu for the check h3c_lanswitch.
Other details / expected practices
Setting default values for configuration variables
Default values for check parameters (e.g. switch_cpu_default_levels) must be chosen in a way that they make sense for everybody, not just for your special case. In case you are unsure then rather choose too loose than too tight levels. This helps avoid false alarms.
Reuse of configuration variables
If the same configuration variable is used in multiple checks, all of them must set a default value and all those values must be identical!
Your check should assume that the agent is always producing valid data. It should not try to handle cases where the agent output is broken. This is handled by Check_MK via Python exceptions. Otherwise this will disable the debug handler (make the code more ugly).
int() vs. saveint() and float vs. savefloat()
int(s) will throw an exception if if is not a valid number string (or empty). Then Check_MK will catch the exception and make the check result "UNKNOWN" with an according error message. saveint(s) will assume 0, if s is not valid.
Use saveint() in all places, where you know or suspect that some device does not supply valid data but the check can work with the rest of the data and produce useful results.
Use int() in all other cases, e.g. if the check does not make any sense if you have no valid data.
Only set the perfdata flag (the third parameter in the check_info declaration) to 1if the check really produces performance data output.
PNP Graph definition
Each check that outputs performance data must have a dedicated PNP graph definition in pnp-templates. If the check has warning and critical levels then the graph must display those levels as yellow and red lines.
Each check that outputs performance data must also have an RRA definition the specifies which of MAX, MIN and AVERAGE is needed to display the graph in its current (and maybe future) forms. Those are in pnp-rraconf. Use a symlink here.
Each check that outputs performance data should have a Perf-O-Meter. For checks part of Check_MK this must be done in web/plugins/perfometer/check_mk.py, for third party checks this should be done in a separate file in web/plugins/perfometer.
SNMP based checks
Only use numeric OIDs in your checks. Name based OIDs rely on MIB files and the check won't work when the MIB files are not in place. Always have your OIDs start with a root, for example: .18.104.22.168.4.1
Neither the check- nor the inventory function may use the print command or otherwise output any data to stdout or stderr nor otherwise communicate with the outside. An rare exception to this are checks that need a dedicated data storage (such as logwatch: it keeps unread log messages in files).
Each check must have a man page. This should be:
Information that must be contained in the check description: