Monitoring filesystems with Check_MKAugust 23. 2011
Basic principleThe monitoring of filesystems can cover several aspects. This article is concerned with the most important one: the amount of used or free space. Depending of the type of the agent and operating system, Check_MK provices several checks that monitor filesystems. As of version 1.1.11i2 all of those are fully compatible and share the same features and configuration parameters. Here is a table of those checks:
All of those checks support automatic inventory. InventoryThe inventory automatically creates service checks for all filesystems that the agent provides information for. The windows agent sends all fixed disks. The Linux agent sorts out network filesystems but might send filesystems you do not want to see (e.g. temporary mounts). You can customize the inventorization of filesystems in main.mk with three variables. The following settings are default: main.mk # List of filesystem types to skip at inventory inventory_df_exclude_fs = [ 'nfs', 'smbfs', 'cifs', 'iso9660' ] # List of mountpoints to skipt at inventory inventory_df_exclude_mountpoints = [ '/dev' ] Filesystems specified in one of those two lists will never appear at an inventory. For further details about how to tune the inventory, please refer to the check manual of df. Default Levels for Warning and CriticalBecause in a larger installation one can easily have to do with more than 1.000 different filesystems, Check_MK makes an effort to make configuration of warning and critical levels as easy and flexible as possible. You have several methods of defining levels - listed here from less to more and more specific. While this scheme holds for all checks, we take the opportunity to show this for the filesystems with a few examples. The whole store about check parameters is handled in an own article. Please note, that this article already is based on the dictionary based parameterization of df, which is available since version 1.1.11i1. If you are using an older version of Check_MK, then please consult the check manual of df that was shipped with your version. So let's begin: The easiest way is of course to not specify any level at all. Check_MK will apply its builtin default levels of 80% usage for warning and 90% usage for a critical state. You can easily change this global defaults with the variable filesystem_default_levels in main.mk. It is a dictionary where you need to set the key levels. It must be a pair of two integers which are interpreted as percentages: main.mk filesystem_default_levels["levels"] = ( 90, 95 ) More specific rules for levelsEven if global default levels might be sufficient for most filesystems you surely will have exceptions. Exceptions can be defined in a flexible and yet compact way by making use of the variable check_parameters. The following example configures the levels {(95, 98)} for the filesystem /var on all hosts: main.mk
check_parameters += [
( { "levels" : (95, 98)} , ALL_HOSTS, [ "fs_/var" ] ),
]
This defines the levels for filesystems mounted at a directory that begins with /var to 95% and 98% for all hosts. For the next example assume that you have a tagged all of you Windows hosts with win. Then you could add a line for those: main.mk
check_parameters += [
( { "levels" : (95, 98)} , ALL_HOSTS, [ "fs_/var" ] ),
( { "levels" : (75, 85)}, [ "win" ], ALL_HOSTS, [ "fs_" ] ),
]
The expression [ "fs_" ] matches all services the begin with fs_, i.e. all filesystem checks (the check type does not matter here, this works for both df, hr_fs and others). Exceptions in the list of mount points can be denoted with an exclamation mark. The following rule holds for all filesystems that begin with /var but not for /var/log: main.mk
check_parameters += [
( { "levels" : (75, 85)}, ALL_HOSTS, [ "!fs_/var/log", "fs_/var" ] ),
]
Another way to do something very similar would be by writing two rules: main.mk
check_parameters += [
( { "levels" : (80, 90)}, ALL_HOSTS, [ "fs_/var/log" ] ),
( { "levels" : (75, 85)}, ALL_HOSTS, [ "fs_/var" ] ),
]
Please note, that the rules in check_parameters are executed from top to bottom. The first rule that matches wins. If no rule matches then filesystem_default_levels is used. By the way: The windows agent sends the drive letter followed by :\ instead of a mountpoint. So you could write a rule for all C: and D: drives very easily: main.mk
check_parameters += [
( { "levels" : (80, 90)}, ALL_HOSTS, [ "fs_C:", "fs_D:" ] ),
]
Manual level definitions for single filesystemsDefining a specific level for on specific filesystem on one specific host maybe bad style but is also possible with those rules. Just make sure that you list those specific rule first in check_parameters as the first match wins. main.mk
check_parameters += [
( { "levels" : (77, 88)}, [ "srv123xz" ], [ "fs_/very/specific" ] ),
]
Please note: the pattern in this rule is interpreted as a regular expression matching the beginning of the service description. If you want to make sure, that the rule does not match filesystems mounted below /very/specific you should add a dollar. The means - in the language of regular expressions - the end of a string: main.mk
check_parameters += [
( { "levels" : (77, 88)}, [ "srv123xz" ], [ "fs_/very/specific$" ] ),
]
The df magic numberDefining levels with percentages has the advantage that it is easier to use the same levels for filesystems of different sizes. Some people argue - nevertheless - that while for a small filesystem 10% free space might be a problem whereas for a filesytem of 1 TB 10% left is 100 GB and should not lead to an immediate alarm. Check_MK supports "magic" adaption of percentage levels to especially large or small filesystems. This is done by the so called df magic number, which is configured via the dictionary key magic. main.mk filesystem_default_levels["magic"] = 0.8 The default magic number is 1.0 which means no magic adaption. The following table shows the impact of various magic numbers on a level of 80% for various filesystem sizes:
The normalized filesystem size is here set to 20 GB - the default. You can changes this to another size in main.mk: main.mk filesystem_default_levels["magic_normsize"] = 100 In the doc directory of Check_MK you'll find the program df_magic_number.py. Call it with different norm sizes as argument to see the impact on the filesystem levels. Absolute limits on the levelsIn rare cases the levels for warning and critical can get very small or even negative. This happens for very small partitions in combination with a very small magic number (e.g. 0.4). Because of that, the levels for warning and critical will never be set below 50% or 60%, resp. You can configure these two levels in your configuration: main.mk filesystem_default_levels["levels_low"] = (40, 50) # never drop levels below 40/50 Filesystem trendsAs of version 1.1.9i9 Check_MK supports trends. This means that all filesystem checks are now able to compute the change of the used space over the time and can make a forecast into the future. It can estimate the point of time where the filesystem will be full. In the default configuration the check will compute the trend based on the data of the last 24 hours. Similar like the CPU load this is done with a logarithmic average that weights the more recent time more than time farer away. Also data beyond the 24 hours will to some small degree be reflected in the computation. The advantage of this algorithm is a more precise prediction and a simpler implementation, which does not need any access to any RRDs or similar storage. Please note, that when a filesystem is started being monitored, then the trend of the past is unknown and will be assumed to be zero. That means that it will take at least one trend range of time until the trend approximately reflects the reality. Per default, no alerting levels are configured for the trends so if you have attached a graphing tool like PNP4Nagios, you will see the trends but you will not be alerted. For how to configure trends please refer to the check manual of df. |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||