Monitoring Windows with Check_MK

Dieser Artikel wird nicht mehr gepflegt und ist unter Umständen nicht mehr gültig!

1. The Windows Agent

Check_MK provides its own agent for monitoring Windows hosts: check_mk_agent.exe. This agent is being installed as a Windows service and has several advantages over the NSClient++ (besides that fact that it supports Check_MK, of course):

  • Minimalistic approach - one single executable file, no DLLs or additional files needed.
  • Shipped with source code - compilable with the free compiler MinGW - the Minimalist GNU for Windows. You do not nee proprietary software for re-compiling the agent. On current Linux distributions you can even cross-compile the agent on Linux.
  • Support for monitoring Windows Eventlogs.
  • Security: the agent never reads any data from the network and thus is not endangered by code insertion attacks.
  • Extensions can be written in arbitrary programming languages.
  • And last but not least: No configuration is needed.

2. Where to find

The Windows agent will be installed into a directory that is configurable during setup of Check_MK. When using OMD you will find the agent in the directory via share/check_mk/agents/windows below your site directory.

3. Installation of the agent

3.1. The windows agent installer

Beginning from version 1.1.13i3 there is an installer available for the Windows agent. For installing the windows agent using this new installer simply download the windows installer or copy the check-mk-agent-<version>.exe from the agent directory of your Check_MK installation to your Windows host and start it.

The installer will ask where to install and whether or not it should install and start the windows service. The default options should fit most users needs, so you can simply click through the installer.

By default the agent will be installed to %PROGRAMFILES%\check_mk. When the installation is finished, the service has been created and started. You can start monitoring your Windows host right away.

The installer supports some command line arguments which are:

/Sruns the installer or uninstaller silently
/D=sets the default installation directory. It must be the last parameter used in the command line and must not contain any quotes, even if the path contains spaces. Only absolute paths are supported. You need to set the path like this: /D=C:\path\to\agent

3.2. Manual Installation of the agent

If you don't like to use the agent installer you can also install the windows agent manually. The installation of the agent is easy. Just copy check_mk_agent.exe to your Windows host into a convenient directory and call it from a command shell with the option install:

C:\some\directory\> check_mk_agent.exe install

This will install a new Windows service called Check_MK_Agent. This service can be started with the Windows service manager or simply by entering:

C:\some\directory\> net start check_mk_agent

3.3. Test

If the agent is running properly you should be able to connect to the Windows host to TCP port 6556 from the Nagios host. You can test this, e.g., with telnet:

user@host:~$ telnet windowshost 6556
Version: 1.1.13i1
AgentOS: windows
WorkingDirectory: C:\some\directory
ConfigFile: C:\some\directory\check_mk.ini
AgentDirectory: C:\some\directory
PluginsDirectory: C:\some\directory\plugins
LocalDirectory: C:\some\directory\local
C:\        NTFS     62902472 18363880 44538592  30% C:\

3.4. Integration into check_mk

The integration of Windows hosts is nothing special and goes the usual way: add the host to all_hosts in and run the inventory with cmk -I. After that update your Nagios configuration files with cmk -U and restart Nagios:

4. Trying out the agent without installing it

The check_mk_agent allows you to try it out without installing it as a service. The simplest way is to call it with the option test. This does not open a TCP socket but simply displays all current data on your console:

C:\some\directory\> check_mk_agent.exe test
Version: 1.0.29rc
C:\        NTFS     18434584 6559144 11875440  36% C:\
[System Process]

Another option is to start the agent with the option adhoc. No it will open TCP port 6556 and handle requests just like the service. It will do so until you abort it by pressing Control-C:

C:\some\directory\> check_mk_agent.exe adhoc
Listening for TCP connections on port 6556
Close window or press Ctrl-C to exit

5. Information provided by the agent

As of version 1.1.12 the agent provides access to the following data:

  • Size and usage of all fixed disks
  • All running processes
  • All running and stopped services
  • Usage of RAM and the page file
  • CPU utilization
  • Disk IO (throughput)
  • All Eventlogs found in the system
  • The system time (time synchronization)
  • Length of message queues of MS Exchange

Furthermore checks can be added by making use of external plugins written in VBS, command or other languages. Currently we ship the following plugins:

  • Active directory replication
  • DHCP Pools
  • Windows Multipathing
  • Windows Updates
  • Ressource usage of processes via WMI

The agent can be configured to output arbitrary Windows performance counters. Check_MK currently only extracts disk throughput, CPU usage and MS Exchange queues. Further checks can be implemented without any changes to the agent.

6. Configuring Checks in

Most of the items to be checked are found by the inventory function. If you want to autodetect processes and services as well, some configuration in is needed.

6.1. Processes

The output of the Windows agent is compatible with that of the Linux and UNIX agents with respect to the processes. Please refer to "How to monitor processes".

6.2. Services

In order to monitor services you need first to determine which services are of interest to you. The easiest way is to look at the raw output of the agent and look for the section <<<services>>>. You can use cmk -d for this:

user@host:~$  cmk -d winhostxy | fgrep -A 10 '<<<services>>>'
Alerter            stopped  Warndienst
ALG                running  Gatewaydienst auf Anwendungsebene
AppMgmt            stopped  Anwendungsverwaltung
AudioSrv           running  Windows Audio
BITS               running  Intelligenter Hintergrund&uuml;bertragungsdienst
Browser            running  Computerbrowser
Check_MK_Agent     stopped  Check_MK_Agent
cisvc              stopped  Indexdienst
ClipSrv            stopped  Ablagemappe
COMSysApp          stopped  COM+-Systemanwendung

The first column of the output is the exact internal name of the service. Let's say you want to check if ALG is running on host winhostxy. Then put the following line into your checks variable:
checks = [
 ( 'winhostxy', 'services', 'ALG', None ),
 # some other checks...

If you have a larger number of windows hosts it is a tedious and error prone work to define which services you expect for each host. Check_mk helps you by providing an inventory mechanism for services. All you have to do is to provide a list of relevant services. This list is global and needs to be defined only once in in the variable inventory_services.

During the next inventory of windows hosts, check_mk scans for these services and automatically creates a check for each one found running.

Lets assume that the services TSMListener, Httpd and TapiSrv should always be monitored if found running on a machine. All you have to do is to add to your
inventory_services = ['TSMListener', 'Httpd', 'TapiSrv' ]

At the next inventory all hosts where that services run will be detected and checks created automatically.

If you need more flexibility, such as:

  • regular expression matching
  • taking the start type of a service into account (auto, demand, ...)
  • using host tags in order to make rules for only a certain group of hosts

then please refer to the check manual of the services check for details.

7. Eventlog Monitoring

The Windows agent sends output that is fully compatible with that of the Logwatch extension of the Linux/UNIX agent and is thus handled in the same way. For sake of simplicity there are nevertheless some differences:

  • The agent does not save its state persistently to disk.
  • The agent does not filter messages, but uses the fact that Windows classifies messages into Informational, Warning and Error. The agent sends all non-informational messages.

What does this mean in detail? When the agent is started (most probably at boot time of the host) it will try to seek to the current end of the Eventlogs and wait there for new records. Only records appearing while the agent is running will be sent to Nagios. If the agent is stopped and started again, it theoretically could miss some messages. As the agent is running permanently this should not be a practical problem, though.

Since the agent is completely configuration-less it does no specific filtering of events. It simply looks for messages of type Warning or Error. If such a message is seen, the complete check interval will be declared as relevant. The agent sends all messages of the logfile that appeared since the previous check to Nagios - even those of type Information. This allows the administrator to have more context information about the problem at hand on the Nagios server.

If you want to suppress some messages or reclassify them from Warning to Critical or vice versa, you can define a message filter in This is done by setting the variable logwatch_patterns which is a Python dictionary with a key for each logfile. The value is a list of pairs:
logwatch_patterns = {
    'System': [
    ( 'W', 'sshd' ),
    ( 'W', 'rebooting.*system' ),
    ( 'C', 'path link down' ),
    ( 'I', 'ORA-4711' )
    'Application': [
    ( 'W', 'crash.exe' ),
    ( 'C', 'ssh' ),
    ( 'I', 'test.*failed' )

All patterns for a logfile are executed from first to last. The first match wins. The entry ( 'W', 'sshd' ) reclassifies all messages containing sshd to Warning. There are three possible types:

  • C: reclassify as CRITICAL
  • W: reclassify as WARNING
  • I: ignore these messages

Note that the patterns are regular expressions. Thus the entry ( 'I', 'test.*failed' ) reclassifies all messages containing the word test and later the word failed.

Messages that do not match any pattern retain their classification from the agent. Messages that are classified as context messages by the agent are never reclassified.

7.1. Host specific filtering of messages

As of version 1.0.37 of check_mk, host specific message filtering is supported. This means that you can have your reclassification in logwatch_patterns depend on the host where the message has been found.

Host specific patterns include a host list, or a host tag list and a host list as first elements of the entry. This works quite similar to many other configuration variables. Please read more about host tags for details on that. The following example makes some of the patterns of the upper example host specific:
logwatch_patterns = {
    'System': [

    # reclassify only on host abc123
    ( ["abc123"], 'W', 'sshd' ),

    # the following holds for all hosts
    ( 'C', 'path link down' ),

    # reclassify message to "ignore" on all hosts with the tag "test"
    ( ["test"], ALL_HOSTS, 'I', 'ORA-4711' )

    'Application': [
    # Do not reclassify on host "testhost"
    ( ["!testhost"], 'W', 'crash.exe' ),

    # make ssh critical on "dmz" hosts that do not have the tag "test"
    ( ["dmz", "!test"], ALL_HOSTS, 'C', 'ssh' ),

    # this is for all hosts again
    ( 'I', 'test.*failed' )

7.2. Advanced agent configuration

The eventlog monitoring of the Windows agent can be configured to your needs. For each eventlog you can decide which messages should be sent to Check_MK. The default is that all eventlogs are processed and messages of the types warning or critical (or security failures) are being sent.

If you create a file called check_mk.ini in the agent directory, you can configure which eventlogs and which levels are processed. Here is an example:

It is also possible to suppress the context messages for windows eventlogs with the option nocontext. Here is an example:

    # From the Application log send only critical messages
    logfile application = crit

    # From the system log only send warning/critical messages,
    # but suppress any context messages
    logfile system = warn nocontext

    # Do not process other event logs at all
    logfile * = off

Note: When setting a logfile to all, also informational messages are being sent. As long as you do not reclassify them via logwatch_patterns, those message will nevertheless not trigger any alarm.

Setting a logfile to off will disable processing of that eventlog. Reading application specific eventlogs can have an impact on the stability of the agent if the application has bugs in their eventlog implementation.

8. Logfile monitoring

Besides the Windows event logs you can also monitor plain text log files. You need to configure these in check_mk.ini, of course. For this feature use the section logfiles. Here is an example

    textfile = C:\temp backup\Testfile*.txt | D:\tmp\info.log
    warn = hostA? WARN*
    crit = *invalid line*
    ignore = User * logged in
    ok = *Backup created*
    crit = *emergency*

    textfile = C:\Windows\logins.txt
    warn = *Access error*
    crit = *Password error*
    ignore = *successfully logged in

The textfile field determines the files to be monitored. Multiple files in one line are separated by |. Patterns may be used with the globs * and ?.
After the textfile patterns the corresponding state patterns are defined. Like in the unix version the available states are warn, crit, ok and ignore.

Werk #1103

windows agent: now able to omit context text of logfiles

9. Performance Counters, monitoring MS Exchange

Several Windows checks are based on Performance Counters. These are special objects provided by the Windows operating system that contain information about throughput, queue lengths, latencies and other numbers of the system and applications like MS Exchange.

Performance counters are grouped into Counter Objects. Within the operating system each object has a unique ID. Unfortunately IDs for applications (like MS Exchange) are not fixed but vary from server to server. In the registry there is a translation between those number and names - but the names are in the local installation language and thus not portable either. This is very sad. And it makes some configuration neccessary if you want to make use of all of the agent's features.

One good thing is - however - that some basic counter objects seem to have fixed IDs. This is at least the case for the counters needed for monitoring the CPU utilization and the disk throughput.

Check_MK ships ready-to-use checks for

  • CPU utilization
  • Disk throuphput per physical disk
  • MS Exchange mail queues

If you want to make use of the MS Exchange checks, you first have to determine the ID of the counter object MSExchangeTransport Queues. In order to find that, first open a command box (DOS box) and dump the complete counter information into the file counters.ini:

C:\> lodctr /s:counters.ini

Now you can view this file with Notepad or use find on Windows - which does essentially the same as grep on Linux - and watch for MSExchangeTransport Queues:

C:\> find "MSExchangeTransport Queues" counters.ini
---------- COUNTERS.INI
[PERF_MSExchangeTransport Queues]
10332=MSExchangeTransport Queues

If you prefer analysing the file under Linux, you first need to convert it to UTF-8 (it is UTF-16 litte-endian encoded!). This can be done on Linux with:

user@host:~$ recode UTF-16LE..UTF-8 counters.ini

In this example the ID of the counter object is 10332. The ID can now be configured in check_mk.ini. Create this file in the same directory where check_mk_agent.exe is installed, with the following content:

    counters = 10332:msx_queues

Now restart the agent. When you retrieve the agent output from your monitoring host, you should now get an additional section <<<winperf_msx_queues>>> with a content similar to this one:

12947263852.75 10332
1 instances: _total
2 0 rawcount
4 0 rawcount
6 0 rawcount
8 0 rawcount
10 0 rawcount
12 0 rawcount
14 0 rawcount
16 0 rawcount
18 895 rawcount
20 895 counter
22 0 rawcount
24 0 counter

Now when you inventorize the host with cmk -I, new checks for several MS Exchange mail queues should appear. For details please consult the man page of winperf_msx_queues.

Werk #1093

windows agent: performance counter can now be specified by name

10. Extending the Windows agent

10.1. Plugins and local checks

From version 1.1.7i3 the Check_MK agent for windows can be extended just as the Unix agents with local checks and plugins. Local checks are (usually simple) scripts or programs performing self written checks and computing the results directly on the target machines.

Plugins are scripts or programs that output agent sections similar to those builtin in the agent. Several such scripts are shipped together with the agent and are found in the subdirectory plugins.

In order to use such plugins you need to:

  • Create a directory plugins in the same directory where you have put check_mk_agent.exe (on your windows hosts)
  • copy the plugins in question into that directory
  • make sure the system account the agent is running on has appropriate permissions for running the checks

One example of such a plugin is wmicchecks.bat, which uses wmic in order to list processes with their ressource consumption:

@echo off
echo ^<^<^<wmic_process:sep^(44^)^>^>^>
wmic process get name,pagefileusage,virtualsize,workingsetsize,usermodetime,kernelmodetime /format:csv

In order to make use of this agent information your installation of Check_MK needs a check which can process that data. The checks needed for the shipped plugins are part of Check_MK. A tutorial for writing your own checks can be found here.

WARNING: Windows' concept of launching other programs as subprocesses is sometimes hard to grasp for people used to Unix-like operating systems. So please make sure that there are no files in your local or plugins directory that are executable or that open a window when being executed. The latter is particularly true when a text file is found: Windows will then open notepad.exe as a subprocess of the agent on the Windows console. As long as notepad is running the agent is hanging and cannot even be killed or restarted. The solution in such a case is (in order to avoid a reboot):

  1. Remove the critical file(s) from the local and plugins directory.
  2. Go to the windows console. This might involve being physically present at the server.
  3. Close the notepad or whatever window had been opened.

Note: The agent does not execute all files in local and plugins, but first checks the file name extension. Per default all files except those with the suffix txt or dir are being executed. If you want you can specify an explicit list of extensions to execute. This is done in the optional file check_mk.ini in the [global] section:

    # Execute only files with the following extensions
    execute = bat exe vbs ps1

Note: these settings are not being honored by MRPE (see below).

10.2. MRPE

Check_MK's Windows agent also supports MRPE, MK's remote plugins executor. MRPE allows you to run classical Monitoring plugins locally on a Windows host and gives you access to hundreds of checks ready to use that float around in the Internet.

MRPE on Windows does not make use of an own configuration file but uses check_mk.ini. You should put The plugins you want to execute into the section [mrpe] using the following format:

    # Run classical Nagios plugins. The word before the command
    # line is the service description for Nagios. Use backslashes
    # in Windows-paths.
    check = Dummy mrpe\check_crit
    check = IP_Configuration mrpe\check_ipconfig
    check = Whatever c:\myplugins\check_whatever -w 10 -c 20

They check_crit dummy is shipped with Check_MK and can be used for your tests. It simply fails with a Nagios state of CRITICAL and displays one line of text.

Please note that just like in the Unix version of MRPE, the service description must not contain spaces.

Werk #0266

windows_agent: now supports MRPE include files

11. Timeouts, caching and parallel processing of local and plugin scripts 1.2.3i2

Since version 1.2.3i2 it is possible to uncouple the local and plugins script execution from the main process. Previous versions had problems whenever the executed scripts took very long or never returned due to programming errors. This caused the agent to hang indefinitely until the responsible script was terminated.

11.1. Basic principle

Starting with version 1.2.3i2 scripts are now executed in separate threads. The job of these threads is to collect the script output and make it available to the main thread. Whenever the agent is queried by the monitoring server, it can now instantly access the cached data from the worker threads. This results in a greatly improved agent response, since the agent no longer has to wait for any scripts to finish.
However, there is a small downside: The data is determined in advance, so the information is slightly outdated - with a maximum of one check cycle. You can still choose not to enable this feature for certain scripts and force the agent to wait for them to finish - just like in previous versions.

Moving the local and plugin checks into different threads opens up plenty of configuration options.

11.2. New local and plugin configuration options

Within the [local] and [plugins] sections you can now set these new configuration parameters

  • execution 1.2.3i7
  • This parameter specifies when a script should be executed. Whether during the agents runtime (sync) or later on (async) after the agent data has been sent. Any data from scripts with async mode will be available on the next agent call. The default setting is sync.
  • timeout
  • This parameter specifies how long a script is allowed to run until the process gets terminated. In this case the script delivers no output at all, thus leading to a missing section in the agent output. The default value is 60.
  • retry_count
  • This parameter allows a script to fail multiple times (e.g. due to a timeout) before its cached data is discarded and no script output is returned. The default value is 0.
    Generally a local or plugin script should be able to finish in a reasonable amount of time. If it occasionally hangs for some unknown reason it might be a good idea to increase the retry_count for this script.
  • cache_age
  • This parameter signalizes that the cached data can be reused for the specified time. As long as the script output is new enough, there is no need to start the script again. The default value is 0.
    This is especially useful for scripts which cause high cpu load on execution or simply do not provide meaningful new information within those short query timeframes. For those scripts you could set the cache_age to 3600, so that the script's cached data is only updated once an hour.

The following example shows some basic configuration for local and plugin scripts.
Each line in the [local] and [plugins] section has this syntax:

{type} {script pattern} = {value}

Like in the other configuration sections you may use glob expressions for the script patterns.

    # Default timeout is 60 seconds
    # Default cache age is 0 seconds (no caching)
    timeout mem_info.bat = 10
    timeout * = 30

    # Default timeout is 60 seconds
    # Default cache age is 0 seconds (no caching)
    timeout wmic_if.bat = 5

    # the windows updates plugin is executed in async,
    # has a timeout of 120 seconds
    # and is only updated every 3600 seconds
    execution windows_updates.vbs = async
    timeout windows_updates.vbs = 120
    cache_age windows_updates.vbs = 3600

    # All other scripts have a cache age of 120 seconds,
    # their data is discarded if the script fails 3 times
    # and have a timeout of 10 seconds
    cache_age * = 60
    retry_count *.bat = 3
    timeout * = 10

11.3. Local and plugin data collection settings

As mentioned before, the output of the scripts is now stored into caches which are read by the main process. There are two distinct ways of collecting these data, each having their pros and cons.

  • Parallel script execution
  • With parallel script execution all local and plugin scripts are executed at the same time.
    Pro:Overall script execution duration is smaller when using multiple scripts.
    Con:Might cause high CPU load

  • Queued script execution
  • With queued script execution all local and plugin scripts are executed one after each other.
    Pro:Lower CPU load
    Contra:If one script hangs, all other scripts are delayed. To circumvent this, you might specify a maximum timeout for those scripts

Since 1.2.3i7 it is possible to set the execution method async_script_execution for scripts which are configured to run in async mode. This execution mode sets whether the scripts are executed sequential or parallel.

    async_script_execution = sequential

The following values are available

  • sequential
  • The async scripts are executed one bye one.
  • parallel
  • The async scripts are executed all at the same time.

Any output of scripts configured with the async is delayed by one check cycle, usually one minute.

12. General agent configuration

12.1. Restricting Access

As of version 1.1.9i1, the Check_MK agent for Windows allows for access restriction based on IP addresses - just like xinetd for Linux. If you want to make use of this feature you need to create a configuration file check_mk.ini in the same directory as the agent.

The restriction is configured by the variable only_from in the section [global]:

    only_from =

You may add up to 32 IP addresses or networks in slash-notation. If you do not configure only_from or leave it empty, all clients are allowed.

Make sure to restart the agent after any change of the configuration file:

C:\some\directory\> net stop check_mk_agent
C:\some\directory\> net start check_mk_agent

Please note that the agent cannot inhibit the initial TCP connect, because it cannot check the IP address of the remote site before the connection is accepted. If a disallowed remote host has connected, the agent immediately closes the connection. The client thus sees a successful initial connection followed by a 'connection closed by foreign host' without reading any data. This is exactly the way xinetd behaves.

You can also define matching scopes in the "windows firewall with advanced security" or IP Security rules.

12.2. Sections to execute

For the purpose of finding problems or speeding up the agent you can configure, which sections should be executed. This is accomplished in the [global] section with the configuration option sections. Example:

    sections = check_mk uptime ps df mem

In order for Check_MK to get general information about the agent, you should always enable the section check_mk.

12.3. Host specific configuration

Depending on how you manage your windows servers you might need host-specific configuration settings in check_mk.ini. Check_MK offers a method for keeping settings for different hosts in a single check_mk.ini that can be distributed to all of your hosts.

Let's look at the following example:

    host = zmsex?? zexchange*
    counters = 10332:msx_queues

The directive counters = 10332:msx_queues will only be executed on hosts with a name beginning with zexchange or those made of zmsex and two further characters. The jokers * and ? can be used like for filenames. Any number of patterns can be added after the host = .

This restriction holds until the end of the section:

    # Counters for some selected hosts
    host = zmsex0? zexchange*
    counters = 10332:msx_queues

    # Counters for some other hosts
    host = zmsex1?
    counters = 10320:msx_queues

    # This will be executed for all hosts:
    logfile * = off

You also can switch off the restriction by setting a filter for *:

    # Logfiles for hosts ending in prod:
    host = *prod
    logfile security = warn

    # This holds for all hosts:
    host = *
    logfile application = off

Note: hostname matching is case insensitive.

12.4. Crash-Debugging

Some users have reported that on some few Windows servers they have seen the Check_MK agent crash from time to time. The new global option crash_debug helps in such a case. Add this to your [global] section and set it to on.

    crash_debug = yes

This will enable error logging during the computation of the agent sections:

  • Before a new client connection is accepted a file connection.log will be created. After the connection is closed this file will be renamed to success.log.
  • If the agent starts and sees a connection.log lying around it assumes that it previously has crashed. It will rename this file to crash.log. Before, it will rename any existing crash.log to crash-1.log, that one to crash-2.log and so on up to crash-9.log.
  • A new subjection of <<<logwatch>>> will be created simulating a new event log file
  • In case of a previous crash the contents of the crash.log will be visible in this logfile and thus be transported to your monitoring server where you can see the famous last words of the agent.

Do the following steps when trying to investigate an agent crash:

  1. Set crash_debug = yes in check_mk.ini
  2. Restart the agent
  3. Do an inventory of that host with Check_MK. This will create a new service LOG Check_MK Agent.
  4. Restart Check_MK and you monitoring server in order to make the check active
  5. Wait for the next crash