In previous versions it could happen that monitoring sites tried to consume a lot of system memory, e.g. because the monitoring configuration was too large for the available system memory. In this case the Linux kernel starts to kill "random" processes which could als be mandatory system processes, for example the cluster resource manager of clustered appliances. Once such a critical process was killed the appliance was in an undefined state where only a reboot might help. Till the next occurance of this issue.
We have now added a reservation of memory for system processes. Here the extract of the current documentation:
The system memory of the device is available to your monitoring sites, reduced by the amount of memory which is needed by the system processes of the Check_MK Appliance.
To provide a stable system platform, a fixed amount if memory is reserved for the mandatory system processes. The exact amount of reserved memory depends on your device configuration:
If you want to know exaclty how much memory is available to your monitoring sites and how much is currently used, you can monitor your device using Check_MK. After service discovery the host automatically monitors a service User_Memory which shows you the current and historical values.
In case your you monitoring instances are trying consume more memory than available, one of the processes of the monitoring sites is automatically killed. This is done by standard mechanisms of the Linux Kernel.