The Check_MK Micro Core now has an alternative implementation of theLivestatus table statehist. This table is the basis for allavailability computations. In the current implementation, which is stillthe only when using the Nagios core, for each query all historic logfilesthat cover the query range have to be evaluated. Despite caching this canmean an intense effort in CPU and IO usage. If you have a larger number ofhosts and services then a query for a larger time frame could last for minutes.
The new implementation needs to be enabled in the global settingsfor the Check_MK Micro Core: In-memory cache for availability data(experimental). You also have to configure a time range. This limits howlong into the past you can do availability queries. The default setting istwo years.
During the start of The Core all historic log files for that time ranged areparsed into a very efficient in-memory database so that future availabilityqueries do not need any disk IO or logfile parsing. The cache is automaticallyupdated when new alerts happen. Please also note that The Core is notrestarted during normal operation and activation of changes, so the cacheis just invalidated when you reboot your server or do a software updateof Check_MK.
The parser can process 500.000 messages per second and more, so if your diskIO is fast enough even parsing a large history does not take longer thana couple of minutes. This is done in the background and does not preventThe Core from working or queries from being answered. Even availabilityqueries are being answered while the cache is still being built up. If thequeried time range is already in the cache then the query can immediatelybe processed. Otherwise it waits for the cache to be ready.
When it comes to timeperiod definitions the new implementation has adifferent behaviour: It reflects later changes in the definitions of yourtimeperiods. This is conveniant when you want to work with service periodsfor your availability queries. The classical implementation evaluates theTIMEPERIOD TRANSITION entries in your logfiles. The new one directlytakes the current definitions into account and computes them for the timerange in the past.
Note: As of today this implemention is still highly experimentaland might not only produce wrong results, but might crash your core.