Monitoring the Connectivity to Remote SitesRequired version: 1.1.5i3
May 25. 2010
The ProblemWhen using a multi site setup you will sooner or later get into a situation where one of your remote sites is not responding. This might be due to network loss, the remote monitoring host being down, firewall misconfiguration or what ever. Multisite will handle this situations as well as it can and will mark the affected remote site as down. The problem is that this can lead to rather annoying timeouts. This means that as long as you do not manually switch off the connection to the remote site you'll get a timeout each time you load a page. The SolutionAs of version 1.1.5i3 this problem can be solved by a new site configuration variable: status_host. The idea is to add your remote monitoring hosts as hosts in Nagios in your local installation and consider their current host state. If the state is not UP then the connection will be marked as down or unreachable and Multisite will not try to build up a TCP connection (and thus will not run into a timeout). The "status_host" entry has to be added to a site's definition in sites and must consist of a pair of a sitename and a hostname. The sitename must be the name of a (usually the) local site. The hostname must be the name of a host monitored by that site and represents the remote monitoring host. Consider the following example: multisite.mk
sites = {
# connect to local Nagios
"nag01" : {
"alias" : "Nagios 1"
},
# connect to remote site
"nag02": {
"alias": "Nagios 2",
"socket": "tcp:192.168.56.7:6557",
"nagios_url": "/nag02/nagios",
"nagios_cgi_url": "/nag02/nagios/cgi-bin",
"pnp_prefix": "/nag02/pnp4nagios/graph",
"status_host": ("nag01", "nagios2"),
},
}
If the local Nagios instance nag01 has a host nagios2 and that host is either down or unreachable then the remote site will not be contacted any more. Disabling the local siteOne warning: when the user switches of the local site then Multisite has no more access to the status hosts of the remote sites. It consideres the connection as possible in that case in you might again run into timeouts if one of that sites is not responding. |
| ||||||||||||