***
Update: new version pushed to github. If you have questions, you may shoot me an email at jeremy at etherized dot com (instead of being forced to use my sad little comment system).
***
It's no surprise that puppet can be incredibly useful for managing your nagios config. One common approach is to use puppet's dsl to configure nagios checks using the included nagios_* types, which is a very powerful technique when combined with collected/exported resources; this allows you to export checks from each host you want to monitor, and then have those collected on the nagios server. In this manner you can have hosts automatically show up in nagios once their collection resource has been processed.
I like this approach, but I've been sold on using check_mk along with nagios for a while now, so I need to come up with something a bit different.
For those who are unaware, check_mk consists of an agent (which runs on systems to be monitored by nagios) and a poller (which runs on the nagios server). The idea is that you never need to touch the nagios config files directly; the poller autodetects services and creates a valid nagios config all on its own. In addition, you get a fancy replacement web UI that doesn't look like a time warp from 1995.
In order to work, though, check_mk needs a list of nodes to be monitored and (optionally) a list of tags that describe those nodes. The tags are used by check_mk to apply configurations to resources; for example, you might tag a "production" server with the "production" tag, and configure check_mk to enable 24x7 paging on all services that are so tagged.
So, you can do all this manually, but puppet has all the information you need already. Here's the plan: create a puppet module that all to-be-monitored clients will include, have that module export a config snippet that describes each node, and then have puppet collect those snippets on the nagios server.
I've created such a module, and you can find my puppet module that configures check_mk at my github; an explanation of this module follows below, for the curious.
The checkmk class
The first bit is just a boilerplate I use on all my modules which allows them to be disabled with a variable. This is mainly to work around limitations of the dashboard node classifier; it's easy to apply a resource to a group of nodes, it's not so easy to then exclude that resource from a particular node. For this reason I use a "wrapper" class instead of calling the checkmk::agent class directly and rely on that ugly little magic variable to disable it as needed.
The checkmk::agent class
This is textbook basic puppet stuff, so I won't step through it all. The interesting bits are the crazy exported resources:
@@file { "$mk_confdir/$fqdn.mk":
content => template( "checkmk/collection.mk.erb"),
notify => Exec["checkmk_inventory_$fqdn"],
tag => "checkmk_conf",
}
@@exec { "checkmk_inventory_$fqdn":
command => "/usr/bin/check_mk -I $fqdn",
notify => Exec["checkmk_refresh"],
refreshonly => true,
tag => "checkmk_inventory",
}
The important thing to understand is that the exported resources are created in the scope of the client, but not realized on the client; they are then available to the server which actually realizes them. In this case, each client calling checkmk::agent will have these resources defined in puppet's stored config backend, and the nagios server will later scoop them up and process them. Exported resources are cousins of virtual resources and the syntax in the DSL is similar; you simply precede the name of the resource with "@@".
You will notice that I'm both creating a file resource and an exec resource. In my initial version of this module, I did not have per-node exec resources, and whenever a node changed I triggered a re-inventory of all nodes. This proved to be a bit excessive; using per-node exec statements allows you to inventory only the nodes that change.
The tricks in the template require a little explaining too. I like to be able to add check_mk tags from variables assigned in the node classifier, and this template takes those variables and creates valid check_mk configuration strings. The scope.to_hash.keys stuff allows me to use reflection in order to identify any variables whose names contain the string "check_mk_tags," and I append their corresponding values to the list of tags. This is, again, a workaround for limitations of the dashboard classifier, where we want some tags coming from a higher scope but we want to append to the list, which forces us to use multiple variables.
So, for example, I might have check_mk_tags_webserver = webserver attached to my "webserver" group in the classifier, but also check_mk_tags_paging = "paging|critical" in my "critical nodes" group; I can then place a node in both groups, and this template will smush all of the tags in both variables together (note that you should delimit your tags with a pipe if you assign multiple tags to the same variable).
The other trick is to do a DNS lookup on the fqdn of the host, and if the DNS lookup fails, hard code facter's "ipaddress" as the IP of the host. This prevents check_mk from choking on hosts with broken DNS. In addition, it tags such hosts with "brokendns," and I like to assign a corresponding hostgroup in nagios so I may easily shame whoever created such badness into fixing his problem.
Oh, one last thing; there's nothing stopping you from using facts or any other puppet variable as a tag. Simply append your fact or variable to the 'mktags' array in the ERB template, and you're good to go!
The checkmk::server class
My server class includes an exec resource (which does a check_mk inventory) as well as the crucial collection of the exported resources above. Note the syntax here to collect resources based on tags, which allows you to be selective when realizing resources.
So, that's it! Happy check_mk'ing!
Comments
Michael Klatsky @ Sat Sep 10 21:59:28 -0400 2011
First of all- thanks for your work on this. We are using check_mk and puppet and your work on this is much appreciated.
I am running into a problem, however. When running puppetd on the client, the following error is being thrown:
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to parse template checkmk/collection.mk.erb: undefined method `include?' for :_timestamp:Symbol at /etc/puppet/modules/checkmk/manifests/init.pp:46 on node
Would you have any suggestions?
Thanks!
Michael Klatsky
TNR Global, LLC
Hadley, MA 01062
Jeremy @ Mon Sep 12 13:11:46 -0400 2011
That's curious; I expected all the keys in there to be strings, but... that seems to imply that you have a non-string object as a key in your scope. I didn't even know that was possible :)
Keep in mind that this module assumes you're running a recent puppet ( I use 2.7, and yes, I know I'm using evil unscoped variables and that this module needs to be rejiggered before 2.8 comes around). I've also only tested it against dashboard, but I don't really see why it wouldn't work with foreman.
One possible workaround for you would be to modify line 8 in collection.mk.erb to replace:
if k.include?("check_mk_tags")
with:
if k.respond_to?('include?') && k.include?("check_mk_tags")
That should skip any non-string variable names. But, I do wonder if this is a symptom of something else.
Michael @ Tue Jun 19 15:38:09 -0400 2012
Hi,
could you make an example on how you would add the check_mk_tags_webserver = webserver in puppet? Do you just define it as a variable? Or do you have to export it? Lets say i have a module for apache. Can I just say
class http {
check_mk_tags_webserver = webserver
.....
}
Or do I need to export or tag it?
Best Michael
Jeremy @ Tue Jul 17 16:38:55 -0400 2012
Michael,
You can do something like that, however such tags have to be available within the scope of your check_mk agent class itself. That means you can only put tags in things that are *parents* of the check_mk agent class.
The best way I have found to use this is within a node classifier (puppet dashboard, in my case) where I will create a "group" which includes both the class I'm interested in including *and* creates a corresponding variable for cmk purposes at the same time. All variables in Puppet Dashboard are assigned in the highest scope and are therefore available in all child classes (well, that's how they work right now at least).
Note that the changes to scope in 2.8 will break this module as currently written, but I believe something similar should still be possible.
http://docs.puppetlabs.com/guides/scope_and_puppet.html
Me @ Thu Aug 30 21:13:24 -0400 2012
Thanks, I can't wait to try this. Mucho gracias;)
david garvey @ Mon Sep 17 17:55:12 -0400 2012
How are you implementing this? How is the checkmkmoduledisabled set? I have moved hosts to another vlan not accessible by puppetmaster but the check_mk puppet client on the nagios/check_mk server still see the resource.
# I like to have a convenience meta class that may be disabled with a variable; this part is completely optional
class checkmk{
if $checkmkmoduledisabled {
} else {
include checkmk::agent
}
}
Jeremy @ Tue Sep 18 16:59:28 -0400 2012
David: I set a variable in the node's scope in the classifier (in my case, Dashboard) named "checkmkmoduledisabled". The module 'checkmk' when called directly will do nothing if the variable exists and it is not a boolean false (incidentally, I believe Dashboard only deals in strings, so you cannot set the value to boolean false anyway). Otherwise it includes the good stuff (checkmk::agent).
I actually really despise this pattern but I can't think of much else for this use case that works within the limitations Dashboard imposes. In our deployment I have Dashboard "groups," which contain the basic bundles of classes and variables used by our general types of systems. Sometimes you just have to exclude a module on a node for some reason, and this hack lets me do that without having to remove the Dashboard group (and all of the other stuff it does) from the node.
This is a hack and I avoid using it generally, but it's handy to deal with one-off type exceptions in Dashboard. You're better off never including the module to begin with if you don't want to use it.
Another thing worth mentioning is that stored configs never go away on their own. If you remove a node from puppet completely (or rename it or whatever) the old stored config will live on indefinitely. You need to run puppetstoredconfigclean.rb (or whatever the current equivalent of that is) to actually get rid of stale nodes, or else their cmk configs will continue to be generated.
ingard @ Thu Sep 20 08:37:10 -0400 2012
Hi
Thanks for your work. This module fits perfectly with my setup :)
Just one question. How would you go about forcing a new inventory of a node if for instance another module adds functionality to a node resulting in more checks being available?
Regards
Ingard
Jeremy @ Fri Sep 21 09:22:53 -0400 2012
Hey Ingard - re-inventory should happen automatically whenever the list of tags changes. This is done by:
notify => Exec["checkmk_inventory_$fqdn"],
on the exported resource.
This isn't going to cover every situation, though. If you're adding your own check_mk plugins to the client (e.g. in /usr/lib/check_mk_agent/plugins/) or installing something which adds a section in the agent's return data, puppet itself doesn't have a way to know about that. In such cases you'll have to manually inventory (or, you can add a tag temporarily to trigger automatic re-inventory).
The other thing this won't do on its own is *remove* checks - it only *adds* checks (i.e. it runs check_mk -I, not check_mk -II). If you're removing stale checks, you'll still need to manually run check_mk with -II (I do this for sanity, so a change in puppet won't remove checks on me). If you trust puppet well enough, modify the code to run -II instead.
Ing @ Mon Sep 24 02:56:42 -0400 2012
Hi again
Yes I figured i could add tags to force the re-inventory, but I was kinda looking for how to trigger it via the puppet manifest directly if I did changes that results in the agents return data to change. Maybe I just got stuck on the idea that that was the way to implement it. I've not really got around to start using tags yet, but maybe that is the better option all together. However I've not gotten my puppet setup to use ENCs either. Anyway, is there a way to trigger the notify from outside the class ?
Jeremy @ Thu Oct 04 15:14:05 -0400 2012
Hmm, you could create another resource for that purpose which has a similar notify option. You could modify that resource, then have the exec trigger when it does so.
The big gotcha is that such a resource still has to exist *on the nagios server*. This means that, as with the conf files themselves, this dummy resource would have to go through the collect/export process. Or you'd have to manually create a resource only on the nagios server which references the refresh by actual name (rather than using the $::fqdn variable) to trigger the refresh.
I'll be honest, I haven't modified existing checks in cmk frequently enough for this to be a major issue. When I've added my own checks I've done it manually by shoving a list of nodes in a shell for loop and running check_mk with the appropriate options against them all.
david garvey @ Mon Oct 08 18:11:55 -0400 2012
Thanks Jeremy,
I just went with the removal of the host as we reprovision quite frequently. I am new to check_mk so any input will be gladly be considered.;)
PuppetMaster:
puppet node clean host
Icinga Server:
cat remove_host.sh
#!/bin/bash
HOST=$1
echo $HOST
mv /etc/check_mk/conf.d/puppet/${HOST}.mk /tmp/
/usr/bin/cmk -uII $HOST
mv /var/lib/check_mk/autochecks/${HOST}.mk /tmp/autocheck_${HOST}.mk
/usr/bin/cmk -R
[root@icinga ~]#
Jeremy @ Tue Oct 16 16:29:12 -0400 2012
David, that'll get the job done, although you *should* be able to remove a host by doing:
Puppet master:
puppet node clean host
Nagios server:
(wait for the next puppet run to clean it up, or run puppetd -tv if you're in a hurry)
Relevant part of the module is:
file { "$mk_confdir":
ensure => directory,
purge => true,
recurse => true,
notify => Exec["checkmk_refresh"],
}
With that resource, puppet will automatically remove any file in the mk_confdir that it doesn't know about. Once you run 'puppet node clean' you'll get rid of the stored config for that host, and the corresponding file will be killed off in the next puppet run (the 'notify' will trigger check_mk -R, which causes cmk to reload its config to reflect the changes)
IMO, puppet dashboard *should* purge stored configs when you hit the "delete" button in the web UI. Maybe other classifiers handle this more intelligently :)