Sergio Belkin wrote:
2008/5/13 jleaver+centos@reachone.com:
OK, you won :) I'm going to test nagios. I am using centos 5.1 x86_64. Do I lose much if I use rpm from rpmforge (version 2.9)?
We're running version 2.11 at the office (on CentOS 5.1 x86_64). I've looked at some of the things in 3.0, but there's nothing there that I needed yet.
Hopefully you have some way to track changes in /etc/nagios (FSVS is what we use), because it will make your life much easier to have an audit trail.
We created sub-folders under /etc/nagios to hold the various types of entities. For example, we have:
/etc/nagios/commands /etc/nagios/contacts /etc/nagios/contactgroups /etc/nagios/hosts-switches /etc/nagios/hosts-dmz /etc/nagios/hosts-servers /etc/nagios/hosts-lan /etc/nagios/templates-hosts /etc/nagios/templates-services
We then broke individual elements out of the default massive configuration folder into individual .cfg files. For example, we chose to create individual files for each contact rather the putting them all in a single file. So far it works well, it's a lot easier to get a feel for what users have been defined, what hosts are defined, what the templates are. Because when I look in templates-services, I see from the directory listing that I have service templates named X, Y and Z (without having to open up the file to look).
We currently put service checks for individual hosts in the same configuration file as the host. So you will have the following definitions in a typical host file (until you get into templating):
define host{ define hostextinfo{ define service{ define service{ ...
Any plugins that we wrote ourself, we put under a separate folder. Which keeps them separate from
/usr/local/lib64/nagios-plugins/
Basically, start small, track your changes, and plan on refactoring it in week #2 after you start monitoring about a dozen hosts. Stay away from advanced things like escalation, monitoring things like disk space on remote servers, or the like until you get the basics working.
Oh, and SELinux will probably get in your way. So you'll need to play with audit2allow to create supplemental policy to give Nagios additional permissions. (Which may have been due to PEBKAC issues on my end - I plan on going back and looking at labeling and figuring out what I mislabeled.)
I think that's the majority of the issues that we dealt with in the past 2 weeks. We're now in fine-tuning mode and getting ready to start monitoring remote services next week.