On Fri, Dec 16, 2011 at 10:19 AM, Alan McKay alan.mckay@gmail.com wrote:
On the one hand I like having an agent on the remove device since it allows you to have functionality that is more purpose-driven to what we are trying to do. On the other hand, what above devices that cannot run the agent? e.g. monitoring switches and routers. Though to counter my own concern - those are the sorts of things that are either up or down anyway and I"m not sure that they can be "monitored" per-se outside of that. Sure you can graph their traffic and so forth, but is any monitoring software able to actually say "there is a potential problem with your router or switch"? Other than "your device is now down" which is pretty easy to figure out anyway without monitoring software since just about anything connected to it is going to start throwing alarms once it is down.
Yes, you can configure most managed devices to send snmp traps and/or syslog messages about problems to your monitoring receiver. And your monitor polling for snmp values can alarm on failures and thresholds exceeded in the values (like bandwidth percentage used, interface errors, interface drops, etc.).
Incidentally I also looked at OpenNMS which has a live demo online - I don't like the dashboard and basic functionality as much as Zabbix or Zenoss. And since I did not set it up myself nor configure it, I cannot comment on that.
Opennms starts with the assumption that you will be monitoring more things than you can usefully display, so what you see on the home screen will be mostly counts of systems with errors in each category with a drill-down to the actual node entries. This won't mean much if you don't customize the categories for your network. By default it will collect histories of a large number of snmp values for most common systems/devices and generate events/notifications for failures and thresholds. But, by default you have to go to a particular node and pick one or more things from its 'resource graph' list. If you want to see a group of graphs from different devices (like the bandwidth on several important router interfaces or the CPU load on a farm of servers) you can can arrange them on a 'Key SNMP Customized' (KSC) report page. These pages auto-refresh when viewed and often are the best way to watch something. It is fairly easy to install your own system since you can do a yum-based install.
Thoughts form anyone on any of this?
Network monitoring is not trivial no matter what tool you use. Pick something that you trust to scale to the proportions you will need so you don't do a lot of work and then hit a wall. And if you have a lot of systems, avoid anything that needs per-system configuration or agent installation.