Les Mikesell wrote:
There are things that just have to work together and across platforms, like the inventory, monitoring, and capacity tracking so I automatically see it as going the wrong direction to even consider something that locks you into a single OS or vendor. I'd like to promote greater linux use, but can't unless the tools interoperate well and so far ocsinventory and clonezilla are about the only ones that do.
It'd be nice if there was a integrated cross platform monitoring/management type package that worked well. So many have tried, I don't think any have succeeded. It's just too complex a task.
Our monitoring is primarily nagios+cacti which are maintained by hand currently. Myself I have literally tens of thousands of hours invested in monitoring scripts mostly integrating with RRDTool for performance and trending analysis. Everything from basic CPU/IO/memory to load balancers, switches, PDUs, databases, storage arrays etc.
Windows stuff on the other hand is more complicated. I tied in some NSclient/perfmon stuff along with SNMP(+snmp informant) and get a few dozen stats off of our MSSQL servers, honestly can't rate the accuracy, so won't stake anything on those results. They represent a tiny minority of our stuff though, I think we have more load balancers than windows boxes..well almost.
Cacti does suck but it does have a pretty nice UI for end users as far as viewing the data. It's back end scalability is non existent at the moment. My more recent scripts rely on updating RRDs outside of cacti and just pointing cacti at them for the presentation layer. My main cacti server collects nearly 16,000 data points a minute, running at ~20% cpu. 6500 of those come from my storage array(they have their own tool but I like mine more). It's a very labor intensive process is the main downside, but I haven't come across anything better yet. Some tools are better in some areas others in others. I even wrote my own originally back in 2003.
The original implimentation of cacti struggled to keep roughly 3000 data points updated every 5 minutes, and most of the stats were not accurate. So my system is collecting 26 times more data, runs 10x faster, and can scale at least 2x higher than what it's on now without adding hardware, and best of all has truly accurate information.
I remember back in 2005 a company I was at was trying to deploy sitescope, and they kept saying how my graphs were so much better than what they could get out of that $100,000 product, at least at the time. I'm sure their stuff has gotten better, as has mine! They also tried deploying zabbix, and I think replaced nagios with it, but years after deployment even though they basically had a full time zabbix developer they were _still_ using my scripts and graphs for several key metrics on the system.
At some point maybe I'll get the time to re-visit a trending application, my key requirement would be to use a RRD back end, be able to store multiple data points in a single file, be able to rrds created by other applications and have a nice UI for end users. And be scalable, at least 20,000 updates a minute.
I can write/script all of the back end stuff myself but I'm no programmer so can't do the front end.
nate