[CentOS] Spacewalk or Puppet?

Wed Nov 4 22:05:04 UTC 2009
nate <centos at linuxpowered.net>

Les Mikesell wrote:

> There are things that just have to work together and across platforms,
> like the inventory, monitoring, and capacity tracking so I automatically
> see it as going the wrong direction to even consider something that
> locks you into a single OS or vendor.  I'd like to promote greater linux
> use, but can't unless the tools interoperate well and so far
> ocsinventory and clonezilla are about the only ones that do.

It'd be nice if there was a integrated cross platform
monitoring/management type package that worked well.
So many have tried, I don't think any have succeeded. It's just
too complex a task.

Our monitoring is primarily nagios+cacti which are maintained by
hand currently. Myself I have literally tens of thousands of hours
invested in monitoring scripts mostly integrating with RRDTool for
performance and trending analysis. Everything from basic CPU/IO/memory
to load balancers, switches, PDUs, databases, storage arrays etc.

Windows stuff on the other hand is more complicated. I tied in
some NSclient/perfmon stuff along with SNMP(+snmp informant)
and get a few dozen stats off of our MSSQL servers, honestly
can't rate the accuracy, so won't stake anything on those results.
They represent a tiny minority of our stuff though, I think we have
more load balancers than windows boxes..well almost.

Cacti does suck but it does have a pretty nice UI for end users as
far as viewing the data. It's back end scalability is non existent
at the moment. My more recent scripts rely on updating RRDs outside
of cacti and just pointing cacti at them for the presentation
layer. My main cacti server collects nearly 16,000 data points
a minute, running at ~20% cpu. 6500 of those come from my storage
array(they have their own tool but I like mine more). It's a very
labor intensive process is the main downside, but I haven't come
across anything better yet. Some tools are better in some areas
others in others. I even wrote my own originally back in 2003.

The original implimentation of cacti struggled to keep roughly
3000 data points updated every 5 minutes, and most of the stats
were not accurate. So my system is collecting 26 times more
data, runs 10x faster, and can scale at least 2x higher than
what it's on now without adding hardware, and best of all
has truly accurate information.

I remember back in 2005 a company I was at was trying to deploy
sitescope, and they kept saying how my graphs were so much
better than what they could get out of that $100,000 product,
at least at the time. I'm sure their stuff has gotten better,
as has mine! They also tried deploying zabbix, and I think
replaced nagios with it, but years after deployment even
though they basically had a full time zabbix developer they
were _still_ using my scripts and graphs for several key
metrics on the system.

At some point maybe I'll get the time to re-visit a trending
application, my key requirement would be to use a RRD back end,
be able to store multiple data points in a single file, be able
to rrds created by other applications and have a nice UI for
end users. And be scalable, at least 20,000 updates a minute.

I can write/script all of the back end stuff myself but I'm no
programmer so can't do the front end.

nate