[CentOS] Spacewalk or Puppet?

Fri Nov 6 15:07:06 UTC 2009

Karanbir Singh wrote:

> To me, this is a major contribution by some of these tools today -
> spacewalk, puppet, cfengine, chef, bcfg2, slack : all becoming focal
> groups - even if they only address specific use-cases or only address
> certain mindsets / thought process's.

Another aspect of the management tools I forgot to mention is their
own complexity. Most of the enterprise grade stuff is so complicated
you need dedicated trained people to work on it as their sole task,
whether it's something like BMC or HP Openview/sitescope type stuff.

I can only speak for what I've used but at my current company I
was hired after the previous guy left. One of the reasons my company
wanted me was my knowledge of cfengine, which they had deployed but
really nobody knew how to use.

It wasn't easy going into a new enviornment with a five 9 SLA and
dig into their systems with pretty much the only documentation that
I had was a poorly setup cfengine implimentation. I mean their
install process involved *manually* defining classes on the
command line and running the agent to pick up the associated
configurations for those classes. Where as my setup the classes
are always defined and no manual intervention needed. The internal
infrastructure was a total mess, and some of it still is. The
older DNS systems run on windows and weren't setup properly, the
master DNS failed a couple of days ago causing havok on the back
end. I worked around it for a little while they couldn't get the
system back up, then yesterday the main zone for the back end
expired wrecking havok again, and we did an emergency migration
to linux at that point. The guys doing the windows stuff don't know
what they are doing, it's all legacy and hasn't been touched in
years.

So I spent a few months slowly re-writing the entire system, along
with re-doing the entire build system which was just as bad but
fortunately kickstart is a lot easier to work with.

Even today more than a year later there are probably 45-50 systems
out there running on the "old" stuff. There is no good/safe/clean
way to migrate to the new w/o re-installing. So I've been doing
that as opportunities present themselves. Probably 40 of those
systems will be replaced in the next 3-5 months so that will allow
me to almost complete the project..

Maybe when I leave my current company the next person will come
in and re-write everything again, or switch to puppet or
something..

Funny I just realized when I left my previous company I left them
with another guy who was there the whole time who knew CFengine as
well, and he tried to train another guy there before he left
but since all of them have left.. They hired some other guy but
not sure if he's still there and not sure if he ever learned how
things worked, the company has mostly collapsed under multiple
failed business models.

I feel for the folks who are so overworked and stressed out that
they don't have the ability to learn better ways of doing things,
I've been there too. And fortunately am now in a position where I
have the luxury of refusing such positions and tasks now because
it's not worth my time.

As for monitoring automation I hear ya, that would be cool to
have, right now for us it's not much of an issue our growth rate
is pretty small(roughly 400 systems). We have a tier 1 team that
handles most of monitoring setup so I just give them a ticket
and tell them to do it and they do.

I'll be deploying another new edge data center location in a
couple of weeks, 14 servers, 10 bare metal, 4 run ESXi, about
50 instances of CentOS total, with kickstarting over the WAN
I can usually get most everything up in about a day, which
sadly is faster than the network guy takes to setup his two
switches, two load balancers, and two firewalls. Despite the
small server count the hardware is benchmarked to run our
app at about 40,000 transactions a second as a whole, which
is the fastest app by several orders of magnitude that I've
ever worked on anyways. We have more capacity issues on the
load balancers than we have with the servers with such
a small server count.

It certainly was an interesting experience the first time
deploying a data center from remote, we just had 1 really
basic server config used to seed the rest of the network,
everything done via remote management cards in the servers
and remote installations over the WAN. With the exception of
the network stuff which the network guy was on site for
a couple of days to configure.

I have 10x the equipment to manage and can still get things
done faster than him. I could manage all of the network stuff
too without much effort.

If I spent some time I'm sure I could automate some nagios
integration but forget about cacti, lost cause. Maybe OpenNMS at
some point who knows.. At this time automating monitoring
integration isn't a pressing issue. I spend more time writing
custom scripts to query things in the most scalable way I can
than we do adding new things to the monitoring stuff.

Myself I am not holding my breath on any movement or product to
come around and make managing systems especially cross platform
simpler and cheaper. The task is just too complex. The biggest
companies such as Amazon, Google, MS etc have all realized there
is little point in even trying such a thing.

It would only benefit really small companies that are at a growth
point where they don't have enough business to hire people to
standardize on something(or have teams for each thing). And
those companies can't afford the costs involved with some big
new fancy tool to make their lives easier. The bug guys don't
care since they have the teams and stuff to handle it.

Though that won't stop companies from trying..the latest
push of course is to the magical cloud where you only care about
your apps no longer care about the infrastructure.

Maybe someday that will work, I talked to a small company
recently that is investing nearly $1M to in source their
application from the cloud to their own gear(they have never
hosted it themselves) because of issues with the cloud that
they couldn't work around.

good talks though, the most interesting thread I've seen here
in I don't know how long. Even though it was soooooooo off topic
(from what the list is about anyways) :)

nate