Hi.
At that point I pass it over to puppet personally. Used to use cfengine, but there are aspects I prefer when it comes to puppet; your mileage may of course vary.
well, refer back to my initial email on the subject. Its how you split state and policy - puppet isnt all that great at state management but does a great job of policy management and enforcement for that state.
But then again it depends on how you play your setup, and exactly how you define what 'management' really is.
I am personally not that big fan of Puppet, as things are getting quite complex in large scenarios and as Puppet does not scale well (this has been improved in the latest version if you are using passenger instead of webrick).
If you are willed to set up complex configurations with depends and variables, Puppet may be a good choice. In addition of Cobbler or TheForeman you will get provisioning functionality, too. IMHO you should also be familiar with Ruby, too.
Spacewalk is one single tool for all Lifecycle Management task. It is capable of bare provisioning (using Cobbler integration), re-provisioning (using Koan), configuration management, errata generation and package management. It also scales quite well if you are using Oracle Standalone instead of XE. With PostgreSQL, a free database backend will be integrated in the near future.
We are using Satellite/Spacewalk to manage about 2500 Clients and Servers.
Best Regards Marcus
Marcus Moeller wrote:
Hi.
At that point I pass it over to puppet personally. Used to use cfengine, but there are aspects I prefer when it comes to puppet; your mileage may of course vary.
well, refer back to my initial email on the subject. Its how you split state and policy - puppet isnt all that great at state management but does a great job of policy management and enforcement for that state.
But then again it depends on how you play your setup, and exactly how you define what 'management' really is.
I am personally not that big fan of Puppet, as things are getting quite complex in large scenarios and as Puppet does not scale well (this has been improved in the latest version if you are using passenger instead of webrick).
If you are willed to set up complex configurations with depends and variables, Puppet may be a good choice. In addition of Cobbler or TheForeman you will get provisioning functionality, too. IMHO you should also be familiar with Ruby, too.
Spacewalk is one single tool for all Lifecycle Management task. It is capable of bare provisioning (using Cobbler integration), re-provisioning (using Koan), configuration management, errata generation and package management. It also scales quite well if you are using Oracle Standalone instead of XE. With PostgreSQL, a free database backend will be integrated in the near future.
We are using Satellite/Spacewalk to manage about 2500 Clients and Servers.
How well do any of these tools work when the hosts are widely distributed or distributed with groups in different locations? And how do they handle IP assignment on multiple NICs? Do you need DHCP capability on all segments or do you need to know all the MAC addresses and the cable connectivity starting out?
Also, do they provide a version-controlled history with a way to easily find when a change was made and undo it?
On 11/04/2009 01:39 PM, Les Mikesell wrote:
How well do any of these tools work when the hosts are widely distributed or distributed with groups in different locations?
yes.
And how do they handle IP assignment on multiple NICs? Do you need DHCP capability on all segments or do you need to know all the MAC addresses and the cable connectivity starting out?
You only need that for the initial provisioning setup, once the machine is running - neither of the tools need any info on network stacks ( but you are welcome to plumb in whatever you need / want ). DHCP isnt ideal when setup in smaller clusters over a widely spread out, geographically, setup .
Also, do they provide a version-controlled history with a way to easily find when a change was made and undo it?
You dont really need that with platform / state - but its a must for policy and role control - All the major tools in that space encourage use of a VCS backed with layered ACL's if need : puppet/cfengine/chef/bcfg2
- KB
Karanbir Singh wrote:
And how do they handle IP assignment on multiple NICs? Do you need DHCP capability on all segments or do you need to know all the MAC addresses and the cable connectivity starting out?
You only need that for the initial provisioning setup, once the machine is running - neither of the tools need any info on network stacks ( but you are welcome to plumb in whatever you need / want ).
What good is a configuration tool if it can't handle a change in NIC setup? That's really about the only thing that is enough trouble to do manually that it is worth more automation than a shell loop of ssh commands.
DHCP isnt ideal when setup in smaller clusters over a widely spread out, geographically, setup .
Exactly - and remote 'hands on' support generally won't know which NIC is which, making this fairly problematic. And you can't just clone setups because the copies won't work with different MAC addresses.
Also, do they provide a version-controlled history with a way to easily find when a change was made and undo it?
You dont really need that with platform / state - but its a must for policy and role control - All the major tools in that space encourage use of a VCS backed with layered ACL's if need : puppet/cfengine/chef/bcfg2
I'm not sure I agree with that. I really do want to know about platform/state history even if I can't roll it back. For example, if someone changes the duplex setting on a NIC to match a switch I'd like to have the change recorded - and a way to look at how that machine is different from both the way it was at some other time and from other similar machines.
Also, we almost never roll out a change across all machines in a group at the same time but instead closely schedule individual machines or small sets. Do any of the tools make this easy? That's the main reason I haven't used OCSinventory's deployment mechanism even though its cross-platform capabilities are appealing in a mixed environment.
On 11/04/2009 04:35 PM, Les Mikesell wrote:
What good is a configuration tool if it can't handle a change in NIC setup? That's really about the only thing that is enough trouble to do manually that it is worth more automation than a shell loop of ssh commands.
I am not sure how you got from one place to another. Essentially any config tool is able to handle network setups, including stuff like bonding and HA. Think of it like this - if you can do it by hand, you can plumb in the knowledge the system needs to make those decisions for you.
Exactly - and remote 'hands on' support generally won't know which NIC is which, making this fairly problematic. And you can't just clone setups because the copies won't work with different MAC addresses.
I personally think that cloning is for people who dont know what they are doing. Bare metal install and provision into a role with even a moderately usable app deployment strategy, cap'd with a monitoring system that actually know what it needs to do - will easily be more friendly than any form of cloning.
I'm not sure I agree with that. I really do want to know about platform/state history even if I can't roll it back.
well, most people replace or upgrade the platform ( hardware ) when things break, so rolling back is sort of academic in this sense. You are much better off storing platform inventory in something like ocs / glpi / spacewalk ( or even doing things like getting sosreport to give you the right foo, checking that into a local vcs - so you can 'track' a system's evolution )
For example, if someone changes the duplex setting on a NIC to match a switch I'd like to have the change recorded - and a way to look at how that machine is different from both the way it was at some other time and from other similar machines.
how do you mean 'changes' ? I highly recommend there be no way for people to get onto the machine unless in an emergency. So no ssh as an example. That sort of ensures that policy changes like this, need to go via the management system - leaving you a nice audit trail.
Addition of a 'SSH box', needing people to contribute $5 everytime they get onto ssh, also helps get people into the right mindset :)
Also, we almost never roll out a change across all machines in a group at the same time but instead closely schedule individual machines or small sets.
puppet, atleast, makes this sort of a thing trivial - since you can setup environments, and then nominate machines to join different environments on demand. It also make it possible for one to have code-like release and deployment cycles.
You could potentially even have various test and/or wrap some of the policy in tests with rollback capabilities ( I tend to do that for my ssh and puppet configs - configs for puppet itself that is ).
Karanbir Singh wrote:
I personally think that cloning is for people who dont know what they are doing.
Of course - that's a feature, though, not a bug. You want the people who bolt the machines in the racks to be able to do it - and you don't want it to dictate to you what OS they can install.
Bare metal install and provision into a role with even a moderately usable app deployment strategy, cap'd with a monitoring system that actually know what it needs to do - will easily be more friendly than any form of cloning.
OK, just as soon as one exists that works across platforms for people who don't know what they are doing.
I'm not sure I agree with that. I really do want to know about platform/state history even if I can't roll it back.
well, most people replace or upgrade the platform ( hardware ) when things break, so rolling back is sort of academic in this sense. You are much better off storing platform inventory in something like ocs / glpi / spacewalk ( or even doing things like getting sosreport to give you the right foo, checking that into a local vcs - so you can 'track' a system's evolution )
A pre-configured ocs agent is part of our base images, so I have that as soon as the network is set up, but I'd like more detail.
For example, if someone changes the duplex setting on a NIC to match a switch I'd like to have the change recorded - and a way to look at how that machine is different from both the way it was at some other time and from other similar machines.
how do you mean 'changes' ?
I'd like to have the whole /etc tree handled more or less like rancid does for cisco configs - that is, toss the whole thing into cvs/subversion, etc., but with similar machines handled as branches to compress the space and give you an easy way to see diffs. And, of course, something similar for the windows registry.
I highly recommend there be no way for people to get onto the machine unless in an emergency.
It is always an emergency except for things like content on the web servers (which mostly gets there via rsync over ssh with underlying versioning).
Also, we almost never roll out a change across all machines in a group at the same time but instead closely schedule individual machines or small sets.
puppet, atleast, makes this sort of a thing trivial - since you can setup environments, and then nominate machines to join different environments on demand. It also make it possible for one to have code-like release and deployment cycles.
You could potentially even have various test and/or wrap some of the policy in tests with rollback capabilities ( I tend to do that for my ssh and puppet configs - configs for puppet itself that is ).
Just dreaming here, but I think this stuff really belongs in something like Hudson with a drools plugin where you could automate all the way up from the source code through testing and deployment. That would start out with something that already has a cross platform scheduling/execution capability and knows more nuts and bolts about programming and platform differences than most admin-only tools.
On 11/04/2009 06:49 PM, Les Mikesell wrote:
I personally think that cloning is for people who dont know what they are doing.
Of course - that's a feature, though, not a bug. You want the people who bolt the machines in the racks to be able to do it - and you don't want it to dictate to you what OS they can install.
Not sure what kind of a setup you have in mind - most, if not all, the people I know running their own management services tend to need and use little other than remote hands for moving stuff around or pressing very specific buttons. Across the board, most DC's in the UK and the few that I haved worked with in the US as well, tend to prefer tech support to come from the services organisation - which in this case should be / would be you - and not the Data Centre guys - their main role is to manage electrics/temperature and in quite a lot of cases, the network.
OK, just as soon as one exists that works across platforms for people who don't know what they are doing.
If they dont know what they are doing - start by firing the guy who hired them. He (or she) is clearly incompetent in his own role.
how do you mean 'changes' ?
I'd like to have the whole /etc tree handled more or less like rancid does for cisco configs
Were not really living in the 1980's anymore.. things have moved on a bit in terms of both mindset and capabilities of tools around systems management :) And there is also a lot of policy and role management that happens outside /etc that is well worth managing too.
It is always an emergency except for things like content on the web servers (which mostly gets there via rsync over ssh with underlying versioning).
You clearly need better management if all you are doing is fighting fires and causing emergencies all the time!
Just dreaming here, but I think this stuff really belongs in something like Hudson
Completely disagree. Hudson is a depoyment tool that wraps app and build to release code. Systems management using tooling like that is not only a waste of time, is also adding layers of complexity that only add to the instability of the whole stack.
On the flip side, hudson could be managed to some extent using your state management tool - like spacewalk - which could then manage code rollout and build-> test -> release cycles.
Les Mikesell wrote:
What good is a configuration tool if it can't handle a change in NIC setup? That's really about the only thing that is enough trouble to do manually that it is worth more automation than a shell loop of ssh commands.
Just wondering what kind of NIC setup? In the hundreds of systems I have managed I've never had to change the default NIC settings. If you mean interface (IP/etc) setup then that could be an issue, for me I have a script that grabs the MAC addresses and serial numbers and polls a web server with config files associated with them to configure interfaces upon system installation (I haven't had to change them post install, I prefer just to re-install if the system is being re-purposed).
Exactly - and remote 'hands on' support generally won't know which NIC is which, making this fairly problematic. And you can't just clone setups because the copies won't work with different MAC addresses.
If your setup is simple, e.g. 1 network, what I do is I bond all of the interfaces into a single bond in active/passive mode, that makes all of the NICs available for the same purpose, no need to know what is where. If the system needs to access another part of the network that is handled via routing not via physical connection.
If you have an issue where you need to change a NIC's duplex setting because of a flawed switch I'd suggest you look at replacing your switches(at least going forward). I've only had to screw with the duplex setting on a couple of occasions about 5-6 years ago with really old HP big iron. Hundreds of x86 boxes and different switch types/models/vendors later never had a problem.
small sets. Do any of the tools make this easy? That's the main reason I haven't used OCSinventory's deployment mechanism even though its cross-platform capabilities are appealing in a mixed environment.
Define easy, in cfengine and puppet(I'm sure, never used it though) you can define a class of systems and roll the change out to that class. OCS really is a poor management system IMO, it's ok for inventory but the rest is crap. Can't speak for spacewalk, it sounds like a decent inventory/installation system for redhat-based systems but myself wouldn't use it beyond that role.
My own cfengine configuration consists of roughly 17,000 lines and a couple thousand files that are pushed out to various systems(in many cases I push out entire config files rather than having cfengine edit them inline).
It takes some time to get ramped up(I've been working with cfengine for many years) but once your there life is a lot easier. Probably took me a good 2 years of learning. A lot of it revolving around changing the way you think, how can X concept be applied in a more generic fashion to dynamically adapt to more systems automatically for example. Such as defining a dynamic class so when you build a new server it automatically gets everything it needs without having to go touch your policy files.
nate
nate wrote:
What good is a configuration tool if it can't handle a change in NIC setup? That's really about the only thing that is enough trouble to do manually that it is worth more automation than a shell loop of ssh commands.
Just wondering what kind of NIC setup? In the hundreds of systems I have managed I've never had to change the default NIC settings. If you mean interface (IP/etc) setup then that could be an issue, for me I have a script that grabs the MAC addresses and serial numbers and polls a web server with config files associated with them to configure interfaces upon system installation (I haven't had to change them post install, I prefer just to re-install if the system is being re-purposed).
Most of our machines have 5 or so NICs, each connected to special purpose subnets. And even the ones that only need 1 or 2 connections will have the same physical setup so the servers are reusable.
Exactly - and remote 'hands on' support generally won't know which NIC is which, making this fairly problematic. And you can't just clone setups because the copies won't work with different MAC addresses.
If your setup is simple, e.g. 1 network, what I do is I bond all of the interfaces into a single bond in active/passive mode, that makes all of the NICs available for the same purpose, no need to know what is where. If the system needs to access another part of the network that is handled via routing not via physical connection.
It's not simple. Some of the networks will have multicast data feeds, others have backend data, admin access, or are public facing. So, I need to configure the correct addressing and routes for each.
If you have an issue where you need to change a NIC's duplex setting because of a flawed switch I'd suggest you look at replacing your switches(at least going forward).
Of course, but that's the point. If you've had old Cisco switches that didn't auto negotiate well, you'll have all of the connected equipment set to force full duplex. Then when you replace the switch you have to undo that - probably one subnet at a time. How do you manage real-world things like that with a configuration tool?
I've only had to screw with the duplex setting on a couple of occasions about 5-6 years ago with really old HP big iron. Hundreds of x86 boxes and different switch types/models/vendors later never had a problem.
OK, but it's configuration, and it affects every piece of equipment once if you start with older infrastructure.
small sets. Do any of the tools make this easy? That's the main reason I haven't used OCSinventory's deployment mechanism even though its cross-platform capabilities are appealing in a mixed environment.
Define easy, in cfengine and puppet(I'm sure, never used it though) you can define a class of systems and roll the change out to that class.
Easier than an ssh loop that does a 'yum update xxx' or similar command across a set of machines.
OCS really is a poor management system IMO, it's ok for inventory but the rest is crap.
Yes, but what else works cross-platform? I'm toying with the idea of using its agent to run a command, but running the agent via ssh or winexec/psexec (windows) to control the timing.
Can't speak for spacewalk, it sounds like a decent inventory/installation system for redhat-based systems but myself wouldn't use it beyond that role.
I can't quite deal with the idea of needing to abstract OS commands and doing it in a way that still only works with one OS. Why not either just automate the actual commands you need to run, or fix the commands in the first place if they are so bad that you have to abstract them into some new language. And RHEL/Centos boxes are a small part of the operation at the moment.
My own cfengine configuration consists of roughly 17,000 lines and a couple thousand files that are pushed out to various systems(in many cases I push out entire config files rather than having cfengine edit them inline).
And that's supposed to be the easy way?
It takes some time to get ramped up(I've been working with cfengine for many years) but once your there life is a lot easier. Probably took me a good 2 years of learning. A lot of it revolving around changing the way you think, how can X concept be applied in a more generic fashion to dynamically adapt to more systems automatically for example. Such as defining a dynamic class so when you build a new server it automatically gets everything it needs without having to go touch your policy files.
Could you switch arbitrary boxes to windows or some other OS without changing what the operators see? If you are still tied to the arcana of the underlying system - and vulnerable to its changes, what does this get you?
On Wed, Nov 4, 2009 at 1:07 PM, Les Mikesell lesmikesell@gmail.com wrote:
Yes, but what else works cross-platform? I'm toying with the idea of using its agent to run a command, but running the agent via ssh or winexec/psexec (windows) to control the timing.
Puppet works across Linux / Windows / Mac platforms. It can do more on Linux and Mac than it can on Windows, but its at the least capable of causing arbitrary commands to be executed on Windows.
Les Mikesell wrote:
Most of our machines have 5 or so NICs, each connected to special purpose subnets. And even the ones that only need 1 or 2 connections will have the same physical setup so the servers are reusable.
Trunk all the ports and use VLANs ?
Of course, but that's the point. If you've had old Cisco switches that didn't auto negotiate well, you'll have all of the connected equipment set to force full duplex. Then when you replace the switch you have to undo that - probably one subnet at a time. How do you manage real-world things like that with a configuration tool?
Set the new switches to be forced full duplex too and go in and fix the systems with a script or by hand, or just rebuild them (very few of my systems have data on their local drives that is valuable, everything is stored or transferred to centralized storage)
Easier than an ssh loop that does a 'yum update xxx' or similar command across a set of machines.
Depends on your needs, for me a ssh loop wouldn't cut it. It works ok on a tiny scale. Having a management system like puppet or cfengine also makes sure the state is kept the same. If someone goes in and changes the passwd file or overwrites ntp.conf cfengine reverts the change within the hour.
I can't quite deal with the idea of needing to abstract OS commands and doing it in a way that still only works with one OS. Why not either just automate the actual commands you need to run, or fix the commands in the first place if they are so bad that you have to abstract them into some new language. And RHEL/Centos boxes are a small part of the operation at the moment.
It's not just commands, it's configuration as well, configuration that is different depending on the system's purpose, what data center it's located in, what time of day it is, what applications it's responsible for.
And that's supposed to be the easy way?
It is when you have as many moving parts as we do yes. I've tried other methods, for years I was using the ssh loop route because we were so slammed we had no time to learn a proper way to manage systems, once we learned the proper way things are soooo much easier.
You wouldn't believe the lengthy list of commands needed to build a system from the ground up before I re-wrote everything so it is automated. And the lengthy list of commands only covered a couple particular type of systems. The rest were built from hand from memory. I had to go in and learn how everything was setup and automate it.
Right now I have roughly 150 classes of systems, each defines a subset of the infrastructure that gets a particular type of configuration enforced on it. Most of those systems are added into the classes dynamically by their host name or other system properties(such as a script to detect whether or not a system is a VM).
The head QA guy here was able to build his new VM-based environment in about 2 weeks(roughly 70 VMs) because of this, previously he said it would of taken many, many months.
Could you switch arbitrary boxes to windows or some other OS without changing what the operators see? If you are still tied to the arcana of the underlying system - and vulnerable to its changes, what does this get you?
cfengine runs on many systems windows included. I haven't run it myself on anything other than linux. I don't get involved with windows stuff, keeps my stress levels lower.
http://cfengine.com/pages/nova_supported_os
I use an older version of cfengine not the nova stuff.
So it all depends on what your needs are, certainly puppet and cfengine type tools are not for everyone, I wouldn't even recommend them for small deployments(say less than 50 servers). If your one person responsible for servers numbering in the hundreds, or your team is responsible for servers numbering in the thousands then tools like them are priceless.
nate
nate wrote:
Most of our machines have 5 or so NICs, each connected to special purpose subnets. And even the ones that only need 1 or 2 connections will have the same physical setup so the servers are reusable.
Trunk all the ports and use VLANs ?
This might be possible now - it wasn't when the infrastructure was built and it still seems like a bad idea to throw 60Mb+ multicast feeds onto the same physical interface with anything else.
Of course, but that's the point. If you've had old Cisco switches that didn't auto negotiate well, you'll have all of the connected equipment set to force full duplex. Then when you replace the switch you have to undo that - probably one subnet at a time. How do you manage real-world things like that with a configuration tool?
Set the new switches to be forced full duplex too and go in and fix the systems with a script or by hand, or just rebuild them (very few of my systems have data on their local drives that is valuable, everything is stored or transferred to centralized storage)
But will the tool do these changes for me?
You wouldn't believe the lengthy list of commands needed to build a system from the ground up before I re-wrote everything so it is automated.
Sure, but after doing one right, clonezilla can give you a thousand just like it without caring how it got that way. I agree it's ugly, but it also doesn't depend those underlying commands being repeatable or the OS that they ran on.
Les Mikesell wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool. You could define a class that runs a script to detect the network settings, if it is forced to full duplex it would return true, which would then trigger another command to run or config files to get copied, if configs are copied after that it could execute another command(perhaps snmpset to change the switch config or something).
Sure, but after doing one right, clonezilla can give you a thousand just like it without caring how it got that way. I agree it's ugly, but it also doesn't depend those underlying commands being repeatable or the OS that they ran on.
Yeah the biggest issue I have with images is hardware types, if all your boxes are the same then it can be ok, but if you have varying types of hardware it could get messy of course..
nate
nate wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool.
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
You could define a class that runs a script to detect the network settings, if it is forced to full duplex it would return true, which would then trigger another command to run or config files to get copied, if configs are copied after that it could execute another command(perhaps snmpset to change the switch config or something).
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
Sure, but after doing one right, clonezilla can give you a thousand just like it without caring how it got that way. I agree it's ugly, but it also doesn't depend those underlying commands being repeatable or the OS that they ran on.
Yeah the biggest issue I have with images is hardware types, if all your boxes are the same then it can be ok, but if you have varying types of hardware it could get messy of course..
It's messy any way you look at it. We have a few different types of hardware and have to spend some time tuning the base install for new ones but that's something manageable.
Les Mikesell wrote:
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
If your just interested in doing one thing then you wouldn't.. the thread seemed to dive into broader topics than one particular issue.
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
LLDP is supposed to address that, or CDP if your Cisco, EDP if your Extreme etc. I haven't had the need to go to this level myself, though a former co-worker of mine who is a former amazonian said they used CDP all the time on their systems to detect what switch/port/etc their systems were on(5+ years ago, not sure what they do now)
http://openlldp.sourceforge.net/
nate
nate wrote:
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
If your just interested in doing one thing then you wouldn't.. the thread seemed to dive into broader topics than one particular issue.
Yes, but if you have to manage the details anyway I'm having trouble seeing the value of an abstraction - and having to understand both the details and the abstraction. Do the tools give you an easy way to reliably repeat someone else's detailed process without having to understand it?
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
LLDP is supposed to address that, or CDP if your Cisco, EDP if your Extreme etc. I haven't had the need to go to this level myself, though a former co-worker of mine who is a former amazonian said they used CDP all the time on their systems to detect what switch/port/etc their systems were on(5+ years ago, not sure what they do now)
I think scaling is the general topic here. I don't scale well enough to deal with learning a new language/protocol/toolset for every single configuration setting - and especially with variations per vendor. But those are the real-world configuration problems.
Les Mikesell wrote:
nate wrote:
Yes, but if you have to manage the details anyway I'm having trouble seeing the value of an abstraction - and having to understand both the details and the abstraction. Do the tools give you an easy way to reliably repeat someone else's detailed process without having to understand it?
It's kind of hard to put into words I admit. This article may help
http://www.linux-magazine.com/w3/issue/101/Cfengine.pdf
As far as repeating someone else's detailed process, you have to convert the process into the cfengine(or puppet) language.
You can see an example here, this is a pretty old config from my last company - http://portal.aphroland.org/~aphro/oracle_server.conf
and another: http://portal.aphroland.org/~aphro/mysql_server.conf
My configurations have advanced significantly since -
http://portal.aphroland.org/~aphro/redhat.conf (the above config is automatically applied if the system is detected as being redhat based be it fedora, centos, rhel)
You can probably get the idea that trying to accomplish something similar using the basic traditional methods winds up becoming unmanageable pretty quickly.
CFengine(and puppet I'm sure) define many classes on the fly allowing you to do dynamic things like configs based on IP subnet, host name, domain, time of day, day of week, other date parameters, tons of variations on the OS type, 32/64-bit etc.
I think scaling is the general topic here. I don't scale well enough to deal with learning a new language/protocol/toolset for every single configuration setting - and especially with variations per vendor. But those are the real-world configuration problems.
Which is probably why it's pretty common for organizations to standardize on a subset of infrastructure vendors for exactly the problems you raise. And often times there are different people or teams responsible for different operating systems, linux/unix folk often don't touch windows and vise versa. Bigiron people often don't touch either.
Myself I am focused on linux of course, I support 5 different systems at the moment, centos 4,5 32 and 64bit and fedora 8 32-bit(NTP servers only). I haven't deployed any new Centos 4 systems in a while. But I still need to make sure all of the software I push that is important at least has 4 or 5 different versions and the appropriate version is installed depending on the OS.
At my last company I supported about 8 different flavors, combine that with the fact they ran Ruby on rails and I had to custom build a couple of dozen Ruby modules into RPMs, that was a headache. And no I didn't trust to use the Ruby auto installer stuff, I wanted to ensure the same version was installed everywhere. Too many times the developers relied on the community stuff and the site would be down on occasion or they would get automatically upgraded which broke stuff etc.
I have 107 source rpms at the moment that I build for all of my systems.
I feel for ya if you have to support both windows and linux, I used to have to do that myself, but fortunately got out of that rut years ago. People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
nate
nate wrote:
I feel for ya if you have to support both windows and linux, I used to have to do that myself, but fortunately got out of that rut years ago.
There are things that just have to work together and across platforms, like the inventory, monitoring, and capacity tracking so I automatically see it as going the wrong direction to even consider something that locks you into a single OS or vendor. I'd like to promote greater linux use, but can't unless the tools interoperate well and so far ocsinventory and clonezilla are about the only ones that do.
People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
Well, there's always java, in spite of the damage Red Hat has done to it by shipping a broken imitation for years. Maybe hardware has gotten to the point where the overhead doesn't matter.
Les wrote:
nate wrote:
People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
Well, there's always java, in spite of the damage Red Hat has done to it by shipping a broken imitation for years. Maybe hardware has gotten to the point where the overhead doesn't matter.
No. It matters. And I don't care what version of java, I really dislike it, because *it's* broken; or, rather, it failed at what it was supposed to do: a) solve the software backlog, and b) it supposedly guaranteed no null pointer references, and useful error messages.
It did not solve the backlog, and after the huuuge stack traces and usually unhelpful error messages.... And it eats memory, including the Sun implementation. It's just Pascal w/ p-code, revived.
mark "java, why did it have to be java?"
m.roth@5-cent.us wrote:
People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
Well, there's always java, in spite of the damage Red Hat has done to it by shipping a broken imitation for years. Maybe hardware has gotten to the point where the overhead doesn't matter.
No. It matters. And I don't care what version of java, I really dislike it, because *it's* broken; or, rather, it failed at what it was supposed to do: a) solve the software backlog,
You can't do that with companies shipping broken or non-standard implementations. There's not much reason to continue that now.
and b) it supposedly guaranteed no null pointer references, and useful error messages.
Ummm, programmers are clever enough to work around guarantees in most languages.
It did not solve the backlog, and after the huuuge stack traces and usually unhelpful error messages.... And it eats memory, including the Sun implementation. It's just Pascal w/ p-code, revived.
Memory is cheap - thats the last thing to consider these days. Being able to farm out arbitrary chunks of processing across platforms is priceless... But you do have give up the unix-y idea that it is quick and cheap to start a new process.
Look at stuff like OpenNMS with distributed monitors, or Hudson as a distributed build/test platform, or lucene/solr, or the pentaho analysis tools. How else can you do any of those things?
Am Mittwoch, den 04.11.2009, 23:04 +0100 schrieb Les Mikesell:
m.roth@5-cent.us wrote:
People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
Well, there's always java, in spite of the damage Red Hat has done to it by shipping a broken imitation for years. Maybe hardware has gotten to the point where the overhead doesn't matter.
No. It matters. And I don't care what version of java, I really dislike it, because *it's* broken; or, rather, it failed at what it was supposed to do: a) solve the software backlog,
You can't do that with companies shipping broken or non-standard implementations. There's not much reason to continue that now.
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
Chris
financial.com AG
Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender) Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
Christoph Maser wrote:
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
For me it's never been an issue, I've been in java shops since pre RHEL, and we've always installed 3rd party jdks, it's not that hard they come in RPM format, at least Sun JDK and BEA Jrockit. It's by no means the only 3rd party RPM that we use.
Short of those dropping off the face of the planet myself I have no reason to try anything else.
nate
nate wrote:
Christoph Maser wrote:
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
For me it's never been an issue, I've been in java shops since pre RHEL, and we've always installed 3rd party jdks, it's not that hard they come in RPM format, at least Sun JDK and BEA Jrockit. It's by no means the only 3rd party RPM that we use.
Short of those dropping off the face of the planet myself I have no reason to try anything else.
Sure, if you are 'in a java shop' you'll have someone around that knows how (and more importantly why) to find a real version and nuke the broken one supplied in your PATH. But languages don't get popular by only being used in places that already know how to use them. Imagine if every free OS distribution had included a broken copy of bash and perl and maybe even C and internally modified their code so things still mostly worked. What kind of effect would that have had on new people learning to program? That's the kind of damage that's been done to java - which is ironic because it was designed to be perfectly portable.
Les Mikesell wrote:
already know how to use them. Imagine if every free OS distribution had included a broken copy of bash and perl and maybe even C and internally modified their code so things still mostly worked.
Were you around back in the late 90s when redhat shipped a broken gcc? :) Even today redhat seems to have the biggest mind share and perhaps even market share, so even if nobody else shipped the broken stuff that left a very large chunk of users impacted by it, and vendors as well since they built stuff to run on redhat.
As for java I suppose having a working java binary in the base install certainly would help a bit, but for me the bulk of my work with java has been with Tomcat and BEA Weblogic. I'm not even sure today if tomcat is available in the base distros, and certainly Weblogic is not since it's a big fat expensive piece of shit, I mean piece of software.
And even if tomcat was included it's not exactly the easiest thing to use out of the box, even after almost 7 years of using tomcat I still find regular old apache 10x easier to manage, so I lean towards more basic solutions when they present themselves.
Java for other uses I believe has been hindered primarily due to performance reasons rather than lack of good binaries being included in the default distributions. It has a big stigma around it for good reason, JVM startup time isn't exactly fast, it tends to have a large memory footprint, and I think it wasn't until Java 1.5 that you had the ability to share a heap between multiple apps(not sure what the right terminology is), but being able to attach an app to an already running "common" VM. Maybe not but I think I read something about that a few years ago.
Even though I do have the knowledge to be able to install the "right" JVM I tend to avoid java on my own systems wherever possible. It certainly has it's use cases, but I don't see it as something that should(or could) replace something like C or perl etc on a broad scale(at least not yet).
The thing I dislike most about Java though isn't java itself, it's JMX. But that's another topic..
But I do agree that getting java in with a better license at a much earlier time would of helped, I'm just not sure how much.
nate
Les Mikesell wrote:
already know how to use them. Imagine if every free OS distribution had included a broken copy of bash and perl and maybe even C and internally modified their code so things still mostly worked.
Were you around back in the late 90s when redhat shipped a broken gcc? :) Even today redhat seems to have the biggest mind share and perhaps
That was 2.96, I think? <snip>
And even if tomcat was included it's not exactly the easiest thing to use out of the box, even after almost 7 years of using tomcat I still find regular old apache 10x easier to manage, so I lean towards more basic solutions when they present themselves.
Yeah - tomcat eats memory, and that's what I was thinking of when I mentioned java errors resulting in 50, 100, 200 lines of useless stack trace. <snip> mark
On Thu, 5 Nov 2009, m.roth@5-cent.us wrote:
Were you around back in the late 90s when redhat shipped a broken gcc? :) Even today redhat seems to have the biggest mind share and perhaps
That was 2.96, I think?
or perhaps you refer to egcs, which was an earlier effort to poke a stick in the lagging FSF effort
not to troll here, but the argument may be made that the 2.96 demoninated gcc variant was more stable (less 'broken') than its upstream at like level. Bero is long gone from Red Hat of course, but I have an archival copy of his statement here: http://www.owlriver.com/tips/gcc-296-bero/
May we have a new Subject on this drifted thread if it continues, please?
-- Russ herrold
nate wrote:
already know how to use them. Imagine if every free OS distribution had included a broken copy of bash and perl and maybe even C and internally modified their code so things still mostly worked.
Were you around back in the late 90s when redhat shipped a broken gcc? :)
Pretty much everything was broken back then. Wade through the changes in sendmail/bind/nfs, etc. And hardly any c compiler would take exactly the same input as any other. But those things got fixed.
As for java I suppose having a working java binary in the base install certainly would help a bit,
Or NOT having a non-standard thing called java... Or cooperating with the jpackage group so their rpms would drop in. Or working with Sun to have their RPM land where packaged things are expected to land.
but for me the bulk of my work with java has been with Tomcat and BEA Weblogic. I'm not even sure today if tomcat is available in the base distros,
Yes tomcat is there now - basically the jpackage concept, but instead of embracing the jpackage repository it is an incompatibly copied version (probably remnants of work wasted on trying to make it run with their earlier non-standard flavor of something-like-java).
And even if tomcat was included it's not exactly the easiest thing to use out of the box, even after almost 7 years of using tomcat I still find regular old apache 10x easier to manage, so I lean towards more basic solutions when they present themselves.
Maybe you haven't looked recently to see how easy it could have been all along (and this makes my point about driving everyone away from using java). Take a stock centos5.x install with openjdk and tomcat5 packages and drop some 3rd party war files in place (like Hudson or Sun's opengrok). If you don't need to overwrite the root directory you'll have things working in minutes.
But by now, no application developer expects an end user to have a working web container as part of their distribution so everything includes an embedded jetty anyway.
Java for other uses I believe has been hindered primarily due to performance reasons rather than lack of good binaries being included in the default distributions.
Ummm, not working at all is a much worse problem than starting up slowly.
It has a big stigma around it for good reason, JVM startup time isn't exactly fast, it tends to have a large memory footprint, and I think it wasn't until Java 1.5 that you had the ability to share a heap between multiple apps(not sure what the right terminology is), but being able to attach an app to an already running "common" VM. Maybe not but I think I read something about that a few years ago.
Those are the kinds of problems that people figure out how to work around when they have a working tool in front of them. And they've been worked out to the point where java is a reasonable language to run in a cell phone.
Even though I do have the knowledge to be able to install the "right" JVM I tend to avoid java on my own systems wherever possible. It certainly has it's use cases, but I don't see it as something that should(or could) replace something like C or perl etc on a broad scale(at least not yet).
Needing the 'right' jvm, or knowing how/where/why to install it should never have been required. And if standard jvms had been available in all the places that c and perl were, not only would it be equally important as a language it would mean you didn't have to care at all about the OS running it.
The thing I dislike most about Java though isn't java itself, it's JMX. But that's another topic..
But I do agree that getting java in with a better license at a much earlier time would of helped, I'm just not sure how much.
Let's see... Netscape was shipped with an approximately equivalent license back then. How popular did HTML turn out to be because you could count on it to work across platforms?
On Thu, Nov 5, 2009 at 7:37 AM, nate centos@linuxpowered.net wrote:
Christoph Maser wrote:
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
For me it's never been an issue, I've been in java shops since pre RHEL, and we've always installed 3rd party jdks, it's not that hard they come in RPM format, at least Sun JDK and BEA Jrockit. It's by no means the only 3rd party RPM that we use.
Short of those dropping off the face of the planet myself I have no reason to try anything else.
When the developers use non Sun naming conventions, then their apps work whether on the opensource or Sun jre. So, developers please test on both java runtimes.
i am spoiled by yum install and Sun does not have a repository even though the frequent security updates necessitate it. So i end up having to install the Sun jre by hand on workstations. Which one of the solutions mentioned in this thread are proven to update a fleet of workstations with the latest Sun jre?
homemade yum repo clusterssh opennms cacti puppet ocs spacewalk
nate
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Rob Townley wrote:
homemade yum repo clusterssh opennms cacti puppet ocs spacewalk
puppet could, I suppose clusterssh too, cfengine as well. I wouldn't use spacewalk or ocs for it. Cacti and opennms are monitoring suites not management ones. yum repo is just a repo it won't install anything by itself, if you had a daemon or a script/cron to run you could use it to upgrade systems automatically with yum..
nate
Rob Townley wrote:
i am spoiled by yum install and Sun does not have a repository even though the frequent security updates necessitate it. So i end up having to install the Sun jre by hand on workstations. Which one of the solutions mentioned in this thread are proven to update a fleet of workstations with the latest Sun jre?
homemade yum repo clusterssh opennms cacti puppet ocs spacewalk
The 'right' way to do this used to be, and probably still is, to build an RPM that matches the RH alternatives conventions, put it in your local yum repository which is configured on all of your machines, then use puppet, ssh or whatever you'd normally use to trigger an install or update. I'm not sure if the jpackage repo still has the nosrc package to build this rpm or not.
But, if openjdk 1.6 works for your apps, it is already available as a yum install.
Christoph Maser wrote:
Am Mittwoch, den 04.11.2009, 23:04 +0100 schrieb Les Mikesell:
m.roth@5-cent.us wrote:
People don't even come to me with windows questions anymore because I'm so out of touch with it. Only so many brain cells and I'd rather spend them on more valuable things(networking, storage, virtualization, HA, scalability etc)
Well, there's always java, in spite of the damage Red Hat has done to it by shipping a broken imitation for years. Maybe hardware has gotten to the point where the overhead doesn't matter.
No. It matters. And I don't care what version of java, I really dislike it, because *it's* broken; or, rather, it failed at what it was supposed to do: a) solve the software backlog,
You can't do that with companies shipping broken or non-standard implementations. There's not much reason to continue that now.
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
This wasn't a complaint about openjdk. But how long has that been included in RHEL/Centos and how many years was something else called java shipped that didn't actually run java programs? And how difficult was it to get a properly packaged working version of java installed if/when you understood why your programs didn't run? I believe java has been set back enormously because of that.
On Thu, 5 Nov 2009, Christoph Maser wrote:
You can't do that with companies shipping broken or non-standard [java] implementations. There's not much reason to continue that now.
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
Check the upstream open bugs on openjdk ... I have a clear reproducer of error and non-conformance with Sun's test suite for conformance, but after doing the count of unaddressed matters, won't be bothering to file it ;(
-- Russ herrold
R P Herrold wrote:
On Thu, 5 Nov 2009, Christoph Maser wrote:
You can't do that with companies shipping broken or non-standard [java] implementations. There's not much reason to continue that now.
I work in a java shop and I really thik you both are wrong. We do some pretty amazing things with it and openjdk in centos (wich I think you were relating to) is working quite well for us.
Check the upstream open bugs on openjdk ... I have a clear reproducer of error and non-conformance with Sun's test suite for conformance, but after doing the count of unaddressed matters, won't be bothering to file it ;(
Does it work on that 'other' java version? And if not, have you filed the bug against it?
On Thu, 5 Nov 2009, Les Mikesell wrote:
Does it work on that 'other' java version? And if not, have you filed the bug against it?
It works on Sun's Java implementation if that is what you are asking [as such I did not run the conformance suite against Sun's kit] -- dunno (and dont care) about BEA or IBM's as the openjdk is the target destination for migration
-- Russ herrold
R P Herrold wrote:
Does it work on that 'other' java version? And if not, have you filed the bug against it?
It works on Sun's Java implementation if that is what you are asking [as such I did not run the conformance suite against Sun's kit] -- dunno (and dont care) about BEA or IBM's as the openjdk is the target destination for migration
No, I meant the thing that Red Hat has shipped for years as /bin/java.
Les Mikesell wrote:
Does it work on that 'other' java version?
It works on Sun's Java implementation if that is what you are asking
No, I meant the thing that Red Hat has shipped for years as /bin/java.
Oh, you mean that thing we uninstall immediately on building a new CentOS box, so we can install an actual JVM in its place, right?
On Thu, 5 Nov 2009, Les Mikesell wrote:
No, I meant the thing that Red Hat has shipped for years as /bin/java.
perhaps: /usr/bin/java ...
That is of late an alternatives link as I recall -- defereferencing it in the usual case in recent history, the gnu project compiler collection 'javac' as in CentOS does not meet my needs either. I did not run the test suite there, and so dunno beyond that
-- Russ herrold
Les Mikesell wrote:
There are things that just have to work together and across platforms, like the inventory, monitoring, and capacity tracking so I automatically see it as going the wrong direction to even consider something that locks you into a single OS or vendor. I'd like to promote greater linux use, but can't unless the tools interoperate well and so far ocsinventory and clonezilla are about the only ones that do.
It'd be nice if there was a integrated cross platform monitoring/management type package that worked well. So many have tried, I don't think any have succeeded. It's just too complex a task.
Our monitoring is primarily nagios+cacti which are maintained by hand currently. Myself I have literally tens of thousands of hours invested in monitoring scripts mostly integrating with RRDTool for performance and trending analysis. Everything from basic CPU/IO/memory to load balancers, switches, PDUs, databases, storage arrays etc.
Windows stuff on the other hand is more complicated. I tied in some NSclient/perfmon stuff along with SNMP(+snmp informant) and get a few dozen stats off of our MSSQL servers, honestly can't rate the accuracy, so won't stake anything on those results. They represent a tiny minority of our stuff though, I think we have more load balancers than windows boxes..well almost.
Cacti does suck but it does have a pretty nice UI for end users as far as viewing the data. It's back end scalability is non existent at the moment. My more recent scripts rely on updating RRDs outside of cacti and just pointing cacti at them for the presentation layer. My main cacti server collects nearly 16,000 data points a minute, running at ~20% cpu. 6500 of those come from my storage array(they have their own tool but I like mine more). It's a very labor intensive process is the main downside, but I haven't come across anything better yet. Some tools are better in some areas others in others. I even wrote my own originally back in 2003.
The original implimentation of cacti struggled to keep roughly 3000 data points updated every 5 minutes, and most of the stats were not accurate. So my system is collecting 26 times more data, runs 10x faster, and can scale at least 2x higher than what it's on now without adding hardware, and best of all has truly accurate information.
I remember back in 2005 a company I was at was trying to deploy sitescope, and they kept saying how my graphs were so much better than what they could get out of that $100,000 product, at least at the time. I'm sure their stuff has gotten better, as has mine! They also tried deploying zabbix, and I think replaced nagios with it, but years after deployment even though they basically had a full time zabbix developer they were _still_ using my scripts and graphs for several key metrics on the system.
At some point maybe I'll get the time to re-visit a trending application, my key requirement would be to use a RRD back end, be able to store multiple data points in a single file, be able to rrds created by other applications and have a nice UI for end users. And be scalable, at least 20,000 updates a minute.
I can write/script all of the back end stuff myself but I'm no programmer so can't do the front end.
nate
nate wrote:
It'd be nice if there was a integrated cross platform monitoring/management type package that worked well. So many have tried, I don't think any have succeeded. It's just too complex a task.
So the things that are too complicated for the computers get done by hand...
At some point maybe I'll get the time to re-visit a trending application, my key requirement would be to use a RRD back end, be able to store multiple data points in a single file, be able to rrds created by other applications and have a nice UI for end users. And be scalable, at least 20,000 updates a minute.
I can write/script all of the back end stuff myself but I'm no programmer so can't do the front end.
Have you looked at OpenNMS? It's only scaling issue is how fast you can write the rrd files out. It can use a pure-java rrd implementation with a different file format (.jrb) or rrdtool if you prefer. There's a way to query the min/max/average for a time range via http if you want to gather some longer-term values for better-formatted trend watching or aggregate groups of related instances.
Les Mikesell wrote:
Have you looked at OpenNMS? It's only scaling issue is how fast you can write the rrd files out. It can use a pure-java rrd implementation with a different file format (.jrb) or rrdtool if you prefer. There's a way to query the min/max/average for a time range via http if you want to gather some longer-term values for better-formatted trend watching or aggregate groups of related instances.
Not in a few years, a lot of systems rely on storing only one or two data points in a file, is OpenNMS like this(still)? I store upwards of 20 points in a file, drastically improves scalability, really breaks many of cacti's core assumptions that one data point is associated to a file.
It looks like OpenNMS has a "store by group" feature to group data points but no obvious information on what that feature does specifically and why it may not be enabled by default.
nate
nate wrote:
Have you looked at OpenNMS? It's only scaling issue is how fast you can write the rrd files out. It can use a pure-java rrd implementation with a different file format (.jrb) or rrdtool if you prefer. There's a way to query the min/max/average for a time range via http if you want to gather some longer-term values for better-formatted trend watching or aggregate groups of related instances.
Not in a few years, a lot of systems rely on storing only one or two data points in a file, is OpenNMS like this(still)? I store upwards of 20 points in a file, drastically improves scalability, really breaks many of cacti's core assumptions that one data point is associated to a file.
It looks like OpenNMS has a "store by group" feature to group data points but no obvious information on what that feature does specifically and why it may not be enabled by default.
I think it is just that you have to modify the graph settings to match what happens to the data source - and there's no handy way to convert existing history. But, if you are interested you should probably ask on the OpenNMS list where someone who knows what they are doing might answer. There are a lot of people using it in large-scale setups.
It is a lot easier to try OpenNMS than it was a few years ago - now there is a yum repo that basically 'just works' with Centos, and there's quite a bit of development work happening on it.
Les Mikesell wrote:
I think it is just that you have to modify the graph settings to match what happens to the data source - and there's no handy way to convert existing history. But, if you are interested you should probably ask on the OpenNMS list where someone who knows what they are doing might answer. There are a lot of people using it in large-scale setups.
It is a lot easier to try OpenNMS than it was a few years ago - now there is a yum repo that basically 'just works' with Centos, and there's quite a bit of development work happening on it.
Added to my list of things to check out, at least this time I won't forget about it again. At this rate not till sometime early next year.
thanks
nate
On 11/04/2009 10:05 PM, nate wrote:
Our monitoring is primarily nagios+cacti which are maintained by hand currently. Myself I have literally tens of thousands of hours invested in monitoring scripts mostly integrating with RRDTool for performance and trending analysis.
You say a lot with just that statement there - its a place we've all been at, and its the one issue that some of these tools around *today* help with.
Essentially, every admin has been down the route of setting up a bunch of machines and then working away at them, investing large portions of time with regular admin tasks - like writing scripts to manage small bits of state, writing some sort of config rollouts, doing some post-install tests etc etc. The list can go on and on. The important thing here really is that weve *all* done that - and a *large* portion of what we were trying to do was common in most scenarios. But there was never really any traction around any single community, that would encourage people to come together - talk about these things - and then move on creating tool sets that work for people.
To me, this is a major contribution by some of these tools today - spacewalk, puppet, cfengine, chef, bcfg2, slack : all becoming focal groups - even if they only address specific use-cases or only address certain mindsets / thought process's. The main thing is that people are talking and whats coming from those talks are more capable and better written tools that, kind of now, mean that it may no longer be necessary to spend those hours and hours working out of a silo doing the sort of work that we were doing in the past. On the flip side, people argue that doing the same level of work and working under the same conditions people are today producing a much better management system for their own use and for their users.
For example, if the monitoring tool is unable to accept tasks and report process from a tool, which in turn can be connected upto what the machine is actually supposed to be doing, its a monitoring tool that I dont even want to consider using. I'd rather have something which can let me write a snippet like:
------------- Machine of type webserver needs: - packages httpd, mod_ssl - monitoring for port :80 and :443 + if not working, run scriptX, if still not working, notify remote monitoring, and remove from production pool - dir /var/www/html should exist and if file /var/www/app/.TAG does not exist : notify {deploymentmachine} that {thismachine} needs app rollout - if all is good, run pre-production tests, if all pass, get us / keep us in the production pool
Make machine1,machine2,machine3 a webserver ------------
The advantage from this is that various bits of the descriptive code could be used in various options and scenarios. Compare that to having to go around to each machine and doing things on each box, manually, every time.
Karanbir Singh wrote:
Our monitoring is primarily nagios+cacti which are maintained by hand currently. Myself I have literally tens of thousands of hours invested in monitoring scripts mostly integrating with RRDTool for performance and trending analysis.
You say a lot with just that statement there - its a place we've all been at, and its the one issue that some of these tools around *today* help with.
Essentially, every admin has been down the route of setting up a bunch of machines and then working away at them, investing large portions of time with regular admin tasks - like writing scripts to manage small bits of state, writing some sort of config rollouts, doing some post-install tests etc etc. The list can go on and on. The important thing here really is that weve *all* done that - and a *large* portion of what we were trying to do was common in most scenarios. But there was never really any traction around any single community, that would encourage people to come together - talk about these things - and then move on creating tool sets that work for people.
To me, this is a major contribution by some of these tools today - spacewalk, puppet, cfengine, chef, bcfg2, slack : all becoming focal groups - even if they only address specific use-cases or only address certain mindsets / thought process's. The main thing is that people are talking and whats coming from those talks are more capable and better written tools that, kind of now, mean that it may no longer be necessary to spend those hours and hours working out of a silo doing the sort of work that we were doing in the past. On the flip side, people argue that doing the same level of work and working under the same conditions people are today producing a much better management system for their own use and for their users.
For example, if the monitoring tool is unable to accept tasks and report process from a tool, which in turn can be connected upto what the machine is actually supposed to be doing, its a monitoring tool that I dont even want to consider using. I'd rather have something which can let me write a snippet like:
Machine of type webserver needs:
- packages httpd, mod_ssl
- monitoring for port :80 and :443
- if not working, run scriptX, if still not working, notify remote
monitoring, and remove from production pool
- dir /var/www/html should exist and if file /var/www/app/.TAG does not
exist : notify {deploymentmachine} that {thismachine} needs app rollout
- if all is good, run pre-production tests, if all pass, get us / keep
us in the production pool
Make machine1,machine2,machine3 a webserver
The advantage from this is that various bits of the descriptive code could be used in various options and scenarios. Compare that to having to go around to each machine and doing things on each box, manually, every time.
Anyone who manages some number of servers will very likely also have to deal with an assortment of different operating systems, networking devices, load balancers, etc., so if you choose tools that are only able to manage one type of setup you'll fragment your team into sets that can't help each other and will likely make a mess of your network. And if you aren't heterogeneous yet, just wait for the next round of company acquisitions to start.
You are absolutely right that is is an important topic that doesn't really have a good forum, but what we really need are some cross-platform abstractions and protocols to describe provisioning and deployment.
Karanbir Singh wrote:
To me, this is a major contribution by some of these tools today - spacewalk, puppet, cfengine, chef, bcfg2, slack : all becoming focal groups - even if they only address specific use-cases or only address certain mindsets / thought process's.
Another aspect of the management tools I forgot to mention is their own complexity. Most of the enterprise grade stuff is so complicated you need dedicated trained people to work on it as their sole task, whether it's something like BMC or HP Openview/sitescope type stuff.
I can only speak for what I've used but at my current company I was hired after the previous guy left. One of the reasons my company wanted me was my knowledge of cfengine, which they had deployed but really nobody knew how to use.
It wasn't easy going into a new enviornment with a five 9 SLA and dig into their systems with pretty much the only documentation that I had was a poorly setup cfengine implimentation. I mean their install process involved *manually* defining classes on the command line and running the agent to pick up the associated configurations for those classes. Where as my setup the classes are always defined and no manual intervention needed. The internal infrastructure was a total mess, and some of it still is. The older DNS systems run on windows and weren't setup properly, the master DNS failed a couple of days ago causing havok on the back end. I worked around it for a little while they couldn't get the system back up, then yesterday the main zone for the back end expired wrecking havok again, and we did an emergency migration to linux at that point. The guys doing the windows stuff don't know what they are doing, it's all legacy and hasn't been touched in years.
So I spent a few months slowly re-writing the entire system, along with re-doing the entire build system which was just as bad but fortunately kickstart is a lot easier to work with.
Even today more than a year later there are probably 45-50 systems out there running on the "old" stuff. There is no good/safe/clean way to migrate to the new w/o re-installing. So I've been doing that as opportunities present themselves. Probably 40 of those systems will be replaced in the next 3-5 months so that will allow me to almost complete the project..
Maybe when I leave my current company the next person will come in and re-write everything again, or switch to puppet or something..
Funny I just realized when I left my previous company I left them with another guy who was there the whole time who knew CFengine as well, and he tried to train another guy there before he left but since all of them have left.. They hired some other guy but not sure if he's still there and not sure if he ever learned how things worked, the company has mostly collapsed under multiple failed business models.
I feel for the folks who are so overworked and stressed out that they don't have the ability to learn better ways of doing things, I've been there too. And fortunately am now in a position where I have the luxury of refusing such positions and tasks now because it's not worth my time.
As for monitoring automation I hear ya, that would be cool to have, right now for us it's not much of an issue our growth rate is pretty small(roughly 400 systems). We have a tier 1 team that handles most of monitoring setup so I just give them a ticket and tell them to do it and they do.
I'll be deploying another new edge data center location in a couple of weeks, 14 servers, 10 bare metal, 4 run ESXi, about 50 instances of CentOS total, with kickstarting over the WAN I can usually get most everything up in about a day, which sadly is faster than the network guy takes to setup his two switches, two load balancers, and two firewalls. Despite the small server count the hardware is benchmarked to run our app at about 40,000 transactions a second as a whole, which is the fastest app by several orders of magnitude that I've ever worked on anyways. We have more capacity issues on the load balancers than we have with the servers with such a small server count.
It certainly was an interesting experience the first time deploying a data center from remote, we just had 1 really basic server config used to seed the rest of the network, everything done via remote management cards in the servers and remote installations over the WAN. With the exception of the network stuff which the network guy was on site for a couple of days to configure.
I have 10x the equipment to manage and can still get things done faster than him. I could manage all of the network stuff too without much effort.
If I spent some time I'm sure I could automate some nagios integration but forget about cacti, lost cause. Maybe OpenNMS at some point who knows.. At this time automating monitoring integration isn't a pressing issue. I spend more time writing custom scripts to query things in the most scalable way I can than we do adding new things to the monitoring stuff.
Myself I am not holding my breath on any movement or product to come around and make managing systems especially cross platform simpler and cheaper. The task is just too complex. The biggest companies such as Amazon, Google, MS etc have all realized there is little point in even trying such a thing.
It would only benefit really small companies that are at a growth point where they don't have enough business to hire people to standardize on something(or have teams for each thing). And those companies can't afford the costs involved with some big new fancy tool to make their lives easier. The bug guys don't care since they have the teams and stuff to handle it.
Though that won't stop companies from trying..the latest push of course is to the magical cloud where you only care about your apps no longer care about the infrastructure.
Maybe someday that will work, I talked to a small company recently that is investing nearly $1M to in source their application from the cloud to their own gear(they have never hosted it themselves) because of issues with the cloud that they couldn't work around.
good talks though, the most interesting thread I've seen here in I don't know how long. Even though it was soooooooo off topic (from what the list is about anyways) :)
nate
nate wrote:
It certainly was an interesting experience the first time deploying a data center from remote, we just had 1 really basic server config used to seed the rest of the network, everything done via remote management cards in the servers and remote installations over the WAN. With the exception of the network stuff which the network guy was on site for a couple of days to configure.
I have 10x the equipment to manage and can still get things done faster than him. I could manage all of the network stuff too without much effort.
Network equipment is actually much simpler to deal with than computers because it is usually all controlled by a single text file. For the initial setup and big changes where you can reboot, just start with a template, edit it with automated or manual tools, toss a copy into a version control system and tftp it into place. And if you tftp it back after ever live change and toss in your version control it is easy to spot anything that has ever changed. I just wish the changes were so easy to manage and track on hosts.
If I spent some time I'm sure I could automate some nagios integration but forget about cacti, lost cause. Maybe OpenNMS at some point who knows.. At this time automating monitoring integration isn't a pressing issue. I spend more time writing custom scripts to query things in the most scalable way I can than we do adding new things to the monitoring stuff.
Myself I am not holding my breath on any movement or product to come around and make managing systems especially cross platform simpler and cheaper. The task is just too complex. The biggest companies such as Amazon, Google, MS etc have all realized there is little point in even trying such a thing.
Nobody's going to buy Amazon, Google, or MS - and they probably aren't going to merge. Most other companies are less sure about that.
It would only benefit really small companies that are at a growth point where they don't have enough business to hire people to standardize on something(or have teams for each thing). And those companies can't afford the costs involved with some big new fancy tool to make their lives easier. The bug guys don't care since they have the teams and stuff to handle it.
What happens in the real world is that small companies build something complex that works, then are acquired by mid-sized companies that are contractually obligated to keep their many separate divisions working but would like to combine common functionality and the staff maintaining things. A company like MS may be able to rip out all the Suns and just hope their replacement design works, but smaller companies can't get away with that and the mix of equipment has to co-exist for years - and their non-interoperable automation tools become extra arcane things to maintain separately.
Though that won't stop companies from trying..the latest push of course is to the magical cloud where you only care about your apps no longer care about the infrastructure.
There is a certain appeal to vmware and the like that isolate your OS's from hardware differences. But our applications tax the raw hardware capacity without adding any additional layers.
Maybe someday that will work, I talked to a small company recently that is investing nearly $1M to in source their application from the cloud to their own gear(they have never hosted it themselves) because of issues with the cloud that they couldn't work around.
If you are using the full capacity of a machine, someone else isn't going to be able to sell it to you cheaper from a cloud - but it does make sense for tasks that are rare or need variable capacity.
good talks though, the most interesting thread I've seen here in I don't know how long. Even though it was soooooooo off topic (from what the list is about anyways) :)
Well, if you are using Centos, you are probably running servers and not paying an outside vendor to support them so it's pretty likely that everyone here has the same problem.
Les Mikesell wrote:
What happens in the real world is that small companies build something complex that works, then are acquired by mid-sized companies that are contractually obligated to keep their many separate divisions working but would like to combine common functionality and the staff maintaining things. A company like MS may be able to rip out all the Suns and just hope their replacement design works, but smaller companies can't get away with that and the mix of equipment has to co-exist for years - and their non-interoperable automation tools become extra arcane things to maintain separately.
More of what I meant was those bigger companies can afford to keep the existing teams in place. Just look at the recent T-mobile sidekick thing. All of that infrastructure was Sun/Oracle/Linux.
It took MS years to migrate hotmail off of BSD/Sun. Even after they migrated the front end it took even longer to change out the back end, but at least with the front end swapped you couldn't query their servers and see it was running BSD.
My last company was a small company, they bought another smaller one(1 person shop) for their technology(perl-based) and then spent the next year re-writing it to be java based. Only to lose interest in Java along the way and want to run everything in Ruby. Then they realized their ruby apps were crap and dropped them all and went back to their core java app..full circle I suppose.
Two companies ago I worked for a pretty stressful mobile e-commerce startup that was pretty much entirely linux-based. They started out with windows but then migrated to linux(I came on board after the migration had started)
A few months after I quit they got bought out by a really big company(thousands of people billions of $). They were anti linux. So much so that when my former company decided to drop RHEL in favor of Oracle linux for lower support costs the lead lawyer at the parent firm sent a very threatening letter to my company demanding they turn off all linux systems immediately and that open source was banned from the organization. The COO of the parent company had to go explain the situation and they added an exception. The parent company even entirely re-wired the corporate network and linux systems were not allowed to be connected to the main network, despite many people using it as their primary desktop, they were forced to get secondary systems for the normal corporate network. Glad I left when I did, I sensed a disturbance in the force and got out quick.
I thought that was funny at least. The parent company was so inefficient at running operations that they began outsourcing work to my former company which could operate things 5-10x more efficiently. Though the stress levels remain high there. My friends that are still there want to leave but have no time to even prepare a resume let alone look for a new gig(I remember the feeling..), so those extreme productivity numbers come at a very high personal cost. I'm still recovering from stuff I did 4-5 years ago, though it was an awesome learning experience, probably compressed 5-10 years worth of work/knowledge in 3.
Today the parent company is embracing linux more and has no plans to migrate or re-write the app to run on something else, they've kept the teams in place for the most part, augmenting them with others from the parent over time.
Maybe I've just been lucky or something.
nate
nate wrote:
What happens in the real world is that small companies build something complex that works, then are acquired by mid-sized companies that are contractually obligated to keep their many separate divisions working but would like to combine common functionality and the staff maintaining things. A company like MS may be able to rip out all the Suns and just hope their replacement design works, but smaller companies can't get away with that and the mix of equipment has to co-exist for years - and their non-interoperable automation tools become extra arcane things to maintain separately.
More of what I meant was those bigger companies can afford to keep the existing teams in place. Just look at the recent T-mobile sidekick thing. All of that infrastructure was Sun/Oracle/Linux.
Yeah, that's what they said. But this was an MS acquisition and they were changing it...
It took MS years to migrate hotmail off of BSD/Sun. Even after they migrated the front end it took even longer to change out the back end, but at least with the front end swapped you couldn't query their servers and see it was running BSD.
My last company was a small company, they bought another smaller one(1 person shop) for their technology(perl-based) and then spent the next year re-writing it to be java based. Only to lose interest in Java along the way and want to run everything in Ruby. Then they realized their ruby apps were crap and dropped them all and went back to their core java app..full circle I suppose.
Most things boil down to the application itself - and java is pretty agnostic as to where it runs.
Today the parent company is embracing linux more and has no plans to migrate or re-write the app to run on something else, they've kept the teams in place for the most part, augmenting them with others from the parent over time.
Maybe I've just been lucky or something.
Don't count on exceptions to last - or that linux is inherently better. Windows 2003 server and later can be reliable enough to use, so the only real driving force to convert is the license cost and administration differences.
Les Mikesell wrote:
Don't count on exceptions to last - or that linux is inherently better. Windows 2003 server and later can be reliable enough to use, so the only real driving force to convert is the license cost and administration differences.
Yeah for me it doesn't really matter, if some company comes in and buys us and says we're going to windows I'll say I'm going to another company, lots of places interested in hiring people in my position so nice to have the flexibility.
In fact now that I think about it, about 9 years ago I was at a company and another IT guy who was more MS-based wanted to migrate from NT4 to Win2k(the bulk of the stuff I was responsible for was Linux/Solaris/HPUX/Tru64/AIX, though there was a touch of NT4 too). I told him(and my boss) if you do I'm leaving(back in my LDAP days wanting to use Samba as PDC). And they never did, well not till after I left anyways. We still joke about it to this day.
I don't have anything too much against windows myself I just have no interest in using it. Not worth my time. Stopped using it for the most part about 11 years ago, haven't had a need to look back.
nate
Am Mittwoch, den 04.11.2009, 20:46 +0100 schrieb Les Mikesell:
nate wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool.
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
Just a small real life example. Every now and then we find that some webservers on our farm do not have a specific sysctl config set (tcp fin timeout). If you fix that with an ssh loop or mussh you have it fixed now, just until you add a new server to the farm month later. If you use a management tool it will find that this particular server belongs to the farm and should have it set, does it for you and activates it. How do you do that with manual interaction?
Chris
financial.com AG
Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender) Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
Christoph Maser wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool.
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
Just a small real life example. Every now and then we find that some webservers on our farm do not have a specific sysctl config set (tcp fin timeout). If you fix that with an ssh loop or mussh you have it fixed now, just until you add a new server to the farm month later. If you use a management tool it will find that this particular server belongs to the farm and should have it set, does it for you and activates it. How do you do that with manual interaction?
If I wrote the script I would either have set it up to run regularly with a list of targets where it is needed and add new members to the list as appropriate or I'd add the setup to our stock images so it wouldn't be needed as a special case. But in my experience, the kinds of things that need time consuming configuration aren't that predictable ahead of time. What do you do when they relate to the switch where something is connected or something other than the group you've put the server in?
Am Mittwoch, den 04.11.2009, 23:42 +0100 schrieb Les Mikesell:
Christoph Maser wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool.
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
Just a small real life example. Every now and then we find that some webservers on our farm do not have a specific sysctl config set (tcp fin timeout). If you fix that with an ssh loop or mussh you have it fixed now, just until you add a new server to the farm month later. If you use a management tool it will find that this particular server belongs to the farm and should have it set, does it for you and activates it. How do you do that with manual interaction?
If I wrote the script I would either have set it up to run regularly with a list of targets where it is needed and add new members to the list as appropriate
Hey wait, isnt that what the management tools provide a framework for? I think this is _the_ essential point.
or I'd add the setup to our stock images so it wouldn't be needed as a special case. But in my experience, the kinds of things that need time consuming configuration aren't that predictable ahead of time. What do you do when they relate to the switch where something is connected or something other than the group you've put the server in?
Well if you have an automatic way of finding out wich switch it is connected to you can use that as condition inside the management solution. If you put the server in the wrong group i'd call that human error. Btw. machines are not in one group. You can use multiple groups at the same time, so one for the server-group, one for the OS, one for the switch its on, one for the rack it is in etc etc. But you propably know that.
Back to the first part. I really prefer learning puppet/cfengine about writing an armada of tools myself to have my scripts be run reliable and secure on _every_ host. Plus i might get some additional cool features from the framework too.
Chris
financial.com AG
Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender) Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
On Wed, Nov 4, 2009 at 11:46 AM, Les Mikesell lesmikesell@gmail.com wrote:
nate wrote:
But will the tool do these changes for me?
The tool will do anything you tell it to, it's a generic tool.
OK, but if I have to write the script, why wouldn't I just write the script my way and automate it over ssh which already works instead of learning some new language and having to install some new agent everywhere to run it?
You could define a class that runs a script to detect the network settings, if it is forced to full duplex it would return true, which would then trigger another command to run or config files to get copied, if configs are copied after that it could execute another command(perhaps snmpset to change the switch config or something).
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
Saw something about this at LinuxCon. CME is using Cisco Discover Protocol and LLDP to figure out the info about the connected port, location, vlan and a bunch of other stuff.
Larry Brigman wrote:
You could define a class that runs a script to detect the network settings, if it is forced to full duplex it would return true, which would then trigger another command to run or config files to get copied, if configs are copied after that it could execute another command(perhaps snmpset to change the switch config or something).
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
Saw something about this at LinuxCon. CME is using Cisco Discover Protocol and LLDP to figure out the info about the connected port, location, vlan and a bunch of other stuff.
That's interesting, thanks! I was surprised to see that cdpr (from epel) would pick up the name/ip/port from a connected Dell PowerConnect switch. But then I repeated it using the -v option and it found the upstream Cisco instead... The production switches are all Cisco though, so this might be a usable hack to permit pre-configuring machines to adjust themselves to whatever order the cables happen to be plugged in. The duplex option just shows a number and doesn't offer to interpret the value, but maybe I can look that up somewhere.
On Thu, Nov 5, 2009 at 1:44 PM, Les Mikesell lesmikesell@gmail.com wrote:
Larry Brigman wrote:
You could define a class that runs a script to detect the network settings, if it is forced to full duplex it would return true, which would then trigger another command to run or config files to get copied, if configs are copied after that it could execute another command(perhaps snmpset to change the switch config or something).
It's next to impossible to get or set a duplex setting via snmp. And non-trivial to figure out what switch port is connected to what device - OpenNMS does a reasonable job but if you activate all of its checks it can kill things that have full bgp routes.
Saw something about this at LinuxCon. CME is using Cisco Discover Protocol and LLDP to figure out the info about the connected port, location, vlan and a bunch of other stuff.
That's interesting, thanks! I was surprised to see that cdpr (from epel) would pick up the name/ip/port from a connected Dell PowerConnect switch. But then I repeated it using the -v option and it found the upstream Cisco instead... The production switches are all Cisco though, so this might be a usable hack to permit pre-configuring machines to adjust themselves to whatever order the cables happen to be plugged in. The duplex option just shows a number and doesn't offer to interpret the value, but maybe I can look that up somewhere.
cdpr is multicast udp packet. Other devices not Cisco will pass this on. Other Cisco devices will drop it since it is really only useful between devices. Other switch vendors wanted to be more vendor neutral and came up with the LLDP (Link Layer Discovery Protocol) which I don't think is on cheaper switches. The other think about CDPR is that unless the network admin has explicatlly turned this off, it is "on" by default in all Cisco gear.
CME's basic usage model was to use this to notify the networking group about a mis-configuration by pointing them to the exact switch/port by name and number. They use nothing but Cisco gear.
On 11/04/2009 12:15 PM, Marcus Moeller wrote:
I am personally not that big fan of Puppet, as things are getting quite complex in large scenarios and as Puppet does not scale well (this has been improved in the latest version if you are using passenger instead of webrick).
Puppet is actually easier to scale beyond a few nodes than spacewalk is. Remember that the client->server model for puppet is optional. eg. One set of people I introduced puppet to use git as a policy distribution layer and run ~ 12k instances using puppet, with average delivery time being less than 4 minutes from released, with less than 10 min guaranteed for role/policy implementation on the designated nodes ( its a large hosing setup, we did these calculations just last weekend ).
If you are willed to set up complex configurations with depends and variables, Puppet may be a good choice. In addition of Cobbler or TheForeman you will get provisioning functionality, too. IMHO you should also be familiar with Ruby, too.
I dont agree with either of the two, puppet is one package with ruby on the machine, how many dozens of thigns do you need to install for spacewalk ? What is the stability and maintenance loop like for each component ? And how many hoops do you need to jump through in order to get your app rollout linked into state management with spacewalk ? Try the same for puppet or even bcfg2 /chef/cfengine - its just a case of comparing apples and oranges. They dont do the same thing and they are both good to have, you just need to make up your mind what level you want to abstract at and how you want to split roles. Eg. Spacewalks's config policy management capabilities are about 3 generations behind what puppet is able to give you today.
Spacewalk is one single tool for all Lifecycle Management task. It is capable of bare provisioning (using Cobbler integration), re-provisioning (using Koan), configuration management, errata generation and package management. It also scales quite well if you are using Oracle Standalone instead of XE. With PostgreSQL, a free database backend will be integrated in the near future.
I think you highlighted the main issues yourself - spacewalk does a fair bit of different things than puppet - the only real overlap is on config management ( which I prefer to call policy - since its a case of policy that defines the role of a specific box ). And at that puppet is way ahead of the capabilities of spacewalk.
The other massive win you can get from an integrated spacewalk/puppet deployment is the ability to stop running ssh and any remote access to a machine. Which in turn means that your spacewalk setup will have ( almost by force ) a good representation of the platform and the puppet manifests give you a fantastic view of the entire policy deployment across the entire infrastructure - just this is worth a lot when you have employee churn and/or need to consider platform re-factoring. And with a vcs to back up the puppet manifests, you get all the audit, track and management around that policy that you need.
btw, I also dont agree with the idea that using puppet needs knowledge of ruby - its like saying using cfengine needs knowledge of 'C' or using OpenOffice needs intricate knowledge of Java. On the other hand, Chef aims to be closely associated within ruby - and if you said that Chef needed a fair ruby mindset, I'd agree :)
- KB
PS: Your email client is broken. Its not preserving thread sanity.
Dear Karan,
I am personally not that big fan of Puppet, as things are getting quite complex in large scenarios and as Puppet does not scale well (this has been improved in the latest version if you are using passenger instead of webrick).
Puppet is actually easier to scale beyond a few nodes than spacewalk is. Remember that the client->server model for puppet is optional. eg. One set of people I introduced puppet to use git as a policy distribution layer and run ~ 12k instances using puppet, with average delivery time being less than 4 minutes from released, with less than 10 min guaranteed for role/policy implementation on the designated nodes ( its a large hosing setup, we did these calculations just last weekend ).
We had massive performance issues with Puppet < 0.25 and Mogrel/Webrick.
If you are willed to set up complex configurations with depends and variables, Puppet may be a good choice. In addition of Cobbler or TheForeman you will get provisioning functionality, too. IMHO you should also be familiar with Ruby, too.
Concerning Ruby you should at least be familiar with quoting/escaping and scopes.
I dont agree with either of the two, puppet is one package with ruby on the machine, how many dozens of thigns do you need to install for spacewalk ? What is the stability and maintenance loop like for each
There are not so may packages that needs to be installed on client side (about 10) but in conclusion you will get functionalities like remote-commands through osad and monitoring. The package upgrades could be handled with errata and update management easily.
component ? And how many hoops do you need to jump through in order to get your app rollout linked into state management with spacewalk ? Try the same for puppet or even bcfg2 /chef/cfengine - its just a case of comparing apples and oranges. They dont do the same thing and they are both good to have, you just need to make up your mind what level you want to abstract at and how you want to split roles. Eg. Spacewalks's config policy management capabilities are about 3 generations behind what puppet is able to give you today.
That's true. Config management should be extended and there are already some good ideas out there, like depending remote commands.
Best Regards Marcus
PS: Your email client is broken. Its not preserving thread sanity.
Not a problem here.
On 11/04/2009 02:18 PM, Marcus Moeller wrote:
We had massive performance issues with Puppet< 0.25 and Mogrel/Webrick.
Right, I dont think that the default out of the box setup with Webrick is meant to scale much beyond 100 or so machines, but its trivial to setup nginx based proxy in front of multiple mongrels and have that handle the load. Anything > 500 nodes needs specific consideration, but then at that level you have both the time and the interest to fix the specific issues.
Concerning Ruby you should at least be familiar with quoting/escaping and scopes.
I think the puppet DSL is slightly different from ruby in that way. Just working with the language guide for puppet is enough to keep things going. Its only when you get down to lower level embedded templates with erb that it might help knowing a bit of ruby, but I do honestly think most people can do almost everything on puppet without any ruby experience.
There are not so may packages that needs to be installed on client side (about 10)
How about the server side? puppet is still a single package on that end too.
but in conclusion you will get functionalities like remote-commands through osad and monitoring. The package upgrades could be handled with errata and update management easily.
with puppet you get the ability to carry role based nagios definitions in sync with the role definition - which almost means zero nagios configuration. So what that means is that when I define what my webserver-type1 should look like and what configs its needs and what policy it needs to implement I can also define, at the same place, what sort of monitoring would be needed against those components. Then when I apply webserver-type1 to any specific machine, I get the nagios configs for free.
And the fact that puppet runs in a definite manner, it can make for a reactive monitoring system in itself ( although I prefer to use tools like monit / god for that - specially for time critical services ).
PS: Your email client is broken. Its not preserving thread sanity.
Not a problem here.
Interestingly for your email : Message-ID: g1m1yig5etitfc1rxzjezwJv4X.penango@mail.gmail.com
The headers contain no References or in-reply-to headers on the copy that came through to me ( your most recent one does have References set ). So not sure what mailclient you are using, but its a bit random on its headers.
- KB
If you guys would be so kind would you mind emailing some examples of some puppet policies? It would really be beneficial to me :)
Thanks again for the all replies!
Dan Burkland -----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Karanbir Singh Sent: Wednesday, November 04, 2009 8:34 AM To: CentOS mailing list Subject: Re: [CentOS] Spacewalk or Puppet?
On 11/04/2009 02:18 PM, Marcus Moeller wrote:
We had massive performance issues with Puppet< 0.25 and Mogrel/Webrick.
Right, I dont think that the default out of the box setup with Webrick is meant to scale much beyond 100 or so machines, but its trivial to setup nginx based proxy in front of multiple mongrels and have that handle the load. Anything > 500 nodes needs specific consideration, but then at that level you have both the time and the interest to fix the specific issues.
Concerning Ruby you should at least be familiar with quoting/escaping and scopes.
I think the puppet DSL is slightly different from ruby in that way. Just working with the language guide for puppet is enough to keep things going. Its only when you get down to lower level embedded templates with erb that it might help knowing a bit of ruby, but I do honestly think most people can do almost everything on puppet without any ruby experience.
There are not so may packages that needs to be installed on client side (about 10)
How about the server side? puppet is still a single package on that end too.
but in conclusion you will get functionalities like remote-commands through osad and monitoring. The package upgrades could be handled with errata and update management easily.
with puppet you get the ability to carry role based nagios definitions in sync with the role definition - which almost means zero nagios configuration. So what that means is that when I define what my webserver-type1 should look like and what configs its needs and what policy it needs to implement I can also define, at the same place, what sort of monitoring would be needed against those components. Then when I apply webserver-type1 to any specific machine, I get the nagios configs for free.
And the fact that puppet runs in a definite manner, it can make for a reactive monitoring system in itself ( although I prefer to use tools like monit / god for that - specially for time critical services ).
PS: Your email client is broken. Its not preserving thread sanity.
Not a problem here.
Interestingly for your email : Message-ID: g1m1yig5etitfc1rxzjezwJv4X.penango@mail.gmail.com
The headers contain no References or in-reply-to headers on the copy that came through to me ( your most recent one does have References set ). So not sure what mailclient you are using, but its a bit random on its headers.
- KB
hi Dan,
Firstly - you are more than welcome on the list - but be polite to everyone else around and take a minute to atleat trim posts and be a good mailing lists person!
On 04/11/09 19:04, Dan Burkland wrote:
If you guys would be so kind would you mind emailing some examples of some puppet policies? It would really be beneficial to me :)
start by looking at the puppet wiki - there are quite a few snippets of code, including some tutorials to get you started. They also have a very active irc channel at #puppet@irc.freenode.net and mailing lists.
- KB