Karanbir Singh wrote: > >> Our monitoring is primarily nagios+cacti which are maintained by >> hand currently. Myself I have literally tens of thousands of hours >> invested in monitoring scripts mostly integrating with RRDTool for >> performance and trending analysis. > > You say a lot with just that statement there - its a place we've all > been at, and its the one issue that some of these tools around *today* > help with. > > Essentially, every admin has been down the route of setting up a bunch > of machines and then working away at them, investing large portions of > time with regular admin tasks - like writing scripts to manage small > bits of state, writing some sort of config rollouts, doing some > post-install tests etc etc. The list can go on and on. The important > thing here really is that weve *all* done that - and a *large* portion > of what we were trying to do was common in most scenarios. But there was > never really any traction around any single community, that would > encourage people to come together - talk about these things - and then > move on creating tool sets that work for people. > > To me, this is a major contribution by some of these tools today - > spacewalk, puppet, cfengine, chef, bcfg2, slack : all becoming focal > groups - even if they only address specific use-cases or only address > certain mindsets / thought process's. The main thing is that people are > talking and whats coming from those talks are more capable and better > written tools that, kind of now, mean that it may no longer be necessary > to spend those hours and hours working out of a silo doing the sort of > work that we were doing in the past. On the flip side, people argue that > doing the same level of work and working under the same conditions > people are today producing a much better management system for their own > use and for their users. > > For example, if the monitoring tool is unable to accept tasks and report > process from a tool, which in turn can be connected upto what the > machine is actually supposed to be doing, its a monitoring tool that I > dont even want to consider using. I'd rather have something which can > let me write a snippet like: > > ------------- > Machine of type webserver needs: > - packages httpd, mod_ssl > - monitoring for port :80 and :443 > + if not working, run scriptX, if still not working, notify remote > monitoring, and remove from production pool > - dir /var/www/html should exist and if file /var/www/app/.TAG does not > exist : notify {deploymentmachine} that {thismachine} needs app rollout > - if all is good, run pre-production tests, if all pass, get us / keep > us in the production pool > > Make machine1,machine2,machine3 a webserver > ------------ > > The advantage from this is that various bits of the descriptive code > could be used in various options and scenarios. Compare that to having > to go around to each machine and doing things on each box, manually, > every time. Anyone who manages some number of servers will very likely also have to deal with an assortment of different operating systems, networking devices, load balancers, etc., so if you choose tools that are only able to manage one type of setup you'll fragment your team into sets that can't help each other and will likely make a mess of your network. And if you aren't heterogeneous yet, just wait for the next round of company acquisitions to start. You are absolutely right that is is an important topic that doesn't really have a good forum, but what we really need are some cross-platform abstractions and protocols to describe provisioning and deployment. -- Les Mikesell lesmikesell at gmail.com