Greetings,
On 3/24/11, Akemi Yagi amyagi@gmail.com wrote:
I should note, however, that they are not for production use.
Akemi
Thanks Akemi very much for your always relevant and brilliant works.
</rant> As an Indian, where "veda" repository of knowledge originated, I have always looked at every member of this list as a repository of knowledge.
http://en.wikipedia.org/wiki/Vedas
I am presuming you are of Japanese origin. together India and Japan have had a very, very long shared history indeed : http://www.ece.lsu.edu/kak/VedicJapan.pdf
I stand in awe in front of the giant of the members fo this list.
i am almost about to rollover to the so-called "golden" age of 50, (the origin of these jubilees of which I am unaware of) My respectful namo namaH (in ITRANS encoding) to all the members.
I am sure you have heard of Hanuman from Ramayana the best grammarian known?
</rant>
Having said that, I have this troubling thought for last decade: What exactly is high availability: is it 24/7 power on time? or is ti "when needed". Please not it am not talking about the maybe arrogant "on demand" attitude of a human.
I have been a member of this, linux-cluster, and other lists for alomst half a decade.
I have managed a two node Hearbeat/DRBD setup for about 2 years which was transformed into a two-node (which became a three) node RHCS cluster at least for about two Quarters.
This under extreme circumstances in India like 4 hours of (Electrical Power Load Shedding) outage every day with no fencing device.
I can claim I at least tried that mating dance with those two beasts and (horror of horror) the breeding ground of PHB (vnbrims.org) with about 200 PHB without any significant assistance.
I never understood the term "Hig Availability" : does it mean available as in "soliciting" ?
What exactly those lusers want? and what exactly we self declared high tech droids / engineers seek?
What exactly is "production" use? (I know DEV, UAT, blah, bla, tla etc been there done that and I don't have the T shirt - nobody gave me one)
Always with warm regards only,
Rajagopal
On 3/24/2011 2:48 PM, Rajagopal Swaminathan wrote:
I never understood the term "Hig Availability" : does it mean available as in "soliciting" ?
"High Availability" means that you are pushing toward 100% uptime for your services. You try to make sure that no one event can take you down.
What exactly is "production" use? (I know DEV, UAT, blah, bla, tla etc been there done that and I don't have the T shirt - nobody gave me one)
"Production use" basically means that you are using the software/hardware/whatever in a situation where you are relying on it. In other words, not in a development, testing, or other non-critical application. The best example of "Production use" is a company's main website or mail server.
Rajagopal Swaminathan wrote:
On 3/24/11, Akemi Yagi amyagi@gmail.com wrote:
I should note, however, that they are not for production use.
<snip>
</rant>
<snip>
I am sure you have heard of Hanuman from Ramayana the best grammarian known?
Huh - it's been more than half a lifetime since I read a translation of the Ramayana, or maybe it was the translation, but I don't remember Hanuman as grammarian. <g>
</rant>
Having said that, I have this troubling thought for last decade: What exactly is high availability: is it 24/7 power on time? or is ti "when needed". Please not it am not talking about the maybe arrogant "on demand" attitude of a human.
Ok, h/a is not "fault tolerance", which is for 99.+% uptime (and you *pay* a lot for additional decimals there). What it is for is well over 90% uptime, though where I've worked, including supporting the City of Chicago 911 system (emergency system, that is), it was expected to be over 99% uptime. h/a, as you should be familiar from your experience, two or more servers share an asserted IP address, and if one server goes down, another will see within whatever's configured - 30 seconds, maybe - the other server then asserts the IP address, and offers all of the services expected. They should also be sharing redundant storage, so that the only thing that should be lost are transactions that were just started when the first server went down; those that were partly transacted should be rolled back to a known state when the second server asserts the IP.
I never understood the term "Hig Availability" : does it mean available as in "soliciting" ?
Nope - it has real meaning.
What exactly those lusers want? and what exactly we self declared high tech droids / engineers seek?
What exactly is "production" use? (I know DEV, UAT, blah, bla, tla etc been there done that and I don't have the T shirt - nobody gave me one)
All services expected from a server at a given IP should be available up to or over 99% of the time, with no lost transactions, or transactions in an undefined state - that's h/a and production.
mark
Greetings,
Dear Roth, thanks for you reply. I am your fan club member.
(Hero worshipping is not new to India : Dr. S. Chandrashekhar (Chicago University), Dr APJ Abdul Kalam, Sachin Tendulkar, Amitabh Bachan, Rajnikant and the such) in a country of 1/6th earth's population.
But I am still just one.
So the opinions below are entirely mine alone.
On 3/25/11, m.roth@5-cent.us m.roth@5-cent.us wrote:
Huh - it's been more than half a lifetime since I read a translation of the Ramayana, or maybe it was the translation, but I don't remember Hanuman as grammarian. <g>
</rant>
<rant> Just in case from: http://mythfolklore.net/india/encyclopedia/hanuman.htm "Among his other accomplishments, Hanuman was a grammarian; and the Ramayana says, “The chief of monkeys is perfect; no one equals him in the sastras, in learning, and in ascertaining the sense of the scriptures [or in moving at will]. In all sciences, in the rules of austerity, he rivals the preceptor of the gods. … It is well known that Hanuman was the ninth author of grammar.”" </rant>
will see within whatever's configured - 30 seconds, maybe - the other
I understand an realize what you say and have supported in the past such "Enterprise" environment 24/7 from the pager's era from KSA to India(circa 1987 to early 2011).
<rant> I have very shifted to 'lowly' predictable work and family hours with a ridiculous pay for time being as an ISP suppot professional agent of an US corporate. (30 years including my brief stint as "TV Mechanic", "Electronic Foremn" etc. as my floating job titles for about three years)
It boils down to info being available to grassroot users when needed -- by them -- an not decided by "US" syadmins and the such.
They don't mind system not being available. That gives them breathing space. Give it to them without them feeling the guilt. afterall unavailability is a fact of life.
Few minutes is fro them is ok. It is the PHB's perception that we have to change. I have tried that at least. and it has helped me remendously in properly setting expectations ot he users. </rant>
I was also writing to ILUC recently which caused this overflow of disgust, at it may be perceived by some members.
Apologies for any misunderstandings.
As always, with warm regards,
Rajagopal
On 3/24/2011 2:07 PM, m.roth@5-cent.us wrote:
Having said that, I have this troubling thought for last decade: What exactly is high availability: is it 24/7 power on time? or is ti "when needed". Please not it am not talking about the maybe arrogant "on demand" attitude of a human.
Ok, h/a is not "fault tolerance", which is for 99.+% uptime (and you *pay* a lot for additional decimals there).
Most of the computation in percentages is like computing rates for insurance - balancing the penalty for missing your SLA against the cost of providing it. But on the operations side you can only work in integer numbers of instances. You know everything will break and if you can't be down long enough to replace it, you need a complete duplicate copy, and then you may need to duplicate that pair in another location. If you aren't already working with grid/cluster systems you almost always more than double your cost to get even the slightest increase in expected availability.
What exactly is "production" use? (I know DEV, UAT, blah, bla, tla etc been there done that and I don't have the T shirt - nobody gave me one)
All services expected from a server at a given IP should be available up to or over 99% of the time, with no lost transactions, or transactions in an undefined state - that's h/a and production.
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
On 03/25/11 11:32 AM, Les Mikesell wrote:
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
mmm, yes, 'data maybe'. good enough for web forums and blogs.
I'm getting really annoyed when upper corporate management keeps saying we need to cloudify our highly transactionally intensive manufacturing execution system where we can't AFFORD to lose ANY data. Of course, when presented with a plan that has a 8-9 figure budget for a 4 year transition to a new 'cloud' architected system (have to do it in overlap or the factory stops til the new system is fully deployed and working, and every single application has to be reengineered from scratch) they cough and sputter.
So, instead, we cross out 'data center' and write 'cloud' on our architectural diagrams, and go ahead and virtualize as much of the middleware layers as we can, since new hardware is so much faster than the older hardware the middleware was designed to run on (hey, 8 vmware esxi boxes running 50 Linux VMs is a cloud, right?)
John R Pierce wrote:
On 03/25/11 11:32 AM, Les Mikesell wrote:
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
Well... except that in this context, it's not only database transactions: it's any granular interaction between client and server. You don't, for example, want part of a form you've just clicked <submit> on to only partly get there, if there's a network blip or whatever.
mmm, yes, 'data maybe'. good enough for web forums and blogs.
I'm getting really annoyed when upper corporate management keeps saying we need to cloudify our highly transactionally intensive manufacturing execution system where we can't AFFORD to lose ANY data. Of course,
#insert "wget http://executives_r_us.com/current_buzzwords.html"
(And don't get me started on the nineties, and the "p"* word!!!) <snip>
So, instead, we cross out 'data center' and write 'cloud' on our architectural diagrams, and go ahead and virtualize as much of the middleware layers as we can, since new hardware is so much faster than the older hardware the middleware was designed to run on (hey, 8 vmware esxi boxes running 50 Linux VMs is a cloud, right?)
Why, did you think it wasn't?
mark
* Paradigm, as in, "the new flavor toothpaste of spearment instead of peppermint is a New Paradigm!!! and will (dare I say it) Change the World As We Know It!!!!!!"
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 03/25/11 12:11 PM, m.roth@5-cent.us wrote:
- Paradigm, as in, "the new flavor toothpaste of spearment instead of peppermint is a New Paradigm!!! and will (dare I say it) Change the World As We Know It!!!!!!"
:)
We're still trying to stick forks in "Younameit as a Service" (YaaS)
On 3/25/2011 2:11 PM, m.roth@5-cent.us wrote:
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
Well... except that in this context, it's not only database transactions: it's any granular interaction between client and server. You don't, for example, want part of a form you've just clicked<submit> on to only partly get there, if there's a network blip or whatever.
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down. For a lot of things it would be better to keep running with timestamp or clock vectors on the data that will be used to track the multiple versions you'll have as the system reconverges. I'd expect Amazon's shopping cart to work that way, although they might do something more transactional when finalizing a purchase.
Les Mikesell wrote:
On 3/25/2011 2:11 PM, m.roth@5-cent.us wrote:
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
Well... except that in this context, it's not only database transactions: it's any granular interaction between client and server. You don't, for example, want part of a form you've just clicked<submit> on to only partly get there, if there's a network blip or whatever.
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down. For a lot of things it would be
<snip> Les, ignore d/b. Think of you submitting, from your browser, a request for something, and the transaction is incomplete, and it gets to the server, and instead of getting what it was allowed to, it gets something below that level, that it's *not* allowed to get, and you really *don't* want accesible, but it can do it as apache....
Or something as trivial as requesting something, and your email address is truncated.
BEA doesn't make Big Bucks for Tuxedo, for example, only for d/b transactions.
mark
On 3/25/2011 2:48 PM, m.roth@5-cent.us wrote:
Well... except that in this context, it's not only database transactions: it's any granular interaction between client and server. You don't, for example, want part of a form you've just clicked<submit> on to only partly get there, if there's a network blip or whatever.
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down. For a lot of things it would be
<snip> Les, ignore d/b. Think of you submitting, from your browser, a request for something, and the transaction is incomplete, and it gets to the server, and instead of getting what it was allowed to, it gets something below that level, that it's *not* allowed to get, and you really *don't* want accesible, but it can do it as apache....
Or something as trivial as requesting something, and your email address is truncated.
That doesn't really relate to distributed vs. non-distributed or transactional vs non-transactional. That's just application level stuff.
BEA doesn't make Big Bucks for Tuxedo, for example, only for d/b transactions.
I thought it was about making multiple separate things that might have their own transactional concepts have an upper level transaction that either fails or completes consistently. That is, it favors the C (consistency) in CAP over availability. The cloud DB's tend to favor availability even when consistency can't be guaranteed temporarily, or they give the client the opportunity to control the consistency level it wants for any operation (i.e. wait for n number of writes to be completed, read n copies with their clock vector, etc.).
On Friday, March 25, 2011 03:35:29 pm Les Mikesell wrote:
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down.
Puh-lease. TCP has solved that problem; look into the new algorithms and techniques PostgreSQL 9 brings to the ACID table.
Networks at layer 3 are expected to blip; TCP at layer 4 makes it a reliable stream. Or if it goes down both endpoints know it went down, and the database engine has a choice whether to abort and rollback or wait on a retry. Replay write-ahead logs are another way to deal with this.
On 3/26/11 12:51 PM, Lamar Owen wrote:
On Friday, March 25, 2011 03:35:29 pm Les Mikesell wrote:
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down.
Puh-lease. TCP has solved that problem; look into the new algorithms and techniques PostgreSQL 9 brings to the ACID table.
For a single instance. The issue in scaling and failover scenarios is that you need multiple, perhaps many, copies of data, and what cloud databases and the nosql and CAP buzzwords are all about are how to handle the situation when part of that storage is unavailable, or worse, the copies are segmented and still running independently.
Networks at layer 3 are expected to blip; TCP at layer 4 makes it a reliable stream. Or if it goes down both endpoints know it went down, and the database engine has a choice whether to abort and rollback or wait on a retry. Replay write-ahead logs are another way to deal with this.
Even with a simple replication in an ACID system - if your remote copy also permits updates you have to decide if the whole system should become unavailable because of the single failure or if you should allow potentially conflicting writes to continue while the systems are disconnected. The scalable DBs start with the premise that partitioning is an expected real-world occurrence that applications have to deal with (and the better ones also transparently deal with adding/removing nodes as capacity needs grow and shrink). There are times an application should abort if it can't ensure that all copies have consistency but they may be rare compared to the times you can continue with the newest data you know about.
On 3/25/2011 1:57 PM, John R Pierce wrote:
Not everything deals in transactions, though. The recently popular distributed database versions that scale up are more about doing something reasonable in scenarios where you can't guarantee a transaction state (where 'reasonable' is defined by the application).
mmm, yes, 'data maybe'. good enough for web forums and blogs.
I'm getting really annoyed when upper corporate management keeps saying we need to cloudify our highly transactionally intensive manufacturing execution system where we can't AFFORD to lose ANY data.
Transactions aren't about not losing data, they mean you fail completely in any situation where you can't guarantee that all copies are synchronized. Things do break and there are many situations where not failing and eventually restoring consistency with data stored during the time consistency was impossible is a better approach. Yours may not be one of them. There is a lot of literature on CAP theory if you haven't been swamped by it already. And riak looks really sensible to me, although I haven't done more than fire up a test instance yet.
Of course, when presented with a plan that has a 8-9 figure budget for a 4 year transition to a new 'cloud' architected system (have to do it in overlap or the factory stops til the new system is fully deployed and working, and every single application has to be reengineered from scratch) they cough and sputter.
So no one develops new applications there?
So, instead, we cross out 'data center' and write 'cloud' on our architectural diagrams, and go ahead and virtualize as much of the middleware layers as we can, since new hardware is so much faster than the older hardware the middleware was designed to run on (hey, 8 vmware esxi boxes running 50 Linux VMs is a cloud, right?)
I can't criticize that since we are doing the same. Odds are that you could redistribute the apps to run directly on 8 linux hosts with better performance, but hardware is cheap and VMs are easy.
On 03/25/11 12:21 PM, Les Mikesell wrote:
So no one develops new applications there?
This is a large scale manufacturing execution system. You don't just go off and design an all new system based on the buzzwords d'jour, when your factories are dependent on it.
Picture large factory floors with dozens of assembly lines each with 100s of pieces of computer controlled industrial equipment, all developed by different vendors, many 5-10 years old because THEY STILL WORK, talking proprietary protocols to middleware layers of data concentrators, which in turn talk to a cluster of core databases that track everything going on, and then a maze of back end reporting systems, shipping systems, data warehousing extractors, realtime production analysis (ok, thats part of reporting), statistical error analysis and trend prediction (feeds back into the reporting databases), etc, etc. There's also subsystems that monitor the overall process flow and manipulate production workloads and product mix, etc etc.
ALL this stuff would need replacing to work with a radically different core architecture. The last major upgrade of the core database architecture took 5 years to deploy in parallel with the previous system after 5 years of development (and it maintained backwards compatability with the floor/middleware side of things). All of our ongoing development work is evolutionary rather than revolutionary, new pieces have to be compatible with old pieces. We thought we could kill off the legacy support for some really old factory floor MSDOS based systems that used some truly ancient protocol APIs we'd developed over 15 years ago, and we discovered there's still a few 100 of those burn-in ovens running in some of the more remote factories, so we still need to handle the oddball data format they generate (yes, there's middleware layers that translate the really ancient into the merely antique). The physical factories are in a perpetual balance on the edge of chaos, if there's a problem on a line, work in progress gets manually moved off to other lines, events can arrive at the core database out of sequence due to network buffering delays yet we need to process them in order and still be able to produce accurate responses 1 second of realtime after the preceding event. Every phase of the data flow has resilience designed in.
On 3/25/2011 2:59 PM, John R Pierce wrote:
So no one develops new applications there?
This is a large scale manufacturing execution system. You don't just go off and design an all new system based on the buzzwords d'jour, when your factories are dependent on it.
Pretty much everyone is in the same shape. But if you want to improve things beyond what you get from updating hardware you have to make changes sometime.
Picture large factory floors with dozens of assembly lines each with 100s of pieces of computer controlled industrial equipment, all developed by different vendors, many 5-10 years old because THEY STILL WORK, talking proprietary protocols to middleware layers of data concentrators, which in turn talk to a cluster of core databases that track everything going on, and then a maze of back end reporting systems, shipping systems, data warehousing extractors, realtime production analysis (ok, thats part of reporting), statistical error analysis and trend prediction (feeds back into the reporting databases), etc, etc. There's also subsystems that monitor the overall process flow and manipulate production workloads and product mix, etc etc.
We redistribute commodity/stock market data - complexity isn't something surprising, it is the norm.
ALL this stuff would need replacing to work with a radically different core architecture.
Most of our stuff couldn't possibly work the way it did a few years ago due to external changes in volume and scale. But each change had to be non-disruptive and in nearly all cases new/old versions of a component had to co-exist in the server farms transparently. But, I suppose we think of the software as more of a core component of our business than a factory might.
The last major upgrade of the core database architecture took 5 years to deploy in parallel with the previous system after 5 years of development (and it maintained backwards compatability with the floor/middleware side of things).
So doesn't that mean you need to start the next design sooner vs later? If the middleware layer handles most of the component interaction you might have some freedom to make piecemeal changes. On the other hand, distributed database technology is still evolving rapidly so if you aren't running into scaling problems waiting might be a good idea.
On 03/25/11 2:43 PM, Les Mikesell wrote:
So doesn't that mean you need to start the next design sooner vs later? If the middleware layer handles most of the component interaction you might have some freedom to make piecemeal changes. On the other hand, distributed database technology is still evolving rapidly so if you aren't running into scaling problems waiting might be a good idea.
we're doing fine for now... the main databases are Oracle 10g, we're testing 11g now, and have been moved from big Sun Solaris boxes to big IBM Power AIX boxes. A production server in Thailand I was just doing performance monitoring on has 16 x 3Ghz CPUs on a Power 750 (w/ 4 threads per core) and 96GB ram, and was humming along just fine, once I convinced the local operations folks that one of our newer software components needed its queue files on different disks than the main oracle DB. Kinda scary watching 64 CPU threads all running at 50-75% and 20 different SAN LUN's humming at 50% write bandwidth, and realizing the system was working just fine. This particular logical server (its a hardware partition or LPAR on a larger box) is running what traditionally was 2 seperate systems, so we can unload it back onto two systems if needed, and also scale each systems several times larger each without getting into architectural limitations, just buy cutting big checks for more capital equipment.
CentOS gets used extensively in the middleware layer, btw.
On 03/25/11 3:37 PM, John R Pierce wrote:
On 03/25/11 2:43 PM, Les Mikesell wrote:
So doesn't that mean you need to start the next design sooner vs later? If the middleware layer handles most of the component interaction you might have some freedom to make piecemeal changes. On the other hand, distributed database technology is still evolving rapidly so if you aren't running into scaling problems waiting might be a good idea.
we're doing ...
choot, that was going to be a private response to Lee, heh.
ooops.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Rajagopal Swaminathan Sent: Thursday, March 24, 2011 14:49 To: CentOS mailing list Subject: [CentOS] {OT] Re: Installing IMA (Integrity Measurement Architecture) on CentOS 5.5
Greetings,
Having said that, I have this troubling thought for last decade: What exactly is high availability: is it 24/7 power on time? or is ti "when needed". Please not it am not talking about the maybe arrogant "on demand" attitude of a human.
You probably don't want it to be your only reference, but in my opinion Wikipedia has a pretty good first pass definition of high availability [1] means between you and your boss, i.e., "...a prearranged level of operational performance will be met during a contractual measurement period."
This under extreme circumstances in India like 4 hours of (Electrical Power Load Shedding) outage every day
So have you and your boss prearranged a "level of operational performance will be met during a contractual measurement period"? Something like: The system will be available to the users in the building 90% of the time when the local power grid is powered up?
Greetings,
Thanks for your reply.
On 3/25/11, Denniston, Todd A CIV NAVSURFWARCENDIV Crane todd.denniston@navy.mil wrote:
So have you and your boss prearranged a "level of operational performance will be met during a contractual measurement period"? Something like: The system will be available to the users in the building 90% of the time when the local power grid is powered up?
:)
I understand that and have tried to do just that for a long time... Explaing to PHBs that computer services will take about half an hour to be on as usual, fruitlessly.
Almost all of them younger to me by biological age.
Even today I am doing that to my current boss who is biologically and technologically younger to me in FLOSS but elder to me in Micrsoft products.
<rant>
sigh... What all things a man has to do to keep his family afloat.
I have strived to make it available as soon as I could boot at least one of those beasts as those fedora 3 and then Centos nodes.
Well I am not managing those beasts after Jan 2008. So no bother at this point of time.
This whole HA thing was between Aug 2005 to Dec 2008 in my stint when I had owned the process of camera ready publication using only F3 and CUPS which was released by Shri N. R. Narayanamurthy, (estwhile CEO, Infosys):
http://vpmthane.org/pub%20-%20research%20vol%202005-6/home.htm
and part during http://vpmthane.org/pub%20-%20research%20vol%202005-6/home.htm
In the same timeframe I had presented some other papers too: http://orientalthane.com/abstracts.pdf
http://orientalthane.com/Inst%20Abstract%202005.pdf
and some few more forays into academia where I could not make way further due to lack of paper qualifications </rant>
Oh! ignore those few silly articles by me in that.
May I know if we have a centos-social list like IRC where we can rant about our experiences?
Oh, some memories are so vivid and sometimes livid.
Above IMHO,
Regards,
Rajagopal