On 3/26/11 12:51 PM, Lamar Owen wrote:
On Friday, March 25, 2011 03:35:29 pm Les Mikesell wrote:
If 'get there' is defined as all redundant copies being in a consistent state, then you'll fail at this point in transactional mode in the fairly likely event that you have a network blip between the db master and slave(s) or one of them is down.
Puh-lease. TCP has solved that problem; look into the new algorithms and techniques PostgreSQL 9 brings to the ACID table.
For a single instance. The issue in scaling and failover scenarios is that you need multiple, perhaps many, copies of data, and what cloud databases and the nosql and CAP buzzwords are all about are how to handle the situation when part of that storage is unavailable, or worse, the copies are segmented and still running independently.
Networks at layer 3 are expected to blip; TCP at layer 4 makes it a reliable stream. Or if it goes down both endpoints know it went down, and the database engine has a choice whether to abort and rollback or wait on a retry. Replay write-ahead logs are another way to deal with this.
Even with a simple replication in an ACID system - if your remote copy also permits updates you have to decide if the whole system should become unavailable because of the single failure or if you should allow potentially conflicting writes to continue while the systems are disconnected. The scalable DBs start with the premise that partitioning is an expected real-world occurrence that applications have to deal with (and the better ones also transparently deal with adding/removing nodes as capacity needs grow and shrink). There are times an application should abort if it can't ensure that all copies have consistency but they may be rare compared to the times you can continue with the newest data you know about.