ACID-compliant filesystem (was:Re: [CentOS] Re: centos] 4.4 upgrade problems)

Mon Sep 11 12:01:36 UTC 2006

[I realize this is growing OT, and I don't plan to discuss this ad nauseum.]
[Note also that a LUFS-based ACID filesystem (using PostgreSQL as the storage 
manager) exists; see http://www.edlsystems.com/psqlfs/ for this.  Since it's 
LUFS-based, probably couldn't use it for a root filesystem anyway....and it's 
fairly old at this point; a similar filesystem for FUSE is at 
http://relfs.sourceforge.net/; also, to see how old an idea this is, read 
http://www.linuxjournal.com/article/1383 for a 1997-era take on it]

On Saturday 09 September 2006 21:08, Les Mikesell wrote:
> On Sat, 2006-09-09 at 16:57, Lamar Owen wrote:
> > In the general case, I'd like to issue something like:
> >
> > # acidfs-begin-transaction
> > # yum -y update
> > [bunch of output]
> > # if yum-no-error-condition;
> > #    acidfs-commit
> > # else
> > #    acidfs-rollback
> > # fi

> Isn't that what LVM snapshots are supposed to provide?

No.  LVM snapshots could allow you to roll back all changes to a filesystem to 
a previous state (a VMware snapshot likewise); what I'm talking about only 
rolls back (if needed) the changes made by the yum process, allowing other 
filesystem changes to stay (like changes to a running PostgreSQL or MySQL 
database, or web page changes, or /var/log/messages changes, etc).  Basic 
database stuff.  And, critical to what I'm talking about, a truly ACID 
filesystem isolates the changes-in-process-but-not-committed to the process 
doing the changes; the rest of the system is clueless as to the changes until 
the filesystem changes are committed.

Of course, my procedure above is oversimplified; one would obviously want to 
lock for writing the files impacted by an update; for that, you'd want to get 
a list of the rpms that are going to be updated, and then lock for writing 
the files in the packages that are going to be updated.  With MVCC, the write 
lock does not block any readers (they're going to get the previous version 
anyway).

> > The currently loaded programs need to continue to have the
> > older libs available if needed;

> This would not be possible without massive changes in the way
> RPM works.  The new installs won't happen if they can't see
> their dependent libs.

If the filesystem itself is using MVCC, then it is done below the RPM level by 
the filesystem; until the commit occurs all processes except the one doing 
the update still see the old filesystem state.  This is the 'I' in ACID, and 
is a basic trait of all but the most crippled databases.

> The only hope would be to make the equivalent of a virtual machine
> where the old system keeps running until the new one is
> completely constructed.

If the filesystem itself implements process-granular MVCC (multiversion 
concurrency control), then a VM environment isn't necessary.

> Backwards compatibility?  You seem to have confused Linux
> distributions with something else.  Try, for example, to
> copy a Centos 3.x distro onto filesystems created by
> 4.x and make it boot.

That's forwards compatibility.  Take a CentOS 4 system and install to a 
filesystem created by CentOS 3 to test backwards compatibility.

Yes, I know the 'culture' of real backwards compatibility is not strong, and 
that's regretable, but at the same time even today most all Linux 
distributions are capable of reading very early ext2 filesystems.  Now, I'm 
not sure if the modern Linuxen would open and run a libc4 a.out binary, 
though....

> Are filesystem authors required to be reasonable?

Why not? (Not that it's relevant).  <rant>And that's one reason I won't touch 
ReiserFS with a ten foot pole, even though it has some very nice 
database-like features.</rant>  Unfortunately there are a few high-profile 
OSS developers who aren't apparently reasonable (Schilling, for instance); 
most OSS developers, thankfully, are fairly reasonable, within reason (The 
PostgreSQL team, for instance, to use one with which I am closely familiar).

> I think I'm missing something here.  If yum itself or the rpm
> database wasn't broken, what went wrong?

Good question.  I don't have a full answer to this; I do know that a 'yum 
rollback' would be a very nice feature for me right now for that one server.  
The rpm database is fine; yum did not crash out, but I still got dupes, and 
bind fell over and died a horrible screaming death because of it.  I do not 
know exactly what happened behind the scenes to create the mess; I am just 
left with a mess after following the 'recommended' procedure (that worked 
fine on several other boxen; none of which were active nameservers, though).

> But you've made a big assumption here that yum itself would
> work properly while doing this.  If yum was working right
> you wouldn't have dups now.

Maybe.  I don't know that for sure; I think the problem lies deeper, myself.  
But I have no evidence to back up my gut feeling.

> > Just throwing an idea out, that's all, for discussion.  This systemic
> > non-atomicity and inconsistency is endemic to all linuxen at the moment.

> And necessarily so, since each rpm package installs independently
> and may have to complete with both process and filesystem changes
> before some of the others will work.

Yeah, I know that far too well; and I know some of RPM's more arcane bugs that 
have, in the past, been resolved with WONTFIX.  Each RPM is standalone; in 
the process of maintaining a fairly interdependent set of RPMs (PostgreSQL) 
for five years I learned this very well.  But thanks to my experience at this 
low level I am of the conviction that this is the wrong thing from a system 
point of view; unfortunately I am not convinced that the 'right way' is out 
there; but I'd know it if I saw it.  An ACID compliant filesytem, among other 
advantages, could alleviate and work around some of the issues of the RPM 
package system (Debian's isn't any better in this regard).
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu