Calling All FS Fanatics

List overview All Threads
Download

newer

older

fftw upgrade?

Looking for SQSH Rpms

Kirk Bocek

2 Oct 2006 2 Oct '06

11:41 p.m.

Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

Kirk Bocek

Show replies by date

karl＠klxsystems.net

2 Oct 2 Oct

11:53 p.m.

For our mysql servers we use reiserfs, which we install via a kernel rpm.

We then install reiserfs-tools rpm, and do some work on /etc/fstab and some mount commands to get it all functioning.

We do this for performance and redundancy.

The daemons you run will likely have a say in which filesystem you plan to deploy, good idea to post to those lists as well. e.g. "Squid performs horrible on RAID5, and it doesn't use SMP, it likes ext3 just fine because of how it works".

Names some daemons, you'll probably get alot of opinions from people fairly close to their respective code-bases, or their shadowy minions ; )

-karlski

...

Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

Kirk Bocek

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Kirk Bocek

11:58 p.m.

Well, Karlski, my answer would have to be 'everything.' :)

This will be a pretty general purpose server doing many different things. Mysql is one of the things it will be doing.

Now I've handed the bulk of the array space over to LVM. That would give me the flexibility to use more than one filesystem. Hmmm, I'll have to dig up some mysql benchmarks, run them up the flag pole and see who salutes.

Kirk Bocek

karl@klxsystems.net wrote:

...

For our mysql servers we use reiserfs, which we install via a kernel rpm.

We then install reiserfs-tools rpm, and do some work on /etc/fstab and some mount commands to get it all functioning.

We do this for performance and redundancy.

The daemons you run will likely have a say in which filesystem you plan to deploy, good idea to post to those lists as well. e.g. "Squid performs horrible on RAID5, and it doesn't use SMP, it likes ext3 just fine because of how it works".

Names some daemons, you'll probably get alot of opinions from people fairly close to their respective code-bases, or their shadowy minions ; )

-karlski

...
Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

Kirk Bocek

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Feizhou

3 Oct 3 Oct

9:22 a.m.

karl@klxsystems.net wrote:

...

For our mysql servers we use reiserfs, which we install via a kernel rpm.

JFYI: I got the following on the reiser mailing list. The OP was also told to upgrade his reiserfs progs to the latest versions.

The bug is fixed in 2.6.18 which I built. But not (2.6.9-42.0.2.plus.c4) which is the latest standard centos/redhat kernel that support reiserfs.

Joshua Baker-LePain

2 Oct 2 Oct

11:57 p.m.

On Mon, 2 Oct 2006 at 4:41pm, Kirk Bocek wrote

...

Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

To a large extent it depends on what the FS will be doing. Each have their strengths.

That being said, I'd lean strongly towards XFS or JFS. Reiser... worries me. AIUI, the current incarnation has been largely abandoned for Reiser4, which is having all sorts of issues getting into the kernel.

I've used XFS for years and had very good luck with it. And some folks I respect very much here are using JFS on critical systems. Test 'em both under your presumed workload and go with whatever gives you the warm fuzzies.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Kirk Bocek

3 Oct 3 Oct

12:08 a.m.

Joshua Baker-LePain wrote:

...

Reiser... worries me.

A bit of googling gave me the same impression. I don't like being worried.

...

AIUI,

Ah, the sound I make when a filesystem crashes...

...

I've used XFS for years and had very good luck with it. And some folks I respect very much here are using JFS on critical systems. Test 'em both under your presumed workload and go with whatever gives you the warm fuzzies.

Since you're the one who started me on this mess (gee, thanks! :)) here's what XFS looks like after enabling memory interleaving and 3.0GB/Sec SATA:

------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Beryl 10G:64k 59751 93 237853 41 59695 8 48936 77 210088 17 256.7 2 Beryl 10G:64k 59533 94 241177 41 59023 8 52625 80 214198 17 261.3 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP Beryl 16 4646 23 +++++ +++ 4941 20 3050 15 +++++ +++ 783 3 Beryl 16 3515 17 +++++ +++ 3623 15 2829 14 +++++ +++ 827 4

210MB/Sec reads, 235MB/Sec writes. Yummy!

Kirk Bocek

chrism＠imntv.com

12:09 a.m.

Joshua Baker-LePain wrote:

...

On Mon, 2 Oct 2006 at 4:41pm, Kirk Bocek wrote

...
Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

To a large extent it depends on what the FS will be doing. Each have their strengths.

That being said, I'd lean strongly towards XFS or JFS. Reiser... worries me. AIUI, the current incarnation has been largely abandoned for Reiser4, which is having all sorts of issues getting into the kernel.

I've used XFS for years and had very good luck with it. And some folks I respect very much here are using JFS on critical systems. Test 'em both under your presumed workload and go with whatever gives you the warm fuzzies.

I seem to have maxed out at approximately 275mb/sec on writes and about 200mb/sec on reads with the following configuration:

Dual opteron 275's 2gb RAM (4 x 512mb) 3Ware 9550SX w/8 ports 8 x 750gig barracudas (RAID 0) 2 x 80gig seagates for the OS ncq off 9550 set to "performance" rather than "balanced" on the storsave or whatever that parameter was called ext3 file system with "blockdev --setra 16384" <-- great find! CentOS 4.4 64-bit

I'm too chicken/paranoid/etc to fiddle with XFS since I'm cpu bound most of the time (encoding/fondling uncompressed video). At some point, I'll switch the array over to RAID5 so there is some sort of safety net, but right now I'm working with play data so it doesn't really matter.

Cheers,

Kirk Bocek

2:58 a.m.

...

Joshua Baker-LePain wrote: I seem to have maxed out at approximately 275mb/sec on writes and about 200mb/sec on reads with the following configuration:

Dual opteron 275's 2gb RAM (4 x 512mb) 3Ware 9550SX w/8 ports 8 x 750gig barracudas (RAID 0) 2 x 80gig seagates for the OS ncq off 9550 set to "performance" rather than "balanced" on the storsave or whatever that parameter was called ext3 file system with "blockdev --setra 16384" <-- great find! CentOS 4.4 64-bit

I'm too chicken/paranoid/etc to fiddle with XFS since I'm cpu bound most of the time (encoding/fondling uncompressed video). At some point, I'll switch the array over to RAID5 so there is some sort of safety net, but right now I'm working with play data so it doesn't really matter.

3Ware's site seems to point to 300+MB/Sec with 8 disks so it sounds like you're close. Read speed seems low. As I said, enabling memory interleaving on my motherboard and setting the drives to 3GB/Sec made a big difference.

8x750 Gig! I still remember when a friend bought his first 512MB drive and I asked him what he was going to do with all that space! Of course that was long before any thought of video on a PC...

Kirk BOcek

chrism＠imntv.com

12:07 p.m.

Kirk Bocek wrote:

...

8x750 Gig! I still remember when a friend bought his first 512MB drive and I asked him what he was going to do with all that space! Of course that was long before any thought of video on a PC...

Yeah, it's quite a bit of elbow room for now. Going back to memory lane....I remember starting one of the first public access Internet sites in NYC about 15 years ago. One of the original core machines was a 486/25 with 1 or 2mb of RAM, a couple of 80mb quantum SCSI drives, and some multiport serial cards with lots of modems and octopus cabling all over. People used to call long distance to login with a shell account and do their thing as I was one of the few outfits that had any real bandwidth (128k fractional T1...which was more than a lot of college campuses at the time). That machine would often have 20-30 simultaneous dialup users at a whopping 9600 baud and ran Bill Jolitz's 386BSD and then very quickly migrated to BSDI's BSD/OS. :) Eventually, Usenet began to take over available disk space and I got my first 1gig barracuda....then a 4 gig....then an 8 gig...Time flies when you're having fun. ;) And I remember paying $1-2k for a single 4gig barracuda back then and now you can buy a few terabytes for the same investment......

Cheers,

chrism＠imntv.com

7:10 p.m.

Kirk Bocek wrote:

...

...
Joshua Baker-LePain wrote: I seem to have maxed out at approximately 275mb/sec on writes and about 200mb/sec on reads with the following configuration:

Dual opteron 275's 2gb RAM (4 x 512mb) 3Ware 9550SX w/8 ports 8 x 750gig barracudas (RAID 0) 2 x 80gig seagates for the OS ncq off 9550 set to "performance" rather than "balanced" on the storsave or whatever that parameter was called ext3 file system with "blockdev --setra 16384" <-- great find! CentOS 4.4 64-bit

I'm too chicken/paranoid/etc to fiddle with XFS since I'm cpu bound most of the time (encoding/fondling uncompressed video). At some point, I'll switch the array over to RAID5 so there is some sort of safety net, but right now I'm working with play data so it doesn't really matter.

3Ware's site seems to point to 300+MB/Sec with 8 disks so it sounds like you're close. Read speed seems low. As I said, enabling memory interleaving on my motherboard and setting the drives to 3GB/Sec made a big difference.

8x750 Gig! I still remember when a friend bought his first 512MB drive and I asked him what he was going to do with all that space! Of course that was long before any thought of video on a PC...

I just updated with a newer motherboard bios and enabled memory interleaving and now I'm getting 201mb/sec for writes and 317mb/sec for reads. I think that's definitely fast enough for me to stop fiddling with it. :-)

Cheers,

Feizhou

5:16 a.m.

Joshua Baker-LePain wrote:

...

On Mon, 2 Oct 2006 at 4:41pm, Kirk Bocek wrote

...
Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

To a large extent it depends on what the FS will be doing. Each have their strengths.

That being said, I'd lean strongly towards XFS or JFS. Reiser... worries me. AIUI, the current incarnation has been largely abandoned for Reiser4, which is having all sorts of issues getting into the kernel.

I would strongly lean away from XFS. JFS appears to be the safest bet and its performance is actually very good on all aspects from benchmarks I have seen.

reiser4 is having all sorts of issues getting into the kernel and XFS is having all sorts of issues being maintained. Some kernel developers even went so far as to say that they do not want to have anything to do with XFS.

...

I've used XFS for years and had very good luck with it. And some folks I respect very much here are using JFS on critical systems. Test 'em both under your presumed workload and go with whatever gives you the warm fuzzies.

XFS is good until you lose power while the disk subsystem is under load. This was when XFS was in its best form too (around 2.4.18 - 2.4.22). Not many people use JFS but it does actually seem to have the best environment.

Morten Torstensen

7:08 a.m.

Feizhou wrote:

...

XFS is good until you lose power while the disk subsystem is under load. This was when XFS was in its best form too (around 2.4.18 - 2.4.22). Not many people use JFS but it does actually seem to have the best environment.

JFS shares codebase with JFS2 in AIX and sees a lot of development and maintenance there. Filesystems can be tricky from a support POV, especially on large production systems. The upstream provider is pretty picky about filesystems, even if you go for one of the 3rd party supported ones like JFS and OCFS and you have reasons for choosing them.

There are more to filesystems than speed.

-- //Morten Torstensen //Email: morten@mortent.org //IM: Cartoon@jabber.no morten.torstensen@gmail.com And if it turns out that there is a God, I don't believe that he is evil. The worst that can be said is that he's an underachiever.

Feizhou

9:02 a.m.

Morten Torstensen wrote:

...

Feizhou wrote:

...
XFS is good until you lose power while the disk subsystem is under load. This was when XFS was in its best form too (around 2.4.18 - 2.4.22). Not many people use JFS but it does actually seem to have the best environment.

JFS shares codebase with JFS2 in AIX and sees a lot of development and maintenance there. Filesystems can be tricky from a support POV, especially on large production systems. The upstream provider is pretty picky about filesystems, even if you go for one of the 3rd party supported ones like JFS and OCFS and you have reasons for choosing them.

The thing is, I do not see a lot of stuff going on with JFS on LKML. reiser v3 bugs pop up now and then, XFS had spats going on and ext3 is rather lack luster and still gets reports now and then. Upstream going with ext3 is rather expected since Redhat is the backer of ext3 just as Suse is behind reiser v3.

...

There are more to filesystems than speed.

Most certainly. Where are the Linux JFS related complaints?

Morten Torstensen

10:36 a.m.

Feizhou wrote:

...

...
There are more to filesystems than speed.

Most certainly. Where are the Linux JFS related complaints?

Maybe there aren't many? :) Few users or good quality... pick.

Anyway, this is the starting point for JFS: http://jfs.sourceforge.net/

There are mailing lists and bugreporting available there.

Feizhou

4 Oct 4 Oct

2:49 a.m.

Morten Torstensen wrote:

...

Feizhou wrote:

...
...
There are more to filesystems than speed.

Most certainly. Where are the Linux JFS related complaints?

Maybe there aren't many? :) Few users or good quality... pick.

:) Actually I really wondered whether it was due to few users :P

Kirk Bocek

3 Oct 3 Oct

4:15 p.m.

Feizhou wrote:

...

The thing is, I do not see a lot of stuff going on with JFS on LKML. reiser v3 bugs pop up now and then, XFS had spats going on and ext3 is rather lack luster and still gets reports now and then. Upstream going with ext3 is rather expected since Redhat is the backer of ext3 just as Suse is behind reiser v3.

Okay, so Feizhou is of the glass-is-half-empty school of filesystems. :)

Joshua says he has been using XFS for years. Can anyone else share anecdotes regarding XFS? Anyone else happy with it?

Kirk Bocek

chrism＠imntv.com

4:25 p.m.

Kirk Bocek wrote:

...

Feizhou wrote:

...
The thing is, I do not see a lot of stuff going on with JFS on LKML. reiser v3 bugs pop up now and then, XFS had spats going on and ext3 is rather lack luster and still gets reports now and then. Upstream going with ext3 is rather expected since Redhat is the backer of ext3 just as Suse is behind reiser v3.

Okay, so Feizhou is of the glass-is-half-empty school of filesystems. :)

Joshua says he has been using XFS for years. Can anyone else share anecdotes regarding XFS? Anyone else happy with it?

Is your process even disk throughput bound? If not, you may be agonizing over a decision that needn't even be taken if the "tried and true" and default supported file system (ext3) is "fast enough" to avoid becoming a bottleneck.

That's where I find myself so I've taken the easy way out, for now, and have stuck with the standard file system.

Cheers,

Johnny Hughes

5:37 p.m.

On Tue, 2006-10-03 at 09:15 -0700, Kirk Bocek wrote:

...

Feizhou wrote:

...
The thing is, I do not see a lot of stuff going on with JFS on LKML. reiser v3 bugs pop up now and then, XFS had spats going on and ext3 is rather lack luster and still gets reports now and then. Upstream going with ext3 is rather expected since Redhat is the backer of ext3 just as Suse is behind reiser v3.

Okay, so Feizhou is of the glass-is-half-empty school of filesystems. :)

Joshua says he has been using XFS for years. Can anyone else share anecdotes regarding XFS? Anyone else happy with it?

Kirk Bocek

Personally, I would never use anything except ext3 on a RH based kernel ... but that is just me.

Morten Torstensen

8:13 p.m.

Johnny Hughes wrote:

...

Personally, I would never use anything except ext3 on a RH based kernel ... but that is just me.

Yup.. would love to use JFS, but for me it is not worth it. RH basically test NOTHING but ext3. They might test function, but not thorough reliability tests in stress scenarios.

I say that from observing RH, not on actual knowledge of what they test and how.

Bottom line is that I agree with Johnny... if you positively don't *need* another filesystem, use ext3.

Steve Bergman

8:31 p.m.

On Tue, 2006-10-03 at 22:13 +0200, Morten Torstensen wrote:

...

Bottom line is that I agree with Johnny... if you positively don't *need* another filesystem, use ext3.

Plus, I have a notion that the "interaction between ext3 and 3ware raid5" referenced in the previous episode, might just have something to do with ext3's ordered data writes, which can be turned off.

I personally feel that ext3 is a much maligned filesystem. Tragically, it is maligned because the dev team and Redhat chose to do "the right thing".

"The right thing" being that they set data=ordered as the default. There is a significant, though *usually* not severe, performance penalty. But the data integrity guarantees are substantially better than with most any other journalled FS. Kudos to them.

That reminds me, someone mentioned that reiserfs v3 does not have an ordered or full data journalling mode. That is not correct. I'm no reiserfs fan, but I do know that those modes were quietly added to reiserfs v3 a while back. Namesys is usually a publicity hound deluxe, but only for their current project; The old ones can rot.

So, relatively few people know about those additions to v3.

-Steve

Joshua Baker-LePain

8:36 p.m.

On Tue, 3 Oct 2006 at 3:31pm, Steve Bergman wrote

...

On Tue, 2006-10-03 at 22:13 +0200, Morten Torstensen wrote:

...
Bottom line is that I agree with Johnny... if you positively don't *need* another filesystem, use ext3.

Plus, I have a notion that the "interaction between ext3 and 3ware raid5" referenced in the previous episode, might just have something to do with ext3's ordered data writes, which can be turned off.

Oh, I tested ext3 vs. 3ware RAID5 in *multitudes* of configurations -- all 3 different journaling configs, external journals, various size journals, etc. Nothing helped. There's just some bad juju there. On the same hardware, XFS and even ext2 pulled far better than numbers than ext3. Put the 3ware in RAID10 (or use md), though, and ext3 worked just fine with it.

Trust me, it wasn't for lack of trying.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Steve Bergman

8:55 p.m.

On Tue, 2006-10-03 at 16:36 -0400, Joshua Baker-LePain wrote:

...

Trust me, it wasn't for lack of trying.

Hmmm, sorry to hear that.

Have you posted this info to lkml? Even if you don't get an answer, it is something that should be reported. Or perhaps it has already been addressed.

I'm not necessarily recommending that you do this for production, but have you tried a more recent kernel? CentOS's 2.6.9 is sort of ancient.

Due to VM problems with the CentOS 4.4 kernel that I think were likely VMWare related, I recently moved one of my servers to the 2.6.16.x vanilla kernel that is supposed to have long term support now.

It was pretty easy and clean.

I downloaded the 2.6.16.29 source and the original FC5 kernel SRPM, which used kernel 2.6.15.

Copy the proper config file from the configs directory into the 2.6.16.29 source tree. Do a "make oldconfig". And then "make", "make modules install", "make install".

At the very least, you would find out if ext3 in the upcoming CentOS 5 might be likely to handle this better.

Best Of luck!

-Steve

Joshua Baker-LePain

4 Oct 4 Oct

1:06 p.m.

On Tue, 3 Oct 2006 at 3:55pm, Steve Bergman wrote

...

On Tue, 2006-10-03 at 16:36 -0400, Joshua Baker-LePain wrote:

...
Trust me, it wasn't for lack of trying.

Hmmm, sorry to hear that.

Have you posted this info to lkml? Even if you don't get an answer, it is something that should be reported. Or perhaps it has already been addressed.

At the time, I had a long discussion about it on nahant-list (see the embarrassingly titled thread that starts here https://www.redhat.com/archives/nahant-list/2005-October/msg00271.html) with no resolution. And I brought it up again on the ext3 list in April.

...

At the very least, you would find out if ext3 in the upcoming CentOS 5 might be likely to handle this better.

The other issue with ext3 that will soon bite me is the 8TB limitation. It's pretty easy to get above that these days. It's good to see that the RHEL5 beta ups that to 16TB, but I can't say I'm not worried about running ext3 on something that big.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Dan Stoner

2:50 p.m.

...

At the time, I had a long discussion about it on nahant-list (see the embarrassingly titled thread that starts here https://www.redhat.com/archives/nahant-list/2005-October/msg00271.html) with no resolution. And I brought it up again on the ext3 list in April.

It seems that some of the "performance sucks" feeling should actually be directed at the 3Ware RAID adapter that is common in those threads rather than the filesystem.

Using Bonnie against hardware RAID5 on a Dell PowerEdge 2850 (PERC4 something-or-other using Megaraid driver) on ext3 gives 75 MB/s.

Dan Stoner Network Administrator Florida Museum of Natural History University of Florida

Joshua Baker-LePain

2:56 p.m.

On Wed, 4 Oct 2006 at 10:50am, Dan Stoner wrote

...

...
At the time, I had a long discussion about it on nahant-list (see the embarrassingly titled thread that starts here https://www.redhat.com/archives/nahant-list/2005-October/msg00271.html) with no resolution. And I brought it up again on the ext3 list in April.

It seems that some of the "performance sucks" feeling should actually be directed at the 3Ware RAID adapter that is common in those threads rather than the filesystem.

Except for the fact that other FSes on the same hardware get far better performance. These were my results using 2 7506-8 boards, each in hardware RAID5 mode, with a software RAIDO on top:

write read ----- ---- ext2 81 180 ext3 34 222 XFS 109 213

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Camron W. Fox

5:29 p.m.

Joshua Baker-LePain wrote:

...

On Wed, 4 Oct 2006 at 10:50am, Dan Stoner wrote

...
...
At the time, I had a long discussion about it on nahant-list (see the embarrassingly titled thread that starts here https://www.redhat.com/archives/nahant-list/2005-October/msg00271.html) with no resolution. And I brought it up again on the ext3 list in April.

It seems that some of the "performance sucks" feeling should actually be directed at the 3Ware RAID adapter that is common in those threads rather than the filesystem.

Except for the fact that other FSes on the same hardware get far better performance. These were my results using 2 7506-8 boards, each in hardware RAID5 mode, with a software RAIDO on top:
 write   read
 -----   ----
ext2 81 180 ext3 34 222 XFS 109 213

Joshua,

Did you do any other tuning to get the ext2 numbers? I have two 7506-4 boards that I can only seem to get 13/117 out of.

Best Regards, Camron

-- Camron W. Fox Hilo Office High Performance Computing Group Fujitsu America, INC. E-mail: cwfox@us.fujitsu.com

Joshua Baker-LePain

5:37 p.m.

On Wed, 4 Oct 2006 at 7:29am, Camron W. Fox wrote

...

Joshua Baker-LePain wrote:

...
Except for the fact that other FSes on the same hardware get far better performance. These were my results using 2 7506-8 boards, each in hardware RAID5 mode, with a software RAIDO on top:
 write   read
 -----   ----
ext2 81 180 ext3 34 222 XFS 109 213
Did you do any other tuning to get the ext2 numbers? I have two 7506-4 boards that I can only seem to get 13/117 out of.

Well, keep in mind that ext2 number was across 2 8 port boards. So, I wouldn't expect to see much better than 20 on a single 4 port.

As for tuning, you'll have to read through the thread I referenced before. That testing was over a year ago -- I have trouble remembering what I had for lunch yesterday.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Peter Kjellstrom

12:39 p.m.

On Tuesday 03 October 2006 22:36, Joshua Baker-LePain wrote:

...

On Tue, 3 Oct 2006 at 3:31pm, Steve Bergman wrote

...
On Tue, 2006-10-03 at 22:13 +0200, Morten Torstensen wrote:

...
Bottom line is that I agree with Johnny... if you positively don't *need* another filesystem, use ext3.

Plus, I have a notion that the "interaction between ext3 and 3ware raid5" referenced in the previous episode, might just have something to do with ext3's ordered data writes, which can be turned off.

Oh, I tested ext3 vs. 3ware RAID5 in *multitudes* of configurations -- all 3 different journaling configs, external journals, various size journals, etc. Nothing helped. There's just some bad juju there. On the same hardware, XFS and even ext2 pulled far better than numbers than ext3. Put the 3ware in RAID10 (or use md), though, and ext3 worked just fine with it.

Trust me, it wasn't for lack of trying.

Like Joshua I've tried many different configs, different kernels, different journal modes, etc... 3ware + raid5 + ext3 just isn't very fast.

/Peter

Kirk Bocek

3 Oct 3 Oct

9:05 p.m.

Steve Bergman wrote:

...

Plus, I have a notion that the "interaction between ext3 and 3ware raid5" referenced in the previous episode, might just have something to do with ext3's ordered data writes, which can be turned off.

I just remounted an ext3 filesystem with '-o data=writeback' and attempted to run bonnie++ as I've been doing all along here. The system basically came to a halt. The first step in the benchmark creates a series of 1GB files. This hasn't take more than a couple of minutes on any other test. After 10 or 15 minutes with it only half way through the creation process I decided to abort the benchmark. And that's difficult because the system is now only semi-responsive.

'data=writeback' isn't the answer. Sorry.

Kirk Bocek

chrism＠imntv.com

9:21 p.m.

Kirk Bocek wrote:

...

Steve Bergman wrote:

...
Plus, I have a notion that the "interaction between ext3 and 3ware raid5" referenced in the previous episode, might just have something to do with ext3's ordered data writes, which can be turned off.

I just remounted an ext3 filesystem with '-o data=writeback' and attempted to run bonnie++ as I've been doing all along here. The system basically came to a halt. The first step in the benchmark creates a series of 1GB files. This hasn't take more than a couple of minutes on any other test. After 10 or 15 minutes with it only half way through the creation process I decided to abort the benchmark. And that's difficult because the system is now only semi-responsive.

'data=writeback' isn't the answer. Sorry.

I am mounting my RAID device with the data=writeback option without incident.

Cheers,

Kirk Bocek

9:57 p.m.

chrism@imntv.com wrote:

...

Kirk Bocek wrote:

...
I just remounted an ext3 filesystem with '-o data=writeback' and attempted to run bonnie++ as I've been doing all along here. The system basically came to a halt. The first step in the benchmark creates a series of 1GB files. This hasn't take more than a couple of minutes on any other test. After 10 or 15 minutes with it only half way through the creation process I decided to abort the benchmark. And that's difficult because the system is now only semi-responsive.

'data=writeback' isn't the answer. Sorry.

I am mounting my RAID device with the data=writeback option without incident. Cheers,

Okay, I take it back. After remounting with default options, I'm still getting a slowdown. Something else is going on. My Bad!

Feizhou

4 Oct 4 Oct

3:34 a.m.

Morten Torstensen wrote:

...

Johnny Hughes wrote:

...
Personally, I would never use anything except ext3 on a RH based kernel ... but that is just me.

Yup.. would love to use JFS, but for me it is not worth it. RH basically test NOTHING but ext3. They might test function, but not thorough reliability tests in stress scenarios.

I say that from observing RH, not on actual knowledge of what they test and how.

Bottom line is that I agree with Johnny... if you positively don't *need* another filesystem, use ext3.

The Linux kernel's choices of filesystems all have strengths and drawbacks.

ext3 is robust against minor hardware faults. It however can have its directory and some file data messed up real bad when it crashes or encounters power failure. I have had to manually go through mail queues to see what can be salvaged before deleting the entire lot. This is still better than XFS where I don't even bother looking for salvageable mails.

ext3 never matched XFS' performance though...so it is pick your poison.

I guess the best thing is probably to get a battery backed up NVRAM device to use as your external journal and run with data=journal with ext3. This ought to run all other filesystems out of town in terms of performance and integrity for many cases.

The Linux kernel positively needs another filesystem.

Lamar Owen

3 Oct 3 Oct

8:45 p.m.

On Tuesday 03 October 2006 05:02, Feizhou wrote:

...

Suse is behind reiser v3.

Not any more. See http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-defau... for details. SuSE is going to ext3 as its default filesystem with 10.2, looks like.

-- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu

Nathan Grennan

11:44 p.m.

Lamar Owen wrote:

...

On Tuesday 03 October 2006 05:02, Feizhou wrote:

...
Suse is behind reiser v3.

Not any more. See http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-defau... for details. SuSE is going to ext3 as its default filesystem with 10.2, looks like.

I was just about to post this. It is very informative of why reiserfs is a bad idea.

Steve Bergman

4 Oct 4 Oct

12:06 a.m.

On Tue, 2006-10-03 at 16:44 -0700, Nathan Grennan wrote:

...

...
I was just about to post this. It is very informative of why reiserfs is a bad idea.

And by extension, not just v3, but also v4.

I remember back when Linux had no journalling filesystems. Stephen Tweedie said he was working on adding a journalling layer to ext2. Several months after he announced that, and very curious, I emailed to ask him how things were going. He said he'd have something for people to look at in about 6 months.

A year and a half passed before he had something he felt was worthy for the world to see. Progress seemed positively glacial.

Some more time passed, and Hans, of Namesys, announced that he^Wthey were adding journalling to reiserfs. It was all a done in almost no time at all.

I wasn't following things all that closely. To my eye, one day reiserfs went from an overly hyped filesystem, entirely based on B-Trees, to being the first Linux filesystem with journalling.

To be honest, I was excited about it at the time. Ext3 was experimental, as I recall, and had only full data journalling, at a substantial performance penalty.

The thing is, over time, ext3 evolved, and became performant, and standard, and really solid. Tweedie and his team were the tortoise, to Namesys's hare.

Meanwhile, the cracks started to reveal themselves in reiserfs.

The horror stories of data loss...

They ended up getting mostly resolved, though from what I hear, Suse is mainly responsible for that.

These days, Namesys's hype is all about Resiser4.

Resier3 is yesterday! Reiser4 is tomorrow!!!

Yeah, yeah, yeah... Some of us remember last time...

Jim Perrin

12:41 a.m.

...

These days, Namesys's hype is all about Resiser4.

Or keeping Hans away from a murder rap....

...

Resier3 is yesterday! Reiser4 is tomorrow!!!

I get the feeling Reiser4 will be a few days after tomorrow since the devel has some more pressing life issues.

Back on a serious note, while it doesn't really compare much to some of the other options going on in here. You can get some decent ext3 performance boost simply by increasing the commit time, and mounting with noatime (assuming you don't need atime records). I stick with the default ext3 on most systems (otherwise I'm an xfs fan) but these two mount option adjustments let you crank some more speed out of ext3. By default ext3 commits every 5 seconds, try setting commit to 10, 15, or even 20 seconds and see what you get.

-- During times of universal deceit, telling the truth becomes a revolutionary act. George Orwell

Kirk Bocek

3:18 a.m.

Jim Perrin wrote:

...

... (otherwise I'm an xfs fan)...

Share your experiences with xfs a bit, if you would. Joshua seems to have some history with it. Even Feizhou seems to have something nice to say about it. :)

Kirk Bocek

Feizhou

3:53 a.m.

Kirk Bocek wrote:

...

Jim Perrin wrote:

...
... (otherwise I'm an xfs fan)...

Share your experiences with xfs a bit, if you would. Joshua seems to have some history with it. Even Feizhou seems to have something nice to say about it. :)

XFS was good...there were tests conducted against ext3 vs XFS in terms of data loss and directory corruption and XFS did pretty well almost the same as ext3...BUT that was the XFS version 1.1 patched against a 2.4.20 RH kernel or ages ago.

There have been opinions on LKML that XFS was really good around 2.4.18 - 2.4.22 and things I have been through seem to agree with that. With regards to the current version, things have apparently got so messy that some have publicly stated they do not want to have anything to with it. My experiences with XFS on 2.6.x have not been very positive. I have seen it go into read only mode quite a few times and forget about data integrity in a crash or after a power failure. Performance wise, XFS has mostly been very good.

XFS and Linux just do not meld together because Linux and the Irix kernel do certain things differently according to some posts on LKML so I guess there is a reason why Redhat has pulled XFS from its list of supported filesystems.

Feizhou

2:58 a.m.

Nathan Grennan wrote:

...

Lamar Owen wrote:

...
On Tuesday 03 October 2006 05:02, Feizhou wrote:

...
Suse is behind reiser v3.

Not any more. See http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-defau... for details. SuSE is going to ext3 as its default filesystem with 10.2, looks like.

I was just about to post this. It is very informative of why reiserfs is a bad idea.

Then there is their dependence on properly working hardware. Recently they have been talking about making reisefs more robust to hardware faults. So if your disk starts acting up, you might lose data or even your whole filesystem...

Jure Pečar

12:53 p.m.

On Wed, 04 Oct 2006 10:58:27 +0800 Feizhou feizhou@graffiti.net wrote:

...

Then there is their dependence on properly working hardware. Recently they have been talking about making reisefs more robust to hardware faults. So if your disk starts acting up, you might lose data or even your whole filesystem...

There was a nice paper published recently (at OLS or maybe I got the link from one of OLS presentations) about an "iron ext3", ext3 modified to withstand various data corruptions caused possibly by hardware failures. In there is also a nice table with comparisons on how different linux file systems stand up to those corruptions:

http://www.cs.wisc.edu/adsl/Publications/iron-sosp05.pdf

Very good reading to anyone who's concerned about digital data storage.

Also, for discussion about cheap storage there's a mailing list called "linux-ide-arrays": http://marc.theaimsgroup.com/?l=linux-ide-arrays&r=1&w=2 Subscribe at http://lists.math.uh.edu/cgi-bin/mj_wwwusr

But remember ... "Cheap, fast, reliable. Pick any two, you can't have all three" ... is even more true for storage than for anything else ;)

-- Jure Pečar http://jure.pecar.org

Feizhou

4:03 a.m.

Lamar Owen wrote:

...

On Tuesday 03 October 2006 05:02, Feizhou wrote:

...
Suse is behind reiser v3.

Not any more. See http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-defau... for details. SuSE is going to ext3 as its default filesystem with 10.2, looks like.

This did not make it to slashdot?!?! Hans sure does not have a lot going for him anymore.

Nathan Grennan

3 Oct 3 Oct

7:15 p.m.

Kirk Bocek wrote:

...

Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

Here is the story, if not somewhat outdated, that I have learned over time.

XFS, fast, but can fail under load, does XORs of data, so a bad write, as in power failure, can mean garbage in a file. It is meta-data only journaling. Also slow on deletes.

JFS, reasonable fast, not popular, read of lots of bugs last time I looked into it a few years ago, again meta-data only journaling.

ReiserFS v3, very buggy, meta-data only, and not well maintained at this point. Bad writes can lead to zeros in your files.

ReiserFS v4, sounds great, may be everything I want in a filesystem, but isn't in the kernel yet. Can do data journaling in addition to meta-data only.

ext3, works for me. It is meta-data only by default, but does it in s a such a way to minimize the risk much more than other filesystems. Also has writeback mode which is like other filesystems if you are looking for better performance. Also has full data journalling mode, which is atomic and is actually faster than the other two in certain situations.

Les Mikesell

7:50 p.m.

On Tue, 2006-10-03 at 12:15 -0700, Nathan Grennan wrote:

...

ext3, works for me. It is meta-data only by default, but does it in s a such a way to minimize the risk much more than other filesystems. Also has writeback mode which is like other filesystems if you are looking for better performance. Also has full data journalling mode, which is atomic and is actually faster than the other two in certain situations.

Has anyone done benchmarks on ext3 with the dir_index option? I've used reiserfs in the past for better performance in creating and deleting many files on filesystems that handle maildir directories or for backuppc with it's millions of hardlinks, but perhaps ext3 would works as well with the indexes enabled.

-- Les Miksell lesmikesell@gmail.com

Kirk Bocek

8:49 p.m.

Where do you find this option, Les? I don't see it in the man page for mount.

Kirk Bocek

Les Mikesell wrote:

...

Has anyone done benchmarks on ext3 with the dir_index option? I've used reiserfs in the past for better performance in creating and deleting many files on filesystems that handle maildir directories or for backuppc with it's millions of hardlinks, but perhaps ext3 would works as well with the indexes enabled.

Les Mikesell

9:23 p.m.

On Tue, 2006-10-03 at 13:49 -0700, Kirk Bocek wrote:

...

...
Has anyone done benchmarks on ext3 with the dir_index option? I've used reiserfs in the past for better performance in creating and deleting many files on filesystems that handle maildir directories or for backuppc with it's millions of hardlinks, but perhaps ext3 would works as well with the indexes enabled.

...

Where do you find this option, Les? I don't see it in the man page for mount.

It's a filesystem option, not a mount option. Look at man tune2fs.

-- Les Mikesell lesmikesell@gmail.com

Joshua Baker-LePain

4 Oct 4 Oct

12:35 p.m.

On Tue, 3 Oct 2006 at 4:23pm, Les Mikesell wrote

...

On Tue, 2006-10-03 at 13:49 -0700, Kirk Bocek wrote:

...
...
Has anyone done benchmarks on ext3 with the dir_index option? I've used reiserfs in the past for better performance in creating and deleting many files on filesystems that handle maildir directories or for backuppc with it's millions of hardlinks, but perhaps ext3 would works as well with the indexes enabled.

...
Where do you find this option, Les? I don't see it in the man page for mount.

It's a filesystem option, not a mount option. Look at man tune2fs.

Also note that it's the default for FSs created by anaconda at install time, but *not* (last I checked) a default for mke2fs. To turn dir_index on at mke2fs time, use the '-O dir_index' flag.

If you use tune2fs to add the option to an extant FS, then dir_index only applies to new directories (i.e. created after you added the option). To retroactively apply it to the whole FS, you have to do the tune2fs and then take the FS offline and run 'e2fsck -fD' on the device.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Kirk Bocek

3:23 p.m.

Joshua Baker-LePain wrote:

...

Also note that it's the default for FSs created by anaconda at install time, but *not* (last I checked) a default for mke2fs. To turn dir_index on at mke2fs time, use the '-O dir_index' flag.

If you use tune2fs to add the option to an extant FS, then dir_index only applies to new directories (i.e. created after you added the option). To retroactively apply it to the whole FS, you have to do the tune2fs and then take the FS offline and run 'e2fsck -fD' on the device.

Thanks Joshua, that's good info. I didn't know about the anaconda interaction.

Paul Heinlein

3 Oct 3 Oct

9:27 p.m.

On Tue, 3 Oct 2006, Les Mikesell wrote:

...

Has anyone done benchmarks on ext3 with the dir_index option?

Anecdotally, I've observed dir_index speeding things up noticeably.

-- Paul "the plural of 'anecdote' is not 'data'" Heinlein

Feizhou

4 Oct 4 Oct

3:05 a.m.

Paul Heinlein wrote:

...

On Tue, 3 Oct 2006, Les Mikesell wrote:

...
Has anyone done benchmarks on ext3 with the dir_index option?

Anecdotally, I've observed dir_index speeding things up noticeably.

But not enough to make a really big difference. I would not use ext3 for directories with tens of thousands of entries. I have had a good experience with RH 2.4.20 patched with a XFS upgrade...but I cannot say much for today's 2.6.x kernels.

Kirk Bocek

3 Oct 3 Oct

9:09 p.m.

Nathan Grennan wrote:

...

XFS, fast, but can fail under load, does XORs of data, so a bad write, as in power failure, can mean garbage in a file. It is meta-data only journaling. Also slow on deletes.

You and several others point to a greater chance for data corruption. However, this host will be on a UPS. The system will be safely shut down before the power goes off. Isn't that enough protection?

Kirk Bocek

Steve Bergman

9:23 p.m.

On Tue, 2006-10-03 at 14:09 -0700, Kirk Bocek wrote:

...

You and several others point to a greater chance for data corruption. However, this host will be on a UPS. The system will be safely shut down before the power goes off. Isn't that enough protection?

In that case, why not use the blazingly fast ext2?

-Steve

Kirk Bocek

9:44 p.m.

Steve Bergman wrote:

...

In that case, why not use the blazingly fast ext2?

Hmmm, good idea. I hadn't thought to take a step back. Thanks.

Steve Bergman

10:02 p.m.

On Tue, 2006-10-03 at 14:44 -0700, Kirk Bocek wrote:

...

Steve Bergman wrote:

...
In that case, why not use the blazingly fast ext2?

Hmmm, good idea. I hadn't thought to take a step back. Thanks.

Well, that reply was really intended to get you thinking about other ways that the system could be brought down unexpectedly:

If the UPS battery goes bad, and they do, the next power blip and...boom!

Hardware failure of any form.

Kernel panic due to some silly problem like the memory needing to be reseated, due to slightly oxidized connector surfaces.

New admin tripping over the power cord or accidentally unplugging the wrong plug.

Someone plugging a laser printer or space heater into a power strip connected to the UPS.

I actually had that (the space heater thing) happen at one of my client sites once, a long time ago. It wasn't exactly the sort of place that one would expect to see raid5, though. Come to think of it, it was a while back. I think it was a 3B2.

Writing journalling filesystems required a lot more effort than going out and buying a UPS.

But they were written for a reason.

But... if you are certain of your precautions... I've never seen ext2 lose a benchmark... ever.

-Steve

Kirk Bocek

10:03 p.m.

Steve Bergman wrote:

...

But... if you are certain of your precautions... I've never seen ext2 lose a benchmark... ever.

I'm not certain of anything. That's why I'm asking these questions. :)

Kirk Bocek

Steve Bergman

10:21 p.m.

On Tue, 2006-10-03 at 15:03 -0700, Kirk Bocek wrote:

...

Steve Bergman wrote:

...
But... if you are certain of your precautions... I've never seen ext2 lose a benchmark... ever.

I'm not certain of anything. That's why I'm asking these questions. :)

I would be interested to see your results if you care to try ext2. The kernel guys are pretty well committed to supporting it long term. They are absolutely *anal* about making changes which could destabilize it in any way.

As far as data integrity goes, you are at about the same level as ext3 with data=writeback, or most other jounalled filesystems. It's probably rather safer than XFS... according to my sources, anyway.

However, with large filesystems, you could be looking at a lengthy fsck process in the event of an unclean shutdown. Some say it can take days on really large raid arrays, but fsck's on my 0.5 terabyte ext3 FS, which has a nearly identical on disk structure as ext2, takes 8 minutes.

May the Force be with you, and all that sort of rot! ;-)

-Steve

Feizhou

4 Oct 4 Oct

3:12 a.m.

...

However, with large filesystems, you could be looking at a lengthy fsck process in the event of an unclean shutdown. Some say it can take days on really large raid arrays, but fsck's on my 0.5 terabyte ext3 FS, which has a nearly identical on disk structure as ext2, takes 8 minutes.

Probably because they used e2fsprogs lower than version 1.35 or somewhere around that. Upgrading meant huge improvements in the time needed to do fsck.

Kirk Bocek

5:11 a.m.

Steve Bergman wrote:

...

I would be interested to see your results if you care to try ext2.

Well, Steve, I was really hoping to have some benchmarks for you. But some kind of massive slowdown has hit my new server. That's what caused my faulty call on the ext3 journal parameter change.

I don't know what happened. Nothing in the logs or dmesg that I can see. The system now is very slow creating file systems and becomes unresponsive while bonnie++ is writing.

Guess I'll remove all the extraneous LVs and start again.

Kirk Bocek

5 Oct 5 Oct

3:36 a.m.

Steve Bergman wrote:

...

I would be interested to see your results if you care to try ext2. The kernel guys are pretty well committed to supporting it long term. They are absolutely *anal* about making changes which could destabilize it in any way.

I finally figured out my slowdown problem: I had somehow turned off write-caching on the 3Ware controller. Hoo-Boy! Does that kill throughput! What the heck is that option for anyway?

Here are a handful of bonnie++ benchmarks, I decided to just quote the block write and block read numbers:

MB/Sec Write Read XFS: 231 202 ext2, dir_index: 221 205 ext3, dir_index, data=ordered: 80 196 ext3, dir_index, data=writeback: 95 199 ext3, data=writeback: 95 201

As you hinted, ext2 has almost the same performance as XFS. Data=writeback on ext3 helps some but not a whole lot. Dir_index doesn't seem to do a thing.

I'm really torn here. I can make use of the extra write speeds of ext2 or XFS. But is XFS stable and supported enough for 'production' use? Will I regret a forced fsck on a 1TB ext2 volume?

Steve, you say you've been happy with XFS for a few years. Have you been using it under any kind of load?

Kirk Bocek

Feizhou

4:05 a.m.

...

I finally figured out my slowdown problem: I had somehow turned off write-caching on the 3Ware controller. Hoo-Boy! Does that kill throughput! What the heck is that option for anyway?

For cases where you do not want to lose your data when you get a blackout. If you do not have a battery power backup for your cache, you will lose data that is in the cache that has not been committed to the disks.

...

Here are a handful of bonnie++ benchmarks, I decided to just quote the block write and block read numbers:
    MB/Sec
    Write   Read
XFS: 231 202 ext2, dir_index: 221 205 ext3, dir_index, data=ordered: 80 196 ext3, dir_index, data=writeback: 95 199 ext3, data=writeback: 95 201

As you hinted, ext2 has almost the same performance as XFS. Data=writeback on ext3 helps some but not a whole lot. Dir_index doesn't seem to do a thing.

Indexes directories are only useful for cases where there are thousands of files in a directory and you want to access a single file (and you know the name in advance) quickly.

...

I'm really torn here. I can make use of the extra write speeds of ext2 or XFS. But is XFS stable and supported enough for 'production' use? Will I regret a forced fsck on a 1TB ext2 volume?

Are you using the no write cache flag with bonnie++? Otherwise you may not get the same results from whatever it is that you are running.

...

Steve, you say you've been happy with XFS for a few years. Have you been using it under any kind of load?

Run XFS without write caching and you should be safe. Are you creating thousands of files?

Kirk Bocek

4:25 a.m.

Feizhou wrote:

...

For cases where you do not want to lose your data when you get a blackout. If you do not have a battery power backup for your cache, you will lose data that is in the cache that has not been committed to the disks.

That's what I thought. But, jeez, that's *really* slow. I didn't have the patience to wait for a benchmark to finish but I'm guessing about 5MB/Sec writes. Not acceptable.

...

Indexes directories are only useful for cases where there are thousands of files in a directory and you want to access a single file (and you know the name in advance) quickly.

That's also what I thought but wasn't sure.

...

Are you using the no write cache flag with bonnie++? Otherwise you may not get the same results from whatever it is that you are running.

I did *not* run bonnie++ with '-b'. I expect to use write caching in application, so that's what I want to benchmark.

...

Run XFS without write caching and you should be safe. Are you creating thousands of files?

The main stress on this system will be as a media server (mythtv and some other things.) So, no, files will be fewer and larger. Are you talking about write caching on the 3Ware? As I said, without it the system become unresponsive during large writes.

Feizhou

4:40 a.m.

...

...
Run XFS without write caching and you should be safe. Are you creating thousands of files?

The main stress on this system will be as a media server (mythtv and some other things.) So, no, files will be fewer and larger. Are you talking about write caching on the 3Ware? As I said, without it the system become unresponsive during large writes.

XFS is the best for long concurrent writes...so if you are not too concerned about losing data in a crash (or get battery power backup for your 3ware cards and the more cache for your card you have the better) but do you also expect a lot of reading while during your writes? Then jfs might be better.

See http://untroubled.org/benchmarking/2004-04/2.6.5-gentoo/

You can also use the framework at http://untroubled.org/benchmarking/2004-04/fsbench.tar.gz and customize it for your own testing.

Main site for that is http://untroubled.org/benchmarking/2004-04/

Kirk Bocek

4:42 a.m.

Feizhou wrote:

...

...
...
Run XFS without write caching and you should be safe. Are you creating thousands of files?

I'd still like to know what you mean by 'without write caching.' Do you mean setting it on the 3Ware or is there a FS or OS setting you are talking about?

Feizhou

5 a.m.

Kirk Bocek wrote:

...

Feizhou wrote:

...
...
...
Run XFS without write caching and you should be safe. Are you creating thousands of files?

I'd still like to know what you mean by 'without write caching.' Do you mean setting it on the 3Ware or is there a FS or OS setting you are talking about?

There are two levels of caching. Hardware and software. Hardware level caching involves in your case the 3ware card cache and the individual disk caches.

Software level involves the kernel's disk cache.

There are FS level settings available. You can mount the filesystem with the sync option which means writes are to be done synchronously and so these are not stored in the kernel's cache. There is also dirsync which can to used to ensure metadata on files/directories are also written synchronously.

You can also do it on a per file level by setting the file's sync (S) attribute.

So a completely paranoid setup would include making sure that the file/filesystem is sync'ed and also turning off the write cache on the hardware. You should be able to set sync at the software level and leave the hardware write caching on and see a difference.

You can use Bruce Guenter's framework and test the differences. His is pretty simple and you can take out those filesystems that you do not want to test.

Kirk Bocek

3:27 p.m.

Feizhou wrote:

...

There are two levels of caching. Hardware and software. Hardware level caching involves in your case the 3ware card cache and the individual disk caches.

Software level involves the kernel's disk cache.

There are FS level settings available. You can mount the filesystem with the sync option which means writes are to be done synchronously and so these are not stored in the kernel's cache. There is also dirsync which can to used to ensure metadata on files/directories are also written synchronously.

You can also do it on a per file level by setting the file's sync (S) attribute.

So a completely paranoid setup would include making sure that the file/filesystem is sync'ed and also turning off the write cache on the hardware. You should be able to set sync at the software level and leave the hardware write caching on and see a difference.

So what do *you* do when you disable write caching on one of your systems?

Feizhou

6 Oct 6 Oct

3:09 a.m.

...

So what do *you* do when you disable write caching on one of your systems?

One of my posts had something about deleting mails from the mail queue...3ware 750x/850x do not have battery power backup.

I tried to rely on data=journal with ext3 but under heavy load the box would crash rather frequently. XFS was plain disastrous.

Maybe try to get as many spindles as you can and go for write caching off with XFS + sync as it appears your streams are rather important and see whether it meets your needs but speed wise and data integrity wise. The only hope under Linux appears to be reiser4 but that has to wait for a while...

Steve Bergman

5 Oct 5 Oct

6:08 a.m.

On Wed, 2006-10-04 at 20:36 -0700, Kirk Bocek wrote:

...

     MB/Sec
     Write   Read
XFS: 231 202 ext2, dir_index: 221 205 ext3, dir_index, data=ordered: 80 196 ext3, dir_index, data=writeback: 95 199 ext3, data=writeback: 95 201

...

Steve, you say you've been happy with XFS for a few years. Have you been using it under any kind of load?

Thanks for the numbers.

I didn't say I was happy with XFS. I said I don't use it and have heard horror stories about it.

Does anyone know the details as to the work that has gone on in recent kernel.org kernels regarding write barriers? Isn't it supposed to eliminate the need for turning hardware write caching off?

Kirk Bocek

2:35 p.m.

Steve Bergman wrote:

...

I didn't say I was happy with XFS. I said I don't use it and have heard horror stories about it.

My bad. Sorry.

Morten Torstensen

7:19 a.m.

Kirk Bocek wrote:

...

I finally figured out my slowdown problem: I had somehow turned off write-caching on the 3Ware controller. Hoo-Boy! Does that kill throughput! What the heck is that option for anyway?

Err... are you sure you want to turn on write-caching? You will have filesystem corruption in case of a power-down, panic or other crashes then. It does help to have a battery backup on the 3ware card, assuming it is smart enough to write the changes to disk before the OS starts.

Kirk Bocek

4:34 p.m.

Morten Torstensen wrote:

...

Err... are you sure you want to turn on write-caching? You will have filesystem corruption in case of a power-down, panic or other crashes then. It does help to have a battery backup on the 3ware card, assuming it is smart enough to write the changes to disk before the OS starts.

Yea, I'm sure. With write-caching turned off on the 3Ware controller only and writing to an XFS filesystem:

------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP Beryl 5G:64k 9968 15 10447 1 3924 0 25071 38 201973 18 542.5 3 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP Beryl 16 253 1 +++++ +++ 230 1 225 1 +++++ +++ 149 1

Block writes drop to 10MB/Sec plus the system becomes non-responsive during large writes. Can anyone tell me this is acceptable? I'm really at a loss when you and Feizhou tell me to turn off write-caching. How do you get anything even approaching *tolerable* performance?

Steve Bergman

5:33 p.m.

On Thu, 2006-10-05 at 09:34 -0700, Kirk Bocek wrote:

...

Yea, I'm sure. With write-caching turned off on the 3Ware controller only and writing to an XFS filesystem:

You might look into the work done on "write barriers" in more recent kernels. They are supposed to obviate the need to turn off write caching. However, I'm not certain how they relate to hardware raid. The kernel needs to be able to issue a cache flush command to the drives, which makes it a controller driver issue. Anyone have any more info on this?

Les Mikesell

5:54 p.m.

On Thu, 2006-10-05 at 12:33 -0500, Steve Bergman wrote:

...

...
Yea, I'm sure. With write-caching turned off on the 3Ware controller only and writing to an XFS filesystem:

You might look into the work done on "write barriers" in more recent kernels. They are supposed to obviate the need to turn off write caching. However, I'm not certain how they relate to hardware raid. The kernel needs to be able to issue a cache flush command to the drives, which makes it a controller driver issue. Anyone have any more info on this?

I thought that journalled file systems only had to count on the disk/raid not re-ordering the writes to avoid filesystem corruption. That is, is doesn't matter if the writes are cached as long as what is written is written in the same order as the OS issued the writes. You just lose the data of anything that did not make it to the disk but you shouldn't mess up the relationship between free/allocated space and the inodes using it. Is that impression incorrect?

-- Les Mikesell lesmikesell@gmail.com

Feizhou

6 Oct 6 Oct

3:21 a.m.

...

I thought that journalled file systems only had to count on the disk/raid not re-ordering the writes to avoid filesystem corruption. That is, is doesn't matter if the writes are cached as long as what is written is written in the same order as the OS issued the writes. You just lose the data of anything that did not make it to the disk but you shouldn't mess up the relationship between free/allocated space and the inodes using it. Is that impression incorrect?

Journal file systems do not rely on writes not being reordered. They rely on the meta data (directory/inode stuff) being committed to the journal to preserve filesystem integrity. If the meta data was still in the caches whether the raid card cache or the hard disks' cache and not committed to the media, you can expect some directory/inode corruption.

Les Mikesell

3:40 p.m.

On Fri, 2006-10-06 at 11:21 +0800, Feizhou wrote:

...

...
I thought that journalled file systems only had to count on the disk/raid not re-ordering the writes to avoid filesystem corruption. That is, is doesn't matter if the writes are cached as long as what is written is written in the same order as the OS issued the writes. You just lose the data of anything that did not make it to the disk but you shouldn't mess up the relationship between free/allocated space and the inodes using it. Is that impression incorrect?

Journal file systems do not rely on writes not being reordered. They rely on the meta data (directory/inode stuff) being committed to the journal to preserve filesystem integrity. If the meta data was still in the caches whether the raid card cache or the hard disks' cache and not committed to the media, you can expect some directory/inode corruption.

Yes, but what is important is the state of the disk at any time, not whether it is in sync with what the OS thinks is on it. You are going to lose data in any case. Whether you get filesystem corruption or not depends only on the metadata changes being handled in the proper sequence. If none of the changes make it to disk, the filesystem is still fine. If the metadata changes are written in the wrong order or the data change is written first (all very likely with controllers or drives that cache and optimize the writes) the result will be corrupted if it doesn't complete.

-- Les Mikesell lesmikesell@gmail.com

Peter Kjellström

5 Oct 5 Oct

10:05 a.m.

On Thursday 05 October 2006 05:36, Kirk Bocek wrote:

...

... As you hinted, ext2 has almost the same performance as XFS. Data=writeback on ext3 helps some but not a whole lot. Dir_index doesn't seem to do a thing.

dir_index helps for many small files in one directory, not for sequential read/write.

...

I'm really torn here. I can make use of the extra write speeds of ext2 or XFS. But is XFS stable and supported enough for 'production' use? Will I regret a forced fsck on a 1TB ext2 volume?

Steve, you say you've been happy with XFS for a few years. Have you been using it under any kind of load?

I use XFS on Centos-4 here with 9500-S and 9550-SX. The load is quite heavy (~30 climate modelling people) and the volume not tiny (~40 TiB). This system works fine and I have no problems sleeping at night. But then again, if you want data security you'll have to run backups anyway. Any filesystem can die.

This has already been stated in this thread, but it's worth saying again I think: the i386/i686 kernel has 4k kernel stack, x86_64 has 8k, XFS does not like 4k stacks.

/Peter

...

Kirk Bocek

Kirk Bocek

3:45 p.m.

Peter Kjellström wrote:

...

I use XFS on Centos-4 here with 9500-S and 9550-SX. The load is quite heavy (~30 climate modelling people) and the volume not tiny (~40 TiB). This system works fine and I have no problems sleeping at night. But then again, if you want data security you'll have to run backups anyway. Any filesystem can die.

Oh, yea, backups! :)

Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

Feizhou

6 Oct 6 Oct

3:12 a.m.

Kirk Bocek wrote:

...

Peter Kjellström wrote:

...
I use XFS on Centos-4 here with 9500-S and 9550-SX. The load is quite heavy (~30 climate modelling people) and the volume not tiny (~40 TiB). This system works fine and I have no problems sleeping at night. But then again, if you want data security you'll have to run backups anyway. Any filesystem can die.

Oh, yea, backups! :)

Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

Don't forget his comment about the size of the stack. Jeff Mahoney (Linux fs coder) had something to say about XFS' current state of affairs in the Linux kernel and he had something to say about XFS + 4k stacks.

Peter Kjellström

7:14 a.m.

On Friday 06 October 2006 05:12, Feizhou wrote:

...

Kirk Bocek wrote:

...
Peter Kjellström wrote:

...
I use XFS on Centos-4 here with 9500-S and 9550-SX. The load is quite heavy (~30 climate modelling people) and the volume not tiny (~40 TiB). This system works fine and I have no problems sleeping at night. But then again, if you want data security you'll have to run backups anyway. Any filesystem can die.

Oh, yea, backups! :)

Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

Don't forget his comment about the size of the stack. Jeff Mahoney (Linux fs coder) had something to say about XFS' current state of affairs in the Linux kernel and he had something to say about XFS + 4k stacks.

Don't even try it is my feeling (xfs on 4k stacks).

/Peter

Peter Kjellström

7:13 a.m.

On Thursday 05 October 2006 17:45, Kirk Bocek wrote:

...

Peter Kjellström wrote:

...
I use XFS on Centos-4 here with 9500-S and 9550-SX. The load is quite heavy (~30 climate modelling people) and the volume not tiny (~40 TiB). This system works fine and I have no problems sleeping at night. But then again, if you want data security you'll have to run backups anyway. Any filesystem can die.

Oh, yea, backups! :)

Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

About a year. And this system is not the only one running XFS over here.

/Peter

JT Justman

9:22 a.m.

Kirk Bocek wrote:

...

Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

By the way, thank you for helping me educate myself in binary prefix notation. All these years of explaining people the difference and I'd never noticed that there was a standard for this :P

Kirk Bocek

3:12 p.m.

For those too lazy to google:

http://en.wikipedia.org/wiki/Binary_prefix

I'm still not really comfortable pronouncing most of these. A yobibyte sounds like something Frodo would have for lunch.

Kirk Bocek

JT Justman wrote:

...

Kirk Bocek wrote:

...
Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

By the way, thank you for helping me educate myself in binary prefix notation. All these years of explaining people the difference and I'd never noticed that there was a standard for this :P

JT _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Kirk Bocek

3:20 p.m.

I just figured out that I used the wrong prefix. 2^40 bytes is *tebibytes* not tibi...

JT Justman wrote:

...

Kirk Bocek wrote:

...
Heck, 30 users and 40 tibibytes of data is a pretty good stress test. How long have you been running XFS in this setup?

By the way, thank you for helping me educate myself in binary prefix notation. All these years of explaining people the difference and I'd never noticed that there was a standard for this :P

Les Mikesell

3 Oct 3 Oct

10:30 p.m.

On Tue, 2006-10-03 at 17:02 -0500, Steve Bergman wrote:

...

Writing journalling filesystems required a lot more effort than going out and buying a UPS.

But they were written for a reason.

But... if you are certain of your precautions... I've never seen ext2 lose a benchmark... ever.

Since your data normally isn't journalled anyway, you probably aren't any more likely to lose anything with ext2, but you may have to wait through a long fsck. The main problem I've had with ext2 wasn't so much that it needed the fsck to clean it, it was that the stock setup refused to fix many errors automatically. If a system was at all busy when it crashed it would very likely drop you to a root prompt and make you run fsck manually, answering 'y' to every prompt (as though I wouldn't want it fixed...). No fun at all when the box is miles away.

-- Les Mikesell lesmikesell@gmail.com

Kirk Bocek

10:33 p.m.

Les Mikesell wrote:

...

Since your data normally isn't journalled anyway, you probably aren't any more likely to lose anything with ext2, but you may have to wait through a long fsck. The main problem I've had with ext2 wasn't so much that it needed the fsck to clean it, it was that the stock setup refused to fix many errors automatically. If a system was at all busy when it crashed it would very likely drop you to a root prompt and make you run fsck manually, answering 'y' to every prompt (as though I wouldn't want it fixed...). No fun at all when the box is miles away.

It's been awhile since I had to sit through one of those boot time fsck's, but now that you guys remind me of them I remember why I'm using ext3.

Kirk Bocek

Steve Bergman

10:47 p.m.

On Tue, 2006-10-03 at 15:33 -0700, Kirk Bocek wrote:

...

It's been awhile since I had to sit through one of those boot time fsck's, but now that you guys remind me of them I remember why I'm using ext3.

By default, you still have to with ext3. Just not as often. Every 21 mounts or whatever, or 180 days or whatever.

A wise default, I will agree. I just wish it came with a message that said "DON'T PANIC!" in nice friendly letters, so that my clients didn't *freak* when they saw all the warning nessages. I also wish that it would default to doing its *damnedest* to fix the problems rather than dumping them at a cold and lonely # prompt and leaving them with a "broken" system.

If I want it to be conservative about fixing stuff, by God, I can tell it to dump me at a command prompt.

-Steve

Les Mikesell

11:02 p.m.

On Tue, 2006-10-03 at 17:47 -0500, Steve Bergman wrote:

...

...
It's been awhile since I had to sit through one of those boot time fsck's, but now that you guys remind me of them I remember why I'm using ext3.

By default, you still have to with ext3. Just not as often. Every 21 mounts or whatever, or 180 days or whatever.

...

A wise default, I will agree. I just wish it came with a message that said "DON'T PANIC!" in nice friendly letters, so that my clients didn't *freak* when they saw all the warning nessages. I also wish that it would default to doing its *damnedest* to fix the problems rather than dumping them at a cold and lonely # prompt and leaving them with a "broken" system.

If I want it to be conservative about fixing stuff, by God, I can tell it to dump me at a command prompt.

Exactly. What are the odds that the person answering the prompts knows more than fsck about how to fix filesystems?

-- Les Mikesell lesmikesell@gmail.com

Steve Bergman

11:25 p.m.

On Tue, 2006-10-03 at 18:02 -0500, Les Mikesell wrote:

...

Exactly. What are the odds that the person answering the prompts knows more than fsck about how to fix filesystems?

The presumption is that the seasoned administrator, who will always be on hand, might know that there is the possibility that fsck will make the wrong decisions and corrupt the filesystem beyond repair.

So, being there 24 hours a day, or able to return on 5 minutes notice, in the worst case (one has to admire the dedication), he will whip out his trusty filesystem debugger and satisfy himself, in another 5 minutes, since the customers are waiting at the cash registers to get checked out, that either it would be safe to allow fsck to fix the problems, or that it would be better to tell the customers to all go home and come back tomorrow after he is certain that the filesystem is safe.

This leaves the rest of us to wonder how it all applies to us, our users, and our customers... :-(

-Steve

chrism＠imntv.com

4 Oct 4 Oct

12:45 a.m.

Les Mikesell wrote:

...

Exactly. What are the odds that the person answering the prompts knows more than fsck about how to fix filesystems?

Unless you're Ted T'so, probably zero. :-)

Cheers,

Steve Bergman

2:50 a.m.

On Tue, 2006-10-03 at 20:45 -0400, chrism@imntv.com wrote:

...

Les Mikesell wrote:

...
Exactly. What are the odds that the person answering the prompts knows more than fsck about how to fix filesystems?

Unless you're Ted T'so, probably zero. :-)

I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

Feizhou

3:14 a.m.

Steve Bergman wrote:

...

On Tue, 2006-10-03 at 20:45 -0400, chrism@imntv.com wrote:

...
Les Mikesell wrote:

...
Exactly. What are the odds that the person answering the prompts knows more than fsck about how to fix filesystems?

Unless you're Ted T'so, probably zero. :-)

I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

Steve Bergman

3:21 a.m.

On Wed, 2006-10-04 at 11:14 +0800, Feizhou wrote:

...

...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

Kirk Bocek

3:27 a.m.

Steve Bergman wrote:

...

On Wed, 2006-10-04 at 11:14 +0800, Feizhou wrote:

...
...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

I just ran 'fsck -fvy' with no complaints.

Les Mikesell

3:44 a.m.

On Tue, 2006-10-03 at 22:27, Kirk Bocek wrote:

...

Steve Bergman wrote:

...
On Wed, 2006-10-04 at 11:14 +0800, Feizhou wrote:

...
...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

I just ran 'fsck -fvy' with no complaints.

That works if there is not much damage. If the machine was busy when it crashed there's a fair chance that it will refuse to run with the -y, which is no fun when you really need the machine to restart itself when power is restored.

-- Les Mikesell lesmikesell@gmail.com

Feizhou

3:56 a.m.

Les Mikesell wrote:

...

On Tue, 2006-10-03 at 22:27, Kirk Bocek wrote:

...
Steve Bergman wrote:

...
On Wed, 2006-10-04 at 11:14 +0800, Feizhou wrote:

...
...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

I just ran 'fsck -fvy' with no complaints.

That works if there is not much damage. If the machine was busy when it crashed there's a fair chance that it will refuse to run with the -y, which is no fun when you really need the machine to restart itself when power is restored.

How serious a level of damage before it refuses -y?

I cannot remember any time that I have not been able to do -y and there have been times when I saw a huge amount of errors being automatically fixed.

Les Mikesell

4:37 a.m.

On Tue, 2006-10-03 at 22:56, Feizhou wrote:

...

...
...
...
...
...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

I just ran 'fsck -fvy' with no complaints.

That works if there is not much damage. If the machine was busy when it crashed there's a fair chance that it will refuse to run with the -y, which is no fun when you really need the machine to restart itself when power is restored.

How serious a level of damage before it refuses -y?

Just guessing, but probably anytime 2 or more concurrent writes had allocated space but not completed the updates.

...

I cannot remember any time that I have not been able to do -y and there have been times when I saw a huge amount of errors being automatically fixed.

With ext2 my odds were at least one out of 10 that a busy machine wouldn't come back up automatically after a power glitch. Ext3 is much better because it normally just uses the journal to recover.

-- Les Mikesell lesmikesell@gmail.com

Feizhou

4:53 a.m.

...

...
How serious a level of damage before it refuses -y?

Just guessing, but probably anytime 2 or more concurrent writes had allocated space but not completed the updates.

...
I cannot remember any time that I have not been able to do -y and there have been times when I saw a huge amount of errors being automatically fixed.

With ext2 my odds were at least one out of 10 that a busy machine wouldn't come back up automatically after a power glitch. Ext3 is much better because it normally just uses the journal to recover.

Hang on, I might be off on a tangent here. Are you saying there is a difference between fsck on ext2 and fsck on ext3 (when not doing journal recovery of course) when it comes to -y?

Les Mikesell

5:15 a.m.

On Tue, 2006-10-03 at 23:53, Feizhou wrote:

...

...
...
How serious a level of damage before it refuses -y?

Just guessing, but probably anytime 2 or more concurrent writes had allocated space but not completed the updates.

...
I cannot remember any time that I have not been able to do -y and there have been times when I saw a huge amount of errors being automatically fixed.

With ext2 my odds were at least one out of 10 that a busy machine wouldn't come back up automatically after a power glitch. Ext3 is much better because it normally just uses the journal to recover.

Hang on, I might be off on a tangent here. Are you saying there is a difference between fsck on ext2 and fsck on ext3 (when not doing journal recovery of course) when it comes to -y?

I don't know about that. The default unattended startup just uses the journal on ext3 instead of fsck so the odds are much better that it will complete by itself. If there is a difference in fsck it is probably more version-related than ext2 vs ext3.

-- Les Mikesell lesmikesell@gmail.com

Feizhou

5:24 a.m.

...

I don't know about that. The default unattended startup just uses the journal on ext3 instead of fsck so the odds are much better that it will complete by itself. If there is a difference in fsck it is probably more version-related than ext2 vs ext3.

Just my imagination. Bill Schoolcraft has answered the mystery of fsck -y. It is imposed by the script and not by e2fsprogs itself.

Steve Bergman

4:06 a.m.

On Tue, 2006-10-03 at 22:44 -0500, Les Mikesell wrote:

...

...
I just ran 'fsck -fvy' with no complaints.

That works if there is not much damage.

"Damage" is too strong a word. It bails if there is anything that it is not absolutely sure it knows how to fix. It doesn't care if it is running on your grandma's home system, a point of sale server at an establishment without a full-time sysadmin, or in a datacenter with experienced admins jumping all over each other to run their favorite filesystem debugger to dissect what might have happened.

The problem is that, so far as I know, there is *no freaking way* to tell it that your grandma, client, or other helpless party is at the controls.

-a, -y -p... all insist upon protecting the user from having his problem fixed until he demonstrates his own competence.

Bill-Schoolcraft

5:10 a.m.

At Tue, 3 Oct 2006 it looks like Steve Bergman composed:

...

On Wed, 2006-10-04 at 11:14 +0800, Feizhou wrote:

...
...
I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

nah, fsck -y

No. That still bails you out and asks you to run it without -a or -y.

You can, however, put a matchstick between the 'Y' and the 'T' in such a way that you can fix dinner while it's running. ;-)

That's exactly what one can do, "been there, done that!" Getting bounced into a "system-forced" fsck will, as I've experienced, deny you the -y option.

-- Bill Schoolcraft <*> Unix System Engineer ~ "When a fly lands on the ceiling, does it do a half roll or a half loop?"

Morten Torstensen

11:31 a.m.

Steve Bergman wrote:

...

...
Unless you're Ted T'so, probably zero. :-)

I Imagine he holds the 'Y' key down, too, or has a patch to do it for him..

That is what I would do. That is also why I do backups. With fresh backups, you don't bother with filesystem recovery. If a fsck cannot fix it, restore. Any other choice is one of desperation (read: no backups)

Michael Kress

8 Oct 8 Oct

5:17 p.m.

Kirk Bocek wrote:

...

Nathan Grennan wrote:

...
XFS, fast, but can fail under load, does XORs of data, so a bad write, as in power failure, can mean garbage in a file. It is meta-data only journaling. Also slow on deletes.

You and several others point to a greater chance for data corruption. However, this host will be on a UPS. The system will be safely shut down before the power goes off. Isn't that enough protection?

Kirk,

how did you decide about the xfs question? I almost have the same setup as you do (9550SXU-LP, dual xeon on a Supermicro X6DH8-G2+, 4SATA-II hds attached, raid5) and I'm following the discussion, but as it grew quite big, I must have lost the trail to your decision. :) I made my bonnie++ tests with xfs under xen and I'm not quite content (still searching for the speed gain to include into the xen enabled kernel). To express it briefly, xen kernel has worse performance than the centos-kernel. Anyways, xfs is a whole lotta faster than ext3.

What worries me a little bit is peoples' fear about xfs being unsafe under high load. That's why I'd like to hear something about your decision. Thanks Michael

-- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L

Kirk Bocek

8:14 p.m.

I'm going to leave the OS volumes (/, /boot, /var, etc) as ext3. This should reduce the chances of creating a non-bootable system. I'm going to take a leap of faith and use XFS on my 'data' volumes. I'm sure I can make use of the 200MB/Sec writes I've benchmarked with bonnie++.

Kirk Bocek

Michael Kress wrote:

...

Kirk Bocek wrote:

...
Nathan Grennan wrote:

...
XFS, fast, but can fail under load, does XORs of data, so a bad write, as in power failure, can mean garbage in a file. It is meta-data only journaling. Also slow on deletes.

You and several others point to a greater chance for data corruption. However, this host will be on a UPS. The system will be safely shut down before the power goes off. Isn't that enough protection?

Kirk,

how did you decide about the xfs question? I almost have the same setup as you do (9550SXU-LP, dual xeon on a Supermicro X6DH8-G2+, 4SATA-II hds attached, raid5) and I'm following the discussion, but as it grew quite big, I must have lost the trail to your decision. :) I made my bonnie++ tests with xfs under xen and I'm not quite content (still searching for the speed gain to include into the xen enabled kernel). To express it briefly, xen kernel has worse performance than the centos-kernel. Anyways, xfs is a whole lotta faster than ext3.

What worries me a little bit is peoples' fear about xfs being unsafe under high load. That's why I'd like to hear something about your decision. Thanks Michael

Feizhou

9 Oct 9 Oct

2:23 a.m.

Kirk Bocek wrote:

...

I'm going to leave the OS volumes (/, /boot, /var, etc) as ext3. This should reduce the chances of creating a non-bootable system. I'm going to take a leap of faith and use XFS on my 'data' volumes. I'm sure I can make use of the 200MB/Sec writes I've benchmarked with bonnie++.

Remember, 8k stacks, 8k stacks.

Kirk Bocek

3:14 a.m.

Yes, yes, mommy, this is x86_64! :)

Feizhou wrote:

...

Remember, 8k stacks, 8k stacks.

Feizhou

3:39 a.m.

Kirk Bocek wrote:

...

Yes, yes, mommy, this is x86_64! :)

say...did you try JFS?

Kirk Bocek

4:19 a.m.

Feizhou wrote:

...

say...did you try JFS?

Yes I did. However that was while playing with the centosplus kernel and before I realized that I needed to run the stock kernel in order to have access to a set of kernel modules that are available in RPM form only for the stock kernel.

Along the way I managed to delete my JFS benchmarks. I guess I didn't post the actual numbers to the list but JFS performed a bit worse than XFS but still much better than ext3.

Kirk Bocek

Feizhou

5:28 a.m.

Hi Kirk,

...

...
say...did you try JFS?

Yes I did. However that was while playing with the centosplus kernel and before I realized that I needed to run the stock kernel in order to have access to a set of kernel modules that are available in RPM form only for the stock kernel.

Along the way I managed to delete my JFS benchmarks. I guess I didn't post the actual numbers to the list but JFS performed a bit worse than XFS but still much better than ext3.

So JFS performance still stands. Thank you.

Michael Kress

5:57 a.m.

Kirk Bocek wrote:

...

I'm going to leave the OS volumes (/, /boot, /var, etc) as ext3. This should reduce the chances of creating a non-bootable system. I'm going to take a leap of faith and use XFS on my 'data' volumes. I'm sure I can make use of the 200MB/Sec writes I've benchmarked with bonnie++.

I just converted one of my xen domains running on an x86_64 from ext3 to xen. That's one partition containing everything, system and data. I must say, I am content with performance compared to ext3. What I've seen is that an umount takes quite a while after having written tons of data (like with bonnie++). With umount I mean: shutdown xen domain, mount this partition, write data on it, and then umount it and wait up to 1 min for it to complete(!). The only thing I had to care of around xenU was the xfs module, which had to be pre-known via ramdisk, so I had no worries being able to boot the xenU...

mkinitrd /boot/initrd-`uname -r`-xenU.img `uname -r` -v --fstab=/mnt/lvx04/etc/fstab

Greetings - Michael

-- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L

Andreas Micklei

11:58 a.m.

Am Sonntag, 8. Oktober 2006 22:14 schrieb Kirk Bocek:

...

I'm going to leave the OS volumes (/, /boot, /var, etc) as ext3. This should reduce the chances of creating a non-bootable system. I'm going to take a leap of faith and use XFS on my 'data' volumes. I'm sure I can make use of the 200MB/Sec writes I've benchmarked with bonnie++.

Same setup here. Works beautifully. NFS, Samba and Kolab run really fast on our x86_64 machine.

Regarding concerns of filesystem corruption: Always make backups. There can be a hardware crash or some other catastrophy any time. Using ext3 does not safe you the trouble of making proper backups and planning for disaster. And if you have a good backup strategy anyway... why not try XFS? :-)

regards, Andreas Micklei

Feizhou

4 Oct 4 Oct

5:22 a.m.

Nathan Grennan wrote:

...

Kirk Bocek wrote:

...
Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. I benchmarked XFS, JFS, ReiserFS and ext3 and they came back in that order from best to worst performer.

I'm leaning towards XFS because of performance and because centosplus makes kernel modules available for the stock kernel.

How's the reliability of XFS? It's certainly been around long enough.

Anyone care to sway me one way or another?

Here is the story, if not somewhat outdated, that I have learned over time.

XFS, fast, but can fail under load, does XORs of data, so a bad write, as in power failure, can mean garbage in a file. It is meta-data only journaling. Also slow on deletes.

ext3, works for me. It is meta-data only by default, but does it in s a such a way to minimize the risk much more than other filesystems. Also has writeback mode which is like other filesystems if you are looking for better performance. Also has full data journalling mode, which is atomic and is actually faster than the other two in certain situations.

BTW, data=writeback is no guarantee of a performance boost. However, the test was done with 2.4 which also gave data=journal a performance boost in certain cases. In any case, Bruce Guenter's testing showed that ordered and writeback does not result in any performance benefit at all.

http://untroubled.org/benchmarking/2004-04/2.6.5-gentoo/

Check out Jeff Mahoney's views on XFS and ext3.

http://linux.wordpress.com/2006/09/27/suse-102-ditching-reiserfs-as-it-defau...

Debbie Tropiano

3:14 p.m.

Kirk -

On Mon, Oct 02, 2006 at 04:41:38PM -0700, Kirk Bocek wrote:

...

Now that I've been enlightened to the terrible write performance of ext3 on my new 3Ware RAID 5 array, I'm stuck choosing an alternative filesystem. ...

There is a known problem with some 3Ware RAID cards and RH-type Linux OSes (I can find the link, if you like). We ran into that but discovered that an upgrade to our RAID card (to the 9000 series) fixed it for us (but I've seen reports of problems even with the 9000 series cards).

A while back we tried both XFS & JFS, but ran into kernel bugs with both of them. It may be that those bugs have been fixed, but it's worthwhile knowing about it ahead of time.

I've not been following ReiserFS for a long time, but the last time I worked with it (several years ago) we ran into problems with file corruption. It may be that they have all been fixed, but I'd look into that as well.

Good luck, Debbie "who hasn't read the entire thread"

Kirk Bocek

3:22 p.m.

Debbie Tropiano wrote:

...

There is a known problem with some 3Ware RAID cards and RH-type Linux OSes (I can find the link, if you like). We ran into that but discovered that an upgrade to our RAID card (to the 9000 series) fixed it for us (but I've seen reports of problems even with the 9000 series cards).

If you have that link, I'd appreciate seeing whatever is there. I don't expect much since this I am using the latest 9550 board.

...

A while back we tried both XFS & JFS, but ran into kernel bugs with both of them. It may be that those bugs have been fixed, but it's worthwhile knowing about it ahead of time.

I've not been following ReiserFS for a long time, but the last time I worked with it (several years ago) we ran into problems with file corruption. It may be that they have all been fixed, but I'd look into that as well.

Good luck, Debbie "who hasn't read the entire thread"

What? There's only 600 or so posts... (I *knew* I'd set off the fanatics!)

Debbie Tropiano

3:29 p.m.

Kirk -

On Wed, Oct 04, 2006 at 08:22:39AM -0700, Kirk Bocek wrote:

...

Debbie Tropiano wrote:

...
There is a known problem with some 3Ware RAID cards and RH-type Linux OSes (I can find the link, if you like). We ran into that but discovered that an upgrade to our RAID card (to the 9000 series) fixed it for us (but I've seen reports of problems even with the 9000 series cards).

If you have that link, I'd appreciate seeing whatever is there. I don't expect much since this I am using the latest 9550 board.

It's https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121434 There are *lots* of comments, so you might want to start from the bottom for the more current ones (which also might have the more recent boards listed).

...

What? There's only 600 or so posts... (I *knew* I'd set off the fanatics!)

:-)

Good luck, Debbie

Steve Bergman

3:24 p.m.

On Wed, 2006-10-04 at 10:14 -0500, Debbie Tropiano wrote:

...

Kirk -

...

I've not been following ReiserFS for a long time, but the last time I worked with it (several years ago) we ran into problems with file corruption. It may be that they have all been fixed, but I'd look into that as well.

Makes one wonder if Linux is reasonably usable with 3ware. ;-)

Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

Joshua Baker-LePain

3:29 p.m.

On Wed, 4 Oct 2006 at 10:24am, Steve Bergman wrote

...

Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

Actually, md is a *very* good performer (although I don't have any benchmarks on hand at the moment). The reason I stick with hardware RAID is that md doesn't handle hot swapping all that well. Not taking systems down to replace bad disks is a Good Thing.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Les Mikesell

3:41 p.m.

On Wed, 2006-10-04 at 11:29 -0400, Joshua Baker-LePain wrote:

...

...
Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

Actually, md is a *very* good performer (although I don't have any benchmarks on hand at the moment). The reason I stick with hardware RAID is that md doesn't handle hot swapping all that well. Not taking systems down to replace bad disks is a Good Thing.

Now that disks are not so expensive you have to look at the difference in performance with raid 5 vs. raid 1 or 0+1. Raid5 essential ties all of the disk heads together and seek time is usually the bottleneck unless you have a single process accessing a huge file. Of course these days you can also throw RAM at the problem and at least avoid the seek to read the block that you are going to partially overwrite by having it already in the buffers.

-- Les Mikesell lesmikesell@gmail.com

Feizhou

5 Oct 5 Oct

4:12 a.m.

Les Mikesell wrote:

...

On Wed, 2006-10-04 at 11:29 -0400, Joshua Baker-LePain wrote:

...
...
Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

Actually, md is a *very* good performer (although I don't have any benchmarks on hand at the moment). The reason I stick with hardware RAID is that md doesn't handle hot swapping all that well. Not taking systems down to replace bad disks is a Good Thing.

Now that disks are not so expensive you have to look at the difference in performance with raid 5 vs. raid 1 or 0+1. Raid5 essential ties all of the disk heads together and seek time is usually the bottleneck unless you have a single process accessing a huge file. Of course these days you can also throw RAM at the problem and at least avoid the seek to read the block that you are going to partially overwrite by having it already in the buffers.

raid 1+0 will trounce raid 5 in most situations. Been there and done that. The only area where raid 5 really makes absolute sense is large archives that are rarely modified and not constantly written to. Given enough disks, very little should be able to match raid5 read performance...until one disk goes down...then the bottleneck becomes the processing power available so software raid 5 will likely trump raid 5 on hardware raid cards.

Dan Stoner

4 Oct 4 Oct

3:44 p.m.

...

Makes one wonder if Linux is reasonably usable with 3ware. ;-)

Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

I don't have any hard numbers, but one of my colleagues just doesn't use the 3Ware hardware RAID features because of the performance issues, building his RAID 5 (or possibly RAID 6) arrays in software. Performance screams (=fast).

Back to the filesystem discussion... A bit far-fetched, but I wonder if the really bad (comparably) ext3 performance could be caused by an unusual interaction between the ext3 code and the 3Ware driver that happens to makes writes suck worse when using ext3 on the 3Ware adapters... like ext3 writes ALWAYS fall into some non-optimal driver code path.

- Dan

Feizhou

5 Oct 5 Oct

4:15 a.m.

Dan Stoner wrote:

...

...
Makes one wonder if Linux is reasonably usable with 3ware. ;-)

Does anyone have any bonnie++ results for Linux software RAID 5? I know software RAID 5 does not really lend itself to a performant software implementation, but the results would be interesting to see, none the less.

I don't have any hard numbers, but one of my colleagues just doesn't use the 3Ware hardware RAID features because of the performance issues, building his RAID 5 (or possibly RAID 6) arrays in software. Performance screams (=fast).

If he is using 3ware raid cards below 95xx series, raid 5 performance sucks due to the lack of a large RAM buffer. 3ware 95xx cards should have good performance with raid 5 implementations.

Debbie Tropiano

4 Oct 4 Oct

3:56 p.m.

Steve -

On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...

On Wed, 2006-10-04 at 10:14 -0500, Debbie Tropiano wrote:

...
Kirk -

...
I've not been following ReiserFS for a long time, but the last time I worked with it (several years ago) we ran into problems with file corruption. It may be that they have all been fixed, but I'd look into that as well.

Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I think you may have intended to snip out something else ...

The ReiserFS work that I did was with different HW and didn't include any 3ware RAID cards (IIRC it was a Dell RAID but that was quite a while back).

But we've had good stability issues with EXT3 on 3ware cards with RAID 5 once we upgraded our oldest card to the 9000 series. Since ours are production systems, we prioritized stability over performance and have been satisfied. We are now revisiting other FS types (XFS especially) but still need the stability. I'm interested in what happens for us.

Debbie "who's probably gone OT since our fileservers are running FC2 & 3"

Feizhou

5 Oct 5 Oct

4:22 a.m.

Debbie Tropiano wrote:

...

Steve -

On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...
On Wed, 2006-10-04 at 10:14 -0500, Debbie Tropiano wrote:

...
Kirk - I've not been following ReiserFS for a long time, but the last time I worked with it (several years ago) we ran into problems with file corruption. It may be that they have all been fixed, but I'd look into that as well.

Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I think you may have intended to snip out something else ...

The ReiserFS work that I did was with different HW and didn't include any 3ware RAID cards (IIRC it was a Dell RAID but that was quite a while back).

I guess that was before 2.4.18? reiserfs suffered from being out of sync with vfs code before 2.4.18.

...

But we've had good stability issues with EXT3 on 3ware cards with RAID 5 once we upgraded our oldest card to the 9000 series. Since ours are production systems, we prioritized stability over performance and have been satisfied. We are now revisiting other FS types (XFS especially) but still need the stability. I'm interested in what happens for us.

write cache off :P

...

Debbie "who's probably gone OT since our fileservers are running FC2 & 3"

Nah. These issues are across the board...even the i/o schedulers are same across FC2/3 and Centos 4/RHEL 4.

Rodrigo Barbosa

4 Oct 4 Oct

4:07 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...

Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I take by that emoticon that you are kidding, because everyone (and their cats and dogs) is using 3Ware on Linux these days. I have (P)ATA, SATA and SCSI 3Ware cards on Linux servers, without any issues whatsoever.

[]s

- -- Rodrigo Barbosa "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFI9xMpdyWzQ5b5ckRAlUQAJoCp9RVI3F0ptJSjCoznYwRaAgyfwCeLo+h VwQzHk5mENZwSqEpClwOXG4= =aU85 -----END PGP SIGNATURE-----

Feizhou

5 Oct 5 Oct

4:23 a.m.

Rodrigo Barbosa wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...
Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I take by that emoticon that you are kidding, because everyone (and their cats and dogs) is using 3Ware on Linux these days. I have (P)ATA, SATA and SCSI 3Ware cards on Linux servers, without any issues whatsoever.

Lucky you, if you had paired your older 3ware card with certain motherboards...

Rodrigo Barbosa

4:32 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Thu, Oct 05, 2006 at 12:23:49PM +0800, Feizhou wrote:

...

...
On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...
Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I take by that emoticon that you are kidding, because everyone (and their cats and dogs) is using 3Ware on Linux these days. I have (P)ATA, SATA and SCSI 3Ware cards on Linux servers, without any issues whatsoever.

Lucky you, if you had paired your older 3ware card with certain motherboards...

Maybe. But I have made certain my BIOS (both mb and card) were always up to date.

[]s

- -- Rodrigo Barbosa "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFJIrjpdyWzQ5b5ckRApAnAKCAckwiPBhtN2xA/BN6vOma9ZuhVwCfXvED Wr2WJMGP2wkvIGzbHa4ghbk= =+2Xq -----END PGP SIGNATURE-----

Feizhou

4:48 a.m.

Rodrigo Barbosa wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Thu, Oct 05, 2006 at 12:23:49PM +0800, Feizhou wrote:

...
...
On Wed, Oct 04, 2006 at 10:24:57AM -0500, Steve Bergman wrote:

...
Makes one wonder if Linux is reasonably usable with 3ware. ;-)

I take by that emoticon that you are kidding, because everyone (and their cats and dogs) is using 3Ware on Linux these days. I have (P)ATA, SATA and SCSI 3Ware cards on Linux servers, without any issues whatsoever.

Lucky you, if you had paired your older 3ware card with certain motherboards...

Maybe. But I have made certain my BIOS (both mb and card) were always up to date.

The problem had nothing to do with bios being up to date. That is why 3ware maintains a list of compatible motherboards. But I did leave out one important bit...the problem I have seen involved riser cards and was resolved after getting a particular riser card from a particular manufacturer.

6851

Age (days ago)

6858

Last active (days ago)

discuss@lists.centos.org

124 comments

23 participants

tags (0)

participants (23)

Andreas Micklei
Bill-Schoolcraft
Camron W. Fox
chrism＠imntv.com
Dan Stoner
Debbie Tropiano
Feizhou
Jim Perrin
Johnny Hughes
Joshua Baker-LePain
JT Justman
Jure Pečar
karl＠klxsystems.net
Kirk Bocek
Lamar Owen
Les Mikesell
Michael Kress
Morten Torstensen
Nathan Grennan
Paul Heinlein
Peter Kjellstrom
Rodrigo Barbosa
Steve Bergman