Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
So what is the best way to get XFS working with CentOS 4.3 ? And not have something like this happening.
A quote from the xfs list at sgi
On Tue, 18 Jul 2006 at 10:29am, Andrew Elwell wrote
>using the 2.6.9-34 centosplus SMP kernel (3GHz P4 with >hyperthreading enabled) > >what we normally (~once a day) is simply > >do_IRQ: stack overflow: 416 >[<c0107a27>]
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
Do I need a kernel with 8K stacks?
and is this
http://dev.centos.org/centos/4/testing/i386/RPMS/kernel-module-xfs-2.6.9-34....
the "kernel-module-xfs" RPM he was talking about (or equivalent for `uname -r` equals 2.6.9-34.ELsmp).
Regards Mark Strong
This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of Transaction Network Services. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
Mark Strong wrote:
Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
you might want to dig into that noise...
So what is the best way to get XFS working with CentOS 4.3 ? And not have something like this happening.
follow the centos-xfs-kernel-rpm process, you already highlighted.
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
Do I need a kernel with 8K stacks?
thats a decisions for you to take :)
and is this
http://dev.centos.org/centos/4/testing/i386/RPMS/kernel-module-xfs-2.6.9-34....
the "kernel-module-xfs" RPM he was talking about (or equivalent for `uname -r` equals 2.6.9-34.ELsmp).
yes
On Tue, 2006-08-15 at 14:13 +1000, Mark Strong wrote:
Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
So what is the best way to get XFS working with CentOS 4.3 ? And not have something like this happening.
A quote from the xfs list at sgi
On Tue, 18 Jul 2006 at 10:29am, Andrew Elwell wrote
>using the 2.6.9-34 centosplus SMP kernel (3GHz P4 with >hyperthreading enabled) > >what we normally (~once a day) is simply > >do_IRQ: stack overflow: 416 >[<c0107a27>]
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
Do I need a kernel with 8K stacks?
and is this
http://dev.centos.org/centos/4/testing/i386/RPMS/kernel-module-xfs-2.6.9-34....
the "kernel-module-xfs" RPM he was talking about (or equivalent for `uname -r` equals 2.6.9-34.ELsmp).
Regards Mark Strong
Personally, I would not use xfs on Linux ... maybe take a look here:
http://distrowatch.com/weekly.php?issue=20060814
And see what several debain devel's say about XFS.
RedHat says it is not stable enough to use in RHEL.
I don't think everyone can be wrong.
If you really want to use it, you can use the module you referenced above and our kernel. The standard RHEL kernel will not compile w/ anything except 4k stacks (that is how the CentOS kernel is released too) ... so if you want to do that, you'll need to figure it out.
On Tue, 2006-08-15 at 04:18, Johnny Hughes wrote:
Personally, I would not use xfs on Linux ... maybe take a look here:
http://distrowatch.com/weekly.php?issue=20060814
And see what several debain devel's say about XFS.
Was the update at the bottom of that article pointing to http://oss.sgi.com/projects/xfs/faq.html#dir2 there when you read it. Turns out to be a kernel bug in 2.6.17 that is fixed in 2.6.17.7 and later. What's the deal with recent kernels anyway? All kinds of stuff has been broken in fedora in the last several kernel updates and they are only up to 2.6.17-1 or so. It keeps reminding why I like Centos, but how can anyone ship something that won't boot on a mainstream box like an IBM x86 eserver?
On Tue, 15 Aug 2006 at 4:18am, Johnny Hughes wrote
Personally, I would not use xfs on Linux ... maybe take a look here:
Almost every time I've tested performance for my workload of interest, XFS kicks the $#@)$ out of ext3 -- we're talking more than 2X write performance on the same hardware. And every time I point out how poorly ext3 performs (either on the RH lists or the ext3 list) I get ignored or told it's my hardware (despite also providing the XFS numbers proving it's not the hardware).
And I won't even go into xfsdump vs. ext2/3 dump.
http://distrowatch.com/weekly.php?issue=20060814
And see what several debain devel's say about XFS.
Yes, there was a bad bug with XFS recently. It's fixed now. It happens.
RedHat says it is not stable enough to use in RHEL.
I've never completely understood RH's opposition to XFS. I've heard several stories -- the 4K stacks issue (which is a long way towards being resolved in recent kernels), support issues, etc. I almost wonder if it isn't a case of NIH.
I don't think everyone can be wrong.
To add one more anecdotal data point, I've used XFS since RH7.3 (using pre 1.0 releases from SGI) and never lost *any* data to it. Transitioning to ext3 (to stay with officially supported kernels) was *painful* -- performance plummeted, and it forced me to rework many of my servers.
If you really want to use it, you can use the module you referenced above and our kernel. The standard RHEL kernel will not compile w/ anything except 4k stacks (that is how the CentOS kernel is released too) ... so if you want to do that, you'll need to figure it out.
Also (to the OP) keep in mind that x86_64 still uses 8K stacks.
On Tue, 2006-08-15 at 09:15 -0400, Joshua Baker-LePain wrote:
Almost every time I've tested performance for my workload of interest, XFS kicks the $#@)$ out of ext3
It is clearly a trade-off. E.g. XFS's lazy allocation causes less writes and less fragmentation. But in the event of a crash it is likely that you will lose more data on filesystems with a lot of variable data than ext3.
I've never completely understood RH's opposition to XFS. I've heard several stories -- the 4K stacks issue (which is a long way towards being resolved in recent kernels), support issues, etc. I almost wonder if it isn't a case of NIH.
I guess there are various reasons:
- 4K stacks were an issue at 4.0 time (maybe they still are, I don't know). - SELinux security labels cannot be stored with the default XFS inode size (of course, the inode size can be set when creating a filesystem) - XFS does not have data block journaling. - Do you want to support more than one file system, when you have a file system that is good enough?
-- Daniel
Daniel de Kok wrote:
On Tue, 2006-08-15 at 09:15 -0400, Joshua Baker-LePain wrote:
Almost every time I've tested performance for my workload of interest, XFS kicks the $#@)$ out of ext3
It is clearly a trade-off. E.g. XFS's lazy allocation causes less writes and less fragmentation. But in the event of a crash it is likely that you will lose more data on filesystems with a lot of variable data than ext3.
Yes, XFS works great for heavy I/O loads like mail queues...until the box crashes during peak hours.
I wonder how Hans and reiser4 are getting along...
Feizhou wrote:
I wonder how Hans and reiser4 are getting along...
Oh, I think those two are getting along great. The problem is: How are Hans and the rest of the world getting along?
scnr,
Ralph
Rading this thread about xfs vs ext3, and how ext3 is safer...
I just got ("just" as in "today") an error on one of my ext3 file systems. Reason? Userspace application allocated a larger chunk of memory, kernel generated OOM (while there was about 1 gig of swap still free), and an completely unrelated ext3 file system (app in question wasn't doing anything on it) got an error, automatically was remoutned read-only, and marked as in need of fsck.
I've unmounted it, run fsck on it (it found some errors and fixed them), and now each time I try to mount that file system kernel reports the file system is marked as having error from previous mount and that it is in need of fsck.
<flame mode="on"> Now, I wouldn't call this kind of thing "stable" operating system or "stable" file system. If application asks for too much memory it should get killed (btw, system had 1 gig of RAM and application asked for like 600 meg, plus there was plenty of swap space free too -- so I wouldn't call this a case of app asking too much). You definetely don't end up with corrupted file system. </flame>
On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote:
Rading this thread about xfs vs ext3, and how ext3 is safer...
I just got ("just" as in "today") an error on one of my ext3 file systems. Reason? Userspace application allocated a larger chunk of memory, kernel generated OOM (while there was about 1 gig of swap still free), and an completely unrelated ext3 file system (app in question wasn't doing anything on it) got an error, automatically was remoutned read-only, and marked as in need of fsck.
I've unmounted it, run fsck on it (it found some errors and fixed them), and now each time I try to mount that file system kernel reports the file system is marked as having error from previous mount and that it is in need of fsck.
<flame mode="on"> Now, I wouldn't call this kind of thing "stable" operating system or "stable" file system. If application asks for too much memory it should get killed (btw, system had 1 gig of RAM and application asked for like 600 meg, plus there was plenty of swap space free too -- so I wouldn't call this a case of app asking too much). You definetely don't end up with corrupted file system. </flame>
shit breaks ... it's life :)
On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote:
<flame mode="on"> Now, I wouldn't call this kind of thing "stable" operating system or "stable" file system. If application asks for too much memory it should get killed (btw, system had 1 gig of RAM and application asked for like 600 meg, plus there was plenty of swap space free too -- so I wouldn't call this a case of app asking too much). You definetely don't end up with corrupted file system. </flame>
- Did you enforce process limits? - Was the memory fragmented, and how does the applications allocate memory? - I suppose that vm.oom-kill is still set to 1?
Oh, and there's always bad karma (or semi-random errors if you prefer) ;).
-- Daniel
Quoting Daniel de Kok danieldk@pobox.com:
On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote:
<flame mode="on"> Now, I wouldn't call this kind of thing "stable" operating system or "stable" file system. If application asks for too much memory it should get killed (btw, system had 1 gig of RAM and application asked for like 600 meg, plus there was plenty of swap space free too -- so I wouldn't call this a case of app asking too much). You definetely don't end up with corrupted file system. </flame>
- Did you enforce process limits?
Hm, no. There was no need for that. Even if I had, they would be higher than what the app was using (because the system had enough resources).
- Was the memory fragmented, and how does the applications allocate
memory?
Well, it was Perl script, and only God knows how Perl allocates memory ;-). It allocated almost all of those 600megs on startup (probably in smaller chunks), than happily worked on it. Somewhere in the middle, the OOM and file system corruption happened. BTW, some half an hour after the ext3 error, the app happily (and uniterrupted) finished its job.
- I suppose that vm.oom-kill is still set to 1?
Hmmm... Any downside to setting it to 0?
Oh, and there's always bad karma (or semi-random errors if you prefer) ;).
Bad karma is having a bad memory, or overheated processor. Not applicable to my case ;-)
There were bunck of errors logged. Here are just few of them that seem most relevant:
Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 HighMem: empty Swap cache: add 1634047, delete 1515588, find 13048002/13194900, race 0+22 Free swap: 844384kB 261856 pages of RAM 5646 reserved pages 108709 pages shared 118455 pages swap cached do_get_write_access: OOM for frozen_buffer ext3_splice_branch: aborting transaction: Out of memory in __ext3_journal_get_write_access EXT3-fs error (device dm-2) in ext3_ordered_writepage: Out of memory Aborting journal on device dm-2. ext3_abort called. EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device dm-2) in ext3_ordered_writepage: IO failure last message repeated 3 times __journal_remove_journal_head: freeing b_frozen_data last message repeated 10 times __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data
On Wed, 2006-08-16 at 16:08 -0500, Aleksandar Milivojevic wrote:
Hmmm... Any downside to setting it to 0?
Yeah. It disables the OOM killer, possibly leading to the situation where no one can allocate memory. Having it enabled (and set to one process) will try to pick a process according to the least surprise principle.
Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 HighMem: empty
Seems like a very bad OOM situation. Remember that the swapper is not just an extension of physical memory, since kswapd has to take care of paging out memory pages. If you are allocating huge chunks of memory in a short time, you may want to crank up the vm.min_free_kbytes tunable. This variable sets the low watermark of free memory, and setting it higher gives the kernel more room for emergency allocations (e.g. letting kswapd do its work). Setting it to 4096KB should help a lot, though I have seen people setting it much higher on servers.
-- Daniel
Quoting Daniel de Kok danieldk@pobox.com:
On Wed, 2006-08-16 at 16:08 -0500, Aleksandar Milivojevic wrote:
Hmmm... Any downside to setting it to 0?
Yeah. It disables the OOM killer, possibly leading to the situation where no one can allocate memory. Having it enabled (and set to one process) will try to pick a process according to the least surprise principle.
So, does "no one can allocate memory" means:
a) things will be put on hold until kswapd frees enough pages (if possible, than apps will start dying), or b) the machine will freeze
The (a) is what usually happens on other operating system (Tru64 and/or Solaris). And it is acceptable. If backing store is overallocated, and free swap falls to zero, kernel kills userland process that it can't handle anymore. However, as I said many times in this thread, this is not what happened in my case. I had more than enough free space on swap to accomodate offending application.
Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 HighMem: empty
Seems like a very bad OOM situation. Remember that the swapper is not just an extension of physical memory, since kswapd has to take care of paging out memory pages. If you are allocating huge chunks of memory in a short time, you may want to crank up the vm.min_free_kbytes tunable. This variable sets the low watermark of free memory, and setting it higher gives the kernel more room for emergency allocations (e.g. letting kswapd do its work). Setting it to 4096KB should help a lot, though I have seen people setting it much higher on servers.
In my case, min_free_kbytes was set to 1015 (default, I guess). I'll try crancking it up. Machine has 1 gig of memory, so there's plenty of space to play with that parameter before performance will suffer noticably. Anyhow, I'd expect that min_free_kbytes would be performance tunable. Not something I'd need to touch to make machine more stable.
On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote:
<flame mode="on"> Now, I wouldn't call this kind of thing "stable" operating system or "stable" file system. If application asks for too much memory it should get killed (btw, system had 1 gig of RAM and application asked for like 600 meg, plus there was plenty of swap space free too -- so I wouldn't call this a case of app asking too much).
Just because you have some memory to spare doesn't mean that malloc() can hand you 600 megs of contiguous memory. And I've lost track of what the fashionable way of handling VM is now. Is it the slow but sure "check first and fail if impossible" or the fast and dirty "always succeed and worry about paging later"? Or is it user-configurable now? Usually I just allocate 2 gigs of swap on the theory that if it goes that far the machine will be so unresponsive I'll think it's dead anyway.
Quoting Les Mikesell lesmikesell@gmail.com:
Just because you have some memory to spare doesn't mean that malloc() can hand you 600 megs of contiguous memory. And I've lost track of what the fashionable way of handling VM is now. Is it the slow but sure "check first and fail if impossible" or the fast and dirty "always succeed and worry about paging later"? Or is it user-configurable now? Usually I just allocate 2 gigs of swap on the theory that if it goes that far the machine will be so unresponsive I'll think it's dead anyway.
I beleive it is the "lazy" system, where request for memory allocation is always successfull, and the app gets killed if there is not enough backing store later. I know this is configurable parameter in Tru64 and Solaris. Not sure if it is configurable parameter in Linux.
However, the thing is, there was enough free swap space to handle things. More than 1 gig of free swap space, that is. There was enough free swap to move *everything* that was in physical RAM onto it, if needed. So it wasn't the case of "ran out of free swap". It was a clear failure of VM to utilize the resources it had (real memory + swap space). The VM should have put things on hold until enough pages were swapped out. Not kill things or deny memory to other kernel modules (and in this case, cause file system corruption).
Ralph Angenendt wrote:
Feizhou wrote:
I wonder how Hans and reiser4 are getting along...
Oh, I think those two are getting along great. The problem is: How are Hans and the rest of the world getting along?
that's too bad. Last I checked, Andrew Morton was trying to get something going.
On Tuesday 15 August 2006 06:13, Mark Strong wrote:
Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
...
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
I don't have a full answer for you, what happen probably depends alot on both hardware and software configuration and load. But, I'll second the above statement, don't use the xfs module as shipped with the centosplus kernel (afaict it's still vanilla from 2.6.9 and it did break for me when I tested) but go with the kernel-module-xfs package.
We've been running ~5 servers and ~10T on centos with the stand-alone module for quite some time and seen no problems (nfs serving on x86_64 using 3ware cards for storage) YMMW...
/Peter
On Tue, 15 Aug 2006, Peter [windows-1252] Kjellström wrote:
On Tuesday 15 August 2006 06:13, Mark Strong wrote:
Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
...
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
I don't have a full answer for you, what happen probably depends alot on both hardware and software configuration and load. But, I'll second the above statement, don't use the xfs module as shipped with the centosplus kernel (afaict it's still vanilla from 2.6.9 and it did break for me when I tested) but go with the kernel-module-xfs package.
We've been running ~5 servers and ~10T on centos with the stand-alone module for quite some time and seen no problems (nfs serving on x86_64 using 3ware cards for storage) YMMW...
XFS works alot better on x86_64 kernels as they have 8k stacks.
-Connie Sieh
/Peter
Hi all, thanks for the replies.
I'll take it under advisement that XFS may eat(read fill it up with zeros) my data without a moments warning.
But I'll give it a go (the kernel-module-xfs-xxxx.rpm) that is.
I currently have a debian(my normal distro choice) system with a kernel.org kernel 2.6.11.4 running XFS on stripe of scsi disks, and had a little trouble at the start, but its been working fine lately (uptime of 225 days). XFS can be good under certain circumstances.
And see what several debain devel's say about XFS.
I think those debian guys are crazy to run XFS on laptops and/or normal desktops (unless you were testing XFS for robustness, during the inevitable power failures).
Regards Mark Strong.
On Tue, 2006-08-15 at 09:22 -0500, Connie Sieh wrote:
On Tue, 15 Aug 2006, Peter [windows-1252] Kjellström wrote:
On Tuesday 15 August 2006 06:13, Mark Strong wrote:
Hi All, after looking around for info on XFS(the filesystem) and its use on CentOS and/or RHEL 4. There seems to be a lot of noise about 4K Stacks (especially on linux-xfs@oss.sgi.com).
...
You don't want to use the XFS in the centosplus kernel. It has major known issues with 4K stacks (leading to overflows). Use the kernel-module-xfs (or somesuch) RPM instead, and you should have better luck.
I don't have a full answer for you, what happen probably depends alot on both hardware and software configuration and load. But, I'll second the above statement, don't use the xfs module as shipped with the centosplus kernel (afaict it's still vanilla from 2.6.9 and it did break for me when I tested) but go with the kernel-module-xfs package.
We've been running ~5 servers and ~10T on centos with the stand-alone module for quite some time and seen no problems (nfs serving on x86_64 using 3ware cards for storage) YMMW...
XFS works alot better on x86_64 kernels as they have 8k stacks.
-Connie Sieh
/Peter
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of Transaction Network Services. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.