Hi list,
during the last days there was a discussion going on about the stability of XFS; though I myself used XFS heavily and didn't run into issues yet, I'd like to ask something *before* we create our next generation data storage backend...
Les Mikesell wrote in [0] about issues in the combination of XFS and LVM -- however, it was being discussed in context of using 32bit kernels.
What I specifically need is to run XFS (or something similar, I am *not* forced to use XFS, but it was my preference for some years now, and I didn't have any issues with it yet) on top of LVM to be able to create snapshots. We're talking about several file systems of a size at about 4TiByte each.
On another place [1] I read that there were issues with that.
Can anyone shed some light on this? Would be very appreciated.
Regards,
Timo
[0] -- http://lists.centos.org/pipermail/centos/2009-December/086850.html
On Dec 9, 2009, at 8:05 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
Hi list,
during the last days there was a discussion going on about the stability of XFS; though I myself used XFS heavily and didn't run into issues yet, I'd like to ask something *before* we create our next generation data storage backend...
Les Mikesell wrote in [0] about issues in the combination of XFS and LVM -- however, it was being discussed in context of using 32bit kernels.
What I specifically need is to run XFS (or something similar, I am *not* forced to use XFS, but it was my preference for some years now, and I didn't have any issues with it yet) on top of LVM to be able to create snapshots. We're talking about several file systems of a size at about 4TiByte each.
On another place [1] I read that there were issues with that.
Can anyone shed some light on this? Would be very appreciated.
There is no problem if it is done on x86_64 with it's 8k stack frames, but on i386 with it's 4k stack frames you could run into a stack overflow when doing it on top of stackable block devices (md raid, lvm, drbd, etc).
Also since the current LVM on CentOS doesn't support barriers (next release I believe) journalling isn't safe on LVM unless you are using a storage controller with BBU write-back cache.
I have heard anyways that the current implementation of barriers isn't very performant and doesn't take into consideration controllers with BBU cache, so most people will end up mounting with nobarriers which just means they are in the same boat as they are now. Better make sure your machine is bullet proof as a power outage or a kernel panic can spell disaster for XFS (or any other file system really).
It is better to invest in a good hardware RAID controller until the whole barriers stuff is ironed out. It should really perform better then it does.
-Ross
thus Ross Walker spake:
On Dec 9, 2009, at 8:05 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
Hi list,
during the last days there was a discussion going on about the stability of XFS; though I myself used XFS heavily and didn't run into issues yet, I'd like to ask something *before* we create our next generation data storage backend...
Les Mikesell wrote in [0] about issues in the combination of XFS and LVM -- however, it was being discussed in context of using 32bit kernels.
What I specifically need is to run XFS (or something similar, I am *not* forced to use XFS, but it was my preference for some years now, and I didn't have any issues with it yet) on top of LVM to be able to create snapshots. We're talking about several file systems of a size at about 4TiByte each.
On another place [1] I read that there were issues with that.
Can anyone shed some light on this? Would be very appreciated.
There is no problem if it is done on x86_64 with it's 8k stack frames, but on i386 with it's 4k stack frames you could run into a stack overflow when doing it on top of stackable block devices (md raid, lvm, drbd, etc).
Also since the current LVM on CentOS doesn't support barriers (next release I believe) journalling isn't safe on LVM unless you are using a storage controller with BBU write-back cache.
I have heard anyways that the current implementation of barriers isn't very performant and doesn't take into consideration controllers with BBU cache, so most people will end up mounting with nobarriers which just means they are in the same boat as they are now. Better make sure your machine is bullet proof as a power outage or a kernel panic can spell disaster for XFS (or any other file system really).
It is better to invest in a good hardware RAID controller until the whole barriers stuff is ironed out. It should really perform better then it does.
Thanks for your detailed explanation, that really clears things up; however, I was intending to build a software RAID10 as we had really not so good experiences on hw RAID controllers int the past (for all kinds of phenomena).
Would barriering here still be a problem then?
Timo
-Ross
On Dec 9, 2009, at 10:39 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
thus Ross Walker spake:
On Dec 9, 2009, at 8:05 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
Hi list,
during the last days there was a discussion going on about the stability of XFS; though I myself used XFS heavily and didn't run into issues yet, I'd like to ask something *before* we create our next generation data storage backend...
Les Mikesell wrote in [0] about issues in the combination of XFS and LVM -- however, it was being discussed in context of using 32bit kernels.
What I specifically need is to run XFS (or something similar, I am *not* forced to use XFS, but it was my preference for some years now, and I didn't have any issues with it yet) on top of LVM to be able to create snapshots. We're talking about several file systems of a size at about 4TiByte each.
On another place [1] I read that there were issues with that.
Can anyone shed some light on this? Would be very appreciated.
There is no problem if it is done on x86_64 with it's 8k stack frames, but on i386 with it's 4k stack frames you could run into a stack overflow when doing it on top of stackable block devices (md raid, lvm, drbd, etc).
Also since the current LVM on CentOS doesn't support barriers (next release I believe) journalling isn't safe on LVM unless you are using a storage controller with BBU write-back cache.
I have heard anyways that the current implementation of barriers isn't very performant and doesn't take into consideration controllers with BBU cache, so most people will end up mounting with nobarriers which just means they are in the same boat as they are now. Better make sure your machine is bullet proof as a power outage or a kernel panic can spell disaster for XFS (or any other file system really).
It is better to invest in a good hardware RAID controller until the whole barriers stuff is ironed out. It should really perform better then it does.
Thanks for your detailed explanation, that really clears things up; however, I was intending to build a software RAID10 as we had really not so good experiences on hw RAID controllers int the past (for all kinds of phenomena).
Would barriering here still be a problem then?
So long as LVM isn't involved it will use barriers, but I can tell you you will be less then impressed by the performance.
Go for hardware RAID with BBU write-cache, go for a good hardware RAID solution, look to spend $350-$700 get one that supports SAS and SATA. I like the LSI MegaRAID cards with 512MB of battery backed cache.
Some cards allow you to run in JBOD mode with battery backed write- back cache enabled, so if you really want software RAID you can run it and still have fast, safe performance (though you spread the cache a little thin across that many logical units).
-Ross
thus Ross Walker spake:
On Dec 9, 2009, at 10:39 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
thus Ross Walker spake:
On Dec 9, 2009, at 8:05 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
Hi list,
during the last days there was a discussion going on about the stability of XFS; though I myself used XFS heavily and didn't run into issues yet, I'd like to ask something *before* we create our next generation data storage backend...
Les Mikesell wrote in [0] about issues in the combination of XFS and LVM -- however, it was being discussed in context of using 32bit kernels.
What I specifically need is to run XFS (or something similar, I am *not* forced to use XFS, but it was my preference for some years now, and I didn't have any issues with it yet) on top of LVM to be able to create snapshots. We're talking about several file systems of a size at about 4TiByte each.
On another place [1] I read that there were issues with that.
Can anyone shed some light on this? Would be very appreciated.
There is no problem if it is done on x86_64 with it's 8k stack frames, but on i386 with it's 4k stack frames you could run into a stack overflow when doing it on top of stackable block devices (md raid, lvm, drbd, etc).
Also since the current LVM on CentOS doesn't support barriers (next release I believe) journalling isn't safe on LVM unless you are using a storage controller with BBU write-back cache.
I have heard anyways that the current implementation of barriers isn't very performant and doesn't take into consideration controllers with BBU cache, so most people will end up mounting with nobarriers which just means they are in the same boat as they are now. Better make sure your machine is bullet proof as a power outage or a kernel panic can spell disaster for XFS (or any other file system really).
It is better to invest in a good hardware RAID controller until the whole barriers stuff is ironed out. It should really perform better then it does.
Thanks for your detailed explanation, that really clears things up; however, I was intending to build a software RAID10 as we had really not so good experiences on hw RAID controllers int the past (for all kinds of phenomena).
Would barriering here still be a problem then?
So long as LVM isn't involved it will use barriers, but I can tell you you will be less then impressed by the performance.
Go for hardware RAID with BBU write-cache, go for a good hardware RAID solution, look to spend $350-$700 get one that supports SAS and SATA. I like the LSI MegaRAID cards with 512MB of battery backed cache.
Some cards allow you to run in JBOD mode with battery backed write- back cache enabled, so if you really want software RAID you can run it and still have fast, safe performance (though you spread the cache a little thin across that many logical units).
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
-Ross
Timo
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
thus Christopher Chan spake:
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
Well, for me the recommended LSI is okay as it's my favorite vendor, too. I used to abandon Adaptec quite a while ago and my optinion was confirmed when the OpenBSD vs. Adaptec discussion came up. However, the question on the hardware RAID's vendor is totally independent from the file system discussion.
I re-read XFS's FAQ on this issues, seems to me that we have to set up two machines in the lab, one purely software RAID driven, and one with a JBOD configured hardware RAID controller, and then benchmark and stress testing the setup.
Timo
Timo Schoeler wrote:
thus Christopher Chan spake:
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
Well, for me the recommended LSI is okay as it's my favorite vendor, too. I used to abandon Adaptec quite a while ago and my optinion was confirmed when the OpenBSD vs. Adaptec discussion came up. However, the question on the hardware RAID's vendor is totally independent from the file system discussion.
Oh yeah it is. If you use hardware raid, you do not need barriers and can afford to turn it off for better performance or use LVM for that matter.
I re-read XFS's FAQ on this issues, seems to me that we have to set up two machines in the lab, one purely software RAID driven, and one with a JBOD configured hardware RAID controller, and then benchmark and stress testing the setup.
JBOD? You plan to use software raid with that? Why?!
[off list]
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
Well, for me the recommended LSI is okay as it's my favorite vendor, too. I used to abandon Adaptec quite a while ago and my optinion was confirmed when the OpenBSD vs. Adaptec discussion came up. However, the question on the hardware RAID's vendor is totally independent from the file system discussion.
Oh yeah it is. If you use hardware raid, you do not need barriers and can afford to turn it off for better performance or use LVM for that matter.
Hi, this ist off list: Could you please explain me the LVM vs. barrier thing?
AFAIU, one should turn off write caches on HDs (in any case), and -- if there's a BBU backed up RAID controller -- use this cache, but turn off barriers. When does LVM come into play here? Thanks in advance! :)
I re-read XFS's FAQ on this issues, seems to me that we have to set up two machines in the lab, one purely software RAID driven, and one with a JBOD configured hardware RAID controller, and then benchmark and stress testing the setup.
JBOD? You plan to use software raid with that? Why?!
Mainly due to better manageability and monitoring. Honestly, all the proprietary tools are not the best.
Timo
Timo Schoeler wrote:
[off list]
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
Well, for me the recommended LSI is okay as it's my favorite vendor, too. I used to abandon Adaptec quite a while ago and my optinion was confirmed when the OpenBSD vs. Adaptec discussion came up. However, the question on the hardware RAID's vendor is totally independent from the file system discussion.
Oh yeah it is. If you use hardware raid, you do not need barriers and can afford to turn it off for better performance or use LVM for that matter.
Hi, this ist off list: Could you please explain me the LVM vs. barrier thing?
AFAIU, one should turn off write caches on HDs (in any case), and -- if there's a BBU backed up RAID controller -- use this cache, but turn off barriers. When does LVM come into play here? Thanks in advance! :)
No, barriers are specifically to allow you to turn on write caches on HDs and not lose data. Before barriers, fsync/fsyncdata lied. They would return before data hit the platters. With barriers, fsync/fsyncdata will return only after data hit the platters.
However, the dm layer does not support barriers so you need to turn write caches off if you care about data with lvm and you have no bbu cache to use.
If you use a hardware raid card with bbu cache, you can use lvm without worrying and if not using lvm, you can (should in the case of XFS) turn off barriers.
I re-read XFS's FAQ on this issues, seems to me that we have to set up two machines in the lab, one purely software RAID driven, and one with a JBOD configured hardware RAID controller, and then benchmark and stress testing the setup.
JBOD? You plan to use software raid with that? Why?!
Mainly due to better manageability and monitoring. Honestly, all the proprietary tools are not the best.
3dm2 for 3ware was pretty decent whether http or cli...
On Dec 10, 2009, at 4:28 AM, Timo Schoeler timo.schoeler@riscworks.net wrote:
[off list]
Thanks for your eMail, Ross. So, reading all the stuff here I'm really concerned about moving all our data to such a system. The reason we're moving is mainly, but not only the longisch fsck UFS (FreeBSD) needs after a crash. XFS seemed to me to fit perfectly as I never had issues with fsck here. However, this discussion seems to change my mindset. So, what would be an alternative (if possible not using hardware RAID controllers, as already mentioned)? ext3 is not, here we have long fsck runs, too. Even ext4 seems not too good in this area...
I thought 3ware would have been good. Their cards have been praised for quite some time...have things changed? What about Adaptec?
Well, for me the recommended LSI is okay as it's my favorite vendor, too. I used to abandon Adaptec quite a while ago and my optinion was confirmed when the OpenBSD vs. Adaptec discussion came up. However, the question on the hardware RAID's vendor is totally independent from the file system discussion.
Oh yeah it is. If you use hardware raid, you do not need barriers and can afford to turn it off for better performance or use LVM for that matter.
Hi, this ist off list: Could you please explain me the LVM vs. barrier thing?
AFAIU, one should turn off write caches on HDs (in any case), and -- if there's a BBU backed up RAID controller -- use this cache, but turn off barriers. When does LVM come into play here? Thanks in advance! :)
LVM like md raid and drbd is a layered block device and If you turn the wire caches off on the HDs then there is no problem, but HDs aren't designed to perform to spec with the write cache disabled they expect important data is written with FUA access (forced unit access), so performance will be terrible.
I re-read XFS's FAQ on this issues, seems to me that we have to set up two machines in the lab, one purely software RAID driven, and one with a JBOD configured hardware RAID controller, and then benchmark and stress testing the setup.
JBOD? You plan to use software raid with that? Why?!
Mainly due to better manageability and monitoring. Honestly, all the proprietary tools are not the best.
Timo _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
LVM like md raid and drbd is a layered block device and If you turn the wire caches off on the HDs then there is no problem, but HDs aren't designed to perform to spec with the write cache disabled they expect important data is written with FUA access (forced unit access), so performance will be terrible.
I hope that I'm not going too much off topic here, but I'm getting worried not to be sure to understand, especially when it has to do with data safety:
Considering a stack of: - ext3 - on top of LVM2 - on top of software RAID1 - on top of regular SATA disks (no hardware RAID) is it "safe" to have the HD cache enabled?
(Note: ext3, not XFS, hence the possible off-topic...)
In other words, is this discussion about barriers, etc. only relevant to XFS?
Mathieu Baudier wrote:
LVM like md raid and drbd is a layered block device and If you turn the wire caches off on the HDs then there is no problem, but HDs aren't designed to perform to spec with the write cache disabled they expect important data is written with FUA access (forced unit access), so performance will be terrible.
I hope that I'm not going too much off topic here, but I'm getting worried not to be sure to understand, especially when it has to do with data safety:
Considering a stack of:
- ext3
- on top of LVM2
- on top of software RAID1
- on top of regular SATA disks (no hardware RAID)
is it "safe" to have the HD cache enabled?
(Note: ext3, not XFS, hence the possible off-topic...)
Nothing is safe once device-mapper is involved.
In other words, is this discussion about barriers, etc. only relevant to XFS?
No, it applies to all filesystems. Prior to barriers, fsync/fsyncdata lies. See the man page for fsync.
Chan Chung Hang Christopher wrote:
Mathieu Baudier wrote:
LVM like md raid and drbd is a layered block device and If you turn the wire caches off on the HDs then there is no problem, but HDs aren't designed to perform to spec with the write cache disabled they expect important data is written with FUA access (forced unit access), so performance will be terrible.
I hope that I'm not going too much off topic here, but I'm getting worried not to be sure to understand, especially when it has to do with data safety:
Considering a stack of:
- ext3
- on top of LVM2
- on top of software RAID1
- on top of regular SATA disks (no hardware RAID)
is it "safe" to have the HD cache enabled?
(Note: ext3, not XFS, hence the possible off-topic...)
Nothing is safe once device-mapper is involved.
In other words, is this discussion about barriers, etc. only relevant to XFS?
No, it applies to all filesystems. Prior to barriers, fsync/fsyncdata lies. See the man page for fsync.
No mention of barriers in the man page, I'm also getting confused. is device mapper used for software raid - i.e. /dev/mdX? If so what are the implications of barriers and where are they turned on / off? Forgive me for potential off topic, but I too run xfs on lvm which uses mapper.......risky??
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
No mention of barriers in the man page, I'm also getting confused. is device mapper used for software raid - i.e. /dev/mdX?
Nope. Software raid is the md layer. Nothing to do with dm. Two separate layers although they share a bit of stuff.
If so what are the implications of barriers and where are they turned on / off?
Barriers allow one to ensure true fsync/fsyncdata when used with hard disks that have their write cache enabled. This is not talking about hard drives connected to hardware raid controllers which is a different ball game.
Forgive me for potential off topic, but I too run xfs on lvm which uses mapper.......risky??
IF you are not using hardware raid with bbu cache and of course, if you have disabled the write caches on the hard drives connected to the raid via the raid controller.
On Fri, Dec 11, 2009 at 12:10:59AM +0800, Chan Chung Hang Christopher wrote:
Mathieu Baudier wrote:
LVM like md raid and drbd is a layered block device and If you turn the wire caches off on the HDs then there is no problem, but HDs aren't designed to perform to spec with the write cache disabled they expect important data is written with FUA access (forced unit access), so performance will be terrible.
I hope that I'm not going too much off topic here, but I'm getting worried not to be sure to understand, especially when it has to do with data safety:
Considering a stack of:
- ext3
- on top of LVM2
- on top of software RAID1
- on top of regular SATA disks (no hardware RAID)
is it "safe" to have the HD cache enabled?
(Note: ext3, not XFS, hence the possible off-topic...)
Nothing is safe once device-mapper is involved.
https://www.redhat.com/archives/dm-devel/2009-December/msg00079.html
"Barriers are now supported by all the types of dm devices."
-- Pasi