Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Mark Haney wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Btrfs appears rather useful because the disks are SSDs, because it allows me to create subvolumes and because it handles SSDs nicely. Unfortunately, the SSDs are not suited for hardware RAID.
The only alternative I know is xfs or ext4 on mdadm and no subvolumes, and md RAID has severe performance penalties which I´m not willing to afford.
Part of the data I plan to store on these SSDs greatly benefits from the low latency, making things about 20--30 times faster for an important application.
So what should I do?
PS:
What kind of storage solutions do people use for cyrus mail spools? Apparently you can not use remote storage, at least not NFS. That even makes it difficult to use a VM due to limitations of available disk space.
I´m reluctant to use btrfs, but there doesn´t seem to be any reasonable alternative.
hw wrote:
Mark Haney wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Btrfs appears rather useful because the disks are SSDs, because it allows me to create subvolumes and because it handles SSDs nicely. Unfortunately, the SSDs are not suited for hardware RAID.
The only alternative I know is xfs or ext4 on mdadm and no subvolumes, and md RAID has severe performance penalties which I´m not willing to afford.
Part of the data I plan to store on these SSDs greatly benefits from the low latency, making things about 20--30 times faster for an important application.
So what should I do? _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
I hate top posting, but since you've got two items I want to comment on, I'll suck it up for now.
Having SSDs alone will give you great performance regardless of filesystem. BTRFS isn't going to impact I/O any more significantly than, say, XFS. It does have serious stability/data integrity issues that XFS doesn't have. There's no reason not to use SSDs for storage of immediate data and mechanical drives for archival data storage.
As for VMs we run a huge Zimbra cluster in VMs on VPC with large primary SSD volumes and even larger (and slower) secondary volumes for archived mail. It's all CentOS 6 and works very well. We process 600 million emails a month on that virtual cluster. All EXT4 inside LVM.
I can't tell you what to do, but it seems to me you're viewing your setup from a narrow SSD/BTRFS standpoint. Lots of ways to skin that cat.
On 09/08/2017 08:07 AM, hw wrote:
PS:
What kind of storage solutions do people use for cyrus mail spools? Apparently you can not use remote storage, at least not NFS. That even makes it difficult to use a VM due to limitations of available disk space.
I´m reluctant to use btrfs, but there doesn´t seem to be any reasonable alternative.
hw wrote:
Mark Haney wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Btrfs appears rather useful because the disks are SSDs, because it allows me to create subvolumes and because it handles SSDs nicely. Unfortunately, the SSDs are not suited for hardware RAID.
The only alternative I know is xfs or ext4 on mdadm and no subvolumes, and md RAID has severe performance penalties which I´m not willing to afford.
Part of the data I plan to store on these SSDs greatly benefits from the low latency, making things about 20--30 times faster for an important application.
So what should I do?
Mark Haney wrote:
I hate top posting, but since you've got two items I want to comment on, I'll suck it up for now.
I do, too, yet sometimes it´s reasonable. I also hate it when the lines are too long :)
Having SSDs alone will give you great performance regardless of filesystem.
It depends, i. e. I can´t tell how these SSDs would behave if large amounts of data would be written and/or read to/from them over extended periods of time because I haven´t tested that. That isn´t the application, anyway.
BTRFS isn't going to impact I/O any more significantly than, say, XFS.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services. I don´t know if the software RAID of btrfs is better in that or not, though, but I´m seeing btrfs on SSDs being fast, and testing with the particular application has shown a speedup of factor 20--30.
That is the crucial improvement. If the hardware RAID delivers that, I´ll use that and probably remove the SSDs from the machine as it wouldn´t even make sense to put temporary data onto them because that would involve software RAID.
It does have serious stability/data integrity issues that XFS doesn't have. There's no reason not to use SSDs for storage of immediate data and mechanical drives for archival data storage.
As for VMs we run a huge Zimbra cluster in VMs on VPC with large primary SSD volumes and even larger (and slower) secondary volumes for archived mail. It's all CentOS 6 and works very well. We process 600 million emails a month on that virtual cluster. All EXT4 inside LVM.
Do you use hardware RAID with SSDs?
I can't tell you what to do, but it seems to me you're viewing your setup from a narrow SSD/BTRFS standpoint. Lots of ways to skin that cat.
That´s because I do not store data on a single disk, without redundancy, and the SSDs I have are not suitable for hardware RAID. So what else is there but either md-RAID or btrfs when I do not want to use ZFS? I also do not want to use md-RAID, hence only btrfs remains. I also like to use sub-volumes, though that isn´t a requirement (because I can use directories instead and loose the ability to make snapshots).
I stay away from LVM because that just sucks. It wouldn´t even have any advantage in this case.
On 09/08/2017 08:07 AM, hw wrote:
PS:
What kind of storage solutions do people use for cyrus mail spools? Apparently you can not use remote storage, at least not NFS. That even makes it difficult to use a VM due to limitations of available disk space.
I´m reluctant to use btrfs, but there doesn´t seem to be any reasonable alternative.
hw wrote:
Mark Haney wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Btrfs appears rather useful because the disks are SSDs, because it allows me to create subvolumes and because it handles SSDs nicely. Unfortunately, the SSDs are not suited for hardware RAID.
The only alternative I know is xfs or ext4 on mdadm and no subvolumes, and md RAID has severe performance penalties which I´m not willing to afford.
Part of the data I plan to store on these SSDs greatly benefits from the low latency, making things about 20--30 times faster for an important application.
So what should I do?
hw wrote:
Mark Haney wrote:
<snip>
BTRFS isn't going to impact I/O any more significantly than, say, XFS.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services.
<snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
mark
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but I´ve seen the impact myself, and I definitely don´t want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
It remains to be tested how the hardware RAID performs, which may be even better than the SSDs.
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but I´ve seen the impact myself, and I definitely don´t > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
Thanks. Valeri
It remains to be tested how the hardware RAID performs, which may be even better than the SSDs. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 8 September 2017 at 11:00, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but I´ve seen the impact myself, and I definitely don´t > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
It will depend on the type of SSD and the type of hardware RAID. There are at least 4 different classes of SSD drives with different levels of cache, write/read performance, number of lifetime writes, etc. There are also multiple types of hardware RAID. A lot of hardware RAID will try to even out disk usage in different ways. This means 'moving' the heavily used data from slow parts to fast parts etc etc. On an SSD all these extra writes aren't needed and so if the hardware RAID doesn't know about SSD technology it will wear out the SSD quickly. Other hardware raid parts that can cause faster failures on SSD's are where it does test writes all the time to see if disks are bad etc. Again if you have gone with commodity SSD's this will wear out the drive faster than expected and boom bad disks.
That said, some hardware RAID's are supposedly made to work with SSD drive technology. They don't do those extra writes, they also assume that the disks underneath will read/write in near constant time so queueing of data is done differently. However that stuff costs extra money and not usually shipped in standard OEM hardware.
Thanks. Valeri
It remains to be tested how the hardware RAID performs, which may be even better than the SSDs. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++ _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On Fri, September 8, 2017 11:07 am, Stephen John Smoogen wrote:
On 8 September 2017 at 11:00, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, >> XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but Iôve seen the impact myself, and I definitely > donôt > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
It will depend on the type of SSD and the type of hardware RAID. There are at least 4 different classes of SSD drives with different levels of cache, write/read performance, number of lifetime writes, etc. There are also multiple types of hardware RAID. A lot of hardware RAID will try to even out disk usage in different ways. This means 'moving' the heavily used data from slow parts to fast parts etc etc.
Wow, you learn something every day ;-) Which hardware RAIDs do these moving of data (manufacturer/model, please - believe it or not I never heard of that ;-). And "slow part" and "fast part" of what are data being moved between?
Thanks in advance for tutorial!
Valeri
On an SSD all these extra writes aren't needed and so if the hardware RAID doesn't know about SSD technology it will wear out the SSD quickly. Other hardware raid parts that can cause faster failures on SSD's are where it does test writes all the time to see if disks are bad etc. Again if you have gone with commodity SSD's this will wear out the drive faster than expected and boom bad disks.
That said, some hardware RAID's are supposedly made to work with SSD drive technology. They don't do those extra writes, they also assume that the disks underneath will read/write in near constant time so queueing of data is done differently. However that stuff costs extra money and not usually shipped in standard OEM hardware.
Thanks. Valeri
It remains to be tested how the hardware RAID performs, which may be even better than the SSDs. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++ _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
-- Stephen J Smoogen. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 8 September 2017 at 12:13, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
On Fri, September 8, 2017 11:07 am, Stephen John Smoogen wrote:
On 8 September 2017 at 11:00, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, >> XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but I´ve seen the impact myself, and I definitely > don´t > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
It will depend on the type of SSD and the type of hardware RAID. There are at least 4 different classes of SSD drives with different levels of cache, write/read performance, number of lifetime writes, etc. There are also multiple types of hardware RAID. A lot of hardware RAID will try to even out disk usage in different ways. This means 'moving' the heavily used data from slow parts to fast parts etc etc.
Wow, you learn something every day ;-) Which hardware RAIDs do these moving of data (manufacturer/model, please - believe it or not I never heard of that ;-). And "slow part" and "fast part" of what are data being moved between?
Thanks in advance for tutorial!
I thought it was HP who had these, but I can't find it.. which means without references... I get an F. My apologies on that. Thank you for keeping me honest.
Valeri Galtsev wrote:
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but I´ve seen the impact myself, and I definitely don´t > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
When you search for it, you´ll find that besides wearing out undesirably fast --- which apparently can be contributed mostly to less overcommitment of the drive --- you may also experience degraded performance over time which can be worse than you would get with spinning disks, or at least not much better.
Add to that the firmware being designed for an entirely different application and having bugs, and your experiences with surprisingly incompatible hardware, and you can imagine that using an SSD not designed for hardware RAID applications with hardware RAID is a bad idea. There is a difference like night and day between "consumer hardware" and hardware you can actually use, and that is not only the price you pay for it.
On Fri, September 8, 2017 12:56 pm, hw wrote:
Valeri Galtsev wrote:
On Fri, September 8, 2017 9:48 am, hw wrote:
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
<snip> >> BTRFS isn't going to impact I/O any more significantly than, say, >> XFS. > > But mdadm does, the impact is severe. I know there are ppl saying > otherwise, but Iôve seen the impact myself, and I definitely > donôt > want > it on that particular server because it would likely interfere with > other services. <snip> I haven't really been following this thread, but if your requirements are that heavy, you're past the point that you need to spring some money and buy hardware RAID cards, like LSI, er, Avago, I mean, who's bought them more recently?
Heavy requirements are not required for the impact of md-RAID to be noticeable.
Hardware RAID is already in place, but the SSDs are "extra" and, as I said, not suited to be used with hardware RAID.
Could someone, please, elaborate on the statement that "SSDs are not suitable for hardware RAID".
When you search for it, you´ll find that besides wearing out undesirably fast --- which apparently can be contributed mostly to less overcommitment of the drive --- you may also experience degraded performance over time which can be worse than you would get with spinning disks, or at least not much better.
Thanks. That seems to clear fog a little bit. I still would like to hear manufacturers/models here. My choices would be: Areca or LSI (bought out by Intel, so former LSI chipset and microcode/firmware) and as SSD Samsung Evo SATA III. Does anyone who used these in hardware RAID can offer any bad experience description?
I am kind of shying away from "crap" hardware which in a long run is more expensive, even though looks cheaper (Pricegrabber is your enemy - I would normally say to my users). So, I never would consider using poorly/cheaply designed hardware in some setup (e.g. hardware RAID based storage) one expects performance from. Am I still taking chance hitting "bad" hardware RAID + SSD combination? Just curious where we actually stand.
Thanks again for fruitful discussion!
Valeri
Add to that the firmware being designed for an entirely different application and having bugs, and your experiences with surprisingly incompatible hardware, and you can imagine that using an SSD not designed for hardware RAID applications with hardware RAID is a bad idea. There is a difference like night and day between "consumer hardware" and hardware you can actually use, and that is not only the price you pay for it. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 9/8/2017 12:52 PM, Valeri Galtsev wrote:
Thanks. That seems to clear fog a little bit. I still would like to hear manufacturers/models here. My choices would be: Areca or LSI (bought out by Intel, so former LSI chipset and microcode/firmware) and as SSD Samsung Evo SATA III. Does anyone who used these in hardware RAID can offer any bad experience description?
Does the Samsung EVO have supercaps and write-back buffer protection? if not, it is in NO way suitable for reliable use in a raid/server environment.
as far as raiding SSDs go, the ONLY raid I'd use with them is raid1 mirroring (or if more than 2, raid10 striped mirrors). And I'd probably do it with OS based software raid, as thats more likely to support SSD trim than a hardware raid card, plus allows the host to monitor the SSDs via SMART, which a hardware raid card probably hides.
I'd also make sure I undercommit the size of the SSD, so if its a 500GB SSD, I'd make absolutely sure to never have more than 300-350GB of data on it. if its part of a stripe set, the only way to ensure this is to partition it so the raid slice is only 300-350GB.
On Fri, September 8, 2017 3:06 pm, John R Pierce wrote:
On 9/8/2017 12:52 PM, Valeri Galtsev wrote:
Thanks. That seems to clear fog a little bit. I still would like to hear manufacturers/models here. My choices would be: Areca or LSI (bought out by Intel, so former LSI chipset and microcode/firmware) and as SSD Samsung Evo SATA III. Does anyone who used these in hardware RAID can offer any bad experience description?
Does the Samsung EVO have supercaps and write-back buffer protection? if not, it is in NO way suitable for reliable use in a raid/server environment.
With all due respect, John, this is the same as hard drive cache is not backed up power wise for a case of power loss. And hard drives all lie about write operation completed before data actually are on the platters. So we can claim the same: hard drives are not suitable for RAID. I implied to find out from experts in what respect they claim SSDs are unsuitable for hardware RAID as opposed to mechanical hard drives.
Am I missing something?
as far as raiding SSDs go, the ONLY raid I'd use with them is raid1 mirroring (or if more than 2, raid10 striped mirrors). And I'd probably do it with OS based software raid, as thats more likely to support SSD trim than a hardware raid card, plus allows the host to monitor the SSDs via SMART, which a hardware raid card probably hides.
Good, thanks. My 3ware RAIDs through their 3dm daemon do warn me about SMART status: fail (meaning the drive though working should according to SMART be replaced ASAP). Not certain off hand about LSI ones (one should be able to query them through command line client utility).
I'd also make sure I undercommit the size of the SSD, so if its a 500GB SSD, I'd make absolutely sure to never have more than 300-350GB of data on it.  if its part of a stripe set, the only way to ensure this is to partition it so the raid slice is only 300-350GB.
Great point! And one may want to adjust stripe size to be resembling SSDs internals, as default is for hard drives, right?
Thanks, John, that was instructive!
Valeri
-- john r pierce, recycling bits in santa cruz
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 9/8/2017 2:36 PM, Valeri Galtsev wrote:
With all due respect, John, this is the same as hard drive cache is not backed up power wise for a case of power loss. And hard drives all lie about write operation completed before data actually are on the platters. So we can claim the same: hard drives are not suitable for RAID. I implied to find out from experts in what respect they claim SSDs are unsuitable for hardware RAID as opposed to mechanical hard drives.
Am I missing something?
major difference is, SSD's do a LOT more write buffering as their internal write blocks are on the order of a few 100KB, also they extensively reorder data on the media, both for wear leveling and to minimize physical block writes so there's really no way the host and/or controller can track whats going on.
enterprise hard disks do NOT do hidden write buffering, its all fully managable via SAS or SATA commands. desktop drives tend to lie about it to achieve better performance. I do NOT use desktop drives in raids.
...
And one may want to adjust stripe size to be resembling SSDs internals, as default is for hard drives, right?
as the SSD physical data blocks have no visible relation to logical block numbers or CHS, its not practical to do this. I'd use a fairly large stripe size, like 1MB, so more data can be sequentially written to the same device (even tho the device will scramble it all over as it sees fit).
John R Pierce wrote:
And one may want to adjust stripe size to be resembling SSDs internals, as default is for hard drives, right?
as the SSD physical data blocks have no visible relation to logical block numbers or CHS, its not practical to do this. I'd use a fairly large stripe size, like 1MB, so more data can be sequentially written to the same device (even tho the device will scramble it all over as it sees fit).
Isn´t it easier for SSDs to write small chunks of data at a time? The small chunk might fit into some free space more easily than a large one which needs to be spread out all over the place.
On Sep 9, 2017, at 12:47 PM, hw hw@gc-24.de wrote:
Isn´t it easier for SSDs to write small chunks of data at a time?
SSDs read/write in large-ish (256k-4M) blocks/pages. Seems to me that drive blocks and hardware RAID strip size and file system block/cluster/extents sizes and etc and etc and etc should be aligned for best performance.
See: http://codecapsule.com/2014/02/12/coding-for-ssds-part-2-architecture-of-an-...
Specifically the section: NAND-flash pages and blocks
On 9/9/2017 9:47 AM, hw wrote:
Isn´t it easier for SSDs to write small chunks of data at a time? The small chunk might fit into some free space more easily than a large one which needs to be spread out all over the place.
the SSD collects data blocks being written and when a full flash block worth of data is collected, often 256K to several MB, it writes them all at once to a single contiguous block on the flash array, no matter what the 'address' of the blocks being written is. think of it as a 'scatter-gather' operation.
different drive brands and models use different strategies for this, and all this is completely opaque to the host OS so you really can't outguess or manage this process at the OS or disk controller level.
John R Pierce wrote:
On 9/9/2017 9:47 AM, hw wrote:
Isn´t it easier for SSDs to write small chunks of data at a time? The small chunk might fit into some free space more easily than a large one which needs to be spread out all over the place.
the SSD collects data blocks being written and when a full flash block worth of data is collected, often 256K to several MB, it writes them all at once to a single contiguous block on the flash array, no matter what the 'address' of the blocks being written is. think of it as a 'scatter-gather' operation.
different drive brands and models use different strategies for this, and all this is completely opaque to the host OS so you really can't outguess or manage this process at the OS or disk controller level.
What if the collector is full?
I understand that using small chunk sizes can reduce performance because many chunks need to be dealt with. Using large chunks would involve reading and writing larger amounts of data every time, and that also could reduce performance.
With a chunk size of 1MB, disk access might amount to huge amounts of data being read and written unnecessarily. So what might be a good chunk size for SSDs?
On 13 September 2017 at 09:25, hw hw@gc-24.de wrote:
John R Pierce wrote:
On 9/9/2017 9:47 AM, hw wrote:
Isn´t it easier for SSDs to write small chunks of data at a time? The small chunk might fit into some free space more easily than a large one which needs to be spread out all over the place.
the SSD collects data blocks being written and when a full flash block worth of data is collected, often 256K to several MB, it writes them all at once to a single contiguous block on the flash array, no matter what the 'address' of the blocks being written is. think of it as a 'scatter-gather' operation.
different drive brands and models use different strategies for this, and all this is completely opaque to the host OS so you really can't outguess or manage this process at the OS or disk controller level.
What if the collector is full?
I understand that using small chunk sizes can reduce performance because many chunks need to be dealt with. Using large chunks would involve reading and writing larger amounts of data every time, and that also could reduce performance.
With a chunk size of 1MB, disk access might amount to huge amounts of data being read and written unnecessarily. So what might be a good chunk size for SSDs?
It will depend on the type of SSD. Ones with large cache and various smarts (SAS Enterprise type) can take many different sizes. For SATA ones it depends on what the cache and write of the SSD is and very few of them seem to be the same. The SSD also has all kinds of logic which moves data around constantly on disk to wipe level so it makes it opaque. The people who have tested this usually have to burn through an SSD set to get an idea about a particular 'run' of a model but it doesn't go over every version of the model of SATA SSD.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Stephen John Smoogen wrote:
On 13 September 2017 at 09:25, hw hw@gc-24.de wrote:
John R Pierce wrote:
On 9/9/2017 9:47 AM, hw wrote:
Isn´t it easier for SSDs to write small chunks of data at a time? The small chunk might fit into some free space more easily than a large one which needs to be spread out all over the place.
the SSD collects data blocks being written and when a full flash block worth of data is collected, often 256K to several MB, it writes them all at once to a single contiguous block on the flash array, no matter what the 'address' of the blocks being written is. think of it as a 'scatter-gather' operation.
different drive brands and models use different strategies for this, and all this is completely opaque to the host OS so you really can't outguess or manage this process at the OS or disk controller level.
What if the collector is full?
I understand that using small chunk sizes can reduce performance because many chunks need to be dealt with. Using large chunks would involve reading and writing larger amounts of data every time, and that also could reduce performance.
With a chunk size of 1MB, disk access might amount to huge amounts of data being read and written unnecessarily. So what might be a good chunk size for SSDs?
It will depend on the type of SSD. Ones with large cache and various smarts (SAS Enterprise type) can take many different sizes. For SATA ones it depends on what the cache and write of the SSD is and very few of them seem to be the same. The SSD also has all kinds of logic which moves data around constantly on disk to wipe level so it makes it opaque. The people who have tested this usually have to burn through an SSD set to get an idea about a particular 'run' of a model but it doesn't go over every version of the model of SATA SSD.
Hm, so much to SSDs ... I can only hope they will be replaced with something better.
I have decided against putting anything onto these SSDs other than temporary data, but even for that, I would need to make an md-RAID, which I don´t want. It may work or not, and "may work" is not enough.
If the performance on the hardware RAID isn´t as good, it can not get worse than it is now, and it may be even better than with the SSDs.
I have two at home with the system installed on btrfs. I´m going to change that to md-RAID1 and xfs. Is there anything special involved in copying the system to another disk? Will 'cp -ax' do, or should I use rsync to copy xattrs etc.? Using the commonly used stripe size of 128kb is something I´d expect the SSDs being able to handle.
On 13 September 2017 at 12:00, hw hw@gc-24.de wrote:
It will depend on the type of SSD. Ones with large cache and various smarts (SAS Enterprise type) can take many different sizes. For SATA ones it depends on what the cache and write of the SSD is and very few of them seem to be the same. The SSD also has all kinds of logic which moves data around constantly on disk to wipe level so it makes it opaque. The people who have tested this usually have to burn through an SSD set to get an idea about a particular 'run' of a model but it doesn't go over every version of the model of SATA SSD.
Hm, so much to SSDs ... I can only hope they will be replaced with something better.
I have decided against putting anything onto these SSDs other than temporary data, but even for that, I would need to make an md-RAID, which I don´t want. It may work or not, and "may work" is not enough.
May work is part of any commodity hardware build. The SATA hard drives do not use the same technology as 4 years ago and you may end up with them having crap out on shorter lifetimes because they aren't built to live longer than 3 years depending on the model. [It doesn't matter the brand.. they get built with the same tech and at the same place these days.]
If the performance on the hardware RAID isn´t as good, it can not get worse than it is now, and it may be even better than with the SSDs.
I have two at home with the system installed on btrfs. I´m going to change that to md-RAID1 and xfs. Is there anything special involved in copying the system to another disk? Will 'cp -ax' do, or should I use rsync to copy xattrs etc.? Using the commonly used stripe size of 128kb is something I´d expect the SSDs being able to handle.
Depending on what CentOS you are working, cp -a will preserve xattrs.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Stephen John Smoogen wrote:
On 13 September 2017 at 12:00, hw hw@gc-24.de wrote:
It will depend on the type of SSD. Ones with large cache and various smarts (SAS Enterprise type) can take many different sizes. For SATA ones it depends on what the cache and write of the SSD is and very few of them seem to be the same. The SSD also has all kinds of logic which moves data around constantly on disk to wipe level so it makes it opaque. The people who have tested this usually have to burn through an SSD set to get an idea about a particular 'run' of a model but it doesn't go over every version of the model of SATA SSD.
Hm, so much to SSDs ... I can only hope they will be replaced with something better.
I have decided against putting anything onto these SSDs other than temporary data, but even for that, I would need to make an md-RAID, which I don´t want. It may work or not, and "may work" is not enough.
May work is part of any commodity hardware build. The SATA hard drives do not use the same technology as 4 years ago and you may end up with them having crap out on shorter lifetimes because they aren't built to live longer than 3 years depending on the model. [It doesn't matter the brand.. they get built with the same tech and at the same place these days.]
Spinning disks don´t have the trouble with writing as SSDs have, and they are used for hardware RAID in this case. The SSDs may work with hardware RAID or may not. They may work for their purpose or they may not. The spinning disks will work, they have done so for the last two years --- with ZFS rather than hardware RAID, but WD Reds should do fine, and do so since about a month now.
My experience is that spinning disks fail either within the first three months or when about three years old --- or virtually never because they get so old that they are being replaced by disks with greater capacity before they fail.
Nowadays, what isn´t build to fail as soon as the manufacturer can get away with? :( Even cars you pay 70k for are built to fail after only three years, same as those that cost 256k used. (The 256k one I saw at a BWM dealer, and the sales guy told me they are built to fail. Go figure :) )
If the performance on the hardware RAID isn´t as good, it can not get worse than it is now, and it may be even better than with the SSDs.
I have two at home with the system installed on btrfs. I´m going to change that to md-RAID1 and xfs. Is there anything special involved in copying the system to another disk? Will 'cp -ax' do, or should I use rsync to copy xattrs etc.? Using the commonly used stripe size of 128kb is something I´d expect the SSDs being able to handle.
Depending on what CentOS you are working, cp -a will preserve xattrs.
Centos 7
On 9/13/2017 9:21 AM, Stephen John Smoogen wrote:
...The SATA hard drives....[It doesn't matter the brand.. they get built with the same tech and at the same place these days.]
thats most assuredly not true. HD manufacturing is extremely competitive, there's no WAY they 'are built at the same place'. Each of the few remaining major brands of HD's has their own processes, their own factories and keeps their technology very closely guarded.
On Wed, September 13, 2017 2:16 pm, John R Pierce wrote:
On 9/13/2017 9:21 AM, Stephen John Smoogen wrote:
...The SATA hard drives....[It doesn't matter the brand.. they get built with the same tech and at the same place these days.]
thats most assuredly not true.  HD manufacturing is extremely competitive, there's no WAY they 'are built at the same place'. Each of the few remaining major brands of HD's has their own processes, their own factories and keeps their technology very closely guarded.
Fully agree. And I even have my beloved manufacturer (whose drives have failure rate almost and order of magnitude lower that others).
Valeri
-- john r pierce, recycling bits in santa cruz
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 13 September 2017 at 16:18, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
On Wed, September 13, 2017 2:16 pm, John R Pierce wrote:
On 9/13/2017 9:21 AM, Stephen John Smoogen wrote:
...The SATA hard drives....[It doesn't matter the brand.. they get built with the same tech and at the same place these days.]
thats most assuredly not true.  HD manufacturing is extremely competitive, there's no WAY they 'are built at the same place'. Each of the few remaining major brands of HD's has their own processes, their own factories and keeps their technology very closely guarded.
Fully agree. And I even have my beloved manufacturer (whose drives have failure rate almost and order of magnitude lower that others).
Valeri
You are correct and I was way out on hyperbole on my comment. I was aiming for going over the consolidation of brands where low-end drives are made at the same place and rebadged to different vendors and took it to the extreme.
On Fri, Sep 8, 2017 at 2:52 PM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
manufacturers/models here. My choices would be: Areca or LSI (bought out by Intel, so former LSI chipset and microcode/firmware) and as SSD Samsung
Intel only purchased the networking component of LSI, Axxia, from Avago. The RAID division was merged into Broadcom (post Avago merger).
Valeri Galtsev wrote:
Thanks. That seems to clear fog a little bit. I still would like to hear manufacturers/models here. My choices would be: Areca or LSI (bought out by Intel, so former LSI chipset and microcode/firmware) and as SSD Samsung Evo SATA III. Does anyone who used these in hardware RAID can offer any bad experience description?
It depends on your budget and on the hardware you plan to use the controller with, and on what you´re intending to do. I wouldn´t recommend using SSDs that are not explicitly rated for use with hardware RAID with hardware RAID.
Samsung seems to have firmware bugs that makes the kernel/btrfs disable some features. I´d go with Intel SSDs and either use md-RAID or btrfs, but the reliability of btrfs is questionable, and md-RAID has a performance penalty.
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
I hate top posting, but since you've got two items I want to comment on, I'll suck it up for now.
I do, too, yet sometimes it´s reasonable. I also hate it when the lines are too long :)
I'm afraid you'll have to live with it a bit longer. Sorry.
Having SSDs alone will give you great performance regardless of filesystem.
It depends, i. e. I can´t tell how these SSDs would behave if large amounts of data would be written and/or read to/from them over extended periods of time because I haven´t tested that. That isn´t the application, anyway.
If your I/O is going to be heavy (and you've not mentioned expected traffic, so we can only go on what little we glean from your posts), then SSDs will likely start having issues sooner than a mechanical drive might. (Though, YMMV.) As I've said, we process 600 million messages a month, on primary SSDs in a VMWare cluster, with mechanical storage for older, archived user mail. Archived, may not be exactly correct, but the context should be clear.
BTRFS isn't going to impact I/O any more significantly than, say, XFS.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services. I don´t know if the software RAID of btrfs is better in that or not, though, but I´m seeing btrfs on SSDs being fast, and testing with the particular application has shown a speedup of factor 20--30.
I never said anything about MD RAID. I trust that about as far as I could throw it. And having had 5 surgeries on my throwing shoulder wouldn't be far.
That is the crucial improvement. If the hardware RAID delivers that, I´ll use that and probably remove the SSDs from the machine as it wouldn´t even make sense to put temporary data onto them because that would involve software RAID.
Again, if the idea is to have fast primary storage, there are pretty large SSDs available now and I've hardware RAIDED SSDs before without trouble, though not for any heavy lifting, it's my test servers at home. Without an idea of the expected mail traffic, this is all speculation.
It does have serious stability/data integrity issues that XFS doesn't have. There's no reason not to use SSDs for storage of immediate data and mechanical drives for archival data storage.
As for VMs we run a huge Zimbra cluster in VMs on VPC with large primary SSD volumes and even larger (and slower) secondary volumes for archived mail. It's all CentOS 6 and works very well. We process 600 million emails a month on that virtual cluster. All EXT4 inside LVM.
Do you use hardware RAID with SSDs?
We do not here where I work, but that was setup LONG before I arrived.
I can't tell you what to do, but it seems to me you're viewing your setup from a narrow SSD/BTRFS standpoint. Lots of ways to skin that cat.
That´s because I do not store data on a single disk, without redundancy, and the SSDs I have are not suitable for hardware RAID. So what else is there but either md-RAID or btrfs when I do not want to use ZFS? I also do not want to use md-RAID, hence only btrfs remains. I also like to use sub-volumes, though that isn´t a requirement (because I can use directories instead and loose the ability to make snapshots).
If the SSDs you have aren't suitable for hardware RAID, then they aren't good for production level mail spools, IMHO. I mean, you're talking like you're expecting a metric buttload of mail traffic, so it stands to reason you'll need really beefy hardware. I don't think you can do what you seem to need on budget hardware. Personally, and solely based on this thread alone, if I was building this in-house, I'd get a decent server cluster together and build a FC or iSCSI SAN to a Nimble storage array with Flash/SSD front ends and large HDDs in the back end. This solves virtually all your problems. The servers will have tiny SSD boot drives (which I prefer over booting from the SAN) and then everything else gets handled by the storage back-end.
In effect this is how our mail servers are setup here. And they are virtual.
I stay away from LVM because that just sucks. It wouldn´t even have any advantage in this case.
LVM is a joke. It's always been something I've avoided like the plague.
On 09/08/2017 08:07 AM, hw wrote:
PS:
What kind of storage solutions do people use for cyrus mail spools? Apparently you can not use remote storage, at least not NFS. That even makes it difficult to use a VM due to limitations of available disk space.
I´m reluctant to use btrfs, but there doesn´t seem to be any reasonable alternative.
hw wrote:
Mark Haney wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Btrfs appears rather useful because the disks are SSDs, because it allows me to create subvolumes and because it handles SSDs nicely. Unfortunately, the SSDs are not suited for hardware RAID.
The only alternative I know is xfs or ext4 on mdadm and no subvolumes, and md RAID has severe performance penalties which I´m not willing to afford.
Part of the data I plan to store on these SSDs greatly benefits from the low latency, making things about 20--30 times faster for an important application.
So what should I do?
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Mark Haney wrote:
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
I hate top posting, but since you've got two items I want to comment on, I'll suck it up for now.
I do, too, yet sometimes it´s reasonable. I also hate it when the lines are too long :)
I'm afraid you'll have to live with it a bit longer. Sorry.
Having SSDs alone will give you great performance regardless of filesystem.
It depends, i. e. I can´t tell how these SSDs would behave if large amounts of data would be written and/or read to/from them over extended periods of time because I haven´t tested that. That isn´t the application, anyway.
If your I/O is going to be heavy (and you've not mentioned expected traffic, so we can only go on what little we glean from your posts), then SSDs will likely start having issues sooner than a mechanical drive might. (Though, YMMV.) As I've said, we process 600 million messages a month, on primary SSDs in a VMWare cluster, with mechanical storage for older, archived user mail. Archived, may not be exactly correct, but the context should be clear.
I/O is not heavy in that sense, that´s why I said that´s not the application. There is I/O which, as tests have shown, benefits greatly from low latency, which is where the idea to use SSDs for the relevant data has arisen from. This I/O only involves a small amount of data and is not sustained over long periods of time. What exactly the problem is with the application being slow with spinning disks is unknown because I don´t have the sources, and the maker of the application refuses to deal with the problem entirely.
Since the data requiring low latency will occupy about 5% of the available space on the SSDs and since they are large enough to hold the mail spool for about 10 years at its current rate of growth besides that data, these SSDs could be well used to hold that mail spool.
BTRFS isn't going to impact I/O any more significantly than, say, XFS.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services. I don´t know if the software RAID of btrfs is better in that or not, though, but I´m seeing btrfs on SSDs being fast, and testing with the particular application has shown a speedup of factor 20--30.
I never said anything about MD RAID. I trust that about as far as I could throw it. And having had 5 surgeries on my throwing shoulder wouldn't be far.
How else would I create a RAID with these SSDs?
I´ve been using md-RAID for years, and it always worked fine.
That is the crucial improvement. If the hardware RAID delivers that, I´ll use that and probably remove the SSDs from the machine as it wouldn´t even make sense to put temporary data onto them because that would involve software RAID.
Again, if the idea is to have fast primary storage, there are pretty large SSDs available now and I've hardware RAIDED SSDs before without trouble, though not for any heavy lifting, it's my test servers at home. Without an idea of the expected mail traffic, this is all speculation.
The SSDs don´t need to be large, and they aren´t. They are already greatly oversized at 512GB nominal capacity.
There´s only a few hundred emails per day. There is no special requirement for their storage, but there is a lot of free space on these SSDs, and since the email traffic is mostly read-only, it won´t wear out the SSDs. It simply would make sense to put the mail spool onto these SSDs.
It does have serious stability/data integrity issues that XFS doesn't have. There's no reason not to use SSDs for storage of immediate data and mechanical drives for archival data storage.
As for VMs we run a huge Zimbra cluster in VMs on VPC with large primary SSD volumes and even larger (and slower) secondary volumes for archived mail. It's all CentOS 6 and works very well. We process 600 million emails a month on that virtual cluster. All EXT4 inside LVM.
Do you use hardware RAID with SSDs?
We do not here where I work, but that was setup LONG before I arrived.
Probably with the very expensive SSDs suited for this ...
I can't tell you what to do, but it seems to me you're viewing your setup from a narrow SSD/BTRFS standpoint. Lots of ways to skin that cat.
That´s because I do not store data on a single disk, without redundancy, and the SSDs I have are not suitable for hardware RAID. So what else is there but either md-RAID or btrfs when I do not want to use ZFS? I also do not want to use md-RAID, hence only btrfs remains. I also like to use sub-volumes, though that isn´t a requirement (because I can use directories instead and loose the ability to make snapshots).
If the SSDs you have aren't suitable for hardware RAID, then they aren't good for production level mail spools, IMHO. I mean, you're talking like you're expecting a metric buttload of mail traffic, so it stands to reason you'll need really beefy hardware. I don't think you can do what you seem to need on budget hardware. Personally, and solely based on this thread alone, if I was building this in-house, I'd get a decent server cluster together and build a FC or iSCSI SAN to a Nimble storage array with Flash/SSD front ends and large HDDs in the back end. This solves virtually all your problems. The servers will have tiny SSD boot drives (which I prefer over booting from the SAN) and then everything else gets handled by the storage back-end.
If SSDs not suitable for RAID usage aren´t suitable for production use, then basically all SSDs not suitable for RAID usage are SSDs that can´t be used for anything that requires something less volatile than a ramdisk. Experience with such SSDs contradicts this so far.
There is no "storage backend" but a file server, which, instead of 99.95% idling, is being asisgned additional tasks, and since it is difficult to put a cyrus mail spool on remote storage, the email server is one of these tasks.
In effect this is how our mail servers are setup here. And they are virtual.
You have entirely different requirements.
I stay away from LVM because that just sucks. It wouldn´t even have any advantage in this case.
LVM is a joke. It's always been something I've avoided like the plague.
I´ve also avoided it until I had an application where it would have been advantageous if it actually provided the benefits it seems supposed to provide. It turned out that it didn´t and only made things much worse, and I continue to stay away from it.
After all, you´re saying it´s a bad idea to use these SSDs, especially with btrfs. I don´t feel good about it, either, and I´ll try to avoid using them.
hw wrote:
Mark Haney wrote:
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
<snip>
Probably with the very expensive SSDs suited for this ...
<snip>
That´s because I do not store data on a single disk, without redundancy, and the SSDs I have are not suitable for hardware RAID.
<snip> That's a biggie: are these SSDs consumer grade, or enterprise grade? It was common knowledge 8-9 years ago that you *never* want consumer grade in anything that mattered, other than maybe a home PC - they wear out much sooner.
But then, you can't really use consumer grade h/ds in a server. We like the NAS-rated ones, like WD Red, which are about 1.33% the price of consumer grade, and solid... and a lot less than the enterprise-grade, which are about 3x consumer grade.
mark
m.roth@5-cent.us wrote:
hw wrote:
Mark Haney wrote:
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
<snip> > Probably with the very expensive SSDs suited for this ... <snip> >>> >>> That´s because I do not store data on a single disk, without >>> redundancy, and the SSDs I have are not suitable for hardware RAID. <snip> That's a biggie: are these SSDs consumer grade, or enterprise grade? It
They are not specifically enterprise rated and especially not for use with hardware RAID.
was common knowledge 8-9 years ago that you *never* want consumer grade in anything that mattered, other than maybe a home PC - they wear out much sooner.
Similar SSDs are in use in a server for about 2 years now as cache for ZFS, and there haven´t been any issues with them.
But then, you can't really use consumer grade h/ds in a server. We like the NAS-rated ones, like WD Red, which are about 1.33% the price of consumer grade, and solid... and a lot less than the enterprise-grade, which are about 3x consumer grade.
Those are pretty worthwhile, though not the fastest. Out of 14, one has failed over the last 3 years or so, and it was still under warranty. They do serve their purpose.
On 09/08/2017 01:31 PM, hw wrote:
Mark Haney wrote:
I/O is not heavy in that sense, that´s why I said that´s not the application. There is I/O which, as tests have shown, benefits greatly from low latency, which is where the idea to use SSDs for the relevant data has arisen from. This I/O only involves a small amount of data and is not sustained over long periods of time. What exactly the problem is with the application being slow with spinning disks is unknown because I don´t have the sources, and the maker of the application refuses to deal with the problem entirely.
Since the data requiring low latency will occupy about 5% of the available space on the SSDs and since they are large enough to hold the mail spool for about 10 years at its current rate of growth besides that data, these SSDs could be well used to hold that mail spool.
See, this is the kind of information that would have made this thread far shorter. (Maybe.) The one thing that you didn't explain is whether this application is the one /using/ the mail spool or if you're adding Cyrus to that system to be a mail server.
BTRFS isn't going to impact I/O any more significantly than, say, XFS.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services. I don´t know if the software RAID of btrfs is better in that or not, though, but I´m seeing btrfs on SSDs being fast, and testing with the particular application has shown a speedup of factor 20--30.
I never said anything about MD RAID. I trust that about as far as I could throw it. And having had 5 surgeries on my throwing shoulder wouldn't be far.
How else would I create a RAID with these SSDs?
I´ve been using md-RAID for years, and it always worked fine.
That is the crucial improvement. If the hardware RAID delivers that, I´ll use that and probably remove the SSDs from the machine as it wouldn´t even make sense to put temporary data onto them because that would involve software RAID.
Again, if the idea is to have fast primary storage, there are pretty large SSDs available now and I've hardware RAIDED SSDs before without trouble, though not for any heavy lifting, it's my test servers at home. Without an idea of the expected mail traffic, this is all speculation.
The SSDs don´t need to be large, and they aren´t. They are already greatly oversized at 512GB nominal capacity.
There´s only a few hundred emails per day. There is no special requirement for their storage, but there is a lot of free space on these SSDs, and since the email traffic is mostly read-only, it won´t wear out the SSDs. It simply would make sense to put the mail spool onto these SSDs.
It does have serious stability/data integrity issues that XFS doesn't have. There's no reason not to use SSDs for storage of immediate data and mechanical drives for archival data storage.
As for VMs we run a huge Zimbra cluster in VMs on VPC with large primary SSD volumes and even larger (and slower) secondary volumes for archived mail. It's all CentOS 6 and works very well. We process 600 million emails a month on that virtual cluster. All EXT4 inside LVM.
Do you use hardware RAID with SSDs?
We do not here where I work, but that was setup LONG before I arrived.
Probably with the very expensive SSDs suited for this ...
Possibly, but that's somewhat irrelevant. I've taken off the shelf SSDs and hardware RAID'd them. If they work for the hell I put them through (processing weather data), they'll work for the type of service you're saying you have.
If the SSDs you have aren't suitable for hardware RAID, then they aren't good for production level mail spools, IMHO. I mean, you're talking like you're expecting a metric buttload of mail traffic, so it stands to reason you'll need really beefy hardware. I don't think you can do what you seem to need on budget hardware. Personally, and solely based on this thread alone, if I was building this in-house, I'd get a decent server cluster together and build a FC or iSCSI SAN to a Nimble storage array with Flash/SSD front ends and large HDDs in the back end. This solves virtually all your problems. The servers will have tiny SSD boot drives (which I prefer over booting from the SAN) and then everything else gets handled by the storage back-end.
If SSDs not suitable for RAID usage aren´t suitable for production use, then basically all SSDs not suitable for RAID usage are SSDs that can´t be used for anything that requires something less volatile than a ramdisk. Experience with such SSDs contradicts this so far.
Not true at all. Maybe 5 years ago SSDs were hit or miss with hardware RAID. Not anymore. It's just another drive to the system, the controllers don't know the difference between a SATA HDD and a SATA SSD. Couple that with the low volume of mail, and you should be fine for HW RAID.
There is no "storage backend" but a file server, which, instead of 99.95% idling, is being asisgned additional tasks, and since it is difficult to put a cyrus mail spool on remote storage, the email server is one of these tasks.
Again, you never mentioned the volume of mail expected, and your previous threads seemed to indicate you were expecting enough to cause issues with SSDs and BTRFS. In IT when we get a 'my printer is broken', we ask for more info since that's not descriptive enough. If this server as is asleep and you (now) make it sound, BTRFS might be fine. Though, personally, I'd avoid it regardless.
In effect this is how our mail servers are setup here. And they are virtual.
You have entirely different requirements.
I know that now. Previously, you made it sound the your mail flow would be a lot closer to 'heavy' than what you've finally described. I can only offer thoughts based on what information I'm given.
I stay away from LVM because that just sucks. It wouldn´t even have any advantage in this case.
LVM is a joke. It's always been something I've avoided like the plague.
I´ve also avoided it until I had an application where it would have been advantageous if it actually provided the benefits it seems supposed to provide. It turned out that it didn´t and only made things much worse, and I continue to stay away from it.
After all, you´re saying it´s a bad idea to use these SSDs, especially with btrfs. I don´t feel good about it, either, and I´ll try to avoid using them.
No, I'm not saying not to use your SSDs. I'm saying that BTRFS is not worth using in any server. The SSD question, prompted by you, was whether the SSDs could: 1) be hardware RAID'd 2) handle the load of mail you were expecting.
512GB SSDs are new enough to probably be HW RAID'd fine, assuming they are weird ones from a third party no one has really heard of. I know because my last company bought some inexpensive (I call them knockoffs) third party SSDs that were utter crap from the moment an OS was installed on them. If yours are from Seagate, WG, or other bigname drive maker, I would be surprised if they choked being on a hardware RAID card. A setup like yours doesn't appear to need 'Enterprise' level hardware, SMB hardware appears would work for you just as well.
Just not with BTRFS. On any drive. Ever.
Mark Haney wrote:
On 09/08/2017 01:31 PM, hw wrote:
Mark Haney wrote:
<snip>
Probably with the very expensive SSDs suited for this ...
Possibly, but that's somewhat irrelevant. I've taken off the shelf SSDs and hardware RAID'd them. If they work for the hell I put them through (processing weather data), they'll work for the type of service you're saying you have.
<snip>
Not true at all. Maybe 5 years ago SSDs were hit or miss with hardware RAID. Not anymore. It's just another drive to the system, the controllers don't know the difference between a SATA HDD and a SATA SSD. Couple that with the low volume of mail, and you should be fine for HW RAID.
<snip> Actually, with the usage you're talking about, I'm surprised you're using SATA and not SAS.
mark
Mark Haney wrote:
On 09/08/2017 01:31 PM, hw wrote:
Mark Haney wrote:
I/O is not heavy in that sense, that´s why I said that´s not the application. There is I/O which, as tests have shown, benefits greatly from low latency, which is where the idea to use SSDs for the relevant data has arisen from. This I/O only involves a small amount of data and is not sustained over long periods of time. What exactly the problem is with the application being slow with spinning disks is unknown because I don´t have the sources, and the maker of the application refuses to deal with the problem entirely.
Since the data requiring low latency will occupy about 5% of the available space on the SSDs and since they are large enough to hold the mail spool for about 10 years at its current rate of growth besides that data, these SSDs could be well used to hold that mail spool.
See, this is the kind of information that would have made this thread far shorter. (Maybe.) The one thing that you didn't explain is whether this application is the one /using/ the mail spool or if you're adding Cyrus to that system to be a mail server.
It was a simple question to begin with; I only wanted to know if something speaks against using btrfs for a cyrus mail spool. There are things that speak against doing that with NFS, so there might be things with btrfs.
The application doesn´t use the mail spool at all, it has its own dataset.
Do you use hardware RAID with SSDs?
We do not here where I work, but that was setup LONG before I arrived.
Probably with the very expensive SSDs suited for this ...
Possibly, but that's somewhat irrelevant. I've taken off the shelf SSDs and hardware RAID'd them. If they work for the hell I put them through (processing weather data), they'll work for the type of service you're saying you have.
Well, I can´t very well test them with the mail spool, so I´ve beeing going with what I´ve been reading about SSDs with hardware RAID.
If the SSDs you have aren't suitable for hardware RAID, then they aren't good for production level mail spools, IMHO. I mean, you're talking like you're expecting a metric buttload of mail traffic, so it stands to reason you'll need really beefy hardware. I don't think you can do what you seem to need on budget hardware. Personally, and solely based on this thread alone, if I was building this in-house, I'd get a decent server cluster together and build a FC or iSCSI SAN to a Nimble storage array with Flash/SSD front ends and large HDDs in the back end. This solves virtually all your problems. The servers will have tiny SSD boot drives (which I prefer over booting from the SAN) and then everything else gets handled by the storage back-end.
If SSDs not suitable for RAID usage aren´t suitable for production use, then basically all SSDs not suitable for RAID usage are SSDs that can´t be used for anything that requires something less volatile than a ramdisk. Experience with such SSDs contradicts this so far.
Not true at all. Maybe 5 years ago SSDs were hit or miss with hardware RAID. Not anymore. It's just another drive to the system, the controllers don't know the difference between a SATA HDD and a SATA SSD. Couple that with the low volume of mail, and you should be fine for HW RAID.
I´d need another controller to do hardware RAID, which would require another slot on board, and IIRC, there isn´t a suitable one free anymore. Or I´d have to replace two of the other disks with the SSDs, and that won´t be a good thing to do.
There is no "storage backend" but a file server, which, instead of 99.95% idling, is being asisgned additional tasks, and since it is difficult to put a cyrus mail spool on remote storage, the email server is one of these tasks.
Again, you never mentioned the volume of mail expected, and your previous threads seemed to indicate you were expecting enough to cause issues with SSDs and BTRFS. In IT when we get a 'my printer is broken', we ask for more info since that's not descriptive enough. If this server as is asleep and you (now) make it sound, BTRFS might be fine. Though, personally, I'd avoid it regardless.
Of course --- the issue, or question, is btrfs, not the SSDs.
After all, you´re saying it´s a bad idea to use these SSDs, especially with btrfs. I don´t feel good about it, either, and I´ll try to avoid using them.
No, I'm not saying not to use your SSDs. I'm saying that BTRFS is not worth using in any server. The SSD question, prompted by you, was whether the SSDs could:
- be hardware RAID'd
- handle the load of mail you were expecting.
Yes, I´m the one saying not to use them. My question was if there´s anything that speaks against using btrfs for a cyrus mail spool. It wasn´t about SSDs.
Hardware RAID for the SSDs is not really an option because the ports of the controllers are used otherwise, and it is unknown how well these SSDs would work with them. Otherwise I wouldn´t consider using btrfs.
512GB SSDs are new enough to probably be HW RAID'd fine, assuming they are weird ones from a third party no one has really heard of. I know because my last company bought some inexpensive (I call them knockoffs) third party SSDs that were utter crap from the moment an OS was installed on them. If yours are from Seagate, WG, or other bigname drive maker, I would be surprised if they choked being on a hardware RAID card. A setup like yours doesn't appear to need 'Enterprise' level hardware, SMB hardware appears would work for you just as well.
Just not with BTRFS. On any drive. Ever.
Well, that´s a problem because when you don´t want md-RAID and can´t do hardware RAID, the only other option is ZFS, which I don´t want either. That leaves me with not using the SSDs at all.
Am 09.09.2017 um 19:22 schrieb hw hw@gc-24.de:
Mark Haney wrote:
On 09/08/2017 01:31 PM, hw wrote:
Mark Haney wrote:
I/O is not heavy in that sense, that´s why I said that´s not the application. There is I/O which, as tests have shown, benefits greatly from low latency, which is where the idea to use SSDs for the relevant data has arisen from. This I/O only involves a small amount of data and is not sustained over long periods of time. What exactly the problem is with the application being slow with spinning disks is unknown because I don´t have the sources, and the maker of the application refuses to deal with the problem entirely.
Since the data requiring low latency will occupy about 5% of the available space on the SSDs and since they are large enough to hold the mail spool for about 10 years at its current rate of growth besides that data, these SSDs could be well used to hold that mail spool.
See, this is the kind of information that would have made this thread far shorter. (Maybe.) The one thing that you didn't explain is whether this application is the one /using/ the mail spool or if you're adding Cyrus to that system to be a mail server.
It was a simple question to begin with; I only wanted to know if something speaks against using btrfs for a cyrus mail spool. There are things that speak against doing that with NFS, so there might be things with btrfs.
The application doesn´t use the mail spool at all, it has its own dataset.
Do you use hardware RAID with SSDs?
We do not here where I work, but that was setup LONG before I arrived.
Probably with the very expensive SSDs suited for this ...
Possibly, but that's somewhat irrelevant. I've taken off the shelf SSDs and hardware RAID'd them. If they work for the hell I put them through (processing weather data), they'll work for the type of service you're saying you have.
Well, I can´t very well test them with the mail spool, so I´ve beeing going with what I´ve been reading about SSDs with hardware RAID.
It really depends on the RAID-controller and the SSDs. Every RAID-controller has a maximum number of IOPS it can process.
Also, as pointed out, consumer SSD have various deficiencies that make them unsuitable for enterprise-use:
https://blogs.technet.microsoft.com/filecab/2016/11/18/dont-do-it-consumer-s... https://blogs.technet.microsoft.com/filecab/2016/11/18/dont-do-it-consumer-ssd/
Enterprise SSDs also fail much more predictably. You basically get an SLA with them about the DWPD/TBW data.
For small amounts of highly volatile data, I recommend looking into Optane SSDs.
Well, that´s a problem because when you don´t want md-RAID and can´t do hardware RAID, the only other option is ZFS, which I don´t want either. That leaves me with not using the SSDs at all.
As for BTRFS: RedHat dumped it. So, it’s a SuSE/Ubuntu thing right now. Make of that what you want ;-)
Personally, I’d prefer to use ZFS for SSDs. No Hardware-RAID for sure. Not sure if I’d use it on anything else but FreeBSD (even though a Linux port is available and code-wise it’s more or less the same).
From personal experience, it’s better to even ditch the non-RAID HBA and just go with NVMe SSDs for the 2.5“ drive slots (a.k.a. 8639 a.k.a U.2 form factor). If you have spare PCIe slots, you can also go for HHHL PCIe NVMe cards - but of course, you’d have to RAID them.
Mark Haney wrote:
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
<snip>
It depends, i. e. I can´t tell how these SSDs would behave if large amounts of data would be written and/or read to/from them over extended periods of time because I haven´t tested that. That isn´t the application, anyway.
If your I/O is going to be heavy (and you've not mentioned expected traffic, so we can only go on what little we glean from your posts), then SSDs will likely start having issues sooner than a mechanical drive might. (Though, YMMV.) As I've said, we process 600 million messages a month, on primary SSDs in a VMWare cluster, with mechanical storage for older, archived user mail. Archived, may not be exactly correct, but the context should be clear.
One thing to note, which I'm aware of because I was recently spec'ing out a Dell server: Dell, at least, offers two kinds of SSDs, one for heavy write, I think it was, and one for equal r/w. You might dig into that.
But mdadm does, the impact is severe. I know there are ppl saying otherwise, but I´ve seen the impact myself, and I definitely don´t want it on that particular server because it would likely interfere with other services. I don´t know if the software RAID of btrfs is better in that or not, though, but I´m seeing btrfs on SSDs being fast, and testing with the particular application has shown a speedup of factor 20--30.
Odd, we've never seen anything like that. Of course, we're not handling the kind of mail you are... but serious scientific computing hits storage hard, also.
I never said anything about MD RAID. I trust that about as far as I could throw it. And having had 5 surgeries on my throwing shoulder wouldn't be far.
Why? We have it all over, and have never seen a problem with it. Nor have I, personally, as I have a RAID 1 at home. <snip>
mark
m.roth@5-cent.us wrote:
Mark Haney wrote:
On 09/08/2017 09:49 AM, hw wrote:
Mark Haney wrote:
<snip> >> >> It depends, i. e. I can´t tell how these SSDs would behave if large >> amounts of data would be written and/or read to/from them over extended >> periods of time because I haven´t tested that. That isn´t the >> application, anyway. > > If your I/O is going to be heavy (and you've not mentioned expected > traffic, so we can only go on what little we glean from your posts), > then SSDs will likely start having issues sooner than a mechanical drive > might. (Though, YMMV.) As I've said, we process 600 million messages a > month, on primary SSDs in a VMWare cluster, with mechanical storage for > older, archived user mail. Archived, may not be exactly correct, but > the context should be clear. > One thing to note, which I'm aware of because I was recently spec'ing out a Dell server: Dell, at least, offers two kinds of SSDs, one for heavy write, I think it was, and one for equal r/w. You might dig into that. >> >> But mdadm does, the impact is severe. I know there are ppl saying >> otherwise, but I´ve seen the impact myself, and I definitely don´t want >> it on that particular server because it would likely interfere with other >> services. I don´t know if the software RAID of btrfs is better in that >> or not, though, but I´m seeing btrfs on SSDs being fast, and testing >> with the particular application has shown a speedup of factor 20--30.
Odd, we've never seen anything like that. Of course, we're not handling the kind of mail you are... but serious scientific computing hits storage hard, also.
I never said anything about MD RAID. I trust that about as far as I could throw it. And having had 5 surgeries on my throwing shoulder wouldn't be far.
Why? We have it all over, and have never seen a problem with it. Nor have I, personally, as I have a RAID 1 at home.
<snip>
Make a test and replace a software RAID5 with a hardware RAID5. Even with only 4 disks, you will see an overall performance gain. I´m guessing that the SATA controllers they put onto the mainboards are not designed to handle all the data --- which gets multiplied to all the disks --- and that the PCI bus might get clogged. There´s also the CPU being burdened with the calculations required for the RAID, and that may not be displayed by tools like top, so you can be fooled easily.
Graphics cards have acceleration in hardware for a reason. What was the last time you tried to do software rendering, and what frame rates did you get? :) Offloading the I/O to a designated controller gives you room for the things you actually want to do, similar to a graphics card.
On 09/08/2017 11:06 AM, hw wrote:
Make a test and replace a software RAID5 with a hardware RAID5. Even with only 4 disks, you will see an overall performance gain. I´m guessing that the SATA controllers they put onto the mainboards are not designed to handle all the data --- which gets multiplied to all the disks --- and that the PCI bus might get clogged. There´s also the CPU being burdened with the calculations required for the RAID, and that may not be displayed by tools like top, so you can be fooled easily.
That sounds like a whole lot of guesswork, which I'd suggest should inspire slightly less confidence than you are showing in it.
RAID parity calculations are accounted under a process named md<number>_raid<level>. You will see time consumed by that code under all of the normal process accounting tools, including total time under "ps" and current time under "top". Typically, your CPU is vastly faster than the cheap processors on hardware RAID controllers, and the advantage will go to software RAID over hardware. If your system is CPU bound, however, and you need that extra fraction of a percent of CPU cycles that go to calculating parity, hardware might offer an advantage.
The last system I purchased had its storage controller on a PCIe 3.0 x16 port, so its throughput to the card should be around 16GB/s. Yours might be different. I should be able to put roughly 20 disks on that card before the PCIe bus is the bottleneck. If this were a RAID6 volume, a hardware RAID card would be able to support sustained writes to 22 drives vs 20 for md RAID. I don't see that as a compelling advantage, but it is potentially an advantage for a hypothetical hardware RAID card.
When you are testing your 4 disk RAID5 array, microbenchmarks like bonnie++ will show you a very significant advantage toward the hardware RAID as very small writes are added to the battery-backed cache on the card and the OS considers them complete. However, on many cards, if the system writes data to the card faster than the card writes to disks, the cache will fill up, and at that point, the system performance can suddenly and unexpectedly plummet. I've fun a few workloads where that happened, and we had to replace the system entirely, and use software RAID instead. Software RAID's performance tends to be far more predictable as the workload increases.
Outside of microbenchmarks like bonnie++, software RAID often offers much better performance than hardware RAID controllers. Having tested systems extensively for many years, my advice is this: there is no simple answer to the question of whether software or hardware RAID is better. You need to test your specific application on your specific hardware to determine what configuration will work best. There are some workloads where a hardware controller will offer better write performance, since a battery backed write-cache can complete very small random writes very quickly. If that is not the specific behavior of your application, software RAID will very often offer you better performance, as well as other advantages. On the other hand, software RAID absolutely requires a monitored UPS and tested auto-shutdown in order to be remotely reliable, just as a hardware RAID controller requires a battery backed write-cache, and monitoring of the battery state.
Gordon Messmer wrote:
On 09/08/2017 11:06 AM, hw wrote:
Make a test and replace a software RAID5 with a hardware RAID5. Even with only 4 disks, you will see an overall performance gain. I´m guessing that the SATA controllers they put onto the mainboards are not designed to handle all the data --- which gets multiplied to all the disks --- and that the PCI bus might get clogged. There´s also the CPU being burdened with the calculations required for the RAID, and that may not be displayed by tools like top, so you can be fooled easily.
That sounds like a whole lot of guesswork, which I'd suggest should inspire slightly less confidence than you are showing in it.
It´s called "experience". I haven´t tested a great number of machines extensively to experience the difference between software and hardware on them, and I agree with what you´re saying. It´s all theory until it has been suitably tested, hence my recommendation to test it.
RAID parity calculations are accounted under a process named md<number>_raid<level>. You will see time consumed by that code under all of the normal process accounting tools, including total time under "ps" and current time under "top". Typically, your CPU is vastly faster than the cheap processors on hardware RAID controllers, and the advantage will go to software RAID over hardware. If your system is CPU bound, however, and you need that extra fraction of a percent of CPU cycles that go to calculating parity, hardware might offer an advantage.
The last system I purchased had its storage controller on a PCIe 3.0 x16 port, so its throughput to the card should be around 16GB/s. Yours might be different. I should be able to put roughly 20 disks on that card before the PCIe bus is the bottleneck. If this were a RAID6 volume, a hardware RAID card would be able to support sustained writes to 22 drives vs 20 for md RAID. I don't see that as a compelling advantage, but it is potentially an advantage for a hypothetical hardware RAID card.
When you are testing your 4 disk RAID5 array, microbenchmarks like bonnie++ will show you a very significant advantage toward the hardware RAID as very small writes are added to the battery-backed cache on the card and the OS considers them complete. However, on many cards, if the system writes data to the card faster than the card writes to disks, the cache will fill up, and at that point, the system performance can suddenly and unexpectedly plummet. I've fun a few workloads where that happened, and we had to replace the system entirely, and use software RAID instead. Software RAID's performance tends to be far more predictable as the workload increases.
Outside of microbenchmarks like bonnie++, software RAID often offers much better performance than hardware RAID controllers. Having tested systems extensively for many years, my advice is this: there is no simple answer to the question of whether software or hardware RAID is better. You need to test your specific application on your specific hardware to determine what configuration will work best. There are some workloads where a hardware controller will offer better write performance, since a battery backed write-cache can complete very small random writes very quickly. If that is not the specific behavior of your application, software RAID will very often offer you better performance, as well as other advantages. On the other hand, software RAID absolutely requires a monitored UPS and tested auto-shutdown in order to be remotely reliable, just as a hardware RAID controller requires a battery backed write-cache, and monitoring of the battery state.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
I think it depends on who you ask. Facebook and Netflix are using it extensively in production:
https://www.linux.com/news/learn/intro-to-linux/how-facebook-uses-linux-and-...
Though they have the in-house kernel engineering resources to troubleshoot problems. When I see quotes like this [1] on the product's WIKI:
"The parity RAID code has multiple serious data-loss bugs in it. It should not be used for anything other than testing purposes."
I'm reluctant to store anything of value on it. Have you considered using ZoL? I've been using it for quite some time and haven't lost data.
- Ryan http://prefetch.net
[1] https://btrfs.wiki.kernel.org/index.php/RAID56
On Thu, Sep 7, 2017 at 2:12 PM, Mark Haney mark.haney@neonova.net wrote:
On 09/07/2017 01:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
I might be the lone voice on this, but I refuse to use btrfs for anything, much less a mail spool. I used it in production on DB and Web servers and fought corruption issues and scrubs hanging the system more times than I can count. (This was within the last 24 months.) I was told by certain mailing lists, that btrfs isn't considered production level. So, I scrapped the lot, went to xfs and haven't had a problem since.
I'm not sure why you'd want your mail spool on a filesystem and seems to hate being hammered with reads/writes. Personally, on all my mail spools, I use XFS or EXT4. OUr servers here handle 600million messages a month without trouble on those filesystems.
Just my $0.02.
Mark Haney Network Engineer at NeoNova 919-460-3330 option 1 mark.haney@neonova.net www.neonova.net
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Matty wrote:
I think it depends on who you ask. Facebook and Netflix are using it extensively in production:
https://www.linux.com/news/learn/intro-to-linux/how-facebook-uses-linux-and-...
Though they have the in-house kernel engineering resources to troubleshoot problems. When I see quotes like this [1] on the product's WIKI:
"The parity RAID code has multiple serious data-loss bugs in it. It should not be used for anything other than testing purposes."
It´s RAID1, not 5/6. It´s only 2 SSDs.
I do not /need/ to put the mail spool there, but it makes sense because the data that benefits from the low latency fills about only 5% of them, and the spool is mostly read, resulting in not so much wear of the SSDs.
I can probably do a test with that data on the hardware RAID, and if performance is comparable, I rather put it there than on the SSDs.
I'm reluctant to store anything of value on it. Have you considered using ZoL? I've been using it for quite some time and haven't lost data.
Yes, and I´m moving away from ZFS because it remains alien, and the performance is poor. ZFS wasn´t designed with performance in mind, and that shows.
It is amazing that SSDs with Linux are still so pointless and that there is no file system available actually suited for production use providing features ZFS and btrfs are valued for. It´s even frustrating that disk access still continues to defeat performance so much.
Maybe it´s crazy wanting to put data onto SSDs with btrfs because the hardware RAID is also RAID1, for performance and better resistance against failures than RAID5 has. I guess I really shouldn´t do that.
Now I´m looking forward to the test with the hardware RAID. A RAID1 of 8 disks may yield even better performance than 2 SSDs in software RAID1 with btrfs.
On 09/07/2017 12:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
This is what Red Hat says about btrfs:
The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
Johnny Hughes wrote:
On 09/07/2017 12:57 PM, hw wrote:
Hi,
is there anything that speaks against putting a cyrus mail spool onto a btrfs subvolume?
This is what Red Hat says about btrfs:
The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux.
The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.
That surely speaks against it.
However, it´s hard to believe. They must be expecting btrfs never to become useable.