hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
Any thoughts?
best regards Michael Schumacher
On Wed, 16 Sep 2020 at 12:12, Michael Schumacher < michael.schumacher@pamas.de> wrote:
hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
I don't think so because the drives would always be out of sync but in a
restart it would be hard to know if the drive is out of sync for a good reason or a bad one. For most of the SSD raids, I have seen people just making sure to buy disks which are spec'd for more writes or similar 'smarter' enterprise trim. I have also read about the synchronicity problem but I think this may be a theory vs reality problem. In theory they should all fail at once, in reality at least for the arrays I have used for 3 years, they seem to fail in different times. that said, I only have 3 systems over 3 years with SSD drives running RAID6 so I only have anecdata versus data.
Any thoughts?
best regards Michael Schumacher
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 2020-09-16 11:26, Stephen John Smoogen wrote:
On Wed, 16 Sep 2020 at 12:12, Michael Schumacher < michael.schumacher@pamas.de> wrote:
hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
I don't think so because the drives would always be out of sync but in a
restart it would be hard to know if the drive is out of sync for a good reason or a bad one. For most of the SSD raids, I have seen people just making sure to buy disks which are spec'd for more writes or similar 'smarter' enterprise trim. I have also read about the synchronicity problem but I think this may be a theory vs reality problem. In theory they should all fail at once, in reality at least for the arrays I have used for 3 years, they seem to fail in different times. that said, I only have 3 systems over 3 years with SSD drives running RAID6 so I only have anecdata versus data.
I fully agree about synchronous failure of SSDs in RAID to be made up or grossly overrated. SSD failure _probablity_ is increased with number of write operations (into the same area). Failure still has stochastic nature. If SSD is spec'ed for N number of writes, it doesn't mean on the write N+1 SSD will fail. It only means that after N number of writes failure probability is below [some acceptable value], which, however is much higher of that of unused SSD.
That said, single SSD failure probability after long run is some small value, say q. Event of failure of another SSD is independent event from failure of first failed SSD (even though their probabilities q both increase with number of writes) hence probability of failures are:
one SSD failed: q
two SSDs failed: (q)^2
three SSDs failed: (q)^3
thus multi-failures (say, within some period of time, say 1 day, or 1 week) still are way less probable events than single failure. The following numbers have nothing to do with probability of failure of some devices, it is just an illustration, so:
if q = 10 ^ (-10) (ten to the minus 10th power), then
(q)^2 = 10 ^ (-20)
(q)^3 = 10 ^ (-30)
My apologies for saying trivial things, they just give IMHO a feeling of what to take into consideration, and what to ignore safely.
And no, I don't intend to start flame war on views of statistics, or on hardware vs software RAIDs, or RAIDs vs zfs. Just think it over and draw your own conclusions.
Valeri
Any thoughts?
best regards Michael Schumacher
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Hi Michael,
With SSD's, no matter what storage technology is used, you pay your money and you take your choice.
The more expensive SSD's have higher I/O rates, higher data bandwidth and better durability.
I would go for NVMe as this gives a higher data rate with PCIe 3.0 and PCIe 4.0 (twice the data rate) ones are just coming in to the market.
I believe that traditional Raid 5 and 6 are not required for SSD's
I have configured all my customer SSD subsystems for Raid 1 (mirror), reduced overhead.
Cost defines if the above is acceptable.
Also, do you use hardware or software Raid 1.
There are many other questions but the above is a start.
Regards, Mark Woolfson MW Consultancy Ltd Leeds LS18 4LY United Kingdom Tel: +44 113 259 1204 Mob: +44 786 065 2778 -----Original Message----- From: Michael Schumacher Sent: Wednesday, September 16, 2020 5:11 PM To: CentOS mailing list Subject: [CentOS] storage for mailserver
hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
Any thoughts?
best regards Michael Schumacher
_______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Hi Michael,
RAID 1 is not uncommon with SSDs (be them SATA/SAS/NVMe). RAID 5/6 wear SSD drives more so are generally best avoided.
You really need to monitor your SSDs health to help avoid failures. And obviously always have your backups...
-yoctozepto
On Wed, Sep 16, 2020 at 6:12 PM Michael Schumacher michael.schumacher@pamas.de wrote:
hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
Any thoughts?
best regards Michael Schumacher
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 16/09/2020 17:11, Michael Schumacher wrote:
hi,
I am planning to replace my old CentOS 6 mail server soon. Most details are quite obvious and do not need to be changed, but the old system was running on spinning discs and this is certainly not the best option for todays mail servers.
With spinning discs, HW-RAID6 was the way to go to increase reliability and speed. Today, I get the feeling, that traditional RAID is not the best option for SSDs. I am reading that all RAID members in SSD-arrays age synchronously so that the risk of a massive failure of more than one disk is more likely than with HDDs. There are many other concerns like excessive write load compared to non-raid systems, etc.
Is there any common sense what disk layout should be used these days?
I have been looking for some kind of master-slave system, where the (one or many) SSD is taking all writes and reads, but the slave HDD runs in parallel as a backup system like in a RAID1 system. Is there any such system?
Any thoughts?
You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and marking the HDD members as --write-mostly, meaning most of the reads will come from the faster SSDs retaining much of the speed advantage, but you have the redundancy of both SSDs and HDDs in the array.
Read performance is not far off native write performance of the SSD, and writes mostly cached / happen in the background so are not so noticeable on a mail server anyway.
I kind of stumbled across this setup by accident when I added an NVMe SSD to an existing RIAD1 array consisting of 2 HDDs.
# cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sda1[2](W) sdb1[4](W) nvme0n1p1[3] 485495616 blocks super 1.0 [3/3] [UUU] bitmap: 3/4 pages [12KB], 65536KB chunk
See how we have 3 devices in the above RAID1 array, 2 x HDDs, marked with a (W) indicating they are in --write-mostly mode, and one SSD (MVMe) device. I just went for 3 devices in the array as it started life as a 2 x HDD array and I added the third SSD device, but you can mix and match to suit your needs.
See the following article which may be helpful or search 'mdadm write-mostly' for more info.
Hello Phil,
Wednesday, September 16, 2020, 7:40:24 PM, you wrote:
PP> You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and PP> marking the HDD members as --write-mostly, meaning most of the reads PP> will come from the faster SSDs retaining much of the speed advantage, PP> but you have the redundancy of both SSDs and HDDs in the array.
PP> Read performance is not far off native write performance of the SSD, and PP> writes mostly cached / happen in the background so are not so noticeable PP> on a mail server anyway.
very interesting. Do you or anybody else have experience with this setup? Any test results to compare? I will do some testing if nobody can come up with comparisons.
best regards --- Michael Schumacher
On 17/09/2020 13:35, Michael Schumacher wrote:
Hello Phil,
Wednesday, September 16, 2020, 7:40:24 PM, you wrote:
PP> You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and PP> marking the HDD members as --write-mostly, meaning most of the reads PP> will come from the faster SSDs retaining much of the speed advantage, PP> but you have the redundancy of both SSDs and HDDs in the array.
PP> Read performance is not far off native write performance of the SSD, and PP> writes mostly cached / happen in the background so are not so noticeable PP> on a mail server anyway.
very interesting. Do you or anybody else have experience with this setup? Any test results to compare? I will do some testing if nobody can come up with comparisons.
best regards
Michael Schumacher
Here's a few performance stats from my setup, made with fio.
Firstly a RAID1 array from 2 x WD Black 1TB drives. Second set of figures are the same are for a RAID1 array with the same 2 WD Black 1TB drives and a WD Blue NVMe (PCIe X2) added into the array, with the 2 X HDDs set to --write-mostly.
Sequential write QD32 147MB/s (2 x HDD RAID1) 156MB/s (1 x NVMe, 2 x HDD RAID1)
The write tests give near identical performance with and without the SSD in the array as once any cache has been saturated, write speeds are presumably limited by the slowest device in the array.
Sequential read QD32 187MB/s (2 x HDD RAID1) 1725MB/s (1 x NVMe, 2 x HDD RAID1)
Sequential read QD1 162MB/s (2 x HDD RAID1) 1296MB/s (1 x NVMe, 2 x HDD RAID1)
4K random read 712kB/s (2 x HDD RAID1) 55.0MB/s (1 x NVMe, 2 x HDD RAID1)
The read speeds are a completely different story, and the array essentially performs identically to the native speed of the SSD device once the slower HDDs are set to --write-mostly, meaning the reads are prioritized to the SSD device. The SSD NVMe device is limited to PCIe X2 hence why sequential read speeds top out at 1725MB/s. Current PCIe X4 devices should be able to double that.
To summarize, a hybrid RAID1 mixing HDDs and SSDs will have write performance similar to the HDD (slowest device) and read performance similar to the SSD (fastest device) as long as the slower HDDs are added to the array with the --write-mostly flag set. Obviously these are synthetic I/O tests and may not reflect real world application performance but at least give you a good idea where the underlying bottlenecks may be.
On 9/17/20 4:25 PM, Phil Perry wrote:
On 17/09/2020 13:35, Michael Schumacher wrote:
Hello Phil,
Wednesday, September 16, 2020, 7:40:24 PM, you wrote:
PP> You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and PP> marking the HDD members as --write-mostly, meaning most of the reads PP> will come from the faster SSDs retaining much of the speed advantage, PP> but you have the redundancy of both SSDs and HDDs in the array.
PP> Read performance is not far off native write performance of the SSD, and PP> writes mostly cached / happen in the background so are not so noticeable PP> on a mail server anyway.
very interesting. Do you or anybody else have experience with this setup? Any test results to compare? I will do some testing if nobody can come up with comparisons.
best regards
Michael Schumacher
Here's a few performance stats from my setup, made with fio.
Firstly a RAID1 array from 2 x WD Black 1TB drives. Second set of figures are the same are for a RAID1 array with the same 2 WD Black 1TB drives and a WD Blue NVMe (PCIe X2) added into the array, with the 2 X HDDs set to --write-mostly.
Sequential write QD32 147MB/s (2 x HDD RAID1) 156MB/s (1 x NVMe, 2 x HDD RAID1)
The write tests give near identical performance with and without the SSD in the array as once any cache has been saturated, write speeds are presumably limited by the slowest device in the array.
Sequential read QD32 187MB/s (2 x HDD RAID1) 1725MB/s (1 x NVMe, 2 x HDD RAID1)
Sequential read QD1 162MB/s (2 x HDD RAID1) 1296MB/s (1 x NVMe, 2 x HDD RAID1)
4K random read 712kB/s (2 x HDD RAID1) 55.0MB/s (1 x NVMe, 2 x HDD RAID1)
The read speeds are a completely different story, and the array essentially performs identically to the native speed of the SSD device once the slower HDDs are set to --write-mostly, meaning the reads are prioritized to the SSD device. The SSD NVMe device is limited to PCIe X2 hence why sequential read speeds top out at 1725MB/s. Current PCIe X4 devices should be able to double that.
To summarize, a hybrid RAID1 mixing HDDs and SSDs will have write performance similar to the HDD (slowest device) and read performance similar to the SSD (fastest device) as long as the slower HDDs are added to the array with the --write-mostly flag set. Obviously these are synthetic I/O tests and may not reflect real world application performance but at least give you a good idea where the underlying bottlenecks may be.
Too bad the 4k random write tests are missing above.
I have used SSD + HDD RAID1 configurations in dozens of CentOS desktops and servers for years and it works very well with the --write-mostly flag being set on the HDD. With most reads coming from the SSD, starting programs are much quicker.
However, I find the write queue to be very, very small, so the system "feels" like a slow HDD system during writing. But it is possible to configure an extended write-behind buffer/queue which will greatly improve 'bursty' write performance (e.g., Yum/DNF updates or unpacking a tarball with many small files).
Do test, lest some kernel bugs over the years, such as [1], rear their ugly head (you will get a panic quickly). The bug returned at some point and I gave up hope upstream would not break it again. For desktops, it left me unable to boot and required console access to fix.
In short, use 'mdadm --examine-bitmap' on a component (not the md device itself) and look at "Write Mode." I set it to the maximum of 16383 which must be done when the bitmap is created, so remove the bitmap and create a new one with the options you prefer:
mdadm /dev/mdX --grow --bitmap=none mdadm /dev/mdX --grow --bitmap=internal --bitmap-chunk=512M --write-behind=16383
Note sync_action must be idle if you decide to script this. Bigger bitmap-chunks are my preference, but might not be yours. Your mileage and performance may differ. :-)
I've been meaning to test big write-behind's on my CentOS 8 systems...
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1582673%C2%A0 (login required to view)
On 19/09/2020 19:19, Chris Schanzle via CentOS wrote:
On 9/17/20 4:25 PM, Phil Perry wrote:
On 17/09/2020 13:35, Michael Schumacher wrote:
Hello Phil,
Wednesday, September 16, 2020, 7:40:24 PM, you wrote:
PP> You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and PP> marking the HDD members as --write-mostly, meaning most of the reads PP> will come from the faster SSDs retaining much of the speed advantage, PP> but you have the redundancy of both SSDs and HDDs in the array.
PP> Read performance is not far off native write performance of the SSD, and PP> writes mostly cached / happen in the background so are not so noticeable PP> on a mail server anyway.
very interesting. Do you or anybody else have experience with this setup? Any test results to compare? I will do some testing if nobody can come up with comparisons.
best regards
Michael Schumacher
Here's a few performance stats from my setup, made with fio.
Firstly a RAID1 array from 2 x WD Black 1TB drives. Second set of figures are the same are for a RAID1 array with the same 2 WD Black 1TB drives and a WD Blue NVMe (PCIe X2) added into the array, with the 2 X HDDs set to --write-mostly.
Sequential write QD32 147MB/s (2 x HDD RAID1) 156MB/s (1 x NVMe, 2 x HDD RAID1)
The write tests give near identical performance with and without the SSD in the array as once any cache has been saturated, write speeds are presumably limited by the slowest device in the array.
Sequential read QD32 187MB/s (2 x HDD RAID1) 1725MB/s (1 x NVMe, 2 x HDD RAID1)
Sequential read QD1 162MB/s (2 x HDD RAID1) 1296MB/s (1 x NVMe, 2 x HDD RAID1)
4K random read 712kB/s (2 x HDD RAID1) 55.0MB/s (1 x NVMe, 2 x HDD RAID1)
The read speeds are a completely different story, and the array essentially performs identically to the native speed of the SSD device once the slower HDDs are set to --write-mostly, meaning the reads are prioritized to the SSD device. The SSD NVMe device is limited to PCIe X2 hence why sequential read speeds top out at 1725MB/s. Current PCIe X4 devices should be able to double that.
To summarize, a hybrid RAID1 mixing HDDs and SSDs will have write performance similar to the HDD (slowest device) and read performance similar to the SSD (fastest device) as long as the slower HDDs are added to the array with the --write-mostly flag set. Obviously these are synthetic I/O tests and may not reflect real world application performance but at least give you a good idea where the underlying bottlenecks may be.
Too bad the 4k random write tests are missing above.
4k random writes QD1
with fsync=1 56.6kB/s (2 x HDD RAID1) 77.8kB/s (1 x NVMe, 2 x HDD RAID1)
with fsync=1000 1431kB/s (2 x HDD RAID1) 1760kB/s (1 x NVMe, 2 x HDD RAID1)
I have used SSD + HDD RAID1 configurations in dozens of CentOS desktops and servers for years and it works very well with the --write-mostly flag being set on the HDD. With most reads coming from the SSD, starting programs are much quicker.
However, I find the write queue to be very, very small, so the system "feels" like a slow HDD system during writing.
Yes, as per above, 4k random write performance is similar to that of a pure HDD RAID array.
On 9/16/20 10:40 AM, Phil Perry wrote:
You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and marking the HDD members as --write-mostly, meaning most of the reads will come from the faster SSDs retaining much of the speed advantage, but you have the redundancy of both SSDs and HDDs in the array.
Was the write-behind crash bug ever actually fixed? I don't see it in more recent release notes, but the bug listed isn't public, so I can't check its status.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/htm...