[CentOS] storage for mailserver

Sat Sep 19 18:19:57 UTC 2020
Chris Schanzle <chris.schanzle at nist.gov>

On 9/17/20 4:25 PM, Phil Perry wrote:
> On 17/09/2020 13:35, Michael Schumacher wrote:
>> Hello Phil,
>>
>> Wednesday, September 16, 2020, 7:40:24 PM, you wrote:
>>
>> PP> You can achieve this with a hybrid RAID1 by mixing SSDs and HDDs, and
>> PP> marking the HDD members as --write-mostly, meaning most of the reads
>> PP> will come from the faster SSDs retaining much of the speed advantage,
>> PP> but you have the redundancy of both SSDs and HDDs in the array.
>>
>> PP> Read performance is not far off native write performance of the SSD, and
>> PP> writes mostly cached / happen in the background so are not so noticeable
>> PP> on a mail server anyway.
>>
>> very interesting. Do you or anybody else have experience with this
>> setup? Any test results to compare? I will do some testing if nobody
>> can come up with comparisons.
>>
>>
>> best regards
>> ---
>> Michael Schumacher
>
> Here's a few performance stats from my setup, made with fio.
>
> Firstly a RAID1 array from 2 x WD Black 1TB drives. Second set of figures are the same are for a RAID1 array with the same 2 WD Black 1TB drives and a WD Blue NVMe (PCIe X2) added into the array, with the 2 X HDDs set to --write-mostly.
>
> Sequential write QD32
> 147MB/s (2 x HDD RAID1)
> 156MB/s (1 x NVMe, 2 x HDD RAID1)
>
> The write tests give near identical performance with and without the SSD in the array as once any cache has been saturated, write speeds are presumably limited by the slowest device in the array.
>
> Sequential read QD32
> 187MB/s (2 x HDD RAID1)
> 1725MB/s (1 x NVMe, 2 x HDD RAID1)
>
> Sequential read QD1
> 162MB/s (2 x HDD RAID1)
> 1296MB/s (1 x NVMe, 2 x HDD RAID1)
>
> 4K random read
> 712kB/s (2 x HDD RAID1)
> 55.0MB/s (1 x NVMe, 2 x HDD RAID1)
>
> The read speeds are a completely different story, and the array essentially performs identically to the native speed of the SSD device once the slower HDDs are set to --write-mostly, meaning the reads are prioritized to the SSD device. The SSD NVMe device is limited to PCIe X2 hence why sequential read speeds top out at 1725MB/s. Current PCIe X4 devices should be able to double that.
>
> To summarize, a hybrid RAID1 mixing HDDs and SSDs will have write performance similar to the HDD (slowest device) and read performance similar to the SSD (fastest device) as long as the slower HDDs are added to the array with the --write-mostly flag set. Obviously these are synthetic I/O tests and may not reflect real world application performance but at least give you a good idea where the underlying bottlenecks may be.


Too bad the 4k random write tests are missing above.

I have used SSD + HDD RAID1 configurations in dozens of CentOS desktops and servers for years and it works very well with the --write-mostly flag being set on the HDD.  With most reads coming from the SSD, starting programs are much quicker.

However, I find the write queue to be very, very small, so the system "feels" like a slow HDD system during writing.  But it is possible to configure an extended write-behind buffer/queue which will greatly improve 'bursty' write performance (e.g., Yum/DNF updates or unpacking a tarball with many small files).

Do test, lest some kernel bugs over the years, such as [1], rear their ugly head (you will get a panic quickly).  The bug returned at some point and I gave up hope upstream would not break it again.  For desktops, it left me unable to boot and required console access to fix.

In short, use 'mdadm --examine-bitmap' on a component (not the md device itself) and look at "Write Mode."  I set it to the maximum of 16383 which must be done when the bitmap is created, so remove the bitmap and create a new one with the options you prefer:

mdadm /dev/mdX --grow --bitmap=none
mdadm /dev/mdX --grow --bitmap=internal --bitmap-chunk=512M --write-behind=16383

Note sync_action must be idle if you decide to script this.  Bigger bitmap-chunks are my preference, but might not be yours.  Your mileage and performance may differ.  :-)

I've been meaning to test big write-behind's on my CentOS 8 systems...

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1582673  (login required to view)