I have a rather large box (2x8-core Xeon, 96GB RAM) where I have a couple of disk arrays connected on an Areca controller. I just added a new external array, 8 3TB drives in RAID5, and the testing I'm doing right now is on this array, but this seems to be a problem on this machine in general, on all file systems (even, possibly, NFS, but I'm not sure about that one yet).
So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I'd expect.
However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:
dd bs=1M if=/dev/zero of=/Volumes/data_10-2/test.bin count=40000 40000+0 records in 40000+0 records out 41943040000 bytes (42 GB) copied, 292.288 s, 143 MB/s
Now, I'm not expecting to get the raw device speeds, but this seems at least to be 2-3 times slower than what I'd expect.
Using conv=fsync oflag=direct makes it utterly pathetic:
dd bs=1M if=/dev/zero of=/Volumes/data_10-2/test.bin oflag=direct conv=fsync count=5000 5000+0 records in 5000+0 records out 5242880000 bytes (5.2 GB) copied, 178.791 s, 29.3 MB/s
Now, I'm sure there can be many reasons for this, but I wonder where I should start looking to debug this.
On 2014-10-14, Joakim Ziegler joakim@terminalmx.com wrote:
So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I'd expect.
However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:
My first question would be, why not test the filesystem with iozone too? (And/or, test the device with dd.) You may or may not come up with the same results, but at least someone can't come back and blame your testing methodology for the odd results.
(Just as an aside, if your 6.4 box is on a public network, you should probably consider updating it as well, since many security and bug fixes have been issued since 6.4 was released.)
If you are still getting poor results from ext4, you have at least two more options.
==Check with the ext4 mailing list; they're usually pretty helpful. ==Try your tests against xfs. Try to make sure your tests are replicating your use cases as closely as you can manage; you wouldn't want to pick a filesystem based on a test that doesn't actually replicate how you're going to use the fs.
--keith
On 13/10/14, 20:59, Keith Keller wrote:
On 2014-10-14, Joakim Ziegler joakim@terminalmx.com wrote:
So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I'd expect.
However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:
My first question would be, why not test the filesystem with iozone too? (And/or, test the device with dd.) You may or may not come up with the same results, but at least someone can't come back and blame your testing methodology for the odd results.
(Just as an aside, if your 6.4 box is on a public network, you should probably consider updating it as well, since many security and bug fixes have been issued since 6.4 was released.)
If you are still getting poor results from ext4, you have at least two more options.
==Check with the ext4 mailing list; they're usually pretty helpful. ==Try your tests against xfs. Try to make sure your tests are replicating your use cases as closely as you can manage; you wouldn't want to pick a filesystem based on a test that doesn't actually replicate how you're going to use the fs.
Googling shows some people who solved what seems like a similar problem with a kernel upgrade, so I'm going to try that. This box is on 2.6.32-358, and 2.6.32-431.29.2 seems to be the newest. At least it's a factor to eliminate.
On 10/14/2014 02:15 PM, Joakim Ziegler wrote:
I have a rather large box (2x8-core Xeon, 96GB RAM) where I have a couple of disk arrays connected on an Areca controller. I just added a new external array, 8 3TB drives in RAID5, and the testing I'm doing right now is on this array, but this seems to be a problem on this machine in general, on all file systems (even, possibly, NFS, but I'm not sure about that one yet).
The first thing I would check is that you have a BBU installed on the areca controller and that it is functioning properly (check the cli, I don't know the exact commands off the top of my head), also make sure that write caching is enabled on the controller (after you've checked the BBU, of course). Without a working BBU in place hardware RAID controllers, such as areca, disable write caching (by default) and this will have a significant impact on write speeds.
Note that newer controllers use a type of flash memory instead of a BBU.
Peter
On 13/10/14, 21:16, Peter wrote:
On 10/14/2014 02:15 PM, Joakim Ziegler wrote:
I have a rather large box (2x8-core Xeon, 96GB RAM) where I have a couple of disk arrays connected on an Areca controller. I just added a new external array, 8 3TB drives in RAID5, and the testing I'm doing right now is on this array, but this seems to be a problem on this machine in general, on all file systems (even, possibly, NFS, but I'm not sure about that one yet).
The first thing I would check is that you have a BBU installed on the areca controller and that it is functioning properly (check the cli, I don't know the exact commands off the top of my head), also make sure that write caching is enabled on the controller (after you've checked the BBU, of course). Without a working BBU in place hardware RAID controllers, such as areca, disable write caching (by default) and this will have a significant impact on write speeds.
Note that newer controllers use a type of flash memory instead of a BBU.
Yes, I have a BBU and it's working. No write caching should, however, not affect raw device writes and filesystem writes so differently, I think.
On Mon, 13 Oct 2014 20:15:11 -0500 Joakim Ziegler joakim@terminalmx.com wrote:
...
So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I'd expect.
However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:
...
Now, I'm sure there can be many reasons for this, but I wonder where I should start looking to debug this.
First I'd suggest comparing apples to apples. That is try doing the dd test on the raw device and compare to dd on ext4.
Then you may want to try changing io scheduler from the default cfq to deadline. This typically works better for many raid controllers but ymmv.
Also testing with xfs instead of ext4 is probably worth it. xfs usually outperform ext4 in streaming writes (like dd). Of course this raises the question of whether that dd is a useful metric for your actual load... xfs may infact be needed (3T * 7 = 21 TB > ext4 max (if I remember correctly, refer to rh online data for rhel6 to make sure)).
Good luck, Peter K
On 14/10/14, 6:45, Peter Kjellström wrote:
On Mon, 13 Oct 2014 20:15:11 -0500 Joakim Ziegler joakim@terminalmx.com wrote:
...
So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I'd expect.
However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:
...
Now, I'm sure there can be many reasons for this, but I wonder where I should start looking to debug this.
First I'd suggest comparing apples to apples. That is try doing the dd test on the raw device and compare to dd on ext4.
Then you may want to try changing io scheduler from the default cfq to deadline. This typically works better for many raid controllers but ymmv.
Also testing with xfs instead of ext4 is probably worth it. xfs usually outperform ext4 in streaming writes (like dd). Of course this raises the question of whether that dd is a useful metric for your actual load... xfs may infact be needed (3T * 7 = 21 TB > ext4 max (if I remember correctly, refer to rh online data for rhel6 to make sure)).
Upgrading to 6.5 with its new kernel did not fix the problem. I will be doing some more testing. The strange thing is, I have a near-identical machine also running CentOS 6.5, also with ext4 on the same controller (and another, newer Areca controller), and there it's extremely fast, on the fastest controller there, dd hits around 2GB/sec sustained over 200 GB of data on a 24-disk RAID6 (both systems have 96GB of RAM each).
And yes, I've formatted with a newer version of e2fsprogs than is included with the distro, to get 16TB+ support, although in the case of the device I'm currently testing, it actually has two partitions, so I wouldn't have needed to.
I'll do a bit more testing and come back with my results.