I've posted here about this a number of times. The other admin I work with had been playing with it recently, with some real problems we'd been having, and this time, with a year or so's more stuff to google, and newer documentation, found the problem.
What we'd been seeing: cd to an NFS-mounted directory, and from an NFS-mounted directory, tar -xzvf a 25M or so tar.gz, which unpacks to about 105M. Under CentOS 5, on a local drive, seconds; doing the above, about 35 seconds. Mount options included sync. Under 6.x, from the beginning, it was 6.5 to 7 *minutes*.
The result was that we'd been keeping our home directory servers on 5.
What he found was the mount option barrier. According to one or two hits I found, googling, it's not clear that 5.x even recognizes this option. From upstream docs, https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrieronoff.html, it's enabled by default, and affects *all* journalled filesystems.
Remounting the drive with -o nobarrier, I just NFS mounted an exported directory... and it took 20 seconds.
Since most of our systems are all on UPSes, we're not worried about sudden power loss... and my manager did a jig, and we're starting to talk about migrating the rest of our home directory servers....
mark
On Mon, Nov 4, 2013 at 12:06 PM, m.roth@5-cent.us wrote:
I've posted here about this a number of times. The other admin I work with had been playing with it recently, with some real problems we'd been having, and this time, with a year or so's more stuff to google, and newer documentation, found the problem.
What we'd been seeing: cd to an NFS-mounted directory, and from an NFS-mounted directory, tar -xzvf a 25M or so tar.gz, which unpacks to about 105M. Under CentOS 5, on a local drive, seconds; doing the above, about 35 seconds. Mount options included sync. Under 6.x, from the beginning, it was 6.5 to 7 *minutes*.
The result was that we'd been keeping our home directory servers on 5.
What he found was the mount option barrier. According to one or two hits I found, googling, it's not clear that 5.x even recognizes this option. From upstream docs, https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrieronoff.html, it's enabled by default, and affects *all* journalled filesystems.
Remounting the drive with -o nobarrier, I just NFS mounted an exported directory... and it took 20 seconds.
Since most of our systems are all on UPSes, we're not worried about sudden power loss... and my manager did a jig, and we're starting to talk about migrating the rest of our home directory servers....
I'm trying to make sense of that timing. Does that mean that pre-6.x, fsync() didn't really wait for the data to be written to disk, or does it somehow take 7 minutes to get 100M onto your disk in the right order? Or is this an artifact of a specific raid controller and what you have to do to flush its cache?
Les Mikesell wrote:
On Mon, Nov 4, 2013 at 12:06 PM, m.roth@5-cent.us wrote:
I've posted here about this a number of times. The other admin I work with had been playing with it recently, with some real problems we'd been having, and this time, with a year or so's more stuff to google, and newer documentation, found the problem.
What we'd been seeing: cd to an NFS-mounted directory, and from an NFS-mounted directory, tar -xzvf a 25M or so tar.gz, which unpacks to about 105M. Under CentOS 5, on a local drive, seconds; doing the above, about 35 seconds. Mount options included sync. Under 6.x, from the beginning, it was 6.5 to 7 *minutes*.
The result was that we'd been keeping our home directory servers on 5.
What he found was the mount option barrier. According to one or two hits I found, googling, it's not clear that 5.x even recognizes this option. From upstream docs, https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrieronoff.html, it's enabled by default, and affects *all* journalled filesystems.
Remounting the drive with -o nobarrier, I just NFS mounted an exported directory... and it took 20 seconds.
Since most of our systems are all on UPSes, we're not worried about sudden power loss... and my manager did a jig, and we're starting to
talk about
migrating the rest of our home directory servers....
I'm trying to make sense of that timing. Does that mean that pre-6.x, fsync() didn't really wait for the data to be written to disk, or does it somehow take 7 minutes to get 100M onto your disk in the right order? Or is this an artifact of a specific raid controller and what you have to do to flush its cache?
No, this is regardless of what box, old Penguins, newer Dell's with PERC 600 or 700 RAID controllers. Apparently, this "barrier" controls journalling transactions, so that they are in order, or something like that.
mark
On Mon, Nov 4, 2013 at 12:26 PM, m.roth@5-cent.us wrote:
I'm trying to make sense of that timing. Does that mean that pre-6.x, fsync() didn't really wait for the data to be written to disk, or does it somehow take 7 minutes to get 100M onto your disk in the right order? Or is this an artifact of a specific raid controller and what you have to do to flush its cache?
No, this is regardless of what box, old Penguins, newer Dell's with PERC 600 or 700 RAID controllers. Apparently, this "barrier" controls journalling transactions, so that they are in order, or something like that.
I just don't see where that kind of time can go unless it is forcing a flush of a large (and probably mostly unrelated) cache to disk - possibly even in the internal drive caches if there is a way to do that, and waiting for it to complete after each file close.