Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

List overview All Threads
Download

newer

older

Question 6.3 pulseaudio

installing centos 6 on an old bird

m.roth＠5-cent.us

10 Jul 2012 10 Jul '12

2:18 p.m.

Thought I'd post this here, too - I emailed it to the redhat list, and that's pretty moribund, while I've seen redhatters here....

---------------------------- Original Message ---------------------------- Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7 From: m.roth@5-cent.us Date: Tue, July 10, 2012 09:54 To: "General Red Hat Linux discussion list" redhat-list@redhat.com --------------------------------------------------------------------------

m.roth@5-cent.us wrote:

...

For any redhatters on the list, I'm going to be reopening this bug today.

I am also VERY unhappy with Redhat. I filed the bug months ago, and it was *never* assigned - no one apparently even looked at it. It's a show-stopper for us, since it hits us on our home directory servers.

A week or so ago, I updated our test system to 6.3, and *nothing* has changed. Unpack a large file locally, and it's seconds. Unpack from an NFS-mounted directory to a local disk takes about 1.5min. NFS mount either an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a large file to the NFS-mounted directory, and it's between 6.5 and 7.5 *MINUTES*. We cannot move our home directory servers to 6.x with this unacknowledged ->BUG<-.

Large file is defined as a 28M .gz file, unpacked to 92M.

This is 100% repeatable.

I tried sending an email to our support weeks ago, and got no response. Maybe it takes shaming in a public forum to get anyone to acknowledge this exists....

mark

Show replies by date

Gé Weijers

10 Jul 10 Jul

10:58 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

It may not be a bug, it may be that RHEL 6.x implements I/O barriers correctly, which slows things down but keeps you from losing data....

On Tue, Jul 10, 2012 at 7:18 AM, m.roth@5-cent.us wrote:

...

Thought I'd post this here, too - I emailed it to the redhat list, and that's pretty moribund, while I've seen redhatters here....

---------------------------- Original Message ---------------------------- Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7 From: m.roth@5-cent.us Date: Tue, July 10, 2012 09:54 To: "General Red Hat Linux discussion list" redhat-list@redhat.com

m.roth@5-cent.us wrote:

...
For any redhatters on the list, I'm going to be reopening this bug today.

I am also VERY unhappy with Redhat. I filed the bug months ago, and it was *never* assigned - no one apparently even looked at it. It's a show-stopper for us, since it hits us on our home directory servers.

A week or so ago, I updated our test system to 6.3, and *nothing* has changed. Unpack a large file locally, and it's seconds. Unpack from an NFS-mounted directory to a local disk takes about 1.5min. NFS mount either an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a large file to the NFS-mounted directory, and it's between 6.5 and 7.5 *MINUTES*. We cannot move our home directory servers to 6.x with this unacknowledged ->BUG<-.

Large file is defined as a 28M .gz file, unpacked to 92M.

This is 100% repeatable.

I tried sending an email to our support weeks ago, and got no response. Maybe it takes shaming in a public forum to get anyone to acknowledge this exists....
      mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Gé

Colin Simpson

11 Jul 11 Jul

4:29 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

We have this issue.

I have a support call open with Red Hat about it. Bug reports will only really forcibly get actioned if you open a support call and point at the bug report.

I also have this issue though much much worse on Fedora (using BTRFS), which will surely have to be fixed before BTRFS becomes the default fs in RHEL. But the Fedora bug I have open on this provided some useful insights on NFSv4 esp :

https://bugzilla.redhat.com/show_bug.cgi?id=790232#c2

Particularly:

"NFS file and directory creates are synchronous operation: before the create can return, the client must get a reply from the server saying not only that it has created the new object, but that the create has actually hit the disk."

Also listed here is a proposed protocol extension to NFS v4 to make file creation more efficient:

http://tools.ietf.org/html/draft-myklebust-nfsv4-unstable-file-creation-01

Not sure if this will be added to RH.

Also RH support found:

http://archive09.linux.com/feature/138453

"NFSv4 file creation is actually about half the speed of file creation over NFSv3, but NFSv4 can delete files quicker than NFSv3. By far the largest speed gains come from running with the async option on, though using this can lead to issues if the NFS server crashes or is rebooted."

I'm glad we aren't the only ones seeing this, it sort of looked like we were when talking to support!

I'll add this RH bug number to my RH support ticket.

But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

Thanks

Colin

On Tue, 2012-07-10 at 15:58 -0700, an unknown sender wrote:

...

It may not be a bug, it may be that RHEL 6.x implements I/O barriers correctly, which slows things down but keeps you from losing data....

On Tue, Jul 10, 2012 at 7:18 AM, <m.roth at 5-cent.us> wrote:

...
Thought I'd post this here, too - I emailed it to the redhat list, and that's pretty moribund, while I've seen redhatters here....

---------------------------- Original Message ---------------------------- Subject: Bug 800181: NFSv4 on RHEL 6.2 over six times slower than 5.7 From: m.roth at 5-cent.us Date: Tue, July 10, 2012 09:54 To: "General Red Hat Linux discussion list" <redhat-list at redhat.com>

m.roth at 5-cent.us wrote:

...
For any redhatters on the list, I'm going to be reopening this bug today.

I am also VERY unhappy with Redhat. I filed the bug months ago, and it was *never* assigned - no one apparently even looked at it. It's a show-stopper for us, since it hits us on our home directory servers.

A week or so ago, I updated our test system to 6.3, and *nothing* has changed. Unpack a large file locally, and it's seconds. Unpack from an NFS-mounted directory to a local disk takes about 1.5min. NFS mount either an ext3 or ext4 fs, cd to that directory, and I run a job to unpack a large file to the NFS-mounted directory, and it's between 6.5 and 7.5 *MINUTES*. We cannot move our home directory servers to 6.x with this unacknowledged ->BUG<-.

Large file is defined as a 28M .gz file, unpacked to 92M.

This is 100% repeatable.

I tried sending an email to our support weeks ago, and got no response. Maybe it takes shaming in a public forum to get anyone to acknowledge this exists....
      mark
CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos

________________________________

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.

Les Mikesell

4:49 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...

But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

-- Les Mikesell lesmikesell@gmail.com

Gé Weijers

9:15 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5 (kernel 2.6.18) does not always guarantee that the disk cache is flushed before 'fsync' returns. This is especially true if you use software RAID and/or LVM. You may be able to get the old performance back by disabling I/O barriers and using a UPS, a RAID controller that has battery backed RAM, or enterprise-grade drives that guarantee flushing all the data to disk by using a 'supercap' to store enough energy to complete all writes.

Gé

On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell lesmikesell@gmail.com wrote:

...

On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

-- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Gé

m.roth＠5-cent.us

9:29 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

Gé Weijers wrote:

...

This is likely to be a bug in RHEL5 rather than one in RHEL6. RHEL5 (kernel 2.6.18) does not always guarantee that the disk cache is flushed before 'fsync' returns. This is especially true if you use software RAID and/or LVM. You may be able to get the old performance back by disabling I/O barriers and using a UPS, a RAID controller that has battery backed RAM, or enterprise-grade drives that guarantee flushing all the data to disk by using a 'supercap' to store enough energy to complete all writes.

Gé

On Wed, Jul 11, 2012 at 9:49 AM, Les Mikesell lesmikesell@gmail.com wrote:

...
On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

Thanks, Les, that's *very* interesting.

Based on that, I'm trying again, as I did back in March, when I filed the original bug (which I think meant we were doing it in 6.0 or 6.1), but async on both server and client, and getting different results.

...

...

Ge, sorry, but it hit us, with the same configuration we had in 5, when we tried to move to 6.

And please don't top post.

mark

David C. Miller

10:16 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

----- Original Message -----

...

On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

-- Les Mikesell lesmikesell@gmail.com

I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.

Mark,

Have you tried using async in your export options yet? Any difference?

David.

Colin Simpson

12 Jul 12 Jul

10:41 a.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

I have tried the async option and that reverts to being as fast as previously.

So I guess the choice is use the less safe async and get file creation being quick or live with the slow down until a potentially new protocol extension appears to help with this.

Colin

On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote:

...

----- Original Message -----

...
On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

-- Les Mikesell lesmikesell@gmail.com

I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.

Mark,

Have you tried using async in your export options yet? Any difference?

David. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

________________________________

mark

13 Jul 13 Jul

12:12 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 07/12/12 06:41, Colin Simpson wrote:

...

I have tried the async option and that reverts to being as fast as previously.

So I guess the choice is use the less safe async and get file creation being quick or live with the slow down until a potentially new protocol extension appears to help with this.

The most aggravating part of this is when my manager first set me the problem of trying to find a workaround, I *did* try async, and got no difference. Now, I can't replicate that... but the oldest version I have is still 6.2, and I think I was working under 6.0 or 6.1.

*After* I test further, I think it's up to my manager and our users to decide if it's worth it to go with less secure - this is a real issue, since some of their jobs run days, and one or two weeks, on an HBS* or a good sized cluster. (We're speaking of serious scientific computing here.)

mark

* Technical term: honkin' big server, things like 48 or 64 cores, quarter of a terabyte of memory or so....

...

Colin

On Wed, 2012-07-11 at 15:16 -0700, David C. Miller wrote:

...
----- Original Message -----

...
On Wed, Jul 11, 2012 at 11:29 AM, Colin Simpson Colin.Simpson@iongeo.com wrote:

...
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.

F16 to RHEL5 - 0m 28.170s F16 to F16 ext4 - 4m 12.450s F16 to F16 btrfs - 14m 31.252s

A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).

I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.

-- Les Mikesell lesmikesell@gmail.com

I think you are right that it is the forcing of the sync operation for all writes in NFSv4 that is making it slow. I just tested on a server and client both running RHEL 6.3. I exported a directory that had an old tar.gz of open office 3.0 distribution for Linux. 175MB. Exported with the default of sync option took 26 seconds to extract from the client mount. Exported with the async option and the extraction only took 4 seconds. Just to be clear on what I tested with. This is over 1GbE. The NFS server has an Intel Core i3-2125 CPU @ 3.3GHz, 16GB ram, NFS export directory is from a 22 drive Linux RAID6 connected via a SAS 6Gb/sec HBA. The client is a Intel Core 2 duo E8400 @ 3GHz, 4GB ram.

Mark,

Have you tried using async in your export options yet? Any difference?

David. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- "The Pluto Files", Neil Degrasse Tyson. Pluto shall rise again! - whitroth

Les Mikesell

12:40 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Fri, Jul 13, 2012 at 7:12 AM, mark m.roth@5-cent.us wrote:

...

*After* I test further, I think it's up to my manager and our users to decide if it's worth it to go with less secure - this is a real issue, since some of their jobs run days, and one or two weeks, on an HBS* or a good sized cluster. (We're speaking of serious scientific computing here.)

I always wondered why the default for nfs was ever sync in the first place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.

-- Les Mikesell lesmikesell@gmail.com

Johnny Hughes

17 Jul 17 Jul

9:33 a.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 07/13/2012 07:40 AM, Les Mikesell wrote:

...

On Fri, Jul 13, 2012 at 7:12 AM, mark m.roth@5-cent.us wrote:

...
*After* I test further, I think it's up to my manager and our users to decide if it's worth it to go with less secure - this is a real issue, since some of their jobs run days, and one or two weeks, on an HBS* or a good sized cluster. (We're speaking of serious scientific computing here.)

I always wondered why the default for nfs was ever sync in the first place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.

Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times) than do operations that take place via NFS on a normal network. If you are doing something with your network connection to make it very low latency where the speeds rival local operations, then it would likely be fine to use the exact same settings as local operations. If you are not doing low latency operations, then you are increasing the risk of the system thinking something has happened while the operation is still queued and things like a loss of power will have different items on disk than the system knows about, etc. But people get to override the default settings and increase risk to benefit performance in they choose to.

Les Mikesell

12:48 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Tue, Jul 17, 2012 at 4:33 AM, Johnny Hughes johnny@centos.org wrote:

...

...
I always wondered why the default for nfs was ever sync in the first place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.

Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times) than do operations that take place via NFS on a normal network.

Everything _except_ moving a disk head around, which is the specific operation we are talking about.

...

If you are doing something with your network connection to make it very low latency where the speeds rival local operations, then it would likely be fine to use the exact same settings as local operations.

What I mean is that nobody ever uses sync operations locally - writes are always buffered unless the app does an fsync, and data will sit in that buffer much longer that it does on the network.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

1:27 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

Les Mikesell wrote:

...

On Tue, Jul 17, 2012 at 4:33 AM, Johnny Hughes johnny@centos.org wrote:

...
...
I always wondered why the default for nfs was ever sync in the first place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.

Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times) than do operations that take place via NFS on a normal network.

I would also think that, historically speaking, networks used to be noisier, and more prone to dropping things on the floor (watch out for the bitrot in the carpet, all those bits get into it, y'know...), and so it was for reliability of data. <snip>

...

What I mean is that nobody ever uses sync operations locally - writes are always buffered unless the app does an fsync, and data will sit in that buffer much longer that it does on the network.

But unless the system goes down, that data *will* get written. As I said in what I think was my previous post on this subject, I do have concerns about data security when it might be the o/p of a job that's been running for days.

mark

Les Mikesell

4:28 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Tue, Jul 17, 2012 at 8:27 AM, m.roth@5-cent.us wrote:

...

...
...
...
...
I always wondered why the default for nfs was ever sync in the first

place. Why shouldn't it be the same as local use of the filesystem? The few things that care should be doing fsync's at the right places anyway.

Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times) than do operations that take place via NFS on a normal network.

I would also think that, historically speaking, networks used to be noisier, and more prone to dropping things on the floor (watch out for the bitrot in the carpet, all those bits get into it, y'know...), and so it was for reliability of data.

How many apps really expect the status of every write() to mean they have a recoverable checkpoint?

...

...
What I mean is that nobody ever uses sync operations locally - writes are always buffered unless the app does an fsync, and data will sit in that buffer much longer that it does on the network.

But unless the system goes down, that data *will* get written.

But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break - at least since people stopped using thin coax.

...

As I said in what I think was my previous post on this subject, I do have concerns about data security when it might be the o/p of a job that's been running for days.

It is a rare application that can recover (or expects to) without losing any data from a random disk write. In fact it would be a foolish application that expects that, since it isn't guaranteed to be committed to disk locally without an fsync. Maybe things like link and rename that applications use as atomic checkpoints in the file system need it. These days wouldn't it be better to use one of the naturally-distributed and redundant databases (riak, cassandra, mongo, etc.) for big jobs instead of nfs filesystems anyway?

-- Les Mikesell lesmikesell@gmail.com

Lamar Owen

18 Jul 18 Jul

6:31 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Tuesday, July 17, 2012 12:28:00 PM Les Mikesell wrote:

...

But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break - at least since people stopped using thin coax.

Just a few days ago I watched a facility's switched network go basically 'down' due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor 'C' switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.

Jabbering, of course, being a technical term..... :-)

While managed switches with a dedicated management VLAN are good, when the traffic in question overwhelms the control plane things get unmanaged really quickly. COPP isn't available on these particular switches, unfortunately.

Rob Kampen

7:25 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 07/19/2012 06:31 AM, Lamar Owen wrote:

...

On Tuesday, July 17, 2012 12:28:00 PM Les Mikesell wrote:

...
But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break - at least since people stopped using thin coax.

Just a few days ago I watched a facility's switched network go basically 'down' due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor 'C' switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.

Jabbering, of course, being a technical term..... :-)

While managed switches with a dedicated management VLAN are good, when the traffic in question overwhelms the control plane things get unmanaged really quickly. COPP isn't available on these particular switches, unfortunately.

Just two weeks ago I had a similar issue with a broadband modem repeatedly restarting itself - it flooded our network and all our VPNs with "jabbering" (TM) and basically left us in an unworkable situation until we got someone on site.

...

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Les Mikesell

7:31 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Wed, Jul 18, 2012 at 1:31 PM, Lamar Owen lowen@pari.edu wrote:

...

On Tuesday, July 17, 2012 12:28:00 PM Les Mikesell wrote:

...
But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break - at least since people stopped using thin coax.

Just a few days ago I watched a facility's switched network go basically 'down' due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor 'C' switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.

Sure, everything can break and most will sometime, but does this happen often enough that you'd want to slow down all of your network disk writes by an order of magnitude on the odd chance that some app really cares about a random write that it didn't bother to fsync?

-- Les Mikesell lesmikesell@gmail.com

Lamar Owen

19 Jul 19 Jul

5:06 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Wednesday, July 18, 2012 03:31:53 PM Les Mikesell wrote:

...

Sure, everything can break and most will sometime, but does this happen often enough that you'd want to slow down all of your network disk writes by an order of magnitude on the odd chance that some app really cares about a random write that it didn't bother to fsync?

For some applications, yes, that is exactly what I would want to do. It depends upon whether performance is more or less important than reliability.

Les Mikesell

5:19 p.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On Thu, Jul 19, 2012 at 12:06 PM, Lamar Owen lowen@pari.edu wrote:

...

On Wednesday, July 18, 2012 03:31:53 PM Les Mikesell wrote:

...
Sure, everything can break and most will sometime, but does this happen often enough that you'd want to slow down all of your network disk writes by an order of magnitude on the odd chance that some app really cares about a random write that it didn't bother to fsync?

For some applications, yes, that is exactly what I would want to do. It depends upon whether performance is more or less important than reliability.

I realize that admins often have to second-guess badly designed things but shouldn't the application make that decision itself and fsync at the points where restarting is possible or useful? To do it at the admin level it becomes a mount-point choice not just an application setting.

-- Les Mikesell lesmikesell@gmail.com

Tilman Schmidt

13 Jul 13 Jul

11:28 a.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

Am 11.07.2012 00:58, schrieb Gé Weijers:

...

It may not be a bug, it may be that RHEL 6.x implements I/O barriers correctly, which slows things down but keeps you from losing data....

Which is of course no excuse for not even responding to a support request. "It's not a bug, it's a feature" may not be the response the client wants to hear, but it's much better than no response at all.

Jm2c

-- Tilman Schmidt Phoenix Software GmbH Bonn, Germany

Kahlil Hodgson

11 Jul 11 Jul

4:21 a.m.

New subject: Fwd: Bug 800181: NFSv4 on RHEL 6.3 over six times slower than 5.8

On 11/07/12 00:18, m.roth@5-cent.us wrote:

...

...
For any redhatters on the list, I'm going to be reopening this bug today.

I am also VERY unhappy with Redhat. I filed the bug months ago, and it was *never* assigned - no one apparently even looked at it. It's a show-stopper for us, since it hits us on our home directory servers.

Out of curiosity, do you have a Red Hat subscription with Standard or better support? The SLAs for even a severity 4 issue should have got you a response within 2 business days.

https://access.redhat.com/support/offerings/production/sla.html

Did you give them a call?

If you are just using the Red Hat bugzilla that might be your problem. I've heard a rumour that Red Hat doesn't really monitor that channel, giving preference to issues raised though their customer portal. That does makes _some_ commercial sense, but if they are, it would be polite to shut down the old bugzilla service and save some frustration. I don't have a Red Hat subscription myself, so I can't really test this. Can anyone, perhaps with a Red Hat subscription, shed any light on this?

It occurs that I might be hi-jacking a thread here, so apologies if that is the case.

Cheers,

Kal

-- Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd (w) +61 (0) 3 9008 5281 Suite 1415 401 Docklands Drive Docklands VIC 3008 Australia "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

4776

Age (days ago)

4785

Last active (days ago)

discuss@lists.centos.org

20 comments

11 participants

tags (0)

participants (11)

Colin Simpson
David C. Miller
Gé Weijers
Johnny Hughes
Kahlil Hodgson
Lamar Owen
Les Mikesell
m.roth＠5-cent.us
mark
Rob Kampen
Tilman Schmidt