Matt Garman wrote:
On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell larry.martell@gmail.com wrote:
<snip>
On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell larry.martell@gmail.com wrote:
Well I spoke too soon. The importer (the one that was initially hanging that I came here to fix) hung up after running 20 hours. There were no NFS errors or messages on neither the client nor the server. When I restarted it, it hung after 1 minute, Restarted it again and it hung after 20 seconds. After that when I restarted it it hung immediately. Still no NFS errors or messages. I tried running the process on the server and it worked fine. So I have to believe this is related to nobarrier. Tomorrow I will try removing that setting, but I am no closer to solving this and I have to leave Japan Saturday :-(
The bad disk still has not been replaced - that is supposed to happen tomorrow, but I won't have enough time after that to draw any conclusions.
I've seen behavior like that with disks that are on their way out...
<snip> I just had a truly unpleasant thought, speaking of disks. Years ago, we tried some WD Green drives in our servers, and that was a disaster. In somewhere between days and weeks, the drives would go offline. I finally found out what happened: consumer-grade drives are intended for desktops, and the TLER - how long the drive keeps trying to read or write to a sector before giving up, marking the sector bad, and going somewhere else - is two *minutes*. Our servers were expecting the TLER to be 7 *seconds* or under. Any chance the client cheaped out with any of the drives?
mark