I've been seeing some random Proliant DL380 G4 64bit crashes. Each time, on the console are messages relating tojbd2/cciss and something about a waitfor 120 seconds. Is anybody else seeing anything like this? Oddly, I can't seem to find this in the logs. I guess it can't write when this happens.
From last June, I used to face the same issue on a HP Proliant DL785. There
are 2 bugs at Redhat about it: https://bugzilla.redhat.com/show_bug.cgi?id=605444 https://bugzilla.redhat.com/show_bug.cgi?id=615543
But I did not find a stable configuration even using HP cciss driver 3.6.28-12 supposed to solve I/O hangs : http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetai...
In september, I did the upgrade to SL6 and no crash since.
-- Pierre-François Honoré
2011/12/17 John Hinton webmaster@ew3d.com
I've been seeing some random Proliant DL380 G4 64bit crashes. Each time, on the console are messages relating tojbd2/cciss and something about a waitfor 120 seconds. Is anybody else seeing anything like this? Oddly, I can't seem to find this in the logs. I guess it can't write when this happens.
-- John Hinton 877-777-1407 ext 502 http://www.ew3d.com Comprehensive Online Solutions
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
If you follow the cited bugzilla's, you'll see that you *must* upgrade your HP firmware too (for everything(!!) -- particularly RAID controllers and SAS expander, etc.) --> to the absolute latest release. [Note: the updates on the 9.30 ISO are *not* late enough, btw.] Then, you need the latest version of the kernel that has a work-around in the cciss / hpsa driver.
HTH
-rak-
On 12/18/2011 2:22 PM, Richard Karhuse wrote:
If you follow the cited bugzilla's, you'll see that you *must* upgrade your HP firmware too (for everything(!!) -- particularly RAID controllers and SAS expander, etc.) --> to the absolute latest release. [Note: the updates on the 9.30 ISO are *not* late enough, btw.] Then, you need the latest version of the kernel that has a work-around in the cciss / hpsa driver.
HTH
-rak-
Thanks. I have already started down the firmware path. This is irritating! 15 years of solid reliability out of Proliant products and then suddenly this! :( I'm starting to wonder if the Linux kernel is just trying to do too many things... geez... (Isn't that what Windows does?) Maybe there is a need for a server kernel which could be a simplified version of a desktop or full kernel? Then again, I have no insight into what led to this... perhaps it was introduced due to the server side features.
So, by "latest kernel", I suppose that would not be the latest CentOS 6.1 kernel? If not, does anyone know if it is in any kernel provided by upstream and if it will soon be available under CentOS? For instance 6.2 that seems to be just around the corner?
Upstream seemed to blame it on their upstream, or the kernel. The cases I found were closed in spite of no good resolution. There has to be a ton of Proliant stuff out there. Actually, HP seems to have a lot of holes in providing for RH6 and has only RH5 for many of these firmware updates. I did successfully run HP RH5 firmware updates on a RH6 box, but I'm not so happy about taking chances like that.
Or worse.... perhaps we are starting to see a degradation due to ownership by HP vs. the fine products that Compaq created? I certainly hope not!
Meanwhile, I guess I'll sit back and wait to see if what I have done is enough.
On Sun, Dec 18, 2011 at 3:21 PM, John Hinton webmaster@ew3d.com wrote:
On 12/18/2011 2:22 PM, Richard Karhuse wrote:
If you follow the cited bugzilla's, you'll see that you *must* upgrade your HP firmware too (for everything(!!) -- particularly RAID controllers and SAS expander, etc.) --> to the absolute latest release. [Note: the updates on the 9.30 ISO are *not* late enough, btw.] Then, you need the latest version of the kernel that has a work-around in the cciss /
hpsa
driver.
HTH
-rak-
Thanks. I have already started down the firmware path. This is irritating! 15 years of solid reliability out of Proliant products and then suddenly this! :( I'm starting to wonder if the Linux kernel is just trying to do too many things... geez... (Isn't that what Windows does?) Maybe there is a need for a server kernel which could be a simplified version of a desktop or full kernel? Then again, I have no insight into what led to this... perhaps it was introduced due to the server side features.
The problem is *not* the linux kernel --> it's HP firmware. Look @ the kernel changes and you'll see where it is working around HP FW.
Note: Some of the firmware upgrades *require* that the box and disks/ MSA's be power cycled (as in you must pull the power cord!) for the FW upgrade to take effect. If you don't do that the new FW isn't what's being used ... (but, then, I assume most folks realise that about FW upgrades...)
So, by "latest kernel", I suppose that would not be the latest CentOS 6.1 kernel? If not, does anyone know if it is in any kernel provided by upstream and if it will soon be available under CentOS? For instance 6.2 that seems to be just around the corner?
The latest kernel in the channel should have the "fix" (aka work-around) in it. Of course, it is not effective unless the corresponding FW patch is also been applied. You have to be very diligent and find the FW's on the HP site and get the very latest. Not sure about G4's, but on G6's, the motherboard FW upgrade was also important too (and is not part of 9.30).
Upstream seemed to blame it on their upstream, or the kernel. The cases I found were closed in spite of no good resolution. There has to be a ton of Proliant stuff out there. Actually, HP seems to have a lot of holes in providing for RH6 and has only RH5 for many of these firmware updates. I did successfully run HP RH5 firmware updates on a RH6 box, but I'm not so happy about taking chances like that.
Or worse.... perhaps we are starting to see a degradation due to ownership by HP vs. the fine products that Compaq created? I certainly hope not!
Meanwhile, I guess I'll sit back and wait to see if what I have done is enough.
-- John Hinton
HTH.
-rak-
On 12/18/2011 3:44 PM, Richard Karhuse wrote:
On Sun, Dec 18, 2011 at 3:21 PM, John Hintonwebmaster@ew3d.com wrote:
On 12/18/2011 2:22 PM, Richard Karhuse wrote:
If you follow the cited bugzilla's, you'll see that you *must* upgrade your HP firmware too (for everything(!!) -- particularly RAID controllers and SAS expander, etc.) --> to the absolute latest release. [Note: the updates on the 9.30 ISO are *not* late enough, btw.] Then, you need the latest version of the kernel that has a work-around in the cciss /
hpsa
driver.
HTH
-rak-
Thanks. I have already started down the firmware path. This is irritating! 15 years of solid reliability out of Proliant products and then suddenly this! :( I'm starting to wonder if the Linux kernel is just trying to do too many things... geez... (Isn't that what Windows does?) Maybe there is a need for a server kernel which could be a simplified version of a desktop or full kernel? Then again, I have no insight into what led to this... perhaps it was introduced due to the server side features.
The problem is *not* the linux kernel --> it's HP firmware. Look @ the kernel changes and you'll see where it is working around HP FW.
Note: Some of the firmware upgrades *require* that the box and disks/ MSA's be power cycled (as in you must pull the power cord!) for the FW upgrade to take effect. If you don't do that the new FW isn't what's being used ... (but, then, I assume most folks realise that about FW upgrades...)
So, by "latest kernel", I suppose that would not be the latest CentOS 6.1 kernel? If not, does anyone know if it is in any kernel provided by upstream and if it will soon be available under CentOS? For instance 6.2 that seems to be just around the corner?
The latest kernel in the channel should have the "fix" (aka work-around) in it. Of course, it is not effective unless the corresponding FW patch is also been applied. You have to be very diligent and find the FW's on the HP site and get the very latest. Not sure about G4's, but on G6's, the motherboard FW upgrade was also important too (and is not part of 9.30).
Upstream seemed to blame it on their upstream, or the kernel. The cases I found were closed in spite of no good resolution. There has to be a ton of Proliant stuff out there. Actually, HP seems to have a lot of holes in providing for RH6 and has only RH5 for many of these firmware updates. I did successfully run HP RH5 firmware updates on a RH6 box, but I'm not so happy about taking chances like that.
Or worse.... perhaps we are starting to see a degradation due to ownership by HP vs. the fine products that Compaq created? I certainly hope not!
Meanwhile, I guess I'll sit back and wait to see if what I have done is enough.
-- John Hinton
HTH.
-rak- _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Richard,
After hours of Googling, you have summarized the issues and the resolve clearly. The bugzillas hopped around with too much bad information interspersed between the few good bits. I will now assume those that said this didn't work, either DL'd too old a firmware update or failed to go far enough. Thank you very much!