Hi All, I have a Centos 5 box serving NFS3 shares from an LSI megaraid card. The box has been up and down for about a week and trying to figure out what's up. Found a syslog message today about "APIC error on CPU" and after rebooting with NOAPIC, I now get this:
# cat /var/log/kernel | grep BUG Mar 17 09:51:05 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s! [migration/0:2] Mar 17 09:52:21 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s! [ssh:3491]
Anyone know what this means? I found a thread* from 2006 on this list that mentions updating the bios, but thought I would get a message out early in case this doesn't fix it.
Thanks
[*] - http://lists.centos.org/pipermail/centos/2006-June/023933.html
Try upgrading to the latest kernel.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of tblader Sent: Tuesday, March 17, 2009 9:05 AM To: CentOS mailing list Subject: [CentOS] syslog: CPU stuck for 10s!
Hi All, I have a Centos 5 box serving NFS3 shares from an LSI megaraid card. The box has been up and down for about a week and trying to figure out what's up. Found a syslog message today about "APIC error on CPU" and after rebooting with NOAPIC, I now get this:
# cat /var/log/kernel | grep BUG Mar 17 09:51:05 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s! [migration/0:2] Mar 17 09:52:21 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s! [ssh:3491]
Anyone know what this means? I found a thread* from 2006 on this list that mentions updating the bios, but thought I would get a message out early in case this doesn't fix it.
Thanks
[*] - http://lists.centos.org/pipermail/centos/2006-June/023933.html _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 03/17/2009 02:38 PM, Martin Suehowicz wrote:
Try upgrading to the latest kernel.
Hi, I believe I've got the latest already:
uname -a Linux ofdmz.localdomain 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 12:03:43 EST 2008 i686 athlon i386 GNU/Linux
tblader wrote:
On 03/17/2009 02:38 PM, Martin Suehowicz wrote:
Try upgrading to the latest kernel.
Hi, I believe I've got the latest already:
uname -a Linux ofdmz.localdomain 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 12:03:43 EST 2008 i686 athlon i386 GNU/Linux
My first guess is maybe it's a problem with irqbalance but it wouldn't explain the crashes.
It's possible you have a misbehaving CPU - maybe one which is getting hot since it crashing. Bad thermo grease perhaps.
Can you look at the temperatures?
On 03/17/2009 05:19 PM, Agile Aspect wrote: <snip>
My first guess is maybe it's a problem with irqbalance but it wouldn't explain the crashes.
Interestingly, I stopped irqbalance yesterday afternoon and it ran all night just fine.
It's possible you have a misbehaving CPU - maybe one which is getting hot since it crashing. Bad thermo grease perhaps.
Can you look at the temperatures?
I've reinstalled using the x86_64 distro and kernel 2.6.18-92.1.22.el5.centos.plus. gkrellm reports 104F cpu temp; seems to be accurate - the heat tubes on the cpu cooler are just warm.