I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.
This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.
Any suggestions?
Thanks.
Bill
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.
Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.
Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.
This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.
Most of the times I have seen hard lockups is with interrupt problems of distribution of them and failing hardware.
John
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus JohnS spake:
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.
Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"
Hi,
just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.
However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.
Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.
My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.
This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.
Most of the times I have seen hard lockups is with interrupt problems of distribution of them and failing hardware.
John
Timo
If it helps I'm experiencing a *very* similar problem with all Atom N270 based company netbooks (Lenovo S10e) with openSUSE 11.2 and Kernel 2.6.31, as well as openSUSE 11.1 and Kernel 2.6.27. Putting load on the NIC works fine until I start rsync. Literally everything else allows me to put load on it - ftp, scp, downloading large files via HTTP, copying via NFS or SMB, etc. But as soon as I start rsync it locks up. It doesn't even start to transfer anything - the NIC instantly dies and more often than not takes the entire system down with it. Every now and then instead of a full lockup I only get a dead NIC. Killing the rsync process hard (-9) and restarting the network often helps.
I realize that this is a very different software environment but I was about to try CentOS on that baby next. I'll try the noapic option when I get back to my office but it's interesting how so different environments seem to produce similar errors. I guess what I'm trying to say is that it's probably not a CentOS-specific issue we're dealing with here.
Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus Martin Jungowski spake:
If it helps I'm experiencing a *very* similar problem with all Atom N270 based company netbooks (Lenovo S10e) with openSUSE 11.2 and Kernel 2.6.31, as well as openSUSE 11.1 and Kernel 2.6.27. Putting load on the NIC works fine until I start rsync. Literally everything else allows me to put load on it - ftp, scp, downloading large files via HTTP, copying via NFS or SMB, etc. But as soon as I start rsync it locks up.
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).
On the machine that runs stable for weeks now
# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09
I've got TOR running, which uses a sustained bandwidth of about three MBit (which is not that much, it had about ten MBit for quite a while with an additional FreeNet daemon running).
It doesn't even start to transfer anything - the NIC instantly dies and more often than not takes the entire system down with it. Every now and then instead of a full lockup I only get a dead NIC. Killing the rsync process hard (-9) and restarting the network often helps.
I realize that this is a very different software environment but I was about to try CentOS on that baby next. I'll try the noapic option when I get back to my office but it's interesting how so different environments seem to produce similar errors. I guess what I'm trying to say is that it's probably not a CentOS-specific issue we're dealing with here.
Martin
Timo
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).
On the machine that runs stable for weeks now
# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09
What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.
John
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus JohnS spake:
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).
On the machine that runs stable for weeks now
# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09
What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.
I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02
John
HTH,
Timo
On 03/16/2010 07:17 AM Timo Schoeler wrote:
thus JohnS spake:
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).
On the machine that runs stable for weeks now
# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09
What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.
I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02
...
Timo
Timo, Just to be clear, the specs cited above are for the machine that works or the one with the problem?
tnx
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus ken spake:
On 03/16/2010 07:17 AM Timo Schoeler wrote:
thus JohnS spake:
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).
On the machine that runs stable for weeks now
# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09
What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.
I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02
...
Timo
Timo, Just to be clear, the specs cited above are for the machine that works or the one with the problem?
Both. I've got two totally identical machines running with the specs mentioned above, one of them runs rock solid w/o the ``noapic'' tweak as kernel boot argument, the other one crashes without. With ``noapic'' enabled, it also runs rock solid. OS on both is CentOS 5.4 x86_64 with all updates applied.
tnx
Timo
On Tue, Mar 16, 2010, Timo Schoeler wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus JohnS spake:
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.
Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"
Hi,
just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.
However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.
We have four boxes in small chassis (micro-atx?) with Atom processors that are having no problems. These machines are basically gateway boxes for small businesses and do OpenVPN tunnels inter-connecting three offices in Texas and one in Missouri.
The box in question is in a larger chassis that doesn't require a low-profile NIC. It's several months newer than the others so I don't know if they're the same main board.
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.
Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.
My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.
This shouldn't be on the on-board RealTek NIC, but on the Intel that's in a regular slot. On the other hand, when I look at the dmesg output it appears that it's the RealTek on the public NIC.
FWIW, after I updated this to rsync-3.0.7 yesterday afternoon, I restarted the rsync using -vP to monitor it, and it has been transferring without a glitch for 15 hours now.
Bill
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus Bill Campbell spake:
On Tue, Mar 16, 2010, Timo Schoeler wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
thus JohnS spake:
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.
Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"
Hi,
just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.
However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.
We have four boxes in small chassis (micro-atx?) with Atom processors that are having no problems. These machines are basically gateway boxes for small businesses and do OpenVPN tunnels inter-connecting three offices in Texas and one in Missouri.
The box in question is in a larger chassis that doesn't require a low-profile NIC. It's several months newer than the others so I don't know if they're the same main board.
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.
Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.
My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.
This shouldn't be on the on-board RealTek NIC, but on the Intel that's in a regular slot. On the other hand, when I look at the dmesg output it appears that it's the RealTek on the public NIC.
FWIW, after I updated this to rsync-3.0.7 yesterday afternoon, I restarted the rsync using -vP to monitor it, and it has been transferring without a glitch for 15 hours now.
However, I'm really convinced that an application/daemon (rsync in this case) should NOT be able to crash the entire system.
Timo
Bill