APIC error on Intel Atom CPU, CentOS 5.x

List overview All Threads
Download

newer

older

Re: [CentOS] [CentOS-devel] disk...

cpuspeed CentOS 5.4

Bill Campbell

16 Mar 2010 16 Mar '10

2:13 a.m.

I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.

This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.

This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.

Any suggestions?

Thanks.

Bill

-- INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way Voice: (206) 236-1676 Mercer Island, WA 98040-0820 Fax: (206) 232-9186 Skype: jwccsllc (206) 855-5792 It is better to die on your feet than to live on your knees! -- Emiliano Zapata.

Show replies by date

JohnS

16 Mar 16 Mar

7:21 a.m.

On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:

...

I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.

Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"

...

This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.

Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.

...

This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.

Most of the times I have seen hard lockups is with interrupt problems of distribution of them and failing hardware.

John

Timo Schoeler

7:27 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus JohnS spake:

...

On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:

...
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.

Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"

Hi,

just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.

However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.

...

...
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.

Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.

My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.

...

...
This machine has locked up requiring a hard reset twice while this rsync process has been running at night with no problems at other times.

Most of the times I have seen hard lockups is with interrupt problems of distribution of them and failing hardware.

John

Timo

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFLnzLYfg746kcGBOwRAhsPAJ0Tm4Tae9aIkL/t9QLElofbUDlUdQCgvs1P 8oQPDwRtzXyPyh9ArKnaCtQ= =BCE2 -----END PGP SIGNATURE-----

Martin Jungowski

7:50 a.m.

If it helps I'm experiencing a *very* similar problem with all Atom N270 based company netbooks (Lenovo S10e) with openSUSE 11.2 and Kernel 2.6.31, as well as openSUSE 11.1 and Kernel 2.6.27. Putting load on the NIC works fine until I start rsync. Literally everything else allows me to put load on it - ftp, scp, downloading large files via HTTP, copying via NFS or SMB, etc. But as soon as I start rsync it locks up. It doesn't even start to transfer anything - the NIC instantly dies and more often than not takes the entire system down with it. Every now and then instead of a full lockup I only get a dead NIC. Killing the rsync process hard (-9) and restarting the network often helps.

I realize that this is a very different software environment but I was about to try CentOS on that baby next. I'll try the noapic option when I get back to my office but it's interesting how so different environments seem to produce similar errors. I guess what I'm trying to say is that it's probably not a CentOS-specific issue we're dealing with here.

Martin

-- Rieke Computersysteme GmbH Hellerholz 5 D-82061 Neuried Email: martin@rhm.de

Timo Schoeler

8:53 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus Martin Jungowski spake:

...

If it helps I'm experiencing a *very* similar problem with all Atom N270 based company netbooks (Lenovo S10e) with openSUSE 11.2 and Kernel 2.6.31, as well as openSUSE 11.1 and Kernel 2.6.27. Putting load on the NIC works fine until I start rsync. Literally everything else allows me to put load on it - ftp, scp, downloading large files via HTTP, copying via NFS or SMB, etc. But as soon as I start rsync it locks up.

For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).

On the machine that runs stable for weeks now

# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09

I've got TOR running, which uses a sustained bandwidth of about three MBit (which is not that much, it had about ten MBit for quite a while with an additional FreeNet daemon running).

...

It doesn't even start to transfer anything - the NIC instantly dies and more often than not takes the entire system down with it. Every now and then instead of a full lockup I only get a dead NIC. Killing the rsync process hard (-9) and restarting the network often helps.

I realize that this is a very different software environment but I was about to try CentOS on that baby next. I'll try the noapic option when I get back to my office but it's interesting how so different environments seem to produce similar errors. I guess what I'm trying to say is that it's probably not a CentOS-specific issue we're dealing with here.

Martin

Timo

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFLn0cMfg746kcGBOwRAgQ4AKCBQGXz3lp+6UdPsUd+GR6RqjJAzgCgr4tB qRwPowIh8EyVX4JppTIpmZk= =qybD -----END PGP SIGNATURE-----

JohnS

11:11 a.m.

On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:

...

For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).

On the machine that runs stable for weeks now

# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09

What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.

John

Timo Schoeler

11:17 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus JohnS spake:

...

On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:

...
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).

On the machine that runs stable for weeks now

# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09

What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.

I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02

...

John

HTH,

Timo

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFLn2jbfg746kcGBOwRAgzlAKCia1RyLp+usLH21QwTe110S9HOfQCfYi2e JWhGzcjHC7qee+/PUB2U8xk= =Wvsp -----END PGP SIGNATURE-----

ken

11:32 a.m.

On 03/16/2010 07:17 AM Timo Schoeler wrote:

...

thus JohnS spake:

...
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:

...
...
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).

On the machine that runs stable for weeks now

# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09

What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.

I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02

...

Timo

Timo, Just to be clear, the specs cited above are for the machine that works or the one with the problem?

tnx

Timo Schoeler

11:35 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus ken spake:

...

On 03/16/2010 07:17 AM Timo Schoeler wrote:

...
thus JohnS spake:

...
On Tue, 2010-03-16 at 09:53 +0100, Timo Schoeler wrote:

...
For me it dies on any kind of traffic, not just rsync. I even saw this fetching ISOs using FTP. The machine just died (network-wise; I don't have console on this machine in the data centre, but I do have a machine of this type here in the office, so I could build a test setup with KVM).

On the machine that runs stable for weeks now

# uptime 09:51:33 up 47 days, 47 min, 1 user, load average: 0.14, 0.11, 0.09

What brand of NIC Card do you have and is the BIOS version different or the same? I'm just wondering since you guys are having problems, because I have been considering on buying an Atom based machine.

I can't tell you the exact BIOS version I've running, but I presume they're the same on both hosts (the one freaking out and the other running rock solid); the NIC is

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Board is Intel D945GLF2, CPU is Intel Atom CPU 330@1.60GHz stepping 02

...

Timo

Timo, Just to be clear, the specs cited above are for the machine that works or the one with the problem?

Both. I've got two totally identical machines running with the specs mentioned above, one of them runs rock solid w/o the ``noapic'' tweak as kernel boot argument, the other one crashes without. With ``noapic'' enabled, it also runs rock solid. OS on both is CentOS 5.4 x86_64 with all updates applied.

...

tnx

Timo

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFLn20Ofg746kcGBOwRAhfEAKC2sW2N7kydU5tA/JEdkytrwtbgqACeKAfL K079Ga471ulRVZRjeBAgxVk= =auF2 -----END PGP SIGNATURE-----

Bill Campbell

5:02 p.m.

On Tue, Mar 16, 2010, Timo Schoeler wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus JohnS spake:

...
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:

...
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.

Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"

Hi,

just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.

However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.

We have four boxes in small chassis (micro-atx?) with Atom processors that are having no problems. These machines are basically gateway boxes for small businesses and do OpenVPN tunnels inter-connecting three offices in Texas and one in Missouri.

The box in question is in a larger chassis that doesn't require a low-profile NIC. It's several months newer than the others so I don't know if they're the same main board.

...

...
...
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.

Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.

My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.

This shouldn't be on the on-board RealTek NIC, but on the Intel that's in a regular slot. On the other hand, when I look at the dmesg output it appears that it's the RealTek on the public NIC.

FWIW, after I updated this to rsync-3.0.7 yesterday afternoon, I restarted the rsync using -vP to monitor it, and it has been transferring without a glitch for 15 hours now.

Bill

-- INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way Voice: (206) 236-1676 Mercer Island, WA 98040-0820 Fax: (206) 232-9186 Skype: jwccsllc (206) 855-5792 Property must be secured, or liberty cannot exist. -- John Adams

Timo Schoeler

17 Mar 17 Mar

1:21 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus Bill Campbell spake:

...

On Tue, Mar 16, 2010, Timo Schoeler wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

thus JohnS spake:

...
On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:

...
I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5. The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz. I am not a hardware type, and don't have a clue what this means.

Try "noapic" on the kernel boot parameter. Also if that don't work out try "acpi=off"

Hi,

just jumpin' in: I too have an Atom-based machine which runs *rock solid* with ''noapic'' as parameter, and crashes without.

However, I've got another machine based on exactly the same hardware (board, CPU, memory, HD, everything) and the same BIOS config -- running flawlessly without the parameter given.

We have four boxes in small chassis (micro-atx?) with Atom processors that are having no problems. These machines are basically gateway boxes for small businesses and do OpenVPN tunnels inter-connecting three offices in Texas and one in Missouri.

The box in question is in a larger chassis that doesn't require a low-profile NIC. It's several months newer than the others so I don't know if they're the same main board.

...
...
...
This is occurring while an rsync-3.0.4 process is receiving data sent by a machine running rsync-3.0.7 (I just updated the CentOS box to rsync-3.0.7 since noticing that it was a bit dated). This is the only significant load on this machine at this time.

Maybe your running out of kernel threads and or APIC can't distribute interrupts across the CPU. Or APIC don't like your motherboard/cpu under stress.

My impression was that it was not load (I tortured both machines running BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test it with a reasonable NIC.

This shouldn't be on the on-board RealTek NIC, but on the Intel that's in a regular slot. On the other hand, when I look at the dmesg output it appears that it's the RealTek on the public NIC.

FWIW, after I updated this to rsync-3.0.7 yesterday afternoon, I restarted the rsync using -vP to monitor it, and it has been transferring without a glitch for 15 hours now.

However, I'm really convinced that an application/daemon (rsync in this case) should NOT be able to crash the entire system.

Timo

...

Bill

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFLoNdBfg746kcGBOwRAlUiAJ44LO7NDdWNkkWXbd9ENJg++fIanQCgjogU 5c/4dj1dmKPevzRTEzbB2qc= =5Jeu -----END PGP SIGNATURE-----

5619

Age (days ago)

5620

Last active (days ago)

discuss@lists.centos.org

10 comments

5 participants

tags (0)

participants (5)

Bill Campbell
JohnS
ken
Martin Jungowski
Timo Schoeler