[CentOS] APIC error on Intel Atom CPU, CentOS 5.x

Wed Mar 17 13:21:05 UTC 2010
Timo Schoeler <timo.schoeler at riscworks.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

thus Bill Campbell spake:
> On Tue, Mar 16, 2010, Timo Schoeler wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> thus JohnS spake:
>>> On Mon, 2010-03-15 at 19:13 -0700, Bill Campbell wrote:
>>>> I am seeing ``APIC error on CPU3: 60(60)'' warnings from dmesg
>>>> periodically on a CentOS 5.4 box, kernel 2.6.18-164.11.1.el5.
>>>> The CPU is an Intel(R) Atom(TM) CPU 330 @ 1.60GHz.  I am not a
>>>> hardware type, and don't have a clue what this means.
>>> Try "noapic" on the kernel boot parameter.  Also if that don't work out
>>> try "acpi=off"
>> Hi,
>>
>> just jumpin' in: I too have an Atom-based machine which runs *rock
>> solid* with ''noapic'' as parameter, and crashes without.
>>
>> However, I've got another machine based on exactly the same hardware
>> (board, CPU, memory, HD, everything) and the same BIOS config -- running
>> flawlessly without the parameter given.
> 
> We have four boxes in small chassis (micro-atx?) with Atom
> processors that are having no problems.  These machines are
> basically gateway boxes for small businesses and do OpenVPN
> tunnels inter-connecting three offices in Texas and one in
> Missouri.
> 
> The box in question is in a larger chassis that doesn't require a 
> low-profile NIC.  It's several months newer than the others so I
> don't know if they're the same main board.
> 
>>>> This is occurring while an rsync-3.0.4 process is receiving data
>>>> sent by a machine running rsync-3.0.7 (I just updated the CentOS
>>>> box to rsync-3.0.7 since noticing that it was a bit dated).  This
>>>> is the only significant load on this machine at this time.
>>> Maybe your running out of kernel threads and or APIC can't distribute
>>> interrupts across the CPU.  Or APIC don't like your motherboard/cpu
>>> under stress.
>> My impression was that it was not load (I tortured both machines running
>> BOINC for a few weeks) but traffic. Thus, I suspect the (on board) NIC
>> to be a bit... crappy (IIRC it was Realtek)? I've always wanted to test
>> it with a reasonable NIC.
> 
> This shouldn't be on the on-board RealTek NIC, but on the Intel
> that's in a regular slot.  On the other hand, when I look at the
> dmesg output it appears that it's the RealTek on the public NIC.
> 
> FWIW, after I updated this to rsync-3.0.7 yesterday afternoon, I
> restarted the rsync using -vP to monitor it, and it has been
> transferring without a glitch for 15 hours now.

However, I'm really convinced that an application/daemon (rsync in this
case) should NOT be able to crash the entire system.

Timo

> Bill

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFLoNdBfg746kcGBOwRAlUiAJ44LO7NDdWNkkWXbd9ENJg++fIanQCgjogU
5c/4dj1dmKPevzRTEzbB2qc=
=5Jeu
-----END PGP SIGNATURE-----