[CentOS] Problems with motherboard support? INTEL DP43BF

Mon Dec 27 21:19:55 UTC 2010
John R Pierce <pierce at hogranch.com>

On 12/27/10 11:04 AM, robert mena wrote:
> Hi,
>
> I've installed Centos 5.5 (plus updates) in a machine with INTEL 
> DP43BF motherboard.  In order to make Linux detect the PCIs I've added 
> the pci=assign-busses in my GRUB conf.
>
> Everything runs fine but within less than 2 days of uptime the machine 
> simply freezes (black console no connectivity).  This has happened 
> more than one time so I'm considering to be a problem. The memtest 
> passed without a problem and the machine uses a compact flash (sandisk 
> extreme III 4GB) as a disk.
>
> I could only find the error messages in my /var/log/messages but those 
> appear hours before the actual lock.
>
> kernel: 0000:00:1a.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
>
> kernel: 0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
>
>
> kernel: eth4: PCI Bus error a290.
>
> kernel: eth4: PCI Bus error 0290.
>
> kernel: eth3: PCI Bus error 2290.
>
> kernel: eth3: PCI Bus error 0290.
>
>
> Any tips?
>


thats a desktop board, right?  so it probably doesn't have ECC or any of 
the other system integrity features of a server board, nor do they 
usually have the IO bus bandwidth to handle substantial IO workloads.

PCI bus errors are not a good thing at all, either.  you have 5 ethernet 
adapters in use?   what sort of Ethernet controller?   I believe those 
PCI Bus errors are being reported by your ethernet adapters, and could 
be the result of excess bus contention.  a single gigE can way more than 
saturate a 32bit 33Mhz PCI (parallel) bus.  All the PCI slots on a 
desktop board like you have are on the same bus and contend for the same 
bandwidth.

Also, as mentioned thermal problems are a definite possibility, although 
Intel CPUs tend to self-throttle if they get too hot, the Chipset might 
not be that good at it (eg, watch the chipset and memory temperature as 
well as the CPU).    Another possible cause would be silent memory 
corruption although that would be more likely to cause a kernel fault 
("Fatal kernel error - system halted") however if your display is in a 
GUI mode, you won't see this unless the console is directed to a serial 
port which is being monitored.