Hi All,
We have installed CentOS 5.3 x86_64 in an HP DL585 server with AMD Opteron 64 bit processor and 16 GB RAM. The kernel version is 2.6.18-128.el5 . Now this has thrown an error message in /var/log/message,
Jul 3 21:41:11 db1 kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) Jul 3 21:41:11 db1 kernel: EDAC MC0: CE page 0x65bc7, offset 0x6a0, grain 8, syndrome 0x6e1a, row 0, channel 0, label "": k8_edac Jul 3 21:41:11 db1 kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error Jul 3 22:00:00 db1 ntpdate[3813]: step time server 120.88.46.10 offset -4.375417 sec Jul 3 22:12:57 db1 kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) Jul 3 22:12:57 db1 kernel: EDAC MC0: CE page 0x65bc7, offset 0x6a0, grain 8, syndrome 0x6e1a, row 0, channel 0, label "": k8_edac Jul 3 22:12:57 db1 kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error
I understand this is an error message from Error Detection And Control module and just wanna confirm that this is not a kernel or software related issue. If Hardware related is it confined only to the Physical memory stick installed or processor related? Any hint on this??
Regards,
Kurian Thayil.
Greetings,
On Sat, Jul 4, 2009 at 12:34 PM, Kurian Thayilkurianmthayil@gmail.com wrote:
Hi All,
We have installed CentOS 5.3 x86_64 in an HP DL585 server with AMD Opteron 64 bit processor and 16 GB RAM. The kernel version is 2.6.18-128.el5 . Now this has thrown an error message in /var/log/message,
Jul 3 21:41:11 db1 kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)
While I cant throw much light on this particular problem, I have faced some peoblem as the opeteron based DL 3x5 g5 servers require RAM to be populated evenly among the CPU sockets
Just my 2p
Kurian Thayil wrote:
I understand this is an error message from Error Detection And Control module and just wanna confirm that this is not a kernel or software related issue. If Hardware related is it confined only to the Physical memory stick installed or processor related? Any hint on this??
Login to the iLO/iLO2 interface and look at the system event log, all DL585s will log memory errors there, and will even tell you what memory module it is.
nate
On Sat, Jul 4, 2009 at 1:33 PM, natecentos@linuxpowered.net wrote:
Kurian Thayil wrote:
I understand this is an error message from Error Detection And Control module and just wanna confirm that this is not a kernel or software related issue. If Hardware related is it confined only to the Physical memory stick installed or processor related? Any hint on this??
Login to the iLO/iLO2 interface and look at the system event log, all DL585s will log memory errors there, and will even tell you what memory module it is.
In addition to Nate's suggestion, you may be able to use the mcelog utility to check for CPU and memory faults:
http://prefetch.net/blog/index.php/2009/06/11/locating-hardware-faults-on-li...
Hope this helps, - Ryan -- http://prefetch.net