[CentOS] DIMM problem

Wed Apr 24 18:21:05 UTC 2013
m.roth at 5-cent.us <m.roth at 5-cent.us>

Luigi Rosa wrote:
> m.roth at 5-cent.us said the following on 24/04/2013 19:51:
>> The *memory* was new - I replaced all, I think, of the original memory.
>> The server's from '09. If they had a warranty, it's well past that, and HP
>> won't chat or email without $$$.
> ProLiant DL 580 servers have an integrated log.
> If you boot with SmartStart CD you can run "Integrated Management Log
> Viewer" application and see if the system has logged some event related
to ECC
> memory.
> If you find some errors about ECC memory, you have a fault memory module
> (the entry in the integrated log SHOULD say what module is faulty).
> If the memory module is new you should be able to get a replacement.

Oh, I know I can get a replacement. In the meantime, it's in *use*, and I
need to arrange to be able to take it down. Then there's the issue of what
comes out - it's got, I don't remember 32 DIMMs maybe, including 3 or 4
riser boards. The bank=2 makes me *think* it's riser 2, but which of the
four? And where's it's mirror (I think it's mirrored memory).

Good idea, though, and I just installed OpenIPMI and ipmitool... and the
only thing that ipmitool sel list shows is a power supply failure
yesterday. I did go into the datacenter and look at it, and it's got this
cute pull-out little display... and it's not showing any of the DIMMs as
failing, which goes with the results of
cat /sys/devices/system/edac/mc/mc0/csrow*/*count *all* giving me zero,
though /sys/devices/system/edac/mc/mc0/ce_count shows 20260 and rising.