[CentOS] Machine check events

Glenn Eychaner geychaner at mac.com
Thu Nov 28 12:37:26 UTC 2013


m.roth writes:

> Is the system still under warranty? How 'bout the memory, if you've
> replaced it? You *should* replace it. It's not going to get better....

This is brand-new Kingston 1600MHz ECC memory on a workstation/server
running at high altitude in a relatively open environment; I am loath to
replace it based on a single correctable parity error every few days.
Especially since both active computers are (thus far) seeing about the same
error frequency (though it will take many more days or even weeks to
determine that for certain; I haven't seen one in the last three days on
either active computer), and memtest was run on these computers overnight
(18+ hours) between build and deployment without apparent issue.

[The computers were built in the states and then shipped 10,000 miles to
the observatory location.]

And the turnaround time from the observatory to the U.S. on servicing is no
small matter. I have five of these computers (two active, one "hot" spare,
one "cold" spare, one test system); if in the long run one proves to be a
problem, i will deal with it at that time. If the memory is a bad batch,
I'll need more proof.

-G.

On Nov 27, 2013, at 3:56 PM, Glenn Eychaner <geychaner at mac.com> wrote:

> And all that work was done to get this, output of a corrected memory parity
> error. I get about one of these per workstation per 3 days, more or less; is
> this a surprising number? (The workstation under the heaviest load gets
> more, while the idle spare gets none at all; no surprise there!)
> 
> MCE 6
> CPU 1 BANK 0 
> TIME 1385426237 Mon Nov 25 21:37:17 2013
> MCG status:
> MCi status:
> Corrected error
> Error enabled
> MCA: Internal parity error
> STATUS 90000040000f0005 MCGSTATUS 0
> MCGCAP c09 APICID 2 SOCKETID 0 
> CPUID Vendor Intel Family 6 Model 60

--
Glenn Eychaner (geychaner at lco.cl)
Telescope Systems Programmer, Las Campanas Observatory







More information about the CentOS mailing list