And all that work was done to get this, output of a corrected memory parity error. I get about one of these per workstation per 3 days, more or less; is this a surprising number? (The workstation under the heaviest load gets more, while the idle spare gets none at all; no surprise there!) MCE 6 CPU 1 BANK 0 TIME 1385426237 Mon Nov 25 21:37:17 2013 MCG status: MCi status: Corrected error Error enabled MCA: Internal parity error STATUS 90000040000f0005 MCGSTATUS 0 MCGCAP c09 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 60 Anyway, -G. On Nov 27, 2013, at 3:32 PM, Glenn Eychaner <geychaner at mac.com> wrote: > On further, further, further toying, I now have mcelog running on my 32-bit > CentOS 6 systems! I admit to doing it the "dumb" way: I grabbed the source > from the git repository, compiled and installed it, and THEN discovered > that the init.d file supplied with the source was not CentOS compatible, so > I grabbed the x86-64 RPM, extracted the startup files, and copied them into > place. The RPM was small enough to make this easy. > > What I SHOULD have done is to grab the source RPM, replace the source with > the latest source, build and install the source RPM, and then repackage the > RPMs again for future consumption. Maybe I will try that at a future date, but > I don't really have time today. > > -G. > > On Nov 26, 2013, at 11:11 AM, Glenn Eychaner <geychaner at mac.com> wrote: > >> On further, further investigation, it looks like according to the mcelog install >> guide at http://www.mcelog.org/installation.html, I could "roll my own" for 32-bit >> CentOS 6: >> >> "For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with >> the soft offlining capability backported (like RHEL6 or SLES11-SP1)" >> "The kernel has to have CONFIG_X86_MCE enabled. For 32bit kernels you >> need at least a 2.6,30 kernel." >> >> The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether it >> has CONFIG_X86_MCE enabled. How can I find this out? >> >> JD writes: >> >>> yum info mcelog >>> ... >>> Description : mcelog is a daemon that collects and decodes Machine Check >>> : Exception data on x86-64 machines. >>> >>> So not for 32-bit... >> >> On Nov 26, 2013, at 9:25 AM, Glenn Eychaner <geychaner at mac.com> wrote: >> >>> Further investigation seems to indicate that these events should be handled >>> by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a >>> "mcelog" or "mced" binary, nor does yum seem to contain anything related >>> (based on "yum whatprovides '*/mcelog'" and similar queries). >>> >>> Thus, I still don't know what to do with these errors. Ignore them? I am >>> running 32-bit CentOS 6.4 (legacy software reasons). >>> >>> On Nov 25, 2013, at 11:05 AM, Glenn Eychaner <geychaner at mac.com> wrote: >>> >>>> On my new Haswell-based machines, I am occasionally seeing entries like the >>>> following in /var/log/messages: >>>> kernel: [Hardware Error]: Machine check events logged >>>> (I would not have even noticed them, except that they get flagged by logwatch.) >>>> These messages always occur alone, and don't seem to have a corresponding >>>> entry in any other log file in /var/log. How can I get more info about these >>>> messages? -- Glenn Eychaner (geychaner at lco.cl) Telescope Systems Programmer, Las Campanas Observatory