[CentOS] mce error

Wed Nov 14 06:22:20 UTC 2012
Banyan He <banyan at rootong.com>

1. ls /lib/modules/2.6.32-279.el6.i686/kernel/drivers/edac | grep mce
If you can find the module there, go to step 2
2. modprobe edac_mce_amd
3. lsmod | grep mce       # verify if it loads

If you don't have the module, compile one. The default kernel from 
centos should have it.

If that is not your case, it is the problem with mcelog itself. I'm not 
100% confident on these conclusion but the code seems wrong here.

             if (!strcmp(vendor,"AuthenticAMD")) {
                 if (family == 15)
                     cputype = CPU_K8;
                 if (family >= 15)
                     SYSERRprintf("AMD Processor family %d: Please load 
edac_mce_amd module.\n", family);
                 return 0;

Your CPU family is 15. Whatever you do, you will reach here since the 
check is called just after the main is launched.

     if (!cpu_forced && !is_cpu_supported()) {
         fprintf(stderr, "CPU is unsupported\n");
         exit(1);
     }

The routine is_cpu_supported reads the data from /proc/cpuinfo for the 
family number. You got stuck here then. You can change the code from 
">=15" to "> 15".

------------
Banyan He
Blog: http://www.rootong.com
Email: banyan at rootong.com

On 2012-11-14 10:58 AM, Ted Miller wrote:
> On 11/13/2012 09:21 AM, Johnny Hughes wrote:
>> On 11/13/2012 07:49 AM, Banyan He wrote:
>>> Just check the config to build the edac_mce module if you don't build it in.
>>>
>>> CONFIG_EDAC_MCE=y
>>>
>>> Make sure you have this in the /boot/config-xxxx.
>> If he is running a standard CentOS kernel then he should have
>> CONFIG_EDAC_MCE=y.
>>
>>>
>>> On 2012-11-13 8:12 PM, Ted Miller wrote:
>>>> During booting of Centos6 I see an error message that goes something like:
>>>>
>>>> Starting mcelog daemon                                     [FAILED]
>>>> AMD Processor family 15: Please load edac_mce_amd module.
>>>> CPU is unsupported
>>>>
>>>> The only helpful information I have found is in the "preview" of
>>>> https://access.redhat.com/knowledge/solutions/158503.  I don't have a
>>>> RedHat account, so don't know if they have a real solution.
>>>>
>>>> I know that mce has to do with logging certain microprocessor errors.
>>>>
>>>> 1. How important is this
>>>> 2. Is there anything I should do, except wait for a bug fix sometime?
>>>>
>>>> Ted Miller
>>>> Elkhart, IN
>> What is does this command say:
>>
>> uname -r
> Install is 100% stock, off Minimal Install disk, then added groups for
> Desktop.  Up to date.
>
>      [tmiller at office04]$uname -r
>      2.6.32-279.14.1.el6.x86_64
>
> Then I tried the command the web page has (I see my error during bootup)
>
>      [root at office04 Documents]# /etc/init.d/mcelogd start
>      [root at office04 Documents]# /etc/init.d/mcelogd status
>      Checking for mcelog
>      mcelog is stopped
>
>      [tmiller at office04]$ls /dev/mc*
>      /dev/mcelog
>
> so the device does exist
>
>      [root at office04 Documents]# locate edac_mci_amd
>
> returned nothing, but I don't know if it should or not.
>
> I was reading the MAN page, and noticed "See  mcelog  --help for  a list of
> valid CPUs." so I tried it, and it lists:
>      Valid CPUs: generic p6old core2 k8 p4 dunnington xeon74xx xeon7400
>      xeon5500 xeon5200 xeon5000 xeon5100 xeon3100 xeon3200 core_i7 core_i5
>      core_i3 nehalem westmere xeon71xx xeon7100 tulsa intel xeon75xx
>      xeon7500 xeon7200 xeon7100 sandybridge sandybridge-ep
> All the CPUs I recognize in there are Intel, though I don't know all the
> nicknames.
>
>      cat /proc/cpuinfo
>
> on my system shows (only first of two cores copied)
>
>      processor	: 0
>      vendor_id	: AuthenticAMD
>      cpu family	: 15
>      model		: 35
>      model name	: Dual Core AMD Opteron(tm) Processor 180
>      stepping	: 2
>      cpu MHz		: 1000.000
>      cache size	: 1024 KB
>      physical id	: 0
>      siblings	: 2
>      core id		: 0
>      cpu cores	: 2
>      apicid		: 0
>      initial apicid	: 0
>      fpu		: yes
>      fpu_exception	: yes
>      cpuid level	: 1
>      wp		: yes
>      flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
> 3dnow rep_good pni lahf_lm cmp_legacy
>      bogomips	: 2009.40
> TLB size	: 1024 4K pages
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp
>
> Not the latest and greatest, and old enough I expected it to be supported
> by now.
>
> Any clues in all this?
> Ted Miller
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> .
>