During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
1. How important is this 2. Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN
Just check the config to build the edac_mce module if you don't build it in.
CONFIG_EDAC_MCE=y
Make sure you have this in the /boot/config-xxxx.
------------ Banyan He Blog: http://www.rootong.com Email: banyan@rootong.com
On 2012-11-13 8:12 PM, Ted Miller wrote:
During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
- How important is this
- Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 11/13/2012 07:49 AM, Banyan He wrote:
Just check the config to build the edac_mce module if you don't build it in.
CONFIG_EDAC_MCE=y
Make sure you have this in the /boot/config-xxxx.
If he is running a standard CentOS kernel then he should have CONFIG_EDAC_MCE=y.
On 2012-11-13 8:12 PM, Ted Miller wrote:
During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
- How important is this
- Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN
What is does this command say:
uname -r
On 11/13/2012 09:21 AM, Johnny Hughes wrote:
On 11/13/2012 07:49 AM, Banyan He wrote:
Just check the config to build the edac_mce module if you don't build it in.
CONFIG_EDAC_MCE=y
Make sure you have this in the /boot/config-xxxx.
If he is running a standard CentOS kernel then he should have CONFIG_EDAC_MCE=y.
On 2012-11-13 8:12 PM, Ted Miller wrote:
During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
- How important is this
- Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN
What is does this command say:
uname -r
Install is 100% stock, off Minimal Install disk, then added groups for Desktop. Up to date.
[tmiller@office04]$uname -r 2.6.32-279.14.1.el6.x86_64
Then I tried the command the web page has (I see my error during bootup)
[root@office04 Documents]# /etc/init.d/mcelogd start [root@office04 Documents]# /etc/init.d/mcelogd status Checking for mcelog mcelog is stopped
[tmiller@office04]$ls /dev/mc* /dev/mcelog
so the device does exist
[root@office04 Documents]# locate edac_mci_amd
returned nothing, but I don't know if it should or not.
I was reading the MAN page, and noticed "See mcelog --help for a list of valid CPUs." so I tried it, and it lists: Valid CPUs: generic p6old core2 k8 p4 dunnington xeon74xx xeon7400 xeon5500 xeon5200 xeon5000 xeon5100 xeon3100 xeon3200 core_i7 core_i5 core_i3 nehalem westmere xeon71xx xeon7100 tulsa intel xeon75xx xeon7500 xeon7200 xeon7100 sandybridge sandybridge-ep All the CPUs I recognize in there are Intel, though I don't know all the nicknames.
cat /proc/cpuinfo
on my system shows (only first of two cores copied)
processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : Dual Core AMD Opteron(tm) Processor 180 stepping : 2 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 2009.40 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Not the latest and greatest, and old enough I expected it to be supported by now.
Any clues in all this? Ted Miller
1. ls /lib/modules/2.6.32-279.el6.i686/kernel/drivers/edac | grep mce If you can find the module there, go to step 2 2. modprobe edac_mce_amd 3. lsmod | grep mce # verify if it loads
If you don't have the module, compile one. The default kernel from centos should have it.
If that is not your case, it is the problem with mcelog itself. I'm not 100% confident on these conclusion but the code seems wrong here.
if (!strcmp(vendor,"AuthenticAMD")) { if (family == 15) cputype = CPU_K8; if (family >= 15) SYSERRprintf("AMD Processor family %d: Please load edac_mce_amd module.\n", family); return 0;
Your CPU family is 15. Whatever you do, you will reach here since the check is called just after the main is launched.
if (!cpu_forced && !is_cpu_supported()) { fprintf(stderr, "CPU is unsupported\n"); exit(1); }
The routine is_cpu_supported reads the data from /proc/cpuinfo for the family number. You got stuck here then. You can change the code from ">=15" to "> 15".
------------ Banyan He Blog: http://www.rootong.com Email: banyan@rootong.com
On 2012-11-14 10:58 AM, Ted Miller wrote:
On 11/13/2012 09:21 AM, Johnny Hughes wrote:
On 11/13/2012 07:49 AM, Banyan He wrote:
Just check the config to build the edac_mce module if you don't build it in.
CONFIG_EDAC_MCE=y
Make sure you have this in the /boot/config-xxxx.
If he is running a standard CentOS kernel then he should have CONFIG_EDAC_MCE=y.
On 2012-11-13 8:12 PM, Ted Miller wrote:
During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
- How important is this
- Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN
What is does this command say:
uname -r
Install is 100% stock, off Minimal Install disk, then added groups for Desktop. Up to date.
[tmiller@office04]$uname -r 2.6.32-279.14.1.el6.x86_64
Then I tried the command the web page has (I see my error during bootup)
[root@office04 Documents]# /etc/init.d/mcelogd start [root@office04 Documents]# /etc/init.d/mcelogd status Checking for mcelog mcelog is stopped [tmiller@office04]$ls /dev/mc* /dev/mcelog
so the device does exist
[root@office04 Documents]# locate edac_mci_amd
returned nothing, but I don't know if it should or not.
I was reading the MAN page, and noticed "See mcelog --help for a list of valid CPUs." so I tried it, and it lists: Valid CPUs: generic p6old core2 k8 p4 dunnington xeon74xx xeon7400 xeon5500 xeon5200 xeon5000 xeon5100 xeon3100 xeon3200 core_i7 core_i5 core_i3 nehalem westmere xeon71xx xeon7100 tulsa intel xeon75xx xeon7500 xeon7200 xeon7100 sandybridge sandybridge-ep All the CPUs I recognize in there are Intel, though I don't know all the nicknames.
cat /proc/cpuinfo
on my system shows (only first of two cores copied)
processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : Dual Core AMD Opteron(tm) Processor 180 stepping : 2 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 2009.40 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Not the latest and greatest, and old enough I expected it to be supported by now.
Any clues in all this? Ted Miller _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos .
On 11/13/2012 09:21 AM, Johnny Hughes wrote:
On 11/13/2012 07:49 AM, Banyan He wrote:
Just check the config to build the edac_mce module if you don't build it in.
CONFIG_EDAC_MCE=y
Make sure you have this in the /boot/config-xxxx.
If he is running a standard CentOS kernel then he should have CONFIG_EDAC_MCE=y.
On 2012-11-13 8:12 PM, Ted Miller wrote:
During booting of Centos6 I see an error message that goes something like:
Starting mcelog daemon [FAILED] AMD Processor family 15: Please load edac_mce_amd module. CPU is unsupported
The only helpful information I have found is in the "preview" of https://access.redhat.com/knowledge/solutions/158503. I don't have a RedHat account, so don't know if they have a real solution.
I know that mce has to do with logging certain microprocessor errors.
- How important is this
- Is there anything I should do, except wait for a bug fix sometime?
Ted Miller Elkhart, IN
What is does this command say:
uname -r
On 2012-11-14 10:58 AM, Ted Miller wrote:
Install is 100% stock, off Minimal Install disk, then added groups for Desktop. Up to date.
[tmiller@office04]$uname -r 2.6.32-279.14.1.el6.x86_64
Then I tried the command the web page has (I see my error during bootup)
[root@office04 Documents]# /etc/init.d/mcelogd start [root@office04 Documents]# /etc/init.d/mcelogd status Checking for mcelog mcelog is stopped
[tmiller@office04]$ls /dev/mc* /dev/mcelog
so the device does exist
[root@office04 Documents]# locate edac_mci_amd
returned nothing, but I don't know if it should or not.
I was reading the MAN page, and noticed "See mcelog --help for a list of valid CPUs." so I tried it, and it lists: Valid CPUs: generic p6old core2 k8 p4 dunnington xeon74xx xeon7400 xeon5500 xeon5200 xeon5000 xeon5100 xeon3100 xeon3200 core_i7 core_i5 core_i3 nehalem westmere xeon71xx xeon7100 tulsa intel xeon75xx xeon7500 xeon7200 xeon7100 sandybridge sandybridge-ep All the CPUs I recognize in there are Intel, though I don't know all the nicknames.
cat /proc/cpuinfo
on my system shows (only first of two cores copied)
processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : Dual Core AMD Opteron(tm) Processor 180 stepping : 2 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 2009.40 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Not the latest and greatest, and old enough I expected it to be supported by now.
Any clues in all this? Ted Miller .
On 11/14/2012 01:22 AM, Banyan He wrote:
- ls /lib/modules/2.6.32-279.el6.i686/kernel/drivers/edac | grep mce
It exists
If you can find the module there, go to step 2 2. modprobe edac_mce_amd
That works
- lsmod | grep mce # verify if it loads
That verifies, even after a reboot. (Didn't try it before step 2, so don't know if it was already loaded.)
If that is not your case, it is the problem with mcelog itself. I'm not 100% confident on these conclusion but the code seems wrong here.
if (!strcmp(vendor,"AuthenticAMD")) { if (family == 15) cputype = CPU_K8; if (family >= 15) SYSERRprintf("AMD Processor family %d: Please load edac_mce_amd module.\n", family); return 0;
Your CPU family is 15. Whatever you do, you will reach here since the check is called just after the main is launched.
I'm not much at C programming, but the way I read that, I will hit the "return 0" statement no matter what the family number, even if it is less than 15. Any CPU that matches the !strcmp(vendor,"AuthenticAMD") expression is going to get to the return 0 line eventually. The two intermediate if statements only determine if a value is set for 'cputype' and if the warning statement gets printed before you arrive at the return 0 line. You are going to get there whether your family number is 1 or 100.
I found source code online (had a comment about being edited two months ago) for the is_cpu_supported routine. Looking at the whole thing, I see what appear (to my inexperienced eye) two program flow errors.
1. The issue you pointed out, where the third 'if' statement looks like it should be '>', not '>='.
2. It looks like there should be braces around the two statements following the third 'if' statement. Then it would look like:
if (!strcmp(vendor,"AuthenticAMD")) { if (family == 15) cputype = CPU_K8; if (family > 15) { SYSERRprintf("AMD Processor family %d: Please load edac_mce_amd module.\n", family); return 0;}
That construction would allow Family=15 to be supported. The mcelog error message lists k8 as a supported CPU (but I wonder if it has ever been tested?)
Without these changes, my eye says that all AMD CPUs are rejected (return 0), and never get to the accepted criteria (return 1)
if (!cpu_forced && !is_cpu_supported()) { fprintf(stderr, "CPU is unsupported\n"); exit(1); }
The routine is_cpu_supported reads the data from /proc/cpuinfo for the family number. You got stuck here then. You can change the code from ">=15" to "> 15".
Banyan He Blog: http://www.rootong.com Email: banyan@rootong.com
I don't have source code downloaded, nor have I done much building/compiling, but I would be willing to try to solve this. Maybe I can contribute a little bit back to the project this way.
Does anyone else read the code the way I do, or am I missing something completely? Ted Miller Indiana, USA
Ted Miller wrote:
[root@office04 Documents]# locate edac_mci_amd
returned nothing, but I don't know if it should or not.
you have a typo, it should be locate edac_mce_amd
and it should return /lib/modules/*/kernel/drivers/edac/edac_mce_amd.ko for all the kernels you have installed
On 11/14/2012 05:41 AM, Nicolas Thierry-Mieg wrote:
Ted Miller wrote:
[root@office04 Documents]# locate edac_mci_amd
returned nothing, but I don't know if it should or not.
you have a typo, it should be locate edac_mce_amd
Thanks for catching that. Good example of why copying and pasting the actual commands and responses is the best way. Wrong command->wrong response.
and it should return /lib/modules/*/kernel/drivers/edac/edac_mce_amd.ko for all the kernels you have installed
It does. See my other reply for more details on what I found.
Any help welcomed. Ted Miller