In order to support ZFS, we upgraded a backups server with a new, ECC motherboard. We're running CentOS 6 with ZFS on Linux, recently patched. Now, I want to enable EDAC so we can check for memory errors (and maybe PCI errors as well) but so far, repeatedly pounding on the Google hasn't yielded exactly what I need to do to enable EDAC.
One howto was covering PCI and edac, but "modprobe edac_mc" didn't work. Here's some information below, How do I get edac up and running? Many howtos cover how to use edac-ctl and edac-util, but none seem to cover how to determine what module to load into the kernel.
[root@hume ~]# modprobe edac_mc FATAL: Module edac_mc not found. [root@hume ~]# lsmod | grep edac [root@hume ~]# cat /proc/version Linux version 2.6.32-431.11.2.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014 [root@hume ~]# modprobe edac_mce FATAL: Module edac_mce not found. [root@hume ~]# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X9SCL/X9SCM [root@hume ~]# edac-ctl --status edac-ctl: drivers not loaded. [root@hume ~]# lsmod Module Size Used by ext3 240013 1 jbd 80858 1 ext3 nfsd 309196 13 lockd 73662 1 nfsd nfs_acl 2647 1 nfsd auth_rpcgss 44949 1 nfsd sunrpc 262864 20 nfsd,lockd,nfs_acl,auth_rpcgss exportfs 4236 1 nfsd bnx2fc 90539 0 fcoe 23298 0 libfcoe 56791 2 bnx2fc,fcoe 8021q 25349 0 garp 7152 1 8021q stp 2218 1 garp libfc 108670 3 bnx2fc,fcoe,libfcoe llc 5546 2 garp,stp scsi_transport_fc 55299 3 bnx2fc,fcoe,libfc scsi_tgt 12077 1 scsi_transport_fc ipt_REJECT 2351 2 nf_conntrack_ipv4 9506 15 nf_defrag_ipv4 1483 1 nf_conntrack_ipv4 iptable_filter 2793 1 ip_tables 17831 1 iptable_filter ip6t_REJECT 4628 2 nf_conntrack_ipv6 8337 2 nf_defrag_ipv6 11156 1 nf_conntrack_ipv6 xt_state 1492 17 nf_conntrack 79758 3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state ip6table_filter 2889 1 ip6_tables 18732 1 ip6table_filter iTCO_wdt 7115 0 iTCO_vendor_support 3056 1 iTCO_wdt zfs 1152935 53 zcommon 44698 1 zfs znvpair 80460 2 zfs,zcommon zavl 6925 1 zfs zunicode 323159 1 zfs spl 260832 5 zfs,zcommon,znvpair,zavl,zunicode zlib_deflate 21629 1 spl i2c_i801 11359 0 i2c_core 31084 1 i2c_i801 ses 6475 0 enclosure 8438 1 ses sg 29350 0 lpc_ich 12803 0 mfd_core 1895 1 lpc_ich shpchp 32778 0 ext4 374902 3 jbd2 93427 1 ext4 mbcache 8193 2 ext3,ext4 raid1 32045 2 usb_storage 49068 5 sd_mod 39069 27 crc_t10dif 1541 1 sd_mod ata_generic 3837 0 pata_acpi 3701 0 pata_jmicron 2813 2 video 20674 0 output 2409 1 video e1000e 267701 0 ptp 9614 1 e1000e pps_core 11458 1 ptp ahci 42247 8 xhci_hcd 148886 0 dm_mirror 14384 0 dm_region_hash 12085 1 dm_mirror dm_log 9930 2 dm_mirror,dm_region_hash dm_mod 84209 2 dm_mirror,dm_log be2iscsi 99578 0 bnx2i 48096 0 cnic 57079 2 bnx2fc,bnx2i uio 10462 1 cnic ipv6 317829 56 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,cnic cxgb4i 28361 0 cxgb4 104882 1 cxgb4i cxgb3i 24491 0 libcxgbi 52202 2 cxgb4i,cxgb3i cxgb3 152922 1 cxgb3i mdio 4769 1 cxgb3 libiscsi_tcp 17020 3 cxgb4i,cxgb3i,libcxgbi qla4xxx 257114 0 iscsi_boot_sysfs 9458 2 be2iscsi,qla4xxx libiscsi 49836 7 be2iscsi,bnx2i,cxgb4i,cxgb3i,libcxgbi,libiscsi_tcp,qla4xxx scsi_transport_iscsi 84241 5 be2iscsi,bnx2i,libcxgbi,qla4xxx,libiscsi
[root@hume ~]# rpm -qi edac-utils Name : edac-utils Relocations: (not relocatable) Version : 0.9 Vendor: CentOS Release : 14.el6 Build Date: Wed 20 Jul 2011 11:13:34 AM UTC Install Date: Wed 25 Jun 2014 09:27:40 PM UTC Build Host: c6b6.bsys.dev.centos.org Group : System Environment/Base Source RPM: edac-utils-0.9-14.el6.src.rpm Size : 78637 License: GPLv2+ Signature : RSA/SHA1, Mon 26 Sep 2011 04:17:58 AM UTC, Key ID 0946fca2c105b9de Packager : CentOS BuildSystem http://bugs.centos.org URL : http://sourceforge.net/projects/edac-utils/ Summary : Userspace helper for kernel EDAC drivers Description : EDAC is the current set of drivers in the Linux kernel that handle detection of ECC errors from memory controllers for most chipsets on i386 and x86_64 architectures. This userspace component consists of an init script which makes sure EDAC drivers and DIMM labels are loaded at system startup, as well as a library and utility for reporting current error counts from the EDAC sysfs files.
On 06/25/14 18:08, Lists wrote:
In order to support ZFS, we upgraded a backups server with a new, ECC motherboard. We're running CentOS 6 with ZFS on Linux, recently patched. Now, I want to enable EDAC so we can check for memory errors (and maybe PCI errors as well) but so far, repeatedly pounding on the Google hasn't yielded exactly what I need to do to enable EDAC.
One howto was covering PCI and edac, but "modprobe edac_mc" didn't work. Here's some information below, How do I get edac up and running? Many howtos cover how to use edac-ctl and edac-util, but none seem to cover how to determine what module to load into the kernel.
<snip> Dumb question: is there something to enable/disenable it in the BIOS?
mark
Am 26.06.2014 um 00:08 schrieb Lists lists@benjamindsmith.com:
In order to support ZFS, we upgraded a backups server with a new, ECC motherboard. We're running CentOS 6 with ZFS on Linux, recently patched. Now, I want to enable EDAC so we can check for memory errors (and maybe PCI errors as well) but so far, repeatedly pounding on the Google hasn't yielded exactly what I need to do to enable EDAC.
One howto was covering PCI and edac, but "modprobe edac_mc" didn't work. Here's some information below, How do I get edac up and running? Many howtos cover how to use edac-ctl and edac-util, but none seem to cover how to determine what module to load into the kernel.
[root@hume ~]# modprobe edac_mc FATAL: Module edac_mc not found.
it seems to be compiled into the kernel.
$ grep -i -E 'mce|edac' /boot/config-2.6.32-431.11.2.el6.x86_64
[root@hume ~]# lsmod | grep edac [root@hume ~]# cat /proc/version Linux version 2.6.32-431.11.2.el6.x86_64
check the available modules
$ find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac'
-- LF
See below
On 06/26/2014 08:11 AM, Leon Fauster wrote:
Am 26.06.2014 um 00:08 schrieb Lists lists@benjamindsmith.com:
In order to support ZFS, we upgraded a backups server with a new, ECC motherboard. We're running CentOS 6 with ZFS on Linux, recently patched. Now, I want to enable EDAC so we can check for memory errors (and maybe PCI errors as well) but so far, repeatedly pounding on the Google hasn't yielded exactly what I need to do to enable EDAC.
One howto was covering PCI and edac, but "modprobe edac_mc" didn't work. Here's some information below, How do I get edac up and running? Many howtos cover how to use edac-ctl and edac-util, but none seem to cover how to determine what module to load into the kernel.
[root@hume ~]# modprobe edac_mc FATAL: Module edac_mc not found.
it seems to be compiled into the kernel.
$ grep -i -E 'mce|edac' /boot/config-2.6.32-431.11.2.el6.x86_64
[root@hume ~]# lsmod | grep edac [root@hume ~]# cat /proc/version Linux version 2.6.32-431.11.2.el6.x86_64
check the available modules
$ find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac'
[root@hume ~]# find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac' /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5400_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i82975x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7core_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/amd64_edac_mod.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_mce_amd.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7300_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/x38_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3200_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5100_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_core.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/e752x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/sb_edac.ko
Question is: which one do I use?
-Ben
Lists lists@benjamindsmith.com writes:
$ find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac'
[root@hume ~]# find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac' /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5400_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i82975x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7core_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/amd64_edac_mod.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_mce_amd.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7300_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/x38_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3200_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5100_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_core.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/e752x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/sb_edac.ko
Question is: which one do I use?
Try edac_core, it might load the module which is needed for the hardware you have.
On 06/27/2014 10:27 PM, lee wrote:
Lists lists@benjamindsmith.com writes:
$ find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac'
[root@hume ~]# find /lib/modules/2.6.32-431.11.2.el6.x86_64/ | grep -i -E 'edac' /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5400_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i82975x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7core_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/amd64_edac_mod.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3000_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_mce_amd.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7300_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/x38_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i3200_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i5100_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/edac_core.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/e752x_edac.ko /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/sb_edac.ko
Question is: which one do I use?
Try edac_core, it might load the module which is needed for the hardware you have.
Tried that, no luck so far...
[root@hume bin]# modprobe edac_core [root@hume bin]# lsmod | grep -i edac edac_core 46581 0 [root@hume bin]# edac-ctl --status edac-ctl: drivers not loaded.
[root@hume bin]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Pentium(R) CPU G2020 @ 2.90GHz stepping : 9 cpu MHz : 2900.103 cache size : 3072 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 5800.20 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: -- SNIP --
What might I search for to find the right driver?
Am 01.07.2014 23:42, schrieb Lists:
/lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/edac/i7core_edac.ko
model name : Intel(R) Pentium(R) CPU G2020 @ 2.90GHz
Try the i7core_edac module. If that does not fit there is no EDAC support for your Ivy Bridge generation CPU with the build-in memory controller.
Alexander
On 07/01/2014 03:42 PM, Alexander Dalloz wrote:
Try the i7core_edac module. If that does not fit there is no EDAC support for your Ivy Bridge generation CPU with the build-in memory controller.
Thank you very much. Now the module is loaded but I still don't see any devices.
[root@hume bin]# modprobe i7core_edac [root@hume bin]# edac-ctl --status edac-ctl: drivers are loaded. [root@hume bin]# ls -s /sys/devices/system/edac/mc total 0
Also, where would I go to find out what driver I should be using for other systems? The best introduction I've found so far doesn't seem to cover setup: http://www.admin-magazine.com/Articles/Monitoring-Memory-Errors
And the best documentation I've found so far doesn't mention the driver I just installed on its "supported hardware" chart. http://buttersideup.com/edacwiki/Main_Page
Am I being optimistic to think that I should be generally able to identify and/or log ECC error correction events with EL6?
Thanks,
Ben
On 07/01/2014 06:41 PM, Lists wrote:
Am I being optimistic to think that I should be generally able to identify and/or log ECC error correction events with EL6?
I've found the answer to my question, replying for future reference. EDAC really only applies to older systems. Use mcelog for newer (EG: 64 bit) systems where the CPU has a built in memory controller.
-Ben
Lists wrote:
On 07/01/2014 06:41 PM, Lists wrote:
Am I being optimistic to think that I should be generally able to identify and/or log ECC error correction events with EL6?
I've found the answer to my question, replying for future reference. EDAC really only applies to older systems. Use mcelog for newer (EG: 64 bit) systems where the CPU has a built in memory controller.
Excerpt for some OEMs who use memlog (like SGI).
mark