In order to support ZFS, we upgraded a backups server with a new, ECC motherboard. We're running CentOS 6 with ZFS on Linux, recently patched. Now, I want to enable EDAC so we can check for memory errors (and maybe PCI errors as well) but so far, repeatedly pounding on the Google hasn't yielded exactly what I need to do to enable EDAC.
One howto was covering PCI and edac, but "modprobe edac_mc" didn't work. Here's some information below, How do I get edac up and running? Many howtos cover how to use edac-ctl and edac-util, but none seem to cover how to determine what module to load into the kernel.
[root@hume ~]# modprobe edac_mc FATAL: Module edac_mc not found. [root@hume ~]# lsmod | grep edac [root@hume ~]# cat /proc/version Linux version 2.6.32-431.11.2.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014 [root@hume ~]# modprobe edac_mce FATAL: Module edac_mce not found. [root@hume ~]# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X9SCL/X9SCM [root@hume ~]# edac-ctl --status edac-ctl: drivers not loaded. [root@hume ~]# lsmod Module Size Used by ext3 240013 1 jbd 80858 1 ext3 nfsd 309196 13 lockd 73662 1 nfsd nfs_acl 2647 1 nfsd auth_rpcgss 44949 1 nfsd sunrpc 262864 20 nfsd,lockd,nfs_acl,auth_rpcgss exportfs 4236 1 nfsd bnx2fc 90539 0 fcoe 23298 0 libfcoe 56791 2 bnx2fc,fcoe 8021q 25349 0 garp 7152 1 8021q stp 2218 1 garp libfc 108670 3 bnx2fc,fcoe,libfcoe llc 5546 2 garp,stp scsi_transport_fc 55299 3 bnx2fc,fcoe,libfc scsi_tgt 12077 1 scsi_transport_fc ipt_REJECT 2351 2 nf_conntrack_ipv4 9506 15 nf_defrag_ipv4 1483 1 nf_conntrack_ipv4 iptable_filter 2793 1 ip_tables 17831 1 iptable_filter ip6t_REJECT 4628 2 nf_conntrack_ipv6 8337 2 nf_defrag_ipv6 11156 1 nf_conntrack_ipv6 xt_state 1492 17 nf_conntrack 79758 3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state ip6table_filter 2889 1 ip6_tables 18732 1 ip6table_filter iTCO_wdt 7115 0 iTCO_vendor_support 3056 1 iTCO_wdt zfs 1152935 53 zcommon 44698 1 zfs znvpair 80460 2 zfs,zcommon zavl 6925 1 zfs zunicode 323159 1 zfs spl 260832 5 zfs,zcommon,znvpair,zavl,zunicode zlib_deflate 21629 1 spl i2c_i801 11359 0 i2c_core 31084 1 i2c_i801 ses 6475 0 enclosure 8438 1 ses sg 29350 0 lpc_ich 12803 0 mfd_core 1895 1 lpc_ich shpchp 32778 0 ext4 374902 3 jbd2 93427 1 ext4 mbcache 8193 2 ext3,ext4 raid1 32045 2 usb_storage 49068 5 sd_mod 39069 27 crc_t10dif 1541 1 sd_mod ata_generic 3837 0 pata_acpi 3701 0 pata_jmicron 2813 2 video 20674 0 output 2409 1 video e1000e 267701 0 ptp 9614 1 e1000e pps_core 11458 1 ptp ahci 42247 8 xhci_hcd 148886 0 dm_mirror 14384 0 dm_region_hash 12085 1 dm_mirror dm_log 9930 2 dm_mirror,dm_region_hash dm_mod 84209 2 dm_mirror,dm_log be2iscsi 99578 0 bnx2i 48096 0 cnic 57079 2 bnx2fc,bnx2i uio 10462 1 cnic ipv6 317829 56 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,cnic cxgb4i 28361 0 cxgb4 104882 1 cxgb4i cxgb3i 24491 0 libcxgbi 52202 2 cxgb4i,cxgb3i cxgb3 152922 1 cxgb3i mdio 4769 1 cxgb3 libiscsi_tcp 17020 3 cxgb4i,cxgb3i,libcxgbi qla4xxx 257114 0 iscsi_boot_sysfs 9458 2 be2iscsi,qla4xxx libiscsi 49836 7 be2iscsi,bnx2i,cxgb4i,cxgb3i,libcxgbi,libiscsi_tcp,qla4xxx scsi_transport_iscsi 84241 5 be2iscsi,bnx2i,libcxgbi,qla4xxx,libiscsi
[root@hume ~]# rpm -qi edac-utils Name : edac-utils Relocations: (not relocatable) Version : 0.9 Vendor: CentOS Release : 14.el6 Build Date: Wed 20 Jul 2011 11:13:34 AM UTC Install Date: Wed 25 Jun 2014 09:27:40 PM UTC Build Host: c6b6.bsys.dev.centos.org Group : System Environment/Base Source RPM: edac-utils-0.9-14.el6.src.rpm Size : 78637 License: GPLv2+ Signature : RSA/SHA1, Mon 26 Sep 2011 04:17:58 AM UTC, Key ID 0946fca2c105b9de Packager : CentOS BuildSystem http://bugs.centos.org URL : http://sourceforge.net/projects/edac-utils/ Summary : Userspace helper for kernel EDAC drivers Description : EDAC is the current set of drivers in the Linux kernel that handle detection of ECC errors from memory controllers for most chipsets on i386 and x86_64 architectures. This userspace component consists of an init script which makes sure EDAC drivers and DIMM labels are loaded at system startup, as well as a library and utility for reporting current error counts from the EDAC sysfs files.