[CentOS] CentOS 6.5: NFS server crashes with list_add corruption errors

Fri Jan 31 08:55:37 UTC 2014
Alessio Cecchi <alessio at skye.it>

Hi Jeffrey,

this PowerEdge has a replica of the data on another server.

Do you believe it is a hardware problem and not software?

What makes you think that?

Thanks

Il 30/01/2014 16:36, Jeffrey Hass ha scritto:
> Allesio,
>
> Are these VM's -- did you move the /VM files/ respectively to backup
> location named as per VM? e.g.:
> /DB
> /CORE business Server
> /etc
>
> Because it looks like your PowerEdge system chocked and you may have to:
>
> A: get that back online, fire up the systems.. or
> B: replace
> C: restor/replace
>
> The errors are pretty obvious to me at first pass -- and if I was there
> I could tell in 5 minutes
> what is 'probably' wrong.. but that's my first pass at this.
>
> I hope you had some kind of failover/redundancy with the "appliance" --
>
> Goodluck,
>
> JJ Hass
>
>
> On 1/30/2014 3:24 AM, Alessio Cecchi wrote:
>> Hi,
>>
>> I'm running CentOS 6.5 as NFS server (v3 and v4) and exporting Ext4 and
>> XFS filesystem.
>>
>> After many months that all works fine today the server crash:
>>
>> Jan 30 09:46:13 qb-storage kernel: ------------[ cut here ]------------
>> Jan 30 09:46:13 qb-storage kernel: WARNING: at lib/list_debug.c:26
>> __list_add+0x6d/0xa0() (Not tainted)
>> Jan 30 09:46:13 qb-storage kernel: Hardware name: PowerEdge
>> Jan 30 09:46:13 qb-storage kernel: list_add corruption. next->prev
>> should be prev (ffff8804366c5df0), but was ffff8803f611fa68.
>> (next=ffff8803f611fa68).
>> Jan 30 09:46:13 qb-storage kernel: Modules linked in: nfsd lockd nfs_acl
>> auth_rpcgss sunrpc act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf
>> sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc
>> xt_statistic xt_time xt_connlimit xt_realm iptable_raw xt_comment
>> xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP
>> ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_set
>> ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip
>> nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
>> nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane
>> nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite
>> nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre
>> nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast
>> nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY
>> nf_tproxy_core nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner
>> xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac
>> xt_limit xt_length xt_iprange xt_help
>> Jan 30 09:46:13 qb-storage kernel: er xt_hashlimit xt_DSCP xt_dscp
>> xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_AUDIT
>> ipt_LOG xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
>> nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables ipv6 xfs
>> exportfs microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas sg
>> bnx2 lpc_ich mfd_core usb_storage ext4 jbd2 mbcache raid1 sr_mod cdrom
>> sd_mod crc_t10dif ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last
>> unloaded: speedstep_lib]
>> Jan 30 09:46:13 qb-storage kernel: Pid: 5759, comm: nfsd4 Not tainted
>> 2.6.32-431.1.2.0.1.el6.x86_64 #1
>> Jan 30 09:46:13 qb-storage kernel: Call Trace:
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071e27>] ?
>> warn_slowpath_common+0x87/0xc0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81071f16>] ?
>> warn_slowpath_fmt+0x46/0x50
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81527920>] ?
>> thread_return+0x4e/0x76e
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff812944ed>] ?
>> __list_add+0x6d/0xa0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd60a>] ?
>> laundromat_main+0x23a/0x3f0 [nfsd]
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffffa05bd3d0>] ?
>> laundromat_main+0x0/0x3f0 [nfsd]
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094d30>] ?
>> worker_thread+0x170/0x2a0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109b2b0>] ?
>> autoremove_wake_function+0x0/0x40
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff81094bc0>] ?
>> worker_thread+0x0/0x2a0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109af06>] ? kthread+0x96/0xa0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8109ae70>] ? kthread+0x0/0xa0
>> Jan 30 09:46:13 qb-storage kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
>> Jan 30 09:46:13 qb-storage kernel: ---[ end trace 13fa6e7d5ee2d668 ]---
>>
>> and:
>>
>> |BUG: soft lockup - CPU#0 stuck for 67s! [nfsd4:3519]
>>
>> The error is exactly like this:
>>
>> https://access.redhat.com/site/solutions/166583
>> |
>>
>> Does anyone know if the problem is solved and how?
>> Thanks
>>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


-- 
Alessio Cecchi is:
@ ILS -> http://www.linux.it/~alessice/
on LinkedIn -> http://www.linkedin.com/in/alessice
Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz
Cloud Email Hosting -> http://www.qboxmail.com
@ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it