[CentOS] corruption of in-memory data detected (xfs)

Wed Jul 2 02:22:55 UTC 2014
Eliezer Croitoru <eliezer at ngtech.co.il>

I had similar issue:
A nfs server with XFS as the FS for backup of a very large system.
I have a 2TB raid-1 volume and I started rsync the backup and then 
somewhere I got this issue.
There were lots of files there and the system has 8GB of ram and CentOS 
6.5 64bit.
I didn't bother to look at the issue due to the fact that ReiserFS was 
just OK with it without any issues.

I never new about the inode64 option, is it only on the mount options or 
also on the mkfs.xfs command?

Also in a case I want to test it again what would be a recommendation to 
not crash the system when there is lot's of memory in use?

Thanks,
Eliezer

On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:
>
> Hi All,
>
> I am having an issue with an XFS filesystem shutting down under high load with very many small files.
> Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the
> time, until I get to 9-11 mln small files (35k on average).
>
> at some point I get the following in dmesg:
>
> [2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
> Caller 0xffffffff8826bb7d
> [2870477.695558]
> [2870477.695559] Call Trace:
> [2870477.695611]  [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
> [2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
> [2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
> [2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
> [2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
> [2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
> [2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
> [2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
> [2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
> [2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
> [2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
> [2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
> [2870477.695977]
> [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c.  Return address =
> 0xffffffff88262c46
> [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected.  Shutting down filesystem: sda5
> [2870477.696464] Please umount the filesystem, and rectify the problem(s)
>
> # ls -l /store
> ls: /store: Input/output error
> ?--------- 0 root root 0 Jan  1  1970 /store
>
> Filesystems is ~1T in size
> # df -hT /store
> Filesystem    Type    Size  Used Avail Use% Mounted on
> /dev/sda5      xfs    910G  142G  769G  16% /store
>
>
> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>
>
> The filesystem is in a virtual machine (Xen) and on top of LVM.
>
> Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with
> CentOS 5.x by default.)
>
> These are the defaults with which the filesystem was created:
> # xfs_info /store
> meta-data=/dev/sda5              isize=256    agcount=32, agsize=7454720 blks
>           =                       sectsz=512   attr=0
> data     =                       bsize=4096   blocks=238551040, imaxpct=25
>           =                       sunit=0      swidth=0 blks, unwritten=1
> naming   =version 2              bsize=4096
> log      =internal               bsize=4096   blocks=32768, version=1
>           =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple
> servers of the same type. So, I doubt it's a memory issue or something like that.
>
> Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer
> kernel), but didn't see any xfs related fixes since CentOS 5.9
>
> Any help will be greatly appreciated...
>
>