[CentOS] corruption of in-memory data detected (xfs)

Eliezer Croitoru <eliezer at ngtech.co.il> writes:

> I had similar issue: A nfs server with XFS as the FS for backup of a
> very large system. I have a 2TB raid-1 volume and I started rsync the
> backup and then somewhere I got this issue. There were lots of files
> there and the system has 8GB of ram and CentOS 6.5 64bit. I didn't
> bother to look at the issue due to the fact that ReiserFS was just OK
> with it without any issues.
>
> I never new about the inode64 option, is it only on the mount options
> or also on the mkfs.xfs command?
>
> Also in a case I want to test it again what would be a recommendation
> to not crash the system when there is lot's of memory in use?

My systems have 17G of RAM and 1T xfs partitions. I was under the
impression that inode64 option only applies to FS larger than 1T in
size? 

> On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:
>> Hi All,
>> I am having an issue with an XFS filesystem shutting down under high
>> load with very many small files. Basically, I have around 3.5 - 4
>> million files on this filesystem. New files are being written to the
>> FS all the time, until I get to 9-11 mln small files (35k on
>> average).
>> at some point I get the following in dmesg:
>> [2870477.695512] Filesystem "sda5": XFS internal error
>> xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller
>> 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace:
>> [2870477.695611] [<ffffffff88262c28>]
>> :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643]
>> [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673]
>> [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707]
>> [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764]
>> [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791]
>> [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803]
>> [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814]
>> [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837]
>> [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861]
>> [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887]
>> [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899]
>> [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923]
>> [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933]
>> [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953]
>> [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963]
>> [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977]
>> [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
>> of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46
>> [2870477.696452] Filesystem "sda5": Corruption of in-memory data
>> detected. Shutting down filesystem: sda5 [2870477.696464] Please
>> umount the filesystem, and rectify the problem(s)
>> # ls -l /store ls: /store: Input/output error ?--------- 0 root root
>> 0 Jan 1 1970 /store
>> Filesystems is ~1T in size # df -hT /store Filesystem Type Size Used
>> Avail Use% Mounted on /dev/sda5 xfs 910G 142G 769G 16% /store
>>
>> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>>
>> The filesystem is in a virtual machine (Xen) and on top of LVM.
>> Filesystem was created using mkfs.xfs defaults with
>> xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS
>> 5.x by default.)
>> These are the defaults with which the filesystem was created: #
>> xfs_info /store meta-data=/dev/sda5 isize=256 agcount=32,
>> agsize=7454720 blks = sectsz=512 attr=0 data = bsize=4096
>> blocks=238551040, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1
>> naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768,
>> version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none
>> extsz=4096 blocks=0, rtextents=0
>> The problem is reproducible and I don't think it's hardware related.
>> The problem was reproduced on multiple servers of the same type. So,
>> I doubt it's a memory issue or something like that.
>> Is that a known issue? If it is then what's the fix? I went through
>> the kernel updates for CentOS 5.10 (newer kernel), but didn't see
>> any xfs related fixes since CentOS 5.9
>> Any help will be greatly appreciated...
>>
>>
>
> _______________________________________________ CentOS mailing list
> CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos
>
>

-- 
"In language, clarity is everything."  
- Confucius