[CentOS] corruption of in-memory data detected (xfs)

James A. Peltier jpeltier at sfu.ca
Tue Jul 1 18:32:39 UTC 2014


----- Original Message -----
| 
| Hi All,
| 
| I am having an issue with an XFS filesystem shutting down under high
| load with very many small files.
| Basically, I have around 3.5 - 4 million files on this filesystem.
| New files are being written to the FS all the
| time, until I get to 9-11 mln small files (35k on average).
| 
| at some point I get the following in dmesg:
| 
| [2870477.695512] Filesystem "sda5": XFS internal error
| xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
| Caller 0xffffffff8826bb7d
| [2870477.695558]
| [2870477.695559] Call Trace:
| [2870477.695611]  [<ffffffff88262c28>]
| :xfs:xfs_trans_cancel+0x5b/0xfe
| [2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
| [2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
| [2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| [2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
| [2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
| [2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
| [2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
| [2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
| [2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
| [2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
| [2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
| [2870477.695977]
| [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
| of file fs/xfs/xfs_trans.c.  Return address =
| 0xffffffff88262c46
| [2870477.696452] Filesystem "sda5": Corruption of in-memory data
| detected.  Shutting down filesystem: sda5
| [2870477.696464] Please umount the filesystem, and rectify the
| problem(s)
| 
| # ls -l /store
| ls: /store: Input/output error
| ?--------- 0 root root 0 Jan  1  1970 /store
| 
| Filesystems is ~1T in size
| # df -hT /store
| Filesystem    Type    Size  Used Avail Use% Mounted on
| /dev/sda5      xfs    910G  142G  769G  16% /store
| 
| 
| Using CentOS 5.9 with kernel 2.6.18-348.el5xen
| 
| 
| The filesystem is in a virtual machine (Xen) and on top of LVM.
| 
| Filesystem was created using mkfs.xfs defaults with
| xfsprogs-2.9.4-1.el5.centos (that's the one that comes with
| CentOS 5.x by default.)
| 
| These are the defaults with which the filesystem was created:
| # xfs_info /store
| meta-data=/dev/sda5              isize=256    agcount=32,
| agsize=7454720 blks
|          =                       sectsz=512   attr=0
| data     =                       bsize=4096   blocks=238551040,
| imaxpct=25
|          =                       sunit=0      swidth=0 blks,
|          unwritten=1
| naming   =version 2              bsize=4096
| log      =internal               bsize=4096   blocks=32768, version=1
|          =                       sectsz=512   sunit=0 blks,
|          lazy-count=0
| realtime =none                   extsz=4096   blocks=0, rtextents=0
| 
| The problem is reproducible and I don't think it's hardware related.
| The problem was reproduced on multiple
| servers of the same type. So, I doubt it's a memory issue or
| something like that.
| 
| Is that a known issue? If it is then what's the fix? I went through
| the kernel updates for CentOS 5.10 (newer
| kernel), but didn't see any xfs related fixes since CentOS 5.9
| 
| Any help will be greatly appreciated...
| 
| 
| --
| "If we really understand the problem, the answer will come out of it,
| because the answer is not separate from the problem."
| - Krishnamurti

Sorry, further to this, most bugs related to XFS are related to kernel bugs.  I can see that you're running an older kernel and just because you don't see the bugs listed in the errata doesn't mean the bugs haven't been found as part of the backport process

-- 
James A. Peltier
Manager, IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.



More information about the CentOS mailing list