[CentOS] Centos 6.6, apparent xfs corruption

Mon Sep 21 21:48:58 UTC 2015
James A. Peltier <jpeltier at sfu.ca>


----- Original Message -----
| -----BEGIN PGP SIGNED MESSAGE-----
| Hash: SHA1
| 
| I think you need to read this from the bottom up:
| 
| "Corruption of in-memory data detected.  Shutting down filesystem"
| so XFS calls xfs_do_force_shutdown to shut down the filesystem.  The
| call comes from fs/xfs/xfs_trans.c which fails, and so reports
| "Internal error xfs_trans_cancel".
| 
| In other words, I would look at the memory corruption first.  This
| _could_ be a kernel problem, but I would suggest starting with an
| extended memory check, it smells to me of a failing chip.
| 
| Just my 2d worth!
| 
| Martin
| 
| On 21/09/15 21:41, Nicholas Geovanis wrote:
| > Hi all - After several months of worry-free operation, we received
| > the following kernel messages about an xfs filesystem running under
| > CentOS 6.6. The proximate causes appear to be "Internal error
| > xfs_trans_cancel" and "Corruption of in-memory data detected.
| > Shutting down filesystem". The filesystem is back up, mounted,
| > appears to be working OK underlying a Splunk datastore. Does anyone
| > have a suggestion on diagnosis or known problems? Many
| > thanks.....Nick Geo
| > 
| > Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error
| > xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c.  Caller
| > 0xffffffffa01f1388 Sep 18 20:35:15 gries kernel: Sep 18 20:35:15
| > gries kernel: Pid: 24005, comm: splunkd Not tainted
| > 2.6.32-504.8.1.el6.x86_64 #1 Sep 18 20:35:15 gries kernel: Call
| > Trace: Sep 18 20:35:15 gries kernel: [<ffffffffa01d57bf>] ?
| > xfs_error_report+0x3f/0x50 [xfs] Sep 18 20:35:15 gries kernel:
| > [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15
| > gries kernel: [<ffffffffa01f2e55>] ? xfs_trans_cancel+0xf5/0x120
| > [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ?
| > xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel:
| > [<ffffffff8114eef9>] ? __do_fault+0x469/0x530 Sep 18 20:35:15 gries
| > kernel: [<ffffffffa02050d6>] ? xfs_vn_rename+0x66/0x70 [xfs] Sep 18
| > 20:35:15 gries kernel: [<ffffffff8119d149>] ?
| > vfs_rename+0x419/0x480 Sep 18 20:35:15 gries kernel:
| > [<ffffffff8119fab9>] ? sys_renameat+0x309/0x3a0 Sep 18 20:35:15
| > gries kernel: [<ffffffff8128c295>] ?
| > _atomic_dec_and_lock+0x55/0x80 Sep 18 20:35:15 gries kernel:
| > [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110 Sep 18 20:35:15
| > gries kernel: [<ffffffff810e5c87>] ?
| > audit_syscall_entry+0x1d7/0x200 Sep 18 20:35:15 gries kernel:
| > [<ffffffff8119fb6b>] ? sys_rename+0x1b/0x20 Sep 18 20:35:15 gries
| > kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Sep
| > 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8)
| > called from line 1949 of file fs/xfs/xfs_trans.c.  Return address
| > = 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2):
| > Corruption of in-memory data detected.  Shutting down filesystem
| > Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the
| > filesystem and rectify the problem(s) Sep 18 20:35:27 gries kernel:
| > XFS (dm-2): xfs_log_force: error 5 returned.

Do you have any XFS optimizations enabled in /etc/fstab such logbsize, nobarrier, etc?  is the filesystem full?  What percentage of the file system is available?  Some optimizations will cause a similar type of error when there is insufficient space for the extent allocations to take place or for file system rebalances to happen.

-- 
James A. Peltier
IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 604-365-6432
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices
Twitter : @sfu_rcg
Powering Engagement Through Technology