[CentOS] Centos 6.6, apparent xfs corruption

Mon Sep 21 21:23:37 UTC 2015
J Martin Rushton <martinrushton56 at btinternet.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I think you need to read this from the bottom up:

"Corruption of in-memory data detected.  Shutting down filesystem"
so XFS calls xfs_do_force_shutdown to shut down the filesystem.  The
call comes from fs/xfs/xfs_trans.c which fails, and so reports
"Internal error xfs_trans_cancel".

In other words, I would look at the memory corruption first.  This
_could_ be a kernel problem, but I would suggest starting with an
extended memory check, it smells to me of a failing chip.

Just my 2d worth!

Martin

On 21/09/15 21:41, Nicholas Geovanis wrote:
> Hi all - After several months of worry-free operation, we received
> the following kernel messages about an xfs filesystem running under
> CentOS 6.6. The proximate causes appear to be "Internal error
> xfs_trans_cancel" and "Corruption of in-memory data detected.
> Shutting down filesystem". The filesystem is back up, mounted,
> appears to be working OK underlying a Splunk datastore. Does anyone
> have a suggestion on diagnosis or known problems? Many
> thanks.....Nick Geo
> 
> Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error
> xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c.  Caller
> 0xffffffffa01f1388 Sep 18 20:35:15 gries kernel: Sep 18 20:35:15
> gries kernel: Pid: 24005, comm: splunkd Not tainted 
> 2.6.32-504.8.1.el6.x86_64 #1 Sep 18 20:35:15 gries kernel: Call
> Trace: Sep 18 20:35:15 gries kernel: [<ffffffffa01d57bf>] ? 
> xfs_error_report+0x3f/0x50 [xfs] Sep 18 20:35:15 gries kernel:
> [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15
> gries kernel: [<ffffffffa01f2e55>] ? xfs_trans_cancel+0xf5/0x120
> [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ?
> xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel:
> [<ffffffff8114eef9>] ? __do_fault+0x469/0x530 Sep 18 20:35:15 gries
> kernel: [<ffffffffa02050d6>] ? xfs_vn_rename+0x66/0x70 [xfs] Sep 18
> 20:35:15 gries kernel: [<ffffffff8119d149>] ?
> vfs_rename+0x419/0x480 Sep 18 20:35:15 gries kernel:
> [<ffffffff8119fab9>] ? sys_renameat+0x309/0x3a0 Sep 18 20:35:15
> gries kernel: [<ffffffff8128c295>] ? 
> _atomic_dec_and_lock+0x55/0x80 Sep 18 20:35:15 gries kernel:
> [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110 Sep 18 20:35:15
> gries kernel: [<ffffffff810e5c87>] ? 
> audit_syscall_entry+0x1d7/0x200 Sep 18 20:35:15 gries kernel:
> [<ffffffff8119fb6b>] ? sys_rename+0x1b/0x20 Sep 18 20:35:15 gries
> kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Sep
> 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8)
> called from line 1949 of file fs/xfs/xfs_trans.c.  Return address
> = 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2):
> Corruption of in-memory data detected.  Shutting down filesystem 
> Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the
> filesystem and rectify the problem(s) Sep 18 20:35:27 gries kernel:
> XFS (dm-2): xfs_log_force: error 5 returned. 
> _______________________________________________ CentOS mailing
> list CentOS at centos.org 
> https://lists.centos.org/mailman/listinfo/centos
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJWAHVZAAoJEAF3yXsqtyBlT7IQAM45t0n8I7aQ203LjBjSUx39
9O4xu8gTYb1XFdoM2DkzPAygKuiVYRiN3dgcMO6KP2mgT+MNK8G2043lY3v6w5wK
HzgYQ0/GwyDkJiy5EqaG6JWRUDyF788BU3kiWLJUxclsTqXN9Aw9E58aiu2duNvj
+e5WSflUbN1DdLep0LdGe0QR4QzsQBiFUhgt4i3EU6oYPQvS3dJyByPAOnD9t7+s
dbJQ1i7fDmLpCaYGvon8DoDQSE8aA/ums94NJzkPYyIza/D5pBfFf6r3RH3Xrg85
6aYFfjIBXcEQgq4DyEccJviaJ5eOWMCLocvMni6oWKml3+u6PtEvnw6sqIWoKwiC
xhyUVOXmF3qgH3xhx8pXMag0eO5hGm9ApGNckaXLy/j0AinCV9APvE9rAtYG94j+
IL0x9WCvtgduJvXZaSnekPaKKbT9MS1G+Zohi+WlY8u7PZlZdXzjyAgC8BPJQAyZ
yNendFRl7WQB1rbWZQJJD4tlhlU/Nwpwy6BtHn/lbhiYlFaTP1ytS09vToGTJw0A
BwX0+f4PnnYJV58X7WtEm1jhdsO/u+hykHqqmsq7ATsX9I6bkFTNwm13+Khf88zy
ve4fLJ/JEtJi2nVwD6K9mEqTO+I1CiGhJnOnfrphPsLa0WSkBtjl+FWM0jYFTSwR
TAavAlYHzW5/9BP0eNmL
=UGN5
-----END PGP SIGNATURE-----