"James A. Peltier" <jpeltier at sfu.ca> writes: > | I am having an issue with an XFS filesystem shutting down under high > | load with very many small files. Basically, I have around 3.5 - 4 > | million files on this filesystem. New files are being written to the > | FS all the time, until I get to 9-11 mln small files (35k on > | average). > | > | at some point I get the following in dmesg: > | > | [2870477.695512] Filesystem "sda5": XFS internal error > | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller > | 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: > | [2870477.695611] [<ffffffff88262c28>] > | :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] > | [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] > | [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] > | [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] > | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] > | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] > | [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] > | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] > | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] > | [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] > | [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 > | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 > | [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f > | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 > | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 > | [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 > | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 > | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 > | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 > | [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) > | called from line 1139 of file fs/xfs/xfs_trans.c. Return address = > | 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of > | in-memory data detected. Shutting down filesystem: sda5 > | [2870477.696464] Please umount the filesystem, and rectify the > | problem(s) > | > | # ls -l /store ls: /store: Input/output error ?--------- 0 root root > | 0 Jan 1 1970 /store > | > | Filesystems is ~1T in size # df -hT /store Filesystem Type > | Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G > | 769G 16% /store > | > | > | Using CentOS 5.9 with kernel 2.6.18-348.el5xen > | > | > | The filesystem is in a virtual machine (Xen) and on top of LVM. > | > | Filesystem was created using mkfs.xfs defaults with > | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS > | 5.x by default.) > | > | These are the defaults with which the filesystem was created: # > | xfs_info /store meta-data=/dev/sda5 isize=256 > | agcount=32, agsize=7454720 blks = > | sectsz=512 attr=0 data = bsize=4096 > | blocks=238551040, imaxpct=25 = > | sunit=0 swidth=0 blks, unwritten=1 naming =version > | 2 bsize=4096 log =internal > | bsize=4096 blocks=32768, version=1 > | = sectsz=512 sunit=0 blks, > | lazy-count=0 realtime =none extsz=4096 blocks=0, > | rtextents=0 > | > | The problem is reproducible and I don't think it's hardware related. > | The problem was reproduced on multiple servers of the same type. So, > | I doubt it's a memory issue or something like that. > | > | Is that a known issue? If it is then what's the fix? I went through > | the kernel updates for CentOS 5.10 (newer kernel), but didn't see > | any xfs related fixes since CentOS 5.9 > | > | Any help will be greatly appreciated... > | > | > | -- "If we really understand the problem, the answer will come out of > | it, because the answer is not separate from the problem." - > | Krishnamurti > > Sorry, further to this, most bugs related to XFS are related to kernel > bugs. I can see that you're running an older kernel and just because > you don't see the bugs listed in the errata doesn't mean the bugs > haven't been found as part of the backport process So, you suggest I try my luck with the newer kernel from CentOS 5.10? What's the proper way to open a bug for this against CentOS 5 / RHEL 5? -- "Individual rights are not subject to a public vote; a majority has no right to vote away the rights of a minority; the political function of rights is precisely to protect minorities from oppression by majorities (and the smallest minority on earth is the individual)." - Ayn Rand