Hi All,
I am having an issue with an XFS filesystem shutting down under high load with very many small files. Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average).
at some point I get the following in dmesg:
[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: [2870477.695611] [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5 [2870477.696464] Please umount the filesystem, and rectify the problem(s)
# ls -l /store ls: /store: Input/output error ?--------- 0 root root 0 Jan 1 1970 /store
Filesystems is ~1T in size # df -hT /store Filesystem Type Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G 769G 16% /store
Using CentOS 5.9 with kernel 2.6.18-348.el5xen
The filesystem is in a virtual machine (Xen) and on top of LVM.
Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.)
These are the defaults with which the filesystem was created: # xfs_info /store meta-data=/dev/sda5 isize=256 agcount=32, agsize=7454720 blks = sectsz=512 attr=0 data = bsize=4096 blocks=238551040, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0
The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that.
Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9
Any help will be greatly appreciated...
----- Original Message ----- | | Hi All, | | I am having an issue with an XFS filesystem shutting down under high | load with very many small files. | Basically, I have around 3.5 - 4 million files on this filesystem. | New files are being written to the FS all the | time, until I get to 9-11 mln small files (35k on average). | | at some point I get the following in dmesg: | | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. | Caller 0xffffffff8826bb7d | [2870477.695558] | [2870477.695559] Call Trace: | [2870477.695611] [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe | [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 | [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 | [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb | [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff | [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] | [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 | of file fs/xfs/xfs_trans.c. Return address = | 0xffffffff88262c46 | [2870477.696452] Filesystem "sda5": Corruption of in-memory data | detected. Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | | # ls -l /store | ls: /store: Input/output error | ?--------- 0 root root 0 Jan 1 1970 /store | | Filesystems is ~1T in size | # df -hT /store | Filesystem Type Size Used Avail Use% Mounted on | /dev/sda5 xfs 910G 142G 769G 16% /store | | | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | | | The filesystem is in a virtual machine (Xen) and on top of LVM. | | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with | CentOS 5.x by default.) | | These are the defaults with which the filesystem was created: | # xfs_info /store | meta-data=/dev/sda5 isize=256 agcount=32, | agsize=7454720 blks | = sectsz=512 attr=0 | data = bsize=4096 blocks=238551040, | imaxpct=25 | = sunit=0 swidth=0 blks, | unwritten=1 | naming =version 2 bsize=4096 | log =internal bsize=4096 blocks=32768, version=1 | = sectsz=512 sunit=0 blks, | lazy-count=0 | realtime =none extsz=4096 blocks=0, rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple | servers of the same type. So, I doubt it's a memory issue or | something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer | kernel), but didn't see any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated... | | | -- | "If we really understand the problem, the answer will come out of it, | because the answer is not separate from the problem." | - Krishnamurti
Is this filesystem mounted with the inode64 option?
"James A. Peltier" jpeltier@sfu.ca writes:
| I am having an issue with an XFS filesystem shutting down under high | load with very many small files. Basically, I have around 3.5 - 4 | million files on this filesystem. New files are being written to the | FS all the time, until I get to 9-11 mln small files (35k on | average). | at some point I get the following in dmesg: | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller | 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: | [2870477.695611] [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] | [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] | [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] | [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] | [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] | [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] | [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) | called from line 1139 of file fs/xfs/xfs_trans.c. Return address = | 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of | in-memory data detected. Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | # ls -l /store ls: /store: Input/output error ?--------- 0 root root | 0 Jan 1 1970 /store | Filesystems is ~1T in size # df -hT /store Filesystem Type | Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G | 769G 16% /store | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | The filesystem is in a virtual machine (Xen) and on top of LVM. | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS | 5.x by default.) | | These are the defaults with which the filesystem was created: # | xfs_info /store meta-data=/dev/sda5 isize=256 | agcount=32, agsize=7454720 blks = | sectsz=512 attr=0 data = bsize=4096 | blocks=238551040, imaxpct=25 = | sunit=0 swidth=0 blks, unwritten=1 naming =version | 2 bsize=4096 log =internal | bsize=4096 blocks=32768, version=1 | = sectsz=512 sunit=0 blks, | lazy-count=0 realtime =none extsz=4096 blocks=0, | rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple servers of the same type. So, | I doubt it's a memory issue or something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer kernel), but didn't see | any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated...
Is this filesystem mounted with the inode64 option?
No, since FS is slightly smaller than 1T in size. From my understanding inode64 would be required for XFS filesystems larger than 1T?
----- Original Message ----- | | Hi All, | | I am having an issue with an XFS filesystem shutting down under high | load with very many small files. | Basically, I have around 3.5 - 4 million files on this filesystem. | New files are being written to the FS all the | time, until I get to 9-11 mln small files (35k on average). | | at some point I get the following in dmesg: | | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. | Caller 0xffffffff8826bb7d | [2870477.695558] | [2870477.695559] Call Trace: | [2870477.695611] [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe | [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 | [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 | [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb | [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff | [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] | [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 | of file fs/xfs/xfs_trans.c. Return address = | 0xffffffff88262c46 | [2870477.696452] Filesystem "sda5": Corruption of in-memory data | detected. Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | | # ls -l /store | ls: /store: Input/output error | ?--------- 0 root root 0 Jan 1 1970 /store | | Filesystems is ~1T in size | # df -hT /store | Filesystem Type Size Used Avail Use% Mounted on | /dev/sda5 xfs 910G 142G 769G 16% /store | | | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | | | The filesystem is in a virtual machine (Xen) and on top of LVM. | | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with | CentOS 5.x by default.) | | These are the defaults with which the filesystem was created: | # xfs_info /store | meta-data=/dev/sda5 isize=256 agcount=32, | agsize=7454720 blks | = sectsz=512 attr=0 | data = bsize=4096 blocks=238551040, | imaxpct=25 | = sunit=0 swidth=0 blks, | unwritten=1 | naming =version 2 bsize=4096 | log =internal bsize=4096 blocks=32768, version=1 | = sectsz=512 sunit=0 blks, | lazy-count=0 | realtime =none extsz=4096 blocks=0, rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple | servers of the same type. So, I doubt it's a memory issue or | something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer | kernel), but didn't see any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated... | | | -- | "If we really understand the problem, the answer will come out of it, | because the answer is not separate from the problem." | - Krishnamurti
Sorry, further to this, most bugs related to XFS are related to kernel bugs. I can see that you're running an older kernel and just because you don't see the bugs listed in the errata doesn't mean the bugs haven't been found as part of the backport process
"James A. Peltier" jpeltier@sfu.ca writes:
| I am having an issue with an XFS filesystem shutting down under high | load with very many small files. Basically, I have around 3.5 - 4 | million files on this filesystem. New files are being written to the | FS all the time, until I get to 9-11 mln small files (35k on | average). | | at some point I get the following in dmesg: | | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller | 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: | [2870477.695611] [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] | [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] | [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] | [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] | [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] | [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] | [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) | called from line 1139 of file fs/xfs/xfs_trans.c. Return address = | 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of | in-memory data detected. Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | | # ls -l /store ls: /store: Input/output error ?--------- 0 root root | 0 Jan 1 1970 /store | | Filesystems is ~1T in size # df -hT /store Filesystem Type | Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G | 769G 16% /store | | | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | | | The filesystem is in a virtual machine (Xen) and on top of LVM. | | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS | 5.x by default.) | | These are the defaults with which the filesystem was created: # | xfs_info /store meta-data=/dev/sda5 isize=256 | agcount=32, agsize=7454720 blks = | sectsz=512 attr=0 data = bsize=4096 | blocks=238551040, imaxpct=25 = | sunit=0 swidth=0 blks, unwritten=1 naming =version | 2 bsize=4096 log =internal | bsize=4096 blocks=32768, version=1 | = sectsz=512 sunit=0 blks, | lazy-count=0 realtime =none extsz=4096 blocks=0, | rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple servers of the same type. So, | I doubt it's a memory issue or something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer kernel), but didn't see | any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated... | | | -- "If we really understand the problem, the answer will come out of | it, because the answer is not separate from the problem." - | Krishnamurti
Sorry, further to this, most bugs related to XFS are related to kernel bugs. I can see that you're running an older kernel and just because you don't see the bugs listed in the errata doesn't mean the bugs haven't been found as part of the backport process
So, you suggest I try my luck with the newer kernel from CentOS 5.10?
What's the proper way to open a bug for this against CentOS 5 / RHEL 5?
On Tue, 01 Jul 2014 13:09:04 -0700 Alexandru Cardaniuc wrote:
What's the proper way to open a bug for this against CentOS 5 / RHEL 5?
If you try it with the latest kernel and it works, then I don't think there is any bug to file.
On Jul 1, 2014 11:02 PM, "Frank Cox" theatre@melvilletheatre.com wrote:
On Tue, 01 Jul 2014 13:09:04 -0700 Alexandru Cardaniuc wrote:
What's the proper way to open a bug for this against CentOS 5 / RHEL 5?
If you try it with the latest kernel and it works, then I don't think
there is any bug to file.
Have you seen this: http://marc.info/?l=linux-kernel&m=116476406605998&w=2
It might not even be a bug but a hardware issue...
- Jitse
----- Original Message ----- | "James A. Peltier" jpeltier@sfu.ca writes: | | > | I am having an issue with an XFS filesystem shutting down under | > | high | > | load with very many small files. Basically, I have around 3.5 - 4 | > | million files on this filesystem. New files are being written to | > | the | > | FS all the time, until I get to 9-11 mln small files (35k on | > | average). | > | | > | at some point I get the following in dmesg: | > | | > | [2870477.695512] Filesystem "sda5": XFS internal error | > | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller | > | 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: | > | [2870477.695611] [<ffffffff88262c28>] | > | :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] | > | [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] | > | [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] | > | [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb | > | [2870477.695726] | > | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] | > | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] | > | [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] | > | [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] | > | [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] | > | [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] | > | [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | > | [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | > | [2870477.695829] [<ffffffff80264929>] | > | _spin_lock_irqsave+0x9/0x14 | > | [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f | > | [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | > | [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | > | [2870477.695899] [<ffffffff80264929>] | > | _spin_lock_irqsave+0x9/0x14 | > | [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | > | [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | > | [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 | > | [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 | > | [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) | > | called from line 1139 of file fs/xfs/xfs_trans.c. Return address | > | = | > | 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption | > | of | > | in-memory data detected. Shutting down filesystem: sda5 | > | [2870477.696464] Please umount the filesystem, and rectify the | > | problem(s) | > | | > | # ls -l /store ls: /store: Input/output error ?--------- 0 root | > | root | > | 0 Jan 1 1970 /store | > | | > | Filesystems is ~1T in size # df -hT /store Filesystem Type | > | Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G | > | 769G 16% /store | > | | > | | > | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | > | | > | | > | The filesystem is in a virtual machine (Xen) and on top of LVM. | > | | > | Filesystem was created using mkfs.xfs defaults with | > | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with | > | CentOS | > | 5.x by default.) | > | | > | These are the defaults with which the filesystem was created: # | > | xfs_info /store meta-data=/dev/sda5 isize=256 | > | agcount=32, agsize=7454720 blks = | > | sectsz=512 attr=0 data = bsize=4096 | > | blocks=238551040, imaxpct=25 = | > | sunit=0 swidth=0 blks, unwritten=1 naming | > | =version | > | 2 bsize=4096 log =internal | > | bsize=4096 blocks=32768, version=1 | > | = sectsz=512 sunit=0 blks, | > | lazy-count=0 realtime =none extsz=4096 | > | blocks=0, | > | rtextents=0 | > | | > | The problem is reproducible and I don't think it's hardware | > | related. | > | The problem was reproduced on multiple servers of the same type. | > | So, | > | I doubt it's a memory issue or something like that. | > | | > | Is that a known issue? If it is then what's the fix? I went | > | through | > | the kernel updates for CentOS 5.10 (newer kernel), but didn't see | > | any xfs related fixes since CentOS 5.9 | > | | > | Any help will be greatly appreciated... | > | | > | | > | -- "If we really understand the problem, the answer will come out | > | of | > | it, because the answer is not separate from the problem." - | > | Krishnamurti | > | > Sorry, further to this, most bugs related to XFS are related to | > kernel | > bugs. I can see that you're running an older kernel and just | > because | > you don't see the bugs listed in the errata doesn't mean the bugs | > haven't been found as part of the backport process | | So, you suggest I try my luck with the newer kernel from CentOS 5.10? | | What's the proper way to open a bug for this against CentOS 5 / RHEL | 5?
The recommendation is to always run the latest kernel before filing a bug. Looking at the stack trace it appears that this system is doing a lot of locking, IRQ and XFS/VFS. You're probably looking too closely for something that is XFS specific rather than something that may be SCSI/FC related or VFS related. There have been seven CentOS 5 kernel updates since your currently running kernel, covering many facets of file systems, drivers and subsystems.
That said a way to possibly mitigate this may be to attempt to use the noatime mount option which may delay the problem.
I had similar issue: A nfs server with XFS as the FS for backup of a very large system. I have a 2TB raid-1 volume and I started rsync the backup and then somewhere I got this issue. There were lots of files there and the system has 8GB of ram and CentOS 6.5 64bit. I didn't bother to look at the issue due to the fact that ReiserFS was just OK with it without any issues.
I never new about the inode64 option, is it only on the mount options or also on the mkfs.xfs command?
Also in a case I want to test it again what would be a recommendation to not crash the system when there is lot's of memory in use?
Thanks, Eliezer
On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:
Hi All,
I am having an issue with an XFS filesystem shutting down under high load with very many small files. Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average).
at some point I get the following in dmesg:
[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: [2870477.695611] [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5 [2870477.696464] Please umount the filesystem, and rectify the problem(s)
# ls -l /store ls: /store: Input/output error ?--------- 0 root root 0 Jan 1 1970 /store
Filesystems is ~1T in size # df -hT /store Filesystem Type Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G 769G 16% /store
Using CentOS 5.9 with kernel 2.6.18-348.el5xen
The filesystem is in a virtual machine (Xen) and on top of LVM.
Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.)
These are the defaults with which the filesystem was created: # xfs_info /store meta-data=/dev/sda5 isize=256 agcount=32, agsize=7454720 blks = sectsz=512 attr=0 data = bsize=4096 blocks=238551040, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0
The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that.
Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9
Any help will be greatly appreciated...
----- Original Message ----- | I had similar issue: | A nfs server with XFS as the FS for backup of a very large system. | I have a 2TB raid-1 volume and I started rsync the backup and then | somewhere I got this issue. | There were lots of files there and the system has 8GB of ram and | CentOS | 6.5 64bit. | I didn't bother to look at the issue due to the fact that ReiserFS | was | just OK with it without any issues. | | I never new about the inode64 option, is it only on the mount options | or | also on the mkfs.xfs command? | | Also in a case I want to test it again what would be a recommendation | to | not crash the system when there is lot's of memory in use? | | Thanks, | Eliezer
inode64 is a mount time option and it is a one way option as well. Once you mounted a filesystem with inode64 you can't go back. It has to do with inode allocation. If you have older operating systems mounting a filesystem with inode64 will lead to "odd behaviour" because it allows the inodes to be allocated anywhere in the filesystem instead of "stuck" within the first 1TB. inode64 leads to better filesystem performance for large filesystems. Nothing need be done during the mkfs portion.
On 7/1/2014 9:40 PM, James A. Peltier wrote:
inode64 is a mount time option and it is a one way option as well. Once you mounted a filesystem with inode64 you can't go back. It has to do with inode allocation. If you have older operating systems mounting a filesystem with inode64 will lead to "odd behaviour" because it allows the inodes to be allocated anywhere in the filesystem instead of "stuck" within the first 1TB. inode64 leads to better filesystem performance for large filesystems. Nothing need be done during the mkfs portion.
if you don't use inode64, once the first 1TB is completely filled, it will have no more room for inodes.
I just noticed, the OP is running a large XFS system on EL 5 ? I didn't think XFS was officially supported on 5, and was considered experimental. I would strongly urge installing centos 6.latest ASAP and using that instead
----- Original Message ----- | On 7/1/2014 9:40 PM, James A. Peltier wrote: | > inode64 is a mount time option and it is a one way option as well. | > Once you mounted a filesystem with inode64 you can't go back. It | > has to do with inode allocation. If you have older operating | > systems mounting a filesystem with inode64 will lead to "odd | > behaviour" because it allows the inodes to be allocated anywhere | > in the filesystem instead of "stuck" within the first 1TB. | > inode64 leads to better filesystem performance for large | > filesystems. Nothing need be done during the mkfs portion. | if you don't use inode64, once the first 1TB is completely filled, it | will have no more room for inodes. | | I just noticed, the OP is running a large XFS system on EL 5 ? I | didn't | think XFS was officially supported on 5, and was considered | experimental. I would strongly urge installing centos 6.latest ASAP | and using that instead | | | -- | john r pierce 37N 122W | somewhere on the middle of the left coast
OP only has a 1TB volume, so not large. XFS was supported in later versions of 5 (around 5.5 maybe) but kickstart didn't handle it because xfsprogs was not included in anaconda and so you couldn't format a XFS filesystem during install without a %post section. Moving to a later kernel at the very least is recommended but yes, running 6 would be better. Running 7 would be best since XFS is the default for 7.. Oh wait... umm.. nvm.. Soon ;)
John R Pierce pierce@hogranch.com writes:
On 7/1/2014 9:40 PM, James A. Peltier wrote:
inode64 is a mount time option and it is a one way option as well. Once you mounted a filesystem with inode64 you can't go back. It has to do with inode allocation. If you have older operating systems mounting a filesystem with inode64 will lead to "odd behaviour" because it allows the inodes to be allocated anywhere in the filesystem instead of "stuck" within the first 1TB. inode64 leads to better filesystem performance for large filesystems. Nothing need be done during the mkfs portion.
if you don't use inode64, once the first 1TB is completely filled, it will have no more room for inodes.
I just noticed, the OP is running a large XFS system on EL 5 ? I didn't think XFS was officially supported on 5, and was considered experimental. I would strongly urge installing centos 6.latest ASAP and using that instead
Yes, I run XFS on ~1T (900G) partition, so I don't think I need to consider inode64 for that. What is the official situation with XFS and CentOS 5? It was in technology preview in CentOS 5.4 I think? How about now?
On 7/6/2014 9:09 PM, Alexandru Cardaniuc wrote:
Yes, I run XFS on ~1T (900G) partition, so I don't think I need to consider inode64 for that. What is the official situation with XFS and CentOS 5? It was in technology preview in CentOS 5.4 I think? How about now?
5 is very close to EOL now. I never considered XFS as anything other than a preview in 5, I don't believe that was changed in the later updates, the only mention is in the 5.4 release notes, not 5.5-5.10.
I only use XFS on centos 6, where its very stable.
On 07.Jul.2014, at 06:51, John R Pierce pierce@hogranch.com wrote:
On 7/6/2014 9:09 PM, Alexandru Cardaniuc wrote:
Yes, I run XFS on ~1T (900G) partition, so I don't think I need to consider inode64 for that. What is the official situation with XFS and CentOS 5? It was in technology preview in CentOS 5.4 I think? How about now?
5 is very close to EOL now.
End of Production 3 (End of Production Phase) is on March 31 2017 [1] That's not that very close in my opinion.
And regarding xfs from the Release Notes of 5.7 [2] "Usage of XFS in conjunction with Red Hat Enterprise Linux 5.7 High Availability Add-On/Clustering as a file system resource is now fully supported." Whatever that means.
[1] https://access.redhat.com/support/policy/updates/errata [2] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/htm...
On 07/06/2014 11:09 PM, Alexandru Cardaniuc wrote:
John R Pierce pierce@hogranch.com writes:
On 7/1/2014 9:40 PM, James A. Peltier wrote:
inode64 is a mount time option and it is a one way option as well. Once you mounted a filesystem with inode64 you can't go back. It has to do with inode allocation. If you have older operating systems mounting a filesystem with inode64 will lead to "odd behaviour" because it allows the inodes to be allocated anywhere in the filesystem instead of "stuck" within the first 1TB. inode64 leads to better filesystem performance for large filesystems. Nothing need be done during the mkfs portion.
if you don't use inode64, once the first 1TB is completely filled, it will have no more room for inodes. I just noticed, the OP is running a large XFS system on EL 5 ? I didn't think XFS was officially supported on 5, and was considered experimental. I would strongly urge installing centos 6.latest ASAP and using that instead
Yes, I run XFS on ~1T (900G) partition, so I don't think I need to consider inode64 for that. What is the official situation with XFS and CentOS 5? It was in technology preview in CentOS 5.4 I think? How about now?
XFS official support was added to RHEL in 5.7, so therefore it is in our source code.
Although, all that means is you get to ask on this list for help in CentOS. Any support on CentOS is what the community can provide you or that you can provide yourself.
----- Original Message ----- | John R Pierce pierce@hogranch.com writes: | | > On 7/1/2014 9:40 PM, James A. Peltier wrote: | >> inode64 is a mount time option and it is a one way option as well. | >> Once you mounted a filesystem with inode64 you can't go back. It | >> has | >> to do with inode allocation. If you have older operating systems | >> mounting a filesystem with inode64 will lead to "odd behaviour" | >> because it allows the inodes to be allocated anywhere in the | >> filesystem instead of "stuck" within the first 1TB. inode64 leads | >> to | >> better filesystem performance for large filesystems. Nothing need | >> be | >> done during the mkfs portion. | > if you don't use inode64, once the first 1TB is completely filled, | > it | > will have no more room for inodes. | | > I just noticed, the OP is running a large XFS system on EL 5 ? I | > didn't think XFS was officially supported on 5, and was considered | > experimental. I would strongly urge installing centos 6.latest ASAP | > and using that instead | | Yes, I run XFS on ~1T (900G) partition, so I don't think I need to | consider inode64 for that. What is the official situation with XFS | and | CentOS 5? It was in technology preview in CentOS 5.4 I think? How | about | now?
It is officially supported. Update to the latest kernel and report back otherwise we won't be able to continue to help.
Eliezer Croitoru eliezer@ngtech.co.il writes:
I had similar issue: A nfs server with XFS as the FS for backup of a very large system. I have a 2TB raid-1 volume and I started rsync the backup and then somewhere I got this issue. There were lots of files there and the system has 8GB of ram and CentOS 6.5 64bit. I didn't bother to look at the issue due to the fact that ReiserFS was just OK with it without any issues.
I never new about the inode64 option, is it only on the mount options or also on the mkfs.xfs command?
Also in a case I want to test it again what would be a recommendation to not crash the system when there is lot's of memory in use?
My systems have 17G of RAM and 1T xfs partitions. I was under the impression that inode64 option only applies to FS larger than 1T in size?
On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:
Hi All, I am having an issue with an XFS filesystem shutting down under high load with very many small files. Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average). at some point I get the following in dmesg: [2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: [2870477.695611] [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5 [2870477.696464] Please umount the filesystem, and rectify the problem(s) # ls -l /store ls: /store: Input/output error ?--------- 0 root root 0 Jan 1 1970 /store Filesystems is ~1T in size # df -hT /store Filesystem Type Size Used Avail Use% Mounted on /dev/sda5 xfs 910G 142G 769G 16% /store
Using CentOS 5.9 with kernel 2.6.18-348.el5xen
The filesystem is in a virtual machine (Xen) and on top of LVM. Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.) These are the defaults with which the filesystem was created: # xfs_info /store meta-data=/dev/sda5 isize=256 agcount=32, agsize=7454720 blks = sectsz=512 attr=0 data = bsize=4096 blocks=238551040, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that. Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9 Any help will be greatly appreciated...
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
----- Original Message ----- | Eliezer Croitoru eliezer@ngtech.co.il writes: | | > I had similar issue: A nfs server with XFS as the FS for backup of | > a | > very large system. I have a 2TB raid-1 volume and I started rsync | > the | > backup and then somewhere I got this issue. There were lots of | > files | > there and the system has 8GB of ram and CentOS 6.5 64bit. I didn't | > bother to look at the issue due to the fact that ReiserFS was just | > OK | > with it without any issues. | > | > I never new about the inode64 option, is it only on the mount | > options | > or also on the mkfs.xfs command? | > | > Also in a case I want to test it again what would be a | > recommendation | > to not crash the system when there is lot's of memory in use? | | | My systems have 17G of RAM and 1T xfs partitions. I was under the | impression that inode64 option only applies to FS larger than 1T in | size?
Your impression is correct. The OPs filesystem is less than 1TB so inode64 is not the problem and it is likely a kernel bug.