XFS-filesystem corrupted by defragmentation Was: Performance problems with XFS on Centos 5.4

xfs_db: unexpected XFS SB magic number 0x00000000 xfs_db: read failed: Invalid argument xfs_db: data size check failed cache_node_purge: refcount was 1, not zero (node=0x2a25c20) xfs_db: cannot read root inode (22)

THAT output was definitly not there when I did this the last time and therefor the new fragmentation does not make me happy either

xfs_db> frag actual 0, ideal 0, fragmentation factor 0.00%

The file-system is still mounted and working and I don't dare to do anything about it (am in a mild state of panic) because I think it might not come back if I do.

Any suggestions most welcome (am googling myself before I do anything about it).

I swear to god: I did not do anything else with the xfs_*-commands than the stuff mentioned above

Bernhard

Attachments:

attachment.eml (message/rfc822 — 5.8 KB)

Show replies by date

James Pearson

13 Apr 13 Apr

10:58 a.m.

Bernhard Gschaider wrote:

...

Before I'd try to defragment my whole filesystem (see attached mail for whole story) I figured "Let's try it on some file".

Might be better to ask on the XFS list: xfs@oss.sgi.com - see:

http://oss.sgi.com/mailman/listinfo/xfs

James Pearson

Bernhard Gschaider

12:14 p.m.

New subject: XFS-filesystem corrupted by defragmentation Was: Performance problems with XFS on Centos 5.4

...

...
...
...
...
On Tue, 13 Apr 2010 11:58:39 +0100 "JP" == James Pearson james-p@moving-picture.com wrote:

JP> Bernhard Gschaider wrote: >> Before I'd try to defragment my whole filesystem (see attached >> mail for whole story) I figured "Let's try it on some file".

JP> Might be better to ask on the XFS list: xfs@oss.sgi.com - JP> see:

JP> http://oss.sgi.com/mailman/listinfo/xfs

Thank you. I did. I just figured that if this is a problem specific to the version of the xfs-utils that come with CentOS somebody here might have encountered it too

Bernhard Gschaider

6:06 p.m.

Just to close this thread and remove any doubt that it might have raised about XFS: the problem was a PEBCAB[1] It was pointed out to me on the XFS-list that the device I used for xfs_db was inconsistent with the info from xfs_info (I was blindly copying the device from the output of df)

Footnotes: [1] PEBCAB: Problem exists between chair and keyboard

...

...
...
...
...
On Tue, 13 Apr 2010 11:54:53 +0200 "BG" == Bernhard Gschaider bgschaid_lists@ice-sf.at wrote:

BG> Before I'd try to defragment my whole filesystem (see attached BG> mail for whole story) I figured "Let's try it on some file".

BG> So I did

>> xfs_bmap /raid/Temp/someDiskimage.iso BG> [output shows 101 extents and 1 hole]

BG> Then I defragmented the file >> xfs_fsr /raid/Temp/someDiskimage.iso BG> extents before:101 after:3 DONE

>> xfs_bmap /raid/Temp/someDiskimage.iso BG> [output shows 3 extents and 1 hole]

BG> and now comes the bummer: i wanted to check the fragmentation BG> of the whole filesystem (just for checking):

>> xfs_db -r /dev/mapper/VolGroup00-LogVol04 BG> xfs_db: unexpected XFS SB magic number 0x00000000 xfs_db: read BG> failed: Invalid argument xfs_db: data size check failed BG> cache_node_purge: refcount was 1, not zero (node=0x2a25c20) BG> xfs_db: cannot read root inode (22)

BG> THAT output was definitly not there when I did this the last BG> time and therefor the new fragmentation does not make me happy BG> either

xfs_db> frag BG> actual 0, ideal 0, fragmentation factor 0.00%

BG> The file-system is still mounted and working and I don't dare BG> to do anything about it (am in a mild state of panic) because BG> I think it might not come back if I do.

BG> Any suggestions most welcome (am googling myself before I do BG> anything about it).

BG> I swear to god: I did not do anything else with the BG> xfs_*-commands than the stuff mentioned above

BG> Bernhard

BG> From: Bernhard Gschaider bgschaid_lists@ice-sf.at Subject: BG> Re: [CentOS] Performance problems with XFS on Centos 5.4 To: BG> CentOS mailing list centos@centos.org Date: Mon, 12 Apr 2010 BG> 18:22:24 +0200 Organization: ICE Stroemungsforschung Reply-To: BG> CentOS mailing list centos@centos.org

...

...
...
...
...
On Fri, 9 Apr 2010 10:59:02 -0400 "RW" == Ross Walker rswwalker@gmail.com wrote:

RW> On Apr 9, 2010, at 9:59 AM, Bernhard Gschaider RW> <bgschaid_lists@ice- sf.at> wrote:

>>> Hi! >>> >>> During the last weeks I experienced some performance problems >>> with a large file-system on XFS basis. Sometimes for instance >>> ls is painfully. Immidiatly afterwards ls on the same >>> directory is immidiate. I used strace on this ls and found >>> that during the first ls the lstat-calls need approx 0.02s >>> each while during the second ls the are two orders of >>> magnitude faster. >>> >>> Googling around I stumbled upon some messages similar like >>> this >>> >>> http://www.opensubscriber.com/message/linux-xfs@oss.sgi.com/1355060.html >>> >>> which have in common a) they're from around 2006 b) they >>> suggest to increase a mount-option ihashsize. This mount >>> option is listed as deprecated in the current kernel-doc >>> >>> So my question: does anyone have experience with that kind of >>> performance problem? Do you think it is a XFS problem or are >>> there some other tuning parameters in the kernel that could be >>> modified for instance via /proc? >>> >>> The reason why I'm asking here is that it is a production >>> file-system so I would be very unpopular if I experiment too >>> much (a couple of reboots is OK ;) ) >>> >>> Bernhard >>> >>> PS: the situation got worse during the last weeks when the >>> file-system increased in size, so the option that some kind of >>> buffer now is too small and I'm experiencing some kind of >>> thrashing seems very likely to me

RW> Are you defragging the file system regularly?

BG> Uups. Never occured to me ("Fragmentation is soooo Windoze") BG> Had a look:

xfs_db> frag BG> actual 6349355, ideal 4865683, fragmentation factor 23.37%

BG> This seems significant.

RW> How much memory do you have in the system and how big is the RW> file system?

BG> Memory on the system is 4Gig (2 DualCore Xenons). The BG> filesystem is 3.5 TB of which 740 Gig are used. Which is the BG> maximum amount used during the one year that the filesystem is BG> being used (that is why the high fragmentation amazes me)

RW> What are the XFS parameters for the file system?

BG> Is this sufficent?

BG> % xfs_info /raid meta-data=/dev/VolGroup00/LogVol05 isize=256 BG> agcount=32, agsize=29434880 blks = sectsz=512 attr=0 data = BG> bsize=4096 blocks=941916160, imaxpct=25 = sunit=0 swidth=0 BG> blks, unwritten=1 naming =version 2 bsize=4096 log =internal BG> bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, BG> lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0

RW> What is the storage setup?

BG> The filesystem is on a LVM-Volume which sits on a RAID 5 BG> (Hardware RAID) drive

RW> Need the info.

BG> So the way to go forward would be using xfs_fsr on that BG> drive. I read some horror stories about lost files, are these BG> to be taken seriously (I mean they were in some Ubuntu forums BG> ;) )

BG> Any other thoughts on parameters?

BG> Thanks for your time

BG> Bernhard _______________________________________________ BG> CentOS mailing list CentOS@centos.org BG> http://lists.centos.org/mailman/listinfo/centos

BG> ----------

BG> _______________________________________________ CentOS mailing BG> list CentOS@centos.org BG> http://lists.centos.org/mailman/listinfo/centos

-- --------------------------------------------------------------------------- DI Bernhard F.W. Gschaider --------------------------------------------------------------------------- EMail: Bernhard.Gschaider@ice-sf.at WWW : www.ice-sf.at Jabber : bgschaid@jabber.org Tel: +43(3842)98282-42 Fax: +43(3842)98282-02 ---------------------------------------------------------------------------

Ross Walker

11:15 p.m.

On Apr 13, 2010, at 2:06 PM, Bernhard Gschaider <bgschaid_lists@ice-sf.at

...

wrote:

...

Just to close this thread and remove any doubt that it might have raised about XFS: the problem was a PEBCAB[1] It was pointed out to me on the XFS-list that the device I used for xfs_db was inconsistent with the info from xfs_info (I was blindly copying the device from the output of df)

Footnotes: [1] PEBCAB: Problem exists between chair and keyboard

Sorry I wasn't reading the list for a few days and when I saw the problem you had I was like holy sh!t, I'm glad it turned out to be a non-issue.

Did defragging help?

Can you give me the layout of your disks, chunk size, how many (I think you said it was raid5?) so I can verify the sunit and swidth values are correct?

-Ross

5707

Age (days ago)

5707

Last active (days ago)

discuss@lists.centos.org

4 comments

3 participants

tags (0)

participants (3)

Bernhard Gschaider
James Pearson
Ross Walker