Hello All,
This was posted to the Postgres Admin list - it has some valuable tips wrt XFS usage. I haven't had time yet to see if the C5 kernel conforms to what Hannes is talking about, but it looks like XFS is *very* picky about having hardware working correctly (and raid arrays set up correctly).
Cheers, -J
---------- Forwarded message ---------- Date: Tue, 01 May 2007 16:35:28 +0200 From: Hannes Dorbath light@theendofthetunnel.de To: Adam Witney awitney@sgul.ac.uk, pgsql-admin@postgresql.org Subject: Re: [ADMIN] File systems linux !!!
Adam Witney wrote:
Could you give a couple of examples of things that could be done wrong? I have XFS running for my data partition, but I didn't really do much when I set it up...
Thanks for any advice
1.) Don't run XFS on any hardware that's not proven to be 100% fsync/fua safe. It's extremely unforgiving in that regard. Double check your raid controller settings and then test with something like http://www.faemalia.net/mysqlUtils/diskTest.pl
2.) Don't run it with a 4K stacks kernel. Though most issues with 4k stacks have been fixed long ago, there is an 4k stack fix in 2.6.21 release notes yet again. I just wouldn't trust it 100% yet. Especially avoid to run with 4k stacks in production if you are used to stack block devices on top of each other (LVM, EVMS, DRBD, GNBD etc).
3.) Use the deadline I/O scheduler, anticipatory and XFS don't like each other. This is true for almost any FS != ext3. This makes a difference especially for OLTP.
4.) Don't use stripe alignment unless you are 100% sure on how to calculate that numbers for your raid setup. No stripe alignment is always better than a wrong alignment. Some controllers don't like it at all and degrade in performance.
5.) Make sure to use write barriers unless you run on a hardware controller with BBU. Actually this is XFS default these days, but it gets disabled if you have any block device in your stack that doesn't support it. An example is DRBD (though write barriers are on the road map)
6.) Flushing data is the sole responsibility of the application. XFS does nothing to help broken applications, like ext3 can do with data=ordered or data=journal.
XFS uses writeback exclusively. Don't run anything that does not conform to ACID on XFS. This is fine for PostgreSQL, but might not be fine for all your applications.
7.) Check dmesg for XFS messages and be able to interpret them. Especially something about "CORRUPTED_GOTO". If you see such a line chances are high that 1.) is not true. This is a cry from XFS to run xfs_repair ASAP, the file system was only mounted to keep your box online and will shutdown immediately if any suspicious position is accessed. Take XFS messages serious and Google for them if your are not sure what they mean.
8.) Grab all those nice PDFs at http://oss.sgi.com/projects/xfs/training/index.html These are essential readings for any XFS admin.
IMHO XFS is a mature and rock stable file system, however your really need to obey the things above. It's just not the general-purpose-mkfs-and-forget-FS like what ext3 is claimed to be.
What I recommend for a PostgreSQL production box is to use ext3 data=journal for / and XFS for $PGDATA. This should give you a system which really behaves nice on power failures. ext3 data=journal for all non ACID applications, XFS for ACID applications.
Personally I use XFS on / as well, but I have taken some steps to make it behave like I wish.
On Tuesday 01 May 2007 08:28, Feizhou wrote:
3.) Use the deadline I/O scheduler, anticipatory and XFS don't like each other. This is true for almost any FS != ext3. This makes a difference especially for OLTP.
Default on Centos 5 should be neither. It is CFQ IIRC.
Here's a quick (if slightly older) writeup of CFQ, which mentions some code was borrowed from the Anticipatory Scheduler.
http://lwn.net/Articles/114770/
And here's a post from what seems to be the developer stating it explicitly.
http://lwn.net/Articles/114773/
I have no idea whether the shared code contains the portions that XFS finds fault with.