[CentOS] XFS : Taking the plunge

Tue Jan 21 17:35:07 UTC 2014
James A. Peltier <jpeltier at sfu.ca>

Hi,

----- Original Message -----
| 
| Hi All,
| 
| I have been trying out XFS given it is going to be the file system of
| choice from upstream in el7. Starting with an Adaptec ASR71605
| populated
| with sixteen 4TB WD enterprise hard drives. The version of OS is 6.4
| x86_64 and has 64G of RAM.

Good!  You're going to need it with a volume that large!

| This next part was not well researched as I had a colleague bothering
| me
| late on Xmas Eve that he needed 14 TB immediately to move data to
| from an
| HPC cluster. I built an XFS file system straight onto the (raid 6)
| logical
| device made up of all sixteen drives with.
| 
| 
| > mkfs.xfs -d su=512k,sw=14 /dev/sda
| 
| 
| where "512k" is the Stripe-unit size of the single logical device
| built on
| the raid controller. "14" is from the total number of drives minus
| two
| (raid 6 redundancy).


Whoa!  What kind of data are you writing to disk?  I hope they're files that are typically large to account for such a large stripe unit or you're going to lose a lot of the performance benefits.  It will write quite a bit of data to an individual drive in the RAID this way.

| Any comments on the above from XFS users would be helpful!
| 
| I mounted the filesystem with the default options assuming they would
| be
| sensible but I now believe I should have specified the "inode64"
| mount
| option to avoid all the inodes will being stuck in the first TB.
| 
| The filesystem however is at 87% and does not seem to have had any
| issues/problems.
| 
| > df -h | grep raid
| /dev/sda               51T   45T  6.7T  87% /raidstor
| 
| Another question is could I now safely remount with the "inode64"
| option
| or will this cause problems in the future? I read this below in the
| XFS
| FAQ but wondered if it have been fixed (backported?) into el6.4?
| 
| ""Starting from kernel 2.6.35, you can try and then switch back.
| Older
| kernels have a bug leading to strange problems if you mount without
| inode64 again. For example, you can't access files & dirs that have
| been
| created with an inode >32bit anymore.""

Changing to inode64 and back is no problem.  Keep in mind that inode64 may not work with clients running older operating systems.  This bit us when we had a mixture of Solaris 8/9 clients.

| I also noted that "xfs_check" ran out of memory and so after some
| reading
| noted that it is reccommended to use "xfs_repair -n -vv" instead as
| it
| uses far less memory. One remark is so why is "xfs_check" there at
| all?

That's because it didn't do anything.  Trust me, when you actually go and run xfs_{check,repair} without the -n flag, you're gonna need A LOT of memory.  For example a 11TB file system use 24GB of memory for an xfs_repair for a filesystem that held medical imaging data.  Good luck!

As for why xfs_check is there, there are various reasons for it.  For example, it's your go-to program for fixing quota issues, which we've had a couple issues with quotas that xfs_check pointed out so that we could then run xfs_repair.  Keep in mind that xfs_check's are not run on unclean shutdowns.  The XFS log is merely replayed and you're advised to run xfs_check to validate the file system consistency.

| I do have the option of moving the data elsewhere and rebuilding but
| this
| would cause some problems. Any advice much appreciated.

Do you REALLY need it to be a single volume that is so large?

-- 
James A. Peltier
Manager, IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices

“A successful person is one who can lay a solid foundation from the bricks others have thrown at them.” -David Brinkley via Luke Shaw