There was a similar thread about which is the best FS for Centos.
I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
I now have a fresh new drive to install my root Centos system onto, and wondered about creating the partitions as XFS?
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
Kind Regards,
Keith Roberts
On Friday 03 December 2010 13:55:28 Keith Roberts wrote:
There was a similar thread about which is the best FS for Centos.
I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
'data safe' is certainly not something easy to define. Short answer: no XFS is not better than ext3 here. Longer answer: Both are journaled, ext3 typically pushes data to disk quicker, neither are check-summed, ext3 is more widely used, neither does replication, XFS has some corner cases (I have seen strangeness with very full filesystems and also it's not recommended for 32- bit CentOS).
In the end the only thing that'll keep your data safe are backups.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
Oversimplified: XFS sets data structures up as you go, ext3 does it from start. Also, the default for ext3 is to reserve space (see the -m option).
I now have a fresh new drive to install my root Centos system onto, and wondered about creating the partitions as XFS?
ext3 is default => extremely well tested => good choice (IMHO)
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
/Peter
On Fri, 2010-12-03 at 14:20 +0100, Peter Kjellström wrote:
On Friday 03 December 2010 13:55:28 Keith Roberts wrote:
There was a similar thread about which is the best FS for Centos. I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
'data safe' is certainly not something easy to define.
+1
Short answer: no XFS is not better than ext3 here.
+1 We'll all move to ext4 with CentOS 6. ext4 is a big improvement over the options available in CentOS 5
In the end the only thing that'll keep your data safe are backups.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
Oversimplified: XFS sets data structures up as you go, ext3 does it from start. Also, the default for ext3 is to reserve space (see the -m option).
+1
Although equivalent issues can arise in XFS [vs. ext3]. http://www.whitemiceconsulting.com/2010/09/xfs-inodes.html
I now have a fresh new drive to install my root Centos system onto, and wondered about creating the partitions as XFS?
ext3 is default => extremely well tested => good choice (IMHO)
I'd stick with ext3 unless you have a compelling reason to use another FS.
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
On Fri, Dec 03, 2010 at 08:31:12AM -0500, Adam Tauno Williams wrote:
On Fri, 2010-12-03 at 14:20 +0100, Peter Kjellström wrote:
On Friday 03 December 2010 13:55:28 Keith Roberts wrote:
There was a similar thread about which is the best FS for Centos. I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
'data safe' is certainly not something easy to define.
+1
Short answer: no XFS is not better than ext3 here.
+1 We'll all move to ext4 with CentOS 6. ext4 is a big improvement over the options available in CentOS 5
In the end the only thing that'll keep your data safe are backups.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
Oversimplified: XFS sets data structures up as you go, ext3 does it from start. Also, the default for ext3 is to reserve space (see the -m option).
+1
Although equivalent issues can arise in XFS [vs. ext3]. http://www.whitemiceconsulting.com/2010/09/xfs-inodes.html
I now have a fresh new drive to install my root Centos system onto, and wondered about creating the partitions as XFS?
ext3 is default => extremely well tested => good choice (IMHO)
I'd stick with ext3 unless you have a compelling reason to use another FS.
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
Has anyone an update or status for issues raised in http://lwn.net/Articles/322823/ or T'so's response to the issue https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 have all the apps been adjusted, or is ext4 still more vulnerable to data loss than ext3? Could link to a reference?
On Dec 3, 2010, at 9:25 AM, cpolish@surewest.net wrote:
On Fri, Dec 03, 2010 at 08:31:12AM -0500, Adam Tauno Williams wrote:
On Fri, 2010-12-03 at 14:20 +0100, Peter Kjellström wrote:
On Friday 03 December 2010 13:55:28 Keith Roberts wrote:
There was a similar thread about which is the best FS for Centos. I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
'data safe' is certainly not something easy to define.
+1
Short answer: no XFS is not better than ext3 here.
+1 We'll all move to ext4 with CentOS 6. ext4 is a big improvement over the options available in CentOS 5
In the end the only thing that'll keep your data safe are backups.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
Oversimplified: XFS sets data structures up as you go, ext3 does it from start. Also, the default for ext3 is to reserve space (see the -m option).
+1
Although equivalent issues can arise in XFS [vs. ext3]. http://www.whitemiceconsulting.com/2010/09/xfs-inodes.html
I now have a fresh new drive to install my root Centos system onto, and wondered about creating the partitions as XFS?
ext3 is default => extremely well tested => good choice (IMHO)
I'd stick with ext3 unless you have a compelling reason to use another FS.
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
Has anyone an update or status for issues raised in http://lwn.net/Articles/322823/ or T'so's response to the issue https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 have all the apps been adjusted, or is ext4 still more vulnerable to data loss than ext3? Could link to a reference?
Both ext4 and xfs are susceptible to this type of data loss.
These file systems excel in handling very large volumes of data, TBs, especially in the time to fsck and handling of very large files because of that they tend to be on server class hardware with UPS power protection, or on video recorders where a little data loss isn't the end of the world.
If you are talking GBs of data, stick with ext3.
-Ross
2010/12/3 Peter Kjellström cap@nsc.liu.se:
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
There is a good chance that they are included in the distro as of CentOS 6 [1] and, therefore, are available during the installation. This remains to be seen/tested though.
Akemi
On 12/3/2010 6:20 AM, Peter Kjellström wrote:
What about the XFS admin tools - do these get installed when you format a partition as XFS from anaconda, or are they a seperate rpm package, installed later?
They are in a separate rpm (xfsprogs, repository: extras).
On that topic, there are many arbitrary differences between the way the ext3 and XFS tools work. If you're the sort who has half the tune2fs command line interface memorized, you'll find yourself climbing a bit of a learning curve by switching to XFS.
IMHO, the best reason to use XFS is when you have to get past one of the ext3 limits. If your problem fits within ext3's limits, stick with it.
Am 03.12.2010 13:55, schrieb Keith Roberts:
There was a similar thread about which is the best FS for Centos.
I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
I had a 100GiB ext3 partition, and it took up 1.75GiB for FS administration purposes. I reformatted it to XFS, and it only used 50.8MB!
Just yesterday we had the case of hitting ext3 limits - a folder can only contain 32k subfolders. So I had to create a XFS container, to hold the amount of data.
Rainer
On Fri, Dec 3, 2010 at 7:55 AM, Keith Roberts keith@karsites.net wrote:
There was a similar thread about which is the best FS for Centos.
I'm using ext3, and wondered if XFS would be more 'data safe' than ext3.
If your work load doesn't dictate the use of XFS, I would personally stick with EXT3. It's the default file system on thousands and thousands of machines, so I would be willing to bet that the widespread deployment of EXT3 would flush out show stopping bugs rather quickly. There was a thread discussing this on the LKML a while back. You might try searching there to get feedback from the folks who actually write the file systems.
- Ryan -- http://prefetch.net
From personal experience, the last three times I ran XFS on large
volumes (4+ TB), they all became irrecoverably corrupted in some way or another.
The final occasion resulted in XFS being permanently banned from that establishment.
On 12/3/2010 2:08 PM, John Jasen wrote:
From personal experience, the last three times I ran XFS on large
volumes (4+ TB), they all became irrecoverably corrupted in some way or another.
The final occasion resulted in XFS being permanently banned from that establishment.
Was this on 32-bit RH/Centos where the 4k stacks are a known problem for XFS?
Thanks to everyone that answered, for all the replies to my questions about XFS.
I've taken note of the points raised, and gone with ext3 for now (again).
I do have a backup strategy in place, and you can use my PHP script from here, if you like it:
http://forums.fedoraforum.org/showthread.php?t=248436
Having made regular backups to the hard drive, I then as I feel is appropriate, make CD/DVD backups from the backup drive.
I've lost hard drives and data in the past, so now I take precautions not to loose any data I need.
Kind Regards,
Keith Roberts
On 12/3/2010 2:13 PM, Keith Roberts wrote:
Thanks to everyone that answered, for all the replies to my questions about XFS.
I've taken note of the points raised, and gone with ext3 for now (again).
I do have a backup strategy in place, and you can use my PHP script from here, if you like it:
http://forums.fedoraforum.org/showthread.php?t=248436
Having made regular backups to the hard drive, I then as I feel is appropriate, make CD/DVD backups from the backup drive.
I've lost hard drives and data in the past, so now I take precautions not to loose any data I need.
Whenever anyone mentions backups, I like to plug the backuppc program (http://backuppc.sourceforge.net/index.html and packaged in EPEL). It uses compression and hardlinks all duplicate files to keep much more history than you'd expect on line with a nice web interface - and does pretty much everything automatically.
On 12/03/10 12:25 PM, Les Mikesell wrote:
Whenever anyone mentions backups, I like to plug the backuppc program (http://backuppc.sourceforge.net/index.html and packaged in EPEL). It uses compression and hardlinks all duplicate files to keep much more history than you'd expect on line with a nice web interface - and does pretty much everything automatically.
I'm curious how you backup backuppc, like for disaster recovery, archival, etc? since all the files are in a giant mess of symlinks (for deduplication) with versioning, I'd have to assume the archive volume gets really messy after awhile, and further, something like that is pretty darn hard to make a replica of it.
this has kept me leery of it.
On Fri, 2010-12-03 at 12:51 -0800, John R Pierce wrote:
On 12/03/10 12:25 PM, Les Mikesell wrote:
Whenever anyone mentions backups, I like to plug the backuppc program (http://backuppc.sourceforge.net/index.html and packaged in EPEL). It uses compression and hardlinks all duplicate files to keep much more history than you'd expect on line with a nice web interface - and does pretty much everything automatically.
I'm curious how you backup backuppc, like for disaster recovery,
I know nothing about backuppc; I don't use it. But we use rsync with the same concept for a deduplicated archive.
archival, etc? since all the files are in a giant mess of symlinks
No, they are not symbolic links - they are *hard links*. That they are hard-links is the actual magic. Symbolic links would provide the automatic deallocation of expires files.
(for deduplication) with versioning, I'd have to assume the archive volume gets really messy after awhile, and further, something like that is pretty darn hard to make a replica of it.
I don't see why; only the archive is deduplicated in this manner, and it certainly isn't "messy". One simply makes a backup [for us that means to tape - a disk is not a backup] of the most current snapshot.
The script just looks like -
export ROOT="/srv/cifs/Arabis-Red" export STAMP=`date +%Y%m%d%H` export LASTSTAMP=`cat $ROOT/LAST.STAMP` mkdir $ROOT/$STAMP mkdir $ROOT/$STAMP/home
nice rsync --verbose --archive --delete --acls \ --link-dest $ROOT/$LASTSTAMP/home/ \ --numeric-ids \ -e ssh \ archivist@arabis-red:/home/ \ $ROOT/$STAMP/home/ \ 2>&1 > $ROOT/$STAMP/home.log
echo $STAMP > $ROOT/LAST.STAMP
On 12/3/2010 4:14 PM, Adam Tauno Williams wrote:
On Fri, 2010-12-03 at 12:51 -0800, John R Pierce wrote:
On 12/03/10 12:25 PM, Les Mikesell wrote:
Whenever anyone mentions backups, I like to plug the backuppc program (http://backuppc.sourceforge.net/index.html and packaged in EPEL). It uses compression and hardlinks all duplicate files to keep much more history than you'd expect on line with a nice web interface - and does pretty much everything automatically.
I'm curious how you backup backuppc, like for disaster recovery,
I know nothing about backuppc; I don't use it. But we use rsync with the same concept for a deduplicated archive.
archival, etc? since all the files are in a giant mess of symlinks
No, they are not symbolic links - they are *hard links*. That they are hard-links is the actual magic. Symbolic links would provide the automatic deallocation of expires files.
(for deduplication) with versioning, I'd have to assume the archive volume gets really messy after awhile, and further, something like that is pretty darn hard to make a replica of it.
I don't see why; only the archive is deduplicated in this manner, and it certainly isn't "messy". One simply makes a backup [for us that means to tape - a disk is not a backup] of the most current snapshot.
Actually, making a backup of BackupPC's data pool (or just moving it to new disks) does get messy. With a large pool there are so many hardlinks that rsync has trouble dealing with it, eats all your memory, and takes forever. This is a frequent topic of conversation on the BackupPC list. However, the next major version of BackupPC is supposed to use a different method of deduplication that will not use hardlinks and will be much easier to back up.
On 12/3/2010 3:14 PM, Adam Tauno Williams wrote:
I know nothing about backuppc; I don't use it. But we use rsync with the same concept for a deduplicated archive.
Backuppc is a couple of perl scripts, one of which happens to re-implement rsync in a way that lets it use stock rsync on the remote while transparently accessing a compressed copy on the server side. It can also use tar or samba to copy files in, then does the same compression/dedup operation.
(for deduplication) with versioning, I'd have to assume the archive volume gets really messy after awhile, and further, something like that is pretty darn hard to make a replica of it.
I don't see why; only the archive is deduplicated in this manner, and it certainly isn't "messy". One simply makes a backup [for us that means to tape - a disk is not a backup] of the most current snapshot.
I does get messy because backuppc archives typically have millions of hardlinked files. It doesn't just hardlink between subsequent runs of the same machine, it hardlinks all files with identical content from the same machine or other, using a pool directory of hashed filenames as a common link to match them up quickly.
The script just looks like -
export ROOT="/srv/cifs/Arabis-Red" export STAMP=`date +%Y%m%d%H` export LASTSTAMP=`cat $ROOT/LAST.STAMP` mkdir $ROOT/$STAMP mkdir $ROOT/$STAMP/home
nice rsync --verbose --archive --delete --acls \ --link-dest $ROOT/$LASTSTAMP/home/ \ --numeric-ids \ -e ssh \ archivist@arabis-red:/home/ \ $ROOT/$STAMP/home/ \ 2>&1> $ROOT/$STAMP/home.log
echo $STAMP> $ROOT/LAST.STAMP
But that won't match up multiple copies of the same file in different locations or help with many machines with mostly-duplicate content. The backuppc scheme works pretty well in normal usage, but most file-oriented approaches to copy the whole backuppc archive have scaling problems because they have to track all the inodes and names to match up the hard links.
On Fri, Dec 03, 2010 at 04:07:06PM -0600, Les Mikesell wrote:
The backuppc scheme works pretty well in normal usage, but most file-oriented approaches to copy the whole backuppc archive have scaling problems because they have to track all the inodes and names to match up the hard links.
That's been my experience with other hard-linked based backup schemes as well. For 'normal' sized backups they work pretty well, but for some value of 'large' backups the number of inodes and the tree traversal time starts to cause real performance problems.
I'd be interested to know how large people's backups are where they're still seeing decent performance using approaches like this? I believe we started seeing problems once we hit a few TB (on ext3)?
We've moved to brackup (http://code.google.com/p/brackup/) for these reasons, and are doing nightly backups of 18TB of data quite happily. Brackup does fancy chunk-based deduplication (somewhat like git), and so avoids the hard link approach entirely.
Cheers, Gavin
On 12/3/2010 4:32 PM, Gavin Carr wrote:
On Fri, Dec 03, 2010 at 04:07:06PM -0600, Les Mikesell wrote:
The backuppc scheme works pretty well in normal usage, but most file-oriented approaches to copy the whole backuppc archive have scaling problems because they have to track all the inodes and names to match up the hard links.
That's been my experience with other hard-linked based backup schemes as well. For 'normal' sized backups they work pretty well, but for some value of 'large' backups the number of inodes and the tree traversal time starts to cause real performance problems.
I'd be interested to know how large people's backups are where they're still seeing decent performance using approaches like this? I believe we started seeing problems once we hit a few TB (on ext3)?
You should probably ask this on the backuppc list. But note that the performance issue is not using backuppc itself, it is only a problem when you try to copy the whole archive by some file-oriented method.
We've moved to brackup (http://code.google.com/p/brackup/) for these reasons, and are doing nightly backups of 18TB of data quite happily. Brackup does fancy chunk-based deduplication (somewhat like git), and so avoids the hard link approach entirely.
Brackup looks more like a 'push out a backup from a single host' concept as opposed to backuppc's 'pull all backups from many targets to a common server with appropriate scheduling' so you'd probably use them in different scenarios. Or did you mean you are backing up backuppc's archive with brackup?
The author has announced plans to re-do the storage scheme in backuppc, but I'm not sure if it will be chunked. One down side of the current scheme is that small changes in big files result in a complete separate copy being stored. The rsync based transfer will only send the differences but the server ends up (like normal rysnc) reconstructing a complete copy of the modified file.
On Fri, 03 Dec 2010 16:54:50 -0600 Les Mikesell wrote:
You should probably ask this on the backuppc list. But note that the performance issue is not using backuppc itself, it is only a problem when you try to copy the whole archive by some file-oriented method.
I don't use backuppc myself but I'm curious: what is the preferred method for copying the archive from one fileserver to another?
On 12/3/2010 5:03 PM, Frank Cox wrote:
You should probably ask this on the backuppc list. But note that the performance issue is not using backuppc itself, it is only a problem when you try to copy the whole archive by some file-oriented method.
I don't use backuppc myself but I'm curious: what is the preferred method for copying the archive from one fileserver to another?
Unmount it and image-copy the partition or whole drive. I use raid-mirroring because then you only have to unmount for the time it takes to fail the partition out of the raid, but realistically you aren't going to get much else done on that partition for the time it is copying because the head is going to keep jumping back to the place it is trying to copy.
On Fri, Dec 03, 2010 at 04:54:50PM -0600, Les Mikesell wrote:
On 12/3/2010 4:32 PM, Gavin Carr wrote:
We've moved to brackup (http://code.google.com/p/brackup/) for these reasons, and are doing nightly backups of 18TB of data quite happily. Brackup does fancy chunk-based deduplication (somewhat like git), and so avoids the hard link approach entirely.
Brackup looks more like a 'push out a backup from a single host' concept as opposed to backuppc's 'pull all backups from many targets to a common server with appropriate scheduling' so you'd probably use them in different scenarios.
Yeah, you're right, the brackup model is push, and backuppc is quite a different beast in that respect. I was really talking more generally about hard-linked based backups than backuppc in particular.
Cheers, Gavin
On 12/3/2010 2:51 PM, John R Pierce wrote:
On 12/03/10 12:25 PM, Les Mikesell wrote:
Whenever anyone mentions backups, I like to plug the backuppc program (http://backuppc.sourceforge.net/index.html and packaged in EPEL). It uses compression and hardlinks all duplicate files to keep much more history than you'd expect on line with a nice web interface - and does pretty much everything automatically.
I'm curious how you backup backuppc, like for disaster recovery, archival, etc? since all the files are in a giant mess of symlinks (for deduplication) with versioning, I'd have to assume the archive volume gets really messy after awhile, and further, something like that is pretty darn hard to make a replica of it.
The usual way is some form of image-copy of the whole drive or partition. In my case I made a 3-device raid1 mirror where I regularly rotate one of the drives offsite letting the one coming back resync. I use a SATA drive in a trayless hot-swap enclosure. For disaster recovery it can by connected to my laptop - or about anything with a USB cable adapter.
this has kept me leery of it.
If you are really leery, you just run two independent instances, one of which is in another location. It can run with rsync over ssh, so bandwidth needs are minimal and it pretty much takes care of itself.
It also has a way to generate tar archives from the last (or a specified) backup. You can roll those off to something else periodically but you lose the space savings.
On Friday 03 December 2010 21:13:37 Keith Roberts wrote: ...
Having made regular backups to the hard drive, I then as I feel is appropriate, make CD/DVD backups from the backup drive.
Just be careful with CD/DVDs since data quality and persistence over time is questionable.
/Peter
I've lost hard drives and data in the past, so now I take precautions not to loose any data I need.
Kind Regards,
Keith Roberts