Hello.
After some reading, including the rsync man page, I am still not clear on this: When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
A simple example case would be backups for one or just a few light-duty local workstations. Is there a simple, clear rule about that, or is it too complicated for that?
On Wed, Feb 11, 2015 at 11:02 AM, Francis Gerund ranrund@gmail.com wrote:
Hello.
After some reading, including the rsync man page, I am still not clear on this: When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
A simple example case would be backups for one or just a few light-duty local workstations. Is there a simple, clear rule about that, or is it too complicated for that?
Use it unless the resulting backup run is too slow to be practical - which will only be when there are vast numbers of hardlinks in the filesystem which is pretty rare (backuppc's archive, for example). That is, if you don't know whether or not you need it you are better off retaining as much of the original filesystem attributes as possible in your backups. But, keep in mind that it can only reproduce the hardlinks that exist in the portion of the filesystem that is covered in one run. If you do multiple runs covering different subdirectories, it can't duplicate hardlinks outside of each run.
On 02/11/2015 11:13 AM, Les Mikesell wrote:
On Wed, Feb 11, 2015 at 11:02 AM, Francis Gerund ranrund@gmail.com wrote:
Hello.
After some reading, including the rsync man page, I am still not clear on this: When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
A simple example case would be backups for one or just a few light-duty local workstations. Is there a simple, clear rule about that, or is it too complicated for that?
Use it unless the resulting backup run is too slow to be practical - which will only be when there are vast numbers of hardlinks in the filesystem which is pretty rare (backuppc's archive, for example). That is, if you don't know whether or not you need it you are better off retaining as much of the original filesystem attributes as possible in your backups. But, keep in mind that it can only reproduce the hardlinks that exist in the portion of the filesystem that is covered in one run. If you do multiple runs covering different subdirectories, it can't duplicate hardlinks outside of each run.
/var/lib/yum/yumdb and, to a lesser extent, /usr/share/zoneinfo are two places that use hard links a lot. If you _don't_ use "-H" you make multiple, independent copies of each file and have no way to restore the original hard link structure. If all you care about is not losing data, then it's just a space issue. If the ability to restore the original hard link relationships is important, then using "-H" is a must, no matter the performance penalty.
We have been using rsync for backups for years with no issues. We backup Oracle archive logs with rsync evry 15 minutes.
________________________________________ From: centos-bounces@centos.org [centos-bounces@centos.org] on behalf of Francis Gerund [ranrund@gmail.com] Sent: Wednesday, February 11, 2015 12:02 PM To: centos@centos.org Subject: [CentOS] [OT] Using rsync to backup / restore - when to use (or not use) the -H option switch?
Hello.
After some reading, including the rsync man page, I am still not clear on this: When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
A simple example case would be backups for one or just a few light-duty local workstations. Is there a simple, clear rule about that, or is it too complicated for that? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 02/11/2015 09:02 AM, Francis Gerund wrote:
When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
It's probably too site or application specific to give any general advice.
Run this command across the filesystem you're going to back up: find /path -type f -links +1
All of the files listed in find's output have multiple links, and will benefit from using -H.
The cost associated with -H is that rsync has to keep a table in memory of all of the inodes and paths that it processes. A large filesystem can cause rsync to consume a lot of RAM. If sufficient RAM is available, I would always recommend -H.
On Wed, Feb 11, 2015 at 11:51 AM, Gordon Messmer gordon.messmer@gmail.com wrote:
On 02/11/2015 09:02 AM, Francis Gerund wrote:
When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
It's probably too site or application specific to give any general advice.
Run this command across the filesystem you're going to back up: find /path -type f -links +1
All of the files listed in find's output have multiple links, and will benefit from using -H.
The cost associated with -H is that rsync has to keep a table in memory of all of the inodes and paths that it processes. A large filesystem can cause rsync to consume a lot of RAM. If sufficient RAM is available, I would always recommend -H.
I don't know about the actual implementation, but wouldn't it really only need to track the inodes/paths of the files with >1 link?
Okay, thanks guys. It seems that -H sould be included by default, unless there is a specific reason not to.
Maybe the rsync -a option switch should include hard links by default. Rsync tutorial type information usually lists generic examples such as:
sudo rsync -avz <source> <destination>
and not addressing the subject of hard links.
And you weren't kidding about the number of entries in /var/lib/yum/yumdb. Wow!
On Wed, Feb 11, 2015 at 12:01 PM, Les Mikesell lesmikesell@gmail.com wrote:
On Wed, Feb 11, 2015 at 11:51 AM, Gordon Messmer gordon.messmer@gmail.com wrote:
On 02/11/2015 09:02 AM, Francis Gerund wrote:
When using rsync to backup and restore, when should and when should one *not* include hard links (by using the -H option switch)?
It's probably too site or application specific to give any general
advice.
Run this command across the filesystem you're going to back up: find /path -type f -links +1
All of the files listed in find's output have multiple links, and will benefit from using -H.
The cost associated with -H is that rsync has to keep a table in memory
of
all of the inodes and paths that it processes. A large filesystem can
cause
rsync to consume a lot of RAM. If sufficient RAM is available, I would always recommend -H.
I don't know about the actual implementation, but wouldn't it really only need to track the inodes/paths of the files with >1 link?
-- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos