Il 02/04/20 21:14, Karl Vogel ha scritto:
[Replying privately because my messages aren't making it to the list]
In a previous message, Alessandro Baggi said:
A> Bacula works without any problem, well tested, solid but complex to A> configure. Tested on a single server (with volumes on disk) and a A> full backup of 810gb (~150000 files) took 6,30 hours (too much).
For a full backup, I'd use something like "scp -rp". Anything else has overhead you don't need for the first copy.
Also, pick a good cipher (-c) for the ssh/scp commands -- it can improve your speed by an order of magnitude. Here's an example where I copy my current directory to /tmp/bkup on my backup server:
Running on: Linux x86_64 Thu Apr 2 14:48:45 2020
me% scp -rp -c aes128-gcm@openssh.com -i $HOME/.ssh/bkuphost_ecdsa \ . bkuphost:/tmp/bkup
Authenticated to remote-host ([remote-ip]:22). ansible-intro 100% 16KB 11.3MB/s 00:00 ETA nextgov.xml 100% 27KB 21.9MB/s 00:00 ETA building-VM-images 100% 1087 1.6MB/s 00:00 ETA sort-array-of-hashes 100% 1660 2.5MB/s 00:00 ETA ... ex1 100% 910 1.9MB/s 00:00 ETA sitemap.m4 100% 1241 2.3MB/s 00:00 ETA contents 100% 3585 5.5MB/s 00:00 ETA ini2site 100% 489 926.1KB/s 00:00 ETA mkcontents 100% 1485 2.2MB/s 00:00 ETA
Transferred: sent 6465548, received 11724 bytes, in 0.4 seconds Bytes per second: sent 18002613.2, received 32644.2
Thu Apr 02 14:48:54 2020
A> scripted rsync. Simple, through ssh protocol and private key. No agent A> required on target. I use file level deduplication using hardlinks.
I avoid block-level deduplication as a general rule -- ZFS memory use goes through the roof if you turn that on.
rsync can do the hardlinks, but for me it's been much faster to create a list of SHA1 hashes and use a perl script to link the duplicates. I can send you the script if you're interested.
This way, you're not relying on the network for anything other than the copies; everything else takes place on the local or backup system.
A> Using a scripted rsync is the simpler way but there is something that A> could be leaved out by me (or undiscovered error). Simple to restore.
I've never had a problem with rsync, and I've used it to back up Linux workstations with ~600Gb or so. One caveat -- if you give it a really big directory tree, it can get lost in the weeds. You might want to do something like this:
Make your original backup using scp.
Get a complete list of file hashes on your production systems
using SHA1 or whatever you like.
- Whenever you do a backup, get a (smaller) list of modified files
using something like "find ./something -newer /some/timestamp/file" or just making a new list of file hashes and comparing that to the original list.
- Pass the list of modified files to rsync using the "--files-from"
option so it doesn't have to walk the entire tree again.
Good luck!
-- Karl Vogel / vogelke@pobox.com / I don't speak for the USAF or my company
The best setup is having a wife and a mistress. Each of them will assume you're with the other, leaving you free to get some work done. --programmer with serious work-life balance issues
Hi Karl,
thank you for your answer. I'm trying ssh scripted rsync using a faster cypher like you suggested and seems that transfer on 10GB is better of default selected cypher (129 sec vs 116 using aes128-gcm, I tested this multiple times). Now I will try to check on the entire dataset and see how much benefit I gain.
Waiting that, what do you think about bacula as backup solution?
Thank you in advance.