Re: [CentOS] C8 and backup solution - Discuss

3 Apr 2020

      Il 02/04/20 21:14, Karl Vogel ha scritto:
...
[Replying privately because my messages aren't making it to the list]
...
...
In a previous message, Alessandro Baggi said:
A> Bacula works without any problem, well tested, solid but complex to
A> configure. Tested on a single server (with volumes on disk) and a
A> full backup of 810gb (~150000 files) took 6,30 hours (too much).
For a full backup, I'd use something like "scp -rp". Anything else
has overhead you don't need for the first copy.
Also, pick a good cipher (-c) for the ssh/scp commands -- it can improve
your speed by an order of magnitude. Here's an example where I copy
my current directory to /tmp/bkup on my backup server:
Running on: Linux x86_64
Thu Apr 2 14:48:45 2020
me% scp -rp -c aes128-gcm@openssh.com -i $HOME/.ssh/bkuphost_ecdsa \
. bkuphost:/tmp/bkup
Authenticated to remote-host ([remote-ip]:22).
ansible-intro 100% 16KB 11.3MB/s 00:00 ETA
nextgov.xml 100% 27KB 21.9MB/s 00:00 ETA
building-VM-images 100% 1087 1.6MB/s 00:00 ETA
sort-array-of-hashes 100% 1660 2.5MB/s 00:00 ETA
...
ex1 100% 910 1.9MB/s 00:00 ETA
sitemap.m4 100% 1241 2.3MB/s 00:00 ETA
contents 100% 3585 5.5MB/s 00:00 ETA
ini2site 100% 489 926.1KB/s 00:00 ETA
mkcontents 100% 1485 2.2MB/s 00:00 ETA
Transferred: sent 6465548, received 11724 bytes, in 0.4 seconds
Bytes per second: sent 18002613.2, received 32644.2
Thu Apr 02 14:48:54 2020
A> scripted rsync. Simple, through ssh protocol and private key. No agent
A> required on target. I use file level deduplication using hardlinks.
I avoid block-level deduplication as a general rule -- ZFS memory
use goes through the roof if you turn that on.
rsync can do the hardlinks, but for me it's been much faster to create
a list of SHA1 hashes and use a perl script to link the duplicates.
I can send you the script if you're interested.
This way, you're not relying on the network for anything other than the
copies; everything else takes place on the local or backup system.
A> Using a scripted rsync is the simpler way but there is something that
A> could be leaved out by me (or undiscovered error). Simple to restore.
I've never had a problem with rsync, and I've used it to back up Linux
workstations with ~600Gb or so. One caveat -- if you give it a really
big directory tree, it can get lost in the weeds. You might want to do
something like this:

Make your original backup using scp.

Get a complete list of file hashes on your production systems

using SHA1 or whatever you like.

Whenever you do a backup, get a (smaller) list of modified files

using something like "find ./something -newer /some/timestamp/file"
or just making a new list of file hashes and comparing that to the
original list.

Pass the list of modified files to rsync using the "--files-from"

option so it doesn't have to walk the entire tree again.
Good luck!
--
Karl Vogel / vogelke@pobox.com / I don't speak for the USAF or my company
The best setup is having a wife and a mistress. Each of them will assume
you're with the other, leaving you free to get some work done.
--programmer with serious work-life balance issues
Hi Karl,
thank you for your answer. I'm trying ssh scripted rsync using a faster 
cypher like you suggested and seems that transfer on 10GB is better of 
default selected cypher (129 sec vs 116 using aes128-gcm, I tested this 
multiple times). Now I will try to check on the entire dataset and see 
how much benefit I gain.
Waiting that, what do you think about bacula as backup solution?
Thank you in advance.