[CentOS] strategy/technology to backup 20TB or more user's data

Wed Feb 6 05:16:36 UTC 2008
nate <centos at linuxpowered.net>

ankush grover wrote:

> There is a concept of snapshots of Samba with LVM where snapshots of
> samba are taken at the given interval but so far haven't found any
> good article or how-to on that and also what is the experience of
> users using this technology and also what other technologies are being
> to handle TBs of data.

Save yourself a bunch of trouble and buy a real storage system,
if you have 20TB of data that's a serious amount of stuff to
back up. Network Appliance is pretty popular in that space. I've
been using 3PAR for my storage and really like it's built in
virtualization. Dell recently purchase Equallogic which looks to
have some solid technology as well. I attended a little event
where they pitched their pooled storage iSCSI system. Looked
pretty cool.

With these sorts of system snapshotting is really easy and scalable.
In the 3PAR world for example(not sure who else might have this
ability), they have thin copy on write technology. So say you
take a snapshot of a volume once a day for 30 days. In a traditional
snapshot environment, you use a lot of space on the array as it
keeps track of (up to) 30 different snapshots and the changes
from the original volume. In the 3PAR world it only writes the
changes once for the source volume. So if you change 10GB of
data on the base volume and you have 20 snapshots, only 10GB
of data is written as changed to the array, not 20*10GB.

I also love the ability to snapshot multiple volumes from
multiple systems at the same time, and the array ensures
they are all taken at the same instant.

Add on top of that the most advanced thin provisioning (dedicate
on write) technology around, the ability to dynamically grow
the array, change RAID levels on the fly with no downtime,
etc, and you got yourself a nice system :)

I'd personally steer clear of any of the "old fashioned" arrays
(e.g. traditional EMC, HDS, IBM, though some of them are getting
thin provisioning as well).

I can only speak with personal experience in 3PAR, but I
believe NetApp and it looks like Equallogic are very similar
in ease of use. No need to be a storage engineer or have
fancy training to use them. No spending days/weeks planning
the layout of your storage system.

With my storage array currently I have 8.5TB raw space,
am using about 6TB, but my servers think I have nearly
30TB. As I get closer to 8TB I can add more space on
the fly, re-balance the I/O for maximum performance
and continue to grow as-needed.

nate