Hi folks,
I've got a bit of a different scenario than I imagine most, and have spent the last 60 or 90 minutes searching Amanda list archives and googling, but did not come up with anything much. Then I went browsing around the Amanda website and found "vaulting" and was wondering whether this would suit my needs.
I'm basically searching around for a backup solution and trying to decide whether to use something off the shelf or just roll my own with gtar. It is important to me that my solution use standard tools like dump/restore / gtar on the back end, which is how I ended up at Amanda. In looking through some of the initial configuration how-tos it seemed as though this was massively over-complex for my application. But then I hit upon "vaulting"
http://wiki.zmanda.com/index.php/How_To:Copy_Data_from_Volume_to_Volume
This is not exactly my scenario, but maybe there is another way to roll a "vaulting" solution to suit me.
Basically I work in a scientific research lab (stem cell research) where the scientists produce a fair bit of raw data. We want to periodically take the data and archive it to tape and then remove it from disk and store the tape in our archival facility. We'd need a record of what is on each tape of course. But this would not be the same scenario as in the link above because it would not be taking data from 2ndary to tertiary storage. It would essentially be taken from primary to tertiary directly. i.e. directly from disk to tape. But not in an automated fashion like typical nightly dumps. On request, we'd take the scientist's data and copy it over to our server that has the tape unit, then dump it out to tape, and remove it from the disk there. Once verified, we could tell the scientist it is OK to remove their primary data now, and then we'd store the tape.
Is Amanda suited to this? Or is there another application I should be looking at?
thanks, -Alan
Hey, Alan,
Alan McKay wrote: <snip>
gtar on the back end, which is how I ended up at Amanda. In looking through some of the initial configuration how-tos it seemed as though this was massively over-complex for my application. But then I hit upon "vaulting"
http://wiki.zmanda.com/index.php/How_To:Copy_Data_from_Volume_to_Volume
<snip>
Basically I work in a scientific research lab (stem cell research) where the scientists produce a fair bit of raw data. We want to periodically take the data and archive it to tape and then remove it from disk and store the tape in our archival facility. We'd need a record of what is
<snip> For one thing, I think you seriously need to look at backup up to offline hard drives, instead of tapes. Unless you really want/need to archive the tapes for seven years, or whatever, legally, tapes are not the preferred solution these days: they're very slow to use for recovery, and h/d's are large and fast, and still cheap.
We back up to backup servers, then, every couple of weeks, I run rsync backups (well, we have a locally-rolled system to run the rsync) onto offline drives - in our case, I swap large drives into an eSATA drive bay. When I'm done, they go in the fire safe.
I will note that I work for a US federal agency who I shouldn't mention (I do not speak for the agency or my employer), and our division generates a lot of data, also: easily half a terabyte for one user, and a number for the group that does protein folding....
mark
For one thing, I think you seriously need to look at backup up to offline hard drives, instead of tapes. Unless you really want/need to archive the tapes for seven years
Well, the scientists are talking longer than 7 years so HDs just are not going to cut it
We back up to backup servers, then, every couple of weeks, I run rsync backups (well, we have a locally-rolled system to run the rsync) onto offline drives - in our case, I swap large drives into an eSATA drive bay. When I'm done, they go in the fire safe.
That's what I'd prefer to do :-)
On Wed, Jan 11, 2012 at 2:40 PM, Alan McKay alan.mckay@gmail.com wrote:
For one thing, I think you seriously need to look at backup up to offline hard drives, instead of tapes. Unless you really want/need to archive the tapes for seven years
Well, the scientists are talking longer than 7 years so HDs just are not going to cut it
I'd be hard pressed to find a tape drive that could read any tape I've written that long ago.
We back up to backup servers, then, every couple of weeks, I run rsync backups (well, we have a locally-rolled system to run the rsync) onto offline drives - in our case, I swap large drives into an eSATA drive bay. When I'm done, they go in the fire safe.
That's what I'd prefer to do :-)
You probably need to look at how you identify things first to see if any existing archive approach maps onto that well enough or has a searchable online index so you would know which tape to restore when someone asks for old data.
Personally I always think of backuppc first for backups because it can hold so much more online, but it isn't great at archiving. You can make a standard tar image out of anything it has stored, but for anything but the latest run it would take some command line options or selecting it through a web browser.
--On Wednesday, January 11, 2012 03:40:20 PM -0500 Alan McKay alan.mckay@gmail.com wrote:
Well, the scientists are talking longer than 7 years so HDs just are not going to cut it
Regarding the use of hard drives, you might want to have a look at this: http://www.lockss.org/locksswiki/files/ISandT2008.pdf
At any rate, if you're concerned with archiving beyond seven years, you probably also need to add another dimension to your archival problem. In the professional archival/library industry they're quite aware of having to maintain information longer than the lifetime of any of:
- the individual media that it is stored on (eg: "the tape got too old and is now throwing errors")
- the media type (eg: "my 10 year backups have been stored on ExaByte tapes in a humidity/temperature controlled vault, but I can't find a working ExaByte tape drive anymore", or "does anyone have a drive for my 8-inch floppy? How about a computer that will talk to the drive?")
- the data format (eg: "the document I need is in AppleWriter format. I was able to retrieve it from backups, and the previously recorded checksums match, but I can't find a program that will read it!")
For long term storage, you may need to be able to not just put stuff away, but also have a policy (and the resources!) to periodically migrate data to newer media & formats. This can get expensive in time and money of course; your stakeholders may need to weigh in again periodically to evaluate the value of the data vs the cost of migration.
I'm sure that there are some archivist-related mailing lists out there that can better explain the depth of the horror.
Depending on the value of the data, you may also need to look at multiple copies. And then there's disaster recovery ...
Just because you didn't have enough problems already ...
(BTW, the main site http://www.lockss.org about LOCKSS looks interesting from an acedemic point of view, albiet not relevent here.)
Devin
For long term storage, you may need to be able to not just put stuff away, but also have a policy (and the resources!) to periodically migrate data to newer media & formats.
Yes, we've already begun this process - and we are taking into account the sorts of issues you mentioned.
On Thu, Jan 12, 2012 at 12:48 AM, Devin Reade gdr@gno.org wrote:
--On Wednesday, January 11, 2012 03:40:20 PM -0500 Alan McKay alan.mckay@gmail.com wrote:
Well, the scientists are talking longer than 7 years so HDs just are not going to cut it
[...]
For long term storage, you may need to be able to not just put stuff away, but also have a policy (and the resources!) to periodically migrate data to newer media & formats. This can get expensive in time and money of course; your stakeholders may need to weigh in again periodically to evaluate the value of the data vs the cost of migration.
What about LTFS?
http://en.wikipedia.org/wiki/Linear_Tape_File_System
Seems interesting. Anyone tried it?
Rafa