[CentOS] Disaster recovery recommendations

Sat Oct 31 19:17:30 UTC 2015
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Fri, October 30, 2015 9:31 pm, Mark LaPierre wrote:
> On 10/30/15 17:30, Max Pyziur wrote:
>>
>> Greetings,
>>
>> I have three drives; they are all SATA Seagate Barracudas; two are
>> 500GB; the third is a 2TB.
>>
>> I don't have a clear reason why they have failed (possibly due to a
>> deep, off-brand, flakey mobo; but it's still inconclusive, but I would
>> like to find a disaster recovery service that can hopefully recover the
>> data.
>>
>> Much thanks for any and all suggestions,
>>
>> Max Pyziur
>> pyz at brama.com
>
> If you can get them mounted on a different machine, other than the one
> with the problem mother board, then I suggest giving SpinRite a try.
>
> https://www.grc.com/sr/spinrite.htm

I listened to guy's video. Pretty much sounds like what command line utility

badblocks

does. The only viable I hear is its latest addition when this utility
flips all bits and writes into the same location. In fact it is anything
(containing both 0's and 1's) that is to be written to the sector, then on
write the drive firmware kicks in as the drive itself on write operation
reads written sector and compared to what was sent to it and if it differs
it labels sector, or rather block I used wrong term just after this guy as
I was listening while typing. Anyway this forces discovery and
re-allocation of bad blocks. Otherwise bad blocks are discovered on some
read operation, if CRC (cyclic redundancy check sum) on read doesn't
match, the firmware reads the block many times and superimposes the read
results, if it finally gets CRC match it happily writes what it came with
to the bad block relocation area, and adds block to bad block
re-allocation table. After some number of reads if firmware doesn't come
up with CRC match it gives up, writes whatever superimposed data is. So
these data are under suspicion as even CRC match doesn't mean the data is
correct. This is why there are filesytems (ZFS to name one) that store
really sophisticated checksums for each of files.

Two things can be mentioned here.

1. If you notice that sometimes the machine (I/O actually) freezes on
access of some file(s), it most likely means the drive firmware is
struggling to do its magic on recovery of content and re-allocation of
newly discovered bad blocks. Time to check and maybe replace the drive.

2. Hardware RAIDs (and probably software RAIDs - someone chime in, I'm
staying away from software RAIDs) have the ability to schedule "verify"
task. This basically goes over all sectors (or blocks) of all drives thus:
a. forcing drive firmware to discover newly developed bad blocks; b. as
drives when working on badblock will often time out, then RAID firmware
will kick this drive out, and will start rebuilding RAID, thus re-writing
content of bad block on the drive developed bad block. In this case the
information comes from good drives, thus less likely to be corrupted. What
I described is best case scenario, not always drive will time out... so
even hardware RAIDS are prone to actual data corruption, Bottom line, it
is good to migrate to something like ZFS.

Thanks.
Valeri

>
> It's inexpensive which makes it a low risk and not much of a loss if it
> doesn't work.
>
> Also consider this a lesson learned.  The cost of a second low capacity
> machine, including the electric bill to run it, is insignificant
> compared to paying for data recovery.
>
> http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=7841915&Sku=J001-10169
>
> If you insist on keeping personal control of your data, like I do, then
> that is the best way to go about it.  Use the second machine as your
> backup.  Set it up as a NAS device and use rsync to keep your data
> backed up.  If you're paranoid you could even locate the old clunker off
> site at a family/friend's home and connect to it using ssh over the
> internet.
>
> Your other option is to use a cloud storage service of some kind.  Be
> sure to encrypt anything you store on the cloud on your machine first,
> before you send it to the cloud, so that your data will be secure even
> if someone hacks your cloud service.  There's another drawback to using
> a cloud as your backup.  The risk is small, but you do have to realize
> that the cloud could blow away along with your data.  It's happened
> before.
>
> --
>     _
>    °v°
>   /(_)\
>    ^ ^  Mark LaPierre
> Registered Linux user No #267004
> https://linuxcounter.net/
> ****
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++