[CentOS] Disaster recovery recommendations

Sat Oct 31 21:00:04 UTC 2015
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Sat, October 31, 2015 3:30 pm, Mark LaPierre wrote:
> On 10/31/15 15:17, Valeri Galtsev wrote:
>>
>> On Fri, October 30, 2015 9:31 pm, Mark LaPierre wrote:
>>> On 10/30/15 17:30, Max Pyziur wrote:
>>>>
>>>> Greetings,
>>>>
>>>> I have three drives; they are all SATA Seagate Barracudas; two are
>>>> 500GB; the third is a 2TB.
>>>>
>>>> I don't have a clear reason why they have failed (possibly due to a
>>>> deep, off-brand, flakey mobo; but it's still inconclusive, but I would
>>>> like to find a disaster recovery service that can hopefully recover
>>>> the
>>>> data.
>>>>
>>>> Much thanks for any and all suggestions,
>>>>
>>>> Max Pyziur
>>>> pyz at brama.com
>>>
>>> If you can get them mounted on a different machine, other than the one
>>> with the problem mother board, then I suggest giving SpinRite a try.
>>>
>>> https://www.grc.com/sr/spinrite.htm
>>
>> I listened to guy's video. Pretty much sounds like what command line
>> utility
>>
>> badblocks
>>
>> does. The only viable I hear is its latest addition when this utility
>> flips all bits and writes into the same location. In fact it is anything
>> (containing both 0's and 1's) that is to be written to the sector, then
>> on
>> write the drive firmware kicks in as the drive itself on write operation
>> reads written sector and compared to what was sent to it and if it
>> differs
>> it labels sector, or rather block I used wrong term just after this guy
>> as
>> I was listening while typing. Anyway this forces discovery and
>> re-allocation of bad blocks. Otherwise bad blocks are discovered on some
>> read operation, if CRC (cyclic redundancy check sum) on read doesn't
>> match, the firmware reads the block many times and superimposes the read
>> results, if it finally gets CRC match it happily writes what it came
>> with
>> to the bad block relocation area, and adds block to bad block
>> re-allocation table. After some number of reads if firmware doesn't come
>> up with CRC match it gives up, writes whatever superimposed data is. So
>> these data are under suspicion as even CRC match doesn't mean the data
>> is
>> correct. This is why there are filesytems (ZFS to name one) that store
>> really sophisticated checksums for each of files.
>>
>> Two things can be mentioned here.
>>
>> 1. If you notice that sometimes the machine (I/O actually) freezes on
>> access of some file(s), it most likely means the drive firmware is
>> struggling to do its magic on recovery of content and re-allocation of
>> newly discovered bad blocks. Time to check and maybe replace the drive.
>>
>> 2. Hardware RAIDs (and probably software RAIDs - someone chime in, I'm
>> staying away from software RAIDs) have the ability to schedule "verify"
>> task. This basically goes over all sectors (or blocks) of all drives
>> thus:
>> a. forcing drive firmware to discover newly developed bad blocks; b. as
>> drives when working on badblock will often time out, then RAID firmware
>> will kick this drive out, and will start rebuilding RAID, thus
>> re-writing
>> content of bad block on the drive developed bad block. In this case the
>> information comes from good drives, thus less likely to be corrupted.
>> What
>> I described is best case scenario, not always drive will time out... so
>> even hardware RAIDS are prone to actual data corruption, Bottom line, it
>> is good to migrate to something like ZFS.
>>
>> Thanks.
>> Valeri
>>
>>>
>>> It's inexpensive which makes it a low risk and not much of a loss if it
>>> doesn't work.
>>>
>>> Also consider this a lesson learned.  The cost of a second low capacity
>>> machine, including the electric bill to run it, is insignificant
>>> compared to paying for data recovery.
>>>
>>> http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=7841915&Sku=J001-10169
>>>
>>> If you insist on keeping personal control of your data, like I do, then
>>> that is the best way to go about it.  Use the second machine as your
>>> backup.  Set it up as a NAS device and use rsync to keep your data
>>> backed up.  If you're paranoid you could even locate the old clunker
>>> off
>>> site at a family/friend's home and connect to it using ssh over the
>>> internet.
>>>
>>> Your other option is to use a cloud storage service of some kind.  Be
>>> sure to encrypt anything you store on the cloud on your machine first,
>>> before you send it to the cloud, so that your data will be secure even
>>> if someone hacks your cloud service.  There's another drawback to using
>>> a cloud as your backup.  The risk is small, but you do have to realize
>>> that the cloud could blow away along with your data.  It's happened
>>> before.
>>>
>>> --
>>>     _
>>>    °v°
>>>   /(_)\
>>>    ^ ^  Mark LaPierre
>>> Registered Linux user No #267004
>>> https://linuxcounter.net/
>>> ****
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++
>> Valeri Galtsev
>> Sr System Administrator
>> Department of Astronomy and Astrophysics
>> Kavli Institute for Cosmological Physics
>> University of Chicago
>> Phone: 773-702-4247
>> ++++++++++++++++++++++++++++++++++++++++
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
>
> Hey Valeri,
>
> What you say is true and should be considered when he rebuilds his system.
>
> The point of my post was to suggest a way for the OP to recover his data
> at a reasonable cost using Spinrite.

It sounds like OP has fried circuit boards on at least two, but likely all
three drives (from what I read OP describes). If he were able to have
drives visible on the bus, then reading drive content sector-by-sector or
block-by-block would be the way to create raw drive image - which would be
first step of data recovery.

>
> One point you may be confused with is that Spinrite does not care what
> file system you have on your disk.

No, I understood that from guy's oral presentation. That's why I compared
spinrite earlier version with command line "badblocks" tool, which does
the same: writes/reads/compares one block (read or write unit) at a time.
badblocks if I remember correctly has flag to not destroy data on disk for
which it reads and remembers sector, does its read-write-compare test,
then writes original data back in place.

> Spinrite does not mount the file
> system.  It access the disk storage media one sector at a time using the
> actual drive hardware/firmware to read the data from each sector.  If it
> does not succeed in reading the sector it keeps trying using various
> methods until it gets a read or until it is satisfied that the sector is
> unreadable.

Yes thanks for the Spinrite reference. This adds to arsenal of recovery
tools when drive is visible on the bus, and only the surface of platters
got bad. Often GUI tools are more transparent for humans, thus diminishing
chance of blunders compared to UNIX command line tools.

>
> When it gets a read it writes it back to the center of the track where
> it's supposed to be and checks to be sure that it worked by reading it
> back again.
>
> As Spinrite progresses across the storage media the drive firmware
> manages the marking of truly unrecoverable sectors as bad and the other
> sectors as good.

I sent OP off the list references to recovery services people I know
personally used with success (my own plan is: I have a good backup ;-).
Unless I'm misreading what he writes his case is burned circuit boards of
drives. Then data on the platters most likely are intact. Which is the
most encouraging if recovery company is involved, this pure "mechanical"
thing is most trivial for them (even though it involves clean lab and
fancy equipment and capable techicians).

Valeri

>
> --
>     _
>    °v°
>   /(_)\
>    ^ ^  Mark LaPierre
> Registered Linux user No #267004
> https://linuxcounter.net/
> ****
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++