[CentOS] SATA Raid 5 and losing a drive

Tue Apr 11 18:07:37 UTC 2006
Andy Green <andy at warmcat.com>

Andy Green wrote:
> Joshua Baker-LePain wrote:
>> Did you try doing any I/O to the array?  In my limited experience with 
>> software RAID, it won't notice a drive missing until it tries to do 
>> something with said drive.
> Yes I did do this, I copied a file to the mountpoint and did a sync. 
> Nothing.

Hm Googling around suggests that everyone with SATA raid may be 
experiencing the same lack of warning that their safety net just blew a 
hole through the server farm roof in a bid to reach escape velocity.

''...The error handling is very simple, but at this stage that is an 
advantage. Error handling code anywhere is inevitably both complex and 
sorely under-tested. libata error handling is intentionally simple. 
Positives: Easy to review and verify correctness. Never data corruption. 
Negatives: if an error occurs, libata will simply send the error back 
the block layer. There are limited retries by the block layer, depending 
on the type of error, but there is never a bus reset.

Or in other words: "it's better to stop talking to the disk than 
compound existing problems with further problems."

As Serial ATA matures, and host- and device-side errata become apparent, 
the error handling will be slowly refined. I am planning to work with a 
few (kind!) disk vendors, to obtain special drives/firmwares that allow 
me to inject faults, and otherwise exercise error handling code.

Error handling improvements will almost certainly be required in order 
to implement features such as device hotplug.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4492 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.centos.org/pipermail/centos/attachments/20060411/0c3cf5bc/attachment-0005.bin>