On Sat, Feb 28, 2015 at 4:29 PM, Valeri Galtsev <galtsev at kicp.uchicago.edu> wrote: > You are implying that firmware of hardware RAID cards is somehow buggier > than software of software RAID plus Linux kernel (sorry if I > misinterpreted your point). "Drives, and hardware RAID cards are subject to firmware bugs, just as we have software bugs in the kernel." makes no assessment of how common such bugs are relative to each other. > I disagree: embedded system of RAID card and > RAID function they have to fulfill are much simpler than everything > involved into software RAID. Therefore, with the same effort invested, > firmware of (good) hardware is less buggy. There's no evidence provided for this. All I've stated is bugs happen in both software and the firmware on hardware RAID cards. http://www.cs.toronto.edu/~bianca/papers/fast08.pdf And further there's a widespread misperception that RAID56 (whether software or hardware) is capable of detecting and correcting corruption. > And again, Linux kernel can be > panicked more likely than trivial embedded system of hardware RAID > card/box. At least my experience over decade and a half confirms that. I'd say this is not a scientific sample and therefore unproven. I can provide my own non-scientific sample: an XServe running OS X with software raid1 which has never, in 8 years, kernel panicked. Its longest uptime was over 500 days, and was only rebooted due to a system upgrade that required it. There's nothing special about the XServe that makes this magic, it's just good hardware with ECC memory, enterprise SAS drives, and a capable though limited kernel. So there's no good reason to expect kernel panics. Having them means something is wrong. > I have my raids verified once a week. If you don't > verify them for a year, what happens then: you don't discover individual > drive degradation until it is too late and larger number than the level of > redundancy are kicked out because of fatal failures. This is a common problem on software and hardware RAID alike, the lack of scrubbing. Also recognize that software raid tends to bring along cheaper drives that aren't well suited for RAID use, whereas people spending money on hardware raid tend to invest in appropriate drives. That prevents problems due to proper SCT ERC settings on the drive. >Anyway, these > horror stories were purely poor sysadmin's job IMHO. I agree. This is common in any case. -- Chris Murphy