On Tue, Feb 10, 2015 at 9:12 PM, John R Pierce pierce@hogranch.com wrote:
On 2/10/2015 6:54 PM, Chris Murphy wrote:
Why I avoid swap on md raid 1/10 is because of the swap caveats listed under man 4 md. Is possible for a page in memory to change between the writes to the two md devices such that the mirrors are in fact different. The man page only suggests this makes scrub check results unreliable, and that such a difference wouldn't be read (?) But I don't understand this. So I just avoid it because I haven't thoroughly tested it.
if its possible for that to happen, then the whole swapping AND mdraid mechanisms in linux are badly broken.
I suggest not taking my word for it, and reading man (4) md, starting with the paragraph "The most likely cause for an unexpected mismatch on RAID1 or RAID10 occurs if a swap partition or swap file is stored on the array" and including the following 4 paragraphs, and let me know what you think it's saying. It made my eyebrows raise, but it seems to be saying it's not actually resulting in corruption. The part I don't understand is how a page change between the writes to two (swap on) mirrors translates into unused swap and thus not a problem that there's a (meaningful) mismatch between the two mirrors. If the page write to disk happened at all, it seems like this is used rather than not used swap.
For data (not swap), a related known issue for all raid 1 and 5 is a series of common problems: regularly scheduled scrubs are necessary to make sure bad sectors are identified and corrected, yet this isn't the default behavior, it has to be configured; further, a reported mismatch doesn't unambiguously tell us which copy is good (or bad), it's merely reported that they're different. Ergo, regularly schedule "checks" are a good idea, while "repair" is sort of a last resort because it might cause the good copy to get overwritten.
This isn't broken. It's just the way it's designed. This is what DIF/DIX (now PI), Btrfs and ZFS are meant to address. There's also been some intermittent talk on linux-raid@ whether and how to get checksums integrated there.