[CentOS] Race condition with mdadm at boot [still mystifying]
chuckm at seafoam.net
Fri Mar 11 03:25:22 UTC 2011
This is a bit long-winded, but I wanted to share some info ....
Regarding my earlier message about a possible race condition with mdadm,
I have been doing all sorts of poking around with the boot process.
Thanks to a tip from Steven Yellin at Stanford, I found where to add a
delay in the rc.sysinit script, which invokes mdadm to assemble the arrays.
Unfortunately it didn't help, so it likely wasn't a race condition after
However, on close examination of dmesg, I found something very
interesting. There were missing 'bind<sd??>' statements for one or the
other hot spare drive (or sometimes both). These drives are connected
to the last PHYs in each SATA controller ... in other words they are the
last devices probed by the driver for a particular controller. It would
appear that the drivers are bailing out before managing to enumerate all
of the partitions on the last drive in a group, and missing partitions
occur quite randomly.
So it may or may not be a timing issue between the WD Caviar Black
drives and both the LSI and Marvell SAS/SATA controller chips.
So, I replaced the two drives (SATA-300) with two faster drives
(SATA-600) on the off chance they might respond fast enough before the
drivers move on to other duties. That didn't help either.
Each group of arrays uses completely drivers (mptsas and sata_mv) but
both exhibit the same problem, so I'm mystified as to where the real
issue lies. Anyone care to offer suggestions?
More information about the CentOS