[CentOS] Problem with softwareraid

Sat Aug 19 00:47:52 UTC 2017
mad.scientist.at.large at tutanota.com <mad.scientist.at.large at tutanota.com>

18. Aug 2017 13:35 by euroregistrar at gmail.com:

> Hello all,
> i have already had a discussion on the software raid mailinglist and i
> want to switch to this one :)
> I am having a really strange problem with my md0 device running
> centos7. after a new start of my server the md0 was gone. now after
> trying to find the problem i detected the following:
> Booting any installed kernel gives me NO md0 device. (ls /dev/md*
> doesnt give anything). a 'cat /proc/partitions show me now
> /dev/sd[a-d]1 partition. partprobe and a mdadm assemble gives me "disk
> busy"
> [root at quad live]# cat mdstat
> Personalities : [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
> [root at quad ~]# partprobe
> device-mapper: remove ioctl on WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1
> failed: Device or resource busy
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>snip

Are you definately using cables rated for sata III?  Have you checked the power connections?  Have you checked the power supply voltages durning spin up/later?  

Is there tension or major twisting forces on the sata cables?   I've seen this cause intermittent problems and was solved by using a longer cable that reduced the stress at the connector.

Are the drives getting hot (your' model shouldn't have a heat issue under normal conditions).  Are the drives bolted into a system?  Drives can be sensitive to vibration and identical, unmounted drives will tend to shake each other and can produce rotational torque as well (especially when the same model as they'll all have the same resonances in that case).  Either can cause problems with keeping the heads over the track reliably.

I'd definately run all the smart test.  start with the conveyance test and then the short self test, and possibly the long test.   do check the drive temperatures immediately after each test to make sure they aren't getting too hot.

I assume you've done an fsck on the file systems?  If not it might be good to check.

Are you using the mother boards sata interfaces or an add-on card?  If using a card i'd check the firmware version on the card and what the manufacturer is offering for updates.

Are the drives still under warranty?  If so try WD tech support.  Also check that all the Raid tools are properly installed with their' dependencies met.  could be other hardware/drivers interfering.  might reset the bios to "optimized settings".  Which software raid package are you using?

Other than that I'd possibly suspect a software problem, not familiar with software raids myself (haven't used on, know what they are).  Or possibly a problem with the drive that is intermitant or complex in how it fails.