I got the email that a drive in my 4-drive RAID10 setup failed. What are my options?
Drives are WD1000FYPS (Western Digital 1 TB 3.5" SATA).
mdadm.conf:
# mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md/root level=raid10 num-devices=4 UUID=942f512e:2db8dc6c:71667abc:daf408c3
/proc/mdstat: Personalities : [raid10] md127 : active raid10 sdf1[2](F) sdg1[3] sde1[1] sdd1[0] 1949480960 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U] bitmap: 15/15 pages [60KB], 65536KB chunk
smartctl reports this for sdf: 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 6
So it's got 6 bad blocks, 1 pending for remapping.
Can I clear the error and rebuild? (It's not clear what commands would do that.) Or should I buy a replacement drive? I'm considering a WDS100T1R0A (2.5" 1TB red drive), which Amazon has for $135, plus the 3.5" adapter.
The system serves primarily as a home mail server (it fetchmails from an outside VPS serving as my domain's MX) and archival file server.
I got the email that a drive in my 4-drive RAID10 setup failed. What are my options?
Drives are WD1000FYPS (Western Digital 1 TB 3.5" SATA).
mdadm.conf:
# mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md/root level=raid10 num-devices=4 UUID=942f512e:2db8dc6c:71667abc:daf408c3
/proc/mdstat: Personalities : [raid10] md127 : active raid10 sdf1[2](F) sdg1[3] sde1[1] sdd1[0] 1949480960 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U] bitmap: 15/15 pages [60KB], 65536KB chunk
smartctl reports this for sdf: 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always
1
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline
6
So it's got 6 bad blocks, 1 pending for remapping.
Can I clear the error and rebuild? (It's not clear what commands would do that.) Or should I buy a replacement drive? I'm considering a WDS100T1R0A
Hi,
mdadm --remove /dev/md127 /dev/sdf1
and then the same with --add should hotremove and add dev device again.
If it rebuilds fine it may again work for a long time.
Simon
(2.5" 1TB red drive), which Amazon has for $135, plus the 3.5" adapter.
The system serves primarily as a home mail server (it fetchmails from an outside VPS serving as my domain's MX) and archival file server.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
--On Friday, September 18, 2020 10:53 PM +0200 Simon Matter simon.matter@invoca.ch wrote:
mdadm --remove /dev/md127 /dev/sdf1
and then the same with --add should hotremove and add dev device again.
If it rebuilds fine it may again work for a long time.
Thanks. That reminds me: If I need to replace it, is there some easy way to figure out which drive bay is sdf? It's an old Supermicro rack chassis with 6 drive bays. Perhaps a way to blink the drive light?
--On Friday, September 18, 2020 10:53 PM +0200 Simon Matter simon.matter@invoca.ch wrote:
mdadm --remove /dev/md127 /dev/sdf1
and then the same with --add should hotremove and add dev device again.
If it rebuilds fine it may again work for a long time.
This worked like a charm. When I added it back, it told me it was "re-adding" the drive, so it recognized the drive I'd just removed. I checked /proc/mdstat and it showed rebuilding. It took about 90 minutes to finish and is now running fine.
--On Friday, September 18, 2020 10:53 PM +0200 Simon Matter simon.matter@invoca.ch wrote:
mdadm --remove /dev/md127 /dev/sdf1
and then the same with --add should hotremove and add dev device again.
If it rebuilds fine it may again work for a long time.
This worked like a charm. When I added it back, it told me it was "re-adding" the drive, so it recognized the drive I'd just removed. I checked /proc/mdstat and it showed rebuilding. It took about 90 minutes to finish and is now running fine.
I think it's usually like this: When a drive has a bad sector, the sector is then read from the other raid disk but the failing disk is marked bad. Then when rebuilding, the bad sector gets written again and the drive remaps it to a spare sector. As a result all is well again. Note that the drive firmware can handle such cases differently depending on the drive type.
Regards, Simon
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos