[CentOS] RAID rebuild time and disk utilization....

Tue Sep 28 00:16:42 UTC 2010
Tom Bishop <bishoptf at gmail.com>

Here are the iostats:


Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.15     2.47  0.41  0.82    13.01    26.36    31.97
0.01    6.98   1.01   0.12
sda1              0.02     0.00  0.00  0.00     0.04     0.00    24.50
0.00    5.38   4.82   0.00
sda2              0.01     0.00  0.00  0.00     0.03     0.00    37.79
0.00    6.77   5.85   0.00
sda3              0.12     2.47  0.40  0.82    12.93    26.36    31.98
0.01    6.96   1.01   0.12
sdb               1.48     0.00 315.21  0.01 40533.39     0.75   128.59
26.94   85.45   2.80  88.24
sdb1              1.47     0.00 315.21  0.01 40533.30     0.75   128.59
26.94   85.45   2.80  88.24
sdc               1.04     0.00 315.65  0.01 40533.48     1.09   128.41
1.92    6.07   0.88  27.69
sdc1              1.02     0.00 315.65  0.01 40533.39     1.09   128.41
1.92    6.07   0.88  27.68
md0               0.00     0.00  0.00  0.00     0.00     0.00     8.00
0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00    16.80  0.00  1.00     0.00   142.40   142.40
0.00    1.00   0.60   0.06
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
sda3              0.00    16.80  0.00  1.00     0.00   142.40   142.40
0.00    1.00   0.60   0.06
sdb              56.40     0.00 189.60  0.00 30822.40     0.00   162.57
17.45   93.23   5.28 100.02
sdb1             56.40     0.00 189.60  0.00 30822.40     0.00   162.57
17.45   93.23   5.28 100.02
sdc               0.00     0.00 239.40  0.00 30643.20     0.00   128.00
0.69    2.89   0.68  16.24
sdc1              0.00     0.00 239.40  0.00 30643.20     0.00   128.00
0.69    2.89   0.68  16.24
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00


The samsung is on sdb and almost always 100% and svctm is higher than 2.....



On Mon, Sep 27, 2010 at 7:05 PM, Tom Bishop <bishoptf at gmail.com> wrote:

> Thanks ROss, I poured through my dmesg logs and all looks well, things
> appear fine but i don't think the samsung should be running at
> 100%...something is not right but it hasn't bit me yet....her are my dmesg
> logs...
>
> scsi0 : ahci
> scsi1 : ahci
> scsi2 : ahci
> scsi3 : ahci
> scsi4 : ahci
> scsi5 : ahci
> ata1: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6ffd00 irq 225
> ata2: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6ffd80 irq 225
> ata3: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6ffe00 irq 225
> ata4: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6ffe80 irq 225
> ata5: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6fff00 irq 225
> ata6: SATA max UDMA/133 abar m1024 at 0xfe6ffc00 port 0xfe6fff80 irq 225
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: ATA-8: WDC WD3200AAJS-00L7A0, 01.03E01, max UDMA/133
> ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata1.00: configured for UDMA/133
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata2.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
> ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata2.00: configured for UDMA/133
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.00: ATA-8: ST31000528AS, CC3E, max UDMA/133
> ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata3.00: configured for UDMA/133
> ata4: SATA link down (SStatus 0 SControl 300)
> ata5: SATA link down (SStatus 0 SControl 300)
> ata6: SATA link down (SStatus 0 SControl 300)
>   Vendor: ATA       Model: WDC WD3200AAJS-0  Rev: 01.0
>   Type:   Direct-Access                      ANSI SCSI revision: 05
> SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
> SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
>  sda: sda1 sda2 sda3
> sd 0:0:0:0: Attached scsi disk sda
>   Vendor: ATA       Model: SAMSUNG HD103SJ   Rev: 1AJ1
>   Type:   Direct-Access                      ANSI SCSI revision: 05
> SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
> sdb: Write Protect is off
> sdb: Mode Sense: 00 3a 00 00
> SCSI device sdb: drive cache: write back
> SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
> sdb: Write Protect is off
> sdb: Mode Sense: 00 3a 00 00
> SCSI device sdb: drive cache: write back
>  sdb: sdb1
> sd 1:0:0:0: Attached scsi disk sdb
>   Vendor: ATA       Model: ST31000528AS      Rev: CC3E
>   Type:   Direct-Access                      ANSI SCSI revision: 05
> SCSI device sdc: 1953525168 512-byte hdwr sectors (1000205 MB)
> sdc: Write Protect is off
> sdc: Mode Sense: 00 3a 00 00
> SCSI device sdc: drive cache: write back
> SCSI device sdc: 1953525168 512-byte hdwr sectors (1000205 MB)
> sdc: Write Protect is off
> sdc: Mode Sense: 00 3a 00 00
> SCSI device sdc: drive cache: write back
>  sdc: sdc1
> sd 2:0:0:0: Attached scsi disk sdc
> device-mapper: uevent: version 1.0.3
> device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised:
> dm-devel at redhat.com
> device-mapper: dm-raid45: initialized v0.2594l
>
>
> Also, smartctl output...
>
> smartctl -A /dev/sdb
> smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail
> Always       -       0
>   2 Throughput_Performance  0x0026   252   252   000    Old_age
> Always       -       0
>   3 Spin_Up_Time            0x0023   071   071   025    Pre-fail
> Always       -       9011
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       5
>   5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x002e   252   252   051    Old_age
> Always       -       0
>   8 Seek_Time_Performance   0x0024   252   252   015    Old_age
> Offline      -       0
>   9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       64
>  10 Spin_Retry_Count        0x0032   252   252   051    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   252   252   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       5
> 191 G-Sense_Error_Rate      0x0022   252   252   000    Old_age
> Always       -       0
> 192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age
> Always       -       0
> 194 Temperature_Celsius     0x0002   057   053   000    Old_age
> Always       -       43 (Lifetime Min/Max 20/47)
> 195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age
> Always       -       0
> 196 Reallocated_Event_Count 0x0032   252   252   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   252   252   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   252   252   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age
> Always       -       0
> 223 Load_Retry_Count        0x0032   252   252   000    Old_age
> Always       -       0
> 225 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       5
>
>
>
>
>
>
>
>
>
> On Mon, Sep 27, 2010 at 5:54 PM, Ross Walker <rswwalker at gmail.com> wrote:
>
>> On Sep 27, 2010, at 1:24 PM, Les Mikesell <lesmikesell at gmail.com> wrote:
>>
>> > On 9/27/2010 12:19 PM, Benjamin Franz wrote:
>> >> On 09/27/2010 08:15 AM, Tom Bishop wrote:
>> >>> So I'm in the process of building and testing a raid setup and it
>> >>> appeared to take along time to build I came across some settings for
>> >>> setting the min amount of time and that helped but it appears that one
>> >>> of the disks is struggling  (100 utilization) vs the other one...I was
>> >>> wondering if anyone else has seen this and if so, is their a solution
>> >>> for it...my 2 disks are 1 Samsung F3 1tb /dev/sdb and 1 Seagate
>> >>> 7200.12 1Tb /dev/sdc...smartctl looks good on both....
>> >>>
>> >>>
>> >>
>> >> [...]
>> >>
>> >> What is the output from 'cat /proc/mdstat'?
>> >
>> > Also, head position/motion is the usual limiting factor with disk
>> > performance.  Do you have other disk activity that will keep yanking the
>> > head on the active drive away from the track currently needed for the
>> > rebuild?
>>
>> Also, also, if any of the drives are operating in PIO mode (Legacy Mode)
>> then rebuild times will be horrific, as well as performance afterward. Make
>> sure all drives are in SATA/AHCI mode.
>>
>> -Ross
>>
>> PS look at svc_tm in iostat and make sure all drives are performing
>> correctly. Maybe a bad cable or flakey port expander...
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20100927/277c73ba/attachment-0005.html>