Question about a hard drive error

List overview All Threads
Download

newer

older

Memtest86+ running time

Still have problems with secure...

Gilbert Sebenste

11 Nov 2010 11 Nov '10

2:58 a.m.

Hey everyone,

I just got one of these today:

Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code = 0x08000000 Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error Nov 10 16:07:54 stormy kernel: Nov 10 16:07:54 stormy kernel: Info fld=0x0 Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector 3896150669 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

My question is this: I have RAID00 set up, but don't really understand it well. This is how my disks are set up:

Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 1886608544 296733484 1492495120 17% / /dev/sda1 101086 19877 75990 21% /boot tmpfs 1684312 1204416 479896 72% /dev/shm

Which one is having the trouble? Any ideas so I can swap it out?

******************************************************************************* Gilbert Sebenste ******** (My opinions only!) ****** Staff Meteorologist, Northern Illinois University **** E-mail: sebenste@weather.admin.niu.edu *** web: http://weather.admin.niu.edu ** *******************************************************************************

Show replies by date

John R Pierce

11 Nov 11 Nov

3:31 a.m.

On 11/10/10 6:58 PM, Gilbert Sebenste wrote:

...

Hey everyone,

I just got one of these today:

Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code = 0x08000000 Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error Nov 10 16:07:54 stormy kernel: Nov 10 16:07:54 stormy kernel: Info fld=0x0 Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector 3896150669

see where it says dev sda ? thats physical drive zero which has a read error on that sector.

...

Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

My question is this: I have RAID00 set up, but don't really understand it well. This is how my disks are set up:

Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 1886608544 296733484 1492495120 17% / /dev/sda1 101086 19877 75990 21% /boot tmpfs 1684312 1204416 479896 72% /dev/shm

that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

that dev/mapper thing is a LVM volume. you can display the physical volumes behind a LVM with the command 'pvs'

...

Which one is having the trouble? Any ideas so I can swap it out?

raid0 is not suitable for reliability. if any one drive in the raid0 fails (or is removed) the whole volume has failed and will become unusable.

Gilbert Sebenste

15 Nov 15 Nov

6:41 p.m.

On Wed, 10 Nov 2010, John R Pierce wrote:

...

On 11/10/10 6:58 PM, Gilbert Sebenste wrote:

...
Hey everyone,

I just got one of these today:

Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code = 0x08000000 Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error Nov 10 16:07:54 stormy kernel: Nov 10 16:07:54 stormy kernel: Info fld=0x0 Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector 3896150669

see where it says dev sda ? thats physical drive zero which has a read error on that sector.

...
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800) Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

My question is this: I have RAID00 set up, but don't really understand it well. This is how my disks are set up:

Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 1886608544 296733484 1492495120 17% / /dev/sda1 101086 19877 75990 21% /boot tmpfs 1684312 1204416 479896 72% /dev/shm

that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

Correct, apologies for the incorrect wording.

...

that dev/mapper thing is a LVM volume. you can display the physical volumes behind a LVM with the command 'pvs'

Thank you! That was helpful.

...

...
Which one is having the trouble? Any ideas so I can swap it out?

raid0 is not suitable for reliability. if any one drive in the raid0 fails (or is removed) the whole volume has failed and will become unusable.

Thanks John, I appreciate it! Both are being replaced after a nearby 55 KV power line shorted to ground and blew a manhole cover 50' into the air, damaging a lot of equipment over here, even those on UPS's. Nobody was hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

******************************************************************************* Gilbert Sebenste ******** (My opinions only!) ****** *******************************************************************************

Benjamin Franz

16 Nov 16 Nov

4:31 p.m.

On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:

...

Thanks John, I appreciate it! Both are being replaced after a nearby 55 KV power line shorted to ground and blew a manhole cover 50' into the air, damaging a lot of equipment over here, even those on UPS's. Nobody was hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

In these days of multi-terabyte drives you should be looking at RAID6 instead. The chances of a 'double failure' during degraded operation/resync is too high to ignore.

-- Benjamin Franz

Alan Hodgson

4:42 p.m.

On November 16, 2010 08:31:05 am Benjamin Franz wrote:

...

On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:

...
Thanks John, I appreciate it! Both are being replaced after a nearby 55 KV power line shorted to ground and blew a manhole cover 50' into the air, damaging a lot of equipment over here, even those on UPS's. Nobody was hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

In these days of multi-terabyte drives you should be looking at RAID6 instead. The chances of a 'double failure' during degraded operation/resync is too high to ignore.

Like almost 100% ..

John R Pierce

5:25 p.m.

On 11/16/10 8:31 AM, Benjamin Franz wrote:

...

In these days of multi-terabyte drives you should be looking at RAID6 instead. The chances of a 'double failure' during degraded operation/resync is too high to ignore.

These days of cheap drives, I use raid10 almost exclusively. and if its at all mission critical, I like to have 1-2 hotspares. if I was deploying a new server, and its workload was at all database-centric, I'd want to use use 2.5" SAS rather than 3.5" SATA

With RAID10, the rebuild time is how long it takes to copy the one drive. if you have 6 drives in a raid10 and one fails, leaving 5, and another fails, there's only a 1 in 5 chance of that other failure being the mirror of the dead drive. If you have a hot spare, that rebuild starts immediately, reducing the window for that dreaded double failure to a minimum.

Benjamin Franz

6:41 p.m.

On 11/16/2010 09:25 AM, John R Pierce wrote:

...

These days of cheap drives, I use raid10 almost exclusively. and if its at all mission critical, I like to have 1-2 hotspares. if I was deploying a new server, and its workload was at all database-centric, I'd want to use use 2.5" SAS rather than 3.5" SATA

With RAID10, the rebuild time is how long it takes to copy the one drive. if you have 6 drives in a raid10 and one fails, leaving 5, and another fails, there's only a 1 in 5 chance of that other failure being the mirror of the dead drive. If you have a hot spare, that rebuild starts immediately, reducing the window for that dreaded double failure to a minimum.

Oh, I agree - and when price is no object, or if write performance is the bottleneck, or if you need huge numbers of drives, I love RAID10. You can take it to crazy levels of redundancy + performance by going to RAID0 layered over multiple three-way RAID1 arrays. Why have multiple hotspares when you can go for N>2-RAID1 + 0 instead and get a hefty performance boost on reads for almost free at even higher reliability?

-- Benjamin Franz

John R Pierce

6:47 p.m.

On 11/16/10 10:41 AM, Benjamin Franz wrote:

...

Oh, I agree - and when price is no object, or if write performance is

the price spread isn't that big of a deal.

a 6-drive raid-6 gives you 4x space, while a 6 drive raid-10 gives you 3X. not that big of a deal.

an 8-drive RAID-6 gives you 6X space, while an 8-drive RAID-10 gives you 4x space. not much bigger of a gap.

raid sets really shouldn't be much bigger than about 8 drives, anyways. rebuild times for a 12 drive raid6 would be astronomical.

Benjamin Franz

7:07 p.m.

On 11/16/2010 10:47 AM, John R Pierce wrote:

...

raid sets really shouldn't be much bigger than about 8 drives, anyways. rebuild times for a 12 drive raid6 would be astronomical.

You are ok up to here. Rebuild time for replacement of a failed drive scales by drive size, not raid set size, regardless of whether it is RAID1, 5, 6 or 10. It remains roughly the amount of time it takes to completely write one drive at full speed (at least unless you run out of bus bandwidth - but that takes a lot of drives).

However, system availability/performance is much better for RAID10 than for the others during a rebuild because of the isolation of the rebuild work to only the involved spindles.

-- Benjamin Franz

5496

Age (days ago)

5501

Last active (days ago)

discuss@lists.centos.org

8 comments

4 participants

tags (0)

participants (4)

Alan Hodgson
Benjamin Franz
Gilbert Sebenste
John R Pierce