3ware disk failure -> hang

List overview All Threads
Download

newer

older

gdm won't start

unconfig'ing a machine

Joshua Baker-LePain

6 Jan 2006 6 Jan '06

4:49 p.m.

I've got an i386 server running centos 4.2 with 3 3ware controllers in it -- an 8006-2 for the system disks and 2 7500-8s. On the 7500s, I'm running an all software RAID50. This morning I came in to find the system hung. Turns out a disk went overnight on one of the 7500s, and rather than a graceful failover I got this:

Jan 6 01:03:58 $SERVER kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7,flags = 0x40, unit #3. Jan 6 01:04:02 $SERVER kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #3. Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error encountered on port 3 on controller ID:2. Check cables and drives for media errors. (0xa) Jan 6 01:04:10 $SERVER kernel: Debug: sleeping function called from invalid context at include/asm/uaccess.h:556 Jan 6 01:04:10 $SERVER kernel: in_atomic():0[expected: 0], irqs_disabled():1 Jan 6 01:04:10 $SERVER kernel: [<c011fbe9>] __might_sleep+0x7d/0x88 Jan 6 01:04:10 $SERVER kernel: [<f885f056>] tw_ioctl+0x478/0xb07 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d Jan 6 01:04:10 $SERVER kernel: [<f883f905>] scsi_done+0x0/0x16 [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f8860529>] tw_scsi_queue+0x163/0x1f1 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<f883f748>] scsi_dispatch_cmd+0x1e9/0x24f [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f884417e>] scsi_request_fn+0x297/0x30d [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<c0221f68>] __generic_unplug_device+0x2b/0x2d Jan 6 01:04:10 $SERVER kernel: [<c0221f7f>] generic_unplug_device+0x15/0x21 Jan 6 01:04:10 $SERVER kernel: [<c022288d>] blk_execute_rq+0x88/0xb0 Jan 6 01:04:10 $SERVER kernel: [<c022096e>] elv_set_request+0xa/0x17 Jan 6 01:04:10 $SERVER kernel: [<c022251f>] get_request+0x1de/0x1e8 Jan 6 01:04:10 $SERVER kernel: [<c01a97c3>] task_has_capability+0x4a/0x52 Jan 6 01:04:10 $SERVER kernel: [<c0225cd5>] sg_scsi_ioctl+0x2bf/0x3c1 Jan 6 01:04:10 $SERVER kernel: [<c02261aa>] scsi_cmd_ioctl+0x3d3/0x475 Jan 6 01:04:10 $SERVER kernel: [<c014d41b>] handle_mm_fault+0xbd/0x175 Jan 6 01:04:10 $SERVER kernel: [<c011ad67>] do_page_fault+0x1ae/0x5c6 Jan 6 01:04:10 $SERVER kernel: [<c014e4a6>] vma_adjust+0x286/0x2d6 Jan 6 01:04:10 $SERVER kernel: [<f88228ea>] sd_ioctl+0xb3/0xd4 [sd_mod] Jan 6 01:04:10 $SERVER kernel: [<c02246e8>] blkdev_ioctl+0x328/0x334

The 3ware boards are running the latest firmware and I'm using the stock driver in the kernel. I *did* have to upgrade mdadm to 1.12.0, as the stock version doesn't support stacked arrays. 3dmd is running.

Any ideas as to what I can do to prevent this in the future? Having the system hang every time a disk dies is, well, less than optimal.

Thanks.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Show replies by date

Bryan J. Smith

6 Jan 6 Jan

5:39 p.m.

Joshua Baker-LePain jlb17@duke.edu wrote:

...

I'm running an all software RAID50 ... This morning I came in to find the system hung. Turns out a disk went overnight on one of the 7500s, and rather than a graceful failover I got this: Jan 6 01:03:58 $SERVER kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7,flags = 0x40, unit #3. Jan 6 01:04:02 $SERVER kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #3. Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error encountered on port 3 on controller ID:2. Check cables and drives for media errors. (0xa)

Yes, the drive failed.

Had you used the 3Ware's intelligent hardware RAID, it would have hidden the drive disconnect from the system. You'd see a log entry on the failure, and that the array was in a "downgraded" state.

Instead, you're using software RAID, and it's up to the kernel to not panic on itself because a disk is no longer available. The problem isn't the 3Ware controller, it's the software RAID logic in the kernel.

...

Any ideas as to what I can do to prevent this in the future?

Use the 3Ware card as it is intended, a hardware RAID card.

...

Having the system hang every time a disk dies is, well,

less

...

than optimal.

No joke. It wasn't until even kernel 2.6 that hotplug support was offered, and it still does _not_ work as advertised.

It's stuff like this that makes me want to strangle most advocates of using 3Ware cards with software RAID. There are countless issues like this -- far more than the alleged "hardware lock-in" negative of using hardware RAID.

-- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com ---------------------------------------------------- *** Speed doesn't kill, difference in speed does ***

Les Mikesell

5:53 p.m.

On Fri, 2006-01-06 at 11:39, Bryan J. Smith wrote:

...

...
Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error encountered on port 3 on controller ID:2. Check cables and drives for media errors. (0xa)

Yes, the drive failed.

Had you used the 3Ware's intelligent hardware RAID, it would have hidden the drive disconnect from the system. You'd see a log entry on the failure, and that the array was in a "downgraded" state.

Instead, you're using software RAID, and it's up to the kernel to not panic on itself because a disk is no longer available. The problem isn't the 3Ware controller, it's the software RAID logic in the kernel.

That doesn't look like an md device error, it looks like an IDE driver kernel error that never made it up to the raid logic.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

8:37 p.m.

Les Mikesell lesmikesell@gmail.com wrote:

...

That doesn't look like an md device error, it looks like an IDE driver kernel error that never made it up to the raid logic.

Because you need some sort of hotplug or other logic to trap the error, instead of letting the storage device being reported as "removed" from the system.

When you use 3Ware card as a "dumb" JBOD device, you lose _all_ of its features for hot-swap. That's because it no longer hides the "raw" disk and its state from the system.

So when you use it in JBOD mode, and the volume = disk, you need to use _another_ facility to hide the "raw" disk status from the system. That's where hotplug comes in, and you need to set it up proper.

This has been a repeat theme. People don't understand that all of the 3Ware "advantages" like hot-swap and hiding the "raw" disc status _only_ work when you use its hardware arrays.

Les Mikesell

8:47 p.m.

On Fri, 2006-01-06 at 14:37, Bryan J. Smith wrote:

...

...
That doesn't look like an md device error, it looks like an IDE driver kernel error that never made it up to the raid logic.

Because you need some sort of hotplug or other logic to trap the error, instead of letting the storage device being reported as "removed" from the system.

When you use 3Ware card as a "dumb" JBOD device, you lose _all_ of its features for hot-swap. That's because it no longer hides the "raw" disk and its state from the system.

OK, but "raw" scsi disks don't have this problem.

...

So when you use it in JBOD mode, and the volume = disk, you need to use _another_ facility to hide the "raw" disk status from the system. That's where hotplug comes in, and you need to set it up proper.

Why is this different than a scsi drive?

...

This has been a repeat theme. People don't understand that all of the 3Ware "advantages" like hot-swap and hiding the "raw" disc status _only_ work when you use its hardware arrays.

Of course we don't understand it. Is this documented somewhere?

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

8:56 p.m.

Les Mikesell lesmikesell@gmail.com wrote:

...

OK, but "raw" scsi disks don't have this problem.

Huh? Unless the device is unmounted and not in use, you betcha you'll have the _same_ problem. The kernel panics because the device is no longer available.

Only when you have a SCSI hardware RAID array will you get the same functionality as 3Ware hardware RAID arrays.

...

Why is this different than a scsi drive?

It's not.

...

Of course we don't understand it. Is this documented somewhere?

Sigh. Please show me where it is documented that you can remove _any_, _active_ storage device from a system without it kernel panicing? The only time I can remove _any_ storage device (without configuring advanced hotplug features) is if I take the device off-line.

That's just how the kernel works, _period_.

3Ware physically "hides" the storage devices, but _only_ when you make them an array. As long as the array is intact (be it good or degraded), it is still usable by the OS. The 3Ware is controlling _all_ disc activies, and only reports itself as an array back to the OS.

When the OS sees the "raw" storage, then that's a problem if one part of the storage becomes in available. Such is the case of _any_ storage that is "removed" because it fails -- be it a physical ATA drive on an ATA controller, a physical SCSI drive on a SCSI controller, or any controller that presents a disk as a standalone JBOD volume.

You have to setup something like hotplug to take control of the device, so when it goes off-line, the system doesn't see it, while it's still trying to use it like it's there.

Les Mikesell

9:08 p.m.

On Fri, 2006-01-06 at 14:56, Bryan J. Smith wrote:

...

...
OK, but "raw" scsi disks don't have this problem.

Huh? Unless the device is unmounted and not in use, you betcha you'll have the _same_ problem. The kernel panics because the device is no longer available.

No, I've had several SCSI drive failures in software raid and the kernel will gracefully log the errors, mark the drive failed in the md array and go on about its business. This doesn't happen with typical IDE devices, but the 3ware isn't a typical IDE controller.

If you do everything in the right order, you can replace a failed hot-swap scsi drive and rebuild the software raid without shutting the machine down.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

9:41 p.m.

Les Mikesell lesmikesell@gmail.com wrote:

...

No, I've had several SCSI drive failures in software raid and the kernel will gracefully log the errors, mark the drive failed in the md array and go on about its business. This doesn't happen with typical IDE devices, but the 3ware isn't a typical IDE controller.

Then the 3Ware card must not be doing something that a typical SCSI card does, and more typical of a "dumb" block ATA driver, even though it's using the SCSI subsystem.

...

If you do everything in the right order, you can replace a failed hot-swap scsi drive and rebuild the software raid without shutting the machine down.

Again, SCSI cards must have some added notification, or at least the ones you are using. I guess that would make sense, because there _is_ the "host adapter" on-board.

In any case, 3Ware cards do _not_ do it for JBOD.

Les Mikesell

9:54 p.m.

On Fri, 2006-01-06 at 15:41, Bryan J. Smith wrote:

...

...
If you do everything in the right order, you can replace a failed hot-swap scsi drive and rebuild the software raid without shutting the machine down.

Again, SCSI cards must have some added notification, or at least the ones you are using. I guess that would make sense, because there _is_ the "host adapter" on-board.

Or at least the typical hardware/driver errors aren't fatal.

...

In any case, 3Ware cards do _not_ do it for JBOD.

I'm sure you are right about the behavior but it still seems surprising that the driver for what appears to be hot-swap devices actually isn't.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

10:24 p.m.

Les Mikesell lesmikesell@gmail.com wrote:

...

Or at least the typical hardware/driver errors aren't fatal.

I think you, and most software RAID users, continue to miss the _root_ cause. If you yank a drive out of a system, one that is being used _actively_, you are going to get a kernel panic. I've seen it on ATA and SCSI. It's _not_ a driver issue. It's the fact that you've lost a resource.

The MD code does _not_ handle this. You have to tie into the hotplug system for 2.6 to hide the device's status from the MD code.

Now maybe some SCSI drivers handle it differently. But it is _not_ a driver issue.

I can take down 3Ware arrays or JBODs and do it all-the-time. The key difference is that I'm _not_actively_ using the arrays/JBODs. You're getting the kernel panic because you _are_.

If you are actively using a device, it will tank the kernel if it suddenly becomes unavailable. I have _never_ seen MD handle this correctly, and some SCSI cards must just be more graceful.

Again, _regardless_ of how some SCSI cards might work, with SCSI, ATA and other cards I've used, unless I use hotplug's facilities (one of the reasons why many SCSI drivers were deprecated for 2.6), it does _not_ work.

And you will _not_ get such operation out of a 3Ware card in JBOD mode, _only_ when you use its hardware arrays.

...

...
In any case, 3Ware cards do _not_ do it for JBOD.

I'm sure you are right about the behavior but it still seems surprising that the driver for what appears to be hot-swap devices actually isn't.

-- Les Mikesell lesmikesell@gmail.com

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Les Mikesell

10:34 p.m.

On Fri, 2006-01-06 at 16:24, Bryan J. Smith wrote:

...

...
Or at least the typical hardware/driver errors aren't fatal.

I think you, and most software RAID users, continue to miss the _root_ cause. If you yank a drive out of a system, one that is being used _actively_, you are going to get a kernel panic. I've seen it on ATA and SCSI. It's _not_ a driver issue. It's the fact that you've lost a resource.

The MD code does _not_ handle this. You have to tie into the hotplug system for 2.6 to hide the device's status from the MD code.

Now maybe some SCSI drivers handle it differently. But it is _not_ a driver issue.

I don't understand this distinction. The kernel calls the driver which talks to the controller. There should be a timeout around this and the controller's response or the timeout should be fielded by the driver. How can it not be a driver issue unless the controller actually locks the PC bus (which may be the case with the motherboard IDE controllers - they generally won't boot with a bad drive either). You don't want to hide the status from the MD code - you want the md driver to kick the device out when it has problems.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

11:22 p.m.

Les Mikesell lesmikesell@gmail.com wrote:

...

I don't understand this distinction. The kernel calls the driver which talks to the controller. There should be a timeout around this and the controller's response or the timeout should be fielded by the driver. How can it not be a driver issue unless the controller actually locks the PC bus (which may be the case with the motherboard IDE controllers - they generally won't boot with a bad drive either). You don't want to hide the

status

...

from the MD code - you want the md driver to kick the

device

...

out when it has problems.

Correct. But even the MD code, from what I've seen, directly accesses the devices. That in turn causes the kernel panic, because it assumes the device is usable.

Maybe MD expects certain SCSI facilities, and 3Ware doesn't provide them (and ATA can't). Especially since the 3Ware appears as a SCSI device. But so do many SATA drivers currently, and they do are _not_ full SCSI command sets. That could explain it.

I know the hotplug facility in kernel 2.6 is designed to address the issue of programs or other drivers accessing a device and expecting it to be there. So you should involve it for any such device. I haven't done it personally though, because I rely on hardware RAID.

In any case, my _original_point_ stands.

You can_not_ use 3Ware cards for hot-swap or handling failed drives _unless_ you use its array facilities whereby an array is still active (even if degraded) but not failed. The proliferation of use of 3Ware cards for software RAID because they support hot-swap is _not_ true for anything but its hardware RAID as an array, and must end. I regularly help people realize this when the software RAID support lists set them wrong.

Joshua Baker-LePain

7:16 p.m.

On Fri, 6 Jan 2006 at 9:39am, Bryan J. Smith wrote

...

Had you used the 3Ware's intelligent hardware RAID, it would have hidden the drive disconnect from the system. You'd see a log entry on the failure, and that the array was in a "downgraded" state.

Instead, you're using software RAID, and it's up to the kernel to not panic on itself because a disk is no longer available. The problem isn't the 3Ware controller, it's the software RAID logic in the kernel.

Yes, I'm aware of all that. I've been using 3wares for *years* (as giggle would easily have revealed). But, as the archives of this list will attest to, using these boards in hardware RAID mode in centos 4 is bad news. Performance sucks. There's some sort of nasty interaction between the 3wares and ext3 which makes the combo unusable, really. And we all know the upstream provider's stance on XFS.

...

...
Having the system hang every time a disk dies is, well, less than optimal.

No joke. It wasn't until even kernel 2.6 that hotplug support was offered, and it still does _not_ work as advertised.

Hotplug worked just fine on this system when I tested (multiple times) via 'mdadm -f -r' and 'mdadm -a'. It's the actual disk failure handling that's at fault here.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Mickael Maddison

7:47 p.m.

New subject: Re[2]: 3ware disk failure -> hang

Friday, January 6, 2006, 11:16:31 AM, you wrote:

...

On Fri, 6 Jan 2006 at 9:39am, Bryan J. Smith wrote

...

...
Had you used the 3Ware's intelligent hardware RAID, it would have hidden the drive disconnect from the system. You'd see a log entry on the failure, and that the array was in a "downgraded" state.

Instead, you're using software RAID, and it's up to the kernel to not panic on itself because a disk is no longer available. The problem isn't the 3Ware controller, it's the software RAID logic in the kernel.

...

Yes, I'm aware of all that. I've been using 3wares for *years* (as giggle would easily have revealed). But, as the archives of this list will attest to, using these boards in hardware RAID mode in centos 4 is bad news. Performance sucks. There's some sort of nasty interaction between the 3wares and ext3 which makes the combo unusable, really. And we all know the upstream provider's stance on XFS.

I have a number of machines running 3ware cards on CentOS 3/4 and I haven't had any trouble with them in HW RAID.

...

...
...
Having the system hang every time a disk dies is, well, less than optimal.

No joke. It wasn't until even kernel 2.6 that hotplug support was offered, and it still does _not_ work as advertised.

...

Hotplug worked just fine on this system when I tested (multiple times) via 'mdadm -f -r' and 'mdadm -a'. It's the actual disk failure handling that's at fault here.

Joshua Baker-LePain

7:53 p.m.

New subject: Re[2]: 3ware disk failure -> hang

On Fri, 6 Jan 2006 at 11:47am, Mickael Maddison wrote

...

Friday, January 6, 2006, 11:16:31 AM, you wrote:

...

...
Yes, I'm aware of all that. I've been using 3wares for *years* (as giggle would easily have revealed). But, as the archives of this list will attest to, using these boards in hardware RAID mode in centos 4 is bad news. Performance sucks. There's some sort of nasty interaction between the 3wares and ext3 which makes the combo unusable, really. And we all know the upstream provider's stance on XFS.

I have a number of machines running 3ware cards on CentOS 3/4 and I haven't had any trouble with them in HW RAID.

I guess it depens on your definition of "trouble". These are benchmarks I ran on this system with the 2 3wares in hardware RAID5 and a software RAID0 across them:

write read ----- ---- ext2 81 180 ext3 34 222 XFS 109 213

That ext3 write speed was something I just wasn't willing to live with on a system that used to perform so much better (running RH7.3, hw RAID, and XFS).

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

David Finch

8:30 p.m.

Joshua Baker-LePain wrote:

...

I guess it depens on your definition of "trouble". These are benchmarks I ran on this system with the 2 3wares in hardware RAID5 and a software RAID0 across them:
 write   read
 -----   ----
ext2 81 180 ext3 34 222 XFS 109 213

That ext3 write speed was something I just wasn't willing to live with on a system that used to perform so much better (running RH7.3, hw RAID, and XFS).

Glad to know it's not just me. Using ext3 on a 3ware raid 5 of four 250gb disks. Writes are slow and seem to halt the server until they complete, but it's not a server where response time or write speed is critical.

Bryan J. Smith

11:42 p.m.

David Finch david@mytsoftware.com wrote:

...

Glad to know it's not just me.

It's not. I've seen it too.

...

Using ext3 on a 3ware raid 5 of four 250gb disks.

Doesn't matter what the disks are. The problem is the small cache size on the 3Ware Escalade 7000/8000. They only have 1-4MB of 0 wait state SRAM.

1MB on the original 7200/7210/7400/7410/7800/7810 and 7000-2 and 8000-2, as well the 7006-2 and 8006-2 (xxx6 = 66MHz PCI).

2MB on the 7450/7850 which are, subsequently, the 7500-4/-8 for PATA now, with the 8500-4/-8 and 8506-4/-8 (xxx6 = 66MHz PCI).

4MB on the 7506-12.

SRAM is very expensive, both size and cost-wise. It's the logic used in CPU cache and for networking ASICs. But it has little to no wait -- unlike DRAM which is still 40-70ns on reads (many many wait cycles, typically 6-10 for today's 133-266MHz clocks). That's why 3Ware calls the Escalade 7000/8000 series a "storage switch." It's ideal for RAID-0, 1 and 10.

This size is a serious issue when it comes to Ext3's journal logic, especially pre-2.4.18 kernels IIRC (maybe it was 2.4.15?). With only 2MB typical (4MB on the 7506-12), the commit of the Ext3 journal exceeds that size -- so the card "stalls" on the write when just committing the journal from the data.

...

Writes are slow and seem to halt the server until they complete, but it's not a server where response time or write speed is critical.

You can play with the kernel buffer settings. It's highly recommended for many of the 3Ware Escalade cards, including the 9000 series.

But if performance is a consideration, do _not_ use RAID-5 on the 3Ware Escalade 7000/8000. Use RAID-10. You can break over 200MBps _writes_ with RAID-10 on the 7000/8000 series.

Steve Huff

7 Jan 7 Jan

5:15 p.m.

On Jan 6, 2006, at 6:42 PM, Bryan J. Smith wrote:

...

...
Writes are slow and seem to halt the server until they complete, but it's not a server where response time or write speed is critical.

You can play with the kernel buffer settings. It's highly recommended for many of the 3Ware Escalade cards, including the 9000 series.

would you be willing to recommend a specific way in which i should play with the kernel buffer settings, and which ones i should play with? some systems (with a 9500 card of some sort) i support have this issue, and i'd like to address it.

thanks, steve

--- If this were played upon a stage now, I could condemn it as an improbable fiction. - Fabian, Twelfth Night, III,v

Joshua Baker-LePain

18 Jan 18 Jan

4:07 a.m.

New subject: 3ware RAID10 performance (again. Was Re: 3ware disk failure -> hang)

On Fri, 6 Jan 2006 at 3:42pm, Bryan J. Smith wrote

...

You can play with the kernel buffer settings. It's highly recommended for many of the 3Ware Escalade cards, including the 9000 series.

But if performance is a consideration, do _not_ use RAID-5 on the 3Ware Escalade 7000/8000. Use RAID-10. You can break over 200MBps _writes_ with RAID-10 on the 7000/8000 series.

Wanting to get back to using hardware RAID on my 3wares without taking the crushing RAID5/ext3 performance hit, I took this advice and swapped out all 16 160GB drives in one of my servers for brand new 320GB drives. I configured both cards (7500-8s) in RAID-10 mode with 128KB stripe size. bonnie++ on 1 card only managed about 60 MB/s writing (and 165 MB/s reading). That's with 'blockdev --setra 16384' on the device. A software RAID0 stripe across the 2 arrays managed 90MB/s writes and 300MB/s reads.

What tricks do you have to pull to get the 200MB/s you quote above?

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Bryan J. Smith

6 Jan 6 Jan

8:42 p.m.

New subject: Re[2]: 3ware disk failure -> hang

Joshua Baker-LePain jlb17@duke.edu wrote:

...

I guess it depens on your definition of "trouble". These are benchmarks I ran on this system with the 2 3wares in hardware RAID5 and a software RAID0 across them: write read ----- ---- ext2 81 180 ext3 34 222 XFS 109 213 That ext3 write speed was something I just wasn't willing to live with on a system that used to perform so much

better

...

(running RH7.3, hw RAID, and XFS).

That's because the 3Ware Escalade 7000-8000 series only have 1-4MB of SRAM. The Ext3 journal logic is rather "dumb" (especially on old kernel 2.4) and bursts _more_ than 4MB at a time. As a result, you overflow the small SRAM cache of the 7000-8000 series, and the XOR stalls.

Newer Ext3 journaling code in newer 2.4/2.6 kernels is a little better, but I still do _not_ recommend you use RAID-5 on the 7000/8000 series. I have repeatedly recommended RAID-10 for 3Ware Escalade 7000-8000 series.

I'm still holding off on recommending the 3Ware Escalade 9500S for RAID-5, although most people are seeing good performance with the 9.2.1.1 firmware released ~9 months ago. The verdict is still out on the far newer 9550SX series (with PowerPC).

Lastly, and quite repeatedly, the 3Ware card does _not_ offer hot-swap capability when you use JBOD modes. I, among countless others, have been saying this for years. You have to use _another_ kernel facility to "hide" the status of the discs when you're using JBOD modes.

Bryan J. Smith

8:29 p.m.

New subject: 3ware disk failure -> hang -- how does software RAID "hide" a disk?

Joshua Baker-LePain jlb17@duke.edu wrote:

...

But, as the archives of this list will attest to, using

these

...

boards in hardware RAID mode in centos 4 is bad news. Performance sucks.

At RAID-5 writes? Of course on the 7000/8000 designs. They only have 1-4MB of SRAM, not enough to buffer SRAM.

Furthermore, software RAID-0 is _always_ going to be faster than hardware RAID-0. RAID-5 reads are basically RAID-0 reads (minus one stripe).

But at RAID-1 or RAID-10, 3Ware's 7000/8000 Storage Switch designs are very, very fast.

...

There's some sort of nasty interaction between the 3wares

and

...

ext3 which makes the combo unusable, really.

Huh? _Never_ heard of that. I'm using 7000/8000 series cards on RHEL3 and RHEL4 (as well as FC1-FC3), *0* issues. All Ext3 filesystems.

...

Hotplug worked just fine on this system when I tested (multiple times) via 'mdadm -f -r' and 'mdadm -a'. It's

the

...

actual disk failure handling that's at fault here.

Yes, that's ... tada ... hotplug!

You can't just have a fixed disk "remove itself" from the OS. That's causing your panic.

When you're using 3Ware in JBOD, all it can do is report the disk failure and report the fixed disk as unusable and remove it from the system. So for software RAID, it's up to the _kernel_ to handle that right.

And sure enough, it doesn't.

Has absolutely nothing to do with 3Ware's card. When you use JBOD and you remove or lose a disk, which is its own volume, the 3Ware removes the volume -- just as if a "regular" ATA or SCSI card with a disk.

There is no way for 3Ware to "hide" the volume or continue using it -- because there is a 1:1 disc:volume relationship. They only way to "hide" the disk is to use its hardware RAID features, where multiple disks are a volume.

Until the kernel has standard, trusted features to handle failed disks, it's the reason why I refuse to use software RAID-1, 10 or 5. Hotplug in 2.6 is supposed to handle this when setup correctly, but I've yet to see it.

Mickael Maddison

9:26 p.m.

New subject: Re[2]: 3ware disk failure -> hang -- how does software RAID "hide" a disk?

Hello Bryan,

Well said.

-- Best regards, Mickael mailto:mikelists@silverservers.com Friday, January 6, 2006, 12:29:51 PM, you wrote: > Joshua Baker-LePain jlb17@duke.edu wrote: >> But, as the archives of this list will attest to, using > these >> boards in hardware RAID mode in centos 4 is bad news. >> Performance sucks. > At RAID-5 writes? Of course on the 7000/8000 designs. They > only have 1-4MB of SRAM, not enough to buffer SRAM. > Furthermore, software RAID-0 is _always_ going to be faster > than hardware RAID-0. RAID-5 reads are basically RAID-0 > reads (minus one stripe). > But at RAID-1 or RAID-10, 3Ware's 7000/8000 Storage Switch > designs are very, very fast. >> There's some sort of nasty interaction between the 3wares > and >> ext3 which makes the combo unusable, really. > Huh? _Never_ heard of that. I'm using 7000/8000 series > cards on RHEL3 and RHEL4 (as well as FC1-FC3), *0* issues. > All Ext3 filesystems. >> Hotplug worked just fine on this system when I tested >> (multiple times) via 'mdadm -f -r' and 'mdadm -a'. It's > the >> actual disk failure handling that's at fault here. > Yes, that's ... tada ... hotplug! > You can't just have a fixed disk "remove itself" from the OS. > That's causing your panic. > When you're using 3Ware in JBOD, all it can do is report the > disk failure and report the fixed disk as unusable and remove > it from the system. So for software RAID, it's up to the > _kernel_ to handle that right. > And sure enough, it doesn't. > Has absolutely nothing to do with 3Ware's card. When you use > JBOD and you remove or lose a disk, which is its own volume, > the 3Ware removes the volume -- just as if a "regular" ATA or > SCSI card with a disk. > There is no way for 3Ware to "hide" the volume or continue > using it -- because there is a 1:1 disc:volume relationship. > They only way to "hide" the disk is to use its hardware RAID > features, where multiple disks are a volume. > Until the kernel has standard, trusted features to handle > failed disks, it's the reason why I refuse to use software > RAID-1, 10 or 5. Hotplug in 2.6 is supposed to handle this > when setup correctly, but I've yet to see it.

Adam Gibson

7:47 p.m.

Joshua Baker-LePain wrote:

...

I've got an i386 server running centos 4.2 with 3 3ware controllers in it -- an 8006-2 for the system disks and 2 7500-8s. On the 7500s, I'm running an all software RAID50. This morning I came in to find the system hung. Turns out a disk went overnight on one of the 7500s, and rather than a graceful failover I got this:

Jan 6 01:03:58 $SERVER kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7,flags = 0x40, unit #3. Jan 6 01:04:02 $SERVER kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #3. Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error encountered on port 3 on controller ID:2. Check cables and drives for media errors. (0xa) Jan 6 01:04:10 $SERVER kernel: Debug: sleeping function called from invalid context at include/asm/uaccess.h:556 Jan 6 01:04:10 $SERVER kernel: in_atomic():0[expected: 0], irqs_disabled():1 Jan 6 01:04:10 $SERVER kernel: [<c011fbe9>] __might_sleep+0x7d/0x88 Jan 6 01:04:10 $SERVER kernel: [<f885f056>] tw_ioctl+0x478/0xb07 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d Jan 6 01:04:10 $SERVER kernel: [<f883f905>] scsi_done+0x0/0x16 [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f8860529>] tw_scsi_queue+0x163/0x1f1 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<f883f748>] scsi_dispatch_cmd+0x1e9/0x24f [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f884417e>] scsi_request_fn+0x297/0x30d [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<c0221f68>] __generic_unplug_device+0x2b/0x2d Jan 6 01:04:10 $SERVER kernel: [<c0221f7f>] generic_unplug_device+0x15/0x21 Jan 6 01:04:10 $SERVER kernel: [<c022288d>] blk_execute_rq+0x88/0xb0 Jan 6 01:04:10 $SERVER kernel: [<c022096e>] elv_set_request+0xa/0x17 Jan 6 01:04:10 $SERVER kernel: [<c022251f>] get_request+0x1de/0x1e8 Jan 6 01:04:10 $SERVER kernel: [<c01a97c3>] task_has_capability+0x4a/0x52 Jan 6 01:04:10 $SERVER kernel: [<c0225cd5>] sg_scsi_ioctl+0x2bf/0x3c1 Jan 6 01:04:10 $SERVER kernel: [<c02261aa>] scsi_cmd_ioctl+0x3d3/0x475 Jan 6 01:04:10 $SERVER kernel: [<c014d41b>] handle_mm_fault+0xbd/0x175 Jan 6 01:04:10 $SERVER kernel: [<c011ad67>] do_page_fault+0x1ae/0x5c6 Jan 6 01:04:10 $SERVER kernel: [<c014e4a6>] vma_adjust+0x286/0x2d6 Jan 6 01:04:10 $SERVER kernel: [<f88228ea>] sd_ioctl+0xb3/0xd4 [sd_mod] Jan 6 01:04:10 $SERVER kernel: [<c02246e8>] blkdev_ioctl+0x328/0x334

The 3ware boards are running the latest firmware and I'm using the stock driver in the kernel. I *did* have to upgrade mdadm to 1.12.0, as the stock version doesn't support stacked arrays. 3dmd is running.

Any ideas as to what I can do to prevent this in the future? Having the system hang every time a disk dies is, well, less than optimal.

I have a similar problem with using the hardware raid(mirror). Every time 3dmd started a scheduled verify at midnight... anywhere from 0 to 26 minutes later the kernel would crash. This happened every night at 12. I finally disabled the verify task in 3dmd and the crashes stopped. I now just use smartd to do extended tests which do not show any problems with the disks. The crash dump and log indicates that port0 is bad though.

I have the crash dumps and it is reproducible if I enable verify again... Anyone know of a way to get to the bottom of the crash and find a fix? I keep getting the feeling of "See... you should have bought RHEL to get support!". Too expensive for my use of this system though.

3w-xxxx: scsi0: Command failed: status = 0xc4, flags = 0x3b, unit #0. 3w-xxxx: scsi0: AEN: INFO: Initialization started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Initialization started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Initialization complete: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify complete: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify complete: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify started: Unit #0. 3w-xxxx: scsi0: AEN: INFO: Verify started: Unit #0. 3w-xxxx: scsi0: AEN: ERROR: Verify failed: Port #0. 3w-xxxx: scsi0: AEN: INFO: Initialization started: Unit #0. Debug: sleeping function called from invalid context at include/asm/uaccess.h:556 in_atomic():0[expected: 0], irqs_disabled():1 [<c011df50>] __might_sleep+0x7d/0x89 [<d0840550>] tw_ioctl+0x466/0xd80 [3w_xxxx] [<d0858dfc>] scsi_done+0x0/0x16 [scsi_mod] [<d08422a0>] tw_scsi_queue+0x222/0x312 [3w_xxxx] [<d0858bee>] scsi_dispatch_cmd+0x2f6/0x3ad [scsi_mod] [<d085eab2>] scsi_request_fn+0x385/0x5b4 [scsi_mod] [<c024ee5a>] __generic_unplug_device+0x2b/0x2d [<c024eed7>] generic_unplug_device+0x7b/0xe0 [<c024fbd6>] blk_execute_rq+0xbb/0xe3 [<c011e57b>] autoremove_wake_function+0x0/0x2d [<c02573c5>] cfq_set_request+0x33/0x6b [<c0257392>] cfq_set_request+0x0/0x6b [<c024db99>] elv_set_request+0xa/0x17 [<c024f77b>] get_request+0x395/0x39f [<c0253181>] sg_scsi_ioctl+0x2bf/0x3c1 [<c0253656>] scsi_cmd_ioctl+0x3d3/0x475 [<c0157a48>] do_no_page+0x55/0x3bf [<c018103f>] dput+0x33/0x423 [<c0157f92>] handle_mm_fault+0xd5/0x1fd [<c011a8ed>] do_page_fault+0x1ac/0x4dc [<d081a8f4>] sd_ioctl+0xb6/0xd7 [sd_mod] [<c0251d86>] blkdev_ioctl+0x32b/0x337 [<c0171c2e>] block_ioctl+0x11/0x13 [<c017c0f1>] sys_ioctl+0x297/0x336 [<c030f8cb>] syscall_call+0x7/0xb [<c030007b>] xfrm_policy_gc_kill+0x39/0x68 Debug: sleeping function called from invalid context at include/asm/uaccess.h:531 in_atomic():0[expected: 0], irqs_disabled():1 [<c011df50>] __might_sleep+0x7d/0x89 [<d0840b85>] tw_ioctl+0xa9b/0xd80 [3w_xxxx] [<d0858dfc>] scsi_done+0x0/0x16 [scsi_mod] [<d08422a0>] tw_scsi_queue+0x222/0x312 [3w_xxxx] [<d0858bee>] scsi_dispatch_cmd+0x2f6/0x3ad [scsi_mod] [<d085eab2>] scsi_request_fn+0x385/0x5b4 [scsi_mod] [<c024ee5a>] __generic_unplug_device+0x2b/0x2d [<c024eed7>] generic_unplug_device+0x7b/0xe0 [<c024fbd6>] blk_execute_rq+0xbb/0xe3 [<c011e57b>] autoremove_wake_function+0x0/0x2d [<c02573c5>] cfq_set_request+0x33/0x6b [<c0257392>] cfq_set_request+0x0/0x6b [<c024db99>] elv_set_request+0xa/0x17 [<c024f77b>] get_request+0x395/0x39f [<c0253181>] sg_scsi_ioctl+0x2bf/0x3c1 [<c0253656>] scsi_cmd_ioctl+0x3d3/0x475 [<c0157a48>] do_no_page+0x55/0x3bf [<c018103f>] dput+0x33/0x423 [<c0157f92>] handle_mm_fault+0xd5/0x1fd [<c011a8ed>] do_page_fault+0x1ac/0x4dc [<d081a8f4>] sd_ioctl+0xb6/0xd7 [sd_mod] [<c0251d86>] blkdev_ioctl+0x32b/0x337 [<c0171c2e>] block_ioctl+0x11/0x13 [<c017c0f1>] sys_ioctl+0x297/0x336 [<c030f8cb>] syscall_call+0x7/0xb [<c030007b>] xfrm_policy_gc_kill+0x39/0x68 Kernel panic - not syncing: drivers/scsi/scsi_lib.c:1245: spin_lock(drivers/scsi/hosts.c:cf950034) already locked by drivers/scsi/3w-xxxx.c/1950

Joshua Baker-LePain

8:08 p.m.

On Fri, 6 Jan 2006 at 2:47pm, Adam Gibson wrote

...

I have a similar problem with using the hardware raid(mirror). Every time 3dmd started a scheduled verify at midnight... anywhere from 0 to 26 minutes later the kernel would crash. This happened every night at 12. I finally disabled the verify task in 3dmd and the crashes stopped. I now just use smartd to do extended tests which do not show any problems with the disks. The crash dump and log indicates that port0 is bad though.

Did you have smartd set up to monitor the disks as well as 3dmd? Did you get the bad port error preceding every crash?

...

I have the crash dumps and it is reproducible if I enable verify again... Anyone know of a way to get to the bottom of the crash and find a fix? I keep getting the feeling of "See... you should have bought RHEL to get support!". Too expensive for my use of this system though.

There's always RH's bugzilla, but not if in you're in a hurry, and they do seem to frown on centos derived bugs.

-- Joshua Baker-LePain Department of Biomedical Engineering Duke University

Adam Gibson

8:30 p.m.

Joshua Baker-LePain wrote:

...

On Fri, 6 Jan 2006 at 2:47pm, Adam Gibson wrote

...
I have a similar problem with using the hardware raid(mirror). Every time 3dmd started a scheduled verify at midnight... anywhere from 0 to 26 minutes later the kernel would crash. This happened every night at 12. I finally disabled the verify task in 3dmd and the crashes stopped. I now just use smartd to do extended tests which do not show any problems with the disks. The crash dump and log indicates that port0 is bad though.

Did you have smartd set up to monitor the disks as well as 3dmd? Did you get the bad port error preceding every crash?

At first I had smartd running as well which was scheduled much later in the morning away from the 3dmd verify. I experimented by not running smartd but the crashes still occurred.

The port0 error was after every crash.

...

...
I have the crash dumps and it is reproducible if I enable verify again... Anyone know of a way to get to the bottom of the crash and find a fix? I keep getting the feeling of "See... you should have bought RHEL to get support!". Too expensive for my use of this system though.

There's always RH's bugzilla, but not if in you're in a hurry, and they do seem to frown on centos derived bugs.

I would not feel right trying to report this to RH bugzilla. Is that something that has been done in the past? I am just really surprised that RH has not found this problem on their own. 3ware controllers are used by a lot of users I would think. To have a problem like this is a pretty big deal I would think.

Bryan J. Smith

11:34 p.m.

Adam Gibson agibson@ptm.com wrote:

...

The port0 error was after every crash.

Check your cable. Check your tray if you have one.

The issue I've typically seen with 3Ware PATA setups I haven't installed is a cheap tray. SATA solves much of this issue, but you can still run into it.

...

I would not feel right trying to report this to RH bugzilla. Is that something that has been done in the

past?

Yes. Any issue should be reported to Bugzilla -- Fedora Core or Red Hat Enterprise Linux (CentOS). In fact, I recently had a RHEL issue and Red Hat explicitly told me to run a newer Fedora Core version (with a newer kernel or set of kernel patches) to see if it fixed the problem. If so, then they'd integrate those fixes into the next RHEL update, or at least tell me what to patch in.

...

I am just really surprised that RH has not found this

problem

...

on their own. 3ware controllers are used by a lot of users

...

would think. To have a problem like this is a pretty big

deal

...

I would think.

There are countless hardware and software RAID issues on a regular basis. Just hit Bugzilla and you'll see.

For the most part, knowing _how_ to deploy hardware or software RAID is the critical factor. If you use 3Ware, use its facilities. The biggest falicy I see propogated is that you can use its hot-swap and fault-tolerance with software RAID. You can't any more than any other ATA or SCSI card I've used (although I have to investigate some of the SCSI cards people are using here).

If you need RAID-5 write performance, do _not_ use the 3Ware Escalade 7000/8000 cards. They only have a measly 1-4MB of 0 wait state SRAM (static RAM) and operate as a "storage switch" and are _not_ a "buffering controller." Consider RAID-10 instead, which it excells at, especially since it's write performance is far, far better than any RAID-5 write I've seen (software or hardware).

A few people here are running the newer 3Ware Escalade 9500S cards with firmware 9.2.1.1, an updated driver to match (newer than what the stock/Red Hat kernels are running with) and RAID-5 with good results. I haven't personally used these, as I still prefer to deploy RAID-10 on the 8000 series. I've heard the 9550SX is still maturing.

Les Mikesell

11:56 p.m.

On Fri, 2006-01-06 at 17:34, Bryan J. Smith wrote:

...

For the most part, knowing _how_ to deploy hardware or software RAID is the critical factor. If you use 3Ware, use its facilities. The biggest falicy I see propogated is that you can use its hot-swap and fault-tolerance with software RAID. You can't any more than any other ATA or SCSI card I've used (although I have to investigate some of the SCSI cards people are using here).

I have a non-critical IBM eserver with software raid running so I yanked a drive to see what happens. Basically nothing. All the other drive lights blinked while it reset the bus, it logged some scsi errors like: SCSI error : <0 0 2 0> return code = 0x10000 end_request: I/O error, dev sdc, sector 71681855 md: write_disk_sb failed for device sdc1 then: md: write_disk_sb failed for device sdc1 md: excessive errors occurred during superblock update, exiting mptbase: ioc0: IOCStatus(0x0043): SCSI Device Not There SCSI error : <0 0 2 0> return code = 0x10000 end_request: I/O error, dev sdc, sector 12791 raid1: Disk failure on sdc1, disabling device. Operation continuing on 1 devices

Everything is still working normally.

Then I removed the failed device from the raid, did the echo 'remove-single-device ..." >/proc/scsi/scsi thing, reseated the drive, added it back as a scsi device and added it back to the raid and it is rebuilding now. Nothing else even blinked except the first 'cat /proc/mdstat' took several seconds after the disk was removed.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

7 Jan 7 Jan

12:02 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

Les Mikesell lesmikesell@gmail.com wrote:

...

I have a non-critical IBM eserver with software raid running so I yanked a drive to see what happens.

Hold on a second ... Are you using a SCSI backplane? If so, that's the difference right there! ;->

SCSI backplanes and host adapters work very, very different on transient (or failure for that matter) than _any_ ATA or regular SCSI (without a backplane). They are still formulating similarly for SATA, and there are some SCSI adopted standards for SATA backplanes. But with SAS, much of that is becoming moot.

Okay, now things make far more sense. ;->

Peter Arremann

12:09 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

On Friday 06 January 2006 19:02, Bryan J. Smith wrote:

...

Les Mikesell lesmikesell@gmail.com wrote:

...
I have a non-critical IBM eserver with software raid running so I yanked a drive to see what happens.

Hold on a second ... Are you using a SCSI backplane? If so, that's the difference right there! ;->

SCSI backplanes and host adapters work very, very different on transient (or failure for that matter) than _any_ ATA or regular SCSI (without a backplane).

Hmmm... how is it a different matter? A scsi backplane has very little logic onboard other than what is required for the scsi id selection (unless that is hardwired too). I don't know any backplane that has logic that can understand the scsi protocol and could communicate drive removal or such to the controller... ?

Peter.

Bryan J. Smith

12:18 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

Peter Arremann loony@loonybin.org wrote:

...

Hmmm... how is it a different matter? A scsi backplane has very little logic onboard other than what is required for

the

...

scsi id selection (unless that is hardwired too).

Depends on the backplane. You should see what the Dell, IBM and other servers have. ;->

Furthermore, SCSI-2 is a very _rich_ protocol, including being able to handle transient, loss, etc... of nodes. Most cards, when used with a backplane that supports such logic, handles a lot. The rich SCSI driver for the OS then communicates much of that status back to the system.

People who expect a 3Ware, Areca or other card to provide similar don't realize that these card's drivers are rather "dumb" from the OS standpoint. All of the "intelligence" is on-board, in the firmware, driven by the on-board ASIC or microcontroller. That's why these features _only_ work when the on-board ASIC or microcontroller is controlling the array.

And not when it's just presenting the disk as a JBOD back to the system, where it cannot manage it as such.

...

I don't know any backplane that has logic that can

understand

...

the scsi protocol and could communicate drive removal or such to the controller... ?

Huh? Read up on SCSI-2 with SCA. Now if you're using some cheap SCSI carriers, no, you're not going to get the same. But if you're using a standard server backplane -- especially what you'd get in an IBM eServer or a Dell PowerEdge, then yes, you're going to get various SCSI-2 control.

It's just a couple of key pins which the SCSI-2 host adapter then handles, and then reports back via it's driver.

Again, 3Ware, Areca and other controllers don't have "broken" drivers per se, they just are _not_ delivering the full SCSI-2 capabilities that some other cards do.

Now it will be interesting to see if Serial Attached SCSI (SAS) cards, which can use SATA drives, will still report back SATA status via their SCSI-2 facilities. I know some of the SAF-TE standard is adopted in some SATA drives and backplanes.

Peter Arremann

12:37 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

On Friday 06 January 2006 19:18, Bryan J. Smith wrote:

...

Peter Arremann loony@loonybin.org wrote:

...
Hmmm... how is it a different matter? A scsi backplane has very little logic onboard other than what is required for

the

...
scsi id selection (unless that is hardwired too).

Depends on the backplane. You should see what the Dell, IBM and other servers have. ;->

*nods* Have done so - never when working on anything from HP, sun or IBM I've seen anything like what you're describing... SAF-TE is the furthest in that direction but also not even close...

...

Now it will be interesting to see if Serial Attached SCSI (SAS) cards, which can use SATA drives, will still report back SATA status via their SCSI-2 facilities. I know some of the SAF-TE standard is adopted in some SATA drives and backplanes.

Ok - so you're referring so saf-te with your above statemts or is there more?

Peter.

Bryan J. Smith

12:49 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

Peter Arremann loony@loonybin.org wrote:

...

*nods* Have done so - never when working on anything from HP, sun or IBM I'v seen anything like what you're

describing...

...

SAF-TE is the furthest in that direction but also not even close... Ok - so you're referring so saf-te with your above statemts or is there more?

Well, SAF-TE is pretty anal-level, and it's more about enclosure and other details. I'd have to fully research _what_ is exactly required -- at the SCSI card (probably not much), at the SCSI backplane (probably not much) and at the SCSI driver (there's the majority of the beef ;-).

Now, case-in-point/back-to-original-focus:

3Ware Escalade, Areca ARC and other _true_ hardware ATA cards are _not_ SCSI-2 cards that provide SCSI-2 feedback to the kernel. They have added some capabilities, like smartd and other daemon support. But to get the failed/hot-swap capability, you _must_ let the internal ASIC/microcontroller -- where _all_ of the "meat" is -- manage the array.

At this point, anyone who doesn't believe what I'm saying can feel free to assume I'm pulling it all out of rectum. Because I've said enough, dissected this all enough and I'm sure most people who don't care are sick of hearing me try to explain why you can't use 3Ware cards with software RAID and handle failures or hot-swap.

And I still want to strangle anyone on the software RAID lists for continuing to proliferate such. ;->

Peter Arremann

1:09 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

On Friday 06 January 2006 19:49, Bryan J. Smith wrote:

...

Peter Arremann loony@loonybin.org wrote:

...
SAF-TE is the furthest in that direction but also not even close... Ok - so you're referring so saf-te with your above statemts or is there more?

Well, SAF-TE is pretty anal-level, and it's more about enclosure and other details. I'd have to fully research _what_ is exactly required -- at the SCSI card (probably not much), at the SCSI backplane (probably not much) and at the SCSI driver (there's the majority of the beef ;-).

Most manufacturers I'm familiar with are using std. LSI, adaptec, whatever chips on the controller side. Drives are standard and I've just never seen ICs mounted on a internal backplane that had even close to enough pins to be attached to the scsi bus... many external enclosures support saf-te but saf-te does not provide you the level of information you were talking about. The point about failing devices/disconnects is specifically mentioned as not being a responsibility of saf-te.

...

Now, case-in-point/back-to-original-focus:

Sorry - I should have marked that as off topic. It had nothing to do with my 3ware/ata question :-) I was simply wondering what exactly I missed...

Peter.

Bryan J. Smith

12:11 a.m.

New subject: 3ware disk failure -> hang -- SUMMARY: Failed drives/hot-swap recommendations

"Bryan J. Smith" thebs413@earthlink.net wrote:

...

Hold on a second ... Are you using a SCSI backplane? If so, that's the difference right there! ;-> SCSI backplanes and host adapters work very, very different on transient (or failure for that matter) than _any_ ATA or regular SCSI (without a backplane). They are still formulating similarly for SATA, and there are some SCSI adopted standards for SATA backplanes. But with SAS, much of that is becoming moot. Okay, now things make far more sense. ;->

Okay, let me put this summary out ...

1) Software RAID with SCSI

If you want reliable software RAID for failed drives or hot-swap drives, you want to get a host adapter _and_ a SCSI backplane that work together. The card must then have full SCSI-2 support via their driver for the SCSI subsystem to enable such disconnect and hot-swap features, which is then paired with the backplane hardware.

2) Hardware RAID with ATA

3Ware Escalade, Areca ARC and other ATA RAID controllers use _true_ hardware RAID by the way of ASIC or microcontrollers that _never_ let the OS see the "raw" disc. When the discs are managed into arrays, the on-board intelligence can handle failures and hot-swaps.

If the discs are not managed as arrays, they report the discs as they are to the OS, which means if they fail or are removed, you'll _lose_ the device. Although these card's drivers might load via the SCSI subsystem, they are _not_ SCSI cards, and do _not_ have a full SCSI-2 feature set.

3) Software RAID with ATA, SCSI (non-backplane) or JBOD modes

You're on your own here. If there is not a full SCSI-2 driver for your controller, with associated hardware to handle loss or transient, then you're likely going to get a panic. The "new option" in kernel 2.6 is allegedly hotswap, but I have never configured a storage device for it -- other than USB, FireWire, CompactFlash, etc...

Now does this make more sense? ;->

Bryan J. Smith

12:22 a.m.

New subject: 3ware disk failure -> hang -- SUMMARY: Failed drives/hot-swap recommendations

"Bryan J. Smith" thebs413@earthlink.net wrote:

...

Software RAID with ATA, SCSI (non-backplane) or JBOD

modes You're on your own here. If there is not a full SCSI-2 driver for your controller, with associated hardware to handle loss or transient, then you're likely going to get a panic. The "new option" in kernel 2.6 is allegedly hotswap, but I have never configured a storage device for

...

-- other than USB, FireWire, CompactFlash, etc...

One final note ...

"libata" is an effort to give a rich set of features to both ATA and SATA. Especially for SATA, where it does have staggered pins and can be considered the "SCA for ATA."

But that means the ATA controller, its driver and the end-device must all support the function in libata to work.

Les Mikesell

1:20 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

On Fri, 2006-01-06 at 18:02, Bryan J. Smith wrote:

...

...
I have a non-critical IBM eserver with software raid running so I yanked a drive to see what happens.

Hold on a second ... Are you using a SCSI backplane? If so, that's the difference right there! ;->

SCSI backplanes and host adapters work very, very different on transient (or failure for that matter) than _any_ ATA or regular SCSI (without a backplane). They are still formulating similarly for SATA, and there are some SCSI adopted standards for SATA backplanes. But with SAS, much of that is becoming moot.

Okay, now things make far more sense. ;->

The only relevant part of the backplane is that it uses SCA connectors which you need for hot swapping because they make the power and data connections happen in the right order. The controller doesn't know if there is a backplane or just a cable.

-- Les Mikesell lesmikesell@gmail.com

Chris Mauritz

1:33 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

Les Mikesell wrote:

...

The only relevant part of the backplane is that it uses SCA connectors which you need for hot swapping because they make the power and data connections happen in the right order. The controller doesn't know if there is a backplane or just a cable.

That is my understanding as well, though I'm not a bona fide certified electrical engineer (don't even play one on the net). 8-)

Cheers,

Bryan J. Smith

5:22 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

Les Mikesell wrote:

...

The only relevant part of the backplane is that it uses SCA connectors which you need for hot swapping because they make the power and data connections happen in the right order. The controller doesn't know if there is a backplane or just a cable.

Chris Mauritz wrote:

...

That is my understanding as well, though I'm not a bona fide certified electrical engineer (don't even play one on the net). 8-)

Ummm, no, there's far more to a SCSI-2 backplane than just handling the transient. That is then feed back to the SCSI-2 host adapter. And depending on what the SCSI-2 driver can give the kernel, there's a lot more that _can_ be reported.

And, my latter comment of ...

"I'd have to fully research _what_ is exactly required -- at the SCSI card (probably not much), at the SCSI backplane (probably not much) and at the SCSI driver (there's the majority of the beef ;-)."

SCSI-2 is a very rich command set, with a lot of control. I'm sure the backplane and host adapter offer a wealth of information, something that can be used. Again, I have to research what is different.

But _regardless_, you're _not_ going to get that out of a 3Ware, an Areca, etc... They have that capability _on_card_ in their firmware, executed by their ASIC/microcontrollers for their arrays. But they aren't feeding back such a rich set of capabilities to the OS.

And that's before we even look at standard ATA controllers. There is a rich set of logic being developed for ATA, and "libata" attempts to support it. But nothing is well developed, much less well supported by various controllers, etc...

So please quit nit-picking my comments, which you guys _always_ do. I have given far more dissection than _anyone_ here has even _bothered_ to address. I have given repeated, _sound_advise_ on what you can and can't use 3Ware for. There are far more many people going, "oh yeah, you can use 3Ware and get hot-swap with software RAID" on a whim over on the MD lists. I at least know what I have used.

I'm honestly considering leaving this group because select people continue to want to over-simplify my comments just for the sake of disagreeing with them, and not seeing the larger issue of _all_ the factors involved. It's really not worth it if some of you guys don't want to work with me.

And damn it if it isn't the same characters over and over and over again. Especially the ones that want me to provide documentation, yet they haven't offered any insight or documentation themselves.

-- Bryan J. Smith mailto:b.j.smith@ieee.org http://thebs413.blogspot.com ------------------------------------------ Some things (or athletes) money can't buy. For everything else there's "ManningCard."

Les Mikesell

6:38 a.m.

New subject: 3ware disk failure -> hang -- SCSI backplanes

On Fri, 2006-01-06 at 23:22, Bryan J. Smith wrote:

...

So please quit nit-picking my comments, which you guys _always_ do. I have given far more dissection than _anyone_ here has even _bothered_ to address.

Nobody is nitpicking your comments about the 3ware board - just the ones about the kernel having to panic when you remove a device which isn't true. The error can and does get passed up to the md layer with reasonable hardware.

-- Les Mikesell lesmikesell@gmail.com

7103

Age (days ago)

7115

Last active (days ago)

discuss@lists.centos.org

38 comments

9 participants

tags (0)

participants (9)

Adam Gibson
Bryan J. Smith
Chris Mauritz
David Finch
Joshua Baker-LePain
Les Mikesell
Mickael Maddison
Peter Arremann
Steve Huff