[OT/HW] hardware raid -- comment/experience with 3Ware

List overview All Threads
Download

newer

older

Postfix (Roundcube client) sends...

UEFI Bootable ISO HowTo

Arun Khan

6 Mar 2013 6 Mar '13

4:35 p.m.

Greetings,

I am looking for a hardware raid card that supports up to 4 SATA II hard disks with hot swap (compatible raid cage)

I have short listed two LSI/3Ware cards: 1. 9750-4i 2. 9650SE-4LPML

Both appear to be well supported in Linux.

I would appreciate your personal experience with the CLI tools provided by LSI. Can they be configured to send email for disk failures or SMART errors? Is there a Web interface for monitoring?

Any preference between 1 and 2 above.

Thanks for your time and suggestions. -- Arun Khan

Show replies by date

Gordon Messmer

6 Mar 6 Mar

6:37 p.m.

On 03/06/2013 08:35 AM, Arun Khan wrote:

...

Both appear to be well supported in Linux.

They are.

...

I would appreciate your personal experience with the CLI tools provided by LSI. Can they be configured to send email for disk failures or SMART errors?

Not the CLI tools. You'll need to run 3dm2.

...

Is there a Web interface for monitoring?

Yes, that is also provided by 3dm2.

...

Any preference between 1 and 2 above.

Based on about 10 years of running a hundred or so systems with 3ware controllers, I would say that you're better off with an LSI MegaRAID card, or with Linux software RAID. 3ware cards themselves have been the most problematic component of any system I've run in my entire professional career (starting in 1996). Even very recent cards fail in a wide variety of ways, and there is no guarantee that if your array fails using a controller that you buy now that you'll be able to connect it to a controller that you buy later.

At this point, I deploy almost exclusively systems running Linux with KVM on top of software RAID. While I lose the battery backed write cache (which is great for performance unless you sustain enough writes to fill it completely, at which point the system grinds nearly to a halt), I gain a consistent set of management tools and the ability to move a disk array to any hardware that accepts the same form factor disk. The reliability of my systems has improved significantly since I moved to software RAID.

Arun Khan

12 Mar 12 Mar

8:30 a.m.

On Thu, Mar 7, 2013 at 12:07 AM, Gordon Messmer wrote:

...

On 03/06/2013 08:35 AM, Arun Khan wrote:

...
Any preference between 1 and 2 above.

Based on about 10 years of running a hundred or so systems with 3ware controllers, I would say that you're better off with an LSI MegaRAID card, or with Linux software RAID. 3ware cards themselves have been the most problematic component of any system I've run in my entire professional career (starting in 1996). Even very recent cards fail in a wide variety of ways, and there is no guarantee that if your array fails using a controller that you buy now that you'll be able to connect it to a controller that you buy later.

@ Gordon - thanks for sharing this piece of info! In case of RAID card failure, it is important to be able to recover the data (RAID device) with a compatible replacement. Are the LSI MegaRAID controller more reliable in this respect?

...

At this point, I deploy almost exclusively systems running Linux with KVM on top of software RAID. While I lose the battery backed write cache (which is great for performance unless you sustain enough writes to fill it completely, at which point the system grinds nearly to a halt), I gain a consistent set of management tools and the ability to move a disk array to any hardware that accepts the same form factor disk. The reliability of my systems has improved significantly since I moved to software RAID.

Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori. The systems will go to client site (remote), prefer to keep the support calls to remove/replace hardware activity :(

Thanks, -- Arun Khan

Keith Keller

6:31 p.m.

On 2013-03-12, Arun Khan knura9@gmail.com wrote:

...

Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori.

Hot-swapping a failed drive is basically the same process AFAICT with a 3ware array or an md array. The primary issue would be to make sure the controller supports hot swap, and how you tell it to release a drive from the kernel (a la tw_cli /cx/px remove). And if you have one or two spares on a modest array then you can schedule site visits around your own schedule if a drive fails--either a hardware RAID or linux md will automatically rebuild with a spare when an active drive is marked as failed.

Did you have other concerns about hot swapping? I may be confusing what you're hoping to do.

--keith

-- kkeller@wombat.san-francisco.ca.us

Arun Khan

13 Mar 13 Mar

7:25 p.m.

On Wed, Mar 13, 2013 at 12:01 AM, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:

...

On 2013-03-12, Arun Khan knura9@gmail.com wrote:

...
Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori.

Hot-swapping a failed drive is basically the same process AFAICT with a 3ware array or an md array. The primary issue would be to make sure the controller supports hot swap, and how you tell it to release a drive from the kernel (a la tw_cli /cx/px remove). And if you have one or two spares on a modest array then you can schedule site visits around your own schedule if a drive fails--either a hardware RAID or linux md will automatically rebuild with a spare when an active drive is marked as failed.

Did you have other concerns about hot swapping? I may be confusing what you're hoping to do.

I have no experience with hardware raid and what is entailed in "hot swapping" disks connected to HBA.

My expectation with hardware controller (that supports hot swap), (a) is to identify which drive has failed (via an LED in the RAID cage), (b) the local support guy removes that disk, (c) inserts a new disk and (d) the controller detects the new disk and adds the new disk into the array and automatically rebuilds the array. Is this possible with hardware controllers?

It looks like, even with a hardware controller, I would still have to use the CLI tools to remove the failed disk and add the new one. Please correct me if I misunderstood.

With a hardware controller, I was hoping the process of removing/adding a drive would not involve CLI tools.

-- Arun Khan

SilverTip257

12 Mar 12 Mar

11:33 p.m.

On Tue, Mar 12, 2013 at 4:30 AM, Arun Khan knura9@gmail.com wrote:

...

On Thu, Mar 7, 2013 at 12:07 AM, Gordon Messmer wrote:

...
On 03/06/2013 08:35 AM, Arun Khan wrote:

...
Any preference between 1 and 2 above.

Based on about 10 years of running a hundred or so systems with 3ware controllers, I would say that you're better off with an LSI MegaRAID card, or with Linux software RAID. 3ware cards themselves have been the most problematic component of any system I've run in my entire professional career (starting in 1996). Even very recent cards fail in a wide variety of ways, and there is no guarantee that if your array fails using a controller that you buy now that you'll be able to connect it to a controller that you buy later.

@ Gordon - thanks for sharing this piece of info! In case of RAID card failure, it is important to be able to recover the data (RAID device) with a compatible replacement. Are the LSI MegaRAID controller more reliable in this respect?

I've not had any MegaRAID controllers fail, so I can only say they've been reliable thus far!

...

...
At this point, I deploy almost exclusively systems running Linux with KVM on top of software RAID. While I lose the battery backed write cache (which is great for performance unless you sustain enough writes to fill it completely, at which point the system grinds nearly to a halt), I gain a consistent set of management tools and the ability to move a disk array to any hardware that accepts the same form factor disk. The reliability of my systems has improved significantly since I moved to software RAID.

Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori.

Hot swap really depends on what your HBA or RAID controller supports.

You start by failing/removing the drive via mdadm. Then hot remove the disk from the subsystem (ex: SCSI [0]) and finally physically remove it. Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally add the new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block device (ex: /dev/sdc) name is occupied and unavailable for the new disk you put in the system. I've used the above procedure many times to repair softraid arrays while keeping systems online.

[0] https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/5/ht... [1] https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/5/ht...

The systems will go to client site (remote), prefer to keep the

...

support calls to remove/replace hardware activity :(

Thanks, -- Arun Khan _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

Keith Keller

13 Mar 13 Mar

2:10 a.m.

On 2013-03-12, SilverTip257 silvertip257@gmail.com wrote:

...

I've not had any MegaRAID controllers fail, so I can only say they've been reliable thus far!

I think that this is not a helpful comment for the OP. He wants to know, in the event the controller does fail, can he replace it with a similar-but-possibly-not-identical controller and have it recognize the original RAID containers. Just because you have not seen any failures so far does not mean the OP never will.

...

You start by failing/removing the drive via mdadm. Then hot remove the disk from the subsystem (ex: SCSI [0]) and finally physically remove it. Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally add the new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block device (ex: /dev/sdc) name is occupied and unavailable for the new disk you put in the system. I've used the above procedure many times to repair softraid arrays while keeping systems online.

This is basically the same procedure for replacing a failed drive in a hardware RAID array, except that there is no need to worry about drive names (since individual drives don't get assigned a name in the kernel). But the point is that replacing a failed drive is the same amount of on-site work in either scenario, so that should not deter the OP from choosing software RAID. (There may be other factors, such as the aforementioned write cache on many RAID cards.)

--keith

-- kkeller@wombat.san-francisco.ca.us

SilverTip257

1:07 p.m.

On Tue, Mar 12, 2013 at 10:10 PM, Keith Keller < kkeller@wombat.san-francisco.ca.us> wrote:

...

On 2013-03-12, SilverTip257 silvertip257@gmail.com wrote:

...
I've not had any MegaRAID controllers fail, so I can only say they've

been

...
reliable thus far!

I think that this is not a helpful comment for the OP. He wants to know, in the event the controller does fail, can he replace it with a similar-but-possibly-not-identical controller and have it recognize the

I've had no problem with various versions of Dell MegaRAID/PERC5i controllers. You can swap drives from a PERC5i into a PERC6i for example and things are peachy. But it is not possible to swap drives from a PERC6i into a PERC5i controller.

Avoid SAS6/iR controllers ... they are low-end controllers that only support hardware RAID0 and RAID1.

Ultimately hardware RAID controllers can be a big pain -- just like anything else it's a good business practice to have spares!

...

original RAID containers. Just because you have not seen any failures so far does not mean the OP never will.

...
You start by failing/removing the drive via mdadm. Then hot remove the disk from the subsystem (ex: SCSI [0]) and finally physically remove it. Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally add

the

...
new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block device (ex: /dev/sdc) name is occupied and unavailable for the new disk you put

in

...
the system. I've used the above procedure many times to repair softraid arrays while keeping systems online.

This is basically the same procedure for replacing a failed drive in a hardware RAID array, except that there is no need to worry about drive

I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

I think that information will be useful to the OP and others, so I posted it all. I ought to check the CentOS wiki and see if any/all of those steps are documented.

...

names (since individual drives don't get assigned a name in the kernel). But the point is that replacing a failed drive is the same amount of on-site work in either scenario, so that should not deter the OP from choosing software RAID. (There may be other factors, such as the aforementioned write cache on many RAID cards.)

--keith

-- kkeller@wombat.san-francisco.ca.us

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

Keith Keller

5:34 p.m.

On 2013-03-13, SilverTip257 silvertip257@gmail.com wrote:

...

I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Agreed, but the OP specifically mentioned wanting to avoid creating more on-site work. He could do all of the steps you mentioned remotely, so the amount of on-site work for HW RAID or md RAID is equivalent, and therefore shouldn't be a factor in choosing between them. The added complexity might be an issue for a user new to RAID or to device management.

--keith

-- kkeller@wombat.san-francisco.ca.us

m.roth＠5-cent.us

5:55 p.m.

Keith Keller wrote:

...

On 2013-03-13, SilverTip257 silvertip257@gmail.com wrote:

...
I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Agreed, but the OP specifically mentioned wanting to avoid creating more on-site work. He could do all of the steps you mentioned remotely, so the amount of on-site work for HW RAID or md RAID is equivalent, and therefore shouldn't be a factor in choosing between them. The added complexity might be an issue for a user new to RAID or to device management.

In that case, my feeling would be that if the server was purchased with a RAID controller1], that's what should be used [2].

1. With the exception of the PERC 310, which I really dislike, the first version ofthe PERC 7, which *only* allowed DELL h/ds, and Intel FakeRAID. 2. Otherwise, that's a nice chunk o' change that was wasted in someone's budget.

mark

Arun Khan

8:15 p.m.

On Wed, Mar 13, 2013 at 11:04 PM, Keith Keller wrote:

...

On 2013-03-13, SilverTip257 wrote:

...
I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Agreed, but the OP specifically mentioned wanting to avoid creating more on-site work. He could do all of the steps you mentioned remotely, so the amount of on-site work for HW RAID or md RAID is equivalent, and therefore shouldn't be a factor in choosing between them. The added complexity might be an issue for a user new to RAID or to device management.

...

From the discussions thus far, I have concluded that hardware raid has

it's own issues.

Thanks to everyone for sharing your thoughts, suggestions and comments. I am still leaning towards a hardware raid controller but will look into the MegaRAID controllers as well.

-- Arun Khan

Keith Keller

11:43 p.m.

On 2013-03-13, Arun Khan knura9@gmail.com wrote:

...

Thanks to everyone for sharing your thoughts, suggestions and comments. I am still leaning towards a hardware raid controller but will look into the MegaRAID controllers as well.

All of the controllers mentioned so far--3ware, MegaRAID (both made by LSI), and Areca--are true hardware RAID controllers. If you intended on doing md RAID you would probably not purchase one of these, but would go instead with a simple SATA/SAS controller with sufficient ports and bandwidth to host all your drives (and make sure all hardware components supported hot swap).

--keith

-- kkeller@wombat.san-francisco.ca.us

Arun Khan

8:08 p.m.

On Wed, Mar 13, 2013 at 6:37 PM, SilverTip257 wrote:

...

On Tue, Mar 12, 2013 at 10:10 PM, Keith Keller wrote:

...
On 2013-03-12, SilverTip257 silvertip257@gmail.com wrote:

...
I've not had any MegaRAID controllers fail, so I can only say they've

been

...
reliable thus far!

I think that this is not a helpful comment for the OP. He wants to know, in the event the controller does fail, can he replace it with a similar-but-possibly-not-identical controller and have it recognize the

I've had no problem with various versions of Dell MegaRAID/PERC5i controllers. You can swap drives from a PERC5i into a PERC6i for example and things are peachy. But it is not possible to swap drives from a PERC6i into a PERC5i controller.

No plans to go with Dell hardware but it is great to note that newer models (Dell OEM Megaraid) recognize arrays created with older models. I don't expect an older model to recognize an array created by a newer model.

...

Avoid SAS6/iR controllers ... they are low-end controllers that only support hardware RAID0 and RAID1.

My configuration will be RAID 5 or 6, depending on how the option the client is willing to pay.

...

Ultimately hardware RAID controllers can be a big pain -- just like anything else it's a good business practice to have spares!

...
original RAID containers. Just because you have not seen any failures so far does not mean the OP never will.

...
You start by failing/removing the drive via mdadm. Then hot remove the disk from the subsystem (ex: SCSI [0]) and finally physically remove it. Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally add

the

...
new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block device (ex: /dev/sdc) name is occupied and unavailable for the new disk you put

in

...
the system. I've used the above procedure many times to repair softraid arrays while keeping systems online.

This is basically the same procedure for replacing a failed drive in a hardware RAID array, except that there is no need to worry about drive

I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Yes, this could happen inspite of well documented procedures. For this reason, hardware RAID has been a consideration. However, I have come to realize that it has it's own pros and cons as mentioned in this thread.

-- Arun Khan

Les Mikesell

8:54 p.m.

On Wed, Mar 13, 2013 at 3:08 PM, Arun Khan knura9@gmail.com wrote:

...

...
I'll argue that the software RAID process is slightly more complex. And it is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Yes, this could happen inspite of well documented procedures. For this reason, hardware RAID has been a consideration. However, I have come to realize that it has it's own pros and cons as mentioned in this thread.

I've hot-swapped lots of SCA and SATA drives in and out of software md raids and never had a problem. Assuming you have appropriate disk carriers or one of those trayless swappable SATA bays, hot-swap is part of the hardware spec (there may be a few very old SATA controllers that don't notice, though). I'd always prefer software raid for simple mirroring but would use hardware for raid5, etc., where it offloads the parity computation.

-- Les Mikesell lesmikesell@gmail.com

SilverTip257

11:39 p.m.

On Wed, Mar 13, 2013 at 4:08 PM, Arun Khan knura9@gmail.com wrote:

...

On Wed, Mar 13, 2013 at 6:37 PM, SilverTip257 wrote:

...
On Tue, Mar 12, 2013 at 10:10 PM, Keith Keller wrote:

...
On 2013-03-12, SilverTip257 silvertip257@gmail.com wrote:

...
I've not had any MegaRAID controllers fail, so I can only say they've

been

...
reliable thus far!

I think that this is not a helpful comment for the OP. He wants to know, in the event the controller does fail, can he replace it with a similar-but-possibly-not-identical controller and have it recognize the

I've had no problem with various versions of Dell MegaRAID/PERC5i controllers. You can swap drives from a PERC5i into a PERC6i for example and things

are

...
peachy. But it is not possible to swap drives from a PERC6i into a

PERC5i

...
controller.

No plans to go with Dell hardware but it is great to note that newer models (Dell OEM Megaraid) recognize arrays created with older models. I don't expect an older model to recognize an array created by a newer model.

Doubtful. The 6i are newer than the 5i. Months ago I tested and can confirm 5i cannot read 6i metadata (Dell and others are not lying).

I've not tried swapping drives from a 5i into a 6i, then back to 5i to see if the 6i changed the metadata any. That's too much swapping for a hypothetical situation where an admin does not do a one-to-one swap (5i to 5i).

...

...
Avoid SAS6/iR controllers ... they are low-end controllers that only support hardware RAID0 and RAID1.

And to add more information... the Dell PERC[56]i controllers are supported by the LSI SNMP daemon which then exports quite a bunch of information to SNMP. And that is useful with a Nagios plugin to keep tabs on array health.

The SAS6/iR controllers are not compatible with that LSI SNMP daemon and so far I've not found a way to monitor their array health efficiently. The newest version of that Nagios script claims to support MPTFusion-based controllers (which the SAS6/iR is), but again I've not found a way to export the data to SNMP.

[ OT: I confess...someone on this list pointed me at a Nagios plugin to check OpenManage that I've yet to test. :-/ ]

...

My configuration will be RAID 5 or 6, depending on how the option the client is willing to pay.

...
Ultimately hardware RAID controllers can be a big pain -- just like anything else it's a good business practice to have spares!

...
original RAID containers. Just because you have not seen any failures so far does not mean the OP never will.

...
You start by failing/removing the drive via mdadm. Then hot remove

the

...
...
...
disk from the subsystem (ex: SCSI [0]) and finally physically remove

it.

...
...
...
Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally

add

...
...
the

...
new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block

device

...
...
...
(ex: /dev/sdc) name is occupied and unavailable for the new disk you

put

...
...
in

...
the system. I've used the above procedure many times to repair

softraid

...
...
...
arrays while keeping systems online.

This is basically the same procedure for replacing a failed drive in a hardware RAID array, except that there is no need to worry about drive

I'll argue that the software RAID process is slightly more complex. And

it

...
is crucial that one remember to hot-remove the disk ... after all one could panic their box by just yanking the drive.

Yes, this could happen inspite of well documented procedures. For this reason, hardware RAID has been a consideration. However, I have come to realize that it has it's own pros and cons as mentioned in this thread.

-- Arun Khan _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

Arun Khan

7:51 p.m.

On Wed, Mar 13, 2013 at 7:40 AM, Keith Keller wrote:

...

On 2013-03-12, SilverTip257 wrote:

...
I've not had any MegaRAID controllers fail, so I can only say they've been reliable thus far!

I think that this is not a helpful comment for the OP. He wants to know, in the event the controller does fail, can he replace it with a similar-but-possibly-not-identical controller and have it recognize the original RAID containers. Just because you have not seen any failures so far does not mean the OP never will.

+1. Nothing is guaranteed in life. However when the HBA fails, is it possible to replace it with the same model+firmware (assuming a spare card in stock) or a later model from the same OEM and recover the RAID array? (Assuming that none of the disks in the original array had any failure).

Has this happened to anyone and have they been able to recover the array without losing any data?

...

...
You start by failing/removing the drive via mdadm. Then hot remove the disk from the subsystem (ex: SCSI [0]) and finally physically remove it. Then work in the opposite direction ... hot add (SCSI [1]), clone the partition layout from one drive to the new with sfdisk, and finally add the new disk/partitions to your softraid array with mdadm.

You must hot remove the disk from the SCSI subsystem or the block device (ex: /dev/sdc) name is occupied and unavailable for the new disk you put in the system. I've used the above procedure many times to repair softraid arrays while keeping systems online.

This is basically the same procedure for replacing a failed drive in a hardware RAID array, except that there is no need to worry about drive names (since individual drives don't get assigned a name in the kernel). But the point is that replacing a failed drive is the same amount of on-site work in either scenario, so that should not deter the OP from choosing software RAID. (There may be other factors, such as the aforementioned write cache on many RAID cards.)

Going slightly OT - how do the NAS boxes handle the hard disk failure scenario?

-- Arun Khan

Gordon Messmer

7:50 p.m.

On 03/12/2013 01:30 AM, Arun Khan wrote:

...

@ Gordon - thanks for sharing this piece of info! In case of RAID card failure, it is important to be able to recover the data (RAID device) with a compatible replacement. Are the LSI MegaRAID controller more reliable in this respect?

I don't know. I would contact LSI support or sales and see what they have to say about it. I haven't had the occasion to move a disk array between one and another model of MegaRAID controller.

...

Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori. The systems will go to client site (remote), prefer to keep the support calls to remove/replace hardware activity :(

You're correct that adding a new disk to an mdadm array will require some CLI work. Probably most hardware RAID would do this automatically. I frequently do the CLI work remotely over SSH so that a local tech can swap the disk.

Eero Volotinen

8:11 p.m.

2013/3/13 Gordon Messmer yinyang@eburg.com:

...

On 03/12/2013 01:30 AM, Arun Khan wrote:

...
@ Gordon - thanks for sharing this piece of info! In case of RAID card failure, it is important to be able to recover the data (RAID device) with a compatible replacement. Are the LSI MegaRAID controller more reliable in this respect?

I don't know. I would contact LSI support or sales and see what they have to say about it. I haven't had the occasion to move a disk array between one and another model of MegaRAID controller.

...
Software RAID is an option but I don't think hot swap is possible without some tinkering with the mdadm tool a priori. The systems will go to client site (remote), prefer to keep the support calls to remove/replace hardware activity :(

You're correct that adding a new disk to an mdadm array will require some CLI work. Probably most hardware RAID would do this automatically. I frequently do the CLI work remotely over SSH so that a local tech can swap the disk.

areca also produces hardware raid controllers that works nicely on centos.

-- Eero

Keith Keller

6 Mar 6 Mar

6:56 p.m.

On 2013-03-06, Arun Khan knura9@gmail.com wrote:

...

I am looking for a hardware raid card that supports up to 4 SATA II hard disks with hot swap (compatible raid cage)

I have short listed two LSI/3Ware cards:

9750-4i

9650SE-4LPML

I would appreciate your personal experience with the CLI tools provided by LSI. Can they be configured to send email for disk failures or SMART errors?

Yes, with the 3dm2 monitor. There was a bug (which may still exist) where using the hostname of the SMTP server did not work, so you should try the IP address if you try using a DNS name and it doesn't work. You may also want to adjust the EmailSeverity option in 3dm.conf; I use EmailSeverity 3 which sends me information messages like array verifications and BBU charging events in addition to disk failures and SMART errors. I believe the default setting only sends out errors.

...

Is there a Web interface for monitoring?

3dm2 can also provide a web server, though I've never used it.

The command line tools (tw_cli) are okay, but they don't have a fabulous API. In particular the output is basically human readable text, which means you need to do your own text parsing if you want to use the output in automated tools like nagios.

...

Any preference between 1 and 2 above.

I would have a slight preference for the 9750 series. You get a faster controller for a very similar price.

The only other thing I would recommend is that you not rely on the reshape feature of these 3ware cards. My first test many years ago actually destroyed the array, and I aborted my second test last year because the reshape was probably going to take many weeks. (This was, IIRC, a reshape from 4 3TB disks to 5. Linux md can do this reshape in 1-2 days.) If you need to add space you should use tools like LVM instead.

The other standard warnings about using these controllers apply (e.g., if using the write cache, have a BBU on the card and a UPS on the server; do regular verifies on your redundant arrays; RAID is not a backup system).

--keith

-- kkeller@wombat.san-francisco.ca.us

Gordon Messmer

7:29 p.m.

On 03/06/2013 10:56 AM, Keith Keller wrote:

...

The other standard warnings about using these controllers apply (e.g., if using the write cache, have a BBU on the card

A 3ware card will not enable write caching unless a BBU is present.

Keith Keller

8:28 p.m.

On 2013-03-06, Gordon Messmer yinyang@eburg.com wrote:

...

On 03/06/2013 10:56 AM, Keith Keller wrote:

...
The other standard warnings about using these controllers apply (e.g., if using the write cache, have a BBU on the card

A 3ware card will not enable write caching unless a BBU is present.

My man page for tw_cli implies that you can turn on write caching without a BBU:

Please Note: 1) The default of the unit creation sets write cache to "on" for performance reasons. However, if there is no BBU available for the controller, a warning is sent to standard error.

Perhaps newer controllers will refuse to enable the write cache anyway; the tw_cli man page is written not to be specific to a particular controller.

Since we're already offtopic, I asked around about 3ware vs. MegaRAID. The ''consensus'' (1 < n < 5) was that MegaRAIDs had better performance but the CLI was daunting compared to the 3ware controllers. But if LSI starts phasing out the 3ware line perhaps it makes sense for people to start looking at MegaRAID controllers to be prepared for this possibility. (The minimal web searching I've done seems to support the dauntiness of the MegaRAID tools.)

--keith

-- kkeller@wombat.san-francisco.ca.us

Gordon Messmer

9:05 p.m.

On 03/06/2013 12:28 PM, Keith Keller wrote:

...

Please Note: 1) The default of the unit creation sets write cache to "on" for performance reasons. However, if there is no BBU available for the controller, a warning is sent to standard error.

It is a warning that there is no BBU, not a warning that you'll get unsafe write caching without one.

To the best of my recollection, all controllers will turn off write caching if the battery fails and during the battery re-learn cycle. If there is no BBU present, caching will never be enabled.

...

Since we're already offtopic, I asked around about 3ware vs. MegaRAID. The ''consensus'' (1 < n < 5) was that MegaRAIDs had better performance but the CLI was daunting compared to the 3ware controllers.

I don't know whether or not there's any noteworthy performance difference. MegaRAID cards have not been as prone to failure in my experience. The management software is definitely inferior.

...

But if LSI starts phasing out the 3ware line perhaps it makes sense for people to start looking at MegaRAID controllers to be prepared for this possibility. (The minimal web searching I've done seems to support the dauntiness of the MegaRAID tools.)

Whether or not LSI continues to make 3ware cards, I continue to strongly recommend against their use. They suck. They have always sucked. If a customer wanted me to manage a system with a 3ware card in it, I'd decline because at some point I'm going to be paged during off hours to fix the damn thing.

John R Pierce

9:40 p.m.

On 3/6/2013 1:05 PM, Gordon Messmer wrote:

...

I don't know whether or not there's any noteworthy performance difference. MegaRAID cards have not been as prone to failure in my experience. The management software is definitely inferior.

megacli doth sucketh mightily, but its a little less annoying when you realize that A) the -'s are all optional (so don't use them) and B) the commands are case independent and I find them considerably less annoying to type as all lower case.

so.. this

|megacli cfgldadd r6||||[||20||:||0||,||20||:||1||,||20||:||2||,||20||:||3||,||20||:||4||,||20||:||5||,||20||:||6||,||20||:||7||,||20||:||8||,||20||:||9||,||20||:||10||] a0| |m||egacli pdhsp set physdrv[||20||:||11||,20:12||] a0

instead of...

| |MegaCli64 -CfgLdAdd -r6||[||20||:||0||,||20||:||1||,||20||:||2||,||20||:||3||,||20||:||4||,||20||:||5||,||20||:||6||,||20||:||7||,||20||:||8||,||20||:||9||,||20||:||10||] -a0| ||MegaCli64 -P|dHsp -Set -PhysDrv[||20||:||11||,||20:12||||] -a0|

(where megacli is a symlink to /path/to/MegaCli64 ...)

also I found a python script online called megaclisas-status and modified it to better suit my needs, this gives a MUCH nicer output for drive status than the native commands...

# lsi-raidinfo -- Controllers -- -- ID | Model c0 | LSI MegaRAID SAS 9261-8i

along with another script that runs this and emails alerts if there's any bad drive or volume status.

-- john r pierce 37N 122W somewhere on the middle of the left coast

John R Pierce

9:45 p.m.

On 3/6/2013 1:40 PM, John R Pierce wrote:

...

|megacli cfgldadd r6||||[||20||:||0||,||20||:||1||,||20||:||2||,||20||:||3||,||20||:||4||,||20||:||5||,||20||:||6||,||20||:||7||,||20||:||8||,||20||:||9||,||20||:||10||] a0| |m||egacli pdhsp set physdrv[||20||:||11||,20:12||] a0

yeeeeow, wtf happend to THAT formatting?!? I wrote....

so.. this

megacli cfgldadd r6[20:0,20:1,20:2,20:3,20:4,20:5,20:6,20:7,20:8,20:9,20:10] a0 megacli pdhsp set physdrv[20:11,20:12] a0

instead of...

MegaCli64 -CfgLdAdd -r6[20:0,20:1,20:2,20:3,20:4,20:5,20:6,20:7,20:8,20:9,20:10] -a0 MegaCli64 -PdHsp -Set -PhysDrv[20:11,20:12] -a0

.....

-- john r pierce 37N 122W somewhere on the middle of the left coast

Keith Keller

10:52 p.m.

On 2013-03-06, Gordon Messmer yinyang@eburg.com wrote:

...

On 03/06/2013 12:28 PM, Keith Keller wrote:

...
Please Note: 1) The default of the unit creation sets write cache to "on" for performance reasons. However, if there is no BBU available for the controller, a warning is sent to standard error.

It is a warning that there is no BBU, not a warning that you'll get unsafe write caching without one.

To the best of my recollection, all controllers will turn off write caching if the battery fails and during the battery re-learn cycle. If there is no BBU present, caching will never be enabled.

Unfortunately, I don't have an available machine on which to test. But at one point I believe that I did have a 9550-backed controller with no BBU which would allow me to turn on write caching for a redundant array. I know that this same machine does say that the write cache is "on" for the JBOD units it currently hosts. (It's a tertiary backup on a data center power grid, so the likelihood of sudden power loss is low, and if the filesystem is lost I wouldn't be all that upset.)

...

...
But if LSI starts phasing out the 3ware line perhaps it makes sense for people to start looking at MegaRAID controllers to be prepared for this possibility. (The minimal web searching I've done seems to support the dauntiness of the MegaRAID tools.)

Whether or not LSI continues to make 3ware cards, I continue to strongly recommend against their use. They suck. They have always sucked. If a customer wanted me to manage a system with a 3ware card in it, I'd decline because at some point I'm going to be paged during off hours to fix the damn thing.

This is surprising to me. I've had one 3ware controller fail in 10 years. Two caveats to this note are that the failure ended up destroying the filesystem, and I'm not happy with another 3ware controller I have (the aforementioned 9550, which is very old).

To throw Yet Another Monkey Wrench into the discussion, I know that some of the CentOS folks swear by Areca controllers. So YMMV in any case.

--keith

-- kkeller@wombat.san-francisco.ca.us

Keith Keller

7 Mar 7 Mar

5:22 a.m.

On 2013-03-06, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:

...

This is surprising to me. I've had one 3ware controller fail in 10 years.

I realized that this is untrue. I have had three 3ware controllers fail on me in 10 years, out of about 25. That's actually a pretty poor failure rate. Two of those failures were not fatal--they would crash the disk subsystem (and for the one which hosted the / filesystem, cause a kernel panic), but not destroy the filesystems. One of those two, and the other which did trash the filesystems, were out of warranty at the time of failure. (But I have even older 3ware controllers which are still perfectly fine.)

--keith

-- kkeller@wombat.san-francisco.ca.us

4497

Age (days ago)

4504

Last active (days ago)

discuss@lists.centos.org

25 comments

9 participants

tags (0)

participants (9)

Arun Khan
Eero Volotinen
Gordon Messmer
John R Pierce
Keith Keller
Keith Keller
Les Mikesell
m.roth＠5-cent.us
SilverTip257