I have no personal experience with rack mount chassis.
From the past postings, I reckon there are members, in this list, who
have experience in rack mount setups and would like get their advice.
To reduce the H/W cost, I am considering Linux mdadm RAID10 on a 2U chassis.
I would appreciate clarification on the following:
In rack mount chassis, do the cages that house the hard disks have the following feature?
(a) Indicate disk failure. LED lights up and/or audio alarm? (b) The failed HDD can be swapped.
TIA.
On 10/11/11 12:11 AM, Arun Khan wrote:
In rack mount chassis, do the cages that house the hard disks have the following feature?
(a) Indicate disk failure. LED lights up and/or audio alarm?
that requires specific configuration to suit whatever drive interconnect you have.
(b) The failed HDD can be swapped.
If they are in hotswap bays, yes. if they aren't, no.
they have 2U servers with as many as 25 2.5" SAS bays now.
John R Pierce wrote:
On 10/11/11 12:11 AM, Arun Khan wrote:
In rack mount chassis, do the cages that house the hard disks have the following feature?
(a) Indicate disk failure. LED lights up and/or audio alarm?
that requires specific configuration to suit whatever drive interconnect you have.
Sometimes, it's just looking at what drive is showing SMART (or other) errors. Other times, look at the light.
(b) The failed HDD can be swapped.
If they are in hotswap bays, yes. if they aren't, no.
Don't waste time and money getting anything *other* than hot swap bays. You really, really don't want to have to pull the server out, and take it apart just to swap a bad drive.
they have 2U servers with as many as 25 2.5" SAS bays now.
mark
On Tue, Oct 11, 2011 at 6:36 PM, m.roth@5-cent.us wrote:
John R Pierce wrote:
On 10/11/11 12:11 AM, Arun Khan wrote:
In rack mount chassis, do the cages that house the hard disks have the following feature?
(a) Indicate disk failure. LED lights up and/or audio alarm?
that requires specific configuration to suit whatever drive interconnect you have.
Sometimes, it's just looking at what drive is showing SMART (or other) errors. Other times, look at the light.
Agree but it requires me to be in physical proximity of the system. My objective is to make the "HDD failure recovery" process as deterministic as possible. As stated in another response, this will be an appliance running 24x7 at sites where Linux admin knowledge is likely to be sparse and some sites may not give me connectivity over the 'Net.
(b) The failed HDD can be swapped.
If they are in hotswap bays, yes. if they aren't, no.
Don't waste time and money getting anything *other* than hot swap bays. You really, really don't want to have to pull the server out, and take it apart just to swap a bad drive.
OK, I have understood the value of hot swap bays for disk failure scenario.
Thanks, -- Arun Khan
On Tue, Oct 11, 2011 at 12:46 PM, John R Pierce pierce@hogranch.com wrote:
On 10/11/11 12:11 AM, Arun Khan wrote:
In rack mount chassis, do the cages that house the hard disks have the following feature?
(a) Indicate disk failure. LED lights up and/or audio alarm?
that requires specific configuration to suit whatever drive interconnect you have.
Does these bays have a connector (+ cable) that is connected to the motherboard or RAID card to control the HDD LEDs in the bay? (sorry if this appears basic but I have no experience with such hardware)
(b) The failed HDD can be swapped.
If they are in hotswap bays, yes. if they aren't, no.
Are the hot swap bays compatible with Linux mdadm RAID? i.e. Upon detection of disk failure, the respective HDD LED on the bay can be turned ON?
they have 2U servers with as many as 25 2.5" SAS bays now.
The 2U system is for an appliance that I am building and it will be a commercial product. I plan to order the "integrated" system from "value add" SIs of Supremicro/Tyan (whover is able to satisfy the h/w spec.). My storage requirement is 2TB (4X1TB disks), a 6 disk bay should be sufficient (or whatever is the lowest denominator).
I am trying to reduce the cost if I can get by with mdadm RAID10 with additional tools to detect failed drive and then rebuild the s/w RAID10 when a new disk is inserted. In some cases, it will not be possible to service the unit by me and for that reason I am looking for a visual clue (LED ON) so that I can guide the local sysadmin. I may have to go with H/W RAID if it is not possible to do the same with mdadm RAID.
Thanks for your help. -- Arun Khan
Arun Khan wrote:
On Tue, Oct 11, 2011 at 12:46 PM, John R Pierce pierce@hogranch.com wrote:
On 10/11/11 12:11 AM, Arun Khan wrote:
In rack mount chassis, do the cages that house the hard disks have the following feature?
<snip>
Does these bays have a connector (+ cable) that is connected to the motherboard or RAID card to control the HDD LEDs in the bay? (sorry if this appears basic but I have no experience with such hardware)
They have sleds. You screw a std. drive into one, and shove it in - literally, that's all there is to it.
(b) The failed HDD can be swapped.
If they are in hotswap bays, yes. if they aren't, no.
Are the hot swap bays compatible with Linux mdadm RAID? i.e. Upon detection of disk failure, the respective HDD LED on the bay can be turned ON?
Everything understands hot swap bays these days, and certainly Linux, like every other version of Unix, does. Let's see, I have well over a hundred rackmounts in our server rooms and the data center, all have hot swap, and 90% are running CentOS (and a very few RHEL, and a couple of odd things, and there are the few WinDoze servers (they have hot swap, also). <snip>
The 2U system is for an appliance that I am building and it will be a commercial product. I plan to order the "integrated" system from "value add" SIs of Supremicro/Tyan (whover is able to satisfy the h/w spec.). My storage requirement is 2TB (4X1TB disks), a 6 disk bay should be sufficient (or whatever is the lowest denominator).
You absolutely do *NOT* want anything but hot swap. Take a look at the Dell R[468]10's. <snip> mark
On 10/11/11 7:29 AM, Arun Khan wrote:
that requires specific configuration to suit whatever drive interconnect
you have.
Does these bays have a connector (+ cable) that is connected to the motherboard or RAID card to control the HDD LEDs in the bay? (sorry if this appears basic but I have no experience with such hardware)
typically, a server will have a SAS backplane which sas/sata drives hot plug into, and 1 or more 4 channel SAS ports that plug into the host bus adapter or raid controller. this SAS backplane usually has a 'SES' controller[*] embedded on it, which appears to the host as another SAS device, and manages the LEDs. If its a brand name server (hp, dell, ibm, etc) using the vendor's raid cards, the LEDs all just work. if its whitebox stuff, with JBOD, getting the right failure LEDs to come on may require some custom configuration.
[*] SES supercedes the earlier SAF-TE design for the same functionality.
On 10/11/2011 03:29 PM, Arun Khan wrote:
Are the hot swap bays compatible with Linux mdadm RAID? i.e. Upon detection of disk failure, the respective HDD LED on the bay can be turned ON?
no, not all are. Only a few work with mdadm ( or rather in a way that mdadm can work with them, even the basic mdadm hotswap capability is new'ish. Test it a few times to make sure it works for your setup. ).
I am trying to reduce the cost if I can get by with mdadm RAID10 with additional tools to detect failed drive and
Also, mdraid10 isnt the same as a normal raid-10, unless you meant to imply that you are doing a raid10 with md-raid tools.
- KB
On Tue, Oct 18, 2011 at 8:52 PM, Karanbir Singh mail-lists@karan.org wrote:
Also, mdraid10 isnt the same as a normal raid-10, unless you meant to imply that you are doing a raid10 with md-raid tools.
Yes, the plan is to create raid10 with the md tools.
-- Arun Khan
Dear Arun,
On Tuesday, October 11, 2011 you wrote:
I would appreciate clarification on the following:
(a) Indicate disk failure. LED lights up and/or audio alarm? (b) The failed HDD can be swapped.
Don't rely on the LED going on. I mark all my hot swap disks with labels with their serial number. This label is visible from the outside without removing the HD. That way, I can double check that I remove the faulty disk. Pulling the wrong disk is the last thing you want to risk in a RAID setup. Relying on a fault LED is close to that. Also make a list of the HD serial numbers and their position within the RAID in time. Store that in a safe place.
I pulled ONCE the wrong disk out of a Raid5 array. :-( You know what that means?
best regards --- Michael Schumacher PAMAS Partikelmess- und Analysesysteme GmbH Dieselstr.10, D-71277 Rutesheim Tel +49-7152-99630 Fax +49-7152-996333 Geschäftsführer: Gerhard Schreck Handelsregister B Stuttgart HRB 252024
Hi Michael,
On Fri, Oct 14, 2011 at 5:35 PM, Michael Schumacher wrote:
On Tuesday, October 11, 2011 you wrote:
I would appreciate clarification on the following:
(a) Indicate disk failure. LED lights up and/or audio alarm? (b) The failed HDD can be swapped.
Don't rely on the LED going on. I mark all my hot swap disks with labels with their serial number. This label is visible from the outside without removing the HD. That way, I can double check that I remove the faulty disk. Pulling the wrong disk is the last thing you want to risk in a RAID setup. Relying on a fault LED is close to that. Also make a list of the HD serial numbers and their position within the RAID in time. Store that in a safe place.
Thanks for these very helpful suggestions - good admin practice.
I pulled ONCE the wrong disk out of a Raid5 array. :-( You know what that means?
You mean, it is not OK to pull out a "functioning" disk? Pulling one disk out of RAID 5 should be OK. Am I missing something?
Thanks, -- Arun Khan
On 10/18/2011 04:11 PM, Arun Khan wrote:
You mean, it is not OK to pull out a "functioning" disk? Pulling one disk out of RAID 5 should be OK. Am I missing something?
grab yourself a bunch of usb keys + a usb hub - fire up mdadm on your laptop and use those keys as target disks and see how things work with mdadm and hotswap. Much fun to be had there. I would also recommend using CentOS6.
Pulling a disk that isnt set bad and deactivated in mdadm can cause some very funky results - best of all, the machine will freeze and you can reinsert the disk boot up and carry on. Worst of all, you will lose all the data on the array.
btw, dont think that these issues dont affect hardware raid - they do. its just that the management for these things is slightly more abstracted away and the controllers are better integrated with the disk cages.
- KB
On Tue, Oct 18, 2011 at 8:55 PM, Karanbir Singh mail-lists@karan.org wrote:
On 10/18/2011 04:11 PM, Arun Khan wrote:
You mean, it is not OK to pull out a "functioning" disk? Pulling one disk out of RAID 5 should be OK. Am I missing something?
grab yourself a bunch of usb keys + a usb hub - fire up mdadm on your laptop and use those keys as target disks and see how things work with mdadm and hotswap. Much fun to be had there. I would also recommend using CentOS6.
Thanks for the suggestion - a great way to experiment.
From the feedback on this thread, I am leaning towards h/w raid controller.
Pulling a disk that isnt set bad and deactivated in mdadm can cause some very funky results - best of all, the machine will freeze and you can reinsert the disk boot up and carry on. Worst of all, you will lose all the data on the array.
I agree.
btw, dont think that these issues dont affect hardware raid - they do. its just that the management for these things is slightly more abstracted away and the controllers are better integrated with the disk cages.
About 10 years ago, I had a h/w raid controller go bad (HDDs connected via SCSI cable - no HDD bays involved). The replacement card recreated the RAID array - lost all data. I did have a back up to restore most of the data.
-- Arun Khan
On Tue, Oct 18, 2011 at 11:05 AM, Arun Khan knura9@gmail.com wrote:
btw, dont think that these issues dont affect hardware raid - they do. its just that the management for these things is slightly more abstracted away and the controllers are better integrated with the disk cages.
About 10 years ago, I had a h/w raid controller go bad (HDDs connected via SCSI cable - no HDD bays involved). The replacement card recreated the RAID array - lost all data. I did have a back up to restore most of the data.
I don't think anything is immune to failure. Another fun case is a randomly-bad memory bit causing different things to be written to software raid mirrors. I had one that took 3+ days of running memtest86 to catch.
On Tuesday, October 18, 2011 01:07:02 PM Les Mikesell wrote:
I don't think anything is immune to failure. Another fun case is a randomly-bad memory bit causing different things to be written to software raid mirrors. I had one that took 3+ days of running memtest86 to catch.
ECC RAM?
On Wed, Oct 19, 2011 at 2:33 PM, Lamar Owen lowen@pari.edu wrote:
On Tuesday, October 18, 2011 01:07:02 PM Les Mikesell wrote:
I don't think anything is immune to failure. Another fun case is a randomly-bad memory bit causing different things to be written to software raid mirrors. I had one that took 3+ days of running memtest86 to catch.
ECC RAM?
The server said it was one-bit-correcting or something like that. I thought it was supposed to stop if it had errors it couldn't correct. I swapped the whole set out at once without digging much more into the details.
On Tue, Oct 18, 2011 at 10:11 AM, Arun Khan knura9@gmail.com wrote:
I would appreciate clarification on the following:
(a) Indicate disk failure. LED lights up and/or audio alarm? (b) The failed HDD can be swapped.
Don't rely on the LED going on. I mark all my hot swap disks with labels with their serial number. This label is visible from the outside without removing the HD. That way, I can double check that I remove the faulty disk. Pulling the wrong disk is the last thing you want to risk in a RAID setup. Relying on a fault LED is close to that. Also make a list of the HD serial numbers and their position within the RAID in time. Store that in a safe place.
Thanks for these very helpful suggestions - good admin practice.
I pulled ONCE the wrong disk out of a Raid5 array. :-( You know what that means?
You mean, it is not OK to pull out a "functioning" disk? Pulling one disk out of RAID 5 should be OK. Am I missing something?
Usually you would be swapping drives to repair an already-broken raid. Unless you have a hot spare and the raid has already rebuilt on it, pulling a working disk will take a 2nd drive out of the failed raid5 and kill it.
On Tue, Oct 18, 2011 at 8:57 PM, Les Mikesell lesmikesell@gmail.com wrote:
On Tue, Oct 18, 2011 at 10:11 AM, Arun Khan knura9@gmail.com wrote:
I pulled ONCE the wrong disk out of a Raid5 array. :-( You know what that means?
You mean, it is not OK to pull out a "functioning" disk? Pulling one disk out of RAID 5 should be OK. Am I missing something?
Usually you would be swapping drives to repair an already-broken raid. Unless you have a hot spare and the raid has already rebuilt on it, pulling a working disk will take a 2nd drive out of the failed raid5 and kill it.
Thanks I get it now :)
-- Arun Khan