Hi All,
I know many of us here manage RAID on our Centos based servers so this may be of interest to us all.
I ordered three new "Enterprise hard drives" this month from a well known UK online retailer. The drives arrived as new in their anti-static packaging. Before using one of the drives in a mission critical hardware raid I checked the SMART attributes and was amazed at what I saw; see a few of the attributes listed below
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail - 2600 9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80 198 Offline_Uncorrectable 0x0030 196 196 000 Old_age - 398 200 Multi_Zone_Error_Rate 0x0008 180 180 000 Old_age - 4077
So for a brand new packaged drive this was a bit of a surprise. "2106" power on hours, obviously should be zero for a new drive and "398" "Offline_Uncorrectable sectors" this is a well used and faulty drive. I contacted the (very well known) manufacturer of the drive and asked for information on the serial number. I was told the serial number of the drive was region specific to the USA and should not even be in the UK. I opened and tested the second and third drives with similar results. I was told two of the drives had already been returned under warranty and replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
So it may be a good policy to check the SMART attributes of drives before deployment!
Cheers, Steve
On Wed, Oct 02, 2013 at 05:24:54PM +0100, Steve Brooks wrote:
9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80
replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
Grey imports would not have been running for 87 days and power cycled 80 times in that period.
If the retailer doesn't refund your money then you need to escalate.
And name the retailer...
On Wed, Oct 02, 2013 at 05:24:54PM +0100, Steve Brooks wrote:
9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80
replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
Grey imports would not have been running for 87 days and power cycled 80 times in that period.
If the retailer doesn't refund your money then you need to escalate.
And name the retailer...
The retailer is certainly willing to refund and the manufacturer is also willing to replace.. The worrying part is that the drives that were replaced under warranty should *not* find there way back onto the shelves re-packaged as new enterprise class drives..
Steve
On Oct 2, 2013, at 11:21 AM, Steve Brooks wrote:
On Wed, Oct 02, 2013 at 05:24:54PM +0100, Steve Brooks wrote:
9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80
replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
Grey imports would not have been running for 87 days and power cycled 80 times in that period.
If the retailer doesn't refund your money then you need to escalate.
And name the retailer...
The retailer is certainly willing to refund and the manufacturer is also willing to replace.. The worrying part is that the drives that were replaced under warranty should *not* find there way back onto the shelves re-packaged as new enterprise class drives..
Steve
Wow, I'm of the belief that for every 1 that gets caught, 3 get away.
This news is alarming but not surprising.
SMART testing batches of 40 drives is a PITA but a good idea.
I better get krakin :)
- aurf
On Wed, Oct 2, 2013 at 11:51 PM, Steve Brooks steveb@mcs.st-and.ac.uk wrote:
The retailer is certainly willing to refund and the manufacturer is also willing to replace.. The worrying part is that the drives that were replaced under warranty should *not* find there way back onto the shelves re-packaged as new enterprise class drives..
Thanks for the heads. After a slew of HDD failures, I use smartctl, badblocks on every drive before putting them into production. However this may not be practical when there are many disks in a storage.
Usually repaired drives are marked "Refurbished" if the RMA is handled by the manufacturer directly. RMA handled by retailer who knows what instruction the management gives regarding returns.
On 02/10/2013 17:28, Stephen Harris wrote:
And name the retailer...
+1000
Come on, this isn't the BBC, name the retailer and the manufacturer...
In fact it is prudent to completely test each drive before use. I recommend scripting several rounds of dd, badblocks, all smart tests, and some hdparm interaction.
On Wed, Oct 2, 2013 at 11:24 AM, Steve Brooks steveb@mcs.st-and.ac.uk wrote:
Hi All,
I know many of us here manage RAID on our Centos based servers so this may be of interest to us all.
I ordered three new "Enterprise hard drives" this month from a well known UK online retailer. The drives arrived as new in their anti-static packaging. Before using one of the drives in a mission critical hardware raid I checked the SMART attributes and was amazed at what I saw; see a few of the attributes listed below
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail - 2600 9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80 198 Offline_Uncorrectable 0x0030 196 196 000 Old_age - 398 200 Multi_Zone_Error_Rate 0x0008 180 180 000 Old_age - 4077
So for a brand new packaged drive this was a bit of a surprise. "2106" power on hours, obviously should be zero for a new drive and "398" "Offline_Uncorrectable sectors" this is a well used and faulty drive. I contacted the (very well known) manufacturer of the drive and asked for information on the serial number. I was told the serial number of the drive was region specific to the USA and should not even be in the UK. I opened and tested the second and third drives with similar results. I was told two of the drives had already been returned under warranty and replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
So it may be a good policy to check the SMART attributes of drives before deployment!
Cheers, Steve _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
From: Steve Brooks steveb@mcs.st-and.ac.uk
I ordered three new "Enterprise hard drives" this month from a well known UK online retailer. The drives arrived as new in their anti-static packaging. Before using one of the drives in a mission critical hardware raid I checked the SMART attributes and was amazed at what I saw; see a few of the attributes listed below
There's also the grey area of the "like new", refurbished (by the manufacturer, or even the vendor), etc... especially on ebay.
When I did some server support for a big name, I learned that all the (very expensive) repair parts they sold to clients whose equipment was out of warranty were all refurbished parts (from other clients).
JD
On 10/3/2013 8:36 AM, John Doe wrote:
There's also the grey area of the "like new", refurbished (by the manufacturer, or even the vendor), etc... especially on ebay.
a LOT of 'refurbish' stuff is refurbed by 3rd parties, neither the OEM or the reseller, and then reinserted into the grey market retail stream. in these cases traceability and accountability can be hard to come by.
now, its a fact that something like 80% of product returns are 100% AOK, they were returned for stupid reasons, pilot error, etc. The refurbishers often do little more than a quick functional test, relabel (if you're lucky) and repack. Big discount resellers like Fry's are notorious for stocking this sort of junk (never mind the stuff they repack in-house).
Hey,
I was wondering about enterprise class drives: Do you really expect the drive to be shipped to you before even a basic validation test? Do you understand that a basic spindown to the car is needed to make sure that all the parts are fine and the car actually works?? I would try to imagine myself this: Hmm OK this is your new car "bam: Hoo we forgot to start the engine and make sure that you have a bit of gas to make it to the next gas station" hoo and "sorry this is the first time we turn the switch on since the assembly of the car so feel free to test it for us.."
Eliezer
On 10/02/2013 07:24 PM, Steve Brooks wrote:
Hi All,
I know many of us here manage RAID on our Centos based servers so this may be of interest to us all.
I ordered three new "Enterprise hard drives" this month from a well known UK online retailer. The drives arrived as new in their anti-static packaging. Before using one of the drives in a mission critical hardware raid I checked the SMART attributes and was amazed at what I saw; see a few of the attributes listed below
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail - 2600 9 Power_On_Hours 0x0032 098 097 000 Old_age - 2106 12 Power_Cycle_Count 0x0032 100 100 000 Old_age - 80 198 Offline_Uncorrectable 0x0030 196 196 000 Old_age - 398 200 Multi_Zone_Error_Rate 0x0008 180 180 000 Old_age - 4077
So for a brand new packaged drive this was a bit of a surprise. "2106" power on hours, obviously should be zero for a new drive and "398" "Offline_Uncorrectable sectors" this is a well used and faulty drive. I contacted the (very well known) manufacturer of the drive and asked for information on the serial number. I was told the serial number of the drive was region specific to the USA and should not even be in the UK. I opened and tested the second and third drives with similar results. I was told two of the drives had already been returned under warranty and replaced with new drives. Wow... I was also told by the online retailer this is known as a grey import and is not that uncommon..
So it may be a good policy to check the SMART attributes of drives before deployment!
Cheers, Steve _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 10/05/2013 02:57 AM, Peter wrote:
On 10/05/2013 11:39 AM, Eliezer Croitoru wrote:
Hey,
I was wondering about enterprise class drives: Do you really expect the drive to be shipped to you before even a basic validation test?
I would expect 24 or maybe 48 hours for a burn-in, but not 87 days.
OK so it is clear now that a new driver should be tested but not be *used* :D
Eliezer
Peter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 07/10/2013 00:49, Eliezer Croitoru wrote:
On 10/05/2013 02:57 AM, Peter wrote:
On 10/05/2013 11:39 AM, Eliezer Croitoru wrote:
Hey,
I was wondering about enterprise class drives: Do you really expect the drive to be shipped to you before even a basic validation test?
Hello,
I think any test should nothing to do with these counters. If I buy a new hard drive I expect to have counters on zero. Because the tests is made (or should be) by manufacturer after test they can be set these counters to zero.
Levi
I would expect 24 or maybe 48 hours for a burn-in, but not 87 days.
OK so it is clear now that a new driver should be tested but not be *used* :D
Eliezer
Peter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hey Levi,
This is another angle that you are talking about. I would not worry about it that much if it is seald with the manufacturer stamp on it. what whould be done on the drive?? somebody transfered some data? these counters are there for a reason and I would want the manufactrer to do couple tests and if the seal means that all tests was done on the motor\engine and the electronic board (which are compiled from couple parts\places) I would want them to test the whole drive for me to make sure that the screw is not loose and the hardware can run a full run and is not failing at all. If the testing tools are acurate enoguh to prevent the need for a *RUN* test I do not mind leaving the drive assembled as is and thats it. The drive pin\head should be docked and locked the wole time of delivery of the drive etc..
I am still wating for WD or SEAGATE representetive of them to describe for me the details of how a how a drive was made from 0 to 100.
Eliezer
On 10/07/2013 09:24 AM, Birta Levente wrote:
On 07/10/2013 00:49, Eliezer Croitoru wrote:
On 10/05/2013 02:57 AM, Peter wrote:
On 10/05/2013 11:39 AM, Eliezer Croitoru wrote:
Hey,
I was wondering about enterprise class drives: Do you really expect the drive to be shipped to you before even a basic validation test?
Hello,
I think any test should nothing to do with these counters. If I buy a new hard drive I expect to have counters on zero. Because the tests is made (or should be) by manufacturer after test they can be set these counters to zero.
Levi
I would expect 24 or maybe 48 hours for a burn-in, but not 87 days.
OK so it is clear now that a new driver should be tested but not be *used* :D
Eliezer
Peter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 10/7/2013 5:59 AM, Eliezer Croitoru wrote:
I am still wating for WD or SEAGATE representetive of them to describe for me the details of how a how a drive was made from 0 to 100.
I'm sure they both consider that information trade secret.
its my understanding that testing done on the factory floor leaves the counters cleared when the final firmware is installed. Ditto factory 'remanufactured' aka 'refurbished' drives that are tested, and relabeled, they get cleared after test. last one of these I got, sold as such, had a different colored label (green instead of silver) and clearly said remanufactured, I'm pretty sure its SMART data was also reset. What the OP got appears to be a drive that was returned, retested and resold somewhere in the distributor-retailer train, NOT by the factory, hence what people refer to as 'grey market'.
On 07/10/2013 19:28, John R Pierce wrote:
On 10/7/2013 5:59 AM, Eliezer Croitoru wrote:
I am still wating for WD or SEAGATE representetive of them to describe for me the details of how a how a drive was made from 0 to 100.
I'm sure they both consider that information trade secret.
its my understanding that testing done on the factory floor leaves the counters cleared when the final firmware is installed. Ditto factory 'remanufactured' aka 'refurbished' drives that are tested, and relabeled, they get cleared after test. last one of these I got, sold as such, had a different colored label (green instead of silver) and clearly said remanufactured, I'm pretty sure its SMART data was also reset. What the OP got appears to be a drive that was returned, retested and resold somewhere in the distributor-retailer train, NOT by the factory, hence what people refer to as 'grey market'.
I've replaced a number of Seagate 1TB SAS drives, constellations I think, and at least 2 of the 3 replacements I've done were with drives that were clearly marked as for RMA REPLACEMENT ONLY, which I assume are previously 'failed' drives that have gone back, been re-assessed / re-furbished and put back into the market. I don't know much about SMART, but I get the impression that the drives decide to fail themselves when some metric goes anomalous, rather than continue running and potentially cause data corruption. Therefore there's likely to be a large number of drives that can be tweaked to go back into production after they have 'failed' If I buy a drive from a retailer, then I expect a factory 'new' one though, hence my request for the manufacturer and retailer to be named by the OP.
On 10/8/2013 2:17 AM, Giles Coochey wrote:
I don't know much about SMART, but I get the impression that the drives decide to fail themselves when some metric goes anomalous, rather than continue running and potentially cause data corruption. Therefore there's likely to be a large number of drives that can be tweaked to go back into production after they have 'failed'
the metrics in SMART are to tell the OS when failure is imminent. for various complex reasons they aren't as effective at this as one would like.