On Sun, Jan 22, 2012 at 1:34 PM, Keith Keller < kkeller@wombat.san-francisco.ca.us> wrote:
On 2012-01-22, Boris Epstein borepstein@gmail.com wrote:
Also, here's somethine else I have discovered. Apparently there is an potential intermittent RAID disk trouble. At least I found the following
in
the system log:
Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR
(0x04:0x0026):
Drive ECC error reported:port=4, unit=0. Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR
(0x04:0x002D):
Source drive error occurred:port=4, unit=0.
Which 3ware controller is this? I have had lots of problems with the 3ware 9550SX controller and WD-EA[RD]S drives in a similar configuration. (Yes, I know all about the EARS drives, but they work mostly fine with the 3ware 9650 controller, so I suspect some weird interaction between the cheap drives and the old not-so-great controller. I also suspect an intermittently failing port, which I'll be testing more later this week.)
Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING (0x04:0x000F): SMART threshold exceeded:port=9. Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING (0x04:0x000F): SMART threshold exceeded:port=9. Jan 22 09:56:17 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x000B): Rebuild started:unit=0.
What does your RAID look like? Are you using the 3ware's RAID6 (in which case it's not a 9550) or mdraid? Are the 3ware errors in the logs across a large number of ports or just a few? Have you used the drive tester for your drives to verify that they're still good? On all my other systems, when the controller has reported a failure, and I've run it through the tester, it's reported a failure. (Often when my 9550 reports a failure the drive passes all tests.)
If you happen to have real RAID drive models, you may also try contacting LSI support. They will steadfastly refuse to help if you have desktop-edition drives, but can be at least somewhat helpful if you have enterprise drives.
--keith
-- kkeller@wombat.san-francisco.ca.us
Keith, thanks!
The RAID is on the controller level. Yes, I believe the controller is a 3Ware 9xxx series - I don't recall the details right now.
What are you referring to as "drive tester"?
Boris.