[CentOS] weird XFS problem

Sun Jan 22 21:44:57 UTC 2012
Ross Walker <rswwalker at gmail.com>

On Jan 22, 2012, at 4:41 PM, Ross Walker <rswwalker at gmail.com> wrote:

> On Jan 22, 2012, at 10:00 AM, Boris Epstein <borepstein at gmail.com> wrote:
> 
>> Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0026):
>> Drive ECC error reported:port=4, unit=0.
>> Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x002D):
>> Source drive error occurred:port=4, unit=0.
>> Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0004):
>> Rebuild failed:unit=0.
>> Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x003B):
>> Rebuild paused:unit=0.
> 
> From 3ware's site:
> 004h Rebuild failed
> 
> The 3ware RAID controller was unable to complete a rebuild operation. This error can be caused by drive errors on either the source or the destination of the rebuild. However, due to ATA drives' ability to reallocate sectors on write errors, the rebuild failure is most likely caused by the source drive of the rebuild detecting some sort of read error. The default operation of the 3ware RAID controller is to abort a rebuild if an error is encountered. If it is desired to continue on error, you can set the Continue on Source Error During Rebuild policy for the unit on the Controller Settings page in 3DM.
> 
> 026h Drive ECC error reported
> 
> This AEN may be sent when a drive returns the ECC error response to an 3ware RAID controller command. The AEN may or may not be associated with a host command. Internal operations such as Background Media Scan post this AEN whenever drive ECC errors are detected.
> 
> Drive ECC errors are an indication of a problem with grown defects on a particular drive. For redundant arrays, this typically means that dynamic sector repair would be invoked (see AEN 023h). For non-redundant arrays (JBOD, RAID 0 and degraded arrays), drive ECC errors result in the 3ware RAID controller returning failed status to the associated host command.
> 
> Sounds awfully like a hardware error on one of the drives. Replace the failed drive and try rebuilding.
> 

This error code does not bode well.

02Dh Source drive error occurred

If an error is encountered during a rebuild operation, this AEN is generated if the error was on a source drive of the rebuild. Knowing if the error occurred on the source or the destination of the rebuild is useful for troubleshooting.



It's possible the whole RAID6 is corrupt.

-Ross