[CentOS] CentOS 6.2 + areca raid + xfs problems

Crunch numbercruncher245 at gmail.com
Wed Apr 4 09:26:33 UTC 2012


On 04/03/2012 05:58 PM, Tony Schreiner wrote:
> Two weeks ago I (clean-)installed CentOS 6.2 on a server which had been running 5.7.
>
> There is a 16 disk = ~11 TB data volume running on an Areca ARC-1280 raid card with LVM + xfs filesystem on it. The included arcmsr driver module is loaded.
>
> At first it seemed ok, but with in a few hours I started getting I/O error message on directory listings, and then a bit later when I did a vgdisplay command there was garbage in that.

The file system data are being corrupted. This can only happen either 
through human intervention or hardware failure; assuming that the 
original installation was okay. This is a safe assumption to make 
considering you've reinstalled and it now seems to be okay.

>
> I then ran the volume check on the RAID card bios, it flagged 3 errors. When I restarted the system, things were ok, but then the problem reappeared.
> I ran another volume check and no errors were flagged (I should note, the check takes about 9 hours). but upon restarting, the file system was ok, but then went bad again.

Presumably the card bios runs checks only on the firmware and/or the 
hardware; say disks and the card itself. The reported errors therefore 
point to those components.

>
> Another symptom was that the cli64 raid management utility, which I got from the Areca site would just hang.
I would guess the utility is a piece of client code that queries the 
firmware. Assuming nothing is wrong with the client code, this implies 
some form of defect occurring in the firmware. Could be unresponsive 
hardware or corrupt firmware code.

>
> After a couple of days of this, I decided I could not afford to have this system unavailable, and I reinstalled CentOS 5.8. Everything has been fine since.
The firmware and file system may well have corrected the errors on your 
first pass. But then for the corruption to happen again without any 
detected errors sounds inconsistent. There's something missing here. 
Maybe the card corrected the errors itself the second time leaving 
corruption behind.




More information about the CentOS mailing list