[CentOS] Aide error "Caught SIGBUS/SEGV"

Fri May 7 17:56:10 UTC 2010
m.roth at 5-cent.us <m.roth at 5-cent.us>

> m.roth at 5-cent.us wrote:
>>> Bowie Bailey wrote:
>>>> m.roth at 5-cent.us wrote:
>>>>> Brian wrote:
>>>>>>> [mailto:centos-bounces at centos.org] On Behalf Of Bowie Bailey
>>>>>>> Bowie Bailey wrote:
>>>>>>>
>>>>>>>> One of my servers has recently started giving an error every time
>>>>>>>> I run "aide --check".  I ran it manually twice today with the
>>>>>>>> same results. The second time, I added the -V flag, but that
>>>>>>>> didn't give me anything useful.  The system is currently running
>>>>>>>> CentOS 5.3.
>>>>>>>>
>> <snip>
>>
>>>>>> Suggest: Rename your current database, and aide -i to build a new
>>>>>> one, the aide -c to check it.
>>>>>> If that works (aide -c on new database) I'd suspect (pulling stray
>>>>>> thoughts out of /dev/chaos) that your current data base is corrupt
>>>>>> enough that you can't check it.
>>>>>>
>>>> I will try re-initializing the database.  That's a good idea that
>>>> hadn't occurred to me for whatever reason...  :)
>>>>
>>> No dice.  I tried running 'aide --init' and it died with the exact same
>>> error.
>>>
>>> Maybe I should just try reinstalling it.  Any other ideas?
>>>
>> mysqldump. Have you looked at the logs for mysql itelf?
>>
> What does mysql have to do with it?  I don't have mysql installed on
> this machine.

Sorry, don't know aide, but you mentioned a database. I was suggesting, in
a broader sense, dumping the database to a backup and rebuilding the
*entire* d/b, including the control files.
>
> However, the comment about looking at the logs pointed me to a related
> issue.  I am seeing this in my logs:
>
>         kernel: attempt to access beyond end of device
>         kernel: dm-0: rw=0, want=4344463064, limit=126550016
>
> Looks like I may have some corruption on the disk.  When I get a chance,
> I'll take it down and run fsck to see if that will help.

Ack! No, that doesn't look good at all. It's almost as though the disk is
full, or there's something that makes the kernel think it is.

        mark