[CentOS] Looking for Insights

Shawn Everett

shawn at tandac.com
Sat Dec 1 18:46:59 UTC 2007


Hi Guys,

I had a strange problem yesterday and I'm curious as to what everyone 
thinks.

I have a client with a Red Hat Enterprise 2.1 cluster.  All quality HP 
equipment with an MSA 500 storage array acting as the shared storage 
between the two nodes in the cluster.

This cluster is configured for reliability and not load balancing.  All 
work is handled by one node or the other not both.

There are two 100GB RAID 5 logical drives in the MSA500.  Linux sees them 
as /dev/md2 and /dev/md3 respectively.  Running cat /proc/mdstat shows 
them as "active multipath" and otherwise healthy.

There is a nightly shell script that runs and backs up information via tar 
to a USB external drive.  The last thing the script does before unmounting 
the USB drive is to run the sync command.

Yesterday it was noticed that the backup script was hung.  A quick check 
via "ps aux" showed the backup script and a sync process still hanging 
around from Monday night.

All attempts to stop these processed failed and it was decided a reboot was 
the best fix.  All production services where shut down as much as 
possible.  Because of the locked processes the machine would hot shut down 
properly and was physically turned off.

On reboot the drives in the server were checked via e2fsck with no 
problems.

The shared storage: md2 and md3 also mounted without errors.

All services started properly.

No errors about md2 or md3 were reported in dmesg or /var/log/messages

Strangely, all files added via Samba after Monday are gone. This is limited 
to only one device: md3.  Everything else is fine.

Checking the two drives/partitions that make up md3 show none of the 
missing files.

Any brilliant thoughts as to where those files might have gone would be 
appreciated.  The files lost are not critical so there are no major 
problems but it's a puzzle I can't quite figure out.

Shawn 



More information about the CentOS mailing list