Hi Guys,
I had a strange problem yesterday and I'm curious as to what everyone thinks.
I have a client with a Red Hat Enterprise 2.1 cluster. All quality HP equipment with an MSA 500 storage array acting as the shared storage between the two nodes in the cluster.
This cluster is configured for reliability and not load balancing. All work is handled by one node or the other not both.
There are two 100GB RAID 5 logical drives in the MSA500. Linux sees them as /dev/md2 and /dev/md3 respectively. Running cat /proc/mdstat shows them as "active multipath" and otherwise healthy.
There is a nightly shell script that runs and backs up information via tar to a USB external drive. The last thing the script does before unmounting the USB drive is to run the sync command.
Yesterday it was noticed that the backup script was hung. A quick check via "ps aux" showed the backup script and a sync process still hanging around from Monday night.
All attempts to stop these processed failed and it was decided a reboot was the best fix. All production services where shut down as much as possible. Because of the locked processes the machine would hot shut down properly and was physically turned off.
On reboot the drives in the server were checked via e2fsck with no problems.
The shared storage: md2 and md3 also mounted without errors.
All services started properly.
No errors about md2 or md3 were reported in dmesg or /var/log/messages
Strangely, all files added via Samba after Monday are gone. This is limited to only one device: md3. Everything else is fine.
Checking the two drives/partitions that make up md3 show none of the missing files.
Any brilliant thoughts as to where those files might have gone would be appreciated. The files lost are not critical so there are no major problems but it's a puzzle I can't quite figure out.
Shawn
Shawn Everett wrote:
No errors about md2 or md3 were reported in dmesg or /var/log/messages
Anything about the underlying physical devices? Do you have diagnostic tools for the hardware raid underneath the software array?
Strangely, all files added via Samba after Monday are gone. This is limited to only one device: md3. Everything else is fine.
Checking the two drives/partitions that make up md3 show none of the missing files.
Any brilliant thoughts as to where those files might have gone would be appreciated. The files lost are not critical so there are no major problems but it's a puzzle I can't quite figure out.
It sounds like the hardware device where this was writing was not responding, but I'd expect some sort of timeout and log message - unless /var/log is on this device too.
Strangely, all files added via Samba after Monday are gone. This is limited to only one device: md3. Everything else is fine.
Checking the two drives/partitions that make up md3 show none of the missing files.
Any brilliant thoughts as to where those files might have gone would be appreciated. The files lost are not critical so there are no major problems but it's a puzzle I can't quite figure out.
Is there any information available in the samba log files?
On Saturday 01 December 2007, Barry Brimer wrote:
Strangely, all files added via Samba after Monday are gone. This is limited to only one device: md3. Everything else is fine.
Checking the two drives/partitions that make up md3 show none of the missing files.
Any brilliant thoughts as to where those files might have gone would be appreciated. The files lost are not critical so there are no major problems but it's a puzzle I can't quite figure out.
Is there any information available in the samba log files?
None of the logs show anything unusual.
Shawn