Hi List,
We currently have a very irritating problem with Centos 5.3 x86_64 running on a Dell Poweredge SC1435. The problem is this: We are experiencing frequent kernel panics while using glusterfs and Fuse. Across the cluster of servers, we are experiencing roughly 1 panic every 1-2 days. This wasn't a problem with earlier servers where we used Fedora 6.
Here's a kernel panic screenshot: http://imagehost.gr/images/c5ad2d5jzgpgoq91v24y.png
Here's some general info: Linux server6 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Using: fuse-2.7.4-8_10.el5 fuse-kmdl-2.6.18-128.1.10.el5-2.7.4-8_10.el5 fuse-libs-2.7.4-8_10.el5 glusterfs-common-2.0.1-1.el5 glusterfs-client-2.0.1-1.el5 glusterfs-server-2.0.1-1.el5
I've straced glusterfs while it dies, and there's nothing seriously spurious, just it stops working as soon as the kernel locks up.
A little background, Gluster is used to share some directories which are used by apache to serve files from. I've managed to replicate the live environment inside a virtual machine, and also to replicate the kernel panic by loading the virtual machine's apache with ApacheBench, at as few as 3 concurrent requests, the kernel locks up. However, i have been unable to reproduce this exact behavior on the live cluster, and have tried up to 10,000 concurrent requests which max out the network more than anything.
I've tried latest versions of gluster and fuse from development snapshots and stable releases, I've tried patched versions of fuse released by Gluster. Nothing seems to improve this problem.
If anyone has any ideas for further debugging, or other routes for support. I'm running out of ideas.
Thanks in advance
Tom O'Connor
On Tue, Sep 15, 2009 at 03:24:52PM +0100, Tom O'Connor wrote:
If anyone has any ideas for further debugging, or other routes for support. I'm running out of ideas.
Enterprise Linux 5.4 with included official FUSE support seems like the next place to look.
Matthew Miller wrote:
On Tue, Sep 15, 2009 at 03:24:52PM +0100, Tom O'Connor wrote:
If anyone has any ideas for further debugging, or other routes for support. I'm running out of ideas.
Enterprise Linux 5.4 with included official FUSE support seems like the next place to look.
Possibly, but i'd rather try and fix the problem without saying "oh well, just upgrade to the latest release". It's quite a lot of effort to fully upgrade a whole bunch of servers, but upgrading individual packages would be far more realistic.
Tom
Tom O'Connor wrote:
Matthew Miller wrote:
On Tue, Sep 15, 2009 at 03:24:52PM +0100, Tom O'Connor wrote:
If anyone has any ideas for further debugging, or other routes for support. I'm running out of ideas.
Enterprise Linux 5.4 with included official FUSE support seems like the next place to look.
Possibly, but i'd rather try and fix the problem without saying "oh well, just upgrade to the latest release". It's quite a lot of effort to fully upgrade a whole bunch of servers, but upgrading individual packages would be far more realistic.
Good luck tracking down the problem yourself then. The reason people use RHEL and therefore Centos is because much effort has been put into making sure the entire set of toolchains work well with each other. Upgrading a whole bunch of servers versus tracking down the problem and if you are successful, building your own rpms and your own repository, which one do you think will be more effort? Besides, 'upgrading to 5.4' is just that...upgrading individual packages. :-|
On Tue, Sep 15, 2009 at 03:53:52PM +0100, Tom O'Connor wrote:
Possibly, but i'd rather try and fix the problem without saying "oh well, just upgrade to the latest release". It's quite a lot of effort to fully upgrade a whole bunch of servers, but upgrading individual packages would be far more realistic.
I understand your point in general, but in this specific case the suggestion is to upgrade from a release in which the feature you are using is unsupported to a release in which it is.