Here are some suggestions: 1. Enable and configure kdump 2. Enable Magic SysRq 3. Consider enabling "kernel.softlockup_panic" and "vm.panic_on_oom", but doing so will cause you server to crash sooner than it would normally --> it depends upon whether you want to capture the first instance (e.g. smoking gun) or that you want to wait until the system is completely hosed (and may have more evidence of the issue). Then test and verify that Magic SysRq can be used to generate a kernel core dump. Then, sit back and wait ..... I do this on all my production servers -- saving the pain of having to do this under pressure plus capturing the vmcore on the first instance is very much worth the effort .... HTH -rak-