On 04/04/2012 09:16 AM, Jonathan Alstead wrote: > Hello, > > Recently our dell sc1425 server has been locking up with kernel freezes > and required a hard reboot on each occasion. I've looked on the centos > forums with limited success - each problem seems slightly different > (some failure on high load, some not). Our kernel is 2.6.18-274.17.1.el5 > and /var/log/messages show the following errors: > > Apr 3 12:41:25 sp2 kernel: INFO: task mysqld:15345 blocked for more > than 120 seconds. > Apr 3 12:41:25 sp2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 3 12:41:25 sp2 kernel: mysqld D 00000CEB 2524 15345 32083 > 15346 15167 (NOTLB) > Apr 3 12:41:25 sp2 kernel: c50c7f54 00000082 bf379c08 00000ceb > ca9b1648 f43c6c5c 00000000 00000001 > Apr 3 12:41:25 sp2 kernel: d9d18000 bf384f01 00000ceb 0000b2f9 > 00000001 d9d1810c c2013ac4 edc5de40 > Apr 3 12:41:25 sp2 kernel: 08515c98 c6cb37b8 c2014464 c200cc80 > 00000020 00000000 00000000 00000000 > Apr 3 12:41:25 sp2 kernel: Call Trace: > Apr 3 12:41:25 sp2 kernel: [<c0622f16>] > rwsem_down_write_failed+0x126/0x141 > Apr 3 12:41:25 sp2 kernel: [<c0439989>] .text.lock.rwsem+0x2b/0x3a > Apr 3 12:41:25 sp2 kernel: [<c046aa6a>] sys_mprotect+0xbd/0x1eb > > Apr 3 12:41:25 sp2 kernel: [<c0404f4b>] syscall_call+0x7/0xb > > Apr 3 12:41:25 sp2 kernel: ======================= > Apr 3 12:41:25 sp2 kernel: INFO: task clamd:15721 blocked for more than > 120 seconds. > Apr 3 12:41:26 sp2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 3 12:41:26 sp2 kernel: clamd D 00000D49 2528 15721 1 > 16416 15449 (NOTLB) > Apr 3 12:41:26 sp2 kernel: e848cf74 00000086 8f107b57 00000d49 > 30ea2005 e848cf44 c08259d0 00000007 > Apr 3 12:41:26 sp2 kernel: e8c6aaa0 8f117848 00000d49 0000fcf1 > 00000000 e8c6abac c200cc80 f4f5f3c0 > Apr 3 12:41:26 sp2 kernel: c041f863 00000184 c200d620 c2013ac4 > 00000020 00000000 d887f0a8 f766f0c0 > Apr 3 12:41:26 sp2 kernel: Call Trace: > Apr 3 12:41:26 sp2 kernel: [<c041f863>] default_wake_function+0x0/0xc > Apr 3 12:41:26 sp2 kernel: [<c048e994>] destroy_inode+0x38/0x47 > Apr 3 12:41:26 sp2 kernel: [<c0622f16>] > rwsem_down_write_failed+0x126/0x141 > Apr 3 12:41:26 sp2 kernel: [<c0439989>] .text.lock.rwsem+0x2b/0x3a > Apr 3 12:41:26 sp2 kernel: [<c046a32b>] sys_munmap+0x24/0x41 > > Apr 3 12:41:26 sp2 kernel: [<c0404f4b>] syscall_call+0x7/0xb > It sounds like some kind of IO or memory problem. I would probably start by running MEMTEST and the basic diagnostic tests provided by DELL, which if you don't have installed on your disk can be downloaded in the form of a CentOS based openmange liveCD from somewhere on the dell site. It could also be a disk problem, but from the output you provide I think I would look for memory or IO bus problems first and then look for disk problems if you don't find anything with the first two. It almost looks like a memory controller problem. Nataraj