Estou tendo problemas com o Centos 6.4 x64 em um Dell R815 que é ligado a uma storage Dell MD3200. Essa maquina é um banco de dados (PostgreSQL 9.2.4).
Com o passar das horas começa ocorrer o seguinte erro:
Jul 7 04:16:42 olosdb01 kernel: BUG: soft lockup - CPU#9 stuck for 67s! [multipathd:2175] Jul 7 04:16:42 olosdb01 kernel: Modules linked in: dell_rbu mptctl mptbase vfat fat nls_utf8 autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 power_meter dcdbas microcode serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core sg ses enclosure bnx2 ext4 mbcache jbd2 dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif usb_storage ahci mpt2sas scsi_transport_sas raid_class megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dell_rbu] Jul 7 04:16:42 olosdb01 kernel: CPU 9 Jul 7 04:16:42 olosdb01 kernel: Modules linked in: dell_rbu mptctl mptbase vfat fat nls_utf8 autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 power_meter dcdbas microcode serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core sg ses enclosure bnx2 ext4 mbcache jbd2 dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif usb_storage ahci mpt2sas scsi_transport_sas raid_class megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dell_rbu] Jul 7 04:16:42 olosdb01 kernel: Jul 7 04:16:42 olosdb01 kernel: Pid: 2175, comm: multipathd Not tainted 2.6.32-358.el6.x86_64 #1 Dell Inc. PowerEdge R815/0272WF Jul 7 04:16:42 olosdb01 kernel: RIP: 0010:[<ffffffff810aeb6a>] [<ffffffff810aeb6a>] smp_call_function_many+0x1ea/0x260 Jul 7 04:16:42 olosdb01 kernel: RSP: 0018:ffff880e7c67bc18 EFLAGS: 00000202 Jul 7 04:16:42 olosdb01 kernel: RAX: 0000000000000010 RBX: ffff880e7c67bc58 RCX: 0000000000000020 Jul 7 04:16:42 olosdb01 kernel: RDX: 000000000000000f RSI: 0000000000000040 RDI: 0000000000000286 Jul 7 04:16:42 olosdb01 kernel: RBP: ffffffff8100bb8e R08: ffffffff81c07528 R09: 0000000000000040 Jul 7 04:16:42 olosdb01 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff880e7c67bbf0 Jul 7 04:16:42 olosdb01 kernel: R13: 0000000000000003 R14: 0000000000000296 R15: ffff880e7c67bbc8 Jul 7 04:16:42 olosdb01 kernel: FS: 00007f88d8261700(0000) GS:ffff880c87440000(0000) knlGS:00000000aecffb70 Jul 7 04:16:42 olosdb01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 7 04:16:42 olosdb01 kernel: CR2: 00007fffa74e3828 CR3: 000000047c7c5000 CR4: 00000000000407e0 Jul 7 04:16:42 olosdb01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 7 04:16:42 olosdb01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 7 04:16:42 olosdb01 kernel: Process multipathd (pid: 2175, threadinfo ffff880e7c67a000, task ffff880e7c283540) Jul 7 04:16:42 olosdb01 kernel: Stack: Jul 7 04:16:42 olosdb01 kernel: 01ff880e7c67be34 0000000000000000 ffff880e7c67be10 ffffffff8104c740 Jul 7 04:16:42 olosdb01 kernel: <d> 0000000000000000 ffff880e7c67bcc8 ffffffff81ac2430 0000000000000000 Jul 7 04:16:42 olosdb01 kernel: <d> ffff880e7c67bc68 ffffffff810aec02 ffff880e7c67bc98 ffffffff81076bc4 Jul 7 04:16:42 olosdb01 kernel: Call Trace: Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c740>] ? do_flush_tlb_all+0x0/0x60 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aec02>] ? smp_call_function+0x22/0x30 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81076bc4>] ? on_each_cpu+0x24/0x50 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c4bc>] ? flush_tlb_all+0x1c/0x20 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811508ba>] ? __purge_vmap_area_lazy+0xea/0x1e0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150a10>] ? free_vmap_area_noflush+0x60/0x70 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150ac5>] ? free_unmap_vmap_area+0x25/0x30 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150b10>] ? remove_vm_area+0x40/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150c2e>] ? __vunmap+0x2e/0x120 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40 Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa00099ea>] ? ctl_ioctl+0x1ca/0x270 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8109b1a1>] ? lock_hrtimer_base+0x31/0x60 Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0007fd0>] ? dev_status+0x0/0x50 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0009aa3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81194ed2>] ? vfs_ioctl+0x22/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81195074>] ? do_vfs_ioctl+0x84/0x580 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811955f1>] ? sys_ioctl+0x81/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810dc565>] ? __audit_syscall_exit+0x265/0x290 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Jul 7 04:16:42 olosdb01 kernel: Code: 14 46 00 0f ae f0 48 8b 7b 30 ff 15 59 7a 9e 00 80 7d c7 00 0f 84 9f fe ff ff f6 43 20 01 0f 84 95 fe ff ff 0f 1f 44 00 00 f3 90 <f6> 43 20 01 75 f8 e9 83 fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44 Jul 7 04:16:42 olosdb01 kernel: Call Trace: Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aeb4f>] ? smp_call_function_many+0x1cf/0x260 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c740>] ? do_flush_tlb_all+0x0/0x60 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aec02>] ? smp_call_function+0x22/0x30 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81076bc4>] ? on_each_cpu+0x24/0x50 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c4bc>] ? flush_tlb_all+0x1c/0x20 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811508ba>] ? __purge_vmap_area_lazy+0xea/0x1e0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150a10>] ? free_vmap_area_noflush+0x60/0x70 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150ac5>] ? free_unmap_vmap_area+0x25/0x30 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150b10>] ? remove_vm_area+0x40/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150c2e>] ? __vunmap+0x2e/0x120 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40 Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa00099ea>] ? ctl_ioctl+0x1ca/0x270 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8109b1a1>] ? lock_hrtimer_base+0x31/0x60 Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0007fd0>] ? dev_status+0x0/0x50 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0009aa3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81194ed2>] ? vfs_ioctl+0x22/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81195074>] ? do_vfs_ioctl+0x84/0x580 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811955f1>] ? sys_ioctl+0x81/0xa0 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810dc565>] ? __audit_syscall_exit+0x265/0x290 Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Alguma ideia do que possar ser? Abraço!