Estou tendo problemas com o Centos 6.4 x64 em um Dell R815 que é ligado
a uma storage Dell MD3200.
Essa maquina é um banco de dados (PostgreSQL 9.2.4).
Com o passar das horas começa ocorrer o seguinte erro:
Jul 7 04:16:42 olosdb01 kernel: BUG: soft lockup - CPU#9 stuck for
67s! [multipathd:2175]
Jul 7 04:16:42 olosdb01 kernel: Modules linked in: dell_rbu mptctl
mptbase vfat fat nls_utf8 autofs4 sunrpc ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 power_meter dcdbas microcode
serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd
i2c_piix4 i2c_core sg ses enclosure bnx2 ext4 mbcache jbd2
dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif
usb_storage ahci mpt2sas scsi_transport_sas raid_class megaraid_sas
dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
dell_rbu]
Jul 7 04:16:42 olosdb01 kernel: CPU 9
Jul 7 04:16:42 olosdb01 kernel: Modules linked in: dell_rbu mptctl
mptbase vfat fat nls_utf8 autofs4 sunrpc ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 power_meter dcdbas microcode
serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd
i2c_piix4 i2c_core sg ses enclosure bnx2 ext4 mbcache jbd2
dm_round_robin scsi_dh_rdac sr_mod cdrom sd_mod crc_t10dif
usb_storage ahci mpt2sas scsi_transport_sas raid_class megaraid_sas
dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
dell_rbu]
Jul 7 04:16:42 olosdb01 kernel:
Jul 7 04:16:42 olosdb01 kernel: Pid: 2175, comm: multipathd Not
tainted 2.6.32-358.el6.x86_64 #1 Dell Inc. PowerEdge R815/0272WF
Jul 7 04:16:42 olosdb01 kernel: RIP: 0010:[<ffffffff810aeb6a>]
[<ffffffff810aeb6a>] smp_call_function_many+0x1ea/0x260
Jul 7 04:16:42 olosdb01 kernel: RSP: 0018:ffff880e7c67bc18 EFLAGS:
00000202
Jul 7 04:16:42 olosdb01 kernel: RAX: 0000000000000010 RBX:
ffff880e7c67bc58 RCX: 0000000000000020
Jul 7 04:16:42 olosdb01 kernel: RDX: 000000000000000f RSI:
0000000000000040 RDI: 0000000000000286
Jul 7 04:16:42 olosdb01 kernel: RBP: ffffffff8100bb8e R08:
ffffffff81c07528 R09: 0000000000000040
Jul 7 04:16:42 olosdb01 kernel: R10: 0000000000000001 R11:
0000000000000000 R12: ffff880e7c67bbf0
Jul 7 04:16:42 olosdb01 kernel: R13: 0000000000000003 R14:
0000000000000296 R15: ffff880e7c67bbc8
Jul 7 04:16:42 olosdb01 kernel: FS: 00007f88d8261700(0000)
GS:ffff880c87440000(0000) knlGS:00000000aecffb70
Jul 7 04:16:42 olosdb01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 7 04:16:42 olosdb01 kernel: CR2: 00007fffa74e3828 CR3:
000000047c7c5000 CR4: 00000000000407e0
Jul 7 04:16:42 olosdb01 kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 7 04:16:42 olosdb01 kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 7 04:16:42 olosdb01 kernel: Process multipathd (pid: 2175,
threadinfo ffff880e7c67a000, task ffff880e7c283540)
Jul 7 04:16:42 olosdb01 kernel: Stack:
Jul 7 04:16:42 olosdb01 kernel: 01ff880e7c67be34 0000000000000000
ffff880e7c67be10 ffffffff8104c740
Jul 7 04:16:42 olosdb01 kernel: <d> 0000000000000000
ffff880e7c67bcc8 ffffffff81ac2430 0000000000000000
Jul 7 04:16:42 olosdb01 kernel: <d> ffff880e7c67bc68
ffffffff810aec02 ffff880e7c67bc98 ffffffff81076bc4
Jul 7 04:16:42 olosdb01 kernel: Call Trace:
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c740>] ?
do_flush_tlb_all+0x0/0x60
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aec02>] ?
smp_call_function+0x22/0x30
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81076bc4>] ?
on_each_cpu+0x24/0x50
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c4bc>] ?
flush_tlb_all+0x1c/0x20
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811508ba>] ?
__purge_vmap_area_lazy+0xea/0x1e0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150a10>] ?
free_vmap_area_noflush+0x60/0x70
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150ac5>] ?
free_unmap_vmap_area+0x25/0x30
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150b10>] ?
remove_vm_area+0x40/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150c2e>] ?
__vunmap+0x2e/0x120
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa00099ea>] ?
ctl_ioctl+0x1ca/0x270 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8109b1a1>] ?
lock_hrtimer_base+0x31/0x60
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0007fd0>] ?
dev_status+0x0/0x50 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0009aa3>] ?
dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81194ed2>] ?
vfs_ioctl+0x22/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81195074>] ?
do_vfs_ioctl+0x84/0x580
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811955f1>] ?
sys_ioctl+0x81/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810dc565>] ?
__audit_syscall_exit+0x265/0x290
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b
Jul 7 04:16:42 olosdb01 kernel: Code: 14 46 00 0f ae f0 48 8b 7b 30
ff 15 59 7a 9e 00 80 7d c7 00 0f 84 9f fe ff ff f6 43 20 01 0f 84 95
fe ff ff 0f 1f 44 00 00 f3 90 <f6> 43 20 01 75 f8 e9 83 fe ff ff 0f
1f 00 4c 89 ea 4c 89 f6 44
Jul 7 04:16:42 olosdb01 kernel: Call Trace:
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aeb4f>] ?
smp_call_function_many+0x1cf/0x260
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c740>] ?
do_flush_tlb_all+0x0/0x60
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810aec02>] ?
smp_call_function+0x22/0x30
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81076bc4>] ?
on_each_cpu+0x24/0x50
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8104c4bc>] ?
flush_tlb_all+0x1c/0x20
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811508ba>] ?
__purge_vmap_area_lazy+0xea/0x1e0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150a10>] ?
free_vmap_area_noflush+0x60/0x70
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150ac5>] ?
free_unmap_vmap_area+0x25/0x30
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150b10>] ?
remove_vm_area+0x40/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150c2e>] ?
__vunmap+0x2e/0x120
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81150d9a>] ? vfree+0x2a/0x40
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa00099ea>] ?
ctl_ioctl+0x1ca/0x270 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8109b1a1>] ?
lock_hrtimer_base+0x31/0x60
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0007fd0>] ?
dev_status+0x0/0x50 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffffa0009aa3>] ?
dm_ctl_ioctl+0x13/0x20 [dm_mod]
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81194ed2>] ?
vfs_ioctl+0x22/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff81195074>] ?
do_vfs_ioctl+0x84/0x580
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff811955f1>] ?
sys_ioctl+0x81/0xa0
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff810dc565>] ?
__audit_syscall_exit+0x265/0x290
Jul 7 04:16:42 olosdb01 kernel: [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b
Alguma ideia do que possar ser?
Abraço!