[CentOS-virt] Problems with Windows on KVM machine

Thu Mar 20 09:41:03 UTC 2014
engineer at colocat.ru <engineer at colocat.ru>

OK, new info here. Tuning is done, got this on node with Windows 2008
Server. Others with *nix are working good. On the storage nothing in logs,
on the node:


195 Mar 20 09:42:22 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049
closed (-103)
196 Mar 20 09:42:42 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049
on mlx4_0, memreg 5 slots 32 ird 16
197 Mar 20 09:42:49 v0004 kernel: ------------[ cut here ]------------
198 Mar 20 09:42:49 v0004 kernel: WARNING: at kernel/softirq.c:159
local_bh_enable_ip+0x7d/0xb0() (Not tainted)
199 Mar 20 09:42:49 v0004 kernel: Hardware name: S2600WP
200 Mar 20 09:42:49 v0004 kernel: Modules linked in: act_police cls_u32
sch_ingress cls_fw sch_sfq sch_htb ebt_arp ebt_ip ebtable_nat ebtables
xprtrdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc bridge stp llc
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm ib_addr ipv6 openvswitch(U) vhost_net macvtap macvlan
tun kvm_intel kvm iTCO_wdt iTCO_vendor_support sr_mod cdrom sb_edac
edac_core lpc_ich mfd_core igb i2c_algo_bit ptp pps_core sg i2c_i801
i2c_core ioatdma dca mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core ext4
jbd2 mbcache usb_storage sd_mod crc_t10dif ahci isci libsas
scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
201 Mar 20 09:42:49 v0004 kernel: Pid: 0, comm: swapper Not tainted
2.6.32-431.5.1.el6.x86_64 #1
202 Mar 20 09:42:49 v0004 kernel: Call Trace:
203 Mar 20 09:42:49 v0004 kernel: <IRQ> [<ffffffff81071e27>] ?
warn_slowpath_common+0x87/0xc0
204 Mar 20 09:42:49 v0004 kernel: [<ffffffff81071e7a>] ?
warn_slowpath_null+0x1a/0x20
205 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a3ed>] ?
local_bh_enable_ip+0x7d/0xb0
206 Mar 20 09:42:49 v0004 kernel: [<ffffffff8152a7fb>] ?
_spin_unlock_bh+0x1b/0x20
207 Mar 20 09:42:49 v0004 kernel: [<ffffffffa04554f0>] ?
rpc_wake_up_status+0x70/0x80 [sunrpc]
208 Mar 20 09:42:49 v0004 kernel: [<ffffffffa044e79c>] ?
xprt_wake_pending_tasks+0x2c/0x30 [sunrpc]
209 Mar 20 09:42:49 v0004 kernel: [<ffffffffa05322fc>] ?
rpcrdma_conn_func+0x9c/0xb0 [xprtrdma]
210 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0535450>] ?
rpcrdma_qp_async_error_upcall+0x40/0x80 [xprtrdma]
211 Mar 20 09:42:49 v0004 kernel: [<ffffffffa01c11cb>] ?
mlx4_ib_qp_event+0x8b/0x100 [mlx4_ib]
212 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0166c54>] ?
mlx4_qp_event+0x74/0xf0 [mlx4_core]
213 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0154057>] ?
mlx4_eq_int+0x557/0xcb0 [mlx4_core]
214 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0455396>] ?
rpc_wake_up_task_queue_locked+0x186/0x270 [sunrpc]
215 Mar 20 09:42:49 v0004 kernel: [<ffffffffa01547c4>] ?
mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
216 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e6eb0>] ?
handle_IRQ_event+0x60/0x170
217 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e980e>] ?
handle_edge_irq+0xde/0x180
218 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0153362>] ?
mlx4_cq_completion+0x42/0x90 [mlx4_core]
219 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100faf9>] ? handle_irq+0x49/0xa0
220 Mar 20 09:42:49 v0004 kernel: [<ffffffff815312ec>] ? do_IRQ+0x6c/0xf0
221 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100b9d3>] ?
ret_from_intr+0x0/0x11
222 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a893>] ?
__do_softirq+0x73/0x1e0
223 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e6eb0>] ?
handle_IRQ_event+0x60/0x170
224 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100c30c>] ?
call_softirq+0x1c/0x30
225 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
226 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90
227 Mar 20 09:42:49 v0004 kernel: [<ffffffff815312f5>] ? do_IRQ+0x75/0xf0
228 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100b9d3>] ?
ret_from_intr+0x0/0x11
229 Mar 20 09:42:49 v0004 kernel: <EOI> [<ffffffff812e09ae>] ?
intel_idle+0xde/0x170
230 Mar 20 09:42:49 v0004 kernel: [<ffffffff812e0991>] ?
intel_idle+0xc1/0x170
231 Mar 20 09:42:49 v0004 kernel: [<ffffffff814268f7>] ?
cpuidle_idle_call+0xa7/0x140
232 Mar 20 09:42:49 v0004 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
233 Mar 20 09:42:49 v0004 kernel: [<ffffffff8150cf1a>] ? rest_init+0x7a/0x80
234 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c26f8f>] ?
start_kernel+0x424/0x430
235 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c2633a>] ?
x86_64_start_reservations+0x125/0x129
236 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c26453>] ?
x86_64_start_kernel+0x115/0x124
237 Mar 20 09:42:49 v0004 kernel: ---[ end trace ddc1b92aa1d57ab7 ]---
238 Mar 20 09:42:49 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049
closed (-103)
239 Mar 20 09:43:19 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049
on mlx4_0, memreg 5 slots 32 ird 16

and so on.

> Done so. The problem still exists but only with win. Much heavier load on
> Linux/FreeBSD VMs doesn't cause anything.
>> On Thu, Feb 20, 2014 at 1:53 PM, <engineer at colocat.ru> wrote:
>>
>>> Sometimes there are messages like
>>> Feb 17 04:11:28 stor1 rpc.idmapd[3116]: nss_getpwnam: name '0' does not
>>> map into domain 'localdomain'
>>> And nothing more. We've done tailing of logs both storage and node -
>>> nothing. In debug we've got aroung 10Gb of messages but there's noone
>>> to
>>> catch the problem :(
>>>
>>
>> The problem with NFS server,
>>
>> Try below link to tune NFS server for better performance.
>>
>> 1. http://www.tldp.org/HOWTO/NFS-HOWTO/performance.html
>>
>> 2.
>> http://www.techrepublic.com/blog/linux-and-open-source/tuning-nfs-for-better-performance/
>>
>>
>>>
>>> > 1. Try to see the logs on Storage server for more information. What
>>> kind
>>> > of
>>> > errors you are getting?