OK, new info here. Tuning is done, got this on node with Windows 2008 Server. Others with *nix are working good. On the storage nothing in logs, on the node: 195 Mar 20 09:42:22 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049 closed (-103) 196 Mar 20 09:42:42 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049 on mlx4_0, memreg 5 slots 32 ird 16 197 Mar 20 09:42:49 v0004 kernel: ------------[ cut here ]------------ 198 Mar 20 09:42:49 v0004 kernel: WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0() (Not tainted) 199 Mar 20 09:42:49 v0004 kernel: Hardware name: S2600WP 200 Mar 20 09:42:49 v0004 kernel: Modules linked in: act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb ebt_arp ebt_ip ebtable_nat ebtables xprtrdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 openvswitch(U) vhost_net macvtap macvlan tun kvm_intel kvm iTCO_wdt iTCO_vendor_support sr_mod cdrom sb_edac edac_core lpc_ich mfd_core igb i2c_algo_bit ptp pps_core sg i2c_i801 i2c_core ioatdma dca mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core ext4 jbd2 mbcache usb_storage sd_mod crc_t10dif ahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] 201 Mar 20 09:42:49 v0004 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.5.1.el6.x86_64 #1 202 Mar 20 09:42:49 v0004 kernel: Call Trace: 203 Mar 20 09:42:49 v0004 kernel: <IRQ> [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0 204 Mar 20 09:42:49 v0004 kernel: [<ffffffff81071e7a>] ? warn_slowpath_null+0x1a/0x20 205 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a3ed>] ? local_bh_enable_ip+0x7d/0xb0 206 Mar 20 09:42:49 v0004 kernel: [<ffffffff8152a7fb>] ? _spin_unlock_bh+0x1b/0x20 207 Mar 20 09:42:49 v0004 kernel: [<ffffffffa04554f0>] ? rpc_wake_up_status+0x70/0x80 [sunrpc] 208 Mar 20 09:42:49 v0004 kernel: [<ffffffffa044e79c>] ? xprt_wake_pending_tasks+0x2c/0x30 [sunrpc] 209 Mar 20 09:42:49 v0004 kernel: [<ffffffffa05322fc>] ? rpcrdma_conn_func+0x9c/0xb0 [xprtrdma] 210 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0535450>] ? rpcrdma_qp_async_error_upcall+0x40/0x80 [xprtrdma] 211 Mar 20 09:42:49 v0004 kernel: [<ffffffffa01c11cb>] ? mlx4_ib_qp_event+0x8b/0x100 [mlx4_ib] 212 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0166c54>] ? mlx4_qp_event+0x74/0xf0 [mlx4_core] 213 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0154057>] ? mlx4_eq_int+0x557/0xcb0 [mlx4_core] 214 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0455396>] ? rpc_wake_up_task_queue_locked+0x186/0x270 [sunrpc] 215 Mar 20 09:42:49 v0004 kernel: [<ffffffffa01547c4>] ? mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core] 216 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e6eb0>] ? handle_IRQ_event+0x60/0x170 217 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e980e>] ? handle_edge_irq+0xde/0x180 218 Mar 20 09:42:49 v0004 kernel: [<ffffffffa0153362>] ? mlx4_cq_completion+0x42/0x90 [mlx4_core] 219 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100faf9>] ? handle_irq+0x49/0xa0 220 Mar 20 09:42:49 v0004 kernel: [<ffffffff815312ec>] ? do_IRQ+0x6c/0xf0 221 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 222 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a893>] ? __do_softirq+0x73/0x1e0 223 Mar 20 09:42:49 v0004 kernel: [<ffffffff810e6eb0>] ? handle_IRQ_event+0x60/0x170 224 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 225 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0 226 Mar 20 09:42:49 v0004 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90 227 Mar 20 09:42:49 v0004 kernel: [<ffffffff815312f5>] ? do_IRQ+0x75/0xf0 228 Mar 20 09:42:49 v0004 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 229 Mar 20 09:42:49 v0004 kernel: <EOI> [<ffffffff812e09ae>] ? intel_idle+0xde/0x170 230 Mar 20 09:42:49 v0004 kernel: [<ffffffff812e0991>] ? intel_idle+0xc1/0x170 231 Mar 20 09:42:49 v0004 kernel: [<ffffffff814268f7>] ? cpuidle_idle_call+0xa7/0x140 232 Mar 20 09:42:49 v0004 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 233 Mar 20 09:42:49 v0004 kernel: [<ffffffff8150cf1a>] ? rest_init+0x7a/0x80 234 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c26f8f>] ? start_kernel+0x424/0x430 235 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c2633a>] ? x86_64_start_reservations+0x125/0x129 236 Mar 20 09:42:49 v0004 kernel: [<ffffffff81c26453>] ? x86_64_start_kernel+0x115/0x124 237 Mar 20 09:42:49 v0004 kernel: ---[ end trace ddc1b92aa1d57ab7 ]--- 238 Mar 20 09:42:49 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049 closed (-103) 239 Mar 20 09:43:19 v0004 kernel: rpcrdma: connection to 192.168.1.1:20049 on mlx4_0, memreg 5 slots 32 ird 16 and so on. > Done so. The problem still exists but only with win. Much heavier load on > Linux/FreeBSD VMs doesn't cause anything. >> On Thu, Feb 20, 2014 at 1:53 PM, <engineer at colocat.ru> wrote: >> >>> Sometimes there are messages like >>> Feb 17 04:11:28 stor1 rpc.idmapd[3116]: nss_getpwnam: name '0' does not >>> map into domain 'localdomain' >>> And nothing more. We've done tailing of logs both storage and node - >>> nothing. In debug we've got aroung 10Gb of messages but there's noone >>> to >>> catch the problem :( >>> >> >> The problem with NFS server, >> >> Try below link to tune NFS server for better performance. >> >> 1. http://www.tldp.org/HOWTO/NFS-HOWTO/performance.html >> >> 2. >> http://www.techrepublic.com/blog/linux-and-open-source/tuning-nfs-for-better-performance/ >> >> >>> >>> > 1. Try to see the logs on Storage server for more information. What >>> kind >>> > of >>> > errors you are getting?