On 31/01/17 21:00, Jinesh Choksi wrote:
On 30 January 2017 at 22:17, Adi Pircalabu wrote:
May I chip in here? In our environment we're randomly seeing: Jan 17 23:40:14 xen01 kernel: ixgbe 0000:04:00.1 eth6: Detected Tx Unit Hang
Someone in this thread: https://sourceforge.net/p/e1000/bugs/530/#2855 reported that /"With these kernels I was only able to work around the issue by disabling tx-checksumming offload with ethtool."/
However, that was reported for Kernels 4.2.6 / 4.2.8 / 4.4.8 and 4.4.10. I just thought it could be something you could rule out and hence mentioned it:
ethtool --offload eth6 rx off tx off
Another thing to rule out in case its a regression with Intel NICs and TSO:
# tso => tcp-segmentation-offload # gso => generic-segmentation-offload # gro => generic-receive-offload # sg => scatter-gather # ufo => udp-fragmentation-offload (Cannot change) # lro => large-receive-offload (Cannot change)
ethtool -K eth6 tso off gso off gro off sg off
Nice, useful information. I've just disabled tx & rx checksumming on all the 10Gb interfaces on the affected servers, see how it goes. But as I said yesterday, in our environment it takes months to replicate.
Thanks,
Adi Pircalabu