[CentOS-virt] Problems with scsi-target-utils when hosted on dom0 centos 7 xen box

Thu Apr 14 14:10:34 UTC 2016
Hans Loots <hans.loots at webpower.nl>

Hello Nathan, dear all,

> We were attempting to use scsi-target-utils, hosted on a xen dom0 vm
using localhost,
> and running into some problems.  I was not able to reproduce this on a
centos 7.2
> server using the default kernel.

I am seeing comparable things on our centos6 xen servers running 3.18
kernels. We have about 20 of those machines running and have started
upgrading them from 3.10.68 to 3.18 a couple of weeks ago. But currently,
at 3/4 of finishing, I'm having second thoughts and am thinking about
rolling back because of reliability issues.

Stuff I've tried before is taking care that all machine runs latest BIOS'es
and ethernet firmware. The servers in question are Dell PowerEdges from
different generations, talking to an Equallogic diskarray over 1Gbit
copper. Dells toolset is installed, OMSA as well as hitkit.

The errors I'm seeing are looking like this:

Apr 13 23:03:43 xen15-2 iscsid: Kernel reported iSCSI connection 25:0 error
(1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (1)
Apr 13 23:03:43 xen15-2 iscsid: Connection25:0 to [target: iqn.xxxxx,
portal: a.b.c.d,3260] through [iface: eql.em2] is operational now
Apr 13 23:03:48 xen15-2 iscsid: Connection9:0 to [target: iqn.xxxxx,
portal: a.b.c.d,3260] through [iface: eql.em2] is shutdown.

While the the only noticeable difference in dmesg output is stuff like this:
(on 3.18)
pci 0000:02:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no
compatible bridge window
pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no
compatible bridge window
pci 0000:01:00.0: BAR 6: assigned [mem 0x91e80000-0x91efffff pref]
pci 0000:01:00.1: BAR 6: no space for [mem size 0x00080000 pref]
pci 0000:01:00.1: BAR 6: failed to assign [mem size 0x00080000 pref]
pci 0000:01:00.2: BAR 6: no space for [mem size 0x00080000 pref]
pci 0000:01:00.2: BAR 6: failed to assign [mem size 0x00080000 pref]
pci 0000:01:00.3: BAR 6: no space for [mem size 0x00080000 pref]
pci 0000:01:00.3: BAR 6: failed to assign [mem size 0x00080000 pref]
(and on 3.10)
pci 0000:00:03.0: BAR 15: assigned [mem 0xd5200000-0xd53fffff pref]
pci 0000:01:00.1: BAR 6: assigned [mem 0xd5000000-0xd507ffff pref]
pci 0000:01:00.2: BAR 6: assigned [mem 0xd5080000-0xd50fffff pref]
pci 0000:01:00.3: BAR 6: assigned [mem 0xd5100000-0xd517ffff pref]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0xd8000000-0xd8ffffff]
pci 0000:00:01.0:   bridge window [mem 0xd5000000-0xd51fffff pref]

But to be honest, my knowledge as to the possible cause of this is lacking.
Is this just a small ACPI related glitch or is it the sign ethernet cards
are misbehaving somehow?

Are more people seeing errors in this area?

Thx and regards,
-- Hans (just trying to make sense of it all)


2016-04-11 22:14 GMT+02:00 Nathan Coulson <nathan at bravenet.com>:

> Hello
>
> We were attempting to use scsi-target-utils, hosted on a xen dom0 vm using
> localhost, and running into some problems.  I was not able to reproduce
> this on a centos 7.2 server using the default kernel.
>
>
> (From dmesg)
> Apr  4 11:18:42 funk kernel: [  596.511204]  connection2:0: detected conn
> error (1022)
> Apr  4 11:18:42 funk kernel: connection2:0: ping timeout of 5 secs
> expired, recv timeout 5, last rx 4295253788, last ping 4295258790, now
> 4295263808
> Apr  4 11:18:42 funk kernel: connection2:0: detected conn error (1022)
> Apr  4 11:18:42 funk iscsid: Kernel reported iSCSI connection 2:0 error
> (1022 - Invalid or unknown error code) state (3)
> Apr  4 11:18:44 funk iscsid: connection2:0 is operational after recovery
> (1 attempts)
>
> Repeated a few times, until eventually
>
>
> Apr  4 11:19:44 funk kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED
> driverbyte=DRIVER_OK
> Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd] CDB:
> Apr  4 11:19:44 funk kernel: Write(10): 2a 00 01 df c7 e8 00 00 18 00
> Apr  4 11:19:44 funk kernel: blk_update_request: I/O error, dev sdd,
> sector 31442920
> Apr  4 11:19:44 funk kernel: [  658.127596] sd 7:0:0:1: [sdd]
> Apr  4 11:19:44 funk kernel: [  658.127688] Result:
> hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> Apr  4 11:19:44 funk kernel: [  658.127761] sd 7:0:0:1: [sdd] CDB:
> Apr  4 11:19:44 funk kernel: [  658.127826] Write(10): 2a 00 01 df c7 e8
> 00 00 18 00
> Apr  4 11:19:44 funk kernel: [  658.127927] blk_update_request: I/O error,
> dev sdd, sector 31442920
> Apr  4 11:19:44 funk kernel: [  658.128040] sd 7:0:0:1: [sdd]
> Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd]
> Apr  4 11:19:44 funk kernel: [  658.128105] Result:
> hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> Apr  4 11:19:44 funk kernel: [  658.128177] sd 7:0:0:1: [sdd] CDB:
> Apr  4 11:19:44 funk kernel: [  658.128241] Write(10): 2a 00 00 00 08 00
> 00 00 18 00
> Apr  4 11:19:44 funk kernel: [  658.128339] blk_update_request: I/O error,
> dev sdd, sector 2048
> Apr  4 11:19:44 funk kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED
> driverbyte=DRIVER_OK
> Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd] CDB:
> Apr  4 11:19:44 funk kernel: Write(10): 2a 00 00 00 08 00 00 00 18 00
> Apr  4 11:19:44 funk kernel: blk_update_request: I/O error, dev sdd,
> sector 2048
>
>
> (Test Setup)
> scsi-target-utils installed via yum, default config
> /etc/tgt/conf.d/xenguests.conf
> <target iqn.2016-02.com.bravenet:test>
>     backing-store //mnt/vmdisk/test # vm image
> </target>
>
> systemctl tgtd restart
>
> iscsiadm -m discovery -t sendtargets -p localhost
>
> iscsiadm -m node -T iqn.2016-02.com.bravenet:test -l
>
>
> add it to lvm (pvcreate, vgcreate), let's call it /dev/vmdisk.vg/test.lv
>
> and then use libvirt to attempt to install an os on /dev/vmdisk.vg/test.lv
> (using anaconda)
>
>
>
>
> Around the time it tries to create the disk label, is when the conn errors
> start, until eventually it gives up trying to create the disk label.
>
>
>
> We tested a similar setup on a centos 7.2 host we use kvm based
> virtualmachine hosting on (default 3.10 kernel), and it worked fine.  It
> may be similar to what was reported on
> https://bugzilla.redhat.com/show_bug.cgi?id=1245990, but I never saw a
> resolution on what they discovered (other then a reference to comment18
> which does not appear to exist).
>
> Testing over the network appears to also work as well (where another
> machine connects to scsi-target-utils on the funk server above.
>
>
>
>
>
> Longterm Purpose of the above setup, was to get direct access to a
> filesystem image hosted on a gluster setup, using bs-type glfs on
> scsi-target-utils.
>
> --
> Nathan Coulson
> www.bravenet.com
> nathan at bravenet.com
> _______________________________________________
> CentOS-virt mailing list
> CentOS-virt at centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20160414/c18913a8/attachment-0004.html>