[CentOS-virt] Problems with scsi-target-utils when hosted on dom0 centos 7 xen box

Fri Apr 15 20:27:08 UTC 2016
Nathan Coulson <nathan at bravenet.com>

On 2016-04-14 07:10 AM, Hans Loots wrote:
> Hello Nathan, dear all,
>
> > We were attempting to use scsi-target-utils, hosted on a xen dom0 vm 
> using localhost,
> > and running into some problems.  I was not able to reproduce this on 
> a centos 7.2
> > server using the default kernel.
>
> I am seeing comparable things on our centos6 xen servers running 3.18 
> kernels. We have about 20 of those machines running and have started 
> upgrading them from 3.10.68 to 3.18 a couple of weeks ago. But 
> currently, at 3/4 of finishing, I'm having second thoughts and am 
> thinking about rolling back because of reliability issues.
>
> Stuff I've tried before is taking care that all machine runs latest 
> BIOS'es and ethernet firmware. The servers in question are Dell 
> PowerEdges from different generations, talking to an Equallogic 
> diskarray over 1Gbit copper. Dells toolset is installed, OMSA as well 
> as hitkit.
>
> The errors I'm seeing are looking like this:
>
> Apr 13 23:03:43 xen15-2 iscsid: Kernel reported iSCSI connection 25:0 
> error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (1)
> Apr 13 23:03:43 xen15-2 iscsid: Connection25:0 to [target: iqn.xxxxx, 
> portal: a.b.c.d,3260] through [iface: eql.em2] is operational now
> Apr 13 23:03:48 xen15-2 iscsid: Connection9:0 to [target: iqn.xxxxx, 
> portal: a.b.c.d,3260] through [iface: eql.em2] is shutdown.
>

I did not have interface shutdowns in my tests (well, Network Manager 
was doing something there but I disabled it for my tests). The hardware 
is an old Tyan S2882D motherboard, 8GB Ram, and 2x Opteron 275 
processors (Dual Core).



> While the the only noticeable difference in dmesg output is stuff like 
> this:
> (on 3.18)
> pci 0000:02:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: 
> no compatible bridge window
> pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: 
> no compatible bridge window
> pci 0000:01:00.0: BAR 6: assigned [mem 0x91e80000-0x91efffff pref]
> pci 0000:01:00.1: BAR 6: no space for [mem size 0x00080000 pref]
> pci 0000:01:00.1: BAR 6: failed to assign [mem size 0x00080000 pref]
> pci 0000:01:00.2: BAR 6: no space for [mem size 0x00080000 pref]
> pci 0000:01:00.2: BAR 6: failed to assign [mem size 0x00080000 pref]
> pci 0000:01:00.3: BAR 6: no space for [mem size 0x00080000 pref]
> pci 0000:01:00.3: BAR 6: failed to assign [mem size 0x00080000 pref]
> (and on 3.10)
> pci 0000:00:03.0: BAR 15: assigned [mem 0xd5200000-0xd53fffff pref]
> pci 0000:01:00.1: BAR 6: assigned [mem 0xd5000000-0xd507ffff pref]
> pci 0000:01:00.2: BAR 6: assigned [mem 0xd5080000-0xd50fffff pref]
> pci 0000:01:00.3: BAR 6: assigned [mem 0xd5100000-0xd517ffff pref]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> pci 0000:00:01.0:   bridge window [mem 0xd8000000-0xd8ffffff]
> pci 0000:00:01.0:   bridge window [mem 0xd5000000-0xd51fffff pref]
>
> But to be honest, my knowledge as to the possible cause of this is 
> lacking. Is this just a small ACPI related glitch or is it the sign 
> ethernet cards are misbehaving somehow?
>
> Are more people seeing errors in this area?
>
> Thx and regards,
> -- Hans (just trying to make sense of it all)
>
>
> 2016-04-11 22:14 GMT+02:00 Nathan Coulson <nathan at bravenet.com 
> <mailto:nathan at bravenet.com>>:
>
>     Hello
>
>     We were attempting to use scsi-target-utils, hosted on a xen dom0
>     vm using localhost, and running into some problems.  I was not
>     able to reproduce this on a centos 7.2 server using the default
>     kernel.
>
>
>     (From dmesg)
>     Apr  4 11:18:42 funk kernel: [  596.511204]  connection2:0:
>     detected conn error (1022)
>     Apr  4 11:18:42 funk kernel: connection2:0: ping timeout of 5 secs
>     expired, recv timeout 5, last rx 4295253788, last ping 4295258790,
>     now 4295263808
>     Apr  4 11:18:42 funk kernel: connection2:0: detected conn error (1022)
>     Apr  4 11:18:42 funk iscsid: Kernel reported iSCSI connection 2:0
>     error (1022 - Invalid or unknown error code) state (3)
>     Apr  4 11:18:44 funk iscsid: connection2:0 is operational after
>     recovery (1 attempts)
>
>     Repeated a few times, until eventually
>
>
>     Apr  4 11:19:44 funk kernel: Result:
>     hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
>     Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd] CDB:
>     Apr  4 11:19:44 funk kernel: Write(10): 2a 00 01 df c7 e8 00 00 18 00
>     Apr  4 11:19:44 funk kernel: blk_update_request: I/O error, dev
>     sdd, sector 31442920
>     Apr  4 11:19:44 funk kernel: [  658.127596] sd 7:0:0:1: [sdd]
>     Apr  4 11:19:44 funk kernel: [  658.127688] Result:
>     hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
>     Apr  4 11:19:44 funk kernel: [  658.127761] sd 7:0:0:1: [sdd] CDB:
>     Apr  4 11:19:44 funk kernel: [  658.127826] Write(10): 2a 00 01 df
>     c7 e8 00 00 18 00
>     Apr  4 11:19:44 funk kernel: [  658.127927] blk_update_request:
>     I/O error, dev sdd, sector 31442920
>     Apr  4 11:19:44 funk kernel: [  658.128040] sd 7:0:0:1: [sdd]
>     Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd]
>     Apr  4 11:19:44 funk kernel: [  658.128105] Result:
>     hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
>     Apr  4 11:19:44 funk kernel: [  658.128177] sd 7:0:0:1: [sdd] CDB:
>     Apr  4 11:19:44 funk kernel: [  658.128241] Write(10): 2a 00 00 00
>     08 00 00 00 18 00
>     Apr  4 11:19:44 funk kernel: [  658.128339] blk_update_request:
>     I/O error, dev sdd, sector 2048
>     Apr  4 11:19:44 funk kernel: Result:
>     hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
>     Apr  4 11:19:44 funk kernel: sd 7:0:0:1: [sdd] CDB:
>     Apr  4 11:19:44 funk kernel: Write(10): 2a 00 00 00 08 00 00 00 18 00
>     Apr  4 11:19:44 funk kernel: blk_update_request: I/O error, dev
>     sdd, sector 2048
>
>
>     (Test Setup)
>     scsi-target-utils installed via yum, default config
>     /etc/tgt/conf.d/xenguests.conf
>     <target iqn.2016-02.com.bravenet:test>
>         backing-store //mnt/vmdisk/test # vm image
>     </target>
>
>     systemctl tgtd restart
>
>     iscsiadm -m discovery -t sendtargets -p localhost
>
>     iscsiadm -m node -T iqn.2016-02.com.bravenet:test -l
>
>
>     add it to lvm (pvcreate, vgcreate), let's call it
>     /dev/vmdisk.vg/test.lv <http://vmdisk.vg/test.lv>
>
>     and then use libvirt to attempt to install an os on
>     /dev/vmdisk.vg/test.lv <http://vmdisk.vg/test.lv> (using anaconda)
>
>
>
>
>     Around the time it tries to create the disk label, is when the
>     conn errors start, until eventually it gives up trying to create
>     the disk label.
>
>
>
>     We tested a similar setup on a centos 7.2 host we use kvm based
>     virtualmachine hosting on (default 3.10 kernel), and it worked
>     fine.  It may be similar to what was reported on
>     https://bugzilla.redhat.com/show_bug.cgi?id=1245990, but I never
>     saw a resolution on what they discovered (other then a reference
>     to comment18 which does not appear to exist).
>
>     Testing over the network appears to also work as well (where
>     another machine connects to scsi-target-utils on the funk server
>     above.
>
>
>
>
>
>     Longterm Purpose of the above setup, was to get direct access to a
>     filesystem image hosted on a gluster setup, using bs-type glfs on
>     scsi-target-utils.
>
>     -- 
>     Nathan Coulson
>     www.bravenet.com <http://www.bravenet.com>
>     nathan at bravenet.com <mailto:nathan at bravenet.com>
>     _______________________________________________
>     CentOS-virt mailing list
>     CentOS-virt at centos.org <mailto:CentOS-virt at centos.org>
>     https://lists.centos.org/mailman/listinfo/centos-virt
>
>
>
>
> _______________________________________________
> CentOS-virt mailing list
> CentOS-virt at centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt


-- 
Nathan Coulson
System Administrator for Bravenet
www.bravenet.com
nathan at bravenet.com