[CentOS-virt] Xen vs. iSCSI

Tue Jun 16 03:11:45 UTC 2009
Bill McGonigle <bill at bfccomputing.com>

[previously sent to rhelv5 list, apologies to those on both]

I've got a problem I can reproduce easily enough, but really I fail to 
understand what's going wrong.

I've got a 5.3 Dom0, which is running three guests.  One is Fedora 10, 
that runs with local flat files, and works fine.  One is Nexenta 2 
(opensolaris-based), and that runs off of physical partitions, and seems 
to work great.  The third runs Fedora 11 and and has for its disks, 
iSCSI devices that are exported from Nexenta (ZFS-backed).

I have the Dom0 mapping the two iSCSI devices, one for /boot and one for 
/.  They're showing up initially as /dev/sdc and /dev/sdd.

If I go after the iSCSI devices in Dom0, with dd, for instance, they 
work fine all day. I can read and write the entire devices to and from 
local files without error.  iSCSI seems to work properly in that regard. 
  I'm getting about 38MB/s.  I've scrubbed the disk pool and no errors 
were found and long SMART self-test passed on each of the disks.  I've 
also been able to mount both iSCSI devices and run bonnie++ on them 
successfully from Dom0.

So, I specify those devices in the Xen config for the domU (tried both 
real device name and /dev/disk/by-path/ names) and the DomU boots and 
operates as I'd expect.  Installation worked fine and typical operations 
(low volume) work.  However, then I try to do something, which I'm 
assuming is more disk intensive, like running a yum update, and iSCSI 
seems to fall over.

In the DomU, I'll see a lock-up, and then filesystem errors.  e.g.:

   Installing     : kernel [############################################ 
]  1/33EXT3-fs error (device xvda1) in ext3_ordered_writepage: IO failure

In the Dom0, I'll see:

   sd 6:0:0:0: timing out command, waited 360s
   sd 6:0:0:0: SCSI error: return code = 0x06050000
   end_request: I/O error, dev sdc, sector 37319
   sd 7:0:0:0: timing out command, waited 360s
   sd 7:0:0:0: SCSI error: return code = 0x06000000
   end_request: I/O error, dev sdd, sector 29792137
   sd 7:0:0:0: timing out command, waited 360s
   sd 7:0:0:0: SCSI error: return code = 0x06000000
   end_request: I/O error, dev sdd, sector 29792313

Both (all) iSCSI devices are failed.  Under iostat I see activity to the 
iSCSI block devices, and the whole machine acts mostly I/O blocked (even 
the Fedora 10 DomU running on flat files will start throwing nagios into 
a tizzy).  If I do 'service iscsi stop' everything picks right back up 
(though the DomU using them as its disks is obviously unhappy).

When I start iscsi again I can pick right back up (after repairing 
filesystems in the DomU), and I can repeat the process at will. 
Sometimes the disks will come back as, e.g. sdd and sde, leaving me to 
think something still has a handle on sdc.  But lsof shows nothing in dom0.

One thing that stood out were some of the block and sector number errors 
being right on power of two boundries:

  scsi 7:0:0:0: SCSI error: return code = 0x00010000
  end_request: I/O error, dev sdd, sector 32768
  Buffer I/O error on device sdd, logical block 4096
  Buffer I/O error on device sdd, logical block 4097
  Buffer I/O error on device sdd, logical block 4098
  Buffer I/O error on device sdd, logical block 4099
  Buffer I/O error on device sdd, logical block 4100
  Buffer I/O error on device sdd, logical block 4101
  Buffer I/O error on device sdd, logical block 4102
  Buffer I/O error on device sdd, logical block 4103
  Buffer I/O error on device sdd, logical block 4104
  Buffer I/O error on device sdd, logical block 4105
  scsi 7:0:0:0: rejecting I/O to dead device

but as I opened with, I'm sort of at as loss as to what is actually 
causing the problem.  Any suggestions for further troubleshooting and/or 
ideas about what's happening appreciated.

Thanks,
-Bill

-- 
Bill McGonigle, Owner           Work: 603.448.4440
BFC Computing, LLC              Home: 603.448.1668
http://www.bfccomputing.com/    Cell: 603.252.2606
Twitter, etc.: bill_mcgonigle   Page: 603.442.1833
Email, IM, VOIP: bill at bfccomputing.com
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf