[CentOS-devel] [Users] 6.4 CR: oVirt 3.1 breaks with missing cpu features after update to CentOS 6.4 (6.3 + CR)

Thu Mar 7 14:59:27 UTC 2013
Patrick Hurrelmann <patrick.hurrelmann at lobster.de>

On 05.03.2013 13:49, Dan Kenigsberg wrote:
> On Tue, Mar 05, 2013 at 12:32:31PM +0100, Patrick Hurrelmann wrote:
>> On 05.03.2013 11:14, Dan Kenigsberg wrote:
>> <snip>
>>>>>>
>>>>>> My version of vdsm as stated by Dreyou:
>>>>>> v 4.10.0-0.46 (.15), builded from
>>>>>> b59c8430b2a511bcea3bc1a954eee4ca1c0f4861 (branch ovirt-3.1)
>>>>>>
>>>>>> I can't see that Ia241b09c96fa16441ba9421f61a2f9a417f0d978 was merged to
>>>>>> 3.1 Branch?
>>>>>>
>>>>>> I applied that patch locally and restarted vdsmd but this does not
>>>>>> change anything. Supported cpu is still as low as Conroe instead of
>>>>>> Nehalem. Or is there more to do than patching libvirtvm.py?
>>>>>
>>>>> What is libvirt's opinion about your cpu compatibility?
>>>>>
>>>>>      virsh -r cpu-compare <(echo '<cpu match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>>>
>>>>> If you do not get "Host CPU is a superset of CPU described in bla", then
>>>>> the problem is within libvirt.
>>>>>
>>>>> Dan.
>>>>
>>>> Hi Dan,
>>>>
>>>> virsh -r cpu-compare <(echo '<cpu
>>>> match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>> Host CPU is a superset of CPU described in /dev/fd/63
>>>>
>>>> So libvirt obviously is fine. Something different would have surprised
>>>> my as virsh capabilities seemed correct anyway.
>>>
>>> So maybe, just maybe, libvirt has changed their cpu_map, a map that
>>> ovirt-3.1 had a bug reading.
>>>
>>> Would you care to apply http://gerrit.ovirt.org/5035 to see if this is
>>> it?
>>>
>>> Dan.
>>
>> Hi Dan,
>>
>> success! Applying that patch made the cpu recognition work again. The
>> cpu type in admin portal shows again as Nehalem. Output from getVdsCaps:
>>
>>    cpuCores = 4
>>    cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,
>>               mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,
>>               ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,
>>               arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,
>>               aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,
>>               ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,ida,
>>               dts,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,
>>               model_Conroe,model_coreduo,model_core2duo,model_Penryn,
>>               model_n270
>>    cpuModel = Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
>>    cpuSockets = 1
>>    cpuSpeed = 2393.769
>>
>>
>> I compared libvirt's cpu_map.xml on both Centos 6.3 and CentOS 6.4 and
>> indeed they do differ in large portions. So this patch should probably
>> be merged to 3.1 branch? I will contact Dreyou and request that this
>> patch will also be included in his builds. I guess otherwise there will
>> be quite some fallout after people start picking CentOS 6.4 for oVirt 3.1.
>>
>> Thanks again and best regards
> 
> Thank you for reporting this issue and verifying its fix.
> 
> I'm not completely sure that we should keep maintaining the ovirt-3.1
> branch upstream - but a build destined for el6.4 must have it.
> 
> If you believe we should release a fix version for 3.1, please verify
> that http://gerrit.ovirt.org/12723 has no ill effects.
> 
> Dan.

I did none additional tests and the new CentOS 6.4 host failed start or
migrate any vm. It always boils down to:

Thread-43::ERROR::2013-03-07
15:02:51,950::task::853::TaskManager.Task::(_setError)
Task=`52a9f96f-3dfd-4bcf-8d7a-db14e650b4c1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2551, in getVolumeSize
    apparentsize = str(volume.Volume.getVSize(sdUUID, imgUUID, volUUID,
bs=1))
  File "/usr/share/vdsm/storage/volume.py", line 283, in getVSize
    return mysd.getVolumeClass().getVSize(mysd, imgUUID, volUUID, bs)
  File "/usr/share/vdsm/storage/blockVolume.py", line 101, in getVSize
    return int(int(lvm.getLV(sdobj.sdUUID, volUUID).size) / bs)
  File "/usr/share/vdsm/storage/lvm.py", line 772, in getLV
    lv = _lvminfo.getLv(vgName, lvName)
  File "/usr/share/vdsm/storage/lvm.py", line 567, in getLv
    lvs = self._reloadlvs(vgName)
  File "/usr/share/vdsm/storage/lvm.py", line 419, in _reloadlvs
    self._lvs.pop((vgName, lvName), None)
  File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/share/vdsm/storage/misc.py", line 1219, in acquireContext
    yield self
  File "/usr/share/vdsm/storage/lvm.py", line 404, in _reloadlvs
    lv = makeLV(*fields)
  File "/usr/share/vdsm/storage/lvm.py", line 218, in makeLV
    attrs = _attr2NamedTuple(args[LV._fields.index("attr")],
LV_ATTR_BITS, "LV_ATTR")
  File "/usr/share/vdsm/storage/lvm.py", line 188, in _attr2NamedTuple
    attrs = Attrs(*values)
TypeError: __new__() takes exactly 9 arguments (10 given)

and followed by:

Thread-43::ERROR::2013-03-07
15:02:51,987::dispatcher::69::Storage.Dispatcher.Protect::(run)
__new__() takes exactly 9 arguments (10 given)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
    raise self.error
TypeError: __new__() takes exactly 9 arguments (10 given)
Thread-43::DEBUG::2013-03-07
15:02:51,987::vm::580::vm.Vm::(_startUnderlyingVm)
vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::_ongoingCreations released
Thread-43::ERROR::2013-03-07
15:02:51,987::vm::604::vm.Vm::(_startUnderlyingVm)
vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1289, in _run
    devices = self.buildConfDevices()
  File "/usr/share/vdsm/vm.py", line 431, in buildConfDevices
    self._normalizeVdsmImg(drv)
  File "/usr/share/vdsm/vm.py", line 358, in _normalizeVdsmImg
    drv['truesize'] = res['truesize']
KeyError: 'truesize'

In webadmin the start and migrate operations fail with 'truesize'.

I could find BZ#876958 which has the very same error. So I tried to
apply patch http://gerrit.ovirt.org/9317. I had to apply it manually
(guess patch would need a rebase for 3.1), but it works.

I now can start new virtual machines successfully on a CentOS 6.4 /
oVirt 3.1 host. Migration of vm from CentOS 6.3 hosts work, but not the
other way around. Migration from 6.4 to 6.3 fails:

Thread-1296::ERROR::2013-03-07 15:55:24,845::vm::176::vm.Vm::(_recover)
vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::internal error Process
exited while reading console log output: Supported machines are:
pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
rhel6.3.0  RHEL 6.3.0 PC (default)
rhel6.2.0  RHEL 6.2.0 PC
rhel6.1.0  RHEL 6.1.0 PC
rhel6.0.0  RHEL 6.0.0 PC
rhel5.5.0  RHEL 5.5.0 PC
rhel5.4.4  RHEL 5.4.4 PC
rhel5.4.0  RHEL 5.4.0 PC

Thread-1296::ERROR::2013-03-07 15:55:24,988::vm::240::vm.Vm::(run)
vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 451, in
_startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py",
line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
dom=self)
libvirtError: internal error Process exited while reading console log
output: Supported machines are:
pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
rhel6.3.0  RHEL 6.3.0 PC (default)
rhel6.2.0  RHEL 6.2.0 PC
rhel6.1.0  RHEL 6.1.0 PC
rhel6.0.0  RHEL 6.0.0 PC
rhel5.5.0  RHEL 5.5.0 PC
rhel5.4.4  RHEL 5.4.4 PC
rhel5.4.0  RHEL 5.4.0 PC

But I guess this is fine and migration from higher host version to a
lower version is probably not supported, right?

Regards
Patrick

-- 
Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg

HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich