[CentOS-devel] [Users] 6.4 CR: oVirt 3.1 breaks with missing cpu features after update to CentOS 6.4 (6.3 + CR)

Thu Mar 7 15:50:16 UTC 2013
Patrick Hurrelmann <patrick.hurrelmann at lobster.de>

On 07.03.2013 16:18, Dan Kenigsberg wrote:
> On Thu, Mar 07, 2013 at 03:59:27PM +0100, Patrick Hurrelmann wrote:
>> On 05.03.2013 13:49, Dan Kenigsberg wrote:
>>> On Tue, Mar 05, 2013 at 12:32:31PM +0100, Patrick Hurrelmann wrote:
>>>> On 05.03.2013 11:14, Dan Kenigsberg wrote:
>>>> <snip>
>>>>>>>>
>>>>>>>> My version of vdsm as stated by Dreyou:
>>>>>>>> v 4.10.0-0.46 (.15), builded from
>>>>>>>> b59c8430b2a511bcea3bc1a954eee4ca1c0f4861 (branch ovirt-3.1)
>>>>>>>>
>>>>>>>> I can't see that Ia241b09c96fa16441ba9421f61a2f9a417f0d978 was merged to
>>>>>>>> 3.1 Branch?
>>>>>>>>
>>>>>>>> I applied that patch locally and restarted vdsmd but this does not
>>>>>>>> change anything. Supported cpu is still as low as Conroe instead of
>>>>>>>> Nehalem. Or is there more to do than patching libvirtvm.py?
>>>>>>>
>>>>>>> What is libvirt's opinion about your cpu compatibility?
>>>>>>>
>>>>>>>      virsh -r cpu-compare <(echo '<cpu match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>>>>>
>>>>>>> If you do not get "Host CPU is a superset of CPU described in bla", then
>>>>>>> the problem is within libvirt.
>>>>>>>
>>>>>>> Dan.
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> virsh -r cpu-compare <(echo '<cpu
>>>>>> match="minimum"><model>Nehalem</model><vendor>Intel</vendor></cpu>')
>>>>>> Host CPU is a superset of CPU described in /dev/fd/63
>>>>>>
>>>>>> So libvirt obviously is fine. Something different would have surprised
>>>>>> my as virsh capabilities seemed correct anyway.
>>>>>
>>>>> So maybe, just maybe, libvirt has changed their cpu_map, a map that
>>>>> ovirt-3.1 had a bug reading.
>>>>>
>>>>> Would you care to apply http://gerrit.ovirt.org/5035 to see if this is
>>>>> it?
>>>>>
>>>>> Dan.
>>>>
>>>> Hi Dan,
>>>>
>>>> success! Applying that patch made the cpu recognition work again. The
>>>> cpu type in admin portal shows again as Nehalem. Output from getVdsCaps:
>>>>
>>>>    cpuCores = 4
>>>>    cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,
>>>>               mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,
>>>>               ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,
>>>>               arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,
>>>>               aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,
>>>>               ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,ida,
>>>>               dts,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,
>>>>               model_Conroe,model_coreduo,model_core2duo,model_Penryn,
>>>>               model_n270
>>>>    cpuModel = Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
>>>>    cpuSockets = 1
>>>>    cpuSpeed = 2393.769
>>>>
>>>>
>>>> I compared libvirt's cpu_map.xml on both Centos 6.3 and CentOS 6.4 and
>>>> indeed they do differ in large portions. So this patch should probably
>>>> be merged to 3.1 branch? I will contact Dreyou and request that this
>>>> patch will also be included in his builds. I guess otherwise there will
>>>> be quite some fallout after people start picking CentOS 6.4 for oVirt 3.1.
>>>>
>>>> Thanks again and best regards
>>>
>>> Thank you for reporting this issue and verifying its fix.
>>>
>>> I'm not completely sure that we should keep maintaining the ovirt-3.1
>>> branch upstream - but a build destined for el6.4 must have it.
>>>
>>> If you believe we should release a fix version for 3.1, please verify
>>> that http://gerrit.ovirt.org/12723 has no ill effects.
>>>
>>> Dan.
>>
>> I did none additional tests and the new CentOS 6.4 host failed start or
>> migrate any vm. It always boils down to:
>>
>> Thread-43::ERROR::2013-03-07
>> 15:02:51,950::task::853::TaskManager.Task::(_setError)
>> Task=`52a9f96f-3dfd-4bcf-8d7a-db14e650b4c1`::Unexpected error
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/task.py", line 861, in _run
>>     return fn(*args, **kargs)
>>   File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
>>     res = f(*args, **kwargs)
>>   File "/usr/share/vdsm/storage/hsm.py", line 2551, in getVolumeSize
>>     apparentsize = str(volume.Volume.getVSize(sdUUID, imgUUID, volUUID,
>> bs=1))
>>   File "/usr/share/vdsm/storage/volume.py", line 283, in getVSize
>>     return mysd.getVolumeClass().getVSize(mysd, imgUUID, volUUID, bs)
>>   File "/usr/share/vdsm/storage/blockVolume.py", line 101, in getVSize
>>     return int(int(lvm.getLV(sdobj.sdUUID, volUUID).size) / bs)
>>   File "/usr/share/vdsm/storage/lvm.py", line 772, in getLV
>>     lv = _lvminfo.getLv(vgName, lvName)
>>   File "/usr/share/vdsm/storage/lvm.py", line 567, in getLv
>>     lvs = self._reloadlvs(vgName)
>>   File "/usr/share/vdsm/storage/lvm.py", line 419, in _reloadlvs
>>     self._lvs.pop((vgName, lvName), None)
>>   File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
>>     self.gen.throw(type, value, traceback)
>>   File "/usr/share/vdsm/storage/misc.py", line 1219, in acquireContext
>>     yield self
>>   File "/usr/share/vdsm/storage/lvm.py", line 404, in _reloadlvs
>>     lv = makeLV(*fields)
>>   File "/usr/share/vdsm/storage/lvm.py", line 218, in makeLV
>>     attrs = _attr2NamedTuple(args[LV._fields.index("attr")],
>> LV_ATTR_BITS, "LV_ATTR")
>>   File "/usr/share/vdsm/storage/lvm.py", line 188, in _attr2NamedTuple
>>     attrs = Attrs(*values)
>> TypeError: __new__() takes exactly 9 arguments (10 given)
>>
>> and followed by:
>>
>> Thread-43::ERROR::2013-03-07
>> 15:02:51,987::dispatcher::69::Storage.Dispatcher.Protect::(run)
>> __new__() takes exactly 9 arguments (10 given)
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
>>     result = ctask.prepare(self.func, *args, **kwargs)
>>   File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
>>     raise self.error
>> TypeError: __new__() takes exactly 9 arguments (10 given)
>> Thread-43::DEBUG::2013-03-07
>> 15:02:51,987::vm::580::vm.Vm::(_startUnderlyingVm)
>> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::_ongoingCreations released
>> Thread-43::ERROR::2013-03-07
>> 15:02:51,987::vm::604::vm.Vm::(_startUnderlyingVm)
>> vmId=`7db86f12-8c57-4d2b-a853-a6fd6f7ee82d`::The vm start process failed
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
>>     self._run()
>>   File "/usr/share/vdsm/libvirtvm.py", line 1289, in _run
>>     devices = self.buildConfDevices()
>>   File "/usr/share/vdsm/vm.py", line 431, in buildConfDevices
>>     self._normalizeVdsmImg(drv)
>>   File "/usr/share/vdsm/vm.py", line 358, in _normalizeVdsmImg
>>     drv['truesize'] = res['truesize']
>> KeyError: 'truesize'
>>
>> In webadmin the start and migrate operations fail with 'truesize'.
>>
>> I could find BZ#876958 which has the very same error. So I tried to
>> apply patch http://gerrit.ovirt.org/9317. I had to apply it manually
>> (guess patch would need a rebase for 3.1), but it works.
> 
> Thanks for the report. I've made a public backport for this in
> http://gerrit.ovirt.org/12836/ and would ask you again to tick that it
> is verified by you.
> 
>>
>> I now can start new virtual machines successfully on a CentOS 6.4 /
>> oVirt 3.1 host. Migration of vm from CentOS 6.3 hosts work, but not the
>> other way around. Migration from 6.4 to 6.3 fails:
>>
>> Thread-1296::ERROR::2013-03-07 15:55:24,845::vm::176::vm.Vm::(_recover)
>> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::internal error Process
>> exited while reading console log output: Supported machines are:
>> pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
>> rhel6.3.0  RHEL 6.3.0 PC (default)
>> rhel6.2.0  RHEL 6.2.0 PC
>> rhel6.1.0  RHEL 6.1.0 PC
>> rhel6.0.0  RHEL 6.0.0 PC
>> rhel5.5.0  RHEL 5.5.0 PC
>> rhel5.4.4  RHEL 5.4.4 PC
>> rhel5.4.0  RHEL 5.4.0 PC
>>
>> Thread-1296::ERROR::2013-03-07 15:55:24,988::vm::240::vm.Vm::(run)
>> vmId=`c978cbf8-6b4d-4d6f-9435-480d9fed31c4`::Failed to migrate
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/vm.py", line 223, in run
>>     self._startUnderlyingMigration()
>>   File "/usr/share/vdsm/libvirtvm.py", line 451, in
>> _startUnderlyingMigration
>>     None, maxBandwidth)
>>   File "/usr/share/vdsm/libvirtvm.py", line 491, in f
>>     ret = attr(*args, **kwargs)
>>   File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py",
>> line 82, in wrapper
>>     ret = f(*args, **kwargs)
>>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
>> migrateToURI2
>>     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
>> dom=self)
>> libvirtError: internal error Process exited while reading console log
>> output: Supported machines are:
>> pc         RHEL 6.3.0 PC (alias of rhel6.3.0)
>> rhel6.3.0  RHEL 6.3.0 PC (default)
>> rhel6.2.0  RHEL 6.2.0 PC
>> rhel6.1.0  RHEL 6.1.0 PC
>> rhel6.0.0  RHEL 6.0.0 PC
>> rhel5.5.0  RHEL 5.5.0 PC
>> rhel5.4.4  RHEL 5.4.4 PC
>> rhel5.4.0  RHEL 5.4.0 PC
>>
>> But I guess this is fine and migration from higher host version to a
>> lower version is probably not supported, right?
> 
> Well, I suppose that qemu would allow migration if you begine with a
> a *guest* of version rhel6.3.0. Please try it out.
> 
> Dan.

Alright, just verified it. A vm started on a 6.3 host can be
successfully migrated to the new 6.4 host and then back to any other 6.3
host. It just won't migrate a vm started on 6.4 to any host running 6.3.

Regards
Patrick

-- 
Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg

HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich