[CentOS-devel] [cloud] Features of a cloud VM template

Wed Apr 9 15:12:32 UTC 2014
Nux! <nux at li.nux.ro>

On 09.04.2014 07:48, Jimmy Kaplowitz wrote:
>> I was wondering about LVM. It makes reconfiguration much easier (like
>> adding swap). But growroot doesn't support LVM.
> Single partition is good for both simplicity and LVM support. LVM also 
> adds
> a performance overhead which is fine for some customer use cases but 
> not
> others.

Swap could be easily added via a file if really needed.

>>>      - dracut-modules-growroot included so the template partition 
>>> will
>>> expand to match target, cloud-init in charge of resize2fs
>> Only required for kernel < 3.8. Later kernels can do online partition
>> resizing (handled by cloud-init post initrd).
> Unless we plan to ignore CentOS 6, we need to handle kernels before 
> 3.8 as
> well as CentOS 7's 3.10.

Yes, EL6 is our main concern here I guess, though it'd be good if we 
could reuse as much as possible for future versions, I imagine.

>>> B - To swap or not to swap?
>> Some service providers charge for disk IOs and nobody wants to pay 
>> for
>> swap activity, so I vote against swap.
> I also don't see a need for swap in a cloud image.

Cool, nobody likes the swap.
As I said above, it's trivial to add a swap file later on.

>> C - "tuned-adm profile virtual-host" which translates to:
>>>      - kern.sched_min_granularity_ns 10ms
>>>      - kernel.sched_wakeup_granularity_ns 15ms
>>>      - vm.dirty_ratio 40%
>>>      - vm.swappiness 30
>>>      - IO scheduler "deadline"
>>>      - fs barriers off
>>>      - CPU governor "performance"
>>>      - disk readahead 4x
>> Where do these come from? What's the rational?

They come from RedHat, maybe Sam Kottler or some other RH dev can 
clarify some of this for us. I would have expected to see NOOP scheduler 
Maybe it's worth opening another thread to discuss this profile. I 
imagine they must have some reasons for choosing this since they build 
both the guest/host OS and the hypervisor.

> This might be a good place to link to GCE's recommendations for image
> settings, assembled from several different teams inside Google, with a 
> bent
> toward maximal security but also discussing other areas:
> https://developers.google.com/compute/docs/building-image

Yes, many of the modifications do make sense, but once we start 
"optimising" where do we stop? This could lead to a slippery slope.
Maybe KB can weigh in on this.

>> D - tso and gso off on the network interfaces http://s.nux.ro/gsotso
>> These seem to be settings on the host, not the guest.

These settings should be off on the guest, but seeing as there is no 
mention of this for newer versions, maybe it's something not necessarily 
AFAIK the virtio device can't do "hardware" TCP segmentation offloading 
and so on, but perhaps this is forwarded to the hypervisor.
To be looked at later on, doesn't seem like of big importance.

>>> E - network interface remapping (75-persistent-net-generator.rules, 
>>> BZ
>>> 912801)
>> Not authorized to access that bug.
> Same.

It's about preventing udev to mapping MACs to NICs, so that when the VM 
gets transformed into a template it will not retain this and therefore 
have its NIC called eth1 or whatever name is available next. I'm sure 
everyone has hit this problem when building templates.
"echo explanation > /etc/udev/rules.d/70-persistent-cd.rules" should do 
the trick.

>>> F - Selinux on. Do we relabel for uniqueness? Seen small VMs run out 
>>> of
>>> memory while relabelling..
>> Ack.
> I don't think GCE's current image does anything specific here beyond
> leaving SELinux on and ensuring some of our environment-specific hacks 
> get
> properly labeled. No opinion on what's optimal, but we do offer small 
> VMs
> as well as normal-sized ones, so handling both use cases is good.

Ok, so this needs further debate.

>> G - PERSISTENT_DHCLIENT="1" (BZ 1011013)
>> Ack.
> Seems reasonable based on the RHBA linked from the BZ - we haven't 
> noticed
> a problem without this but it could be useful.

I have seen the problem first hand in Cloudstack; if the virtual router 
(dhcp provider) goes away the instance loses its IP and becomes 
unreachable ...

>> H - Bundle all the paravirt drivers in the ramdisk
>>> (virtio/xen/vmware/hyperv) so the same image can boot everywhere?
>> Seems reasonable. What's the impact on the initrd size?
> Seems good to me too. The ones GCE cares about are virtio-scsi, 
> virtio-net,
> and virtio-pci/virtio-blk, but no objection to the others in the 
> initrd if
> the result is reasonably sized.

The default initrd already carries most of them, here's a normal initrd 
on my workstation:

17595362 Feb 12 13:13 initramfs-2.6.32-431.5.1.el6.x86_64.img

and here's anther one based on the same kernel, but with: 
add_drivers+="vmw_pvscsi vmxnet3 hv_vmbus hv_utils hv_storvsc hv_netvsc 
xenfs xen-netfront xen-blkfront virtio_scsi virtio_net virtio_console 
virtio-rng virtio_blk virtio_pci"

17688533 Apr  9 15:34 paravirt.img

So they are almost identical in size.

>>> I - Per "stack" requirements (e.g. cloudstack relies a lot on root 
>>> user
>>> and password logins, openstack tends not to, SSH key only logins etc
>>> etc)
>> Can we have a single image that fits all the different requirements?

It would require building some logic into it so the instance is aware 
it's running in ACS/OS/AWS/GCE ... possible, not sure how feasible.

> We are unlikely to have that in the end, but we can certainly start 
> with
> one base and customize the output slightly for each environment.


> Examples: GCE currently has two (Apache2-licensed Python) daemons 
> running
> in our instances: one handles SSH keys via our metadata server in a 
> way
> that's tied in to Google accounts and Google Cloud project access 
> control
> lists, the other one facilitates some of our advanced networking 
> features.
> We also ship gcutil and gsutil, two (Apache2-licensed Python) 
> command-line
> utilities which are useful for interacting with the environment. The
> container format varies across environments too.

No chance to actually get involved with cloud-init instead of running 
different scripts?
Either way, it looks like %post will have a lot of work to do for all 
these images. :-)

>> K - No firwall. Handled by the service provider.

+1 for default-open iptables

> L - Timezone is set to UTC, Hostname is set to 'centos', lang is
>> en_US.UTF-8, keyboard is us (or whatever you guys think makes sense).

+1 - The hostname not very important as most people use DHCP.

NTP/ntpdate is of course, a must.


No problem with that.

>> N - Along with the image, we'll also provide md5/sha1/sha256 
>> checksums,
>> gpg signed files and a manifest (list of installed packages and their
>> versions).


It would look like we need to allow enough room in %post for 
customisations required for all the various platforms, but if we can 
have a common base, that'd be great.

KB, what's your opinion on the above and what should we do next?


Sent from the Delta quadrant using Borg technology!