Hi,

On Tue, Apr 8, 2014 at 11:03 PM, Juerg Haefliger <juergh@gmail.com> wrote:
On Tue, Apr 8, 2014 at 2:24 PM, Nux! <nux@li.nux.ro> wrote:
>
> Hello,
>
> While the Cloud SIG is still being established, let's get to actual
> work and think of a set of features for a CentOS cloud template.
> I am referring here to VMs, not containers (e.g. docker).
>
> This is how I see it so far, please feel free to come with
> suggestions/comments/questions.
>
> A - Single partition for simplicity (and lack of good arguments against
> it)

I was wondering about LVM. It makes reconfiguration much easier (like adding swap). But growroot doesn't support LVM.

Single partition is good for both simplicity and LVM support. LVM also adds a performance overhead which is fine for some customer use cases but not others.
 
>      - dracut-modules-growroot included so the template partition will
> expand to match target, cloud-init in charge of resize2fs

Only required for kernel < 3.8. Later kernels can do online partition resizing (handled by cloud-init post initrd).

Unless we plan to ignore CentOS 6, we need to handle kernels before 3.8 as well as CentOS 7's 3.10. 
 
> B - To swap or not to swap?

Some service providers charge for disk IOs and nobody wants to pay for swap activity, so I vote against swap.

I also don't see a need for swap in a cloud image. 

> C - "tuned-adm profile virtual-host" which translates to:
>      - kern.sched_min_granularity_ns 10ms
>      - kernel.sched_wakeup_granularity_ns 15ms
>      - vm.dirty_ratio 40%
>      - vm.swappiness 30
>      - IO scheduler "deadline"
>      - fs barriers off
>      - CPU governor "performance"
>      - disk readahead 4x

Where do these come from? What's the rational?

This might be a good place to link to GCE's recommendations for image settings, assembled from several different teams inside Google, with a bent toward maximal security but also discussing other areas:
https://developers.google.com/compute/docs/building-image

Some of them are more important than others, and clearly distributions will make the decisions that are right for them. Examples might be compiling virtio-scsi/virtio-net support as kernel modules for generality even though kernel modules would be disabled in the most single-vendor locked-down security-minded kernel.

Very few of these beyond the bare hardware support are strictly mandatory, but e.g. disabling password-based SSH, disabling root SSH login, and having root's password field locked are good cloud defaults except anywhere a specific vendor's environment needs otherwise.

We should also consider installing yum-cron by default; that adds a lot of automatic security protection for hands-off cloud users, but some behaviors or software versions occasionally change between 6.x and 6.{x+1}. Interesting tradeoff, and one that many users of a configuration management system handle through that software. GCE's image currently does preinstall yum-cron, though of course the CentOS community will eventually own the image and have the final say.

More recommendations might surface over time through things like performance testing or advice from our hypervisor or kernel hackers.

> D - tso and gso off on the network interfaces http://s.nux.ro/gsotso

These seem to be settings on the host, not the guest.

No opinion here, though if this is a guest-side setting, I can ask around within Google to give a well-informed GCE perspective.
 
> E - network interface remapping (75-persistent-net-generator.rules, BZ
> 912801)

Not authorized to access that bug.

Same.
 
> F - Selinux on. Do we relabel for uniqueness? Seen small VMs run out of
> memory while relabelling..

Ack.

I don't think GCE's current image does anything specific here beyond leaving SELinux on and ensuring some of our environment-specific hacks get properly labeled. No opinion on what's optimal, but we do offer small VMs as well as normal-sized ones, so handling both use cases is good.

> G - PERSISTENT_DHCLIENT="1" (BZ 1011013)

Ack.

Seems reasonable based on the RHBA linked from the BZ - we haven't noticed a problem without this but it could be useful.

> H - Bundle all the paravirt drivers in the ramdisk
> (virtio/xen/vmware/hyperv) so the same image can boot everywhere?

Seems reasonable. What's the impact on the initrd size?

Seems good to me too. The ones GCE cares about are virtio-scsi, virtio-net, and virtio-pci/virtio-blk, but no objection to the others in the initrd if the result is reasonably sized.
 
> I - Per "stack" requirements (e.g. cloudstack relies a lot on root user
> and password logins, openstack tends not to, SSH key only logins etc
> etc)

Can we have a single image that fits all the different requirements?

We are unlikely to have that in the end, but we can certainly start with one base and customize the output slightly for each environment.

Examples: GCE currently has two (Apache2-licensed Python) daemons running in our instances: one handles SSH keys via our metadata server in a way that's tied in to Google accounts and Google Cloud project access control lists, the other one facilitates some of our advanced networking features. We also ship gcutil and gsutil, two (Apache2-licensed Python) command-line utilities which are useful for interacting with the environment. The container format varies across environments too.

> That's about all that crosses my mind for now.

K - No firwall. Handled by the service provider.

Mostly the same in GCE too. To avoid breaking configs which expect the firewall on by default, we're currently going with a default-open iptables firewall (at least for TCP/UDP - I'd have to check for ICMP). If CentOS prefers to disable it entirely, no strong objection from me.

L - Timezone is set to UTC, Hostname is set to 'centos', lang is en_US.UTF-8, keyboard is us (or whatever you guys think makes sense).

Agreed, although in the GCE case the hostname is set dynamically via DHCP based on the instance name given to the API. We also set the NTP server to metadata.google.internal, served by the host the VM is running on. While this is baked into our images via kickstart, the DHCP server also recently started providing this via NTP option.
 
M - NOZEROCONF=yes

No opinion from me here. The same RHBA as before makes this seem wise to enable, although I haven't noticed a problem without it (our metadata server is at 169.254.169.254).
 
N - Along with the image, we'll also provide md5/sha1/sha256 checksums, gpg signed files and a manifest (list of installed packages and their versions).

Sounds reasonable.
 
- Jimmy