Sharing notes from a visit in IT4Innovations center in Ostrava.
Supercomputer parameters info are available at [1] - mixture of Xeon, Xeon Phi and graphic card cores. Two clusters - Anselm runs RHEL 6, Salomon runs CentOS 6.
Each node runs on a specific piece of a hardware. If a project is built on a builder of a different hardware, it is built with different flags and configuration and the resulting binaries are not properly optimized. Some projects need special features of compilers in order to run efficiently. Some of the features are available only in the latest versions of compilers. By the time the latest versions get into builders, it is usually to late. Or a project needs to be built with a proprietary compiler that is not publicly available. Or given each project needs different version of system libraries in general, the HPC infrastructure needs to offer various versions of the same library. The software needs to be available in many flavors. It means the same software to be built with multiple compilers and multiple versions. Different versions have different properties/features. Different compilers accent/provide different optimizations. Thus, a matrix of software to provide the users with. At the end a software needs to be built for the end-system architecture so it can use all the available instructions not to slow down the computation. For that reason (and many others) all the projects need to be built locally inside the cluster. That makes most of the binary packages in CentOS distribution unusable for the HPC use cases.Only usecase for rpm as proof of concepting on devel laptops is useful, not for final deployments.Currently, the EasyBuild project [2] is used as a replacement for rpm based spec files.Operators use Fedora upstream monitoring tool to monitor latest&greatest software (atm. ~400+ projects). Infiniband is usedfor connection of the nodes, sometimes issues with 3rd party SW drivers (Bull/Atos and/or HPE).
Other notes: - Lot of service providers are still running on CentOS 6, which blocks upgrade to CentOS 7- shutdown of cluster not possible, infra for clusters changed between RHEL 6 and RHEL 7. - Puppet and Ansible used to deploy clusters (Ansible for disk-free nodes, Puppetfor nodes with disks). Still, each deployment is unique (e.g. unexpected situations) and thus not fully automated.For Ansiblepart,there is no roleusedfromthe AnsibleGalaxy - just Core modules and custom roles and playbooks - Experiments with containers as well via Singularity [3] (Docker is not fully supported on CentOS 6, needs privileged user account) - Demand on packaging and providing tooling for HPC rather than libraries themselves. If possible, provide full HPC stack that is upstream and distribution supported/maintained (including full stack upgrades). CI/CD supported as well. - HPC community is unfortunately security free, security fix deployment can take several months, dependencies on specific minor releases or kernel versions. Kernel KABI whitelist should be advised to 3rd party vendors of drivers to prevent hard version deps - Each assigned set of nodes is expected to be vanilla new. Given it takes some time before a node is rebooted (order of minutes), all the tooling running inside a node must clean everything a user task left. Thus, minimize a number of times a node is really rebooted.
[1] https://docs.it4i.cz/salomon/hardware-overview/ [2] https://github.com/hpcugent/easybuild [3] http://singularity.lbl.gov/
On Mon, Jun 19, 2017 at 1:32 PM, Jan Chaloupka jchaloup@redhat.com wrote:
Sharing notes from a visit in IT4Innovations center in Ostrava.
Supercomputer parameters info are available at [1] - mixture of Xeon, Xeon Phi and graphic card cores. Two clusters - Anselm runs RHEL 6, Salomon runs CentOS 6.
Each node runs on a specific piece of a hardware. If a project is built on a builder of a different hardware, it is built with different flags and configuration and the resulting binaries are not properly optimized. Some projects need special features of compilers in order to run efficiently. Some of the features are available only in the latest versions of compilers. By the time the latest versions get into builders, it is usually to late. Or a project needs to be built with a proprietary compiler that is not publicly available. Or given each project needs different version of system libraries in general, the HPC infrastructure needs to offer various versions of the same library. The software needs to be available in many flavors. It means the same software to be built with multiple compilers and multiple versions. Different versions have different properties/features. Different compilers accent/provide different optimizations. Thus, a matrix of software to provide the users with. At the end a software needs to be built for the end-system architecture so it can use all the available instructions not to slow down the computation. For that reason (and many others) all the projects need to be built locally inside the cluster. That makes most of the binary packages in CentOS distribution unusable for the HPC use cases. Only usecase for rpm as proof of concepting on devel laptops is useful, not for final deployments. Currently, the EasyBuild project [2] is used as a replacement for rpm based spec files. Operators use Fedora upstream monitoring tool to monitor latest&greatest software (atm. ~400+ projects). Infiniband is used for connection of the nodes, sometimes issues with 3rd party SW drivers (Bull/Atos and/or HPE).
Other notes:
- Lot of service providers are still running on CentOS 6, which blocks
upgrade to CentOS 7 - shutdown of cluster not possible, infra for clusters changed between RHEL 6 and RHEL 7.
- Puppet and Ansible used to deploy clusters (Ansible for disk-free nodes,
Puppet for nodes with disks). Still, each deployment is unique (e.g. unexpected situations) and thus not fully automated. For Ansible part, there is no role used from the Ansible Galaxy - just Core modules and custom roles and playbooks
- Experiments with containers as well via Singularity [3] (Docker is not
fully supported on CentOS 6, needs privileged user account)
even if docker was supported, the kernel on the compute nodes of a cluster will stay fixed and old (due to e.g. infiniband support built-in). Won't that break containers in case someone creates a docker image assuming an access to a very recent kernel on the docker host? https://forums.docker.com/t/libc-incompatibilities-when-will-they-emerge/989...
Marcin
- Demand on packaging and providing tooling for HPC rather than
libraries themselves. If possible, provide full HPC stack that is upstream and distribution supported/maintained (including full stack upgrades). CI/CD supported as well.
- HPC community is unfortunately security free, security fix deployment
can take several months, dependencies on specific minor releases or kernel versions. Kernel KABI whitelist should be advised to 3rd party vendors of drivers to prevent hard version deps
- Each assigned set of nodes is expected to be vanilla new. Given it
takes some time before a node is rebooted (order of minutes), all the tooling running inside a node must clean everything a user task left. Thus, minimize a number of times a node is really rebooted.
[1] https://docs.it4i.cz/salomon/hardware-overview/ [2] https://github.com/hpcugent/easybuild [3] http://singularity.lbl.gov/
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
even if docker was supported, the kernel on the compute nodes of a cluster will stay fixed and old (due to e.g. infiniband support built-in). Won't that break containers in case someone creates a docker image assuming an access to a very recent kernel on the docker host? https://forums.docker.com/t/libc-incompatibilities-when- will-they-emerge/9895/4
Marcin
Hi Marcin,
With Singularity you can run Centos/RHEL 7.x container on Centos/RHEL 6.x OS easily and smoothly.
Regards, DH
forgive me if this thread is considered OT - I think it contains some cross-cultural insights that may be valuable.
Supercomputer parameters info are available at [1] - mixture of Xeon, Xeon Phi and graphic card cores. Two clusters - Anselm runs RHEL 6, Salomon runs CentOS 6.
It's quite common to find a mixture of node configurations - sometimes within a single cluster, sometimes partitioned. Running a commercial Linux is relatively uncommon, though, since most centres prefer Centos or perhaps Scientific Linux. I haven't found vendor expertise particularly helpful in HPC situations, especially considering that a non-small centre really must maintain its own skilled personnel.
Each node runs on a specific piece of a hardware. If a project is built on a builder of a different hardware, it is built with different flags and configuration and the resulting binaries are not properly optimized. Some
This matters sometimes. there is a continuum of codes, from not really caring about optimization (by which we mean machine-specific), to those where it makes a lot of difference. A convenient middle is generic apps that select optimized matrix libraries. But for an app which is not vector-friendly, those things don't matter much.
project needs to be built with a proprietary compiler that is not publicly available. Or given each project needs different version of system libraries
The phenomenon is really that various tools/libraries/pipelines come from a variety of different development organizations, which vary in how aggressively they pursue recent versions. The worst are sloppy coders who don't follow standards, so require extremely specific versions of many component packages ("only foo-3.14 is known to work"). That's the main motive for containerization (and to some extent, solutions like environment modules and Nix.) There really is no such thing as "system libraries", when you view software this way...
architecture so it can use all the available instructions not to slow down the computation.
Besides vector length (ie, SSE vs AVX vs AVX-512), there has not been a lot of public, solid demonstrations that compiler- or flag-tweaking matters much. The main scaling dimension is "more nodes", and may be driven by memory footprint as much as CPU speed, so few people sweat the flags to deliver some 7.34% improvement in single-core performance (in my experience.)
- Puppet and Ansible used to deploy clusters (Ansible for disk-free nodes,
Puppetfor nodes with disks). Still, each deployment is unique (e.g.
I claim that HPC clusters do not benefit much from these tools: their main value is in handling widely diverse environments that need to change frequently, whereas most HPC clusters are just rack after rack of the same nodes that need to behave the same this year as last year. (My organization uses OneSIS, which permits stateless nodes that run a readonly NFS-root; to upgrade a package, you just "chroot /var/lib/oneSIS/image yum update foo".)
- Experiments with containers as well via Singularity [3] (Docker is not
fully supported on CentOS 6, needs privileged user account)
The key point is that Docker is simply inappropriate for HPC, where the norm is a large, shared cluster, and jobs are not anything like Docker's raison d'etre (webservers, redis, etc). In a sense, the normal cluster scheduler is already automating resource management (memory, cores, gpus) and jobs don't need eg exposed IP addresses. Your storage is a quota in a many-PB /project filesystem, not dynamically provisioned S3 buckets.
- HPC community is unfortunately security free, security fix deployment can
That's a bit of an overstatement. Most published fixes are simply irrelevant, since very little desktop-ish user-space is even available on compute nodes (firefox, for instance). Kernel updates may be delayed if there are constraints in driver or out-of-tree filesystems. Mitigation (like simply disabling SCTP) is common. But the concern for repeatability makes it unattractive to apply, for instance, glibc updates as well.
- Each assigned set of nodes is expected to be vanilla new. Given it takes
This varies between centres - we don't reboot compute nodes voluntarily, and a single node may be shared by jobs from multiple users. You trade increased utilization/efficiency versus some exposure to security and performance interference. It's really a question of how diverse your user-base is...
regards, mark hahn.