Hi,
I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora.
Initial members would be me (ovasik@redhat.com, CentOS FAS account: Reset), Adrian Reber (areber@redhat.com, CentOS FAS account: areber), Stanislav Kozina (skozina@redhat.com, CentOS FAS account: ersin) and Jan Chaloupka (jchaloup@fedoraproject.org, CentOS FAS account: jchaloup). Of course, anyone is welcome to join.
Thanks in advance for approving/sponsoring the SIG.
Regards, Ondrej Vasik
On Sun, Apr 23, 2017 at 2:09 PM, Ondřej Vašík ovasik@redhat.com wrote:
Hi,
I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora.
Good to see an initiative to get the tools specific for HPC packaged, but I have a comment. Under https://github.com/openhpc/ohpc/tree/obs/OpenHPC_1.3_Factory/components/io-l... I see spec files for software like netcdf or hdf5.
On a cluster one needs access to **many** versions of libraries (that includes compilers, python, mpi, etc.) and packaging them as RPMS is not the correct model, unless the HPC system uses VM golden images or container images, and allows the users to start them on-demand. What is usually used is a setup based on lmod/environment-modules like https://github.com/hpcugent/easybuild-easyconfigs
I would therefore prefer the OpenHPC project focuses in the first place on the tools a single version of which is installed on the operating system.
Best regards,
Marcin
Initial members would be me (ovasik@redhat.com, CentOS FAS account: Reset), Adrian Reber (areber@redhat.com, CentOS FAS account: areber), Stanislav Kozina (skozina@redhat.com, CentOS FAS account: ersin) and Jan Chaloupka (jchaloup@fedoraproject.org, CentOS FAS account: jchaloup). Of course, anyone is welcome to join.
Thanks in advance for approving/sponsoring the SIG.
Regards, Ondrej Vasik
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
On a cluster one needs access to **many** versions of libraries (that includes compilers, python, mpi, etc.) and
Well, on larger, more-shared, more-diverse clusters, that's true. But there are plenty of clusters that don't customize the stack much, if at all, even if they use locally-developed codes.
Really, the point is: the very nature of a distribution is that it reduces flexibility in favor of convenience. If "cloud is someone else's computer", then "distribution is someone else's build/test/packaging". I think it's still useful to have a baseline HPC cluster distro, even if many people, especially at larger sites, resort to modules to produce other combinations of middleware.
packaging them as RPMS is not the correct model, unless the HPC system uses VM golden images or container images, and allows the users to start them
Many, many clusters do use RPMS, whether that means NFSroot approaches, or stateful node installs (often just kickstart, though there are many who use devops approaches like puppet).
One sticky aspect of the modules approach is illustrated by Nix: either you use it sparingly, or you go all the way and replace everything about the node install (all the way down to ld.so and glibc...)
regards, mark hahn (sharcnet/computecanada)
Marcin Dulak píše v Ne 23. 04. 2017 v 15:13 +0200:
On Sun, Apr 23, 2017 at 2:09 PM, Ondřej Vašík ovasik@redhat.com wrote: Hi,
I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora.
Good to see an initiative to get the tools specific for HPC packaged, but I have a comment. Under https://github.com/openhpc/ohpc/tree/obs/OpenHPC_1.3_Factory/components/io-l... I see spec files for software like netcdf or hdf5.
On a cluster one needs access to **many** versions of libraries (that includes compilers, python, mpi, etc.) and packaging them as RPMS is not the correct model, unless the HPC system uses VM golden images or container images, and allows the users to start them on-demand. What is usually used is a setup based on lmod/environment-modules like https://github.com/hpcugent/easybuild-easyconfigs
Yes, understood, thanks for the comment. With containers being more and more popular, I think even packaging these applications and libraries makes more sense. Of course, optimizing build for the specific system would be even better, but package still gives you a way how to easily install/update/remove some application with all its dependencies.
For many versions of libs and compilers - sometimes it may make sense to create a matrix of rpms like is done in the case of openHPC initiative, sometimes probably software collections can be used to get multiple versions of library/dependency on the system in parallel.
Goal is not to solve everything - this is of course out of scope - but to improve current situation and maybe to start discussion like this - how to proceed, what is missing and what is expected to be missing (because it doesn't make sense to have it as distribution package).
Regards, Ondrej
I would therefore prefer the OpenHPC project focuses in the first place on the tools a single version of which is installed on the operating system.
Best regards,
Marcin
Initial members would be me (ovasik@redhat.com, CentOS FAS account: Reset), Adrian Reber (areber@redhat.com, CentOS FAS account: areber), Stanislav Kozina (skozina@redhat.com, CentOS FAS account: ersin) and Jan Chaloupka (jchaloup@fedoraproject.org, CentOS FAS account: jchaloup). Of course, anyone is welcome to join. Thanks in advance for approving/sponsoring the SIG. Regards, Ondrej Vasik _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
Dear Ondrej at al,
Good to read such an email. The initiative is very needed. HPC infrastructure and environment is very specific and there's no need to have HPC SW in RPM packages. I do not want to go too much into details. It would be a long email. Anyway I'm responsible for production/operations of national HPC clusters here in Ostrava. So near to Brno. I'd like to invite you all to visit us. So we can explain you the needs of HPC. How we run the services. What we need from OS. How we distribute the software for end-users within the clusters.
Looking forward to hearing from you soon. DH
2017-04-23 14:09 GMT+02:00 Ondřej Vašík ovasik@redhat.com:
Hi,
I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora.
Initial members would be me (ovasik@redhat.com, CentOS FAS account: Reset), Adrian Reber (areber@redhat.com, CentOS FAS account: areber), Stanislav Kozina (skozina@redhat.com, CentOS FAS account: ersin) and Jan Chaloupka (jchaloup@fedoraproject.org, CentOS FAS account: jchaloup). Of course, anyone is welcome to join.
Thanks in advance for approving/sponsoring the SIG.
Regards, Ondrej Vasik
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
So I believe one of the goals that Ondrej and others have in bringing this SIG up are to work on providing a common HPC baseline, initially by delivering things similar to (and expanding on) OpenHPC. Since CentOS is an rpm based distribution, the 'package management' way to easily install the required software is via yum/rpm. As was mentioned by the other reply, multiple versions are often needed/required but this could in theory be handled with SCLs.
On 04/23/2017 10:45 AM, David Hrbáč wrote:
Dear Ondrej at al,
Good to read such an email. The initiative is very needed. HPC infrastructure and environment is very specific and there's no need to have HPC SW in RPM packages. I do not want to go too much into details. It would be a long email. Anyway I'm responsible for production/operations of national HPC clusters here in Ostrava. So near to Brno. I'd like to invite you all to visit us. So we can explain you the needs of HPC. How we run the services. What we need from OS. How we distribute the software for end-users within the clusters.
Looking forward to hearing from you soon. DH
2017-04-23 14:09 GMT+02:00 Ondřej Vašík <ovasik@redhat.com mailto:ovasik@redhat.com>:
Hi, I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup <https://wiki.centos.org/SpecialInterestGroup> among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora. Initial members would be me (ovasik@redhat.com <mailto:ovasik@redhat.com>, CentOS FAS account: Reset), Adrian Reber (areber@redhat.com <mailto:areber@redhat.com>, CentOS FAS account: areber), Stanislav Kozina (skozina@redhat.com <mailto:skozina@redhat.com>, CentOS FAS account: ersin) and Jan Chaloupka (jchaloup@fedoraproject.org <mailto:jchaloup@fedoraproject.org>, CentOS FAS account: jchaloup). Of course, anyone is welcome to join. Thanks in advance for approving/sponsoring the SIG. Regards, Ondrej Vasik _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org <mailto:CentOS-devel@centos.org> https://lists.centos.org/mailman/listinfo/centos-devel <https://lists.centos.org/mailman/listinfo/centos-devel>
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
On Sun, Apr 23, 2017 at 11:22:46AM -0700, Jim Perrin wrote:
So I believe one of the goals that Ondrej and others have in bringing this SIG up are to work on providing a common HPC baseline, initially by delivering things similar to (and expanding on) OpenHPC. Since CentOS is an rpm based distribution, the 'package management' way to easily install the required software is via yum/rpm. As was mentioned by the other reply, multiple versions are often needed/required but this could in theory be handled with SCLs.
HPC covers a lot of grounds, imho (cf beowulf mainling list, from research group size to national/multi countries setup). - compute part: building the software (tuned for your cpu/gpu, ie openblas/atlas VS generic), SCL, Lmod/modules - easybuild/spark/nix/... - hardware (IB, dedicated hw such as FPGA, ...), ARM VS x86_64, ... - management (puppet/ansible/salt/...) - scaling on 10s on nodes VS 1000 VS more... (network/rack/datacenter management at scale) - user management (from plain /etc/{password|shadow} to FreeIPA, or Active Directory...) - shared storage, NFSv3/v4, pNFS, proprietary (cf panasas, gpfs,...) - and managing 100 TB or 100 PB is not the same (cf robinhood.sf.net) - distributed storage (client/server): tuned for different workload and requirements (quotas, ACLs, streaming VS IOPS, locking?, cheap?, expandability) lustre, beegfs, rozofs, moosefs, ..., ceph, glusterfs, - archiving/long term storage (irods?) - batch queuing: slurm and friends - containers (docker, singularity, ...) - web interfaces for non IT fluent users - remote visualisation (to avoid moving TB of data) - UEFI vs plain PXE/legacy booting - cloud expansion or cloud based for embarrassingly parallel workload ? - haddoop ? - what framework? warewulf as in openhpc, xcat, ks (foreman or DYI), ...
Cheers
Tru
Tru +1
DH
2017-04-24 9:27 GMT+02:00 Tru Huynh tru@centos.org:
HPC covers a lot of grounds, imho (cf beowulf mainling list, from research group size to national/multi countries setup).
- compute part: building the software (tuned for your cpu/gpu, ie
openblas/atlas VS generic), SCL, Lmod/modules
- easybuild/spark/nix/...
- hardware (IB, dedicated hw such as FPGA, ...), ARM VS x86_64, ...
- management (puppet/ansible/salt/...)
- scaling on 10s on nodes VS 1000 VS more... (network/rack/datacenter
management at scale)
- user management (from plain /etc/{password|shadow} to FreeIPA, or Active
Directory...)
- shared storage, NFSv3/v4, pNFS, proprietary (cf panasas, gpfs,...)
- and managing 100 TB or 100 PB is not the same (cf robinhood.sf.net)
- distributed storage (client/server): tuned for different workload and
requirements (quotas, ACLs, streaming VS IOPS, locking?, cheap?, expandability) lustre, beegfs, rozofs, moosefs, ..., ceph, glusterfs,
- archiving/long term storage (irods?)
- batch queuing: slurm and friends
- containers (docker, singularity, ...)
- web interfaces for non IT fluent users
- remote visualisation (to avoid moving TB of data)
- UEFI vs plain PXE/legacy booting
- cloud expansion or cloud based for embarrassingly parallel workload ?
- haddoop ?
- what framework? warewulf as in openhpc, xcat, ks (foreman or DYI), ...
Cheers
Tru
Hi to all,
We had a TELCO on 20170426. Here are brief NoM.
- We will provide the credentials for members of SIG to have access to real HPC system. - We hope to meet in a person very, so we can show infra and discuss needs and workflows in HPC domain. - SIG members have been provided with project ID at IT4Innovations and an email describing how to apply for the credentials. - We hope to have another TELCO, date not set yet.
Regards, DH
2017-04-24 10:14 GMT+02:00 David Hrbáč david-lists@hrbac.cz:
Tru +1
DH
2017-04-24 9:27 GMT+02:00 Tru Huynh tru@centos.org:
HPC covers a lot of grounds, imho (cf beowulf mainling list, from research group size to national/multi countries setup).
- compute part: building the software (tuned for your cpu/gpu, ie
openblas/atlas VS generic), SCL, Lmod/modules
- easybuild/spark/nix/...
- hardware (IB, dedicated hw such as FPGA, ...), ARM VS x86_64, ...
- management (puppet/ansible/salt/...)
- scaling on 10s on nodes VS 1000 VS more... (network/rack/datacenter
management at scale)
- user management (from plain /etc/{password|shadow} to FreeIPA, or
Active Directory...)
- shared storage, NFSv3/v4, pNFS, proprietary (cf panasas, gpfs,...)
- and managing 100 TB or 100 PB is not the same (cf robinhood.sf.net)
- distributed storage (client/server): tuned for different workload and
requirements (quotas, ACLs, streaming VS IOPS, locking?, cheap?, expandability) lustre, beegfs, rozofs, moosefs, ..., ceph, glusterfs,
- archiving/long term storage (irods?)
- batch queuing: slurm and friends
- containers (docker, singularity, ...)
- web interfaces for non IT fluent users
- remote visualisation (to avoid moving TB of data)
- UEFI vs plain PXE/legacy booting
- cloud expansion or cloud based for embarrassingly parallel workload ?
- haddoop ?
- what framework? warewulf as in openhpc, xcat, ks (foreman or DYI), ...
Cheers
Tru
On Sun, Apr 23, 2017 at 02:09:55PM +0200, Ondřej Vašík wrote:
I would like to propose start of High performance computing (HPC) SIG. I see it already mentioned on https://wiki.centos.org/SpecialInterestGroup among Future SIGs. Primary reason for the SIG existence will be to improve the state of High performance computing related packages on CentOS and similar distributions, with special focus on stability of builds, CentOS (and similar distribution) related improvements for OpenHPC project and getting new HPC packages packaged for CentOS and/or Fedora.
Initial members would be me (ovasik at redhat.com, CentOS FAS account: Reset), Adrian Reber (areber at redhat.com, CentOS FAS account: areber),
Almost, my CentOS FAS account is 'adrian'
Adrian