Tru +1DH2017-04-24 9:27 GMT+02:00 Tru Huynh <tru@centos.org>:
HPC covers a lot of grounds, imho (cf beowulf mainling list, from
research group size to national/multi countries setup).
- compute part: building the software (tuned for your cpu/gpu, ie openblas/atlas VS generic), SCL, Lmod/modules
- easybuild/spark/nix/...
- hardware (IB, dedicated hw such as FPGA, ...), ARM VS x86_64, ...
- management (puppet/ansible/salt/...)
- scaling on 10s on nodes VS 1000 VS more... (network/rack/datacenter management at scale)
- user management (from plain /etc/{password|shadow} to FreeIPA, or Active Directory...)
- shared storage, NFSv3/v4, pNFS, proprietary (cf panasas, gpfs,...)
- and managing 100 TB or 100 PB is not the same (cf robinhood.sf.net)
- distributed storage (client/server): tuned for different workload and requirements (quotas, ACLs, streaming VS IOPS, locking?, cheap?, expandability) lustre, beegfs, rozofs, moosefs, ..., ceph, glusterfs,
- archiving/long term storage (irods?)
- batch queuing: slurm and friends
- containers (docker, singularity, ...)
- web interfaces for non IT fluent users
- remote visualisation (to avoid moving TB of data)
- UEFI vs plain PXE/legacy booting
- cloud expansion or cloud based for embarrassingly parallel workload ?
- haddoop ?
- what framework? warewulf as in openhpc, xcat, ks (foreman or DYI), ...
Cheers
Tru