<div dir="ltr">Tru +1<div><br></div><div>DH<br><div class="gmail_extra"><br><div class="gmail_quote">2017-04-24 9:27 GMT+02:00 Tru Huynh <span dir="ltr"><<a href="mailto:tru@centos.org" target="_blank">tru@centos.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

</span>HPC covers a lot of grounds, imho (cf beowulf mainling list, from<br>

research group size to national/multi countries setup).<br>

- compute part: building the software (tuned for your cpu/gpu, ie openblas/atlas VS generic), SCL, Lmod/modules<br>

- easybuild/spark/nix/...<br>

- hardware (IB, dedicated hw such as FPGA, ...), ARM VS x86_64, ...<br>

- management (puppet/ansible/salt/...)<br>

- scaling on 10s on nodes VS 1000 VS more... (network/rack/datacenter management at scale)<br>

- user management (from plain /etc/{password|shadow} to FreeIPA, or Active Directory...)<br>

- shared storage, NFSv3/v4, pNFS, proprietary (cf panasas, gpfs,...)<br>

- and managing 100 TB or 100 PB is not the same (cf <a href="http://robinhood.sf.net" rel="noreferrer" target="_blank">robinhood.sf.net</a>)<br>

- distributed storage (client/server): tuned for different workload and requirements (quotas, ACLs, streaming VS IOPS, locking?, cheap?, expandability) lustre, beegfs, rozofs, moosefs, ..., ceph, glusterfs,<br>

- archiving/long term storage (irods?)<br>

- batch queuing: slurm and friends<br>

- containers (docker, singularity, ...)<br>

- web interfaces for non IT fluent users<br>

- remote visualisation (to avoid moving TB of data)<br>

- UEFI vs plain PXE/legacy booting<br>

- cloud expansion or cloud based for embarrassingly parallel workload ?<br>

- haddoop ?<br>

- what framework? warewulf as in openhpc, xcat, ks (foreman or DYI), ...<br>

<br>

Cheers<br>

<span class="HOEnZb"><font color="#888888"><br>

Tru</font></span></blockquote></div></div></div></div>