On Tue, 18 Jul 2017 09:01:07 -0400 Jonathan Billings <billings at negate.org> wrote: > On Sun, Jul 16, 2017 at 06:02:15PM +0100, Pete Biggs wrote: > > > > > > The physicists and mathematicians who count there need high > > > durations. > > > > Yes. I too run HPC clusters and I have had uptimes of over 1000 > > days - clusters that are turned on when they are delivered and > > turned off when they are obsolete. It is crucial for long running > > calculations that you have a stable OS - you have never seen wrath > > like a computational scientist whose 200 day calculation has just > > failed because you needed to reboot the node it was running on. > > I too was a HPC admin, and I knew people who believed the above, and > their clusters were compromised. You're running a service where the > weakest link are the researchers who use your cluster -- they're able > to run code on your nodes, so local exploits are possible. They often > have poor security practices (share passwords, use them for multiple > accounts). I work at a quite large hpc site and fully agree. HPC resources need possibly more smart and active security work than your average server. With 1000+ users that can compile and run jobs and get their credentials misplaced etc. we typically move even faster than CentOS updates to fix/half-patch/mitigate security vulnerabilities. /Peter