Re: [CentOS] Thanks to every one

18 Jul 2017


      On Tue, July 18, 2017 8:01 am, Jonathan Billings wrote:
...
On Sun, Jul 16, 2017 at 06:02:15PM +0100, Pete Biggs wrote:
...
...
The physicists and mathematicians who count there need high durations.
Yes. I too run HPC clusters and I have had uptimes of over 1000 days -
clusters that are turned on when they are delivered and turned off when
they are obsolete. It is crucial for long running calculations that you
have a stable OS - you have never seen wrath like a computational
scientist whose 200 day calculation has just failed because you needed
to reboot the node it was running on.
I too was a HPC admin, and I knew people who believed the above, and
their clusters were compromised.  You're running a service where the
weakest link are the researchers who use your cluster -- they're able
to run code on your nodes, so local exploits are possible.  They often
have poor security practices (share passwords, use them for multiple
accounts).
Also, if your researchers can't write code that performs checkpoints,
they're going to be awfully unhappy when a bug in their code makes it
segfault 199 days into a 200 day run.
Scheduled downtime and rolling cluster upgrades is a necessity of
HPC cluster administration.  I do wish that the ksplice/kpatch stuff
was available in CentOS.
Thanks, Jonathan! Before your reply I had bad feeling that I'm the only
one in this World who still respects security considerations... The only
thing is: I still shy away from ksplice/kpatch, and do reboot machines
instead of patching running kernel on the fly.
Valeri
++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Thanks to every one