[CentOS] clustering

Wed Nov 16 08:37:47 UTC 2011
James A. Peltier <jpeltier at sfu.ca>

----- Original Message -----
| Hey folks,
| I just went through the archives trying to find some info on this but
| did not come up with much other than it seems there are a few experts
| here on the list.
| I have no experience with clustering and have just taken over a Stem
| Cell Research Lab that has a Grid Engine cluster. I have not yet dug
| into the details of Grid Engine (only been here a week now) but am
| just trying to get up to speed on clustering in general.
| I was just looking at Red Hat's site and they have this HPC thing
| http://www.redhat.com/promo/mrg/ but damned if I can find any actual
| details on it there - that data sheet they link to is just a bunch of
| marketing gobble-de-gook as far as I can make sense of it anyway.
| Quick question : what are Red Hat using to do that, and can CentOS do
| the same thing? How hard is it to configure? How does it compare to
| Grid Engine?
| I have to say I'm a bit hesitant about Grid Engine because of the
| whole Oracle takeover. I just don't trust Oracle.
| Basically I'd like to get up to speed really quickly on different
| clustering technologies, and maybe even set up a CentOS (or
| Scientific) based cluster in a sandbox to play with.
| I guess - looking for reading to get up to speed on clustering, and
| wondering what my options are with CentOS, RHEL and Scientific.
| thanks,
| -Alan

I'm not sure what is going to be happening with SGE, but we use Torque and Maui for our deparmental HPC clusters and Torque and MOAB for our Western Canada HPC environment (Westgrid).  There are a *lot* of aspects to HPC clusters that you need to be familiar with.  The resource managers and schedulers are the least of your problems.  The software toolchain and optimization are *the most important*.  Understanding how to optimize for the processors, troubleshooting inefficient code, etc.  That's where you should focus.

FWIW: MRG is based around Condor.  Aeolus the new cloud product (OpenForms) is also based around Condor.

James A. Peltier
IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices
I will do the best I can with the talent I have