[CentOS] maximum cpus/cores in CentOS 4.1

Sat Sep 10 01:38:06 UTC 2005
Bryan J. Smith <b.j.smith at ieee.org>

Lamar Owen <lowen at pari.edu> wrote:
> You know, my name is not hard to spell.

Actually, many people seem to have a hell of a time spelling
mine around here.  But if you note, I don't mind one bit.  I
don't pick on ettique, mispelling, etc...  The only things
that get to me are hypocracy and lack of first-hand
experience.

> Stop.  Go back and reread the original post.  I said:
> "Certainly the 8x Opteron will be faster on many things;
> but under heavy multiuser load the 14-way SPARC does a
> surprisingly good job, with around three quarters the 
> performance of a dual 3GHz Xeon (that outclasses the SPARC
> box in every way possible except interconnect) at a load
> average of 30 or so."
> I mentioned nothing there that is patently wrong.

First off, I still don't see how you could move from talking
about the Opteron to comparing SPARC to Xeon.  There is
absolutely no similarities other than ISA, which is _not_ a
performance issue.

Secondly, I think someone else was focusing on your _latter_
comments which prompted me so ask, "are you for real?"  I
think it was that which prompted the various responses.

> I did a simple benchmark that showed the E6500 held up
> nicely under load.  Made a simple statement about it
> performing AROUND (that is, approximately) 75% of
> a different box's speed.

And that's _fine_ for comparing the 6500 SPARC UPA system to
a dual-Xeon FSB-MCH clusterfsck at a _specific_ application.

How this benchmark translates to Opteron, I have no idea.

> This was not an engineering-type post;

I'm still trying to figure out what post it was.
But then you really went off "the deep end" with your
follow-up -- _that's_ what prompted others to respond AFAICT.

> while my degree IS in engineering, I made a very simple
> general observation of the capabilities of a crossbar 
> interconnected system versus a bus-type system.

And Opteron is _neither_.  It's a partial mesh.

> Exactly where is that completely wrong?  Where is that
> off-base?  I said the E6500 was outclassed in every way 
> EXCEPT interconnect BY a Xeon box: I said NOTHING about
> an Opteron box except the very broadest generalization
> (since I don't have an Opteron on hand to try it out on).

Again, first you made the relationship.
Then you expanded on it with quite incorrect information.
I don't know your reasons, but your statements were trying to
be technically specific enough at points, then overly general
at others.

In the end, I just asked you not to explain the performance
of the Opteron by your benchmark.  Then you really went off.

> Further, this very E6500 is the one that I'm offering as a
> build box for CentOS SPARC; this makes that portion on
topic 
> for, if not this list, the -devel list.

Actually, the thread was on processor support.

I merely pointed out that you can set Opteron to 1024 and it
doesn't make a damn bit of difference if the hardware
configuration isn't supported.

Same deal with Xeon.  Upto 32 is supported, but I've booted
some 8-way Xeon systems and only gotten 2 processors because
the bridging logic wasn't supported.

> You can benchmark the dual Xeon against an 8x Opteron
> yourself.

I'm still scratching my head on what you were saying.
All I know is that it was wholly inapplicable to the
performance of Opteron.

> As to research on I/O interconnects in a high I/O
> environment, been there, done that.  There is more
> to a server's load than I/O.

Don't forget memory and memory mapped I/O, let alone the
_system_ interconnect to support any CPU-memory-I/O alongside
the _peripheral_ interconnect.  Servers are data pushers and
manipulators in many, many cases.

> I have an application, IRAF, that is very compute
intensive.
> Raw FPU gigaflops matters to this application, which runs
in
> a client-server mode.

Then it's far more CPU, although if it's being feed data,
memory and system interconnect can affect that.  Especially
if there is a client-server communication.

> Raw FPU gigaflops rises in standard stored program
> architectures roughly linearly with clock speed

On the _same_ core, _not_ different cores.
You can_not_ compare different cores by clock.

Otherwise people wouldn't still be running 667-733MHz Alpha
264s, let alone Itanium2 733-800MHz systems with 3.8GHz Xeons
out there.  Clock is only a measure in the _same_ core
design, _not_ different designs.

Heck, a P3 at 1.4GHz is _better_ than P4 at 2.8GHz when it
comes to many FPU opterations.  Throw the SSE switch to gain
a P4 "advantage" and kiss your precision goodbye!  Sadly
enough, the Pentium M at 2.1GHz is Intel's _fastest_ x86 FPU
engine -- you have to go to Itanium at, ironically, sub-1GHz
to get faster from Intel.

> (this is of course not true in massively parallel and
> DEL-type architectures (such as SRC's MAP processor
> (getting a MAPstation here for real-time cross-correlation
> interferometry))); and, given a particular 
> processor (say, SPARC) getting more clock speed will
> usually (but of course not always) get you more FPU power. 


When compared to the _same_ core.
Clock is _incomparable_between_ cores.
I honestly don't think you understand how superscalar
architectures work.
Some have more FPU units than others, some have FPU units
staged out so they take far more cycles than others.

> But that's not relevant.
> The original post was simply that the Linux kernel does
> well with a large number of processors, at least on SPARC.
> Good grief.

You went there too dude.

And my _original_ point continues to be that you can set the
number of x86-64 processors to 1,024, and you might still
only see 4-8 processors on a 32-way configuration.

Hardware support in the kernel for the system interconnect
design is what matters.  There is no "generic, transparent
scalable system interconnect," although HyperTransport comes
as close as you can get.


-- 
Bryan J. Smith                | Sent from Yahoo Mail
mailto:b.j.smith at ieee.org     |  (please excuse any
http://thebs413.blogspot.com/ |   missing headers)