[CentOS] maximum cpus/cores in CentOS 4.1

Fri Sep 9 21:48:41 UTC 2005
Bryan J. Smith <b.j.smith at ieee.org>

Lamar Owen <lowen at pari.edu> wrote:
> The world that shows that, if 3 is less than 4, then 3 is
> also less than 10.

SPARC and Xeon are not comparable in generalities.
Xeon and Opteron are not comparable in generalities.

> No, they are not completely different.  Last time I
> checked, they both executed instructions,

Are you for real?

> they both used RAM to store those instructions,

Are you for real?

> and they behaved as a Von Neumann machine.

NOTE:  The definition of "Von Neumann" is not as generalized
in the EE world.  ;->  It's like saying E=mc^2 to a
physicist.

> They are only different in the details that you wish
> to emphasize.

But those details _mean_everything_!

> Xeon != GTL+ and Opteron != HT.
> GTL+ and HT are merely the interconnect, 
> which, while it very much does impact the performance of a
> box, it is just the interconnect; having a different
> interconnect does not make two CPU's 
> _completely_ different, and you should know that.

No, it's _everything_ when it comes to a server!

> Is aluminum _completely_ different from steel?

Poor analogy.
It's like calling a pickup and an 18-wheeler both "trucks."

> The thread was about number of CPUs;

Yes!  And when you use more than 1 Xeon or 1 Opteron,
_everything_ changes.  And Xeon differs drasically from
Opteron.  Heck, Xeon and Itanium are about the same in this
regard.

Once you pass 2-4 Xeons or Itaniums, they you're talking
hardware-specific kernel hacks.  Such implementations are
typically proprietary -- especially beyond 4 sockets.

Opteron uses a more flexible inter-CPU interconnect.  Custom
bridging is not required.  But beyond 8 sockets is not
standardized quite yet.  Since they are rare, the hacks have
not been added to the kernel.

I don't know much more "real world" I can be on this. 
Everyone is talking in generalizations and I'm trying to say
exactly what the differences are.

> I answered in a fashion that indicated that I knew that
> this was only indirectly applicable

No, it's utterly inapplicable.
If you want to compare Xeon and SPARC, that's one thing.
But every single assumption I've read on Opteron in this
thread has been so inapplicable, I just can't understand it.

Stock Xeon and Itanium do not support >4 nodes.
You have to use non-standard/non-commodity briding.
There are several vendors with such, from a few,
semi-consistent 8x S604s (Xeon) to 32x S604 (HP Xeon) to SGI
and other up-to 64-way Itanium.  They are _not_ standard
implementations, and additional hardware support is required.

Opteron has yet to have such non-commodity implementations,
at least in volume.  Although HyperTransport does make some
things transparent, to this point, only 8x S940.  I haven't
seen a "standard" reference for more than 8x S940, and the
modular approaches used with HT extenders is still being
worked out.

Hence why the hardware support hasn't been added to the
kernel.

CASE-IN-POINT:  Merely boosting the x86-64 to support more
than 8 processors, or even 16 processors, will do _little_ to
support that many if the hardware support for those designs
aren't there.

I've seen the same thing with 8-way Xeon systems, people only
seeing 2 processors.  Why?  Because the bridging approach was
not supported in the kernel.

> (but since when have you  paid attention to what someone
> actually said?).

Actually, I paid very close attention!
At first I was interested.
Then I was disgusted at your poor application after I saw
where you went with it.

> I did a comparison that cast the E6500 (a five year old
> box) against a decent server (by today's standards)

You made a comparison to Xeon.  It is wholly inapplicable to
make assumptions about the Opteron from that -- period.

> (BTW: not running x86_64 code, either, but i386 code)

There's actually little difference from a server standpoint
when it comes to EM64T, but that's another thread.

> in a pretty decent light.  I made a fairly simple comment
> that you have blown completely out of proportion,

Because you feel compelled to discuss a platform you have
never used, and seemingly do not understand.

I have seen several posts on this board from people who think
they understand Opteron -- several even comparing the
HyperTransport interconnect approach of non-Opteron to
Opteron, which is a bit different.

> and you have made this worse than useless.  You have told 
> me nothing I did not already know, other than that doing a
> *PLONK* on your incoming e-mail to my servers would be a
> Big Win.

It would really help if you comment on what you have
first-hand experience with, and not try to make assumptions
based on poor third-hand experience.

I often get lambasted because what most -- possibly a great
majorith -- people see as an "anal little difference" is
actually a very, very _big_ difference.  That's why I am very
strict on correcting such things.

Not to "be a jerk" -- but to point out that fact that a
"common assumption" is very, very _incorrect_.  I find I'm
actually in the _very_small_minority_ when it comes to
several things -- from semiconductor concepts to
file/database server design to server performance -- on
various IT lists.  And I have to say I will _not_ join the
majority anytime soon.

I like to think that's why I get repeat business.  If you
want to take offense to my comments, then that's your choice.
 But I really do "put my foot down" on things that are wholly
inaccurate -- and 9 times out of 10, it's the difference
between first-hand experience and third-hand.
 
> You once again fail to read what I wrote.  The UltraSPARC
> CPU (not interconnect) architecture is difficult to get
> to run at higher clock speeds.

Why does clock speed matter one iota when it comes to
servers?
Why?  Processors don't even need a clock (but that's another
story)!

Get off clock speed.  The only thing clock speed is good for
is measuring performance of the _exact_same_ core design. 
Otherwise, it's rather useless.
  
> The Opteron, both by being lower cost at a given speed
> (and speed does matter, even though raw clock speed doesn't
> matter as much as many think; speed is a big factor for 
> number crunching)

Those statements are *DEAD*WRONG*.

Clock speed in a clocked boolean logic (CBL) circuit is when
the gates switch.  In fact, given the speed of light is too
slow, it's quite regionalized below 0.25um feature sizes.

The number of execution units, the type of execution units
and their number of stages are what matter -- assuming the
design is even superscalar!  I've seen 500MHz SGS-Thompson
embedded processors that are so-called "P3 class" get killed
by a decade-old superscalar NexGen Nx586 at 84MHz.

And proprietary implementations of the Itanium, let alone
standard Alpha 264 which years older and cheaper, make the
Xeon dog-meat at 1/4-1/5th the clock in many, many server
applications.  In fact, before the Opteron, proprietary
Itanium was better than Xeon for many, many server
applications if you could afford it.

You can_not_ compare performance by clock speed _between_
products -- period.  In fact, clock is slowly but surely
being removed from significant portions of the processor. 
Asynchronous is returning because the clock is a _very_bad_
thing.

But I won't go there.  The stupidest thing introduced in a
microprocessor was the clock, let alone the operand+operator
approach to instruction sets.  But that has more to do with
the fact that CS majors controlled IC design in the '70s, and
engineers didn't come around until the '80s (which is where
the "RISC hack" came about).

The good news is the next generation of microprocessor
designs does away with clock, the instruction set
architecture (ISA) and other '70s legacy concepts design by
CS approaches.  Physicists and engineers now control design,
and software-based binary translation solves the
compatibiltiy issue.

> outclasses the SPARC systems these days.

It has more to do with interconnect than clock.
Trust me on this.

Proprietary Itanium systems are a great example.
Just like proprietary Xeon systems before that.

But the Opteron is the first to do it commodity.

> Economic reasons as much as any other probably
> played a significant role here,

Of course.  Why do SPARC when Opteron has a better
interconnect for less money?

And, BTW, SPARC _is_ available at about the _same_ clock as
Opteron for the price.

> and I personally do not agree that interconnect 
> technology was the major factor.

Then you would be in the minority among system designers.
I'm not talking about solution providers, I mean engineers.

It's the commodity systems interconnect of the Opteron that
has made it the "cost king."  You can't get its interconnect
without some non-standard interconnect design with Xeon,
Itanium, etc..., or using a less commodity RISC platform like
SPARC.

The problem is that beyond 8-way S940 hasn't really taken off
... yet.  There are some non-commodity approaches to be
standardized on yet, but it's happening.  And when it does,
the kernel support will be added.

Hence why enabling 32+ way for Opteron won't do squat right
now.

> Of course, a statement to the contrary by someone in Sun
> who helped make that decision would prove me wrong.

Feel free to assume I'm pulling everything out of my @$$.
;->

> The Sun Starfire and Gigaplane architectures both are
> impacted by processor-RAM affinity, since access to the
> local RAM on any given CPU/memory card is via the local
> UPA and doesn't have to hit the board-external
> interconnect.

Yes.  And there is some I/O affinity too.  Opteron adds a few
more things with its direct, partial-mesh HyperTransport
approach, instead of the crossbar of UPA.  But both have
NUMA.

Which is why Solaris' maturity in this regard makes it an
_ideal_ operating system for Opteron _today_.  It's grown up
with not only NUMA, but cross-bar interconnects which are
half-way to a partial mesh.

Linux has grown up on "front side bottleneck" into a "memory
controller hub."

> In a manner of speaking, Gigaplane is a type of NUMA,
> even though it isn't 'true' NUMA.

Yes.  But it still has to hit the crossbar to get to I/O.
Opteron has processor affinity for I/O too.

> Starfire OTOH can be true NUMA (the architecture came
> from SGI).

Yes, I know.  SGI even transferred some of the OS code to
Microsoft for NT in their short-lived NT move.

> Don't bother to 'correct' me (I know I'm being 
> somewhat generic in those statements, and, if I had time
> and wanted to do so, I could delve into the nitty-gritty);

As could I.

> I won't see your reply.

Ignorance is bliss.



-- 
Bryan J. Smith                | Sent from Yahoo Mail
mailto:b.j.smith at ieee.org     |  (please excuse any
http://thebs413.blogspot.com/ |   missing headers)