[CentOS] Re: Is a new HP-dl380 dual XEON 64 bit or 32 bit -- AMD64 v. EM64T

Thu Jun 16 03:44:41 UTC 2005

On Wed, 2005-06-15 at 22:21 -0500, Bryan J. Smith wrote:
> Yes and no.  In fact, it has to do with the fact that Intel is still
> relying on a chipset to do what most everyone else is doing at the CPU
> interconnect ...
> Not true.  PowerPC implementations, like the 970, that use
> HyperTransport do _not_ use it to the CPU.  They still use an Intel like
> "memory hub" and single-point-of-contention.  I.e., they only use it as
> a system interconnect for I/O, but not CPU itself.  They use their own
> bus for CPU/memory.
> Same deal for inter-bridge connections between chipsets in even AGTL+
> platform like in nVidia and SiS chipsets.  The value of HyperTransport
> is not realized.  There is still only a _single_ point on interconnect
> to the CPU(s).
> Athlon64/Opteron is the first, commodity platform to bring something of
> "partial mesh" to the system interconnect.

Other than the IA-64 commentary, this may not seem completely Linux-
related, but it goes to the _heart_ of building _quality_ Linux/x86-64
servers that have excellent I/O throughput and, to a lesser extent,
latency considerations.

Here is what Intel MCH architecture looks like:  
http://www.samag.com/documents/sam0411b/0411b_f2.htm  

This is also what PowerPC 970 also looks like -- but it uses its own
system interconnect instead of AGTL+ between CPUs/memory.  It does _not_
talk HyperTransport.  It's basically like having an nVidia or SiS
chipset on an Intel platform, the HyperTransport is only on "one side"
of the MCH.

Athlon MP / Alpha 264 EV6 is also variant of this.  But instead of a
"hub" between CPU, memory and I/O, there is 3-16 port "switch."  But
it's still a single point-of-connection at the EV6 "switch."

The "stock" IA-64 Itanium2 actually still uses a single point-of-
contention as well in it's Scalable Node Architecture (SNA).  But most
companies that implement Itanium2 solutions don't use stock SNA.

Now let's say we look at one of the proprietary Non-Uniform Memory
Architecture (NUMA) offerings of previous Xeon and, more today, Itanium
(I used Xeon as an example, even if NUMA Xeon implementations are
rare):  
http://www.samag.com/documents/sam0411b/0411b_f3.htm  

They solve the memory contention issues, but still _not_ the I/O ones.
Worse yet, there are some "processor affinity" deficiency that could be
addressed with a better design beyond just NUMA.

Now let's look at the reference design 4-way AMD 800 as implemented in
the HP ProLiant DL585:  
http://www.samag.com/documents/sam0411b/0411b_f5.htm  

Now look at that -- a partial mesh!  You've got independent
HyperTransport interconnects to I/O _and_ other CPUs, as well as NUMA
DDR channels directly on each CPU.  So not only can you have "processor
affinity" when it comes to programs and data, but you can also have
"processor affinity" when it comes to memory mapped I/O too!  And with
the I/O MMU on-chip (really just an overgrown AGPgart controller from
Athlon MP, long story ;-), you've got maximum throughput with minimum
context overhead for I/O.

Because servers are all about I/O.

-- 
Bryan J. Smith                                     b.j.smith at ieee.org 
--------------------------------------------------------------------- 
It is mathematically impossible for someone who makes more than you
to be anything but richer than you.  Any tax rate that penalizes them
will also penalize you similarly (to those below you, and then below
them).  Linear algebra, let alone differential calculus or even ele-
mentary concepts of limits, is mutually exclusive with US journalism.
So forget even attempting to explain how tax cuts work.  ;->