On Wed, 2005-06-15 at 22:21 -0500, Bryan J. Smith wrote:
Yes and no. In fact, it has to do with the fact that Intel is still relying on a chipset to do what most everyone else is doing at the CPU interconnect ... Not true. PowerPC implementations, like the 970, that use HyperTransport do _not_ use it to the CPU. They still use an Intel like "memory hub" and single-point-of-contention. I.e., they only use it as a system interconnect for I/O, but not CPU itself. They use their own bus for CPU/memory. Same deal for inter-bridge connections between chipsets in even AGTL+ platform like in nVidia and SiS chipsets. The value of HyperTransport is not realized. There is still only a _single_ point on interconnect to the CPU(s). Athlon64/Opteron is the first, commodity platform to bring something of "partial mesh" to the system interconnect.
Other than the IA-64 commentary, this may not seem completely Linux- related, but it goes to the _heart_ of building _quality_ Linux/x86-64 servers that have excellent I/O throughput and, to a lesser extent, latency considerations.
Here is what Intel MCH architecture looks like: http://www.samag.com/documents/sam0411b/0411b_f2.htm
This is also what PowerPC 970 also looks like -- but it uses its own system interconnect instead of AGTL+ between CPUs/memory. It does _not_ talk HyperTransport. It's basically like having an nVidia or SiS chipset on an Intel platform, the HyperTransport is only on "one side" of the MCH.
Athlon MP / Alpha 264 EV6 is also variant of this. But instead of a "hub" between CPU, memory and I/O, there is 3-16 port "switch." But it's still a single point-of-connection at the EV6 "switch."
The "stock" IA-64 Itanium2 actually still uses a single point-of- contention as well in it's Scalable Node Architecture (SNA). But most companies that implement Itanium2 solutions don't use stock SNA.
Now let's say we look at one of the proprietary Non-Uniform Memory Architecture (NUMA) offerings of previous Xeon and, more today, Itanium (I used Xeon as an example, even if NUMA Xeon implementations are rare): http://www.samag.com/documents/sam0411b/0411b_f3.htm
They solve the memory contention issues, but still _not_ the I/O ones. Worse yet, there are some "processor affinity" deficiency that could be addressed with a better design beyond just NUMA.
Now let's look at the reference design 4-way AMD 800 as implemented in the HP ProLiant DL585: http://www.samag.com/documents/sam0411b/0411b_f5.htm
Now look at that -- a partial mesh! You've got independent HyperTransport interconnects to I/O _and_ other CPUs, as well as NUMA DDR channels directly on each CPU. So not only can you have "processor affinity" when it comes to programs and data, but you can also have "processor affinity" when it comes to memory mapped I/O too! And with the I/O MMU on-chip (really just an overgrown AGPgart controller from Athlon MP, long story ;-), you've got maximum throughput with minimum context overhead for I/O.
Because servers are all about I/O.