[CentOS] [OT] Memory Models and Multi/Virtual-Cores -- WAS: 4.0 -> 4.1 update failing

Sat Jun 25 22:06:30 UTC 2005
Bryan J. Smith <b.j.smith@ieee.org> <thebs413 at earthlink.net>

From: Peter Arremann <loony at loonybin.org>
> Then enlighten me - if I have 40 address bits - transmit only the higher 37 
> since I don't need the lower end. Timing schemas show only one Input hold 
> time per address transfer for all available pins - how can that be a 32bit 
> bus? (ftp://download.intel.com/design/Xeon/datashts/30675401.pdf) 

Once again, stop assuming a trace and its specifications for board layout
means that memory is directly addressable over those pins.  We could
go on that all night and for days, and I can give you countless examples
of embedded, PC and other memory controllers, memory technology, etc...
where paging and other "swap" is required.

Heck, as an EE, I would ass-u-me you would have been exposed to
memory controller design.  It is a complex mess whereby each new
change in addressing exponentially increases the transistor count.

> And, if timing diagrams, pinouts and so on lie about the size of the bus,
> ... cut derrogatory non-sense ...

*STOP*  This is why I can't even begin.  I _never_ said that Intel didn't
offer 64GiB (36-bit) memory addressing.  The traces _must_ exist so
Intel can on [A]GTL+ platforms, but it does not mean that there is not
some "paging" going on at the memory _logic_.  I just said it cannot
address it linearly -- directly by 36-bit -- in the GTL design.

It all goes back to Intel's belief that IA-64/Itanium would have taken
over the i686/GTL world by now when it decide not to build a new
architecture for IA-32 back in the '90s.  A belief that Intel has not
been paying the price for, and quickly retrofiting everything it can
ASAP.

AMD decided to switch to EV6 instead as its foundation for _all_
current processors back in 1996+, and that includes tunneling EV6
over HyperTransport in Athlon64/HyperTransport.  This has to do
with the fact that the Athlon core is a true 40-bit addressing
processor, and not 32-bit with PAE36 to page in the "overhang"
from a segment+offset that is normalized above 4GB.  Athlon just
_emulates_ PAE36 for compatibility.

If you hit "/proc/cpuinfo" on _any_ Athlon, it will show that the
PAE36 flag is supported.  PAE36 support has _always_ been in
_every_ Athlon because the core design is 40-bit EV6.  AMD
took the time, effort and transistors to emulate PAE36 at the
control, and put in the logic in its TLB and memory controller.
If they didn't, then AMD wouldn't be able to run any PAE36
OSes or applications.

And even if you don't have more than 4GB of RAM, there is
still the memory organization issues.  E.g., Red Hat currently
ships i386/i686 kernels with the 4G+4G model which hurts
performance over the 1G+3G model.  Why does it hurt
performance?  Because it either relies on the PAE36 paging
logic, or a software emulation of it (if the processor does
not support PAE36).  With the BIOS hack and kernel, this
allows Athlon MP to linearlly address directly above 4GB.

Intel is _finally_ just coming out with its first, true 40-bit
physical interconnect that breaks the limitations of GTL.
People should be wary to _not_ go with the majority of
Intel's existing platforms because of this, even those with
EM64T.  Only these newer platforms.

> And if its not, then your whole speach about the pae36
> differences between a gtl+ or ev6 connected device is
> wrong, which then in turn makes the only real difference
> the iommu the newer athlon cores provide (so dma can go
> above the first 4GB rather than having to be bounced)...
> This is also supported by the intel, amd and redhat docs
> (see links posted above and in previous mails),  the post
> Feizhou made in this thread convering the LKML references about 
> using the apggart and even microsoft 
> (http://www.microsoft.com/whdc/system/platform/server/PAE/pae_os.mspx)

I know, I also posted it, and there are even _more_ comments in the
LKML.  There are comments on how AMD has extra GARTs in the
Athlon (yes, even the so-called "32-bit" Athlon) to handle _all_
I/O, not just AGP (long story).

But I'm not talking just about that.  I'm talking about the serious
limitations with GTL itself -- even on Socket-640 and Xeon EM64T.
It's only the latest

> *shrugs* Intel, AMD and any other spec sheet you can find down
> to VIA chipset docs agrees with that... But I guess I'm still
> wrong though?

Dude, you are using _board_level_ spec sheets.  I'm talking about
the _internal_ design of the CPU/interconnect and how it handles
the addressing between the systems software and interconnect.

You get that from _neither_ of the "board level" spec sheets _nor_
the "programmer" guides.  You have to find more eccentric docs,
many times, they are not on-line.  Intel is not going to boast how
even their AGTL+ chipsets with first-gen EM64T can't directly
address above 4GiB because of legacy design limitations in the
platform.



--
Bryan J. Smith   mailto:b.j.smith at ieee.org