From: alex@milivojevic.org
Well, this is how I interpreted Bryan's emails. He'll probably correct me if I'm wrong (yeah, I have EE, but haven't done much EE work since I graduated, it just happened to be mostly "software" things for me, so I'm rather rusty here :-(
If you haven't noticed, I am _not_ big on "credentials." As far as I'm concerned, as long as someone has the knowledge, I could care less about the paper**. Even most state BoPEs will allow you to replace an ABET Accredited BSE with 8-12 years of experience (plus another 4-5 years experience post-degree) to qualify to become a PE.
There are semiconductor engineering mindsets and they create differ from programmer or even technologist.
[ **NOTE: Being a consultant, I finally had to "give in" to the paper. I also decided to major in engineering "just in case I needed it" (and, later on, got to use it for 2 years in the semiconductor industry). ]
I don't think Bryan is talking about "external" width of the address bus, how many lines you see printed on the motherboard.
That's because it is _irrelevant_ in many cases. You can mux lines, which is exactly what the original EV6 Crossbar Athlon does.
He's talking how the things are organized internally, and about the way the bus logic works. That would be what theoretically implementations might use, not what some specific implementation is limiting itself to.
Exactly. Until the new 40-bit Xeon MPs came out, they were pretty much "slap on" designs with extended ALUs and microcoding to be x86-64 compatible to a point.
Sorry for mixing software and hardware from now on,
It hard not to.
First you have to consider the programmer aspects. Then you have to realize those will influence hardware compatibility. Then you have to remember that with every new addressing model, you _exponentially_ increase the external logic to drive it.
just trying to make Peter see where his misconception is (the best way I can, which might not be good way at all).
It's fairly difficult to do it without breakout a basic memory controller circuit at least the transistor level, or possibly the combinational NAND gates it somewhat represents.
The programming model for 32-bit userland applications is obviusly limited to 32 bits -- the sizeof(void *) will tell you that. So single process can see only 32-bit logical linear address space.
To a point. PAE36 is what allows you to break that, because even in the i386, the 16-bit segment register that is offset 4-bits from the 32-bit offset register results in a 36-bit "normalized" address (it can actually be 37-bit, but that's another story). How the processor handles that in a way that is compatible with the OS is the problem.
PAE36 uses "paging" from above 32-bit (up to 36-bit/64GiB) down to below 32-bit (upto 4GiB). PAE36 processors can support this paging, and then have at least 36-bit traces on the platform. GTL+ logic was designed with this "slapped on," and never bothered with direct, linear access until just recently (with the new 40-bit redesign for Xeon MP).
Athlon, on the other hand, has always been 40-bit EV6. AMD decided to support the PAE36 paging, which added logic. But they _always_ had the "40-bit linear addressing for free" inherently in the platform. That's where these few BIOS hacks come in, combined with an OS/model that can take advantage of it. If you enable this mode for Windows, it will _not_ work!
On the other hand, processor (the hardware) doesn't need to have such limits, if it's internal organization is wider. So, AMD (processor) is able to see 40-bit linear physical address space as one single big chunk of memory. 32-bit applications will have their 32-bit *logical* address space mapped into this 40-bit linear *physical* address space of processor. I don't know if programming model of "32-bit" AMD processors allows you to have wider-then-32-bit pointers (even if it did, you would have to have compiler that can generate such code, gcc can't do it for sure).
Not true. The model is _still_ PAE36, up to 36-bit/64GiB. But instead of paging, the combination of Athlon's "inherent 40-bit" with an OS that can do it will not use paging. Normally PAE36 does paging, which is what PAE36 OSes normally do.
Obviously, the kernel needs to know how to manage things in this wider physical address space, the reason why you need patched Linux kernel to take advantage of it.
The actual amount of space doesn't change -- at least not under the PAE36 model. It's just how the OS commands the memory logic to use it, and that also requires the firmware (BIOS) to pre-configure it at POST. Under a norminal PAE36 OS and board, the memory logic and OS always do paging to access above 32-bit/4GiB.
But I don't think this hack is able to linearly address the entire 40-bit because of the limitations of what PAE36 can address.
Intel (the processor) on the other hand, is able to physically address only 32-bit address space. Anything wider than that, it needs to page. Dealing with paging will obviosly be additional work for OS, hence lower performance.
Paging is a significant hit. Anyone running in the 4G+4G "HIGHMEM" model of their Linux kernel and recompiles for 1G+3G model will notice a noticeable performance gain (as long with only 960MiB of memory).
So while both processors will have more that 32 address lines on the packaging (and printed on motherboard) minus couple of lowest one that are not needed) as you can see in various specifications that you queted, that doesn't mean processor's core actually sees all that address space.
How else to try explaining this... Hm... Remeber Intel 8086? It could address only 16-bit address space, but it had more than 16 address lines on the packaging. It used segmenting (hopefully the right term) to see wider-than-16-bit address space. Now try to make analogy with what was discussed so far ;-)
It's somewhat correct, although it gets interesting.
It's more like EMS than XMS, because XMS used to shunt the processor in and out of Protected286/386, and that required a 286/386 to access it. You could run a true 24-bit/32-bit, respectively, to avoid that.
EMS worked on old 8088/8086 (as well as 80286/386), because pages above 1MB could be mapped in. The 80386 and some 80286 could emulate this as well without a special card with special addressing.
In the GTL+ bus, this is exactly what it does. It offers special addressing lines for a memory logic that pages, because that's all the OS does. Even the early EM64T processors had to deal with this "limitation" of GTL+ platform.
You want to ensure you get a new EM64T platform that doesn't have that approach.
-- Bryan J. Smith mailto:b.j.smith@ieee.org