On Tuesday 28 June 2005 10:12, alex@milivojevic.org wrote
I don't think Bryan is talking about "external" width of the address bus, how many lines you see printed on the motherboard. He's talking how the things are organized internally, and about the way the bus logic works. That would be what theoretically implementations might use, not what some specific implementation is limiting itself to.
alright - as I posted before, I went to bus level only to find common ground. But anyway, lets look at the cpu internals then...
Sorry for mixing software and hardware from now on, just trying to make Peter see where his misconception is (the best way I can, which might not be good way at all). The programming model for 32-bit userland applications is obviusly limited to 32 bits -- the sizeof(void *) will tell you that. So single process can see only 32-bit logical linear address space.
Yes, very much agreed.
On the other hand, processor (the hardware) doesn't need to have such limits, if it's internal organization is wider. So, AMD (processor) is able to see 40-bit linear physical address space as one single big chunk of memory.
If you have a 32bit address, there is no way to magically make 8 more bits appear. So you need to translate using a translation table... 32bit address A generates 36/40/64/whatever bit address B.... that's a simple table lookup. With that, you can map 4GB of 32bit address space into a larger space - but each process, like you said before, is still only 4GB.
So, how do you define a linear space there? Only 1 entry per 4GB? Intel can do that in their newest Xeons. Be able to use any address in the 36/40 bit address space? 36bit on PPro and newer, 40bit on newest xeons too. So that can't be the big difference either.
And about PAE. All PAE calls are just there to manipulate the PTE entries - so if AMD has their own method there to write entries into the page table great, but you still need a page table to translate from 32 to 40 bits somewhere... And if that's the case, then first AMD did omit it in all their documentation and it would have nothing to do with EV6 - you could do the same on a 40bit AGTL+ bus without issues.
So again - if you look before PT, then either one is 32bit.... If you look post translation, then both produce >32 bit linear - any address can be accessed.
Obviously, the kernel needs to know how to manage things in this wider physical address space, the reason why you need patched Linux kernel to take advantage of it.
What patch exactly is that?
Intel (the processor) on the other hand, is able to physically address only 32-bit address space. Anything wider than that, it needs to page. Dealing with paging will obviosly be additional work for OS, hence lower performance.
Unfortunately that's where you go wrong. The TLB buffers (that translate the 32bit address that you have into the real physical address, mmu jada jada...) have more than 32bit wide entries even on the Intel hardware. And unlike our friend Bryan, I don't mind backing that up with references to official documentation: ftp://download.intel.com/design/Pentium4/manuals/25366816.pdf Look on page 3-31. There you will see that for PAE the PTE have been extended to 64 bit (only 36 used for PAE36)...
This address is then used to access memory. This is where the bus comes in - after the translation. The application has a 32bit pointer, the page table lookup converts it into a 36bit one if PAE36 is enabled. No difference here between EV6 and GTL+ - the bus simply does not matter there.
Oh, if you don't believe intel and you want inofficial things look at http://www.prism.gatech.edu/~gte213x/LinuxMM/rpt.html#cPGH or even page.h of your kernel source code... i.e. line 71 that refers to the 63rd bit in the PT... (2.4.21 kernel source, haven't checked 2.6..) but I guess that doesn't count to Bryan either :-)
So while both processors will have more that 32 address lines on the packaging (and printed on motherboard) minus couple of lowest one (that are not needed) as you can see in various specifications that you queted, that doesn't mean processor's core actually sees all that address space.
Yes, they don't see the upper bits in the page table - but that's the same no matter what :-)
How else to try explaining this... Hm... Remeber Intel 8086? It could address only 16-bit address space, but it had more than 16 address lines on the packaging. It used segmenting (hopefully the right term) to see wider-than-16-bit address space. Now try to make analogy with what was discussed so far ;-)
It generated 20 bit addresses out of segment and offset... but the data width is 16 bit or 8 bit on the 8088... That's what they used to define the bus width... If you read the thread, I mentioned it somewhere that parallel busses are defined either by their address width (now common) or data width (used to be common)... For the 8086/88 Intel did the later to show the difference between the cpus.
The PIII and the like are 32 bit cpus but the bus address width is up to 40 bits. This of it the other way around - look at the chipset or another device connected to the bus. They need to speak the bus protocol without knowing about applications, without knowing about protected mode or anything. So any CPU knowledge can't be used to argue about the basic bus structure.
Anyway, in the end what it comes down to is the point that the GTL+ or EV6 bus has nothing to do with PAE36 as Bryan claimed. HyperTransport of the Athlon64s also has very little to do with EV6 and so on.
I don't disagree that there might be a hidden super secret extension in the 32bit athlons. But it has nothing to do with EV6. It has nothing to do with voodoo magic or Bryan's chest beating either. Its bits and bytes - and sorry, you can argue about religion but In engineering, if the manufacturer doesn't document it, you can't offer any prove and - most importabtly - it goes against the docs of at least 5 other manufacturers, then sorry, you're just wrong.
Peter.