From: Peter Arremann loony@loonybin.org
And that's exactly the part I don't get - if you have a 32bit address model then you have to use PAE of some sort (compatible to Intel PAE36 or not) to get to address more than that memory...
The BIOS hack basically says "don't be stupid" >4GB. It tells the Athlon MP to the PAE36 addresses linearlly, instead of paging in GTL compatible fashion.
With that said, do you mean _beyond_ 36-bit/64GiB?
First off, I don't think there ever was an Athlon MP mainboard with more than 64GiB (or more than 32GiB for that matter). Secondly, programs are written to PAE36, so they can only address 64GiB (if not 32-bit/4GiB).
*BUT* is it possible for the Athlon MP to still address more than 64GB and "window" (not page) PAE36/64GiB programs? That's a good question, and it's the #1 reason I'm trying to find that post in the LKML.
In reality, if the hack uses Linux/x86-64, it could very well be that it puts the Athlon MP in "48-bit/256TiB Long Mode" and translates addreses linearlly. The Athlon MP wouldn't offer PAE52, no. But it _could_ offer PAE36 windows in a 48-bit (40-bit physical EV6) space.
Remember, the Athlon/MP and Athlon64/Opteron are the _exact_same_ core design, including the 40-bit/1TiB EV6 interconnect for addressing outside the chip (even if the latter tunnels over HyperTransport between CPUs and I/O to other memory, instead of using a "crossbar switch").
again - how do I generate a >32bit address when using a 32bit address model without pages? :-)
Remember, in i386 (and, subsequently, the i486 TLB), the segment register is offset at bit 20. That means the most significant bit (MSB) of the segment register actually has a value of 2^35 (32GiB)!
Segment: 1111 1111 1111 1111 ____ ____ ____ ____ ____ Offset: ____ 1111 1111 1111 1111 1111 1111 1111 1111
Before PAE36 in the i686, Intel would thrown an exception if you set _any_ of the top 4-bits of the segment register, or if the 5th MSB of the segment was set and either the MSB of the offset, or the previous bits "overflowed" when "normalized" into the physical address. That would result in a physical address >32-bit/>4GiB.
In i686 with PAE36, Intel now uses those >32-bit/>4GiB addresses as "pages" into <32-bit/<4GiB addresses.
In EV6 with the incompatible "BIOS hack" and a supporting kernel, the Athlon MP's TLB doesn't page, it allows _direct_, _linear_ access above 32-bit/4GiB.
Digital EV6 is capable of physically addresssing 40-bit. Intel GTL is only capable of 32-bit.
When Athlon is in GTL compatibility mode, it only does 32-bit for full compatibility. But you can tell it to use native, 40-bit EV6 with this BIOS hack. The BIOS hack sets up the crossbar registers which the OS must then support.
I'll confirm how this is being done. I believe it actually leverages the Linux/x86-64 kernel.
Once I have that address, throwing it out on the bus is easy - but how do you generate that address without a PAE?
That's where the hack comes in.
Remember, _all_ kernels in _all_ OSes use the TLB, hence why an i486 ISA compatible is virtually required these days (or they do it in the kernel's software).
Normally the Athlon TLB will do the same thing as any Intel GTL, so any such i686 kernel with PAE36 will normally do the same thing. But the BIOS hack puts the Athlon TLB into its _true_ 40-bit EV6 addressing, and the kernel then knows of this and uses it.
So you're simply talking about being the ability to output an address longer than 32bit on the bus?
It's more than just that.
It's the combination of presenting PAE36 to _user_ software, while using _linear_ addressing physically.
Intel i686+GTL already does PAE36, and pages in addressings above 32-bit. Athlon+EV6 normally just emulates that, right down to the TLB.
This hack doesn't, it unleashes the Athlon as it's meant to work on a 40-bit interconnect designed for 64-bit Alpha processors. And that includes having the kernel drive the TLB to just take those >32-bit "normalized" addresses that would normally be "paged" and just act like it's linear space.
As far as above 36-bit, I'm not sure. I'll check on that.
on AMD64, yes, thats for sure... but you were referring to 32bit athlons in the statement I'm trying to understand.
And what I'm saying is that Athlon is Athlon. It's the same core, same 40-bit EV6 interconnect. Everything else is compatibility.
On Athlon 64, you can run a PAE52 kernel (on its 40-bit physical platform) and run PAE36 applications, giving linear address space to them all while they think paging is being used.
What this hack does is do the same thing for Athlon MP -- marketed as "32-bit," but the _same_ 40-bit capable, _physical_ platform. EV6 is EV6, designed for 64-bit Alpha, and AMD couldn't "cripple" it. They use the same board logic and cores, whether a 32-bit Athlon or 64-bit Alpha are used.
But doing anything but traditional, GTL 32-bit/PAE36 would normally _break_ hardware and OSes. Unless you have a hack to the BIOS which enables this "Linux" memory mode, and a kernel which supports it.
Again, I'll find the post, which will lead me to the tech info.
-- Bryan J. Smith mailto:b.j.smith@ieee.org
On Friday 24 June 2005 18:44, Bryan J. Smith b.j.smith@ieee.org wrote:
From: Peter Arremann loony@loonybin.org
And that's exactly the part I don't get - if you have a 32bit address model then you have to use PAE of some sort (compatible to Intel PAE36 or not) to get to address more than that memory...
The BIOS hack basically says "don't be stupid" >4GB. It tells the Athlon MP to the PAE36 addresses linearlly, instead of paging in GTL compatible fashion.
Ok - so the PAE36 mechanism is the same, the 36bit addresses are the same just the address ranges that are reserved for special use are different?
With that said, do you mean _beyond_ 36-bit/64GiB?
Sorry meant beyond 32bit address space / 4GB.
In reality, if the hack uses Linux/x86-64, it could very well be that it puts the Athlon MP in "48-bit/256TiB Long Mode" and translates addreses linearlly. The Athlon MP wouldn't offer PAE52, no. But it _could_ offer PAE36 windows in a 48-bit (40-bit physical EV6) space.
That would require an additional level of page translation though or a really substential redesign - so its in my opinion unlikely that they do...
Intel GTL is only capable of 32-bit.
GTL (P5) yes - but that doesn't have PAE36 either... But we're talking PAE here - so GTL+ aera... The address pins are even labeled up to A35 Download any of the PPro and newer spec sheets (except mobile cpus that don't support PAE). The Pentium II 300Mhz Specs is a good example. Quote from page 75: "The Address signals define a 2^36-byte physical memory address space." document id 24365702.pdf
Peter.
On Fri, 2005-06-24 at 20:16 -0400, Peter Arremann wrote:
Ok - so the PAE36 mechanism is the same, the 36bit addresses are the same just the address ranges that are reserved for special use are different?
Ugh, sigh, you're thinking like a programmer. ;-> You have to separate the logical (programmer) from the physical (engineer) concepts.
Sorry meant beyond 32bit address space / 4GB.
Right, I understood that. I was just wondering if you meant beyond 36-bit/64GiB as well? Because EV6 is 40-bit/1TiB.
That would require an additional level of page translation though or a really substential redesign - so its in my opinion unlikely that they do...
Again, thinking like a programmer. ;->
The Athlon was _already_ designed for 40-bit EV6 _physically_ and eventual "Long Mode" programmatically. The TLB in the Athlon is _not_ designed for GTL, but EV6. It only works like a GTL when so commanded. ;->
GTL (P5) yes - but that doesn't have PAE36 either... But we're talking PAE here - so GTL+ aera...
I meant GTL and all derivatives ... GTL+, AGTL+, etc...
The address pins are even labeled up to A35 Download any of the PPro and newer spec sheets (except mobile cpus that don't support PAE). The Pentium II 300Mhz Specs is a good example. Quote from page 75: "The Address signals define a 2^36-byte physical memory address space." document id 24365702.pdf
Of course! Because if it didn't, it wouldn't be able to page in beyond 4GiB. @-ppp Yes, but the TLB compatibility and other issues are involved, things that GTL and all derivatives _never_ addressed.
In a nutshell, Intel PAE36 processors on even AGTL+ are _incapable_ of the combination of both _physical_ and _logical_ addressing to do anything but paging above 4GiB. That's my point.
AMD solved the problem by using EV6, which is _nothing_ like GTL-based busses. It goes far deeper than you realize.
On Fri, 2005-06-24 at 20:14 -0500, Bryan J. Smith wrote:
In a nutshell, Intel PAE36 processors on even AGTL+ are _incapable_ of the combination of both _physical_ and _logical_ addressing to do anything but paging above 4GiB. That's my point.
BTW, there are also issues with Intel's EM64T (IA-32e) processors in PAE52 (Long) mode with "windowing" PAE36 processes on an AGTL+ platform.
Things solved _long_ago_ on the original Athlon, and more formally supported in AMD64 (x86-64), including being enabled on the original Athlon MP with that BIOS-based mode change.
On Friday 24 June 2005 21:14, Bryan J. Smith wrote:
On Fri, 2005-06-24 at 20:16 -0400, Peter Arremann wrote:
Ok - so the PAE36 mechanism is the same, the 36bit addresses are the same just the address ranges that are reserved for special use are different?
Ugh, sigh, you're thinking like a programmer. ;-> You have to separate the logical (programmer) from the physical (engineer) concepts.
Sorry I guess. Don't know how that happened - went to a electrical engineering school, specialized in micro electronics :-)
That would require an additional level of page translation though or a really substential redesign - so its in my opinion unlikely that they do...
Again, thinking like a programmer. ;->
*whistles innocently*
The Athlon was _already_ designed for 40-bit EV6 _physically_ and eventual "Long Mode" programmatically. The TLB in the Athlon is _not_ designed for GTL, but EV6. It only works like a GTL when so commanded. ;->
Ok, now I have to ask even more questions :-) If the physical bus is EV6, the internals are EV6 and the programming model doesn't matter as you said before - then why would the TLB on the athlon emulate a GTL?
Intel GTL is only capable of 32-bit.
GTL (P5) yes - but that doesn't have PAE36 either... But we're talking PAE here - so GTL+ aera...
I meant GTL and all derivatives ... GTL+, AGTL+, etc...
Then you're clearly wrong calling it a 32bit addressed bus...
The address pins are even labeled up to A35 Download any of the PPro and newer spec sheets (except mobile cpus that don't support PAE). The Pentium II 300Mhz Specs is a good example. Quote from page 75: "The Address signals define a 2^36-byte physical memory address space." document id 24365702.pdf
Of course! Because if it didn't, it wouldn't be able to page in beyond 4GiB. @-ppp Yes, but the TLB compatibility and other issues are involved, things that GTL and all derivatives _never_ addressed.
In a nutshell, Intel PAE36 processors on even AGTL+ are _incapable_ of the combination of both _physical_ and _logical_ addressing to do anything but paging above 4GiB. That's my point.
Define paging then - I can access >4GB on an EMT64 - It just lacks and IO mmu to do coordinate dma from a 32bit device to a memory region > 4GB... but if I just do a memory access, then there is no issue - no paging nessecary...
AMD solved the problem by using EV6, which is _nothing_ like GTL-based busses. It goes far deeper than you realize.
they solved the 32bit IO issue with by morphing their agpgart into a minimalistic io mmu ...
Peter.
On Fri, 2005-06-24 at 21:35 -0400, Peter Arremann wrote:
Sorry I guess. Don't know how that happened - went to a electrical engineering school, specialized in micro electronics :-)
So you understand how memory controllers work then, correct? You understand how memory can show up as half, quarter or incompatible based on differences in IC, controller logic, etc..., correct?
Ok, now I have to ask even more questions :-) If the physical bus is EV6, the internals are EV6 and the programming model doesn't matter as you said before
- then why would the TLB on the athlon emulate a GTL?
Compatibility. Win32/PAE36 expects to be able to page 512MiB from above 4GiB into under 4GiB using the TLB.
Then you're clearly wrong calling it a 32bit addressed bus...
_All_ GTL designs are physically 32-bit addressing buses. The use of 4-bit extra bits to do PAE36 is _not_ direct.
Define paging then - I can access >4GB on an EMT64 - It just lacks and IO mmu to do coordinate dma from a 32bit device to a memory region > 4GB... but if I just do a memory access, then there is no issue - no paging nessecary...
I'm not even talking about the I/O MMU anymore. I'm talking about paging 512MB of memory from above 4GiB (PAE36) into 4GiB (32-bit).
GTL has to do it. EV6, natively, does not.
they solved the 32bit IO issue with by morphing their agpgart into a minimalistic io mmu ...
I'm not even talking about the I/O MMU anymore.
On Friday 24 June 2005 22:22, Bryan J. Smith wrote:
Then you're clearly wrong calling it a 32bit addressed bus...
_All_ GTL designs are physically 32-bit addressing buses. The use of 4-bit extra bits to do PAE36 is _not_ direct.
Hmmm... GTL is a parallel bus... Therefore the width of the bus is defined by either address or data width. GTL and all variants are 64bit data, so that can't be the reason for calling it a 32bit bus. Since the accesses have to be aligned on 8 byte boundries the lower the bus does not carry A0, A1, A2.
On the Pentium (GTL bus) there are the address pins A3 through A31. [Pentium MMX embedded version, couldn't find the regular P5/P54C docs anymore. www.intel.com/design/intarch/applnots/27320602.pdf]
Then, GTL+, support for PAE36, has address bits for 36bit adressing - A3 through A35. [Pentium II processor specs: ftp://download.intel.com/design/intarch/datashts/27326801.pdf]
Finally you've got the high end stuff like a Xeon MP, 64bit... Address lines A3 through A39 - so space for 40 bit physical addresses. [http://download.intel.com/design/Xeon/datashts/30675401.pdf&e=7152]
So, if the different number of address lines doesn't change the width of a bus, what does? :-)
Peter.
On Fri, 2005-06-24 at 23:22 -0400, Peter Arremann wrote:
Hmmm... GTL is a parallel bus... Therefore the width of the bus is defined by either address or data width. GTL and all variants are 64bit data, so that can't be the reason for calling it a 32bit bus.
Sigh, I _explicitly_ used the adjective "address." If I slipped once or twice, I apologize. This is pretty a good sign to end the thread, because it's just argumentative at this point. ;->
BTW, GTL can be _wider_ than 64-bit. It can be multiples of 64-bit, but it is still a "shared" bus/hub configuration. Many 2 and 4-way Xeon servers use 128-bit or even 256-bit shared buses to the MCH, and then the memory.
EV6 is an up to 16 multi-point cross-bar switch configuration of independent, 64-bit wide buses. E.g., the dual Athlon MP uses 2 _independent_ buses from is crossbar switch "northbridge."
Since the accesses have to be aligned on 8 byte boundries the lower the bus does not carry A0, A1, A2.
Of course. Paragraphs are 256 bytes. Paging is typically 4KiB (unless 4MiB is enabled, which not-well-regression-tested results on Athlon, but it doesn't do anything for the Athlon anyway).
On the Pentium (GTL bus) there are the address pins A3 through A31. [Pentium MMX embedded version, couldn't find the regular P5/P54C docs anymore. www.intel.com/design/intarch/applnots/27320602.pdf] Then, GTL+, support for PAE36, has address bits for 36bit adressing - A3 through A35. [Pentium II processor specs: ftp://download.intel.com/design/intarch/datashts/27326801.pdf] Finally you've got the high end stuff like a Xeon MP, 64bit... Address lines A3 through A39 - so space for 40 bit physical addresses. [http://download.intel.com/design/Xeon/datashts/30675401.pdf&e=7152] So, if the different number of address lines doesn't change the width of a bus, what does? :-)
Socket-A/462 via EV6 doesn't have "fixed" address lines if you bothered to check (it only has 13 x 2). The existence of an actual, physical trace is not how addressing works.
But at this point, let's just forget this whole thread because it's obvious that you're not interested in hearing me out, just hearing yourself out.
On Sat, 2005-06-25 at 01:27, Bryan J. Smith wrote:
But at this point, let's just forget this whole thread because it's obvious that you're not interested in hearing me out, just hearing yourself out.
Some real-world benchmark numbers would make the case more convincing. Does anyone have some? I'm particularly interested in anything with AMD vs. IBM's 64-bit xeon boxes.
Also, on the hardware topic: are there differences in NICs in working with 801.q VLANs?
On Saturday 25 June 2005 13:02, Les Mikesell wrote:
On Sat, 2005-06-25 at 01:27, Bryan J. Smith wrote:
But at this point, let's just forget this whole thread because it's obvious that you're not interested in hearing me out, just hearing yourself out.
Some real-world benchmark numbers would make the case more convincing. Does anyone have some? I'm particularly interested in anything with AMD vs. IBM's 64-bit xeon boxes.
The talk we had wasn't really about 64bit...
There are a few benchmarks out there comparing the two - but none I've seen actually test 64bit linux tuned for Opteron/Xeon (instead of the stock RHEL kernels, see below) with large memory (>4GB) running heavy IO. That's where you'd see the biggest issue with intels implementation because of the lack of IOMMU.
Here is a list of benchmarks and why they are not idea to demonstrate the differences in how the 64bit extensions work: http://www.anandtech.com/IT/showdoc.aspx?i=2447 - No heavy IO, no opteron tuned kernel but EM64T kernel for both. http://www.gamepc.com/labs/view_content.asp?id=pciews&page=1&cookie%... - Windows, 32bit. http://www.anandtech.com/IT/showdoc.aspx?i=2347 - windows, 32bit http://www17.tomshardware.com/cpu/20040927/index.html - windows, 32bit.
More interesting my opinion is the tuning though... There was a couple of threads about mtune differences between k8 and nocona (i.e. https://www.redhat.com/archives/fedora-devel-list/2005-June/thread.html#0071...) and the RHEL3U2 release notes where EM64T was first supported (http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/release-notes/as... - look in the kernel info section).
Peter.
On Sat, 2005-06-25 at 12:49, Peter Arremann wrote:
Some real-world benchmark numbers would make the case more convincing. Does anyone have some? I'm particularly interested in anything with AMD vs. IBM's 64-bit xeon boxes.
The talk we had wasn't really about 64bit...
There are a few benchmarks out there comparing the two - but none I've seen actually test 64bit linux tuned for Opteron/Xeon (instead of the stock RHEL kernels, see below) with large memory (>4GB) running heavy IO. That's where you'd see the biggest issue with intels implementation because of the lack of IOMMU.
Most of the situations where I would use them involve some disk update activity but much more reading, so what really interests me is how well a huge amount of memory works as a disk buffer to avoid doing heavy physical I/O.
On Saturday 25 June 2005 15:53, Les Mikesell wrote:
Most of the situations where I would use them involve some disk update activity but much more reading, so what really interests me is how well a huge amount of memory works as a disk buffer to avoid doing heavy physical I/O.
Then the first AnandTech benchmark article (http://www.anandtech.com/IT/showdoc.aspx?i=2447) is exactly what you want to look at. Huge amount of memory (when compared to the size of the database running on the system) on a 64bit linux kernel... We're doing the same for one of our apps called IPM. Its a PHP app running against a quad opteron with 16GB ram. Heavy on network IO (during business hours its rare that we don't saturate the main 100mbit link) but little disk activity. DB size is about 2.5GB and we end up with a couple of gig for disk buffers. CentOS4 of course... anything specific you're looking for?
Peter.
On Saturday 25 June 2005 02:27, Bryan J. Smith wrote:
The existence of an actual, physical trace is not how addressing works.
Then how does it work? You always just tell me that's not how it is - without an exact reason why or offering an alternative that does not contradict the manufacturers documentation. You call it a 32bit bus, the rest of the world, including the guys that developed the bus, the guys that build mainboards based on it, the guys that illigaly make chipsets with a compatible protocol without a license - they all see it different.
But at this point, let's just forget this whole thread because it's obvious that you're not interested in hearing me out,
Valid to point out what you percive is the problem in us finding a solution.
just hearing yourself out.
Personal attack - don't think that has anything to do with the technical problem. Its not even a very good attack. I'm the one who in almost all emails wrote less than you, plus I backed up my arguments with links to official standards or manufacturers documentation.
Then again, I might have been using a technique that you're unfamiliar with. Its called devide and conquer. You start with a large problem that you're trying to solve. You then split this problem into several smaller ones and try to solve them. If nessecary, repeat this until you've reached a point where you can solve the simple problems and then you use these solutions as building blocks to go after the larger problem. This method is tought to, among others, programmers, electrical engineers and managers - so I assume its a more valid aproach than personal attacks :-)
But in the end I agree - lets drop this. Sorry if I came across that way - for me it was simply frustration. I started with a statement I did not understand and then each time all I got back was a "that's not how it works" - without any pointers to more documentation or anything. So I had to - using the method outlined above - dig down another layer, then an other and so on until we're at a level that we can both agree on. Then, at least that was the plan, go from a common ground up to what you claim is the way it works so we can figure out exactly who is wrong and where the error in thinking lays.
Unfortunately I guess that won't be happening, because I simply can't accept that changes in the physical layout of a parallel bus do not have any effect on its width. SCSI did it when going from 8bit narrow to 16 bit wide. It worked for them. [http://pinouts.ru/data/info-scsi_pinout.shtml] MCA, usually a 32bit bus, had a low cost 16bit variant that did the same. [http://pinouts.ru/data/mca_32bit_pinout.shtml] PCI? 32bit addressing but with the optional 64bit extension, has 64 address lines... [http://pinouts.ru/data/PCI_pinout.shtml]
Peter.
On Sat, 2005-06-25 at 13:19 -0400, Peter Arremann wrote:
Then how does it work? You always just tell me that's not how it is - without an exact reason why or offering an alternative that does not contradict the manufacturers documentation. You call it a 32bit bus, the rest of the world, including the guys that developed the bus, the guys that build mainboards based on it, the guys that illigaly make chipsets with a compatible protocol without a license - they all see it different.
Dude, you're totally mis-appropriating simple board layout specifications to how the logic of the bus works. That's why I'm not even going to discuss this any longer.
On Saturday 25 June 2005 16:18, Bryan J. Smith wrote:
Dude, you're totally mis-appropriating simple board layout specifications to how the logic of the bus works. That's why I'm not even going to discuss this any longer.
Then enlighten me - if I have 40 address bits - transmit only the higher 37 since I don't need the lower end. Timing schemas show only one Input hold time per address transfer for all available pins - how can that be a 32bit bus? (ftp://download.intel.com/design/Xeon/datashts/30675401.pdf) And, if timing diagrams, pinouts and so on lie about the size of the bus, what is actually going on, then I guess its true - Intel uses voodoo magic to design their chips and they added extra address pin, never ever use them and the MCH figures out the missing address bits by some more ocult means (http://www.amazon.com/exec/obidos/ASIN/B00001ZWV7/104-8776547-8655150)
And if its not, then your whole speach about the pae36 differences between a gtl+ or ev6 connected device is wrong, which then in turn makes the only real difference the iommu the newer athlon cores provide (so dma can go above the first 4GB rather than having to be bounced)... This is also supported by the intel, amd and redhat docs (see links posted above and in previous mails), the post Feizhou made in this thread convering the LKML references about using the apggart and even microsoft (http://www.microsoft.com/whdc/system/platform/server/PAE/pae_os.mspx)
*shrugs* Intel, AMD and any other spec sheet you can find down to VIA chipset docs agrees with that... But I guess I'm still wrong though?
Peter.
Quoting Peter Arremann loony@loonybin.org:
On Saturday 25 June 2005 16:18, Bryan J. Smith wrote:
Dude, you're totally mis-appropriating simple board layout specifications to how the logic of the bus works. That's why I'm not even going to discuss this any longer.
Then enlighten me - if I have 40 address bits - transmit only the higher 37 since I don't need the lower end. Timing schemas show only one Input hold time per address transfer for all available pins - how can that be a 32bit bus? (ftp://download.intel.com/design/Xeon/datashts/30675401.pdf) And, if timing diagrams, pinouts and so on lie about the size of the bus, what is actually going on, then I guess its true - Intel uses voodoo magic to design their chips and they added extra address pin, never ever use them and the MCH figures out the missing address bits by some more ocult means (http://www.amazon.com/exec/obidos/ASIN/B00001ZWV7/104-8776547-8655150)
Well, this is how I interpreted Bryan's emails. He'll probably correct me if I'm wrong (yeah, I have EE, but haven't done much EE work since I graduated, it just happened to be mostly "software" things for me, so I'm rather rusty here :-(
I don't think Bryan is talking about "external" width of the address bus, how many lines you see printed on the motherboard. He's talking how the things are organized internally, and about the way the bus logic works. That would be what theoretically implementations might use, not what some specific implementation is limiting itself to.
Sorry for mixing software and hardware from now on, just trying to make Peter see where his misconception is (the best way I can, which might not be good way at all). The programming model for 32-bit userland applications is obviusly limited to 32 bits -- the sizeof(void *) will tell you that. So single process can see only 32-bit logical linear address space. On the other hand, processor (the hardware) doesn't need to have such limits, if it's internal organization is wider. So, AMD (processor) is able to see 40-bit linear physical address space as one single big chunk of memory. 32-bit applications will have their 32-bit *logical* address space mapped into this 40-bit linear *physical* address space of processor. I don't know if programming model of "32-bit" AMD processors allows you to have wider-then-32-bit pointers (even if it did, you would have to have compiler that can generate such code, gcc can't do it for sure). Obviously, the kernel needs to know how to manage things in this wider physical address space, the reason why you need patched Linux kernel to take advantage of it. Intel (the processor) on the other hand, is able to physically address only 32-bit address space. Anything wider than that, it needs to page. Dealing with paging will obviosly be additional work for OS, hence lower performance.
So while both processors will have more that 32 address lines on the packaging (and printed on motherboard) minus couple of lowest one (that are not needed) as you can see in various specifications that you queted, that doesn't mean processor's core actually sees all that address space.
How else to try explaining this... Hm... Remeber Intel 8086? It could address only 16-bit address space, but it had more than 16 address lines on the packaging. It used segmenting (hopefully the right term) to see wider-than-16-bit address space. Now try to make analogy with what was discussed so far ;-)
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
From: Steven Vishoot sir_funzone@yahoo.com
As Carol Brady said "what was all that about" I dont know if anyone else feels this way, but i think this subjuct has been beaten to death and beyond. Not sure if i remember right, but wasnt the original post about upgrading from 4.0 to 4.1? How did this topic become an engineering course? sorry for being a crab...but come on....
[lots snipped]
I agree completely. Unfortunately, even though I've killfiled the biggest offender, I'm still getting a lot of the noise since people continue to feed him. If people want to feed the trolls, please, for the love of all that's holy, trim your posts. 8-) Sigh, this is the first technical list filter I've ever had to implement (for email...usenet is another story).
Cheers,
C
On Tuesday 28 June 2005 10:12, alex@milivojevic.org wrote
I don't think Bryan is talking about "external" width of the address bus, how many lines you see printed on the motherboard. He's talking how the things are organized internally, and about the way the bus logic works. That would be what theoretically implementations might use, not what some specific implementation is limiting itself to.
alright - as I posted before, I went to bus level only to find common ground. But anyway, lets look at the cpu internals then...
Sorry for mixing software and hardware from now on, just trying to make Peter see where his misconception is (the best way I can, which might not be good way at all). The programming model for 32-bit userland applications is obviusly limited to 32 bits -- the sizeof(void *) will tell you that. So single process can see only 32-bit logical linear address space.
Yes, very much agreed.
On the other hand, processor (the hardware) doesn't need to have such limits, if it's internal organization is wider. So, AMD (processor) is able to see 40-bit linear physical address space as one single big chunk of memory.
If you have a 32bit address, there is no way to magically make 8 more bits appear. So you need to translate using a translation table... 32bit address A generates 36/40/64/whatever bit address B.... that's a simple table lookup. With that, you can map 4GB of 32bit address space into a larger space - but each process, like you said before, is still only 4GB.
So, how do you define a linear space there? Only 1 entry per 4GB? Intel can do that in their newest Xeons. Be able to use any address in the 36/40 bit address space? 36bit on PPro and newer, 40bit on newest xeons too. So that can't be the big difference either.
And about PAE. All PAE calls are just there to manipulate the PTE entries - so if AMD has their own method there to write entries into the page table great, but you still need a page table to translate from 32 to 40 bits somewhere... And if that's the case, then first AMD did omit it in all their documentation and it would have nothing to do with EV6 - you could do the same on a 40bit AGTL+ bus without issues.
So again - if you look before PT, then either one is 32bit.... If you look post translation, then both produce >32 bit linear - any address can be accessed.
Obviously, the kernel needs to know how to manage things in this wider physical address space, the reason why you need patched Linux kernel to take advantage of it.
What patch exactly is that?
Intel (the processor) on the other hand, is able to physically address only 32-bit address space. Anything wider than that, it needs to page. Dealing with paging will obviosly be additional work for OS, hence lower performance.
Unfortunately that's where you go wrong. The TLB buffers (that translate the 32bit address that you have into the real physical address, mmu jada jada...) have more than 32bit wide entries even on the Intel hardware. And unlike our friend Bryan, I don't mind backing that up with references to official documentation: ftp://download.intel.com/design/Pentium4/manuals/25366816.pdf Look on page 3-31. There you will see that for PAE the PTE have been extended to 64 bit (only 36 used for PAE36)...
This address is then used to access memory. This is where the bus comes in - after the translation. The application has a 32bit pointer, the page table lookup converts it into a 36bit one if PAE36 is enabled. No difference here between EV6 and GTL+ - the bus simply does not matter there.
Oh, if you don't believe intel and you want inofficial things look at http://www.prism.gatech.edu/~gte213x/LinuxMM/rpt.html#cPGH or even page.h of your kernel source code... i.e. line 71 that refers to the 63rd bit in the PT... (2.4.21 kernel source, haven't checked 2.6..) but I guess that doesn't count to Bryan either :-)
So while both processors will have more that 32 address lines on the packaging (and printed on motherboard) minus couple of lowest one (that are not needed) as you can see in various specifications that you queted, that doesn't mean processor's core actually sees all that address space.
Yes, they don't see the upper bits in the page table - but that's the same no matter what :-)
How else to try explaining this... Hm... Remeber Intel 8086? It could address only 16-bit address space, but it had more than 16 address lines on the packaging. It used segmenting (hopefully the right term) to see wider-than-16-bit address space. Now try to make analogy with what was discussed so far ;-)
It generated 20 bit addresses out of segment and offset... but the data width is 16 bit or 8 bit on the 8088... That's what they used to define the bus width... If you read the thread, I mentioned it somewhere that parallel busses are defined either by their address width (now common) or data width (used to be common)... For the 8086/88 Intel did the later to show the difference between the cpus.
The PIII and the like are 32 bit cpus but the bus address width is up to 40 bits. This of it the other way around - look at the chipset or another device connected to the bus. They need to speak the bus protocol without knowing about applications, without knowing about protected mode or anything. So any CPU knowledge can't be used to argue about the basic bus structure.
Anyway, in the end what it comes down to is the point that the GTL+ or EV6 bus has nothing to do with PAE36 as Bryan claimed. HyperTransport of the Athlon64s also has very little to do with EV6 and so on.
I don't disagree that there might be a hidden super secret extension in the 32bit athlons. But it has nothing to do with EV6. It has nothing to do with voodoo magic or Bryan's chest beating either. Its bits and bytes - and sorry, you can argue about religion but In engineering, if the manufacturer doesn't document it, you can't offer any prove and - most importabtly - it goes against the docs of at least 5 other manufacturers, then sorry, you're just wrong.
Peter.