From: Maciej ?enczykowski maze@cela.pl
That's a good point - does anyone know what the new Intel Virtualization thingamajig in the new dual core pentium D's is about?
It's all speculation at this point. But there are _several_ factors.
But I'm sure the first time Intel saw AMD's x86-64/PAE52 presentation, the same thing popped into my mind that popped into Intel's mind ... Virtualization
- The 48-bit/256TiB limitation of x86-64 "Long Mode"
There is a "progammers limit" of 48-bit/256TiB in x86-64 "Long Mode." This limitation is due to how i386/i486-TLB works -- 16-bit segment, 32-bit off-set. If AMD would have choosen to ignore such compatibility, it would have been near-impossible for 32-bit/PAE36 programs to run under a kernel of a different model. But "Long Mode" was designed so its PAE52 model could run both 32-bit (and PAE36) as well as new 48-bit programs.
We'll revisit that in a bit. Now, let's talk about Intel/AMD design lineage.
- Intel IA-32 Complete Design Lineage
IA-32 Gen 1 (1986): i386, including i486 - Non-superscalar: ALU + optional FPU (std. in 486DX), TLB added in i486 IA-32 Gen 2 (1992): i586, Pentium/MMX (defunct, redesigned in i686) - Superscalar 2+1 ALU+FPU (pipelined) IA-32 Gen 3 (1994): i686, Pentium Pro, II, III, 4 (partial refit) - Superscalar: 2+2 ALU+FPU (pipelined), FPU 1 complex or 2 ADD - P3 = +1 SSE pipe, P4 = +2 SSE pipe
Intel hasn't revamped it's aging i686 architecture in almost 12 years. the Pentium Pro through Pentium III are the _exact_same_ 7-issue (2+2+3 ALU+FPU+controll) design (the P3 slaps on one SSE unit), and the Pentium 4 was a quick, 18-month refit of longer pipes (with associated reduction in ALU/FPU performance MHz for MHz) that extended pipes for clock (and added a 2nd SSE unit).
I'm sure Intel's reasoning for not bothering with a complete generation redesign beyond i686 is because it thought EPIC/Predication would have taken over by now. The reality has been quite the opposite (which I won't get back into).
Since then, Intel has made a number of "hacks" to the i686 architecture. One is HyperThreading which tries to keep its pipes full by using its control units to virtualize two instruction schedulers, registers, etc... In a nutshell, it's a nice way to get "out-of-order and register renaming for almost free." Other than basic coherency checking as necessary in silicon, it "passes the buck" to the OS, leveraging its context switching (and associated overhead) to manage some details.
That's why HyperThreading can actually be slower for some applications, because they do not thread, and the added overhead in _software_ results in reduced processing time for the applications.
"Yamhill" IA-32e aka "EM64T" was just a P4 ALU refit for x86-64/PAE52, but it lacks many design considerations that the Athlon has -- especially outside the programmer/software considerations, and definitely more at the core interconnect/platform.. I.e., because Intel continues to use a single-point-of-contention "memory controller hub" (MCH), memory interconnect and I/O management, among other details, are still left to the MCH. This is going to become more and more of a headache. The reality is that the Intel IA-32e platform _must_ get past the "northbridge outside the CPU" attitude to compete with AMD.
As such, I have _always_ theorized that "Yamhill" is a 2-part project. Part 2 is the first redesign of a x86 core in almost (now) 12 years, which goes beyond merely adding true register renaming and out-of- order execution (which are largely hacks in the P4/HT), but goes directly to the concept of virtualizing cores. More on that in a bit, now AMD ...
- AMD x86 Complete Design Lineage
AMD Gen 1 (1992*): i386/486 ISA -- 386, 486, 5x86, K5* - Non-superscalar: ALU + optional FPU (std. in K5) AMD Gen 2 (1994*): i486/686 ISA -- Nx586+FPU/K5*, Nx686/K6 - Superscalar: 3+1 ALU+FPU (ALUs pipelined, FPU _not_ piplined) AMD Gen 3 (1999): i686/x86-64 ISA -- Athlon, Athlon64/Opteron - Superscalar: 3+3 ALU+FPU (pipelined), FPU 2 _and_ 1 ADD/MULT - Extensions are microcoded and leverage ALU/FPU as appropriate
*NOTE: The NexGen Nx586 released in 1994 forms the basis for latter K5 (i486) and the K6 (i686). AMD had scalability issues with its original non-superscalar K5 design and purchased NexGen.
SIDE NOTE: SSE Comparison - P4 can do 3 MULT SSE (1 FPU complex + 2 SSE pipes) - Athlon can do 3 MULT SSE (2 FPU complex + 1 FPU MULT)
Contrary to popular opinion, Athlon64/Opteron is the _same_core_ design as the 32-bit Athlon platform. It is still the same, ultra- powerful 3+3 ALU+FPU core, with its 2 complex + 1 ADD/MULT FPU able to equal Intel's 1 complex _or_ 2 ADD FPU plus 2 SSE pipes at doing the majority of matrix transforms (which are MULT -- hence why Intel's FPU can't do 2 simultaneously, and relies heavily on its precision-lacking SSE pipes).
Also contrary to popular opinion, 40-bit/1TiB Digital Alpha EV6 interconnect forms the basis for _all_ addressing in _all_ Athlon releases, including the 32-bit. There are a few mainboards that allow even 32-bit Athlons to safely address above 4GB with _no_ paging or issues (with an OS that offers such a supporting kernel, like Linux). The 3-16 point EV6 crossbar and not "hub" architecture, forced Athlon MP to put any I/O coherency login in the chip, so the AGPgart control is actually on the Athlon MP, and not in the northbridge. This has evolved into a full I/O MMU in Athlon64/Opteron.
Because Athlon is 5 years newer than Intel i686, and there is a wealthy of talent influx from Digital (even though Intel did get some as well, they haven't redesigned i686 completely), Athlon has some of the latest, run-time register renaming and out-of-order execution control in the core itself. This is why doing something like HyperThreading would benefit AMD _very_little_ and largley introduce self-defeating (and even performance reducing) overhead.
In addition to the design of PAE52, the #1 reason why you can safely assume AMD is moving towards virtualization is because of the design limits they put on Athlon64/Opteron. E.g., although the original 32-bit Athlon platform used logic that allowed up to the full EV6 8MB SRAM addressing (cache), Athlon64/Opteron has been artificially limited to 1MB SRAM (saving many considerations and other benefits). This clearly indicates AMD did not consider Athlon64/Opteron
- The Evolution to Virtual Cores
AMD's adoption of '90s concepts of register renaming and out-of-order execution are great for a single core. And Intel's HyperThreading with the minor P4 run-time additions passes-the-buck decently in lieu of a complete core redesign (which they haven't done since 1994). But the concept of extending the pipes any further for performance has been largely broken in the P4, and Intel is actually falling back to its last rev of the i686 original, P3.
Multiple, _physical_ cores have been the first step. This is little more than slapping in a second set of all the non-SRAM transistors, plus any additional bridging logic, if necessary. AMD HyperTransport requires none -- as HyperTransport can "tunnel" anything, EV6 memory/addressing, I/O tunnels/bridges, inter-CPU, etc... all "gluelessly." Intel MCH GTL+ cannot, and requires bridges between the "chipset MCH" and the "multi-core MCH," adding latency. And there are nagging 32-bit limitations with GTL+ as well (long story).
The next logical evolution in microprocessor design is to blur the physical separation between cores. It's the best way without tearing down the entire '70s-induced concept of machine code (operator+ operand, possibly control, at least microcoded internally) and the resulting instruction sets. Instead of discrete, superscalar units of a half-dozen to a dozen, pipelined units, there will be numerous, independent pipes, possibly with their own registers or a number of generic registers, as a single unit. Other than the controlling firmware and/or OS, this is _not_ what software will use.
What the software will use are the virtual instantiations that partition this set of pipes and registers, which may very well be dynamic in nature. Let's say I boot Windows, I might instantiate a virtual i686/PAE36 core guaranteeing 100% full Win32 compatibility. Depending on what resources the chip physically has, I will likely even instantiate multiple i686 processors. The concept of multi-CPU and multi-threading has evolved into virtual-cores with virtual-threading. Virtualizing more CPUs with a total number of more pipes/registers than is actual will allow more registers and pipelines to be executing instead of the common 40-50% for superscalar CISC or 60-70% for superscalar RISC.
As an "added bonus," this means the 48-bit/256TiB constraint for PAE36 compatibility is _removed_. I.e., you can have a much larger, true memory pool, and any required windowing/ segmentation is done with_out_ paging by the "host" memory model, even though the OS is virtually running in a PAE36 or PAE52 model.
This also gives rise to an entirely new platform for virtualization of simultaneous OSes -- be it the same OS, or different OSes. Because cores are virtual, you can have multiple, independent processors with their own registers, memory windows into physical RAM, etc... On the more "consumer" front, this will allow it to work with existing OSes as-is. On the more "load balancing server" front, this will often be paired with software (think EMC/VMWare *SX products) so numerous instances can be dynamically load-balanced across virtual cores -- but far more overhead and increased efficiency is put on the chip. But it is still managed by software (just with reduced context switching overhead in the software).
Again, it's really just a consolidation of all the run-time optimizations we have now, along with both multi-core and multi-threading approaches, into a general pool of pipes, registers and organization. Additionally, it breaks the physical constraints of the memory model for the physical hardware, which is a very big issue for our future. To ensure x86/PAE36 and x86-64/PAE52 compatibility in the future, such machines will need to be virtualized or we'll be stuck at 48-bit/256TiB.
As in is it worth anything?
Yes -- and almost everything to the future of Microsoft being able to sustain much their existing Win32 codebase which does _not_ port to PAE52 very easily and definitely _not_ with full compatibility.
And we have to break the 48-bit/256TiB limitations of PAE52, while still ensuring PAE52 OSes/applications, as well as some legacy PAE36 OSes/applications, still run. The only way is to virtualize the whole freak'n chip so we can instantiate a processor, registers and its memory model -- even if dynamically assigned/ shared. And that's just for end-users, possibly workstations and entry servers.
For load-balancing servers, you'll still need a software solution for management. It will be that the hardware just offers far greater efficiency and reduced context switching. In fact, the next consolidation are these virtual core chips in blades, where you not only manage the virtual cores in the individual chips/ blades, but an entire rack of blades as a single unit with multiple OSes spread across. This already exists, but this takes it one step further -- because the processors themselves are virtualized with greatly reduced overhead on the part of the software.
Will it allow a dual simultaneous boot of Linux+WinXP+MaxOS under Xen or something along those lines?
Yes.
It will both give more virtualized processors to a single executing OS, as well as create segmented, virtualized processors for independently and simultaneously operating processors.
Even on an SMP machine?
First off, remove the Intel-centric notion of "Symmetric" MP (SMP).
Secondly, multi-processing and multi-threading are going to merge with traditional register renaming and out-of-order execution. So the traditional concept of "MP" is _dying_. In fact, in the '90s, it really died in general.
I know it's hard to think outside the box and traditional thought, but most users don't understand superscalar design in the first place. Those who do understand why AMD has _not_ bothered to adopt Intel SMT (HyperThreading) in Athlon, because it won't benefit (because AMD's cores are 5 years newer in design, and put far more optimizations in the chip to keep pipes full and registers used that to virtualize two sets for the OS to use).
Anyone have any experience/knowledge about this?
I can only speculate based on the history of the players involved, as well as what AMD's PAE52 design as well as limitations of the current Athlon core (which is largely the _same_ between both the 32-bit and newer 64-bit versions).
But the concept of adding more pipes with lots of stages for timing is only leaving more and more stages in pipes empty, or doing little. There has to be a consolidation of many run-time optimizations inside of the chip, and the best way to do that is to create a bank of pipes, registers, etc... and virtually assemble them into virtual cores that are partitioned with memory as a traditional PAE36 or PAE52 processor (or multi-processor).
It's going to solve a _lot_ of issues -- both semiconductor and software.
What level of CPU/hardware(?) does the virt-core support? And is the virt-core 32bit?
You can be certain that the "host" OS (possibly firmware-based?) will be able to instantiate multiple PAE36 and/or PAE52 virtual systems with their own and -- I'll use legacy terminology here (even if it's not technically correct) -- "Ring 0" access. So, technically, there should be possible to run any PAE36 or PAE52 OS simultaneously on the same hardware as any other PAE36 or PAE52 OS.
The larger issues of firmware-OS interoperability as well as partitioning resources (memory, disk, etc...) is really more of a political/market issue. I.e., AMD and Intel can provide the platform, but people have to work together to use it. Furthermore, it also means that Intel can continue to best AMD in funding of OEMs and firmware/software vendors, so it still has an advantage in that capacity.
I'm sure Apple will be protective of its firmware, and Intel's new, supposed "open" firmware is rather proprietary. As I've repeatedly commented elsewhere, the 2 "most open" hardware vendors right now are AMD and Sun, x86-64 and SPARC, respectively. Intel has not only protected non-programmer aspects of IA-64 heavily, but most of their new platform developments for even IA-32e (EM64T) are _very_proprietary_. IBM is partially doing the same with Power in a microelectronics offering, but it is _not_ the same in its branded Power solutions (among others).
So it's not going to solve vendors who require firmware and data organization that is not open and stanardized. We're fine on legacy Win32 platforms, but it's not going to address Mactel, nor solve the problem of existing OSes that don't run under current virtualization solutions because of such proprietary requirements.
-- Bryan J. Smith mailto:b.j.smith@ieee.org
On Tuesday 21 June 2005 11:54, Bryan J. Smith b.j.smith@ieee.org wrote:
There are a few mainboards that allow even 32-bit Athlons to safely address above 4GB with _no_ paging or issues (with an OS that offers such a supporting kernel, like Linux).
How does that work? :-)
Peter.
On Fri, 2005-06-24 at 01:37 -0400, Peter Arremann wrote:
How does that work? :-)
It works on the reality that the 32-bit Athlon and 64-bit Athlon use the _same_ 40-bit/1TiB EV6 addressing to memory (as well as tunneled over HyperTransport to other CPUs and I/O in the case of the latter). In reality, they are the same core designs too (the latter just being revamped ALU with 64-bit features, more 128-bit XMM registers and the evolution of its on-CPU AGPgart to the I/O MMU).
Normally the 32-bit Athlon is limited in its addressing to 32-bit/PAE36 (4/64GiB) for Intel GTL compatibility at the BIOS, OS, etc... If you have a BIOS that lets the 32-bit Athlon break 32-bit/PAE36 Intel GTL compatibility, and pair it with an OS that does the same, then you can have the _full_ support of 32-bit Athlon's EV6 addressing architecture. In fact, EV6 is _nothing_ like GTL, but it just emulates it. That includes it looking like a "SMP bus" in the case of Athlon MP, when -- in fact -- it's an "MP switch."
If you want to know more about the non-PAE36 >4GB Linux hack and the few Athlon MP mainboards with BIOSes that support it, read up on the LKML circa February 2004.
On Friday 24 June 2005 02:09, Bryan J. Smith wrote:
Normally the 32-bit Athlon is limited in its addressing to 32-bit/PAE36 (4/64GiB) for Intel GTL compatibility at the BIOS, OS, etc... If you have a BIOS that lets the 32-bit Athlon break 32-bit/PAE36 Intel GTL compatibility, and pair it with an OS that does the same, then you can have the _full_ support of 32-bit Athlon's EV6 addressing architecture. In fact, EV6 is _nothing_ like GTL, but it just emulates it. That includes it looking like a "SMP bus" in the case of Athlon MP, when -- in fact -- it's an "MP switch."
Hmmm - The GTL bus uses 36bit for address... So if you get a license from Intel and build your own device, it can address 36bits directly without any games. PAE36 is a mmu concept that allows a 32 bit OS to have 16 4GB pages (hence the name Page Address Extensions 36 bits...)
So if you have a 32 bit athlon, you get 4GB ram... To go above that, you need more bits. Those bits need to be stored in a separate register since all your apps and os only have 32bit - and then you need to have a mmu that can combine those two into the physical address... And the combining of an offset and a segment is what PAE36 is all about... See where my confusion comes in? :-)
If you want to know more about the non-PAE36 >4GB Linux hack and the few Athlon MP mainboards with BIOSes that support it, read up on the LKML circa February 2004.
Sorry to make you even more work but I searched the LKML archives and couldn't find anything :-( Could you please send me a direct link?
Peter.
On Fri, 2005-06-24 at 02:51 -0400, Peter Arremann wrote:
Hmmm - The GTL bus uses 36bit for address... So if you get a license from Intel and build your own device, it can address 36bits directly without any games. PAE36 is a mmu concept that allows a 32 bit OS to have 16 4GB pages (hence the name Page Address Extensions 36 bits...)
Yes, the key word there is that it "pages." BTW, it's not 16 x 4GB pages, but 120 x 512MB pages into the lower 8 x 512MB memory.
Intel GTL is _still_ a 4GB platform with paging above 4GB up to 64GB. It is literally like using old EMS in DOS.
So if you have a 32 bit athlon, you get 4GB ram... To go above that, you need more bits.
Sigh ... the 32-bit Athlon uses _40-bit_ EV6. I only used the term "32- bit Athlon" to market differentiate from the "64-bit Athlon/Opteron." In reality, _both_ products use the same core with 40-bit EV6 addressing. The latter just also offers a 48-bit/PAE52 programming/register "Long Mode", whereas the former only offers a 32- bit/PAE36 programming/register mode.
Those bits need to be stored in a separate register since all your apps and os only have 32bit
36-bit -> 16-bit segment + 32-bit offset = 36-bit (4-bits overhang).
On Intel GTL, it pages above 4GB, as you mentioned.
On AMD EV6, it also does it linearly in hardware for GTL compatibility. _Unless_ you have an Athlon MP mainboard with the BIOS and a Linux kernel that offers _true_ access. It that _avoids_ paging in hardware above 4GB, significantly improving performance.
The Athlon64/Opteron just now have a formal mode called "Long Mode" where the 16-bit segment is the "top bits 33-48" with the 32-bit offset = 48-bit/256TiB. _Physically_ Athlon64/Opteron are still limited to EV6's 40-bit/1TiB addressing of the platform, same as 32-bit Athlon.
People often get confused on _physical_ platform (board engineering- level) addressing versus _logical_ "programmer" addressing.
- and then you need to have a mmu that can combine those two into the
physical address... And the combining of an offset and a segment is what PAE36 is all about... See where my confusion comes in? :-)
Yes, reference above.
Sorry to make you even more work but I searched the LKML archives and couldn't find anything :-( Could you please send me a direct link?
I'll find it. It was from a gentlemen from AMD that discussed it in a thread right after Intel announced EM64T. Linus & co. were talking about how EM64T is still using a 32-bit platform underneath. The AMD gentlemen commented how they a few vendors had a BIOS option "Linux" for the memory access, and the Linux kernel could support linear addressing above 4GB.
On Fri, 2005-06-24 at 07:42 -0500, Bryan J. Smith wrote:
I'll find it. It was from a gentlemen from AMD that discussed it in a thread right after Intel announced EM64T. Linus & co. were talking about how EM64T is still using a 32-bit platform underneath. The AMD gentlemen commented how they a few vendors had a BIOS option "Linux" for the memory access, and the Linux kernel could support linear addressing above 4GB.
Ack, it doesn't appear to be directly in the 2004Feb17 thread: http://kerneltrap.org/node/2466
There are a few comments on how the Athlon differs from Intel because of the Alpha EV6, but not the more technical detail I was referring to.
Again, to get this "mode" you have to: a) Tell the BIOS to enable a "Linux" memory mode b) Have a Linux kernel (Linux/x86-64 I believe?) that supports it
Yes, that's a modification of the Linux/x86-64 kernel running on a "32- bit" Athlon, because it offers the extended, linear memory addressing and other memory management.
BTW, the official name for Intel EM64T products _are_ IA-32e for a number of reason. ;-ppp
On 6/24/05, Bryan J. Smith b.j.smith@ieee.org wrote:
On Fri, 2005-06-24 at 07:42 -0500, Bryan J. Smith wrote:
I'll find it. It was from a gentlemen from AMD that discussed it in a thread right after Intel announced EM64T. Linus & co. were talking about how EM64T is still using a 32-bit platform underneath. The AMD gentlemen commented how they a few vendors had a BIOS option "Linux" for the memory access, and the Linux kernel could support linear addressing above 4GB.
Ack, it doesn't appear to be directly in the 2004Feb17 thread: http://kerneltrap.org/node/2466
There are a few comments on how the Athlon differs from Intel because of the Alpha EV6, but not the more technical detail I was referring to.
Again, to get this "mode" you have to: a) Tell the BIOS to enable a "Linux" memory mode b) Have a Linux kernel (Linux/x86-64 I believe?) that supports it
Any idea what the BIOS actually does to enable "linux" memory mode? That is what registers are poked in the MCH (I assume it would be the MCH)? Do you think this could be done post bios by say a boot loader?
Just curious...james
Bryan J. Smith wrote:
On Fri, 2005-06-24 at 07:42 -0500, Bryan J. Smith wrote:
I'll find it. It was from a gentlemen from AMD that discussed it in a thread right after Intel announced EM64T. Linus & co. were talking about how EM64T is still using a 32-bit platform underneath. The AMD gentlemen commented how they a few vendors had a BIOS option "Linux" for the memory access, and the Linux kernel could support linear addressing above 4GB.
Ack, it doesn't appear to be directly in the 2004Feb17 thread: http://kerneltrap.org/node/2466
There are a few comments on how the Athlon differs from Intel because of the Alpha EV6, but not the more technical detail I was referring to.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107759901509280&w=2
The Linux option in bios is mentioned here.
Again, to get this "mode" you have to: a) Tell the BIOS to enable a "Linux" memory mode b) Have a Linux kernel (Linux/x86-64 I believe?) that supports it
Yes, that's a modification of the Linux/x86-64 kernel running on a "32- bit" Athlon, because it offers the extended, linear memory addressing and other memory management.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107757492125437&w=2
I think what Bryan here is talking about is called IOMMU by the kernel guys.
On Friday 24 June 2005 09:59, Feizhou wrote:
There are a few comments on how the Athlon differs from Intel because of the Alpha EV6, but not the more technical detail I was referring to.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107759901509280&w=2
The Linux option in bios is mentioned here.
Again, to get this "mode" you have to: a) Tell the BIOS to enable a "Linux" memory mode b) Have a Linux kernel (Linux/x86-64 I believe?) that supports it
Yes, that's a modification of the Linux/x86-64 kernel running on a "32- bit" Athlon, because it offers the extended, linear memory addressing and other memory management.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107757492125437&w=2
I think what Bryan here is talking about is called IOMMU by the kernel guys. _______________________________________________
Thanks - but both links again talk about a 32bit/4GB schema and don't talk at all about addressing >4GB without the need for paging - that was the statement Bryan made when posting.
Peter.
On Friday 24 June 2005 08:42, Bryan J. Smith wrote:
On Fri, 2005-06-24 at 02:51 -0400, Peter Arremann wrote:
Hmmm - The GTL bus uses 36bit for address... So if you get a license from Intel and build your own device, it can address 36bits directly without any games. PAE36 is a mmu concept that allows a 32 bit OS to have 16 4GB pages (hence the name Page Address Extensions 36 bits...)
Yes, the key word there is that it "pages." BTW, it's not 16 x 4GB pages, but 120 x 512MB pages into the lower 8 x 512MB memory.
I knew that except at 2:51 am *yawns* Sorry, my fault.
So if you have a 32 bit athlon, you get 4GB ram... To go above that, you need more bits.
Sigh ... the 32-bit Athlon uses _40-bit_ EV6. I only used the term "32- bit Athlon" to market differentiate from the "64-bit Athlon/Opteron." In reality, _both_ products use the same core with 40-bit EV6 addressing. The latter just also offers a 48-bit/PAE52 programming/register "Long Mode", whereas the former only offers a 32- bit/PAE36 programming/register mode.
And that's exactly the part I don't get - if you have a 32bit address model then you have to use PAE of some sort (compatible to Intel PAE36 or not) to get to address more than that memory...
Those bits need to be stored in a separate register since all your apps and os only have 32bit
36-bit -> 16-bit segment + 32-bit offset = 36-bit (4-bits overhang).
On Intel GTL, it pages above 4GB, as you mentioned.
On AMD EV6, it also does it linearly in hardware for GTL compatibility. _Unless_ you have an Athlon MP mainboard with the BIOS and a Linux kernel that offers _true_ access. It that _avoids_ paging in hardware above 4GB, significantly improving performance.
again - how do I generate a >32bit address when using a 32bit address model without pages? :-) Once I have that address, throwing it out on the bus is easy - but how do you generate that address without a PAE?
The Athlon64/Opteron just now have a formal mode called "Long Mode" where the 16-bit segment is the "top bits 33-48" with the 32-bit offset = 48-bit/256TiB. _Physically_ Athlon64/Opteron are still limited to EV6's 40-bit/1TiB addressing of the platform, same as 32-bit Athlon.
People often get confused on _physical_ platform (board engineering- level) addressing versus _logical_ "programmer" addressing.
- and then you need to have a mmu that can combine those two into the
physical address... And the combining of an offset and a segment is what PAE36 is all about... See where my confusion comes in? :-)
Yes, reference above.
So you're simply talking about being the ability to output an address longer than 32bit on the bus?
Sorry to make you even more work but I searched the LKML archives and couldn't find anything :-( Could you please send me a direct link?
I'll find it. It was from a gentlemen from AMD that discussed it in a thread right after Intel announced EM64T. Linus & co. were talking about how EM64T is still using a 32-bit platform underneath. The AMD gentlemen commented how they a few vendors had a BIOS option "Linux" for the memory access, and the Linux kernel could support linear addressing above 4GB.
on AMD64, yes, thats for sure... but you were referring to 32bit athlons in the statement I'm trying to understand.
Peter.