where does the performance hit for 4G/4G on Intel (whether ia32e or not) come from?
The performance hit is for _all_ IA-32 compatible architectures running Linux/x86, because there is definitely a hit.
There's a hit for the 4G+4G HIGHMEM model. And there is another, bigger one if you go 64G model (more than 4GiB user).
As far as _both_ Intel IA-32 on Linux/x86 _and_ Intel IA-32e (EM64T) on Linux/x86-64, you _always_ have "bounce buffers" (c/o the Soft I/O MMU, Soft IOTLB in Linux/x86-64 on EM64T) if you are doing a transfer between two memory areas -- e.g., user memory and memory mapped I/O -- when _one_ area is above 4GiB. No way around that, and a major problem with Intel right now.
Right, so if I have 2G of RAM, I want 2G/2G (kernel/user) split instead of 1G/3G so that I don't have to turn on HIGHMEM and thus avoid the penalty of using HIGHMEM.
ugh...RHEL4 kernels do not provide 2G/2G split...only a 4G/4G option. Documentation say between 0% -> 30% performance hit...and to treat it as 20%...
Sorry about the 'no need for HIGHMEM' part...that is still needed to see more than 1G, my mistake.