Hi,
HP says that what they call "NUMA split mode" should be disabled in the BIOS of the Z800 workstation when running Linux. They are reasoning that Linux kernels do not support this feature and even might not boot if it´s enabled.
Since it apparently was years ago since they made this statement, I´m wondering if I should still leave this feature disabled or not. More recent kernels might support it, and it´s supposed to improve performance.
Could someone explain what this feature actually is or does, and if Centos kernels support it?
On 10/1/2017 8:38 AM, hw wrote:
HP says that what they call "NUMA split mode" should be disabled in the BIOS of the Z800 workstation when running Linux. They are reasoning that Linux kernels do not support this feature and even might not boot if it´s enabled.
hmm, that workstation is a dual Xeon 56xx (Westmere-EP, derived from Nehalem), new in 2010
Since it apparently was years ago since they made this statement, I´m wondering if I should still leave this feature disabled or not. More recent kernels might support it, and it´s supposed to improve performance.
Could someone explain what this feature actually is or does, and if Centos kernels support it?
On these sorts of dual socket hardware architectures, half of the memory is directly attached to each CPU, and the two CPUs are linked with a QPI bus. All the memory appears in one unified address space, but the memory belonging to the 'other' CPU has a little higher latency to access since it has to go across the QPI. In non-NUMA mode, this is ignored, and all memory is treated as equal from the OS perspective. in NUMA mode, an attempt is made to keep process memory on one CPU's memory, and to prefer scheduling those processes on the cores of that CPU. This can get messy, say you have a process running on core 0 (in cpu0) which allocates a big block of shared memory, then spawns 8 worker threads which all run concurrently and use this same shared working memory space. there's only 4 or 6 cores on each of the two CPUs, so either these worker threads have to wait for an available core on the same CPU as the memory allocation, or some of them end up running across the QPI bus anyways.
I believe Linux, even RHEL 6, does support NUMA configurations, but its very questionable if a random typical workload would actually gain much from it, and it adds significant overhead in keeping track of all this.
On 10/1/2017 9:39 AM, John R Pierce wrote:
I believe Linux, even RHEL 6, does support NUMA configurations, but its very questionable if a random typical workload would actually gain much from it, and it adds significant overhead in keeping track of all this.
a technical paper examining exactly this, on exactly the sort of architecture you have, using RHEL 6.
http://iopscience.iop.org/article/10.1088/1742-6596/664/9/092010/pdf
John R Pierce pierce@hogranch.com writes:
On 10/1/2017 8:38 AM, hw wrote:
HP says that what they call "NUMA split mode" should be disabled in the BIOS of the Z800 workstation when running Linux. They are reasoning that Linux kernels do not support this feature and even might not boot if it´s enabled.
hmm, that workstation is a dual Xeon 56xx (Westmere-EP, derived from Nehalem), new in 2010
Since it apparently was years ago since they made this statement, I´m wondering if I should still leave this feature disabled or not. More recent kernels might support it, and it´s supposed to improve performance.
Could someone explain what this feature actually is or does, and if Centos kernels support it?
On these sorts of dual socket hardware architectures, half of the memory is directly attached to each CPU, and the two CPUs are linked with a QPI bus. All the memory appears in one unified address space, but the memory belonging to the 'other' CPU has a little higher latency to access since it has to go across the QPI. In non-NUMA mode, this is ignored, and all memory is treated as equal from the OS perspective. in NUMA mode, an attempt is made to keep process memory on one CPU's memory, and to prefer scheduling those processes on the cores of that CPU. This can get messy, say you have a process running on core 0 (in cpu0) which allocates a big block of shared memory, then spawns 8 worker threads which all run concurrently and use this same shared working memory space. there's only 4 or 6 cores on each of the two CPUs, so either these worker threads have to wait for an available core on the same CPU as the memory allocation, or some of them end up running across the QPI bus anyways.
I believe Linux, even RHEL 6, does support NUMA configurations, but its very questionable if a random typical workload would actually gain much from it, and it adds significant overhead in keeping track of all this.
Is it possible that you are confusing enabling/disabling NUMA with NUMA split mode?
It is possible to disable/enable NUMA, and when NUMA is enabled, you can also enable the mysterious NUMA split mode.
I´m trying to download the PDF you pointed me to, but the download is stalled. I´m running Centos 7.4, but perhaps there´s an explanation in the PDF that might tell me what NUMA split mode is supposed to be.
So far, I found out that KSM is disabled by default and would probably be a disadvantage here, so I´m using numad and probably gain something from most, if not all, things using local memory instead of going across nodes. This will need some further investigation, though.
On 10/1/2017 9:10 PM, hw wrote:
I´m trying to download the PDF you pointed me to, but the download is stalled. I´m running Centos 7.4, but perhaps there´s an explanation in the PDF that might tell me what NUMA split mode is supposed to be.
it loaded fine here again tonight. huh.
the gist of the article is that they got at best 2-4% improvements with RHEL 6/SLES 6 on dual nehalem/westmere Xeon's when NUMA was enabled. I see no mention of NUMA Split mode
John R Pierce pierce@hogranch.com writes:
On 10/1/2017 9:10 PM, hw wrote:
I´m trying to download the PDF you pointed me to, but the download is stalled. I´m running Centos 7.4, but perhaps there´s an explanation in the PDF that might tell me what NUMA split mode is supposed to be.
it loaded fine here again tonight. huh.
Internet in this country sucks; it´s almost the worst in the world.
the gist of the article is that they got at best 2-4% improvements with RHEL 6/SLES 6 on dual nehalem/westmere Xeon's when NUMA was enabled. I see no mention of NUMA Split mode
I´ve been able to get the PDF with the tor browser today.
I´d say the gist is that using numad can improve performance, depending on the hardware used and on the workload it performs. That, as usual, leaves everyone to do their own testing with their hardware and their workload.
In my case --- besides using numad, which won´t hurt anything --- it might be best to use numactl to pin the particular application I want to tune the most to one node and its memory. There is more than enough local memory for it. In theory, that should give best overall performance, and the particular application can nothing but benefit from using local memory. Since it´s also doing disk I/O, I need to find out which of the two nodes might be preferable. Benchmarking would be really difficult.
However, that still leaves the mystery what NUMA split mode is supposed to be.