Hi I have a Virtual Machine with CentOS 7 64-bit having 6 CPUs (intel). I have a process with 10 threads. two threads of the process pinned to two CPU's using affinity. These threads processes lot of network messages via Sockets. Other threads wait for other events using pthread condition wait. When two threads CPU reaches 75 to 80% then the schedular, schedules most of the time for those two threads, even though remaining 4 CPU's are ideal. Due to this other threads event processing is delayed a lot. Is it mostly do to more signal's from kernel space to user space tor those threads ? Is there a way to avoid this issue? RegardsRadha
On 09/02/2015 08:17 PM, Radha krishna wrote:
I have a Virtual Machine with CentOS 7 64-bit having 6 CPUs (intel). I have a process with 10 threads. two threads of the process pinned to two CPU's using affinity. These threads processes lot of network messages via Sockets. Other threads wait for other events using pthread condition wait. When two threads CPU reaches 75 to 80% then the schedular, schedules most of the time for those two threads, even though remaining 4 CPU's are ideal. Due to this other threads event processing is delayed a lot. Is it mostly do to more signal's from kernel space to user space tor those threads ? Is there a way to avoid this issue?
Red Hat has an excellent guide on performance tuning libvirt. I suggest you start there: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm...
This is a very complex topic, and it would be difficult to provide specific guidance without lots of information about the system being tuned. We would need to know the specifics of the hardware configuration (memory and CPU layout), CPU affinity for the qemu process hosting the VM, CPU topology for the VM, and probably a lot more information about the process you're scheduling. For instance, how much CPU time at the extra threads using, and how do you know what CPUs they're running on?
My uneducated guess as to your question: At 75% of CPU time, it may be ideal to run the other threads on the same cores to minimize latency to the memory they share with the network processing threads.