I am experiencing an issue that my process does not wake out of a select() call when a single character is received in an input file descriptor when running as a VMware guest.
Anyone ever experienced this ?
I can run tshark and see the character arrive, but my process does not wake up and see that character. Most times it works - but once in a while it does not.
So I made a change on my code - and did not just wait on select() - but just try to read the buffer all the time and print the results. once in a while that character is "delayed" getting to my input buffer. Top reports the machine is 99% idle.
Any thoughts?
Jerry
On 12/3/19 8:46 AM, Jerry Geis wrote:
I am experiencing an issue that my process does not wake out of a select() call when a single character is received in an input file descriptor when running as a VMware guest.
Anyone ever experienced this ?
I can run tshark and see the character arrive, but my process does not wake up and see that character. Most times it works - but once in a while it does not.
So I made a change on my code - and did not just wait on select() - but just try to read the buffer all the time and print the results. once in a while that character is "delayed" getting to my input buffer. Top reports the machine is 99% idle.
Any thoughts?
Jerry _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
You don't say what the app is written in but I ran into this with perl. perl apps can either be line buffered or character buffered ($| if I remember right is the switch). Line buffered means the buffer is not delivered until a newline character is received. If nothing else, try "<char>\n" and see if that gets consistently delivered.
Cheers, Dave
On Dec 3, 2019, at 9:18 AM, David G. Miller dave@davenjudy.org wrote:
On 12/3/19 8:46 AM, Jerry Geis wrote:
I am experiencing an issue that my process does not wake out of a select() call when a single character is received in an input file descriptor when running as a VMware guest.
You imply but don’t say that this doesn’t happen when the app is running on bare metal. Is that the case?
Anyone ever experienced this ?
No, and I’ve been writing sockets-type code since the days when it wasn’t clear whether BSD sockets would win out over AT&T TLI/XTI/STREAMS.
once in a while that character is "delayed" getting to my input buffer.
That’s probably the Nagle algorithm:
https://en.wikipedia.org/wiki/Nagle%27s_algorithm
It’s intentional. You almost never want to disable it.
Perl apps can either be line buffered or character buffered
I think that’s controlled by the kernel’s terminal driver code, not by Perl. Perl is just giving you an alternate configuration to the underlying termios() or whatever call controls this.
Anyway, you have to go out of your way to get line-buffered sockets on Linux. One way is to bind a socket to a pty, as ssh does, bringing the terminal I/O code into it again, but I doubt Jerry’s doing that.
I’d bet Jerry's app is just making assumptions about the way TCP works that just aren’t true.
Jerry, please show your sockets setup code and the skeletonized read loop. I’m talking socket(), bind(), setsockopt(), etc. I want to see every sockets call. Your app logic you’re free to keep hidden away.
You don't say what the app is written in but I ran into this with perl. perl apps can either be line buffered or character buffered ($| if I remember right is the switch). Line buffered means the buffer is not delivered until a newline character is received. If nothing else, try "<char>\n" and see if that gets consistently delivered.
That is the funny thing also - that character I'm waiting on is a CR. My program is written in C.
vmtoolsd is running.
Jerry
Seems like its the single byte thing... I tried adding:
int flag = 1; if(setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag)) < 0)
but did not have any effect. I also did the echo 1 > /proc/sys/net/ipv4/tcp_low_latency seems to have no effect also.
Jerry
https://sysctl-explorer.net/net/ipv4/tcp_low_latency/ According to this the option has "no effect" What is the modern way to turn off low latency ?
jerry
On Dec 3, 2019, at 12:20 PM, Jerry Geis jerry.geis@gmail.com wrote:
int flag = 1; if(setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag)) < 0)
So first, I said “don’t do that,” and then you went and did that. :)
But second, I’m guessing you did this on the receiving side, where it has no effect under any conditions. The Angle algorithm is about delaying the first packet on the sender’s side in anticipation of shortly receiving more data that can go in the same packet.
Additionally, Nagle’s algorithm only works when there’s unacknowledged data, not on the first packet out on a new conn.
Now that we’ve dispensed with Nagle, let’s get down to the actual issue.
Warren,
Now that we’ve dispensed with Nagle, let’s get down to the actual issue.
Correct. I was trying to find something... Agreed that is on the sending side - I am on the receiving side. Are there other options that this single byte CR over socket is not getting seen by my application. tshark shows its been received, I have tried to skip my select() call and just call recv() with nonblocking direct - the byte is not seen. Very odd.
Jerry
On Dec 3, 2019, at 1:11 PM, Jerry Geis jerry.geis@gmail.com wrote:
Are there other options that this single byte CR over socket is not getting seen by my application.
Sure, but without the code, you’re reducing me to blind speculation. I’m offering free debugging services here.
You also haven’t answered the question of whether the VM qualifier in the subject line actually affects the symptom.
If the problem only occurs under VMware, are we talking about ESXi or the Workstation/Fusion flavor? (Type 1 vs Type 2 hypervisor?)
And if it also occurs when you run the receiver on C7 on bare metal, then you’ve got a generic sockets problem, not a VM problem.