Hi,
Not sure where to look for my issue. I hope someone can point me in the correct place.
I have been working on a bespoke server package for more than twenty years. It was originally developed on Solaris (Unix), was ported to windows and now ported to Linux for the last five years. This system is in live production under heavy usage everyday. The servers are all written in C++ and use a version of encoded ONC RPC (without bind) to communicate server to server and java for the client displays.
Since about six months ago, I have been experiencing a weird issue with the sockets on my test system. My dev env is on CentsOS 7.7 running on Virtualbox 6 on a Windows 10 machine. The VM has bridge networks interface to my lan using a static IP. Our servers talk on the interface on Virtualbox to other servers possibly on other hosts via my real network. All works well until I do a massive relability and soak test of one of our servers. I send a series of large data message every 15 second or so to one of our servers (say Y), expecting that I might see a lockup and bugs to fix etc in that server Y. But instead of Y server failing what I see is the well know port that our system uses (ie 2323) for name lookup requests, block and I then see timouts of on that socket (this is a different server say Z). All the others servers (A..Y) get timeouts communicating to Z from then on. This effect I don't see on other OSs with similar tests.
If I systemctl stop our service and then restart the servers A-Y start but continue to fail with timeouts to Z. Reboot does the same. I have change the well know port to 23232 and it still fails. I have run the servers in the systemctl as a new user and it still fails. As a mad idea I change the interface so the servers talk on the virtual box internal network and the system returns to operation. Also if I run the servers manually on the command line as my user account they work.
It kind of looks like a firewall/anti-virus/tojan block rule on our well known port 2323 or Z server. As far as I can see the CentOS firewall is not running. The Norton firewall on my PC does not seam to have an rules or warning about my virtualbox ips or ports. Our servers don't cache any ip data.
The first time this happened I was too busy to look at it and just restored the VM from a backup. It then happened a second time a month ago and spend a day looking at the issue found nothing and restored from backup again, putting it down to the centos security update I have just done earlier that day. It happen for a third time on Friday (24th). This time I have done no updates since the last restore so I can be sure its not a centos update. I checked again could find nothing wrong, did all the updates and still nothing worked. Investigated all the firewall and interfaces and it works. I need the system to work on the external bridge network interface and I can't think of anything else todo. The socket error messages are just Timeout, there is nothing in dmesg, or journal that suggests anything.
I am now a complete lost to what is happening and why.
Regards David Finch