On Sunday, January 26, 2020 11:18:36 PM CET Pete Biggs wrote:
First of all - disclaimer - I'm no network specialist, I just read and am interested in it. I may get things wrong!!
Both physical interfaces show the same. But does this mean it's on as in "rx- checksumming: on" or off as in "tx-checksum-ipv4: off [fixed]"?
As far as I understand it rx-checksum is the underlying wire checksumming - and from what I've read about it, disabling that disables the UDP checksums.
You mean layer 1 checksumming? Is there such a thing with ethernet? I think I read something about encoding, when I was trying to understand what "bandwidth" actually means, being involved in signal transmissions; and I seem to remember that there was no checksumming involved and it had to do with identifying signals as a requirement for the very possibility to transmit something before anything could be transmitted at all.
Assuming that I do not receive packets with invalid UPD checksums, then the packages must be somehow altered and their UPD checksums recalculated to arrive here. Does bad hardware etc. do that? Why would the UDP checksums just happen to get recalculated correctly but like randomly without intent?
I'm not sure I understand what you are asking.
It is about VOIP calls via SRTP being interrupted at irregular intervals. The intervals appear to depend on the time of day: Such phone calls can last for a duration of about 5--25 minutes during the day to up to 1.5 hours at around 3am before being interrupted.
Asterisk says that a package is being replayed, meaning that libsrtp has already seen and processed the packet earlier. That can happen a couple times until asterisk reports authentication failures. The result is that the call is interrupted in that I can not hear the opposite end while the other end sometimes can still hear me, sometimes not. The interruption can take even minutes and the audio can continue after that, though usually I either hang up the call, or the calls ends by itself before the audio is back.
IIUC, authentication failures mean that libsrtp figures that the authentication tag of an SRTP package does not match the data contained otherwise within the packet. The authentication tag is encrpyted on the sender side after initially keys have been exchanged between sender and receiver from which new keys are being derived as needed. The key exchange can go over SIP (using TLS) when sdes is used, which it is in this case.
The receiver decrypts the authentication tag and verifies that the tag matches all the other data in the packet. Only when the package was thusly successfully authenticated, the RTP-payload of the package is decrypted.
The SRTP package seems to be the entire payload of the UDP package, so if the data of the SRTP package gets damaged or were to be intentionally altered, the UDP checksum would have to be intentionally re-calculated.
Two independent installations of asterisk at physically different locations are showing the same error messages, both connecting to the same VOIP provider.
As you can imagine, this is really fun to debug ...
But it's unlikely (very unlikely) that the checksums are randomly correct. But packet checksums are recalculated when packets are forwarded by layer 4 switches - the contents of the package are inspected as part of the switching process.
Yes, I thought so, IIRC it's required for routing and changing the TTL maybe.
Now that someone would intentionally alter the SRTP packages and re-calculate the checksums seems rather unlikely, all the more so since they would need to do that at two different places.
Only when asterisk (i. e. libsrtp) finally verifies the authentication tag of an SRTP package against the authenticated part of the package --- which, according to RFC 3711, seems to be the entire payload of the UPD package --- the verfication fails.
How is that possible?
If it's SRTP checksum error, then that checksum is part of the packet payload at the application level - the UDP checksum is for the whole packet. Presumably the contents of the application payload were altered after the SRTP checksum was calculated but before the UDP packet checksum. It could be a bad layer 4 switch I suppose.
Right --- or the SRTP package has been created incorrectly by their phone system because it is overloaded at busy times, or it's buggy.
My favorite theory is that I am sometimes suddenly receiving the wrong SRTP stream. I think it would fit the symptoms. Perhaps the VOIP provider is experiencing interesting NAT issues when their connection tracking is getting messed up at times when there are more connections than they can handle.
That defective hardware is causing the same problem at both places at the same time seems rather unlikely.
So I've been trying to figure out what the problem might be. After learning all this, I'm sufficiently sure that the problem is on their side.
Probably your best bet is to use wireshark to decode the packets to see what the raw data looks like.
Hm, I tried that and wireshark doesn't seem to like SRTP packages very much. Apparently it doesn't have a way to decrypt SRTP packages at all, even if I could get the initial keys. Maybe someone who is much more proficient with wireshark could find something. To me, it has been useless so far.
If wireshark could do stuff with SRTP packages, what could it possibly show other than that some packages either carry a damaged payload, or that the encryption keys don't fit, which is something I already know? If the problem was with asterisk or libsrtp, the problem would be much more common.