[CentOS] Centos 7: UPD packet checksum verification?

Mon Jan 27 01:44:58 UTC 2020
hw <hw at gc-24.de>

On Sunday, January 26, 2020 11:18:36 PM CET Pete Biggs wrote:
> First of all - disclaimer - I'm no network specialist, I just read and
> am interested in it.  I may get things wrong!!
> 
> > Both physical interfaces show the same.  But does this mean it's on as in
> > "rx- checksumming: on" or off as in "tx-checksum-ipv4: off [fixed]"?
> 
> As far as I understand it rx-checksum is the underlying wire
> checksumming - and from what I've read about it, disabling that
> disables the UDP checksums.

You mean layer 1 checksumming?  Is there such a thing with ethernet?  I think 
I read something about encoding, when I was trying to understand what 
"bandwidth" actually means, being involved in signal transmissions; and I seem 
to remember that there was no checksumming involved and it had to do with 
identifying signals as a requirement for the very possibility to transmit 
something before anything could be transmitted at all.

> > Assuming that I do not receive packets with invalid UPD checksums, then
> > the
> > packages must be somehow altered and their UPD checksums recalculated to
> > arrive here.  Does bad hardware etc. do that?  Why would the UDP checksums
> > just happen to get recalculated correctly but like randomly without
> > intent?
> 
> I'm not sure I understand what you are asking.

It is about VOIP calls via SRTP being interrupted at irregular intervals.  The 
intervals appear to depend on the time of day:  Such phone calls can last for 
a duration of about 5--25 minutes during the day to up to 1.5 hours at around 
3am before being interrupted.

Asterisk says that a package is being replayed, meaning that libsrtp has 
already seen and processed the packet earlier.  That can happen a couple times 
until asterisk reports authentication failures.  The result is that the call 
is interrupted in that I can not hear the opposite end while the other end 
sometimes can still hear me, sometimes not.  The interruption can take even 
minutes and the audio can continue after that, though usually I either hang up 
the call, or the calls ends by itself before the audio is back.

IIUC, authentication failures mean that libsrtp figures that the 
authentication tag of an SRTP package does not match the data contained 
otherwise within the packet.  The authentication tag is encrpyted on the 
sender side after initially keys have been exchanged between sender and 
receiver from which new keys are being derived as needed.  The key exchange 
can go over SIP (using TLS) when sdes is used, which it is in this case.

The receiver decrypts the authentication tag and verifies that the tag matches 
all the other data in the packet.  Only when the package was thusly 
successfully authenticated, the RTP-payload of the package is decrypted.

The SRTP package seems to be the entire payload of the UDP package, so if the 
data of the SRTP package gets damaged or were to be intentionally altered, the 
UDP checksum would have to be intentionally re-calculated.

Two independent installations of asterisk at physically different locations 
are showing the same error messages, both connecting to the same VOIP 
provider.

As you can imagine, this is really fun to debug ...

> But it's unlikely (very
> unlikely) that the checksums are randomly correct. But packet checksums
> are recalculated when packets are forwarded by layer 4 switches - the
> contents of the package are inspected as part of the switching process.

Yes, I thought so, IIRC it's required for routing and changing the TTL maybe.

Now that someone would intentionally alter the SRTP packages and re-calculate 
the checksums seems rather unlikely, all the more so since they would need to 
do that at two different places.

> > Only when asterisk (i. e. libsrtp) finally verifies the authentication tag
> > of an SRTP package against the authenticated part of the package ---
> > which, according to RFC 3711, seems to be the entire payload of the UPD
> > package --- the verfication fails.
> > 
> > How is that possible?
> 
> If it's SRTP checksum error, then that checksum is part of the packet
> payload at the application level - the UDP checksum is for the whole
> packet.  Presumably the contents of the application payload were
> altered after the SRTP checksum was calculated but before the UDP
> packet checksum.  It could be a bad layer 4 switch I suppose.

Right --- or the SRTP package has been created incorrectly by their phone 
system because it is overloaded at busy times, or it's buggy.

My favorite theory is that I am sometimes suddenly receiving the wrong SRTP 
stream.  I think it would fit the symptoms.  Perhaps the VOIP provider is 
experiencing interesting NAT issues when their connection tracking is getting 
messed up at times when there are more connections than they can handle.

That defective hardware is causing the same problem at both places at the same 
time seems rather unlikely.

So I've been trying to figure out what the problem might be.  After learning 
all this, I'm sufficiently sure that the problem is on their side.

> Probably your best bet is to use wireshark to decode the packets to see
> what the raw data looks like.

Hm, I tried that and wireshark doesn't seem to like SRTP packages very much.  
Apparently it doesn't have a way to decrypt SRTP packages at all, even if I 
could get the initial keys.  Maybe someone who is much more proficient with 
wireshark could find something.  To me, it has been useless so far.

If wireshark could do stuff with SRTP packages, what could it possibly show 
other than that some packages either carry a damaged payload, or that the 
encryption keys don't fit, which is something I already know?  If the problem 
was with asterisk or libsrtp, the problem would be much more common.