[Home]

Summary:ASTERISK-29128: res_srtp: Authentication failure after hold/unhold
Reporter:laszlovl (lvl)Labels:patch
Date Opened:2020-10-16 06:47:57Date Closed:
Priority:MajorRegression?Yes
Status:Open/NewComponents:Resources/res_srtp
Versions:16.13.0 Frequency of
Occurrence
Constant
Related
Issues:
is caused byASTERISK-28903 res_srtp: Answered Crypto Suite might be wrong in SDP/SDES.
Environment:Attachments:( 0) filtered.log
( 1) snom_changed_srtp_suite_13_256.patch
( 2) snom_changed_srtp_suite_13.patch
( 3) snom_changed_srtp_suite_16.patch
( 4) snom-srtp-debug-filtered.log
Description:As simple as the title indicates. Put an SRTP call on hold, unhold it, and Asterisk starts logging "SRTP unprotect failed on SSRC 1509410849 because of authentication failure" afterwards. No more audio is transmitted.

Traced the problem to commit https://github.com/asterisk/asterisk/commit/c00b032bbfc14f40537989477229f189a1b529d7 (ASTERISK-28903), without it everything works fine.

Asterisk 16.13, libsrtp 1.5.4.
Comments:By: Asterisk Team (asteriskteam) 2020-10-16 06:47:58.641-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Alexander Traud (traud) 2020-12-09 12:05:35.838-0600

Although the issue looks simple … I re-visited my change for ASTERISK-28903 and the cause it not obvious to me. Therefore, I need the state and transitions. Consequently, I tried to reproduce your scenario -- and failed. Looking at your SIP logger output, you use a Snom desk phone. That phone is connected via PJSIP. That phone dialed another party, connected via PJSIP. Then, you use the software (?) button on the Snom to hold that call. Immediately (?) you press that button again to un-hold the call.

* I am using Asterisk 16.13, its bundled PJ Project, and libSRTP 1.5.4 just like you.
* I am using not a Snom D715 but Snom D725. That should not matter.
* I am not using firmware 8.9.3.*8* but 8.9.3.*60*, because I do not have any older version.

Therefore, please, re-run your setup and go for {{core set debug 1}}. Furthermore, if possible, run a packet trace like Wireshark in the background. What is the SSRC and SEQ of the RTP stream before and after the hold? After that can you do me a favor and update one of your devices to firmware [8.9.3.60|http://wiki.snom.com/Firmware/V8_9_3_60]? You do not need anything newer; that makes sure we have exactly the same setup. Then, please, re-run your setup again, again the RTP-SSRCs and RTP-SEQs.

Simply out of curiosity:
Although you are using the latest Asterisk version, why are you using a libSRTP version and Snom firmware from the year 2016?

By: laszlovl (lvl) 2020-12-15 13:10:27.414-0600

Thanks for looking into this.

The phone was actually running firmware 8.9.3.80 (I believe it's not running a newer version because all subsequent firmwares were found to contain showstopper bugs); my script to scrub sensitive information from a debug log is recognizing the firmware version as an IP address. As for libSRTP, if I understand correctly, Asterisk developers recommend to stick with 1.5.4 (https://wiki.asterisk.org/wiki/display/AST/libsrtp) instead of 2.x.

I first downgraded the phone to 8.9.3.60 and was able to reproduce the problem in exactly the same fashion. I then upgraded it back to 8.9.3.80 for the subsequent reproduction, you'll be able to find that one on Snom's servers as well.

As it concerns SRTP, a wireshark trace won't tell you much. But I created another debug log with "rtp set debug on" (as well as core debug level 1), which will show info about the sequence numbering.

By: Alexander Traud (traud) 2020-12-16 03:08:16.557-0600

bq. wireshark trace won't tell you much

Asterisk logs are terribly difficult to read because I cannot filter. Who started the call, who is involved:
* You used Blink on a Linux machine and started the call.
* You use the channel driver PJSIP and the usual ‘Dial’ command in your dialplan (extensions.conf).
* You call a Snom D715 with firmware 8.9.3.80.
* On your Snom, you put the call on hold.
* On your Snom, you unhold the call again.
* Now, that RTP stream coming from the Snom is marked as unauthenticated; and just that stream (not the one coming from Blink).

*First question*: Is that correct?
*Second question*: What happens when you start the call the other way around: Snom calls Blink, not Blink but still the Snom puts the call on hold?

In your last log, I found something fascinating:
In the beginning, your Snom is accepting SHA1_80. When the Snom puts the call on hold, it reuses the existing crypto-key but changes the crypto-suite to SHA1_32. I have to find a way to reproduce that (today is not test day here).

*Third question*: On your Snom → Web interface → Identity x → RTP → SRTP Auth-Tag: [AES-80|https://service.snom.com/display/wiki/user_auth_tag]: Does that workaround the issue? If yes, let us undo the authentication tag on your Snom to its default AES-32. Then,
*Fourth question*: On your Asterisk, in the configuration file {{pjsip.conf}}, in the endpoint(s) of your Snom(s), set {{srtp_tag_32=yes}}: Does that workaround the issue as well?

I know, these are just (possible) workarounds and not a solution. Nevertheless, those four answers would help me to reproduce your scenario.

bq. 8.9.3.80 … scrub sensitive information

I see. Thanks for double-checking firmware .60. But no worry, I try to catch-up with your scenario.
Not related to this issue, but just for your information: Newer firmware introduce security updates as well. Actually, the firmware released on the 1st of December 2020 included a security fix for SIP over TLS. Since day one, Snom was using a wrong, possible manipulated domain to check the server name in the TLS certificate. If you face showstopper bugs, did you discuss those with the Snom support already? Anyway, for the sack of simplicity and because I am able to reproduce that scenario, let us stay with Snom 8.9.3.80 for now.

bq. recommend to stick with 1.5.4

I know that Wiki entry. However, it is bad advice because it does not explain what was wrong with older versions exactly; and it does not explain why libSRTP 2.x is not tested. Furthermore, it is uncertain whether libSRTP 1.5.x is still maintained and ‘secure’. Actually, it would be the job of the one giving such advice to double-check that (again and again because that can change on a daily basis). libSRTP 2.3 works here without problems. And the community should use the latest to make sure the code in Asterisk stays compatible. Anyway, for the sack of simplicity and because I am able to reproduce that scenario, let us stay with libSRTP 1.5.4 for now.

By: Alexander Traud (traud) 2020-12-16 07:38:28.392-0600

No need to answer my questions because I was able to reproduce the issue with my production system (Asterisk 13.38.0/chan_sip, libSRTP 2.3.0, Snom 10.1.64.14). Thank you for reporting this issue! The trick is, the Snom must be called. Snom accepts either authentication-tag length but puts the call on hold with the configured tag length. If those tag lengths differ, Asterisk gets ‘confused’.

At least, we have two possible workarounds. Now, I am investigating if such a mid-call tag-length-change-with-the-same-crypto-key is ‘allowed’ by RFCs. Even if not, the next step is to determine whether this can be ‘accepted’ by Asterisk. By the way, the latter is not that easy as it sounds either because there might be scenarios (sRTP-ROC larger than zero) in which this scenario might have worked never. In any case, I am going to file a feature request with Snom not to change the tag-length mid-call.

By: laszlovl (lvl) 2020-12-16 10:38:30.542-0600

Good to hear, thanks for digging! If there's anything else I can do, ping me.

By: Alexander Traud (traud) 2021-01-11 08:14:34.441-0600

Well, you can do one thing: Could you not rely on me? This issue here is a perfect example of what is wrong within the VoIP/SIP/SDP/RTP ‘industry’. For example, I am not an employee but just a contributor to the Asterisk project. Furthermore, I am not a professional but just a hobbyist, not gaining a penny. From an objective point of view, I have no motivation here. Therefore, I cannot be expected to be the driver of this issue through the various layers. Is it a problem for the Asterisk project, is it a problem for Digium/Sangoma, is it a problem for Snom? Or is it even an issue in the specification itself, which layer (SDP or SDES-sRTP), and therefore which RFC exactly? Or is this not a software bug but an interoperability issue for SIP implementers in general, and we would need clarification via a ‘best practice’ RFC? Something is wrong for sure; the call has no audio after resume. Now, what is the correct way to fix this? Should an end-user (like me) be the driver because he is affected by this?

Back to some facts: Over the holidays, I went through [my collection|https://www.traud.de/voip] of SDES-sRTP enabled desk phones. The (bad or good) news, I did not face this issue with any other software platform. The (good or bad) news, I found a lot of other issues because of this call scenario. And I understand even less how the protocol designers thought about this scenario. Near to every implementation does it differently; the very same call scenario.

The worse news, the change for ASTERISK-28903 is not the cause. To test my statement, undo the change as you did already, configure your Asterisk to send not an 80 but 32-bit tag, configure your Snom to use not a 32 but 80-bit tag on default. Now, when you resume, you have exactly the same outcome: Your Asterisk cannot authenticate the sRTP packets from your Snom anymore.

It was pure luck that the scenario Call Hold/Resume with SDES-sRTP worked. The change for ASTERISK-28903 just unveiled this software bug. Before ASTERISK-28903, {{srtp->flags}} contained the flag for
* AST_SRTP_CRYPTO_TAG_80 and
* AST_SRTP_CRYPTO_TAG_32

at the end of {{ast_sdp_crypto_process(.)}}. This was and is silly because both flags are mutually exclusive. When you set debug level 1, you see ‘SRTP remote key unchanged; maintaining current policy’. The key did not change. That is correct. However, the crypto suite changed.

Now, let us change the key when the suite changed. Attached, you find three patches, one for
* Asterisk 13,
* Asterisk 13 with the patch for ASTERISK-26190 applied,
* Asterisk 16 and newer

It worked here in all scenarios: Asterisk with 32 or 80 bit as default, Snom with 32 or 80 bit as default. Please, give it a try. However, keep the call going for more than 22 minutes (0xffff × 20 milliseconds), then do the first Call Hold/Resume. Ta-da! Your Asterisk cannot authenticate the sRTP packets from your Snom anymore. The cause is the roll-over counter (sRTP-ROC). In the Snom desk phone, the ROC is 1 now because the RTP-SEQ wrapped once. In Asterisk, the ROC gets reverted to 0 because (with the attached patch) Asterisk calls crypto_activate(.) on Call Hold. Again, does a specification exist which states what to do with the sRTP-ROC in such a scenario: Is 0 or 1 correct?

I have not decided yet whether to submit that patch into Asterisk (if you like it, go for it and go for the review process). Although the patch is correct and does not do any other harm, it just hides the issue further. Now, you have to call at least 22 minutes, put the call on hold, and when you resume, you face the issue. We are just hiding a software issue being there for more than 15 years.

*Long story short*:
Who is the driver who goes through all those questions and answers them together with the VoIP/SIP/SDP/RTP community? With Snom, I filed ticket [#42056|https://helpdesk.snom.com/support/tickets/42056].
Until resolved, I recommend configuring your Snom to use the same authentication bit length on default as your Asterisk does.

By: Alexander Traud (traud) 2021-02-10 08:37:17.992-0600

Yeah, that is the SIP standard. Snom does not go further but did not tell me their conclusion. By the way, the ticket-id is [HEU-8749|https://jira.snom.com/servicedesk/customer/portal/2/HEU-8749] now. The Asterisk Team does not have anyone who escalades this to [SIP Implementors|https://lists.cs.columbia.edu/mailman/listinfo/sip-implementors], [SIP Forum|https://en.wikipedia.org/wiki/SIP_Forum], and/or [IETF WG mmusic|https://tools.ietf.org/wg/mmusic/] either.

[~lvl]:
# did you report this to Snom as well (and mention that IDs)? Do you have access to their ticket system at all?
# are you going to drive this further, for example ask the SIP Implementors mailing list?

By: laszlovl (lvl) 2021-02-10 10:21:25.789-0600

Thanks for your time, Alexander. I am no longer working for the company where I encountered this problem, so I don't have the necessary hardware/access anymore to continue this.