ASTERISK-27056: codec_opus responds with incorrect RTP Payload Type

[Home]

Summary: ASTERISK-27056: codec_opus responds with incorrect RTP Payload Type

Reporter: Luke Escude (lukeescude) Labels: patch

Date Opened: 2017-06-13 16:41:00 Date Closed: 2021-09-17 13:08:50

Priority: Major Regression?

Status: Closed/Complete Components: Core/CodecInterface Core/General

Versions: 13.16.0 Frequency of
Occurrence

Related
Issues:
is related to ASTERISK-25166 No audio when using direct media and a codec with a dynamic payload

Environment: x64 CentOS Attachments: ( 0) asterisk_opus.txt
( 1) asterisk.pcap
( 2) grandstream.pcap
( 3) issue-27056-1.patch
( 4) payload_dynamic_reINVITE.patch

Description: A phone call to an Opus extension works just fine (Asterisk sends an invite with 107 opus). Two-way audio and all that jazz.

However, if the Asterisk endpoint puts the call on hold, then retrieves the call (the endpoint sends an INVITE to retrieve the call with 123 opus) Asterisk agrees to use 123 opus, however it continues to send 107 opus instead.

Comments: By: Asterisk Team (asteriskteam) 2017-06-13 16:41:00.625-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].
By: Luke Escude (lukeescude) 2017-06-13 16:42:20.770-0500

PCAP of the same call, from the POV of Asterisk and the Grandstream endpoint.
By: Sean Bright (seanbright) 2017-06-14 09:02:20.121-0500

Reattached PCAPs after filtering
By: Sean Bright (seanbright) 2017-06-14 09:03:02.061-0500

[~lukeescude], this is {{chan_sip}} or {{chan_pjsip}}?
By: Sean Bright (seanbright) 2017-06-14 09:19:55.738-0500

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Luke Escude (lukeescude) 2017-06-14 09:22:29.648-0500

It is PJSIP.

No problem I'll do some more testing today - additionally, I am not sure if this is an issue with Asterisk or Grandstream so I'm working with them as well.
By: Jared Hull (fortytwo) 2017-06-14 09:26:42.751-0500

We are using chan_pjsip here.
By: Sean Bright (seanbright) 2017-06-14 09:30:25.441-0500

[~lukeescude], it doesn't appear that the GS is doing anything technically wrong, but changing the payload arbitrarily certainly makes things more difficult. See the related issue ASTERISK-25166.
By: Luke Escude (lukeescude) 2017-06-14 09:36:00.313-0500

Sean, why is the payload integer even required?

The codec names/text are all standardized, but it seems the integers are not. So why use them?
By: Sean Bright (seanbright) 2017-06-14 09:46:31.290-0500

[~lukeescude], RTP uses a fixed size header of 96 bytes. The payload type is a 7 bit member of that header. Encoding the codec name into the header would require 1 byte (8 bits) for each character of the codec name plus an indication of the length, so throw in another 2 bytes. So "opus" would require 6 bytes (48 bits), whereas 107 or 123 only requires 7 bits. You'd be adding at least 41 bytes of overhead for every single RTP packet (50 packets a second) if you encoded the name instead of the payload type.
By: Jared Hull (fortytwo) 2017-06-14 11:24:12.163-0500

Attached our debug+verbose log, interestingly we could not reproduce this on 14.4.1 and 14.5.0

Line 10284:
rtp_engine.c: Setting tx payload type 107 based on m type on 0x7f0e02057c90
By: Alexander Traud (traud) 2021-09-17 12:54:15.181-0500

I am able to reproduce this consistently in Asterisk 13 with both SIP channel drivers, chan_sip and pjsip_sip, even with {{directmedia=no}}, even with any dynamic payload type, not only the Opus Codec. Consequently, this affects all media formats which do not have a static payload type [assigned|https://www.iana.org/assignments/rtp-parameters]. In the world of Asterisk, those are G.726-32, iLBC, Speex and the optional Codec2, SiLK, Siren, GSM-EFR, AMR, AMR-WB, 3GPP-EVS, and Opus Codec.

Again direct media is disabled. However, a re-iNVITE from the remote party is required. For example, Asterisk offers the call, the remote party answers the call, and then the remote party puts the call on hold by sending a re-INVITE (see [RFC 5359 Section 2.1|https://datatracker.ietf.org/doc/html/rfc5359#section-2.1]). In that re-INVITE, the remote party does not re-use the dynamic payloads of the SDP offer from Asterisk but uses its own. In that re-INVITE, the payload type is higher than it was in the previous SDP transaction. Finally, the remote party has no other payload type which overwrites the payload type used in the previous SDP transaction. With chan_sip, the problem is {{ast_rtp_codecs_payload_code}}.

Example, which is broken:
# Asterisk offers a call with Opus Codec at payload type 107.
# Unify answers that call with Opus Codec at payload type 107.
# Unify puts the call on hold.
# Unify unholds the call, with an SDP offer with Opus Codec at payload type 124.

Although the Unify expects RTP frames on payload type 124, Asterisk sends its RTP frames with payload type 107.

Example, which works:
# Asterisk offers a call with Opus Codec at payload type 107.
# Unify answers that call with Opus Codec at payload type 107.
# Unify puts the call on hold.
# Unify unholds the call, with an SDP offer with Opus Codec at payload type 106.

Now, Asterisk uses 106 because that is the first Opus Codec it has in its internal list.

Example, which works:
# Asterisk offers a call with Opus Codec at payload type 107.
# Unify answers that call with Opus Codec at payload type 107.
# Unify puts the call on hold.
# Unify unholds the call, with an SDP offer with Opus Codec at payload type 124 and iLBC at payload type 107

Now, Asterisk uses 124 because its previous Opus Codec was overwritten by iLBC.

There are many more examples that simply work, showing how impossible derivation is in the world of SIP and SDP:
* Although it works with many phones, the concept was broken in Asterisk.
* Although it does not work with just a few phones, those phones are not to blame but Asterisk.

*Workaround A*
One workaround is to use the payload types, Asterisk is using. For example, if you configure your phone to send Opus Codec at 107, no issue arises. This is possible in Polycom UC and Grandstream.

*Workaround B*
Another workaround for the phone is to re-use the payload types of the previous session. For example, if the call was established with Opus Codec at 107, the phone continues to use that payload type even in a re-INVITE.

*Workaround C*
Another workaround for the phone is to start at the lowest possible payload type (96) and have no gaps in the types. For example, all types from 96 to 124 are used, then the phone can use payload type 125 for Opus Codec because Asterisk overwrote all its previous assignments.

Many of my phones use the latter two strategies. Of mine, the relatively new ALE Cloud Edition and Unify, which added Opus Codec just months ago, were not fixable. However, more platforms are expected to be affected. Actually, when manually increasing the default payload type for Opus Codec of a Digium D phone, even that platform is affected.

*Solution A*
One solution in Asterisk is to delete all assignments when a re-INVITE arrives; see the attached patch {{payload_dynamic_reINVITE.patch}}. This does not fix another issue: When the remote party answers the call with a different, higher payload type, Asterisk is still sending with its lower one.

*Solution B*
As [~fortytwo] mentioned, another solution is to use Asterisk 14 or newer because that uses two lists, one for sending aka transmission (TX) and one for expecting aka receiving (RX), see [commit 1a549ed|https://github.com/asterisk/asterisk/commit/1a549ed]. That is just one part of the change. Over the years, at least three changes have been made around this. So backporting that is not so easy.

@ Bug Marshals
Because Asterisk 13 receives no bug fixes anymore, just security fixes, and because this issue is believed to be fixed since Asterisk 14, please, consider to close this issue here. For those forced to use Asterisk 13, like me, have a look at my attached patch.
By: Kevin Harwell (kharwell) 2021-09-17 13:08:29.462-0500

[~traud] Many thanks for the analysis, testing, workarounds, and solutions for this issue. It is much appreciated.
By: Alex Hermann (gaaf) 2022-01-11 05:08:26.448-0600

Hi [~traud],

In your patch, is just destroying the payloads considered safe? I'm testing your patch (amongst others) on a system and I see crashes with unlocking the codecs_lock. Could it be your patch causes the payloads (including the lock) to be ripped away under some thread (or maybe even the same thread) still holding the codecs_lock?

<edit>
It seems it is generally not safe to destroy locks that are still reachable and/or could still be locked. [https://stackoverflow.com/a/55625308].
</edit>

I have not fully traced the crash yet, but below seems to occur on receiving early media RTP.

{code}
#1 0x0000556b15153fa4 in __ast_rwlock_unlock (filename=0x556b1528986b "rtp_engine.c", line=1073, func=0x556b1528ad50 <__PRETTY_FUNCTION__.16814> "ast_rtp_codecs_payload_code",
t=0x7f4e6a81c2e0, name=0x556b15289acc "&codecs->codecs_lock") at lock.c:823
res = 0
#2 0x0000556b151b9652 in ast_rtp_codecs_payload_code (codecs=0x7f4e6a81c2c0, asterisk_format=1, format=0x556b17765510, code=0) at rtp_engine.c:1073
type = 0x7f4e69037880
i = 8
payload = 8
__PRETTY_FUNCTION__ = "ast_rtp_codecs_payload_code"
#3 0x00007f4eea2b1f10 in ast_rtp_write (instance=0x7f4e6a81c0f8, frame=0x7f4f192e0088) at res_rtp_asterisk.c:4344
rtp = 0x7f4f1a0befe0
remote_address = {ss = {ss_family = 2, __ss_padding = "?\300\271`\224\211", '\000' <repeats 111 times>, __ss_align = 0}, len = 16}
format = 0x3f860f0fa707e425
codec = 0
__PRETTY_FUNCTION__ = "ast_rtp_write"
#4 0x0000556b151b721a in ast_rtp_instance_write (instance=0x7f4e6a81c0f8, frame=0x7f4f192e0088) at rtp_engine.c:520
res = 32591
__PRETTY_FUNCTION__ = "ast_rtp_instance_write"
#5 0x00007f4eecb49b3f in chan_pjsip_write (ast=0x7f4f1b13bd78, frame=0x7f4f192e0088) at chan_pjsip.c:880
channel = 0x7f4f1a02e6d8
pvt = 0x7f4f1afda7e8
media = 0x7f4f188f6838
res = 0
__PRETTY_FUNCTION__ = "chan_pjsip_write"
#6 0x0000556b150ca02a in ast_write (chan=0x7f4f1b13bd78, fr=0x7f4f192e0088) at channel.c:5630
res = -1
f = 0x7f4f192e0088
count = 0
hooked = 0
__PRETTY_FUNCTION__ = "ast_write"
#7 0x00007f4eec9347cb in wait_for_answer (in=0x7f4f1b13bd78, out_chans=0x7f4e6d1b2c40, to=0x7f4e6d1b28f8, peerflags=0x7f4e6d1b3790, opt_args=0x7f4e6d1b2fb0, pa=0x7f4e6d1b3050,
num_in=0x7f4e6d1b2c60, result=0x7f4e6d1b2900, dtmf_progress=0x0, ignore_cc=1, forced_clid=0x7f4e6d1b2c80, stored_clid=0x7f4e6d1b2cd0) at app_dial.c:1679
f = 0x7f4f192e0088
c = 0x7f4f2802c508
o = 0x7f4f28021390
pos = 2

<snipped the rest>
{code}
By: Alexander Traud (traud) 2022-01-11 12:47:36.043-0600

Puh. Good question. I recommend to escalate this question to the [mailing list for developers|http://lists.digium.com/mailman/listinfo/asterisk-dev] (and do not mention you still use Asterisk 13 because it is end-of-life, or you are about this issue here because it is closed – but ask more general). Sounds like a generic question to me. {{ast_rtp_codecs_payloads_destroy}} is a public function. There is no note in its Doxygen about this. And yes, I assumed it does not destroy if there is a lock still active. Looking at its code, on the first glance, it looks like it does destroy anyway, exactly as mentioned in your StackOverflow question. It calls ast_rwlock_destroy which calls pthread_rwlock_destroy.

If you enable {{DEBUG_THREADS}} in the compiler options via {{make menuselect}}, you might see more (afterwards you have to {{make clean}} and re-make). So, please, ask this on the mailing list to get other opinions or in-sights. Then, please, update here. Might be a bug in {{ast_rtp_codecs_payloads_destroy}}. Or might be a bug in my patch. In the latter case,
{code}
ast_rtp_codecs_payloads_destroy(dest);
ast_rtp_codecs_payloads_initialize(dest);
{code}
have to be moved not before but after the while loop. In any case, great finding, thanks for reporting!
By: Alex Hermann (gaaf) 2022-01-11 14:13:19.227-0600

Thanks for looking into it.

I think the easiest solution is to just empty the codecs list manually while holding the codecs_lock, and not call {{ast_rtp_codecs_payloads_destroy()}} at all. That way the lock will survive and there is guaranteed no other thread accessing the codec list. I'll see if I can cook up some patch, but don't hold your breath.
By: Alexander Traud (traud) 2022-01-12 03:06:53.971-0600

Actually, I copied {{ast_rtp_codecs_payloads_clear}}. Not sure, why I did not use it. {{ast_rtp_codecs_payloads_destroy}} is a public API used even in Asterisk itself, in the channel drivers SIP, PJSIP, MGCP, Motif, as destructor for an RTP engine, and the newly added Stream API. Consequently, this should be looked at first. It could be the cause for zillion of other issues not investigated that well, yet. If you are not planning to contact the mailing list, please say so. Because it is your finding, the honors are yours.
By: Alex Hermann (gaaf) 2022-04-15 09:15:20.464-0500

Simply clearing the list of payload types didn't do the trick so I have created an alternative way to clear the codecs not present in the new negotiation.

Initiated by a non-Opus related issue, but I think it should handle this case as well.