[Home]

Summary:ASTERISK-27441: PJSIP: Forked INVITE SDP negotiation gets one way audio.
Reporter:laszlovl (lvl)Labels:patch pjsip
Date Opened:2017-11-22 09:00:57.000-0600Date Closed:2018-03-12 07:24:58
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:15.1.2 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) asterisk-sdpversion-log.txt
( 1) early-media-patch.diff
Description:The scenario:

1. Asterisk dials outbound using chan_pjsip
2. The remote party sends an 183, with IP address 1.2.3.4 in the SDP *1
3. RTP is now flowing between Asterisk and 1.2.3.4
4. The remote party sends a 200, with a *new* IP address 5.6.7.8 in the SDP *2
5. Asterisk is not applying the new SDP details. 5.6.7.8 will start sending RTP to Asterisk, which is rejected:

{code}
[Nov 22 14:29:48] DEBUG[8280][C-00000001]: res_rtp_asterisk.c:5743 ast_rtp_read: 0x7f021c0286e0 -- Received RTP packet from 5.6.7.8:9898, dropping due to strict RTP protection.
{code}

The following log lines hint at the root cause:

{code}
[Nov 22 14:29:46] DEBUG[8237]: res_pjsip_session.c:3187 handle_incoming: Received response
[Nov 22 14:29:46] DEBUG[8237]: res_pjsip_session.c:3171 handle_incoming_response: Response is 200 OK
[Nov 22 14:29:46] DEBUG[8237]: res_pjsip_session.c:798 handle_negotiated_sdp: Pending topology was NULL for channel 'PJSIP/proxy-00000001'
{code}

The message is triggered by the code in https://github.com/asterisk/asterisk/blob/15.1/res/res_pjsip_session.c#L796. Because there is no "session->pending_media_state->topology" at this point, the *handle_negotiated_sdp* function is aborted and RTP from the new IP address is ignored as the session is not moved to the STRICT_RTP_LEARN state.

This behavior was introduced in commit https://github.com/asterisk/asterisk/commit/40de3a12e0caddec0be31aa4ad996c22fc716be5. Reverting that code will however cause Asterisk to segfault on the scenario as described.

It's not clear to me yet why *session->pending_media_state->topology* is null in this scenario, as opposed to all other scenarios where SDP is updated for an existing session.

*1
{code}
SIP/2.0 183 Session Progress

v=0
o=root 1910284739 1910284739 IN IP4 1.2.3.4
c=IN IP4 1.2.3.4
{code}

*2
{code}
SIP/2.0 200 OK

v=0
o=root 4067 4067 IN IP4 5.6.7.8
c=IN IP4 5.6.7.8
{code}
Comments:By: Asterisk Team (asteriskteam) 2017-11-22 09:00:57.663-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Richard Mudgett (rmudgett) 2017-11-22 14:02:07.212-0600

Could you attach the pjsip set logger on output showing the call [1].  I suspect that the SDP version is not incremented so Asterisk assumes that the SDP has already been processed by an earlier message.

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: laszlovl (lvl) 2017-11-22 16:05:43.417-0600

Fair point. The two snippets I added are from an actual pjsip log, but I'll add a longer log below for reference. As you can see, while the 200's SDP version is different from the 180's, it's not increased by one but it's a random (lower) number. Would that be grounds for Asterisk to silently ignore the 200's SDP contents? (Even though it does in fact trigger a _session_inv_on_media_update_ and _handle_negotiated_sdp_)

{code}
<--- Transmitting SIP request (1019 bytes) to UDP:1.6.1.6:8065 --->
INVITE sip:user@dev.com:8065 SIP/2.0
Call-ID: d205d496-618d-4c70-98ee-b4269c8e15eb

v=0
o=- 1120527176 1120527176 IN IP4 1.6.1.6
s=Asterisk
c=IN IP4 1.6.1.6
t=0 0
m=audio 31640 RTP/AVP 9 8 0 3 101
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv

<--- Received SIP response (1010 bytes) from UDP:1.6.1.6:8065 --->
SIP/2.0 183 Session Progress
To: <sip:user@dev.com>;tag=VhNaFWijiHt
Call-ID: d205d496-618d-4c70-98ee-b4269c8e15eb

v=0
o=root 1828559483 1828559483 IN IP4 1.1.1.2
s=Mediaserver
c=IN IP4 1.1.1.2
t=0 0
m=audio 16944 RTP/AVP 8 0 3 101
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=sendrecv


<--- Received SIP response (1062 bytes) from UDP:1.6.1.6:8065 --->
SIP/2.0 200 OK
To: <sip:user@dev.com>;tag=as66679d74
Call-ID: d205d496-618d-4c70-98ee-b4269c8e15eb

v=0
o=root 4067 4067 IN IP4 8.2.1.8
s=session
c=IN IP4 8.2.1.8
t=0 0
m=audio 30874 RTP/AVP 8 0 3 101
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=silenceSupp:off - - - -
a=ptime:20
a=sendrecv


<--- Transmitting SIP request (635 bytes) to UDP:1.6.1.6:8065 --->
ACK sip:user@8.2.1.8 SIP/2.0
Call-ID: d205d496-618d-4c70-98ee-b4269c8e15eb
{code}

By: Richard Mudgett (rmudgett) 2017-11-22 19:02:50.490-0600

Yep.  The SDP version is not increased between the 183 Progress and the 200 OK response messages.  In fact they seems to have *no* relation to each other and the 200 OK's version is less than the 183 Progress message.

The SDP version is the third token after the '='.

The 183 Progress's version is {{1828559483}}
{noformat}
o=root 1828559483 1828559483 IN IP4 1.1.1.2
{noformat}
The 200 OK's version is {{4067}}
{noformat}
o=root 4067 4067 IN IP4 8.2.1.8
{noformat}

Please attach the log file of the call \[1] (verbose, debug and SIP messages included).  No shortcuts this time. :)  We need to see what Asterisk is thinking about the call and if the strictrtp learning is restarted when the 200 OK comes in.  Unless you have forced it otherwise, v15 uses bundled pjproject by default.  Are you using bundled?

\[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

P.S.  A pcap would also be good as we may need to see when the RTP streams switch to the new addresses in relation to the SIP messages.  The strictrtp learning only gives 1.5 seconds to learn a new address before locking to any learned address.


By: laszlovl (lvl) 2017-11-23 08:20:38.699-0600

Right, I thought you only wanted the pjsip logger part. I'll attach the full debug log, although I'm pretty sure the "Pending topology was NULL for channel" message is the only relevant part. Asterisk is learning RTP when the 183 comes in but won't restart learning after the 200. That's in line with what I can see in the code, because "Pending topology was NULL for channel" indicates that the 200's SDP won't be applied.

A few questions at this stage:

1. Where is the code that ignores an SDP when the version number is not higher than the previous version number? (Actually, it's not completely ignored because at least _handle_negotiated_sdp_ is called)
2. Wouldn't it at the least warrant a debug message when this occurs?
3. I see some relevant PJ_LOG statements in pjsip's code but these messages don't appear in asterisk's debug log. Is it possible to see them?
4. Chan_sip has an option to gracefully deal with broken clients in this exact scenario: ignoresdpversion. Would it be possible to add this in Chan_Pjsip as well?

I indeed use the default of with-pjproject-bundled.

Thanks for your help so far!

By: Richard Mudgett (rmudgett) 2017-11-23 14:12:29.141-0600

{quote}
1. Where is the code that ignores an SDP when the version number is not higher than the previous version number? (Actually, it's not completely ignored because at least _handle_negotiated_sdp_ is called)
{quote}
I think that is checked in pjproject.  Another place it could be checked is in res_pjsip_session.c.
{quote}
2. Wouldn't it at the least warrant a debug message when this occurs?
{quote}
Yes.  A debug message would be a good thing.
{quote}
3. I see some relevant PJ_LOG statements in pjsip's code but these messages don't appear in asterisk's debug log. Is it possible to see them?
{quote}
See CLI "pjproject set log level" and {{pjproject.conf.sample}}.  Beware that those are pjproject's log levels that are routed through Asterisk's logging system.  Also level 5 and 6 output a rediculous amount and is rarely useful.
{quote}
4. Chan_sip has an option to gracefully deal with broken clients in this exact scenario: ignoresdpversion. Would it be possible to add this in Chan_Pjsip as well?
{quote}
See issue ASTERISK-27309 about about getting this option incorporated into chan_pjsip.

By: laszlovl (lvl) 2017-11-23 17:46:59.507-0600

Thanks, that's providing some useful additional debug information.

{code}
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:                sip_ua_layer.c ...Received forked Response msg 200/INVITE/cseq=15621 (rdata0x7f492001f4d8) for existing dialog dlg0x7f4928072848
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:                sip_ua_layer.c ...Unhandled forked Response msg 200/INVITE/cseq=15621 (rdata0x7f492001f4d8), response will be handed over to the first dialog
[Nov 23 23:29:08] DEBUG[30326]: res_pjsip_session.c:3359 session_inv_on_state_changed: Source of transaction state change is RX_MSG
[Nov 23 23:29:08] DEBUG[30326]: res_pjsip_session.c:3204 handle_incoming: Received response
[Nov 23 23:29:08] DEBUG[30326]: res_pjsip_session.c:3188 handle_incoming_response: Response is 200 OK
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:             inv0x7f4928072848 ....Received forked final response after SDP negotiation has been done in early media. Renegotiating SDP..
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:             inv0x7f4928072848 ....Got SDP answer in Response msg 200/INVITE/cseq=15621 (rdata0x7f492001f4d8)
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:             inv0x7f4928072848 ....SDP negotiation done, status=0
[Nov 23 23:29:08] DEBUG[30326]: res_pjsip_session.c:798 handle_negotiated_sdp: Pending topology was NULL for channel 'PJSIP/proxy-0000000a'
[Nov 23 23:29:08] DEBUG[30326]: pjproject:0 <?>:             inv0x7f4928072848 ....Received Response msg 200/INVITE/cseq=15621 (rdata0x7f492001f4d8), sending ACK
{code}

Even though Asterisk might be doing the technically correct thing, I suspect it's not doing so for the right reasons.

As far as I can see, the 200's SDP is not explicitly discarded by Pjsip nor Asterisk based on the SDP version number. In fact, Pjsip appears to indicate that it's handling the SDP answer normally and is calling the on_media_update callback in Asterisk. The only thing blocking the SDP from being handled (which would re-start strict RTP learning thus fix the problem) is _session->pending_media_state->topology_ being NULL in _handle_negotiated_sdp_.

That brings me back to the original question: which code is normally responsible for creating a _pending_media_state->topology_ that isn't triggered in this specific scenario?

By: laszlovl (lvl) 2017-11-27 15:12:32.230-0600

Could you clarify - does the change in title mean that you believe the SDP version is the root cause after all?

By: Richard Mudgett (rmudgett) 2017-11-27 16:45:39.009-0600

Reattaching [^asterisk-sdpversion-log.txt] to add the .txt exten for easy viewing.

Yes.  I think this is due to the 200 OK SDP version being lower than the 183 Progress SDP version.  I did not look into the code.  However, we don't seem to be processing the SDP associated with the 200 OK so we don't restart strictrtp learning after locking onto the 183 Progress RTP stream.  As a result you get one way audio because strict RTP is discarding the media coming to Asterisk.

To eliminate the possibility that this is v15 media stream support related you could try it with v13 or v14.

By: laszlovl (lvl) 2017-11-27 17:00:49.617-0600

Just adding to highlight that the outbound INVITE is forked by the remote party - as indicated by the separate to-tag for the 183 and the 200 responses. Since that makes them separate dialogs, you could argue that there are two separate SDP sessions as well where the version check shouldn't apply.

If you look at the pjproject debug lines, it appears to realise as much ("Unhandled forked Response msg, response will be handed over to the first dialog" - see https://trac.pjsip.org/repos/ticket/657) and process the SDP normally. I did look at the code and couldn't find any indication that Asterisk (chan Pjsip) is performing SDP version checks for itself.

That still leads me to believe that the problem is caused by something else.

By: Richard Mudgett (rmudgett) 2017-11-27 18:06:59.989-0600

That does put a different light on the problem.  I hadn't even noticed that the call was forked.  I wonder which dialog is used when Asterisk terminates the call.

Could you attach a log of the full call from the call setup to teardown with verbose=4, debug=4, pjsip set logger on, and the pjproject logging set to 4?  The call does not need to stay connected for more than a few seconds.

The reason v15 is not a LTS release is because of the media stream support allowing multiple audio/video streams.  To eliminate the possibility that this is v15 media stream support related you could try it with v13 or v14.

By: laszlovl (lvl) 2017-11-28 06:51:02.118-0600

You're right, the problem doesn't exist in Asterisk 14.6. I'll upload the full debug logs in a bit.

By: laszlovl (lvl) 2017-11-28 10:42:54.663-0600

I can now 100% confirm that this has nothing to do with SDP versioning. The Asterisk 15 architecture simply doesn't account for a scenario where SDP is re-negotiated between a 183 and 200.

The 183 is processed as a regular negotiation. This means that if the negotiation is succesful, _pending_media_state_ will become _active_media_state_, and the actual _pending_media_state_ is cleared:

{code}
handle_negotiated_sdp:

SWAP(session->active_media_state, session->pending_media_state);
ast_sip_session_media_state_reset(session->pending_media_state);
{code}

When the 200 is received, Pjsip will map it onto the existing session:

{code}
sip_ua_layer.c ...Unhandled forked Response msg 200/INVITE/cseq=15621 (rdata0x7f492001f4d8), response will be handed over to the first dialog
{code}

... and correctly trigger a _on_media_update_ for that session, which is relayed to asterisk as _session_inv_on_media_update_ -> _handle_negotiated_sdp_.

There, the negotiation will be blocked by this code:

{code}
handle_negotiated_sdp:

if (!session->pending_media_state->topology) {
ast_debug(1, "Pending topology was NULL for channel '%s'\n",
session->channel ? ast_channel_name(session->channel) : "unknown");
return 0;
}
{code}

... because - remember? - we cleared the session's _session->pending_media_state->topology_ after processing the 183.

I think the fix is for _handle_negotiated_sdp_ to use _session->active_media_state_ when _session->pending_media_state_ is empty. The 200's SDP should be fed into _handle_negotiated_sdp_session_media_ together with the *already negotiated* state, which is now in _session->active_media_state_ instead. Only if both the session's pending & active media are empty, should _handle_negotiated_sdp_ break.

Coming up is a proof of concept patch, which is verified to fix the scenario described in this issue.

By: Joshua C. Colp (jcolp) 2017-11-28 10:51:38.764-0600

[~lvl] You can't modify active media state. Once a media state transitions to active it is treated as immutable and is supposed to be guaranteed to not change (and thus not need protection from other threads modifying it, which removes deadlock potential). You can only create a new media state and replace the active media state.

By: laszlovl (lvl) 2017-11-28 11:11:09.345-0600

Attached is the patch I spoke of.

~jcolp I believe that _session->active_media_state_ is only used for reading here and isn't modified, but it's hard to oversee completely. Feel free to use the attached patch in any way, or not :-)

By: laszlovl (lvl) 2017-12-05 15:41:38.065-0600

Any thoughts on the proposed patch? Should I submit it to Gerrit or is someone working on a different approach?

By: Joshua C. Colp (jcolp) 2017-12-05 15:44:22.833-0600

This issue is not currently assigned or being worked on. It has been accepted and is in queue. You can submit it to Gerrit if you wish. I don't think it is the correct fix in the end, but coming to a conclusion requires looking deeper into the SDP negotiation and the result of that on things which I can't do right now.

By: Jeroen (doggie) 2017-12-07 09:14:28.230-0600

We have exactly the same issue. Is there a planning when this issue will be fixed. Is there a way to speedup this proces?

By: laszlovl (lvl) 2018-02-20 11:58:19.529-0600

I managed to create a testcase in the Asterisk testsuite to reproduce the problem and submitted it to Gerrit: https://gerrit.asterisk.org/#/c/8303/

I verified that the testcase fails in the current version and succeeds with the proposed patch applied. The patch is submitted to Gerrit: https://gerrit.asterisk.org/#/c/8304/

By: Friendly Automation (friendly-automation) 2018-03-12 07:25:00.040-0500

Change 8379 merged by Jenkins2:
res_pjsip_session: properly handle SDP from a forked call with early media

[https://gerrit.asterisk.org/8379|https://gerrit.asterisk.org/8379]

By: Friendly Automation (friendly-automation) 2018-03-12 09:06:35.592-0500

Change 8305 merged by Jenkins2:
res_pjsip_session: properly handle SDP from a forked call with early media

[https://gerrit.asterisk.org/8305|https://gerrit.asterisk.org/8305]