[Home]

Summary:ASTERISK-26490: res_pjsip: sends 481 Call/Transaction Does Not Exist when transaction branch parameter contains "_"
Reporter:Juris Breicis (jbreicis)Labels:
Date Opened:2016-10-20 16:15:52Date Closed:2016-12-07 13:38:27.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip Resources/res_pjsip
Versions:13.11.2 Frequency of
Occurrence
Related
Issues:
Environment:Linux 4.4.19-1-pve #1 SMP x86_64 x86_64 x86_64 GNU/Linux Attachments:( 0) 2_mss_smex_capture.log.pcap
( 1) asterisk_log.txt
( 2) extensions.conf
( 3) extensions.conf
( 4) MSS_SIP_TRACE_ISSUES.txt
( 5) PJPROJECT-trace-03112016.txt
( 6) pjsip.conf
Description:In this particular setup asterisk 13.11.2 [10.40.6.100] is connected to NSN MSS through SIP trunk [212.210.2.2:5060]. The calls are received and routed back through the same trunk. Randomly the PJSIP will reply 481 Call/Transaction Does Not Exist to the trunk's CANCEL request of ongoing call in Ringing state. And sometimes (for no apparent reason) it just processes CANCEL request as expected.
I have attached console dump with:
core set debug 10
core set verbose 10
pjsip set logging on

and corresponding sections of pjsip.conf and extensions.conf

Please note although it looks like internal network 10.X.X.X is communicating through nat with public one 212.X.X.X - the routers are configured so, that this communication happens without NAT directly machine to machine.
Comments:By: Asterisk Team (asteriskteam) 2016-10-20 16:15:52.608-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Juris Breicis (jbreicis) 2016-10-20 16:23:28.861-0500

Relevant sections of pjsip.conf

By: Juris Breicis (jbreicis) 2016-10-20 16:24:03.093-0500

Relevant sections of extensions.conf

By: Rusty Newton (rnewton) 2016-10-20 20:36:32.559-0500

Can you attach an additional log, just like you have done, but along with a wireshark viewable packet capture? That way we can correlate between the two. Thanks!

By: Juris Breicis (jbreicis) 2016-10-20 23:42:27.848-0500

Okay - so tcpdump on port udp:5060 should be fine?

By: Walter Doekes (wdoekes) 2016-10-21 02:45:44.594-0500

{{tcpdump -w}} yes

By: Juris Breicis (jbreicis) 2016-10-21 05:53:25.916-0500

Okay, requested files added. For this - I even tested a simpler scenario (pjsip.conf unchanged, new extensions.conf added as well):
1. The call is originated behind the trunk.
2. Call comes in through trunk connection.
3. In the dialplan we first Wait(3) then Send Ringing() progress then wait again 30 seconds.
4. The originator of the call hangups on ongoing connection attempt (while hearing ringing).

The included log and pcap shows two sequential calls originated under same conditions.
First one - CANCELS correctly
Second one - asterisk pjsip responds with: SIP/2.0 481 Call/Transaction Does Not Exist

By: Juris Breicis (jbreicis) 2016-10-24 06:37:54.407-0500

Hi, do you have everything what you need at the moment, or should i provide any additional information?

By: Juris Breicis (jbreicis) 2016-10-25 03:44:58.530-0500

Aditotnal information which might help: when chan_sip is used instead of chan_pjsip on the same machine: no such behaviour occurs. Please let me know if you would like to see sip.conf as well.

By: Rusty Newton (rnewton) 2016-10-28 08:55:27.932-0500

Thanks. I think you have provided enough information. We will dig into this when we get a moment and see what we can find. It may be that we have to attempt reproduction of it in order to track it down.

By: Rusty Newton (rnewton) 2016-11-02 15:20:14.659-0500

Well, I see what is happening, but I'm not sure why it is happening.

Some further information would help.

We'd like to get another log file that includes the pjproject debug. This will show us when the INVITE transaction was destroyed so we can see if the transaction was actually destroyed before the CANCEL was received and possibly see whether it was destroyed for a proper reason. There may be something killing the transaction before it should be killed. Maybe the system is just overloaded and can't handle the transaction somehow.

*If you are not using bundled PJPROJECT* then you can simply provide an Asterisk log with the following log channel types: (warning,notice,error,verbose,debug) and be sure to turn verbose and debug up to level 5.

Be sure to include the output of "pjsip set logger on" in that same log.

*If you are using bundled PJPROJECT* then you'll need to make a source code modification in addition to getting a log with the channels and levels as described above.

In your Asterisk source make sure to re-run configure, then open up and edit: third-party/pjproject/source/pjlib/include/pj/config_site.h

Set the PJ_LOG_MAX_LEVEL define to 5 instead of 3

{noformat}
#define PJ_LOG_MAX_LEVEL 3
{noformat}
Change that to
{noformat}
#define PJ_LOG_MAX_LEVEL 5
{noformat}

Then, make and make install of course.

Now grab the debug described and make sure it includes messages that look like these:

{noformat}
[Nov  2 13:14:08] DEBUG[8893] pjproject: sip_endpoint.c Distributing rdata to modules: Request msg INVITE/cseq=311 (rdata0x7f7e54005c88)
[Nov  2 13:14:08] DEBUG[8893] pjproject:      endpoint .Response msg 401/INVITE/cseq=311 (tdta0x7f7e0c001c90) created
[Nov  2 13:14:08] DEBUG[8893] pjproject: tdta0x7f7e0c00 .Destroying txdata Response msg 401/INVITE/cseq=311 (tdta0x7f7e0c001c90)
[Nov  2 13:14:08] DEBUG[8893] pjproject: tsx0x7f7e0c00f ..Transaction created for Request msg INVITE/cseq=311 (rdata0x7f7e54005c88)
{noformat}

That is, DEBUG messages that begin with "pjproject:"

The log needs to include everything from the very beginning of the call in question so that we can hopefully see where the transaction are created and where they are destroyed.

We should be able to use the INVITE's cseq number to find its tdta* numbers then use those to follow the transactions.

By: Richard Mudgett (rmudgett) 2016-11-02 15:34:12.862-0500

We'll also need the version of PJPROJECT used if you are not using bundled pjproject.

By: Juris Breicis (jbreicis) 2016-11-02 15:59:14.204-0500

This was with the bundled PJPROJECT. I will try to get all the requested necessary information tomorrow morning. Should I also provide tcpdump files as well? And which would be the best scenario? The last one I tested?

By: Juris Breicis (jbreicis) 2016-11-03 03:13:57.440-0500

Attached is log which has 4 identically initiated and processed calls; 3 of them terminate as expected, but the last one raises 481 Call/Transaction Does not Exist.
The log includes extended debug output from bundled pjproject.
The pjsip configuration is the same as in original posting, except for different context name for these two trunks.

By: Rusty Newton (rnewton) 2016-11-09 20:24:15.729-0600

Thanks for the logs. I'm looking at your debug again. I'll let you know what we find.

By: Mark Michelson (mmichelson) 2016-11-16 17:52:35.367-0600

I had a look through the log and noticed that on failed calls, the branch parameter for the transaction had "\_" and "." in them. The successful cases had purely alphanumeric branch parameters. I put together a SIPp scenario, and I figured out that specifically it's the "\_" in the branch that is causing this to happen. I'm not sure why that would cause an issue since branch parameters should be completely opaque identifiers, but that's the cause of the issue right now.

By: Mark Michelson (mmichelson) 2016-11-16 18:10:46.137-0600

Adding some debug to PJSIP's transaction code, I see when we process the initial INVITE:
{noformat}
sip_transactio .Transaction 0x7fa62c007fe8 registered with hkey=0x0xc9c405bd and key=s$z9hG4bKa_D22ZYAjgaeh26
{noformat}

And then when the CANCEL arrives:
{noformat}
sip_transactio .Finding tsx with hkey=0x0x61bcd19d and key=s$z9hG4bKa_D22ZYAjgaeh26
{noformat}

You can see that the hkey (hash table key) is different between the two, even though the transaction key is identical. For some reason, the same key is getting hashed to two different values.

By: Friendly Automation (friendly-automation) 2016-12-07 13:38:29.144-0600

Change 4566 merged by Joshua Colp:
Bundled pjproject:  Fix finding SIP transactions.

[https://gerrit.asterisk.org/4566|https://gerrit.asterisk.org/4566]

By: Friendly Automation (friendly-automation) 2016-12-07 13:38:47.380-0600

Change 4565 merged by Joshua Colp:
Bundled pjproject:  Fix finding SIP transactions.

[https://gerrit.asterisk.org/4565|https://gerrit.asterisk.org/4565]

By: Friendly Automation (friendly-automation) 2016-12-07 13:39:01.785-0600

Change 4567 merged by Joshua Colp:
Bundled pjproject:  Fix finding SIP transactions.

[https://gerrit.asterisk.org/4567|https://gerrit.asterisk.org/4567]

By: Friendly Automation (friendly-automation) 2016-12-07 13:39:17.873-0600

Change 4568 merged by Joshua Colp:
Bundled pjproject:  Fix finding SIP transactions.

[https://gerrit.asterisk.org/4568|https://gerrit.asterisk.org/4568]