[Home]

Summary:ASTERISK-25295: res_pjsip crash - pjsip_uri_get_uri at /usr/include/pjsip/sip_uri.h
Reporter:Dmitriy Serov (Demon)Labels:
Date Opened:2015-07-30 07:50:35Date Closed:2015-09-09 05:55:35
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:13.4.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) 2015_08_10__20_38_07.backtrace-threads.txt
( 1) 2015_08_10__20_38_07.full.tail.txt
( 2) 2015_08_10__20_58_07.backtrace-threads.txt
( 3) 2015_08_10__20_58_07.full.tail.txt
( 4) 2015_09_05__15_12_07.backtrace-threads.txt
( 5) 2015_09_05__15_12_07.full.tail.txt
( 6) backtrace.2015-07-30-1.txt
( 7) backtrace.2015-07-30-2.txt
( 8) backtrace.2015-07-30-3.txt
( 9) backtrace.2015-07-30-5.txt
(10) core.back-trace.txt
(11) debuglog.txt
(12) full.2015-07-30-1.txt
(13) full.2015-07-30-2.txt
(14) full.2015-07-30-3.txt
(15) full.2015-07-30-5.txt
Description:Using last git branch 13. Now is 13.5-rc.
Tired of fighting with deadlock when using chan_sip. Decided to convert all the devices and the gateways to res_pjsip.
The result was crash very often. When asterisk boots. Sometimes asterisk successfully loaded and crash in a few minutes.
In the configs, nothing has changed except transfer chan_sip devices to res_pjsip.
Comments:By: Asterisk Team (asteriskteam) 2015-07-30 07:50:36.230-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Dmitriy Serov (Demon) 2015-07-30 07:54:57.256-0500

backtraces and full debug logs (level 8) for 4 crashes.

By: Rusty Newton (rnewton) 2015-07-31 09:26:05.950-0500

I see so many optimized out values. Is there anyway you can recompile with "DONT_OPTIMIZE" and "BETTER_BACKTRACES" compiler flags? We generally want those on for any backtrace.

DONT_OPTIMIZE really won't affect speed that much in a production instance.

By: Rusty Newton (rnewton) 2015-07-31 09:27:25.196-0500

In addition if you can get a debug log that includes "pjsip set logger on" output so we can see what SIP packets were read right before the crash that would be helpful.

By: Dmitriy Serov (Demon) 2015-08-10 06:57:25.902-0500

crash after 4 hours from start with PJSIP devices (instead of chan_sip).

By: Dmitriy Serov (Demon) 2015-08-10 13:14:35.473-0500

I already accumulated a lot of files with the logs and backtraces. Crashes every 20 minutes to 2 hours. I guess I'll have to go back to chan_sip.

By: Rusty Newton (rnewton) 2015-08-10 19:22:35.477-0500

Sorry you are having such a hard time with 13 between chan_sip and res_pjsip. Hopefully we'll be able to identify the source of these crashes with the latest data you provided.

By: Rusty Newton (rnewton) 2015-08-18 14:52:41.913-0500

[~Demon] can you provide the exact PJPROJECT version being used, along with your OS environment information?  Thanks!

By: Mark Michelson (mmichelson) 2015-08-18 18:00:32.592-0500

It looks like there are a couple of different types of crashes that are occurring here.

# A crash occurs when attempting to send a response to a REGISTER request with a strange Contact URI. This is what I see before the crash occurs: {noformat}[2015-07-30 12:18:20] VERBOSE[22319] res_pjsip_registrar.c: Added contact 'sip:wiseua@195.69.202.35:31382;rinstance=e770d0cd5972231a;X-PUSH-URI=' to AOR 'wiseua' with expiration of 7200 seconds{noformat} That URI has a stray '=' at the end of it. I think what is happening is that the attempt to parse the URI is failing, and we are ignoring the failure. This leads to creation of a Contact header with a NULL URI, which leads to the crash seen here. When I try to reproduce this crash using SIPp, I can't make it happen. Instead, I see this: {noformat}Added contact 'sip:wiseua@127.0.0.1:5061;rinstance=e770d0cd5972231a;X-PUSH-URI' to AOR 'sipp' with expiration of 3600 seconds{noformat} Notice how in my version, the stray '=' at the end is removed. However, my SIPp scenario has the '=' at the end of the Contact URI. Another similar crash has this: {noformat}[2015-08-10 20:36:18] VERBOSE[7386] res_pjsip_registrar.c: Added contact 'sip:16923@195.16.110.63:51677;app-id=s.notify.live.net;pn-type=wp;pn-tok= S' to AOR '16923' with expiration of 7200 seconds
{noformat}. When I try to put this URI into a SIPp scenario, this fails to even reach the registrar because of a detected syntax error in PJSIP. So this means either that the version of PJSIP that you are using is different than the one I am using, and it therefore parses the URI differently, or it means that the Contact URI format is not what I was expecting it to be, and so I am not properly reproducing the problem. In either case, I believe the correct action to take here is to ensure that we do not ignore the return value when attempting to parse contact URIs.
# A second crash appears to be due to attempting to use a task serializer that has already been destroyed. This appears to happen due to a SIP subscription being destroyed, then attempting to handle a 200 OK to an outgoing NOTIFY that we sent. There appears to be a reference counting issue here, so this is going to be more difficult to diagnose and fix.

By: Dmitriy Serov (Demon) 2015-08-19 01:10:05.109-0500

cat /usr/lib/pkgconfig/libpjproject.pc
{noformat}
Name: libpjproject
Description: Multimedia communication library
URL: http://www.pjsip.org
Version: 2.4
Libs: -L${libdir} -lpjsua2 -lstdc++ -lpjsua -lpjsip-ua -lpjsip-simple -lpjsip -lpjmedia-codec -lpjmedia -lpjmedia-videodev -lpjmedia-audiodev -lpjmedia -lpjnath -lpjlib-util  -lilbccodec -lg7221codec  -lsrtp -lgsm -lspeex -lspeexdsp -lpj -lssl -lcrypto -ldl -lz -luuid -lm -lrt -lpthread
Cflags: -I${includedir} -I/usr/include -DPJ_AUTOCONF=1  -O2 -DNDEBUG -DPJ_IS_BIG_ENDIAN=0 -DPJ_IS_LITTLE_ENDIAN=1 -fPIC
{noformat}

At different times I have used both 2.4 and trunk. But these logs were obtained using 2.4. PJProject trunk was not compiled without Video at the time.
{noformat}
uname -a
Linux talk37 3.13.6 #4 SMP Wed Aug 20 17:52:10 MSK 2014 x86_64 Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz GenuineIntel GNU/Linux
{noformat}


By: Mark Michelson (mmichelson) 2015-08-19 09:25:15.442-0500

Thank you very much for the detailed answer. I suspect that there is something else about the URI that causes it to be mis-parsed. I'll try some more experiments and will report back when I have some sort of fix.

By: Mark Michelson (mmichelson) 2015-08-19 16:04:17.757-0500

I have been unable to "trick" PJSIP into being able to accept a URI that looks like either of the ones that I quoted earlier. Unfortunately, none of the logs from when the registrar crash occurred have PJSIP logging enabled, so I cannot see what the REGISTER packet that caused the crash looked like.

I am still looking into the second crash to try to find out how to fix it.

By: Dmitriy Serov (Demon) 2015-08-20 04:19:59.237-0500

chan_sip: sip show peer 16923:
 Useragent    : LinphoneWP8KeepAlive/ (belle-sip/1.3.2)
 Reg. Contact : sip:16923@195.16.110.23:29230;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2luZG

Length important? :)
Sorry, can't provide a dump of REGISTER packet, because the device is offline. If I catch the moment when the owner it will connect, then...


By: Mark Michelson (mmichelson) 2015-08-20 09:40:35.714-0500

That seemed to work properly for me:

{noformat}
Added contact 'sip:16923@127.0.0.1:5061;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2luZG' to AOR 'sipp' with expiration of 7200 seconds
{noformat}

By: Kevin Harwell (kharwell) 2015-08-27 17:03:51.884-0500

I've been looking into this issue some and like Mark have been unable to reproduce the REGISTER problem. Does the contact uri for the following case actually contain the "=" at the end after "X-PUSH-URI"?
{noformat}
[2015-07-30 12:18:20] VERBOSE[22319] res_pjsip_registrar.c: Added contact 'sip:wiseua@195.69.202.35:31382;rinstance=e770d0cd5972231a;X-PUSH-URI=
{noformat}
I ask because when copying that line into another editor right after the "=" there is also a non printable character (you can also see it in the verbose log output when looking at the file). If those are added characters then it's likely that a buffer is not being properly NULL terminated or as Mark mentioned the buffer is being used improperly after a failure has occurred.

[~Demon] any luck on getting a REGISTER packet? It would hopefully show us if there is anything odd in the incoming message.

Also you asked:

{quote}
Length important?
{quote}

It looks like the max length for the contact URI is 256

By: Dmitriy Serov (Demon) 2015-08-28 07:19:34.931-0500

bq. Does the contact uri for the following case actually contain the "=" at the end after "X-PUSH-URI"?
Yes. I don't have REGISTER packet dump (user is offline), but "sip show peer 16923" (now is chan_sip) displays:
{noformat}
 Reg. Contact : sip:16923@87.228.126.146:61276;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2lu
{noformat}

in logs i found such packet:
{noformat}
REGISTER sip:talk37.ru;transport=udp SIP/2.0
Via: SIP/2.0/UDP 192.168.1.66:50985;rport;branch=z9hG4bKPjmQXgOIC3W.
Max-Forwards: 70
From: "ivmaster" <sip:ivmaster@talk37.ru>;tag=a0ZwQBBkHDataga30
To: "ivmaster" <sip:ivmaster@talk37.ru>
Call-ID: dAe1tzVT1G
CSeq: 18018 REGISTER
User-Agent: Join 3.2.4(10958)
Contact: "ivmaster" <sip:ivmaster@192.168.1.66:50985>;reg-id=1;+sip.instance="<urn:uuid:C427A9B9-ECA3-8383-4AE6-BA41B88A180F>"
Expires: 900
Allow: PRACK, INVITE, ACK, BYE, CANCEL, UPDATE, INFO, SUBSCRIBE, NOTIFY, REFER, MESSAGE, OPTIONS, MESSAGE
Content-Length:  0
{noformat}

The users of LinphoneWP8KeepAlive/ (belle-sip/1.3.2) makes asterisk (chan_sip) to generate such packet:
{noformat}
OPTIONS sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0ilZ SIP/2.0
Via: SIP/2.0/UDP 85.142.148.80:5060;branch=z9hG4bK5eff2789;rport
Max-Forwards: 70
From: "asterisk" <sip:asterisk@talk37.ru>;tag=as249c56dc
To: <sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0il
Contact: <sip:asterisk@85.142.148.80:5060>
Call-ID: 61d6e12e28403b324a4af91e5a697de2@talk37.ru
CSeq: 102 OPTIONS
User-Agent: ruVoIP.net PBX
Date: Fri, 28 Aug 2015 10:35:34 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Content-Length: 0
{noformat}

Users of winphone quite rare. I turned on sip debug and waiting "REGISTER" packet from.

Questions:
1. What version of pjproject better now to use?
2. What configure params to use?
I could make another attempt to switch devices on PJSIP with debug enabled and a full dump of the packets.

By: Dmitriy Serov (Demon) 2015-08-28 08:01:32.098-0500

By the way, make attension to field "to" compare with request line:
To: <sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0il
OPTIONS sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0ilZ SIP/2.0

255 chars. Char "Z" disappeared and there is no ">" in "To" line. Chan_sip logs :)

By: Kevin Harwell (kharwell) 2015-08-28 11:45:02.699-0500

{quote}
Questions:
1. What version of pjproject better now to use?
{quote}

I'd recommend using the latest version of [pjproject|http://www.pjsip.org/] if possible, which is [currently|http://www.pjsip.org/download.htm] [2.4.5|http://www.pjsip.org/release/2.4.5/pjproject-2.4.5.tar.bz2].

{quote}
2. What configure params to use?
{quote}

See the following wiki page on how to configure it for Asterisk: [Building and Installing pjproject|https://wiki.asterisk.org/wiki/display/AST/Building+and+Installing+pjproject]. Depending on your system you might need to enable/disable certain features.

Also, as a side note I was wrong earlier about the length of a contact. Apparently, there is another constraint when writing to the ast_db upon adding a contact:

{noformat}
db.c:318 ast_db_put: Family and key length must be less than 253 bytes
{noformat}

By: Joshua C. Colp (jcolp) 2015-08-29 10:45:59.054-0500

A fix for the second crash issue is now up for review.

By: Dmitriy Serov (Demon) 2015-09-05 14:51:20.757-0500

new backtrace and debug logs with SIP debug.

{noformat}
REGISTER sip:talk37.ru;transport=UDP SIP/2.0
Via: SIP/2.0/UDP 100.104.148.1:27797;branch=z9hG4bK-d8754z-6cdfafec6cf606bc-1---d8754z-;rport
Max-Forwards: 70
Contact: <sip:velikin@100.104.148.1:27797;rinstance=f676c97c42c7beec;transport=UDP;X-PUSH-URI=http://s.notify.live.net/u/1/db3/H2QAAADfWyZhFZDO0gf0n6K3sFKAvWRESxB7f0R0w8U4-dbUr-pxPenfrZ12qybyPFbi9LTtk-IYx3z3dyhTifqDI_CVcjK5GxllwyNcRkxN_mNAZoZIUNQ79EGVr_zvisatB_k/d2luZG93c3Bob25lZGVmYXVsdA/EW_BnIu3fUOH7FePoWYHNw/wQmdoDSphqYQ57yXZVXIQ7nLVW4>
To: <sip:velikin@talk37.ru;transport=UDP>
{noformat}

By: Joshua C. Colp (jcolp) 2015-09-08 07:47:32.246-0500

Fixes are up at https://gerrit.asterisk.org/1212 and https://gerrit.asterisk.org/1213 for the Asterisk side of this but you will still need to change PJSIP to allow a larger URI size, as it defaults to 256. This can be done by modifying pjsip/include/pjsip/sip_config.h and changing PJSIP_MAX_URL_SIZE to be 1024. After doing so PJSIP *and* Asterisk has to be rebuilt.

I've confirmed that doing this and having the above fixes allows things to work as expected.