[Home]

Summary:ASTERISK-28882: res_pjsip: Contact not completely removed on transport closure
Reporter:Robert Sutton (rsutton@noojee.com.au)Labels:
Date Opened:2020-05-11 00:22:05Date Closed:2020-05-13 05:55:09
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:16.10.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:ubuntu 18.04 docker realtime jssip over WSAttachments:( 0) asterisk.log.gz
Description:The logs are full of the below ERROR, they increase in frequency as time goes by.

It appears that when a remote sip client disconnects abruptly (VPN disconnect) the contact is replaced but some part of the qualify code is still retaining the old contact id and trying to qualify it.

As the code to timeout the contact has already removed it's reference to this dead contact it is never cleaned up.
{noformat}
ERROR[506]: res_pjsip.c:4278 endpt_send_request: Error 171060 'Unsupported transport (PJSIP_EUNSUPTRANSPORT)' sending OPTIONS request to endpoint 113
{noformat}
I've acquired the back trace for this error message...
{noformat}
# 0: [0x5db45d] asterisk utils.c:2404 __ast_assert_failed()
# 1: [0x7fe8771b0ea8] res_pjsip.so res_pjsip.c:4285 endpt_send_request()
# 2: [0x7fe8771b1625] res_pjsip.so res_pjsip.c:4477 ast_sip_send_out_of_dialog_request()
# 3: [0x7fe8771b73ce] res_pjsip.so pjsip_options.c:908 sip_options_qualify_contact()
# 4: [0x4645a7] asterisk astobj2_container.c:328 internal_ao2_traverse()
# 5: [0x46487f] asterisk astobj2_container.c:414 __ao2_callback()
# 6: [0x7fe8771b7561] res_pjsip.so pjsip_options.c:930 sip_options_qualify_aor()
# 7: [0x7fe8771e671c] res_pjsip.so pjsip_scheduler.c:98 run_task()
# 8: [0x5c5736] asterisk taskprocessor.c:1239 ast_taskprocessor_execute()
# 9: [0x5cf3e0] asterisk threadpool.c:1354 execute_tasks()
#10: [0x5c5736] asterisk taskprocessor.c:1239 ast_taskprocessor_execute()
#11: [0x5cd02a] asterisk threadpool.c:367 threadpool_execute()
#12: [0x5cec43] asterisk threadpool.c:1137 worker_active()
#13: [0x5ce9dd] asterisk threadpool.c:1057 worker_start()
#14: [0x5d83b7] asterisk utils.c:1249 dummy_start()
#15: [0x7fe8c2e4e6ba] libpthread.so.0 :0 __pthread_get_minstack()
#16: [0x7fe8c20e341d] libc.so.6 :0 clone()
{noformat}
here is the contact that "pjsip show contacts" lists
{noformat}
 Contact:  113/sip:na89q83p@127.0.0.1:35636;transport=WS  f21857cece Avail        23.087
{noformat}


and here you can see it is qualifying 2 contacts,
{noformat}
DEBUG[506]: res_pjsip/pjsip_options.c:857 sip_options_qualify_contact: Qualifying contact '113;@0f05565353fc038d2259556e7a9d87e1' on AOR '113'
DEBUG[506]: res_pjsip/pjsip_options.c:857 sip_options_qualify_contact: Qualifying contact '113;@f21857cece07658a2e96e470283f3ea1' on AOR '113'
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2020-05-11 00:22:06.482-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2020-05-11 07:12:00.842-0500

In the environment field you specify "realtime jssip over WS". What does this mean? Are you using realtime configuration with PJSIP?

Can you also please provide console output with debug enabled in logger.conf and "core set debug 3" when this occurs?

By: Robert Sutton (rsutton@noojee.com.au) 2020-05-11 19:36:23.297-0500

Debug level 3 from startup to first occurance of the error plus a little more.

There is a back trace as I added  __ast_assert_failed to the appropriate place in my build.

The endpoint with the issue in this log is 113

for my own reference if we need to reproduce this my build number is 16.10.0_02

By: Robert Sutton (rsutton@noojee.com.au) 2020-05-11 19:38:51.979-0500

We are using realtime configuration with PJSIP.
The sip client is jssip which connects via WebSockets

I've attached logs with debug set to level 3 from startup to first occurrence of the error plus a little more.

There is a back trace as I added  __ast_assert_failed to the appropriate place in my build.

The endpoint with the issue in the log is 113


By: Robert Sutton (rsutton@noojee.com.au) 2020-05-12 18:47:22.409-0500

I'm a little more concerned about this issue after finding asterisk consuming 50% CPU on a system with no active calls that had been up for 3 weeks

By: Joshua C. Colp (jcolp) 2020-05-13 04:19:01.201-0500

Looking at the log shows the disconnection being properly handled in cases, but not in the one you pointed out. On the issue details you state this is constant - is it truly that? Is it happening for all endpoints? Is it specific endpoints? Is it just the specific scenario you've outlined? Are there other scenarios where it does work as expected?

By: Robert Sutton (rsutton@noojee.com.au) 2020-05-13 05:24:35.895-0500

Is it constant? - Given that I don't have 100% repeatable way of reproducing it, it is probably fair to downgrade it to frequent.

Is it happening for all endpoints? - I've only noticed it on endpoints using jssip (Web Sockets)

Is it specific endpoints? - It's more frequent on some endpoints than others, but I assume that is just a reflection of the reliability of the network in each endpoints location (many working from home)

Is it just the specific scenario I've outlined. - These are the ones I know of; Disconnect network (at remote endpoint), terminate VPN (at remote endpoint), delete contact from realtime DB (I removed the code that was deleting contacts from realtime last week and the problem persists, although less frequent)

My 2 cents worth here, it feels like a race condition.



By: Robert Sutton (rsutton@noojee.com.au) 2020-05-13 05:53:50.274-0500

I just went back and double check and found a remaining piece of code deleting contacts - can't believe I missed it!

Apologies and lets close this for the time being.

By: Robert Sutton (rsutton@noojee.com.au) 2020-05-13 05:55:09.280-0500

I had external code deleting contacts - thought I had removed that code!

apologies