[Home]

Summary:ASTERISK-25602: chan_sip deadlocks after INVITE processing while calling sip_report_security_event
Reporter:Antonis Psaras (apsaras)Labels:
Date Opened:2015-12-02 03:50:26.000-0600Date Closed:2016-09-07 07:46:09
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General Channels/chan_sip/Security Framework
Versions:13.5.0 Frequency of
Occurrence
Frequent
Related
Issues:
is duplicated byASTERISK-21228 Deadlock in pbx_find_extension when attempting an autoservice stop due to holding the context lock
Environment:Centos 6 64bit on XenServer VM,Attachments:( 0) backtrace-threads.txt
( 1) backtrace-threads.txt
( 2) core-show-locks.txt
( 3) core-show-locks.txt
( 4) full.bz2
( 5) full.txt
( 6) myDebugLog.txt
Description:Asterisk is freezing randomly and not responding to cli commands or SIP messages.

Restarting the service the system comes up
Comments:By: Asterisk Team (asteriskteam) 2015-12-02 03:50:29.219-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Antonis Psaras (apsaras) 2015-12-02 03:52:08.345-0600

The locks and backtrace

By: Rusty Newton (rnewton) 2015-12-02 11:55:22.589-0600

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Rusty Newton (rnewton) 2015-12-02 11:56:05.508-0600

Thanks for the trace and core show locks. Can you also provide a debug log as described in the previous comment to demonstrate what is happening at the time of the locking situation?

Are you able to reproduce the issue at will?

By: Antonis Psaras (apsaras) 2015-12-03 02:02:00.725-0600

Will do next time it happens.

Thank you for your quick response.

By: Rusty Newton (rnewton) 2015-12-07 18:43:03.530-0600

Thanks!

By: Antonis Psaras (apsaras) 2015-12-22 04:00:02.859-0600

New backtrace files and locks

By: Antonis Psaras (apsaras) 2015-12-22 04:04:54.167-0600

DebugLog

By: Antonis Psaras (apsaras) 2015-12-22 04:11:39.900-0600

Full Trace

By: Rusty Newton (rnewton) 2015-12-22 18:58:06.275-0600

Looks like it locked up within the first few minutes after starting. Is that the case? Did the log output halt then?

By: Matt Jordan (mjordan) 2015-12-24 08:10:30.037-0600

The issue is this thread:

{code}
=== Thread ID: 0x7f8cd17f3700 LWP:32073 (do_monitor           started at [28915] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 28289 handle_request_do &netlock 0x7f8cee317de0 (1)
main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D)
main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA)
channels/chan_sip.c:28292 handle_request_do()
channels/chan_sip.c:28251 sipsock_read()
main/io.c:292 ast_io_wait() (0x52e197+19C)
channels/chan_sip.c:28882 do_monitor()
main/utils.c:1237 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f8d33c158d0+6D)
=== ---> Lock #1 (chan_sip.c): MUTEX 8049 sip_new tmp 0x7f8d2c374400 (1)
main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D)
main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA)
main/astobj2.c:167 __ao2_lock() (0x45c6d2+96)
channels/chan_sip.c:8050 sip_new()
channels/chan_sip.c:25678 handle_request_invite()
channels/chan_sip.c:28110 handle_incoming()
channels/chan_sip.c:28319 handle_request_do()
channels/chan_sip.c:28251 sipsock_read()
main/io.c:292 ast_io_wait() (0x52e197+19C)
channels/chan_sip.c:28882 do_monitor()
main/utils.c:1237 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f8d33c158d0+6D)
=== ---> Lock #2 (chan_sip.c): MUTEX 8050 sip_new i 0x7f8d2c2f2a70 (1)
main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D)
main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA)
main/astobj2.c:167 __ao2_lock() (0x45c6d2+96)
channels/chan_sip.c:8051 sip_new()
channels/chan_sip.c:25678 handle_request_invite()
channels/chan_sip.c:28110 handle_incoming()
channels/chan_sip.c:28319 handle_request_do()
channels/chan_sip.c:28251 sipsock_read()
main/io.c:292 ast_io_wait() (0x52e197+19C)
channels/chan_sip.c:28882 do_monitor()
main/utils.c:1237 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f8d33c158d0+6D)
=== ---> Waiting for Lock #3 (pbx.c): MUTEX 12251 ast_rdlock_contexts &conlock 0x979f20 (1)
main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D)
main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA)
main/pbx.c:12252 ast_rdlock_contexts() (0x58cfee+23)
main/pbx.c:4940 pbx_extension_helper()
main/pbx.c:6034 ast_spawn_extension() (0x578c1a+65)
main/pbx.c:6450 __ast_pbx_run()
main/pbx.c:6771 pbx_thread()
main/utils.c:1237 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f8d33c158d0+6D)
=== --- ---> Locked Here: pbx.c line 12251 (ast_rdlock_contexts)
{code}

We can't hold the channel/pvt lock while accessing the dialplan. Interestingly, it appears as if we're doing that not via one of the usual places - in this case, we're looking up the peer via a SIP security event due to the INVITE request, apparently after creating the channel but while the channel lock is still held:

{code}
Thread 48 (Thread 0x7f8cd17f3700 (LWP 32073)):
#0  0x00007f8d348942e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8d3488f5a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x00007f8d3488f487 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000536bef in __ast_pthread_mutex_lock (filename=0x6d612b "pbx.c", lineno=12251, func=0x6dc410 "ast_rdlock_contexts", mutex_name=0x6dc23d "&conlock", t=0x979f20) at lock.c:313
#4  0x000000000058d011 in ast_rdlock_contexts () at pbx.c:12251
#5  0x0000000000575937 in pbx_extension_helper (c=0x0, con=0x0, context=0x7f8cee31a2f0 "sipregistrations", exten=0x7f8cd17eef40 "6004", priority=1, label=0x0, callerid=0x0, action=E_MATCH, found=0x0, combined_find_spawn=0) at pbx.c:4939
#6  0x0000000000578a65 in ast_exists_extension (c=0x0, context=0x7f8cee31a2f0 "sipregistrations", exten=0x7f8cd17eef40 "6004", priority=1, callerid=0x0) at pbx.c:6008
#7  0x00007f8cee04d51c in register_peer_exten (peer=0x7f8d2c29e7b8, onoff=1) at chan_sip.c:5007
#8  0x00007f8cee083206 in reg_source_db (peer=0x7f8d2c29e7b8) at chan_sip.c:16069
#9  0x00007f8cee0d6835 in build_peer (name=0x7f8d2c3273b4 "6004", v=0x0, alt=0x0, realtime=1, devstate_only=0) at chan_sip.c:31112
#10 0x00007f8cee04e9dc in realtime_peer (newpeername=0x7f8d2c3273b4 "6004", addr=0x0, callbackexten=0x0, devstate_only=0, which_objects=2) at chan_sip.c:5450
#11 0x00007f8cee04f017 in sip_find_peer_full (peer=0x7f8d2c3273b4 "6004", addr=0x0, callbackexten=0x0, realtime=1, which_objects=2, devstate_only=0, transport=0) at chan_sip.c:5545
#12 0x00007f8cee04f0d9 in sip_find_peer (peer=0x7f8d2c3273b4 "6004", addr=0x0, realtime=1, which_objects=2, devstate_only=0, transport=0) at chan_sip.c:5584
#13 0x00007f8cee0f0900 in sip_report_security_event (p=0x7f8d2c2f2ac8, req=0x7f8cd17f21b0, res=0) at sip/security_events.c:291
#14 0x00007f8cee0c1baa in handle_incoming (p=0x7f8d2c2f2ac8, req=0x7f8cd17f21b0, addr=0x7f8cd17f2120, recount=0x7f8cd17f20cc, nounlock=0x7f8cd17f20c8) at chan_sip.c:28113
#15 0x00007f8cee0c26d8 in handle_request_do (req=0x7f8cd17f21b0, addr=0x7f8cd17f2120) at chan_sip.c:28319
{code}

I don't think this would happen on a nominal path; I suspect that something off nominal happens and we don't properly unlock the channel/pvt on one of the code paths.

[~apsaras]: can you provide a debug log with 'sip set debug' enabled? That would show the INVITE request and its processing.

By: Asterisk Team (asteriskteam) 2016-01-07 12:00:24.055-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Rusty Newton (rnewton) 2016-01-21 09:01:57.654-0600

This was closed too early. Reopening.

By: Antonis Psaras (apsaras) 2016-01-21 13:35:55.385-0600

Will  provide a debug log with 'sip set debug' enabled next time it happens. Thank you

By: Antonis Psaras (apsaras) 2016-01-26 10:52:45.735-0600

Please find attached a new log with 'sip set debug on'

By: Antonis Psaras (apsaras) 2016-05-11 04:30:26.430-0500

Any update?

By: Matt Jordan (mjordan) 2016-09-07 07:45:39.862-0500

This deadlock is being caused by usage of {{pbx_realtime}}, as shown in this part of the backtrace:

{noformat}
#12 0x000000000056b612 in pbx_exec (c=0x7f8d2c19e7a8, app=0x29754d0, data=0x7f8cce7f1ea0 "SIP/eps01/6007,45,b(11)") at pbx.c:1720
#13 0x00007f8cde23de70 in realtime_exec (chan=0x7f8d2c19e7a8, context=0x7f8d2c19f160 "DefaultPlan", exten=0x7f8d2c19f1b0 "6007", priority=2, callerid=0x7f8d2c0a36c0 "2111057800", data=0x296f6a1 "DefaultPlan") at pbx_realtime.c:352
#14 0x0000000000575f25 in pbx_extension_helper (c=0x7f8d2c19e7a8, con=0x0, context=0x7f8d2c19f160 "DefaultPlan", exten=0x7f8d2c19f1b0 "6007", priority=2, label=0x0, callerid=0x7f8d2c0a36c0 "2111057800", action=E_SPAWN, found=0x7f8cce7f4b00, combined_find_spawn=1) at pbx.c:5005
{noformat}

Closing out as a duplicate.