[Home]

Summary:ASTERISK-27023: res_rtp_asterisk: Deadlock when TURN session in use
Reporter:Jatin Jain (jatinjain)Labels:
Date Opened:2017-05-29 03:14:13Date Closed:2017-07-11 18:51:49
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:13.13.1 Frequency of
Occurrence
Frequent
Related
Issues:
is duplicated byASTERISK-27058 Deadlock in ICE / SRTP
Environment:Linux 2.6.32-431.el6.x86_64Attachments:( 0) gstack-asterisk-pjproject-deadlock.txt
( 1) res_rtp_asterisk-turn-deadlock-fix.patch
( 2) thread_apply_all_bt.txt
Description:Facing deadlock in asterisk while working with pjproject. I am using AMI and the connection gets blocked.

One of the threads takes a mon_lock in do_monitor and then calls pj_ice_sess_send_data in the pjproject library. This thread then tries to acquire another pj_mutex_lock but doesn't get it and keeps on waiting, keeping the mon_lock.

{code:xml}
#0  0x0000003dc280e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003dc2809523 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x0000003dc2809407 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f8f2ee17d44 in pj_mutex_lock () from /usr/lib/libasteriskpj.so
#4  0x00007f8f2ee1eb72 in pj_lock_acquire () from /usr/lib/libasteriskpj.so
#5  0x00007f8f2ee1ecd8 in grp_lock_acquire () from /usr/lib/libasteriskpj.so
#6  0x00007f8f2ee1f21e in pj_grp_lock_acquire () from /usr/lib/libasteriskpj.so
#7  0x00007f8f2edae862 in pj_ice_sess_send_data () from /usr/lib/libasteriskpj.so
#8  0x00007f8e9fea1afd in __rtp_sendto.clone.4 () from /usr/lib/asterisk/modules/res_rtp_asterisk.so
#9  0x00007f8e9fea8f98 in ast_rtcp_write_report () from /usr/lib/asterisk/modules/res_rtp_asterisk.so
#10 0x00007f8e9fea972d in ast_rtcp_write () from /usr/lib/asterisk/modules/res_rtp_asterisk.so
#11 0x00000000005c3abe in ast_sched_runq ()
#12 0x00007f8ebc1ed8df in do_monitor () from /usr/lib/asterisk/modules/chan_sip.so
#13 0x00000000006043b8 in dummy_start ()
#14 0x0000003dc28079d1 in start_thread () from /lib64/libpthread.so.0
#15 0x0000003dc20e8b6d in clone () from /lib64/libc.so.6
{code}

So no other thread is able to acquire this monlock which leads to deadlock.

As mentioned in [this|https://community.asterisk.org/t/help-with-asterisk-deadlock-possible-bug/65439] post, I initially thought its a duplicate of ASTERISK-25275 manifesting in a different way, but I am using pjproject version 2.5.5 which already has the fix mentioned there but the issue still persists.

Comments:By: Asterisk Team (asteriskteam) 2017-05-29 03:14:14.876-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Jatin Jain (jatinjain) 2017-05-30 02:11:26.446-0500

I have attached another thread analysis. Some observations:

Following threads were waiting for a self lock:
101 - (LWP 5043)
96 - (LWP 23266)
92 - (LWP 20724)
80 - (LWP 6867)
75 - (LWP 29646)
44 - (LWP 17584)

This lock is acquired by another thread : 9 - (LWP 17451) which again requires a lock held by thread 65 - (LWP 5454)

This lands up in the territory of pjproject. Now there are 4 threads which are using pjproject.

Thread 65: Sending rtp packets to the remote end and is stuck on acquiring group lock on ice session
Thread 28: Sending rtcp packets to the remote end and is stuck on acquiring group lock on ice session
Thread 50: Has received a STUN indication message and has acquired a group lock on TURN and STUN sessions and is stuck on acquiring a group lock on ice session.
Thread 26: Is sending a STUN indication message. It has acquired a ice session lock and is stuck on acquiring a group lock on TURN session.

Seems that these threads are in a deadlock.



By: Richard Mudgett (rmudgett) 2017-06-02 11:06:48.382-0500

This is likely fixed by the reentrancy locking patch done for ASTERISK-26835 and ASTERISK-26853.  That fix is released in v13.16.0.

By: Richard Mudgett (rmudgett) 2017-06-02 11:10:02.599-0500

Please test v13.16.0 with the mentioned fix above to see if the problem is resolved.

By: Jatin Jain (jatinjain) 2017-06-12 04:03:36.674-0500

It isn't working with Asterisk13.16 even.

There are two threads which are in a deadlock.

The first thread is receiving and processing STUN indication from TURNSERVER. This thread acquires a lock on turn session and wants a lock on ice session.
The second thread is sending BINDING indications to the peer. This thread acquires a lock on ice session and wants a lock on turn session.

Steps to reproduce a similar trace.

1. Add a sleep of 5 seconds in the function - "stun_on_rx_indication" in turn_session.c.
2. Run asterisk with a TURN server. Enable it through rtp.conf.(TURN server is sending STUN binding indications).
3. Make a webRTC call.

FYI I am using coturn as TURN Server.



By: Michael Walton (mike@farsouthnet.com) 2017-06-20 16:24:01.515-0500

Patch from duplicate ASTERISK-27058. This patch ensures that the TURN session is passed the shared group lock of the ICE session, removing the possibility of deadlock due to unordered acquisition of multiple locks. Patch is against 13.16.0.

By: Jatin Jain (jatinjain) 2017-06-29 05:20:09.489-0500

I can't apply the patch and test as I am using Asterisk 13.13 and upgrading to 13.16 is not feasible at the moment. For now I am not using TURN server which has solved the problem but I need to use it in my other setups.

By: Richard Mudgett (rmudgett) 2017-07-06 16:24:06.614-0500

[~mike@farsouthnet.com] - The patch is up on gerrit for review.  I modified your patch slightly to guarantee that the ice pointer will be valid.

[~jatinjain] and [~mike@farsouthnet.com] - Please test as I don't have an environment to test with.  I have a note in the commit message about a concern I have where the group locks of the ICE/STUN and TURN sessions could be different while the sessions are recreated.  To eliminate the window entirely would require a more invasive patch that I would not be able to test.