[Home]

Summary:ASTERISK-21040: Deadlock involving chan_sip.c, pbx.c and autoservice.c, locking on chan and &conclock
Reporter:Andrew Nowrot (andrutto)Labels:
Date Opened:2013-02-06 04:13:05.000-0600Date Closed:2013-11-10 20:30:57.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General Core/PBX PBX/pbx_realtime
Versions:11.2.0 11.2.1 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-21228 Deadlock in pbx_find_extension when attempting an autoservice stop due to holding the context lock
is duplicated byASTERISK-20537 Asterisk deadlocks between looking up extension from process_sdp and bridge execution from pbx_realtime
is duplicated byASTERISK-22835 pbx_realtime: deadlock with channel in autoservice while calling realtime switch
Environment:Linux Deban wheezy, kernel 3.2.34, x86_64 GNU/LinuxAttachments:( 0) backtrace-threads.txt
( 1) core-show-locks.txt
Description:Occasionally Asterisk is being deadlock (no calling response, no invite/registers thru sip). Sometimes it works for a week and sometimes only for several hours. The system load, CPU, RAM are fine.
Comments:By: Rusty Newton (rnewton) 2013-02-14 19:39:45.506-0600

Thanks. What else can you tell us about the system? What channel types are being used? Outbound/Inbound only? Etc.

Is this a really high volume scenario?

Could you provide an Asterisk full log excerpt with VERBOSE and DEBUG at level 5 showing right when the deadlock occurs?

By: Andrew Nowrot (andrutto) 2013-02-15 04:54:02.445-0600

It is linux Debian, running on Intel platform. Average load is between 0.02 and 0,1. Maximum 5 simultaneous calls. So it is not high volume system. I am using only SIP channels in both ways (inbound/outbound). Next time it will occur I will send some logs. As for now it works for six days, so my guess is that it can happen any time.



By: Modulus (modulus) 2013-02-25 10:41:08.652-0600

We came into a similar situation (deadlock with no calling response, no invite/registers thru sip) on a Asterisk 10.12.1 system with Linux Debian Wheezy.

Since this happened on a production system we had to restart immediately, without taking any backtraces.
This happened only once (after 8 hours of Asterisk running) and until now (2-3 days since the event) it has not occurred again.
Now, we have written a script that checks the logs for SIP registrations, and if there are no any of them for some time, it will run gdb to take backtraces, and restart Asterisk.

We are using SIP channels, as well as Local channels.

In the time of the event, there were some active calls that started to hangup after a while, except for one which stuck indefinitely (until restart) into the system. However, according to the CDRs of our upstream provider, that particular call should have ended at 16 secs after being answered, exactly at the time that sip dialogues stopped coming.
By reproducing that particular call after the restart, 'core show channels' showed two SIP channels and four Local channels as expected, while, just before the restart, 'core show channels' showed only two SIP and one Local channels (the other three local channels were missing).

We are thinking that a deadlock happened at the time that the stuck call should have hangup, which maybe is related to the local channels associated with that call. Perhaps it would be useful to know if local channels are also used in Andrew Nowrot's case.

We will come back with backtraces, when that happens again.

By: Rusty Newton (rnewton) 2013-02-25 18:23:13.508-0600

Are either of you using Asterisk Realtime and if so, in what way, and with what backend?

Are you using realtime for dialplan?

If you are using realtime, do you have a way to attempt reproduction without realtime in the mix?

By: Modulus (modulus) 2013-02-25 18:48:42.375-0600

We are using realtime for sip users and peers with mysql backend:

sipusers => mysql,general,sip_buddies
sippeers => mysql,general,sip_buddies

but not for dialplan (extensions.conf is a static file).

Also we are currently using rtcachefriends=yes option in sip.conf

By: Andrew Nowrot (andrutto) 2013-02-26 02:56:24.478-0600

We are using realtime for sip and for extensions with postgresql backend.

sipusers => pgsql,asteriskdb,sip
sippeers => pgsql,asteriskdb,sip
extensions => pgsql,asteriskdb,extensions

rtcachefriends=yes option is set in sip.conf

System works now for 17 days and not causing any problems.

By: Modulus (modulus) 2013-03-02 06:11:06.286-0600

Finally, after one week, our installation (Asterisk 10.12.1) had a new deadlock.

We attach the backtraces.
If our deadlock is irrelevant with the current thread, please feel free to split it.

By: Modulus (modulus) 2013-03-07 03:09:29.677-0600

After analyzing our backtrace, it seems that it is a fax gateway problem.

I removed our backtrace from the current thread, not to be confusing.
We will open a new thread about the bug we found.

By: Dare Awktane (awktane) 2013-04-22 00:33:22.077-0500

Related to ASTERISK-21228 ?

By: Matt Jordan (mjordan) 2013-11-10 20:30:57.175-0600

Closing out as a duplicate of ASTERISK-21228. Since that issue has received more traffic, this issue will be tracked there. Thanks!