[Home]

Summary:ASTERISK-22859: Deadlock random
Reporter:Walter Klomp (walter.klomp)Labels:
Date Opened:2013-11-16 20:44:11.000-0600Date Closed:2013-12-07 20:55:10.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/Subscriptions
Versions:11.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Ubuntu 12.04 latest updatesAttachments:( 0) backtrace-threads-201311162357.txt
( 1) backtrace-threads-201311170115.txt
( 2) backtrace-threads-201311170140.txt
( 3) backtrace-threads-201311170208.txt
( 4) backtrace-threads-201311170305.txt
( 5) backtrace-threads-201311170413.txt
( 6) core-show-locks-201311162357.txt
( 7) core-show-locks-201311170115.txt
( 8) core-show-locks-201311170140.txt
( 9) core-show-locks-201311170208.txt
(10) core-show-locks-201311170305.txt
(11) core-show-locks-201311170413.txt
Description:Happened with 11.6.0.x and now also svn of asterisk-11...
Comments:By: Walter Klomp (walter.klomp) 2013-11-16 20:48:35.503-0600

I have auto-backtrace and core show locks generated when the server doesn't write anything in full.log anymore (indicating a deadlock as I register with sipsak every minute to check...)

When this happens (at the weirdest hours of the day/night) customers can't register anymore.   Previous crashes occured due to locks due to rtupdate in res_config_mysql - have since changed to config_odbc and set rtupdate to "no" - deadlocks are not going away.

Please have a look at a few reports attached.

By: Walter Klomp (walter.klomp) 2013-11-16 20:51:33.068-0600

Except for the deadlock at 23:57 (due to res_config_mysql I think) - all the other ones seem to have a common denominator at chan_sip.so ... But I may be wrong.

By: Walter Klomp (walter.klomp) 2013-11-18 00:02:30.704-0600

Just now it happened again and this time I saw something in the full.log...

[Nov 18 13:49:00] ERROR[25224][C-000029b7] res_timing_timerfd.c: Failed to create timerfd timer: Too many open files
[Nov 18 13:49:00] NOTICE[25224][C-000029b7] chan_sip.c: Unable to create/find SIP channel for this INVITE

and a lot of these:
[Nov 18 13:49:00] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:00] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:00] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:01] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:01] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:01] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:01] ERROR[25224] acl.c: Cannot create socket
[Nov 18 13:49:01] ERROR[25224] acl.c: Cannot create socket

Here is my ulimit -a output
root@asterisk1:/var/log/asterisk# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31501
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31501
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

What can I do ?


By: Matt Jordan (mjordan) 2013-11-18 09:00:38.474-0600

Hi Walter:

My guess is this is not a deadlock.

{noformat}
[Nov 18 13:49:00] ERROR[25224][C-000029b7] res_timing_timerfd.c: Failed to create timerfd timer: Too many open files
{noformat}

There are two scenarios that could be occurring here:

# You have a rather small number of file descriptors allowed on your system, 1024.
{quote}
open files (-n) 1024
{quote}
Asterisk tends to require a rather substantial number of file descriptors. You should probably increase this limit.
# If increasing the limit of file descriptors does not resolve the problem, then you may be running into a more systemic problem wherein the file descriptors are not getting released by Asterisk. With such a low limit currently in place, however, it is difficult to tell.

I would increase the number of open files allowed and see if this resolves your problem. If not, we can look at how to get information about what is occurring.

By: Walter Klomp (walter.klomp) 2013-11-18 09:31:33.823-0600

Yes, I did some snooping around after I saw the error messages in full.log I also noticed the ulimit to be small - I increased it to 99999 ...  Let's monitor what happens next...

Meanwhile asterisk insists on staying in the stream, taking up CPU, not doing transcoding, but 20 concurrent calls take 100% CPU...
would directrtpsetup help in this matter ?

Clients are behind NAT...


By: Matt Jordan (mjordan) 2013-12-07 20:54:48.165-0600

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines