[Home]

Summary:ASTERISK-22614: Asterisk 12 using 308.000 handles with 60 open calls
Reporter:Private Name (falves11)Labels:
Date Opened:2013-09-28 02:51:29Date Closed:2013-10-27 14:05:15
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:12.0.0-alpha1 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-22709 crash: atxfer threeway call results in crash while creating channel snapshot
is related toASTERISK-22731 Crash on incoming chan_pjsip call where dialplan hangs up before ACK is received for INVITE
Environment:Debian 7Attachments:( 0) backtrace.txt
( 1) backtrace.txt
( 2) backtrace.txt
( 3) dialplan.txt
Description:My application crashes when it reaches 100 channels, and I just compiled it without optimization. With 67 opencalls,
lsof | grep asterisk|wc -l
308261
but
lsof -i | grep asterisk|wc -l
1024

There is a huge handles leak. Even 1024 network handles for 68 calls is unheard of Another problem is that the command "core restart now" does not work, hangs forever. No new calls are accepted.
I am not an engineer, so if somebody gracefully wants to log into my box, I will point the traffic to it. If nobody can help, I understand.
But I think we can take advantage of my testing with lots of traffic, and this debug the app.
If I compile Asterisk with "debug threads" the app never finishes loading.
The amount of handles keeps going up, with the same amount of channels
In 1/2 hour reaches
lsof | grep asterisk|wc -l
613388

Comments:By: Private Name (falves11) 2013-09-28 02:58:59.267-0500

this is my dialplan

By: Matt Jordan (mjordan) 2013-09-28 20:16:46.917-0500

There's a good chance you were running into an RTP port leak that just got fixed. Try with a fresh checkout of the Asterisk 12 branch, post r399924.

By: Private Name (falves11) 2013-09-28 21:15:28.277-0500

This is trace of a crash, after I updated my SVN to 400121
The handle leak is gone, but it is very unstable

By: Private Name (falves11) 2013-09-28 21:16:49.812-0500

I updated to version 400121 and it crashed after 5 minutes, with a few dozen calls

By: Private Name (falves11) 2013-09-28 23:33:54.457-0500

I switched to regular SIP, not PJSIP, and with 160 open calls
lsof -i | grep asterisk | wc -l
502
lsof | grep asterisk | wc -l
769230

I think 796230 handles is absurd. But the network handles, 502 is fine. There is still another leak.
If it continues growing, machine will be dead

By: Private Name (falves11) 2013-09-29 07:45:32.208-0500

It crashes once per hour. Here is the largest dump.


By: Private Name (falves11) 2013-09-29 09:33:45.093-0500

The issue is in the CDR portion. After reading the trace, I disabled the hangup handler and it is holding steady with 400 channels.
If you read my dialplan, the only purpose of the hangup handler is to populate a CDR(field) with HANGUPCAUSE and DIALSTATUS. The CDR is not there and that is the cause of the crash. (guessing here)
But, I still get on the screen this error every 10 or so calls
[Sep 29 10:24:22] ERROR[14822][C-000060d0]: cdr.c:2958 ast_cdr_getvar: Unable to find CDR for channel SIP/8.19.245.37-14-0000c0e7
It may mean that the CDR records are lost, the memory is not there, etc. I am afraid I am losing records and tons of money. Could somebody look into this?


By: Matt Jordan (mjordan) 2013-09-29 17:20:30.390-0500

{quote}
It may mean that the CDR records are lost, the memory is not there, etc. I am afraid I am losing records and tons of money. Could somebody look into this?
{quote}

First, if you need support, I highly suggest you look to the asterisk-biz mailing list to contact interested developers in the Asterisk Developer Community. The issue tracker is not a place to solicit for support.

Second, did you actually deploy an alpha release for live customers?

Someone will look at this issue - but if you actually have a system deployed using an alpha release of Asterisk 12, then you should probably go back and re-read the [release announcement|http://lists.digium.com/pipermail/asterisk-announce/2013-August/000482.html].

{quote}
The first preliminary test release of Asterisk 12 is an alpha release, not a
beta release. Due to the size and scope of the changes in Asterisk 12, both an
alpha test cycle and a beta test cycle will be performed. While users are
encouraged to participate in both test cycles, users who choose to participate
in the alpha release testing should understand that an alpha release has not
undergone all of the community testing that a beta release goes through.
{quote}

In case that isn't clear, neither an alpha nor a beta release are production ready releases. That has never been the intent in any release of Asterisk, much less a [Standard release|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions] (which Asterisk 12 is) containing numerous architectural changes. Alpha and beta releases are done so that the community can help the Asterisk Developer Community test the release; not to deploy in situations where you may have live customers impacted.

By: Private Name (falves11) 2013-09-29 19:44:52.578-0500

Well, I guess that when I read about the changes in Asterisk 12, I saw that this time it was going to work right. So far Asterisk crashes easily under pressure. In order to handle thousands of calls I need to use opensips as a balancer and several,up to 12 independent instances of Asterisk in the same box. But Asterisk 12 may be a game changer, for we may no longer need a load balancer in the middle. I guess I was right. My Asterisk 12 single instance is working fine for more than 14 hours with 400 to 500 channels. Something works well in Asterisk 12. I am not using yet PJSIP, because it crashes.
The thing that needs to be worked out in Asterisk 12 is the CDR.
Also I think PJSIP will put Asterisk in par with any softswitch.

By: Private Name (falves11) 2013-09-30 08:03:02.477-0500

It crashed after 16 hours of running well. This is the trace

By: Rusty Newton (rnewton) 2013-10-18 18:20:12.531-0500

We appreciate you testing 12, but we definitely don't recommend running an Alpha or Beta on a production site. You are going to have glitches as the new features in 12 get smoothed out.

Regarding your latest crash, there has been quite a few crashes fixed since the alpha was released. Please try the latest SVN revision of the 12 branch and re-test. The beta has been released, and even since then we have a number of fixes in place.

By: Matt Jordan (mjordan) 2013-10-27 14:05:05.880-0500

The most recent backtrace devolves this issue into a duplicate of ASTERISK-22709 - that is, there is a race condition in creating a channel snapshot for a channel on a thread other than the PBX thread. The channel is not locked; hence, there is no guarantee that the channel is not disposed of during creation of the snapshot.

As such, I'm going to go ahead and close this issue out as a duplicate of ASTERISK-22709. If you are able to reproduce the handle usage problems with the latest from the 12 branch, please comment on this issue and we'll reopen it. Thanks!