[Home]

Summary:ASTERISK-25797: app_queue: Crash when calling a queue with a member with a forward to an nonexistent extension
Reporter:Etienne Lessard (hexanol)Labels:
Date Opened:2016-02-16 11:08:47.000-0600Date Closed:2016-08-12 08:54:37
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_queue
Versions:13.6.0 13.7.2 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-26286 ERROR[9264][C-0000072f] astobj2.c: user_data is NULL
is duplicated byASTERISK-26381 app_queue: Crash possibly related to very long dial string
is duplicated byASTERISK-26486 Forwarded call from Digium Phone d40/d70 may crash asterisk
Environment:Debian 8.3 amd64Attachments:( 0) AST-25797.backtrace.txt
( 1) AST-25797.debug.log
( 2) AST-25797.log
( 3) AST-25797.pcap
Description:Given I have a queue with 1 member: Alice (SIP/alice)
Given Alice's phone has a forward (SIP native) to the 444 extension, and the 444 extension isn't defined in the dialplan
When Bob calls the queue
Then asterisk get stuck in what seems to be an infinite loop, consuming a lot of CPU time, and rapidly consuming (in the matter of seconds) all the available memory in the system, until the kernel OOM killer terminate the process

In my example, I used the following queues.conf:
{code}
[sales]
strategy = ringall
member => SIP/alice
{code}
And the following extensions.conf:
{code}
[default]
exten = 101,1,NoOp()
same  =   n,Queue(sales)
same  =   n,Hangup()
{code}
When Alice's phone receives a SIP INVITE from asterisk, since it has an unconditional forward enabled, it respond with a "302 Moved Temporarily" with the header "Contact: <sip:444@10.34.1.11>".

I've attached the interesting part of the log.

Thank you
Comments:By: Asterisk Team (asteriskteam) 2016-02-16 11:08:50.055-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Etienne Lessard (hexanol) 2016-02-16 11:11:44.644-0600

I forgot to mention that this bug seems to have some resemblance with ASTERISK-25384 .

By: Rusty Newton (rnewton) 2016-02-16 19:37:40.512-0600

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Rusty Newton (rnewton) 2016-02-16 19:41:43.196-0600

That canned message asks for some things you already provided.  More specifically we want the backtrace.  Thanks Etienne!

By: Etienne Lessard (hexanol) 2016-02-17 07:09:59.273-0600

Note that this is not a "typical" crash where the process receives a SIGSEGV or SIGABRT signal and generates a core dump. The problem is that the asterisk process try to allocate an infinite amount of memory, which leads to an out-of-memory situation. The process terminates because the kernel out-of-memory killer kills it. By default no core dump is generated.

Still, I've generated a core dump by manually issuing a SIGABRT to the process while it was in the "inifite loop, eating all system memory" state (but before reaching an out-of-memory situation). I've attached the corresponding backtrace. (the core dump was generated using a slightly different setup than described in my original comment; the extension called is 1001 and the queue name is "file1").

By: Rusty Newton (rnewton) 2016-02-26 09:55:17.687-0600

Thanks again for the info.

I haven't been able to reproduce this yet, can you provide some additional information by following the instructions here:

https://wiki.asterisk.org/wiki/display/AST/Memory+Leak+Debugging

Be sure to get the log with the 'debug' channel on if possible.

By: Etienne Lessard (hexanol) 2016-02-26 14:16:17.411-0600

I'm having some trouble running Asterisk with the MALLOC_DEBUG compiler option, so I'm only attaching a debug log and a network capture.

But I've taken another look at the code, and in my case, in app_queue.c, the instruction "o->chan = ast_request(tech, ast_channel_nativeformats(in), NULL, in, stuff, &status);" set o->chan to NULL;  then, still in app_queue, it logs "Forwarding failed to create channel to dial '%s/%s'\n","; then app_queue calls "ast_channel_publish_dial(qe->chan, o->chan, stuff, NULL);", but o->chan is NULL at this time (we see it in the backtrace), and then ast_channel_publish_dial doesn't seem to handle the case were the second argument (peer) is NULL, entering an infinite loop (oh and maybe memory consumption goes up because the logs are produced by this thread at an higher rate that can be consumed by the thread responsible to write it to the files).

By: Rusty Newton (rnewton) 2016-08-11 09:27:05.293-0500

ASTERISK-26286 duplicates this issue. The information from it should be useful for resolving this.

[~mjordan] had some analysis as well on the community forum: https://community.asterisk.org/t/asterisk-error-astobj2-c-user-data-is-null/49125/9

{quote}
@david551 is right in that fixing the issue where the forward fails would prevent the error from occurring. That being said, I think the continual errors that occur after failing to forward the Local channel occur are due to an improperly handled off-nominal condition.

I took a look at the log statement just prior to the errors being reported from astobj2:
{noformat}
if (ast_call(o->chan, stuff, 0)) {
ast_log(LOG_NOTICE, "Forwarding failed to dial '%s/%s'\n",
tech, stuff);
do_hang(o);
numnochan++;
}
}

ast_channel_publish_dial(qe->chan, o->chan, stuff, NULL);
ast_channel_publish_dial_forward(qe->chan, original, o->chan, NULL,
"CANCEL", ast_channel_call_forward(original));
{noformat}
You can see that if we log out Forwarding failed to dial ..., we immediately hangup the o channel:
{noformat}
static void do_hang(struct callattempt *o)
{
o->stillgoing = 0;
ast_hangup(o->chan);
o->chan = NULL;
}
{noformat}
Note that o->chan is now NULL.

Immediately after this, we attempt to raise events about the now cancelled dial:
{noformat}
ast_channel_publish_dial(qe->chan, o->chan, stuff, NULL);
ast_channel_publish_dial_forward(qe->chan, original, o->chan, NULL,
"CANCEL", ast_channel_call_forward(original));
{noformat}
Those function calls are going to try and interact with o->chan, which is now quite NULL.
{quote}

By: Jaco van Niekerk (faqterson) 2016-09-16 07:58:46.301-0500

I am experiencing the same problem and I have been able to replicated the fault:

Connected to Asterisk 13.11.2 currently running on localhost (pid = 30307)

   -- Got SIP response 302 "Moved Temporarily" back from 192.168.97.71:5060
   -- Now forwarding DAHDI/i1/0871350000-1 to 'Local/444@internal' (thanks to SIP/4000-00000000)
[2016-09-16 14:49:44] NOTICE[30388][C-00000001]: core_local.c:740 local_call: No such extension/context 444@internal while calling Local channel
[2016-09-16 14:49:44] NOTICE[30388][C-00000001]: app_queue.c:4962 wait_for_answer: Forwarding failed to dial 'Local/444@internal'
[2016-09-16 14:49:44] ERROR[30388][C-00000001]: astobj2.c:119 INTERNAL_OBJ: user_data is NULL
[2016-09-16 14:49:44] ERROR[30388][C-00000001]: astobj2.c:119 INTERNAL_OBJ: user_data is NULL
[2016-09-16 14:49:44] ERROR[30388][C-00000001]: astobj2.c:119 INTERNAL_OBJ: user_data is NULL



By: Jaco van Niekerk (faqterson) 2016-09-16 08:00:22.559-0500

Never mind I found a way to resolve the problem.

i: Ignore call forward requests from queue members and do nothing when they are requested.

{{exten => s,10,Queue($QUEUE,$OPTIONStTi,,,$TIME)}}

By: Joshua C. Colp (jcolp) 2016-09-16 08:00:58.822-0500

This issue has been fixed but is not yet in a release.