[Home]

Summary:ASTERISK-22644: Crash with app queue and DND set on SIP agent phone
Reporter:Marco Signorini (marcoloway)Labels:
Date Opened:2013-10-03 05:08:28Date Closed:2013-11-06 16:04:01.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue Core/Stasis
Versions:12.0.0-alpha1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Centos 6.4 i386 + Grandstream GXP2000 phonesAttachments:( 0) backtrace_101413.txt
( 1) backtrace.txt
( 2) extensions.conf
( 3) queues.conf
( 4) sip_trace_wshark.dump
( 5) sip_trace.txt
( 6) sip.conf
Description:I'm facing with a problem running a queue on Asterisk 12. It seems always replicable in my box.
This is my setup:

Asterisk SVN-branch-12-r400356M running on a CentOS 6.4 fresh install on bare Celeron machine.

A queue defined as:

{noformat}
[300]
ringinuse=yes
strategy=ringall
timeout=15
weight=0
wrapuptime=0
{noformat}

Two agents logged on that queue:

{noformat}
*CLI> queue show 300
300 has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0% within 0s
  Members:
     SIP/204 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
     SIP/200 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
  No Callers
{noformat}

The SIP/204 is a Grandstream GXP2000 and has the DND set so it will never ring.

As soon as a call enters in the queue, the SIP/200 starts ringing and the SIP/204 answers with a Busy back message. If the call is not answered, after the timeout period the PBX tries to contact the two agents but as soon as it contacts the SIP/204 it crashes.

{noformat}
*CLI>   == Using SIP RTP CoS mark 5
   -- Executing [300@from-internal:1] NoOp("SIP/201-00000000", ""Called Queue 300") in new stack
   -- Executing [300@from-internal:2] Queue("SIP/201-00000000", "300,t,,") in new stack
   -- Started music on hold, class 'default', on SIP/201-00000000
 == Using SIP RTP CoS mark 5
   -- Called SIP/200
 == Using SIP RTP CoS mark 5
   -- Called SIP/204
   -- SIP/204-00000002 connected line has changed. Saving it until answer for SIP/201-00000000
   -- SIP/200-00000001 connected line has changed. Saving it until answer for SIP/201-00000000
   -- Got SIP response 486 "Busy" back from 10.10.5.117:5066
   -- SIP/204-00000002 is busy
   -- Nobody picked up in 0 ms
   -- SIP/200-00000001 is ringing
   -- SIP/200-00000001 is ringing
   -- Nobody picked up in 15000 ms
   -- Nobody picked up in 15000 ms
Segmentation fault (core dumped)
{noformat}

I've generated a core dump and run the dgb to have the stack trace. Seems the PBX tries to manage a NULL channel.

Here are the details:

Core was generated by `asterisk -cvvvg'.
Program terminated with signal 11, Segmentation fault.

{noformat}
#0  0x080f4a6f in ast_channel_tech (chan=0x0) at channel_internal_api.c:876
876     {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.4.i686 libuuid-2.17.2-12.9.el6_4.3.i686 libxml2-2.7.6-12.el6_4.1.i686 ncurses-libs-5.7-3.20090208.el6.i686 nss-softokn-freebl-3.14.3-3.el6_4.i686 sqlite-3.6.20-1.el6.i686 zlib-1.2.3-29.el6.i686
(gdb) bt
#0  0x080f4a6f in ast_channel_tech (chan=0x0) at channel_internal_api.c:876
#1  0x081d230d in ast_channel_snapshot_create (chan=0x0) at stasis_channels.c:184
#2  0x00a6614f in queue_publish_multi_channel_blob (caller=<value optimized out>, agent=0x0, type=0x900491c, blob=0xb6b231a0) at app_queue.c:1955
#3  0x00a77fc4 in rna (rnatime=15000, qe=0xb6867c88, peer=0x0, interface=0xb6b45964 "SIP/204", membername=0x8ff5c9c "SIP/204", autopause=1) at app_queue.c:4267
#4  0x00a798b5 in wait_for_answer (qe=0xb6867c88, outgoing=0xb6b6f020, to=0xb6867bb8, digit=0xb6867bbf "", prebusies=0, caller_disconnect=0, forwardsallowed=1, ringing=0) at app_queue.c:4841
#5  0x00a7b18a in try_calling (qe=0xb6867c88, opts=..., opt_args=0xb6868e04, announceoverride=0xb6867c27 "", url=0xb6867c26 "", tries=0xb6868e10, noption=0xb6868e0c, agi=0x0, macro=0x0, gosub=0x0, ringing=0) at app_queue.c:6239
#6  0x00a7f25d in queue_exec (chan=0xb7421564, data=<value optimized out>) at app_queue.c:7624
#7  0x08188a44 in pbx_exec (c=0xb7421564, app=0x8ff9c40, data=0xb6868ef8 "300,t,,") at pbx.c:1588
#8  0x08193822 in pbx_extension_helper (c=0xb7421564, con=0xb6b7ef7a, context=0xb74221c8 "from-internal", exten=0xb7422218 "300", priority=2, label=0x0, callerid=0xb7407830 "201", action=E_SPAWN, found=0xb686b29c,
   combined_find_spawn=1) at pbx.c:4697
#9  0x0819a1b1 in ast_spawn_extension (c=0xb7421564, args=0x0) at pbx.c:5706
#10 __ast_pbx_run (c=0xb7421564, args=0x0) at pbx.c:6121
#11 0x0819ba0d in pbx_thread (data=0xb7421564) at pbx.c:6440
#12 0x081ede4a in dummy_start (data=0xb7422b58) at utils.c:1169
#13 0x00285a49 in start_thread () from /lib/libpthread.so.0
#14 0x00ec5aae in clone () from /lib/libc.so.6
{noformat}

The issue was already posted on the dev-list and this is the comment from Matthew Jordan:

{quote}
It's a crash, so it's clearly a bug. Please go ahead and file an issue in the issue tracker.

Apparently o->chan can be NULL in the last ditch NOANSWER handler in wait_for_answer:

{noformat}
if (!*to) {
for (o = start; o; o = o->call_next) {
rna(orig, qe, o->chan, o->interface, o->member->membername, 1);
}

publish_dial_end_event(qe->chan, outgoing, NULL, "NOANSWER");
}
{noformat}

Since rna does more than just raise the Stasis messages regarding the status of the queue, the code that raises the message should probably be made tolerant to a NULL peer channel.
{quote}

Thanks!
Marco Signorini.
Comments:By: Rusty Newton (rnewton) 2013-10-03 09:44:27.002-0500

Thanks Marco. Can you also go ahead and attach complete output from the following command ran on your core file?

"gdb -se "asterisk" -ex "bt full" -ex "thread apply all bt" --batch -c core > /tmp/backtrace.txt"

By: Matt Jordan (mjordan) 2013-10-03 09:55:23.956-0500

So I just tried reproducing this and can't.

My queue set up:

{noformat}
[sales]
strategy=ringall
announce=sales
musicclass = default
monitor-type=MixMonitor
monitor-format=wav
ringinuse = yes
weight = 0
wrapuptime = 0
{noformat}

Two dynamic agents login:

{noformat}
*CLI>   == Using SIP RTP CoS mark 5
   -- Executing [6000@default:1] NoOp("SIP/digium02-00000007", "") in new stack
   -- Executing [6000@default:2] AddQueueMember("SIP/digium02-00000007", "sales") in new stack
[Oct  3 09:51:15] NOTICE[29070][C-00000001]: app_queue.c:7309 aqm_exec: Added interface 'SIP/digium02' to queue 'sales'
   -- Executing [6000@default:3] Hangup("SIP/digium02-00000007", "") in new stack
 == Spawn extension (default, 6000, 3) exited non-zero on 'SIP/digium02-00000007'
 == Using SIP RTP CoS mark 5
   -- Executing [6000@default:1] NoOp("SIP/digium01-00000008", "") in new stack
   -- Executing [6000@default:2] AddQueueMember("SIP/digium01-00000008", "sales") in new stack
[Oct  3 09:51:20] NOTICE[29071][C-00000002]: app_queue.c:7309 aqm_exec: Added interface 'SIP/digium01' to queue 'sales'
   -- Executing [6000@default:3] Hangup("SIP/digium01-00000008", "") in new stack
 == Spawn extension (default, 6000, 3) exited non-zero on 'SIP/digium01-00000008'
{noformat}

This shows up in the sales queue:

{noformat}
*CLI> queue show sales
sales has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:1, SL:0.0% within 0s
  Members:
     SIP/digium02 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
     SIP/digium01 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
  No Callers
{noformat}

SIP/digium01 sets themselves to Busy. SIP/phone_a_2 then calls into the {{sales}} queue:

{noformat}
*CLI>   == Using SIP RTP CoS mark 5
   -- Executing [5000@default:1] NoOp("SIP/phone_a_2-00000009", "") in new stack
   -- Executing [5000@default:2] Queue("SIP/phone_a_2-00000009", "sales,t,,") in new stack
   -- Started music on hold, class 'default', on SIP/phone_a_2-00000009
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium01
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium02
   -- SIP/digium02-0000000b connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- SIP/digium01-0000000a connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- Got SIP response 486 "Do Not Disturb" back from 10.24.19.80:5060
   -- SIP/digium01-0000000a is busy
   -- Nobody picked up in 0 ms
   -- SIP/digium02-0000000b is ringing
   -- Nobody picked up in 15000 ms
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium01
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium02
   -- SIP/digium02-0000000d connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- SIP/digium01-0000000c connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- Got SIP response 486 "Do Not Disturb" back from 10.24.19.80:5060
   -- SIP/digium01-0000000c is busy
   -- Nobody picked up in 0 ms
   -- SIP/digium02-0000000d is ringing
   -- Nobody picked up in 15000 ms
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium01
 == Using SIP RTP CoS mark 5
   -- Called SIP/digium02
   -- SIP/digium02-0000000f connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- SIP/digium01-0000000e connected line has changed. Saving it until answer for SIP/phone_a_2-00000009
   -- Got SIP response 486 "Do Not Disturb" back from 10.24.19.80:5060
   -- SIP/digium01-0000000e is busy
   -- Nobody picked up in 0 ms
   -- SIP/digium02-0000000f is ringing
   -- Nobody picked up in 15000 ms
   -- Stopped music on hold on SIP/phone_a_2-00000009
 == Spawn extension (default, 5000, 2) exited non-zero on 'SIP/phone_a_2-00000009'
{noformat}

One thing that is different about your trace is that {{SIP/digium01}} is still dialed on the second time through. This probably keeps the interface from being NULL in the "last ditch effort" code at the end of Queue.

I noticed that your version of Asterisk is modified. Can you reproduce this with a fresh checkout of Asterisk that is unmodified?

By: Marco Signorini (marcoloway) 2013-10-04 02:41:06.482-0500

Hi Rusty and Matt.
I can reproduce with a fresh install of Asterisk SVN-branch-12-r400508.
Below is the CLI output.

{noformat}
queue show
300 has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0% within 0s
  Members:
     SIP/204 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
     SIP/200 (ringinuse enabled) (dynamic) (Not in use) has taken no calls yet
  No Callers

*CLI>   == Using SIP RTP CoS mark 5
   -- Executing [300@from-internal:1] NoOp("SIP/201-00000000", ""Called Queue 300") in new stack
   -- Executing [300@from-internal:2] Queue("SIP/201-00000000", "300,t,,") in new stack
   -- Started music on hold, class 'default', on SIP/201-00000000
 == Using SIP RTP CoS mark 5
   -- Called SIP/200
 == Using SIP RTP CoS mark 5
   -- Called SIP/204
   -- SIP/204-00000002 connected line has changed. Saving it until answer for SIP/201-00000000
   -- SIP/200-00000001 connected line has changed. Saving it until answer for SIP/201-00000000
   -- Got SIP response 486 "Busy" back from 10.10.5.117:5066
   -- SIP/204-00000002 is busy
   -- Nobody picked up in 1000 ms
   -- SIP/200-00000001 is ringing
   -- SIP/200-00000001 is ringing
   -- Nobody picked up in 15000 ms
   -- Nobody picked up in 15000 ms
Segmentation fault (core dumped)
{noformat}

SIP/204 is a Grandstream GXP2000 with FW1.2.5.3 but SIP/200 is an unbranded cheap SIP phone. The problem is no more replicable if there is only SIP/204 agent in the queue.

I'll try to upload the full backtrace.
Thanks!

Marco Signorini.



By: Marco Signorini (marcoloway) 2013-10-04 02:59:49.407-0500

Result of
{noformat}
gdb -se "asterisk" -ex "bt full" -ex "thread apply all bt" --batch -c core.30350 > backtrace.txt
{noformat}

By: Matt Jordan (mjordan) 2013-10-04 08:59:18.073-0500

The backtrace is helpful, but the configuration I'm using still doesn't reproduce the problem.

Can you please attach the relevant portions of your {{sip.conf}}, {{queues.conf}}, and {{extensions.conf}}? I'd prefer to use your exact set up to try and reproduce the problem.

By: Marco Signorini (marcoloway) 2013-10-14 04:47:20.952-0500

Hi Matt.
I've updated to Asterisk SVN-branch-12-r400872 and I can replicate the problem.
I'm uploading the whole SIP TCPDump capture (either in text format than in the binary format to be read with wireshark) and my relevant configuration files.
Thanks!

Marco Signorini.

By: Marco Signorini (marcoloway) 2013-10-14 04:51:20.317-0500

Core dump backtrace, configuration files and SIP tracelog dump with Asterisk SVN-branch-12-r400872

By: Matt Jordan (mjordan) 2013-11-04 11:59:22.164-0600

Marco: Kevin has a patch up on Review Board that fixes this issue. Can you test it?

https://reviewboard.asterisk.org/r/2983/diff/raw/

By: Marco Signorini (marcoloway) 2013-11-06 02:40:20.080-0600

Hi Matt and Kevin.
I can confirm that the proposed patch fixed the crash.
The fixed version I've tried reports the tag SVN-branch-12-r402452M.
Thank you!