[Home]

Summary:ASTERISK-27006: app_queue: Crash when hanging up with realtime queues
Reporter:Niklas Larsson (pnlarsson)Labels:
Date Opened:2017-05-18 01:28:04Date Closed:
Priority:MajorRegression?Yes
Status:Open/NewComponents:Applications/app_queue Core/Stasis
Versions:13.15.0 14.6.0 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-27567 app_queue.so segfault
is duplicated byASTERISK-28178 Program terminated with signal 11, Segmentation fault.
is duplicated byASTERISK-27166 app_queue: Crash when handling hangup with incomplete data
is duplicated byASTERISK-27707 Segfault after hanging up a queue call where Bridge() was used
is duplicated byASTERISK-27750 app_queue.so: crash when hanging up
is related toASTERISK-27567 app_queue.so segfault
Environment:Debian 8, Asterisk 13.15 (13.14 did not present issue) Attachments:( 0) backtrace_core.uc02.nguc.weblink.se-2017-05-17T08-02-12+0200.txt
( 1) backtrace_core.uc02.nguc.weblink.se-2017-05-17T13-38-19+0200.txt
( 2) backtrace_core.uc02.nguc.weblink.se-2017-05-23T13-31-52+0200.txt
( 3) backtrace_core.uc02.nguc.weblink.se-2017-05-26T09-10-53+0200.txt
( 4) backtrace_core.uc02.nguc.weblink.se-2017-05-30T16-04-54+0200.txt
( 5) C-00000345.txt
( 6) log.txt
( 7) queue_4.txt
Description:Segfault in at hangup:

[2017-05-17 08:02:12] VERBOSE[12389][C-000003ea] bridge_channel.c: Channel PJSIP/ngcore-00000cd3 left 'simple_bridge' basic-bridge <098609d0-4ae7-47e3-980b-06a3b24f3014>

Happens around ones a day.

Running asterisk 13 branch, commit ddbc68b (Thu May 11 10:49:04 2017)
Comments:By: Asterisk Team (asteriskteam) 2017-05-18 01:28:05.395-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Niklas Larsson (pnlarsson) 2017-05-18 01:28:40.443-0500

Latest backtraces

By: Joshua C. Colp (jcolp) 2017-05-22 07:05:04.659-0500

What's the queues.conf that is showing this? Are you using realtime queues? Do you have a console log leading up to the crash?

By: Niklas Larsson (pnlarsson) 2017-05-23 06:53:33.076-0500

13 branch at commit 722ec0671e7a9d452872c766fc8a9017f37c666e Date:   Wed May 17 11:31:27 2017 -0500

By: Niklas Larsson (pnlarsson) 2017-05-23 07:00:53.079-0500

Log leading up to the segfault in todays backtrace:
{noformat}
[2017-05-23 13:31:51] VERBOSE[8992][C-000002ab] bridge_channel.c: Channel PJSIP/wid12005-00000923 left 'simple_bridge' basic-bridge <c916b04f-c30a-4f41-a9a4-67c8c30e48f9>
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-0000041e;2 left 'simple_bridge' basic-bridge <c916b04f-c30a-4f41-a9a4-67c8c30e48f9>
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Spawn extension (app-queue-to-agent-real, 4870, 4) exited non-zero on 'Local/c5u43q4@company_5-queue-to-agent-0000041e;2'
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:1] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:2] Goto("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "ngpbx-hangup-call,hangup,1") in new stack
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx_builtins.c: Goto (ngpbx-hangup-call,hangup,1)
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:1] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:2] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "QOS=") in new stack
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:3] Hangup("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Spawn extension (ngpbx-hangup-call, hangup, 3) exited non-zero on 'Local/c5u43q4@company_5-queue-to-agent-0000041e;2'
[2017-05-23 13:31:51] VERBOSE[8993][C-000002ab] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-0000041e;1 left 'simple_bridge' basic-bridge <61e21e60-1a0f-4b63-ad9d-4b0ec315de87>
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-00000416;2 left 'simple_bridge' basic-bridge <61e21e60-1a0f-4b63-ad9d-4b0ec315de87>
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Spawn extension (app-queue-to-agent-real, 042168280, 4) exited non-zero on 'Local/c5u32q3@company_5-queue-to-agent-00000416;2'
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:1] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:2] Goto("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "ngpbx-hangup-call,hangup,1") in new stack
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx_builtins.c: Goto (ngpbx-hangup-call,hangup,1)
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:1] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:2] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "QOS=") in new stack
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:3] Hangup("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack
[2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Spawn extension (ngpbx-hangup-call, hangup, 3) exited non-zero on 'Local/c5u32q3@company_5-queue-to-agent-00000416;2'
[2017-05-23 13:31:51] VERBOSE[8923][C-000002ab] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-00000416;1 left 'simple_bridge' basic-bridge <e3deef7a-3245-47dc-8403-3ff66435483a>
[2017-05-23 13:31:51] VERBOSE[8836][C-000002ab] bridge_channel.c: Channel PJSIP/ngcore-00000918 left 'simple_bridge' basic-bridge <e3deef7a-3245-47dc-8403-3ff66435483a>
{noformat}

We are using realtime queues, attached conf from db.

And we are using hangup-handlers, and this segfault was not happening in Asterisk 13.14.0

By: Rusty Newton (rnewton) 2017-05-24 12:29:25.461-0500

Niklas, in your environment would it be possible to turn up Asterisk debugging so we can get a more comprehensive log before the crash?

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

It may generate a sizeable log, so you may have to rotate and delete chunks to keep it from getting out of hand.

If possible, grab a log before the next crash with warning,error,notice,verbose,debug where both verbose and debug levels are turned up to 5.

Please attach your logs to the issue rather than posting them in comments. Just as you did with the other attachments such as the queue config. Thanks!

By: Niklas Larsson (pnlarsson) 2017-05-30 10:10:48.218-0500

Tried setting debug to 9 but got a new kind of segfault, after 3 in a few hours I had to revert the debug 5 - added one backtrace (backtrace_core.uc02.nguc.weblink.se-2017-05-26T09-10-53+0200.txt).

Added console trace for latest segfault, PJSIP/wid123-00000a4e is the channel I think causes the fault (log.txt), full concole log in C-00000345



By: Benjamin Keith Ford (bford) 2017-06-08 11:23:48.550-0500

Hello [~pnlarsson], can you attach your extensions.conf being used when this crash happens, as well as your pjsip.conf? Is your PJSIP configuration also in realtime? Does this crash happen _only_ during a hangup, or have you seen it happen outside of a hangup too?

By: Niklas Larsson (pnlarsson) 2017-06-27 03:32:54.807-0500

Hi, extension.conf is a bit intresting - lots of AGI and stuff. Basically I add some hangup-handlers and when they are called the segfault happens.

Predial handler for agent:
{noformat}
exten => agent_call,1,Noop()
same => n,ExecIf($[${DEBUG_DUMPCHAN} = 1]?DumpChan())
same => n,GotoIf($[${LEN(${NGPBX_QUEUE_PICKUP_INTERFACE})}]?pickup)
same => n,ExecIf($[${LEN(${NGPBX_QUEUE_INTERFACE})}]?UserEvent(AgentCalled,u_Queue: ${NGPBX_QUEUE_NAME},u_Interface: ${NGPBX_QUEUE_INTERFACE}))
same => n,Set(CHANNEL(hangup_handler_push)=queue-agent-hangup-handler,s,1)
same => n,Return()
{noformat}
And queue-agent-hangup-handler:
{noformat}
[queue-agent-hangup-handler]
exten => s,1,Noop()
same => n,ExecIf($[${DEBUG_DUMPCHAN} = 1]?DumpChan())
same => n,ExecIf($[${LEN(${CDR(answer)})} = 0]?Return)
same => n,Set(ANSWEREDTIME=$[${EPOCH} - ${STRPTIME(${CDR(start)},Europe/Stockholm,%Y-%m-%d %H:%M:%S)}])
same => n,Set(DIALEDTIME=$[${EPOCH} - ${STRPTIME(${CDR(answer)},Europe/Stockholm,%Y-%m-%d %H:%M:%S)}])
same => n,UserEvent(Cdr,u_start: ${CDR(start)})
same => n,Return()
{noformat}

The pjsip stuff is also using realtime

And the crash only happens at hangup - always at the same spot.

Since this didn't happen before - I could start trying to find which commit that caused this - do you have any idea where I should start looking? (13.14 did not have this issue)

By: Benjamin Keith Ford (bford) 2017-07-10 15:03:31.507-0500

Thanks for posting your debug info. An issue will be created that one of our developers will work on when they are available. In the mean time, if there is any new or additional information you think would be useful to help solve the problem, please post it on this issue!
{quote}
Since this didn't happen before - I could start trying to find which commit that caused this - do you have any idea where I should start looking? (13.14 did not have this issue)
{quote}
One of our devs will look into this. If you want to look for yourself, start with your commit and look for any commits after that that may have modified this segment of Asterisk. If you can track down the commit that caused this issue, please post it here!

By: Rusty Newton (rnewton) 2017-08-01 16:58:54.057-0500

ASTERISK-27166 seems to be the same issue.

By: Etienne Allovon (etienne_pf) 2017-08-18 07:34:30.585-0500

We have a similar problem (at least similar backtrace) :

{code}
...
[New LWP 12021]
[New LWP 12031]
[New LWP 12256]
[New LWP 12238]
[New LWP 12258]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/asterisk -g -U asterisk'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321
7321    app_queue.c: No such file or directory.
(gdb) bt
#0  0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321
#1  0x00000000005bdb88 in router_dispatch (data=0xa4edc0, sub=0xa1efe0, message=0x7fb88002ec80) at stasis_message_router.c:201
#2  0x00000000005b226a in subscription_invoke (message=0x7fb88002ec80, sub=0xa1efe0) at stasis.c:434
#3  dispatch_exec_async (local=<optimized out>) at stasis.c:715
#4  0x00000000005ca2c8 in ast_taskprocessor_execute (tps=tps@entry=0xa1f130) at taskprocessor.c:963
#5  0x00000000005d144e in execute_tasks (data=0xa1f130) at threadpool.c:1322
#6  0x00000000005ca21d in ast_taskprocessor_execute (tps=0x9daa90) at taskprocessor.c:965
#7  0x00000000005d1cc0 in threadpool_execute (pool=0x9dbde0) at threadpool.c:351
#8  worker_active (worker=0x7fb8ac000c80) at threadpool.c:1105
#9  worker_start (arg=arg@entry=0x7fb8ac000c80) at threadpool.c:1024
#10 0x00000000005db26c in dummy_start (data=<optimized out>) at utils.c:1235
#11 0x00007fb8c3c55064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007fb8c2f3d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0  0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321
       queue_data = 0x13854f0
       channel_blob = <optimized out>
       caller_snapshot = 0x0
       member_snapshot = 0x7fb8b40115f8
       chan = 0x0
       reason = CALLER
       __PRETTY_FUNCTION__ = "handle_hangup"
#1  0x00000000005bdb88 in router_dispatch (data=0xa4edc0, sub=0xa1efe0, message=0x7fb88002ec80) at stasis_message_router.c:201
       router = 0xa4edc0
       route = {message_type = <optimized out>, callback = 0x7fb86aa1ce2b <handle_hangup>, data = 0x13854f0}
#2  0x00000000005b226a in subscription_invoke (message=0x7fb88002ec80, sub=0xa1efe0) at stasis.c:434
No locals.
#3  dispatch_exec_async (local=<optimized out>) at stasis.c:715
       sub = 0xa1efe0
       message = 0x7fb88002ec80
#4  0x00000000005ca2c8 in ast_taskprocessor_execute (tps=tps@entry=0xa1f130) at taskprocessor.c:963
       local = {local_data = 0xa1efe0, data = 0x7fb88002ec80}
       t = 0x7fb88001d7c0
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#5  0x00000000005d144e in execute_tasks (data=0xa1f130) at threadpool.c:1322
       tps = 0xa1f130
#6  0x00000000005ca21d in ast_taskprocessor_execute (tps=0x9daa90) at taskprocessor.c:965
       local = {local_data = 0x7fb8ac000cb8, data = 0x90}
       t = 0x7fb880033fc0
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#7  0x00000000005d1cc0 in threadpool_execute (pool=0x9dbde0) at threadpool.c:351
No locals.
#8  worker_active (worker=0x7fb8ac000c80) at threadpool.c:1105
No locals.
#9  worker_start (arg=arg@entry=0x7fb8ac000c80) at threadpool.c:1024
       worker = 0x7fb8ac000c80
       saved_state = <optimized out>
       __PRETTY_FUNCTION__ = "worker_start"
#10 0x00000000005db26c in dummy_start (data=<optimized out>) at utils.c:1235
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140431140109520, -471374274666827947, 0, 140431139705760, 20, 140429859870464, 434008350720126805, -471374776567468203}, __mask_was_saved = 0}}, __pad = {0x7fb85fe9ceb0,
           0x0, 0x7fb8c2fa1812 <__libc_thread_freeres+34>, 0x0}}
       __cancel_arg = 0x7fb85fe9d700
       __not_first_call = <optimized out>
       ret = <optimized out>
       a = {start_routine = 0x5d193d <worker_start>, data = 0x7fb8ac000c80, name = 0x7fb8ac38bcd0 "worker_start         started at [ 1079] threadpool.c worker_thread_start()"}
#11 0x00007fb8c3c55064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#12 0x00007fb8c2f3d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
{code}

The crash occurs in {{handle_hangup}} in {{app_queue.c}} at :
{code}
       ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername,
                       reason == CALLER ? "COMPLETECALLER" : "COMPLETEAGENT", "%ld|%ld|%d",
               (long) (queue_data->starttime - queue_data->holdstart),
               (long) (time(NULL) - queue_data->starttime), queue_data->caller_pos);
{code}
because caller_snapshot is null.

Our problem looks more like ASTERISK-27166. But in our case the problem occurs after a channel redirect followed by a direct transfer.

Here's how to reproduce :

Given three users SIP : U1, U2, U3
Given one queue Q1 with one member U2 (added as a SIP member : SIP/U2)

U1 calls Q1
U2 rings
U2 answers
U2 *redirects* (seems important) U1 in Q1 (in asterisk CLI) {{channel redirect <SIP/U1> <Q1 context>,<Q1 exten>,1}}
U2 rings (he is called back by the queue Q1 again)
U2 answers
U2 direct transfer (via his phone) to U3
U3 rings
U3 answers
U3 hangup
-> then *asterisk crashes*


I tested it on vanilla asterisk :
- asterisk 13.13.1 vanilla : OK
- asterisk 13.14.0 vanilla : OK
- asterisk 13.14.1 vanilla : OK
- asterisk 13.15.0 vanilla : *crashes*
- asterisk 13.16.0 vanilla : *crashes*
- asterisk 13.17.0 vanilla : *crashes*


Below is the backtrace on a an asterisk 13.15.0 vanilla :
{code}
[New LWP 20876]
[New LWP 20870]
[New LWP 20886]
[New LWP 20889]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `asterisk -cvvvg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221
6221            ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername,
(gdb) bt
#0  0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221
#1  0x00000000005c2ece in router_dispatch (data=0x7f64cc226440, sub=0x7f64cc2264f0, message=0x7f64c80108a0) at stasis_message_router.c:201
#2  0x00000000005b6b3b in subscription_invoke (message=0x7f64c80108a0, sub=0x7f64cc2264f0) at stasis.c:434
#3  dispatch_exec_async (local=<optimized out>) at stasis.c:715
#4  0x00000000005cf16e in ast_taskprocessor_execute (tps=tps@entry=0x7f64cc226670) at taskprocessor.c:963
#5  0x00000000005d6400 in execute_tasks (data=0x7f64cc226670) at threadpool.c:1322
#6  0x00000000005cf0ce in ast_taskprocessor_execute (tps=0x16b0750) at taskprocessor.c:965
#7  0x00000000005d6cb8 in threadpool_execute (pool=<optimized out>) at threadpool.c:351
#8  worker_active (worker=<optimized out>) at threadpool.c:1105
#9  worker_start (arg=0x7f64d0000c70) at threadpool.c:1024
#10 0x00000000005e010a in dummy_start (data=<optimized out>) at utils.c:1235
#11 0x00007f64e5fbf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f64e52a762d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0  0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221
       channel_blob = 0x1
       caller_snapshot = 0x0
       member_snapshot = 0x7f64bc005f78
       chan = 0x0
       reason = CALLER
       __PRETTY_FUNCTION__ = "handle_hangup"
#1  0x00000000005c2ece in router_dispatch (data=0x7f64cc226440, sub=0x7f64cc2264f0, message=0x7f64c80108a0) at stasis_message_router.c:201
       router = 0x7f64cc226440
       route = {message_type = <optimized out>, callback = 0x7f649520adf0 <handle_hangup>, data = 0x7f64cc002890}
#2  0x00000000005b6b3b in subscription_invoke (message=0x7f64c80108a0, sub=0x7f64cc2264f0) at stasis.c:434
No locals.
#3  dispatch_exec_async (local=<optimized out>) at stasis.c:715
       sub = 0x7f64cc2264f0
       message = 0x7f64c80108a0
#4  0x00000000005cf16e in ast_taskprocessor_execute (tps=tps@entry=0x7f64cc226670) at taskprocessor.c:963
       local = {local_data = 0x7f64cc2264f0, data = 0x7f64c80108a0}
       t = 0x7f64c8001af0
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#5  0x00000000005d6400 in execute_tasks (data=0x7f64cc226670) at threadpool.c:1322
       tps = 0x7f64cc226670
#6  0x00000000005cf0ce in ast_taskprocessor_execute (tps=0x16b0750) at taskprocessor.c:965
       local = {local_data = 0x7f64b4003780, data = 0x7f64d0000c78}
       t = 0x7f64c8051880
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#7  0x00000000005d6cb8 in threadpool_execute (pool=<optimized out>) at threadpool.c:351
No locals.
#8  worker_active (worker=<optimized out>) at threadpool.c:1105
No locals.
#9  worker_start (arg=0x7f64d0000c70) at threadpool.c:1024
       __PRETTY_FUNCTION__ = "worker_start"
#10 0x00000000005e010a in dummy_start (data=<optimized out>) at utils.c:1235
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140070963126432, -8086888111099764391, 0, 140070963126832, 17, 140069867263744, 8146034564801864025, -8086887509823217319}, __mask_was_saved = 0}}, __pad = {
           0x7f648eae8ef0, 0x0, 0x7f64e530b812 <__libc_thread_freeres+34>, 0x0}}
       __cancel_arg = 0x7f648eae9700
       __not_first_call = <optimized out>
       ret = <optimized out>
       a = {start_routine = 0x5d6950 <worker_start>, data = 0x7f64d0000c70, name = 0x7f64d00020a0 "worker_start         started at [ 1079] threadpool.c worker_thread_start()"}
#11 0x00007f64e5fbf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#12 0x00007f64e52a762d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
{code}

By: Rusty Newton (rnewton) 2017-08-18 08:54:29.696-0500

[~etienne_pf], thanks for the additional information!

By: Carlos Oliva (coliva) 2017-08-22 05:55:53.114-0500

Same crash and backtrace here after update from 13.11.2 to 13.17.0

By: Niklas Larsson (pnlarsson) 2017-08-24 05:14:21.903-0500

Testing to return if caller_snapshot is empty from:

caller_snapshot = ast_channel_snapshot_get_latest(queue_data->caller_uniqueid);

but queue_data->caller_uniqueid is not empty:

(gdb) print queue_data->member_uniqueid
$5 = (const ast_string_field) 0x2d2cdd8 "s2-1503500285.2784"
(gdb) print caller_snapshot
$6 = (struct ast_channel_snapshot *) 0x0

So I did a small change - return if empty caller_snapshot after ao2_unlock(queue_data); - is it bad to skip the rest of the function?

By: Rusty Newton (rnewton) 2017-08-25 07:36:28.849-0500

[~pnlarsson], if you don't get an answer on here then you should try asking on the Asterisk development list: http://lists.digium.com/mailman/listinfo/asterisk-dev or in the #asterisk-dev channel on irc.freenode.net.

By: Niklas Larsson (pnlarsson) 2017-08-28 08:40:00.408-0500

Today it tripped the patch above - and while asterisk is still running it har two channels loose:
{noformat}
Channel              Context              Extension        Prio State   Application  Data                      CallerID        Duration Accountcode PeerAccount BridgeID            
Local/c5u43q4@compan company_5-queue-to-a enter               1 Up      AppQueue     (Outgoing Line)           4870            01:07:16                                            
Local/6000@from-comp app-queue            enter              10 Up      Return       (Empty)                   0920245915      01:07:16                                            
{noformat}
They where bridged togheter in: 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>:
{noformat}
# grep c4d2ea6a-2064-493e-8c85-ea00d16b2803 messages
[2017-08-28 14:22:42] VERBOSE[21621][C-00000168] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-000001b6;1 joined 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
[2017-08-28 14:22:42] VERBOSE[21614][C-00000168] bridge_channel.c: Channel Local/6000@from-company_5-atxfer-000001b5;2 joined 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
[2017-08-28 14:22:42] VERBOSE[21615][C-00000168] bridge_channel.c: Channel Local/6000@from-company_5-atxfer-000001b5;2 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
[2017-08-28 14:22:42] VERBOSE[21615][C-00000168] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-000001ad;2 swapped with Local/6000@from-company_5-atxfer-000001b5;2 into 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
[2017-08-28 14:23:25] VERBOSE[21592][C-00000168] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-000001ad;2 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
[2017-08-28 14:23:25] VERBOSE[21621][C-00000168] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-000001b6;1 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>
{noformat}

By: Niklas Larsson (pnlarsson) 2017-08-29 02:56:57.672-0500

I can now reproduce it - and changed the patch to:
{noformat}
--- a/apps/app_queue.c
+++ b/apps/app_queue.c
@@ -6255,13 +6255,16 @@ static void handle_hangup(void *userdata, struct stasis_subscription *sub,
       ast_debug(3, "Detected hangup of queue %s channel %s\n", reason == CALLER ? "caller" : "member",
                       channel_blob->snapshot->name);

-       ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername,
+       ast_queue_log(queue_data->queue->name, queue_data->caller_uniqueid, queue_data->member->membername,
                       reason == CALLER ? "COMPLETECALLER" : "COMPLETEAGENT", "%ld|%ld|%d",
               (long) (queue_data->starttime - queue_data->holdstart),
               (long) (time(NULL) - queue_data->starttime), queue_data->caller_pos);

-       send_agent_complete(queue_data->queue->name, caller_snapshot, member_snapshot, queue_data->member,
+       if(caller_snapshot){
+               send_agent_complete(queue_data->queue->name, caller_snapshot, member_snapshot, queue_data->member,
                       queue_data->holdstart, queue_data->starttime, reason);
+       }
+
       update_queue(queue_data->queue, queue_data->member, queue_data->callcompletedinsl,
                       queue_data->starttime);
       remove_stasis_subscriptions(queue_data);
{noformat}
I will miss the AMI event AgentComplete - but asterisk is not segfaulting and not leaving any channels around.

The bug must be in ast_channel_snapshot_get_latest and missing data in Stasis Message Bus API cache.


By: Alfredo (alfredopastor) 2018-01-22 18:13:45.648-0600

Hi,

I have the same problem with the queue. In my queue I use a AGI but after time the Asterisk restart.

I am using the node ari library and I want to tell you how I have my code. First, the call go to the Stasis and next continue in the dialplan where is the queue with the AGI. If a client hangup the call in this time, the event may occur.

This is critical for the company, so I am very intersting in will resolve this bug, I could pay for this.

I don't know if this is place rigth where I must write. Please, help me.

My Asterisk version is: 13.18.4

ari_00 kernel: asterisk[1833]: segfault at 10 ip 00007f343fdde9cc sp 00007f342cb3ab90 error 4 in app_queue.so[7f343fdc6000+35000]

By: Nathan Bruning (nathanb) 2018-03-07 06:31:46.141-0600

Our scenario is a AMI Redirect on a queue call (reported as ASTERISK-27707).

I've done a little digging, this is the sequence of events that cause our segfault:

- AMI redirect causes masquerade: ??Actually Masquerading PJSIP/xxx-00000008(6) into the structure of Surrogate/PJSIP/xxx-128.199.61.178-00000008(0)??
- {{ast_channel_destructor}} clears the stasis cache for the channel using {{publish_cache_clear}}: unique_id xxx.16, name PJSIP/xxx-0008, the refcount of the channel is 0.
- {{handle_hangup}} in app_queue.c receives a message with {{queue_data->caller_uniqueid}} = xxx.16, but name is now "Surrogate/xxx", and refcount of the chan is 1. Stasis cache is empty for this unique id, and asterisk segfaults.

Hope this helps in finding the root cause.

By: tatiana tatiana (m.hippy.c@gmail.com) 2018-04-02 02:38:39.838-0500

I have the same problem with the queue and crashing Asterisk. Version Asterisk 13.20
When the solution to this problem is solved?

By: Joshua C. Colp (jcolp) 2018-04-02 05:05:57.945-0500

There is no time frame on when this will get looked into.

By: Pedro Dalston (Pedro-D) 2021-01-19 10:28:44.416-0600



I have this issue and would like to know how to fix or a workaround ? What version could I upgrade to fix it ?

By: Joshua C. Colp (jcolp) 2021-01-19 10:32:20.028-0600

This issue remains open, so as far as is known the issue remains.