Summary: | ASTERISK-27006: app_queue: Crash when hanging up with realtime queues | ||||||||||||||
Reporter: | Niklas Larsson (pnlarsson) | Labels: | |||||||||||||
Date Opened: | 2017-05-18 01:28:04 | Date Closed: | |||||||||||||
Priority: | Major | Regression? | Yes | ||||||||||||
Status: | Open/New | Components: | Applications/app_queue Core/Stasis | ||||||||||||
Versions: | 13.15.0 14.6.0 | Frequency of Occurrence | Occasional | ||||||||||||
Related Issues: |
| ||||||||||||||
Environment: | Debian 8, Asterisk 13.15 (13.14 did not present issue) | Attachments: | ( 0) backtrace_core.uc02.nguc.weblink.se-2017-05-17T08-02-12+0200.txt ( 1) backtrace_core.uc02.nguc.weblink.se-2017-05-17T13-38-19+0200.txt ( 2) backtrace_core.uc02.nguc.weblink.se-2017-05-23T13-31-52+0200.txt ( 3) backtrace_core.uc02.nguc.weblink.se-2017-05-26T09-10-53+0200.txt ( 4) backtrace_core.uc02.nguc.weblink.se-2017-05-30T16-04-54+0200.txt ( 5) C-00000345.txt ( 6) log.txt ( 7) queue_4.txt | ||||||||||||
Description: | Segfault in at hangup:
[2017-05-17 08:02:12] VERBOSE[12389][C-000003ea] bridge_channel.c: Channel PJSIP/ngcore-00000cd3 left 'simple_bridge' basic-bridge <098609d0-4ae7-47e3-980b-06a3b24f3014> Happens around ones a day. Running asterisk 13 branch, commit ddbc68b (Thu May 11 10:49:04 2017) | ||||||||||||||
Comments: | By: Asterisk Team (asteriskteam) 2017-05-18 01:28:05.395-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Niklas Larsson (pnlarsson) 2017-05-18 01:28:40.443-0500 Latest backtraces By: Joshua C. Colp (jcolp) 2017-05-22 07:05:04.659-0500 What's the queues.conf that is showing this? Are you using realtime queues? Do you have a console log leading up to the crash? By: Niklas Larsson (pnlarsson) 2017-05-23 06:53:33.076-0500 13 branch at commit 722ec0671e7a9d452872c766fc8a9017f37c666e Date: Wed May 17 11:31:27 2017 -0500 By: Niklas Larsson (pnlarsson) 2017-05-23 07:00:53.079-0500 Log leading up to the segfault in todays backtrace: {noformat} [2017-05-23 13:31:51] VERBOSE[8992][C-000002ab] bridge_channel.c: Channel PJSIP/wid12005-00000923 left 'simple_bridge' basic-bridge <c916b04f-c30a-4f41-a9a4-67c8c30e48f9> [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-0000041e;2 left 'simple_bridge' basic-bridge <c916b04f-c30a-4f41-a9a4-67c8c30e48f9> [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Spawn extension (app-queue-to-agent-real, 4870, 4) exited non-zero on 'Local/c5u43q4@company_5-queue-to-agent-0000041e;2' [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:1] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:2] Goto("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "ngpbx-hangup-call,hangup,1") in new stack [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx_builtins.c: Goto (ngpbx-hangup-call,hangup,1) [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:1] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:2] NoOp("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "QOS=") in new stack [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:3] Hangup("Local/c5u43q4@company_5-queue-to-agent-0000041e;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8980][C-000002ab] pbx.c: Spawn extension (ngpbx-hangup-call, hangup, 3) exited non-zero on 'Local/c5u43q4@company_5-queue-to-agent-0000041e;2' [2017-05-23 13:31:51] VERBOSE[8993][C-000002ab] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-0000041e;1 left 'simple_bridge' basic-bridge <61e21e60-1a0f-4b63-ad9d-4b0ec315de87> [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-00000416;2 left 'simple_bridge' basic-bridge <61e21e60-1a0f-4b63-ad9d-4b0ec315de87> [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Spawn extension (app-queue-to-agent-real, 042168280, 4) exited non-zero on 'Local/c5u32q3@company_5-queue-to-agent-00000416;2' [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:1] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [h@app-queue-to-agent-real:2] Goto("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "ngpbx-hangup-call,hangup,1") in new stack [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx_builtins.c: Goto (ngpbx-hangup-call,hangup,1) [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:1] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:2] NoOp("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "QOS=") in new stack [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Executing [hangup@ngpbx-hangup-call:3] Hangup("Local/c5u32q3@company_5-queue-to-agent-00000416;2", "") in new stack [2017-05-23 13:31:51] VERBOSE[8910][C-000002ab] pbx.c: Spawn extension (ngpbx-hangup-call, hangup, 3) exited non-zero on 'Local/c5u32q3@company_5-queue-to-agent-00000416;2' [2017-05-23 13:31:51] VERBOSE[8923][C-000002ab] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-00000416;1 left 'simple_bridge' basic-bridge <e3deef7a-3245-47dc-8403-3ff66435483a> [2017-05-23 13:31:51] VERBOSE[8836][C-000002ab] bridge_channel.c: Channel PJSIP/ngcore-00000918 left 'simple_bridge' basic-bridge <e3deef7a-3245-47dc-8403-3ff66435483a> {noformat} We are using realtime queues, attached conf from db. And we are using hangup-handlers, and this segfault was not happening in Asterisk 13.14.0 By: Rusty Newton (rnewton) 2017-05-24 12:29:25.461-0500 Niklas, in your environment would it be possible to turn up Asterisk debugging so we can get a more comprehensive log before the crash? https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information It may generate a sizeable log, so you may have to rotate and delete chunks to keep it from getting out of hand. If possible, grab a log before the next crash with warning,error,notice,verbose,debug where both verbose and debug levels are turned up to 5. Please attach your logs to the issue rather than posting them in comments. Just as you did with the other attachments such as the queue config. Thanks! By: Niklas Larsson (pnlarsson) 2017-05-30 10:10:48.218-0500 Tried setting debug to 9 but got a new kind of segfault, after 3 in a few hours I had to revert the debug 5 - added one backtrace (backtrace_core.uc02.nguc.weblink.se-2017-05-26T09-10-53+0200.txt). Added console trace for latest segfault, PJSIP/wid123-00000a4e is the channel I think causes the fault (log.txt), full concole log in C-00000345 By: Benjamin Keith Ford (bford) 2017-06-08 11:23:48.550-0500 Hello [~pnlarsson], can you attach your extensions.conf being used when this crash happens, as well as your pjsip.conf? Is your PJSIP configuration also in realtime? Does this crash happen _only_ during a hangup, or have you seen it happen outside of a hangup too? By: Niklas Larsson (pnlarsson) 2017-06-27 03:32:54.807-0500 Hi, extension.conf is a bit intresting - lots of AGI and stuff. Basically I add some hangup-handlers and when they are called the segfault happens. Predial handler for agent: {noformat} exten => agent_call,1,Noop() same => n,ExecIf($[${DEBUG_DUMPCHAN} = 1]?DumpChan()) same => n,GotoIf($[${LEN(${NGPBX_QUEUE_PICKUP_INTERFACE})}]?pickup) same => n,ExecIf($[${LEN(${NGPBX_QUEUE_INTERFACE})}]?UserEvent(AgentCalled,u_Queue: ${NGPBX_QUEUE_NAME},u_Interface: ${NGPBX_QUEUE_INTERFACE})) same => n,Set(CHANNEL(hangup_handler_push)=queue-agent-hangup-handler,s,1) same => n,Return() {noformat} And queue-agent-hangup-handler: {noformat} [queue-agent-hangup-handler] exten => s,1,Noop() same => n,ExecIf($[${DEBUG_DUMPCHAN} = 1]?DumpChan()) same => n,ExecIf($[${LEN(${CDR(answer)})} = 0]?Return) same => n,Set(ANSWEREDTIME=$[${EPOCH} - ${STRPTIME(${CDR(start)},Europe/Stockholm,%Y-%m-%d %H:%M:%S)}]) same => n,Set(DIALEDTIME=$[${EPOCH} - ${STRPTIME(${CDR(answer)},Europe/Stockholm,%Y-%m-%d %H:%M:%S)}]) same => n,UserEvent(Cdr,u_start: ${CDR(start)}) same => n,Return() {noformat} The pjsip stuff is also using realtime And the crash only happens at hangup - always at the same spot. Since this didn't happen before - I could start trying to find which commit that caused this - do you have any idea where I should start looking? (13.14 did not have this issue) By: Benjamin Keith Ford (bford) 2017-07-10 15:03:31.507-0500 Thanks for posting your debug info. An issue will be created that one of our developers will work on when they are available. In the mean time, if there is any new or additional information you think would be useful to help solve the problem, please post it on this issue! {quote} Since this didn't happen before - I could start trying to find which commit that caused this - do you have any idea where I should start looking? (13.14 did not have this issue) {quote} One of our devs will look into this. If you want to look for yourself, start with your commit and look for any commits after that that may have modified this segment of Asterisk. If you can track down the commit that caused this issue, please post it here! By: Rusty Newton (rnewton) 2017-08-01 16:58:54.057-0500 ASTERISK-27166 seems to be the same issue. By: Etienne Allovon (etienne_pf) 2017-08-18 07:34:30.585-0500 We have a similar problem (at least similar backtrace) : {code} ... [New LWP 12021] [New LWP 12031] [New LWP 12256] [New LWP 12238] [New LWP 12258] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/asterisk -g -U asterisk'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321 7321 app_queue.c: No such file or directory. (gdb) bt #0 0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321 #1 0x00000000005bdb88 in router_dispatch (data=0xa4edc0, sub=0xa1efe0, message=0x7fb88002ec80) at stasis_message_router.c:201 #2 0x00000000005b226a in subscription_invoke (message=0x7fb88002ec80, sub=0xa1efe0) at stasis.c:434 #3 dispatch_exec_async (local=<optimized out>) at stasis.c:715 #4 0x00000000005ca2c8 in ast_taskprocessor_execute (tps=tps@entry=0xa1f130) at taskprocessor.c:963 #5 0x00000000005d144e in execute_tasks (data=0xa1f130) at threadpool.c:1322 #6 0x00000000005ca21d in ast_taskprocessor_execute (tps=0x9daa90) at taskprocessor.c:965 #7 0x00000000005d1cc0 in threadpool_execute (pool=0x9dbde0) at threadpool.c:351 #8 worker_active (worker=0x7fb8ac000c80) at threadpool.c:1105 #9 worker_start (arg=arg@entry=0x7fb8ac000c80) at threadpool.c:1024 #10 0x00000000005db26c in dummy_start (data=<optimized out>) at utils.c:1235 #11 0x00007fb8c3c55064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x00007fb8c2f3d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt full #0 0x00007fb86aa1d029 in handle_hangup (userdata=0x13854f0, sub=<optimized out>, msg=<optimized out>) at app_queue.c:7321 queue_data = 0x13854f0 channel_blob = <optimized out> caller_snapshot = 0x0 member_snapshot = 0x7fb8b40115f8 chan = 0x0 reason = CALLER __PRETTY_FUNCTION__ = "handle_hangup" #1 0x00000000005bdb88 in router_dispatch (data=0xa4edc0, sub=0xa1efe0, message=0x7fb88002ec80) at stasis_message_router.c:201 router = 0xa4edc0 route = {message_type = <optimized out>, callback = 0x7fb86aa1ce2b <handle_hangup>, data = 0x13854f0} #2 0x00000000005b226a in subscription_invoke (message=0x7fb88002ec80, sub=0xa1efe0) at stasis.c:434 No locals. #3 dispatch_exec_async (local=<optimized out>) at stasis.c:715 sub = 0xa1efe0 message = 0x7fb88002ec80 #4 0x00000000005ca2c8 in ast_taskprocessor_execute (tps=tps@entry=0xa1f130) at taskprocessor.c:963 local = {local_data = 0xa1efe0, data = 0x7fb88002ec80} t = 0x7fb88001d7c0 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" #5 0x00000000005d144e in execute_tasks (data=0xa1f130) at threadpool.c:1322 tps = 0xa1f130 #6 0x00000000005ca21d in ast_taskprocessor_execute (tps=0x9daa90) at taskprocessor.c:965 local = {local_data = 0x7fb8ac000cb8, data = 0x90} t = 0x7fb880033fc0 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" #7 0x00000000005d1cc0 in threadpool_execute (pool=0x9dbde0) at threadpool.c:351 No locals. #8 worker_active (worker=0x7fb8ac000c80) at threadpool.c:1105 No locals. #9 worker_start (arg=arg@entry=0x7fb8ac000c80) at threadpool.c:1024 worker = 0x7fb8ac000c80 saved_state = <optimized out> __PRETTY_FUNCTION__ = "worker_start" #10 0x00000000005db26c in dummy_start (data=<optimized out>) at utils.c:1235 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140431140109520, -471374274666827947, 0, 140431139705760, 20, 140429859870464, 434008350720126805, -471374776567468203}, __mask_was_saved = 0}}, __pad = {0x7fb85fe9ceb0, 0x0, 0x7fb8c2fa1812 <__libc_thread_freeres+34>, 0x0}} __cancel_arg = 0x7fb85fe9d700 __not_first_call = <optimized out> ret = <optimized out> a = {start_routine = 0x5d193d <worker_start>, data = 0x7fb8ac000c80, name = 0x7fb8ac38bcd0 "worker_start started at [ 1079] threadpool.c worker_thread_start()"} #11 0x00007fb8c3c55064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #12 0x00007fb8c2f3d62d in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. {code} The crash occurs in {{handle_hangup}} in {{app_queue.c}} at : {code} ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername, reason == CALLER ? "COMPLETECALLER" : "COMPLETEAGENT", "%ld|%ld|%d", (long) (queue_data->starttime - queue_data->holdstart), (long) (time(NULL) - queue_data->starttime), queue_data->caller_pos); {code} because caller_snapshot is null. Our problem looks more like ASTERISK-27166. But in our case the problem occurs after a channel redirect followed by a direct transfer. Here's how to reproduce : Given three users SIP : U1, U2, U3 Given one queue Q1 with one member U2 (added as a SIP member : SIP/U2) U1 calls Q1 U2 rings U2 answers U2 *redirects* (seems important) U1 in Q1 (in asterisk CLI) {{channel redirect <SIP/U1> <Q1 context>,<Q1 exten>,1}} U2 rings (he is called back by the queue Q1 again) U2 answers U2 direct transfer (via his phone) to U3 U3 rings U3 answers U3 hangup -> then *asterisk crashes* I tested it on vanilla asterisk : - asterisk 13.13.1 vanilla : OK - asterisk 13.14.0 vanilla : OK - asterisk 13.14.1 vanilla : OK - asterisk 13.15.0 vanilla : *crashes* - asterisk 13.16.0 vanilla : *crashes* - asterisk 13.17.0 vanilla : *crashes* Below is the backtrace on a an asterisk 13.15.0 vanilla : {code} [New LWP 20876] [New LWP 20870] [New LWP 20886] [New LWP 20889] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `asterisk -cvvvg'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221 6221 ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername, (gdb) bt #0 0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221 #1 0x00000000005c2ece in router_dispatch (data=0x7f64cc226440, sub=0x7f64cc2264f0, message=0x7f64c80108a0) at stasis_message_router.c:201 #2 0x00000000005b6b3b in subscription_invoke (message=0x7f64c80108a0, sub=0x7f64cc2264f0) at stasis.c:434 #3 dispatch_exec_async (local=<optimized out>) at stasis.c:715 #4 0x00000000005cf16e in ast_taskprocessor_execute (tps=tps@entry=0x7f64cc226670) at taskprocessor.c:963 #5 0x00000000005d6400 in execute_tasks (data=0x7f64cc226670) at threadpool.c:1322 #6 0x00000000005cf0ce in ast_taskprocessor_execute (tps=0x16b0750) at taskprocessor.c:965 #7 0x00000000005d6cb8 in threadpool_execute (pool=<optimized out>) at threadpool.c:351 #8 worker_active (worker=<optimized out>) at threadpool.c:1105 #9 worker_start (arg=0x7f64d0000c70) at threadpool.c:1024 #10 0x00000000005e010a in dummy_start (data=<optimized out>) at utils.c:1235 #11 0x00007f64e5fbf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x00007f64e52a762d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt full #0 0x00007f649520afec in handle_hangup (userdata=0x7f64cc002890, sub=0x0, sub@entry=0x7f64cc2264f0, msg=0x7f649522268e, msg@entry=0x7f64c80108a0) at app_queue.c:6221 channel_blob = 0x1 caller_snapshot = 0x0 member_snapshot = 0x7f64bc005f78 chan = 0x0 reason = CALLER __PRETTY_FUNCTION__ = "handle_hangup" #1 0x00000000005c2ece in router_dispatch (data=0x7f64cc226440, sub=0x7f64cc2264f0, message=0x7f64c80108a0) at stasis_message_router.c:201 router = 0x7f64cc226440 route = {message_type = <optimized out>, callback = 0x7f649520adf0 <handle_hangup>, data = 0x7f64cc002890} #2 0x00000000005b6b3b in subscription_invoke (message=0x7f64c80108a0, sub=0x7f64cc2264f0) at stasis.c:434 No locals. #3 dispatch_exec_async (local=<optimized out>) at stasis.c:715 sub = 0x7f64cc2264f0 message = 0x7f64c80108a0 #4 0x00000000005cf16e in ast_taskprocessor_execute (tps=tps@entry=0x7f64cc226670) at taskprocessor.c:963 local = {local_data = 0x7f64cc2264f0, data = 0x7f64c80108a0} t = 0x7f64c8001af0 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" #5 0x00000000005d6400 in execute_tasks (data=0x7f64cc226670) at threadpool.c:1322 tps = 0x7f64cc226670 #6 0x00000000005cf0ce in ast_taskprocessor_execute (tps=0x16b0750) at taskprocessor.c:965 local = {local_data = 0x7f64b4003780, data = 0x7f64d0000c78} t = 0x7f64c8051880 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" #7 0x00000000005d6cb8 in threadpool_execute (pool=<optimized out>) at threadpool.c:351 No locals. #8 worker_active (worker=<optimized out>) at threadpool.c:1105 No locals. #9 worker_start (arg=0x7f64d0000c70) at threadpool.c:1024 __PRETTY_FUNCTION__ = "worker_start" #10 0x00000000005e010a in dummy_start (data=<optimized out>) at utils.c:1235 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140070963126432, -8086888111099764391, 0, 140070963126832, 17, 140069867263744, 8146034564801864025, -8086887509823217319}, __mask_was_saved = 0}}, __pad = { 0x7f648eae8ef0, 0x0, 0x7f64e530b812 <__libc_thread_freeres+34>, 0x0}} __cancel_arg = 0x7f648eae9700 __not_first_call = <optimized out> ret = <optimized out> a = {start_routine = 0x5d6950 <worker_start>, data = 0x7f64d0000c70, name = 0x7f64d00020a0 "worker_start started at [ 1079] threadpool.c worker_thread_start()"} #11 0x00007f64e5fbf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #12 0x00007f64e52a762d in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. {code} By: Rusty Newton (rnewton) 2017-08-18 08:54:29.696-0500 [~etienne_pf], thanks for the additional information! By: Carlos Oliva (coliva) 2017-08-22 05:55:53.114-0500 Same crash and backtrace here after update from 13.11.2 to 13.17.0 By: Niklas Larsson (pnlarsson) 2017-08-24 05:14:21.903-0500 Testing to return if caller_snapshot is empty from: caller_snapshot = ast_channel_snapshot_get_latest(queue_data->caller_uniqueid); but queue_data->caller_uniqueid is not empty: (gdb) print queue_data->member_uniqueid $5 = (const ast_string_field) 0x2d2cdd8 "s2-1503500285.2784" (gdb) print caller_snapshot $6 = (struct ast_channel_snapshot *) 0x0 So I did a small change - return if empty caller_snapshot after ao2_unlock(queue_data); - is it bad to skip the rest of the function? By: Rusty Newton (rnewton) 2017-08-25 07:36:28.849-0500 [~pnlarsson], if you don't get an answer on here then you should try asking on the Asterisk development list: http://lists.digium.com/mailman/listinfo/asterisk-dev or in the #asterisk-dev channel on irc.freenode.net. By: Niklas Larsson (pnlarsson) 2017-08-28 08:40:00.408-0500 Today it tripped the patch above - and while asterisk is still running it har two channels loose: {noformat} Channel Context Extension Prio State Application Data CallerID Duration Accountcode PeerAccount BridgeID Local/c5u43q4@compan company_5-queue-to-a enter 1 Up AppQueue (Outgoing Line) 4870 01:07:16 Local/6000@from-comp app-queue enter 10 Up Return (Empty) 0920245915 01:07:16 {noformat} They where bridged togheter in: 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803>: {noformat} # grep c4d2ea6a-2064-493e-8c85-ea00d16b2803 messages [2017-08-28 14:22:42] VERBOSE[21621][C-00000168] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-000001b6;1 joined 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> [2017-08-28 14:22:42] VERBOSE[21614][C-00000168] bridge_channel.c: Channel Local/6000@from-company_5-atxfer-000001b5;2 joined 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> [2017-08-28 14:22:42] VERBOSE[21615][C-00000168] bridge_channel.c: Channel Local/6000@from-company_5-atxfer-000001b5;2 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> [2017-08-28 14:22:42] VERBOSE[21615][C-00000168] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-000001ad;2 swapped with Local/6000@from-company_5-atxfer-000001b5;2 into 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> [2017-08-28 14:23:25] VERBOSE[21592][C-00000168] bridge_channel.c: Channel Local/c5u32q3@company_5-queue-to-agent-000001ad;2 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> [2017-08-28 14:23:25] VERBOSE[21621][C-00000168] bridge_channel.c: Channel Local/c5u43q4@company_5-queue-to-agent-000001b6;1 left 'simple_bridge' basic-bridge <c4d2ea6a-2064-493e-8c85-ea00d16b2803> {noformat} By: Niklas Larsson (pnlarsson) 2017-08-29 02:56:57.672-0500 I can now reproduce it - and changed the patch to: {noformat} --- a/apps/app_queue.c +++ b/apps/app_queue.c @@ -6255,13 +6255,16 @@ static void handle_hangup(void *userdata, struct stasis_subscription *sub, ast_debug(3, "Detected hangup of queue %s channel %s\n", reason == CALLER ? "caller" : "member", channel_blob->snapshot->name); - ast_queue_log(queue_data->queue->name, caller_snapshot->uniqueid, queue_data->member->membername, + ast_queue_log(queue_data->queue->name, queue_data->caller_uniqueid, queue_data->member->membername, reason == CALLER ? "COMPLETECALLER" : "COMPLETEAGENT", "%ld|%ld|%d", (long) (queue_data->starttime - queue_data->holdstart), (long) (time(NULL) - queue_data->starttime), queue_data->caller_pos); - send_agent_complete(queue_data->queue->name, caller_snapshot, member_snapshot, queue_data->member, + if(caller_snapshot){ + send_agent_complete(queue_data->queue->name, caller_snapshot, member_snapshot, queue_data->member, queue_data->holdstart, queue_data->starttime, reason); + } + update_queue(queue_data->queue, queue_data->member, queue_data->callcompletedinsl, queue_data->starttime); remove_stasis_subscriptions(queue_data); {noformat} I will miss the AMI event AgentComplete - but asterisk is not segfaulting and not leaving any channels around. The bug must be in ast_channel_snapshot_get_latest and missing data in Stasis Message Bus API cache. By: Alfredo (alfredopastor) 2018-01-22 18:13:45.648-0600 Hi, I have the same problem with the queue. In my queue I use a AGI but after time the Asterisk restart. I am using the node ari library and I want to tell you how I have my code. First, the call go to the Stasis and next continue in the dialplan where is the queue with the AGI. If a client hangup the call in this time, the event may occur. This is critical for the company, so I am very intersting in will resolve this bug, I could pay for this. I don't know if this is place rigth where I must write. Please, help me. My Asterisk version is: 13.18.4 ari_00 kernel: asterisk[1833]: segfault at 10 ip 00007f343fdde9cc sp 00007f342cb3ab90 error 4 in app_queue.so[7f343fdc6000+35000] By: Nathan Bruning (nathanb) 2018-03-07 06:31:46.141-0600 Our scenario is a AMI Redirect on a queue call (reported as ASTERISK-27707). I've done a little digging, this is the sequence of events that cause our segfault: - AMI redirect causes masquerade: ??Actually Masquerading PJSIP/xxx-00000008(6) into the structure of Surrogate/PJSIP/xxx-128.199.61.178-00000008(0)?? - {{ast_channel_destructor}} clears the stasis cache for the channel using {{publish_cache_clear}}: unique_id xxx.16, name PJSIP/xxx-0008, the refcount of the channel is 0. - {{handle_hangup}} in app_queue.c receives a message with {{queue_data->caller_uniqueid}} = xxx.16, but name is now "Surrogate/xxx", and refcount of the chan is 1. Stasis cache is empty for this unique id, and asterisk segfaults. Hope this helps in finding the root cause. By: tatiana tatiana (m.hippy.c@gmail.com) 2018-04-02 02:38:39.838-0500 I have the same problem with the queue and crashing Asterisk. Version Asterisk 13.20 When the solution to this problem is solved? By: Joshua C. Colp (jcolp) 2018-04-02 05:05:57.945-0500 There is no time frame on when this will get looked into. By: Pedro Dalston (Pedro-D) 2021-01-19 10:28:44.416-0600 I have this issue and would like to know how to fix or a workaround ? What version could I upgrade to fix it ? By: Joshua C. Colp (jcolp) 2021-01-19 10:32:20.028-0600 This issue remains open, so as far as is known the issue remains. |