[Home]

Summary:ASTERISK-14558: [patch] WARNING[23025]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local
Reporter:David Brillert (aragon)Labels:
Date Opened:2009-07-29 09:26:25Date Closed:2011-06-04 10:11:32
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 01102009cli.txt
( 1) 12292009log.txt
( 2) 12292009log2.txt
( 3) 14092009crashvalgrind.txt
( 4) 14092009malloc_debug.txt
( 5) 14092009valgrind.txt
( 6) 15609_13082009_cli_output_with_lock_debug.txt
( 7) 15609_13082009_gdb_thread_apply_all.txt
( 8) 15609cli_locked_in_autoservice_line89.txt
( 9) 15609sipdebug_cli14082009.txt
(10) 15609threadapplyall_locked_in_autoservice_line89.txt
(11) 15609threadapplyallbt_coreshowlocks10082009.txt
(12) 20091221__issue15609__1.4__3.diff.txt
(13) 20091228__issue15609.diff.txt
(14) 20091229__issue15609__1.4__3.diff.txt
(15) 20091229__issue15609__1.4__4.diff.txt
(16) 20091229__issue15609__1.4.diff.txt
(17) 20091230__issue16521__1.4__chan_local_only.diff.txt
(18) 22122009valgrind.txt
(19) agent1606trace.txt
(20) bt.asterisk-dev-mc-2009-09-28T14:33:33-0700.1491.txt
(21) core.17805.txt
(22) core.18716.txt
(23) core.187162.txt
(24) core.28287.txt
(25) core.4977.txt
(26) core.5134.txt
(27) core21428.txt
(28) core628.txt
(29) coreshowlocks14092009.txt
(30) database_show_valgrind_malloc_debug.txt
(31) database_show_valgrind.txt
(32) filestream_frisolate-1.4.diff.txt
(33) filestream_frisolate-1.4.diff2.txt
(34) gdb_asterisk_pid_database_show.txt
(35) logger_repeating_14.patch
(36) messages.zip
(37) mysqlcheck_cdr_gdb_pid_ofasterisk11_12_2009.txt
(38) segfault14082009.txt
(39) threadapplyallbt.txt
(40) threadapplyallbt2.txt
(41) threadapplyallbt3.txt
(42) threadapplyallbt4.txt
(43) valgrind.30092009.txt
(44) valgrind09102009ast_queue_frame.txt
Description:Since upgrading to 1.4 SVN 206273 I see LOTS these errors when paging or when calls are processed by app_queue.  When I see the messages during a page I don't hear any paging (my Polycom phones continue to ring but no paging audio).

I have no idea where the message is coming from how to reproduce, or collect debug information for this specific issue.  I need help to find root cause.
I think it could be caused by locking in autoservice since I see this lock every time I see the warning message

=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3057154960 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x81798c8 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)
=== -------------------------------------------------------------------

Comments:By: Darrin Henshaw (ayth) 2009-08-04 06:29:41

I'm noticing this issue as well. Running 1.4.26 on CentOS 4.7, on a Dell server.

We are trunking calls using IAX, however, it's happening over our pri line as well. Seems sporadic, I've been woken up multiple times lately due to this, channels are left open within asterisk, and are causing issues. The logs/console are being spammed with this:

[Aug  2 08:08:52] WARNING[9235] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-6580
[Aug  2 08:08:52] WARNING[9232] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-1819
[Aug  2 08:08:52] WARNING[9234] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-6580
[Aug  2 08:08:52] WARNING[9238] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-1819
[Aug  2 08:08:52] WARNING[9235] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-6580
[Aug  2 08:08:52] WARNING[9233] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-1819
[Aug  2 08:08:52] WARNING[9234] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-6580
[Aug  2 08:08:52] WARNING[9238] channel.c: Exceptionally long voice queue length queuing to IAX2/Company-Country-1819

Any ideas?

By: Christoph Stadlmann (cstadlmann) 2009-08-04 07:03:05

This phenomenon also occurs if you have a queue with 'ringall' strategy and the members are all on 'Local/' channels.
If two members answer the call at the same time, this deadlock occurs and no further calls can be processed.

By: Mark Michelson (mmichelson) 2009-08-04 13:19:59

That "core show locks" output is just plain bizarre. It's showing that the autoservice thread cannot lock the aslist because the aslist was already locked by another autoservice thread. The problem with this condition is that there is supposed to only be *one* autoservice thread running, ever.

Was the core show locks output repeated many times, or was it just showing that one thread? For anyone that is able to make this happen, it would be helpful to see a gdb backtrace of all threads from an unoptimized build so that I or someone else can verify that there are two autoservice threads running.

Also for Ayth and cstadlmann, do you also see the same "core show locks" output aragon is reporting, or are you just seeing the "exceptionally long queue" message?

By: Darrin Henshaw (ayth) 2009-08-04 13:25:28

As far as I know for sure I'm just getting the "exceptionally long queue" message. However, next time it happens, I'll check the output from "core show locks" and post it. Thanks.

By: David Brillert (aragon) 2009-08-04 16:35:55

mmichelson

Was the core show locks output repeated many times, or was it just showing that one thread? For anyone that is able to make this happen, it would be helpful to see a gdb backtrace of all threads from an unoptimized build so that I or someone else can verify that there are two autoservice threads running.

I can get this message to appear every thirty seconds or so under high load.
I only see one autoservice lock (not spamming).

Which command should I run to get the gdb output(s) you are looking for?
I have attached the gdb from gdb asterisk --pid=28898 to notes as well (threadapllyallbt.txt) and some CLI output prior to warning and core show locks output



By: Mark Michelson (mmichelson) 2009-08-04 16:38:16

The "thread apply all bt" output was what I wanted. Sorry, I should have been more specific. Thanks for the quick feedback.

By: Christoph Stadlmann (cstadlmann) 2009-08-05 00:55:04

Sorry guys, I have to withdraw my post. The suggestion provided in Bug 14112 seems to fix the issue for me, at least for queues.
Maybe also compiling with DONT_OPTIMIZE, DEBUG_CHANNEL_LOCKS, DEBUG_THREADS, DEBUG_FD_LEAKS and MALLOX_DEBUG solved the issue.

This is how I patched chan_local.c:

--- channels/chan_local.c.orig  2009-07-29 12:34:47.000000000 +0200
+++ channels/chan_local.c       2009-08-05 06:47:19.000000000 +0200
@@ -539,10 +539,9 @@
       if (!p)
               return -1;

+       /* SMR - Use the macro rather */
       while (ast_mutex_trylock(&p->lock)) {
-               ast_channel_unlock(ast);
-               usleep(1);
-               ast_channel_lock(ast);
+               DEADLOCK_AVOIDANCE(&ast->lock);
       }

       isoutbound = IS_OUTBOUND(ast, p);


mmichelson: maybe you should link bug 14112 and this one.

By: David Brillert (aragon) 2009-08-05 08:56:19

I'm not using chan_agent in any queues and 14112 was suspended.
Also I am running a non optimized build

By: Christoph Stadlmann (cstadlmann) 2009-08-05 09:03:07

The title of this issue is '...queuing to Local', so I thought you are using Local channels. I'm also not using chan_agent, the second part of the patch is not implemented in my system. The patch I provided above seems to solve issues with queues and chan_local.
Regardless of this, I have deadlocks with queues posted in another ticket (15637) which you already monitor.

By: Mark Michelson (mmichelson) 2009-08-05 12:53:16

aragon: The text file you uploaded has a slightly different output for "core show locks" than what you originally wrote in the description, so the conditions may not be exactly the same as the last time you started seeing this message. Specifically, the following lines are missing from the new text file you uploaded:

=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x81798c8 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)

This means that when you captured the output for the text file you uploaded, there was no actual deadlock. I found the thread that was printing all the "exceptionally long queue" messages, and it appears to be a typical channel thread. Are you using /n on your local channels?

cstadlmann: I would not expect for that patch you added to chan_local to actually help in this situation. The reason is the DEADLOCK_AVOIDANCE macro contains the same statements that were removed. The SMR comment alludes to the fact that he's just replacing the in-lined code with the macro version. The only thing that makes sense to me is that having DEBUG_THREADS enabled in menuselect is somehow causing a slight timing variation which causes the issue to be non-existent. Also, are you using /n on your local channels?

By: David Brillert (aragon) 2009-08-05 14:19:07

Are you using /n on your local channels? = yes

By: David Brillert (aragon) 2009-08-05 16:55:58

mmichelson

The lock comes and goes (not permanent until restart).
I have uploaded some more pid bt's (2-4) while I was watching the warnings.
Also I might be mistaken but the core show locks commands may be responsible for some of the locks, since when I issue the command I see the autoservice becomes actually locked (could be a coincidence).

By: David Brillert (aragon) 2009-08-05 16:58:26

Also it is very tough to get the core show locks output and the thread apply all bt output while the lock actually exists.  Maybe this means the locking is normal but as you said "is just plain bizarre" so I am taking your word for it.
Are the warnings normal or is something wrong here?

By: Christoph Stadlmann (cstadlmann) 2009-08-06 00:34:52

/n on local channels: YES.

By: David Brillert (aragon) 2009-08-10 09:15:19

mmichelson?

Any relation to https://issues.asterisk.org/view.php?id=15594

By: Mark Michelson (mmichelson) 2009-08-10 09:49:37

I don't think issue ASTERISK-14546 is related, but it's hard to tell. Not all of the backtraces from those crashes are the same, so it's hard to pinpoint the causes of those. Also, that issue is about crashes that are happening, as opposed to periods of potential deadlock, so I think it is unrelated.

I tried running some tests, including cstadlmann's test of having two queue members answer at the same time, but I could not make the warning message appear. I also tried some pages, as were suggested by aragon, but that did not work either.

My next plan is to try some other things, like placing transfers when using Local channels, to see if that might trigger the condition. Also, seeing as how cstadlmann's patch to chan_local appeared to fix his problem, I'm also trying the same set of tests as before but with no special compilation flags turned on, in case this is something which requires precise timing.

Basically, I'm trying to think of ways that can cause a large queue of audio frames to build up on a channel without the frames being read. Most of the time, a channel is put into autoservice so that such a condition cannot happen. One working theory I have is that since the calls are daisy-chained across local channels, one end of the local channel may be autoserviced correctly, but the other end is not, resulting in the large queue of voice frames.

As for Ayth's problem with IAX2 channels, I have no idea what's going on there. Sorry.

By: Christoph Stadlmann (cstadlmann) 2009-08-10 09:59:03

mmichelson,

after I have read your explanation, I don't think my 'patch' solved the issue. I assume it has something to do with the compiler flags, as you mentioned. The only thing I can say right now is that the messages containing 'Exceptionally long voice queue...' are gone, even when two queue members answer a call concurrently.

I kindly ask you to look at the issue https://issues.asterisk.org/view.php?id=15637. Maybe it's all about what's happending and described there.

By: David Brillert (aragon) 2009-08-10 09:59:41

I managed to get simultaneous core show locks output and thread apply all bt during an actual lock.
Uploading 15609threadapplyallbt_coreshowlocks10082009.txt

By: David Brillert (aragon) 2009-08-11 09:47:12

the warning messages always seem to appear when starting moh or when stopping moh

By: David Brillert (aragon) 2009-08-12 15:00:26

mmichelson,

Would anything else help such as enabling debug in logger ?



By: Mark Michelson (mmichelson) 2009-08-12 15:04:32

It may help, just to see when during the call the problem occurs. I can't guarantee it will be helpful, but it never hurts to have too much information :)

By: David Brillert (aragon) 2009-08-12 15:31:39

OK I enabled debug to console and this just spams the CLI so bad I cannot issue any commands.
Same lines over and over ad nausem.

[Aug 12 16:21:47] DEBUG[19046]: channel.c:4998 __ast_channel_unlock: ::::==== Unlocking AST channel SIP/siptrunk-09fea1b8
[Aug 12 16:21:47] DEBUG[19046]: channel.c:5019 __ast_channel_unlock: ::::==== Channel SIP/siptrunk-09fea1b8 was unlocked
[Aug 12 16:21:47] DEBUG[19046]: channel.c:5077 __ast_channel_trylock: ====:::: Trying to lock AST channel SIP/siptrunk-09fea1b8
[Aug 12 16:21:47] DEBUG[19046]: channel.c:5088 __ast_channel_trylock: :::=== Now have 1 locks (recursive)
[Aug 12 16:21:47] DEBUG[19046]: channel.c:5091 __ast_channel_trylock: ::::==== Channel SIP/siptrunk-09fea1b8 was locked

Every so often I see something a little different.

nlocking AST channel SIP/siptrunk-b4546790
[Aug 12 16:21:46] DEBUG[19705]: channel.c:5019 __ast_channel_unlock: [Aug 12 16:21:47] NOTICE[23647]: chan_sip.c:13163 handle_response_peerpoke: Peer '1671' is now Reachable. (622ms / 2000ms)
[Aug 12 16:21:47] DEBUG[23647]: devicestate.c:302 ast_device_state_changed_literal: Notification of state change to be queued on device/channel SIP/1671
[Aug 12 16:21:47] DEBUG[23647]: chan_sip.c:2258 __sip_ack: Stopping retransmission on '0d26e45e50b8438a2f2f8f4a586cab75@192.168.30.165' of Request 102: Match Found

[Aug 12 16:21:47] DEBUG[23647]: devicestate.c:302 ast_device_state_changed_literal: Notification of state change to be queued on device/channel SIP/1616

[Aug 12 16:21:46] DEBUG[23647]: channel.c:5095 __ast_channel_trylock: ::::==== Channel SIP/siptrunk-b4504708 failed to lock. Not waiting around...

[Aug 12 16:21:43] DEBUG[19730]: rtp.c:2712 ast_rtp_raw_write: Difference is 1312, ms is 184

By: Mark Michelson (mmichelson) 2009-08-12 17:03:17

Turn off DEBUG_CHANNEL_LOCKS. That's what's causing all of those spammy messages. Honestly, I don't even know why we have that option since it's mostly useless anyway.

By: David Brillert (aragon) 2009-08-13 08:50:37

I turned off locks.
Here is some CLI with debug enabled and I will try to get a text file uploaded with core show locks and simultaneous gdb output when there is an autoservice lock within the hour.

[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15758]: app_queue.c:2448 is_our_turn: There are 0 available members.
[Aug 13 09:47:28] DEBUG[15758]: app_queue.c:2466 is_our_turn: It's not our turn (SIP/siptrunk-088e7130).
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15754]: rtp.c:2712 ast_rtp_raw_write: Difference is 3144, ms is 413
[Aug 13 09:47:28] DEBUG[15811]: rtp.c:2712 ast_rtp_raw_write: Difference is 3176, ms is 417
[Aug 13 09:47:28] DEBUG[15636]: rtp.c:2712 ast_rtp_raw_write: Difference is 3160, ms is 415
[Aug 13 09:47:28] DEBUG[15479]: rtp.c:2712 ast_rtp_raw_write: Difference is 3200, ms is 420
[Aug 13 09:47:28] DEBUG[15903]: rtp.c:2712 ast_rtp_raw_write: Difference is 3184, ms is 418
[Aug 13 09:47:28] DEBUG[15651]: rtp.c:2712 ast_rtp_raw_write: Difference is 3208, ms is 421
[Aug 13 09:47:28] DEBUG[15641]: rtp.c:2712 ast_rtp_raw_write: Difference is 3192, ms is 419
[Aug 13 09:47:28] DEBUG[15636]: app_queue.c:2448 is_our_turn: There are 0 available members.
[Aug 13 09:47:28] DEBUG[15636]: app_queue.c:2466 is_our_turn: It's not our turn (SIP/siptrunk-0889bd18).
[Aug 13 09:47:28] DEBUG[15764]: rtp.c:2712 ast_rtp_raw_write: Difference is 3200, ms is 420
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15902]: rtp.c:2712 ast_rtp_raw_write: Difference is 3216, ms is 422
[Aug 13 09:47:28] DEBUG[15899]: rtp.c:2712 ast_rtp_raw_write: Difference is 3248, ms is 426
[Aug 13 09:47:28] DEBUG[15492]: rtp.c:2712 ast_rtp_raw_write: Difference is 3224, ms is 423
[Aug 13 09:47:28] DEBUG[15696]: rtp.c:2712 ast_rtp_raw_write: Difference is 3256, ms is 427
[Aug 13 09:47:28] DEBUG[15901]: rtp.c:2712 ast_rtp_raw_write: Difference is 3256, ms is 427
[Aug 13 09:47:28] DEBUG[15645]: rtp.c:2712 ast_rtp_raw_write: Difference is 3256, ms is 427
[Aug 13 09:47:28] DEBUG[15900]: rtp.c:2712 ast_rtp_raw_write: Difference is 3272, ms is 429
[Aug 13 09:47:28] DEBUG[15759]: rtp.c:2712 ast_rtp_raw_write: Difference is 3280, ms is 430
[Aug 13 09:47:28] DEBUG[15484]: rtp.c:2712 ast_rtp_raw_write: Difference is 3296, ms is 432
[Aug 13 09:47:28] DEBUG[15642]: rtp.c:2712 ast_rtp_raw_write: Difference is 3296, ms is 432
[Aug 13 09:47:28] DEBUG[15657]: rtp.c:2712 ast_rtp_raw_write: Difference is 3328, ms is 436
[Aug 13 09:47:28] DEBUG[15758]: rtp.c:2712 ast_rtp_raw_write: Difference is 3320, ms is 435
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15653]: rtp.c:2712 ast_rtp_raw_write: Difference is 3328, ms is 436
[Aug 13 09:47:28] DEBUG[15766]: rtp.c:2712 ast_rtp_raw_write: Difference is 3352, ms is 439
[Aug 13 09:47:28] DEBUG[15752]: rtp.c:2712 ast_rtp_raw_write: Difference is 3384, ms is 443
[Aug 13 09:47:28] DEBUG[15698]: rtp.c:2712 ast_rtp_raw_write: Difference is 3384, ms is 443
[Aug 13 09:47:28] DEBUG[15904]: rtp.c:2712 ast_rtp_raw_write: Difference is 3416, ms is 447
[Aug 13 09:47:28] DEBUG[15760]: rtp.c:2712 ast_rtp_raw_write: Difference is 3440, ms is 450
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15408]: rtp.c:923 ast_rtcp_read: Got RTCP report of 64 bytes
fireworx*CLI> core show locks
fireworx*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3051977616 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x819d4e8 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)
=== -------------------------------------------------------------------
===
=======================================================================

[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15492]: app_queue.c:2448 is_our_turn: There are 0 available members.
[Aug 13 09:47:28] DEBUG[15492]: app_queue.c:2466 is_our_turn: It's not our turn (SIP/siptrunk-089cdf50).
[Aug 13 09:47:28] WARNING[15052]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1610@default-agent-2b17,1
[Aug 13 09:47:28] DEBUG[15904]: app_queue.c:2448 is_our_turn: There are 0 available members.
[Aug 13 09:47:28] DEBUG[15904]: app_queue.c:2466 is_our_turn: It's not our turn (SIP/siptrunk-b5d6eba8).
[Aug 13 09:47:28] DEBUG[15903]: app_queue.c:2448 is_our_turn: There are 0 available members.
[Aug 13 09:47:28] DEBUG[15903]: app_queue.c:2466 is_our_turn: It's not our turn (SIP/siptrunk-b5d67300).

By: David Brillert (aragon) 2009-08-13 09:04:45

uploading 15609_13082009_gdb_thread_apply_all.txt
and 15609_13082009_cli_output with_locl_debug.txt

These logs were captured on Asterisk revision 211807



By: David Brillert (aragon) 2009-08-13 09:28:40

previous uploads showed lock in a different line of autoservice
Uploading two more txt files where Asterisk was locked in autoservice line 89
15609cli_locked in autoservice_line89.txt
15609threadapplyall_locked in autoservice_line89.txt

By: David Brillert (aragon) 2009-08-14 12:46:52

This time a crash so I uploaded segfault14082009.txt
I opened a ticket for this crash 15544

I saw some similarities in the core show locks output 15609threadapplyall_locked in autoservice_line89.txt from and the segfault gdb output segfault14082009.txt so I think there may be some relation.

By: David Brillert (aragon) 2009-08-14 13:21:25

Since there is some possiblity that this ticket is related to 15544 and I suspect that root cause of 15544 is somewhere in https://reviewboard.asterisk.org/r/308/
I have uploaded another CLI output with debug and sip debug enabled.
15609sipdebug_cli14082009.txt uploaded

By: David Brillert (aragon) 2009-08-14 15:06:57

Also if I replace the SIP trunk I am using to send calls between Asterisk with PRI spans I still get the warning messages.

By: David Brillert (aragon) 2009-08-17 09:46:35

Yikers

I'm now starting to see this in the field.
It appears that during the lock SIP phones lose their registrations and become unreachable.  This is also tearing down active calls.
Example:

[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:19:04] WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/153@default-local-d4b4,1
[Aug 13 18:28:40] WARNING[19883] chan_dahdi.c: Unable to request echo training on channel 49: Invalid argument
[Aug 13 18:34:40] NOTICE[7445] chan_sip.c: Peer '127' is now UNREACHABLE!  Last qualify: 19
[Aug 13 18:34:59] NOTICE[7445] chan_sip.c: Peer '127' is now Reachable. (16ms / 2000ms)
[Aug 13 18:45:43] NOTICE[7445] chan_sip.c: Peer '128' is now UNREACHABLE!  Last qualify: 21
[Aug 13 18:46:03] NOTICE[7445] chan_sip.c: Peer '128' is now Reachable. (15ms / 2000ms)
[Aug 13 18:53:28] NOTICE[7445] chan_sip.c: Peer '147' is now UNREACHABLE!  Last qualify: 21
[Aug 13 18:53:58] NOTICE[7445] chan_sip.c: Peer '147' is now Reachable. (15ms / 2000ms)
[Aug 13 19:09:27] WARNING[5209] chan_dahdi.c: Unable to request echo training on channel 76: Invalid argument
[Aug 13 19:13:27] NOTICE[7445] chan_sip.c: Peer '125' is now UNREACHABLE!  Last qualify: 21
[Aug 13 19:14:26] NOTICE[7445] chan_sip.c: Peer '125' is now Reachable. (16ms / 2000ms)
[Aug 13 19:28:51] WARNING[28325] file.c: Failed to write frame
[Aug 13 19:31:52] NOTICE[7445] chan_sip.c: Peer '129' is now UNREACHABLE!  Last qualify: 18
[Aug 13 19:32:16] NOTICE[7445] chan_sip.c: Peer '129' is now Reachable. (16ms / 2000ms)
[Aug 13 19:46:42] NOTICE[7445] chan_sip.c: Peer '150' is now UNREACHABLE!  Last qualify: 20
[Aug 13 19:47:14] NOTICE[7445] chan_sip.c: Peer '150' is now Reachable. (16ms / 2000ms)
[Aug 13 19:48:51] WARNING[20954] file.c: Failed to write frame
[Aug 13 19:58:21] NOTICE[7445] chan_sip.c: Peer '129' is now UNREACHABLE!  Last qualify: 21
[Aug 13 19:58:50] NOTICE[7445] chan_sip.c: Peer '129' is now Reachable. (15ms / 2000ms)
[Aug 13 20:07:01] NOTICE[7445] chan_sip.c: Peer '108' is now UNREACHABLE!  Last qualify: 22
[Aug 13 20:07:59] NOTICE[7445] chan_sip.c: Peer '108' is now Reachable. (16ms / 2000ms)
[Aug 13 20:17:05] NOTICE[7445] chan_sip.c: Peer '108' is now UNREACHABLE!  Last qualify: 20
[Aug 13 20:17:52] NOTICE[7445] chan_sip.c: Peer '108' is now Reachable. (17ms / 2000ms)
[Aug 13 21:15:50] NOTICE[7445] chan_sip.c: Peer '105' is now UNREACHABLE!  Last qualify: 18

By: David Brillert (aragon) 2009-08-17 09:55:23

IMHO the severity of this bug report must be raised.
It is very service affecting in the field.

By: David Brillert (aragon) 2009-08-17 10:07:04

Just prior to each bank of warnings for WARNING[7440] channel.c: Exceptionally long voice queue length queuing to Local/

I see warnings like these:

[Aug 17 08:57:41] WARNING[21276] file.c: Failed to write frame
[Aug 17 08:57:41] WARNING[21276] file.c: Failed to write frame

By: David Brillert (aragon) 2009-08-25 10:55:19

This is still taking my phones down daily.
The warning message always prefixes a bunch of phones becoming unreachable by qualify.
The phones and server are on dedicated LAN switching hardware.

By: Russell Bryant (russell) 2009-08-25 14:56:26

I am wondering if ASTERISK-14129 is related to this.  It will be good to know if this still happens after we get that issue resolved.

By: David Brillert (aragon) 2009-08-25 18:05:02

I patched asterisk-addons and tested on 1.4.26.1 but I still get these warnings, and the lock in autoservice

long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
fireworx*CLI> core show locks
fireworx*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3062971280 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x81798c8 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)
=== -------------------------------------------------------------------
===
=======================================================================

[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1
[Aug 25 19:01:32] WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1636@default-agent-6981,1

By: David Brillert (aragon) 2009-08-25 18:08:42

fyi compiler options in my traces have always been:
DON'T_OPTIMIZE MALLOC_DEBUG DEBUG_THREADS

By: David Brillert (aragon) 2009-08-26 07:23:34

The lack of audio during a page is not reproducible under normal circumstances.  I think that was caused during high call volume load tests while debugging bug 15109.
I'm pretty sure that anomaly was caused by https://issues.asterisk.org/view.php?id=15627
and I ran out of sockets while testing 1.4.26.1
However I am still seeing the warnings and autoservice locks under normal circumstances while testing the patch from 15109

By: David Brillert (aragon) 2009-08-31 13:25:47

Russell,
I'm not sure about any relationship to 15109 since there are no crashes just locks, also I applied the patch from 15109 but that does not help remove the locks or warnings.
These warnings and locks did not appear in 1.4.24.1 and I don't know which revision introduced this behavior... I miss viewVNC :(

I am now testing SVN-branch-1.4-r214940 with patched asterisk-addons(15109 patch) and I still see WARNING[3465]: channel.c:952 __ast_queue_frame: Exceptionally long voice queue length queuing to Local and Asterisk is still locking

[Aug 31 14:23:43] WARNING[23269]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1602@default-agent-4027,1
[Aug 31 14:23:43] WARNING[23269]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1602@default-agent-4027,1
fireworx*CLI> core show locks
fireworx*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3062037392 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x8178448 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)
=== -------------------------------------------------------------------
===
=======================================================================

[Aug 31 14:23:43] WARNING[23269]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1602@default-agent-4027,1

By: David Brillert (aragon) 2009-09-01 08:41:32

The CLI displays lots of these SIP dialog messages before and after the locks.

Really destroying SIP dialog '388df4933543d0cd1f4088ca18d95533@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '2ce9de5a6cce8cfc5f8e0c293d10d42f@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '62a213322864fee614dede9742872a96@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '40c642ba4875f3f76e1d1fc568056d86@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '05e75c10329c38e7147a6e372276ed4a@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '6048208a0c4f028b29468ee47f1bc093@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '7a1eb3581d202a9b5f622aa83a7d7fe8@192.168.30.165' Method: OPTIONS
   -- Executing [h@default-application-acd-new-english:1] Goto("SIP/siptrunk-b5ba72b0", "all-hangup|s|1") in new stack
   -- Goto (all-hangup,s,1)
   -- Executing [s@all-hangup:1] GotoIf("SIP/siptrunk-b5ba72b0", "0?all-faxnotify|s|1:2") in new stack
   -- Goto (all-hangup,s,2)
   -- Executing [s@all-hangup:2] ResetCDR("SIP/siptrunk-b5ba72b0", "w") in new stack
   -- Executing [s@all-hangup:3] NoCDR("SIP/siptrunk-b5ba72b0", "") in new stack
   -- Executing [s@all-hangup:4] System("SIP/siptrunk-b5ba72b0", "/var/www/scopserv/telephony/scripts/billing/cdr.sh 1251811800.23097") in new stack
Really destroying SIP dialog '68dfbcb92cdc4c79038aa09463d2c11e@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '066e4cf4238e779002f0a2383d6e2221@192.168.30.165' Method: OPTIONS
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
[Sep  1 09:31:31] WARNING[31458]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1620@default-agent-4558,1
fireworx*CLI> core show locks
fireworx*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3062037392 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x8178448 (1)
=== --- ---> Locked Here: autoservice.c line 89 (autoservice_run)
=== -------------------------------------------------------------------

By: David Brillert (aragon) 2009-09-04 07:51:17

This time I have a core dump (memory abort).
Attaching core21428.txt
I think this issue might be related to 15109 but asterisk-addons is patched with Russel's commit from that issue.

By: David Brillert (aragon) 2009-09-04 08:59:51

Russell,

I'm patching with your diff from 15719 and retesting
https://issues.asterisk.org/file_download.php?file_id=23674&type=bug

By: David Brillert (aragon) 2009-09-07 13:21:04

Russell,

Crashed (memory abort) with SVN-branch-1.4-r214940M and patch from 15719 core628.txt uploaded after crash



By: David Brillert (aragon) 2009-09-07 13:50:37

Could an admin please raise priority level of this bug report.
It is not minor since Asterisk is crashing often.  Looks more like a 1.4.27 block...

By: Amilcar S Silvestre (amilcar) 2009-09-09 07:54:10

I keep having crashes similar to this last core dump uploaded by aragon. I think 15109 was closed prematured. I don't use mp3, neighter have format_mp3.so loaded.

By: David Brillert (aragon) 2009-09-10 09:28:45

Another core dump (averaging one per day) running SVN r217156 including patch issue_15179.debug1.diff.txt with patched Asterisk-addons revision 1027
New backtrace core.5134.txt uploaded

By: godsyn (godsyn) 2009-09-10 12:07:03

This looks like the issue we are having. ~25000 calls daily, and it has crashed about three times in the last week. Constantly getting the above warnings in our console.

By: David Brillert (aragon) 2009-09-11 08:03:19

Uploading my daily crash dump.
core.28287.txt

I'll try running Valgrind...

By: David Brillert (aragon) 2009-09-14 08:04:01

Running under Valgrind over the weekend but no crash yet as of Monday.
Lots of locks though... so I have to restart Asterisk since it is no longer processing any calls.  Uploaded coreshowlocks14092009.txt, 14092009valgrind.txt, 14092009malloc_debug.txt but I can't do a thread apply all bt to Valgrind thread.

By: David Brillert (aragon) 2009-09-14 10:37:28

Also I get millions of messages in the CLI (fills my 2 million line scrollback in putty) when running in Valgrind.
Warnings like these:
[Sep 13 05:15:18] WARNING[28750]: pbx.c:4711 add_pri: Unable to register extension 's', priority 8 in 'default-aa-gmmesseng', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-CONGESTION', priority 1 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-CONGESTION', priority 2 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-NOANSWER', priority 1 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-NOANSWER', priority 2 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-0', priority 1 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[26789]: pbx.c:4711 add_pri: Unable to register extension 'rc-0', priority 2 in 'macro-dial-result', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3010', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3011', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3012', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3013', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3014', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3015', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3016', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3017', priority 1 in 'default-local-directory', already in use
[Sep 13 05:15:18] WARNING[32534]: pbx.c:4711 add_pri: Unable to register extension '3018', priority 1 in 'default-local-directory', already in use

By: David Brillert (aragon) 2009-09-14 16:05:14

I restarted asterisk under valgrind and asterisk crashed before it could even start processing calls. Uploading the valgrind trace anyway...
14092009crashvalgrind.txt

By: Leif Madsen (lmadsen) 2009-09-16 08:30:58

Just upgraded this to Major, then realized it is a Crash severity. Correcting that now.

By: Russell Bryant (russell) 2009-09-16 08:50:40

Note to self on a file to go back and look at: coreshowlocks14092009.txt contains full information on a deadlock that needs to be fixed

By: Tilghman Lesher (tilghman) 2009-09-16 12:57:38

While coreshowlocks14092009.txt contains a deadlock, 1.4 code does not contain that deadlock.

=== Thread ID: 186456976 (session_do           started at [ 2524] manager.c accept_thread())
=== ---> Lock #0 (manager.c): RDLOCK 2283 process_message &actionlock 0x81851e0 (1)
=== ---> Lock #1 (pbx.c): WRLOCK 6169 ast_wrlock_contexts &conlock 0x8185280 (1)
=== ---> Lock #2 (pbx.c): MUTEX 3959 ast_merge_contexts_and_delete &(&hints)->lock 0x81809c8 (1)
=== ---> Waiting for Lock #3 (pbx.c): MUTEX 5319 __ast_context_destroy &tmp->lock 0x6db42c0 (1)
=== --- ---> Locked Here: pbx.c line 4844 (ast_add_extension2)
=== -------------------------------------------------------------------
=== -------------------------------------------------------------------
===
=== Thread ID: 187440016 (session_do           started at [ 2524] manager.c accept_thread())
=== ---> Lock #0 (manager.c): RDLOCK 2283 process_message &actionlock 0x81851e0 (1)
=== ---> Lock #1 (pbx.c): MUTEX 4844 ast_add_extension2 &con->lock 0x6db42c0 (1)
=== ---> Waiting for Lock #2 (pbx.c): MUTEX 2154 ast_add_hint &(&hints)->lock 0x81809c8 (1)
=== --- ---> Locked Here: pbx.c line 3959 (ast_merge_contexts_and_delete)
=== -------------------------------------------------------------------

but in 1.4, we unlock &con->lock before calling ast_add_hint (changed in revision 29051, which was prior to 1.4 being branched).
aragon:  are you running a modified version of Asterisk?



By: David Brillert (aragon) 2009-09-16 13:01:45

SVN r217156 including patch issue_15179.debug1.diff.txt with patched Asterisk-addons revision 1027

By: Tilghman Lesher (tilghman) 2009-09-16 13:06:29

I have no idea where you got that patch, because ASTERISK-14188 has no patches uploaded to it.

By: David Brillert (aragon) 2009-09-16 13:15:44

A typo in my notes...issue_15719.debug1.diff.txt

Here is the link to the actual patch
https://issues.asterisk.org/file_download.php?file_id=23674&type=bug

By: Russell Bryant (russell) 2009-09-17 21:49:03

I have uploaded a patch, filestream-frisolate-1.4.diff.txt, which is an attempt at a workaround for this filestream crash.  I still don't know what the problem is, but I have modified frame handling from filestreams in hopes that it avoids whatever the problem is.

Let me know what happens ...

By: David Brillert (aragon) 2009-09-18 08:56:52

OK, now testing SVN-branch-1.4-r219320M with filestream-frisolate-1.4.diff.txt patch, and asterisk-addons revision 1027.

By: David Brillert (aragon) 2009-09-18 09:04:44

Right away I notice that I am still gettting these warnings and I still don't know if they are normal/benign?

[Sep 18 09:55:34] WARNING[24164]: channel.c:953 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1625@default-agent-61bf,1
[Sep 18 09:55:34] WARNING[24164]: channel.c:953 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1625@default-agent-61bf,1
[Sep 18 09:55:34] WARNING[24164]: channel.c:953 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1625@default-agent-61bf,1

By: Russell Bryant (russell) 2009-09-18 09:08:32

It shouldn't happen.  However, I would not expect my patch to change it.

The patch is to address the crashes that you (and many others) have seen, where you see it crashing when trying to free a frame, and then destroying a filestream.  It has often been seen from musiconhold code.

This bug report includes information on a few problems at this point:

1) The issue where you see the "Exceptionally long voice queue ..."

2) The filestream crash (a number of backtraces posted showing that crash).

3) At least one deadlock (shown by the core show locks output that has been posted)

#3 may be the cause of #1.  However, the patch is only for #2.

By: David Brillert (aragon) 2009-09-18 15:43:51

Crashed already...  looks like a iax2 bug.
Either this is a new issue in revision 219320+patch or the crash is related to problem 2)
I'm not really using IAX2... there is no traffic here.

Uploaded core.17805.txt

By: Digium Subversion (svnbot) 2009-09-18 21:53:07

Repository: asterisk
Revision: 219586

U   branches/1.4/channels/chan_iax2.c

------------------------------------------------------------------------
r219586 | russell | 2009-09-18 21:53:06 -0500 (Fri, 18 Sep 2009) | 6 lines

Make sure the iax_pvt exists before dereferencing it.

This fixes the latest crash posted on issue 15609.

(issue ASTERISK-14558)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=219586

By: Digium Subversion (svnbot) 2009-09-18 22:01:45

Repository: asterisk
Revision: 219587

_U  trunk/
U   trunk/channels/chan_iax2.c

------------------------------------------------------------------------
r219587 | russell | 2009-09-18 22:01:45 -0500 (Fri, 18 Sep 2009) | 13 lines

Merged revisions 219586 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r219586 | russell | 2009-09-18 21:51:13 -0500 (Fri, 18 Sep 2009) | 6 lines
 
 Make sure the iax_pvt exists before dereferencing it.
 
 This fixes the latest crash posted on issue 15609.
 
 (issue ASTERISK-14558)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=219587

By: Digium Subversion (svnbot) 2009-09-18 22:08:43

Repository: asterisk
Revision: 219588

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_iax2.c

------------------------------------------------------------------------
r219588 | russell | 2009-09-18 22:08:42 -0500 (Fri, 18 Sep 2009) | 20 lines

Merged revisions 219587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r219587 | russell | 2009-09-18 21:59:52 -0500 (Fri, 18 Sep 2009) | 13 lines
 
 Merged revisions 219586 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r219586 | russell | 2009-09-18 21:51:13 -0500 (Fri, 18 Sep 2009) | 6 lines
   
   Make sure the iax_pvt exists before dereferencing it.
   
   This fixes the latest crash posted on issue 15609.
   
   (issue ASTERISK-14558)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=219588

By: Digium Subversion (svnbot) 2009-09-18 22:12:14

Repository: asterisk
Revision: 219589

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_iax2.c

------------------------------------------------------------------------
r219589 | russell | 2009-09-18 22:12:14 -0500 (Fri, 18 Sep 2009) | 20 lines

Merged revisions 219587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r219587 | russell | 2009-09-18 21:59:52 -0500 (Fri, 18 Sep 2009) | 13 lines
 
 Merged revisions 219586 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r219586 | russell | 2009-09-18 21:51:13 -0500 (Fri, 18 Sep 2009) | 6 lines
   
   Make sure the iax_pvt exists before dereferencing it.
   
   This fixes the latest crash posted on issue 15609.
   
   (issue ASTERISK-14558)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=219589

By: Digium Subversion (svnbot) 2009-09-18 22:16:20

Repository: asterisk
Revision: 219590

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_iax2.c

------------------------------------------------------------------------
r219590 | russell | 2009-09-18 22:16:20 -0500 (Fri, 18 Sep 2009) | 20 lines

Merged revisions 219587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r219587 | russell | 2009-09-18 21:59:52 -0500 (Fri, 18 Sep 2009) | 13 lines
 
 Merged revisions 219586 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r219586 | russell | 2009-09-18 21:51:13 -0500 (Fri, 18 Sep 2009) | 6 lines
   
   Make sure the iax_pvt exists before dereferencing it.
   
   This fixes the latest crash posted on issue 15609.
   
   (issue ASTERISK-14558)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=219590

By: Leif Madsen (lmadsen) 2009-09-22 09:27:56

Looks like Russell made an update here and got it committed. Can you verify you're still getting the crash on the latest version?

I suspect it may still do it, but if you're not using chan_iax2, then perhaps you can just do a noload => chan_iax2.so to make sure it isn't related -- or at least not a separate issue that needs to be opened in a new report.

Thanks!

By: David Brillert (aragon) 2009-09-22 09:44:38

Leif,

I'll be back in the office tomorrow and will update Asterisk with Russell's IAX2 patch.
Since Friday I have 16 core dumps that I must analyze for relevance.
IAX2 config does exist for a small test trunk but there is only traffic on a SIP trunk.
The WARNING messages are generating regardless of trunk type.
I have tested the following trunk types:
SIP
IAX2
PRI

The test scripts do the following:
1. Multiple Asterisk servers hammer my test server through a SIP trunk. Some of those Asterisk servers act as SIP clients to emulate many SIP phones with an ACD agent login script.
2. Multiple agents answer calls while logged into queue with dynamic agent login and /n option using Polycom IP550 phones with 3.1.3revC firmware.
3. After agent answer caller is transferred to another SIP extension with native Asterisk attended transfer code. Once a Asterisk SIP client receives a transferred call we auto answer with "weasels" prompt.

When the exceptionally long warning message appears multiple SIP extensions become unavailable with SIP qualify in Asterisk for each extension.
Total number of extensions on test server is about 100

By: David Brillert (aragon) 2009-09-22 10:05:47

Patch filestream-frisolate-1.4.diff.txt patch won't patch
SVN-branch-1.4-r219720

wget 'https://issues.asterisk.org/file_download.php?file_id=23845&type=bug' -O - | patch -p0

patching file include/asterisk/frame.h
patching file include/asterisk/file.h
patching file main/frame.c
patching file main/file.c
Hunk ASTERISK-2 FAILED at 907.
Hunk ASTERISK-3 succeeded at 1339 (offset 16 lines).
1 out of 7 hunks FAILED -- saving rejects to file main/file.c.rej

By: David Brillert (aragon) 2009-09-22 11:47:32

Leif,

1.4 SVN r219720 will include Russell's IAX2 patch
So once I can patch r219720 with Russell's filestream patch I will do the following.
Remove all core dumps and re-run all tests and wait for crash then post back traces.

Waiting for updated filestream patch.

By: David Brillert (aragon) 2009-09-22 15:13:07

For now I am testing rev219720 and for a while I disabled IAX2 to see if the WARNINGS would go away.  They did not. I tested this because there seems to be a duplicate bug report of ASTERISK-14558 at ASTERISK-14834 (but without any reported crashes).
I have re-enabled IAX2 to keep testing Russell's 219856 commit.
Still waiting for new filestream patch for rev219720 but at least rev219720 will allow me to test https://reviewboard.asterisk.org/r/372/ commit 219653 and see if that helps at all.

By: Leif Madsen (lmadsen) 2009-09-22 15:50:44

Note that I have just seen this same message on a clients system running 1.6.1.x.

I'll be looking into this tomorrow.

By: David Brillert (aragon) 2009-09-23 16:04:40

Another memory abort today on rev219720
the filestream patch ready for testing won't insert on rev219720 so I can't test that patch right now while using Russell's IAX2 commit.

Regardless the backtrace seems to be pointing some blame on autoservice.c in the latest back trace uploaded core.18716.txt

Could it be that more than one autoservice thread is running at a time as implied by mmichelson?  It seems to be responsible for a lock and a memory abort.

By: Russell Bryant (russell) 2009-09-23 18:58:42

aragon: Can you please post the output of "thread apply all bt" and then "thread apply all bt full" from that last core dump that is associated with core.18716.txt?

Also, I'll update that patch shortly.

By: Russell Bryant (russell) 2009-09-23 19:01:00

New patch for latest 1.4 uploaded, ilestream_frisolate-1.4.diff2.txt

By: David Brillert (aragon) 2009-09-23 19:09:15

Uploaded back trace as requested core.18716(2).txt

By: David Brillert (aragon) 2009-09-23 19:31:16

new patch applied, test results soon

By: David Brillert (aragon) 2009-09-23 19:42:53

Edit: Total brain fart, deleting my last note.
Asterisk is running with diff2 patch.
Will post test results soon



By: David Brillert (aragon) 2009-09-24 15:44:49

I tracked the call history of one call resulting in the WARNING and uploaded the filtered CLI output as agent1606trace.txt

-A call enters queue over SIP trunk
-Starts MOH
-Calls Local/1606@default-agent/n
-Local/1606@default-agent-4274,1 answers
-MOH stops
-h extension executes a normal all-hangup
-Some CDR stuff
-core show locks display lock in autoservice.c line 89
-The WARNINGS are displayed in the CLI

By: David Brillert (aragon) 2009-09-25 08:15:05

In my previous note I mentioned the circumstances of the lock.
Then I remembered something I saw in one of the latest committed revisions.
Reviewboard r362 committed in revision 219136.
I am currently testing SVN-branch-1.4-r219816M and I still see the warnings and locks.

This quote from the review board made me think of the original issue posted in my bug report about the WARNINGS and locks.  It maybe irrelevant but I wouldn't know since I have no programming skills.

Russell Bryant
Posted 1 week ago (September 17th, 2009, 9:24 a.m.)

I have one last comment, but it's really about code that was already there.  It seems like we hold the channel list locked a whole lot longer than we need to when freeing a channel.  However, I guess we should just leave it alone, since in the majority of cases, the channel list isn't going to be touched in ast_channel_free() anymore, since you have it in ast_hangup().  Ship it!

/tags/1.6.1.1/main/channel.c (Diff revision 4)


1399

if (inlist)

1393

AST_RWLIST_UNLOCK(&channels);

1400

AST_RWLIST_UNLOCK(&channels);

   Can you think of any reason we wait this long to unlock the channel list?

      1.

         Matthew Nicholson 1 week ago (September 17th, 2009, 9:29 a.m.)

             I noticed this as well.  I have no idea why the code is like this.  We should be able to unlock the list once we remove our channel from it without any problems.  Thanks for the feedback.

By: David Brillert (aragon) 2009-09-27 10:14:35

New crash.
Uploaded core.4977.txt

By: newborn (newborn) 2009-09-28 06:36:53

i cannot reproduce the error, but it repeated once more. the CLI history:

[Sep 28 15:07:33]     -- Accepting call from '' to '0840' on channel 0/27, span 3
[Sep 28 15:07:33]     -- Executing [0840@from-qsig:1] Set("DAHDI/89-1", "CELL_PHONE=") in new stack
[Sep 28 15:07:33]     -- Executing [0840@from-qsig:2] Set("DAHDI/89-1", "CELL_TIMEOUT=") in new stack
[Sep 28 15:07:33]     -- Executing [0840@from-qsig:3] GotoIf("DAHDI/89-1", "1?4:6") in new stack
[Sep 28 15:07:33]     -- Goto (from-qsig,0840,4)
[Sep 28 15:07:33]     -- Executing [0840@from-qsig:4] Dial("DAHDI/89-1", "Local/m4-1129@external/n/n,") in new stack
[Sep 28 15:07:33]     -- Executing [m4-1129@external:1] Set("Local/m4-1129@external-9791;2", "TIMEOUT(absolute)=3600") in new stack
[Sep 28 15:07:33]     -- Called m4-1129@external/n/n
[Sep 28 15:07:33] Channel will hangup at 2009-09-28 16:07:33.000 MSD.
[Sep 28 15:07:33]     -- Executing [m4-1129@external:2] Dial("Local/m4-1129@external-9791;2", "DAHDI/G1/812345673,,rD(www1129)") in new stack
[Sep 28 15:07:33]     -- Requested transfer capability: 0x00 - SPEECH
[Sep 28 15:07:33]     -- Called G1/812345673
[Sep 28 15:07:33]     -- Local/m4-1129@external-9791;1 is ringing
[Sep 28 15:07:33]     -- DAHDI/44-1 is proceeding passing it to Local/m4-1129@external-9791;2
[Sep 28 15:07:33]     -- DAHDI/44-1 is ringing
[Sep 28 15:07:33]     -- Local/m4-1129@external-9791;1 is ringing
[Sep 28 15:07:33]     -- DAHDI/44-1 answered Local/m4-1129@external-9791;2
[Sep 28 15:07:33]     -- Sending DTMF 'www1129' to the called party.
...

[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2
[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2
[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2
[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2
[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2
[Sep 28 15:09:23] WARNING[20156]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m4-1129@external-9791;2

etc

my asterisk version is 1.6.0.14

also, i cannot restart asterisk with /etc/init.d/asterisk restart

i have to kill -9 "..."



By: newborn (newborn) 2009-09-28 07:43:34

it seems to be a "Local" channel bug, or probably "/n" suffix. i've permanently disabled it now, so i'll keep monitoring my system and this thread =)



By: David Brillert (aragon) 2009-09-28 08:14:01

core.4977.txt captured while running SVN-branch-1.4-r219816M with filestreamdiff2 patch. A slightly different back trace than before since "Program terminated with signal 11, Segmentation fault"
I uploaded this back trace to ASTERISK-14558 since I have no idea if it is related to ASTERISK-14558 or is a new issue.
The text warnings show "out of bounds" errors so I am running tests again under Valgrind.

By: David Brillert (aragon) 2009-09-28 08:22:48

Just a quick note that under Valgrind I still see the WARNINGS.
I'm hoping for a crash under Valgrind but not confident it will do so.  I'll keep my fingers crossed that it will crash within the next few days.

By: Julien Leboeuf (yuenk) 2009-09-28 08:32:38

Same problem as Newborn but with 1.6.0.15

I sometimes get thoses messages :
[Sep 28 14:22:10] WARNING[25978] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:22:10] WARNING[25976] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:22:10] WARNING[25977] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:22:10] WARNING[25979] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:22:10] WARNING[25976] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:22:10] WARNING[25977] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9990
[Sep 28 14:36:27] WARNING[25975] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25971] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25970] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25972] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25979] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25977] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25974] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25973] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25975] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25971] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25978] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25972] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25979] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25976] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:36:27] WARNING[25970] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-9952
[Sep 28 14:42:54] WARNING[25972] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
[Sep 28 14:42:54] WARNING[25970] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
[Sep 28 14:42:54] WARNING[25977] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
[Sep 28 14:42:54] WARNING[25975] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
[Sep 28 14:42:54] WARNING[25976] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
[Sep 28 14:42:54] WARNING[25972] channel.c: Exceptionally long voice queue length queuing to IAX2/iaxasatna01-7084
But it doesn't immediatly make asterisk crash.


Anyway, twice a week, asterisk crashs :
Sep 27 16:57:47 asw-na-01 kernel: asterisk[25381] general protection rip:2aaac37876af rsp:2aab6f905cb0 error:0

/etc/init.d/asterisk restart, seem to restart correctly asterisk but I'm unable to get into CLI (with asterisk -r), it freezes before the prompt.
I have to "Ctrl + C" and "kill -9" the duplicated safe_asterisk, before.

Even though I don't see those "Exceptionally long voice queue" message just before the crash (they only appear several minutes before), I think they are related.

I initially reported this problem in ticket 0015900.

I really don't know how to reproduce it and I can't really play with the server, cause it processes about thousands calls a day.

Tell me if you want more infos or some tests.

By: David Brillert (aragon) 2009-09-28 09:18:43

Another user reported the crash in bug ASTERISK-14886 and that bug was just closed out as a duplicate.  However the bug reporter claims he can reproduce the crash at will.
I have not tested that claim but it seems worth checking out. I won't be testing this scenario since my lab is tied up with my Valgrind tests.

Summary   ASTERISK-14886: Asterisk crashed after transfer
Description The asterisk always crashed after transfer. I have duplicated this issue in 1.6.1.x and 1.6.2.x.

You can duplication this issue with following steps.

1.User A call to Asterisk and join meetme
2.User A active callout function and call to second user
3.User B answer User A's call
4.User A transfer User B into meetme
5.User B drop the call from meetme
6.User A drop the call from meetme <- Asterisk crashed in this step

By: Atis Lezdins (atis) 2009-09-29 06:14:40

I have tried filestream_frisolate-1.4.diff2.txt patch and have the following backtrace:

# 0  0x00000000004abc4c in ast_frame_free (frame=0x323931203a6d6f72, cache=1) at frame.c:377
# 1  0x00000000004607d6 in ast_write (chan=0x2aaac8309258, fr=0x2aaac41fda88) at channel.c:3486
# 2  0x00002aaab82c54b1 in moh_files_generator (chan=0x2aaac8309258, data=0x2aaac8584238, len=160, samples=160) at res_musiconhold.c:298
# 3  0x000000000045cc05 in ast_read_generator_actions (chan=0x2aaac8309258, f=0x2aaac839ad80) at channel.c:2516
# 4  0x000000000045ebc7 in __ast_read (chan=0x2aaac8309258, dropaudio=0) at channel.c:2996
# 5  0x000000000045ef6a in ast_read (chan=0x2aaac8309258) at channel.c:3034
# 6  0x0000000000447bbc in autoservice_run (ign=0x0) at autoservice.c:116
(snip)

full backtrace attached.

I can reproduce this at least two times a day, using load tests. Initial issue was ASTERISK-14899

By: Leif Madsen (lmadsen) 2009-09-29 09:26:16

Well I can now confirm this issue on a customers box running 1.6.1.5:

[Sep 29 10:22:00] WARNING[23202]: channel.c:1037 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/14169878046@incoming-6dbe;1

I'm not getting a crash, but this is probably because we're not running this configuration in production yet.

By: David Brillert (aragon) 2009-09-29 09:37:49

Hi Leif

The crashing is rare related to the warnings and locks but the locking lasts long enough to kill lots of SIP phone registrations due to lost qualify packets caused by the locks (I assume delayed qualify packets caused by the locks is causing the lost registrations).  In my test environment I have seen the locks last for several minutes at a time.  The system either recovers from the lock or it will crash (usually will recover after a service affecting delay).

By: David Brillert (aragon) 2009-09-29 10:22:28

I often get this error when doing a reload.
The same error appears often when I try to start Valgrind which normally is enough to crash Valgrind while loading.

ERROR[11308] manager.c: Could not obtain lock on manager list

By: Atis Lezdins (atis) 2009-09-29 16:24:23

Related to my last backtrace (with filestream_frisolate-1.4.diff2.txt patch) , i also have the following in mmlog:

WARNING: Freeing unused memory at 0x2aaad16a7f88, in __frame_free of frame.c, line 368
..repeating lot of times..
WARNING: Freeing unused memory at 0x2aaad16a7f88, in __frame_free of frame.c, line 368
WARNING: Freeing unused memory at 0x2336778, in __frame_free of frame.c, line 368
WARNING: Freeing unused memory at 0x2aaad16a7f88, in __frame_free of frame.c, line 368
WARNING: Freeing unused memory at 0x2aaad16a7f88, in __frame_free of frame.c, line 368
WARNING: Freeing unused memory at 0x2aaad16a7f88, in frame_cache_cleanup of frame.c, line 325
1254234079 - New session

By: David Brillert (aragon) 2009-09-29 16:29:32

I checked my mmlog and there is only one entry and it could be quite old.
WARNING: Freeing unused memory at 0xb1898a8, in local_pvt_destroy of chan_local.c, line 159

I'm still waiting for a valgrind crash to debug the out of bounds stuff.

By: Atis Lezdins (atis) 2009-09-29 16:39:46

aragon: you should have started time in previous line of mmlog, it's a unix timestamp, so you should make sure it's not from some old version of Asterisk.

By: David Brillert (aragon) 2009-09-30 08:20:45

atis: I wiped mmlog to start fresh I didn't see a timestamp in the file

By: David Brillert (aragon) 2009-09-30 08:53:42

Crash in valgrind.
valgrind.30092009.txt uploading in a few minutes.

By: David Brillert (aragon) 2009-10-01 12:23:48

I found these lines in the CLI after enabling debug.
The lines are interesting because I don't have allow reinvite enabled.
This line especially "chan_sip.c:1694 initialize_initreq: Initializing already initialized SIP dialog 46bb75636cb1464f1be500845642bde3@192.168.30.165 (presumably reinvite)"
Uploading a snippet of the CLI as 01102009cli.txt showing the warnings and debug lines.

[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:4761 sip_alloc: Allocating new SIP dialog for 46bb75636cb1464f1be500845642bde3@192.168.30.165 - REGISTER (No RTP)
[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:8038 transmit_register: Scheduled a registration timeout for 192.168.30.254 id  #1106439
[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:2300 __sip_semi_ack: (Provisional) Stopping retransmission (but retaining packet) on '46bb75636cb1464f1be500845642bde3@192.168.30.165' Request 1955: Found
[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:2258 __sip_ack: Stopping retransmission on '46bb75636cb1464f1be500845642bde3@192.168.30.165' of Request 1955: Match Found
[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:1694 initialize_initreq: Initializing already initialized SIP dialog 46bb75636cb1464f1be500845642bde3@192.168.30.165 (presumably reinvite)
[Oct  1 13:02:47] DEBUG[21537]: chan_sip.c:2300 __sip_semi_ack: (Provisional) Stopping retransmission (but retaining packet) on '46bb75636cb1464f1be500845642bde3@192.168.30.165' Request 1956: Found
[Oct  1 13:02:48] DEBUG[21537]: chan_sip.c:2258 __sip_ack: Stopping retransmission on '46bb75636cb1464f1be500845642bde3@192.168.30.165' of Request 1956: Match Found
[Oct  1 13:02:48] DEBUG[21537]: chan_sip.c:13138 handle_response_register: Registration successful
[Oct  1 13:02:48] DEBUG[21537]: chan_sip.c:13141 handle_response_register: Cancelling timeout 1106439

By: alexb (alexb) 2009-10-01 16:56:30

I am testing Asterisk 1.6 from SVN trunk and have noticed locks ("channel.c: Exceptionally long voice queue length queuing to IAX2") when an incoming call from a IAX channel is transferred (attended transfer) from a SIP extension to another SIP extension, and they both have directmedia=yes in sip.conf.

FYI, Asterisk 1.6.2.0-rc1 works fine.

By: Bartosz Supczinski (supczinskib) 2009-10-01 21:11:24

I have the same problem (1.4.26.2)

[Sep 24 10:11:25] NOTICE[20654] chan_agent.c: Agent 'Agent 6466' didn't answer/confirm within 19 seconds (waited 21)
[Sep 24 10:11:27] WARNING[20657] channel.c: Exceptionally long voice queue length queuing to Local/6466@agents-45e7,1
...
...
...

[Oct  1 13:15:11] NOTICE[20654] chan_agent.c: Agent 'Agent 6465' didn't answer/confirm within 19 seconds (waited 20)
[Oct  1 13:15:13] WARNING[10944] channel.c: Exceptionally long voice queue length queuing to Local/6465@agents-417a,1
...
...
...

CLI> sip show channels

Peer             User/ANR    Call ID      Seq (Tx/Rx)  Format           Hold     Last Message
10.129.160.136   5467        efca187b-bb  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        f4faf744-96  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        a48b4772-97  00101/00011  0x4 (ulaw)       No       Rx: BYE
10.129.160.136   5467        bb03d25d-2d  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        5b1feeca-16  00101/00009  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        950c7e0e-30  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        d0e2668a-f7  00101/00005  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        268f19ce-86  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        b8fe3ed2-8e  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   (None)      8f2ab9d1-14  00101/00024  0x0 (nothing)    No       Rx: REGISTER
10.129.160.189   (None)      24eb9050-8d  00101/00024  0x0 (nothing)    No       Rx: REGISTER
10.129.160.189   5465        27a38034-d7  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        2b7e9208-1c  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.251   5466        4346510f-54  00101/00005  0x4 (ulaw)       No       Rx: BYE
10.129.160.251   5466        b398dc47-e6  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.251   5466        a5ffc8ac-ca  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        a3525a60-4e  00101/00005  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        94ab85f2-4a  00101/00005  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        d0652fae-4f  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        eadf29b8-fa  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        97700be0-cd  00101/00005  0x4 (ulaw)       No       Rx: BYE
10.129.160.189   5465        bf18207c-17  00101/00003  0x4 (ulaw)       No       Rx: BYE
10.129.160.251   (None)      8bd47672-27  00101/05232  0x0 (nothing)    No       Rx: REGISTER
10.129.160.136   (None)      353383f0-a5  00101/05236  0x0 (nothing)    No       Rx: REGISTER
10.129.160.251   (None)      9b524df5-71  00101/05230  0x0 (nothing)    No       Rx: REGISTER
10.129.160.136   (None)      a0b4cd05-81  00101/05234  0x0 (nothing)    No       Rx: REGISTER

CLI> sip show peers

Name/username              Host            Dyn Nat ACL Port     Status              
6467/6467                  10.129.160.136   D          5060     Unmonitored          
6466/6466                  10.129.160.251   D          5060     Unmonitored          
6465/6465                  10.129.160.189   D          5060     Unmonitored          
5563/5563                  10.129.160.169   D          5060     Unmonitored          
5511/5511                  10.129.160.155   D          5060     Unmonitored          
5467/5467                  10.129.160.136   D          5060     Unmonitored          
5466/5466                  10.129.160.251   D          5060     Unmonitored          
5465/5465                  10.129.160.189   D          5060     Unmonitored          
5456/5456                  10.129.160.204   D          5060     Unmonitored          
5455/5455                  10.129.160.223   D          5060     Unmonitored



By: newborn (newborn) 2009-10-02 06:16:56

[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:08] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1
[Oct  2 15:01:09] WARNING[9234]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/11749@internal-ca34;1


another bug, now without /n...

has anyone solution for this problem?

By: David Brillert (aragon) 2009-10-05 10:11:33

I have no solution.  I can provide no solution, I can only test patches in an environment that will produce the WARNINGS/locks and crashes.
Even with Russell's patch I still get crashes within 24 hours with or without Valgrind.  I posted Valgrind dump, but I have no idea if its related to filestream crash or WARNINGS/deadlocks.  Sorry but I really don't know how to make sense of Valgrind dumps or GDB dumps.  I do know that when Asterisk crashed under Valgrind I had run out of memory but since I don't know how to read Valgrind dump I don't know what is leaking...

By: David Brillert (aragon) 2009-10-06 15:48:54

As of this morning I started testing SVN-branch-1.4-r222026M which does not appear to have memory leak and CPU load average is hugely improved over an 8 hour monitor period.
https://issues.asterisk.org/view.php?id=15865
commit 221200
I think that commit might have fixed the memory leak but I'll be watching for a few more days.

I still get WARNINGS and locks about Exceptionally long voice queue length queuing to Local.

Still monitoring for crash.

By: David Brillert (aragon) 2009-10-08 08:15:02

Results after 3 days of testing SVN-branch-1.4-r222152M patched with filestream_frisolate-1.4.diff2.txt

-No more memory leak during reload
-No more CPU increase over time caused by reload
-No more crash in filestream

Bad news
-ast_queue_frame: Exceptionally long voice queue length queuing to Local is still very frequent (every 30-60 seconds)
-This lock is still very frequent and appears related to hangup event (possibly related to destroying SIP dialogs based on CLI events surrounding this lock)

   -- Executing [s@all-hangup:1] GotoIf("SIP/siptrunk-09e1c438", "0?all-faxnotify|s|1:2") in new stack
   -- Goto (all-hangup,s,2)
   -- Executing [s@all-hangup:2] ResetCDR("SIP/siptrunk-09e1c438", "w") in new stack
   -- Executing [s@all-hangup:3] NoCDR("SIP/siptrunk-09e1c438", "") in new stack
   -- Executing [s@all-hangup:4] System("SIP/siptrunk-09e1c438", "/var/www/scopserv/telephony/scripts/billing/cdr.sh 1255006969.57632") in new stack
scrubbed*CLI> core show locks
scrubbed*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1235027056 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== ---> Lock #0 (autoservice.c): MUTEX 89 autoservice_run &(&aslist)->lock 0x81797a8 (1)
=== -------------------------------------------------------------------
===
=======================================================================

Really destroying SIP dialog '265d10a72082ace36d707d1d5dcc519f@127.0.0.1' Method: REGISTER
Really destroying SIP dialog '6fda56727fcb6f7328cd0af159fde10d@127.0.0.1' Method: REGISTER
Really destroying SIP dialog '1aff780a6c1956ce575991ec202f712b@127.0.0.1' Method: REGISTER
Really destroying SIP dialog '19e2afab3e262eed65ac06eb682baa1d@127.0.0.1' Method: REGISTER
Really destroying SIP dialog '492ce9c433393bfc733c61cf7fee4a34@127.0.0.1' Method: REGISTER
Really destroying SIP dialog '56a0b20e6d3a07dc6a02f44367faef4d@127.0.0.1' Method: REGISTER

There are relevant debug infos for those warnings in my previous notes on this bug report.
They are:
15609sipdebug_cli14082009.txt
15609threadapplyall_locked in autoservice_line89.txt
15609threadapplyallbt_coreshowlocks10082009.txt
15609_13082009_cli_output with_lock_debug.txt

By: David Brillert (aragon) 2009-10-08 13:02:40

Just before the autoservice.c mutel 89 autoservice_run warnings I see this.  It seems to be the pre-cursor before each actual autoservice lock.
It's very hard to catch in realtime with a core show locks command in order to gdb that thread...

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1235027056 (autoservice_run      started at [  238] autoservice.c ast_autoservice_start())
=== -------------------------------------------------------------------
===
=======================================================================

By: Russell Bryant (russell) 2009-10-08 14:26:24

atis, I see your debug output with the last patch, but what version of Asterisk was that from?  The line numbers don't match up to the latest 1.4 branch code, so I'm not sure where it's coming from.  I'm about to merge the patch, because if there is a problem, I don't think it's with this patch, but elsewhere.  Let me know if it still happens after I merge it.

By: Digium Subversion (svnbot) 2009-10-08 14:49:12

Repository: asterisk
Revision: 222878

U   branches/1.4/include/asterisk/file.h
U   branches/1.4/include/asterisk/frame.h
U   branches/1.4/main/file.c
U   branches/1.4/main/frame.c

------------------------------------------------------------------------
r222878 | russell | 2009-10-08 14:49:10 -0500 (Thu, 08 Oct 2009) | 44 lines

Make filestream frame handling safer by isolating frames before returning them.

This patch is related to a number of issues on the bug tracker that show
crashes related to freeing frames that came from a filestream.  A number of
fixes have been made over time while trying to figure out these problems, but
there re still people seeing the crash.  (Note that some of these bug reports
include information about other problems.  I am specifically addressing
the filestream frame crash here.)

I'm still not clear on what the exact problem is.  However, what is _very_
clear is that we have seen quite a few problems over time related to unexpected
behavior when we try to use embedded frames as an optimization.  In some cases,
this optimization doesn't really provide much due to improvements made in other
areas.

In this case, the patch modifies filestream handling such that the embedded frame
will not be returned.  ast_frisolate() is used to ensure that we end up with a
completely mallocd frame.  In reality, though, we will not actually have to malloc
every time.  For filestreams, the frame will almost always be allocated and freed
in the same thread.  That means that the thread local frame cache will be used.
So, going this route doesn't hurt.

With this patch in place, some people have reported success in not seeing the
crash anymore.

(SWP-150)
(AST-208)
(ABE-1834)

(issue ASTERISK-14558)
Reported by: aragon
Patches:
     filestream_frisolate-1.4.diff2.txt uploaded by russell (license 2)
Tested by: aragon, russell

(closes issue ASTERISK-14756)
Reported by: zerohalo
Tested by: zerohalo

(closes issue ASTERISK-14784)
Reported by: marhbere

Review: https://reviewboard.asterisk.org/r/386/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222878

By: Atis Lezdins (atis) 2009-10-08 14:52:16

I tested last patch few times on 1.6.1 branch, and since i got identical backtraces i posted one of them here and reverted, since those were more often than previous crashes.

However, somehow I stopped getting those crashes (currently using 1.6.1.6), maybe it's the same reason as for aragon. Other significant things I've done could be:
* removing res_timing_pthread (which isn't present on 1.4),
* totally avoiding usage of "core show channels" and somehow avoiding "queue show",
* suppressing logger flood by my own patch (ASTERISK-14896),
* and updating func_lock with latest patches (as in SVN)

By: Digium Subversion (svnbot) 2009-10-08 14:55:26

Repository: asterisk
Revision: 222880

_U  trunk/
U   trunk/include/asterisk/file.h
U   trunk/include/asterisk/frame.h
U   trunk/main/file.c
U   trunk/main/frame.c

------------------------------------------------------------------------
r222880 | russell | 2009-10-08 14:55:26 -0500 (Thu, 08 Oct 2009) | 51 lines

Merged revisions 222878 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r222878 | russell | 2009-10-08 14:45:47 -0500 (Thu, 08 Oct 2009) | 44 lines
 
 Make filestream frame handling safer by isolating frames before returning them.
 
 This patch is related to a number of issues on the bug tracker that show
 crashes related to freeing frames that came from a filestream.  A number of
 fixes have been made over time while trying to figure out these problems, but
 there re still people seeing the crash.  (Note that some of these bug reports
 include information about other problems.  I am specifically addressing
 the filestream frame crash here.)
 
 I'm still not clear on what the exact problem is.  However, what is _very_
 clear is that we have seen quite a few problems over time related to unexpected
 behavior when we try to use embedded frames as an optimization.  In some cases,
 this optimization doesn't really provide much due to improvements made in other
 areas.
 
 In this case, the patch modifies filestream handling such that the embedded frame
 will not be returned.  ast_frisolate() is used to ensure that we end up with a
 completely mallocd frame.  In reality, though, we will not actually have to malloc
 every time.  For filestreams, the frame will almost always be allocated and freed
 in the same thread.  That means that the thread local frame cache will be used.
 So, going this route doesn't hurt.
 
 With this patch in place, some people have reported success in not seeing the
 crash anymore.
 
 (SWP-150)
 (AST-208)
 (ABE-1834)
 
 (issue ASTERISK-14558)
 Reported by: aragon
 Patches:
       filestream_frisolate-1.4.diff2.txt uploaded by russell (license 2)
 Tested by: aragon, russell
 
 (closes issue ASTERISK-14756)
 Reported by: zerohalo
 Tested by: zerohalo
 
 (closes issue ASTERISK-14784)
 Reported by: marhbere
 
 Review: https://reviewboard.asterisk.org/r/386/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222880

By: Digium Subversion (svnbot) 2009-10-08 14:57:44

Repository: asterisk
Revision: 222881

_U  branches/1.6.0/
U   branches/1.6.0/include/asterisk/file.h
U   branches/1.6.0/include/asterisk/frame.h
U   branches/1.6.0/main/file.c
U   branches/1.6.0/main/frame.c

------------------------------------------------------------------------
r222881 | russell | 2009-10-08 14:57:44 -0500 (Thu, 08 Oct 2009) | 58 lines

Merged revisions 222880 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r222880 | russell | 2009-10-08 14:52:03 -0500 (Thu, 08 Oct 2009) | 51 lines
 
 Merged revisions 222878 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r222878 | russell | 2009-10-08 14:45:47 -0500 (Thu, 08 Oct 2009) | 44 lines
   
   Make filestream frame handling safer by isolating frames before returning them.
   
   This patch is related to a number of issues on the bug tracker that show
   crashes related to freeing frames that came from a filestream.  A number of
   fixes have been made over time while trying to figure out these problems, but
   there re still people seeing the crash.  (Note that some of these bug reports
   include information about other problems.  I am specifically addressing
   the filestream frame crash here.)
   
   I'm still not clear on what the exact problem is.  However, what is _very_
   clear is that we have seen quite a few problems over time related to unexpected
   behavior when we try to use embedded frames as an optimization.  In some cases,
   this optimization doesn't really provide much due to improvements made in other
   areas.
   
   In this case, the patch modifies filestream handling such that the embedded frame
   will not be returned.  ast_frisolate() is used to ensure that we end up with a
   completely mallocd frame.  In reality, though, we will not actually have to malloc
   every time.  For filestreams, the frame will almost always be allocated and freed
   in the same thread.  That means that the thread local frame cache will be used.
   So, going this route doesn't hurt.
   
   With this patch in place, some people have reported success in not seeing the
   crash anymore.
   
   (SWP-150)
   (AST-208)
   (ABE-1834)
   
   (issue ASTERISK-14558)
   Reported by: aragon
   Patches:
         filestream_frisolate-1.4.diff2.txt uploaded by russell (license 2)
   Tested by: aragon, russell
   
   (closes issue ASTERISK-14756)
   Reported by: zerohalo
   Tested by: zerohalo
   
   (closes issue ASTERISK-14784)
   Reported by: marhbere
   
   Review: https://reviewboard.asterisk.org/r/386/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222881

By: Digium Subversion (svnbot) 2009-10-08 15:00:51

Repository: asterisk
Revision: 222882

_U  branches/1.6.1/
U   branches/1.6.1/include/asterisk/file.h
U   branches/1.6.1/include/asterisk/frame.h
U   branches/1.6.1/main/file.c
U   branches/1.6.1/main/frame.c

------------------------------------------------------------------------
r222882 | russell | 2009-10-08 15:00:51 -0500 (Thu, 08 Oct 2009) | 58 lines

Merged revisions 222880 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r222880 | russell | 2009-10-08 14:52:03 -0500 (Thu, 08 Oct 2009) | 51 lines
 
 Merged revisions 222878 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r222878 | russell | 2009-10-08 14:45:47 -0500 (Thu, 08 Oct 2009) | 44 lines
   
   Make filestream frame handling safer by isolating frames before returning them.
   
   This patch is related to a number of issues on the bug tracker that show
   crashes related to freeing frames that came from a filestream.  A number of
   fixes have been made over time while trying to figure out these problems, but
   there re still people seeing the crash.  (Note that some of these bug reports
   include information about other problems.  I am specifically addressing
   the filestream frame crash here.)
   
   I'm still not clear on what the exact problem is.  However, what is _very_
   clear is that we have seen quite a few problems over time related to unexpected
   behavior when we try to use embedded frames as an optimization.  In some cases,
   this optimization doesn't really provide much due to improvements made in other
   areas.
   
   In this case, the patch modifies filestream handling such that the embedded frame
   will not be returned.  ast_frisolate() is used to ensure that we end up with a
   completely mallocd frame.  In reality, though, we will not actually have to malloc
   every time.  For filestreams, the frame will almost always be allocated and freed
   in the same thread.  That means that the thread local frame cache will be used.
   So, going this route doesn't hurt.
   
   With this patch in place, some people have reported success in not seeing the
   crash anymore.
   
   (SWP-150)
   (AST-208)
   (ABE-1834)
   
   (issue ASTERISK-14558)
   Reported by: aragon
   Patches:
         filestream_frisolate-1.4.diff2.txt uploaded by russell (license 2)
   Tested by: aragon, russell
   
   (closes issue ASTERISK-14756)
   Reported by: zerohalo
   Tested by: zerohalo
   
   (closes issue ASTERISK-14784)
   Reported by: marhbere
   
   Review: https://reviewboard.asterisk.org/r/386/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222882

By: Digium Subversion (svnbot) 2009-10-08 15:03:42

Repository: asterisk
Revision: 222883

_U  branches/1.6.2/
U   branches/1.6.2/include/asterisk/file.h
U   branches/1.6.2/include/asterisk/frame.h
U   branches/1.6.2/main/file.c
U   branches/1.6.2/main/frame.c

------------------------------------------------------------------------
r222883 | russell | 2009-10-08 15:03:42 -0500 (Thu, 08 Oct 2009) | 58 lines

Merged revisions 222880 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r222880 | russell | 2009-10-08 14:52:03 -0500 (Thu, 08 Oct 2009) | 51 lines
 
 Merged revisions 222878 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r222878 | russell | 2009-10-08 14:45:47 -0500 (Thu, 08 Oct 2009) | 44 lines
   
   Make filestream frame handling safer by isolating frames before returning them.
   
   This patch is related to a number of issues on the bug tracker that show
   crashes related to freeing frames that came from a filestream.  A number of
   fixes have been made over time while trying to figure out these problems, but
   there re still people seeing the crash.  (Note that some of these bug reports
   include information about other problems.  I am specifically addressing
   the filestream frame crash here.)
   
   I'm still not clear on what the exact problem is.  However, what is _very_
   clear is that we have seen quite a few problems over time related to unexpected
   behavior when we try to use embedded frames as an optimization.  In some cases,
   this optimization doesn't really provide much due to improvements made in other
   areas.
   
   In this case, the patch modifies filestream handling such that the embedded frame
   will not be returned.  ast_frisolate() is used to ensure that we end up with a
   completely mallocd frame.  In reality, though, we will not actually have to malloc
   every time.  For filestreams, the frame will almost always be allocated and freed
   in the same thread.  That means that the thread local frame cache will be used.
   So, going this route doesn't hurt.
   
   With this patch in place, some people have reported success in not seeing the
   crash anymore.
   
   (SWP-150)
   (AST-208)
   (ABE-1834)
   
   (issue ASTERISK-14558)
   Reported by: aragon
   Patches:
         filestream_frisolate-1.4.diff2.txt uploaded by russell (license 2)
   Tested by: aragon, russell
   
   (closes issue ASTERISK-14756)
   Reported by: zerohalo
   Tested by: zerohalo
   
   (closes issue ASTERISK-14784)
   Reported by: marhbere
   
   Review: https://reviewboard.asterisk.org/r/386/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222883

By: David Brillert (aragon) 2009-10-08 15:59:19

Atis' comments about core show channels made me think about possible sources of the WARNINGS.  Atis' bug report ASTERISK-14896

I correlated the *WARNINGS* and *core show channels* with *ngrep -d any port 5038 |grep Command*

When I saw the WARNINGS I saw always something like this with ngrep:
Action: Command..Command: SHOW CHANNEL SIP/siptrunk-b6a19778....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,1....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,2....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/1605-099dc040....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,2....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/siptrunk-b6a19778....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,1....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,2....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/1605-099dc040....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/siptrunk-b6a19778....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1605@default-agent-0cf3,1....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1648@default-agent-7558,2....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/1648-099e4b48....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL Local/1648@default-agent-7558,2....
 Response: Follows..Privilege: Command..
 Action: Command..Command: SHOW CHANNEL SIP/1648-099e4b48....

By: Russell Bryant (russell) 2009-10-08 17:30:56

Now I would like to work on the problem causing the "Exceptionally long voice queue length" message.  I have some theories as to what is going on here, but access to a test environment that demonstrates the issue would be _extremely_ helpful.

aragon, if you (or anyone else) can grant me shell access to a test system where I can work on this, I would appreciate it.  Otherwise, I can attempt to reproduce it.

Feel free to contact me via email at russell (AT) digium (DOT) com.  However, I'm going to be traveling for the next week and a half, so responses may be delayed.

By: Russell Bryant (russell) 2009-10-08 17:37:20

aragon, how heavily are you using those commands?  Also, under what system load are you using those commands?

The code involved in doing a "show channel" is _very_ expensive locking wise.  It is entirely feasible that there are delays caused by the excessive locking going on, which frame queue processing would be affected by.

If this is what is going on, this is not really fixable.  You just have to minimize usage of that command.

I suppose one positive note is that the code responsible for this has been completely rewritten in Asterisk trunk, so at least in the next major version of Asterisk, channel lookup performance has been drastically improved.

By: David Brillert (aragon) 2009-10-08 19:22:12

Russell: I sent you an email with the ssh info.
Load average is about average between 3-4 (development server under very high traffic).  I didn't see any of these warnings or locks until I started testing 1.4 SVN 206273 previously 1.4.24 did not show these locks or warnings.

I'm not sure how heavily the commands are used since they are issued by a daemon that is used to provide realtime graphical ACD queue/agent reports for call center managers etc... However if I shut down the daemon I can reproduce the warnings and locks by manually issuing core show channels command in the CLI.  It is much easier to reproduce using the daemon.

By: David Brillert (aragon) 2009-10-09 08:35:13

Russell, I know it is not sanctioned but how hard would that trunk channel lookup be to backport to 1.4 (assuming this would never be committed to 1.4)?

By: David Brillert (aragon) 2009-10-09 13:54:18

Russell,

I shut down my daemon that does the core show channels.
Also I did not run any core show * commands manually.
I verified through ngrep that the daemon was not sending any more show commands Afterwards I still see the warnings and locks occasionally.
Approximately 3-5 minutes between WARNINGS.
But yesterday I definitely saw some correlation between the WARNINGS/locks and core show channels.

Something else is going on and I have no idea what... hopefully you have a better understanding of what is going on.



By: Atis Lezdins (atis) 2009-10-09 13:54:28

I also provided Russell with access info to my testing environment.

To my observing , the "Exceptionally long voice queue" could be related to amount of load, but I still don't have enough details to prove that (I could try to write munin plugin to graph amount of those messages and compare with server load).

Aragon, I took a look at that channel fix last week, and changes are massive. I wouldn't want to do backport even for 1.6.1.

To my knowledge "show channels" never have been stable for production use, only occasional debug. So, You would be much better with handling some AMI events (or tss.. my realtime channels patches)

By: David Brillert (aragon) 2009-10-09 14:09:25

The average load is about 3.3 on my test system
Memory usage about 400 MB
CPU load does spike to to 4.5 when I see the WARNINGS and then returns to normal.
Memory spikes like crazy until WARNINGS go away.

The system is going crazy right this second and the WARNINGS have not disappeared.

By: David Brillert (aragon) 2009-10-09 14:14:15

I just had a crazy spike and the WARNINGS were spamming the CLI.
Memory usage went nuts and CPU usage reached 13.
Once the system recovers from the warnings and locking the load and memory return to normal.
I don't think the load average causes the problem, I think the problem causes the crazy CPU and memory usage.

By: Atis Lezdins (atis) 2009-10-09 14:19:22

aragon, you could try logger_repeating.patch from ASTERISK-14896 - i guess, you have that many of those messages, that logger thread can't simply write them fast enough. Adjust the LOG_SUPPRESS_IDENTICAL_MESSAGES to your needs (i'm going to make it 30 for production use). That should solve CPU and memory.

Btw, how many cores does your system have?

I'm running a quad core system with 70-80 parallel channels and it keeps within l 1-2.5 with occasional peaks to 3.5. When i was running 120 parallel channels, it was 3.5-5 with pikes to 6



By: David Brillert (aragon) 2009-10-09 14:19:27

I am growing more and more certain that the problem is in the hangup

By: David Brillert (aragon) 2009-10-09 14:22:58

atis: Pentium Xeon 2.8 GHz dual core, 1GB ECC RAM

I'm not sure to to use the logger patch, I am hoping for someone to find root cause.  IMHO finding root cause will be easier if it is easier to reproduce the problem and the messages.

By: Atis Lezdins (atis) 2009-10-09 14:28:21

aragon, that logger patch helped me find cause of that issue. All it does - is avoid warnings flood and corresponding memory/cpu spikes, and that makes debugging much much easier. You will just see "Message repeats 180 times" and won't spend time on waiting for machine to recover.

Of course, that one is not intended for commit, i probably have to open separate issue with some config options.

By: David Brillert (aragon) 2009-10-09 14:35:24

atis: I tried to patch today's latest 1.4 branch with your logger patch

patching file main/logger.c
Hunk #1 FAILED at 94.
Hunk #2 FAILED at 1094.
Hunk #3 FAILED at 1128.
3 out of 3 hunks FAILED -- saving rejects to file main/logger.c.rej

By: Atis Lezdins (atis) 2009-10-09 15:00:00

this one is for latest 1.4

By: David Brillert (aragon) 2009-10-09 15:24:01

Thanks atis, Installed
SVN-branch-1.4-r223225M patched with logger_repeating_14.patch and flushed my messages log

I haven't noticed much change in memory or CPU usage but LOG_SUPPRESS_IDENTICAL_MESSAGES 30 and I don't think I have exceeded 30 yet since restarting Asterisk.

By: David Brillert (aragon) 2009-10-15 14:43:47

Still seeing some autoservice locks although the line numbers have changed in SVN-branch-1.4-r223550M

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1231324272 (autoservice_run      started at [  239] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (autoservice.c): MUTEX 94 autoservice_run &(&aslist)->lock 0x8173648 (1)
=== --- ---> Locked Here: autoservice.c line 94 (autoservice_run)

I only mention the autoservice lock because mmichelson said there should only ever be one instance of autoservice ever and I think that might be important.  Maybe the autoservice lock is meaningless...  It does not always appear before the warnings.



By: David Brillert (aragon) 2009-10-19 08:39:54

I am not seeing any more filestream crashes since SVN-branch-1.4-r223550M
So the crashing is no longer an issue for me.

The long voice queue messages are still a big issue.
I ran SVN-branch-1.4-r223550M under Valgrind long enough to capture some Valgrind info since that might help.
Uploading valgrind09102009ast_queue_frame.txt

By: Nathan Stocks (nathan) 2009-10-20 14:06:30

We're also seeing the "Exceptionally long voice queue length queuing to IAX2/..." log spewing problem on 1.4.26.2.  I included a snippet below.

We did some research and found that both the IAX2 channels mentioned were connected to the voicemail that we send callers to after they time out of one of our queues--but that doesn't make a lot of sense as they spewed for about a half hour before we stopped them and our voicemail shouldn't take 30-minute long messages...  Doing a soft hangup on each of those channels stopped the log spewing.

"core show channels concise" gave this description of one of the channels before we hung it up:

IAX2/connectfromphoneserver-6013!exit_supportqueue_digits!supportqueue_voicemail!1!Up!(None)!!2394154690!!3!1687!(None)

[Oct 20 12:50:49] WARNING[22380]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-6013
[Oct 20 12:50:49] WARNING[22378]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-4049
[Oct 20 12:50:49] WARNING[22375]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-6013

[[ Repeated many times ]]

[Oct 20 12:50:51] WARNING[22377]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-4049
[Oct 20 12:50:51] WARNING[22375]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-6013
[Oct 20 12:50:51] WARNING[22382]: channel.c:951 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/connectfromphoneserver-4049



By: Nathan Stocks (nathan) 2009-10-20 14:09:52

Jeez, sorry about the length of that snippet.  I didn't realize that my just-over-a-second of logs was quite so long, but I guess it serves to underscore the floodiness of the logging...

By: David Brillert (aragon) 2009-10-22 14:21:23

OK I found root cause.
Now I am way out of my element and someone else will have to fix/patch.

I installed each revision of SVN until I found the bad revision.
ASTERISK-14558 introduced by r200991
https://reviewboard.asterisk.org/r/175/

4) Some minor optimizations in ast_frisolate(), and fix two bugs where it would copy pointers to malloc'd data from the source frame but not clear those pointers in the source frame (resulting in a likely crash if the source frame was ever freed). In essence, the output frame was *not* fully isolated from the source frame. There were only two users in the tree and neither of them could cause this bug, though.

5) Use ast_frisolate() in the autoservice run thread, to avoid actually duplicating a frame if it is not necessary.

By: Russell Bryant (russell) 2009-10-23 10:13:15

Excellent work, aragon!  *goes back to look at code*

By: Doug (doug) 2009-11-02 02:04:02.000-0600

Hi Guys

Any news on this error yet? We getting this on 1.4.27 and happens on all types of media. We have tried using mISDN, IAX and Zaptel.


[Nov  2 09:48:10] WARNING[4212]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4213]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4216]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4208]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4211]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4212]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4213]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4216]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4208]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646
[Nov  2 09:48:10] WARNING[4211]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/8000-4646

By: Doug (doug) 2009-11-04 01:41:57.000-0600

Hi all

This problem is on our side service affecting as it hangs up the channel after a bunch or these warnings. I am able to reproduce this easily but cannot locate the reason for the problem. We have no locks at any time so Im sure the above lock info is unrelated.

By: David Brillert (aragon) 2009-11-11 16:22:06.000-0600

We have been converting agi to fastagi and have reduced CPU loads a lot.
With latest SVN-branch-1.4-r229091M  I have yet to see autoservice problem
Load average at approx 15cps is about .3 on test rig.
WARNINGS only occur during hangups
With fastagi and some deadAGI scripts to execute hangups WARNINGS appear to have been replaced with a new type of problem which I think is still indicative of the underlying problem. I think the deadAGI warning below is benign but I don't understand why I see this warning about a live channel AFTER executing h extension (channel should not be active).

  -- Executing [s@all-hangup:2] ResetCDR("SIP/siptrunk-000001bd", "w") in new stack
   -- AGI Script agi://127.0.0.1:4573/cdr?arg1=1257994331.883 completed, returning 0
 == Spawn extension (default-default, 1692, 6) exited non-zero on 'Local/1649@default-agent-525a,2'
   -- Executing [s@all-hangup:3] NoCDR("SIP/siptrunk-000001bd", "") in new stack
   -- Executing [s@all-hangup:4] DeadAGI("SIP/siptrunk-000001bd", "agi://127.0.0.1:4573/cdr?arg1=1257994353.889") in new stack
   -- AGI Script agi://127.0.0.1:4573/cdr?arg1=1257994353.889 completed, returning 0
 == Spawn extension (all-hangup, s, 14) exited non-zero on 'SIP/siptrunk-000001bd'
 == Spawn extension (default-local-devices, 1605, 4) exited non-zero on 'Local/1605@default-agent-cba7,2'
asterisk*CLI> core show locks
asterisk*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1246450800 (pbx_thread           started at [ 2624] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2836 ast_write &chan->lock 0x9e19148 (1)
=== ---> Lock #1 (chan_local.c): MUTEX 329 local_write &p->lock 0x9e4b560 (1)
=== ---> Lock #2 (chan_local.c): MUTEX 185 local_queue_frame &other->lock 0x9d9a910 (2)
=== -------------------------------------------------------------------
===
=== Thread ID: -1246205040 (pbx_thread           started at [ 2624] pbx.c ast_pbx_start())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 2078 __ast_read &chan->lock 0x9d9a910 (0)
=== -------------------------------------------------------------------
===



By: David Brillert (aragon) 2009-11-11 21:19:14.000-0600

Oops I forgot the new WARNING message

-- Goto (all-hangup,s,1)
   -- Executing [s@all-hangup:1] GotoIf("SIP/siptrunk-00000263", "0?all-faxnotify|s|1:2") in new stack
   -- Goto (all-hangup,s,2)
   -- Executing [s@all-hangup:2] ResetCDR("SIP/siptrunk-00000263", "w") in new stack
   -- Executing [s@all-hangup:3] NoCDR("SIP/siptrunk-00000263", "") in new stack
   -- Executing [s@all-hangup:4] DeadAGI("SIP/siptrunk-00000263", "agi://127.0.0.1:4573/cdr?arg1=1257995368.1222") in new stack
[Nov 11 22:11:22] WARNING[30237]: res_agi.c:2203 deadagi_exec: Running DeadAGI on a live channel will cause problems, please use AGI
   -- AGI Script agi://127.0.0.1:4573/cdr?arg1=1257995368.1222 completed, returning 0
 == Spawn extension (all-hangup, s, 14) exited non-zero on 'SIP/siptrunk-00000263'

By: David Brillert (aragon) 2009-11-12 06:56:53.000-0600

Are others who experience this bug also using AGI?
I think this bug might be related to ast_carefulwrite in AGI.

By: David Brillert (aragon) 2009-11-12 15:35:19.000-0600

After two days of testing on test rig replacing agi with fastagi and deadagi (for hangups) has completely removed all of the WARNINGS.



By: David Brillert (aragon) 2009-11-13 09:56:34.000-0600

I understand the relationship added to 16081, 16229
Something is still wrong here though since I found root cause of the WARNING in r200991.  Converting to fastAGI seems more like a workaround.
Does agi code have anything in common with r200911 (ast_frisolate)?
I think it is already established there is relation of agi problems to ast_carefulwrite because of ASTERISK-14994...
Perhaps there is relationship between ast_frisolate and ast_carefulwrite?
I have no idea how to read code so I am just speculating...

By: Leif Madsen (lmadsen) 2009-11-13 10:38:59.000-0600

Oh we're not saying something isn't wrong here, but rather that it appears you've somewhat narrowed down the issue.

There certainly is still something here to be fixed.

By: David Brillert (aragon) 2009-11-13 13:01:13.000-0600

Thanks Leif
I was able to update ASTERISK-14994 with some info for Russell after he supplied me with a debug patch. I hope that helps narrow this down some more.



By: Nathan Stocks (nathan) 2009-11-13 15:47:22.000-0600

We aren't using agi at all.  We're still seeing the "Exceptionally long voice queue length queuing to IAX2/..." errors.

By: David Brillert (aragon) 2009-11-16 08:39:41.000-0600

Well... I spoke to soon.
The warnings are far less frequent after replacing agi with fastagi but I see them in my logs after more testing this weekend.
Also I'm getting some badness in app_queue.c which I haven't seen for a long time.  I am using local agents with /n

[Nov 15 02:07:42] WARNING[8260] app_queue.c: The device state of this queue member, Local/1644@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
[Nov 15 02:30:16] WARNING[11971] app_queue.c: The device state of this queue member, Local/1614@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 07:24:51] WARNING[27182] channel.c: Exceptionally long voice queue length queuing to Local/1643@default-agent-f1ad,1
[Nov 15 08:38:58] WARNING[7120] app_queue.c: The device state of this queue member, Local/1642@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
[Nov 15 09:01:45] WARNING[10653] app_queue.c: The device state of this queue member, Local/1648@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.
[Nov 15 09:10:45] WARNING[12065] app_queue.c: The device state of this queue member, Local/1611@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.

And

[Nov 15 20:43:22] WARNING[3652] res_features.c: Did not read data.

By: Doug (doug) 2009-11-17 01:26:44.000-0600

Hi Guys

We also get a problem that seems to be caused by FastAGI, we had the following errors on all calls internal and external until the FastAGI script was restarted.

-- Executing [s@macro-default-dial:7] AGI("SIP/9304-b5e0e3f0", "agi://127.0.0.1:4573/dial") in new stack
[Nov 17 08:44:26] ERROR[2978]: utils.c:966 ast_carefulwrite: write() returned error: Connection refused
[Nov 17 08:44:26] WARNING[2978]: res_agi.c:238 launch_netscript: Connect to 'agi://127.0.0.1:4573/dial' failed: Connection refused

By: pzhy (pzhy) 2009-11-19 03:36:24.000-0600

I apologize for my English

We have the system:
SIP -> Asterisk (Queue) -> IAX2 (Queue members)

About 2 weeks the system worked stably. After that, for queue members parameter "qualify" to iax.conf was set to "no" (this was before the "yes"). There immediately appeared this problems.

In the log there are following errors:
WARNING[3080] chan_iax2.c: Unable to open IAX timing interface: No such file or directory
...
WARNING[3080] utils.c: trying to reset empty pool
WARNING[3080] utils.c: trying to reset empty pool
...
ERROR[3086] pbx.c: Received invalid event that had no device IE
ERROR[3086] app_queue.c: Received invalid event that had no device IE
...
app_dial.c: Unable to forward voice frame
...
WARNING[3112] chan_iax2.c: Unable to cancel schedule ID 432921.  This is probably a bug (chan_iax2.c: iax2_sched_replace, line 1149).
...
WARNING[3118] channel.c: Exceptionally long voice queue length queuing to IAX2/40139-14715
WARNING[3115] channel.c: Exceptionally long voice queue length queuing to IAX2/40107-412
...
WARNING[3111] channel.c: Exceptionally long voice queue length queuing to IAX2/40111-6242
WARNING[3112] channel.c: Exceptionally long voice queue length queuing to IAX2/40139-14715

version of Asterisk 1.6.0.14



By: David Brillert (aragon) 2009-11-20 15:10:44.000-0600

Database show reproduces NOTICE under Valgrind with agi or fastAGI

By: David Brillert (aragon) 2009-11-20 15:26:18.000-0600

valgrind and gdb pid files uploaded

By: David Brillert (aragon) 2009-11-24 17:11:37.000-0600

This bug is getting intolerable and my customers are tearing me apart.
I just can't stand any more beatings.
Please help get this closed ASAP, I am reduced to begging... and I am forced to use very old Asterisk versions in busy call centers (1.4.21).
I have contributed greatly to stabilize Asterisk 1.4.27 and now I humbly ask for someone to return the favour.

By: David Brillert (aragon) 2009-11-25 15:42:05.000-0600

Definitely related to ASTERISK-14994
I can reproduce both issues with database show
The locking on db.c during database show lasts for over 7 seconds on average and sometimes much longer. Database show for example generates both ast_carefulwrite warning and exceptionally long voice queuing events. They both spew so fast out of the CLI you can only see both sets of warnings in /var/log/asterisk/messages

To reproduce on our busy system is now easy.
* CLI do
database show
core show locks

core show locks
fireworx*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1251673200 (netconsole started at [ 1036] asterisk.c listener())
=== ---> Lock #0 (db.c): MUTEX 325 database_show &dblock 0x8180300 (1)
=== -------------------------------------------------------------------
===
=== Thread ID: -1252164720 (pbx_thread started at [ 2646] pbx.c ast_pbx_start())
=== ---> Waiting for Lock #0 (db.c): MUTEX 178 ast_db_get &dblock 0x8180300 (1)
=== --- ---> Locked Here: db.c line 325 (database_show)
=== -------------------------------------------------------------------
===
=======================================================================

[Nov 20 15:42:50] NOTICE[4422]: utils.c:938 __ast_carefulwrite: Timed out trying to write at line 73 of ast_cli in file cli.c
[Nov 20 15:42:50] NOTICE[4422]: utils.c:938 __ast_carefulwrite: Timed out trying to write at line 73 of ast_cli in file cli.c
[Nov 20 15:42:50] NOTICE[4422]: utils.c:938 __ast_carefulwrite: Timed out trying to write at line 73 of ast_cli in file cli.c

By: Joel Vandal (jvandal) 2009-11-26 16:38:58.000-0600

Related to " Exceptionally long voice queue length queuing " messages, I've made several tests in order to reproduce this issue and I think I've found a possible cause...

- We are using MySQL backend for CDR
- CDR table is very large
- CDR table is not optimized <<<<<<<<<

We simply do, from MySQL, " optimize table cdr " and all errors related to " Exceptionally long voice ... " are gone.

By: David Brillert (aragon) 2009-11-26 18:52:11.000-0600

jvandal is correct.
With mysql cdr table optimized no more "long voice..." warnings.
Even if I do database show now the warnings are gone.
What is unclear to me is why database show would lock Asterisk so bad with small corruption in mysql cdr database?

By: Atis Lezdins (atis) 2009-11-26 19:28:12.000-0600

I have very small CDR table named cdr_post it has only last few unprocessed CDR records. On every web access i retrieve everything from it and process to my actual data tables, so Asterisk can post to CDR instantly.

By: eupla (eupla) 2009-11-27 02:02:14.000-0600

We don't have a MySQL database and we get those "Exceptionally long voice..." errors too with version 1.6.0.15.

The problem seems to be gone using 1.6.0.5, at least for us.

We tried version 1.6.0.18 when it was released, and it worked even worse than 1.6.0.15. We didn't have audio in outgoing calls through a Vega 50 Europa gateway.

By: David Brillert (aragon) 2009-11-27 12:06:42.000-0600

The problem from a high level analysis is that Asterisk delays RTP while something is locked.
In our case Asterisk locks while doing some CDR reads/writes and apparently during hangup phase when we try to write to MySQL CDR record. If that database is corrupt then the Asterisk lock can take a very long time while RTP is delayed. Other events like astdb get and put can also be delayed to a point where the entire event is delayed past the ast_carefulwrite threshold and not all info is written to the astdb file. That results in agent login corruption.

After repairing the MySQL CDR database the warnings have disappeared and I can no longer reproduce with a database show command.  We have also added a midnight routine to optimize MySQL database daily to prevent further instances.

gdb_asterisk_pid_database_show.txt should have the required debug info to determine whether the locking is normal or not.  I attached gdb to Asterisk while reproducing the error with a database show command.  The same things applies to database_show_valgrind.txt

By: David Brillert (aragon) 2009-11-30 14:19:27.000-0600

The "Exceptionally long voice..." warnings are back.
We no more do any request from AMI
We no more use AstDB
We have very small CDR database (thanks for the idea atis)
We no more use AGI
We have no more ideas on how to fix this problem.

By: David Brillert (aragon) 2009-11-30 14:20:54.000-0600

And since repairing the CDR database can no longer reproduce the warnings at will with database show command.

By: David Brillert (aragon) 2009-11-30 15:02:14.000-0600

IMHO still related to the following:
introduced by r200991
https://reviewboard.asterisk.org/r/175/

4) Some minor optimizations in ast_frisolate(), and fix two bugs where it would copy pointers to malloc'd data from the source frame but not clear those pointers in the source frame (resulting in a likely crash if the source frame was ever freed). In essence, the output frame was *not* fully isolated from the source frame. There were only two users in the tree and neither of them could cause this bug, though.

5) Use ast_frisolate() in the autoservice run thread, to avoid actually duplicating a frame if it is not necessary.

By: David Brillert (aragon) 2009-12-01 11:07:43.000-0600

What does this info in mmlog mean?

WARNING: Freeing unused memory at 0x9ab62e8, in complete_fn_2 of cli.c, line 1268
WARNING: Freeing unused memory at 0xa02e970, in complete_fn_2 of cli.c, line 1268
WARNING: Freeing unused memory at 0x8960440, in complete_fn_2 of cli.c, line 1268
WARNING: Freeing unused memory at 0x9db6648, in complete_fn_2 of cli.c, line 1268
WARNING: Freeing unused memory at 0x9df55f0, in complete_fn_2 of cli.c, line 1268
WARNING: Freeing unused memory at 0xa025518, in complete_fn_2 of cli.c, line 1268

By: Stéphane HENRY (stef) 2009-12-02 10:16:34.000-0600

I experience the same problem for a while now (sevral months).

My asterisk version is 1.4.26.1
The CDR is connected to odbc/mysql and I use Astdb (for agents). I tried the Mysql table optimization but the issue is still there.
I have this problem on 3 asterisk servers (same version) where the 'Queue' application is used. I have many other asterisk servers without this issue, with no app_queue installed.

Stéphane

By: Leif Madsen (lmadsen) 2009-12-02 13:40:36.000-0600

aragon: can you please explain what is going wrong here, what the symptoms are, and why this issue is such a problem?

Based on the information in this ticket, it is hard to determine what problem you are experiencing other than just a lot of logging information on the screen. What issues are your customers experiencing when this happens, and what appears to be causing this to happen?

By: Stéphane HENRY (stef) 2009-12-02 15:15:13.000-0600

Leif : on my side, we experiment micro cuts in the VoIP calls when these warnings are logged.

By: David Brillert (aragon) 2009-12-02 16:22:25.000-0600

Leif: No audio on any call can be heard during warning periods. And many times the warnings last for over 30 seconds. This is unacceptable in a call center.  I assume it is caused by some locking introduced in r200991. I can't reproduce it like I used to with a show database command since optimizing MySQL tables but the problem is still there. It generates about 12 MB of text in /var/log/asterisk/messages daily and drives up CPU usage.
It only appears to happen in app_queue and it only happens after a hangup involving music on hold. I have tried native asterisk moh format and mp3 with no difference in frequency of warning.
I thought the last batch of debug uploads would make the root cause of the problem obvious to a developer.
Sorry the bug report is such a mess but my testing exposed so many other crashes and bugs it was hard to remain focused on the original problem. I could not know if any of those other problems were related to the original issue posted.



By: Leif Madsen (lmadsen) 2009-12-03 12:06:57.000-0600

Just wanted to let you know this issue is not being ignored as we discussed it today. We hope to get this resolved reasonably soon.

By: David Brillert (aragon) 2009-12-03 14:28:59.000-0600

Thanks

By: newborn (newborn) 2009-12-04 08:07:08.000-0600

I have no problems about a month or two, and today asterisk console starts flooding

[Dec  4 16:44:41] WARNING[1233]: channel.c:1038 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/m2-464@external-3372;2


and so on

i also tried to execute core show channels about a half an hour ago. i think that this bug is related to core show channels command.

i haven't executed this command before (for a long time)

By: David Brillert (aragon) 2009-12-08 14:25:39.000-0600

Hmmm,
I am getting the warning at 2 minutes after the hour each hour with some exceptions.
/var/log/asterisk/messages
I get this message
[Dec  7 20:38:06] WARNING[21383] app_queue.c: The device state of this queue member, Local/1609@default-agent/n, is still 'Not in Use' when it probably should not be! Please check UPGRADE.txt for correct configuration settings.

Then later at 2 minutes after the hour...
Dec  7 21:02:26] WARNING[30286] channel.c: Exceptionally long voice queue length queuing to Local/1609@default-agent-2ccf,1
Which keeps spamming the CLI until
[Dec  7 21:02:41] WARNING[30286] channel.c: Exceptionally long voice queue length queuing to Local/1609@default-agent-2ccf,1

etc...

show queues
default-queue- has 6 calls (max unlimited) in 'rrmemory' strategy (47s holdtime), W:0, C:1168, A:5502, SL:0.2% within 10s
  Members: >
     Local/1609@default-agent/n (dynamic) (In use) has taken 1168 calls (last was 127 secs ago)
  Callers:
     1. SIP/siptrunk-00001ebc (wait: 1:00, prio: 0)
     2. SIP/siptrunk-00001ebd (wait: 0:55, prio: 0)
     3. SIP/siptrunk-00001ebe (wait: 0:46, prio: 0)
     4. SIP/siptrunk-00001ebf (wait: 0:37, prio: 0)
     5. SIP/siptrunk-00001ec0 (wait: 0:32, prio: 0)
     6. SIP/siptrunk-00001ec1 (wait: 0:09, prio: 0)


1609/1609                  192.168.30.146   D          5060     OK (151 ms)

So agent 1609 is the only logged agent and qualify = yes

Therefore only a logged in use agent appears to be affected by the warning. And potentially this only happens when SIP extension re-registers after 3600 second timeout.
I'm going to re-test this with qualify=no for ext 1609

By: David Brillert (aragon) 2009-12-09 08:48:27.000-0600

No change with qualify=no
This is unrelated.

By: David Brillert (aragon) 2009-12-09 15:20:35.000-0600

warnings appear if there is corruption in mysql cdr table.
warnings disappear if we do mysqlcheck -A -r to repair the cdr table.
jvandal added mysqlcheck -A -r to crond at 2 minutes after the hour which explains warnings I mentioned in previous post.
So now I have a method to reproduce the warnings by manually running mysqlcheck -A -r

Here is what I know now:
Manually invoking mysqlcheck -A -r will spike the CPU and create massive warnings. Also SIP peers become unreachable due to qualify delays.
That could be a result of the high CPU or some locking (I don't know which if either).
Normal CPU load is 2.0 (busy test system processing approx 15 cps).
during scopserv.cdr optimization CPU climbs to 3.9
during scopserv-ivr optimization CPU climbs to 7
during other database optimizations CPU is normal and no warnings.

I can't read source code etc... but if certain Asterisk revisions are reverted we don't have any problems.  So the question now is how can a database optimization of some cdr and ivr data tables freeze up app_queue with these warnings and cause qualify timeouts?
If the cdr database was corrupt then the warnings occurred during hangups and that is when we try to write to cdr.  I could also reproduce on a corrupt cdr database with database show.
So what is the relationship between the cdr data and ast_carefulwrite and ast_frisolate?

By: David Brillert (aragon) 2009-12-11 10:01:52.000-0600

Since it is very easy to reproduce the warings with a mysqlcheck -A -r command.
I reproduced the warnings and attached to Asterisk with gdb asterisk --pid=`pidof asterisk`

The gdb output of that test is uploaded as mysqlcheck_cdr_gdb_pid_ofasterisk11_12_2009.txt

By: David Brillert (aragon) 2009-12-11 10:20:15.000-0600

Holy crap,

I figured out the configuration which causes the warnings during mysqlcheck etc...
In my queue configuration I have an option to allow the caller to exit the queue by pressing a digit in an IVR configuration. The IVR context is included in the queue and a key press will follow rules to transfer the caller to another destination.

Here is an example configuration (this example actually comes from another ticket that was opened with a similar but non-related problem). ASTERISK-13635

To reproduce it is easy enough to set "context" commzilla-aa-mainivr"
Where that context is equal to an IVR menu.
The purpose of that menu is to allow a caller to enter DTMF during wait time to transfer out of queue.

queues.conf
[commzilla-family]
music = default
strategy = rrmemory
servicelevel = 45
context = commzilla-aa-mainivr
timeout = 20
retry = 5
maxlen = 0
weight = 0
reportholdtime = no
reportwaitcall = no
memberdelay = 0
timeoutrestart = no
autofill = yes
autopause = no
ringinuse = no
setinterfacevar = yes
wrapuptime = 0
joinempty = yes
leavewhenempty = no
eventwhencalled = vars
eventmemberstatus = yes
periodic-announce-frequency = 20
periodic-announce = custom/0002
monitor-format = wav49
monitor-type = MixMonitor
queue-youarenext = queue-youarenext
queue-thereare = queue-thereare
queue-callswaiting = queue-callswaiting
queue-holdtime = queue-holdtime
queue-minutes = queue-minutes
queue-seconds = queue-seconds
queue-thankyou = queue-thankyou
queue-lessthan = queue-less-than

extensions-ivr.conf
[commzilla-aa-mainivr]
include => commzilla-local
exten => s,1,Answer
exten => s,2,AGI(/var/www/scopserv/telephony/scripts/agi/ivr.php,3438)
exten => s,3,Wait(1)
exten => s,4,Set(__IVRNAME=commzilla-aa-mainivr)
exten => s,5,Set(TIMEOUT(digit)=3)
exten => s,6,Set(TIMEOUT(response)=5)
exten => s,7,Set(CHANNEL(language)=en)
exten => s,8,Background(custom/0013)
exten => s,9,NoOp()
exten => 0,1,AGI(/var/www/scopserv/telephony/scripts/agi/ivr.php,3438,0)
exten => 0,2,Set(__CALLDIVERT=YES)
exten => 0,3,Dial(Local/200@commzilla-local/n,,o)
exten => 0,4,Hangup
exten => #,1,AGI(/var/www/scopserv/telephony/scripts/agi/ivr.php,3438,pound)
exten => #,2,Directory(commzilla|commzilla-local-directory|)
exten => t,1,AGI(/var/www/scopserv/telephony/scripts/agi/ivr.php,3438,none)
exten => t,2,Set(__CALLDIVERT=YES)
exten => t,3,Dial(Local/200@commzilla-local/n,,o)
exten => t,4,Hangup
exten => i,1,AGI(/var/www/scopserv/telephony/scripts/agi/ivr.php,3438,invalid)
exten => i,2,Set(__CALLDIVERT=YES)
exten => i,3,Dial(Local/200@commzilla-local/n,,o)
exten => i,4,Hangup
exten => h,1,Goto(all-hangup,s,1)

If I remove the context then the warnings no longer appear during mysqlcheck
Here is that configuration

queues.conf
[commzilla-family]
music = default
strategy = rrmemory
servicelevel = 45
# context = commzilla-aa-mainivr
timeout = 20
retry = 5
maxlen = 0
weight = 0
reportholdtime = no
reportwaitcall = no
memberdelay = 0
timeoutrestart = no
autofill = yes
autopause = no
ringinuse = no
setinterfacevar = yes
wrapuptime = 0
joinempty = yes
leavewhenempty = no
eventwhencalled = vars
eventmemberstatus = yes
periodic-announce-frequency = 20
periodic-announce = custom/0002
monitor-format = wav49
monitor-type = MixMonitor
queue-youarenext = queue-youarenext
queue-thereare = queue-thereare
queue-callswaiting = queue-callswaiting
queue-holdtime = queue-holdtime
queue-minutes = queue-minutes
queue-seconds = queue-seconds
queue-thankyou = queue-thankyou
queue-lessthan = queue-less-than

By: David Brillert (aragon) 2009-12-11 10:26:21.000-0600

All of this leads me to believe that ASTERISK-14558 and ASTERISK-14994 are related since both appear to relate to the IVR menu.
It was actually the gdb pid output that made me look at the IVR context included in the queue configuration and prompted me to test the queue without the IVR context included.

This is by no means fixed and I must include the IVR context for the queues so that is no workaround but at least I have provided a configuration and reproducible scenario to finally get this fixed.

By: David Brillert (aragon) 2009-12-11 10:43:20.000-0600

This is the thread I was talking about

Thread 27 (Thread 0xb7851b90 (LWP 6996)):
#0  0x00ea2402 in __kernel_vsyscall ()
#1  0x00a4b023 in poll () from /lib/libc.so.6
#2  0x080d16c6 in get_input ()
#3  0x080d1932 in do_message ()

The same kind of thread can be found in both ASTERISK-14558 and ASTERISK-14994

By: David Brillert (aragon) 2009-12-15 14:43:18.000-0600

So I made some changes to reduce some spamminess in /varlog/asterisk/messages
After maximum wait time expires in queue we now execute a hangup.
The IVR context is re-enabled in queues.conf in order to bring the bug back so I can continue to debug.
With these changes I have new WARNINGS in Asterisk logs.
By deleting some very large CDR tables I have removed the unusually long voice queue length warnings.

These are the new WARNINGS

[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: ast_yyerror():  syntax error: syntax error, unexpected '+', expecting $end; Input:
+ 1
^
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'Set'
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: ast_yyerror():  syntax error: syntax error, unexpected '>', expecting $end; Input:
>
^
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'GotoIf'
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Not taking any branch
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'Goto'
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'BackGround'

I am attaching the latest /var/log/asterisk/messages file with debug enabled so a developer can see the latest warnings and context of these warnings.

I have heard of other reporters seeing the unusually long queue length warnings when executing background...

By: David Brillert (aragon) 2009-12-15 15:16:27.000-0600

So if I understand the background application properly it is waiting for for some input and could be causing the warnings and processing delays.  The IVR context is included in queues.conf but we never send any DTMF (DTMF logging was enabled in my logger.conf but did not see any DTMF).  So I assume that the original warnings are caused by background waiting for input.  In an ideal scenario we would never want the caller to dial DTMF and exit the queue but we include the IVR context just in case.

And if I remove the IVR context from queues.conf then I no longer see the background application being executed hence no more warnings.
I think the bug is in the background application.

By: David Brillert (aragon) 2009-12-15 15:43:05.000-0600

Here is example debug output when I see ast_carefulwrite dialing into IVR so I can bypass the queue for a quick test.

I believe the problem is in the background application.

-- Launched AGI Script /var/www/scopserv/telephony/scripts/agi/ivr.php
[Dec 15 16:36:03] NOTICE[26225]: utils.c:938 ast_carefulwrite: Timed out trying to write
[Dec 15 16:36:03] NOTICE[26225]: utils.c:938 ast_carefulwrite: Timed out trying to write
   -- AGI Script /var/www/scopserv/telephony/scripts/agi/ivr.php completed, returning 0

[Dec 15 16:36:06] WARNING[26225]: file.c:764 ast_readaudio_callback: Failed to write frame
   -- Executing [h@scm-aa-mainaa:1] Goto("Zap/11-1", "all-hangup|s|1") in new stack
   -- Goto (all-hangup,s,1)
   -- Executing [s@all-hangup:1] GotoIf("Zap/11-1", "0?all-faxnotify|s|1:2") in new stack
   -- Goto (all-hangup,s,2)
   -- Executing [s@all-hangup:2] ResetCDR("Zap/11-1", "w") in new stack
   -- Executing [s@all-hangup:3] NoCDR("Zap/11-1", "") in new stack
   -- Executing [s@all-hangup:4] System("Zap/11-1", "/var/www/scopserv/telephony/scripts/billing/cdr.sh 1260912963.4643") in new stack


Here is the debug output

[Dec 15 16:27:30] WARNING[17320] file.c: Failed to write frame
[Dec 15 16:27:30] DEBUG[17320] channel.c: Set channel Zap/17-1 to write format ulaw
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Extension ÿ, priority 0 returned normally even though call was hung up
[Dec 15 16:27:30] DEBUG[17320] channel.c: Soft-Hanging up channel 'Zap/17-1'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Launching 'Goto'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Expression result is '0'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Expression result is '2'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Launching 'GotoIf'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Launching 'ResetCDR'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '"private" <5555555555>'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '5555555555'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4350'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'zap-incoming'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'Zap/17-1'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '(null)'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'BackGround'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'custom/0002'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:26'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:26'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:30'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'ANSWERED'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'DOCUMENTATION'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '_NONE_'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '1260912446.4587'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4350'
[Dec 15 16:27:30] DEBUG[17320] cdr_addon_mysql.c: cdr_mysql: inserting a CDR record.
[Dec 15 16:27:30] DEBUG[17320] cdr_addon_mysql.c: cdr_mysql: SQL command as follows: INSERT INTO cdr_pre (calldate,clid,src,dst,dcontext,channel,dstchannel,lastapp,lastdata,duration,billsec,disposition,amaflags,accountcode,uniqueid,userfield) VALUES ('2009-12-15 16:27:26','\"private\" <5555555555>','5555555555','4350','zap-incoming', 'Zap/17-1','','BackGround','custom/0002',4,4,'ANSWERED',3,'_NONE_','1260912446.4587','4350')
[Dec 15 16:27:30] DEBUG[17320] cdr_radius.c: Unable to create RADIUS record. CDR not recorded!
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Launching 'NoCDR'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Launching 'System'

By: David Brillert (aragon) 2009-12-15 15:49:07.000-0600

Specifically when background is executed by agi

By: Leif Madsen (lmadsen) 2009-12-15 16:35:44.000-0600

These warnings are standard -- you're missing some data that is being passed to an expression:


[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: ast_yyerror(): syntax error: syntax error, unexpected '+', expecting $end; Input:
+ 1
^
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'Set'
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: ast_yyerror(): syntax error: syntax error, unexpected '>', expecting $end; Input:
>
^
[Dec 15 15:10:53] WARNING[17888] ast_expr2.fl: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'GotoIf'
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Not taking any branch
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'Goto'
[Dec 15 15:10:53] DEBUG[17888] pbx.c: Launching 'BackGround'

By: David Brillert (aragon) 2009-12-16 13:42:34.000-0600

So by wiping out my mysql cdr and IVR tables and starting from scratch the warnings are gone.

There appears to be a processing delay when function result is background and we try to write to mysql table with cdr_addon_mysql.c:
The delay occurs when the mysql table is large or corrupt or being optimized.

[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'BackGround'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'custom/0002'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:26'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:26'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '2009-12-15 16:27:30'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'ANSWERED'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is 'DOCUMENTATION'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '_NONE_'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '1260912446.4587'
[Dec 15 16:27:30] DEBUG[17320] pbx.c: Function result is '4350'
[Dec 15 16:27:30] DEBUG[17320] cdr_addon_mysql.c: cdr_mysql: inserting a CDR record.



By: Tilghman Lesher (tilghman) 2009-12-18 18:43:35.000-0600

aragon:  have you considered turning on CDR batching?  If you turn that on, do these warnings all disappear?  If the lock is caused by database table delays (and it certainly seems like it has been), then CDR batching should completely eliminate the issue.

One thing that we could possibly do is automatically activate batching when this problem occurs in the future, and that could be the permanent fix.  However, we should test whether turning on CDR batching really does solve this before heading down that path.

By: Joel Vandal (jvandal) 2009-12-18 20:49:31.000-0600

tilghman: I have a single question... if we enable CDR batching, any scripts executed on the hangup (h) exten will not be able to retrieve CDR data ?

By: Tilghman Lesher (tilghman) 2009-12-19 02:54:35.000-0600

aragon:  please concentrate on ONE issue in this report.  As you pointed out, you have another issue with ast_carefulwrite.  Let's keep details associated with that problem in that other issue.  This issue has already grown excessively long, and we confuse matters further by associating separate problems in the same issue.

Whether the two issues have a common cause is another matter that we can determine in due time, but please stick to the subject of this report.



By: Tilghman Lesher (tilghman) 2009-12-19 02:59:39.000-0600

This is correct.  If you enable CDR batching, you may not be able to directly operate on the CDR data.  Would it be possible for you to operate on the CDR _before_ it is posted?  Or add a trigger to your database and operate at the time of insertion?

By: Tilghman Lesher (tilghman) 2009-12-21 12:03:56.000-0600

I have just uploaded a patch.  This patch will SOMEWHAT reduce the contention between postings of multiple CDRs.  It may not be a complete solution, especially if you use one of the CDR backends (like cdr_addon_mysql) which serializes all of its CDR posts.  In any case, it is a prerequisite before making changes to cdr_addon_mysql in an attempt to reduce the contention in that driver.

Since that first patch, I have made additional changes which SHOULD make the warning associated with this issue disappear entirely.



By: David Brillert (aragon) 2009-12-21 12:14:47.000-0600

Thanks Tilghman:
I'll try to get this patch put into service ASAP and update this ticket.

By: David Brillert (aragon) 2009-12-21 13:27:59.000-0600

tilghman: does the uploaded patch contain all additional changes you mentioned?

By: Tilghman Lesher (tilghman) 2009-12-21 14:30:00.000-0600

aragon:  yes.  If you review the issue history, below, you'll see that 2 previous patches were uploaded and deleted.

By: David Brillert (aragon) 2009-12-21 14:35:48.000-0600

tilghman: thanks again, I wasn't sure because you did not mention which patch version I should test.
I thought it better to ask since I was unsure.

By: David Brillert (aragon) 2009-12-21 15:04:11.000-0600

tilghman: I am just the messenger here and jvandal writes all of the backend cdr mysql/cdr scripts for my servers. He states that the patch will not be helpful because it is for cdr and he only uses mysql cdr.

By: David Brillert (aragon) 2009-12-21 15:18:10.000-0600

tilghman/jvandal: regardless of sql vs mysql cdr I am going to test tilghman's patch on fresh svn checkout in my lab. At least I can confirm if anything that it breaks nothing else...

By: David Brillert (aragon) 2009-12-21 17:29:16.000-0600

tilghman: your patch does not appear to break anything. I assume by your previous notes that you are working on something for cdr_addon_mysql.  I guess I'll sit back and wait for that patch for testing.

By: Tilghman Lesher (tilghman) 2009-12-21 21:38:01.000-0600

Actually, I think you'd be better off to go ahead and test this, as is.  As I said, it should get rid of the error to which this report is linked, solving the frame queue length issue completely.  A possible change of that measure to cdr_addon_mysql is rather extensive and will probably go into trunk only.  The driver isn't really broken; it's just that the way that you're posting CDRs is a little out of the ordinary.  However, as I said, I think I've found the root cause of this issue, and this patch should solve it.

By: pzhy (pzhy) 2009-12-22 02:54:38.000-0600

We have the problem, but the record of any CDRs is off.

By: Amilcar S Silvestre (amilcar) 2009-12-22 04:06:02.000-0600

I don't think this is strictly related to CDRs. We have this same problem in many machines without recording any CDR but the very basic cdr-csv/Master.csv.

Reading this thread, it seems that cdr processing could be one of the things that may cause this behaviour. But it's not the root cause.

By: alexb (alexb) 2009-12-22 04:24:02.000-0600

amilcar: I fully agree. Please see my previous post at https://issues.asterisk.org/view.php?id=15609#111760

No CDR recording but the .csv one, and tons of warnings anyway! I'd also like to highlight that I was using 1.6 from SVN trunk, so this issue is not related with the 1.4 branch only.

By: Julien Leboeuf (yuenk) 2009-12-22 04:35:18.000-0600

Same here, I don't use any particular CDR (only cdr-csv/Master.csv) but I frequently receive those messages.
I applied the new patch on asterisk-1.6.0.18 (only pbx.c needed to be modified) and it doesn't seem to change anything.

By: David Brillert (aragon) 2009-12-22 07:52:57.000-0600

Tilghman: Tested your patch overnight
Still getting the warnings.

[Dec 22 08:01:43] WARNING[1257] channel.c: Exceptionally long voice queue length queuing to Local/1650@default-agent-1a7f,2
[Dec 22 08:01:43] WARNING[1257] channel.c: Exceptionally long voice queue length queuing to Local/1650@default-agent-1a7f,2
[Dec 22 08:01:43] WARNING[1257] channel.c: Exceptionally long voice queue length queuing to Local/1650@default-agent-1a7f,2

By: Tilghman Lesher (tilghman) 2009-12-22 09:27:52.000-0600

aragon:  are your warnings still clustered around the end of the call?  I really need to see the context of where the warnings appear, not just the warnings themselves.

By: David Brillert (aragon) 2009-12-22 10:08:54.000-0600

tilghman: Since blowing away my cdr and ivr mysql tables I have not found a way to reliably reproduce this warnings. I should have made backups of my old tables so I could keep reproducing this warning when I needed to.

I'll have to log all cli to a file and get back to you.

By: Tilghman Lesher (tilghman) 2009-12-22 13:13:23.000-0600

aragon:  even the messages around last night's output would be helpful.

By: David Brillert (aragon) 2009-12-22 13:43:09.000-0600

tilghman: Nothing else in logs except queue length stuff and this...
[Dec 22 14:05:52] ERROR[7062] utils.c: write() returned error: Broken pipe
Your patch is not responsible for broken pipe ERROR. I'm pretty sure this is caused by a much higher CPU utilization since installing 1.4 SVN.  I haven't tracked down which revision causes the higher CPU load.

I'll try to get you more details about exact CLI surrounding queue length warnings.

I can also give you root access if you want.

By: David Brillert (aragon) 2009-12-22 14:17:16.000-0600

tilghman: I ran under Valgrind for a few minutes and almost immediately got the warnings.  They have moved from hangup to when we execute into the acd queue.
This is Asterisk SVN-branch-1.4-r235940M with your patch.

-- Executing [5000@customer-siptrunk-incoming:21] GotoIf("SIP/siptrunk-00000007", "0?22:24") in new stack
 == Parsing '/etc/asterisk/manager.conf': Found
   -- Goto (customer-siptrunk-incoming,5000,24)
   -- Executing [5000@customer-siptrunk-incoming:24] Set("SIP/siptrunk-00000007", "__INCOMINGLINE=1") in new stack
   -- Executing [5000@customer-siptrunk-incoming:4] Set("SIP/siptrunk-00000008", "CHANNEL(musicclass)=default") in new stack
   -- Executing [5000@customer-siptrunk-incoming:25] GotoIf("SIP/siptrunk-00000007", "0?26:27") in new stack
   -- Executing [5000@customer-siptrunk-incoming:5] NoOp("SIP/siptrunk-00000008", ""INCOMING CALL FROM CALLER ID: 1027 (1027)"") in new stack
   -- Executing [5000@customer-siptrunk-incoming:6] Set("SIP/siptrunk-00000008", "__INCOMING_DNIS=5000") in new stack
 == Parsing '/etc/asterisk/manager.conf': Found
   -- Goto (customer-siptrunk-incoming,5000,27)
   -- Executing [5000@customer-siptrunk-incoming:27] GotoIf("SIP/siptrunk-00000007", "0?28:29") in new stack
   -- Executing [5000@customer-siptrunk-incoming:7] Set("SIP/siptrunk-00000008", "CALLERID(dnid)=5000") in new stack
   -- Goto (customer-siptrunk-incoming,5000,29)
   -- Executing [5000@customer-siptrunk-incoming:29] GotoIf("SIP/siptrunk-00000007", "0?30:31") in new stack
[Dec 22 14:57:19] ERROR[23231]: utils.c:966 ast_carefulwrite: write() returned error: Broken pipe
   -- Executing [5000@customer-siptrunk-incoming:8] Set("SIP/siptrunk-00000008", "CDR(userfield)=5000") in new stack
   -- Goto (customer-siptrunk-incoming,5000,31)
   -- Executing [5000@customer-siptrunk-incoming:31] GotoIf("SIP/siptrunk-00000007", "0?32:33") in new stack
   -- Executing [5000@customer-siptrunk-incoming:9] Goto("SIP/siptrunk-00000008", "20") in new stack
   -- Goto (customer-siptrunk-incoming,5000,33)
   -- Executing [5000@customer-siptrunk-incoming:33] Set("SIP/siptrunk-00000007", "CHANNEL(language)=en") in new stack
   -- Goto (customer-siptrunk-incoming,5000,20)
   -- Executing [5000@customer-siptrunk-incoming:20] AGI("SIP/siptrunk-00000008", "/var/www/scopserv/telephony/scripts/agi/fixcidname.php") in new stack
 == Parsing '/etc/asterisk/manager.conf': Found
   -- Executing [5000@customer-siptrunk-incoming:34] Set("SIP/siptrunk-00000007", "QUEUE_PRIO=0") in new stack
   -- Executing [5000@customer-siptrunk-incoming:35] Gosub("SIP/siptrunk-00000007", "default-application-acd-queues-new-english|s|1") in new stack
[Dec 22 14:57:21] ERROR[23234]: utils.c:966 ast_carefulwrite: write() returned error: Broken pipe
   -- Executing [s@default-application-acd-queues-new-english:1] Set("SIP/siptrunk-00000007", "AUDIOHOOK_INHERIT(MixMonitor)=yes") in new stack
 == Parsing '/etc/asterisk/manager.conf':     -- Launched AGI Script /var/www/scopserv/telephony/scripts/agi/fixcidname.php
   -- Executing [s@default-application-acd-queues-new-english:2] NoOp("SIP/siptrunk-00000007", ""=== START QUEUE (default-queues-new-english) ==="") in new stack
Found
 == Parsing '/etc/asterisk/manager.conf':   == Parsing '/etc/asterisk/manager.conf': Found
Found
   -- Executing [s@default-application-acd-queues-new-english:3] Set("SIP/siptrunk-00000007", "__QUEUENAME=default-queues-new-english") in new stack
Really destroying SIP dialog '3618b2963c524fc64ccf79b6729c4c93@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '3b4740201d56bbc868121ef21568d466@192.168.30.165' Method: OPTIONS
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
[Dec 22 14:57:24] WARNING[22889]: channel.c:954 __ast_queue_frame: Really destroying SIP dialog '0ed6424822a86f7a5f52c5826bca5316@192.168.30.165' Method: OPTIONS
Exceptionally long voice queue length queuing to Local/1609@default-agent-2b0b,1
Really destroying SIP dialog '1ccc3d0a11832a280f62a2ef708bdf49@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '1fef18144edb808a420b622b19e54031@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '5350a10157e2fcf914f6df1305a90789@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '17f9a54b1764839305372b0f4d8e9464@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '110823d747584bb809e1d76e31977351@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '5dbc4e6626b1711b26c5aeb50c1dce47@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '58b4c0725d26934602124c6f0114a4dc@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '0fd17fd07bc174c86814ddab7e14e8fb@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '6a91b07703d7cdff179f43f46c924b44@192.168.30.165' Method: OPTIONS
Really destroying SIP dialog '08f276a25ee8fbcd7b3d56a662378b48@192.168.30.165' Method: OPTIONS



By: David Brillert (aragon) 2009-12-22 14:43:48.000-0600

tilghman: Also I noticed I had forgotten to include the ivr context in queues.conf during this round of testing and I still see the exceptionally long... warnings and the "utils.c:938 ast_carefulwrite: Timed out trying to write" had disappeared.  I put the ivr context back into queues.conf and "utils.c:938 ast_carefulwrite: Timed out trying to write" re-appeared during a mysqlcheck.  This tells me that ASTERISK-14558 and ASTERISK-14994 are not related; they are in fact two separate issues.

By: Tilghman Lesher (tilghman) 2009-12-28 16:13:14.000-0600

Okay, let's try using this patch.  This is not a fix, but it may help me track down the actual problem.  This should output a 'core show locks' to the WARNING log level when this problem occurs, so that I can better understand where the locking problem is occurring.  Note that you'll need to have DEBUG_THREADS turned on.

By: David Brillert (aragon) 2009-12-29 10:33:31.000-0600

Testing Asterisk SVN-branch-1.4-r236585M with 12282009_issue15609.diff.txt

By: David Brillert (aragon) 2009-12-29 10:40:00.000-0600

Didn't take long to reproduce the warnings with patch installed.
messages 12292009log.txt attached

By: Tilghman Lesher (tilghman) 2009-12-29 14:11:20.000-0600

Okay, yet another patch to test.  This one hopefully will also output to the messages log (or fix it, maybe).

By: David Brillert (aragon) 2009-12-29 15:16:28.000-0600

Tested the latest patch and I still see the exceptionally long.. warnings.
They appear during hangup still.
I only patched for 12292009_issue.15609.txt
I didn't see any lock info written to messages file... or any text resembling hopeful output in latest patch.

By: Tilghman Lesher (tilghman) 2009-12-29 15:53:01.000-0600

Anything in your logs, like:  "Wha? Can't tell the channel that a frame is pending?!" ?

By: David Brillert (aragon) 2009-12-29 16:29:58.000-0600

No

By: Tilghman Lesher (tilghman) 2009-12-29 16:45:15.000-0600

Okay, yet another patch uploaded.

By: David Brillert (aragon) 2009-12-29 17:41:40.000-0600

Patched and still warnings; new log uploaded 12292009log2.txt

By: Tilghman Lesher (tilghman) 2009-12-29 18:37:37.000-0600

Excellent; that was useful information.  I have still yet another patch uploaded for you.

By: David Brillert (aragon) 2009-12-29 19:11:35.000-0600

Excellent: I don't see any warnings with this patch.
Are you finished? ;)



By: Tilghman Lesher (tilghman) 2009-12-30 00:33:15.000-0600

Almost.  If you revert every change in that patchset except for the change to chan_local.c, do you get any warnings?  Put another way, does that one simple change fix the entire problem, or are the previous fixes also needed?

By: David Brillert (aragon) 2009-12-30 08:23:41.000-0600

I have been testing each patch by itself. So diff4 is the only patch installed and the warnings are gone. I don't know how to revert any changes to the patch.  If you would like me to just test changes to chan_local.c can you please post another patch and I will test that.

By: Tilghman Lesher (tilghman) 2009-12-30 08:54:13.000-0600

aragon: patch posted.

By: David Brillert (aragon) 2009-12-30 09:37:43.000-0600

tilghman: No warnings with chan_local.c only patch.
I tested with very high load, valgrind, database show, core show channels, and mysqlcheck

By: David Brillert (aragon) 2009-12-30 14:54:26.000-0600

tilghman: 7 hours now and no warnings with chan_local.c only patch in use.
This patch looks good.
What are next steps?



By: David Brillert (aragon) 2009-12-30 15:02:24.000-0600

tilghman: I see you have requested review
https://reviewboard.asterisk.org/r/452/

Thanks for the help, I like working with you and I really appreciate the effort.
I'd like to work with you again to fix ASTERISK-14994 :)

By: Digium Subversion (svnbot) 2009-12-30 15:57:11.000-0600

Repository: asterisk
Revision: 236981

U   branches/1.4/channels/chan_local.c

------------------------------------------------------------------------
r236981 | tilghman | 2009-12-30 15:57:10 -0600 (Wed, 30 Dec 2009) | 9 lines

Don't queue frames to channels that have no means to process them.
(closes issue ASTERISK-14558)
Reported by: aragon
Patches:
      20091230__issue16521__1.4__chan_local_only.diff.txt uploaded by tilghman (license 14)
Tested by: aragon

Review: https://reviewboard.asterisk.org/r/452/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236981

By: Digium Subversion (svnbot) 2009-12-30 15:59:19.000-0600

Repository: asterisk
Revision: 236982

_U  trunk/
U   trunk/channels/chan_local.c

------------------------------------------------------------------------
r236982 | tilghman | 2009-12-30 15:59:18 -0600 (Wed, 30 Dec 2009) | 16 lines

Merged revisions 236981 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r236981 | tilghman | 2009-12-30 15:57:10 -0600 (Wed, 30 Dec 2009) | 9 lines
 
 Don't queue frames to channels that have no means to process them.
 (closes issue ASTERISK-14558)
  Reported by: aragon
  Patches:
        20091230__issue16521__1.4__chan_local_only.diff.txt uploaded by tilghman (license 14)
  Tested by: aragon
 
 Review: https://reviewboard.asterisk.org/r/452/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236982

By: Digium Subversion (svnbot) 2009-12-30 16:00:26.000-0600

Repository: asterisk
Revision: 236983

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_local.c

------------------------------------------------------------------------
r236983 | tilghman | 2009-12-30 16:00:26 -0600 (Wed, 30 Dec 2009) | 23 lines

Merged revisions 236982 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r236982 | tilghman | 2009-12-30 15:59:18 -0600 (Wed, 30 Dec 2009) | 16 lines
 
 Merged revisions 236981 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r236981 | tilghman | 2009-12-30 15:57:10 -0600 (Wed, 30 Dec 2009) | 9 lines
   
   Don't queue frames to channels that have no means to process them.
   (closes issue ASTERISK-14558)
    Reported by: aragon
    Patches:
          20091230__issue16521__1.4__chan_local_only.diff.txt uploaded by tilghman (license 14)
    Tested by: aragon
   
   Review: https://reviewboard.asterisk.org/r/452/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236983

By: Digium Subversion (svnbot) 2009-12-30 16:00:45.000-0600

Repository: asterisk
Revision: 236984

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_local.c

------------------------------------------------------------------------
r236984 | tilghman | 2009-12-30 16:00:45 -0600 (Wed, 30 Dec 2009) | 23 lines

Merged revisions 236982 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r236982 | tilghman | 2009-12-30 15:59:18 -0600 (Wed, 30 Dec 2009) | 16 lines
 
 Merged revisions 236981 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r236981 | tilghman | 2009-12-30 15:57:10 -0600 (Wed, 30 Dec 2009) | 9 lines
   
   Don't queue frames to channels that have no means to process them.
   (closes issue ASTERISK-14558)
    Reported by: aragon
    Patches:
          20091230__issue16521__1.4__chan_local_only.diff.txt uploaded by tilghman (license 14)
    Tested by: aragon
   
   Review: https://reviewboard.asterisk.org/r/452/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236984

By: Digium Subversion (svnbot) 2009-12-30 16:17:13.000-0600

Repository: asterisk
Revision: 236985

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_local.c

------------------------------------------------------------------------
r236985 | tilghman | 2009-12-30 16:17:13 -0600 (Wed, 30 Dec 2009) | 23 lines

Merged revisions 236982 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r236982 | tilghman | 2009-12-30 15:59:18 -0600 (Wed, 30 Dec 2009) | 16 lines
 
 Merged revisions 236981 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r236981 | tilghman | 2009-12-30 15:57:10 -0600 (Wed, 30 Dec 2009) | 9 lines
   
   Don't queue frames to channels that have no means to process them.
   (closes issue ASTERISK-14558)
    Reported by: aragon
    Patches:
          20091230__issue16521__1.4__chan_local_only.diff.txt uploaded by tilghman (license 14)
    Tested by: aragon
   
   Review: https://reviewboard.asterisk.org/r/452/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236985

By: Aziz (azizzoaib786) 2014-07-18 04:47:24.437-0500

I am new to this forum please let me know which patch to download to overcome this issue ? and How can I download patch ?

Regards,
Aziz