[Home]

Summary:ASTERISK-04144: * deadlocks after att. transfer into queue
Reporter:nrb (nrb)Labels:
Date Opened:2005-05-12 03:37:06Date Closed:2011-06-07 14:00:39
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bug_queue_log.txt
( 1) dump7.dat
( 2) queue_log_cvshead.txt
( 3) queue_log_stable.txt
Description:If Agents make a attended transfer of a incoming call or queue-call (zap) to another queue, the queue_app stops responding after a while.
This will eventually bring all comunication to a halt.


****** ADDITIONAL INFORMATION ******

Incoming calls from Zap interface
Agents logged on using AgentcallbackLogin on a cisco 7960 (SIP) from where the att. transfer is also initiated.
The time between the attended transfer and the queue going into "non resonding mode" seems to be between "immediately" and several hours
This behaviour has been observed om stable releases 1.0.2, 1.0.5, 1.0.7
This Deadlock trigered by extension 8162
Comments:By: Mark Spencer (markster) 2005-05-14 19:37:29

Does this occur with cvs head?

By: nrb (nrb) 2005-05-18 07:33:26

Tested against CVS_Head (05.17.) with no errors observed. This is only a indication that the problem will not occour in cvs-head, since the problem (deadlock) has been observed kickin in hours after completion af the att. transfer.
The zombie channel involved in the att. transfer does not occur in the queue_log in cvs-head.

By: mhardeman (mhardeman) 2005-05-18 23:09:46

I'm afraid that this may actually be part of a larger SIP transfers issue.  I've noticed periodic deadlocks in * (stable, 1.0.7) that have been related to SIP transfers both blind and attended.  The issue seems to occur fairly randomly.  In particular, in 1.0.7, it can be frequently observed by performing blind transfers into the parking lot extension.  As for attended transfers, I've seen issues where a ghost sip channel remains after a failed transfer and the system frequently reports "Avoided deadlock on SIP/xxxx-yyyy...".  If anyone is interested in following up on these issues and you think this may be a related issue, drop me a line at mhardemn@papersoft.com.  I may be able to reproduce these scenarios and provide a dump.

By: nrb (nrb) 2005-05-20 06:22:43

I would not be able to say if there is a connection to any generel transferproblems in 1.0.7.
When my * 1.0.7 deadlocks, I'm always able to find a zombie channel in the queu_log within the last minute-2 hours.
I'm aware, that the zombie channels occur in relation to the transfers, and guess that the problem is related to the way the queue application handles these zombiechannels.
Is there any kind of additional information/data I could provide to make a clearer picture of the problem?

By: yamez (yamez) 2005-06-07 07:31:49

This happen on my office PBX yesterday, CVS-v1-0-05/17/05-08:06:04
I did not get a dump but did see    "Avoided deadlock on SIP/xxxx-yyyy..."
in my logs. One thing to note is my PBX is sip only, so you do not need a zap device to reproduce this issue.

By: Clod Patry (junky) 2005-06-08 06:58:44

Even with the newest addition related to ast_channel_walk_locked and channel_find_locked (during last week-end), could you confirm it is still a problem?

And that's only on CVS-HEAD.

By: Michael Jerris (mikej) 2005-06-19 09:09:51

nrb- can you please test this for a period of time and behavior that would normally produce a deadlock, but on CVS head.  We need to confirm if this issue is in head, or only in 1.0.x before we can move forward.

By: nrb (nrb) 2005-06-20 03:13:32

During the last month or so, we haven' been able to reproduce this error on cvs head. Therefore I'm almost certain, that this issue is only related to the stable release.
The error is still 100% reproduceable on stable

By: Mark Spencer (markster) 2005-06-20 13:57:18

Not an issue in head.  This presumably was related to the ordering of locking.

By: Jennifer Hales (jennifer hales) 2005-06-24 03:38:52

Yes it does occur with cvs head.  We just had our system go nuts.

By: Michael Jerris (mikej) 2005-06-26 01:02:25

Jennifer-  Can you please open a bug for head with the appropriate debug materails required (see the bug guidelines.)  Due to the testing done on this, I do not beleive you have the exact same issue.  
Thanks.

By: nicolasg (nicolasg) 2005-06-30 14:37:29

I'm experiencing a similar problem with CVS-HEAD as of yesterday:

channel_find_locked: Avoided deadlock for '0x30603280', 10 retries!

The channel got stuck, the only way to solve this is by restarting asterisk.

I'm not sure what causes this. It might be a call pickup issue or a native sip redirect. We are using sipuras. I can't reproduce it yet... it's kind of random, but it happens every day.

Should I open a new bug for HEAD? What info is it needed to track down the issue?

By: Olle Johansson (oej) 2005-08-14 11:46:48

Where are we with this bug report? Has anyone opened a new bug for HEAD? Anyone that works with this?

/Housekeeping

By: raarts (raarts) 2005-09-08 06:57:38

I am seeing what I think is the same problem on a production 1.0.2 system about twice a day. Symptoms: 'avoided deadlock' warning and all calls stop. You have to restart asterisk to get things going. This is a Dual Xeon 2.8 system with 2 x TE411P (6 E1 ports actually in use), and a GDT SCSI RAID controller.

The fun is: it started suddenly. We had changed two things on the system when it started:

- from a webpage people could query queue statuses (this used the Manager interface)
- We started using Local/xxx channels in queues.conf (instead of SIP/xxxxx)

First I disabled querying queue statuses, because the system seemed to lock up after this feature was used too much. That didn't seem to be helping.

I just reverted to using SIP/xxxx in queues.conf, I'll keep you posted.

By: Michael Jerris (mikej) 2005-09-08 07:03:54

I have seen issues like this related to chan_local in the past.  I have a feeling it may be the culpriate.  We need somone to create a backtrace off of cvs v1-0 (current, not 1.0.2, not 1.0.9) when this happens.  Asterisk will need to be built make valgrind as to not optimize.  We will need at least initially a bt full and a thread apply all bt.  Thanks.

By: lters (lters) 2005-09-08 09:01:48

The post by raarts sounds exactly like our setup.
Except that we still use the local channels.
Run cvs 11-15-04 the lock warning happens some, but the system does not hang.
Raarts, how do you use sip clients in the queue.conf? How do you setup agents that way? I would love to use a newer * but can't because of this problem.

By: Michael Jerris (mikej) 2005-09-08 09:11:50

This issue is in regards to the cvs v1-0 branch, not cvs head.  Why would using a newer cvs head be an issue?

By: lters (lters) 2005-09-08 09:17:52

I tried it with cvs head of August 5 and had the exact same problem.
Very serious too. It would hang asterisk. Open a trouble ticket with digium and they could not fix, since I did not have a core dump available. :(
As the queue tried to send calls to local/agents, it would start to do lock avoided, and then in just a matter of minutes, all channels would lock. If you did a show channels, the cli would hang.

By: Michael Jerris (mikej) 2005-09-08 09:20:26

Then that is a seperate issue than this, as it occours in head, and this issue is specifically known not to.  If you have another issue, please open a proper complete bug report with backtraces (from an unoptimized build) so that it can be addressed.

By: raarts (raarts) 2005-09-08 09:32:42

> How do you use sip clients in the queue.conf?

We are using the brute force technique: for every agent that logs in or out, we rewrite the queues.conf, and issue an asterisk reload. We don't use agents.conf nor the AgentLogin etc apps.

By: raarts (raarts) 2005-09-08 09:40:23

MikeJ, I will try to create a backtrace, but how exactly do you want it, because asterisk does not segfault. This will be a lot of work for me, because we have patched asterisk 1.0.2 with bri-stuff, and are running production with that.

I have been flamed recently with entering issues for such a system, so I am planning to revert the bri-stuff patches, and going to 1.0.x recent. But this means a lot of work for us.

But in the mean time, can you tell me how I can produce the stuff you want? Should  I wait until the system seems to not accept any more calls, and then send a SIGSEGV to the asterisk process to let it dump core? And do a bt on that?, and send you the dump as well? Or just attach gdb and do the bt?

By: Michael Jerris (mikej) 2005-09-08 09:46:39

There is a readme in cvs head that explains how to attache to a running asterisk process and what to do when it deadlocks.. Perferabally I would want to see a backtrace off of curent unpatched cvs HEAD, not 1.0 branch.

By: nrb (nrb) 2005-09-29 11:04:59

This issue seems to be solved in cvs and beta1, and its' probably not going to be fixed in stable.
If its not going to be fixed in stable, we might as well close it
/nrb

By: Vivian (k9p4) 2005-10-01 22:34:52

I have installed 1.2.0beta1.
Get the following errors : channel.c:709 channel_find_locked: Avoided initial deadlock for '0x818b468', 10 retries!
This happens we have notice only when the caller gets in the queue.
The Local/xxxx dials the respective agent and this error occurs.
Any pointer / patches are appreciated, and presume that this would be resolved before 1.2 goes stable.

By: Armando Leal (arleal) 2005-10-15 07:21:30

I also installed 1.2.0 beta1 and have the same problem
Oct 11 13:24:54 WARNING[4859] channel.c: Avoided initial deadlock for '0x81fef40', 10 retries!
This problem is happening when people enter queues, and agents answer, after a while it locks everything, I been restarting.



By: Armando Leal (arleal) 2005-10-15 07:58:00

I get this every time queue sends to an aviable agent.
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 DEBUG[2499]: channel.c:699 channel_find_locked: Avoiding initial deadlock for 'SIP/506-e34b'
Oct 15 07:40:31 WARNING[2499]: channel.c:709 channel_find_locked: Avoided initial deadlock for '0x817da18', 10 retries!

By: Vivian (k9p4) 2005-10-15 19:11:12

I have noticed that this problem 'avoided deadlocks' occurs only after the queue has run for a while, no specific duration though.
I am testing now with a version of asterisk for which  I have no cvs-date.
We have successfully tested 1000 calls into the queue over the last 48 hours 'without restarting' asterisk, with only 3 dynamic agents handling calls .
Trying to narrow the problem. Will post on result soon.

By: Armando Leal (arleal) 2005-10-17 04:19:28

The error I think is with callbackagent, I set up queues without agents just adding them dinamicaly, and the deadlock warning is not appearing when calling someone inside queue, so its more related with agents and callbackagents, than queues.

By: Armando Leal (arleal) 2005-10-20 10:08:30

I just noticed something today, if I call queues with a SIP/ there is no deadlocks, but if I call them with Local/  they appear each time.


messages:Oct 20 07:45:33 WARNING[2324] channel.c: Avoided initial deadlock for '0x41000c10', 10 retries!
messages:Oct 20 09:39:28 WARNING[2324] channel.c: Avoided initial deadlock for '0x81d2f10', 10 retries!
messages:Oct 20 09:51:09 WARNING[2324] channel.c: Avoided initial deadlock for '0x81cea68', 10 retries!
messages:Oct 20 09:55:02 WARNING[2324] channel.c: Avoided initial deadlock for '0x81c2d20', 10 retries!
messages:Oct 20 09:55:20 WARNING[2324] channel.c: Avoided initial deadlock for '0x81cb4c8', 10 retries!

score*CLI> show queues
cobranza     has 0 calls (max unlimited) in 'rrmemory' strategy (12s holdtime), W:3, C:37, A:53, SL:2.7% within 0s
  Members:
     Local/504 (dynamic) (available) has taken 5 calls (last was 66 secs ago)
  No Callers


so maybe there is the problem.

By: raarts (raarts) 2005-10-20 10:31:43

Yes, there definitely is a problem with Local. When we were using
the Local channel, we experienced deadlocks twice a day. Since we
stopped using them we experience them once every three days.

So there are multiple causes for deadlocks.

By: Armando Leal (arleal) 2005-11-02 09:08:29.000-0600

this bug is fixed on beta-2?, let me try out!!

By: Chris A. Icide (cicide) 2005-11-04 11:57:06.000-0600

I'm getting this error under CVS Head from 2005-11-03 (pulled it around 8pm Pacific time).

I was getting it before under using HEAD from 2005-08-28.  I've not been able to track it down to anything specific yet.  I have a pretty complicated dialplan and high traffic load.  It seems to start intermittently with a few 'channel.c:783 channel_find_locked:' then shortly thereafter I get them anytime a call is attempted, and the system continues to run, but no calls can be processed, and issueing a stop or restart command at the CLI results in a CLI that takes commands but doesn't do anything.

By: Russell Bryant (russell) 2005-11-15 14:03:24.000-0600

If anyone is still seeing a problem in cvs head or 1.2, please open a new bug.