[Home]

Summary:ASTERISK-13302: Agent shows "(In use)" and will not receive queue calls while agent is logged in waiting for queue calls (1.4.22)
Reporter:Nathan Stocks (nathan)Labels:
Date Opened:2009-01-05 13:54:07.000-0600Date Closed:2009-01-27 16:01:13.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 14173.patch
( 1) last1.75m.bz2
Description:I have a queue with approximately 50 total agents, about 20 of which will be logged in at any given time.  Seemingly randomly (if there's a pattern, I haven't discovered it), sometimes after an agent completes an inbound call the agent status (as displayed in "show queue x") changes to "(In use)" and the agent never again receives a call until that agent logs off and logs back into the queue.  

This will happen about 15-20 times every day with seemingly random agents at seemingly random times.  It happens when there's only one agent logged in or when we have a full 20 logged in.  I can't identify a pattern that would cause this problem.

This causes a huge problem, because we are missing calls and breaking our SLA agreements with clients even though we have enough capacity and people sitting waiting for calls.  As a workaround we are trying to get each agent to log off the queue and log back on the queue after each call, but they often forget.

****** ADDITIONAL INFORMATION ******

The phones are all Polycom 550's.  They are provisioned via dhcp and tftp and communicate with Asterisk via SIP.

Here is a sample of my sip.conf:

[general]
context=internal                                                                                                                                                                                                                                                                                                                                                                      
allowguest=no
bindport=5060                                                                                                                                                                                                                                                                                                                                                            
bindaddr=0.0.0.0                                                                                                                                                                                                                                                                                                                                                          
srvlookup=no                                                                                                                                                                                                                                                                                                                                                                  
localnet=10.0.0.0/255.0.0.0
mohinterpret=default
useragent=Phone Server
callevents=yes
canreinvite=no
allowtransfer=yes
;rfc2833compensate=on                                                                                                                                                                                                                                                                                                                                                                                                                
tos_sip=cs3 ; recommended by docs/ip-
g726nonstandard=yes
allowsubscribe=yes                                                                                                                                                                                                                                                                                                                                            
;subscribecontext = default
notifyringing = yes
notifyhold = yes
limitonpeers = yes
type=friend

[authentication]

[5704]
mailbox=5704@default
type=friend
secret=register5704
nat=never
host=dynamic
port=5060
dtmfmode=rfc2833
callerid="Ben Erickson" <5704>
qualify=2000
call-limit=500
progressinband=no

(snip)

Here's a sample of my agents.conf:

[general]
persistentagents=yes
;multiplelogin=yes                                                                                                                                                                                                                                                                                                                                                                                                                    

[agents]
;maxlogintries=3                                                                                                                                                                                                                                                                                                                                                                                                                      
;autologoff=15                                                                                                                                                                                                                                                                                                                                                                                                                        
;autologoffunavail=no                                                                                                                                                                                                                                                                                                                                                                                                                
;ackcall=no                                                                                                                                                                                                                                                                                                                                                                                                                          
;endcall=yes                                                                                                                                                                                                                                                                                                                                                                                                                          
wrapuptime=5000                                                                                                                                                                                                                                                                                                                                                                                                  
musiconhold => silence                                                                                                                                                                                                                                                                                                                                      
;agentgoodbye => goodbye_file                                                                                                                                                                                                                                                                                                                                                                                                        
updatecdr=yes

agent => 3003,5704,5704-Ben Erickson (SPQ)

(snip)


Here's my queue config:

[general]
persistentmembers = yes
autofill = yes
monitor-type = MixMonitor

[supportq]
musicclass = default
strategy = leastrecent
servicelevel = 20
context = exit_supportqueue_digits
timeout = 5
wrapuptime=0
announce-frequency = 0
periodic-announce=sm/support-queue-repeating-message
periodic-announce-frequency=60
announce-holdtime = no
announce-round-seconds = 1
monitor-format = wav
monitor-type = MixMonitor
joinempty = yes
leavewhenempty = no
eventwhencalled = yes
eventmemberstatus = yes
reportholdtime = no
ringinuse = no
memberdelay = 0

member => Agent/3003,0
(snip)


Please let me know if there is any other useful information I could provide.
Comments:By: David Brillert (aragon) 2009-01-05 16:00:13.000-0600

You should probably look at this bug report
http://bugs.digium.com/view.php?id=12127

And check for similar bugs already reportedin bugtracker.

By: Leif Madsen (lmadsen) 2009-01-06 08:47:35.000-0600

You will probably also want to look into this:

http://reviewboard.digium.com/r/116/

By: Leif Madsen (lmadsen) 2009-01-06 08:48:18.000-0600

Assigning this to putnopvut in case he wishes to do something with this bug report (closed? keep open until r116 makes it in?). Thanks!

By: Nathan Stocks (nathan) 2009-01-06 10:59:44.000-0600

I read both the bug report and patch review above.  Though my symptoms are quite different from theirs (agents getting no calls instead of devices getting multiple calls at once), I'm assuming that you're indicating that the "queue state" fix should fix my problem as well.

So my question is:  How do I get the changes?  Should I just wait for 1.4.23?  Should I download the linked patch and apply it against 1.4.22?  What's recommended?  I need to get this fixed...

By: Leif Madsen (lmadsen) 2009-01-06 11:23:37.000-0600

This would not go into 1.4.23 as it has already been marked with release candidates. The earliest this would go in would be 1.4.24-rc1, assuming the development community wishes it to be so.

You can get the patch from the reviewboard link I mention a couple notes up. There is a 'View Diff' in the upper right hand corner, then there is a link for "Download Patch" on that page.

By: Mark Michelson (mmichelson) 2009-01-06 13:20:35.000-0600

I'm not 100% sure if that patch on reviewboard is applicable to this issue, to be honest. The issues it attempts to solve are typically 100% reproducible all the time, whereas this one is a randomly occurring issue. The fact that it is a randomly occurring issue make this a bit difficult to debug since it can't be predicted when it will happen.

The best thing we can get right now is a debug trace which shows device state changes. In order to get this, you'll need to set the core debug level to 3. Given the amount of traffic that you sound like you have, this will probably result in a huge logfile of messages, but this is probably our best bet. When you upload the log file, please be sure to tell which agent was the one that got "stuck" in use.

In the meantime, I'll do some code inspection, but this sounds like it may be a race condition that will not be easy to detect through static analysis.

By: Nathan Stocks (nathan) 2009-01-06 14:40:52.000-0600

Ok, I patched my 1.4.22 with the state-interface patched linked above (all hunks succeeded with varying offsets) and have compiled and installed it.  I'll restart asterisk tonight and see if the issues disappear tomorrow.

By: Nathan Stocks (nathan) 2009-01-06 14:42:42.000-0600

@putnopvut

If the issue is not fixed tomorrow, I'll get you that debug trace.

(I wish I could get mantis to send me email notifications when comments are added to my bugs.  I've turned on ALL the email options in preferences, yet...nothing)

By: Leif Madsen (lmadsen) 2009-01-06 21:53:37.000-0600

nathan: you may want to file a bug under the 'Mantis' project then with as much information as possible about the non-emailing thing.

By: Mark Michelson (mmichelson) 2009-01-09 08:54:29.000-0600

Mantis should be properly sending out e-mails to you now, nathan. In fact, you should see this note in your Inbox when I get done typing it :)

By: Nathan Stocks (nathan) 2009-01-09 11:23:53.000-0600

putnopvut: I got the email that time!

By: Nathan Stocks (nathan) 2009-01-09 11:29:12.000-0600

After testing the patch for two full business days, there doesn't seem to be any noticeable change in behaviour at all.  We are still experiencing queue agents who, after completing a queue call, will never receive another call until they logout and login.  :-(

How do I produce a useful debug trace?  I know I have to set core debug level to 3, but I assume that increases output to /var/log/asterisk/messages, which is already 250MB on my machine...  Is there a way for me to roll that log file and start fresh so that it's not such a nasty upload for you guys?

By: David Woolley (davidw) 2009-01-09 12:04:18.000-0600

Rename it and do a module reload on logging.

By: Alex VillacĂ­s Lasso (a_villacis) 2009-01-09 16:42:52.000-0600

I am one of the members of the Elastix development team (www.elastix.org) that packages Asterisk 1.4.x as the core of the Elastix telephony distro.

We have a package that implements a call center dialer with configuration web interface, and we have seen this problem with at least two different customers. Our dialer program checks the output of "queue show" to decide whether one or more agents are idle, and then generates calls with the Originate() command through the Asterisk manager interface. When this bug happens, the agent is stuck with the "in use" flag while really idle, but "agent show" lists it as idle. We have tried a workaround (SVN only) that queries both sources and only lists an agent as busy when both sources agree that a particular agent is busy. While this does route the call regardless of "queue show" status, sometimes (most of the time) this results in the call being considered as connected by Asterisk, but the agent just hears the queue background music instead of the connected call. However, relevant events (Join, Link, UnLink, Hangup) are delivered normally to the dialer application.

This bug sounds similar to this bug report, except that this is for 1.6.0.3: http://bugs.digium.com/bug_view_advanced_page.php?bug_id=14139 . However this one is marked as fixed in SVN. Maybe this fix is worth investigating...

nathan, could you please try a different protocol for the IP phone (other than SIP, that is), and check whether the bug occurs in that other protocol as well?

By: Leif Madsen (lmadsen) 2009-01-10 13:00:05.000-0600

nathan:  just add a new entry to logger.conf with the required debugging options, i.e.

queue_tracker => warning,notice,verbose,debug

By: Leif Madsen (lmadsen) 2009-01-13 12:06:49.000-0600

Waiting on some additional information here, so switching this status to Feedback.

By: Nathan Stocks (nathan) 2009-01-13 16:19:29.000-0600

Ok, so I added the queue_tracker entry to logger.conf as was suggested.  Further, I created a little script[1] to look at my queue "supportq" every 5 seconds and spit out some output[2] whenever the number of agents marked "In use" changes (which is the best indicator of an agent suffering the symptom that I can find).

I am also attaching (or will attach after this note) the "queue_tracker" log file that covers the same time period of the output[2] below (and more).


[1] The script I wrote...
#!/usr/bin/env python

import os, commands, time

global last_output; last_output = ""
global last_count;  last_count = 0

def get_count():
  global last_output
  global last_count
  new_output = commands.getoutput("""/bin/bash -c "asterisk -r -x 'queue show supportq' | grep 'In use'" """)
  new_count = new_output.count("\n") + 1
  if last_count != new_count:
     print "--------------------"
     print time.strftime("%Y-%m-%d %H:%M:%S")
     print "Lines: %d -> %d" % (last_count, new_count)
     print "New Output:\n%s" % new_output
     last_output = new_output
     last_count = new_count

while 1:
  get_count()
  time.sleep(5)


[2] The output of running the script for a little while...
--------------------
2009-01-13 15:12:08
Lines: 0 -> 1
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 117 secs ago)
--------------------
2009-01-13 15:12:13
Lines: 1 -> 2
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 123 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 15 secs ago)
--------------------
2009-01-13 15:12:25
Lines: 2 -> 3
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 134 secs ago)
     Agent/3055 (In use) has taken 6 calls (last was 62 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 26 secs ago)
--------------------
2009-01-13 15:12:36
Lines: 3 -> 4
New Output:
     Agent/3037 (In use) has taken 3 calls (last was 450 secs ago)
     Agent/3044 (In use) has taken 3 calls (last was 145 secs ago)
     Agent/3055 (In use) has taken 6 calls (last was 73 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 37 secs ago)
--------------------
2009-01-13 15:12:58
Lines: 4 -> 3
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 167 secs ago)
     Agent/3055 (In use) has taken 6 calls (last was 95 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 59 secs ago)
--------------------
2009-01-13 15:14:09
Lines: 3 -> 4
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 239 secs ago)
     Agent/3041 (In use) has taken 2 calls (last was 5 secs ago)
     Agent/3055 (In use) has taken 6 calls (last was 167 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 131 secs ago)
--------------------
2009-01-13 15:14:20
Lines: 4 -> 5
New Output:
     Agent/3044 (In use) has taken 3 calls (last was 250 secs ago)
     Agent/3041 (In use) has taken 2 calls (last was 16 secs ago)
     Agent/3055 (In use) has taken 6 calls (last was 178 secs ago)
     Agent/3042 (In use) has taken 7 calls (last was 2 secs ago)
     Agent/3056 (In use) has taken 5 calls (last was 142 secs ago)

By: Nathan Stocks (nathan) 2009-01-13 16:43:27.000-0600

Ok, I attached the file last1.75m.bz2, which is the last 1.75 million lines of queue_tracker, which is about 10 minutes including the time that I ran my little debug script.

I would have included more, but that was as big as I could get it while still getting the bzipped file to fit under the 2000K limit.  (not 2048K=2MB, mind you--sort of annoying).

@a_villacis: I would prefer getting an actual fix over trying to migrate to a different protocol at this point, but if we can't get the problem resolved any other way then I will certainly try changing communication protocols.

By: Mark Michelson (mmichelson) 2009-01-13 16:58:04.000-0600

nathan: Thank you very much for your input here. What I need to know before I dive into this large log file is if the problem occurred during the time that you were logging and if it did, which agent was the one that got "stuck."

Based on the output from your python script, I'm going on the assumption that this happened for Agent/3044, since he was consistently "In use" throughout the run, but I just want to be certain. Thanks!

By: Mark Michelson (mmichelson) 2009-01-13 17:44:26.000-0600

Ah, I think I realize what you are showing me now. Agents don't typically ever report "In use" as their status. They are typically either "Unavailable," "Not in Use," or "Busy." If they're reported as "In use" at all, it's probably indicative of something wrong occurring.

...that is unless you have installed the state_interface patch that blitzrage directed you to early on in this process. If you have installed it and are getting your device state updates from the specific SIP endpoints instead of the Agent channels, then seeing an Agent marked as "in use" would be perfectly valid since SIP channels will report "in use."

By: Mark Michelson (mmichelson) 2009-01-13 18:06:02.000-0600

Wow, the device state cod for chan_agent.c changed quite a bit between 1.4.21.2 and 1.4.22 and it wouldn't surprise me if the new "inherited device state" logic added in there might have some flaws. I'll take a closer look at its operation and see what I can figure out.

By: Mark Michelson (mmichelson) 2009-01-13 18:11:20.000-0600

nathan: I have some somewhat good news for you. I was able to reproduce the problem myself. The bad news was that I was totally unprepared for this to happen to me since all I was trying to do was make a simple setup to test this. I had been under the assumption that this problem was some sort of load-related problem, but if I can make it happen with just a few calls to a single logged-in agent, this may be easier to track down than I thought.

By: Mark Michelson (mmichelson) 2009-01-13 18:39:07.000-0600

After doing several tests, here's some data I've come up with.

1. With my current setup, an Agent can take exactly one call. After that, he will never get a "beep" from the queue. Using "queue show" I see that his status is "not in use."

2. I see the Agent's status permanently change to "in use" if the phone he is using re-registers with Asterisk.

Have you also seen the behavior that an agent can take exactly one call before not being able to accept any more?

By: Mark Michelson (mmichelson) 2009-01-13 18:44:52.000-0600

Okay, after further testing, the reason my agent would only be served with one call was because I was using the leastrecent strategy. After taking a call, he was no longer the least recent, so my call was being repeatedly offered to the only other queue member I had defined.

So, the phone registering is the only solid lead I have right now, but it is very consistent.

By: Mark Michelson (mmichelson) 2009-01-13 18:48:36.000-0600

By the way, since agents that are in use report as "busy," instead of "in use" a workaround for your problem would be to set ringinuse=yes for supportq. I found that agents who are busy will still not be called, and if their device state becomes stuck to "in use" they can still receive calls.

This is not intended to be a permanent solution, but it should help you until this problem is resolved.

By: Nathan Stocks (nathan) 2009-01-14 12:39:42.000-0600

putnopvut: Thanks for all your work!  I have changed ringinuse=yes on the queue.  I've done "module reload" and asked all of my agents to log off and then log on again and then let me know if they experience any further problems.  I would be satisfied with a working workaround for now :-)

In response to your earlier questions/comments:

> Ah, I think I realize what you are showing me now. Agents don't typically ever report "In use" as their status. They are typically either "Unavailable," "Not in Use," or "Busy." If they're reported as "In use" at all, it's probably indicative of something wrong occurring.

Exactly.  They should never report "In use" at all in my experience.  That doesn't correspond to any known valid state that I know of [agents are either logged off (unavailable), logged on and not on a call (available), or logged on and on a call (unavailable)], and all of the people who have ever reported the problem have been in the "In use" state when I've looked at them.

> ...that is unless you have installed the state_interface patch that blitzrage directed you to early on in this process. If you have installed it and are getting your device state updates from the specific SIP endpoints instead of the Agent channels, then seeing an Agent marked as "in use" would be perfectly valid since SIP channels will report "in use."

I _do_ have the state_interface patch installed currently, but it has had no noticeable effect on the bug or the frequency of "In use" showing up.  I can easily unpatch and reinstall if you'd like me to.

> I had been under the assumption that this problem was some sort of load-related problem...

It doesn't appear to be load-related.  It is most devastating when it occurs to my 2:00am shift when there's only a couple of people on the phones.  We had one stretch where the only person on the phones for a couple hours experienced this bug and inadvertently missed over 20 calls thinking it was just a quiet night...

Thanks for all your help so far!

~ Nathan

By: Nathan Stocks (nathan) 2009-01-14 12:41:46.000-0600

Oops, in my first response to one of your comments:

...or logged on and on a call (busy)...

By: Mark Michelson (mmichelson) 2009-01-14 13:16:41.000-0600

Based on what I was seeing here, it makes perfect sense that this would happen to your 2:00 AM shift people more often, since I assume they are called much less frequently than your other workers. What I see is that if a SIP phone registers while an agent is not on a call, then the new logic in the agent channel driver to "inherit" device state from the underlying SIP device causes the Agent to become "in use." If an agent is currently on a call when this occurs, then it is not a problem because when the agent ends the call by pressing '*' he effectively overrides this "in use" status. Thus it is my opinion that this bug is inversely load-based.

The reason for this is that to the SIP channel driver, since the person is off hook and on a call to the AgentLogin application, the phone appears to be "in use." While this is perfectly valid for describing the SIP endpoint, it is not accurate when describing the Agent channel. When the SIP phone re-registers, this is an instance where the SIP channel driver tells the core that the device state has changed. The Agent channel driver is hooked into this state change and thus inherits the "in use" status.

So first off, the workaround I told you should work, but I also have another one in mind in case you don't like the first one I told you. The other workaround is to remove the call-limit from all your sip peers. This will cause the SIP channel driver to never report an "in use" device state and therefore never cause the Agent channel driver to inherit such a state.

My thinking when solving this is to turn this device state inheritance off for Agents who have called in using the AgentLogin application (as opposed to those who call in with AgentCallbackLogin). The reason why is that at all times, no matter what, the underlying channel driver will think that the phone is in use, so inheriting this device state makes no sense at all. The tricky part about this is that the Agent channel driver is written so badly that it is not easy to distinguish between the two types of agents. As soon as I have a patch ready, I will test it and put it up here for you to try as well.

Thanks for all the information you have provided on this.

By: Mark Michelson (mmichelson) 2009-01-14 13:17:10.000-0600

Since the ball is clearly in my court now, I'm moving this out of the "feedback" status.

By: Nathan Stocks (nathan) 2009-01-14 13:53:59.000-0600

What's the purpose of the call-limit parameter on SIP configs?  Back when I first started on Asterisk I inherited a 1.0 system that I cleaned up, then migrated to 1.2, and now to 1.4, and I'm not sure what purpose the call-limit serves in the first place, or if I need it at all.

Here's what one of my typical SIP configs looks like:

[5621]
mailbox=5621@default
type=friend
secret=somepassword
nat=never
host=dynamic
port=5060
dtmfmode=rfc2833
callerid="John Doe" <5621>
qualify=2000
call-limit=500
progressinband=no

(If it's not really relevant to the problem, feel free to ignore this comment or just address it later -- I'd rather not distract from the real issue)

By: Mark Michelson (mmichelson) 2009-01-14 13:59:05.000-0600

The call-limit is essentially exactly what it sounds like. It's the number of simultaneous calls that that SIP endpoint can have at any given time. For most SIP phones, a good number to use for this is the number of "lines" that the SIP phone supports. If you have a SIP provider listed as one of your SIP peers in sip.conf, then the call-limit would limit how many calls you can have to/from that provider at any given time.

The call-limit also serves a second purpose. It is used to determine the device state of a SIP endpoint. The basic rule is that if no calls are happening, then the endpoint is "not in use." If the endpoint has some calls toward the limit, then the endpoint is "in use." If the endpoint is using all of the calls toward the limit, then it is "busy."

For your situation in particular, by setting a call-limit, it meant that when an agent was on the phone called into the AgentLogin application, it was using one call towards its limit of 500, so it would report as being "in use."

By: Mark Michelson (mmichelson) 2009-01-14 14:00:07.000-0600

I forgot to mention that if no call-limit is specified in sip.conf, then the SIP phones will always report as being "not in use."

By: Mark Michelson (mmichelson) 2009-01-15 13:56:03.000-0600

I believe I have isolated the necessary information within the agent channel driver to limit device state inheritance to only callback agents.

I have attached a patch to this issue called 14173.patch which has the change which I believe will solve your problem. Please test it and let me know if it works. Thanks!

By: Mark Michelson (mmichelson) 2009-01-19 11:19:42.000-0600

Hi nathan. Sorry if I'm being a bother, but I was wondering if you were able to try out the patch I have attached here? Thanks!

By: Nathan Stocks (nathan) 2009-01-19 23:25:29.000-0600

Sorry for the slow response.  I've been absolutely swamped with stuff at work.

First, the "ringinuse = yes" on the queue configs has worked great as a workaround!  I haven't had a single complaint since I implemented that, which is a good sign.

Second, I checked out 1.4.22 and applied your patch (was it really only one line!?), compiled and installed it just now.  I also reverted the queues.conf to "ringinuse = no".  I sent an email to my call-center managers to report back whether or not the problem reappears.  If I don't let you know in a couple days, just ping me.  Work tends to bury me...

By: Nathan Stocks (nathan) 2009-01-20 09:04:22.000-0600

It's only been a few hours and I've already had two reports of the problem again...I don't think the patch was effective :-(  I'll wait another couple hours, but I'm planning on reverting to the workaround today (ringinuse = yes)...

By: Mark Michelson (mmichelson) 2009-01-20 09:59:05.000-0600

Thanks for the feedback. I'll try to rework it and see if I can get it right this time.

By: Nathan Stocks (nathan) 2009-01-20 16:14:29.000-0600

I'm not convinced that you've "got it wrong" yet.  I've got conflicting reports from my two call-center groups.  One group (my more technical one) claims zero problems in the last 24 hours.  The other group thinks the problem occurred this morning, but the person testing it _didn't get any hold music_, which makes me suspect it might have been an agent left his headset logged in after he left last night and the early shift just didn't notice it...

I've asked for each group to continue watching for another day and keep me posted.  If the problem is really reoccurring, it will show itself again.

By: Nathan Stocks (nathan) 2009-01-20 16:15:21.000-0600

Oh, and just to be clear, I have NOT reverted to the workaround yet.

By: Mark Michelson (mmichelson) 2009-01-20 16:19:06.000-0600

Thanks for the update!

By: Nathan Stocks (nathan) 2009-01-20 17:39:21.000-0600

Another update:  Good news, my call center manager did some research and tracked down this morning's issues to an Agent who left, but did not log off the queue (so calls kept going to his unmanned headset).  So thus far I have no reports of problems related to this issue.

I'll give another update tomorrow.

By: David Brillert (aragon) 2009-01-25 15:17:40.000-0600

ping

By: Leif Madsen (lmadsen) 2009-01-27 14:08:30.000-0600

I'm setting this to Ready For Review because usually a lack of response after testing of a patch tends to mean the patch resolved the issue and the reporter got too busy to update us :)

By: Nathan Stocks (nathan) 2009-01-27 14:34:05.000-0600

You're right!  I moved on to another work-related crisis.  Sorry :-/

* We've remained on the 1.4.22 + 1-line-patch-from-this-issue for over a week now.

I went back and talked to one of my call-center group managers today (I have two separate groups) and he says they haven't had any problems over the last week!

I can't get ahold of the other manager right this second, but they tend to escalate problems straight to me, and I haven't heard from him in a week either.  So that's a good sign too.

I'm calling this fixed from my standpoint.  Thanks for all the awesome help on this!  (One less thing for me to worry about...)

By: Alvaro Ramirez (aramirez) 2009-01-27 14:53:24.000-0600

Last thursday we installed the patch and the problem has been fixed.
Thank yo for your help in this matter.

By: Mark Michelson (mmichelson) 2009-01-27 15:03:38.000-0600

Great. Thanks for the feedback. In that case I am going to get this merged as soon as possible.

By: Digium Subversion (svnbot) 2009-01-27 15:55:04.000-0600

Repository: asterisk
Revision: 171689

U   branches/1.4/channels/chan_agent.c

------------------------------------------------------------------------
r171689 | mmichelson | 2009-01-27 15:55:03 -0600 (Tue, 27 Jan 2009) | 39 lines

Fix devicestate problems for "always-on" agent channels

A revision to chan_agent attempted to "inherit" the device
state of the underlying channel in order to report the
device state of an agent channel more accurately.

The problem with the logic here is that it makes no sense to
use this for always-on agents. If the agent is logged in, then
to the underlying channel, the agent will always appear to be
"in use," no matter if the agent is on a call or not. The reason
is that to the underlying channel, the channel is currently in use
on a call to the AgentLogin application.

The most common cause that I found for this issue to occur was for
a SIP channel to be the underlying channel type for an Agent channel.
If the SIP phone re-registers, then the registration will cause the
device state core to query the device state of the SIP channel. Since the
SIP channel is in use, the Agent channel would also inherit this status.
Once the agent channel was set to "in use" there was no way that the device
state could change on that channel unless the agent logged out.

The solution for this problem is a bit different in 1.4 than it is in the
other branches. In 1.4, there will be a one-line fix to make sure that only
callback agents will inherit device state from their underlying channel type.
For the other branches of Asterisk, since callback support has been removed, there
is also no need for device state inheritance in chan_agent, so I will simply be
removing it from the code.

In addition, the 1.4 source is getting a new comment to help the next person who
edits chan_agent.c. I'm adding a comment that a agent_pvt's loginchan field may be
used to determine if the agent is a callback agent or not.

(closes issue ASTERISK-13302)
Reported by: nathan
Patches:
     14173.patch uploaded by putnopvut (license 60)
Tested by: nathan, aramirez


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171689

By: Digium Subversion (svnbot) 2009-01-27 15:58:01.000-0600

Repository: asterisk
Revision: 171691

U   trunk/channels/chan_agent.c

------------------------------------------------------------------------
r171691 | mmichelson | 2009-01-27 15:58:00 -0600 (Tue, 27 Jan 2009) | 47 lines

Merged revisions 171689 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r171689 | mmichelson | 2009-01-27 15:55:08 -0600 (Tue, 27 Jan 2009) | 39 lines

Fix devicestate problems for "always-on" agent channels

A revision to chan_agent attempted to "inherit" the device
state of the underlying channel in order to report the
device state of an agent channel more accurately.

The problem with the logic here is that it makes no sense to
use this for always-on agents. If the agent is logged in, then
to the underlying channel, the agent will always appear to be
"in use," no matter if the agent is on a call or not. The reason
is that to the underlying channel, the channel is currently in use
on a call to the AgentLogin application.

The most common cause that I found for this issue to occur was for
a SIP channel to be the underlying channel type for an Agent channel.
If the SIP phone re-registers, then the registration will cause the
device state core to query the device state of the SIP channel. Since the
SIP channel is in use, the Agent channel would also inherit this status.
Once the agent channel was set to "in use" there was no way that the device
state could change on that channel unless the agent logged out.

The solution for this problem is a bit different in 1.4 than it is in the
other branches. In 1.4, there will be a one-line fix to make sure that only
callback agents will inherit device state from their underlying channel type.
For the other branches of Asterisk, since callback support has been removed, there
is also no need for device state inheritance in chan_agent, so I will simply be
removing it from the code.

In addition, the 1.4 source is getting a new comment to help the next person who
edits chan_agent.c. I'm adding a comment that a agent_pvt's loginchan field may be
used to determine if the agent is a callback agent or not.

(closes issue ASTERISK-13302)
Reported by: nathan
Patches:
     14173.patch uploaded by putnopvut (license 60)
Tested by: nathan, aramirez


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171691

By: Digium Subversion (svnbot) 2009-01-27 16:00:03.000-0600

Repository: asterisk
Revision: 171692

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_agent.c

------------------------------------------------------------------------
r171692 | mmichelson | 2009-01-27 16:00:02 -0600 (Tue, 27 Jan 2009) | 55 lines

Merged revisions 171691 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r171691 | mmichelson | 2009-01-27 15:58:39 -0600 (Tue, 27 Jan 2009) | 47 lines

Merged revisions 171689 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r171689 | mmichelson | 2009-01-27 15:55:08 -0600 (Tue, 27 Jan 2009) | 39 lines

Fix devicestate problems for "always-on" agent channels

A revision to chan_agent attempted to "inherit" the device
state of the underlying channel in order to report the
device state of an agent channel more accurately.

The problem with the logic here is that it makes no sense to
use this for always-on agents. If the agent is logged in, then
to the underlying channel, the agent will always appear to be
"in use," no matter if the agent is on a call or not. The reason
is that to the underlying channel, the channel is currently in use
on a call to the AgentLogin application.

The most common cause that I found for this issue to occur was for
a SIP channel to be the underlying channel type for an Agent channel.
If the SIP phone re-registers, then the registration will cause the
device state core to query the device state of the SIP channel. Since the
SIP channel is in use, the Agent channel would also inherit this status.
Once the agent channel was set to "in use" there was no way that the device
state could change on that channel unless the agent logged out.

The solution for this problem is a bit different in 1.4 than it is in the
other branches. In 1.4, there will be a one-line fix to make sure that only
callback agents will inherit device state from their underlying channel type.
For the other branches of Asterisk, since callback support has been removed, there
is also no need for device state inheritance in chan_agent, so I will simply be
removing it from the code.

In addition, the 1.4 source is getting a new comment to help the next person who
edits chan_agent.c. I'm adding a comment that a agent_pvt's loginchan field may be
used to determine if the agent is a callback agent or not.

(closes issue ASTERISK-13302)
Reported by: nathan
Patches:
     14173.patch uploaded by putnopvut (license 60)
Tested by: nathan, aramirez


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171692

By: Digium Subversion (svnbot) 2009-01-27 16:01:11.000-0600

Repository: asterisk
Revision: 171693

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_agent.c

------------------------------------------------------------------------
r171693 | mmichelson | 2009-01-27 16:01:11 -0600 (Tue, 27 Jan 2009) | 55 lines

Merged revisions 171691 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r171691 | mmichelson | 2009-01-27 15:58:39 -0600 (Tue, 27 Jan 2009) | 47 lines

Merged revisions 171689 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r171689 | mmichelson | 2009-01-27 15:55:08 -0600 (Tue, 27 Jan 2009) | 39 lines

Fix devicestate problems for "always-on" agent channels

A revision to chan_agent attempted to "inherit" the device
state of the underlying channel in order to report the
device state of an agent channel more accurately.

The problem with the logic here is that it makes no sense to
use this for always-on agents. If the agent is logged in, then
to the underlying channel, the agent will always appear to be
"in use," no matter if the agent is on a call or not. The reason
is that to the underlying channel, the channel is currently in use
on a call to the AgentLogin application.

The most common cause that I found for this issue to occur was for
a SIP channel to be the underlying channel type for an Agent channel.
If the SIP phone re-registers, then the registration will cause the
device state core to query the device state of the SIP channel. Since the
SIP channel is in use, the Agent channel would also inherit this status.
Once the agent channel was set to "in use" there was no way that the device
state could change on that channel unless the agent logged out.

The solution for this problem is a bit different in 1.4 than it is in the
other branches. In 1.4, there will be a one-line fix to make sure that only
callback agents will inherit device state from their underlying channel type.
For the other branches of Asterisk, since callback support has been removed, there
is also no need for device state inheritance in chan_agent, so I will simply be
removing it from the code.

In addition, the 1.4 source is getting a new comment to help the next person who
edits chan_agent.c. I'm adding a comment that a agent_pvt's loginchan field may be
used to determine if the agent is a callback agent or not.

(closes issue ASTERISK-13302)
Reported by: nathan
Patches:
     14173.patch uploaded by putnopvut (license 60)
Tested by: nathan, aramirez


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171693