[Home]

Summary:ASTERISK-28072: app_agent_pool: Crash when heavily manipulated externally using AMI
Reporter:Jeff Hoppe (jhoppebugs)Labels:
Date Opened:2018-09-25 10:17:29Date Closed:
Priority:MinorRegression?
Status:Open/NewComponents:Applications/app_agent_pool
Versions:13.20.0 13.23.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:CentOS 6Attachments:( 0) Agent_Functionality.pdf
( 1) backtrace0925.txt
( 2) backtrace1002.txt
( 3) Crash_Case_#3.pdf
( 4) Full_Log_for_Thead_15274.txt
( 5) Issue_narrowed_down_a_bit.pdf
( 6) xaa.gz
( 7) xab.gz
( 8) xac.gz
( 9) xad.gz
(10) xae.gz
(11) xaf.gz
(12) xag.gz
(13) xah.gz
Description:When an agent leaves the app pool to join a conference call they need to be re-logged in to the app pool when done.   I use AMI to redirect the agent channel to this context:
{noformat}
; ------ when agent leaves the realm they will call this to get back in.
[agentrelogin]
exten => 11,1,NoOp()
exten => 11,n,AgentLogin(${agentid},s)
{noformat}
Occasionally we get an Asterisk crash as specified in the attached log.

Comments:By: Asterisk Team (asteriskteam) 2018-09-25 10:17:30.658-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Jeff Hoppe (jhoppebugs) 2018-09-25 10:18:35.088-0500

Here is the back trace for the error.

By: Jeff Hoppe (jhoppebugs) 2018-09-25 11:02:47.026-0500

I attached the FULL log entries for thread 15274.

I had debug logging set up to gather more information but failed to actually turn it on with 'core set debug 5'.

Next time this crash happens I should have debug logging.  Hopefully you can see something wrong without it.

By: Joshua C. Colp (jcolp) 2018-09-25 12:05:28.316-0500

I'm going to put this into waiting for feedback until the log with full debug is available so we can see what was going on internally at the time.

By: Jeff Hoppe (jhoppebugs) 2018-10-02 15:21:34.210-0500

Here is the back trace (backtrace1002.txt) for a crash.  
Crash Case #3 PDF has information about what is happening.
xah has is the debug file with the crash on line 960920


By: Jeff Hoppe (jhoppebugs) 2018-10-03 10:14:51.014-0500

If there is anything else you need from this incident, let me know.    As other incidents happen (that are slightly  different), I will add the information here as well.

By: Joshua C. Colp (jcolp) 2018-10-03 11:25:37.539-0500

Can you provide information/a test case of how exactly you are using the Agent functionality?

By: Jeff Hoppe (jhoppebugs) 2018-10-03 12:48:03.040-0500

I have attached a document describing how we are using the Agent functionality.   Let me know if more is needed and in what areas and detail level.


By: Jeff Hoppe (jhoppebugs) 2018-10-17 14:39:13.443-0500

Do you want more examples of times it crashes or are you pretty comfortable on knowing what the root cause is based on this one example (Crash case #3)?

By: Joshua C. Colp (jcolp) 2018-10-17 14:52:36.024-0500

You can provide more examples to help when someone looks at this. There is no time frame on that, though.

By: Jeff Hoppe (jhoppebugs) 2018-11-08 14:24:53.563-0600

I have narrowed down the scope of this crash to the following (I don't believe this is a heavy use of the AMI in manipulating channels):


See "Issue Narrowed down a bit".pdf


The Agent Functionality.pdf describes what we do, but the error is always happening on the BRIDGEKICK of all those scenarios. Since I first uploaded that document, the first scenario has been changed to just be a BRIDGEKICK and it still will error in that spot.