[Home]

Summary:ASTERISK-07043: MixMonitor() related segfault on agent hangup
Reporter:Jonathan Towne (jontow)Labels:
Date Opened:2006-05-26 13:34:21Date Closed:2006-09-18 10:23:16
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_mixmonitor
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 2006-06-27-channel.c-M7230.patch
Description:Reference bug id 7182 for additional background information.  It is somehow closely related, although that bug looks to be resolved.

To reproduce: "monitor_format" must be set to enable monitoring, and mixmonitor must be turned on.

I'm using realtime configuration for queue, queue_member configuration.

Log an agent in on a test queue, call it, observe MixMonitor log message, hang one end up and once mixmonitor finishes, a segfault occurs.

I cannot tell if this is a mixmonitor issue or an app_queue issue; it appears as though mixmonitor finishes its job before the crash happens?

****** ADDITIONAL INFORMATION ******

Full backtrace (made with 'dont-optimize') available for all threads at:

http://ppx.slic.com/~jontow/mixmonitor_core_all_threads.txt


Here is a short snippet from the CLI log messages:
------------------------------------------------------------------
 == Begin MixMonitor Recording IAX2/192.168.2.13:4569-1
   -- doing lookup for '192.168.2.13'
 == Spawn extension (default, 5001, 2) exited non-zero on 'Local/5001@default-bb23,2'
   -- Hungup 'IAX2/jonjail-4'
 == Spawn extension (ppxtestq, 1, 2) exited non-zero on 'IAX2/192.168.2.13:4569-1'
zsh: segmentation fault (core dumped)  (asterisk -vvvvvvvvvvvc)
------------------------------------------------------------------

Now the debug-log snippet:
------------------------------------------------------------------
May 26 14:13:24 VERBOSE[31380] logger.c:     -- Hungup 'IAX2/jonjail-4'
May 26 14:13:24 DEBUG[31380] channel.c: Didn't get a frame from channel: Agent/5001
May 26 14:13:24 DEBUG[31380] channel.c: Bridge stops bridging channels IAX2/192.168.2.13:4569-1 and Agent/5001
May 26 14:13:24 DEBUG[31380] chan_agent.c: Hangup called for state Up
May 26 14:13:24 VERBOSE[31380] logger.c:   == Spawn extension (ppxtestq, 1, 2) exited non-zero on 'IAX2/192.168.2.13:4569-1'
May 26 14:13:24 DEBUG[31380] channel.c: Spy MixMonitor removed from channel IAX2/192.168.2.13:4569-1
------------------------------------------------------------------
Comments:By: Jonathan Towne (jontow) 2006-05-26 14:12:19

As per vecher's request:

This bug is not applicable (no MixMonitor) to the 1.2 branch, and the previous reported bug (7182) does not seem to be applicable either.

It looks as though these bugs (7182 & 7230) were added with MixMonitor.

By: Serge Vecher (serge-v) 2006-06-06 13:19:20

jontow: could you please test again with the latest trunk? There was a fix applied last night that may impact this: http://lists.digium.com/pipermail/svn-commits/2006-June/014324.html

Thanks.

By: Jonathan Towne (jontow) 2006-06-22 16:50:26

vechers -- As per one of my developers who had some free moments; it still happens on trunk as of today.  Let me know if you need more details.

By: Jonathan Towne (jontow) 2006-06-23 14:25:34

Observe the following AMI log snippet from earlier when I was taking another look at this:

#################################################################
Event: Unlink
Privilege: call,all
Channel1: IAX2/192.168.2.13:4569-1
Channel2: Agent/5000
Uniqueid1: 1151089033.25
Uniqueid2: 1151089035.28
CallerID1: 3152653253
CallerID2: 3152653253

Event: AgentComplete
Privilege: agent,all
Queue: testq
Uniqueid: 1151089033.25
Channel: Agent/5000
HoldTime: 14
TalkTime: 5
Reason: agent
Variable: MIXMONITOR_FILENAME=/var/spool/asterisk/monitor/1151089033.25.w

Event: Hangup
Privilege: call,all
Channel: Agent/5000
Uniqueid: 1151089035.28
Cause: 0
Cause-txt: Unknown

Connection closed by foreign host.

#################################################################

Note the Variable: MIXMONITOR_FILENAME with the lack of "av" on the end of the string.  What's interesting here is:

-rw-r--r--  1 root  wheel  78124 Jun 23 14:57 /var/spool/asterisk/monitor/1151089033.25.wav

The filename itself is fine, and I can't tell whether the AMI is truncating it as its printed, or whether its overflowing a buffer or otherwise?  I'll keep digging, but this might get someone quicker than myself on the right track.

By: Jonathan Towne (jontow) 2006-06-27 14:20:26

Now on r36170 (clean checkout from just now) I've reproduced the bug from scratch with flat-file setup, or realtime.

Given the patch just added (2006-06-27-channel.c-M7230.patch) I can no longer get it to segfault, and MixMonitor seems to end cleanly.

It looked like a race condition where ast_channel_spy_remove() was being called twice (once in app_mixmonitor.c's stopmon() and the other in channel.c's detach_spies()).  I would like someone to glance through this again and make sure it doesn't cause other issues; ie. in other use cases of MixMonitor or ChanSpy.

Other than that; it Works For Me(tm) now.

By: Jonathan Towne (jontow) 2006-06-27 14:26:43

Considering this a bit more--I think removing this block from detach_spies() is incorrect.  It is perhaps a much better idea to remove it from stopmon() so that it will only affect MixMonitor?  I think right now, the 2006-06-27-channel.c-M7230 patch will break ChanSpy in incredulous ways.

Feedback?  (I can easily generate the patch for either case, but was hasty and posted what I now consider to be the wrong one, in theory)

By: Serge Vecher (serge-v) 2006-08-18 08:20:13

ok, what's up with this issue?

By: Serge Vecher (serge-v) 2006-09-05 15:37:23

I major fix was committed to trunk over the weekend. Please test the latest revision of trunk (> 42000 currently). Thanks

By: Andrew Thompson (andrewt) 2006-09-12 13:36:03

I'm currently having issues with asterisk(r42827) segfaulting on any agent hangup (regardless of mixmonitor being on). If/when I resolve this issue I'll try to test MixMonitor again.

By: Joshua C. Colp (jcolp) 2006-09-18 10:23:15

When/if you get into a position where you can test this again andrewt please do, otherwise anyone else reopen if there is still an issue.