[Home]

Summary:ASTERISK-26297: Delayed SIP signalling and RTP distortion caused by DEBUG_THREADS compile option
Reporter:David Brillert (aragon)Labels:
Date Opened:2016-08-15 20:00:43Date Closed:2016-08-19 12:44:35
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:11.23.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 5 i386 and x64Attachments:( 0) ast1116.pcap
( 1) ast11svn.pcap
( 2) messages.rar
Description:After upgrading to 11.23 on el5. Extension to extension calls and external PSTN calls are badly distorted and signalling/call set up is delayed between 2-30 seconds.  The problem is also reproduced on x64 el5
This was not an issue in 11.16 or 11.20 so I am marking this a regression.  
I don't know where to start debugging this so I am attaching messages file with sip set debug on rtp set debug on verbose and debug logging options
Timertest was clean
res_timing_dahdi.so 
Comments:By: Asterisk Team (asteriskteam) 2016-08-15 20:00:44.290-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: David Brillert (aragon) 2016-08-15 20:02:55.278-0500

I also have the same issue with svn checkout built 8/15/2016

By: Rusty Newton (rnewton) 2016-08-16 09:39:03.327-0500

Can you attach a tcpdump/wireshark of the extension to extension calls?

If possible, a working call from the older version and a distorted call from the newer version.

Can you also try using a different module just temporarily to see if that affects extension to extension calls?

By: David Brillert (aragon) 2016-08-16 23:19:05.081-0500

I need to use the dahdi timing module for the PRI spans.
But here are the pcaps
ast1116.pcap is the 11.16 with good quality audio
ast11svn.pcap is the 11 svn with delayed audio

The INVITE you are looking for is 6367

By: David Brillert (aragon) 2016-08-16 23:34:50.206-0500

I should have mentioned the traces are through the PRI to a NAT'd extension 6367
This is the best I could do remotely. Local extension to extension calls over the LAN are also affected.

By: David Brillert (aragon) 2016-08-17 13:12:36.917-0500

I just edited the bug report details because I just received confirmation that the problem is also reproduced on x64 el5 with 11.23 and 11 svn.

By: Rusty Newton (rnewton) 2016-08-18 09:11:44.417-0500

If local extension to extension calls on the LAN are affected then please include pcaps of those calls as it is always easier to spot what is going on in the most simple call path.
Thanks!

By: David Brillert (aragon) 2016-08-18 14:59:10.148-0500

Hi Rusty,

The pcap only involves a single call and was taken after hours while I was able to reproduce the issue remotely. There shouldn't be much SIP clutter in the trace. A LAN trace won't be possible in this case and isn't required to further investigate this issue IMHO.

By: David Brillert (aragon) 2016-08-19 10:24:18.119-0500

I found root cause.
The new releases were compiled with DEBUG_THREADS
I found this while executing core show locks trying to investigate a possible locking issue. I was surprised when the command returned output.  Normally this flag is disabled on the build servers.
Recompiling without DEBUG_THREADS fixed the reported issues.
It would be nice if we could run in production with this flag enabled so we could debug deadlocks right away and without affecting audio on a single channel.  The flag has become useless so I still think this is a bug but its not a regression since things have been this way for a long time.


By: Richard Mudgett (rmudgett) 2016-08-19 12:17:55.821-0500

DEBUG_THREADS is not recommended for normal production because it impacts performance by essentially serializing all locking in Asterisk.

By: David Brillert (aragon) 2016-08-19 12:26:42.566-0500

Yup, its evil. Got any suggestions to debug locking in a production environment?  I can't think of any customer environment where this flag actually helps.

Aside from that I can't think of any reason to keep this bug report open.

By: Richard Mudgett (rmudgett) 2016-08-19 12:38:28.478-0500

It is a debugging tool to find a problem you are *currently* having and like a lot of debugging tools leaving it enabled for normal operation is going to kill your performance.  The only other way to debug locking problems is to attach gdb to the Asterisk process and get a backtrace.  It is just much harder to find deadlocks that way.

By: Rusty Newton (rnewton) 2016-08-19 12:44:35.786-0500

Closing this out as Not A Bug. There is probably room for improvement with DEBUG_THREADS, if possible at all. If someone wants to provide a patch for that then we can open an issue for it. Otherwise it is a known limitation of the tool.

By: David Brillert (aragon) 2016-08-19 12:46:02.545-0500

If you can execute 'core show locks' in your Asterisk console then you had DEBUG_THREADS enabled when you compiled Asterisk. This flag should never be enabled in a production environment.  If you want to debug a lock without this flag enabled during compile options then install gdb and execute "'gdb asterisk --pid=`pidof asterisk`" then "thread apply all bt" should print the needed debug information.