[Home]

Summary:ASTERISK-24477: segfault in ast_translate at translate.c during calls with codec_siren7
Reporter:Daniel Ammann (damm)Labels:
Date Opened:2014-10-31 02:04:46Date Closed:2018-01-02 08:30:26.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Codecs/codec_siren7
Versions:11.13.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Debian 3.2.63-2 i686 GNU/Linux Asterisk 11.13.0~dfsg-1~bpo70+1 Digium Siren7 Module Version 11.0_1.0.5 (optimized for i686_32)Attachments:( 0) backtrace.txt
( 1) backtrace1.txt
( 2) backtrace2.txt
( 3) backtrace3.txt
Description:Using Siren7 codec throws a segfault at random times during a call. THe Endpoints involved are different Polycom IP6000 and IP7000 phones

Segfault
{noformat}
kernel: [754930.655998] asterisk[22492]: segfault at b63ff000 ip b5d336a9 sp b1948610 error 4 in codec_siren7.so[b5d2f000+f000]
kernel: [827976.561636] asterisk[2558]: segfault at b1e00000 ip b64386a9 sp b16ba610 error 4 in codec_siren7.so[b6434000+f000]
kernel: [927814.320410] asterisk[27330]: segfault at b0b00000 ip b61086a9 sp af2a0610 error 4 in codec_siren7.so[b6104000+f000]
{noformat}

At the time of the crash, several Polycom endpoints were using this asterisk instance with siren7 codec in parallel. The typical usage scenario is that those endpoints are participating on app_conference based conferencing, and transcoding is happening for those endpoints (siren7 -> slin16)

This problems occurs in guesstimated 1 out of 20 calls, and minutes to hours after the call has been established

I can make core dumps available if helpful
Comments:By: Matt Jordan (mjordan) 2014-11-03 13:48:07.360-0600

Thank you for your bug report. In order to move your issue forward, we require a backtrace[1] from the core file produced after the crash. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

After enabling, reproduce the crash, and then execute the backtrace[1] instructions. When complete, attach that file to this issue report.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Daniel Ammann (damm) 2014-11-03 14:50:33.585-0600

Backtrace as requested.

This is from one of the crashes. If backtraces from the other segfault are helpful, let me know.



By: Richard Mudgett (rmudgett) 2014-11-03 15:30:43.693-0600

That backtrace does not contain much in the way of symbols to see what happened.  Did you strip the symbols from the files?

By: Daniel Ammann (damm) 2014-11-04 00:43:22.511-0600

Richard, yes, sorry, I was missing the debug symbols. I am using a precompiled Debian package from wheezy-backports, but fortunately, they also carry a special package with the debug symbols.

I have recreated the backtraces for all three crashes, this time it should include  the symbol information, see attached

By: Rusty Newton (rnewton) 2014-11-05 16:36:49.442-0600

Daniel can you provide your complete sip.conf configuration and dialplan needed to reproduce the issue?

By: Daniel Ammann (damm) 2014-11-06 04:34:57.428-0600

Yes, I could. Is there a way to share this only with Digium staff? These files are somewhat sensitive, as they contain hundreds of SIP extensions, real usernames, etc. I checked the different groups in Jira that I can chose from when uploading the file, but none of those seem to be sufficiently restrictive. Please let me know how I can make this available without the whole world to see it.

If you intend to recreate the problem, I believe this will be rather tedious and hard, as you would have to have several siren7-capable endpoints, have them perform a number of calls using that codec in parallel, and hope for the crash to occur.

Also, I wonder what the relevance of sip.conf and dialplan is in this crash. It is quite clearly codec-related, and I would believe that it might has to do with multi-threaded use of the codec, a race condition of some sort, or transcoding that fails...


By: Mark Michelson (mmichelson) 2014-11-20 17:18:19.584-0600

I think the backtraces are giving a decent indication of what's going on:

{noformat}
#1  0x081a12a3 in ast_translate (path=0x8d67eb0, f=f@entry=0xb0afebd0, consume=consume@entry=1) at translate.c:522
       p = 0x1
       out = <optimized out>
       delivery = {tv_sec = 1414676786, tv_usec = 803225}
       has_timing_info = 1
       ts = 39523299
       len = 20
       seqno = 49545
       __PRETTY_FUNCTION__ = "ast_translate"
{noformat}

The code at line 522 of translate.c is {{out = p->t->frameout(p);}} . If p is actually 0x1, then that would go a long way in explaining why there's a crash occuring. However,  knowing why p is 0x1 is hard to determine in this case.

By: Rusty Newton (rnewton) 2014-11-20 17:37:21.239-0600

{quote}
Yes, I could. Is there a way to share this only with Digium staff? These files are somewhat sensitive, as they contain hundreds of SIP extensions, real usernames, etc. I checked the different groups in Jira that I can chose from when uploading the file, but none of those seem to be sufficiently restrictive. Please let me know how I can make this available without the whole world to see it.
{quote}

We can lock the issue down to only Digium, Bug Marshals and the Reporter, but then that would hide it from all others.. which we don't really want to do unless we really have to. That case would typically be with a security vulnerability.

{quote}
Also, I wonder what the relevance of sip.conf and dialplan is in this crash. It is quite clearly codec-related, and I would believe that it might has to do with multi-threaded use of the codec, a race condition of some sort, or transcoding that fails...
{quote}

sip.conf and dialplan would be useful for reproduction and understanding how the system got to where it is at during the crash. If you think we are unlikely to reproduce the issue then lets forget about that for the moment.

[~mmichelson] looked at the traces for us and it turns out we may be able to further investigate the issue with only the traces. However. what may also help is an Asterisk log showing what is happening up to the crash. Can you provide a log, including the output of "sip set debug on" and the DEBUG type logger channel? You can find some specific instructions on the wiki: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information.

Once you have that log, it should be trivial for you to scrub out all IP addresses if necessary. If it comes down to it, then we can probably lock down the issue.

By: Roberto (tel.medola) 2014-11-25 04:20:59.582-0600

Also care about this issue. I have the same problem with 11.12

By: Rusty Newton (rnewton) 2014-12-09 08:55:26.000-0600

[~tel.medola] if you can provide the same debug as requested of Daniel, that would be very helpful.

By: Daniel Ammann (damm) 2014-12-10 09:19:06.337-0600

I am sorry that I can not provide further debug logs on this. Doing so would mean re-enabling siren7, and I would then be having the risk of a crashing production system, that may be brought down unexpectedly.

I have worked around this differently - by reconfiguring the Polycom endpoints to use L16.16, which corresponds to slin16 on the Asterisk side. By that, we get siren7-like wideband audio, but without the risk of running into the segfault. Further, this avoids extensive transcoding on the Asterisk side. The only downside to this is that there is a high bandwidth requirement (256 kbps), but since the endpoints are in my case in a local LAN, this is acceptable



By: Roberto (tel.medola) 2014-12-10 10:22:20.117-0600

Hi Rusty Newton, thank you for answer.
To collect the log, must enable the codecs and it can happen the crash. I can not take that chance right now.

By: Joshua C. Colp (jcolp) 2017-12-18 06:42:20.303-0600

I fixed a crash in the codec_siren7 module in the last release. Can you please try it on a supported version of Asterisk and see if this is resolved?

By: Asterisk Team (asteriskteam) 2018-01-02 08:30:26.785-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].
[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines