[Home]

Summary:ASTERISK-21893: Segfault after call hangup, in ast_channel_hangupcause_set, at channel_internal_api.c
Reporter:Aleksandr Gordeev (axonaro)Labels:
Date Opened:2013-06-10 02:21:50Date Closed:2015-05-13 12:26:09
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:11.2.1 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace_20130514.txt
( 1) debug_20130514.txt
( 2) extensions.conf.txt
Description:{code}#0  0x080d0e68 in ast_channel_hangupcause_set (chan=0x0, value=16) at channel_internal_api.c:580
#1  0xb5c244d2 in pri_dchannel (vpri=0xb5c40198) at sig_pri.c:7137
#2  0x081b8735 in dummy_start (data=0x894c060) at utils.c:1028
#3  0xb7257955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#4  0xb767e58e in clone () from /lib/i686/cmov/libc.so.6{code}
Comments:By: Matt Jordan (mjordan) 2013-06-10 12:06:48.067-0500

Well this is no good.

In {{sip_pri}}, we explicitly check for the existance of the owning channel before attempting to set the hangup cause:

{noformat}
if (pri->pvts[chanpos]->owner) {
int do_hangup = 0;

snprintf(cause_str, sizeof(cause_str), "PRI PRI_EVENT_HANGUP_REQ (%d)", e->hangup.cause);
pri_queue_pvt_cause_data(pri, chanpos, cause_str, e->hangup.cause);

ast_channel_hangupcause_set(pri->pvts[chanpos]->owner, e->hangup.cause);
{noformat}

If {{pri->pvts[chanpos]->owner}} is not NULL but is set to NULL by the time it reaches {{ast_channel_hangupcause_set}}, there's a race condition someplace and someone is deref-ing the channel when they shouldn't be.

Can you provide the relevant portions of your dialplan, as well as a DEBUG log leading up to the crash? That might provide some insight into what occurred and where the race condition lies.


By: Rusty Newton (rnewton) 2013-06-25 16:48:20.079-0500

Alexandr can you provide the additional information?

By: Aleksandr Gordeev (axonaro) 2013-06-30 04:49:06.194-0500

Sorry for the delay.

By: Rusty Newton (rnewton) 2013-07-01 17:34:09.135-0500

Alexandr, thanks for the debug log and backtrace. Can you additionally provide your extensions.conf, or snippets of extensions.conf relevant to the debug log?  You can sort of work it out through the debug log, but it really helps to have both. If you are affecting the calls with scripts and such, you might explain those as well.

The idea is to understand whats happening to the channels all the way up to the crash. Thanks!

By: Aleksandr Gordeev (axonaro) 2013-07-02 01:04:45.031-0500

Upload extensions.conf.txt

By: Nikola Ciprich (nikola.ciprich) 2014-12-02 00:05:29.442-0600

hello, just wanted to let you know we're experiencing exactly same issue with asterisk 11.14.0, dahdi 2.10.0.1, libpri 1.4.15.
affected box is quite heavily used, 4x PRI, ~60/70 concurrent calls. we're using AMI originated calls a lot, CEL logging to postgresql, dialplan is dynamic, with connection to sqlitedb (via ODBC), otherwise nothing special. If I can help debug this issue, I'll be glad to provide any needed info. However due to heavy box usage, I'm afraid, I won't be able to use valgrind and similar :-(

By: Jan Havelka (dzavy) 2015-05-06 07:02:43.435-0500

Hi, we're experiencing the same issue, too. Asterisk version 11.13.0, dahdi 2.10.0.1.
{noformat}
May  5 08:16:52 asterisk1 kernel: [4829351.974306] asterisk[3306]: segfault at 9f0 ip 000000000048b856 sp 00007f6c0314d028 error 6 in asterisk[400000+1dd000]
{noformat}

Time of the crash matches with the end of standard answered bridged call E1-E1, dstcontext was outgoing-primary, lastapp Dial.

{noformat}
| 2015-05-05 08:15:40 | "xxxxxx666" <xxxxxx666> | xxxxxx666      | xxxxxx167      | outgoing-primary  | DAHDI/i1/1156-12383           | DAHDI/i2/xxxxxx167-129b7      | Dial    | DAHDI/G2/xxxxxx167      |       72 |      64 | ANSWERED    |        2 | 662         | 1-1430806540.163384 | 1156      |
{noformat}

{noformat}
[outgoing-primary]
exten => _X.,1,Macro(check-agent)
exten => _X.,2,Macro(notify-yesl,${CALLERID(num)},${EXTEN})
exten => _X.,3,Macro(set-route,${EXTEN})
exten => _X.,4,Macro(set-callerid)
exten => _X.,5,Macro(set-callerid-public)
exten => _X.,6,Macro(call-recording)
exten => _X.,7,Macro(outgoing-route,${EXTEN},outgoing-failover)

[macro-outgoing-route]
exten => s,1,Set(TIMEOUT(absolute)=7200)
exten => s,2,Set(ALTER_ARG=${ARG1})
exten => s,3,Dial(${ODBC_ROUTEPEER(${ROUTEID})}/${EVAL(${ODBC_ROUTEDIAL(${ROUTEID})})})
exten => s,n,Goto(s-${DIALSTATUS},1)
exten => s-NOANSWER,1,Hangup
exten => s-BUSY,1,Hangup
exten => s-CONGESTION,1,GoTo(${ARG2},${ARG1},1)
exten => s-CHANUNAVAIL,1,GoTo(${ARG2},${ARG1},1)
exten => _s-.,1,NoOp
{noformat}

By: Richard Mudgett (rmudgett) 2015-05-12 17:47:21.431-0500

Patch up on gerrit for code review:
v11 https://gerrit.asterisk.org/#/c/445/
v13 https://gerrit.asterisk.org/#/c/446/
master https://gerrit.asterisk.org/#/c/447/