Asterisk
  1. Asterisk
  2. ASTERISK-21064

Crash when handling ACK on dialog that has no channel

    Details

    • Frequency of Occurrence:
      Frequent

      Description

      Asterisk is frequently segfaulting (several times per week) in __ast_queue_frame at at channel.c:1450.

      This is a fairly busy call routing server with up to 120 calls at peak times. Calls come in on a SIP channel and are routed straight back out on another SIP channel. directmedia=yes.

      There are about eleven lines of dialplan for each call:

      Set("SIP/xxxx-0000044c", "TAG=xxxxx") 
      NoOp("SIP/xxxx-0000044c", "Call from x to y") 
      Gosub("SIP/xxxx-0000044c", "sub-set-group,s,1(xxxxx)
      Set("SIP/xxxx-0000044c", "GROUP()=xxxxx") 
      NoOp("SIP/xxxx-0000044c", "xxxxx: Call 27") 
      Return("SIP/xxxx-0000044c", "") 
      Set("SIP/xxxx-0000044c", "CDR(accountcode)=xxxx")
      Gosub("SIP/xxxx-0000044c", "sub-set-cli,xxx,1") 
      Set("SIP/xxxx-0000044c", "CALLERID(num)=nnnnnnnnnnnn")
      Return("SIP/xxxx-0000044c", "") 
      Dial("SIP/xxxx-0000044c", "SIP/trunk/yyyy,300,C")
      

      The problem does not appear to be load related. The segfault has happened with only 26 calls in progress.

      The segfault has also happened after only 30 minutes of uptime and then not again for a few days.

      We have seen the problem on three different VM's.

      I have a feeling that the problem is only happening with calls from one source SIP trunk but I have been unable to prove this.

      Program terminated with signal 11, Segmentation fault.
      #0  0x000000000046a961 in __ast_queue_frame (chan=0x0, fin=0x41d72f60, head=0, after=0x0) at channel.c:1450
      1450            cur = AST_LIST_LAST(&chan->readq);
      

      I have attached a backtrace. Please let me know if any further information is required.

      1. AST-2013-004-1.8.15-cert.diff
        4 kB
        Matt Jordan
      2. AST-2013-004-1.8.diff
        0.4 kB
        Matt Jordan
      3. AST-2013-004-11.2-cert.diff
        1 kB
        Matt Jordan
      4. AST-2013-004-11.diff
        15 kB
        Matt Jordan
      5. backtrace_20130211.txt
        231 kB
        Colin Cuthbertson
      6. backtrace_20130212.txt
        126 kB
        Colin Cuthbertson
      7. debug_20130212.tar.gz
        108 kB
        Colin Cuthbertson
      8. issueA21064_fix.patch
        0.6 kB
        Walter Doekes
      9. issueA21064_late_ack_of_death.xml
        2 kB
        Walter Doekes
      10. pcap_20130212.pcap
        5.72 MB
        Colin Cuthbertson

        Issue Links

          Activity

          Hide
          Walter Doekes added a comment -

          Is that ACK (from readbuf in backtrace_20130212.txt) supposed to be in the pcap (pcap_20130212.pcap), because I don't see it.

          You might set directmedia to no, to avoid the issue.

          Show
          Walter Doekes added a comment - Is that ACK (from readbuf in backtrace_20130212.txt ) supposed to be in the pcap ( pcap_20130212.pcap ), because I don't see it. You might set directmedia to no, to avoid the issue.
          Hide
          Walter Doekes added a comment -
          $ sudo asterisk -V
          Asterisk SVN-branch-1.8-r396745
          
          $ cat /etc/asterisk/sip.conf
          [general]
          allowguest=yes
          
          $ cat /etc/asterisk/extensions.conf
          [default]
          exten => answer,1,Answer()
          	same => n,Hangup()
          
          $ sudo gdb asterisk
          (gdb) run -c
          
          $ sipp -m 1 -sf issueA21064_late_ack_of_death.xml 127.0.0.1 -key tel answer
          
          # OBSERVE: you may need to tweak the pause in the sipp scenario a bit
          # In my case 510ms was 100% reliable. Asterisk had started the Hangup but
          # hadn't sent the BYE yet.
          
          ...
          [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:3946 retrans_pkt: ** SIP timers: Rescheduling retransmission 2 to 1000 ms (t1 500 ms (Retrans id #2)) 
          Retransmitting #1 (NAT) to 127.0.0.1:5061:
          SIP/2.0 200 OK
          ...
              -- Executing [answer@default:2] Hangup("SIP/127.0.1.1-00000000", "") in new stack
            == Spawn extension (default, answer, 2) exited non-zero on 'SIP/127.0.1.1-00000000'
          [Aug 20 10:23:04] DEBUG[23628]: chan_sip.c:6682 sip_hangup: Hangup call SIP/127.0.1.1-00000000, SIP callid 1-23627@127.0.1.1
          Scheduling destruction of SIP dialog '1-23627@127.0.1.1' in 32000 ms (Method: INVITE)
          
          <--- SIP read from UDP:127.0.0.1:5061 --->
          ACK sip:answer@127.0.0.1:5060 SIP/2.0
          ...
          Found audio description format PCMA for ID 8
          [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9617 process_sdp: Processing media-level (audio) SDP a=rtpmap:8 PCMA/8000... OK.
          Found audio description format PCMU for ID 0
          [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9617 process_sdp: Processing media-level (audio) SDP a=rtpmap:0 PCMU/8000... OK.
          Capabilities: us - 0x80000008000e (gsm|ulaw|alaw|h263|testlaw), peer - audio=0xc (ulaw|alaw)/video=0x0 (nothing)/text=0x0 (nothing), combined - 0xc (ulaw|alaw)
          Non-codec capabilities (dtmf): us - 0x1 (telephone-event|), peer - 0x0 (nothing), combined - 0x0 (nothing)
          Peer audio RTP is at port 127.0.1.1:6000
          [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9871 process_sdp: We're settling with these formats: 0xc (ulaw|alaw)
          [Aug 20 10:23:04] ERROR[23620]: astobj2.c:115 INTERNAL_OBJ: user_data is NULL
          
          Program received signal SIGSEGV, Segmentation fault.
          [Switching to Thread 0x7ffff4184700 (LWP 23620)]
          0x0000000000475d60 in __ast_queue_frame (chan=0x0, fin=0x7ffff4182da0, head=0, after=0x0) at channel.c:1454
          1454		cur = AST_LIST_LAST(&chan->readq);
          
          Show
          Walter Doekes added a comment - $ sudo asterisk -V Asterisk SVN-branch-1.8-r396745 $ cat /etc/asterisk/sip.conf [general] allowguest=yes $ cat /etc/asterisk/extensions.conf [default] exten => answer,1,Answer() same => n,Hangup() $ sudo gdb asterisk (gdb) run -c $ sipp -m 1 -sf issueA21064_late_ack_of_death.xml 127.0.0.1 -key tel answer # OBSERVE: you may need to tweak the pause in the sipp scenario a bit # In my case 510ms was 100% reliable. Asterisk had started the Hangup but # hadn't sent the BYE yet. ... [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:3946 retrans_pkt: ** SIP timers: Rescheduling retransmission 2 to 1000 ms (t1 500 ms (Retrans id #2)) Retransmitting #1 (NAT) to 127.0.0.1:5061: SIP/2.0 200 OK ... -- Executing [answer@default:2] Hangup("SIP/127.0.1.1-00000000", "") in new stack == Spawn extension (default, answer, 2) exited non-zero on 'SIP/127.0.1.1-00000000' [Aug 20 10:23:04] DEBUG[23628]: chan_sip.c:6682 sip_hangup: Hangup call SIP/127.0.1.1-00000000, SIP callid 1-23627@127.0.1.1 Scheduling destruction of SIP dialog '1-23627@127.0.1.1' in 32000 ms (Method: INVITE) <--- SIP read from UDP:127.0.0.1:5061 ---> ACK sip:answer@127.0.0.1:5060 SIP/2.0 ... Found audio description format PCMA for ID 8 [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9617 process_sdp: Processing media-level (audio) SDP a=rtpmap:8 PCMA/8000... OK. Found audio description format PCMU for ID 0 [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9617 process_sdp: Processing media-level (audio) SDP a=rtpmap:0 PCMU/8000... OK. Capabilities: us - 0x80000008000e (gsm|ulaw|alaw|h263|testlaw), peer - audio=0xc (ulaw|alaw)/video=0x0 (nothing)/text=0x0 (nothing), combined - 0xc (ulaw|alaw) Non-codec capabilities (dtmf): us - 0x1 (telephone-event|), peer - 0x0 (nothing), combined - 0x0 (nothing) Peer audio RTP is at port 127.0.1.1:6000 [Aug 20 10:23:04] DEBUG[23620]: chan_sip.c:9871 process_sdp: We're settling with these formats: 0xc (ulaw|alaw) [Aug 20 10:23:04] ERROR[23620]: astobj2.c:115 INTERNAL_OBJ: user_data is NULL Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff4184700 (LWP 23620)] 0x0000000000475d60 in __ast_queue_frame (chan=0x0, fin=0x7ffff4182da0, head=0, after=0x0) at channel.c:1454 1454 cur = AST_LIST_LAST(&chan->readq);
          Hide
          Walter Doekes added a comment - - edited

          I just noticed that this bug wasn't in my 10.7 version. It's relatively new:

          $ svn log -r 371337
          ------------------------------------------------------------------------
          r371337 | jrose | 2012-08-16 17:46:26 +0200 (do, 16 aug 2012) | 11 lines
          
          chan_sip: Trigger reinvite if the SDP answer is included in the SIP ACK
          
          Under certain conditions, a SIP transaction involving directmedia wouldn't
          trigger a re-invite because the SDP answer was included in an ACK instead
          of in a message that we would have triggered the invite with. This patch
          just queues a source change control frame if the dialog is using
          directmedia when we find sdp for an ACK.
          
          (closes issue AST-913)
          Reported by: Thomas Arimont
          
          ------------------------------------------------------------------------
          
          Index: channels/chan_sip.c
          ===================================================================
          --- channels/chan_sip.c	(revision 371336)
          +++ channels/chan_sip.c	(revision 371337)
          @@ -25445,8 +25445,12 @@
           			p->pendinginvite = 0;
           			acked = __sip_ack(p, seqno, 1 /* response */, 0);
           			if (find_sdp(req)) {
          -				if (process_sdp(p, req, SDP_T38_NONE))
          +				if (process_sdp(p, req, SDP_T38_NONE)) {
           					return -1;
          +				}
          +				if (ast_test_flag(&p->flags[0], SIP_DIRECT_MEDIA)) {
          +					ast_queue_control(p->owner, AST_CONTROL_SRCCHANGE);
          +				}
           			}
           			check_pendings(p);
           		} else if (p->glareinvite == seqno) {
          

          process_sdp is sprinkled with p->owner checks. This one wasn't.

          /me mumbles something about private AST-* issues.

          Show
          Walter Doekes added a comment - - edited I just noticed that this bug wasn't in my 10.7 version. It's relatively new: $ svn log -r 371337 ------------------------------------------------------------------------ r371337 | jrose | 2012-08-16 17:46:26 +0200 (do, 16 aug 2012) | 11 lines chan_sip: Trigger reinvite if the SDP answer is included in the SIP ACK Under certain conditions, a SIP transaction involving directmedia wouldn't trigger a re-invite because the SDP answer was included in an ACK instead of in a message that we would have triggered the invite with. This patch just queues a source change control frame if the dialog is using directmedia when we find sdp for an ACK. (closes issue AST-913) Reported by: Thomas Arimont ------------------------------------------------------------------------ Index: channels/chan_sip.c =================================================================== --- channels/chan_sip.c (revision 371336) +++ channels/chan_sip.c (revision 371337) @@ -25445,8 +25445,12 @@ p->pendinginvite = 0; acked = __sip_ack(p, seqno, 1 /* response */, 0); if (find_sdp(req)) { - if (process_sdp(p, req, SDP_T38_NONE)) + if (process_sdp(p, req, SDP_T38_NONE)) { return -1; + } + if (ast_test_flag(&p->flags[0], SIP_DIRECT_MEDIA)) { + ast_queue_control(p->owner, AST_CONTROL_SRCCHANGE); + } } check_pendings(p); } else if (p->glareinvite == seqno) { process_sdp is sprinkled with p->owner checks. This one wasn't. /me mumbles something about private AST-* issues.
          Hide
          Colin Cuthbertson added a comment -

          Thanks for your work on this.

          I ran your tests on two of my Asterisk servers and re-created the problem. I then applied the patch to each and was unable to re-create the segfault again.

          I will keep you posted.

          Show
          Colin Cuthbertson added a comment - Thanks for your work on this. I ran your tests on two of my Asterisk servers and re-created the problem. I then applied the patch to each and was unable to re-create the segfault again. I will keep you posted.
          Hide
          Matt Jordan added a comment -

          We've got everything lined up to get the security release out. These should go out today, 08/27.

          Show
          Matt Jordan added a comment - We've got everything lined up to get the security release out. These should go out today, 08/27.

            People

            • Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development