Asterisk
  1. Asterisk
  2. ASTERISK-19455

SIP channels permanently stuck in system after BYE message received

    Details

    • Type: Bug Bug
    • Status: Closed
    • Severity: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.1.0, 10.1.1, 10.1.2, 10.1.3
    • Target Release Version/s: None
    • Security Level: None
    • Labels:
      None
    • Environment:
      Linux: 2.6.32-220.4.1.el6.x86_64 #1 SMP Tue Jan 24 02:13:44 GMT 2012 x86_64
    • Frequency of Occurrence:
      Constant
    • Regression:
      No

      Description

      We are seeing some percentage of our SIP channels stuck in the system after BYE is received. We see the following in our logs:

      [Mar 1 11:11:23] WARNING[16927]: chan_sip.c:3879 __sip_autodestruct: Autodestruct on dialog '4f59fe4879706230286df93b21335ea9@216.40.252.146' with owner in place (Method: BYE)
      [Mar 1 11:11:24] WARNING[16927]: chan_sip.c:3879 __sip_autodestruct: Autodestruct on dialog '7edfb438409765196966d7550515d8d3@76.164.225.138' with owner in place (Method: BYE)
      [Mar 1 11:11:24] WARNING[16927]: chan_sip.c:3879 __sip_autodestruct: Autodestruct on dialog '0e1e400609c5925b292ad8ae10fe9942@216.40.252.146' with owner in place (Method: BYE)

      The channels do NOT clear. Asterisk must be totally restarted. If you do a sip show channels in the CLI you see lines that look like this:

      76.164.225.138 XXXXXXXXXX 16e63b231707cb9 (nothing) No Rx: BYE p3
      76.164.225.138 XXXXXXXXXX 569511d04c78b4d (nothing) No Rx: BYE p3

      If you inspect a specific stuck channel with sip show channel you see this:

      • SIP Call
        Curr. trans. direction: Outgoing
        Call-ID: 569511d04c78b4d319fb2914183bc17a@76.164.225.138
        Owner channel ID: <none>
        Our Codec Capability: (ulaw)
        Non-Codec Capability (DTMF): 0
        Their Codec Capability: (gsm|ulaw|alaw)
        Joint Codec Capability: (ulaw)
        Format: (nothing)
        T.38 support No
        Video support No
        MaxCallBR: 384 kbps
        Theoretical Address: XX.XX.XX.XX:5060
        Received Address: XX.XX.XX.XX:5060
        SIP Transfer mode: open
        Force rport: Yes
        Audio IP: XX.XX.XX.XX (local)
        Our Tag: as2f5a8f37
        Their Tag: as65be27db
        SIP User agent: Asterisk PBX 1.6.1.20
        Peername: p3
        Original uri: sip:XXXXXXXXXX@XX.XX.XX.XX
        Caller-ID: XXXXXXXXXX
        Need Destroy: No
        Last Message: Rx: BYE
        Promiscuous Redir: No
        Route: sip:XXXXXXXXXX@XX.XX.XX.XX
        DTMF Mode: inband
        SIP Options: replaces replace timer
        Session-Timer: Inactive

      We do NOT have a hangup extension in our dialplan. It is incredibly simple. Here it is:

      exten => _cXX#.,1,Answer()
      exten => _cXX#.,n,AGI(confagi.php,$

      {EXTEN}

      )
      exten => _cXX#.,n,ConfBridge($

      {INSTANCE}

      ,default_bridge,default_user,$

      {MENU_NAME}

      )
      exten => _cXX#.,n,Hangup()

      The AGI script is ALWAYS already complete when this issue happens. It only takes about 300ms to run. The issue occurs when the user hangs up while in the ConfBridge app.

      In the log I've uploaded, you can see the entire situation as regards the following SIP channel: 7fa1b98e207951a85328a51c7d539e58

      The file had a zillion copies of the following two messages, which I redacted for readability:

      [Mar 1 11:27:06] DEBUG[2668] bridging.c: Going into a multithreaded waitfor for bridge channel 0x7f39d50388c8 of bridge 0x7f39d40d0878

      and

      [Mar 1 11:27:06] DEBUG[2934] audiohook.c: Failed to get 160 samples from read factory 0x7f39d41fc2b8
      [Mar 1 11:27:06] DEBUG[2934] audiohook.c: Failed to get 160 samples from write factory 0x7f39d41fcdf8
      [Mar 1 11:27:06] DEBUG[2934] audiohook.c: Read factory 0x7f39d41fc2b8 and write factory 0x7f39d41fcdf8 both fail to provide 160 samples

      One thing I noticed is that the autodestruct time for this channel gets set to 6400ms. I had thought it was supposed to be 32000ms. Maybe that's the problem?

      1. 13034709861-filter.txt
        20 kB
        Dan McKeehan
      2. errorlog.gz
        333 kB
        Dean Vesvuio

        Issue Links

          Activity

          Hide
          Dan McKeehan added a comment - - edited

          Added an example of the logs from my stuck calls.

          Show
          Dan McKeehan added a comment - - edited Added an example of the logs from my stuck calls.
          Hide
          Dean Vesvuio added a comment -

          @Mark

          I will apply the patch tonight and update you tomorrow midday.

          Thanks!!!!

          Show
          Dean Vesvuio added a comment - @Mark I will apply the patch tonight and update you tomorrow midday. Thanks!!!!
          Hide
          Dean Vesvuio added a comment -

          That fixed it! Many thanks! You should definitely try to get that patch put into 10.5.0 if possible. We had zero stuck calls today on the server where we deployed it (down from dozens).

          Show
          Dean Vesvuio added a comment - That fixed it! Many thanks! You should definitely try to get that patch put into 10.5.0 if possible. We had zero stuck calls today on the server where we deployed it (down from dozens).
          Hide
          Mark Michelson added a comment -

          Okay, great to hear. I'm going to commit the patch and get this and ASTERISK-19425 closed.

          @Dan: It's hard to see much from the debug you published. The message

          [May 1 13:10:20] DEBUG[9170] channel.c: Didn't get a frame from channel: SIP/2.2.2.84-00027be3

          means that the bridge detected the hangup on the SIP channel properly. However, it's hard to tell what happened beyond that, if anything. The fact that the channel still showed up in your "core show channel SIP/2.2.2.84-00027be3" command indicates the Asterisk channel is still in the system. It's not just the SIP dialog that got stuck. I suspect that there is either

          1) A channel reference leak
          2) A deadlock

          that is preventing the channel from being freed properly. If I were a betting man, I'd guess that the use of chan_local has something to do with the issue.

          I suggest that you open a new bug report since it has turned out this issue is not directly related to yours. On that issue, please upload a full log, not just selected bits. Also be sure to indicate the version of Asterisk you are using so that the problem can be more easily reproduced if possible.

          Show
          Mark Michelson added a comment - Okay, great to hear. I'm going to commit the patch and get this and ASTERISK-19425 closed. @Dan: It's hard to see much from the debug you published. The message [May 1 13:10:20] DEBUG [9170] channel.c: Didn't get a frame from channel: SIP/2.2.2.84-00027be3 means that the bridge detected the hangup on the SIP channel properly. However, it's hard to tell what happened beyond that, if anything. The fact that the channel still showed up in your "core show channel SIP/2.2.2.84-00027be3" command indicates the Asterisk channel is still in the system. It's not just the SIP dialog that got stuck. I suspect that there is either 1) A channel reference leak 2) A deadlock that is preventing the channel from being freed properly. If I were a betting man, I'd guess that the use of chan_local has something to do with the issue. I suggest that you open a new bug report since it has turned out this issue is not directly related to yours. On that issue, please upload a full log, not just selected bits. Also be sure to indicate the version of Asterisk you are using so that the problem can be more easily reproduced if possible.
          Hide
          Virtua Klem added a comment -

          Hi,

          I have the same problem with 10.5.1 versions.

          And i can't apply the patch.

          Thanks

          Show
          Virtua Klem added a comment - Hi, I have the same problem with 10.5.1 versions. And i can't apply the patch. Thanks

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development