Asterisk
  1. Asterisk
  2. ASTERISK-25468

Deadlock in chan_sip - core show locks shows do_monitor lock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Severity: Major Major
    • Resolution: Fixed
    • Affects Version/s: 13.6.0
    • Target Release Version/s: 13.12.0, 14.1.0, 15.0.0
    • Security Level: None
    • Labels:
      None
    • Environment:
      Debian 8 and Ubuntu 14.04.3.
      Asterisk latest 13.6.0 from Git.
      Realtime using odbc/mysql
    • Regression:
      Yes

      Description

      I am trying to bring a new server into an existing cluster of Asterisk boxes and I keep getting the same problem.

      Servers are all behind a Kamailio LB, and when I add this new server to the dispatcher group, kamailio starts sending REGISTER and SUBSCRIBE requests to the new server. After a few minutes chan_sip just hangs, no longer processing any traffic at all. Nothing shows up in the logs, and Asterisk itself is still running. I can see the incoming SIP packets using sngrep, but Asterisk does not see them.

      I have tried this on KVM and Openvz virtual servers as well as physical servers, and have tried both Debian 8 and Ubuntu 14.04 with the exact same results.

      When chan_sip freezes, 'core show locks' shows the following every time:

      =======================================================================
      === GIT-13-f8707ae
      === Currently Held Locks
      =======================================================================
      ===
      === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
      ===
      === Thread ID: 0x7fba21a4c700 LWP:13423 (do_monitor           started at [28932] chan_sip.c restart_monitor())
      === ---> Lock #0 (chan_sip.c): MUTEX 28903 do_monitor &monlock 0x7fba319054a0 (1)
              main/backtrace.c:59 __ast_bt_get_addresses() (0x46777f+1D)
              main/lock.c:258 __ast_pthread_mutex_lock() (0x5379ef+C7)
              channels/chan_sip.c:28904 do_monitor()
              main/utils.c:1237 dummy_start()
              :0 start_thread()
              libc.so.6 clone() (0x7fbab9e25410+6D)
      === -------------------------------------------------------------------
      

      There is no core sump when this happens. SIP simply stops responding, peers do not expire, etc.

      I have managed to get a gdb backtrace from the running process using

      gdb -ex "thread apply all bt" --batch /usr/sbin/asterisk <pid>
      

      Hopefully that will give some clue. I will upload it as an attachment.

      Any help much appreciated.

      1. ast-13.5.0-gdb.txt
        147 kB
        Barry Flanagan
      2. ASTERISK-25468_gdb-output.txt
        144 kB
        Barry Flanagan
      3. backtrace-threads.txt
        146 kB
        Leandro Dardini
      4. backtrace-threads-13.10.0-rc2.txt
        50 kB
        Leandro Dardini
      5. backtrace-threads1409.txt
        150 kB
        Antonis Psaras
      6. core-show-locks.txt
        26 kB
        Leandro Dardini
      7. coreshowlocks-13.10.0-rc2.txt
        3 kB
        Leandro Dardini
      8. core-show-locks-1409.txt
        0.9 kB
        Antonis Psaras
      9. full-20160914.txt
        3.34 MB
        Antonis Psaras
      10. full-log.txt.gz
        371 kB
        Barry Flanagan

        Issue Links

          Activity

          Hide
          Leandro Dardini added a comment - - edited

          Backtrace for deadlock on asterisk 13.10.0-rc2. This backtrace and the "core show locks" output was made by using a "followme" command with multiple channels. The same dialplan and setup works perfectly on asterisk 13.2.0

          Show
          Leandro Dardini added a comment - - edited Backtrace for deadlock on asterisk 13.10.0-rc2. This backtrace and the "core show locks" output was made by using a "followme" command with multiple channels. The same dialplan and setup works perfectly on asterisk 13.2.0
          Hide
          Joshua Colp added a comment -

          Leandro Dardini Your issue appears to be separate and is actually ODBC related. It should be under a separate issue.

          Show
          Joshua Colp added a comment - Leandro Dardini Your issue appears to be separate and is actually ODBC related. It should be under a separate issue.
          Hide
          Joshua Colp added a comment -

          Leandro Dardini The complete console output (with debug going to console in logger.conf and core set debug 3) and configuration would also be useful on the new issue.

          Show
          Joshua Colp added a comment - Leandro Dardini The complete console output (with debug going to console in logger.conf and core set debug 3) and configuration would also be useful on the new issue.
          Hide
          Antonis Psaras added a comment -

          We have the same issue on asterisk 13.11.
          Please find attached backtrace and full log

          Show
          Antonis Psaras added a comment - We have the same issue on asterisk 13.11. Please find attached backtrace and full log
          Hide
          George Joseph added a comment -

          I've submitted a patch that should fix this issue.

          https://gerrit.asterisk.org/#/c/3962/1

          If someone could please try the patch over 13.11 or the current 13 git branch and provide feedback that's be great.

          Show
          George Joseph added a comment - I've submitted a patch that should fix this issue. https://gerrit.asterisk.org/#/c/3962/1 If someone could please try the patch over 13.11 or the current 13 git branch and provide feedback that's be great.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: