Asterisk
  1. Asterisk
  2. ASTERISK-18345

[patch] sips connection dropped by asterisk with a large INVITE

    Details

    • Type: Bug Bug
    • Status: Closed
    • Severity: Major Major
    • Resolution: Fixed
    • Affects Version/s: SVN, 1.8.4, 11.4.0, 11.5.0
    • Target Release Version/s: 1.8.30.0, 11.12.0, 12.5.0
    • Security Level: None
    • Labels:
      None
    • Frequency of Occurrence:
      Constant

      Description

      When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.

      I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.

      But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().

      I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.

      If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.

      I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:

      @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
                                       * TLS layer */
                                      if (!tcptls_session->ssl || need_poll) {
                                              need_poll = 0;
      -                                       after_poll = 1;
      +                                       after_poll++;
                                              res = ast_wait_for_input(tcptls_session->fd, timeout);
                                              if (res < 0) {
                                                      ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
      @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
                                      ast_mutex_lock(&tcptls_session->lock);
                                      if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
                                              ast_mutex_unlock(&tcptls_session->lock);
      -                                       if (after_poll) {
      +                                       if (after_poll > 1) {
                                                      goto cleanup;
                                              } else {
                                                      need_poll = 1;
      

      and it fixed the issue.

      So, there's something definitely wrong though I couldn't tell exactly what.

      1. tcptls_poll.diff
        7 kB
        Elazar Broad
      2. tcptls_pollv2.diff
        7 kB
        Elazar Broad
      3. tls_read_fix_try1_1.8.11.1.diff
        2 kB
        Steve Davies
      4. tls_read_fix_try2_1.8.11.1.diff
        2 kB
        Steve Davies
      5. tls_read_fix_try3_1.8.11.1.diff
        2 kB
        Steve Davies
      6. tls_read.patch
        0.5 kB
        Filip Jenicek
      7. tlsBigSDP.patch
        0.8 kB
        Alexander Traud
      8. tlsBigSDPdebug.patch
        2 kB
        Alexander Traud

        Issue Links

          Activity

          Hide
          Patrick Laimbock added a comment -

          I have used the tlsBigSDP.patch on Asterisk 11.10.2 with TLS/SRTP enforced on many hour+ long calls and I no longer see the issue in CSipSimple and both CSipSimple and Bria (on Android & iPhone) work fine. Kudos to Alexander Traud for finding the fix!

          Isn't it time tlsBigSDP.patch went up for review for inclusion in Asterisk 1.8 and 11?

          Show
          Patrick Laimbock added a comment - I have used the tlsBigSDP.patch on Asterisk 11.10.2 with TLS/SRTP enforced on many hour+ long calls and I no longer see the issue in CSipSimple and both CSipSimple and Bria (on Android & iPhone) work fine. Kudos to Alexander Traud for finding the fix! Isn't it time tlsBigSDP.patch went up for review for inclusion in Asterisk 1.8 and 11?
          Hide
          Matt Jordan added a comment -

          It did go up for review:

          https://reviewboard.asterisk.org/r/3653/

          Unfortunately, this patch does not fix the root cause of the problem. As mentioned on that review, there is an appropriate way to fix this that would fix it for good, instead of merely making the problem highly unlikely to occur.

          It would be great if a patch was put together that fixes the root cause of the problem as recommended on that code review.

          Show
          Matt Jordan added a comment - It did go up for review: https://reviewboard.asterisk.org/r/3653/ Unfortunately, this patch does not fix the root cause of the problem. As mentioned on that review, there is an appropriate way to fix this that would fix it for good, instead of merely making the problem highly unlikely to occur. It would be great if a patch was put together that fixes the root cause of the problem as recommended on that code review.
          Hide
          Elazar Broad added a comment -

          Hi All,
          It's been a while. I have run into this same issue and attached a patch(tcptls_poll.diff) against trunk which starts the work of replacing sip_tls_read/write() and resolves the issue(see the discussion on the previous reviewboard link). My only concern is the while loop which handles the EAGAIN error has the potential to block infinitely. This patch was tested with CSipSimple utilizing TLS(with full SIP headers, multiple codecs and SRTP enabled) and TCP.

          Thanks,
          Elazar

          Show
          Elazar Broad added a comment - Hi All, It's been a while. I have run into this same issue and attached a patch(tcptls_poll.diff) against trunk which starts the work of replacing sip_tls_read/write() and resolves the issue(see the discussion on the previous reviewboard link). My only concern is the while loop which handles the EAGAIN error has the potential to block infinitely. This patch was tested with CSipSimple utilizing TLS(with full SIP headers, multiple codecs and SRTP enabled) and TCP. Thanks, Elazar
          Hide
          Elazar Broad added a comment -

          This version removes the separate while loop for the read and instead continues the main while loop which already has the timeout implemented.

          Show
          Elazar Broad added a comment - This version removes the separate while loop for the read and instead continues the main while loop which already has the timeout implemented.
          Hide
          Elazar Broad added a comment -
          Show
          Elazar Broad added a comment - ReviewBoard Link: https://reviewboard.asterisk.org/r/3882/

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Development