Asterisk
  1. Asterisk
  2. ASTERISK-18345

[patch] sips connection dropped by asterisk with a large INVITE

    Details

    • Type: Bug Bug
    • Status: Open
    • Severity: Major Major
    • Resolution: Unresolved
    • Affects Version/s: SVN, 1.8.4, 11.4.0, 11.5.0
    • Target Release Version/s: None
    • Security Level: None
    • Labels:
      None
    • Frequency of Occurrence:
      Constant

      Description

      When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.

      I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.

      But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().

      I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.

      If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.

      I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:

      @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
                                       * TLS layer */
                                      if (!tcptls_session->ssl || need_poll) {
                                              need_poll = 0;
      -                                       after_poll = 1;
      +                                       after_poll++;
                                              res = ast_wait_for_input(tcptls_session->fd, timeout);
                                              if (res < 0) {
                                                      ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
      @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
                                      ast_mutex_lock(&tcptls_session->lock);
                                      if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
                                              ast_mutex_unlock(&tcptls_session->lock);
      -                                       if (after_poll) {
      +                                       if (after_poll > 1) {
                                                      goto cleanup;
                                              } else {
                                                      need_poll = 1;
      

      and it fixed the issue.

      So, there's something definitely wrong though I couldn't tell exactly what.

      1. tls_read_fix_try1_1.8.11.1.diff
        2 kB
        Steve Davies
      2. tls_read_fix_try2_1.8.11.1.diff
        2 kB
        Steve Davies
      3. tls_read_fix_try3_1.8.11.1.diff
        2 kB
        Steve Davies
      4. tls_read.patch
        0.5 kB
        Filip Jenicek
      5. tlsBigSDP.patch
        0.8 kB
        Alexander Traud
      6. tlsBigSDPdebug.patch
        2 kB
        Alexander Traud

        Issue Links

          Activity

          Hide
          Tzafrir Cohen added a comment -

          I can reproduce this issue easily on Asterisk trunk and 11.5 on Centos 5 (server) with Asterisk 11.5 on Debian 7 as the client.

          When I have 'allow=all' and a TLS connection, it blows up. When I either switch to TCP or use 'allow=alaw', all's well. The patch tls_read.patch applied well on trunk and seems to have fixed the issue.

          Show
          Tzafrir Cohen added a comment - I can reproduce this issue easily on Asterisk trunk and 11.5 on Centos 5 (server) with Asterisk 11.5 on Debian 7 as the client. When I have 'allow=all' and a TLS connection, it blows up. When I either switch to TCP or use 'allow=alaw', all's well. The patch tls_read.patch applied well on trunk and seems to have fixed the issue.
          Hide
          Alex Khokhlov added a comment - - edited

          I also have this issue in my system and it is 100% reproducable in the following environment:
          Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5.1
          Client: Counterpath Bria 2.3.6.61985, Android 4.3, Samsung Galaxy Nexus
          Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call.

          The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ).

          On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet.

          This is definitely a bug on the Asterisk side.

          Show
          Alex Khokhlov added a comment - - edited I also have this issue in my system and it is 100% reproducable in the following environment: Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5.1 Client: Counterpath Bria 2.3.6.61985, Android 4.3, Samsung Galaxy Nexus Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call. The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ). On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet. This is definitely a bug on the Asterisk side.
          Hide
          Shlomi Gutman added a comment -

          Had problems with outgoing calls from snom with SRTP mandatory asterisk 11.6.0 on Debian 7 (wheezy) 3.2.0-4-amd64(3.2.51-1 x86_64) with openssl 1.0.1e-2.
          After applying patch the problem was resolved.

          Show
          Shlomi Gutman added a comment - Had problems with outgoing calls from snom with SRTP mandatory asterisk 11.6.0 on Debian 7 (wheezy) 3.2.0-4-amd64(3.2.51-1 x86_64) with openssl 1.0.1e-2. After applying patch the problem was resolved.
          Hide
          Alexander Traud added a comment -

          This is a follow-up of ASTERISK-17753. My attached tlsBigSDP.patch enables Asterisk to poll a second time even on the first TLS-message part. Would be cool, if you guys could test it because that is a one-line patch.

          With some large SDP, a second poll is required on the first part of the message. The current code did not poll a second time because the variable need_poll was inited with yes (1). That poll was a no-operation because there was a socket event already (which mandates fgets without poll). In the current code, poll returned immediately, fgets returned NULL, after_poll was yes (1), sip_tls_read returned failed (-1), _sip_tcp_helper_thread went to cleanup, called ast_tcptls_close_session_file, which closes the TLS connection.

          Steps to reproduce

          1. Asterisk sip.conf with tlsenable=yes and tlscertfile set.
          2. CounterPath Bria Android Edition
          3. Bria » Settings » Accounts » your Asterisk account » Account Advanced
            • SIP transport: TLS
            • Encrypt Audio: Best Effort (not required, but increases SDP size)
          4. Bria » Settings » Advanced Settings » Audio Codecs » enable all
          5. start a call from Bria

          Expected Results
          The call should be established or rejected with an SIP status code.

          Results
          with both verbosity/debug set to 9, and SIP debugging enabled
          ERROR tcptls.c: SSL_shutdown() failed: 5

          Notes
          The same happens with CSipSimple if a lot of media codecs are enabled. With CounterPath Bria iPhone Edition—even if the SDP message is bigger—it does not happen. I did not find a difference in Wireshark (same TLS version, same cipher suite, no renegotiation around). To double-check we really have an Asterisk issue here, I created the attached debug-patch (tlsBigSDPdebug.patch): In the above cases, SSL_read returns SSL_ERROR_WANT_READ. This is allowed by the OpenSSL API at any time. Therefore, we have an Asterisk issue.

          Show
          Alexander Traud added a comment - This is a follow-up of ASTERISK-17753 . My attached tlsBigSDP.patch enables Asterisk to poll a second time even on the first TLS-message part. Would be cool, if you guys could test it because that is a one-line patch. With some large SDP, a second poll is required on the first part of the message. The current code did not poll a second time because the variable need_poll was inited with yes (1). That poll was a no-operation because there was a socket event already (which mandates fgets without poll). In the current code, poll returned immediately, fgets returned NULL, after_poll was yes (1), sip_tls_read returned failed (-1), _sip_tcp_helper_thread went to cleanup, called ast_tcptls_close_session_file, which closes the TLS connection. Steps to reproduce Asterisk sip.conf with tlsenable=yes and tlscertfile set. CounterPath Bria Android Edition Bria » Settings » Accounts » your Asterisk account » Account Advanced SIP transport: TLS Encrypt Audio: Best Effort (not required, but increases SDP size) Bria » Settings » Advanced Settings » Audio Codecs » enable all start a call from Bria Expected Results The call should be established or rejected with an SIP status code. Results with both verbosity/debug set to 9, and SIP debugging enabled ERROR tcptls.c: SSL_shutdown() failed: 5 Notes The same happens with CSipSimple if a lot of media codecs are enabled. With CounterPath Bria iPhone Edition —even if the SDP message is bigger—it does not happen. I did not find a difference in Wireshark (same TLS version, same cipher suite, no renegotiation around). To double-check we really have an Asterisk issue here, I created the attached debug-patch (tlsBigSDPdebug.patch): In the above cases, SSL_read returns SSL_ERROR_WANT_READ. This is allowed by the OpenSSL API at any time. Therefore, we have an Asterisk issue.
          Hide
          Michael Kuron added a comment -

          I've used the tlsBigSDP.patch for a few days now on my Asterisk 1.8.28.2 and it does seem to resolve the issue with my Snom phones and the Media5Phone iPhone app.

          Show
          Michael Kuron added a comment - I've used the tlsBigSDP.patch for a few days now on my Asterisk 1.8.28.2 and it does seem to resolve the issue with my Snom phones and the Media5Phone iPhone app.

            People

            • Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development