Asterisk
  1. Asterisk
  2. ASTERISK-18345

[patch] sips connection dropped by asterisk with a large INVITE

    Details

    • Type: Bug Bug
    • Status: Open
    • Severity: Major Major
    • Resolution: Unresolved
    • Affects Version/s: SVN, 1.8.4, 11.4.0, 11.5.0
    • Target Release Version/s: None
    • Security Level: None
    • Labels:
      None
    • Frequency of Occurrence:
      Constant

      Description

      When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.

      I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.

      But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().

      I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.

      If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.

      I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:

      @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
                                       * TLS layer */
                                      if (!tcptls_session->ssl || need_poll) {
                                              need_poll = 0;
      -                                       after_poll = 1;
      +                                       after_poll++;
                                              res = ast_wait_for_input(tcptls_session->fd, timeout);
                                              if (res < 0) {
                                                      ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
      @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
                                      ast_mutex_lock(&tcptls_session->lock);
                                      if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
                                              ast_mutex_unlock(&tcptls_session->lock);
      -                                       if (after_poll) {
      +                                       if (after_poll > 1) {
                                                      goto cleanup;
                                              } else {
                                                      need_poll = 1;
      

      and it fixed the issue.

      So, there's something definitely wrong though I couldn't tell exactly what.

      1. tls_read_fix_try1_1.8.11.1.diff
        2 kB
        Steve Davies
      2. tls_read_fix_try2_1.8.11.1.diff
        2 kB
        Steve Davies
      3. tls_read_fix_try3_1.8.11.1.diff
        2 kB
        Steve Davies
      4. tls_read.patch
        0.5 kB
        Filip Jenicek

        Issue Links

          Activity

          Hide
          Ben Chavet added a comment -

          I have been trying to chase this issue down for DAYS!

          My initial testing shows that the patch supplied here also fixes the issue for asterisk-11.4

          Show
          Ben Chavet added a comment - I have been trying to chase this issue down for DAYS! My initial testing shows that the patch supplied here also fixes the issue for asterisk-11.4
          Hide
          Deo added a comment -

          Thank you very much for the patch! I have spent several FULL days to understand the nature of this bug and the patch seems to be worked for my 11.5 installation @FreeBSD!

          +1 to get it into official release and don't forget to upgrade FreeBSD port please

          Show
          Deo added a comment - Thank you very much for the patch! I have spent several FULL days to understand the nature of this bug and the patch seems to be worked for my 11.5 installation @FreeBSD! +1 to get it into official release and don't forget to upgrade FreeBSD port please
          Hide
          Tzafrir Cohen added a comment -

          I can reproduce this issue easily on Asterisk trunk and 11.5 on Centos 5 (server) with Asterisk 11.5 on Debian 7 as the client.

          When I have 'allow=all' and a TLS connection, it blows up. When I either switch to TCP or use 'allow=alaw', all's well. The patch tls_read.patch applied well on trunk and seems to have fixed the issue.

          Show
          Tzafrir Cohen added a comment - I can reproduce this issue easily on Asterisk trunk and 11.5 on Centos 5 (server) with Asterisk 11.5 on Debian 7 as the client. When I have 'allow=all' and a TLS connection, it blows up. When I either switch to TCP or use 'allow=alaw', all's well. The patch tls_read.patch applied well on trunk and seems to have fixed the issue.
          Hide
          Alex Khokhlov added a comment - - edited

          I also have this issue in my system and it is 100% reproducable in the following environment:
          Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5.1
          Client: Counterpath Bria 2.3.6.61985, Android 4.3, Samsung Galaxy Nexus
          Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call.

          The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ).

          On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet.

          This is definitely a bug on the Asterisk side.

          Show
          Alex Khokhlov added a comment - - edited I also have this issue in my system and it is 100% reproducable in the following environment: Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5.1 Client: Counterpath Bria 2.3.6.61985, Android 4.3, Samsung Galaxy Nexus Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call. The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ). On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet. This is definitely a bug on the Asterisk side.
          Hide
          Shlomi Gutman added a comment -

          Had problems with outgoing calls from snom with SRTP mandatory asterisk 11.6.0 on Debian 7 (wheezy) 3.2.0-4-amd64(3.2.51-1 x86_64) with openssl 1.0.1e-2.
          After applying patch the problem was resolved.

          Show
          Shlomi Gutman added a comment - Had problems with outgoing calls from snom with SRTP mandatory asterisk 11.6.0 on Debian 7 (wheezy) 3.2.0-4-amd64(3.2.51-1 x86_64) with openssl 1.0.1e-2. After applying patch the problem was resolved.

            People

            • Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development