[Home]

Summary:ASTERISK-27347: [patch] pjproject_bundled: Disable TCP/TLS keep-alives.
Reporter:Alexander Traud (traud)Labels:patch pjsip
Date Opened:2017-10-16 03:04:33Date Closed:2018-07-13 07:23:04
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_keepalive
Versions:13.17.2 14.6.2 15.0.0 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-24644 res_pjsip_keepalive: Add keepalive module for connection-oriented transports.
is related toASTERISK-26686 res_pjsip: Lock inversion in transport management
Environment:Attachments:( 0) pjsip_keep_alive.diff
( 1) pjsip_keep_not_alive.patch
Description:PJSIP tries to keep TCP (and TLS) based SIP connections open. For this, a double-CRLF is send every 90 seconds, even if PJSIP is the User-Agent Server (UAS). This is not supported by many User-Agent Clients (UAC), especially when this comes in-between a transaction, although mandated by [RFC 3261 Section 7.5|https://tools.ietf.org/html/rfc5626#section-3.5.1].

For example, my Gigaset DE900 IP Pro does a re-register every 90 seconds. The Gigaset sents a REGISTER, gets a Proxy-Authentication-Required, and then got the keep-alive message. That halted the SIP stack of the Gigaset, the Gigaset closed the underlying TCP connection, and the whole Gigaset had to be restarted to be usable again.

In PJProject, this keep-alive mechanism can be disabled (only) at compile time, like {{CFLAGS="-DNDEBUG=1 -DPJ_HAS_IPV6=1 -DPJSIP_TCP_KEEP_ALIVE_INTERVAL=0 -DPJSIP_TLS_KEEP_ALIVE_INTERVAL=0" ./configure --enable-shared}}

Since version 13.2.0 (ASTERISK-24644, Commit [915bb88|http://github.com/asterisk/asterisk/commit/915bb88d3e973f647eb9d9e560688d6a02af2c2a]), Asterisk replaced this compile-time feature with the runtime setting {{keep_alive_interval}} which can be changed via the configuration file {{pjsip.conf}}. On default, this feature is zero = off. However, to make this work, PJProject must be compiled without its own keep-alive mechanism. This was never mentioned in the [Asterisk Wiki|http://wiki.asterisk.org/wiki/display/AST/PJSIP-pjproject#PJSIP-pjproject-externalBuildingandInstallingpjprojectfromSource].

Since version 13.8.0 (Commit [b59956a|http://github.com/asterisk/asterisk/commit/b59956a875817367834431e7f1fa02486b5aed7f]), Asterisk allows {{./configure --with-pjproject-bundled}} which extracted all the required DEFINEs and flags from the Wiki and sets those automatically. Again, the keep-alive mechanism of the PJProject is not disabled.

The attached patch rectifies these omissions for the bundled PJProject. Hopefully somebody with write-access adds those DEFINEs to the Asterisk Wiki as well.
Comments:By: Ross Beer (rossbeer) 2017-10-16 03:36:24.088-0500

There is also a related issue with PJSIP 'keep_alive_interval' which causes asterisk to deadlock (ASTERISK-26686).

Would the PJSIP keep alive mechanism cause this lock due to a conflict?

By: Alexander Traud (traud) 2017-10-16 13:11:57.662-0500

Ross, I cannot comment on that because I have no overview about the thread/locks in that case. I found this issue thanks to a Wireshark trace. My only goal was to disable that unwanted double-CRLF, which require a re-compilation of the PJProject libraries.

By: Ian Gilmour (tuxian) 2017-10-17 02:37:17.273-0500

Alexander suggested I try adding his patch to see if it improved the TLS problems I reported in ASTERISK-27001 (I still see very occassional TLS port closures and reopens even with my ASTERISK-27001 patch applied).

With Alexander's patch (and my own) I still see TLS errors being reported, and the TLS connection being closed by Asterisk and reopened on another port. Test conditions were similar to those described in ASTERISK-27001. I ran 40 concurrent SIPp generated calls of varying duration, ~40,000 SIPp calls in total. During the test I saw 1 x TLS port change (with an "ssl3_read_bytes-sslv3 alert bad record mac" error being reported). This is similar to what I see without Alexander's patch applied, so I don't think this issue is related to my own.

Note: in my tests I have pjsip.conf keep_alive_interval set to 20secs.


By: Alexander Traud (traud) 2017-10-19 06:20:31.456-0500

Ian, thanks for reporting. Could have been the cause for you as well. It was the cause in my scenario. By the way, just because being curious, why do you go for a keep_alive_interval? Asterisk 13.18 is the first version which allows me to run long-term tests (because a NAT/dynamic address issue was fixed) with chan_pjsip. So I am a newbie when it comes to long-term experiences with chan_pjsip.

By: Sergej Kasumovic (sergej) 2018-01-04 03:54:16.762-0600

We used Asterisk option keep_alive_interval, however as per above comment (ASTERISK-26686), Asterisk will deadlock due to the way code is written.

As per this ticket, pjproject already has a same mechanism and in addition to compile-time option it can be configured in run-time as well:
https://trac.pjsip.org/repos/ticket/1851
http://www.pjsip.org/pjsip/docs/html/group__PJSIP__CONFIG.htm#ga02217f4919a7c575d71eed407be63d04

Hence I am attaching a quick patch which exports two new options in type=system section:
[system]
type=system
tcp_keep_alive=90
tls_keep_alive=90

By default it is set to 90 seconds and options cannot be changed on 'reload'.
You may wish to try it as well.

By: Friendly Automation (friendly-automation) 2018-07-12 18:27:16.637-0500

Change 9383 merged by Jenkins2:
Bundled PJPROJECT: Disable internal connection oriented keep-alive.

[https://gerrit.asterisk.org/9383|https://gerrit.asterisk.org/9383]

By: Friendly Automation (friendly-automation) 2018-07-12 18:27:18.383-0500

Change 9384 merged by Jenkins2:
Bundled PJPROJECT: Disable internal connection oriented keep-alive.

[https://gerrit.asterisk.org/9384|https://gerrit.asterisk.org/9384]

By: Friendly Automation (friendly-automation) 2018-07-13 13:13:08.104-0500

Change 9385 merged by George Joseph:
Bundled PJPROJECT: Disable internal connection oriented keep-alive.

[https://gerrit.asterisk.org/9385|https://gerrit.asterisk.org/9385]

By: Friendly Automation (friendly-automation) 2018-08-28 11:58:54.548-0500

Change 10004 merged by Kevin Harwell:
Bundled PJPROJECT: Disable internal connection oriented keep-alive.

[https://gerrit.asterisk.org/10004|https://gerrit.asterisk.org/10004]