[Home]

Summary:ASTERISK-19003: asterisk closes TLS connection after receiving ACK for 401 following INVITE sent by Snom m9
Reporter:Gregor Schaffrath (kodomo)Labels:
Date Opened:2011-12-09 19:46:22.000-0600Date Closed:2014-08-07 20:05:29
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/TCP-TLS
Versions:1.8.7.1 Frequency of
Occurrence
Constant
Related
Issues:
duplicatesASTERISK-18345 [patch] sips connection dropped by asterisk with a large INVITE
Environment:Debian squeeze amd64Attachments:( 0) Asterisk-TLS.zip
( 1) Asterisk-TLS.zip
( 2) relevant_bits.log
( 3) snom_fail_noansi.log
( 4) snom_fail.log
( 5) snom_fail.log
Description:Registering the Snom m9 via TLS works fine.
The initial INVITE triggers a 401 Unauthorized response from asterisk.
The received ACK is handled by handle_incoming, and then asterisk states that it did not find the CSeq used for all previous messages and disconnects without waiting for the new INVITE:

---
(last lines)
[Dec 10 00:08:33] ^[[1;32mDEBUG^[[0m[3292]: ^[[1;37mchan_sip.c^[[0m:^[[1;37m7835^[[0m ^[[1;37mfind_call^[[0m: = Looking for  Call ID: povyizfse8 (Checking From) --From tag 9fs5t8 --To-tag as791817a8
[Dec 10 00:08:33] ^[[1;32mDEBUG^[[0m[3292]: ^[[1;37mchan_sip.c^[[0m:^[[1;37m24471^[[0m ^[[1;37mhandle_incoming^[[0m: **** Received ACK (6) - Command in SIP ACK
[Dec 10 00:08:33] ^[[1;32mDEBUG^[[0m[3292]: ^[[1;37mchan_sip.c^[[0m:^[[1;37m4011^[[0m ^[[1;37m__sip_ack^[[0m: Stopping retransmission on 'povyizfse8' of Response 29123: Match Not Found
---

It seems as if asterisk just forgot about the CSeq after sending the 401?
The new INVITE sent immediately after the ACK never reaches asterisk, but merely triggers TCP RST packets from the server.
Comments:By: Gregor Schaffrath (kodomo) 2011-12-09 20:36:17.652-0600

log (asterisk verbose/debug + sip debug output ; tcpdump (tcp 5061) ; snom m9 log)

By: Gregor Schaffrath (kodomo) 2011-12-12 15:57:56.493-0600

log without annoying boxes

By: Gregor Schaffrath (kodomo) 2011-12-12 16:11:27.321-0600

important lines (referring to a 1-day-pastebin, but they may be giving a close enough hint to get a better orientation in the log):
22:45 < kodomo>  most is regular stuff... as described above... the snom registers... becomes
               'AVAILABLE'... and then the dialing starts... at line... 776
22:46 < kodomo>  asterisk prepares a response around 879 (401)
22:46 < kodomo>  the snom responds by ACK at 911
22:47 < kodomo>  asterisk acknowledges receipt of the ACK at 948  
22:47 < kodomo> ...and then ... closes the connection
22:47 < kodomo> as a result, the snom is deregistered after a timeout
22:47 < kodomo> and then reregisters
22:48 < kodomo>  starting from line 468, you can see the tcpdump
22:49 < kodomo>  it is the asterisk initiating a FIN sequence, and the INVITE msg. receives a RST
               reaction
22:49 < kodomo> so it really appears to be asterisk closing the connection
22:50 < kodomo> the snom immediately drops the 'registered' LED upon TLS teardown
22:50 < kodomo>  starting from line 1541, you can see the snom's side of the story (SIP dump)


By: Gregor Schaffrath (kodomo) 2011-12-13 20:11:43.495-0600

Quick check with svn trunk version: same result, although I did not see the 'Match Not Found' message per se.

Still, asterisk did not wait for the authenticated INVITE, but just closed the TLS connection.

Is anyone reading this, or am I just talking to myself?

By: Sean Allison (cakerack) 2011-12-14 00:21:37.860-0600

Hi Gregor

I am reading this as I have the exact same problem!  SNOM M9 with Asterisk 1.8.5.0 has the exact same symptoms as you describe.  Anyone in Asterisk have any comments on this?

Sean


By: Walter Doekes (wdoekes) 2011-12-14 02:06:11.496-0600

{noformat}
$ sed -e 's/^M\?^[[^Km]*[Km]//g' snom_fail.log > snom_fail_noansi.log
{noformat}

By: Walter Doekes (wdoekes) 2011-12-14 02:50:46.947-0600

Your times are a bit off. I believe this might match, but your logs aren't that easy to read like this.

(The snom log is even worse, with times jumping backwards and packets in the wrong order.)

It does indeed look like the TCP layer closes the connection (FIN and a RST) right after receiving the ACK.

However, nothing points to that asterisk has forgotten about the dialog. The dialog is still kept until after 6400ms.

{noformat}
[Dec 10 03:07:26] DEBUG[4113]: chan_sip.c:4011 __sip_ack: Stopping retransmission on 'kmb5046ts5' of Response 12631: Match Not Found
{noformat}

The dialog has already been matched (even pedantically).

As far as I can see, this is in __sip_ack() which is called by handle_incoming() which is called by handle_request_do() which is called by _sip_tcp_helper_thread().

So.. it should be in the infinite for(;;):

I see no reason why this should abort:

{noformat}
if (fds[1].revents) { ... }
{noformat}

So I see only candidates that abort in:

{noformat}
if (fds[0].revents) { ... }
{noformat}

And then those that "goto cleanup" without printing a DEBUG message.

Or did you strip those messages out?

- Could we be looking at an authentication timeout?
- Or after_poll being 1 at the wrong moment?

By: Gregor Schaffrath (kodomo) 2011-12-14 05:25:13.034-0600

Hi folks.

Yes, I'm sorry for the ANSI clutter, I tried to strip (most of) it in the updated log file.
The Snom logfile is in reverse temporal order.

If it matches the session, then I don't understand why it does not find the CSeq number... (the number in front of 'Match Not Found' always matches the CSeq number)
I concur about the location of the DEBUG message.

I checked for DEBUG messages above the 'goto cleanup' locations in the uncut logfile. None of the messages appear in the log.
The only messages I stripped out were clutter of other SIP connections going on with other peers.
It therefore should be one of the few gotos without a message.

As I see it, 'if (fds[1].revents) {}' has only the 'TCPTLS_ALERT_STOP' alert case candidate... but I don't see why or where such an alert would be set.

As for 'if (fds[0].revents) {}':
The authentication timeout for the INVITE seems to happen several seconds after the closing of the socket at 03:07:33 (__sip_autodestruct).

I'd therefore reckon that one of the locations where after_poll is checked is the one where the socket gets closed.


By: Gregor Schaffrath (kodomo) 2011-12-14 05:36:59.072-0600

I added a debug statement to the candidate locations, and it turns out that it's actually closing the socket at the first of the two 'if (after_poll) {}' sections:

[Dec 14 12:34:16] DEBUG[21813]: chan_sip.c:4014 __sip_ack: Stopping retransmission on 'amfrmhkhqx' of Response 3388: Match Not Found
[Dec 14 12:34:16] DEBUG[21813]: chan_sip.c:2658 _sip_tcp_helper_thread: grsch: 1st after_poll: after_poll=1 ; going to cleanup

(the one after 'if (!fgets(buf, sizeof(buf), tcptls_session->f)) {')


By: Gregor Schaffrath (kodomo) 2011-12-14 06:28:57.000-0600

Quickly browsing for possibly related reports, I found this one:
ASTERISK-18345
(unresolved... proposed workaround not working reliably according to the reporter)

Could this relate to our problem at hand?

By: Walter Doekes (wdoekes) 2011-12-14 06:34:39.636-0600

You're right. This is probably the same issue.

By: Stephane Chazelas (stephane.chazelas) 2011-12-14 07:54:56.296-0600

(I'm the reporter of that other issue and suggested the work around).

The work around has been working fine for us here for a while. I was just pointing out that it is not guaranteed to work in all circumstances. Asterisk design is flawed here (because it mixes interactions with the socket underneath SSL, SSL functions, non-block IO in a very dangerous way where one can't expect the behavior to be deterministic). The work around makes it even more dirty but at least makes it work for me.

You can safely try out the patch, it shouldn't have any negative impact.

By: Stephane Chazelas (stephane.chazelas) 2011-12-14 08:05:19.844-0600

I've not read this report fully, but I can summarise the issue in the other report I submitted.

Basically: fgets() (on the FILE that is a stdio wrapper around SSL library functions) returns with 0 (causing the connection to be torn down) even though a poll on the underlying socket had indicated that some data was available. However that could have been data at the socket level, but not at the SSL level (out of band data like alerts...). IIRC, there was already a work around to give it another chance. My work around is to give it yet another chance for actual data to arrive.

By: Gregor Schaffrath (kodomo) 2011-12-14 08:26:35.081-0600

Update: applying your patch manually got me one step farther: Now I see the authenticated INVITE on asterisk, and the SRTP setup starts.

I still don't get an SRTP connection, but this may be a configuration issue (or relate to strict firewall rules at my current location).
I hope to be able to test it again at home tonight. (Will give a new update, then)


By: Leif Madsen (lmadsen) 2011-12-14 12:30:57.088-0600

OK I've made this a duplicate of the issue that was originally marked as related.

By: Gregor Schaffrath (kodomo) 2011-12-14 15:58:30.172-0600

The workaround is working for me... so I guess it really is the same issue.

By: Thomas Arimont (tomaso) 2011-12-15 07:56:57.411-0600

I had the same problem too using a SNOM360, version 8.4.32. If the TCP segmentation border of the SNOM (SIP Invite is TCP segmented) was reached, I couldn't place any TLS/SRTP calls from the phone to asterisk anymore. I've found this out by re-configuring the audio codecs on the SNOM to reduce the SIP Invite message length.
Fortunately the patch from ASTERISK-18345 did solve the problem for me too ...

By: Michael Kuron (mkuron) 2012-03-09 03:51:11.208-0600

I'm still having this issue on Asterisk 1.8.10.0 and a Snom 870 with firmware 8.4.35. Thomas' workaround (reducing the number of codecs so the INVITE message is shorter) works fine for me.
I did some Wireshark analysis: My problem is indeed that the TLS connection gets closed after the ACK and before the authenticated INVITE is made.

By: Pietro Bertera (pbertera) 2012-04-26 08:26:42.708-0500

Same issue with Asterisk 1.8.11.1, snom820 and snom720 with firmware 8.7.3.7
The workaround to remove audio codecs works done.


By: Pietro Bertera (pbertera) 2012-04-26 08:39:03.360-0500

Pcap trace between asterisk and snom phone.
At packet 29 Asterisk closes the connection.

In the attachment you can find the the TLS private key.

By: Steve Davies (one47) 2012-05-11 11:12:16.658-0500

Uploaded experimental patch against #ASTERISK-18345, that might apply to issues seen here. Feel free to give it a go in case it works.

By: Pietro Bertera (pbertera) 2012-09-19 05:17:10.850-0500

Another workaround: in order to decrease the size of SIP request you can also enable "Compact Headers" in snom phones: http://wiki.snom.com/wiki/index.php/Settings/short_form