[Home]

Summary:ASTERISK-17753: [patch] [regression] Asterisk drops sip messages and/or response codes if SIP/TLS is used
Reporter:Stefan Tichy (st)Labels:
Date Opened:2011-04-26 10:11:19Date Closed:2011-05-23 09:29:39
Priority:BlockerRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/TCP-TLS
Versions:1.8.3 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_sip-TLS_20110516.patch
( 1) chan_sip-TLS.patch
( 2) extensions.conf
( 3) sipp-test
( 4) ssl-poll-fix1.diff
( 5) ssl-poll-fix2.diff
( 6) ssl-poll-fix3.diff
Description:When a Snom 360 (Firmware 7.3.7) tries to register, there is a new tcp connection. Sometimes Asterisk CLI shows Register and 401 Response, but the phone does not get a response. Sometimes CLI shows nothing, but the phone logs many attempts to register. Its strange and not easy to reproduce, but here sipp is the solution.



****** ADDITIONAL INFORMATION ******

sipp -d 5000  -trace_err -rtp_echo -sn uac -t l1 -r 1 'host15:5061'

Just run this command an check sipp output.
Comments:By: David Hajek (hajekd) 2011-04-26 17:24:17

Not sure if its related, but we have issues with TLS phones are not able to register after upgrade to 1.8.3.3. After downgrade to 1.8.3.2 all is back to normal. We even tried 1.8.3.4rc but same TLS issues. There is something broken with TLS after 1.8.3.2.

By: bas (bas) 2011-05-02 09:28:36

Yes, I can confirm that tls is broken from 1.8.3.3.
1.8.3.2 works ok.

By: Leif Madsen (lmadsen) 2011-05-05 08:19:45

st: can you confirm if things work for you on 1.8.3.2?

By: Stefan Tichy (st) 2011-05-06 11:38:01

Yes it works. There are some Snom 3X0 connected since 1.8.3.2 was released and the simple sipp test works without problems.

Some patches have been applied to Asterisk 1.8.3.2

Issue 17544:

https://issues.asterisk.org/file_download.php?file_id=29266&type=bug

https://issues.asterisk.org/file_download.php?file_id=26709&type=bug

Connection to PostgreSQL without encryption

Others not related to TCP or encryption

By: Franco Lanza (nextime) 2011-05-08 03:37:27

I also have the same issue, using csipsimple ( based on pjsip ) android client.

Trying to debug the issue i discovered some strange things: it seem that * answer
to register request with data from "older" requests.

Let me say, for example, i send a REGISTER request with CSeq 1, * answer with a 401 and CSeq N.

After few minutes, i send another register request with CSeq 2. * answer with a 401 and CSeq X.

after few minutes, i send another register request with CSeq 3, * answer with a 401 and CSeq 1, and also the others header copied from my first register request.

In pratice it seems that it is something like "desynced".

NOTE: looking at the asterisk debug log it seems that all is working the right way, but looking on the other side ( the client ) debug log, it is clear what i just try to say, the "desync". So, also in the debug log of asterisk the "received" sip message isn't the "real received one", but an old one.



By: kasomaro (kasomaro) 2011-05-13 02:26:44

The new 1.8.4 TLS/SRTP looks like to be broken too. 1.8.3.2 is working perfect (same config). How can I get more log from the TLS process, which results in: "SIP/2.0 401 Unauthorized"?

By: Michael Kuron (mkuron) 2011-05-14 05:04:41

I also have issues with TLS on 1.8.3.3 and 1.8.4 (Debian packages from packages.asterisk.org) on a variety of hardware and software phones. Disabling TLS and forcing the phones back to TCP returns everything back to a working state.
When looking at the SIP package traces while on 1.8.3.3, I can see that Asterisk receives TLS packages from the client and replies to them, but these replies never make it to the client.
When looking at the traces on 1.8.4, I can see that the phone is getting those 401 Unauthorized messages.
So it seems that while 1.8.4 fixes the general TLS communication issue, it now screws up the messages sent across TLS.



By: Marcello Ceschia (marcelloceschia) 2011-05-14 13:40:49

revision 314628 breaks TLS support

By: Leif Madsen (lmadsen) 2011-05-16 08:33:22

Please test the patch on ASTERISK-17761 as that is the expected patch to be merged to resolve this issue.

By: Marcello Ceschia (marcelloceschia) 2011-05-16 08:52:14

ASTERISK-1895192 does not solve the TLS issue



By: Matthew Nicholson (mnicholson) 2011-05-17 13:50:43

I am able to reproduce this issue on my test box (although it is not quite as severe).  I am looking into it.

By: Matthew Nicholson (mnicholson) 2011-05-17 15:26:19

So far this seems to be specific to TLS sockets (TCP sockets are not affected).  There appears to be a problem using poll on ssl sockets.  I am working on a fix.



By: Matthew Nicholson (mnicholson) 2011-05-18 15:35:11

I have uploaded a patch that fixes this issue.

By: Sébastien Couture (sysreq) 2011-05-18 16:48:29

I've been able to successfully register a device using mnicholson's latest patch; although now Asterisk segfaults a couple seconds after said registration. I'll upload a backtrace of the dumped core.

By: Sébastien Couture (sysreq) 2011-05-18 16:58:18

Oops, nevermind, I just saw what issue ASTERISK-17761 was about. I applied that patch as well and it seems to resolve the segfault issue.

By: Marcello Ceschia (marcelloceschia) 2011-05-19 02:30:43

mnicholson: sorry, but why do you poll during message parsing?
What happens if you get an event on fds[1]? There are this isn't always handled as it should.

By: Matthew Nicholson (mnicholson) 2011-05-19 06:58:39

Polling is done while reading a message to allow the tcpauthtimeout to fire while receiving a message. Otherwise clients could circumvent that timeout by sending a partial message leaving asterisk sitting in the middle of the read loop. Polling is not done on the event fd while reading a message, only between messages.

By: Matthew Nicholson (mnicholson) 2011-05-20 12:05:45

Please test the new ssl-poll-fix2.diff patch.  It employs a fancy polling scheme that only polls after a failed non-blocking read ensuring that internal buffers have been cleared out.

By: Matthew Nicholson (mnicholson) 2011-05-20 12:09:13

Test with ssl-poll-fix3.diff.

By: Matthew Nicholson (mnicholson) 2011-05-20 14:21:38

I have committed this patch.  Reopen the issue if it is still broken.

By: Digium Subversion (svnbot) 2011-05-23 09:27:53

Repository: asterisk
Revision: 320179

U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r320179 | mnicholson | 2011-05-23 09:27:52 -0500 (Mon, 23 May 2011) | 16 lines

This commit modifies the way polling is done on TLS sockets.

Because of the buffering the TLS layer does, polling is unreliable. If poll is
called while there is data waiting to be read in the TLS layer but not at the
network layer, the messaging processing engine will not proceed until something
else writes data to the socket, which may not occur. This change modifies the
logic around TLS sockets to only poll after a failed read on a non-blocking
socket. This way we know that there is no data waiting to be read from the
buffering layer.

(closes issue ASTERISK-17753)
Reported by: st
Patches:
     ssl-poll-fix3.diff uploaded by mnicholson (license 96)
Tested by: mnicholson

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=320179

By: Digium Subversion (svnbot) 2011-05-23 09:28:00

Repository: asterisk
Revision: 320180

U   branches/1.8/channels/chan_sip.c

------------------------------------------------------------------------
r320180 | mnicholson | 2011-05-23 09:27:59 -0500 (Mon, 23 May 2011) | 16 lines

This commit modifies the way polling is done on TLS sockets.

Because of the buffering the TLS layer does, polling is unreliable. If poll is
called while there is data waiting to be read in the TLS layer but not at the
network layer, the messaging processing engine will not proceed until something
else writes data to the socket, which may not occur. This change modifies the
logic around TLS sockets to only poll after a failed read on a non-blocking
socket. This way we know that there is no data waiting to be read from the
buffering layer.

(closes issue ASTERISK-17753)
Reported by: st
Patches:
     ssl-poll-fix3.diff uploaded by mnicholson (license 96)
Tested by: mnicholson

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=320180

By: Digium Subversion (svnbot) 2011-05-23 09:28:07

Repository: asterisk
Revision: 320181

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r320181 | mnicholson | 2011-05-23 09:28:06 -0500 (Mon, 23 May 2011) | 23 lines

Merged revisions 320180 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.8

........
 r320180 | mnicholson | 2011-05-20 13:48:46 -0500 (Fri, 20 May 2011) | 16 lines
 
 This commit modifies the way polling is done on TLS sockets.
 
 Because of the buffering the TLS layer does, polling is unreliable. If poll is
 called while there is data waiting to be read in the TLS layer but not at the
 network layer, the messaging processing engine will not proceed until something
 else writes data to the socket, which may not occur. This change modifies the
 logic around TLS sockets to only poll after a failed read on a non-blocking
 socket. This way we know that there is no data waiting to be read from the
 buffering layer.
 
 (closes issue ASTERISK-17753)
 Reported by: st
 Patches:
       ssl-poll-fix3.diff uploaded by mnicholson (license 96)
 Tested by: mnicholson
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=320181

By: Digium Subversion (svnbot) 2011-05-23 09:29:39

Repository: asterisk
Revision: 320221

U   tags/1.8.4.1/ChangeLog
U   tags/1.8.4.1/channels/chan_sip.c

------------------------------------------------------------------------
r320221 | lmadsen | 2011-05-23 09:29:39 -0500 (Mon, 23 May 2011) | 1 line

Merge changes for issue ASTERISK-17753 and update the ChangeLog.
------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=320221