[Home]

Summary:ASTERISK-20934: Crash in res_srtp.so when SIP channel is bridged with non-optimizing Local channel
Reporter:tootai (tootai)Labels:
Date Opened:2013-01-14 11:49:58.000-0600Date Closed:2014-01-14 14:40:06.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/SRTP Resources/res_srtp
Versions:10.8.0 10.11.1 10.12.0 Frequency of
Occurrence
Frequent
Related
Issues:
is a clone ofASTERISK-20499 Crash in libsrtp srtp_unprotect_rtcp when SIP channel is bridged with non-optimizing Local channel
Environment:RHEL 5.8 on IBM X3650 M4 - 12 core - Xeon E5-2640 @ 2,50 ghzAttachments:( 0) backtrace_10.12.0_1601131158.txt
( 1) backtrace.txt
( 2) gdb_10.12.0_1601131158.txt
( 3) gdb.txt
Description:A call from snom320 in SRTP mode to echo test or to another phone *NOT* using SRTP is OK. Now we installed PhonerLite softphone with TLS/SRTP stuf and test with echo test: everything is OK too.

Now PhonerLite calls the snom: asterisk coredump after 3~5 seconds and we are NOT able to make anymore SRTP calls after this, they all crash asterisk. We had this issue with 10.7.0 and 10.8.0

We have logfiel from strace as well as coredump.
Comments:By: tootai (tootai) 2013-01-14 12:05:01.213-0600

Hi, me again,

the problems appears again on the IBM physical machine: on 13 of january I deleted or certificats and created new one with ast_tls_cert script, with create_ca with 4096 bits and create_cert with 2048 (first certificat was 4098/1024).

On 14/01/2013, customers called with new certificat and server crashed around 30 times (safe_asterisk is taking control of asterisk). Attached you will find backtrace and gdb from one crash.

I installed asterisk 10.11.1 -> same problem. I apply the srtp_unprotect_patch.diff (see issue #20499) with some difficulties, and start again to call demo stuff.

In console logs I see

[2013-01-14 21:48:14] NOTICE[4642]: res_rtp_asterisk.c:380 __rtp_recvfrom: We are going to run unprotect on 0x2aaac8092900
[2013-01-14 21:48:14] NOTICE[4642]: res_srtp.c:354 ast_srtp_unprotect: SRTP unprotect failed with replay check failed (index too old), retrying
[2013-01-14 21:48:14] NOTICE[4642]: res_srtp.c:393 ast_srtp_unprotect: Forcefully setting the session to NULL. This should cause the call to hangup.
[2013-01-14 21:48:14] ERROR[4642]: res_srtp.c:402 ast_srtp_unprotect: SRTP session was destroyed and could not be recovered.
[2013-01-14 21:48:14] NOTICE[4642]: res_rtp_asterisk.c:2174 ast_rtp_read: errno = Invalid argument
[2013-01-14 21:48:14] WARNING[4642]: res_rtp_asterisk.c:2177 ast_rtp_read: RTP Read error: Invalid argument. Hanging up.

No more crashes but call hangup.

Also, I see lots of

[...]
[2013-01-14 21:53:21] ERROR[4758]: tcptls.c:436 ast_tcptls_client_start: Unable to connect SIP socket to xyz.zzz.134.24:4963: Connection refused
[2013-01-14 21:53:31] ERROR[4706]: tcptls.c:436 ast_tcptls_client_start: Unable to connect SIP socket to xxx.yyy.192.41:3162: Connection timed out
[...]

Could this explain the index to Old? The above IPs are the one of customers phone (all are SNOM 870, remember mine is 320, all with firmware 8.7.3.15)

Thanks for your support.

Daniel


By: Matt Jordan (mjordan) 2013-01-14 16:28:42.646-0600

Your issue description and subsequent comment makes the situation unclear.

# Jonathan's patch from ASTERISK-20499 should be in Asterisk 1.8.20.0/10.12.0/11.2.0. Running with an unpatched version of Asterisk and reporting a crash isn't useful - his patch fixed the off nominal condition where {{libsrtp}} cannot be setup.
# The issue tracker is not a support forum. If {{libsrtp}} reports back an error that it cannot be initialized; cannot unprotect a packet, etc. - we do not have the capability to fix that. The Asterisk development team is not going to take on the burden of modifying {{libsrtp}}'s source. On that path lies embedding libraries into Asterisk and madness.

If you are running with a version of Asterisk with the patch Jonathan developed, *and* it appears as if Asterisk is interoperating with {{libsrtp}} incorrectly or is unable to protect/unprotect an RTP packet that it should be able to handle, then there is an issue.

The fact that some of your devices are apparently working should indicate, however, that the problem isn't with Asterisk.

[libsrtp bug tracker|http://sourceforge.net/tracker/?group_id=38894&atid=423799]

By: tootai (tootai) 2013-01-16 04:59:03.588-0600

Matt, sorry for not being clear, apologize.

1. I didn't know I had to wait 10.12.0 version to see Jonathan's patch included, so I apply it myself by adapting it to 10.11.1 version.
2. I don't use the issue tracker as a support, I face a problem with asterisk crashing, I'm not involved in development, I can't know if it's an asterisk issue or a libsrtp one or something else.

That say, I installed the 10.12.0 and crashes disappears till today. I attach the gdb and backtrace from the first crash.

What is to notice is that after a crash, the snom phones has to be rebooted to not generate new crash on next call. And as I see lot's of

[2013-01-16 14:15:26] ERROR[18375] tcptls.c: Unable to connect SIP socket to xxx.yyy.134.24:3165: Connection refused  

as well as

[2013-01-16 14:20:12] WARNING[18351] chan_sip.c: sip_xmit of 0x167528f0 (len 634) to xxx.yyy.134.24:3393 returned -2: Success

or

[2013-01-16 14:21:52] WARNING[18351] chan_sip.c: sip_xmit of 0x16764a80 (len 634) to xxx.yyy.134.24:3393 returned -2: No such file or directory

after a crash, is it possible that asterisk is keeping some datas from previous phone connection? This would perhaps explain why a reboot from the phone is making things working again.

If I understand the gdb, problem lies in res_srtp. Is it an asterisk issue or should I contact the libsrtp bug tracker as you suggest?

Other information: asterisk is running outside lans on Internet in DC, all customer phones are natted. This explain perhaps why Jonathan couldn't reproduce the problem.

I'm sorry if you feel that I take your time and use this tracker for support, believe me it's not the case, again, apologize.

Regards

Daniel