[Home]

Summary:ASTERISK-20499: Crash in libsrtp srtp_unprotect_rtcp when SIP channel is bridged with non-optimizing Local channel
Reporter:tootai (tootai)Labels:
Date Opened:2012-10-01 12:19:32Date Closed:2013-01-04 16:56:52.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/SRTP
Versions:10.8.0 Frequency of
Occurrence
Constant
Related
Issues:
is the original version of this clone:ASTERISK-20934 Crash in res_srtp.so when SIP channel is bridged with non-optimizing Local channel
Environment:RHEL 5.8 on IBM X3650 M4 - 12 core - Xeon E5-2640 @ 2,50 ghzAttachments:( 0) asterisk_srtp_unprotect_patch.diff
( 1) asterisk-20499_20121127.log
( 2) asterisk-20499_20121127.pcap
( 3) asterisk-20499_20121129_01.txt
( 4) asterisk-20499.txt
( 5) backtrace.txt
( 6) backtrace.txt
( 7) backtrace.txt
( 8) backtrace.txt
( 9) backtrace1213_1.8.txt
(10) backtrace1222_1.8.txt
(11) backtrace1251.txt
(12) backtrace1436.txt
(13) backtrace1442.txt
(14) backtrace1932.txt
(15) backtrace20121205.txt
(16) coredump20121001205609.txt
(17) gdb.txt
(18) gdb.txt
(19) gdb.txt
(20) gdb.txt
(21) gdb1213_1.8.txt
(22) gdb1222_1.8.txt
(23) gdb1251.txt
(24) gdb1436.txt
(25) gdb1442.txt
(26) gdb1932.txt
(27) gdb20121205.txt
(28) libsrtp-1.4.4-fix_crash_on_rtcp_decode.patch
(29) srtp_diagnostic_patch_policy_breakdown.diff
(30) srtp_diagnostic_with_sleep.diff
(31) srtp_fixes_asterisk10.diff
(32) srtp_fixes_it_maybe.diff
Description:A call from snom320 in SRTP mode to echo test or to another phone *NOT* using SRTP is OK. Now we installed PhonerLite softphone with TLS/SRTP stuf and test with echo test: everything is OK too.

Now PhonerLite calls the snom: asterisk coredump after 3~5 seconds and we are NOT able to make anymore SRTP calls after this, they all crash asterisk. We had this issue with 10.7.0 and 10.8.0

We have logfiel from strace as well as coredump.
Comments:By: tootai (tootai) 2012-10-01 12:20:25.014-0500

strace logs

By: Matt Jordan (mjordan) 2012-10-01 14:00:40.546-0500

Thank you for your bug report. In order to move your issue forward, we require a backtrace[1] from the core file produced after the crash. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

After enabling, reproduce the crash, and then execute the backtrace[1] instructions. When complete, attach that file to this issue report.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: tootai (tootai) 2012-10-02 04:02:21.769-0500

Crash with don't optimize and better backtrace. Peer astphone115 is calling echo test, crash after few seconds.

By: tootai (tootai) 2012-10-17 04:17:36.913-0500

Any news about this matter?

By: Matt Jordan (mjordan) 2012-10-17 06:56:33.345-0500

Thanks for providing the backtrace.  Without hitting the 'Send Back' after you've provided feedback, an issue may not show up in bug marshal's filters.

There's a few interesting things about your scenario:
# The crash is occurring on what appears to be a normal RTCP read.  Unfortunately, the crash isn't happening in Asterisk, but in a libsrtp - which means either its a bug in libsrtp, or the SRTP session wasn't initialized properly prior to the RTCP read.  Looking through the backtrace however it _appears_ as if the session is initialized; without seeing a DEBUG log leading up to the crash its a bit hard to tell.  Can you provide a DEBUG log leading up to the crash?
# You're using a non-optimizing Local channel in the bridge.  Does the crash occur when you have a SIP channel to SIP channel bridge?  Can you provide the appropriate portion of the dialplan that leads up to the crash?

By: tootai (tootai) 2012-10-17 09:39:32.811-0500

Sorry for the missing action concerning feedback, will remember now.

Dialplan to get core dump is easy:

exten => 800,1,Dial(Local/600@demo,,) ;call echo test
exten => 800,n,Hangup

I replace this sequence with

exten => 800,1,Playback(echo-test)
exten => 800,n,Echo
exten => 800,n,Playback(demo-echodone)
exten => 800,n,Hangup

and guess what, no more core dump!

So Local channel seems to be the cause. Have to stop to use it as I already have an open bug with him: ASTERISK-19763 ;-)

Thanks for your support, feel free to ask me if you want further details, logs or tests.

--
Daniel

By: Matt Jordan (mjordan) 2012-10-17 09:51:21.469-0500

Interesting... I'm going to try and lab this up with a SNOM to see what happens.  My guess is that since nothings on the 'other side' of the Local channel, something is getting out of wack.  I'll post here when I have a bit more info.

By: Jonathan Rose (jrose) 2012-11-08 16:54:26.568-0600

So am I following this correctly and you are saying you can get a crash to occur just by calling

exten => 800,1,Dial(Local/600@demo,,) ;call echo test
exten => 800,n,Hangup

from the SNOM 320 with SRTP and TLS enabled?  Without fail?  I've been trying to reproduce this today and have been unable to do so, also with a SNOM 320. I'm not 100% sure SRTP and TLS are working like they should be, but the following dialplan seems to show that I have at bare minimum secure signalling and secure media of some type and I'm pretty sure the only ones they could be are tls and srtp considering my setup.

exten => 800,1,NoOp($[${CHANNEL(secure_signaling)}])
exten => 800,n,NoOp($[${CHANNEL(secure_media)}])

I'd give this a shot with PhonerLite calling the SNOM as you initially suggested, but I lack a Windows machine to run it on at the moment.

I'm also using 10.8.0 of course.

By: tootai (tootai) 2012-11-12 05:48:14.494-0600

Your guesses are OK.

On our side was the problem solved by using resiprocate proxy for srtp calls which uses 1.4.3 libsrtp and we had 1.4.2 with our asterisk test servers. Our idea was to compile asterisk with the same libsrtp as repro but I didn't yet find time to do it.


By: Jonathan Rose (jrose) 2012-11-14 10:34:32.953-0600

So the latest version of libsrtp is actually 1.4.4 and it has been so for a good while... since 2007 I think.  Please upgrade your libsrtp to 1.4.4 rather than 1.4.3 and if you can reproduce this problem with that we can look into it some more.

By: abelbeck (abelbeck) 2012-11-14 17:24:34.098-0600

Upgrade to libsrtp 1.4.4 and apply the above "libsrtp-1.4.4-fix_crash_on_rtcp_decode.patch".


By: tootai (tootai) 2012-11-15 11:56:50.646-0600

Asterisk updated to 10.10.0, srtp updated to 1.4.4 and includes the crash_on_rtcp_decode.patch

As soon as snom320 calls the 800 extension I have this from peer debug:

zone-s*CLI> sip set debug peer 105
SIP Debugging Enabled for IP: xxx.xxx.xxx.xxx

<--- SIP read from TLS:xxx.xxx.xxx.xxx:3855 --->
INVITE sip:800@sip.domain.net;user=phone SIP/2.0
Via: SIP/2.0/TLS 192.168.10.105:3855;branch=z9hG4bK-tyrj4ftddmlr;rport
From: "TOOTAi" <sip:105@sip.domain.net>;tag=43aodlfc4v
To: <sip:800@sip.domain.net;user=phone>
Call-ID: 3c4b9074492e-0wv4806eilsm
CSeq: 1 INVITE
Max-Forwards: 70
Contact: <sip:105@192.168.10.105:3855;transport=tls>;reg-id=1
X-Serialnumber: 00041324C312
P-Key-Flags: keys="3"
User-Agent: snom320/8.4.35
Accept: application/sdp
Allow: INVITE, ACK, CANCEL, BYE, REFER, OPTIONS, NOTIFY, SUBSCRIBE, PRACK, MESSAGE, INFO, UPDATE
Allow-Events: talk, hold, refer, call-info
Supported: timer, 100rel, replaces, from-change
Call-Info: <sip:sip.domain.net>;appearance-index=1
Session-Expires: 3600;refresher=uas
Min-SE: 90
Content-Type: application/sdp
Content-Length: 478

v=0
o=root 1358164647 1358164647 IN IP4 192.168.10.105
s=call
c=IN IP4 192.168.10.105
t=0 0
m=audio 49152 RTP/SAVP 0 8 9 2 3 18 4 101
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:OnkcIu8z6xpkMPJszqpY4ThWidSG9FyWEPsFnniR
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:9 G722/8000
a=rtpmap:2 G726-32/8000
a=rtpmap:3 GSM/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:4 G723/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:30
a=sendrecv
<------------->
--- (20 headers 19 lines) ---
Sending to xxx.xxx.xxx.xxx:3855 (NAT)
Using INVITE request as basis request - 3c4b9074492e-0wv4806eilsm
Found peer '105' for '105' from xxx.xxx.xxx.xxx:3855
 == Using SIP RTP CoS mark 5
Found RTP audio format 0
Found RTP audio format 8
Found RTP audio format 9
Found RTP audio format 2
Found RTP audio format 3
Found RTP audio format 18
Found RTP audio format 4
Found RTP audio format 101

and the crash occurs.

By: tootai (tootai) 2012-11-15 11:59:52.382-0600

Forgot to say that I attached the new gdb and backtrace files

By: Jonathan Rose (jrose) 2012-11-16 16:41:19.278-0600

Alright, I'm still unable to reproduce this so far, but I've got a vague idea of what could be causing the crash based on the documentation for lib_srtp. I've attached a diagnostic patch which I'd like you to reproduce the crash with and then post the resulting logs to this issue.

By: tootai (tootai) 2012-11-17 05:30:00.356-0600

I applied Jonathan's patch, attached are the gdb and backtrace files.

On the screen I got

NOTICE[18771]: res_srtp.c:445 ast_srtp_create: srtp_diagnostic_patch: attempting srtp_create at 0xd2eaa40


By: tootai (tootai) 2012-11-17 05:30:47.999-0600

Patch applied and debug files sended

By: Jonathan Rose (jrose) 2012-11-19 09:22:57.385-0600

Is that the only srtp_diagnostic_patch message in the log? I was kinda hoping you'd post a log file containing everything from the start of the run to the crash :P

EDIT:  Also, the gdb.txt you posted most recently appears to not be using the patch. The backtrace.txt is however, so I'm guessing it's just an old core dump or something? It doesn't really matter since what I'm trying to look at are log messages anyway.

I'm a little bothered that the log messages might be being deferred before the crash occurs, so I'm going to update the patch real quick to include a sleep before what appears to be the offending line.

I kinda have a feeling that we shouldn't be attempting to srtp_dealloc when srtp_alloc failed and that might be causing the crash here, but that still wouldn't explain why your srtp_alloc is failing in the first place.  That said, I think it might be a 64-bit OS compatibility problem (assuming you are using 64 bit Fedora).  I'll have to try reproducing this again on my other machine since the one I tried to reproduce it on the first time is using a 32 bit operating system.

By: Jonathan Rose (jrose) 2012-11-19 09:57:27.664-0600

Reposting the diagnostic patch with sleep again. Now we should get better information about problems occurring in srtp_create.

By: tootai (tootai) 2012-11-19 10:31:10.787-0600

With sleep patch

By: tootai (tootai) 2012-11-19 10:32:44.238-0600

New debug files uploaded.

FYI those tests are done with Debian Wheezy amd64, the CentOS server being a production one.

By: Jonathan Rose (jrose) 2012-11-19 11:31:15.867-0600

Um er... I still don't see a log file in the attachments list (which was really the whole point behind the diagnostic patch). You can usually find those in /var/log/asterisk in text files like full or messages depending on how your logger.conf is set.


That aside, I think I've tracked down a particular commit that introduced this problem anyway and it really does appear to be a double free of the session. I've attached a patch that I think will fix it, though there is still the question of why srtp_create is failing in the first place. Since it is failing though, I would imagine it would be doing so in 1.8 as well... I'm not too sure about that side of things, but let's at least see if this one fixes the crash.

the patch is srtp_fixes_it_maybe.diff
give it a shot and get back to me.

By: tootai (tootai) 2012-11-19 11:58:41.088-0600

Didn't realize you where also talking about logs, sorry.

The 3 last lines before crash:

[2012-11-19 17:20:18] NOTICE[15989] res_srtp.c: srtp_diagnostic_patch: attempting srtp_create at 0x7fd93c223930                                                                
[2012-11-19 17:20:18] NOTICE[15989] res_srtp.c: srtp_diagnosic_patch: result of srtp_create at 0x7fd93c223930 was failure                                                      
[2012-11-19 17:20:18] NOTICE[15989] res_srtp.c: srtp_diagnostic_patch: attempting srtp_destroy at 0x7fd93c223930                          

I will test your new patch

By: tootai (tootai) 2012-11-19 12:11:24.487-0600

No more crash but no connection. I get this:

[2012-11-19 19:08:37] WARNING[24798]: res_srtp.c:498 ast_srtp_add_stream: Failed to add SRTP stream for SSRC 308061996
[2012-11-19 19:08:37] WARNING[24798]: sip/sdp_crypto.c:170 sdp_crypto_activate: Could not set SRTP policies
[2012-11-19 19:08:37] WARNING[24798]: chan_sip.c:9517 process_sdp: Can't provide secure audio requested in SDP offer


By: Jonathan Rose (jrose) 2012-11-19 14:18:55.816-0600

In the 11:58 post with the log you mentioned something is missing.  There should have been a line with something like...
NOTICE[15989] res_srtp.c: srtp_diagnostic_patch: srtp_errstr = _______
where _______ is something like "couldn't allocate memory" or "unspecified failure" or some other message contained within srtp_errstr.  I'm guessing you didn't use the last diagnostic patch for that one, but it would be very helpful to have that to figure out which failure you are actually getting, so
please apply the last diagnostic patch (without the one that actually fixes the crash) and regenerate that log.


Then once you have those log messages for us, I would like you to apply the crash fix patch and provide the following:

your sip.conf (you can strip out contexts that aren't relevant to the call as well as private information such as secret and whatnot, but please provide at least the general section and the relevant peers intact)

a full debug log (information on generating that can be seen here: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information) reproducing the problem

A pcap showing the SIP dialog between Asterisk and the phone making the call.

By: Jonathan Rose (jrose) 2012-11-19 14:38:58.155-0600

On a separate note:  It would also be nice to know if this worked for you in Asterisk 1.8 or any earlier branches

By: tootai (tootai) 2012-11-20 04:10:15.761-0600

Below the result: first call 10:56:11 which just hangup, second one 10:56:37 crash.

[2012-11-20 10:56:11] NOTICE[22473] res_srtp.c: srtp_diagnostic_patch: attempting srtp_create at 0xcffffa0                                                                    
[2012-11-20 10:56:11] NOTICE[22473] res_srtp.c: srtp_diagnostic_patch: result of srtp_create at 0xcffffa0 was success                                                          
[2012-11-20 10:56:11] NOTICE[22434] res_srtp.c: srtp_diagnostic_patch: attempting srtp_destroy at 0xcffffa0                                                                    
[2012-11-20 10:56:11] NOTICE[22434] res_srtp.c: srtp_diagnostic_patch: Made it through srtp_destroy at 0xcffffa0                                                              
[2012-11-20 10:56:37] NOTICE[22473] res_srtp.c: srtp_diagnostic_patch: attempting srtp_create at 0xcfd8c80                                                                    
[2012-11-20 10:56:37] NOTICE[22473] res_srtp.c: srtp_diagnosic_patch: result of srtp_create at 0xcfd8c80 was failure                                                          
[2012-11-20 10:56:37] NOTICE[22473] res_srtp.c: srtp_diagnostic_patch: attempting srtp_destroy at 0xcfd8c80    

From WARNING

[2012-11-20 10:56:11] WARNING[22473] res_srtp.c: Failed to add SRTP stream for SSRC 1552782857                                                                                
[2012-11-20 10:56:11] WARNING[22473] sip/sdp_crypto.c: Could not set SRTP policies                                                                                            
[2012-11-20 10:56:11] WARNING[22473] chan_sip.c: Can't provide secure audio requested in SDP offer

Nothing about the call from 10:56:37

I will do next steps perhaps later today but sure tomorrow. I can also test with 1.8 version.


By: tootai (tootai) 2012-11-27 06:40:05.308-0600

I verify that the patch is the diagnostic and it is. Asterisk is 10.10.0

Only logs in notice:

[2012-11-27 12:48:59] NOTICE[23781] res_srtp.c: srtp_diagnostic_patch: attempting srtp_create at 0x17bb6e0                                                                    
[2012-11-27 12:48:59] NOTICE[23781] res_srtp.c: srtp_diagnosic_patch: result of srtp_create at 0x17bb6e0 was failure                                                          
[2012-11-27 12:48:59] NOTICE[23781] res_srtp.c: srtp_diagnostic_patch: attempting srtp_destroy at 0x17bb6e0

and then the crash

I also applyed manualy your patch to 1.8.10.1

[2012-11-27 13:37:52] NOTICE[14468]: res_srtp.c:442 ast_srtp_create: srtp_diagnostic_patch: attempting srtp_create at 0x2c242b0
[2012-11-27 13:37:52] NOTICE[14468]: res_srtp.c:444 ast_srtp_create: srtp_diagnosic_patch: result of srtp_create at 0x2c242b0 was failure
[2012-11-27 13:37:52] WARNING[14468]: sip/sdp_crypto.c:167 sdp_crypto_activate: Could not set local SRTP policy
[2012-11-27 13:37:52] WARNING[14468]: chan_sip.c:9128 process_sdp: Can't provide secure audio requested in SDP offer

Will now apply the working patch and provide you the asked informations


By: tootai (tootai) 2012-11-27 09:02:12.026-0600

All details in attached documents.

By: Jonathan Rose (jrose) 2012-11-28 14:51:40.458-0600

Once again, I need you to use the latest (emphasis on latest) diagnostic patch (posted at 19/Nov/12 9:57 AM). That is to say srtp_diagnostic_with_sleep.diff - I am quite certain that you are still not using this patch because a log message between 'attempting srtp_create at %p' and 'result of srtp_create at %p' is missing from the results you are giving me, and there are no conditions for this log message to appear. To make things a little easier, I've deleted the old patch which didn't contain this log message.

If you don't use that particular version of the diagnostic patch, I won't be able to see why srtp_create is failing.

In the meantime, I'll go over the pcap and your configurations to see if I can't figure out why I've been having so much trouble reproducing this locally.

EDIT: Oh, and one other thing, was the crash reproducible in 1.8 before any patching?

By: tootai (tootai) 2012-11-28 15:08:55.018-0600

Jonathan,

I'm sure that the applied patch is the diagnostic_with_sleep as I told above. Perhaps the multi applying of patches created a misconfiguration. Tomorrow I will start from a fresh 10.10.0 version.

By: Jonathan Rose (jrose) 2012-11-28 16:18:08.399-0600

Well, from the patch...

+ ast_log(LOG_NOTICE, "srtp_diagnostic_patch: attempting srtp_create at %p\n", &temp->session);
+ res = srtp_create(&temp->session, &policy->sp);
+ ast_log(LOG_NOTICE, "srtp_diagnostic_patch: srtp_errstr = %s\n", srtp_errstr(res));

For you to have received the first log message there and not the second without something that just stops the whole process from occurring happening in srtp_create isn't possible. And we know that this didn't happen because you received the log message from the following elements of the patch:

+ if (res != err_status_ok) {
+ ast_log(LOG_NOTICE, "srtp_diagnosic_patch: result of srtp_create at %p was failure\n", &temp->session);


Also in your 1.8 log messages, it is plainly visible that there is only one line between the first log message you received in and the second (occuring on 442 and 444 respectively). In the most recent version of the patch there are 3 lines between those two log messages. Since there was only one line between the log messages mentioned in the original patch, you should pretty easily be able to see why I believe you applied the wrong patch.

While I'm working on it, the extensions you included with your file showing sip settings and such omitted some macros that seem to be essential to completing the call. Could I please get the contents of macro-filterExten, macro-setDialingExtenFromSipHeader, macro-monitor, and macro-OFFICE-Numbers?

By: tootai (tootai) 2012-11-29 06:04:09.673-0600

From a fresh install, indeed patch was wrong. Sorry.

[2012-11-29 12:59:26] NOTICE[25858]: res_srtp.c:446 ast_srtp_create: srtp_diagnostic_patch: attempting srtp_create at 0x1ea63f0
[2012-11-29 12:59:26] NOTICE[25858]: res_srtp.c:448 ast_srtp_create: srtp_diagnostic_patch: srtp_errstr = unsupported parameter
[2012-11-29 12:59:26] NOTICE[25858]: res_srtp.c:450 ast_srtp_create: srtp_diagnosic_patch: result of srtp_create at 0x1ea63f0 was failure
[2012-11-29 12:59:26] NOTICE[25858]: res_srtp.c:475 ast_srtp_destroy: srtp_diagnostic_patch: attempting srtp_destroy at 0x1ea63f0

Other datas you asked for are in last attached file

By: Jonathan Rose (jrose) 2012-11-29 12:25:33.633-0600

Alright, that actually helps to narrow things down a bit. I have a suspicion right now that the key in the SRTP policy isn't being set for some reason but we are still reaching the srtp_create function with that policy for some reason. I'm attaching a new diagnostic patch (srtp_diagnostic_patch_policy_breakdown.diff) to confirm that. If I'm correct, then I'll have to trace that to where the policy key is generated and find out what cases allow for the policy to be allocated while a key pointer isn't assigned.

Please apply the patch and tell me what the new log messages say when you make the call. They should look roughly like this:

[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:439 ast_srtp_create: session pointer: 0x8f7a9e0
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:440 ast_srtp_create: policy pointer: 0x8f77908
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:441 ast_srtp_create: Just for grins: 0x8f77908
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:445 ast_srtp_create: ssrc type = 2
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:446 ast_srtp_create: ssrc value = 0
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:448 ast_srtp_create: key pointer = 0x8f77978
[Nov 29 12:16:59] NOTICE[1172]: res_srtp.c:449 ast_srtp_create: key value = 29

Though if my suspicion is correct the key pointer will be NULL and the value will either be 0 or some garbage data.

By: tootai (tootai) 2012-11-29 15:22:11.103-0600

Here the output:

[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:439 ast_srtp_create: session pointer: 0x15344d0
[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:440 ast_srtp_create: policy pointer: 0x16a03f8
[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:441 ast_srtp_create: Just for grins: 0x16a03f8
[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:445 ast_srtp_create: ssrc type = 2
[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:446 ast_srtp_create: ssrc value = 0
[2012-11-29 22:18:12] NOTICE[17952]: res_srtp.c:448 ast_srtp_create: key pointer = 0xdb6cc0

key pointer is not NULL and there is no key value

By: Jonathan Rose (jrose) 2012-11-29 17:09:37.600-0600

No key value? That doesn't make sense. There has to be a value even if it's 0. Do you mean the log message wasn't generated? I suppose that could happen if the log output got deferred and then it crashed.

Could you clarify? Thanks for sticking this out so far.

By: tootai (tootai) 2012-11-30 01:56:43.081-0600

No key value. Below the logs from notice file. As you can see, the last line is the first information recorded when I restarted asterisk.

[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: session pointer: 0x15344d0                                                                                                    
[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: policy pointer: 0x16a03f8                                                                                                      
[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: Just for grins: 0x16a03f8                                                                                                      
[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: ssrc type = 2                                                                                                                  
[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: ssrc value = 0                                                                                                                
[2012-11-29 22:18:12] NOTICE[17952] res_srtp.c: key pointer = 0xdb6cc0                                                                                                        
[2012-11-29 22:23:43] NOTICE[18113] cdr.c: CDR simple logging enabled.


By: Jonathan Rose (jrose) 2012-11-30 09:17:24.171-0600

Ok, it sounds like Asterisk just crashed before that log message went through then. Could you please apply the patch that stopped the crashing on top of that one and tell me what you get?

By: tootai (tootai) 2012-11-30 11:09:53.465-0600

Now we have the key value ;-):

[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:439 ast_srtp_create: session pointer: 0xea63b90
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:440 ast_srtp_create: policy pointer: 0xe956018
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:441 ast_srtp_create: Just for grins: 0xe956018
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:445 ast_srtp_create: ssrc type = 2
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:446 ast_srtp_create: ssrc value = 0
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:448 ast_srtp_create: key pointer = 0xe93e910
[2012-11-30 18:08:10] NOTICE[16645]: res_srtp.c:449 ast_srtp_create: key value = 7
[2012-11-30 18:08:10] WARNING[16645]: res_srtp.c:511 ast_srtp_add_stream: Failed to add SRTP stream for SSRC 1512096749
[2012-11-30 18:08:10] WARNING[16645]: sip/sdp_crypto.c:170 sdp_crypto_activate: Could not set SRTP policies
[2012-11-30 18:08:10] WARNING[16645]: chan_sip.c:9509 process_sdp: Can't provide secure audio requested in SDP offer



By: Jonathan Rose (jrose) 2012-12-03 09:47:31.898-0600

Sorry about this, but I've basically exhausted the angles I can think of to approach this problem from without hacking on the library itself to isolate it further and at present, I can't devote any more time to this issue. I've tried to recreate every detail of your setup from configurations and the phone I'm using to actually making sure my SIP invite is a perfect mirror of yours, and everything seems nearly identical. The difference probably lies in some crypto-specific stuff within the policy struct received by libsrtp, but I can't confirm that.

I'm going to go ahead and post a review for the crash itself. The problems you are having might be related to specifics of your operating system (including differences in libraries), so if you want to go further down the rabbit hole with this issue, reproducing it (or failing to) on a different machine would be a good start. You should open a new issue now that the crash is resolved, though a link to this one would be a good idea.

By: tootai (tootai) 2012-12-05 11:58:52.347-0600

Hi Jonathan,

I tried your solution against the usual test server with previous patch applied and get:

[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:441 ast_srtp_create: session pointer: 0x2ef2530
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:442 ast_srtp_create: policy pointer: 0xdd28fb8
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:443 ast_srtp_create: Just for grins: 0xdd28fb8
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:447 ast_srtp_create: ssrc type = 2
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:448 ast_srtp_create: ssrc value = 0
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:450 ast_srtp_create: key pointer = 0x2dbf030
[2012-12-05 18:52:40] NOTICE[21308]: res_srtp.c:451 ast_srtp_create: key value = 41
[2012-12-05 18:52:40] WARNING[21308]: sip/sdp_crypto.c:170 sdp_crypto_activate: Could not set SRTP policies
[2012-12-05 18:52:40] WARNING[21308]: chan_sip.c:9509 process_sdp: Can't provide secure audio requested in SDP offer

I tried on another server -the one for which I opened this bug, see details in my first message- asterisk is still crashing

I will send you the backtrace

By: tootai (tootai) 2012-12-05 12:13:25.162-0600

backtrace from IBM server with your last patch from version 10 repo (applied manualy to a 10.10.0 asterisk)

By: Matt Jordan (mjordan) 2012-12-05 12:19:19.084-0600

The fact that you're getting a 'bad memory access' error when it crashes - and the the location that causes this error is in libsrtp - makes me think that this is a product of the system you're running on.  As such, we're highly unlikely to be able to reproduce or resolve this issue.

By: tootai (tootai) 2012-12-05 12:50:53.559-0600

I remain that todays tests where done on 2 different servers: physical server running RH5.8 and virtual machine running debian wheezy on an HP DL server.

I made another test on a third server with asterisk 1.8 under wheezy in a virtual machine on a Dell R210 II server, call finish also in crash. If you still have some doubts concerning the system ... ;-)

Am I the only one trying to get SRTP/TLS to work with asterisk? Hope not!

If you see something else I could test, let me know.

By: Jonathan Rose (jrose) 2012-12-05 13:27:01.963-0600

No, you are just the only one we are seeing who is having these particular problems. It's weird because I had no trouble setting up a working srtp+tls enabled peer using the same phone and most of the same settings as you are.

EDIT:
It looks like the patch did cause a different crash also occuring with srtp_create for you though.  I'll poke at that and try to fix it real quick.

EDIT:
Ehhhh, looking over it again I'm not so sure.  Are you using fresh checkouts of 10 from svn to cause this crash?  I noticed you posted the various diagnostic patch messages in this last log dump you were talking about, but that had nothing to do with the the actual patch that went in.

EDIT:
I found out why the crash in unprotect is happening... and this is messier than I'd anticipated. A lot of stuff isn't handling the srtp_create failures well at all and getting things to do so is certainly proving tricky.

By: tootai (tootai) 2012-12-05 16:34:03.953-0600

Will install tomorrow a fresh 10.10 version with your last patch and give atry.

By: Jonathan Rose (jrose) 2012-12-05 16:58:57.250-0600

Before you do that, can you confirm for me that you are in fact using a completely fresh 10.10.0 for the purpose of the backtraces from before?  I've been trying to match some line numbers to some calls in 10.10.0 against the backtrace you provided and one of the lines in particular doesn't seem to make a great deal of sense:

#6  0x00007f78259425aa in sdp_crypto_process (p=0xd2e6060, attr=0xd28f339 "crypto:1 AES_CM_128_HMAC_SHA1_80 inline:PHWJdnd8Sx9fa8a7SkqGRxYxYwiEpXZrXI8w1Y7Y", rtp=0xd2d12c8,
   srtp=0xfe9830) at sip/sdp_crypto.c:275
       str = 0x0
       tag = 0x7f7808fd4b27 "1"
       suite = 0x7f7808fd4b29 "AES_CM_128_HMAC_SHA1_80"
       key_params = 0x0
       key_param = 0x0
       session_params = 0x0
       key_salt = 0x7f7808fd4b48 "PHWJdnd8Sx9fa8a7SkqGRxYxYwiEpXZrXI8w1Y7Y"
       lifetime = 0x0
       found = 1
       attr_len = 80
       key_len = 30
       suite_val = 1
       remote_key = "<u\211vw|K\037_kƻJJ\206G\026\061c\b\204\245vk\\\217\060", <incomplete sequence \330>
       __PRETTY_FUNCTION__ = "sdp_crypto_process"
#7  0x00007f7825934384 in process_crypto (p=0xd2bdcf8, rtp=0xd2d12c8, srtp=0xd2c13b0,
   a=0xd28f339 "crypto:1 AES_CM_128_HMAC_SHA1_80 inline:PHWJdnd8Sx9fa8a7SkqGRxYxYwiEpXZrXI8w1Y7Y") at chan_sip.c:30938
       __PRETTY_FUNCTION__ = "process_crypto"

When I checked line 30938 I noticed that that the nearest call to sdp_crypto_process was actually occurring at 30930 8 lines before that one and that was as close as it came in any of the versions we've been looking at for this issue and that would seem to suggest that there are local modifications to chan_sip (which hasn't been the subject of any of my patches) for all of the first three backtraces.

Following that further back I saw more line number discrepencies such as right here:

#9  0x00007f782590d873 in handle_request_invite (p=0xd2bdcf8, req=0x7f7808ff9150, debug=0, seqno=1, addr=0xffa4a8, recount=0x7f7808ff81ec,
   e=0xd28ef9f "sip:800@sip3.tootai.net;user=phone", nounlock=0x7f7808ff81e8) at chan_sip.c:23571

which should have been process_sdp, but the closest instance of that call was at 23563 (again, 8 lines off).

Aside from that though, I'm curious, is the new crash occurring on all three of your systems?  I'm a little confused by how you still could have gotten a crash on the system that generated this backtrace (without the new patches applied) since no reading or writing of frames should have been handled at this point. process_sdp should have failed here and the call should have ended... or at least it does on my box when I force ast_srtp_create to fail.

By: tootai (tootai) 2012-12-06 04:56:08.948-0600

1436 is 10.10.0 fresh install
1442 is 10.10.0 with patch 377261 manually applied

By: tootai (tootai) 2012-12-06 05:02:37.031-0600

It's an 10.7.0 against which 10.8.0 10.9.0 and 10.10.0 patch where applied. Then I applied diff 377261 manually. srtp is 1.4.4.

I made a fresh reinstall on this server RH el5 with 16G memory, 10.10.0.tar.gz downloaded from asterisk.org I attach 4 new debug files, crash with stock version as well as with patch 377261 applied.

Will do the same with 1.8 server

By: tootai (tootai) 2012-12-06 05:35:45.336-0600

Fresh 1.8 install to a virtual machine running under Wheezy (Copy of our production server). 1213 is crash from stock version, 1222 is crash with 377261 patch apllied

Edit: patch 377256

By: tootai (tootai) 2012-12-06 06:13:05.959-0600

This is 10.10.0 from another VM running under Wheezy on a third server (the one generally used to debug this issue). One patch 377261 is applied, there is no more crash unlike the first test on the physical IBM machine in the same conditions. Instead, I have

[2012-12-06 13:04:21] WARNING[5706]: sip/sdp_crypto.c:170 sdp_crypto_activate: Could not set SRTP policies
[2012-12-06 13:04:21] WARNING[5706]: chan_sip.c:9509 process_sdp: Can't provide secure audio requested in SDP offer


By: tootai (tootai) 2012-12-07 09:43:10.497-0600

I installed a new snom870 with firmware 8.7.3.15 and connect it to the IBM server running 10.10: no more crash! I then update the firmware from snom320 to the same firmware version (previous version was 8.4.35): no more crash too with the IBM server as well as with the VM running asterisk 1.8

Only the VM running 10.10 still gives me

[2012-12-07 16:31:12] WARNING[17433]: res_srtp.c:500 ast_srtp_add_stream: Failed to add SRTP stream for SSRC 97310990
[2012-12-07 16:31:12] WARNING[17433]: sip/sdp_crypto.c:170 sdp_crypto_activate: Could not set SRTP policies
[2012-12-07 16:31:12] WARNING[17433]: chan_sip.c:9509 process_sdp: Can't provide secure audio requested in SDP offer

Well, don't know what to say :-(

By: tootai (tootai) 2012-12-07 10:04:13.271-0600

Enjoy to early. 3 calls after, again core dump.

By: tootai (tootai) 2012-12-07 10:18:02.368-0600

Debug after the first core dump. Before this, 3 calls (2 from phone to phone, one from phone to echo test) passed smoothly using TLS/SRTP.

By: Jonathan Rose (jrose) 2012-12-07 11:29:32.044-0600

Hey Tootai, for the most recent core dump you are posting, have you applied the patch from reviewboard that I haven't actually mentioned in this issue anywhere?

https://reviewboard.asterisk.org/r/2228/

I'm guessing no (because you had no way to know to do so), but what I'm getting at is that this should resolve the crashes from ast_srtp_unprotect that you are seeing in there.

As per most of my patches in this issue, there is no expectation for this to actually resolve your SRTP session problems, just the crash. What I expect to see is for the call involved to immediately be terminated but for normal operation to continue instead of getting a segfault.

By: tootai (tootai) 2012-12-12 06:09:46.608-0600

Jonathan: an all core dump was your patch applied.

Anyway, I found out what happends: after having upgraded the snom320 firmware, I try secure calls to 2 servers without having any problem, and the third one had the failure described on 07/Dec/12 5:18PM. From now, doesn't matter which server I call, all secure calls will generate a core dump.

Now I restart the snom320 and ... everything is coming back to a normal behaviour, no more crashes. This bring two questions:

- why asterisk crashes on the third server. I will investigate ...
- why the snom is able to crash other servers *only in SRTP mode* after the first crash

Daniel

By: Jonathan Rose (jrose) 2012-12-12 09:41:04.353-0600

Can you recreate that with a sip debug enabled on all servers?  If you can, I'd like you to show me the following:

*Incoming log messages when either of the first two servers gets a call from the SNOM and it works.
*Incoming log messages on the third server you mentioned when it triggers the initial crash.
*Incoming log messages on whichever of the first two you tested above when the SNOM will now trigger a crash.

Please test a fresh checkout from the asterisk 10 branch (don't use an already released version, get it from http://svn.asterisk.org/svn/asterisk/branches/10) and apply the patch from reviewboard.  For your convenience, I will post said patch to the issue as: asterisk_srtp_unprotect_patch.diff

EDIT:  Hold up for a few minutes actually.  I'm going to be adding some extra log messages to the patch when for you to test with.

By: Jonathan Rose (jrose) 2012-12-12 10:40:26.409-0600

Ok, I'm attaching asterisk_srtp_unprotect_patch right now.

Here's what I want you to do:
For each box you are testing with, check out a fresh version of Asterisk 10 from the svn repository.

$ svn co http://svn.asterisk.org/svn/asterisk/branches/10

$ cd 10

<at this point download the patch to the working directory>

$ patch -p0 < asterisk_srtp_unprotect_patch.diff

$ ./configure --enable-dev-mode

$ make

$ make uninstall

$ make install

All the above is just to make sure everything is cleared out and we are at the same starting point. There should be absolutely no modifications to chan_sip.c or anything.

Start asterisk and turn on sip debug.

From here do whatever it is you are doing to cause the crash. Give me all log messages and a fresh backtrace for each system that crashes.

Just to simplify things, only use one of the systems that normally works and the box that seems to start causing the problems.

I want to see the normal case and the case where crashes start occurring for the box that you are describing to work normally.

By: tootai (tootai) 2012-12-29 09:09:49.697-0600

Hi Jonathan,

sorry for delay, work to finish before christmas :-)

Well, I started to do what you asked me for, but face a new problem: in the mean time I upgraded my snom from 8.4.35 firmware to 8.7.3.15 and now no more TLS/STRP stuff :-(

I opened a issue on snom forum, you can find it here http://forum.snom.com/index.php?showtopic=11011 with all details and logs.

This story brings me crazy, don't know what to think (and to do)

Daniel

By: Jonathan Rose (jrose) 2013-01-04 16:43:29.264-0600

Well, for now we can go ahead and set this aside with the current patch which was approved the other day which resolves the issues (or at bare minimum pushes them deeper into the abyss) we were able to identify. I never was able to reproduce your actual crash, so if you can't do it now either we are pretty effectively stuck on that one. If you get everything back in order and are still encountering these crashes related to srtp, it'll be appropriate to open a new issue at that time coupled with new backtraces and such.

By: tootai (tootai) 2013-01-06 10:54:25.925-0600

Hi Jonathan,

I installed a fresh 10.11.1 on the server that causes problem, reset my snom320 to factory value (asked by SNOM support) and enter again the datas.

FYI, in snom320, ID 11 is the one connected to the server in VM, ID 12 the one connected to physical IBM. I have both certificats loaded in the phone, first in the list is the one from IBM server, second the one from VM server.

I pass a call in TLS/SRP on 10.11.1 stock version: server crashed.

I then apply the below patch I created from a diff from previous 10.10.0 asterisk version.

--- res/res_srtp.c      2012-05-18 16:01:56.000000000 +0200
+++ res/res_srtp.c      2012-12-06 13:00:22.593095998 +0100
@@ -37,7 +37,7 @@

#include "asterisk.h"

-ASTERISK_FILE_VERSION(__FILE__, "$Revision: 366881 $")
+ASTERISK_FILE_VERSION(__FILE__, "$Revision: 377256 $")

#include <srtp/srtp.h>

@@ -383,6 +383,8 @@
                                       retry++;
                                       ao2_iterator_destroy(&it);
                                       goto tryagain;
+                               } else {
+                                       srtp->session = NULL;
                               }
                               ao2_t_ref(policy, -1, "Unreffing first policy after srtp_create failed");
                       }
@@ -437,6 +439,8 @@

       /* Any failures after this point can use ast_srtp_destroy to destroy the instance */
       if (srtp_create(&temp->session, &policy->sp) != err_status_ok) {
+               /* Session either wasn't created or was created and dealloced. */
+               temp->session = NULL;
               ast_srtp_destroy(temp);
               return -1;
       }

Calls are no more crashing asterisk but they still don't get through, staying with an error asterisk-10.11.1/main/rtp_engine.c->int ast_rtp_instance_add_srtp_policy(struct ast_rtp_instance *instance, struct ast_srtp_policy *remote_policy, struct ast_srtp_policy *local_policy) where

if (!instance->srtp) {                                                                                                                                                
            res = res_srtp->create(&instance->srtp, instance, remote_policy);

returns -1 as you can see here (I added my own logs):

[2013-01-06 17:08:48] WARNING[3526] rtp_engine.c: create instance result: -1 in ast_rtp_instance add srtp policy                                                              
[2013-01-06 17:08:48] WARNING[3526] rtp_engine.c: returned result: -1 in ast_rtp_instance add srtp policy                                                                      
[2013-01-06 17:08:48] WARNING[3526] sip/sdp_crypto.c: Could not set SRTP policies    

On the second server (physical IBM) everything is OK, asterisk 10.10.0 with the above patch applied.

This bring me to a question: would it be possible that the problem on the VM server to setup SRTP is coming from certificat, eg first one used instead of the second one (if certificat is used for SRTP)?

Daniel

By: tootai (tootai) 2013-01-06 10:58:21.555-0600

The working patch to avoid crash