[Home]

Summary:ASTERISK-16665: Asterisk Crash on RTCP package in SRTP mode
Reporter:Bernhard S (bernhards)Labels:
Date Opened:2010-09-10 01:52:08Date Closed:2011-10-05 17:33:29
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_srtp
Versions:Frequency of
Occurrence
Related
Issues:
is related toASTERISK-18570 Crashes in RTCP handling
Environment:Attachments:( 0) bt_full_not_optimized.txt
( 1) bt_full.txt
( 2) bt_not_optimized.txt
( 3) bt.txt
( 4) bt-full.txt
( 5) srtp-crash-after-5sek.pcap
Description:"snom360-SIP 8.4.18 42570" connected to Asterisk with TLS. snom makes an outbound call to another phone (without srtp). Other telephone does ring - then Asterisk does crash.

libsrtp version 1.4.4. was used - without a change.

****** ADDITIONAL INFORMATION ******

Last Output of the "full-log":
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Ooh, format changed from unknown to ulaw
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Created smoother: format: ulaw ms: 20 len: 160
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Starting RTCP transmission on RTP instance '0x9bab4d8'
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Setting RTCP address on RTP instance '0x9bab4d8'
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: RTP NAT: Got audio from other end. Now sending to address 212.47.191.70:30456
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Ooh, format changed from unknown to alaw
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Created smoother: format: alaw ms: 20 len: 160
[Sep 10 08:26:34] DEBUG[22299] res_rtp_asterisk.c: Starting RTCP transmission on RTP instance '0xb4334c58'

GDB backtracce:

#0  0x0085d612 in rdb_add_index () from /usr/lib/asterisk/modules/res_srtp.so
(gdb) bt
#0  0x0085d612 in rdb_add_index () from /usr/lib/asterisk/modules/res_srtp.so
#1  0x008577a1 in srtp_unprotect_rtcp () from /usr/lib/asterisk/modules/res_srtp.so
#2  0x00856199 in ast_srtp_unprotect (srtp=0x9baee20, buf=0xb4184668, len=0xb418c6f4, rtcp=1) at res_srtp.c:328
#3  0x0074855a in __rtp_recvfrom (instance=0x9bab4d8) at res_rtp_asterisk.c:328
#4  rtcp_recvfrom (instance=0x9bab4d8) at res_rtp_asterisk.c:337
ASTERISK-1  ast_rtcp_read (instance=0x9bab4d8) at res_rtp_asterisk.c:1637
ASTERISK-2  0x0074b4af in ast_rtp_read (instance=0x9bab4d8, rtcp=1) at res_rtp_asterisk.c:1975
ASTERISK-3  0x045a84c5 in sip_rtp_read (ast=0xb432c260) at chan_sip.c:6839
ASTERISK-4  sip_read (ast=0xb432c260) at chan_sip.c:6920
ASTERISK-5  0x080bb5b3 in __ast_read (chan=0xb432c260, dropaudio=0) at channel.c:3742
ASTERISK-6 0x0089666b in wait_for_answer (in=0xb432c260, outgoing=0xb4331890, to=0xb418e170, peerflags=0xb418e1a0,
   opt_args=0xb418e0b0, pa=0xb418d7f0, num_in=0xb418e104, result=0xb418e15c, dtmf_progress=0x0, ignore_cc=1)
   at app_dial.c:1334
ASTERISK-7 0x00898a11 in dial_exec_full (chan=0xb432c260, data=<value optimized out>, peerflags=0xb418e1a0, continue_exec=0x0)
   at app_dial.c:2236
ASTERISK-8 0x0089d199 in dial_exec (chan=0xb432c260, data=0xb4190254 "SIP/sip07/0041445633931:::,120,t") at app_dial.c:2733
ASTERISK-9 0x08137fab in pbx_exec (c=0xb432c260, app=0x9adf560, data=0xb4190254 "SIP/sip07/0041445633931:::,120,t")
Comments:By: Marcello Ceschia (marcelloceschia) 2010-09-10 08:17:33

workaround for this:
-disable RTCP Support in: Advanced->SIP/RTP->RTCP Support

By: Bernhard S (bernhards) 2010-09-10 08:46:09

Yes I know. But this does not solve the main problem :-)

By: Marcello Ceschia (marcelloceschia) 2010-09-10 09:24:54

but it looks like a firmware issue on the snom.
I had the same issue with 8.4.18
I downgraded to 8.2.35 -> no problem
upgrade back to 8.4.18 -> no problem any more

By: Bernhard S (bernhards) 2010-09-10 09:34:51

That might be a timing issue but neither the server not the client should crash if the other party does something wrong.

By: Marcello Ceschia (marcelloceschia) 2010-09-10 09:44:25

you are right, but first of all we should fined the reason when this happens

By: Paul Belanger (pabelanger) 2010-09-10 16:10:40

We will need an unoptimized (see below) backtrace.

---
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Bernhard S (bernhards) 2010-09-10 17:03:17

I already added the last ~ 20 lines of the backtrace which does show what happend within the RTCP flow. As marcelloceschia said, he has also the bug - therefore I think its also possible to reproduce this ticket. I think this is an timing issue related to the establishment of the SRTP session. Maybe the RTCP is the VERY first SRTP package which comes in and some SRTP stuff is not already initialized.

By: Paul Belanger (pabelanger) 2010-09-10 19:04:15

Yes, however we still need you to follow the instructions in doc/backtrace.txt.  More information is better then less.

By: Bernhard S (bernhards) 2010-09-11 01:48:09

Hi,

I added the backtrace and backtrace_full. I dont know if they were created on an optimized asterisk. I have to check that and try to get "non-optimized" if optimized.

Maybe you could already find something interessting or someone else could reproduce the failure, too.

Thanks for your help and your effort.

Best regards,
Bernhard

By: Marcello Ceschia (marcelloceschia) 2010-09-11 02:26:37

I hope i can reproduce this issue.
I got the crash always after 5 seconds connection time, but it looks like it only happend if the phone have an uptime > 12 hours.

By: Paul Belanger (pabelanger) 2010-09-11 08:44:22

bernhards: Yes, they are still optimized, you can tell when you see "<value optimized out>" within a backtrace.

By: Bernhard S (bernhards) 2010-09-13 01:23:26

Please find attached the not-optimized backtraces of the segfault.

Best regards,
Bernhard

By: Marcello Ceschia (marcelloceschia) 2010-09-13 08:03:23

I added an not optimized backtrace: bt-full.txt, after 5 seconds connection I got this core dump
wiresharke trace for this: srtp-crash-after-5sek.pcap

The uptime of the snom360 Uptime:1 days, 2 hours, 33 minutes
I use the res_srtp.c.patch from issue ASTERISK-1732563



By: Leif Madsen (lmadsen) 2010-09-13 10:43:27

Also, issues that are somewhat hardware related require information to be produced by the reporter as the developers do not necessarily have access to the same hardware you are using to produce the crash.

By: Bernhard S (bernhards) 2010-09-27 01:50:06

Which informations do you need to fix that issue for the rc3? I think this is a major bug within the SRTP stack which is also maybe a release blocker?

By: Bernhard S (bernhards) 2010-10-21 01:30:01

Will there be a bugfix before Asterisk 1.8  or not?

By: 1stbs (1stbs) 2010-10-26 12:25:24

hello i got the same prob
i try to fix it and get follow analysis

the error occur in  svn/  trunk/ srtp/ crypto/ replay/ rdb.c
exactly in the function rdb_add_index when it calls
v128_set_bit(&rdb->bitmask, rdb_bits_in_bitmask-delta);
i have added some debbuging information in the function
index is : 3010
rdb->window_start is : 0
rdb_bits_in_bitmask is : 128
delta in else tree is : 2883

rdb_bits_in_bitmask-delta= -2755
and with that negativ value you got the the core dump

right now i have just out commented the else tree to devoid the crash. it seems to work but replay function is damaged a little bit.
Perhaps anybody here has a better idea to solve this issue

By: Birger "WIMPy" Harzenetter (wimpy) 2010-11-08 14:10:51.000-0600

I've got the same issue in Asterisk 1.8.0.
With a freshly started phone everything is ok.
After some uptime of the phone, about half an hour seems be be enough, Asterisk segfaults for me as well.
Interesting points for me are:
- According to the phones display it's an insecure connection. (yes, I'm using TLS as well)
- Asterisk always crashes after nearly 5 seconds. But during that time audio works in both directions.

What can we do to help pinpoint the issue?

By: Andrew Fried (afried) 2010-12-06 10:01:47.000-0600

This problem also exists in 1.8.1-rc1.

I'm also using Snom 300 phones with the 8.4.18 firmware.  As marcelloceschia stated, disabling RTCP Support on the phone itself did seem to solve the immediate problem.

I'd like to mirror bernhards concerns that this is a very serious bug and hope that Digium elevates this bug in their queue accordingly.

By: Gianluca Varisco (gvarisco) 2010-12-22 03:14:13.000-0600

I confirm the behavior also on 1.8.x trunk

By: Andrew Fried (afried) 2010-12-22 14:46:05.000-0600

Is there any way we can get this issue elevated to a priority above "regular" and assigned to someone for remediation?  

All Snom phones default configuration have "RTCP Support" turned on, and any one phone accepting a secure call with that setting cause Asterisk to core dump within 5 seconds of receiving a call when TLS/SRTP are active.

The bug surfaced in 1.8.0 (with the support of SRTP), and remain in both 1.8.1 and 1.8.2-rc1.

By: Leif Madsen (lmadsen) 2011-01-05 15:03:18.000-0600

Issues are prioritized and assigned to developers as time and resources allow.

By: Bob Beers (bbeers) 2011-02-09 09:40:22.000-0600

I have a similar issue, but I did not get Asterisk crash, just call hangup.
I would get this warning from Asterisk when RTCP packet arrived, about 5 seconds after call started:

WARNING[3181]: res_rtp_asterisk.c:1817 ast_rtcp_read: RTCP Read error: Success.  Hanging up.

My problem occurs with Asterisk 1.8.0-beta2 and libsrtp-1.4.4.  The other end is Avaya Aura AS5300 13.0.0.4.
TEO TSG-6 <-- no TLS --> Asterisk <-- TLS --> Avaya <-???-> ???
No, I don't have a wireshark trace, but I have it on my todo list.
I worked-around the issue by patching res_rtp_asterisk.c to ignore RTCP packets if SRTP was active.  Then calls would stay up.  
I don't recommend the patch, but I'll attach it to help developers isolate the issue.



By: Bob Beers (bbeers) 2011-02-09 09:43:17.000-0600

Well, truth is I feel so sure that the patch is a bad idea,
I don't even want to attach it as a submission,
but I'll paste it in-line here for informational purposes ...

Index: res/res_rtp_asterisk.c
===================================================================
--- res/res_rtp_asterisk.c      (revision 303637)
+++ res/res_rtp_asterisk.c      (working copy)
@@ -1640,6 +1640,7 @@
static struct ast_frame *ast_rtcp_read(struct ast_rtp_instance *instance)
{
       struct ast_rtp *rtp = ast_rtp_instance_get_data(instance);
+       struct ast_srtp *srtp = ast_rtp_instance_get_srtp(instance);
       struct ast_sockaddr addr;
       unsigned int rtcpdata[8192 + AST_FRIENDLY_OFFSET];
       unsigned int *rtcpheader = (unsigned int *)(rtcpdata + AST_FRIENDLY_OFFSET);
@@ -1652,6 +1653,11 @@
                               0, &addr)) < 0) {
               ast_assert(errno != EBADF);
               if (errno != EAGAIN) {
+                       if (srtp != NULL) {
+                               /* If we have an srtp stream, ignore srtcp (for now). */
+                               ast_log(LOG_WARNING, "RTCP Read error: %s.  We ignore during SRTP.\n", strerror(errno));
+                               return &ast_null_frame;
+                       }
                       ast_log(LOG_WARNING, "RTCP Read error: %s.  Hanging up.\n", strerror(errno));
                       return NULL;
               }



By: Roman Shpount (rshpount) 2011-04-07 23:33:45

There is an invalid index access in rdb.c when adding an index. When the sequence number of the SRTCP packet is outside of window by more then window size, a bit is set outside of the window bitmap. When sequence number is outside of the window size by more then one, a wrong bit is set. Here is the proposed fix.

In the current code in rdb.c:
err_status_t
rdb_add_index(rdb_t *rdb, uint32_t index) {
 uint32_t delta;  

 /* here we *assume* that index > rdb->window_start */

 delta = (index - rdb->window_start);    
 if (delta < rdb_bits_in_bitmask) {

   /* if the index is within the window, set the appropriate bit */
   v128_set_bit(&rdb->bitmask, delta);

 } else {
   
   delta -= rdb_bits_in_bitmask - 1;

   /* shift the window forward by delta bits*/
   v128_left_shift(&rdb->bitmask, delta);
   v128_set_bit(&rdb->bitmask, rdb_bits_in_bitmask-delta);
   rdb->window_start += delta;

 }    

 return err_status_ok;
}

It should be:

err_status_t
rdb_add_index(rdb_t *rdb, uint32_t index) {
 uint32_t delta;  

 /* here we *assume* that index > rdb->window_start */

 delta = (index - rdb->window_start);    
 if (delta < rdb_bits_in_bitmask) {

   /* if the index is within the window, set the appropriate bit */
   v128_set_bit(&rdb->bitmask, delta);

 } else {
   
   delta -= rdb_bits_in_bitmask - 1;

   /* shift the window forward by delta bits*/
   v128_left_shift(&rdb->bitmask, delta);
   v128_set_bit(&rdb->bitmask, rdb_bits_in_bitmask-1);
   rdb->window_start += delta;

 }    

 return err_status_ok;
}

By: Gregory Hinton Nietsky (irroot) 2011-05-23 04:16:39

this is not a asterisk problem the problem is in srtp the patch above SHOULD not be on mantis the patch is available on the SRTP source forge page.

By: Vladimir Mikhelson (vmikhelson) 2011-05-24 15:47:34

rshpount, thank you for posting here. It definitely helps with troubleshooting Asterisk crashes.  Can you please post diff?  I would like to test your patch.

Here is the link to the SourceForge LibSRTP Bug Tracker irroot referred to:
http://sourceforge.net/tracker/?func=detail&aid=3280295&group_id=38894&atid=423799

-Vladimir

By: Roman Shpount (rshpount) 2011-05-26 00:50:56

Please see the difference between versions 1.4 and 1.5 of rdb.c in libsrtp:
http://srtp.cvs.sourceforge.net/viewvc/srtp/srtp/crypto/replay/rdb.c?r1=1.4&r2=1.5

This is the fix for this bug.

By: Leif Madsen (lmadsen) 2011-10-05 17:33:29.269-0500

Closing as it was an issue outside of Asterisk.

By: Thomas Arimont (tomaso) 2011-12-07 03:06:10.438-0600

The patch referenced here did not work in my environment (srtp lib 1.4.4, Asterisk 1.8.8..0-rc4, SNOM Version 7). I had to use this one (found it somewhere, but can't remember where):

{noformat}
--- srtp-1.4.4-r0/crypto/replay/rdb.c 2006/06/08 17:00:28
+++ srtp-1.4.4-r0/crypto/replay/rdb.c 2010/05/14 22:15:12
@@ -115,7 +115,7 @@

    /* shift the window forward by delta bits*/
    v128_left_shift(&rdb->bitmask, delta);
-    v128_set_bit(&rdb->bitmask, rdb_bits_in_bitmask-delta);
+    v128_set_bit(&rdb->bitmask, 127);
    rdb->window_start += delta;

  }    
{noformat}


By: IƱaki Baz Castillo (ibc) 2012-03-29 07:42:16.801-0500

Assuming that this bug is fixed in libsrtp 1.5 (as Roman Shpount clarifies in a previous comment), note that current libsrtp version in most of the systems is still 1.4.X.

So, wouldn't be better if Asterisk was include libsrtp 1.5 within its sources rather than depending on an external library that probably will be 1.4.X version?