[Home]

Summary:ASTERISK-24291: res_srtp module stops working after about 35.000 processed calls
Reporter:Robert Hirschmann (roberth)Labels:
Date Opened:2014-08-30 07:25:29Date Closed:2014-10-28 09:29:13
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/SRTP Resources/res_srtp
Versions:1.8.23.1 11.11.0 11.12.0 Frequency of
Occurrence
Related
Issues:
Environment:Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-34-generic x86_64) running on HP DL360 G6/7, latest libsrtp0 version 1.4.4+20100615~dfsg-1build, SIP only environmentAttachments:( 0) issue_24291_full_log.14.txt
Description:When using encryption for RTP streams, asterisk does not accept any calls after about 35k calls (reproducable) have been processed correctly.
All further inbound and outbound calls are rejected with a
"488 - Not Acceptable Here".

When this happens, one asterisk machine shows:

{noformat}
[2014-08-29 17:32:23.807] DEBUG[28500][C-00009387]: chan_sip.c:10530 process_sdp: Processing media-level (audio) SDP a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:cYswzW2zYpdgsVkKgQWvdbUSLedzlE8nByMqEYiI... UNSUPPORTED OR FAILED.
[2014-08-29 17:32:23.807] WARNING[28500][C-00009387]: chan_sip.c:10535 process_sdp: Rejecting secure audio stream without encryption details: audio 11070 RTP/SAVP 8 0 101
{noformat}

the destination asterisk shows:
{noformat}
WARNING[10222][C-0000883a]: chan_sip.c:12925 get_crypto_attrib: No SRTP key management enabled
{noformat}

Active srtp calls are not affected when this issue occurs, but all further Invites are rejected with the 488 response, so no more calls can be processed.

The only solution at the moment is to restart asterisk or to wait until no more SRTP calls are active and then unload res_srtp.so followed by loading the module again.

add info:
- problem occurs regardless of using SIP over TLS or SIP without TLS
- other (unencrypted) RTP connections are still working  

If you need further info, just let me know.

Thanks for checking into this!
Robert

   
Comments:By: Matt Jordan (mjordan) 2014-08-30 14:34:51.094-0500

We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

When the issue occurs, please make sure you have both 'sip set debug on' and 'rtp set debug on' enabled.

By: Robert Hirschmann (roberth) 2014-08-31 11:17:17.072-0500

Hello Matt,

thank you for your quick feedback.
Attached you will find the trace you have requested.
The trace starts around 60-70 calls before.
Until call [C-00008824] everything works fine.
At call [C-00008825] the problem starts and SRTP no longer works until the res_srtp.so module has been reloaded.
After reloading the module the issue occurs again after about 35k SRTP calls (to be exactly: 34852).

These are the important trace lines:

last working call:
{noformat}
[2014-08-31 17:38:28.656] DEBUG[12804][C-00008824] sip/sdp_crypto.c: local_key64 7zftbmFk8DEQVnePF8a2tbyTW8wC3TgkH1CK4cz+ len 40
[2014-08-31 17:38:28.656] DEBUG[12804][C-00008824] sip/sdp_crypto.c: Crypto line: a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:7zftbmFk8DEQVnePF8a2tbyTW8wC3TgkH1CK4cz+
{noformat}

first call where the issue occurs and the crypto line is no longer added to the sdp:
{noformat}
[2014-08-31 17:38:28.974] WARNING[12805][C-00008825] chan_sip.c: No SRTP key management enabled
{noformat}

Do you need further info/traces to find the cause for this?

Thanks
Robert

By: Robert Hirschmann (roberth) 2014-09-07 23:59:03.543-0500

trace has been uploaded. do you need further details?

By: Rusty Newton (rnewton) 2014-09-11 13:15:13.466-0500

Your trace didn't have RTP debug enabled, however doing so with your call volume could be massive (depending on your concurrent call state).

If you can get that and a PCAP showing those last few calls including the first failure, that would be excellent.  This is a strange issue.

Oh, additionally

* configuration for the extensions involved. That is, sip.conf.

P.S. If your scenario is really a load test and not a production system, could you provide us all the configuration you are using with instructions to perform your load test so that we could reproduce it?

By: Robert Hirschmann (roberth) 2014-09-11 16:59:08.534-0500

Thanks for your feedback Rusty.

This is a production system with heavy load, but I can switch main traffic to another prod machine tomorrow to reproduce the issue with the loadtest-script.
As I know that it happens every 34-35k calls I can enable rtp debug some few hundred calls before the issue occurs.
The test scenario simulates nearly the same behavior which we have in production (busy calls, answered calls with different call lengths, cancelled calls an so on).

For load testing and to reproduce the issue I use 2 additional Asterisk machines that behave like the clients behind the Asterisk Routing Servers:

Server 1: Dial Out Asterisk Test Server (calls Asterisk Server 2 with unencrypted RTP traffic)
Server 2+3: Asterisk Production Servers. They route traffic between Server 2+3 over WAN with SIP-TLS and SRTP. The issue happens on both servers.
Server 4: Asterisk Test call receiver (gets calls from Server 3 unencrypted)

Asterisk 1+2 and 3+4 are on separate physical networks. Link between server 2+3 is a dedicated symmetric 100MBit WAN Link.
Servers 2 has two network cards with different vlans for internal/external traffic.
Server 3 has one network card with different vlans for internal/external traffic.

sip.conf on Server 2 as well as Server 3 is quite simple:
{noformat}
;--- start sip.conf
[general]
context=default        
match_auth_username=yes
allowoverlap=yes        
udpbindaddr=0.0.0.0                
nat=auto_force_rport,auto_comedia
tcpenable=no            
tcpbindaddr=0.0.0.0    
tlsenable=yes
tlsbindaddr=0.0.0.0
tlscertfile=/etc/asterisk/keys/asterisk.pem
tlscafile=/etc/asterisk/keys/ca.crt
tlscipher=ALL
tlsclientmethod=tlsv1
tlsdontverifyserver=yes
transport=udp      
srvlookup=yes      
tos_sip=cs3            
tos_audio=ef          
tos_video=af41        
tos_text=af41          
cos_sip=3              
cos_audio=5            
cos_video=4            
cos_text=3            
useragent=Asterisk PBX
legacy_useroption_parsing=yes
use_q850_reason = yes
session-timers=refuse
sdpsession=Asterisk PBX

[authentication]

[tpl_ivr](!)
context=trunk_ivrs
type=peer
dtmfmode=rfc2833
qualify=no
port=5060
usereqphone=yes
callcounter=yes
disallow=all
allow=alaw
allow=ulaw
sendrpid = no
trustrpid = no
disallowed_methods = UPDATE
accountcode=ivr_not_set
directmedia=no
deny=0.0.0.0/0.0.0.0
permit=10.100.1.0/27

[ivr3vlan54](tpl_ivr)
host=10.100.1.23 ; internal dial out IVR on vlan54

[bccc]
context=trunk_bccc
secret=none ; not used. we have insecure=invite with an acl
type=friend  
dtmfmode=info
host=X.X.X.X ; WAN IP of Asterisk 3 Server
directmedia=no
nat=no
qualify=yes
disallow=all
allow=alaw        
allow=ulaw        
accountcode=bccc
callcounter=yes
deny=0.0.0.0/0.0.0.0
permit=x.x.x.x/32 ; WAN IP of Asterisk 3 Server
transport=tls
encryption=yes ; CAUTION! when enabled for SRTP, jira issue 24291 occurs
progressinband=yes
tonezone=de
insecure=invite

;--- end of sip.conf
{noformat}

I will reply tomorrow with traces when I have finished load tests with rtp debugging enabled.
In addition, I will send you the involved extensions.conf parts to reproduce it.

Thank you,
Robert



By: Robert Hirschmann (roberth) 2014-09-13 12:19:37.123-0500

Hello Rusty,

I just reproduced the issue.
Everyime it happens exactly between srtp call 34852 and 34853.
I have set "rtp set debug on" as well, but the logfile does not contain the rtp debug output.
Is there any additional keyword in logger.conf I need to enable?
Currently I use:
{noformat}
issue_24291_full_log => notice,warning,error,debug,verbose,dtmf
{noformat}

The other debug output in the file is the same as in the last trace I've sent.
The first thing I see is the "no srtp key management" message but nothing else:
{noformat}
[2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: ** Our capability: (ulaw|alaw) Video flag: False Text flag: False
[2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: ** Our prefcodec: (alaw)
[2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Audio is at 18496
[2014-09-13 19:00:03.425] WARNING[29254][C-00008825] chan_sip.c: No SRTP key management enabled
[2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Adding codec 100004 (alaw) to SDP
[2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Adding codec 100003 (ulaw) to SDP
[2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: -- Done with adding codecs to SDP
[2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: Done building SDP. Settling with this capability: (ulaw|alaw)
{noformat}

Do I need to increase the debug level to something higher than 5?
I could also trace RTP with tcpdump, but it might be useless because it's encrypted...

Thanks for your feedback,
Robert

By: Dwayne Hubbard (dwayne) 2014-09-25 13:44:35.827-0500

I don't see how to add to the Affected Version list but I'm seeing this issue on 1.8.23.1.  Trying to reproduce the issue now.

By: Rusty Newton (rnewton) 2014-09-25 14:20:21.013-0500

[~roberth]
{quote}
I have set "rtp set debug on" as well, but the logfile does not contain the rtp debug output.
Is there any additional keyword in logger.conf I need to enable?
{quote}
After you have turned the debug on, you may have to run "logger reload". Or perhaps nothing shows because it is SRTP. I haven't looked at the output of "rtp set debug on" with SRTP.

[~dwayne]
{quote}
I don't see how to add to the Affected Version list but I'm seeing this issue on 1.8.23.1. Trying to reproduce the issue now.
{quote}
I added the affected version for you.



By: Hiroaki Komatsu (Hiroaki Komatsu) 2014-10-15 20:58:34.074-0500

Similar problems in my environment has occurred.

◎Environment
・OS
 centOS 6.5 64bit
・asterisk version
 certified-asterisk-11.2-cert1
・srtp library version
 srtp-1.4.2
 
◎How to reproduce
・our uac connects to asterisk using TLS/SRTP.
 Asterisk can process this offer at first, but when asterisk asterisk processed over
 17,000 calls(we checked by 'core show channels'),
 asterisk responds with the return code 488, and call invoked by the uac failed.
 but other (unencrypted) RTP connections are still working.

When this happens, one asterisk machine shows:
{noformat}
[2014-10-09 01:51:11] WARNING[19056][C-00004422] chan_sip.c: No SRTP key management enabled
[2014-10-09 01:51:11] WARNING[1533][C-00004423] chan_sip.c: Can't provide secure audio requested in SDP offer
{noformat}

{noformat}
CLI>core show channels
Channel              Location             State   Application(Data)
0 active channels
0 of 500 max active calls ( 0.00% of capacity)
17415 calls processed
{noformat}

◎analysis
・As a result of investigating the problem areas to add logs to the srtp library and asterisk,
 errors occurred at the following processing route of srtp library.
 {code:title=srtp/crypto/cipher/aes_icm.c|borderStyle=solid}
  err_status_t
  aes_icm_encrypt_ismacryp(aes_icm_ctx_t *c,
              unsigned char *buf, unsigned int *enc_len,
              int forIsmacryp) {
  ・
  ・
   /* check that there's enough segment left but not for ismacryp*/
  if (!forIsmacryp && (bytes_to_encr + htons(c->counter.v16[7])) > 0xffff)
   return err_status_terminus;
 {code}
 As a result, it becomes NULL return by the following processing
 of sdp_crypto_setup within the function as asterisk,
 it had been an error output above.
 {code:title=certified-asterisk-11.2-cert1/channels/sip/sdp_crypto.c|borderStyle=solid}
  struct sdp_crypto *sdp_crypto_setup(void)
  {
  ・
  ・
    if (res_srtp->get_random(p->local_key, sizeof(p->local_key)) < 0) {
               sdp_crypto_destroy(p);
               return NULL;
       }
 {code}
I think the buffer area of the encryption key has been depleted.
But we do not know whether in the library or at a asterisk is the cause.
I am currently checking the buffer release opportunity and buffer release function.
Please share it if something information.

By: Hiroaki Komatsu (Hiroaki Komatsu) 2014-10-16 01:25:34.435-0500

We have applied the SRTP library of 1.50pre version. As a result, the problem no longer occurs.
https://github.com/cisco/libsrtp
Therefore, I think there was a bug in the SRTP library.

I think the following problems and had become the trigger probably
https://github.com/cisco/libsrtp/issues/5

please try.

By: Matt Jordan (mjordan) 2014-10-28 09:29:07.313-0500

[~Hiroaki Komatsu]: thank you *very* much for diagnosing and debugging this problem. Your analysis was excellent and it is greatly appreciated.

I've also upgraded a local instance with the 1.5.0 release of {{libsrtp}}. After fixing a minor compatibility issue reported by another user in ASTERISK-24436, I was able to test this issue and was unable to reproduce it with that version of {{libsrtp}}.

As such, I'm going to close out this issue as "Not a bug" in Asterisk, with the recommendation that anyone who does come across it upgrade to the latest version of {{libsrtp}}.

Thanks again for analysing the issue!