Summary: | ASTERISK-24291: res_srtp module stops working after about 35.000 processed calls | ||
Reporter: | Robert Hirschmann (roberth) | Labels: | |
Date Opened: | 2014-08-30 07:25:29 | Date Closed: | 2014-10-28 09:29:13 |
Priority: | Critical | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/SRTP Resources/res_srtp |
Versions: | 1.8.23.1 11.11.0 11.12.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-34-generic x86_64) running on HP DL360 G6/7, latest libsrtp0 version 1.4.4+20100615~dfsg-1build, SIP only environment | Attachments: | ( 0) issue_24291_full_log.14.txt |
Description: | When using encryption for RTP streams, asterisk does not accept any calls after about 35k calls (reproducable) have been processed correctly.
All further inbound and outbound calls are rejected with a "488 - Not Acceptable Here". When this happens, one asterisk machine shows: {noformat} [2014-08-29 17:32:23.807] DEBUG[28500][C-00009387]: chan_sip.c:10530 process_sdp: Processing media-level (audio) SDP a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:cYswzW2zYpdgsVkKgQWvdbUSLedzlE8nByMqEYiI... UNSUPPORTED OR FAILED. [2014-08-29 17:32:23.807] WARNING[28500][C-00009387]: chan_sip.c:10535 process_sdp: Rejecting secure audio stream without encryption details: audio 11070 RTP/SAVP 8 0 101 {noformat} the destination asterisk shows: {noformat} WARNING[10222][C-0000883a]: chan_sip.c:12925 get_crypto_attrib: No SRTP key management enabled {noformat} Active srtp calls are not affected when this issue occurs, but all further Invites are rejected with the 488 response, so no more calls can be processed. The only solution at the moment is to restart asterisk or to wait until no more SRTP calls are active and then unload res_srtp.so followed by loading the module again. add info: - problem occurs regardless of using SIP over TLS or SIP without TLS - other (unencrypted) RTP connections are still working If you need further info, just let me know. Thanks for checking into this! Robert | ||
Comments: | By: Matt Jordan (mjordan) 2014-08-30 14:34:51.094-0500 We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information When the issue occurs, please make sure you have both 'sip set debug on' and 'rtp set debug on' enabled. By: Robert Hirschmann (roberth) 2014-08-31 11:17:17.072-0500 Hello Matt, thank you for your quick feedback. Attached you will find the trace you have requested. The trace starts around 60-70 calls before. Until call [C-00008824] everything works fine. At call [C-00008825] the problem starts and SRTP no longer works until the res_srtp.so module has been reloaded. After reloading the module the issue occurs again after about 35k SRTP calls (to be exactly: 34852). These are the important trace lines: last working call: {noformat} [2014-08-31 17:38:28.656] DEBUG[12804][C-00008824] sip/sdp_crypto.c: local_key64 7zftbmFk8DEQVnePF8a2tbyTW8wC3TgkH1CK4cz+ len 40 [2014-08-31 17:38:28.656] DEBUG[12804][C-00008824] sip/sdp_crypto.c: Crypto line: a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:7zftbmFk8DEQVnePF8a2tbyTW8wC3TgkH1CK4cz+ {noformat} first call where the issue occurs and the crypto line is no longer added to the sdp: {noformat} [2014-08-31 17:38:28.974] WARNING[12805][C-00008825] chan_sip.c: No SRTP key management enabled {noformat} Do you need further info/traces to find the cause for this? Thanks Robert By: Robert Hirschmann (roberth) 2014-09-07 23:59:03.543-0500 trace has been uploaded. do you need further details? By: Rusty Newton (rnewton) 2014-09-11 13:15:13.466-0500 Your trace didn't have RTP debug enabled, however doing so with your call volume could be massive (depending on your concurrent call state). If you can get that and a PCAP showing those last few calls including the first failure, that would be excellent. This is a strange issue. Oh, additionally * configuration for the extensions involved. That is, sip.conf. P.S. If your scenario is really a load test and not a production system, could you provide us all the configuration you are using with instructions to perform your load test so that we could reproduce it? By: Robert Hirschmann (roberth) 2014-09-11 16:59:08.534-0500 Thanks for your feedback Rusty. This is a production system with heavy load, but I can switch main traffic to another prod machine tomorrow to reproduce the issue with the loadtest-script. As I know that it happens every 34-35k calls I can enable rtp debug some few hundred calls before the issue occurs. The test scenario simulates nearly the same behavior which we have in production (busy calls, answered calls with different call lengths, cancelled calls an so on). For load testing and to reproduce the issue I use 2 additional Asterisk machines that behave like the clients behind the Asterisk Routing Servers: Server 1: Dial Out Asterisk Test Server (calls Asterisk Server 2 with unencrypted RTP traffic) Server 2+3: Asterisk Production Servers. They route traffic between Server 2+3 over WAN with SIP-TLS and SRTP. The issue happens on both servers. Server 4: Asterisk Test call receiver (gets calls from Server 3 unencrypted) Asterisk 1+2 and 3+4 are on separate physical networks. Link between server 2+3 is a dedicated symmetric 100MBit WAN Link. Servers 2 has two network cards with different vlans for internal/external traffic. Server 3 has one network card with different vlans for internal/external traffic. sip.conf on Server 2 as well as Server 3 is quite simple: {noformat} ;--- start sip.conf [general] context=default match_auth_username=yes allowoverlap=yes udpbindaddr=0.0.0.0 nat=auto_force_rport,auto_comedia tcpenable=no tcpbindaddr=0.0.0.0 tlsenable=yes tlsbindaddr=0.0.0.0 tlscertfile=/etc/asterisk/keys/asterisk.pem tlscafile=/etc/asterisk/keys/ca.crt tlscipher=ALL tlsclientmethod=tlsv1 tlsdontverifyserver=yes transport=udp srvlookup=yes tos_sip=cs3 tos_audio=ef tos_video=af41 tos_text=af41 cos_sip=3 cos_audio=5 cos_video=4 cos_text=3 useragent=Asterisk PBX legacy_useroption_parsing=yes use_q850_reason = yes session-timers=refuse sdpsession=Asterisk PBX [authentication] [tpl_ivr](!) context=trunk_ivrs type=peer dtmfmode=rfc2833 qualify=no port=5060 usereqphone=yes callcounter=yes disallow=all allow=alaw allow=ulaw sendrpid = no trustrpid = no disallowed_methods = UPDATE accountcode=ivr_not_set directmedia=no deny=0.0.0.0/0.0.0.0 permit=10.100.1.0/27 [ivr3vlan54](tpl_ivr) host=10.100.1.23 ; internal dial out IVR on vlan54 [bccc] context=trunk_bccc secret=none ; not used. we have insecure=invite with an acl type=friend dtmfmode=info host=X.X.X.X ; WAN IP of Asterisk 3 Server directmedia=no nat=no qualify=yes disallow=all allow=alaw allow=ulaw accountcode=bccc callcounter=yes deny=0.0.0.0/0.0.0.0 permit=x.x.x.x/32 ; WAN IP of Asterisk 3 Server transport=tls encryption=yes ; CAUTION! when enabled for SRTP, jira issue 24291 occurs progressinband=yes tonezone=de insecure=invite ;--- end of sip.conf {noformat} I will reply tomorrow with traces when I have finished load tests with rtp debugging enabled. In addition, I will send you the involved extensions.conf parts to reproduce it. Thank you, Robert By: Robert Hirschmann (roberth) 2014-09-13 12:19:37.123-0500 Hello Rusty, I just reproduced the issue. Everyime it happens exactly between srtp call 34852 and 34853. I have set "rtp set debug on" as well, but the logfile does not contain the rtp debug output. Is there any additional keyword in logger.conf I need to enable? Currently I use: {noformat} issue_24291_full_log => notice,warning,error,debug,verbose,dtmf {noformat} The other debug output in the file is the same as in the last trace I've sent. The first thing I see is the "no srtp key management" message but nothing else: {noformat} [2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: ** Our capability: (ulaw|alaw) Video flag: False Text flag: False [2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: ** Our prefcodec: (alaw) [2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Audio is at 18496 [2014-09-13 19:00:03.425] WARNING[29254][C-00008825] chan_sip.c: No SRTP key management enabled [2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Adding codec 100004 (alaw) to SDP [2014-09-13 19:00:03.425] VERBOSE[29254][C-00008825] chan_sip.c: [2014-09-13 19:00:03.425] Adding codec 100003 (ulaw) to SDP [2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: -- Done with adding codecs to SDP [2014-09-13 19:00:03.425] DEBUG[29254][C-00008825] chan_sip.c: Done building SDP. Settling with this capability: (ulaw|alaw) {noformat} Do I need to increase the debug level to something higher than 5? I could also trace RTP with tcpdump, but it might be useless because it's encrypted... Thanks for your feedback, Robert By: Dwayne Hubbard (dwayne) 2014-09-25 13:44:35.827-0500 I don't see how to add to the Affected Version list but I'm seeing this issue on 1.8.23.1. Trying to reproduce the issue now. By: Rusty Newton (rnewton) 2014-09-25 14:20:21.013-0500 [~roberth] {quote} I have set "rtp set debug on" as well, but the logfile does not contain the rtp debug output. Is there any additional keyword in logger.conf I need to enable? {quote} After you have turned the debug on, you may have to run "logger reload". Or perhaps nothing shows because it is SRTP. I haven't looked at the output of "rtp set debug on" with SRTP. [~dwayne] {quote} I don't see how to add to the Affected Version list but I'm seeing this issue on 1.8.23.1. Trying to reproduce the issue now. {quote} I added the affected version for you. By: Hiroaki Komatsu (Hiroaki Komatsu) 2014-10-15 20:58:34.074-0500 Similar problems in my environment has occurred. ◎Environment ・OS centOS 6.5 64bit ・asterisk version certified-asterisk-11.2-cert1 ・srtp library version srtp-1.4.2 ◎How to reproduce ・our uac connects to asterisk using TLS/SRTP. Asterisk can process this offer at first, but when asterisk asterisk processed over 17,000 calls(we checked by 'core show channels'), asterisk responds with the return code 488, and call invoked by the uac failed. but other (unencrypted) RTP connections are still working. When this happens, one asterisk machine shows: {noformat} [2014-10-09 01:51:11] WARNING[19056][C-00004422] chan_sip.c: No SRTP key management enabled [2014-10-09 01:51:11] WARNING[1533][C-00004423] chan_sip.c: Can't provide secure audio requested in SDP offer {noformat} {noformat} CLI>core show channels Channel Location State Application(Data) 0 active channels 0 of 500 max active calls ( 0.00% of capacity) 17415 calls processed {noformat} ◎analysis ・As a result of investigating the problem areas to add logs to the srtp library and asterisk, errors occurred at the following processing route of srtp library. {code:title=srtp/crypto/cipher/aes_icm.c|borderStyle=solid} err_status_t aes_icm_encrypt_ismacryp(aes_icm_ctx_t *c, unsigned char *buf, unsigned int *enc_len, int forIsmacryp) { ・ ・ /* check that there's enough segment left but not for ismacryp*/ if (!forIsmacryp && (bytes_to_encr + htons(c->counter.v16[7])) > 0xffff) return err_status_terminus; {code} As a result, it becomes NULL return by the following processing of sdp_crypto_setup within the function as asterisk, it had been an error output above. {code:title=certified-asterisk-11.2-cert1/channels/sip/sdp_crypto.c|borderStyle=solid} struct sdp_crypto *sdp_crypto_setup(void) { ・ ・ if (res_srtp->get_random(p->local_key, sizeof(p->local_key)) < 0) { sdp_crypto_destroy(p); return NULL; } {code} I think the buffer area of the encryption key has been depleted. But we do not know whether in the library or at a asterisk is the cause. I am currently checking the buffer release opportunity and buffer release function. Please share it if something information. By: Hiroaki Komatsu (Hiroaki Komatsu) 2014-10-16 01:25:34.435-0500 We have applied the SRTP library of 1.50pre version. As a result, the problem no longer occurs. https://github.com/cisco/libsrtp Therefore, I think there was a bug in the SRTP library. I think the following problems and had become the trigger probably https://github.com/cisco/libsrtp/issues/5 please try. By: Matt Jordan (mjordan) 2014-10-28 09:29:07.313-0500 [~Hiroaki Komatsu]: thank you *very* much for diagnosing and debugging this problem. Your analysis was excellent and it is greatly appreciated. I've also upgraded a local instance with the 1.5.0 release of {{libsrtp}}. After fixing a minor compatibility issue reported by another user in ASTERISK-24436, I was able to test this issue and was unable to reproduce it with that version of {{libsrtp}}. As such, I'm going to close out this issue as "Not a bug" in Asterisk, with the recommendation that anyone who does come across it upgrade to the latest version of {{libsrtp}}. Thanks again for analysing the issue! |