Summary: | ASTERISK-28132: res_pjsip_registrar: Asterisk crashing with large number of PJSIP registration | ||
Reporter: | Muhammad Yousuf (myousuf) | Labels: | fax pjsip |
Date Opened: | 2018-10-25 13:10:33 | Date Closed: | 2020-01-14 11:13:47.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Resources/res_pjsip_registrar |
Versions: | 13.23.1 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | CentOS Linux release 7.5.1804 (Core) Linux NASHASTERISK1 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux PJSIP version bundled 2.7.2 | Attachments: | ( 0) backtrace.txt |
Description: | Asterisk is crashing too frequently whenever a large number of PJSIP AOR are trying to register on asterisk. | ||
Comments: | By: Asterisk Team (asteriskteam) 2018-10-25 13:10:34.848-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: George Joseph (gjoseph) 2018-10-25 14:43:11.661-0500 Can you give us some sample endpoint and aor configurations? In the community post, you said "85" contacts. Are they all trying to register at the same time? Were there any calls in progress? Can you provide logs leading up to the crash? What's the load like on the system and how much memory, CPU, etc are available? By: Muhammad Yousuf (myousuf) 2018-10-26 00:07:05.963-0500 #######AOR################### id: 1508-qs contact: NULL default_expiration: NULL mailboxes: max_contacts: 99 minimum_expiration: NULL remove_existing: NULL qualify_frequency: 30 authenticate_qualify: NULL maximum_expiration: NULL outbound_proxy: NULL support_path: NULL qualify_timeout: 30 voicemail_extension: NULL #######ENDPOINT############## tech_id: 9122 id: 1508-qs te_id: 18 transport: aors: 1508-qs auth: 1508-qs context: authenticated disallow: all allow: alaw:20;ulaw:20;gsm:20;h264:20 direct_media: no connected_line_method: NULL direct_media_method: NULL direct_media_glare_mitigation: NULL disable_direct_media_on_nat: NULL dtmf_mode: auto external_media_address: NULL force_rport: yes ice_support: NULL identify_by: NULL mailboxes: moh_suggest: NULL outbound_auth: NULL outbound_proxy: rewrite_contact: yes rtp_ipv6: NULL rtp_symmetric: yes send_diversion: NULL send_pai: NULL send_rpid: NULL timers_min_se: NULL timers: NULL timers_sess_expires: NULL callerid: Muhammad Yousuf <1508> callerid_privacy: NULL callerid_tag: NULL 100rel: NULL aggregate_mwi: NULL trust_id_inbound: NULL trust_id_outbound: NULL use_ptime: NULL use_avpf: NULL media_encryption: sdes inband_progress: call_group: NULL pickup_group: NULL named_call_group: NULL named_pickup_group: NULL device_state_busy_at: NULL fax_detect: NULL t38_udptl: NULL t38_udptl_ec: NULL t38_udptl_maxdatagram: NULL t38_udptl_nat: NULL t38_udptl_ipv6: NULL tone_zone: NULL language: one_touch_recording: NULL record_on_feature: NULL record_off_feature: NULL rtp_engine: NULL allow_transfer: NULL allow_subscribe: yes sdp_owner: NULL sdp_session: NULL tos_audio: NULL tos_video: NULL sub_min_expiry: NULL from_domain: NULL from_user: NULL mwi_from_user: NULL dtls_verify: NULL dtls_rekey: NULL dtls_cert_file: NULL dtls_private_key: NULL dtls_cipher: NULL dtls_ca_file: NULL dtls_ca_path: NULL dtls_setup: NULL srtp_tag_32: NULL media_address: NULL redirect_method: NULL set_var: NULL message_context: astsms force_avp: NULL media_use_received_transport: NULL accountcode: qs media_encryption_optimistic: NULL user_eq_phone: NULL rpid_immediate: NULL g726_non_standard: NULL rtp_keepalive: NULL rtp_timeout: NULL rtp_timeout_hold: NULL bind_rtp_to_media_address: NULL cos_audio: NULL cos_video: NULL deny: NULL permit: NULL acl: NULL contact_deny: NULL contact_permit: NULL contact_acl: NULL voicemail_extension: NULL mwi_subscribe_replaces_unsolicited: NULL subscribe_context: NULL fax_detect_timeout: NULL contact_user: NULL asymmetric_rtp_codec: NULL ########### System Info INFO #################### Manufacturer: Dell Inc. Product Name: PowerEdge M620 40 Cores Processor with 192GB RAM System load never goes up more than 3 percent as currently we don't have much clients on this server. ########## Calls INFO ########################### Yes there were calls but not more than 10 to 15 By: George Joseph (gjoseph) 2018-10-26 08:18:05.897-0500 You seem to have max_contacts set to 99 on the AOR. Are there multiple devices registering to this same AOR? If not, can you try setting max_contacts to 1 and also set remove_existing to yes? These would be the normal settings for a single phone registering as an extension. This isn't meant to be a solution, just a troubleshooting step. If the crashes go away, then we know where to look. By: Muhammad Yousuf (myousuf) 2018-10-26 08:47:05.819-0500 Yes for multiple devices we are using PJSIP and it's a must requirement for us. By: George Joseph (gjoseph) 2018-10-26 13:08:11.603-0500 So were the 80 odd 'aors" really 80 odd contacts on the same aor? At the time of the crash, how many contacts were registered for each aor? By: Muhammad Yousuf (myousuf) 2018-10-26 14:57:47.833-0500 No, maximum 3 contacts against one aor and that was also not frequent case so mostly two contacts against each aor. By: Muhammad Yousuf (muyousif) 2018-11-05 08:03:29.804-0600 Hi, Please let us know if you need any other info as we are still facing this issue very badly and I am sure this co relates with https://trac.pjsip.org/repos/ticket/2099 These are the errors. [2018-11-05 08:39:51] ERROR[37159] tcptls.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Broken pipe [2018-11-05 08:39:52] ERROR[33034] tcptls.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Broken pipe [2018-11-05 08:51:52] VERBOSE[18890] res_pjsip_registrar.c: Removed contact 'sip:107@xx.xx.xx.xx:xx;transport=TLS;rinstance=7bd76dc3a779f1cc' from AOR '107' due to transport shutdown By: Kevin Harwell (kharwell) 2018-11-16 16:32:59.506-0600 I have been unable to replicate the problem, but I've only been able to use config files so far. It looks like you are using a realtime configuration. As a test is there any way you could move your configuration into just the _pjsip.conf_ file to see if the problem still occurs? By: Muhammad Yousuf (muyousif) 2018-11-19 08:51:16.905-0600 With how many PJSIP endpoints did you test it? It may be good with plain conf files but our environment is totally realtime so we need to find out that way. Please let me know if you need any kind of info/setup to test it, we could provide you test environment if possible for you. By: Joshua C. Colp (jcolp) 2018-11-27 11:23:23.752-0600 Can you provide a database dump that can be used to reproduce this problem as well as all the .conf files? I looked back and I also didn't see it mentioned how many registrations per second you commonly see, which would be needed to reproduce the problem. By: Muhammad Yousuf (muyousif) 2018-11-27 12:06:07.035-0600 Could you confirm for which tables you need dump? Aors, Auth, contacts only? In real scenario issue was happening with more than 60 or 70 Aors but in test scenario I had to register more than 150, 200 and 300 sip cilents. I tested using pjsua sip cilent. By: Joshua C. Colp (jcolp) 2018-11-27 12:23:36.300-0600 Everything needed to reproduce the problem. Anything we have to guess ourselves, is another variable that can alter the result. By: Asterisk Team (asteriskteam) 2018-12-12 12:00:01.079-0600 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines |