[Home]

Summary:ASTERISK-28132: res_pjsip_registrar: Asterisk crashing with large number of PJSIP registration
Reporter:Muhammad Yousuf (myousuf)Labels:fax pjsip
Date Opened:2018-10-25 13:10:33Date Closed:2020-01-14 11:13:47.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_registrar
Versions:13.23.1 Frequency of
Occurrence
Related
Issues:
Environment:CentOS Linux release 7.5.1804 (Core) Linux NASHASTERISK1 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux PJSIP version bundled 2.7.2 Attachments:( 0) backtrace.txt
Description:Asterisk is crashing too frequently whenever a large number of PJSIP AOR are trying to register on asterisk.
Comments:By: Asterisk Team (asteriskteam) 2018-10-25 13:10:34.848-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: George Joseph (gjoseph) 2018-10-25 14:43:11.661-0500

Can you give us some sample endpoint and aor configurations?
In the community post, you said "85" contacts.  Are they all trying to register at the same time?
Were there any calls in progress?
Can you provide logs leading up to the crash?
What's the load like on the system and how much memory, CPU, etc are available?


By: Muhammad Yousuf (myousuf) 2018-10-26 00:07:05.963-0500

#######AOR###################

                 id: 1508-qs
            contact: NULL
 default_expiration: NULL
          mailboxes:
       max_contacts: 99
 minimum_expiration: NULL
    remove_existing: NULL
  qualify_frequency: 30
authenticate_qualify: NULL
 maximum_expiration: NULL
     outbound_proxy: NULL
       support_path: NULL
    qualify_timeout: 30
voicemail_extension: NULL

#######ENDPOINT##############

                         tech_id: 9122
                               id: 1508-qs
                            te_id: 18
                        transport:
                             aors: 1508-qs
                             auth: 1508-qs
                          context: authenticated
                         disallow: all
                            allow: alaw:20;ulaw:20;gsm:20;h264:20
                     direct_media: no
            connected_line_method: NULL
              direct_media_method: NULL
    direct_media_glare_mitigation: NULL
      disable_direct_media_on_nat: NULL
                        dtmf_mode: auto
           external_media_address: NULL
                      force_rport: yes
                      ice_support: NULL
                      identify_by: NULL
                        mailboxes:
                      moh_suggest: NULL
                    outbound_auth: NULL
                   outbound_proxy:
                  rewrite_contact: yes
                         rtp_ipv6: NULL
                    rtp_symmetric: yes
                   send_diversion: NULL
                         send_pai: NULL
                        send_rpid: NULL
                    timers_min_se: NULL
                           timers: NULL
              timers_sess_expires: NULL
                         callerid: Muhammad Yousuf <1508>
                 callerid_privacy: NULL
                     callerid_tag: NULL
                           100rel: NULL
                    aggregate_mwi: NULL
                 trust_id_inbound: NULL
                trust_id_outbound: NULL
                        use_ptime: NULL
                         use_avpf: NULL
                 media_encryption: sdes
                  inband_progress:
                       call_group: NULL
                     pickup_group: NULL
                 named_call_group: NULL
               named_pickup_group: NULL
             device_state_busy_at: NULL
                       fax_detect: NULL
                        t38_udptl: NULL
                     t38_udptl_ec: NULL
            t38_udptl_maxdatagram: NULL
                    t38_udptl_nat: NULL
                   t38_udptl_ipv6: NULL
                        tone_zone: NULL
                         language:
              one_touch_recording: NULL
                record_on_feature: NULL
               record_off_feature: NULL
                       rtp_engine: NULL
                   allow_transfer: NULL
                  allow_subscribe: yes
                        sdp_owner: NULL
                      sdp_session: NULL
                        tos_audio: NULL
                        tos_video: NULL
                   sub_min_expiry: NULL
                      from_domain: NULL
                        from_user: NULL
                    mwi_from_user: NULL
                      dtls_verify: NULL
                       dtls_rekey: NULL
                   dtls_cert_file: NULL
                 dtls_private_key: NULL
                      dtls_cipher: NULL
                     dtls_ca_file: NULL
                     dtls_ca_path: NULL
                       dtls_setup: NULL
                      srtp_tag_32: NULL
                    media_address: NULL
                  redirect_method: NULL
                          set_var: NULL
                  message_context: astsms
                        force_avp: NULL
     media_use_received_transport: NULL
                      accountcode: qs
      media_encryption_optimistic: NULL
                    user_eq_phone: NULL
                   rpid_immediate: NULL
                g726_non_standard: NULL
                    rtp_keepalive: NULL
                      rtp_timeout: NULL
                 rtp_timeout_hold: NULL
        bind_rtp_to_media_address: NULL
                        cos_audio: NULL
                        cos_video: NULL
                             deny: NULL
                           permit: NULL
                              acl: NULL
                     contact_deny: NULL
                   contact_permit: NULL
                      contact_acl: NULL
              voicemail_extension: NULL
mwi_subscribe_replaces_unsolicited: NULL
                subscribe_context: NULL
               fax_detect_timeout: NULL
                     contact_user: NULL
             asymmetric_rtp_codec: NULL

########### System Info INFO ####################

       Manufacturer: Dell Inc.
       Product Name: PowerEdge M620

40 Cores Processor with 192GB RAM

System load never goes up more than 3 percent as currently we don't have much clients on this server.

########## Calls INFO ###########################

Yes there were calls but not more than 10 to 15

By: George Joseph (gjoseph) 2018-10-26 08:18:05.897-0500

You seem to have max_contacts set to 99 on the AOR.  Are there multiple devices registering to this same AOR?  If not, can you try setting max_contacts to 1 and also set remove_existing to yes?  These would be the normal settings for a single phone registering as an extension.

This isn't meant to be a solution, just a troubleshooting step.  If the crashes go away, then we know where to look.


By: Muhammad Yousuf (myousuf) 2018-10-26 08:47:05.819-0500

Yes for multiple devices we are using PJSIP and it's a must requirement for us.

By: George Joseph (gjoseph) 2018-10-26 13:08:11.603-0500

So were the 80 odd 'aors" really 80 odd contacts on the same aor?  At the time of the crash, how many contacts were registered for each aor?

By: Muhammad Yousuf (myousuf) 2018-10-26 14:57:47.833-0500

No, maximum 3 contacts against one aor and that was also not frequent case so mostly two contacts against each aor.

By: Muhammad Yousuf (muyousif) 2018-11-05 08:03:29.804-0600

Hi,

Please let us know if you need any other info as we are still facing this issue very badly and I am sure this co relates with

https://trac.pjsip.org/repos/ticket/2099

These are the errors.
[2018-11-05 08:39:51] ERROR[37159] tcptls.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Broken pipe
[2018-11-05 08:39:52] ERROR[33034] tcptls.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Broken pipe
[2018-11-05 08:51:52] VERBOSE[18890] res_pjsip_registrar.c: Removed contact 'sip:107@xx.xx.xx.xx:xx;transport=TLS;rinstance=7bd76dc3a779f1cc' from AOR '107' due to transport shutdown

By: Kevin Harwell (kharwell) 2018-11-16 16:32:59.506-0600

I have been unable to replicate the problem, but I've only been able to use config files so far.

It looks like you are using a realtime configuration. As a test is there any way you could move your configuration into just the _pjsip.conf_ file to see if the problem still occurs?

By: Muhammad Yousuf (muyousif) 2018-11-19 08:51:16.905-0600

With how many PJSIP endpoints did you test it? It may be good with plain conf files but our environment is totally realtime so we need to find out that way. Please let me know if you need any kind of info/setup to test it, we could provide you test environment if possible for you.

By: Joshua C. Colp (jcolp) 2018-11-27 11:23:23.752-0600

Can you provide a database dump that can be used to reproduce this problem as well as all the .conf files? I looked back and I also didn't see it mentioned how many registrations per second you commonly see, which would be needed to reproduce the problem.

By: Muhammad Yousuf (muyousif) 2018-11-27 12:06:07.035-0600

Could you confirm for which tables you need dump? Aors, Auth, contacts only? In real scenario issue was happening with more than 60 or 70 Aors but in test scenario I had to register more than 150, 200 and 300 sip cilents. I tested using pjsua sip cilent.



By: Joshua C. Colp (jcolp) 2018-11-27 12:23:36.300-0600

Everything needed to reproduce the problem. Anything we have to guess ourselves, is another variable that can alter the result.

By: Asterisk Team (asteriskteam) 2018-12-12 12:00:01.079-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines