[Home]

Summary:ASTERISK-28997: res_pjsip: Asterisk locks up and stops processing any SIP requests
Reporter:Yury Kirsanov (lt_flash)Labels:fax
Date Opened:2020-07-21 03:36:34Date Closed:2020-07-31 05:41:45
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip Resources/res_sorcery_memory_cache
Versions:16.11.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Ubuntu Linux 18.04.4 LTS, kernel 4.15.0-111-genericAttachments:( 0) backtrace-threads.txt
( 1) taskprocessors.txt
( 2) threads.txt
Description:Hi,
We have a problem with one of many of our Asterisk-based servers. From time to time it just locks up and stops processing SIP messages, complaining about pjsip/distributor queue has reached 500 scheduled tasks. We have several servers based on same Asterisk version with exactly the same environment, but different Asterisk configuration files, some have more SIP devices and more complex dial plans, some has less.

This issue happens at any time of the day and I don't see any way to reproduce it. There is no segfault generated as Asterisk keeps running, the only thing is that all distributor queues are overflown for some reason.

After Asterisk restart it starts to work like nothing happened. I've checked all network interfaces and there is no DDoS attack or anything like that.

Our Asterisk is patched with ASTERISK-28923 patch and it's the same server that was segfaulting before. Other similar servers are running unpatched version of Asterisk 16.11.1.

I'm attaching debug logs that I was able to grab during this lockup. Asterisk was compiled with 'NO_DEBUG' option.

Please let me know if I can provide any additional information or recompile Asterisk with some additional debug versions. Thanks!
Comments:By: Asterisk Team (asteriskteam) 2020-07-21 03:36:35.751-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Yury Kirsanov (lt_flash) 2020-07-21 03:38:08.519-0500

This is GDB output as per debug documentation, command used:

gdb -ex "thread apply all bt" --batch /opt/asterisk/sbin/asterisk `pidof asterisk` >/tmp/backtrace-threads.txt

By: Yury Kirsanov (lt_flash) 2020-07-21 03:38:41.517-0500

'asterisk -rx "core show taskprocessors"' output

By: Yury Kirsanov (lt_flash) 2020-07-21 03:39:16.591-0500

'asterisk -rx "core show threads"' output

By: Joshua C. Colp (jcolp) 2020-07-21 10:36:41.221-0500

What is the configuration for this and the deployment? I see a sorcery memory cache in use? Is realtime in use? How many auths/endpoints/etc?

By: Yury Kirsanov (lt_flash) 2020-07-21 18:25:45.831-0500

Hi Joshua,
This server has a complex call-flow setup with a lot of dialplans and endpoints, we have 2195 endpoints created currently though not all of them are in use. Asterisk was compiled with all modules except for all channels rather than pjsip. It uses both UDP, TCP and TLS. Realtime is not used and not loaded. It uses FAX in T.38 and V30 modes via res_fax_spandsp.  Server is using Asterisk DB to store a lot of parameters.

I'm not sure why sorcery memory cache is in use, probably because it's compiled in by default?

Here's a list of loaded modules:

asterisk -rx "module show"
Module                         Description                              Use Count  Status      Support Level
acl                            Named ACL system                         2          Running              core
app_authenticate.so            Authentication Application               0          Running              core
app_bridgewait.so              Place the channel into a holding bridge  0          Running              core
app_cdr.so                     Tell Asterisk to not maintain a CDR for  0          Running              core
app_confbridge.so              Conference Bridge Application            0          Running              core
app_db.so                      Database Access Functions                0          Running              core
app_dial.so                    Dialing Application                      70         Running              core
app_directed_pickup.so         Directed Call Pickup Application         0          Running              core
app_disa.so                    DISA (Direct Inward System Access) Appli 0          Running              core
app_echo.so                    Simple Echo Application                  0          Running              core
app_exec.so                    Executes dialplan applications           0          Running              core
app_macro.so                   Extension Macros                         0          Running        deprecated
app_mixmonitor.so              Mixed Audio Monitoring Application       26         Running              core
app_playback.so                Sound File Playback Application          2          Running              core
app_queue.so                   True Call Queueing                       7          Running              core
app_read.so                    Read Variable Application                1          Running              core
app_readexten.so               Read and evaluate extension validity     0          Running              core
app_record.so                  Trivial Record Application               0          Running              core
app_softhangup.so              Hangs up the requested channel           0          Running              core
app_stack.so                   Dialplan subroutines (Gosub, Return, etc 109        Running              core
app_system.so                  Generic System() application             0          Running              core
app_userevent.so               Custom User Event Application            0          Running              core
app_verbose.so                 Send verbose output                      0          Running              core
app_voicemail.so               Comedian Mail (Voicemail System)         0          Running              core
app_while.so                   While Loops and Conditional Execution    0          Running              core
bridge_builtin_features.so     Built in bridging features               1          Running              core
bridge_builtin_interval_features.so Built in bridging interval features      0          Running              core
bridge_holding.so              Holding bridge module                    5          Running              core
bridge_native_rtp.so           Native RTP bridging module               16         Running              core
bridge_simple.so               Simple two channel bridging module       45         Running              core
bridge_softmix.so              Multi-party software based channel mixin 0          Running              core
ccss                           Call Completion Supplementary Services   2          Running              core
cdr                            CDR Engine                               2          Running              core
cel                            CEL Engine                               1          Running              core
chan_pjsip.so                  PJSIP Channel Driver                     910        Running              core
codec_a_mu.so                  A-law and Mulaw direct Coder/Decoder     41         Running              core
codec_adpcm.so                 Adaptive Differential PCM Coder/Decoder  0          Running              core
codec_alaw.so                  A-law Coder/Decoder                      71         Running              core
codec_g722.so                  ITU G.722-64kbps G722 Transcoder         39         Running              core
codec_g723.so                  g723 Coder/Decoder, based on Intel IPP   0          Running           unknown
codec_g726.so                  ITU G.726-32kbps G726 Transcoder         0          Running              core
codec_g729.so                  g729 Coder/Decoder, based on Intel IPP   7          Running           unknown
codec_gsm.so                   GSM Coder/Decoder                        0          Running              core
codec_ilbc.so                  iLBC Coder/Decoder                       0          Running              core
codec_lpc10.so                 LPC10 2.4kbps Coder/Decoder              0          Running              core
codec_resample.so              SLIN Resampling Codec                    52         Running              core
codec_ulaw.so                  mu-Law Coder/Decoder                     2          Running              core
dnsmgr                         DNS Manager                              2          Running              core
dsp                            DSP                                      1          Running              core
enum                           ENUM Support                             1          Running              core
extconfig                      Configuration                            14         Running              core
features                       Call Features                            123        Running              core
format_g719.so                 ITU G.719                                0          Running              core
format_g723.so                 G.723.1 Simple Timestamp File Format     0          Running              core
format_g726.so                 Raw G.726 (16/24/32/40kbps) data         0          Running              core
format_g729.so                 Raw G.729 data                           0          Running              core
format_gsm.so                  Raw GSM data                             0          Running              core
format_h263.so                 Raw H.263 data                           0          Running              core
format_h264.so                 Raw H.264 data                           0          Running              core
format_ilbc.so                 Raw iLBC data                            0          Running              core
format_mp3.so                  MP3 format [Any rate but 8000hz mono is  0          Running          extended
format_pcm.so                  Raw/Sun uLaw/ALaw 8KHz (PCM,PCMA,AU), G. 2          Running              core
format_siren14.so              ITU G.722.1 Annex C (Siren14, licensed f 0          Running              core
format_siren7.so               ITU G.722.1 (Siren7, licensed from Polyc 0          Running              core
format_sln.so                  Raw Signed Linear Audio support (SLN) 8k 0          Running              core
format_vox.so                  Dialogic VOX (ADPCM) File Format         0          Running          extended
format_wav.so                  Microsoft WAV/WAV16 format (8kHz/16kHz S 17         Running              core
format_wav_gsm.so              Microsoft WAV format (Proprietary GSM)   0          Running              core
func_callerid.so               Party ID related dialplan functions (Cal 0          Running              core
func_cdr.so                    Call Detail Record (CDR) dialplan functi 0          Running              core
func_channel.so                Channel information dialplan functions   0          Running              core
func_config.so                 Asterisk configuration file variable acc 0          Running              core
func_curl.so                   Load external URL                        1          Running              core
func_cut.so                    Cut out information from a string        0          Running              core
func_db.so                     Database (astdb) related dialplan functi 0          Running              core
func_devstate.so               Gets or sets a device state in the dialp 0          Running              core
func_env.so                    Environment/filesystem dialplan function 0          Running              core
func_global.so                 Variable dialplan functions              0          Running              core
func_groupcount.so             Channel group dialplan functions         0          Running              core
func_hangupcause.so            HANGUPCAUSE related functions and applic 0          Running              core
func_logic.so                  Logical dialplan functions               0          Running              core
func_math.so                   Mathematical dialplan function           0          Running              core
func_periodic_hook.so          Periodic dialplan hooks.                 2          Running              core
func_strings.so                String handling dialplan functions       0          Running              core
func_timeout.so                Channel timeout dialplan functions       0          Running              core
http                           Built-in HTTP Server                     2          Running              core
indications                    Indication Tone Handling                 1          Running              core
logger                         Logger                                   1          Running              core
manager                        Asterisk Manager Interface               1          Running              core
pbx_config.so                  Text Extension Configuration             0          Running              core
pbx_loopback.so                Loopback Switch                          0          Running              core
pbx_spool.so                   Outgoing Spool Support                   0          Running              core
plc                            PLC                                      1          Running              core
res_adsi.so                    ADSI Resource                            1          Running        deprecated
res_agi.so                     Asterisk Gateway Interface (AGI)         1          Running              core
res_clialiases.so              CLI Aliases                              0          Running              core
res_clioriginate.so            Call origination and redirection from th 0          Running              core
res_config_curl.so             Realtime Curl configuration              0          Running              core
res_convert.so                 File format conversion CLI command       0          Running              core
res_crypto.so                  Cryptographic Digital Signatures         0          Running              core
res_curl.so                    cURL Resource Module                     2          Running              core
res_fax.so                     Generic FAX Applications                 26         Running              core
res_fax_spandsp.so             Spandsp G.711 and T.38 FAX Technologies  25         Running          extended
res_format_attr_celt.so        CELT Format Attribute Module             1          Running              core
res_format_attr_g729.so        G.729 Format Attribute Module            1          Running              core
res_format_attr_h263.so        H.263 Format Attribute Module            1          Running              core
res_format_attr_h264.so        H.264 Format Attribute Module            1          Running              core
res_format_attr_ilbc.so        iLBC Format Attribute Module             1          Running              core
res_format_attr_opus.so        Opus Format Attribute Module             1          Running              core
res_format_attr_silk.so        SILK Format Attribute Module             1          Running              core
res_format_attr_siren14.so     Siren14 Format Attribute Module          1          Running              core
res_format_attr_siren7.so      Siren7 Format Attribute Module           1          Running              core
res_format_attr_vp8.so         VP8 Format Attribute Module              1          Running              core
res_limit.so                   Resource limits                          0          Running              core
res_monitor.so                 Call Monitoring Resource                 2          Running        deprecated
res_musiconhold.so             Music On Hold Resource                   13         Running              core
res_parking.so                 Call Parking Resource                    3          Running              core
res_pjproject.so               PJPROJECT Log and Utility Support        4          Running              core
res_pjsip.so                   Basic SIP resource                       211        Running              core
res_pjsip_authenticator_digest.so PJSIP authentication resource            0          Running              core
res_pjsip_caller_id.so         PJSIP Caller ID Support                  183        Running              core
res_pjsip_dialog_info_body_generator.so PJSIP Extension State Dialog Info+XML Pr 0          Running              core
res_pjsip_diversion.so         PJSIP Add Diversion Header Support       183        Running              core
res_pjsip_dlg_options.so       SIP OPTIONS in dialog handler            182        Running              core
res_pjsip_dtmf_info.so         PJSIP DTMF INFO Support                  182        Running              core
res_pjsip_empty_info.so        PJSIP Empty INFO Support                 182        Running              core
res_pjsip_endpoint_identifier_anonymous.so PJSIP Anonymous endpoint identifier      0          Running              core
res_pjsip_endpoint_identifier_ip.so PJSIP IP endpoint identifier             0          Running              core
res_pjsip_endpoint_identifier_user.so PJSIP username endpoint identifier       0          Running              core
res_pjsip_exten_state.so       PJSIP Extension State Notifications      0          Running              core
res_pjsip_header_funcs.so      PJSIP Header Functions                   182        Running              core
res_pjsip_logger.so            PJSIP Packet Logger                      0          Running              core
res_pjsip_messaging.so         PJSIP Messaging Support                  182        Running              core
res_pjsip_nat.so               PJSIP NAT Support                        182        Running              core
res_pjsip_outbound_publish.so  PJSIP Outbound Publish Support           1          Running              core
res_pjsip_pidf_body_generator.so PJSIP Extension State PIDF Provider      0          Running              core
res_pjsip_pidf_digium_body_supplement.so PJSIP PIDF Digium presence supplement    0          Running              core
res_pjsip_pidf_eyebeam_body_supplement.so PJSIP PIDF Eyebeam supplement            0          Running              core
res_pjsip_pubsub.so            PJSIP event resource                     4116       Running              core
res_pjsip_refer.so             PJSIP Blind and Attended Transfer Suppor 183        Running              core
res_pjsip_registrar.so         PJSIP Registrar Support                  0          Running              core
res_pjsip_rfc3326.so           PJSIP RFC3326 Support                    182        Running              core
res_pjsip_sdp_rtp.so           PJSIP SDP RTP/AVP stream handler         182        Running              core
res_pjsip_session.so           PJSIP Session resource                   14         Running              core
res_pjsip_sips_contact.so      UAC SIPS Contact support                 0          Running              core
res_pjsip_t38.so               PJSIP T.38 UDPTL Support                 311        Running              core
res_pjsip_xpidf_body_generator.so PJSIP Extension State PIDF Provider      0          Running              core
res_rtp_asterisk.so            Asterisk RTP Stack                       128        Running              core
res_security_log.so            Security Event Logging                   0          Running              core
res_smdi.so                    Simplified Message Desk Interface (SMDI) 1          Running          extended
res_sorcery_astdb.so           Sorcery Astdb Object Wizard              2          Running              core
res_sorcery_config.so          Sorcery Configuration File Object Wizard 12         Running              core
res_sorcery_memory.so          Sorcery In-Memory Object Wizard          1          Running              core
res_sorcery_memory_cache.so    Sorcery Memory Cache Object Wizard       6          Running              core
res_speech.so                  Generic Speech Recognition API           1          Running              core
res_srtp.so                    Secure RTP (SRTP)                        6          Running              core
res_statsd.so                  StatsD client support                    1          Running          extended
res_stun_monitor.so            STUN Network Monitor                     0          Running              core
res_timing_pthread.so          pthread Timing Interface                 164        Running          extended
sounds                         Sounds Index                             1          Running              core
udptl                          UDPTL                                    2          Running              core
161 modules loaded


By: Joshua C. Colp (jcolp) 2020-07-21 18:32:57.598-0500

The sorcery memory cache has to be explicitly configured in sorcery.conf to be used. What is the contents of that file?

By: Yury Kirsanov (lt_flash) 2020-07-21 18:36:35.584-0500

cat /etc/asterisk/sorcery.conf
[res_pjsip]

; maximum_objects:  How many object to allow in the cache at 1 time.
; expire_on_reload:  If res_pjsip is reloaded, should the cache be flushed?
; object_lifetime_maximum; How long should an object remain in the cache before it's flushed.

; There is only ever 1 row in the ps_globals table but it's referenced heavily and rarely
; changes.  You may choose to leave this in pjsip.conf and comment out these 2 lines.
; On recent versions of Asterisk, the global section is only read on a pjsip reload which
; effectively caches the settings without an expiration time.
;global/cache=memory_cache,maximum_objects=2,expire_on_reload=yes,object_lifetime_maximum=3600
;global=config,pjsip.conf,criteria=type=global

; There is only ever 1 row in the ps_systems table and it's not referenced after startup.
; You may chose to leave this in pjsip.conf and comment out these 2 lines.
;system/cache=memory_cache,maximum_objects=2,expire_on_reload=yes,object_lifetime_maximum=3600
;system=config,pjsip.conf,criteria=type=system

; endpoints, aors, and auths are heavily read objects but are only written to when their
; configuration is changed.  Set the maximum_objects to the number of extensions, plus the
; number of peered PBXes, plus the number of hosts defined for all providers (a provider
; with 10 hosts defined will use 10 endpoints, 10 aors and 1 auth).  Add a few to spare.
; When a configuration change is made to an object, the specific object is flushed from the
; cache so the object_lifetime_maximum of 15 minutes is just a fail-safe.
endpoint/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900
endpoint=config,pjsip.conf,criteria=type=endpoint

aor/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900
aor=config,pjsip.conf,criteria=type=aor

auth/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900
auth=config,pjsip.conf,criteria=type=auth

; contacts are both read from and written to regularly by Asterisk.
contact/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900
contact=astdb,registrator


[res_pjsip_endpoint_identifier_ip]
; There will be 1 ip identifier for each host across all providers plus 1 for each peered PBX.
identify/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900
identify=config,pjsip.conf,criteria=type=identify


[res_pjsip_outbound_registration]
; There could be 1 outbound registration for each host across all providers depending on whether
; the provider requires them.
registration/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900
registration=config,pjsip.conf,criteria=type=registration

[res_pjsip_outbound_publish]
outbound-publish/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900
outbound-publish=config,pjsip.conf,criteria=type=outbound-publish


By: Joshua C. Colp (jcolp) 2020-07-21 18:43:11.026-0500

You're caching stuff that doesn't need to be cached, in fact placing a cache in front of the config stuff actually makes it less performant. The config support optimizes storage of things after loading is complete. What happens if you remove those caches?

By: Yury Kirsanov (lt_flash) 2020-07-21 18:47:43.049-0500

I'm happy to give it a go, should I just remove that module from loading? I believe this cache configuration comes with default/sample Asterisk configuration files as we definitely didn't modify it.

By: Yury Kirsanov (lt_flash) 2020-07-21 19:03:06.283-0500

Joshua,
I've removed that module from loading, it looks that we had sorcery.conf from very old Asterisk version unchanged, I can confirm that current Asterisk release don't have sorcery.conf configured for PJSIP. We had to load res_sorcery_memory_cache.so because of the fact we had old sorcery.conf configured for PJSIP and PJSIP wasn't loading without this memory cache. Now I have disabled sorcery configuration by putting a blank sorcery.conf and was able to load Asterisk with PJSIP successfully. I will monitor it and report here if it crashes/locks up. Thanks a lot for your help!


By: Yury Kirsanov (lt_flash) 2020-07-28 22:43:53.761-0500

This far we didn't have any outages on affected Asterisk server.

By: Joshua C. Colp (jcolp) 2020-07-30 14:32:21.873-0500

How often did it occur before? Do you do reloads often?

By: Yury Kirsanov (lt_flash) 2020-07-31 05:13:26.018-0500

Hi Joshua,
This was occurring every day or so and yes, we are doing a lot of reloads each day, basically, this Asterisk server has a custom GUI that allows customers to create their own dial plans and apply changes by reloading Asterisk with new configs. We have about 8 such servers and all the rest are working fine. On this server it happened to be a coincidence - we had issues with Asterisk segfaulting due to T.38 fax, this was happening only to this server as one of the customers used fax a lot. It turned out to be a bug and we had it fixed by Asterisk team in ticket 28923. But before that patch one of our admins was trying to do everything possible to fix the issue so he has taken an example for Asterisk sorcery.conf file from official Asterisk tuning manual (https://wiki.asterisk.org/wiki/display/AST/Performance+Tuning) and applied it only to this server. That didn't help with segfaults but after I have applied patch 28923 configuration of sorcery.conf stayed on that server, of which I wasn't aware. Since that time segfaults have gone away but we started to experience lockups. Now after I have removed that sorcery.conf server behaves perfectly fine. I can confirm that we didn't have any lock-ups since 22nd of July. Thanks a lot for your help!