Summary: | ASTERISK-28997: res_pjsip: Asterisk locks up and stops processing any SIP requests | ||
Reporter: | Yury Kirsanov (lt_flash) | Labels: | fax |
Date Opened: | 2020-07-21 03:36:34 | Date Closed: | 2020-07-31 05:41:45 |
Priority: | Minor | Regression? | |
Status: | Closed/Complete | Components: | Resources/res_pjsip Resources/res_sorcery_memory_cache |
Versions: | 16.11.1 | Frequency of Occurrence | Frequent |
Related Issues: | |||
Environment: | Ubuntu Linux 18.04.4 LTS, kernel 4.15.0-111-generic | Attachments: | ( 0) backtrace-threads.txt ( 1) taskprocessors.txt ( 2) threads.txt |
Description: | Hi,
We have a problem with one of many of our Asterisk-based servers. From time to time it just locks up and stops processing SIP messages, complaining about pjsip/distributor queue has reached 500 scheduled tasks. We have several servers based on same Asterisk version with exactly the same environment, but different Asterisk configuration files, some have more SIP devices and more complex dial plans, some has less. This issue happens at any time of the day and I don't see any way to reproduce it. There is no segfault generated as Asterisk keeps running, the only thing is that all distributor queues are overflown for some reason. After Asterisk restart it starts to work like nothing happened. I've checked all network interfaces and there is no DDoS attack or anything like that. Our Asterisk is patched with ASTERISK-28923 patch and it's the same server that was segfaulting before. Other similar servers are running unpatched version of Asterisk 16.11.1. I'm attaching debug logs that I was able to grab during this lockup. Asterisk was compiled with 'NO_DEBUG' option. Please let me know if I can provide any additional information or recompile Asterisk with some additional debug versions. Thanks! | ||
Comments: | By: Asterisk Team (asteriskteam) 2020-07-21 03:36:35.751-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. By: Yury Kirsanov (lt_flash) 2020-07-21 03:38:08.519-0500 This is GDB output as per debug documentation, command used: gdb -ex "thread apply all bt" --batch /opt/asterisk/sbin/asterisk `pidof asterisk` >/tmp/backtrace-threads.txt By: Yury Kirsanov (lt_flash) 2020-07-21 03:38:41.517-0500 'asterisk -rx "core show taskprocessors"' output By: Yury Kirsanov (lt_flash) 2020-07-21 03:39:16.591-0500 'asterisk -rx "core show threads"' output By: Joshua C. Colp (jcolp) 2020-07-21 10:36:41.221-0500 What is the configuration for this and the deployment? I see a sorcery memory cache in use? Is realtime in use? How many auths/endpoints/etc? By: Yury Kirsanov (lt_flash) 2020-07-21 18:25:45.831-0500 Hi Joshua, This server has a complex call-flow setup with a lot of dialplans and endpoints, we have 2195 endpoints created currently though not all of them are in use. Asterisk was compiled with all modules except for all channels rather than pjsip. It uses both UDP, TCP and TLS. Realtime is not used and not loaded. It uses FAX in T.38 and V30 modes via res_fax_spandsp. Server is using Asterisk DB to store a lot of parameters. I'm not sure why sorcery memory cache is in use, probably because it's compiled in by default? Here's a list of loaded modules: asterisk -rx "module show" Module Description Use Count Status Support Level acl Named ACL system 2 Running core app_authenticate.so Authentication Application 0 Running core app_bridgewait.so Place the channel into a holding bridge 0 Running core app_cdr.so Tell Asterisk to not maintain a CDR for 0 Running core app_confbridge.so Conference Bridge Application 0 Running core app_db.so Database Access Functions 0 Running core app_dial.so Dialing Application 70 Running core app_directed_pickup.so Directed Call Pickup Application 0 Running core app_disa.so DISA (Direct Inward System Access) Appli 0 Running core app_echo.so Simple Echo Application 0 Running core app_exec.so Executes dialplan applications 0 Running core app_macro.so Extension Macros 0 Running deprecated app_mixmonitor.so Mixed Audio Monitoring Application 26 Running core app_playback.so Sound File Playback Application 2 Running core app_queue.so True Call Queueing 7 Running core app_read.so Read Variable Application 1 Running core app_readexten.so Read and evaluate extension validity 0 Running core app_record.so Trivial Record Application 0 Running core app_softhangup.so Hangs up the requested channel 0 Running core app_stack.so Dialplan subroutines (Gosub, Return, etc 109 Running core app_system.so Generic System() application 0 Running core app_userevent.so Custom User Event Application 0 Running core app_verbose.so Send verbose output 0 Running core app_voicemail.so Comedian Mail (Voicemail System) 0 Running core app_while.so While Loops and Conditional Execution 0 Running core bridge_builtin_features.so Built in bridging features 1 Running core bridge_builtin_interval_features.so Built in bridging interval features 0 Running core bridge_holding.so Holding bridge module 5 Running core bridge_native_rtp.so Native RTP bridging module 16 Running core bridge_simple.so Simple two channel bridging module 45 Running core bridge_softmix.so Multi-party software based channel mixin 0 Running core ccss Call Completion Supplementary Services 2 Running core cdr CDR Engine 2 Running core cel CEL Engine 1 Running core chan_pjsip.so PJSIP Channel Driver 910 Running core codec_a_mu.so A-law and Mulaw direct Coder/Decoder 41 Running core codec_adpcm.so Adaptive Differential PCM Coder/Decoder 0 Running core codec_alaw.so A-law Coder/Decoder 71 Running core codec_g722.so ITU G.722-64kbps G722 Transcoder 39 Running core codec_g723.so g723 Coder/Decoder, based on Intel IPP 0 Running unknown codec_g726.so ITU G.726-32kbps G726 Transcoder 0 Running core codec_g729.so g729 Coder/Decoder, based on Intel IPP 7 Running unknown codec_gsm.so GSM Coder/Decoder 0 Running core codec_ilbc.so iLBC Coder/Decoder 0 Running core codec_lpc10.so LPC10 2.4kbps Coder/Decoder 0 Running core codec_resample.so SLIN Resampling Codec 52 Running core codec_ulaw.so mu-Law Coder/Decoder 2 Running core dnsmgr DNS Manager 2 Running core dsp DSP 1 Running core enum ENUM Support 1 Running core extconfig Configuration 14 Running core features Call Features 123 Running core format_g719.so ITU G.719 0 Running core format_g723.so G.723.1 Simple Timestamp File Format 0 Running core format_g726.so Raw G.726 (16/24/32/40kbps) data 0 Running core format_g729.so Raw G.729 data 0 Running core format_gsm.so Raw GSM data 0 Running core format_h263.so Raw H.263 data 0 Running core format_h264.so Raw H.264 data 0 Running core format_ilbc.so Raw iLBC data 0 Running core format_mp3.so MP3 format [Any rate but 8000hz mono is 0 Running extended format_pcm.so Raw/Sun uLaw/ALaw 8KHz (PCM,PCMA,AU), G. 2 Running core format_siren14.so ITU G.722.1 Annex C (Siren14, licensed f 0 Running core format_siren7.so ITU G.722.1 (Siren7, licensed from Polyc 0 Running core format_sln.so Raw Signed Linear Audio support (SLN) 8k 0 Running core format_vox.so Dialogic VOX (ADPCM) File Format 0 Running extended format_wav.so Microsoft WAV/WAV16 format (8kHz/16kHz S 17 Running core format_wav_gsm.so Microsoft WAV format (Proprietary GSM) 0 Running core func_callerid.so Party ID related dialplan functions (Cal 0 Running core func_cdr.so Call Detail Record (CDR) dialplan functi 0 Running core func_channel.so Channel information dialplan functions 0 Running core func_config.so Asterisk configuration file variable acc 0 Running core func_curl.so Load external URL 1 Running core func_cut.so Cut out information from a string 0 Running core func_db.so Database (astdb) related dialplan functi 0 Running core func_devstate.so Gets or sets a device state in the dialp 0 Running core func_env.so Environment/filesystem dialplan function 0 Running core func_global.so Variable dialplan functions 0 Running core func_groupcount.so Channel group dialplan functions 0 Running core func_hangupcause.so HANGUPCAUSE related functions and applic 0 Running core func_logic.so Logical dialplan functions 0 Running core func_math.so Mathematical dialplan function 0 Running core func_periodic_hook.so Periodic dialplan hooks. 2 Running core func_strings.so String handling dialplan functions 0 Running core func_timeout.so Channel timeout dialplan functions 0 Running core http Built-in HTTP Server 2 Running core indications Indication Tone Handling 1 Running core logger Logger 1 Running core manager Asterisk Manager Interface 1 Running core pbx_config.so Text Extension Configuration 0 Running core pbx_loopback.so Loopback Switch 0 Running core pbx_spool.so Outgoing Spool Support 0 Running core plc PLC 1 Running core res_adsi.so ADSI Resource 1 Running deprecated res_agi.so Asterisk Gateway Interface (AGI) 1 Running core res_clialiases.so CLI Aliases 0 Running core res_clioriginate.so Call origination and redirection from th 0 Running core res_config_curl.so Realtime Curl configuration 0 Running core res_convert.so File format conversion CLI command 0 Running core res_crypto.so Cryptographic Digital Signatures 0 Running core res_curl.so cURL Resource Module 2 Running core res_fax.so Generic FAX Applications 26 Running core res_fax_spandsp.so Spandsp G.711 and T.38 FAX Technologies 25 Running extended res_format_attr_celt.so CELT Format Attribute Module 1 Running core res_format_attr_g729.so G.729 Format Attribute Module 1 Running core res_format_attr_h263.so H.263 Format Attribute Module 1 Running core res_format_attr_h264.so H.264 Format Attribute Module 1 Running core res_format_attr_ilbc.so iLBC Format Attribute Module 1 Running core res_format_attr_opus.so Opus Format Attribute Module 1 Running core res_format_attr_silk.so SILK Format Attribute Module 1 Running core res_format_attr_siren14.so Siren14 Format Attribute Module 1 Running core res_format_attr_siren7.so Siren7 Format Attribute Module 1 Running core res_format_attr_vp8.so VP8 Format Attribute Module 1 Running core res_limit.so Resource limits 0 Running core res_monitor.so Call Monitoring Resource 2 Running deprecated res_musiconhold.so Music On Hold Resource 13 Running core res_parking.so Call Parking Resource 3 Running core res_pjproject.so PJPROJECT Log and Utility Support 4 Running core res_pjsip.so Basic SIP resource 211 Running core res_pjsip_authenticator_digest.so PJSIP authentication resource 0 Running core res_pjsip_caller_id.so PJSIP Caller ID Support 183 Running core res_pjsip_dialog_info_body_generator.so PJSIP Extension State Dialog Info+XML Pr 0 Running core res_pjsip_diversion.so PJSIP Add Diversion Header Support 183 Running core res_pjsip_dlg_options.so SIP OPTIONS in dialog handler 182 Running core res_pjsip_dtmf_info.so PJSIP DTMF INFO Support 182 Running core res_pjsip_empty_info.so PJSIP Empty INFO Support 182 Running core res_pjsip_endpoint_identifier_anonymous.so PJSIP Anonymous endpoint identifier 0 Running core res_pjsip_endpoint_identifier_ip.so PJSIP IP endpoint identifier 0 Running core res_pjsip_endpoint_identifier_user.so PJSIP username endpoint identifier 0 Running core res_pjsip_exten_state.so PJSIP Extension State Notifications 0 Running core res_pjsip_header_funcs.so PJSIP Header Functions 182 Running core res_pjsip_logger.so PJSIP Packet Logger 0 Running core res_pjsip_messaging.so PJSIP Messaging Support 182 Running core res_pjsip_nat.so PJSIP NAT Support 182 Running core res_pjsip_outbound_publish.so PJSIP Outbound Publish Support 1 Running core res_pjsip_pidf_body_generator.so PJSIP Extension State PIDF Provider 0 Running core res_pjsip_pidf_digium_body_supplement.so PJSIP PIDF Digium presence supplement 0 Running core res_pjsip_pidf_eyebeam_body_supplement.so PJSIP PIDF Eyebeam supplement 0 Running core res_pjsip_pubsub.so PJSIP event resource 4116 Running core res_pjsip_refer.so PJSIP Blind and Attended Transfer Suppor 183 Running core res_pjsip_registrar.so PJSIP Registrar Support 0 Running core res_pjsip_rfc3326.so PJSIP RFC3326 Support 182 Running core res_pjsip_sdp_rtp.so PJSIP SDP RTP/AVP stream handler 182 Running core res_pjsip_session.so PJSIP Session resource 14 Running core res_pjsip_sips_contact.so UAC SIPS Contact support 0 Running core res_pjsip_t38.so PJSIP T.38 UDPTL Support 311 Running core res_pjsip_xpidf_body_generator.so PJSIP Extension State PIDF Provider 0 Running core res_rtp_asterisk.so Asterisk RTP Stack 128 Running core res_security_log.so Security Event Logging 0 Running core res_smdi.so Simplified Message Desk Interface (SMDI) 1 Running extended res_sorcery_astdb.so Sorcery Astdb Object Wizard 2 Running core res_sorcery_config.so Sorcery Configuration File Object Wizard 12 Running core res_sorcery_memory.so Sorcery In-Memory Object Wizard 1 Running core res_sorcery_memory_cache.so Sorcery Memory Cache Object Wizard 6 Running core res_speech.so Generic Speech Recognition API 1 Running core res_srtp.so Secure RTP (SRTP) 6 Running core res_statsd.so StatsD client support 1 Running extended res_stun_monitor.so STUN Network Monitor 0 Running core res_timing_pthread.so pthread Timing Interface 164 Running extended sounds Sounds Index 1 Running core udptl UDPTL 2 Running core 161 modules loaded By: Joshua C. Colp (jcolp) 2020-07-21 18:32:57.598-0500 The sorcery memory cache has to be explicitly configured in sorcery.conf to be used. What is the contents of that file? By: Yury Kirsanov (lt_flash) 2020-07-21 18:36:35.584-0500 cat /etc/asterisk/sorcery.conf [res_pjsip] ; maximum_objects: How many object to allow in the cache at 1 time. ; expire_on_reload: If res_pjsip is reloaded, should the cache be flushed? ; object_lifetime_maximum; How long should an object remain in the cache before it's flushed. ; There is only ever 1 row in the ps_globals table but it's referenced heavily and rarely ; changes. You may choose to leave this in pjsip.conf and comment out these 2 lines. ; On recent versions of Asterisk, the global section is only read on a pjsip reload which ; effectively caches the settings without an expiration time. ;global/cache=memory_cache,maximum_objects=2,expire_on_reload=yes,object_lifetime_maximum=3600 ;global=config,pjsip.conf,criteria=type=global ; There is only ever 1 row in the ps_systems table and it's not referenced after startup. ; You may chose to leave this in pjsip.conf and comment out these 2 lines. ;system/cache=memory_cache,maximum_objects=2,expire_on_reload=yes,object_lifetime_maximum=3600 ;system=config,pjsip.conf,criteria=type=system ; endpoints, aors, and auths are heavily read objects but are only written to when their ; configuration is changed. Set the maximum_objects to the number of extensions, plus the ; number of peered PBXes, plus the number of hosts defined for all providers (a provider ; with 10 hosts defined will use 10 endpoints, 10 aors and 1 auth). Add a few to spare. ; When a configuration change is made to an object, the specific object is flushed from the ; cache so the object_lifetime_maximum of 15 minutes is just a fail-safe. endpoint/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900 endpoint=config,pjsip.conf,criteria=type=endpoint aor/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900 aor=config,pjsip.conf,criteria=type=aor auth/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900 auth=config,pjsip.conf,criteria=type=auth ; contacts are both read from and written to regularly by Asterisk. contact/cache=memory_cache,maximum_objects=3000,expire_on_reload=yes,object_lifetime_maximum=900 contact=astdb,registrator [res_pjsip_endpoint_identifier_ip] ; There will be 1 ip identifier for each host across all providers plus 1 for each peered PBX. identify/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900 identify=config,pjsip.conf,criteria=type=identify [res_pjsip_outbound_registration] ; There could be 1 outbound registration for each host across all providers depending on whether ; the provider requires them. registration/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900 registration=config,pjsip.conf,criteria=type=registration [res_pjsip_outbound_publish] outbound-publish/cache=memory_cache,maximum_objects=150,expire_on_reload=yes,object_lifetime_maximum=900 outbound-publish=config,pjsip.conf,criteria=type=outbound-publish By: Joshua C. Colp (jcolp) 2020-07-21 18:43:11.026-0500 You're caching stuff that doesn't need to be cached, in fact placing a cache in front of the config stuff actually makes it less performant. The config support optimizes storage of things after loading is complete. What happens if you remove those caches? By: Yury Kirsanov (lt_flash) 2020-07-21 18:47:43.049-0500 I'm happy to give it a go, should I just remove that module from loading? I believe this cache configuration comes with default/sample Asterisk configuration files as we definitely didn't modify it. By: Yury Kirsanov (lt_flash) 2020-07-21 19:03:06.283-0500 Joshua, I've removed that module from loading, it looks that we had sorcery.conf from very old Asterisk version unchanged, I can confirm that current Asterisk release don't have sorcery.conf configured for PJSIP. We had to load res_sorcery_memory_cache.so because of the fact we had old sorcery.conf configured for PJSIP and PJSIP wasn't loading without this memory cache. Now I have disabled sorcery configuration by putting a blank sorcery.conf and was able to load Asterisk with PJSIP successfully. I will monitor it and report here if it crashes/locks up. Thanks a lot for your help! By: Yury Kirsanov (lt_flash) 2020-07-28 22:43:53.761-0500 This far we didn't have any outages on affected Asterisk server. By: Joshua C. Colp (jcolp) 2020-07-30 14:32:21.873-0500 How often did it occur before? Do you do reloads often? By: Yury Kirsanov (lt_flash) 2020-07-31 05:13:26.018-0500 Hi Joshua, This was occurring every day or so and yes, we are doing a lot of reloads each day, basically, this Asterisk server has a custom GUI that allows customers to create their own dial plans and apply changes by reloading Asterisk with new configs. We have about 8 such servers and all the rest are working fine. On this server it happened to be a coincidence - we had issues with Asterisk segfaulting due to T.38 fax, this was happening only to this server as one of the customers used fax a lot. It turned out to be a bug and we had it fixed by Asterisk team in ticket 28923. But before that patch one of our admins was trying to do everything possible to fix the issue so he has taken an example for Asterisk sorcery.conf file from official Asterisk tuning manual (https://wiki.asterisk.org/wiki/display/AST/Performance+Tuning) and applied it only to this server. That didn't help with segfaults but after I have applied patch 28923 configuration of sorcery.conf stayed on that server, of which I wasn't aware. Since that time segfaults have gone away but we started to experience lockups. Now after I have removed that sorcery.conf server behaves perfectly fine. I can confirm that we didn't have any lock-ups since 22nd of July. Thanks a lot for your help! |