Summary: | ASTERISK-29340: stasis: PJSIP endpoint taskprocessor congestion after upgrading | ||
Reporter: | Jan Blom (jblom) | Labels: | |
Date Opened: | 2021-03-10 11:23:59.000-0600 | Date Closed: | |
Priority: | Minor | Regression? | |
Status: | Open/New | Components: | Core/Stasis Resources/res_pjsip |
Versions: | 18.2.2 | Frequency of Occurrence | Constant |
Related Issues: | |||
Environment: | 8 core vm, lots of RAM, CentOS Linux release 7.5, Asterisk 18.2.2 | Attachments: | ( 0) modules_18.2.2.txt ( 1) stasis_statistics_18.2.2.txt ( 2) taskprocessors_16.16.2.txt ( 3) taskprocessors_16.6.1.txt ( 4) taskprocessors_18.2.2.txt ( 5) taskprocessors_decline_ast_channel_varset_type.txt |
Description: | After upgrading from Asterisk 16.6.1 to 18.2.2 I notice a difference in how tasks are processed by the taskprocessors.
My setup is that the asterisk box receives inbound calls (using chan_pjsip) from a single endpoint (SIP proxy). There is one taskprocessor {{stasis/p:endpoint:PJSIP/<endpoint name>}} that handles a magnitude more messages than any other taskprocessor and the max depth is well above the high watermark. Using the older Asterisk 16.6.1 and identical configuration, the same taskprocessor receives no messages. This is the taksprocessor in question, after a testrun with 4170 calls at a rate of 100 calls per second. bq.stasis/p:endpoint:PJSIP/proxy-00000019 1579171 0 1597 450 500 I have attached the complete list of taskprocessors (core show taskprocessors) after a testrun. For both Asterisk versions. The dialplan is quite simple, only answering, playing a prompt and hangup. It also involves a few ODBC database calls. However, changing the dialplan to just answer, wait a few seconds and hangup will result in the same behavior,. Even though the number of messages processed by the taskprocessor will decrease a bit, it will still be a huge number. I am not entirely sure this is a problem. Calls seem to be handles properly and audio is fine. However, it is a big change in behavior compared with Asterisk 16 and a potential bottleneck if a single task is going to process a huge amount of messages. Except from this taskprocessor, all other message handling is well distributed over the various taskprocessors. Another annoyance is that the log gets filled with these warnings: bq.taskprocessor.c:1160 taskprocessor_push: The 'stasis/p:endpoint:PJSIP/proxy-00000019' task processor queue reached 500 scheduled tasks again. Most bells and whistles are disabled, no ARI, AMI, AGI. SIP UDP only. The list of loaded modules in Asterisk 18.2.2 is attached. | ||
Comments: | By: Asterisk Team (asteriskteam) 2021-03-10 11:24:00.148-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/]. By: Joshua C. Colp (jcolp) 2021-03-11 04:33:16.114-0600 Can you please build Asterisk using: ./configure --enable-dev-mode Once built and installed, perform your test and then attach the results of the following from the Asterisk CLI: stasis statistics show messages stasis statistics show subscriptions stasis statistics show topics This will provide statistics information on messages being queued up and what is using them. By: Jan Blom (jblom) 2021-03-12 05:39:54.197-0600 Asterisk 18.2.2 built with: ./configure --enable-dev-mode Running a test with 4165 calls @ ~100 call per second Log contains result from: stasis statistics show messages stasis statistics show subscriptions stasis statistics show topics By: Jan Blom (jblom) 2021-03-12 05:40:47.814-0600 Log has been uploaded. By: Joshua C. Colp (jcolp) 2021-03-12 05:51:28.662-0600 If you add "decline=ast_channel_varset_type" under the [declined_message_types] section of stasis.conf does this alter the result? By: Jan Blom (jblom) 2021-03-12 06:04:20.549-0600 Added decline=ast_channel_varset_type By: Jan Blom (jblom) 2021-03-12 06:09:45.255-0600 Uploaded a new log after adding *decline=ast_channel_varset_type* The console is flooded with these messages now: {quote} stasis.c:2295 stasis_message_type_declined: Declining to allocate Stasis message type 'ast_channel_varset_type' due to configuration {quote} By: Benjamin Keith Ford (bford) 2021-03-15 14:13:43.392-0500 Are you seeing the traskprocessor messages after doing this, or just the "Declining to allocate" messages? If the taskprocessor messages are gone, I think we have enough information to work with. By: Jan Blom (jblom) 2021-03-18 02:09:24.115-0500 I still get the taskprocessor messages as well. The max depth of the taskprocessor is still above the limit. From the latest log file: {noformat} Processor Processed In Queue Max Depth Low water High water stasis/p:endpoint:PJSIP/proxy-00000019 1584102 0 1859 450 500 {noformat} By: Benjamin Keith Ford (bford) 2021-03-18 10:44:26.727-0500 Are you able to test other versions out easily? If so, does this happen on the latest release of 16 (16.16.2) as well? This is just to confirm it's happening on both versions and not some change just in 18. By: Jan Blom (jblom) 2021-03-18 13:47:40.071-0500 Sure. I built 16.16.2 and ran the same test as I initially did with 18.2.2, removing "decline=ast_channel_varset_type" again, since that didn't change anything. Taskprocessor log has been attached. On 16.16.2 I don't get any taskprocessor warnings. Most taskprcessor numbers are just about the same in 16.16.2 and 18.2.2 after running a test. The only lines that are significantly different are these: {code:title=18.2.2} Processor Processed In Queue Max Depth Low water High water stasis/p:endpoint:PJSIP/proxy-00000019 1577245 0 1567 450 500 stasis/pool 86146 0 2 450 500 stasis/pool-control 172313 0 14 450 500 {code} {code:title=16.16.2} Processor Processed In Queue Max Depth Low water High water stasis/p:endpoint:PJSIP/proxy-00000019 4158 0 51 450 500 stasis/pool 2143 0 1 450 500 stasis/pool-control 4307 0 16 450 500 {code} |