[Home]

Summary:ASTERISK-29340: stasis: PJSIP endpoint taskprocessor congestion after upgrading
Reporter:Jan Blom (jblom)Labels:
Date Opened:2021-03-10 11:23:59.000-0600Date Closed:
Priority:MinorRegression?
Status:Open/NewComponents:Core/Stasis Resources/res_pjsip
Versions:18.2.2 Frequency of
Occurrence
Constant
Related
Issues:
Environment:8 core vm, lots of RAM, CentOS Linux release 7.5, Asterisk 18.2.2Attachments:( 0) modules_18.2.2.txt
( 1) stasis_statistics_18.2.2.txt
( 2) taskprocessors_16.16.2.txt
( 3) taskprocessors_16.6.1.txt
( 4) taskprocessors_18.2.2.txt
( 5) taskprocessors_decline_ast_channel_varset_type.txt
Description:After upgrading from Asterisk 16.6.1 to 18.2.2 I notice a difference in how tasks are processed by the taskprocessors.

My setup is that the asterisk box receives inbound calls (using chan_pjsip) from a single endpoint (SIP proxy).

There is one taskprocessor {{stasis/p:endpoint:PJSIP/<endpoint name>}} that handles a magnitude more messages than any other taskprocessor and the max depth is well above the high watermark.

Using the older Asterisk 16.6.1 and identical configuration, the same taskprocessor receives no messages.

This is the taksprocessor in question, after a testrun with 4170 calls at a rate of 100 calls per second.
bq.stasis/p:endpoint:PJSIP/proxy-00000019                                    1579171          0       1597        450        500

I have attached the complete list of taskprocessors (core show taskprocessors) after a testrun. For both Asterisk versions.

The dialplan is quite simple, only answering, playing a prompt and hangup. It also involves a few ODBC database calls. However, changing the dialplan to just answer, wait a few seconds and hangup will result in the same behavior,. Even though the number of messages processed by the taskprocessor will decrease a bit, it will still be a huge number.

I am not entirely sure this is a problem. Calls seem to be handles properly and audio is fine. However, it is a big change in behavior compared with Asterisk 16 and a potential bottleneck if a single task is going to process a huge amount of messages. Except from this taskprocessor, all other message handling is well distributed over the various taskprocessors.

Another annoyance is that the log gets filled with these warnings:
bq.taskprocessor.c:1160 taskprocessor_push: The 'stasis/p:endpoint:PJSIP/proxy-00000019' task processor queue reached 500 scheduled tasks again.

Most bells and whistles are disabled, no ARI, AMI, AGI. SIP UDP only.
The list of loaded modules in Asterisk 18.2.2 is attached.
Comments:By: Asterisk Team (asteriskteam) 2021-03-10 11:24:00.148-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2021-03-11 04:33:16.114-0600

Can you please build Asterisk using:

./configure --enable-dev-mode

Once built and installed, perform your test and then attach the results of the following from the Asterisk CLI:

stasis statistics show messages
stasis statistics show subscriptions
stasis statistics show topics

This will provide statistics information on messages being queued up and what is using them.

By: Jan Blom (jblom) 2021-03-12 05:39:54.197-0600

Asterisk 18.2.2 built with:
./configure --enable-dev-mode

Running a test with 4165 calls @ ~100 call per second

Log contains result from:
stasis statistics show messages
stasis statistics show subscriptions
stasis statistics show topics


By: Jan Blom (jblom) 2021-03-12 05:40:47.814-0600

Log has been uploaded.

By: Joshua C. Colp (jcolp) 2021-03-12 05:51:28.662-0600

If you add "decline=ast_channel_varset_type" under the [declined_message_types] section of stasis.conf does this alter the result?

By: Jan Blom (jblom) 2021-03-12 06:04:20.549-0600

Added
decline=ast_channel_varset_type


By: Jan Blom (jblom) 2021-03-12 06:09:45.255-0600

Uploaded a new log after adding
*decline=ast_channel_varset_type*


The console is flooded with these messages now:
{quote}
stasis.c:2295 stasis_message_type_declined: Declining to allocate Stasis message type 'ast_channel_varset_type' due to configuration
{quote}

By: Benjamin Keith Ford (bford) 2021-03-15 14:13:43.392-0500

Are you seeing the traskprocessor messages after doing this, or just the "Declining to allocate" messages? If the taskprocessor messages are gone, I think we have enough information to work with.

By: Jan Blom (jblom) 2021-03-18 02:09:24.115-0500

I still get the taskprocessor messages as well. The max depth of the taskprocessor is still above the limit.

From the latest log file:

{noformat}
Processor                                                               Processed   In Queue  Max Depth  Low water High water
stasis/p:endpoint:PJSIP/proxy-00000019                                    1584102          0       1859        450        500
{noformat}

By: Benjamin Keith Ford (bford) 2021-03-18 10:44:26.727-0500

Are you able to test other versions out easily? If so, does this happen on the latest release of 16 (16.16.2) as well? This is just to confirm it's happening on both versions and not some change just in 18.

By: Jan Blom (jblom) 2021-03-18 13:47:40.071-0500

Sure. I built 16.16.2 and ran the same test as I initially did with 18.2.2, removing "decline=ast_channel_varset_type" again, since that didn't change anything.
Taskprocessor log has been attached.

On 16.16.2 I don't get any taskprocessor warnings.

Most taskprcessor numbers are just about the same in 16.16.2 and 18.2.2 after running a test. The only lines that are significantly different are these:

{code:title=18.2.2}
Processor                                                               Processed   In Queue  Max Depth  Low water High water
stasis/p:endpoint:PJSIP/proxy-00000019                                    1577245          0       1567        450        500
stasis/pool                                                                 86146          0          2        450        500
stasis/pool-control                                                        172313          0         14        450        500
{code}

{code:title=16.16.2}
Processor                                                               Processed   In Queue  Max Depth  Low water High water
stasis/p:endpoint:PJSIP/proxy-00000019                                       4158          0         51        450        500
stasis/pool                                                                  2143          0          1        450        500
stasis/pool-control                                                          4307          0         16        450        500
{code}