[Home]

Summary:ASTERISK-28368: Low performance. Many errors taskprocessor
Reporter:Nikolay shakin (post@itprofit32.ru)Labels:pjsip
Date Opened:2019-04-02 04:27:12Date Closed:2019-04-08 11:36:47
Priority:MajorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:16.2.1 Frequency of
Occurrence
Related
Issues:
Environment:Centos7 kernel 3.10.0-862.9.1.el7.x86_64 FreePBX 14.0.5.25 FOP2 pjsip 2.8Attachments:( 0) debug_log_taskproc.zip
( 1) full.zip
( 2) Screenshot_44.png
( 3) Screenshot_45.png
( 4) Screenshot_47.png
( 5) Screenshot_48.png
Description:I use Asterisk with FreePBX for call-center about 50 users and 30-50 calls.
A Dialplan of the server is ordinary(generated by FreePBX) + few web-hooks added by me, I use queues with rrmemory strategy.
I have many warnings at log, and a problem with pjsip, sometime we don't answer to invites at the time (delay more then 3 seconds), and other side think we are dead. If I ping taskprocessor of the trunk (core ping  taskprocessor subp:PJSIP/trunk_name-22222) at the moment, it shows 5 seconds delay.

What did I tried:
-change hardware (another server)
-disable CDR
-disable CEL
-enable all possible AMI filters
-increase threadpool pjsip
But it does not help, I still have the problem.

And one more strange thing, CPU load is too high for 30 calls, I have almost  the same configuration on asterisk 13(chan_sip) and there are no similar problems or CPU usage.


I have warnings like this:
{quote}
[2019-04-02 10:29:16] WARNING[61361][C-00001f86]: taskprocessor.c:913 taskprocessor_push: The 'stasis-core-control' task processor queue reached 500scheduled tasks again.
[2019-04-02 10:30:04] WARNING[16800]: taskprocessor.c:913 taskprocessor_push: The 'subm:manager_topic-00000006' task processor queue reached 3000 scheduled tasks again.
[2019-04-02 10:30:14] WARNING[62357][C-00001fa4]: taskprocessor.c:913 taskprocessor_push: The 'subm:manager_topic-00000006' task processor queue reached 3000 scheduled tasks again.
[2019-04-02 10:30:50] WARNING[63620][C-00001fbd]: taskprocessor.c:913 taskprocessor_push: The 'subp:ast_channel_topic_all-000023d8' task processor queue reached 500 scheduled tasks.
[2019-04-02 10:30:50] WARNING[63620][C-00001fbd]: taskprocessor.c:913 taskprocessor_push: The 'subp:ast_channel_topic_all-0000209e' task processor queue reached 500 scheduled tasks again.
[2019-04-02 10:32:34] WARNING[65533][C-00001fec]: taskprocessor.c:913 taskprocessor_push: The 'subm:manager_topic-00000006' task processor queue reached 3000 scheduled tasks again.
{quote}

The worst thing is I can't even get normal log, because if I enable pjsip logger and core debug I have logger errors like these:
{quote}
[2019-04-02 12:01:47] WARNING[24275]: logger: ***: Log queue threshold (1000) exceeded.  Discarding new messages.
[2019-04-02 12:01:47] WARNING[16799]: logger: ***: Logging resumed.  220 messages discarded.{quote}
Comments:By: Asterisk Team (asteriskteam) 2019-04-02 04:27:13.353-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 04:32:35.469-0500

log-file without debug

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 04:58:00.460-0500

Logger warnings If debug enabled

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 05:12:44.715-0500

Log-file with debug enabled.
But when I was getting it I had warnings:

[2019-04-02 12:55:11] WARNING[16847]: logger: ***: Log queue threshold (1000) exceeded.  Discarding new messages.
[2019-04-02 12:55:11] WARNING[16799]: logger: ***: Logging resumed.  7794 messages discarded.


By: Joshua C. Colp (jcolp) 2019-04-02 05:33:39.882-0500

We need details about the system itself and your usage patterns. Generally these messages indicate that something is using up resources, so Asterisk itself can't keep up with the load which can cause a cascading failure in some cases. I've also seen this occur when the underlying storage is having problems.

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 07:32:52.592-0500

Nothing unusual, server is on a hyper-v VM. It's only one VM on this  server, and I tried use another hardware hyper-v server. Installed from FreePBX-Distro , asterisk compiled  from source.  There is one idea about storage, I'm using ide-controller  on this VM, may be it affects.

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 07:34:27.501-0500

atop

By: Joshua C. Colp (jcolp) 2019-04-02 07:49:40.725-0500

And what else is being used? What is the usage pattern?

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 08:04:54.607-0500

I'm sorry, unfortunately I afraid,  I don't understand the question. If you mean other process on the server,  it's  LAMP.
I checked write speed --- (839 MB) copied, 1.21064 s, 693 MB/s


By: Joshua C. Colp (jcolp) 2019-04-02 08:21:53.075-0500

I mean how are you using Asterisk and FreePBX? Are they simple calls? Large IVRs? Conference bridges? Etc.

Without this it's hard to answer because Asterisk is extremely flexible in what it can be used for.

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 08:33:05.226-0500

Oh, now  I see, sorry.
It is a call-center , 5 trunks------ > playback(hello)-----> 10 queues ------> 30 sip-users
Mostly(90%) it's  incoming calls

By: Nikolay shakin (post@itprofit32.ru) 2019-04-02 09:30:49.481-0500

90% of calls processed by this context.

By: Joshua C. Colp (jcolp) 2019-04-08 10:08:31.189-0500

I'm continuing to work through the log and data to try to identify things, but can you try upgrading to 16.3.0 which should help with the manager task processor.

By: Joshua C. Colp (jcolp) 2019-04-08 10:35:41.275-0500

Can you also please state what the queue configuration is? What are the members?

By: Joshua C. Colp (jcolp) 2019-04-08 10:37:33.966-0500

Can you also provide just a verbose log so I can see the dialplan execution? It almost seems as though there's a dialplan loop or tons of calls per second to the dialplan.

By: Joshua C. Colp (jcolp) 2019-04-08 10:43:24.867-0500

There is also a newly added option[1] to PJSIP which can control PJSIP's behavior when a taskprocessor is in an overloaded state. This can be used to disable the behavior that causes chan_pjsip to stop responding to new traffic.

[1] https://github.com/asterisk/asterisk/blob/master/configs/samples/pjsip.conf.sample#L1161

By: Nikolay shakin (post@itprofit32.ru) 2019-04-08 11:16:36.421-0500

Just yesterday I reinstalled the system without hyper-v, directly to the same hardware. And today I have no any warnings with the same dialplan , and more number of  calls than it was earlier. The version I use is 16.3.0 however , I don't think the update  helped, probably it was hyper-v.
No warnings , no sip delay, no abnormal CPU usage
Work fine as designed.

By: Nikolay shakin (post@itprofit32.ru) 2019-04-08 11:17:37.839-0500

It is as windows 2012 hyper-v generation 1

By: Joshua C. Colp (jcolp) 2019-04-08 11:18:18.127-0500

That's certainly possible. It really did look like Asterisk was overloaded and just couldn't process things fast enough...

By: Nikolay shakin (post@itprofit32.ru) 2019-04-08 11:33:57.538-0500

I tried two different servers(different hardware but the same hiper-v version). Same config gives different results on nacked hardware and hypervisor.  I tried to make a stress test, and I saw similar problems but number of calls was 500+ and CPS about 30. So there are not too many chances that problem was with astetisk.  

By: Joshua C. Colp (jcolp) 2019-04-08 11:36:47.917-0500

Per your comment I'm suspending this out.

By: Alexandre Keller (alexandrekeller) 2021-04-19 07:14:28.812-0500

I  do have the same issues with Asterisk 16.17.0 using WebRTC + PJSIP endpoints.

Any ideas?