[Home]

Summary:ASTERISK-28560: do_monitor Thread hangs on 99% cpu and doesn't respond
Reporter:Eike Maier (infy1801)Labels:
Date Opened:2019-10-01 07:19:16Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Channels/chan_sip/Registration
Versions:13.29.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:CentOS 7.5, 2 Cores, 16GB RAM, Gigabit ethernet, XCP hosted VMAttachments:( 0) gdb_do_monitor_thread_hanging.txt
Description:We have a small Asterisk PBX with about 50 to 70 extensions, 5 simult. calls and are using the queue app.
We're using some realtime functions like dynamic realtime for queue, queue member, queue log, moh and voicemail.
Most of the other config files are used via static realtime.

When we're having some calls on our PBX and there are registers at the same time, sometimes the do_monitor thread of asterisk hangs on 99% CPU and doesn't respond to any input. The chan_sip module isn't doing anything after that. Most of the time, only a restart is solving this state.

This state happens at least 2 times a day and is a bit annoying to reproduce.

We're using a small modification, to ensure the sippeer table is updated, like via dynamic realtime, but this happens even without the modification.

I have the output of core show locks and core show threads, a gdb trace could be done the next time, this happens.
Comments:By: Asterisk Team (asteriskteam) 2019-10-01 07:19:16.956-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: George Joseph (gjoseph) 2019-10-02 06:03:48.968-0500

The chan_sip channel driver is in 'extended' support status and is supported only by community members.  Your issue is in the queue. Your patience is appreciated as a community developer may work the issue when time and resources become available.

Asterisk is an open source project and community members work the issues on a voluntary basis. You are welcome to develop your own patches and submit them to the project.[1]

If you are not a programmer and you are in a hurry to see a patch provided then you might try rallying support on the Asterisk users mailing list or forums.[2] Another alternative is offering a bug bounty on the asterisk-dev mailing list.[3] Often a little incentive can go a long way.

[1]: https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process
[2]: http://www.asterisk.org/community/discuss
[3]: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Bug+Bounties



By: Eike Maier (infy1801) 2019-10-04 00:07:55.193-0500

Thanks for the information. I'm a developer myself and am trying to resolve this problem, too.
I've been working on this since about two months ago.

For further information from my side:
I found out that the odbc connector is trying to do a SSL handshake first, then proceed doing a normal connection. When the system is under a bit of load, it seems like the SSL handshake is not proceeded correctly and the connection shut down. Due to the lack of response, the asterisk odbc thread is waiting infinitely, which causes the above mentioned issues.
(See the gdb trace)

After setting the option "SSLMode=Disabled" in odbc.ini, this issue has not surfaced until now.
I'll be watching this issue for a bit and will give a reponse, if it's happening again.

I think this option could be helpful not only in my case, but also in other cases, asterisk realtime is used.

By: Eike Maier (infy1801) 2019-10-04 00:08:58.218-0500

gdb Trace of the do_monitor thread when hanging on 99% CPU