[Home]

Summary:ASTERISK-17221: Asterisk SVN 1.8 running at 99% CPU
Reporter:BrettH (zeero)Labels:
Date Opened:2011-01-10 21:45:11.000-0600Date Closed:2019-04-09 12:32:57
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-info.txt
( 1) asterisk-info--2.txt
( 2) asterisk-logs.txt
( 3) asterisk-logs--2.txt
( 4) capture3.txt
( 5) core-show-locks--2.txt
( 6) gdb-trace.txt
( 7) gdb-trace--2.txt
( 8) snmp-cpu.jpg
( 9) snmp-cpu--2.jpg
(10) system-info.txt
(11) system-info--2.txt
Description:Asterisk (SVN-branch-1.8-r299449) process is processing calls (10 per minute) but running at 99% CPU utilisation. The asterisk process runs at low CPU utilisation for approximately 2 weeks before pegging at 99% utilisation.
Have tried several asterisk releases with the same issue.

****** ADDITIONAL INFORMATION ******

config, logs and gdb trace attached.
Comments:By: Stefan Schmidt (schmidts) 2011-01-11 04:51:21.000-0600

i am not sure but maybe you should try to deactive the full log at verbose level 5 and debug level 5. thats a big amount of data you have to write to disk. Do you really need this?

maybe its something complete different but form the gdb i dont see any "hanging" thread which looks like a cpu killer.

regards

stefan

By: BrettH (zeero) 2011-01-11 19:02:47.000-0600

Hi, thanks for response.

No, the full log is not required. I initially experience the high CPU symptoms with only "error" logging enabled at verbose level=3. I then increased the logging levels in an attempt to identify the root cause for the CPU utilisation but it did not help.

Rgds
Zee

By: Stefan Schmidt (schmidts) 2011-01-12 05:36:12.000-0600

and you didnt see any locks, right?
normally a high cpu load is only caused by a looping or hanging thread but even your gdb doesnt show something like this.

can you reproduce this or is it just the now running process and you didnt have restartet it?

regards
stefan

By: Leif Madsen (lmadsen) 2011-01-12 08:48:48.000-0600

I don't see a 'core show locks' which would certainly help here.

By: Stefan Schmidt (schmidts) 2011-01-12 08:58:18.000-0600

leif its in the asterisk-info file but empty.
@zeero could you please retry the core show locks until you catch some information.

thx
regards
stefan

By: BrettH (zeero) 2011-01-12 19:19:36.000-0600

When I detach gdb it seems to kill the asterisk process. I have restarted asterisk with "/usr/sbin/asterisk -f -vvvg &" and the processor utilisation is back to 1%. I expect the high CPU utilisation to start reoccuring within the next 3 weeks.

When the issue reoccurs, I can create a cron to collect "asterisk -rx 'core show locks'" every minute. Is there any other compile flags or commands we can use to gather additional information?

Thanks
Zee

By: Stefan Schmidt (schmidts) 2011-01-13 03:43:28.000-0600

your snmp looks like you have cached a deadlock but not bad enough to kill the whole system.

why do you use the do not fork option -f? any special reason for this?

you should try to use safe_asterisk then IMHO gdb will not kill the process.

if this happens again try to catch the core show locks in the console not with a cron job, cause every minute will not be often enough.

thx

stefan

By: BrettH (zeero) 2011-01-13 21:12:24.000-0600

Hi Stefan,

The issue has reoccured again after 3 days. I had no special reason for using the -f switch when starting asterisk, however I have now restarted it using safe_asterisk and the default switches. I have attached all the new debug and log information, these files are suffixed with "--2.txt". I collected the output of "core show locks" and spammed this command for approximately 5 minutes.

Rgds
Zee

By: BrettH (zeero) 2011-01-22 00:29:15.000-0600

Hi,

Any other suggestions/recommendations?

Thanks
Zee

By: BrettH (zeero) 2011-02-01 07:30:17.000-0600

I'm at a loss on what additional things I can try to resolve this issue as this is production server. Is there a production ready version available (1.6 perhaps ?) that is easier to troubleshoot or do I need to contact a developer to consult for a fee?

Cheers
Zee

By: BrettH (zeero) 2011-02-07 23:36:18.000-0600

Attached "capture3.txt"

I may be grasping at straws but the output of:
"core show threads"
"lsof -p 25127"
"netstat -napo"

^ Seem to indicate that some TCP connections are not getting closed down or cleaned up correctly ?

By: Sean Bright (seanbright) 2019-04-09 12:32:57.283-0500

I'm not able to reproduce this with Asterisk 13 (the oldest version of Asterisk still supported). If you are able to reproduce this in Asterisk 13, please re-open by commenting on this ticket.