[Home]

Summary:ASTERISK-19867: asterisk fails to lower its priority when astcanary dies
Reporter:Xavier Hienne (xhienne)Labels:
Date Opened:2012-05-14 09:35:34Date Closed:2016-10-26 08:59:16
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Utilities/astcanary
Versions:1.8.7.2 1.8.11.1 1.8.12.0 11.23.1 Frequency of
Occurrence
Related
Issues:
is caused byASTERISK-11045 [patch] Tweet, tweet
is related toASTERISK-26352 Astcanary dies when doing "core restart"
Environment:Debian GNU/Linux 6 (squeeze) Linux 2.6.32-5-686 and 2.6.32-5-686-bigmem GNU libc 2.11Attachments:
Description:When astcanary dies, asterisk is supposed to lower its priority and return to the SCHED_OTHER scheduling policy.

This is not what I'm observing on Linux: apparently, the death of astcanary is detected by the watching thread, but it fails to change the priority of all other asterisk threads. I suspect that only the watching thread sees its priority changed (just before dying) and this might be the expected behavior of sched_setscheduler() with pid=0.

Here is how to reproduce this issue:

$ ps u $(pidof astcanary asterisk)
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     31854  6.8  3.3  34732 17120 ?        Ssl  15:59   0:00 /usr/sbin/asterisk -p -M 1000
root     31855  0.0  0.1   2892   772 ?        S    15:59   0:00 astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet 31854

$ chrt -p $(pidof asterisk)
pid 31854's current scheduling policy: SCHED_RR
pid 31854's current scheduling priority: 10

$ killall astcanary
(after some time => WARNING[31856] asterisk.c: The canary is no more...)

$ chrt -p $(pidof asterisk)
pid 31854's current scheduling policy: SCHED_RR
pid 31854's current scheduling priority: 10

=> asterisk is still at its highest priority
Comments:By: Rusty Newton (rnewton) 2016-09-15 17:26:43.068-0500

[~xhienne], I don't think we were ever able to reproduce this.

Does this issue still occur for you on supported versions of Asterisk?

By: Walter Doekes (wdoekes) 2016-09-16 08:44:13.301-0500

Same happens over here.

The canary detection takes quite some time too..

{noformat}
# asterisk -V
Asterisk 11.23.1-osso1~jessie

# chrt -p $(pidof asterisk)
pid 16560's current scheduling policy: SCHED_RR
pid 16560's current scheduling priority: 10

# date; pkill astcanary
Fri Sep 16 15:39:58 CEST 2016

# tail -n1 /var/log/asterisk/messages.log
[Sep 16 15:41:38] WARNING[16566] asterisk.c: The canary is no more.  He has ceased to be!  He's expired and gone to meet his maker!  He's a stiff!  Bereft of life, he rests in peace.  His metabolic processes are now history!  He's off the twig!  He's kicked the bucket.  He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!!  THIS is an EX-CANARY.  (Reducing priority)

# date; chrt -p $(pidof asterisk)
Fri Sep 16 15:42:20 CEST 2016
pid 16560's current scheduling policy: SCHED_RR
pid 16560's current scheduling priority: 10
{noformat}

Note that I'm running as user asterisk, so that may affect how easily the process can switch priority..

By: Walter Doekes (wdoekes) 2016-09-16 09:17:49.231-0500

Okay.. the problem was that sched_setscheduler fixed the astcanary-checking thread, and not the entire process.

If I change it to change {{getpid()}} instead of {{0}} it appears to behave well, but it still doesn't really:
{noformat}
# chrt -p `pidof asterisk`
pid 7220's current scheduling policy: SCHED_RR
pid 7220's current scheduling priority: 10
# date; pkill astcanary
Fri Sep 16 16:01:01 CEST 2016
...
[Sep 16 16:02:49] WARNING[7222] asterisk.c: The canary is no more.  He has ceased to be!  He's expired and gone to meet his maker!  He's a stiff!  Bereft of life, he rests in peace.  His metabolic processes are now history!  He's off the twig!  He's kicked the bucket.  He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!!  THIS is an EX-CANARY.  (Reducing priority)

# chrt -p `pidof asterisk`
pid 7220's current scheduling policy: SCHED_OTHER
pid 7220's current scheduling priority: 0

# ps -eLf | grep sbin/asterisk | awk '{print $4}' | xargs -I{} chrt -p {}
pid 7220's current scheduling policy: SCHED_OTHER
pid 7220's current scheduling priority: 0
pid 7223's current scheduling policy: SCHED_RR
pid 7223's current scheduling priority: 10
pid 7224's current scheduling policy: SCHED_RR
pid 7224's current scheduling priority: 10
pid 7225's current scheduling policy: SCHED_RR
pid 7225's current scheduling priority: 10
...
{noformat}

{quote}
Specifying pid as 0 will operate on the attributes of the calling thread, and passing the value returned from a call to getpid(2) will operate on the attributes of the main thread of the thread group.
{quote}
So, neither 0/gettid() nor getpid() is good enough for our purposes.

Perhaps it's easiest to cheat and execve {{chrt -a -p GETPID}}, as I don't see an easy way to find the thread IDs of the running program.
Or, if this is portable across linux kernels, check {{ls /proc/GETPID/task}} and call sched_setscheduler for every tid/numeric-dir found there.

By: Walter Doekes (wdoekes) 2016-09-16 10:15:17.922-0500

This should probably fix things:
https://gerrit.asterisk.org/#/c/3916

By: Walter Doekes (wdoekes) 2016-09-17 11:33:46.042-0500

I'm not sure this has ever worked...

By: Walter Doekes (wdoekes) 2016-10-21 01:44:13.332-0500

[~rnewton] (repeated after lost msg on irc): ASTERISK-26352, ASTERISK-19867, ASTERISK-26358, ASTERISK-23989 aren't closed and tagged as fixed in 11.24.0; probably auto-close-by-jira failure and hence a lack of tagging for release.