[Home]

Summary:ASTERISK-29609: Subsequent 'ael reload' will cause a lock up
Reporter:Mark Murawski (kobaz)Labels:
Date Opened:2021-08-23 19:59:07Date Closed:2021-09-02 14:16:00
Priority:MajorRegression?No
Status:Closed/CompleteComponents:PBX/pbx_ael
Versions:GIT 16.15.1 17.9.3 18.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Attachments:( 0) locks-after-15-seconds.txt
( 1) locks-after-30-seconds.txt
( 2) thread-all-bt.txt
Description:Requirements for reproduction of bug:
- Have a 'decent amount' of ael.  In this case, I have 650 lines of AEL in variety of files.

Obviously not a 'normal' use case to issue 10 reloads at once, but when multiple scripts/systems might be asking Asterisk to reload at the same time, this could be triggered.

5 to 10 reloads should do it!
{code}
vbox-markm-x64 {/etc/asterisk/ael} root# for i in `seq 1 10`; do asterisk -rx "ael reload" &  done
[1] 12335
[2] 12336
[3] 12337
[4] 12338
[5] 12339
[6] 12340
[7] 12341
[8] 12342
[9] 12343
[10] 12344
vbox-markm-x64 {/etc/asterisk/ael} root# asterisk -rx "core show locks" | grep "Thread ID" | wc -l
12
{code}

Depending what else is running at the time (perhaps pjsip adding hints to dialplan).  This will escallate quickly and lead to a very locked up Asterisk

The fix is to address the missing reload mutex in pbx_ael.  
Comments:By: Asterisk Team (asteriskteam) 2021-08-23 19:59:10.549-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Benjamin Keith Ford (bford) 2021-08-24 10:18:43.431-0500

Can you get a backtrace [1] for this so we can investigate the locking? DEBUG_THREADS would be helpful too if you are able to turn that option on.

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Mark Murawski (kobaz) 2021-08-24 10:59:25.320-0500

I have a fix for this pending upload to Gerrit.  But certainly I'll be attaching the locks and core dump and logs as well.

By: Benjamin Keith Ford (bford) 2021-08-24 11:37:09.977-0500

Oh, great! I'll assign the issue to you. Thanks for the contribution!

By: Mark Murawski (kobaz) 2021-08-24 11:42:01.400-0500

Also... this issue may cause a crash due to critical code running at the same time without a mutex.

By: Mark Murawski (kobaz) 2021-08-24 11:43:25.317-0500

Also... I've discussed this with some of the crew, like Joshua about myself taking over maintenance of the AEL module since there's been not much activity on it in quite some time.

I have a laundry list of fixes and patches to push in the next few months regarding AEL

By: Benjamin Keith Ford (bford) 2021-08-24 11:52:53.184-0500

Currently Josh is on vacation, but I'm sure that would not be an issue. I'll bring this to his attention when he returns if this is something you would be ok doing.

By: Mark Murawski (kobaz) 2021-08-24 12:12:42.269-0500

Crash reproduction:

Terminal 1
{code}
for i in `seq 1 20`; do asterisk -rx "ael reload" &  done
{code}

Terminal 2
{code}
Asterisk Ready.
*CLI>     -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
   -- Remote UNIX connection
[2021-08-24 13:11:47.262-0400] WARNING[22726]: ael/pval.c:2525 check_pval_item: Warning: file /etc/asterisk/ael/AgiAel/CallQueue/CallQueue.ael, line 53-53: application call to GoSub affects flow of control, and needs to be re-written using AEL if, while, goto, etc. keywords instead!
[2021-08-24 13:11:47.267-0400] ERROR[22728]: ael.flex:489 ael_yylex: File=, line=219, column=12: Mismatched ')' in expression!
[2021-08-24 13:11:47.267-0400] ERROR[22728]: ael.y:840 ael_yyerror: ==== File: (null), Line 275, Cols: 1-1: Error: syntax error, unexpected ';', expecting ')' or ','
[2021-08-24 13:11:47.268-0400] ERROR[22725]: ael.flex:489 ael_yylex: File=, line=219, column=15: Mismatched ')' in expression!
[2021-08-24 13:11:47.268-0400] ERROR[22725]: ael.y:840 ael_yyerror: ==== File: (null), Line 356, Cols: 172-172: Error: syntax error, unexpected ';', expecting ')' or ','
[2021-08-24 13:11:47.268-0400] ERROR[22725]: ael.y:840 ael_yyerror: ==== File: (null), Line 366, Cols: 3-6: Error: syntax error, unexpected word
Segmentation fault

{code}

By: Mark Murawski (kobaz) 2021-08-24 13:26:47.914-0500

Updated review link to pending review

By: Benjamin Keith Ford (bford) 2021-08-27 12:18:18.144-0500

[~kobaz] are you still interested in being the maintainer for AEL? We would be adding your name to the wiki page [1] for extended support. Looks like everyone listed there has an account on the wiki, so if you don't have one, creating one so that others can contact you would be the way to move forward.

[1]: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Open+Source+Maintainers

By: Mark Murawski (kobaz) 2021-08-27 13:59:31.986-0500

My account on the wiki is: kobaz

Thanks.

By: Benjamin Keith Ford (bford) 2021-08-27 14:40:24.135-0500

Perfect. We've got you on the list. Thanks!

By: Friendly Automation (friendly-automation) 2021-09-02 14:16:01.505-0500

Change 16356 merged by George Joseph:
pbx_ael:  Fix crash and lockup issue regarding 'ael reload'

[https://gerrit.asterisk.org/c/asterisk/+/16356|https://gerrit.asterisk.org/c/asterisk/+/16356]

By: Friendly Automation (friendly-automation) 2021-09-02 14:16:10.012-0500

Change 16357 merged by George Joseph:
pbx_ael:  Fix crash and lockup issue regarding 'ael reload'

[https://gerrit.asterisk.org/c/asterisk/+/16357|https://gerrit.asterisk.org/c/asterisk/+/16357]

By: Friendly Automation (friendly-automation) 2021-09-02 14:16:18.837-0500

Change 16358 merged by George Joseph:
pbx_ael:  Fix crash and lockup issue regarding 'ael reload'

[https://gerrit.asterisk.org/c/asterisk/+/16358|https://gerrit.asterisk.org/c/asterisk/+/16358]

By: Friendly Automation (friendly-automation) 2021-09-02 14:16:29.625-0500

Change 16348 merged by George Joseph:
pbx_ael:  Fix crash and lockup issue regarding 'ael reload'

[https://gerrit.asterisk.org/c/asterisk/+/16348|https://gerrit.asterisk.org/c/asterisk/+/16348]