[Home]

Summary:ASTERISK-28845: segfault and then crash
Reporter:Mike (forfx)Labels:webrtc
Date Opened:2020-04-20 13:53:18Date Closed:2020-05-06 12:00:01
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_mixmonitor Channels/chan_pjsip PBX/pbx_lua PBX/pbx_realtime
Versions:16.3.0 16.8.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:4cpu 4Gb CentOS Linux release 7.7.1908 (Core) vmware ESXi 6.7 A01Attachments:( 0) 16_3_details.tar.gz
( 1) 16_8_details.tar.gz
( 2) coredumper1.tar.gz
( 3) crash1.tar.gz
( 4) crash2.tar.gz
Description:several times per day asterisk crashes
Apr 19 21:46:08 localhost kernel: asterisk[16076]: segfault at 4 ip 0000000000000004 sp 00007ffb03c01c78 error 14 in asterisk[400000+2bb000]
Apr 20 09:41:59 localhost kernel: asterisk[26332]: segfault at 7f510c270b7c ip 00007f52149e5f1c sp 00007f5191578be0 error 6 in libasteriskpj.so.2[7f521489f000+18f000]
Apr 20 09:42:23 localhost kernel: asterisk[3549]: segfault at 18 ip 00007fbdcccf6b91 sp 00007fbd0ec30930 error 4 in libasteriskpj.so.2[7fbdccbc0000+18f000]
Apr 20 09:42:43 localhost kernel: asterisk[3746]: segfault at 18 ip 00007f29acfdeb91 sp 00007f29290e9930 error 4 in libasteriskpj.so.2[7f29acea8000+18f000]
Apr 20 10:35:55 localhost kernel: asterisk[8300]: segfault at 7fb5f7f2d31c ip 00007fb82cc29f4a sp 00007fb76cf618b0 error 4 in libasteriskpj.so.2[7fb82cae3000+18f000]
Apr 20 11:53:26 localhost kernel: asterisk[11131]: segfault at 7eff1f663b7c ip 00007f0028054f1c sp 00007effa49e8be0 error 6 in libasteriskpj.so.2[7f0027f0e000+18f000]
Apr 20 13:48:45 localhost kernel: asterisk[26108]: segfault at 7f64961a8f0c ip 00007f66d2f12f4a sp 00007f66135bd640 error 4 in libasteriskpj.so.2[7f66d2dcc000+18f000]
pr 19 11:40:05 localhost kernel: asterisk[24850]: segfault at 0 ip 00007f32739be3f9 sp 00007f3208bba528 error 4 in liblua-5.3.so[7f32739b6000+38000]
Apr 19 12:13:49 localhost kernel: asterisk[27989]: segfault at fffffffffffffff0 ip 00007f35477b3ac4 sp 00007f34d6abf528 error 7 in liblua-5.3.so[7f35477ab000+38000]
[
Comments:By: Asterisk Team (asteriskteam) 2020-04-20 13:53:19.650-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2020-04-20 13:57:27.180-0500

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Mike (forfx) 2020-04-20 13:58:02.661-0500

from ast_coredumper

By: Joshua C. Colp (jcolp) 2020-04-20 14:03:33

What is the load on the system? What is it being used for? How many PJSIP endpoints, AORs, and contacts?

By: Mike (forfx) 2020-04-20 14:10:45.679-0500

There no specific steps.
I have several servers divided by 2 group - ivr group and operators group.

Servers where segfault in liblua-5.3.so is from ivr group. (with liblua-5.1.so the same crash)
The are queues without agents
Incoming calls are going to queues
By ami command created call to operators server then it bridges with call from queue

Servers where segfault in libasteriskpj.so.2 is from operators group
Endpoints connected by webrtc.
Incoming call redirects to endpoint.

By: Mike (forfx) 2020-04-20 14:16:21.856-0500

Load average less then 1 or 2
on operators servers the are from 5 to 30 endpoints per server
on ivr servers may be up to 100 calls, most of them redirects to another server, and less to operators servers
all servers works with realtime with MariaDB with galera cluster

By: Kevin Harwell (kharwell) 2020-04-20 15:43:54.152-0500

What's the output of the following:
{noformat}
*CLI> core show settings
*CLI> pjsip show version
*CLI> pjsip dump endpt details
{noformat}
Also can you attach your _config.log_ (should be in the top level source directory if you built from source)?

By: Mike (forfx) 2020-04-21 03:09:00.039-0500

16_3_details.tar.gz
crash1.tar.gz
crash2.tar.gz
this data is just after segfault, asterisk tried to start 7 times and crushed, started on 8 try. After 13 minutes it crushed again.

By: Mike (forfx) 2020-04-21 03:13:24.587-0500

I have another 2 virtual machines, clones of "operators asterisk". They running without crushes. The only difference is that there is no audio recording on them.

By: Mike (forfx) 2020-04-21 06:48:46.423-0500

16_8_details.tar.gz
from ivr server, no audiorecords, no endpoints

By: Kevin Harwell (kharwell) 2020-04-21 12:23:26.764-0500

Regarding the "operator group" crashes within pjproject.

Similar crashes have been reported, and for the most part have been fixed in later versions than what you are currently running (certified/16.3-cert1). Please upgrade to the latest version of Asterisk to see if that fixes that problem. If you are still having similar problems after that then there is already an open issue for that (ASTERISK-28161). Please comment there if your crash is related.

By: Kevin Harwell (kharwell) 2020-04-21 12:54:58.059-0500

Regarding the "ivr group" crashes within lua.

Those crashes appear to be down in lua itself, and seem to occur at different places. Makes me think either bad data is being passed in, or some memory is getting corrupted.

Did this just started happening? What might have changed from before? Did you upgrade _lua_? Asterisk (although according to the log _pbx_lua_ has been modified in a couple of years)? The dialplan?

Anything in the log? Error, warning, or notices? If you haven't already try enabling debug and verbose, and attach those logs and associated crash.

By: Mike (forfx) 2020-04-21 13:03:38.889-0500

Regarding the "operator group" crashes within pjproject.
First I got crash in latest Asterisk 16.8.0, then I set up another server with Asterisk certified/16.3-cert1 and got the same.
Now I have both 16.8 and 16.3 and they all crashes with the same error.


By: Kevin Harwell (kharwell) 2020-04-21 15:18:27.143-0500

aah okay. For that crash then please comment, or add any relevant information to ASTERISK-28161.

Since there are two separate issues here, we'll use this one to track the _pbx_lua_ crash.

By: Asterisk Team (asteriskteam) 2020-05-06 12:00:00.956-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines