[Home]

Summary:ASTERISK-29210: res_pjsip: Crash when examining transport
Reporter:N GM (ngm)Labels:
Date Opened:2020-12-10 09:55:48.000-0600Date Closed:2021-01-06 18:21:50.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:18.1.0 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-29218 res_pjsip: segfault during UDP registration when flow transport is configured
Environment:Debian 10( buster) on armv7l, dual core, 1400Mhz CPU, single SIP line.Attachments:( 0) backtrace.txt
( 1) debug_log_12_11_20
( 2) gdb.txt
( 3) pjsip.conf
Description:When starting Asterisk in daemon mode, using /usr/sbin/asterisk or /etc/init.d/asterisk start, daemon starts with no errors, but within a few second, ps shows that the daemon has crashed and that the process /usr/sbin/asterisk is no longer present. When started with -g option, produces a core dump. When started with /usr/sbin/safe_asterisk, produces continuous core dumps in /tmp.

Interestingly, when started in CLI mode in foreground with /usr/sbin/asterisk -cvvvv, so far it has remained stable with no auto-shutdown or no core dump. SIP connection to SIP provider is established and voice calls work. More testing is perhaps needed to see if core dumps can be replicated in foreground.

/var/log/asterisk/messages does not show any errors/warnings at the point of daemon crash/core dump.

All testing is currently at halt and rollout to production status is on hold, due to these continuous core dumps.
Comments:By: Asterisk Team (asteriskteam) 2020-12-10 09:55:50.667-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Asterisk Team (asteriskteam) 2020-12-10 09:55:51.769-0600

The severity of this issue has been automatically downgraded from "Blocker" to "Major". The "Blocker" severity is reserved for issues which have been determined to block the next release of Asterisk. This severity can only be set by privileged users. If this issue is deemed to block the next release it will be updated accordingly during the triage process.

By: Joshua C. Colp (jcolp) 2020-12-10 10:01:59.836-0600

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: N GM (ngm) 2020-12-10 10:04:19.499-0600

When trying to create a backtrace from the core dump, I get the following error

/var/lib/asterisk/scripts/ast_coredumper asterisk.1607612408.4454.11.core
/var/lib/asterisk/scripts/ast_coredumper: line 362: /dev/fd/62: No such file or directory

Please advise on how to fix this issue with ast_coredumper, or alternate ways to get you details of the core dump. I currently have 3 back to back core dumps produced by safe_asterisk in /tmp. Thanks!

By: Joshua C. Colp (jcolp) 2020-12-10 10:04:55.933-0600

As well OpenWrt on ARM is not a supported platform for Asterisk, it is community supported, so if the problem is isolated to that the response time would reflect that.

By: Joshua C. Colp (jcolp) 2020-12-10 10:06:00.092-0600

If ast_coredumper isn't working you can try older instructions[1].

[1] https://wiki.asterisk.org/wiki/pages/viewpage.action?pageId=5243139#GettingaBacktrace(Asteriskversions%3C13.14.0and14.3.0)-GettingInformationAfterACrash

By: N GM (ngm) 2020-12-10 10:24:41.726-0600

These are the output of the manual backtrace process you sent to me, for the first core dump.

Please let me know if need this information from the other two core dumps I have in /tmp, which were created back-to-back within seconds of this first core dump.

Thanks!

By: Joshua C. Colp (jcolp) 2020-12-10 10:37:15.263-0600

Do you have the complete console output at startup?

By: N GM (ngm) 2020-12-10 20:19:57.235-0600

Can you please let me know where I would find this console output? This crash/core dump so far appears to only happen when in background.

By: Joshua C. Colp (jcolp) 2020-12-11 03:52:37.449-0600

Asterisk can log to the /var/log/asterisk/messages and /var/log/asterisk/full file. There is documentation on the wiki[1] for how to configure to get information.

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: N GM (ngm) 2020-12-11 09:20:08.269-0600

Thank you the instructions on how to capture console details to log.

After changing logger.conf as per the instructions, I started asterisk in foreground (which typically does not and in this case did not crash), updated verbose levels at CLI as per the instructions, and then exited normally.

After that I started asterisk 3 times in background, and in all 3 cases it crashed.

Attached is the log file capturing that one foreground normal start/exit, and the subsequent three background start/crash.

By: N GM (ngm) 2020-12-11 09:22:47.072-0600

BTW, I just noticed that this issue's title is currently "res_pjsip: Crash when examining transport", which was not what I had put in yesterday, since I had no idea if the crash was related to res_pjsip and I am not sure what "examining transport" is. Did you update the title? Thanks.

By: Joshua C. Colp (jcolp) 2020-12-11 09:26:50.327-0600

I did update the title based on initial analysis of the crash.

By: Joshua C. Colp (jcolp) 2020-12-11 09:28:08.276-0600

Are you sure this log was done according to the documentation? It is usually substantially larger and contains a ton more messages.

By: N GM (ngm) 2020-12-11 09:41:47.997-0600

Yes. I too was surprised by how short it was. Since it crashes almost immediately after start in background, I assumed that may the reason for it.

Here are the steps I followed. Please let me know if I made any errors.

1. Update logger.conf in /etc/asterisk with the line "debug_log_12_11_20 => notice,warning,error,debug,verbose,dtmf" in [logger] section.

2. Start asterisk in foreground and enter the following 4 commands.

core set verbose 5
core set debug 5
module reload logger
pjsip set logger on

3. Exit CLI with "core stop now"

4. Start asterisk in background with "asterisk -g", check if it crashes with ps. Repeat 3 times.

5. Copy /var/log/asterisk/debug_log_12_11_20 to send to you.

By: Sean Bright (seanbright) 2020-12-11 09:43:49.507-0600

Try starting asterisk like this: {{asterisk -cvvvvvvvvvvvvvvvvvvvvvvvvvvvdddddddddddddddg}}

By: N GM (ngm) 2020-12-11 09:46:01.153-0600

Sure, I can try that. But wouldn't that command start asterisk in foreground? It does not crash in foreground, only in background. Do you want to capture the console log in the debug file with this command, for the situation where it does not crash?

By: N GM (ngm) 2020-12-11 09:53:37.707-0600

I ran the command "asterisk -cvvvvvvvvvvvvvvvvvvvvvvvvvvvdddddddddddddddg" and it crashed with a "Segmentation fault (core dumped)", after a ton of messages !

Let me upload the updated debug file. It is quite large this time, compared to last time.

By: N GM (ngm) 2020-12-11 09:57:43.571-0600

Attached is the updated debug file, after the Segmentation Fault (core dumped) message.

By: Sean Bright (seanbright) 2020-12-11 10:21:23.875-0600

Please attach your {{pjsip.conf}}. If it contains usernames/passwords you should redact those before attaching.

By: N GM (ngm) 2020-12-11 11:38:58.216-0600

Attached is pjsip.conf with personal information replaced with "****"

By: Joshua C. Colp (jcolp) 2020-12-15 04:19:45.794-0600

If you remove Google Voice, for testing purposes, does the crash disappear?

By: N GM (ngm) 2020-12-15 12:27:26.434-0600

How do I realistically test that scenario? I do not have any other SIP provider account to test it with.

Also, can you share some technical details about what you have found so far from the dump details I have provided? I am a programmer, and may be of more help to you in testing, if I understand your preliminary analysis.

Thx.

By: Joshua C. Colp (jcolp) 2020-12-15 12:30:42.968-0600

I don't have anything concrete. The question was to narrow down the specific cause. This issue isn't being experienced by others, and Google Voice is barely used by anyone any longer - so that would stick out. If not using it causes it to go away, then it's something in that area.

By: Joshua C. Colp (jcolp) 2020-12-15 12:31:21.323-0600

I should also add that I am not actively working on this issue. I am working to triage it such that someone could in the future work on it

By: N GM (ngm) 2020-12-15 13:03:49.106-0600

Thanks for that update. Can @Sean Bright (who appears to be a programmer) and who has interacted with me on this issue, perhaps comment on what his preliminary thoughts are?

For what its worth I checked with some other folks who compiled the exact same source code that I compiled and are using it with Google Voice but on a x86 platform, do not have this issue. I also compiled the exact same source code under WSL/Ubuntu(i.e. x86), with exact same config of pjsip.conf that I sent you, and connected to GV, and again no issues.

So I don't think from a preliminary perspective, that it may be the GV part which is the problem - feels more like a platform/library issue. Hence, I was curious if a programmer like @Sean Bright can shed some light from the dumps, to narrow it down.

Thx.

By: Joshua C. Colp (jcolp) 2020-12-16 04:05:08.543-0600

A platform/usage patterns/CPU can all contribute to exposing an issue that is possible on everything but may not be easily caused on other systems. Disregarding Google Voice as a factor just because it's not happening on other systems is a bad idea for that reason.

By: N GM (ngm) 2020-12-16 12:39:18.197-0600

Understood. I was just trying to share additional information, if that is of any help to you. At the same time, I was looking for some insights from your end, on preliminary analysis of the dumps that I sent to you/@Sean Bright.

Can you give me some guidance on how to test removing of GV, since I don't have another SIP provider to configure in its place?

Thanks.



By: Joshua C. Colp (jcolp) 2020-12-16 12:42:02.329-0600

If it's as easy as starting Asterisk with Google Voice configured, then starting Asterisk without it would narrow it down.

By: N GM (ngm) 2020-12-16 13:36:48.118-0600

Got it. Let me try it out.

By: N GM (ngm) 2020-12-16 14:27:47.728-0600

I replaced pjsip.conf with the pjsip.conf which comes in the sample folder (which appears to be essentially an empty pjsip.conf, since every line appears to be commented in it)

I then restarted asterisk in background, and it has not crashed in the last 30 minutes.

By: Joshua C. Colp (jcolp) 2020-12-16 14:38:46.188-0600

That eliminates too much since you've now eliminated your phone registering as well, this is why I specifically mentioned removing Google Voice since it was a specific aspect that was uncommon.

By: N GM (ngm) 2020-12-16 18:43:54.320-0600

Sorry, I misunderstood earlier.

I removed the gvsip1 sections in my pjsip.conf and restarted asterisk in background. It has not crashed in the last 30 minutes. Phone is registered in Asterisk, under "pjsip show contacts".

By: N GM (ngm) 2020-12-23 16:03:20.590-0600

I am not sure, if you got my last feedback 7 days ago. I did as you said, and removed Google Voice. After starting in background. it did not crash. FYI, the GV setup I have uses TLS for transport, so for this test when I removed Google Voice, TLS also got removed. So we have two variables changed now - GV and TLS - and the segfault has disappeared. Hopefully this helps you narrow down the issue to these two, in addition to the other material I have provided. Please let me know if you need any additional information or tests to be done. Thanks and happy holidays!

By: Joshua C. Colp (jcolp) 2020-12-26 18:10:57.090-0600

[~naf] has a change up for review to resolve the issue. It was isolated to Google Voice functionality.