[Home]

Summary:ASTERISK-27155: core dump might be pjsip related?
Reporter:Tony Guadagno (tonyg)Labels:
Date Opened:2017-07-25 14:14:23Date Closed:2017-10-20 12:17:06
Priority:MajorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:13.17.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:freepbx 14.0.1.1Attachments:( 0) asterisk-21755-testconfig.tar.gz
( 1) asterisk-27155-results.tar.gz
( 2) asterisk-27155-results.tar.gz
( 3) ASTERISK-27155-results.tar.gz
( 4) asterisk-27155-results-2017-07-27-20-22.tar.gz
( 5) asterisk-ASTERISK-27155-withsrtp144-results.tar.gz
( 6) issue-27155.log
( 7) issue-27155.log
Description:hi, after running stable for about 10 days, i got 2 core dumps 3 minutes apart.  i  will upload the tarballs for both
Comments:By: Asterisk Team (asteriskteam) 2017-07-25 14:14:24.796-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Tony Guadagno (tonyg) 2017-07-25 14:22:09.083-0500

back traces using ast_coredumper

By: Tony Guadagno (tonyg) 2017-07-25 18:03:22.636-0500

here is some additional info.  at 15:00:40, the first core dump happened.  in the full log, there seems to be nothing special going on:

[2017-07-25 15:00:26] VERBOSE[24280][C-00000494] pbx.c: Executing [s@macro-hangupcall:4] Hangup("PJSIP/Level3Primary-000005e2", "") in new stack
[2017-07-25 15:00:26] VERBOSE[24280][C-00000494] app_macro.c: Spawn extension (macro-hangupcall, s, 4) exited non-zero on 'PJSIP/Level3Primary-000005e2' in macro 'hangupcall'
[2017-07-25 15:00:26] VERBOSE[24280][C-00000494] pbx.c: Spawn extension (ext-local, h, 1) exited non-zero on 'PJSIP/Level3Primary-000005e2'
[2017-07-25 15:00:28] VERBOSE[24282][C-00000495] app_dial.c: PJSIP/Level3Primary-000005e4 is making progress passing it to PJSIP/5358-000005e3
[2017-07-25 15:00:29] VERBOSE[24282][C-00000495] app_dial.c: PJSIP/Level3Primary-000005e4 is ringing
[2017-07-25 15:00:29] VERBOSE[24282][C-00000495] app_dial.c: PJSIP/Level3Primary-000005e4 answered PJSIP/5358-000005e3
[2017-07-25 15:00:29] VERBOSE[24304][C-00000495] bridge_channel.c: Channel PJSIP/Level3Primary-000005e4 joined 'simple_bridge' basic-bridge <58c5a3c4-15fe-4c70-b022-6318ec5c58c5>
[2017-07-25 15:00:29] VERBOSE[24282][C-00000495] bridge_channel.c: Channel PJSIP/5358-000005e3 joined 'simple_bridge' basic-bridge <58c5a3c4-15fe-4c70-b022-6318ec5c58c5>
[2017-07-25 15:00:41] Asterisk 13.17.0 built by mockbuild @ jenkins7.sangoma.net on a x86_64 running Linux on 2017-07-18 15:22:51 UTC
[Jul 25 15:00:41] VERBOSE[24552] config.c: Parsing '/etc/asterisk/logger.conf': Found
[Jul 25 15:00:41] VERBOSE[24552] config.c: Parsing '/etc/asterisk/logger_general_additional.conf': Found
[Jul 25 15:00:41] VERBOSE[24552] config.c: Parsing '/etc/asterisk/logger_general_custom.conf': Found
[Jul 25 15:00:41] VERBOSE[24552] config.c: Parsing '/etc/asterisk/logger_logfiles_additional.conf': Found
[Jul 25 15:00:41] VERBOSE[24552] config.c: Parsing '/etc/asterisk/logger_logfiles_custom.conf': Found
[2017-07-25 15:00:41] VERBOSE[24552] message.c: Message handler 'dialplan' registered.


By: Tony Guadagno (tonyg) 2017-07-25 18:16:48.841-0500

here is the section of the log from the second core dump, again, nothing special

[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk-predial-hook:1] MacroExit("PJSIP/5306-00000032", "") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:18] GotoIf("PJSIP/5306-00000032", "0?bypass,1") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:19] ExecIf("PJSIP/5306-00000032", "1?Set(CONNECTEDLINE(num,i)=16199240338)") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:20] ExecIf("PJSIP/5306-00000032", "1?Set(CONNECTEDLINE(name,i)=CID:+15859244000)") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:21] ExecIf("PJSIP/5306-00000032", "0?Set(CONNECTEDLINE(name,i)=CID:(Hidden)+15859244000)") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:22] GotoIf("PJSIP/5306-00000032", "0?customtrunk") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] pbx.c: Executing [s@macro-dialout-trunk:23] Dial("PJSIP/5306-00000032", "PJSIP/16199240338@Level3Primary,300,T") in new stack
[2017-07-25 15:03:38] VERBOSE[27123][C-0000002c] app_dial.c: Called PJSIP/16199240338@Level3Primary
[2017-07-25 15:03:41] VERBOSE[27123][C-0000002c] app_dial.c: PJSIP/Level3Primary-00000033 is making progress passing it to PJSIP/5306-00000032
[2017-07-25 15:03:41] VERBOSE[27123][C-0000002c] app_dial.c: PJSIP/Level3Primary-00000033 answered PJSIP/5306-00000032
[2017-07-25 15:03:41] VERBOSE[27139][C-0000002c] bridge_channel.c: Channel PJSIP/Level3Primary-00000033 joined 'simple_bridge' basic-bridge <daba7829-ec5a-49b9-8106-6a4414b7ac46>
[2017-07-25 15:03:41] VERBOSE[27123][C-0000002c] bridge_channel.c: Channel PJSIP/5306-00000032 joined 'simple_bridge' basic-bridge <daba7829-ec5a-49b9-8106-6a4414b7ac46>
[2017-07-25 15:03:47] Asterisk 13.17.0 built by mockbuild @ jenkins7.sangoma.net on a x86_64 running Linux on 2017-07-18 15:22:51 UTC
[Jul 25 15:03:47] VERBOSE[27286] config.c: Parsing '/etc/asterisk/logger.conf': Found
[Jul 25 15:03:47] VERBOSE[27286] config.c: Parsing '/etc/asterisk/logger_general_additional.conf': Found
[Jul 25 15:03:47] VERBOSE[27286] config.c: Parsing '/etc/asterisk/logger_general_custom.conf': Found
[Jul 25 15:03:47] VERBOSE[27286] config.c: Parsing '/etc/asterisk/logger_logfiles_additional.conf': Found
[Jul 25 15:03:47] VERBOSE[27286] config.c: Parsing '/etc/asterisk/logger_logfiles_custom.conf': Found
[2017-07-25 15:03:47] VERBOSE[27286] message.c: Message handler 'dialplan' registered.
[2017-07-25 15:03:47] VERBOSE[27286] pbx_functions.c: Registered custom function 'MESSAGE'
[2017-07-25 15:03:47] VERBOSE[27286] pbx_functions.c: Registered custom function 'MESSAGE_DATA'
[2017-07-25 15:03:47] VERBOSE[27286] pbx_app.c: Registered application 'MessageSend'


By: Joshua C. Colp (jcolp) 2017-07-25 18:23:10.702-0500

These backtraces show the problem being in libsrtp, a third party library. What version is in use?

By: Tony Guadagno (tonyg) 2017-07-25 18:37:08.824-0500

joshua, I am embarrassed to say i am not sure, judging by this, i would say 1.4.4, is there a more scientific way to tell?

[root@uepbx1 ~]# find / -name libsrtp\* -type f
/usr/lib64/libsrtp.so.0.0.0
/usr/share/doc/libsrtp-1.4.4/libsrtp.pdf


By: Tony Guadagno (tonyg) 2017-07-25 22:57:51.745-0500

1.4.4 confirmed

[root@uepbx1 ~]# yum list installed | grep -i srtp
libsrtp.x86_64                          1.4.4-10.20101004cvs.el7  @anaconda/1703


By: Andrew Nagy (tm1000) 2017-07-26 11:01:29.093-0500

Standard Redhat release http://rpm.pbone.net/index.php3/stat/4/idpl/35489540/dir/centos_7/com/libsrtp-1.4.4-10.20101004cvs.el7.x86_64.rpm.html

By: Tony Guadagno (tonyg) 2017-07-27 14:13:41.066-0500

Hi, I just got reports that we core dumped twice today. In both cases, the reporting user said that they were on a conference(external) call for about 15 minutes.. I think this has something to do with how long the call is. Any help would be greatly appreciate

By: Tony Guadagno (tonyg) 2017-07-27 14:24:36.957-0500

more core dumps

By: Tony Guadagno (tonyg) 2017-07-27 22:28:24.680-0500

hi, i have some new info that might shed light on whats going on.  tonight, i conducted some tests.  i noticed that one brand of phone seemed to be involved in all the core dumps we have been experiencing.  most of our phones are grandstream but we have 6 vtech vcs-752's.  after trying different functions, i found that asterisk will core dump every time i resume from hold...but only on the vtech's, this does not happen on the grandstreams.

While i understand the desire to close this ticket and call it a vtech issue, i am asking you not to.  Although, clearly, vtech is part of the issue, i feel that whatever vtech is doing should not result in asterisk core dumping....  i am hoping you will look into this further, if for no other reason than this is a security issue...this would be a very nice DoS attack.  please let me know what you think, here is the log excerpt and the new back trace that goes with it, as you can see, there is not much in the log
thanks

[2017-07-27 20:21:29] VERBOSE[31931][C-00000000] res_musiconhold.c: Started music on hold, class 'default', on channel 'PJSIP/Level3Primary-00000001'
[2017-07-27 20:21:34] VERBOSE[27319] res_pjsip_registrar.c: Added contact 'sips:5371@172.30.2.91:41326;transport=TLS' to AOR '5371' with expiration of 3600 seconds
[2017-07-27 20:21:34] VERBOSE[27324] res_pjsip/pjsip_configuration.c: Contact 5371/sips:5371@172.30.2.91:41326;transport=TLS has been created
[2017-07-27 20:21:34] VERBOSE[27324] res_pjsip/pjsip_configuration.c: Endpoint 5371 is now Reachable
[2017-07-27 20:21:34] VERBOSE[27324] res_pjsip/pjsip_configuration.c: Contact 5371/sips:5371@172.30.2.91:41313;transport=TLS has been deleted
[2017-07-27 20:21:34] VERBOSE[27324] res_pjsip/pjsip_configuration.c: Contact 5371/sips:5371@172.30.2.91:41326;transport=TLS is now Reachable.  RTT: 62.179 msec
[2017-07-27 20:22:03] VERBOSE[31931][C-00000000] res_musiconhold.c: Stopped music on hold on PJSIP/Level3Primary-00000001
[2017-07-27 20:22:08] Asterisk 13.17.0 built by mockbuild @ jenkins7.sangoma.net on a x86_64 running Linux on 2017-07-27 18:40:15 UTC
[Jul 27 20:22:08] VERBOSE[32619] config.c: Parsing '/etc/asterisk/logger.conf': Found
[Jul 27 20:22:08] VERBOSE[32619] config.c: Parsing '/etc/asterisk/logger_general_additional.conf': Found


By: Tony Guadagno (tonyg) 2017-07-27 22:29:02.169-0500

new core dump produce by resuming from hold

By: Rusty Newton (rnewton) 2017-07-28 10:15:31.532-0500

bq. i am hoping you will look into this further, if for no other reason than this is a security issue..

If you think this is a security issue - it is really bad form to report it in a public tracker.

https://wiki.asterisk.org/wiki/display/AST/Asterisk+Security+Vulnerabilities

In the future please follow the guidelines for reporting security issues to the Asterisk team. I've locked down the issue in terms of visibility until we have a chance to triage the issue.

By: Tony Guadagno (tonyg) 2017-07-28 10:18:36.422-0500

rusty, of course you are right, sorry, i did not think it threw.

thanks


By: Rusty Newton (rnewton) 2017-07-28 10:27:32.006-0500

It would help to have a debug log captured during reproduction of the issue. Be sure that the verbose and debug log levels are set to 5 or above. Also make sure the pjsip logger output is included in the log.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

In addition please attach the minimal necessary configuration to reproduce the issue, dialplan, endpoint configuration and so on.

Please include a debug log/pjsip trace for a call/hold resume situation from a phone that doesn't trigger the issue and then one from a phone that does trigger the issue.

Are you able to trigger the issue on an unauthenticated call?

By: Tony Guadagno (tonyg) 2017-08-07 09:26:35.282-0500

hi, sorry for the delay, it took me a while to create a test environment.  i have create a new install of freepbx 14.0.1.4 using asterisk 13.17.  in this environment i have 2 phones. one is a yealink (extension 100) and the other is the vtech vcs 752 (extension 101).  Initially, i set them both up with simple udp (no encryption) and could not reproduce the error.  I then enabled encryption on the vtech, I called ext 100 from 101, placed that call on hold and then pressed resume.  at that point asterisk core dumped.  I have the logs you requested, please let me know if you need anything else.

i have attached, a debug log with pjsip logging
core dump back trace.
tar of the /etc/asterisk folder

By: Tony Guadagno (tonyg) 2017-08-07 09:29:39.553-0500

tar file for /etc/asterisk
back trace of core dump
debug log with pjsip on


By: George Joseph (gjoseph) 2017-08-07 10:40:51.109-0500

Unfortunately, there is no debug info in any of the stack traces and that version of libsrtp is 7 years old.  

Can you uninstall the packaged libsrtp and install libsrtp 1.5.x from source and see if you can reproduce the issue.  You _may_ have to recompile asterisk as well.  On our CentOS systems we always compile libsrtp 1.5.x from source.  Earlier versions have had issues.






By: Tony Guadagno (tonyg) 2017-08-07 11:20:35.942-0500

George, I thought I followed the instructions on how to get the logs, I will try again

On the issue of libsrtp Andrew commented that 1.4.4 was the redhat standard, is this not the case?

Andrew can you clarify?

By: George Joseph (gjoseph) 2017-08-07 11:44:36.543-0500

Make sure that when asterisk is compiled, you have "DONT_OPTIMIZE" and "BETTER_BACKTRACES" turned on.

1.4.4 is what's currently in the repo for CentOS and probably RHEL.  For Fedora, it's been 1.5.4 for a while.   In fact, the 1.4.x branch isn't even on the libsrtp github site.  It's only in the archives.






By: Tony Guadagno (tonyg) 2017-08-07 12:40:26.882-0500

updated log with verbose and debug 5

By: Tony Guadagno (tonyg) 2017-08-10 13:10:49.971-0500

this is a new backtrace, with compiler flags for better backtraces

By: Tony Guadagno (tonyg) 2017-08-11 13:57:00.320-0500

george, did this backtrace have everything you needed?
please let me know
thanks

By: George Joseph (gjoseph) 2017-08-14 08:44:15.405-0500

Sorry Tony.  I'm scrum master this sprint and we're down a few folks so things are a bit stretched.  I'll try and look this afternoon.


By: Tony Guadagno (tonyg) 2017-08-14 08:54:11.174-0500

george, thanks for checking in, i appreciate all your efforts.

By: Rusty Newton (rnewton) 2017-10-16 16:11:53.205-0500

[~gjoseph] Was there anything else we can do here? I think last time we discussed this one it looked like it was a libsrtp issue and not an issue in Asterisk itself.



By: Tony Guadagno (tonyg) 2017-10-17 10:40:13.398-0500

rusty, if I could add a comment, while it is clear that there is an issue with that version (1.44) of libsrtp, i think that asterisk needs to do something to protect itself.  freepbx and others are putting asterisk out there with libsrtp 1.44 and i believe it is a security risk (as stated above) not to defend against this core dump.

thanks


By: Joshua C. Colp (jcolp) 2017-10-20 12:16:47.223-0500

Unfortunately there is nothing within Asterisk we can explicitly do to protect against this issue, besides requiring everyone to update (and based on what I've seen this issue impacts very few people). The issue itself is due to a problem inside of libsrtp itself, in areas we do not have access to in Asterisk.

By: Richard Mudgett (rmudgett) 2017-11-14 17:15:27.161-0600

See https://wiki.asterisk.org/wiki/display/AST/libsrtp about libsrtp and Asterisk support.

By: xrobau (xrobau) 2017-11-14 18:13:53.586-0600

RHEL bug: https://bugzilla.redhat.com/show_bug.cgi?id=1301202


By: Corey Farrell (coreyfarrell) 2017-11-15 13:25:47.463-0600

* RHEL/CentOS 6 don't directly include libsrtp, EPEL6 has 1.5.4.
* All current Fedora's have 1.5.4.
* debian backported both fixes to 1.4.4 and 1.4.5 for older debians.

Redhat 7 includes libsrtp in the base distribution and both CVE's are listed as 'will not fix': [CVE-2013-2139|https://access.redhat.com/security/cve/cve-2013-2139] and [CVE-2015-6360|https://access.redhat.com/security/cve/cve-2015-6360].

By: Andrew Nagy (tm1000) 2017-11-15 13:29:11.509-0600

In the Sangoma/FreePBX distro (as CentOS 7 Variant) world we are now distributing libsrtp 1.6.