[Home]

Summary:ASTERISK-25439: Segfault in find_entry () from /usr/lib/libpj.so.2 (dns_resolver, qualify_contact)
Reporter:Dmitriy Serov (Demon)Labels:
Date Opened:2015-10-01 04:08:26Date Closed:2018-10-09 17:08:01
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:13.5.0 Frequency of
Occurrence
Frequent
Related
Issues:
duplicatesASTERISK-25638 pjsip: Deadlock between monitor thread and worker threads
is duplicated byASTERISK-26310 Crash occurs every 24 - 48 hours with backtrace log showing fault related to pjsip hash
Environment:# Package Information for pkg-config Name: libpjproject Description: Multimedia communication library URL: http://www.pjsip.org Version: 2.4.5 Libs: -L${libdir} -lpjsua2 -lstdc++ -lpjsua -lpjsip-ua -lpjsip-simple -lpjsip -lpjmedia-codec -lpjmedia -lpjmedia-videodev -lpjmedia-audiodev -lpjmedia -lpjnath -lpjlib-util -lilbccodec -lg7221codec -lsrtp -lgsm -lspeex -lspeexdsp -lpj -lssl -lcrypto -luuid -lm -lrt -lpthread Cflags: -I${includedir} -I/usr/include -DPJ_AUTOCONF=1 -O2 -DNDEBUG -DPJ_IS_BIG_ENDIAN=0 -DPJ_IS_LITTLE_ENDIAN=1 -fPIC Attachments:( 0) 2015_09_30__20_02_07.backtrace-threads.txt
( 1) 2015_09_30__20_02_07.full.tail.txt
( 2) 2015_10_01__11_50_08.backtrace-threads.txt
( 3) 2015_10_01__11_50_08.full.tail.txt
( 4) 2015_10_01__13_14_07.backtrace-threads.txt
( 5) 2015_10_01__13_14_07.full.tail.txt
( 6) 2016_01_10__13_08_08.backtrace-threads.txt
( 7) 2016_01_10__13_08_08.full.tail.txt
( 8) 2016_01_10__22_20_01.backtrace-threads.txt
( 9) 2016_01_10__22_20_01.full.tail.txt
(10) 2016_01_10__22_20_01.locks.txt
(11) 2016_01_11__22_56_01.full.tail.txt
(12) 2016_01_11__22_56_01.locks.txt
(13) 2016_01_12__00_04_07.backtrace-threads.txt
(14) 2016_01_12__13_41_01.locks.txt
(15) 2016_01_12__13_42_01.full.tail.txt
(16) 2016_01_12__13_42_01.locks.txt
(17) 2016_01_12__15_43_01.full.tail.txt
(18) 2016_01_12__15_43_01.locks.txt
(19) 2016_01_13__20_18_07.backtrace-threads.txt
(20) 2016_01_13__20_18_07.full.tail.txt
(21) 2016_02_11__05_56_08.backtrace-threads.txt
Description:Segfault in find_entry.
pjproject: 2.4.5
PJSIP_MAX_URL_SIZE modified to 1024
Comments:By: Asterisk Team (asteriskteam) 2015-10-01 04:08:28.655-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Dmitriy Serov (Demon) 2015-10-01 04:09:37.115-0500

backtraces and log tails of two cases are attached

By: Dmitriy Serov (Demon) 2015-10-01 06:14:38.305-0500

The same segfault stack, but judging by the log there were problems with the Internet.
I am using local bind daemon with forwarders { 8.8.8.8; 8.8.4.4; }

By: Dmitriy Serov (Demon) 2016-01-10 08:01:32.699-0600

Asterisk SegFaults every day :(
Today logs attached.

By: Dmitriy Serov (Demon) 2016-01-10 13:37:26.842-0600

It seems (some tests) that SegFault is the result of deadlock (file: 2016_01_10__22_20_01.locks.txt)
2016_01_10__22_20_01.backtrace-threads.txt
tail of log: 2016_01_10__22_20_01.full.tail.txt

May be the reason is:
[2016-01-10 22:18:28] ERROR[367] netsock2.c: getaddrinfo("sip.ukrtel.net", "(null)", ...): System error
[2016-01-10 22:18:29] ERROR[367] netsock2.c: getaddrinfo("sip.ukrtel.net", "(null)", ...): System error


By: Dmitriy Serov (Demon) 2016-01-11 15:30:23.902-0600

Ones again :(
2016_01_12__00_04_07.backtrace-threads.txt
In this case there are no locks. Just segmentation faults.

By: Dmitriy Serov (Demon) 2016-01-11 16:04:58.525-0600

Other regular trouble (may be it relatives).
Asterisk periodical hangs: 2016_01_11__22_56_01.locks.txt
In locks:
__ast_bt_get_addresses
res_pjsip/pjsip_options.c:391 qualify_contact()

in 2016_01_11__22_56_01.full.tail.txt
before [2016-01-11 22:55:47] contacts created and deleted
after [2016-01-11 22:55:47] contact only deleted
asterisk hangs and one minute after was killed -9

By: George Joseph (gjoseph) 2016-01-11 17:26:42.349-0600

Possible cause...
pjproject defines a maximum hostname size of 128 (which is way too short in my opinion).  If it's passed a hostname longer than that that, it segfaults.

Can you try compiling pjproject with PJ_MAX_HOSTNAME set to something very large and see if the issue still happens?

I'm going to put a check in Asterisk anyway to prevent us from sending the hostname if it's too long.

By: Dmitriy Serov (Demon) 2016-01-12 03:21:23.420-0600

pjproject configured with
./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr --with-external-speex \
--with-external-srtp=/usr/src/programs/srtp --with-external-gsm CFLAGS="-O2 -DNDEBUG -DPJSIP_MAX_URL_SIZE=1024 -DPJ_MAX_HOSTNAME=1024"

asterisk was rebuilded and restarted. Monitoring is going on :) If segfault will repeat i comment immediately.

For me it is very doubtful that the length is exactly host exceeded 128. Is there an example of such a host from the log?
In any case I'll be glad if this edit will help.

By: George Joseph (gjoseph) 2016-01-12 09:35:05.766-0600

The length was just something I ran across while looking at pjproject and thought it might be worth a try.  
Still looking.


By: George Joseph (gjoseph) 2016-01-12 10:46:07.235-0600

Do you have DETECT_DEADLOCKS turned on when you compile asterisk?  If not, can you try it?  Maybe it'll give us more info.


By: Dmitriy Serov (Demon) 2016-01-12 15:08:24.117-0600

More file of locks.
Two hangs in 13:41 and 15:43

By: Dmitriy Serov (Demon) 2016-01-13 12:44:54.887-0600

SegFaults in the same place.
PJProject-2.4.5
./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr --with-external-speex \
--with-external-srtp=/usr/src/programs/srtp --with-external-gsm CFLAGS="-O2 -DNDEBUG -DPJSIP_MAX_URL_SIZE=1024 -DPJ_MAX_HOSTNAME=1024"

2016_01_13__20_18_07.backtrace-threads.txt
2016_01_13__20_18_07.full.tail.txt

By: Dmitriy Serov (Demon) 2016-01-22 12:43:42.282-0600

I guess ASTERISK-25638 is related.

By: Dmitriy Serov (Demon) 2016-02-04 03:20:58.825-0600

Disabling "response cache" in "pjproject-2.4.5/pjlib-util/src/pjlib-util/resolver.c" completely eliminated the problem.
I'm sure this cache in pjproject and it ref_cnt is not working in a multithreaded environment with a little more load.

By: Dmitriy Serov (Demon) 2016-02-12 01:20:41.932-0600

2016_02_11__05_56_08.backtrace-threads.txt

Backtrace of same find_entry of hash. But another stack.
I guess this indicates the presence of multithreading issues in the code of the hash.

By: Sean Bright (seanbright) 2018-09-17 17:24:46.046-0500

[There have been many changes to {{resolver.c}} since 2.4.5 was released|https://trac.pjsip.org/repos/changeset?reponame=&new=5826%40pjproject%2Ftrunk%2Fpjlib-util%2Fsrc%2Fpjlib-util%2Fresolver.c&old=4649%40pjproject%2Ftrunk%2Fpjlib-util%2Fsrc%2Fpjlib-util%2Fresolver.c]. Is this still reproducible with the latest Asterisk 13 release with bundled PJSIP?

By: Dmitriy Serov (Demon) 2018-10-09 17:02:41.907-0500

I use 15.6.1. Segfaults with resolver are not seen.
I think issue can be closed.

By: Richard Mudgett (rmudgett) 2018-10-09 17:08:01.317-0500

Closed per reporter.  Reporter no longer using Asterisk 13 versions.