[Home]

Summary:ASTERISK-24731: res_pjsip_session cannot be unloaded
Reporter:Corey Farrell (coreyfarrell)Labels:
Date Opened:2015-01-28 08:17:46.000-0600Date Closed:2015-03-26 12:48:43
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_session
Versions:SVN 13.1.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace_15784.txt
( 1) chan_pjsip-frack.txt
( 2) chan_pjsip-ref-fixes.patch
( 3) chan_pjsip-ref-fixes-r2.patch
Description:res_pjsip_session cannot be unloaded or shutdown, causing huge numbers of leaks to be reported by REF_DEBUG or valgrind. This makes it impossible to do automated checks for memory leaks against chan_pjsip.  All testsuite tests to fail if REF_DEBUG is enabled and res_pjsip_session is loaded.

This is follow up to ASTERISK-24485.  As with that bug it's important for the module to clean itself up on graceful shutdown, less important to allow users to unload the module without shutdown.
Comments:By: Corey Farrell (coreyfarrell) 2015-03-12 08:41:56.761-0500

First attempt at a patch to allow all pjsip modules to load and unload with no reference leaks.  Unfortunately it results in many segmentation faults with the testsuite.

By: Corey Farrell (coreyfarrell) 2015-03-12 08:47:42.218-0500

All backtraces so far look the same.  {{pjsip_endpt_destroy(ast_pjsip_endpoint);}} segfaults during unload of res_pjsip.

Also I noticed AO2 frack's from tests/channels/pjsip/ami/show_registrations_outbound.  Not sure what to do from here.

By: Corey Farrell (coreyfarrell) 2015-03-12 10:09:33.533-0500

Just completed a run of tests/channels/pjsip with the patch.  Out of 148 tests I got 67 total failures.  64 of those failures had reference leaks, and 27 had backtraces.

I've just attached a second (different) backtrace caused by my patch.

By: Corey Farrell (coreyfarrell) 2015-03-12 20:03:47.654-0500

In case it will help anyone to know the sources, here's a list of the 27 backtraces I got.
Segfaults from {{unload_pjsip}}:
{noformat}
logs/channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/bob_hangs_up/backtrace_9915.txt
logs/channels/pjsip/basic_calls/outgoing/nominal/auth/backtrace_7724.txt
logs/channels/pjsip/basic_calls/outgoing/nominal/echo/backtrace_14690.txt
logs/channels/pjsip/basic_calls/outgoing/nominal/nat/backtrace_10822.txt
logs/channels/pjsip/basic_calls/outgoing/off-nominal/bob_incompatible_codecs/backtrace_2804.txt
logs/channels/pjsip/endpoint_identify/backtrace_859.txt
logs/channels/pjsip/user_eq_phone/backtrace_1818.txt
logs/channels/pjsip/hold_inactive/backtrace_1655.txt
logs/channels/pjsip/call_pickup/backtrace_5239.txt
logs/channels/pjsip/transfers/blind_transfer/caller_refer_only/backtrace_13185.txt
logs/channels/pjsip/transfers/blind_transfer/caller_direct_media/backtrace_10342.txt
logs/channels/pjsip/transfers/blind_transfer/callee_direct_media/backtrace_15444.txt
logs/channels/pjsip/transfers/blind_transfer/callee_refer_only/backtrace_10715.txt
logs/channels/pjsip/accountcode/backtrace_3296.txt
logs/channels/pjsip/hold/backtrace_9193.txt
logs/channels/pjsip/sdp_offer_answer/attribute_passthrough/backtrace_11103.txt
logs/channels/pjsip/message/message_in_dialog/backtrace_12521.txt
logs/channels/pjsip/hold_ice/backtrace_1852.txt
logs/channels/pjsip/diversion/diversion_basic/backtrace_9554.txt
logs/channels/pjsip/diversion/diversion_request/backtrace_1883.txt
logs/channels/pjsip/diversion/diversion_caller_id/backtrace_7468.txt
logs/channels/pjsip/diversion/diversion_response/backtrace_9173.txt
{noformat}

Segfaults from {{pjsip_endpt_destroy(ast_pjsip_endpoint)}}:
{noformat}
logs/channels/pjsip/transfers/attended_transfer/nominal/callee_remote/backtrace_15608.txt
logs/channels/pjsip/transfers/attended_transfer/nominal/caller_local/backtrace_7937.txt
logs/channels/pjsip/transfers/attended_transfer/nominal/callee_local/backtrace_10844.txt
logs/channels/pjsip/transfers/blind_transfer/callee_with_hold/backtrace_15381.txt
logs/channels/pjsip/refer_send_to_vm/backtrace_9935.txt
{noformat}

By: Matt Jordan (mjordan) 2015-03-12 20:07:28.967-0500

You're getting crashes due to calling PJSIP functions from a non-PJSIP registered thread:

{noformat}
#0  0x00007fd5ac56bcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  0x00007fd5ac56bcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
       resultvar = 0
       pid = 30776
       selftid = 30876
#1  0x00007fd5ac56f0d8 in __GI_abort () at abort.c:89
       save_stage = 2
       act = {__sigaction_handler = {sa_handler = 0x7fff58f2f2db, sa_sigaction = 0x7fff58f2f2db}, sa_mask = {__val = {140555697465532, 140555312626520, 692, 4294967295, 140555696104675, 4294967296, 140555736862448, 38654705664, 0, 3519, 0, 0, 0, 21474836480, 140555738025984, 140555697480656}}, sa_flags = -1787101968, sa_restorer = 0x7fd5957aff01 <__PRETTY_FUNCTION__.5427>}
       sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fd5ac564b86 in __assert_fail_base (fmt=0x7fd5ac6b63d0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fd5957afcf0 "!\"Calling pjlib from unknown/external thread. You must \" \"register external threads with pj_thread_register() \" \"before calling any pjlib functions.\"", file=file@entry=0x7fd5957afb58 "../src/pj/os_core_unix.c", line=line@entry=692, function=function@entry=0x7fd5957aff01 <__PRETTY_FUNCTION__.5427> "pj_thread_this") at assert.c:92
       str = 0x7fd5a400da50 "\200", <incomplete sequence \333>
       total = 4096
#3  0x00007fd5ac564c32 in __GI___assert_fail (assertion=0x7fd5957afcf0 "!\"Calling pjlib from unknown/external thread. You must \" \"register external threads with pj_thread_register() \" \"before calling any pjlib functions.\"", file=0x7fd5957afb58 "../src/pj/os_core_unix.c", line=692, function=0x7fd5957aff01 <__PRETTY_FUNCTION__.5427> "pj_thread_this") at assert.c:101
No locals.
#4  0x00007fd59579758c in pj_thread_this () from /usr/lib/libpj.so.2
No symbol table info available.
#5  0x00007fd5957a115a in pj_log () from /usr/lib/libpj.so.2
No symbol table info available.
#6  0x00007fd5957a169b in pj_log_4 () from /usr/lib/libpj.so.2
No symbol table info available.
#7  0x00007fd595c40da5 in unload_module () from /usr/lib/libpjsip.so.2
No symbol table info available.
#8  0x00007fd595c40c3c in pjsip_endpt_unregister_module () from /usr/lib/libpjsip.so.2
No symbol table info available.
{noformat}

You'll need to marshal the call to {{pjsip_endpt_unregister_module}} from a PJSIP thread, synchronize on it completing, then continue the unload process.

By: Corey Farrell (coreyfarrell) 2015-03-12 20:39:25.087-0500

So this PJSIP thread issue effected 5 of the 27 backtraces.  I just moved most of {{res_pjsip.c:module_unload}} to {{unload_pjsip}}, it resolved the {{pjsip_endpt_unregister_module}} crash.  Unfortunately 4 of the 5 went on to crash at {{pjsip_endpt_destroy(ast_pjsip_endpoint)}}.

By: Corey Farrell (coreyfarrell) 2015-03-15 00:45:27.010-0500

Revision 2 of the patch.

Contains fixes to res_pjsip_outbound_registration.  Each change seems to fix a FRACK, but about every other run of tests/channels/pjsip/ami/show_registrations_outbound still has a FRACK.  Still have an extra or missing ao2_ref somewhere, or something is going out of order with a task processor.

All other tests now succeed, some still have leaks.

By: Corey Farrell (coreyfarrell) 2015-03-15 00:54:56.336-0500

Attached is the refs entries for an object that FRACK'ed (unref after free) and the backtrace.  I added the last ref line (error) using information in the backtrace.