[Home]

Summary:ASTERISK-25183: PJSIP: Crash on NULL channel in chan_pjsip_incoming_response despite previous checks for NULL channel
Reporter:Matt Jordan (mjordan)Labels:
Date Opened:2015-06-22 09:39:08Date Closed:2015-07-27 11:32:41
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:Frequency of
Occurrence
Related
Issues:
is related toASTERISK-25201 Crash in PJSIP distributor on already free'd threadpool
Environment:Attachments:( 0) backtrace_2003.txt
( 1) full.txt
( 2) messages.txt
Description:Note that this was caught by the {{channels/pjsip/basic_calls/outgoing/off-nominal/bob_incompatible_codecs}} test in the Test Suite.

A crash occurred in the previously mentioned test due to the channel being NULL and its name being retrieved:

{code}
#0  0x000000000054494c in ast_channel_name (chan=0x0) at channel_internal_api.c:476
476 DEFINE_STRINGFIELD_GETTER_FOR(name);
#0  0x000000000054494c in ast_channel_name (chan=0x0) at channel_internal_api.c:476
No locals.
#1  0x00007f42987ebce1 in chan_pjsip_incoming_response (session=0x7f42d000d3b8, rdata=0x7f4308022d98) at chan_pjsip.c:2224
       status = {code = 200, reason = {ptr = 0x7f4308024100 "OK", slen = 2}}
       cause_code = 0x7f42da36d670
       data_size = 102
       __PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
#2  0x00007f42de1a9078 in handle_incoming_response (session=0x7f42d000d3b8, rdata=0x7f4308022d98, type=PJSIP_EVENT_TSX_STATE, response_priority=AST_SIP_SESSION_AFTER_MEDIA) at res_pjsip_session.c:2187
       supplement = 0x7f42d000f7c0
       status = {code = 200, reason = {ptr = 0x7f4308024100 "OK", slen = 2}}
       __PRETTY_FUNCTION__ = "handle_incoming_response"
#3  0x00007f42de1a923f in handle_incoming (session=0x7f42d000d3b8, rdata=0x7f4308022d98, type=PJSIP_EVENT_TSX_STATE, response_priority=AST_SIP_SESSION_AFTER_MEDIA) at res_pjsip_session.c:2201
       __PRETTY_FUNCTION__ = "handle_incoming"
{code}

In Asterisk 13, this corresponds to this line of code:

{code}
/* Build and send the tech-specific cause information */
/* size of the string making up the cause code is "SIP " number + " " + reason length */
data_size += 4 + 4 + pj_strlen(&status.reason);
cause_code = ast_alloca(data_size);
memset(cause_code, 0, data_size);

ast_copy_string(cause_code->chan_name, ast_channel_name(session->channel), AST_CHANNEL_NAME); // THIS LINE HERE
{code}

However, we previously explicitly check that the channel is non-NULL before proceeding in this function:
{code}
if (!session->channel) {
return;
}
{code}

Which ... doesn't make much sense. Even if we had a reference counting issue, this should have pointed to garbage.

However, we can see that we are hanging up a channel at this moment in time:

{code}
Thread 70 (Thread 0x7f42da3ea700 (LWP 7686)):
#0  0x00000000005fee68 in __ast_pthread_mutex_lock (filename=0x7fa55b "astmm.c", lineno=360, func=0x7fb2a7 "region_free", mutex_name=0x7fa5cb "&reglock", t=0xadfb40) at lock.c:313
#1  0x000000000047bd8e in region_free (freed=0xb17040, reg=0x7f42f800a580) at astmm.c:360
#2  0x000000000047c4e3 in __ast_free_region (ptr=0x7f42f800a610, file=0x7fb5ab "astobj2.c", lineno=461, func=0x7fb840 "internal_ao2_ref") at astmm.c:479
#3  0x000000000047c81e in __ast_free (ptr=0x7f42f800a610, file=0x7fb5ab "astobj2.c", lineno=461, func=0x7fb840 "internal_ao2_ref") at astmm.c:532
#4  0x000000000048142d in internal_ao2_ref (user_data=0x7f42f800a668, delta=-1, file=0x7fb5ab "astobj2.c", line=516, func=0x7fb823 "__ao2_ref") at astobj2.c:461
#5  0x0000000000481969 in __ao2_ref (user_data=0x7f42f800a668, delta=-1) at astobj2.c:516
#6  0x0000000000481a4a in __ao2_cleanup (obj=0x7f42f800a668) at astobj2.c:529
#7  0x00007f42987e9ced in hangup (data=0x7f4314006578) at chan_pjsip.c:1744
#8  0x000000000072a453 in ast_taskprocessor_execute (tps=0x7f42d000e698) at taskprocessor.c:768
#9  0x000000000073dba0 in execute_tasks (data=0x7f42d000e698) at threadpool.c:1157
#10 0x000000000072a453 in ast_taskprocessor_execute (tps=0x1484fa8) at taskprocessor.c:768
#11 0x000000000073b0c5 in threadpool_execute (pool=0x1653518) at threadpool.c:351
#12 0x000000000073d677 in worker_active (worker=0x7f42cc001cc8) at threadpool.c:1075
#13 0x000000000073d2c2 in worker_start (arg=0x7f42cc001cc8) at threadpool.c:995
#14 0x0000000000750540 in dummy_start (data=0x7f42cc001f10) at utils.c:1237
#15 0x00000034ac6079d1 in start_thread () from /lib64/libpthread.so.0
#16 0x00000034ac2e89dd in clone () from /lib64/libc.so.6
{code}

Which means that we can probably still skip past the first check on line {{2195}}, and have the {{hangup}} callback nuke out the {{session->channel}} pointer. Egads.

Logs and backtrace attached.
Comments:By: Asterisk Team (asteriskteam) 2015-07-26 11:53:12.558-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.