[Home]

Summary:ASTERISK-28810: Segmentation fault in ast_manager_build_channel_state_string_prefix
Reporter:Robert Sutton (rsutton@noojee.com.au)Labels:
Date Opened:2020-04-06 19:35:08Date Closed:
Priority:MajorRegression?
Status:In Progress/In ProgressComponents:Core/Channels
Versions:16.9.0 Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-25366 Segmentation fault - in ast_manager_build_channel_state_string_prefix at manager_channels.c:417
Environment:docker ubuntu 18.04Attachments:( 0) patch.txt
Description:We are having daily core dumps.

ast_manager_build_channel_state_string_prefix was passed a null snapshot, upon looking around the code base there are many paths where it is called with out first checking.

This problem will keep happening if it is reliant on callers of this method to first check the arg. The simple solution is to do a null check on the snapshot in ast_manager_build_channel_state_string_prefix and return NULL.

I will attach a patch shortly.

#0  ast_manager_build_channel_state_string_prefix (snapshot=0x0, prefix=0x62f514 "") at manager_channels.c:496
       out = <error reading variable out (Cannot access memory at address 0x7f794f496cd0)>
       caller_name = <optimized out>
       connected_name = <optimized out>
       res = <optimized out>
       __PRETTY_FUNCTION__ = "ast_manager_build_channel_state_string_prefix"
Comments:By: Asterisk Team (asteriskteam) 2020-04-06 19:35:10.299-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Robert Sutton (rsutton@noojee.com.au) 2020-04-06 19:39:09.299-0500

Util I can attach the patch here it is as a comment...

diff --git a/main/manager_channels.c b/main/manager_channels.c
index c964033..a50c4f4 100644
--- a/main/manager_channels.c
+++ b/main/manager_channels.c
@@ -493,6 +493,12 @@ struct ast_str *ast_manager_build_channel_state_string_prefix(
       char *connected_name;
       int res;

+       if (!snapshot)
+       {
+               ast_log(AST_LOG_WARNING, "Call to ast_manager_build_channel_state_string_prefix with NULL snapshot!\n");
+               return NULL;
+       }
+
       if (snapshot->tech_properties & AST_CHAN_TP_INTERNAL) {
               return NULL;
       }


By: George Joseph (gjoseph) 2020-04-07 08:06:19.093-0500

Thanks for the patch.   I'll get it merged.

Can you attach the output produced by ast_coredumper so we can track down where the NULL snapshots are coming from?


By: Robert Sutton (rsutton@noojee.com.au) 2020-04-07 19:40:57.563-0500

Unfortunately I was still working towards getting the core dump configuration correct and only managed to acquire a partial core dump due to the core dump size being limited to 1MB.

This partial core dump only yielded the single line of back trace which I've already posted.

I've since rebuilt with my patch and deployed to production - so far 1 day without a core dump.

Getting a full back trace will require pushing a known to core dump build back to production.

So the question is how important is it to get the full back trace?

By: George Joseph (gjoseph) 2020-04-08 13:11:39.860-0500

It was only for future reference so no big deal.  
I'll get the patch submitted tomorrow unless you wish to submit it yourself to https://gerrit/asterisk.org.  It's always better if the contributor can do it themselves.
https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process


By: Robert Sutton (rsutton@noojee.com.au) 2020-04-08 19:25:12.612-0500

I've committed the patch on the 16 branch.

I tried cherry picking it in gerrit to 17 but it failed with a merge conflict. As I couldn't find any instructions on how to proceed from there, I have stopped at that point.

By: George Joseph (gjoseph) 2020-04-09 09:30:52.762-0500

When you get a merge conflict in Gerrit, the best thing to do is...
* In you local development tree, checkout the branch you were trying to cherry-pick to.
* Do a {{git cherry-pick <commitid>}} or a {{git review -x <gerrit_change_num>}}

Sometimes Gerrit will indicate a conflict but a local cherry-pick will work.  If the cherry-pick succeeds, you only have to do a {{git review -t <topic> <new_branch>}} to push the cherry-pick back up to gerrit.  

If it failed, do a {{git status}} to see the files in conflict, fix them, do a {{git add <file>}}, then do a {{git cherry-pick --continue}}.  When it finally succeeds, you can do a {{git review -t <topic> <new_branch>}} to push the cherry-pick back up to gerrit.  

One other thing I'll say is that it's probably better to get at least 1 "+1" code review before cherry-picking.  This way if changes are necessary, you only have to make them in one branch.  Usually, one of us will comment on the review "OK to cherry-pick to ..." when it's safe.

I'll add these instructions to the wiki page.

Thanks for submitting the patch yourself!  We appreciate that.



By: Sean Bright (seanbright) 2020-04-09 11:13:19.587-0500

[~rsutton@noojee.com.au], I would really like to see a full backtrace to understand where the {{NULL}} snapshot is coming from.

By: Robert Sutton (rsutton@noojee.com.au) 2020-04-13 18:52:20.578-0500

I'll push a build without my patch back into production in an attempt to get a full backtrace.



By: Robert Sutton (rsutton@noojee.com.au) 2020-04-13 20:10:12.856-0500

Management really don't want a broken build put back in to production. Is it essential to get a backtrace, or can we just get the Null check merged?

By: Robert Sutton (rsutton@noojee.com.au) 2020-05-18 18:42:15.970-0500

I've acquired a couple identical backtraces from 2 systems

Running asterisk 16.10
{noformat}
[May 15 17:16:11] ERROR[33]:   Got 15 backtrace records
# 0: [0x5db45d] asterisk utils.c:2404 __ast_assert_failed()
# 1: [0x51d3fd] asterisk manager_channels.c:498 ast_manager_build_channel_state_string_prefix()
# 2: [0x51d818] asterisk manager_channels.c:570 ast_manager_build_channel_state_string()
# 3: [0x7feccd91dff5] res_agi.so res_agi.c:1427 agi_channel_to_ami()
# 4: [0x7feccd91e107] res_agi.so res_agi.c:1443 agi_exec_start_to_ami()
# 5: [0x5b414b] asterisk stasis_message.c:226 stasis_message_to_ami()
# 6: [0x628343] asterisk manager.c:1848 manager_default_msg_cb()
# 7: [0x5b48a6] asterisk stasis_message_router.c:202 router_dispatch()
# 8: [0x59fccd] asterisk stasis.c:780 subscription_invoke()
# 9: [0x5a0f7f] asterisk stasis.c:1260 dispatch_exec_async()
#10: [0x5c5720] asterisk taskprocessor.c:1235 ast_taskprocessor_execute()
#11: [0x5c2755] asterisk taskprocessor.c:201 default_tps_processing_function()
#12: [0x5d83b7] asterisk utils.c:1249 dummy_start()
#13: [0x7fed1a66a6ba] libpthread.so.0 :0 __pthread_get_minstack()
#14: [0x7fed198ff41d] libc.so.6 :0 clone()
{noformat}
with this patch applied
{noformat}
diff --git a/main/manager_channels.c b/main/manager_channels.c
index c964033..25b5c65 100644
--- a/main/manager_channels.c
+++ b/main/manager_channels.c
@@ -493,6 +493,11 @@ struct ast_str *ast_manager_build_channel_state_string_prefix(
       char *connected_name;
       int res;

+       if (!snapshot) {
+               __ast_assert_failed(0, "snapshot != NULL in ast_manager_build_channel_state_string_prefix",__FILE__,__LINE__,__PRETTY_FUNCTION__);
+               return NULL;
+       }
+
       if (snapshot->tech_properties & AST_CHAN_TP_INTERNAL) {
               return NULL;
       }
{noformat}
and this one too
{noformat}
[May 15 15:19:03] ERROR[33]:   Got 15 backtrace records
# 0: [0x5db45d] asterisk utils.c:2404 __ast_assert_failed()
# 1: [0x51d3fd] asterisk manager_channels.c:498 ast_manager_build_channel_state_string_prefix()
# 2: [0x51d818] asterisk manager_channels.c:570 ast_manager_build_channel_state_string()
# 3: [0x7feccd91dff5] res_agi.so res_agi.c:1427 agi_channel_to_ami()
# 4: [0x7feccd91e128] res_agi.so res_agi.c:1448 agi_exec_end_to_ami()
# 5: [0x5b414b] asterisk stasis_message.c:226 stasis_message_to_ami()
# 6: [0x628343] asterisk manager.c:1848 manager_default_msg_cb()
# 7: [0x5b48a6] asterisk stasis_message_router.c:202 router_dispatch()
# 8: [0x59fccd] asterisk stasis.c:780 subscription_invoke()
# 9: [0x5a0f7f] asterisk stasis.c:1260 dispatch_exec_async()
#10: [0x5c5720] asterisk taskprocessor.c:1235 ast_taskprocessor_execute()
#11: [0x5c2755] asterisk taskprocessor.c:201 default_tps_processing_function()
#12: [0x5d83b7] asterisk utils.c:1249 dummy_start()
#13: [0x7fed1a66a6ba] libpthread.so.0 :0 __pthread_get_minstack()
#14: [0x7fed198ff41d] libc.so.6 :0 clone()
{noformat}


By: Robert Sutton (rsutton@noojee.com.au) 2020-05-25 20:09:44.904-0500

I've now acquired a dozen occurrences of this and have identified the circumstances of the channels involved.

Firstly this always occurs after a line like this (For clarity this line was executed 26374 times resulting in 12 crashes over an 8 day period).
{noformat}
   -- AGI Script Executing Application: (bridge) Options: (PJSIP/123-000009cb,xpF)
{noformat}

There are 2 channels both in AGI:
     One in a loop stopping and starting playtones every 3 seconds.
     The other channel is orginated into dialplan and then into AGI and immediately answered and bridge is invoked with the first channel as the target

In this example PJSIP/711-000009cb is in the AGI invoking playtones and PJSIP/trunk-00000da2 is the originated channel. I've left blank lines where I've edited out activity from other channels.
{noformat}
  -- Executing [agent-agi@dialer:4] Goto("PJSIP/711-000009cb", "agent-agi,1") in new stack
   -- Goto (dialer,agent-agi,1)
   -- Executing [agent-agi@dialer:1] Set("PJSIP/711-000009cb", "AGIEXITONHANGUP=yes") in new stack
   -- Executing [agent-agi@dialer:2] AGI("PJSIP/711-000009cb", "agi://127.0.0.1:4578/DialerAgentHold?DialJobId=&InternalDialerId=2506") in new stack
   -- AGI Script Executing Application: (Playtones) Options: (beep)
[May 18 16:27:14] ERROR[21047][C-0000084b]: indications.c:340 ast_playtones_start: Failed to parse tone part 'beep'
[May 18 16:27:14] ERROR[21047][C-0000084b]: indications.c:381 ast_playtones_start: No valid tone parts
[May 18 16:27:14] NOTICE[21047][C-0000084b]: app_playtones.c:98 handle_playtones: Unable to start playtones

   -- Executing [agent-agi@dialer:1] Set("PJSIP/trunk-00000da2", "AGIEXITONHANGUP=yes") in new stack
   -- Executing [agent-agi@dialer:2] AGI("PJSIP/trunk-00000da2", "agi://127.0.0.1:4578/DialerGateWay?DialJobId=1379&InternalDialerId=3489") in new stack

   -- AGI Script Executing Application: (bridge) Options: (PJSIP/711-000009cb,xpF)
 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0FRACK!, Failed assertion snapshot != NULL in ast_manager_build_channel_state_string_prefix (0) at line 497 in ast_manager_build_channel_state_string_prefix of manager_channels.c
 == Spawn extension (dialer, agent-agi, 2) exited non-zero on 'Surrogate/PJSIP/711-000009cb'
[May 18 16:27:24] ERROR[34]: manager_channels.c:497 ast_manager_build_channel_state_string_prefix: FRACK!, Failed assertion snapshot != NULL in ast_manager_build_channel_state_string_prefix (0)
   -- Channel PJSIP/711-000009cb joined 'simple_bridge' basic-bridge <7c366a46-c3cf-437a-8632-530462118869>
   -- Channel PJSIP/trunk-00000da2 joined 'simple_bridge' basic-bridge <7c366a46-c3cf-437a-8632-530462118869>
   -- <PJSIP/711-000009cb> Playing 'beep.alaw' (language 'en_AU')

[May 18 16:27:24] ERROR[34]:   Got 15 backtrace records
# 0: [0x5db45d] asterisk utils.c:2404 __ast_assert_failed()
# 1: [0x51d3fd] asterisk manager_channels.c:498 ast_manager_build_channel_state_string_prefix()
# 2: [0x51d818] asterisk manager_channels.c:570 ast_manager_build_channel_state_string()
# 3: [0x7f922d10dff5] res_agi.so res_agi.c:1427 agi_channel_to_ami()
# 4: [0x7f922d10e107] res_agi.so res_agi.c:1443 agi_exec_start_to_ami()
# 5: [0x5b414b] asterisk stasis_message.c:226 stasis_message_to_ami()
# 6: [0x628343] asterisk manager.c:1848 manager_default_msg_cb()
# 7: [0x5b48a6] asterisk stasis_message_router.c:202 router_dispatch()
# 8: [0x59fccd] asterisk stasis.c:780 subscription_invoke()
{noformat}