[Home]

Summary:ASTERISK-29377: cpool_release_pool "double free or corruption (out)"
Reporter:Robert Sutton (rsutton@noojee.com.au)Labels:patch
Date Opened:2021-03-30 18:49:32Date Closed:2021-05-17 08:54:15
Priority:MinorRegression?No
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:16.16.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:ubuntu 18.04, Docker Attachments:( 0) 0090-fix-stun-cache-crash.patch
( 1) 1617056270.asterisk.core.1.txt
Description:Two threads appear to be freeing the same resource leading to a crash as can be seen below..

thread 1

{quote}
1617056270.asterisk.core.1.txt-Thread 1 (Thread 0x7fd335e21700 (LWP 28196)):
1617056270.asterisk.core.1.txt-#0  __GI_abort () at abort.c:107
1617056270.asterisk.core.1.txt-#1  0x00007fd3c92b8967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd3c93e5b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
1617056270.asterisk.core.1.txt-#2  0x00007fd3c92bf9da in malloc_printerr (str=str@entry=0x7fd3c93e7818 "double free or corruption (out)") at malloc.c:5342
1617056270.asterisk.core.1.txt-#3  0x00007fd3c92c6f6a in _int_free (have_lock=0, p=0x7fd2f1db62e0, av=0x7fd3c961ac40 <main_arena>) at malloc.c:4308
1617056270.asterisk.core.1.txt-#4  __GI___libc_free (mem=0x7fd2f1db62f0) at malloc.c:3134
1617056270.asterisk.core.1.txt-#5  0x000055ab6487ef6e in __ast_free (ptr=0x7fd2f1db62f0, file=0x7fd3cbb608b0 "../src/pj/pool_policy_malloc.c", lineno=78, func=0x7fd3cbb608f0 <__PRETTY_FUNCTION__.3957> "default_block_free") at astmm.c:1588
1617056270.asterisk.core.1.txt-#6  0x00007fd3cbb2a5d7 in default_block_free (factory=0x7fd36d3c2560 <cachingpool>, mem=0x7fd2f1db62f0, size=512) at ../src/pj/pool_policy_malloc.c:78
1617056270.asterisk.core.1.txt-#7  0x00007fd3cbb33723 in pj_pool_destroy_int (pool=0x7fd2f1db62f0) at ../src/pj/pool.c:296
1617056270.asterisk.core.1.txt:#8  0x00007fd3cbb33f22 in cpool_release_pool (pf=0x7fd36d3c2560 <cachingpool>, pool=0x7fd2f1db62f0) at ../src/pj/pool_caching.c:238
1617056270.asterisk.core.1.txt-#9  0x00007fd3cbb32e7c in pj_pool_release (pool=0x7fd2f1db62f0) at ../include/pj/pool_i.h:103
1617056270.asterisk.core.1.txt-#10 0x00007fd3cbb30d17 in grp_lock_destroy (p=0x7fd2f1db6398) at ../src/pj/lock.c:397
1617056270.asterisk.core.1.txt-#11 0x00007fd3cbb31150 in grp_lock_dec_ref (glock=0x7fd2f1db6398) at ../src/pj/lock.c:555
1617056270.asterisk.core.1.txt-#12 0x00007fd3cbb31190 in pj_grp_lock_dec_ref (glock=0x7fd2f1db6398) at ../src/pj/lock.c:646
1617056270.asterisk.core.1.txt-#13 0x00007fd3cbb30b5d in grp_lock_release (p=0x7fd2f1db6398) at ../src/pj/lock.c:335
1617056270.asterisk.core.1.txt-#14 0x00007fd3cbb30f83 in pj_grp_lock_release (grp_lock=0x7fd2f1db6398) at ../src/pj/lock.c:488
1617056270.asterisk.core.1.txt-#15 0x00007fd3cbaa51ee in destroy_ice (ice=0x7fd2f0219318, reason=0) at ../src/pjnath/ice_session.c:507
1617056270.asterisk.core.1.txt-#16 0x00007fd3cbaa521b in pj_ice_sess_destroy (ice=0x7fd2f0219318) at ../src/pjnath/ice_session.c:517
1617056270.asterisk.core.1.txt-#17 0x00007fd36d1980e3 in ice_wrap_dtor (vdoomed=0x7fd2f233d8e8) at res_rtp_asterisk.c:896
1617056270.asterisk.core.1.txt-#18 0x000055ab6488077b in __ao2_ref (user_data=0x7fd2f233d8e8, delta=-1, tag=0x7fd36d1b93d0 "", file=0x7fd36d1b93b2 "res_rtp_asterisk.c", line=874, func=0x7fd36d1bd3d0 <__PRETTY_FUNCTION__.40143> "ast_rtp_ice_stop") at astobj2.c:614
1617056270.asterisk.core.1.txt-#19 0x00007fd36d197fd1 in ast_rtp_ice_stop (instance=0x7fd2f143a0d0) at res_rtp_asterisk.c:874
1617056270.asterisk.core.1.txt-#20 0x00007fd36d1a1e9a in rtp_deallocate_transport (instance=0x7fd2f143a0d0, rtp=0x7fd2f05e44d0) at res_rtp_asterisk.c:3947
1617056270.asterisk.core.1.txt-#21 0x00007fd36d1a25c1 in ast_rtp_destroy (instance=0x7fd2f143a0d0) at res_rtp_asterisk.c:4050
1617056270.asterisk.core.1.txt-#22 0x000055ab64988031 in instance_destructor (obj=0x7fd2f143a0d0) at rtp_engine.c:434
1617056270.asterisk.core.1.txt-#23 0x000055ab6488077b in __ao2_ref (user_data=0x7fd2f143a0d0, delta=-1, tag=0x55ab64ac7159 "", file=0x55ab64ac71ea "rtp_engine.c", line=460, func=0x55ab64ac8470 <__PRETTY_FUNCTION__.18829> "ast_rtp_instance_destroy") at astobj2.c:614
1617056270.asterisk.core.1.txt-#24 0x000055ab64880af3 in __ao2_cleanup_debug (obj=0x7fd2f143a0d0, tag=0x55ab64ac7159 "", file=0x55ab64ac71ea "rtp_engine.c", line=460, function=0x55ab64ac8470 <__PRETTY_FUNCTION__.18829> "ast_rtp_instance_destroy") at astobj2.c:670
1617056270.asterisk.core.1.txt-#25 0x000055ab649882a0 in ast_rtp_instance_destroy (instance=0x7fd2f143a0d0) at rtp_engine.c:460
1617056270.asterisk.core.1.txt-#26 0x00007fd3655ae2a9 in stream_destroy (session_media=0x7fd2f0040718) at res_pjsip_sdp_rtp.c:2149
1617056270.asterisk.core.1.txt-#27 0x00007fd367425e2a in stream_destroy (obj=0x55ab65f40df0, arg=0x7fd2f0040718, flags=0) at res_pjsip_session.c:462
1617056270.asterisk.core.1.txt-#28 0x000055ab648823ce in internal_ao2_traverse (self=0x55ab65fe76c0, flags=OBJ_SEARCH_NONE, cb_fn=0x7fd367425de9 <stream_destroy>, arg=0x7fd2f0040718, data=0x0, type=AO2_CALLBACK_DEFAULT, tag=0x7fd36743f684 "", file=0x7fd36743f670 "res_pjsip_session.c", line=477, func=0x7fd367442af0 <__PRETTY_FUNCTION__.33873> "session_media_dtor") at astobj2_container.c:328
1617056270.asterisk.core.1.txt-#29 0x000055ab648826b7 in __ao2_callback (c=0x55ab65fe76c0, flags=OBJ_SEARCH_NONE, cb_fn=0x7fd367425de9 <stream_destroy>, arg=0x7fd2f0040718, tag=0x7fd36743f684 "", file=0x7fd36743f670 "res_pjsip_session.c", line=477, func=0x7fd367442af0 <__PRETTY_FUNCTION__.33873> "session_media_dtor") at astobj2_container.c:414
1617056270.asterisk.core.1.txt-#30 0x00007fd367425e95 in session_media_dtor (obj=0x7fd2f0040718) at res_pjsip_session.c:477
1617056270.asterisk.core.1.txt-#31 0x000055ab6488077b in __ao2_ref (user_data=0x7fd2f0040718, delta=-1, tag=0x7fd36743f684 "", file=0x7fd36743f670 "res_pjsip_session.c", line=285, func=0x7fd3674429c0 <__PRETTY_FUNCTION__.33758> "ast_sip_session_media_state_reset") at astobj2.c:614
1617056270.asterisk.core.1.txt-#32 0x000055ab64880af3 in __ao2_cleanup_debug (obj=0x7fd2f0040718, tag=0x7fd36743f684 "", file=0x7fd36743f670 "res_pjsip_session.c", line=285, function=0x7fd3674429c0 <__PRETTY_FUNCTION__.33758> "ast_sip_session_media_state_reset") at astobj2.c:670
1617056270.asterisk.core.1.txt-#33 0x00007fd3674253a4 in ast_sip_session_media_state_reset (media_state=0x7fd3bcb07670) at res_pjsip_session.c:285
1617056270.asterisk.core.1.txt-#34 0x00007fd367436e32 in ast_sip_session_terminate (session=0x7fd3bcb2b790, response=603) at res_pjsip_session.c:3538
1617056270.asterisk.core.1.txt-#35 0x00007fd366bfb63e in hangup (data=0x7fd39404d620) at chan_pjsip.c:2468
1617056270.asterisk.core.1.txt-#36 0x000055ab649f689f in ast_taskprocessor_execute (tps=0x7fd3bc81ad20) at taskprocessor.c:1237
1617056270.asterisk.core.1.txt-#37 0x000055ab64a00b1e in execute_tasks (data=0x7fd3bc81ad20) at threadpool.c:1354
1617056270.asterisk.core.1.txt-#38 0x000055ab649f689f in ast_taskprocessor_execute (tps=0x55ab65a5e1f0) at taskprocessor.c:1237
1617056270.asterisk.core.1.txt-#39 0x000055ab649fe4e7 in threadpool_execute (pool=0x55ab65a5cb40) at threadpool.c:367
1617056270.asterisk.core.1.txt-#40 0x000055ab64a002e7 in worker_active (worker=0x7fd33c0011b0) at threadpool.c:1137
1617056270.asterisk.core.1.txt-#41 0x000055ab64a00057 in worker_start (arg=0x7fd33c0011b0) at threadpool.c:1056
1617056270.asterisk.core.1.txt-#42 0x000055ab64a0a53b in dummy_start (data=0x7fd2f05f7740) at utils.c:1299
1617056270.asterisk.core.1.txt-#43 0x00007fd3ca15e6db in start_thread (arg=0x7fd335e21700) at pthread_create.c:463
1617056270.asterisk.core.1.txt-#44 0x00007fd3c935071f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

{quote}


and thread 96

{quote}
1617056270.asterisk.core.1.txt-Thread 96 (Thread 0x7fd33669f700 (LWP 24420)):
1617056270.asterisk.core.1.txt-#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1617056270.asterisk.core.1.txt-#1  0x00007fd3ca161025 in __GI___pthread_mutex_lock (mutex=0x7fd36d3c27b0 <cachingpool+592>) at ../nptl/pthread_mutex_lock.c:80
1617056270.asterisk.core.1.txt-#2  0x00007fd3cbb283b9 in pj_mutex_lock (mutex=0x7fd36d3c27b0 <cachingpool+592>) at ../src/pj/os_core_unix.c:1281
1617056270.asterisk.core.1.txt-#3  0x00007fd3cbb308a6 in pj_lock_acquire (lock=0x7fd36d3c2788 <cachingpool+552>) at ../src/pj/lock.c:180
1617056270.asterisk.core.1.txt:#4  0x00007fd3cbb33ebd in cpool_release_pool (pf=0x7fd36d3c2560 <cachingpool>, pool=0x7fd394035d90) at ../src/pj/pool_caching.c:213
1617056270.asterisk.core.1.txt-#5  0x00007fd3cbb32e7c in pj_pool_release (pool=0x7fd394035d90) at ../include/pj/pool_i.h:103
1617056270.asterisk.core.1.txt-#6  0x00007fd3cbb32eb4 in pj_pool_safe_release (ppool=0x7fd394035e48) at ../include/pj/pool_i.h:112
1617056270.asterisk.core.1.txt-#7  0x00007fd3cbab8389 in tdata_on_destroy (arg=0x7fd394035e38) at ../src/pjnath/stun_session.c:170
1617056270.asterisk.core.1.txt-#8  0x00007fd3cbb30cc9 in grp_lock_destroy (p=0x7fd39402f088) at ../src/pj/lock.c:390
1617056270.asterisk.core.1.txt-#9  0x00007fd3cbb31150 in grp_lock_dec_ref (glock=0x7fd39402f088) at ../src/pj/lock.c:555
1617056270.asterisk.core.1.txt-#10 0x00007fd3cbb31190 in pj_grp_lock_dec_ref (glock=0x7fd39402f088) at ../src/pj/lock.c:646
1617056270.asterisk.core.1.txt-#11 0x00007fd3cbb437d8 in pj_timer_heap_poll (ht=0x55ab65cd89d8, next_delay=0x0) at ../src/pj/timer.c:916
1617056270.asterisk.core.1.txt-#12 0x00007fd36d19dd8a in timer_worker_thread (data=0x0) at res_rtp_asterisk.c:2708
1617056270.asterisk.core.1.txt-#13 0x00007fd3cbb278f5 in thread_main (param=0x55ab65cec540) at ../src/pj/os_core_unix.c:541
1617056270.asterisk.core.1.txt-#14 0x00007fd3ca15e6db in start_thread (arg=0x7fd33669f700) at pthread_create.c:463
1617056270.asterisk.core.1.txt-#15 0x00007fd3c935071f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{quote}

Comments:By: Asterisk Team (asteriskteam) 2021-03-30 18:49:35.503-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2021-03-31 03:50:56.649-0500

Those are separate resources, with thread 96 waiting for the lock so it can release its pool into the caching pool.

Please attach the complete backtrace and not specific threads.

By: Robert Sutton (rsutton@noojee.com.au) 2021-03-31 05:02:48.590-0500

back trace as request

By: Robert Sutton (rsutton@noojee.com.au) 2021-03-31 05:44:18.842-0500

back traces, cleaned up

By: George Joseph (gjoseph) 2021-03-31 08:04:05.785-0500

Can you run ast_coredumper and attach all of the .txt files?

By: Robert Sutton (rsutton@noojee.com.au) 2021-03-31 23:34:07.864-0500

I've sent a link to the ast_coredumper output via email to asteriskteam@digium.com

By: George Joseph (gjoseph) 2021-04-01 07:57:45.748-0500

I should have asked this yesterday but can you re-run ast_coredumper with the --tarball-coredumps option and re-upload the results to Google Drive?  This will include the actual coredump file and the asterisk binaries and will allow me to examine data structures not in the text files.


By: Robert Sutton (rsutton@noojee.com.au) 2021-04-05 18:15:11.333-0500

I've sent a fresh link to the output of ast_coredumper with the requested options

By: Joshua C. Colp (jcolp) 2021-04-13 04:29:37.285-0500

Are you running PJSIP with any additional patches, such as the fix for the crash in STUN[1]?

[1] https://github.com/pjsip/pjproject/issues/2505

By: Robert Sutton (rsutton@noojee.com.au) 2021-04-13 07:32:17.733-0500

these are the patches I'm running.

{noformat}

diff --git a/main/manager_channels.c b/main/manager_channels.c
index c964033..25b5c65 100644
--- a/main/manager_channels.c
+++ b/main/manager_channels.c
@@ -493,6 +493,11 @@ struct ast_str *ast_manager_build_channel_state_string_prefix(
       char *connected_name;
       int res;

+       if (!snapshot) {
+               __ast_assert_failed(0, "snapshot != NULL in ast_manager_build_channel_state_string_prefix",__FILE__,__LINE__,__PRETTY_FUNCTION__);
+               return NULL;
+       }
+
       if (snapshot->tech_properties & AST_CHAN_TP_INTERNAL) {
               return NULL;
       }
diff --git a/main/stringfields.c b/main/stringfields.c
index 0a9e599..28ebb05 100644
--- a/main/stringfields.c
+++ b/main/stringfields.c
@@ -80,6 +80,10 @@ static int add_string_pool(struct ast_string_field_mgr *mgr, struct ast_string_f

static void reset_field(const char **p)
{                                                                                                                                            
+       if (!p) {                                                                                                                            
+               __ast_assert_failed(0, "p == NULL in stringfield.reset_field",__FILE__,__LINE__,__PRETTY_FUNCTION__);                        
+               return;                                                                                                                      
+       }
       *p = __ast_string_field_empty;
}



{noformat}

By: Joshua C. Colp (jcolp) 2021-04-19 04:13:48.707-0500

I'm attaching a patch from Teluu which may improve this issue. Personally I haven't been able to reproduce the problem, but I have been running the patch through many calls to ensure that it doesn't cause a regression in ICE support itself.

The file should be placed in the "third-party/pjproject/patches" directory of Asterisk. This should be done in a fresh copy of the Asterisk source code, to ensure that pjproject is built with the patch. You can also do a "make distclean" in Asterisk and rebuild it completely to ensure the patch is incorporated.

If a crash still occurs after using the patch, then a fresh backtrace will be necessary in order to investigate further.

By: Joshua C. Colp (jcolp) 2021-04-27 12:23:18.107-0500

Have you been able to test the attached patch and provide feedback?

By: Robert Sutton (rsutton@noojee.com.au) 2021-04-27 23:14:06.161-0500

No, I haven't had a chance as I am in the middle of moving home.

I'll be back at work next week, I'll do a build with this patch but it may be another week before the build gets to the system which is commonly affected by this bug.


By: Robert Sutton (rsutton@noojee.com.au) 2021-05-13 06:02:28.886-0500

This patch was deployed yesterday, no crashes yet. Give it a couple more days to be sure.

By: Friendly Automation (friendly-automation) 2021-05-17 08:54:16.538-0500

Change 15869 merged by Joshua Colp:
pjsip: Add patch for resolving STUN packet lifetime issues.

[https://gerrit.asterisk.org/c/asterisk/+/15869|https://gerrit.asterisk.org/c/asterisk/+/15869]

By: Friendly Automation (friendly-automation) 2021-05-17 08:54:37.788-0500

Change 15868 merged by Joshua Colp:
pjsip: Add patch for resolving STUN packet lifetime issues.

[https://gerrit.asterisk.org/c/asterisk/+/15868|https://gerrit.asterisk.org/c/asterisk/+/15868]

By: Friendly Automation (friendly-automation) 2021-05-17 08:54:54.450-0500

Change 15882 merged by Joshua Colp:
pjsip: Add patch for resolving STUN packet lifetime issues.

[https://gerrit.asterisk.org/c/asterisk/+/15882|https://gerrit.asterisk.org/c/asterisk/+/15882]