[Home]

Summary:ASTERISK-23119: [patch] main/bridging.c sometimes fails to pthread_join the bridge thread
Reporter:Corey Farrell (coreyfarrell)Labels:
Date Opened:2014-01-08 05:14:37.000-0600Date Closed:2017-12-18 11:16:51.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Core/Bridging
Versions:1.8.26.0 11.8.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bridging-empty_thread_wait.patch
( 1) leak.txt
Description:When Hangup is received from the last channel in a ConfBridge conference, bridge_array_remove() sets bridge->array_num to 0.  This can cause bridge_thread() to clear bridge->thread before ast_bridge_destroy() is run.  When that happens ast_bridge_destroy() doesn't call pthread_join().

The only ConfBridge setting used was quiet=yes.  I used a SIP phone to place a call to ConfBridge, waited a few seconds then hung up from the phone.  This causes the issue almost every time with 1.8 and 11 in valgrind.  I have not tested 12 since it doesn't use bridge->array_num as a thread exit condition, and doesn't clear bridge->thread.
Comments:By: Corey Farrell (coreyfarrell) 2014-01-08 05:38:27.447-0600

[^leak.txt] shows the thread being leaked.

By: Matt Jordan (mjordan) 2014-01-08 07:55:51.960-0600

While I agree with the problem, I'm not sure this patch is the right approach. Any time {{usleep}} gets introduced with another looping condition, it feels like we're trying to hack around the problem. As it is, this merely hopes that {{ast_bridge_destroy}} will get called within that time period, which while very likely, doesn't feel like the right approach.

The real problem here feels like {{bridge_thread}} setting {{bridge->thread}} to AST_PTHREAD_NULL. It probably shouldn't.
* Calling {{pthread_join}} on a terminated thread is acceptable, as it will merely return automatically
* Calling {{pthread_kill}} on a terminated thread should simply fail with ESRCH, which is acceptable

Not clearing the thread ID in that location should allow {{ast_bridge_destroy}} to join in an acceptable manner.




By: Corey Farrell (coreyfarrell) 2014-01-09 02:46:11.782-0600

From bridge_channel_join:
{code}
case AST_BRIDGE_CHANNEL_STATE_FEATURE:
bridge_channel_suspend(bridge_channel->bridge, bridge_channel);
ao2_unlock(bridge_channel->bridge);
bridge_channel_feature(bridge_channel->bridge, bridge_channel);
ao2_lock(bridge_channel->bridge);
bridge_channel_unsuspend(bridge_channel->bridge, bridge_channel);
break;
{code}

If the only channel in a bridge get's suspended, this could cause bridge_thread() to return.  Without any patch a thread handle is leaked every time a bridge feature (menu option) is used by the only participant.  If bridge->thread wasn't cleared then no new thread would be started.

I agree that the usleep loop is a hack, but I think that ending the thread when bridge->array_num==0 is bad.

By: Joshua C. Colp (jcolp) 2017-12-18 11:01:40.931-0600

Have you seen this after the tons of work that Richard put in for Asterisk 12 and above?

By: Corey Farrell (coreyfarrell) 2017-12-18 11:16:31.729-0600

I have not tested newer versions but I just looked at the code and I believe this will no longer be an issue as the bridge thread does not clear it's own thread variable.

By: Corey Farrell (coreyfarrell) 2017-12-18 11:16:51.205-0600

I believe this was fixed in 12.0.0