Summary: | ASTERISK-23119: [patch] main/bridging.c sometimes fails to pthread_join the bridge thread | ||
Reporter: | Corey Farrell (coreyfarrell) | Labels: | |
Date Opened: | 2014-01-08 05:14:37.000-0600 | Date Closed: | 2017-12-18 11:16:51.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Core/Bridging |
Versions: | 1.8.26.0 11.8.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ( 0) bridging-empty_thread_wait.patch ( 1) leak.txt | |
Description: | When Hangup is received from the last channel in a ConfBridge conference, bridge_array_remove() sets bridge->array_num to 0. This can cause bridge_thread() to clear bridge->thread before ast_bridge_destroy() is run. When that happens ast_bridge_destroy() doesn't call pthread_join().
The only ConfBridge setting used was quiet=yes. I used a SIP phone to place a call to ConfBridge, waited a few seconds then hung up from the phone. This causes the issue almost every time with 1.8 and 11 in valgrind. I have not tested 12 since it doesn't use bridge->array_num as a thread exit condition, and doesn't clear bridge->thread. | ||
Comments: | By: Corey Farrell (coreyfarrell) 2014-01-08 05:38:27.447-0600 [^leak.txt] shows the thread being leaked. By: Matt Jordan (mjordan) 2014-01-08 07:55:51.960-0600 While I agree with the problem, I'm not sure this patch is the right approach. Any time {{usleep}} gets introduced with another looping condition, it feels like we're trying to hack around the problem. As it is, this merely hopes that {{ast_bridge_destroy}} will get called within that time period, which while very likely, doesn't feel like the right approach. The real problem here feels like {{bridge_thread}} setting {{bridge->thread}} to AST_PTHREAD_NULL. It probably shouldn't. * Calling {{pthread_join}} on a terminated thread is acceptable, as it will merely return automatically * Calling {{pthread_kill}} on a terminated thread should simply fail with ESRCH, which is acceptable Not clearing the thread ID in that location should allow {{ast_bridge_destroy}} to join in an acceptable manner. By: Corey Farrell (coreyfarrell) 2014-01-09 02:46:11.782-0600 From bridge_channel_join: {code} case AST_BRIDGE_CHANNEL_STATE_FEATURE: bridge_channel_suspend(bridge_channel->bridge, bridge_channel); ao2_unlock(bridge_channel->bridge); bridge_channel_feature(bridge_channel->bridge, bridge_channel); ao2_lock(bridge_channel->bridge); bridge_channel_unsuspend(bridge_channel->bridge, bridge_channel); break; {code} If the only channel in a bridge get's suspended, this could cause bridge_thread() to return. Without any patch a thread handle is leaked every time a bridge feature (menu option) is used by the only participant. If bridge->thread wasn't cleared then no new thread would be started. I agree that the usleep loop is a hack, but I think that ending the thread when bridge->array_num==0 is bad. By: Joshua C. Colp (jcolp) 2017-12-18 11:01:40.931-0600 Have you seen this after the tons of work that Richard put in for Asterisk 12 and above? By: Corey Farrell (coreyfarrell) 2017-12-18 11:16:31.729-0600 I have not tested newer versions but I just looked at the code and I believe this will no longer be an issue as the bridge thread does not clear it's own thread variable. By: Corey Farrell (coreyfarrell) 2017-12-18 11:16:51.205-0600 I believe this was fixed in 12.0.0 |