Summary: | ASTERISK-21406: [patch] chan_sip deadlock on monlock between unload_module and do_monitor | ||
Reporter: | Corey Farrell (coreyfarrell) | Labels: | |
Date Opened: | 2013-04-10 19:01:45 | Date Closed: | 2014-03-07 16:59:03.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 1.8.24.0 11.4.0 | Frequency of Occurrence | Occasional |
Related Issues: | |||
Environment: | Ubuntu/quantal, eglibc-2.15-0ubuntu20 | Attachments: | ( 0) chan_sip-unload-deadlock-backtrace.txt ( 1) chan_sip-unload-deadlock-debug.patch ( 2) chan_sip-unload-testfix.patch |
Description: | unload_module cancels/joins the monitor thread while holding monlock. If do_monitor attempts to lock monlock while unload_module already has it, they deadlock. do_monitor waits for monlock while unload_module waits for do_monitor to exit.
I've experienced this issue a couple of times in production when attempting to shutting down. I found the cause while running valgrind tests. I believe valgrind slowed things down so much it caused the deadlock to occur somewhat reliably. I could not replicate the issue with lock debugging enabled. I added ast_log messages to unload_module, found that they stopped while monlock was held. The valgrind testing was done with 'make samples', no changes to /etc/asterisk. I tried attaching gdb once the lock occured but it could not find symbols (probably because of valgrind). | ||
Comments: | By: Corey Farrell (coreyfarrell) 2013-04-10 19:38:03.436-0500 [^chan_sip-unload-testfix.patch] is a possible fix. At first I did not use sched_yield(), the ast_debug message was printed, but the deadlock was avoided. After adding sched_yield I was not been able to reproduce the deadlock and or the ast_mutex_trylock failed message. This patch has not been tested with any SIP peers/activity, it was only tested as a way to fix the specific issue. By: David Brillert (aragon) 2013-07-18 08:07:13.596-0500 I might be experiencing the same deadlock. Do you have a gdb trace you can upload so I can compare traces? By: Corey Farrell (coreyfarrell) 2013-07-31 02:58:57.528-0500 gdb backtrace is from 1.8 branch. thread 5 is do_monitor() waiting for monlock. thread 16 is attempting to unload chan_sip. it has monlock and is waiting for do_monitor() to exit (pthread_join) Built without thread debugging, run within valgrind. I've been unable to reproduce this issue with thread debugging enabled. Thread debugging / deadlock detection adds a bunch of code to ast_mutex_lock, one of the calls must react to pthread_cancel. By: Corey Farrell (coreyfarrell) 2014-02-25 18:02:37.029-0600 [^chan_sip-unload-deadlock-debug.patch] is not meant to be committed. If you attempt to unload chan_sip while do_monitor is in delay it will deadlock every time. By: Corey Farrell (coreyfarrell) 2014-03-03 13:55:26.426-0600 Review reposted to https://reviewboard.asterisk.org/r/3284/ for switch to my new RB username. |