Summary: | ASTERISK-24451: chan_iax2: reference leak in sched_delay_remove | ||
Reporter: | Corey Farrell (coreyfarrell) | Labels: | |
Date Opened: | 2014-10-26 18:43:10 | Date Closed: | 2015-02-18 20:00:15.000-0600 |
Priority: | Minor | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_iax2 |
Versions: | 11.16.0 13.0.0-beta3 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ( 0) refs.txt | |
Description: | chan_iax2 leaks references to {{peercnt}} when Asterisk is shutdown within 60 seconds of calling {{sched_delay_remove}}. This issue was found with the testsuite using ASTERISK-24379. Many tests produce the leak.
I don't know chan_iax2 but am reporting this in the hopes that someone can address it so we can get the full testsuite leak free. | ||
Comments: | By: Corey Farrell (coreyfarrell) 2014-10-26 18:46:26.103-0500 Output from refcounter.py -n for {{tests/callparking}}. The full refs log is 57mb, if it is needed we will need to find a way for me to post it (JIRA limit is 10MB). By: Matt Jordan (mjordan) 2014-10-28 11:48:05.179-0500 So the weird thing is that reference bump is balanced. The variable in question is {{peercnt}}: {code} 3033: if ((peercnt = ao2_find(peercnts, &tmp, OBJ_POINTER))) { 3034: /* refcount is incremented with ao2_find. keep that ref for the scheduler */ 3035: ast_debug(1, "schedule decrement of callno used for %s in %d seconds\n", ast_sockaddr_stringify_addr(addr), MIN_REUSE_TIME); 3036: i = iax2_sched_add(sched, MIN_REUSE_TIME * 1000, peercnt_remove_cb, peercnt); 3037: if (i == -1) { 3038: ao2_ref(peercnt, -1); 3039: } 3040: } {code} The variable leaking is {{peercnt}}. The function that should drop the reference is {{peercnt_remove_cb}}, which should be called so long as {{iax2_sched_add}} returns anything but {{-1}}, in which case we drop the reference ourselves. {{iax2_sched_add}} calls {{ast_sched_add}} - which returns {{-1}} on failure and the scheduled ID if the scheduler succeeded. So the question is - why is the callback function {{peercnt_remove_cb}} not getting called? Does a {{gdb backtrace}} of the running Asterisk instance show any weird locking situations? The other option would be to put some debug into {{chan_iax2}} and try to find out why that callback isn't being run, or if the scheduler isn't firing for some reason. By: Corey Farrell (coreyfarrell) 2014-11-04 07:16:16.350-0600 {code} #define MIN_REUSE_TIME 60 /* Don't reuse a call number within 60 seconds */ {code} We could be seeing the result of shutdown happening less than 60 seconds after the call to sched_delay_remove. This still poses an issue for the {{REF_DEBUG}} testsuite, causing 10 tests to fail. I realize this probably isn't a "real" production leak, but it's existence reduces the effectiveness of brute force leak checking. Maybe this could be solved by saving the result of {{iax2_sched_add}} to a field in {{struct peercnt}}, iterate peercnts at shutdown and decrement? By: Y Ateya (yateya) 2015-01-17 12:52:59.192-0600 This might be the cause for ASTERISK-24478. The callback is not called after stressing IAX connection. By: Corey Farrell (coreyfarrell) 2015-02-15 13:29:15.596-0600 I believe I have a fix for this issue. I'm currently doing some tests. [~yateya]: I don't think your issue is related. This issue resolves itself after a wait of 60 seconds, but that is a problem for automated testing where Asterisk is quickly shutdown. By: Corey Farrell (coreyfarrell) 2015-02-15 14:05:43.805-0600 Posted to reviewboard: https://reviewboard.asterisk.org/r/4425/ |