Summary: | ASTERISK-16900: [patch] VoicemailMain Exits Without Warning | ||
Reporter: | Leo Brown (netfuse) | Labels: | |
Date Opened: | 2010-11-02 06:46:30 | Date Closed: | 2010-11-12 14:53:11.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_voicemail |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) app_voicemail-issue18240-1.6.2.diff.txt ( 1) bug18240.1-6-2.diff.txt ( 2) bug18240-1.6.2-diff2.txt ( 3) bug18240-trunk-2.diff.txt | |
Description: | Hi I've seen problems with VoicemailMain crashing in the past, from not being able to lock paths, open audio files etc. However, I have a VoiceMailMain which is opening a mailbox, but existing after verifying a correct password. The console trace is below, but because Asterisk is not crashing, I can not provide a backtrace. Could you please give me some guidance on how to further debug, as debug level 10 (shown below) only shows it locking/unlocking paths and then exiting. I have checked all permissions and even removed mailboxes and let Asterisk create the paths itself. Voicemail() works fine, but VoicemailMain does not. The relevant entry from voicemail.conf is also Leo ****** ADDITIONAL INFORMATION ****** Console output: [Nov 2 07:22:43] DEBUG[19478]: app_voicemail.c:9355 vm_execmain: After vm_authenticate [Nov 2 07:22:43] DEBUG[19478]: app_voicemail.c:9398 vm_execmain: Before open_mailbox [Nov 2 07:22:43] DEBUG[19478]: app.c:1249 ast_lock_path_lockfile: Locked path '/var/spool/asterisk/voicemail/default/10350/Old' [Nov 2 07:22:43] DEBUG[19478]: app.c:1266 ast_unlock_path_lockfile: Unlocked path '/var/spool/asterisk/voicemail/default/10350/Old' [Nov 2 07:22:43] DEBUG[19478]: app.c:1249 ast_lock_path_lockfile: Locked path '/var/spool/asterisk/voicemail/default/10350/Old' [Nov 2 07:22:43] DEBUG[19478]: app.c:1266 ast_unlock_path_lockfile: Unlocked path '/var/spool/asterisk/voicemail/default/10350/Old' [Nov 2 07:22:43] DEBUG[19478]: pbx.c:4333 __ast_pbx_run: Spawn extension (from-internal,*98,11) exited non-zero on 'Local/*98@from-internal-c32e;2' voicemaill.conf: [default] 10350 => 1350,VM - GBG,test@example.com,,attach=yes|saycid=no|envelope=no|delete=yes CLI output: pbx*CLI> voicemail show users Context Mbox User Zone NewMsg default 10350 VM - GBG 0 | ||
Comments: | By: Leo Brown (netfuse) 2010-11-02 09:23:47 Hi I have found that it's this code that is causing the return: /* Retrieve urgent, old and new message counts */ ast_debug(1, "Before open_mailbox\n"); res = open_mailbox(&vms, vmu, OLD_FOLDER); /* Count all messages, even Urgent */ if (res < 0){ goto out; } I.e. res < 0. That's returned from open_mailbox, here: /* for local storage, checks directory for messages up to maxmsg limit */ last_msg = last_message_index(vmu, vms->curdir); ast_unlock_path(vms->curdir); if (last_msg < 0) { return last_msg; } Which of course, is from last_message_index, which is returning -1 because there are no files in the Old/ directory - which seems to be the correct operation. So either I'm reading the source wrong, or this is intentionally dying because there are no old messages due to some sort of strange bug uploaded into 1.6.2 branch? :| By: Leif Madsen (lmadsen) 2010-11-02 09:37:10 You can produce a backtrace still by attaching to the existing process (per Google: http://gcc.gnu.org/onlinedocs/gnat_ugn_unw/Attaching-to-a-Running-Process.html) You should also provide 'core show locks' as well, which will require you to have Compiler Flags -> DEBUG_THREADS enabled. ~~~~~~~~~~~~~~~~~~~~~~~ Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then: make install after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt. When complete, attach that file to this issue report. Thanks! ~~~~~~~~~~~~~~~~~~~~~~~~~~ Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install) This will then give you the console command: core show locks When the symptoms of the deadlock present themselves again, please provide output of the deadlock via: # asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt # gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt gdb> bt gdb> bt full gdb> thread apply all bt Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks! By: Leo Brown (netfuse) 2010-11-02 09:41:43 Hi Thanks for the info - I have reverted branches/1.6.2 to revision 272642 (where I have a stable system) and this functions fine. Please see my logic breakdown above - I determined this by placing debug info in the relevant areas (there was not much debug code in there). It seems that the issue is that the app_voicemail is detecting -1 as a problem with locking the path, rather than there being no messages to read. The (working) code in revision 272642 shows instead: /* Retrieve urgent, old and new message counts */ ast_debug(1, "Before open_mailbox\n"); res = open_mailbox(&vms, vmu, OLD_FOLDER); /* Count all messages, even Urgent */ if (res == ERROR_LOCK_PATH) goto out; Which does not falsely exit. Leo By: Michael L. Young (elguero) 2010-11-02 15:32:30 This looks like a regression to me. It appears there was a change in logic resulting in the exiting from app_voicemail when a folder has no messages. 1.4 r293004 1.6.2 r293118 1.8 r293119 trunk r293120 I am not sure if the attached patch is the right solution but it seemed like the most logical fix, unless there is some other part of the code that I am not seeing at this time. last_msg will always be 0 or greater than, or -1 when there is no index (no messages). By: Freddi Hansen (freddi_fonet) 2010-11-07 18:16:34.000-0600 Hi, I had same problem, used your patch on 1.6.2. rel 293969, it works fine now. thanks. By: Alec Davis (alecdavis) 2010-11-09 15:56:29.000-0600 When using FILE based storage, VoicemailMain asks for boxnumber and password, then abrutly terminates call. Confirm with trunk and 1.6.2svn uploaded bug18240.1-6-2.diff.txt findings so far, not necessarily correct. As I understand it, last_message_index returns. > -1 for last message (0 if only 1 message exists) -1 for 0 messages < -1 for errors Also in the patch, is a possible missed unlock, IMAP code path. By: Alec Davis (alecdavis) 2010-11-10 15:29:41.000-0600 With a simple debug print in the last_msg < 0 test it's failing on checking the urgent mailbox. Nov 11 10:06:16] NOTICE[26106]: app_voicemail.c:7420 open_mailbox: ALEC Mailbox: /var/spool/asterisk/voicemail/default/8612/Urgent, last_msg -1 By: Alec Davis (alecdavis) 2010-11-10 16:44:54.000-0600 in 1.6.2 branch r43829, this is where the logic started going wrong. @@ -7408,7 +7409,7 @@ if (vm_lock_path(vms->curdir)) { ast_log(AST_LOG_ERROR, "Could not open mailbox %s: mailbox is locked\n", vms->curdir); - return -1; + return ERROR_LOCK_PATH; } Then from r293118 everywhere else; res = open_mailbox(&vms, vmu, 11); /*11 is the Urgent folder */ - if (res < 0) + if (res == ERROR_LOCK_PATH) By: Alec Davis (alecdavis) 2010-11-10 18:42:45.000-0600 as elguero in ~128751 points out, regression since trunk r293120 uploaded 1.6.2 patch bug18240-1.6.2-diff2.txt in open_mailbox, if vm_lock_path fails return ERROR_LOCK_PATH instead of -1 reverts the ERROR_LOCK_PATH changes unrelated missed unlock By: Jeff Peeler (jpeeler) 2010-11-11 18:17:04.000-0600 Sorry about the mistake! The second patch, bug18240.1-6-2.diff.txt, will be the one committed. But just in case I missed something has anybody found any problems with that one? By: Michael L. Young (elguero) 2010-11-11 18:51:22.000-0600 Ah... I see what I missed.. I failed to see that count_messages could return ERROR_LOCK_PATH inside open_mailbox. I would say that bug18240.1-6-2.diff.txt would be the correct patch. I am going to test this out right now and let you know the results. By: Michael L. Young (elguero) 2010-11-11 19:16:35.000-0600 I applied alecdavis' changes to trunk (bug18240-trunk-2.diff.txt) and it was working properly. Thanks alecdavis By: Alec Davis (alecdavis) 2010-11-11 20:06:21.000-0600 jpeeler: I hope you meant this one 1.6.2 patch bug18240-1.6.2-diff2.txt sorry for the very similar names. The only negative value at the moment that open_mailbox returns is -1 for 'no messages', so checking for < -1, which I initially suggested was wrong. open_mailbox is supposed to return ERROR_LOCK_PATH (-100) if the vm_lock_path fails, and -1 for no messages. By: Alec Davis (alecdavis) 2010-11-11 20:13:47.000-0600 elguero: Looks like you concur, bug18240-trunk-2.diff.txt is from bug18240-1.6.2-diff2.txt, not as ~128807 suggests bug18240.1-6-2.diff.txt jpeeler: maybe actually both are right - confuse everyone. if < -1 then an error (currently only -100). if -1 then 0 messages if >= 0 then 1 or more messages By: Michael L. Young (elguero) 2010-11-11 21:14:03.000-0600 alecdavis: You are correct, I copied the filename in jpeeler's note but I actually applied the second one as you already noted. What part of the code are you talking about in regards to the if < -1? The different code paths depending on the backend does seem to get confusing. Inside open_mailbox function I understand the following: count_messages will return: -100 if there is a locking error (filesystem path) 0 or greater (filesystem path) 0 or greater (odbc path) last_message_index will only return: 0 or greater than for index (1 or more messages) -1 to indicate no index (no message) filesystem path - appears that -1 returned when there is an error with opendir() but it will just be treated the same as not having an index instead of determining locking error because that function should only be called once a lock has already been set. So, I am not sure if that should be changed or not since the locking should have already been handled. At least this is how I am understanding it. So, I think your second patch was the correct fix if what we are checking for is locking errors for the filesystem storage path and exiting. By: Alec Davis (alecdavis) 2010-11-11 22:40:03.000-0600 sorry I was too vague, lets forget about the < -1. Where I was coming from: The existing code is res = open_mailbox(.... if (res < 0) Where the reverted code should be res = open_mailbox(.... if (res == ERROR_LOCK_PATH) But could be, to allow for future errors. res = open_mailbox(.... if (res < -1 ) By: Jeff Peeler (jpeeler) 2010-11-12 10:49:44.000-0600 Yes, the old code with ERROR_LOCK_PATH did not account for things that it should have. And I added more code that is critical, so just going back to that is not the proper fix. I really did mean the patch I listed. Examining the different behaviors of the backends is indeed confusing and I don't even want to type it out. Basically there is confusion between -1 meaning error and the mail index returning -1 for 0 messages. However, given that -1 is a widely used convention of meaning error (and is used here in voicemail) the checking of < -1 should be limited as the second patch has done. By: Digium Subversion (svnbot) 2010-11-12 14:49:11.000-0600 Repository: asterisk Revision: 294903 U branches/1.4/apps/app_voicemail.c ------------------------------------------------------------------------ r294903 | jpeeler | 2010-11-12 14:49:10 -0600 (Fri, 12 Nov 2010) | 16 lines Fix regression causing abort in voicemail after opening a mailbox with no mesgs. In order to be more safe, some error handling code was changed to respect more error conditions including the potential memory allocation failure for deleted and heard message tracking introduced in 293004. However, last_message_index returns -1 for zero messages (perhaps as expected) and was triggering the stricter error checking. Because last_message_index is only called directly in one place, just return 0 from open_mailbox (for file based storage) when no messages are detected unless a real error has occurred. (closes issue ASTERISK-16900) Reported by: leobrown Patches: bug18240.1-6-2.diff.txt uploaded by alecdavis (license 585) Tested by: pabelanger ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=294903 By: Digium Subversion (svnbot) 2010-11-12 14:51:16.000-0600 Repository: asterisk Revision: 294904 _U branches/1.6.2/ U branches/1.6.2/apps/app_voicemail.c ------------------------------------------------------------------------ r294904 | jpeeler | 2010-11-12 14:51:16 -0600 (Fri, 12 Nov 2010) | 23 lines Merged revisions 294903 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r294903 | jpeeler | 2010-11-12 14:49:09 -0600 (Fri, 12 Nov 2010) | 16 lines Fix regression causing abort in voicemail after opening a mailbox with no mesgs. In order to be more safe, some error handling code was changed to respect more error conditions including the potential memory allocation failure for deleted and heard message tracking introduced in 293004. However, last_message_index returns -1 for zero messages (perhaps as expected) and was triggering the stricter error checking. Because last_message_index is only called directly in one place, just return 0 from open_mailbox (for file based storage) when no messages are detected unless a real error has occurred. (closes issue ASTERISK-16900) Reported by: leobrown Patches: bug18240.1-6-2.diff.txt uploaded by alecdavis (license 585) Tested by: pabelanger ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=294904 By: Digium Subversion (svnbot) 2010-11-12 14:52:07.000-0600 Repository: asterisk Revision: 294905 _U branches/1.8/ U branches/1.8/apps/app_voicemail.c ------------------------------------------------------------------------ r294905 | jpeeler | 2010-11-12 14:52:06 -0600 (Fri, 12 Nov 2010) | 30 lines Merged revisions 294904 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.6.2 ................ r294904 | jpeeler | 2010-11-12 14:51:15 -0600 (Fri, 12 Nov 2010) | 23 lines Merged revisions 294903 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r294903 | jpeeler | 2010-11-12 14:49:09 -0600 (Fri, 12 Nov 2010) | 16 lines Fix regression causing abort in voicemail after opening a mailbox with no mesgs. In order to be more safe, some error handling code was changed to respect more error conditions including the potential memory allocation failure for deleted and heard message tracking introduced in 293004. However, last_message_index returns -1 for zero messages (perhaps as expected) and was triggering the stricter error checking. Because last_message_index is only called directly in one place, just return 0 from open_mailbox (for file based storage) when no messages are detected unless a real error has occurred. (closes issue ASTERISK-16900) Reported by: leobrown Patches: bug18240.1-6-2.diff.txt uploaded by alecdavis (license 585) Tested by: pabelanger ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=294905 By: Digium Subversion (svnbot) 2010-11-12 14:53:09.000-0600 Repository: asterisk Revision: 294907 _U trunk/ U trunk/apps/app_voicemail.c ------------------------------------------------------------------------ r294907 | jpeeler | 2010-11-12 14:53:09 -0600 (Fri, 12 Nov 2010) | 37 lines Merged revisions 294905 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.8 ................ r294905 | jpeeler | 2010-11-12 14:52:06 -0600 (Fri, 12 Nov 2010) | 30 lines Merged revisions 294904 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.6.2 ................ r294904 | jpeeler | 2010-11-12 14:51:15 -0600 (Fri, 12 Nov 2010) | 23 lines Merged revisions 294903 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r294903 | jpeeler | 2010-11-12 14:49:09 -0600 (Fri, 12 Nov 2010) | 16 lines Fix regression causing abort in voicemail after opening a mailbox with no mesgs. In order to be more safe, some error handling code was changed to respect more error conditions including the potential memory allocation failure for deleted and heard message tracking introduced in 293004. However, last_message_index returns -1 for zero messages (perhaps as expected) and was triggering the stricter error checking. Because last_message_index is only called directly in one place, just return 0 from open_mailbox (for file based storage) when no messages are detected unless a real error has occurred. (closes issue ASTERISK-16900) Reported by: leobrown Patches: bug18240.1-6-2.diff.txt uploaded by alecdavis (license 585) Tested by: pabelanger ........ ................ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=294907 |