[Home]

Summary:ASTERISK-28992: app_voicemail: Deadlock in ODBC when retrieving file
Reporter:Schneur Rosenberg (thesipguy)Labels:patch
Date Opened:2020-07-16 03:40:24Date Closed:
Priority:MajorRegression?
Status:Waiting for Feedback/In ProgressComponents:Applications/app_voicemail/ODBC
Versions:17.5.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Debian 9Attachments:( 0) 0001-app_voicemail-Attempt-to-optimize-mailbox-resequenci.patch
( 1) debug_log_28992
( 2) locks.zip
Description:I'm having a major issue since I upgraded from Asterisk 11 that both voicemail and voicemailmain ocasionally crash, the call remains connected but there is silence, and the channel cannot be killed, hanging up wont help and neither would  "hangup request" and the every few seconds there would be a message in the CLI like WARNING[31353]: chan_sip.c:4410 __sip_autodestruct: Autodestruct on dialog '1216767442-5060-95@BJC.BGI.A.BA' with owner SIP/9174342412-0000016a in place (Method: BYE). Rescheduling destruction for 10000 ms.

Usually once it crashes no other users can get into their voicemail boxes and no new voicemails can be left until i restart asterisk
Comments:By: Asterisk Team (asteriskteam) 2020-07-16 03:40:25.290-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2020-07-16 05:37:07.117-0500

We suspect that you have a deadlock occurring within Asterisk. Please follow the instructions on the wiki [1] for obtaining debug relevant to a deadlock. Once you have that information, attach it to the issue. Be sure the instructions are followed exactly as the debug may otherwise not be useful.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock



By: Joshua C. Colp (jcolp) 2020-07-16 05:38:25.073-0500

Additionally - be aware that ODBC support in app_voicemail and chan_sip are both community supported. If the backtrace does show issues there is no timeframe on when they would get looked into if the problem is in those areas.

I would also ask that you specify what database you are using and which connector. If using MySQL then ensure you are using the MariaDB connector, as that works the best for people. The MySQL connector can cause crashes and deadlocks.

By: Schneur Rosenberg (thesipguy) 2020-07-16 07:17:20.053-0500

I enabaled DEBUG_THREADS, MALLOC_DEBUG, DONT_OPTIMIZE and restarted asterisk with -g, when I run sudo /var/lib/asterisk/scripts/ast_coredumper --running --no-default-search, I get a message

No suitable gdb was found in /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

I'm using Mariadb 10.5.4 and the mariadb drivers, I had the same issue with Mariadb 10.1

By: Joshua C. Colp (jcolp) 2020-07-16 07:19:03.682-0500

Do you have the gdb package installed on your system?

By: Schneur Rosenberg (thesipguy) 2020-07-16 07:22:52.302-0500

One more note which might help, the first time I enter the mailbox it tells me the count of the messages and when I press one to listen to the messages it gets stuck after it says "First" but afterwards all future attempts get stuck right after entering the password.

Each call I make becomes a stuck call that cant be released only via a restart

By: Benjamin Keith Ford (bford) 2020-07-17 08:40:20.075-0500

Echoing Josh here, but do you have gdb installed? A backtrace will be required for any additional work to be done on this.

By: Asterisk Team (asteriskteam) 2020-07-31 12:00:02.089-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Schneur Rosenberg (thesipguy) 2020-08-03 09:26:00.299-0500

Sorry for the delay, here is the relevant files

By: Asterisk Team (asteriskteam) 2020-08-03 09:26:00.917-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Schneur Rosenberg (thesipguy) 2020-11-25 05:10:52.485-0600

Any update? this is rediculous, Voicemail ODBC is completely broken an d no one gives a hoot?

By: Joshua C. Colp (jcolp) 2020-11-25 07:35:43.375-0600

If there are any updates they will be posted on this issue. ODBC support in Voicemail itself is not frequently used by the Asterisk community, most individuals use the normal file based support.

By: Sean Bright (seanbright) 2020-11-25 08:40:25.172-0600

[~thesipguy], please attach your {{res_odbc.conf}} file.

By: Sean Bright (seanbright) 2020-11-25 09:00:43.544-0600

[~thesipguy] - you can also try [this patch|^0001-app_voicemail-Prevent-deadlocks-when-out-of-ODBC-dat.patch] which I believe will resolve the problem regardless of your config. It's against 18 as 17 is no longer supported, so it may not apply cleanly for you.

By: Schneur Rosenberg (thesipguy) 2020-11-26 03:24:53.309-0600

[ENV]

[voicemailodbc]
enabled => yes
dsn => voicemailodbc
username => changedForSecurity
password => changedForSecurity
pre-connect => yes
share_connections => yes
forcecommit => yes


By: Schneur Rosenberg (thesipguy) 2020-11-26 04:38:18.098-0600

Hi, I downloaded the latest version of astersik 18, I ran the patch via
patch -p1 < ../0001-app_voicemail-Prevent-ODBC-related-deadlock-out-of-d.patch    
(the patch was in one folder above) and I still have the same issue, I saw this in the logs
app_voicemail_odbc.c:9121 open_mailbox: Resequencing Mailbox: /var/spool/asterisk/voicemail/default/webcustomer96741/INBOX, expected 129 but found 75 message(s) in box with max threshold of 100.

but then the channel hung.

By: Sean Bright (seanbright) 2020-11-26 10:30:02.801-0600

[~thesipguy] - I've updated [the patch|^0001-app_voicemail-Prevent-deadlocks-when-out-of-ODBC-dat.patch]. Can you try it?

By: Schneur Rosenberg (thesipguy) 2020-11-27 08:32:50.708-0600

After I applied the new patch I get this error when compiling

  [CC] app_voicemail_odbc.c -> app_voicemail_odbc.o
app_voicemail_odbc.c: In function ‘count_messages_in_folder’:
app_voicemail_odbc.c:5799:23: error: ‘obj’ undeclared (first use in this function)
 ast_odbc_release_obj(obj);
                      ^~~
app_voicemail_odbc.c:5799:23: note: each undeclared identifier is reported only once for each function it appears in
app_voicemail_odbc.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-format-truncation’
/usr/src/asterisk/Makefile.rules:164: recipe for target 'app_voicemail_odbc.o' failed
make[1]: *** [app_voicemail_odbc.o] Error 1
Makefile:387: recipe for target 'apps' failed
make: *** [apps] Error 2


By: Sean Bright (seanbright) 2020-11-27 08:40:42.620-0600

Yes, the new patch is definitely broken, sorry about that.

Try setting {{max_connections = 8}} under {{[voicemailodbc]}} in your config and see if that clears it up.

By: Sean Bright (seanbright) 2020-11-27 11:17:45.734-0600

I reverted the patch to the previous version. I will need an up-to-date backtrace from you while the application is deadlocked to further debug.

By: Sean Bright (seanbright) 2020-12-01 14:48:37.074-0600

Are you also using realtime for voicemail configuration (extconfig.conf mentions "voicemail")?

By: Sean Bright (seanbright) 2020-12-01 15:38:15.229-0600

[~thesipguy] - I've updated [the patch|^0001-app_voicemail-Prevent-deadlocks-when-out-of-ODBC-dat.patch] after identifying the only other place I can see a potential deadlock in app_voicemail (and only if you are also using realtime ODBC). Can you try it? Assuming it works I'll see about getting it merged.

By: Schneur Rosenberg (thesipguy) 2020-12-03 08:05:35.746-0600

Hi Sean, I applied the new patch and it does compile and the voicemail still gets stuck but it does get released after a few minutes, so its only semi fixed, but I did lots of testing and and I'm a bit smarter now, you can now reproduce my issue very easily, the system will only crash when the msgnum isnt sequencial, so if I delete a voicemail from the DB without reordering the msgnum I will see a message in the CLI something like "app_voicemail_odbc.c:9146 open_mailbox: Resequencing Mailbox: /var/spool/asterisk/voicemail/default/odbctest/INBOX, expected 3 but found 2 message(s) in box with max threshold of 100." this could go on for a few minutes depending on how many messages I have in the VM box, and while its reordering the msgnum field the channel will hang, prior to the new patch it would remain hung now it does get released but way after I hang up.

During my testing the system just deleted 50 messages one time when I hung up while it was hanging, also it didnt always delete the .lock file when the channel get release and no new VMs could be left until I manually deleted it

By: Sean Bright (seanbright) 2020-12-03 08:10:57.935-0600

So it sounds like this issue is fixed then? The resequencing issue is the result of manipulating the database directly instead of letting Asterisk do it. Unless I misunderstand.

By: Schneur Rosenberg (thesipguy) 2020-12-03 12:04:58.117-0600

system hanging for 5 minutes when trying to leave or retrieve a messsage, messages getting deleted when system is trying to reorder, mailboxes being locked with the .lock file isnt called fixed, the system should know how to reorder the msgnum without all the above problems, the only thing that is fixed is that I dont need to restart asterisk after everytime it hangs, it releases itself after a few minutes.

You're correct that the problem is that I manipulated the DB directly, but Asterisk should  be able to handle issues like this, not all our customers listen to the VM's through asterisk, my portal pulls the messages directly from the DB and the customers can delete messages from the portal.

I will reorder the msgnum from now on after a delete, but its still a bug that the system gets locked for a few minutes while trying to reorder.

By: Sean Bright (seanbright) 2020-12-03 12:35:59.122-0600

I am not able to reproduce this slow down following the instructions you provided. Can you reproduce and then attach your debug logs as described?

----

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Sean Bright (seanbright) 2020-12-03 13:59:59.560-0600

[~thesipguy], I've attempted to optimizing mailbox resequencing as well. If you could apply both of the attached patches (in order) and then test again and let me know your results that would be helpful.

By: Schneur Rosenberg (thesipguy) 2020-12-21 06:29:45.689-0600

Its not perfect, but its definetly alot better

By: Sean Bright (seanbright) 2020-12-21 15:38:19.862-0600

OK then I need a debug log as described above.

By: Schneur Rosenberg (thesipguy) 2020-12-22 05:25:10.758-0600

Hi, there were only a few relevant lines, I placed a call and when it was silent I hung up,  you can see Autodestruct messages after I hung up then after 35 seconds you can see it trying to play the "vm-youhave" message but its still sending the Autodestruct message, then after 52 seconds it finally hangs up.

its only 11 lines so I'm pasting it instead of attaching a file

[2020-12-22 10:48:31] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:48:38] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:48:44] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:48:50] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:48:57] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:49:03] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:49:10] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:49:16] VERBOSE[10764][C-00067ed5] file.c: <SIP/solhome9-0007b73e> Playing 'vm-youhave.ulaw' (language 'en')
[2020-12-22 10:49:16] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:49:22] WARNING[29015] chan_sip.c: Autodestruct on dialog '1812013694-5060-206@BJC.BGI.A.J' with owner SIP/solhome9-0007b73e in place (Method: BYE). Resch
eduling destruction for 10000 ms
[2020-12-22 10:49:23] VERBOSE[10764][C-00067ed5] pbx.c: Spawn extension (sub_dialing_smart, a, 2) exited non-zero on 'SIP/solhome9-0007b73e'


By: Sean Bright (seanbright) 2020-12-22 07:46:20.451-0600

None of those lines are relevant. You didn't include any debug output (just VERBOSE and WARNING) and I do not see the "Resequencing Mailbox" message that you reported.

By: Schneur Rosenberg (thesipguy) 2020-12-22 08:08:18.805-0600

debug

By: Schneur Rosenberg (thesipguy) 2020-12-22 08:09:23.665-0600

Oh sorry, there was lots of garbage and private info so I filtered out too much, I just attached a file hopefully you can find what you're looking for in the file

By: Sean Bright (seanbright) 2020-12-22 08:22:07.505-0600

And this is with or without the patches I have attached to this issue?

By: Schneur Rosenberg (thesipguy) 2020-12-22 09:05:12.697-0600

With the patches, without the patches it hung the voicemail and users couldn't recieve new voicemails or retrieve the VM, now it hangs but gets released once the reordering is done

By: Sean Bright (seanbright) 2020-12-22 09:09:38.152-0600

And you've not removed anything between these two lines?

{noformat}
[2020-12-22 10:48:21] DEBUG[10764][C-00067ed5] res_odbc.c: Reusing ODBC handle 0x7fc1d470ea20 from class 'voicemailodbc'
[2020-12-22 10:48:25] DEBUG[29015][C-00067ed5] chan_sip.c: **** Received BYE (8) - Command in SIP BYE
{noformat}

By: Schneur Rosenberg (thesipguy) 2020-12-22 09:56:27.693-0600

Nothing with C-00067ed5 in the line, the server is a active server I couldnt bring it down at the moment, so I filtered by C-00067ed5

By: Schneur Rosenberg (thesipguy) 2020-12-22 10:15:46.692-0600

debug 10 seconds

By: Schneur Rosenberg (thesipguy) 2020-12-22 10:17:06.395-0600

I just uploaded a new file which has all the logs between 10:48:20 and 10:48:29, please delete both files from the ticket once you looked at them, as they have some sensitive info

By: Sean Bright (seanbright) 2020-12-22 11:01:59.773-0600

I am going to have to provide another patch that adds more debug output. It is not clear to me why the resequencing step is never completing.

By: Sean Bright (seanbright) 2020-12-22 12:53:59.167-0600

[~thesipguy], [a new patch|^0003-app_voicemail-Add-some-additional-debugging.patch] for you to apply (on top of the existing two) at your convenience. Then I will need a new debug log.

By: Schneur Rosenberg (thesipguy) 2020-12-23 03:47:31.978-0600

The resequencing does complete, my issue is when it takes long and the caller hangs up it wont release the channel until it finished, when there are lots of files in the voicemail and a few are missing it can take a minute or two while its silent and the caller would probably hang up, but asterisk keeps the channel tied up till it eventually finishes.

This is alot better than before the patches, because then the entire Voicemail on the system hung forever

By: Sean Bright (seanbright) 2020-12-23 07:12:42.047-0600

It doesn’t complete according to the logs that I’ve seen, which is why I need you to apply the new patch and then supply logs.

By: Schneur Rosenberg (thesipguy) 2020-12-24 07:08:40.507-0600

I need to revert the patches, there is a new issue with the patched system, it renames voicemail boxes and the names and the timezone to random strings, like

hByS6pTVqBqOOF15
PBfuBrlfiYuV2K13
etc,

it started happeneing after I patched the system, I'm not sure what trigers this change

By: Sean Bright (seanbright) 2020-12-24 09:07:22.609-0600

Alright. I am going to just fix the deadlock since that is the only problem I am able to reliably reproduce. Should have the review up shortly.

By: Schneur Rosenberg (thesipguy) 2021-01-06 04:05:28.070-0600

I ended up blaming the patch on messing up mailbox names, turns out one of our developers was doing tests on a live database and that renamed the fields, I'm ready to apply the patches again and I'll report back

By: Schneur Rosenberg (thesipguy) 2021-02-24 09:00:52.493-0600

I would like to test the patches again but I cant find the patches

By: Sean Bright (seanbright) 2021-02-24 09:18:59.359-0600

The deadlock fix already went into 16, 18, and trunk. The [other patch that attempted to optimize resequencing|^0001-app_voicemail-Attempt-to-optimize-mailbox-resequenci.patch] is now attached.

By: Sean Bright (seanbright) 2021-04-23 11:18:16.705-0500

Do you have any feedback?