[Home]

Summary:ASTERISK-26551: Voicemail deadlock when under load with ODBC backend
Reporter:David Moore (dmoore)Labels:
Date Opened:2016-11-03 00:58:55Date Closed:2016-11-11 08:43:13.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_voicemail Applications/app_voicemail/ODBC
Versions:13.11.2 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Up to date Debian 8 chan_pjsip Cloned a production server and enabled debug then setup automated tests to keep the voicemail system under load in order to reproduce this bug.Attachments:( 0) 2016-11-09-00_27_31-backtrace
( 1) 2016-11-09-00_27_31-channels
( 2) 2016-11-09-00_27_31-debug
( 3) 2016-11-09-00_27_31-locks
( 4) 2016-11-09-23_28_59-backtrace
( 5) 2016-11-09-23_28_59-channels
( 6) 2016-11-09-23_28_59-debug
( 7) 2016-11-09-23_28_59-locks
( 8) 2016-11-11-01_28_46-314-backtrace
( 9) 2016-11-11-01_28_46-314-channels
(10) 2016-11-11-01_28_46-314-debug
(11) 2016-11-11-01_28_46-314-locks
(12) ASTERISK-26551.diff
(13) backtrace-threads10.txt
(14) core-show-locks10.txt
(15) dialplan.txt
(16) odbc_show_all.txt
Description:Voicemail-only server (answering incoming calls from peer with chan_pjsip)

Bug can be reproduced by placing the voicemail system under load, I did this by leaving many simultaneous voicemails through automated means. After running for several hours the voicemail app locks up. The system will still accept incoming calls, and enter the Voicemail app, but no audio is returned and the call does not progress further.

The system eventually recovers and resumes normal operation, this generally takes between 3 and 20 minutes.

I took ten backtraces and ten copies of 'core show locks' during this deadlock on my debug server, I can upload all of them if necessary, I will begin by attaching the latest data set and will upload more upon request to avoid cluttering things up unnecessarily
Comments:By: Asterisk Team (asteriskteam) 2016-11-03 00:58:56.278-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: David Moore (dmoore) 2016-11-03 01:01:03.126-0500

backtrace during deadlock and core show locks from immediately after backtrace

By: Rusty Newton (rnewton) 2016-11-03 08:45:53.709-0500

Thanks! It is great that you can reproduce it.

The thing that helps a lot with deadlocks other than "core show locks" output and traces, is a debug log during the time it occurs.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

If you can additionally attach and explain the dialplan used for the call flow, as well as any relevant config files that will help tremendously.

By: David Moore (dmoore) 2016-11-03 08:56:57.888-0500

Dialplan attached, will work on getting debug output

By: Joshua C. Colp (jcolp) 2016-11-08 05:48:17.352-0600

Which MySQL ODBC driver are you using? The deadlock itself appears to be as a result of the driver:

{code}
#0  0x00007fc5952a4add in read () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fc530946cc4 in vio_read_buff () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#2  0x00007fc53093680e in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#3  0x00007fc53093748d in my_net_read () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#4  0x00007fc5309308bc in cli_safe_read () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#5  0x00007fc530934243 in cli_advanced_command () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#6  0x00007fc53092c571 in mysql_ping () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#7  0x00007fc530e61e24 in MySQLGetConnectAttr () from /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so
#8  0x00007fc530e4f8a3 in SQLGetConnectAttrImpl () from /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so
#9  0x00007fc53292fac8 in SQLGetConnectAttr () from /usr/lib/x86_64-linux-gnu/libodbc.so.2
#10 0x00007fc532b83c6d in connection_dead (connection=0x7fc50c149e18, class=0x2d38cf8) at res_odbc.c:778
{code}

We're doing a connection check to make sure it is still alive and it is hanging waiting to read data from the remote server.

By: David Moore (dmoore) 2016-11-08 08:41:55.434-0600

ODBC version is 2.3.1-3 from Debian Jessie repositories, should I be using something else?

By: Joshua C. Colp (jcolp) 2016-11-08 08:45:53.326-0600

We'll support any UnixODBC version, but the MySQL driver itself is versioned and distributed separately and may have issues in itself that can't be worked around on the Asterisk side.

By: David Moore (dmoore) 2016-11-08 09:00:30.585-0600

MySQL ODBC driver is v5.1.10-3

Is there a different version that is recommended to use for this driver?

By: Joshua C. Colp (jcolp) 2016-11-08 14:24:33.883-0600

The one I used during testing for fixing ODBC was the Oracle MySQL ODBC connector. It worked fine.

I tried to use the distro provided one but it would crash sporadically.

By: David Moore (dmoore) 2016-11-09 00:36:13.625-0600

More complete example of deadlock

By: David Moore (dmoore) 2016-11-09 00:39:00.566-0600

I have uploaded a new backtrace, along with show locks, show channels verbose, last few minutes of debug log, and output of odbc show all

This is after replacing stock libmyodbc and unixodbc with the latest from their respective sources, as follows-

mysql-connector-odbc-5.3.6-linux-debian8-x86-64bit from oracle
unixODBC-2.3.4 from unixodbc website

Can you confirm that this is still a problem with the mysql driver?

By: Joshua C. Colp (jcolp) 2016-11-09 08:10:08.329-0600

Seems to be:

{code}
Thread 198 (Thread 0x7fb41b2be700 (LWP 13868)):
#0  0x00007fb4c49afddb in __libc_recv (fd=43, buf=0x7fb43cd6bd10, n=16384, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00007fb46045da8d in inline_mysql_socket_recv (flags=<optimized out>, n=<optimized out>, buf=<optimized out>, mysql_socket=..., src_line=<optimized out>, src_file=<optimized out>) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/include/mysql/psi/mysql_socket.h:823
#2  vio_read (vio=0x7fb43c062780, buf=<optimized out>, size=<optimized out>) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/vio/viosocket.c:123
#3  0x00007fb46045db12 in vio_read_buff (vio=0x2b, buf=0x7fb41630c030 "\001", size=4) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/vio/viosocket.c:166
#4  0x00007fb4604b8333 in net_read_raw_loop (count=4, net=<optimized out>) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql/net_serv.cc:665
#5  net_read_packet_header (net=<optimized out>) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql/net_serv.cc:755
#6  net_read_packet (net=0x7fb43c8fc8d8, complen=0x7fb41b2ac0c8) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql/net_serv.cc:815
#7  0x00007fb4604b869c in my_net_read (net=0x2b) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql/net_serv.cc:892
#8  0x00007fb46044a6bb in cli_safe_read_with_ok (mysql=0x7fb43c8fc8d8, parse_ok=1 '\001', is_data_packet=0x0) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql-common/client.c:1035
#9  0x00007fb46044f11c in cli_advanced_command (mysql=0x7fb43c8fc8d8, command=<optimized out>, header=<optimized out>, header_length=0, arg=<optimized out>, arg_length=0, skip_check=0 '\000', stmt=0x0) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/sql-common/client.c:1316
#10 0x00007fb460440506 in mysql_ping (mysql=0x2b, mysql@entry=0x7fb43c8fc8d8) at /export/home/pb2/build/sb_0-17781605-1454370718.35/mysql-5.7.11/libmysql/libmysql.c:935
#11 0x00007fb46042fe86 in MySQLGetConnectAttr (hdbc=hdbc@entry=0x7fb43c8fc8d0, attrib=1209, char_attr=char_attr@entry=0x7fb41b2ac2c8, num_attr=num_attr@entry=0x7fb41b2ac480) at /export/home/pb2/build/sb_0-18248528-1457730093.52/mysql-connector-odbc-5.3.6-src/driver/options.c:486
#12 0x00007fb46043dc9d in SQLGetConnectAttrImpl (hdbc=0x7fb43c8fc8d0, attribute=<optimized out>, value=0x7fb41b2ac480, value_max=0, value_len=0x0) at /export/home/pb2/build/sb_0-18248528-1457730093.52/mysql-connector-odbc-5.3.6-src/driver/ansi.c:584
#13 0x00007fb46043dd8b in SQLGetConnectAttr (hdbc=<optimized out>, attribute=<optimized out>, value=<optimized out>, value_max=<optimized out>, value_len=<optimized out>) at /export/home/pb2/build/sb_0-18248528-1457730093.52/mysql-connector-odbc-5.3.6-src/driver/ansi.c:567
#14 0x00007fb4621016f2 in SQLGetConnectAttr (connection_handle=0x2b, attribute=1020706064, value=0x4000, buffer_length=-1, string_length=0x0) at SQLGetConnectAttr.c:742
#15 0x00007fb462357c6d in connection_dead (connection=0x7fb43cb04df8, class=0x38b1478) at res_odbc.c:778
#16 0x00007fb462358087 in _ast_odbc_request_obj2 (name=0x7fb43389f200 <odbc_database> "asterisk2", flags=..., file=0x7fb43369340b "app_voicemail.c", function=0x7fb43369a672 <__PRETTY_FUNCTION__.18641> "messagecount", lineno=5817) at res_odbc.c:851
#17 0x00007fb462358294 in _ast_odbc_request_obj (name=0x7fb43389f200 <odbc_database> "asterisk2", check=0, file=0x7fb43369340b "app_voicemail.c", function=0x7fb43369a672 <__PRETTY_FUNCTION__.18641> "messagecount", lineno=5817) at res_odbc.c:882
#18 0x00007fb43367215e in messagecount (mailbox_id=0x7fb41b2ad630 "80283@DMOORETEST", folder=0x7fb433693d00 "INBOX") at app_voicemail.c:5817
#19 0x00007fb433672469 in has_voicemail (mailboxes=0x7fb41b2ad750 "80283@DMOORETEST", folder=0x0) at app_voicemail.c:5867
#20 0x000000000043e4c5 in ast_app_has_voicemail (mailboxes=0x7fb41b2ad750 "80283@DMOORETEST", folder=0x0) at app.c:707
#21 0x00007fb433679658 in notify_new_message (chan=0x7fb43c23e638, vmu=0x7fb41b2b2a10, vms=0x0, msgnum=0, duration=97, fmt=0x7fb41b2ad6d0 "wav", cidnum=0x7fb43c90c9f0 "+14194389010", cidname=0x7fb43c230bd0 "4194389010", flag=0x7fb41b2b29c0 "") at app_voicemail.c:8073
#22 0x00007fb43367619f in leave_voicemail (chan=0x7fb43c23e638, ext=0x7fb43c84d098 "80283", options=0x7fb41b2bb4e0) at app_voicemail.c:6995
#23 0x00007fb43368685c in vm_exec (chan=0x7fb43c23e638, data=0x7fb41b2bb630 "80283@DMOORETEST,u") at app_voicemail.c:12086
#24 0x000000000057fab8 in pbx_exec (c=0x7fb43c23e638, app=0x4d72190, data=0x7fb41b2bb630 "80283@DMOORETEST,u") at pbx_app.c:485
#25 0x000000000056da04 in pbx_extension_helper (c=0x7fb43c23e638, con=0x0, context=0x7fb43c23eff0 "VoiceMail", exten=0x7fb43c23f040 "80283", priority=3, label=0x0, callerid=0x7fb43c90c9f0 "+14194389010", action=E_SPAWN, found=0x7fb41b2bdcd4, combined_find_spawn=1) at pbx.c:2834
#26 0x0000000000570e3d in ast_spawn_extension (c=0x7fb43c23e638, context=0x7fb43c23eff0 "VoiceMail", exten=0x7fb43c23f040 "80283", priority=3, callerid=0x7fb43c90c9f0 "+14194389010", found=0x7fb41b2bdcd4, combined_find_spawn=1) at pbx.c:4060
#27 0x0000000000571aa4 in __ast_pbx_run (c=0x7fb43c23e638, args=0x0) at pbx.c:4235
#28 0x00000000005731df in pbx_thread (data=0x7fb43c23e638) at pbx.c:4555
#29 0x00000000005f54a1 in dummy_start (data=0x7fb43dc65c10) at utils.c:1235
#30 0x00007fb4c49a90a4 in start_thread (arg=0x7fb41b2be700) at pthread_create.c:309
#31 0x00007fb4c395b62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{code}

It's blocked waiting on a response from the MySQL server. Since you've updated to the latest UnixODBC you can also use multiple connections, which may help. It's done using the "max_connections" option in res_odbc.conf

By: David Moore (dmoore) 2016-11-09 23:51:42.977-0600

We were using max_connections of 20 for each instance of the connector, we bumped that to 200 and while I left the automated voicemails running the system deadlocked again. This lasted more than 30 minutes, when I got to the console it had 1867 active channels. The system placing the calls was doing no more than 40 simultaneously, all the channels stayed open on my debug server

I am assuming this is another ODBC blockage, I am uploading the same files as before, note that the backtrace is around 4MB

These errors were logged in connection with the deadlock, the system stopped responding a minute or two prior to the first:

[2016-11-09 23:12:36] WARNING[18754] taskprocessor.c: The 'subm:manager_topic-00000007' task processor queue reached 3000 scheduled tasks.
[2016-11-09 23:15:16] WARNING[18754] taskprocessor.c: The 'subm:rtp_topic-000001e6' task processor queue reached 500 scheduled tasks.



By: David Moore (dmoore) 2016-11-09 23:53:49.211-0600

backtrace, core show channels, core show locks, and debug log for the last deadlock, taken after more than 30 minutes of lockage

core show locks does not have useful information, asterisk did not output it

all channels listed in 'core show channels' output are defunct and long since gone away



By: Joshua C. Colp (jcolp) 2016-11-10 05:18:12.736-0600

Same issue. We're trying to check connectivity of an ODBC connection to make sure it is still alive and the MySQL driver is just blocking. I'm not sure there's anything we could try in res_odbc to work around or improve this. We are literally using the method provided by ODBC for doing this. Hum. I'm attaching a patch which disables using the ODBC method of detecting a dead connection and falls back to doing a test query.

You can apply it using:
patch -p1 < ASTERISK-26551.diff

By: Joshua C. Colp (jcolp) 2016-11-10 05:18:32.912-0600

Give this a try and if it works we may be able to make it an option.

By: David Moore (dmoore) 2016-11-11 08:19:42.660-0600

No luck, this deadlocked for about 14 minutes, uploading additional backtrace and associated files


By: Joshua C. Colp (jcolp) 2016-11-11 08:25:00.726-0600

Yet again it's MySQL blocking waiting to read from the server, just this time it's because of a query. From the Asterisk side there's nothing else I can do or adjust. We're simply using ODBC as we should and the MySQL connector is blocking.

By: David Moore (dmoore) 2016-11-11 08:35:15.872-0600

Thank you very much for your assistance with this issue. Currently I'm planning on trying to switch from innodb to myisam for voicemessages, and trying mariadb connector instead of mysql connector, I don't have any ideas beyond those

Are there any asterisk ODBC best practices enumerated somewhere?

By: Joshua C. Colp (jcolp) 2016-11-11 08:42:20.887-0600

Not really, there used to be some for pooling but we ended up implementing our own pooling.

By: Joshua C. Colp (jcolp) 2016-11-11 08:43:14.043-0600

I'm closing this out since it's not a bug in the Asterisk side.

You might also try posting on the asterisk-users mailing list[1] or the community site[2] to see if others are running a similar configuration and what they are using.

[1] http://lists.digium.com/pipermail/asterisk-users/
[2] https://community.asterisk.org/