[Home]

Summary:ASTERISK-22563: Realtime database connections dropping
Reporter:Ben Smithurst (bensmithurst)Labels:
Date Opened:2013-09-20 07:20:45Date Closed:2013-12-09 18:16:00.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General Resources/res_odbc
Versions:11.5.1 Frequency of
Occurrence
Occasional
Related
Issues:
is related toASTERISK-22687 Core Dump On DB Connection Failing with MariaDB
Environment:Attachments:( 0) odbc.ini
( 1) res_odbc.conf
( 2) SIGURG-strace.txt
Description:We are seeing a periodic problem with our database connections dropping, with a log message along these lines:

{code}
[2013-09-20 10:58:24] WARNING[35309] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.1.49-1~bpo50+1-log]Lost connection to MySQL server during query (97)
{code}

After finally catching it with strace (attached), it appears to be due to some SIGURG signals being received, interrupting the {{read}} system call.

We are at a bit of a loss to know why this is happening, and are wondering if anyone has seen anything similar and/or can just shed any light on this.

Our investigation so far has concluded that Asterisk sets up a no-op handler for SIGURG, with SA_RESTART, so the {{read}} should be restarted, and indeed it is, but only 3 times -- after that, the socket is closed with {{shutdown}} and then {{close}} (I believe this is inside libmysqlclient).  If the system call *wasn't* restarted, then it looks from the libmysqlclient source, that it should do so anyway, on an EINTR return value.

A tcpdump also confirms that Asterisk is sending a query, and then closing the socket before data is even received back from the server - so it is not caused by a bad query.

There appears to be no easy way to reproduce this, it is *very* intermittent - we are doing some testing of a new server and have various automated tests running 24/7, and on average it happens maybe 2 or 3 times per day.  The problem seems to be often, but not always, on an {{UPDATE sippeers}} query - I am not sure if that is relevant.  We are only using SIP, no other channel types (except Local).

We are not averse to digging into the code to help solve this problem, however if anyone has any input on how the different threads interact in terms of what SIGURG is being used for, it might help us know where to look.  For example, would it be appropriate to block SIGURG using {{sigprocmask}} or similar while doing realtime database queries, or could that cause other problems?

Finally, another oddity is that it takes Asterisk 5 seconds to reconnect - compare the timestamps on e.g. lines 19 and 48 of the strace, this appears to consistent, maybe due to the timeout on the futex call?

Thanks for any input

(internal ref AST-133)
Comments:By: Walter Doekes (wdoekes) 2013-09-23 13:04:55.895-0500

I believe those 5 seconds might stem from this:

http://lists.mysql.com/commits/16150?f=plain
{noformat}
uint my_thread_end_wait_time= 5;
{noformat}

http://bazaar.launchpad.net/~mysql/myodbc/5.1/view/head:/driver/dll.c#L97

^-- libmyodbc is compiled with {{-DNONTHREADSAFE}}, so the seconds is not set to 0.

Finally, in {{my_end()}}, those seconds are slept.

Try this:
{noformat}
#!/bin/sh
set -e
[ -f "`basename "$0"`" ] || exit 2 # run this from current dir only

echo '[*] Getting dependencies'
if ! dpkg -l libmysqlclient-dev | grep -q ^ii; then
( sleep 2; yes ) | apt-get install libmysqlclient-dev
fi

echo '[*] Getting myodbc source'
myodbc="`find . -maxdepth 1 -name 'myodbc-*' -type d`"
if [ ! -d "$myodbc" ]; then
apt-get source libmyodbc
myodbc="`find . -maxdepth 1 -name 'myodbc-*' -type d`"
[ ! -d "$myodbc" ] && echo "Fail!" >&2
fi

echo '[*] Entering source'
cd "$myodbc"

echo '[*] Patching source'
if grep -q '=[^#]*-DNONTHREADSAFE' driver/Makefile.in; then
if ! patch driver/Makefile.in << __EOF__
--- driver/Makefile.in.orig 2012-09-25 16:32:48.282174060 +0200
+++ driver/Makefile.in 2012-09-25 16:32:49.752408865 +0200
@@ -312,7 +312,7 @@
# Linker and preprocessor options                                 #
#                                                                 #
###################################################################
-libmyodbc3_la_CPPFLAGS = -DNONTHREADSAFE
+libmyodbc3_la_CPPFLAGS = # -DNONTHREADSAFE
libmyodbc3_la_LDFLAGS = -release @myodbc_version@ -module
libmyodbc3_r_la_LDFLAGS = -release @myodbc_version@ -module
@WITH_DEBUG_FALSE@LTLIBS_DEPS = ../util/libmyodbc3u.la
__EOF__
then patch driver/Makefile.in << __EOF__
--- driver/Makefile.in.orig 2009-11-05 22:35:15.000000000 +0100
+++ driver/Makefile.in 2013-09-06 09:20:53.000000000 +0200
@@ -290,7 +290,7 @@
# Linker and preprocessor options                                 #
#                                                                 #
###################################################################
-libmyodbc5_la_CPPFLAGS = -DNONTHREADSAFE
+libmyodbc5_la_CPPFLAGS = # -DNONTHREADSAFE
libmyodbc5_la_LDFLAGS = -release @NUMERIC_VERSION@ -module
@WITH_DEBUG_FALSE@LTLIBS_DEPS = ../util/libmyodbc3u.la
@WITH_DEBUG_TRUE@LTLIBS_DEPS = \$(LTLIBS_DEBUG_DEPS)
__EOF__
fi
fi

echo '[*] Configuring source'
if [ ! -f Makefile ]; then
./configure --disable-gui
fi

echo '[*] Building package'
make --silent

echo '[*] Installing package'
sudo make install --silent
sudo ldconfig # probably not needed, but doesn't hurt

echo '[*] Fetch shortest lib and repopulate odbcinst.ini (prefer _r version)'
lib="`find /usr/local/lib -name 'libmyodbc*_r.so' | perl -e 'push @x,$_ for(<>);@x=sort{length $a cmp length $b}@x;print "$x[0]";'`"
[ -z "$lib" ] && lib="`find /usr/local/lib -name 'libmyodbc*.so' | perl -e 'push @x,$_ for(<>);@x=sort{length $a cmp length $b}@x;print "$x[0]";'`"
echo '    Make sure your odbcinst.ini contains this:'
echo
printf '[mysql]\n'
printf 'Description\t= MySQL driver\n'
printf 'Driver\t\t= %s\n' "$lib"
printf 'UsageCount\t= 1\n'
printf 'FileCount\t= 1\n'
printf 'Threading\t= 0\n'
echo
{noformat}

By: Ben Smithurst (bensmithurst) 2013-09-27 10:09:04.432-0500

Thanks Walter, that's interesting to know about the 5 second delay.

Our current workaround for this is to use the MySQL driver, instead of ODBC, and to manually overwrite the {{net.retry_count}} field within the {{MYSQL}} object.  This feels like a dirty hack at best.

I'd be interested to hear any other views on this problem, I don't think it's necessarily a bug in the Asterisk code, but it does cause problems using Realtime, so it would be good to find a way of solving it.

By: Rusty Newton (rnewton) 2013-10-17 19:04:02.157-0500

Ben can you also attach an Asterisk full log excerpt with VERBOSE and DEBUG enabled showing around the time of your log line: {noformat}[2013-09-20 10:58:24] WARNING[35309] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.1.49-1~bpo50+1-log]Los{noformat} ?
Probably want to include quite a bit of the log before and after that line depending on how much you have going on.

Thanks!

By: Rusty Newton (rnewton) 2013-10-17 19:06:43.252-0500

Additionally attach the versions of all related ODBC and MySQL libraries involved.

By: JoshE (n8ideas) 2013-10-21 11:04:11.893-0500

@Ben - we are fighting a similar issue.  Are you seeing a core when this disconnect occurs?  If so, see the bug we submitted here: ASTERISK-22687.  Would be interested to know if what's happening is similar.

Is your database server remote to your Asterisk server?

We have been able to repro this in some cases by dropping mysql port between Asterisk and Mysql in iptables and running a simple 'odbc show all'.  Does that scenario cause the same issue on your end?  Can you post your res_odbc.comf?  Also curious if the connection timeout value may cause this or change the failure path.  We had this value set very low, and it seemed to occasionally make things much worse.

Also willing to dig into code here, but it seems like a few of us are fighting this same issue... just wanted to offer the possibility we all sync up on handling this.

Josh

By: Rusty Newton (rnewton) 2013-11-20 09:37:24.893-0600

Ben, it has been a while since we received feedback from you. Is the issue still occurring for you? Are you able to provide the requested logs and information?

By: Rusty Newton (rnewton) 2013-12-09 18:15:53.577-0600

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines



By: Ben Smithurst (bensmithurst) 2013-12-10 03:21:43.653-0600

Sorry for the lack of response - our dirty workaround of setting {{retry_count}} is still working for us (and bypassing ODBC completely - not that I think ODBC is the problem, just that using MySQL directly makes it easier to do the workaround).