[Home]

Summary:ASTERISK-15776: [patch] Crash in app_voicemail.c in function retrieve_file (Read out in small chunks)
Reporter:Kristijan Vrban (vrban)Labels:
Date Opened:2010-03-09 03:08:29.000-0600Date Closed:2015-02-25 21:53:04.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_voicemail/ODBC
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt_full.txt
( 1) core_with_libmyodbc_5_1_6.txt
( 2) crasher2.c
( 3) odbc_limit_retry.patch
( 4) p_crasher.c
( 5) p_crasher2.c
( 6) with_libmyodbc_debug.txt
Description:Every few days, one of my asterisk 1.4.29 crash with this error. See bt_full.txt
Asterisk was compiled with DONT_OPTIMIZE, so all needed information should be visible.

****** ADDITIONAL INFORMATION ******

OS: Ubuntu 8.04 LTS
Package libmyodbc is version: 3.51.27r695

Comments:By: Tilghman Lesher (tilghman) 2010-03-10 00:25:25.000-0600

Please upgrade to the latest version of the Mysql-ODBC-Connector from http://dev.mysql.com/downloads/connector/odbc/

This will almost always fix crash issues related to old packages that have already been fixed upstream.

By: Kristijan Vrban (vrban) 2010-03-10 03:03:50.000-0600

Hello tilghman, thanks for the quick answer. But 3.51.27 allready is the latest version 3. http://dev.mysql.com/downloads/connector/odbc/3.51.html

Do i need version 5.1.6

I especially compiled asterisk with DONT_OPTIMIZE and took it into production to get a expressive core dump. Have you took a quick look into it? Just to be sure, that it's really an libmyodbc issue, and not app_voicemail.

By: Tilghman Lesher (tilghman) 2010-03-10 09:38:28.000-0600

It's worth a try; however, I still would recommend downloading the source from MySQL and compiling it, as distributions have a history of adding patches to the MySQL ODBC connector which compromise the stability of that driver.

I have taken a quick look at your backtrace, and I am of the opinion that this is a driver issue, not an Asterisk issue.

By: Kristijan Vrban (vrban) 2010-03-17 04:23:44

ok, i have now build and installed a libmyodbc deb with the 5.1.6 Mysql-ODBC-Connector lib.

If i dont get this core in the next 30days, this was really an issue with the debian/ubuntu libmyodbc package. And then i will inform the Debian Maintainer for this lib steve Langasek <vorlon@debian.org> that he could rethink the patches there are added into the MySQL ODBC connector.

By: Kristijan Vrban (vrban) 2010-03-17 10:21:37

hmm, also with 5.1.6 libmyodbc, see core_with_libmyodbc_5_1_6.txt
But now i can reproduce the issue, so if i should include any patch to get more information, let my now.

I am have the speculation, that this issue happen only with a higher number of concurrent ODBC connections. Because when my sipp stress test does more then 20 concurrent into the voicemail, only then it crash.



By: Tilghman Lesher (tilghman) 2010-03-17 11:56:34

Okay, so now that you've verified the issue in upstream MySQL, please file a bug with them:

http://bugs.mysql.com/

Please set a link to the issue filed there, so we can follow progress on that issue from this issue.

By: Kristijan Vrban (vrban) 2010-03-19 04:31:21

i uploaded with_libmyodbc_debug.txt where libmyodbc was also compiled with -O0 and no strip. perhaps someone can read a usefull information in it. i also open parallel a bug report on: http://bugs.mysql.com/bug.php?id=52212



By: Kristijan Vrban (vrban) 2010-04-09 05:33:21

JFYI:
as workaround, when i use:

pooling=>yes
limit=>1

then it does not crash.

By: Kristijan Vrban (vrban) 2010-04-09 08:00:30

i attached a small test tool, that also use odbc to get the same busy blob file from the same DB i use. With this test tool. id does not crash, although it fetches much more blobs from the DB, then with res_odbc.

What is the difference between the way the test tool fetch the blob and res_odbc/app_voicemail_odbc? The test tool get's the blob in sequence. Perhaps res_odbc/app_voicemail_odbc fetch it parallel, and this is the problem? (my assumption)



By: Tilghman Lesher (tilghman) 2010-04-09 10:05:26

Yes, the Asterisk client could indeed fetch results in parallel.  This might be exactly the issue that is occurring, and the fact that turning on pooling works around the problem is an indication that the MySQL library is not thread-safe when sharing a connection.

By: Kristijan Vrban (vrban) 2010-04-09 10:09:24

i attached also p_crasher.c, where i fetch 20 blobs parallel. also with no problems using pthread on the same odbc connect.

so something _must_ be special with res_odbc/app_voicemail_odbc

in the meantime i testet also with asterisk trunk, no difference.

By: Tilghman Lesher (tilghman) 2010-04-09 11:18:57

That does not necessarily create parallel instances.  If the operation can be completed in a single context switch (and it most likely can), then while you are creating 20 separate threads, they are still done in sequence.  You'll need to use condition variables to pause all threads until all are started, and a pthread_cond_broadcast after all are started, and sched_yield() within each thread at various intervals to correctly simulate parallel queries.

By: Tilghman Lesher (tilghman) 2010-04-09 11:21:43

You're also only retrieving a single chunk.  You should be iterating through the entire size, and sched_yield() after each chunk, to allow other threads to start pulling data.

By: Kristijan Vrban (vrban) 2010-04-13 06:27:05

hi, see p_crasher2.c

i re-wrote it now with pthread_cond_broadcast to start all threads simultaneously
and i read out in small chunks with sched_yield()

It still does not crash...

By: Kristijan Vrban (vrban) 2010-04-13 11:05:24

i made a ugly hack "odbc_limit_retry.patch"

this hack with: pooling=>yes & limit=>1 in res_odbc.conf
prevent this crash, and still give a good change, that app_voicenmail get the blob from the DB.

Again, very ugly, but a working makeshift.

By: Tilghman Lesher (tilghman) 2010-04-13 12:00:35

All that patch does is to serialize the queries.

By: Kristijan Vrban (vrban) 2010-04-14 04:45:40

Did you really read my comments?

1. Again the "patch" as i described, i just a hack to use with pooling=>yes and limit=>1 in res_odbc.conf as a workaround. With pooling=>yes and limit=>1, res_odbc does only do one parallel odbc query, and that prevent that the crash happen. But has the disadvantage, that if a second odbc request from app_voicemail for a busy/not_available file is incomming, then res_odbc reject it, and app_voicemail play the default busy/not_available sound file. The hack just retry the odbc request from app_voicemail a few times, which is in my setup is ok as workaround. And not a permanently solution. You can delete it, because it's just my private workaround.

2. More important is p_crasher2.c. Again, i re-wrote it now with pthread_cond_broadcast to start all threads simultaneously
and i read out in small chunks with sched_yield() And libmyodbc still does not crash. So there is still something special with res_odbc/app_voicemail_odbc and i dont know what it is.



By: Tilghman Lesher (tilghman) 2010-05-12 13:03:20

What happens when you turn pooling on and set a higher limit, such as 30?

By: Kristijan Vrban (vrban) 2010-05-14 08:32:36

there is no difference, if pooling is on with 5 or 30 as limit. Or if pooling is disabled. The crash will happen when a higher number of blob's are fetched parallel.

Only polling with limit 1 prevented the crash, because then, res_odbc does only do one query at the same time.

By: Paul Belanger (pabelanger) 2010-06-25 09:06:22

Ping, what is the status here?

By: Tilghman Lesher (tilghman) 2011-01-05 23:38:17.000-0600

vrban: p_crasher2 still does not correctly simulate parallel queries.  A condition broadcast may allow all the threads to start at the broadcast time at the earliest, but as I stated before, if the entire set of commands can occur in a single context switch, then it is still only serializing the queries.  You need to ensure that each step is completed in tandem (and at various offsets) using condition variables before you have a valid test of multiple queries existing in various states in parallel.

pabelanger: the issue is still in triage state, as a cause has not been determined.

By: Matt Jordan (mjordan) 2015-02-25 21:52:54.635-0600

Per the Asterisk versions page [1], the maintenance (bug fix) support for the Asterisk branch you are using has ended. For continued maintenance support please move to a supported branch of Asterisk. After testing with a supported branch, if you find this problem has not been resolved, please open a new issue against the latest version of that Asterisk branch.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

Note that the bug report in the MySQL bug tracker was also closed out due to lack of a response from the issue reporter.

As it is, it is unclear where the issue really was, but given the lack of feedback here or other reports of this in Asterisk, I would suspect that this is an issue in the ODBC connector. If someone reproduces it in a supported version of Asterisk, comment here and I'll be happy to reopen this.