[Home]

Summary:ASTERISK-16815: [patch] segfault when performing "module reload res_odbc.so"
Reporter:Bruce McAlister (asgaroth)Labels:
Date Opened:2010-10-18 11:10:45Date Closed:2011-07-27 13:08:50
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_odbc
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20110214__issue18156.diff.txt
( 1) backtrace_20-Nov-2010.txt
( 2) backtrace_frame_4_p_obj.txt
( 3) backtrace.txt
( 4) console-log_20-Nov-2010.txt
Description:Every now and again it appears that asterisk opens more connections to the database that is defined for the dsn within asterisk. When this occurs, I attempt to reload the odbc module. When I do this asterisk crashed with a segfault.

****** STEPS TO REPRODUCE ******

This is extremely difficult to reproduce, I've been trying to find out what causes asterisk to open multiple connections to the database (over and above the dsn definition), and it's proving extremely difficult to track down. Either way, I would assume that the proper behaviour of "module reload res_odbc.so" should not cause asterisk to crash.

This issue occurs if I have pooling enabled and disabled at the unixODBC layer, I've not attempted disabling pooling from within the asterisk environment yet.

When I come in in the morning I run and "odbc show" to check the connections from within asterisk. Some mornings things appear "normal", however, some mornings, things dont look right at all, whereby the limit connections that are actually defined for the dsn are exceeded, for example, I have defined 15 connections to our database for the "services" dsn, some mornings I come in and this count is at 52 connected connections.

****** ADDITIONAL INFORMATION ******

odbcinst.ini
--------------------------------------------------------------------------------
[ODBC]
Trace = No
Trace File = /tmp/odbc_sql_trace.log
Pooling = Yes

[PostgreSQL]
Description = Official PostgreSQL ODBC Driver v08.04.0200
Driver      = /usr/lib/psqlodbcw.so
Debug       = 0
CommLog     = 0
FileUsage   = 1
DontDLClose = 1
CPTimeout   = 0

[PostgreSQL-Pooling]
Description = Official PostgreSQL ODBC Driver v08.04.0200 (Pooling Enabled)
Driver      = /usr/lib/psqlodbcw.so
Debug       = 0
CommLog     = 0
FileUsage   = 1
DontDLClose = 1
CPTimeout   = 120
--------------------------------------------------------------------------------


odbc.ini
--------------------------------------------------------------------------------
[prod-services]
Description = Blueface Services Database
#Driver = PostgreSQL
Driver = PostgreSQL-Pooling
Database = blueface-service
Servername = database-services
Port = 5432
UserName = username
Password = password
--------------------------------------------------------------------------------

res_odbc.conf
--------------------------------------------------------------------------------
[services]
enabled     => yes
dsn         => prod-services
pooling     => yes
limit       => 15
pre-connect => yes
--------------------------------------------------------------------------------


I have marked this issue as private as the backtrace includes customer names.
Comments:By: Bruce McAlister (asgaroth) 2010-10-19 03:43:56

Happened again this morning, this is what the output of "odbc show" looks like:

bfieast01*CLI> odbc show

ODBC DSN Settings
-----------------

 Name:   services
 DSN:    prod-services
 Pooled: Yes
 Limit:  15
 Connections in use: 50
   - Connection 1: connected
   - Connection 2: connected
   - Connection 3: connected
   - Connection 4: connected
   - Connection 5: connected
   - Connection 6: connected
   - Connection 7: connected
   - Connection 8: connected
   - Connection 9: connected
   - Connection 10: connected
   - Connection 11: connected
   - Connection 12: connected
   - Connection 13: connected
   - Connection 14: connected
   - Connection 15: connected
   - Connection 16: connected
   - Connection 17: connected
   - Connection 18: connected
   - Connection 19: connected
   - Connection 20: connected
   - Connection 21: connected
   - Connection 22: connected
   - Connection 23: connected
   - Connection 24: connected
   - Connection 25: connected
   - Connection 26: connected
   - Connection 27: connected
   - Connection 28: connected
   - Connection 29: connected
   - Connection 30: connected
   - Connection 31: connected
   - Connection 32: connected
   - Connection 33: connected
   - Connection 34: connected
   - Connection 35: connected
   - Connection 36: connected
   - Connection 37: connected
   - Connection 38: connected
   - Connection 39: connected
   - Connection 40: connected
   - Connection 41: connected
   - Connection 42: connected
   - Connection 43: connected
   - Connection 44: connected
   - Connection 45: connected
   - Connection 46: connected
   - Connection 47: connected
   - Connection 48: connected
   - Connection 49: connected
   - Connection 50: connected

By: Tilghman Lesher (tilghman) 2010-10-21 10:02:52

Within the backtrace:

frame 4
p *obj

By: Leif Madsen (lmadsen) 2010-10-29 09:07:21

Requesting feedback from the reporter per tilghman.

By: Bruce McAlister (asgaroth) 2010-10-29 09:17:51

I'm not too sure what feedback Tilghman is requesting from me. Can we elaborate a little please and I'd be glad to provide feedback.

By: Tilghman Lesher (tilghman) 2010-10-29 09:59:56

I need more information from the backtrace.  Please type those commands into gdb, and paste the output here.

By: Bruce McAlister (asgaroth) 2010-10-29 16:07:40

I've now attached the file with the required gdb commands run.

By: Tilghman Lesher (tilghman) 2010-10-31 05:31:20

This does not appear to be a problem within Asterisk, but appears to be an issue either in UnixODBC or the PostgreSQL ODBC driver.  I would recommend upgrading both to the very latest from source as your next step.

By: Bruce McAlister (asgaroth) 2010-11-15 10:56:21.000-0600

I will attempt to update our versions of unixodbc (v2.2.14) to the latest (v2.3.0) and the pgsql driver from our current version (v08.04.0200) to the latest (v09.00.0200).

I will need to schedule downtime to upgrade our production environment, I will update you as soon as I have done this.

By: Bruce McAlister (asgaroth) 2010-11-20 04:51:47.000-0600

I have now upgraded our ODBC environment to the following:

unixODBC v2.3.0
PostgreSQL ODBC driver v 09.00.0200

However, the issue is still present, but slightly different this time.

When I login to the asterisk console and perform a:

"module reload res_odbc.so"

The module appears to reload successfully, however a short while later asterisk attempts to create a new connection to the database, and then it crashes.

I've attached the console log of what I was doing, and the backtrace of the crash (when the console disconnected at the end of console session as attached)

By: Bruce McAlister (asgaroth) 2010-11-20 07:00:16.000-0600

just another update, what apears to be happening, although it only occurs 95% of the time, is that once I reload the res_odbc.so module, the first attempt asterisk makes to create another session to the database crashes asterisk.

Not sure if I mentioned that I have the following lines in modules.conf as well.

preload => res_odbc.so
preload => res_config_odbc.so

The odbc subsystem would not load for without these 2 lines in modules.conf, I was getting the following error:

Error loading module 'res_config_odbc.so': ld.so.1: asterisk: fatal: relocation error: file /opt/asterisk/lib/modules/res_config_odbc.so: symbol ast_odbc_clear_cache: referenced symbol not found

I'm not sure if it is related in any way, just thought I would mention it.

By: Bruce McAlister (asgaroth) 2010-11-25 18:32:11.000-0600

Another update, it still appears that asterisk is not honouring the max number of connections settings defined in the odbc setup, see below for current listing of odbc connections after the upgrade to v2.3.0 of unixDBC.

ODBC DSN Settings
-----------------

 Name:   services
 DSN:    prod-services
 Pooled: Yes
 Limit:  15
 Connections in use: 16
   - Connection 1: connected
   - Connection 2: connected
   - Connection 3: connected
   - Connection 4: connected
   - Connection 5: connected
   - Connection 6: connected
   - Connection 7: connected
   - Connection 8: connected
   - Connection 9: connected
   - Connection 10: connected
   - Connection 11: connected
   - Connection 12: connected
   - Connection 13: connected
   - Connection 14: connected
   - Connection 15: connected
   - Connection 16: connected

 Name:   vmmsgs
 DSN:    prod-voicemail
 Pooled: Yes
 Limit:  10
 Connections in use: 3
   - Connection 1: connected
   - Connection 2: connected
   - Connection 3: connected

 Name:   cdr
 DSN:    prod-cdr
 Pooled: Yes
 Limit:  10
 Connections in use: 23
   - Connection 1: connected
   - Connection 2: connected
   - Connection 3: connected
   - Connection 4: connected
   - Connection 5: connected
   - Connection 6: connected
   - Connection 7: connected
   - Connection 8: connected
   - Connection 9: connected
   - Connection 10: connected
   - Connection 11: connected
   - Connection 12: connected
   - Connection 13: connected
   - Connection 14: connected
   - Connection 15: connected
   - Connection 16: connected
   - Connection 17: connected
   - Connection 18: connected
   - Connection 19: connected
   - Connection 20: connected
   - Connection 21: connected
   - Connection 22: connected
   - Connection 23: connected

By: Bruce McAlister (asgaroth) 2010-11-30 05:08:34.000-0600

Hi, just checking in to see if there is any update to this bug, or if you need anything from me? I've provided feedback, can we change the state of the bug?

By: Bruce McAlister (asgaroth) 2011-02-04 23:32:28.000-0600

Another update, some additional messages I spotted this morning which may help narrow it down:

On the asterisk console we see this:

[Feb  5 05:25:18] NOTICE[20191]: res_odbc.c:1427 odbc_obj_connect: Connecting services
[Feb  5 05:25:18] NOTICE[20191]: res_odbc.c:1427 odbc_obj_connect: Connecting services
[Feb  5 05:25:18] NOTICE[20191]: res_odbc.c:1455 odbc_obj_connect: res_odbc: Connected to services [prod-services]
[Feb  5 05:25:18] NOTICE[20191]: res_odbc.c:1455 odbc_obj_connect: res_odbc: Connected to services [prod-services]
[Feb  5 05:25:20] ERROR[20191]: res_config_odbc.c:185 realtime_odbc: No database handle available with the name of 'services' (check res_odbc.conf)
[Feb  5 05:25:20] ERROR[20191]: res_config_odbc.c:185 realtime_odbc: No database handle available with the name of 'services' (check res_odbc.conf)
[Feb  5 05:25:20] ERROR[20191]: res_config_odbc.c:185 realtime_odbc: No database handle available with the name of 'services' (check res_odbc.conf)
[Feb  5 05:25:20] ERROR[20191]: res_config_odbc.c:185 realtime_odbc: No database handle available with the name of 'services' (check res_odbc.conf)

The odd thing is that "res_odbc.conf" actually does have the definition:

root@bfieast01:/opt/asterisk/etc # more res_odbc.conf
[services]
enabled     => yes
dsn         => prod-services
pooling     => yes
limit       => 15
pre-connect => yes

[vmmsgs]
enabled     => yes
dsn         => prod-voicemail
pooling     => yes
limit       => 10
pre-connect => yes

[cdr]
enabled     => yes
dsn         => prod-cdr
pooling     => yes
limit       => 10
pre-connect => yes

This issue is still ongoing. If you need me to do any additional stuff, please let me know.

By: Tilghman Lesher (tilghman) 2011-02-14 01:08:22.000-0600

You misunderstand the message.  It says that there's no database handle AVAILABLE.  That may mean that the handles are all presently in use.

By: Tilghman Lesher (tilghman) 2011-02-14 01:12:35.000-0600

BTW, your next step should be to report this upstream to the Postgres ODBC driver team.

By: Tilghman Lesher (tilghman) 2011-02-14 01:25:45.000-0600

Patch uploaded should prevent the number allocated from ever going past the limit.  This is a clear race condition that needed to be fixed.

By: Digium Subversion (svnbot) 2011-02-14 14:10:30.000-0600

Repository: asterisk
Revision: 307792

U   branches/1.6.2/res/res_odbc.c

------------------------------------------------------------------------
r307792 | tilghman | 2011-02-14 14:10:29 -0600 (Mon, 14 Feb 2011) | 8 lines

Increment usage count at first reference, to avoid a race condition with many threads creating connections all at once.

(issue ASTERISK-16815)
Reported by: asgaroth
Patches:
      20110214__issue18156.diff.txt uploaded by tilghman (license 14)
Tested by: tilghman

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=307792

By: Digium Subversion (svnbot) 2011-02-14 14:16:56.000-0600

Repository: asterisk
Revision: 307793

_U  branches/1.8/
U   branches/1.8/res/res_odbc.c

------------------------------------------------------------------------
r307793 | tilghman | 2011-02-14 14:16:55 -0600 (Mon, 14 Feb 2011) | 15 lines

Merged revisions 307792 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.6.2

........
 r307792 | tilghman | 2011-02-14 14:10:28 -0600 (Mon, 14 Feb 2011) | 8 lines
 
 Increment usage count at first reference, to avoid a race condition with many threads creating connections all at once.
 
 (issue ASTERISK-16815)
  Reported by: asgaroth
  Patches:
        20110214__issue18156.diff.txt uploaded by tilghman (license 14)
  Tested by: tilghman
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=307793

By: Digium Subversion (svnbot) 2011-02-14 14:18:03.000-0600

Repository: asterisk
Revision: 307795

_U  trunk/
U   trunk/res/res_odbc.c

------------------------------------------------------------------------
r307795 | tilghman | 2011-02-14 14:18:02 -0600 (Mon, 14 Feb 2011) | 22 lines

Merged revisions 307793 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.8

................
 r307793 | tilghman | 2011-02-14 14:16:55 -0600 (Mon, 14 Feb 2011) | 15 lines
 
 Merged revisions 307792 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.6.2
 
 ........
   r307792 | tilghman | 2011-02-14 14:10:28 -0600 (Mon, 14 Feb 2011) | 8 lines
   
   Increment usage count at first reference, to avoid a race condition with many threads creating connections all at once.
   
   (issue ASTERISK-16815)
    Reported by: asgaroth
    Patches:
          20110214__issue18156.diff.txt uploaded by tilghman (license 14)
    Tested by: tilghman
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=307795

By: Bruce McAlister (asgaroth) 2011-03-08 12:59:19.000-0600

Hi Tilghman,

Thanks for the update, I'll implement this half of the fix into our production environment and see if it does aleviate the issue in some way shape or form. I'll then also get on to the unixODBC dev team to see if they can help in trying to get to the bottom of the other half of this issue.

Just a quick question, we are looking to upgrade our 1.6.2 environment to 1.8.x, I assume this fix has also been applied to the 1.8 branch as well?

We are currently testing with v1.8.3, did this fix make it in to that release, or will it go in to 1.8.4?

By: Tilghman Lesher (tilghman) 2011-03-08 17:09:57.000-0600

Yes, it made it into the 1.8 branch.  Whether it makes it into any particular release, you'll need to review the changelog for that release to confirm.

By: Tilghman Lesher (tilghman) 2011-05-03 18:43:42

asgaroth:  given the lack of response here, is it safe to assume that this issue is now fixed for you?

By: Bruce McAlister (asgaroth) 2011-05-04 04:32:23

This issue still occurs on asterisk version "Asterisk SVN-branch-1.6.2-r315393"

I've not tried the latest version of 1.8 yet.

By: Tilghman Lesher (tilghman) 2011-05-04 13:21:54

asgaroth:  so you're still able to open more connections than is configured for the DSN in res_odbc.conf?  Or the crash still occurs when you type 'odbc reload'?

By: Bruce McAlister (asgaroth) 2011-05-05 03:51:03

tilghman, yes, correct on both counts.

The odbc connection issue (opening more connections than is defined) occurred last week, the first time in a long time. I had 15 connections defined, * opened 18 connections.

* crashed when I perform an odbc reload after the dsn has more connections in it as defined. The odbc reload seems to work properly otherwise.

This issue does not occur as frequently anymore since your last patch, but it has happened 2 or 3 times since the patch was applied.

By: Bruce McAlister (asgaroth) 2011-05-05 03:53:56

I will be building a box today that I will use to install the pg 9 drivers on, that we will then switch over to production to see if the postgres driver has anything to do with this issue.

By: Russell Bryant (russell) 2011-07-27 13:08:44.090-0500

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

If this is still an issue, please open a new issue so it can be re-triaged appropriately. Thanks!