[Home]

Summary:ASTERISK-11165: Asterisk crashes when connecting the 10th simultaneous ODBC pooled connections
Reporter:Private Name (falves11)Labels:
Date Opened:2008-01-06 09:44:11.000-0600Date Closed:2008-02-29 17:57:35.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_odbc
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:After so much work now I nailed the ODBC problem.
I traced how many open connection I have in my SQL server from Asterisk every second, and inf act, I see 1,2,3,4,5,6,7,8,9,0, and Asterisk crashes. The capture below is obtained by setting verbose level to 1, and it can be seen clearly that at the 10th ODBC connection Asterisk is unable to connect. Yet, the command odbc show gives this output, which proves that theoretically there is a 1000 limit on the connections.

Name:   global
 DSN:    mssql
 Pooled: Yes
 Limit:  1000
 Connections in use: 1
   - Connection 1: connected


****** ADDITIONAL INFORMATION ******

[Jan  6 10:33:06] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:07] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:07] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:08] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:08] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:08] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:08] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:09] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:09] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:10] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:10] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:11] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:11] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:12] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql]
[Jan  6 10:33:13] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
[Jan  6 10:33:13] WARNING[1973]: res_odbc.c:573 odbc_obj_connect: res_odbc: Error SQLConnect=-1 errno=0 [unixODBC][FreeTDS][SQL Server]Unable to connect to data source
[Jan  6 10:33:13] WARNING[1973]: res_odbc.c:471 ast_odbc_request_obj: Failed to connect to global
[Jan  6 10:33:13] ERROR[1973]: func_odbc.c:320 acf_odbc_read: Unable to execute query [Routing 'S220-1199633587.10','66.28.190.220','19544447408','5058406824','']
[Jan  6 10:33:13] WARNING[1973]: ast_expr2.fl:428 ast_yyerror: ast_yyerror():  syntax error: syntax error, unexpected '<', expecting $end; Input:
< 0
^
[Jan  6 10:33:13] WARNING[1973]: ast_expr2.fl:432 ast_yyerror: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Jan  6 10:33:14] WARNING[1973]: app_dial.c:1266 dial_exec_full: Dial requires an argument (technology/number)
[Jan  6 10:33:14] WARNING[1973]: ast_expr2.fl:428 ast_yyerror: ast_yyerror():  syntax error: syntax error, unexpected '>=', expecting $end; Input:
>= 0
^
[Jan  6 10:33:14] WARNING[1973]: ast_expr2.fl:432 ast_yyerror: If you have questions, please refer to doc/channelvariables.txt in the asterisk source.
[Jan  6 10:33:14] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global
Comments:By: Private Name (falves11) 2008-01-06 14:11:36.000-0600

I get thousand of line like this when compiling with all the debugs.

res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 70 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 40 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 80 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 85 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 25 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 10 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 65 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'?

By: Tilghman Lesher (tilghman) 2008-01-07 11:34:59.000-0600

To report a crash, you MUST upload a stack backtrace (as a separate file, NOT pasted in a bugnote), as documented in doc/backtrace.txt.

By: Tilghman Lesher (tilghman) 2008-01-07 14:08:25.000-0600

Also, have you checked on your MS SQL Server as to whether that particular user is limited to 9 simultaneous connections?

By: Private Name (falves11) 2008-01-07 14:15:34.000-0600

Yes, the user is a sysdamin, and has all rights. There are no limits. I believe there was a problem with the ODBC Driver, and my vendor, Easysoft is checking it right now. He might find an additional problem in the Asterisk code since it keep blowing up under pressure. He is using Valgrind to find the culprit. I will publish his findings here. Meanwhile, we should increase the max amount of events to monitor by default from the current value to a much higher number. The code is in /main/autoservice.c I am using #define MAX_AUTOMONS 10256.

By: Tilghman Lesher (tilghman) 2008-01-07 15:30:52.000-0600

How many channels are you running at once?  You should only hit MAX_AUTOMONS when you have 256 simultaneous channels, minimum.

By: Private Name (falves11) 2008-01-07 15:35:40.000-0600

I  my tests I hit way over 256 simultaneous calls. This is a 64 Bit RHEl 5. It works. The only issue is the ODBC subsistem, but it seems to be a driver issue. Freetds is not designed to handle the pressure. Away from that, Asterisk can handle any volume of calls. I am not touching the media.

By: Private Name (falves11) 2008-01-08 11:24:11.000-0600

Conclusions from this case:
in /main/util.c we need to double the AST_MAX_LOCKS to 64
in /main/autoservice.c  we need to change MAX_AUTOMONS to 1500

With the new Driver version 10.43 from Easysoft, I can not take it down. However, Nick Gorgham submitted some very important information to the ast_development thread, gathered using valgrind. Here is the copy.

"My reading of whats going on is that the scheduler can call

   static int retrans_pkt(const void *data)

And this can run as far as the

   /* Lock channel PVT */
   sip_pvt_lock(pkt->owner);

Then it sits there waiting for the lock to be released. The lock I suspect was put on this in

   static void __sip_ack(struct sip_pvt *p, int seqno, int resp, int
sipmethod)

which runs its course, and eventually calls

   UNLINK(cur, p->packets, prev);
   dialog_unref(cur->owner);
   ast_free(cur);
   break;

The ast_free releases the memory, but its still referenced by the pkt in the thread in retransmit_pkt. Then it wakes up the locked thread by caling

   sip_pvt_unlock(p);

But now when the call in retransmit_pkt continues, the memory its using has actually been free'd by now, so it leads open the chance of corrupting the heap with the expected entertaining problems."

I hope that from this case we may get a much better Asteris, one that can rival and beat any commercial soft switch out there.

By: Digium Subversion (svnbot) 2008-01-08 14:43:19.000-0600

Repository: asterisk
Revision: 97194

U   branches/1.4/main/autoservice.c
U   branches/1.4/main/utils.c

------------------------------------------------------------------------
r97194 | tilghman | 2008-01-08 14:43:18 -0600 (Tue, 08 Jan 2008) | 3 lines

Increase constants to where we're less likely to hit them while debugging.
(Closes issue ASTERISK-11165)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=97194

By: Digium Subversion (svnbot) 2008-01-08 14:52:49.000-0600

Repository: asterisk
Revision: 97198

_U  trunk/
U   trunk/main/autoservice.c
U   trunk/main/utils.c

------------------------------------------------------------------------
r97198 | tilghman | 2008-01-08 14:52:49 -0600 (Tue, 08 Jan 2008) | 11 lines

Merged revisions 97194 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97194 | tilghman | 2008-01-08 14:47:07 -0600 (Tue, 08 Jan 2008) | 3 lines

Increase constants to where we're less likely to hit them while debugging.
(Closes issue ASTERISK-11165)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=97198

By: Digium Subversion (svnbot) 2008-01-09 12:20:55.000-0600

Repository: asterisk
Revision: 97535

_U  team/murf/bug11210/
U   team/murf/bug11210/CHANGES
U   team/murf/bug11210/apps/app_queue.c
U   team/murf/bug11210/apps/app_voicemail.c
U   team/murf/bug11210/channels/chan_console.c
U   team/murf/bug11210/channels/chan_mgcp.c
U   team/murf/bug11210/channels/chan_sip.c
U   team/murf/bug11210/configs/queues.conf.sample
U   team/murf/bug11210/funcs/func_groupcount.c
U   team/murf/bug11210/main/asterisk.c
U   team/murf/bug11210/main/autoservice.c
U   team/murf/bug11210/main/utils.c
U   team/murf/bug11210/res/res_jabber.c

------------------------------------------------------------------------
r97535 | murf | 2008-01-09 12:20:51 -0600 (Wed, 09 Jan 2008) | 140 lines

Merged revisions 97125,97153-97154,97193,97196-97200,97203,97208,97248 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r97125 | tilghman | 2008-01-08 12:06:27 -0700 (Tue, 08 Jan 2008) | 11 lines

Merged revisions 97077 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97077 | tilghman | 2008-01-08 12:02:13 -0600 (Tue, 08 Jan 2008) | 3 lines

Apply multiple crash fixes, found in issue ASTERISK-10897, but not completely
closing that issue.

........

................
r97153 | file | 2008-01-08 12:55:56 -0700 (Tue, 08 Jan 2008) | 12 lines

Merged revisions 97152 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97152 | file | 2008-01-08 15:53:52 -0400 (Tue, 08 Jan 2008) | 4 lines

If no group has been provided to the GROUP_COUNT dialplan function then use the first one specific to the channel.
(closes issue ASTERISK-10610)
Reported by: m4him

........

................
r97154 | file | 2008-01-08 13:06:52 -0700 (Tue, 08 Jan 2008) | 6 lines

Move common code for setting T38 capabilities and fix a bug with fax detection in the SIP RTP read callback. It's still sort of silly... but more on that later.
(closes issue ASTERISK-10758)
Reported by: dimas
Patches:
     sipt38prop.patch uploaded by dimas (license 88)

................
r97193 | mmichelson | 2008-01-08 13:45:59 -0700 (Tue, 08 Jan 2008) | 17 lines

Merged revisions 97192 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97192 | mmichelson | 2008-01-08 14:42:07 -0600 (Tue, 08 Jan 2008) | 9 lines

Making some changes designed to not allow for a corrupted mailstream for a vm_state.

1. Add locking to the vm_state retrieval functions so that no linked list corruption occurs.
2. Make sure to always grab the persistent vm_state when mailstream access is necessary.
3. Correct an incorrect return value in the init_mailstream function.

(closes issue ASTERISK-10818, reported by dwhite)


........

................
r97196 | file | 2008-01-08 13:50:57 -0700 (Tue, 08 Jan 2008) | 14 lines

Merged revisions 97195 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97195 | file | 2008-01-08 16:48:20 -0400 (Tue, 08 Jan 2008) | 6 lines

Fix various DTMF issues in chan_mgcp.
(closes issue ASTERISK-10954)
Reported by: eferro
Patches:
     dtmf_control_hybrid-inband-mode.patch uploaded by eferro (license 337)

........

................
r97197 | file | 2008-01-08 13:52:07 -0700 (Tue, 08 Jan 2008) | 2 lines

One line documentation ftw!

................
r97198 | tilghman | 2008-01-08 13:56:38 -0700 (Tue, 08 Jan 2008) | 11 lines

Merged revisions 97194 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r97194 | tilghman | 2008-01-08 14:47:07 -0600 (Tue, 08 Jan 2008) | 3 lines

Increase constants to where we're less likely to hit them while debugging.
(Closes issue ASTERISK-11165)

........

................
r97199 | oej | 2008-01-08 13:58:56 -0700 (Tue, 08 Jan 2008) | 2 lines

Iksemel is alive in a new home. Release 1.3 is out with bug fixes.

................
r97200 | oej | 2008-01-08 14:01:46 -0700 (Tue, 08 Jan 2008) | 4 lines

Change reference to external library so it appears on the extref listing
http://www.asterisk.org/doxygen/trunk/extref.html


................
r97203 | mmichelson | 2008-01-08 14:18:32 -0700 (Tue, 08 Jan 2008) | 8 lines

Adding the option of specifying a second interface in a member definition for a queue. app_queue
will monitor this second device's state for the member, even though it actually calls the first
interface. This ability has been added for statically defined queue members, realtime queue members,
and dynamic queue members added through the CLI, dialplan, or manager.

(closes issue ASTERISK-11085, reported by acidv)


................
r97208 | mmichelson | 2008-01-08 14:28:04 -0700 (Tue, 08 Jan 2008) | 10 lines

Blocked revisions 97206 via svnmerge

........
r97206 | mmichelson | 2008-01-08 15:24:48 -0600 (Tue, 08 Jan 2008) | 3 lines

Some coding guidelines-related cleanup


........

................
r97248 | twilson | 2008-01-08 14:56:57 -0700 (Tue, 08 Jan 2008) | 2 lines

Initialize new variable to NULL

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=97535

By: Private Name (falves11) 2008-01-09 23:38:46.000-0600

We should raise this constant
#define MAX_AUTOMONS 1500

#define MAX_AUTOMONS 10256

under heavy load, I still get the error related to this variable. I am using SVN-trunk-r97623M

By: Tilghman Lesher (tilghman) 2008-01-10 12:26:39.000-0600

russell: could you take a look at this?

By: Russell Bryant (russell) 2008-01-11 15:55:04.000-0600

If a channel is in autoservice while attempting to log something to a database, then it's possible that 1500 slots in autoservice could get filled up if the database is hanging for some reason.  The question is, where is it hanging, and why ...

By: Digium Subversion (svnbot) 2008-02-29 17:30:56.000-0600

Repository: asterisk
Revision: 105409

U   branches/1.4/main/autoservice.c

------------------------------------------------------------------------
r105409 | russell | 2008-02-29 17:30:48 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice.  There was a race condition in the handling of
the list of channels in autoservice.  The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
was still messing with the channel.  This led to memory corruption, and caused
crashes.  This explains multiple backtraces I have seen that have references
to autoservice, but do to the nature of the issue (memory corruption), could
cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related.  If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105409

By: Digium Subversion (svnbot) 2008-02-29 17:33:02.000-0600

Repository: asterisk
Revision: 105410

_U  trunk/
U   trunk/main/autoservice.c

------------------------------------------------------------------------
r105410 | russell | 2008-02-29 17:33:00 -0600 (Fri, 29 Feb 2008) | 31 lines

Merged revisions 105409 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice.  There was a race condition in the handling of
the list of channels in autoservice.  The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
was still messing with the channel.  This led to memory corruption, and caused
crashes.  This explains multiple backtraces I have seen that have references
to autoservice, but do to the nature of the issue (memory corruption), could
cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related.  If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105410

By: Digium Subversion (svnbot) 2008-02-29 17:57:04.000-0600

Repository: asterisk
Revision: 105409

U   branches/1.4/main/autoservice.c

------------------------------------------------------------------------
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice.  There was a race condition in the handling of
the list of channels in autoservice.  The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
thread was still messing with the channel.  This led to memory corruption, and
caused crashes.  This explains multiple backtraces I have seen that have
references to autoservice, but do to the nature of the issue (memory corruption),
could cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related.  If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105409

By: Digium Subversion (svnbot) 2008-02-29 17:57:35.000-0600

Repository: asterisk
Revision: 105410

_U  trunk/
U   trunk/main/autoservice.c

------------------------------------------------------------------------
r105410 | russell | 2008-02-29 17:36:46 -0600 (Fri, 29 Feb 2008) | 31 lines

Merged revisions 105409 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice.  There was a race condition in the handling of
the list of channels in autoservice.  The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
thread was still messing with the channel.  This led to memory corruption, and
caused crashes.  This explains multiple backtraces I have seen that have
references to autoservice, but do to the nature of the issue (memory corruption),
could cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related.  If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105410