Summary: | ASTERISK-11165: Asterisk crashes when connecting the 10th simultaneous ODBC pooled connections | ||
Reporter: | Private Name (falves11) | Labels: | |
Date Opened: | 2008-01-06 09:44:11.000-0600 | Date Closed: | 2008-02-29 17:57:35.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_odbc |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | After so much work now I nailed the ODBC problem. I traced how many open connection I have in my SQL server from Asterisk every second, and inf act, I see 1,2,3,4,5,6,7,8,9,0, and Asterisk crashes. The capture below is obtained by setting verbose level to 1, and it can be seen clearly that at the 10th ODBC connection Asterisk is unable to connect. Yet, the command odbc show gives this output, which proves that theoretically there is a 1000 limit on the connections. Name: global DSN: mssql Pooled: Yes Limit: 1000 Connections in use: 1 - Connection 1: connected ****** ADDITIONAL INFORMATION ****** [Jan 6 10:33:06] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:07] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:07] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:08] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:08] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:08] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:08] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:09] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:09] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:10] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:10] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:11] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:11] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:12] NOTICE[1973]: res_odbc.c:576 odbc_obj_connect: res_odbc: Connected to global [mssql] [Jan 6 10:33:13] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global [Jan 6 10:33:13] WARNING[1973]: res_odbc.c:573 odbc_obj_connect: res_odbc: Error SQLConnect=-1 errno=0 [unixODBC][FreeTDS][SQL Server]Unable to connect to data source [Jan 6 10:33:13] WARNING[1973]: res_odbc.c:471 ast_odbc_request_obj: Failed to connect to global [Jan 6 10:33:13] ERROR[1973]: func_odbc.c:320 acf_odbc_read: Unable to execute query [Routing 'S220-1199633587.10','66.28.190.220','19544447408','5058406824',''] [Jan 6 10:33:13] WARNING[1973]: ast_expr2.fl:428 ast_yyerror: ast_yyerror(): syntax error: syntax error, unexpected '<', expecting $end; Input: < 0 ^ [Jan 6 10:33:13] WARNING[1973]: ast_expr2.fl:432 ast_yyerror: If you have questions, please refer to doc/channelvariables.txt in the asterisk source. [Jan 6 10:33:14] WARNING[1973]: app_dial.c:1266 dial_exec_full: Dial requires an argument (technology/number) [Jan 6 10:33:14] WARNING[1973]: ast_expr2.fl:428 ast_yyerror: ast_yyerror(): syntax error: syntax error, unexpected '>=', expecting $end; Input: >= 0 ^ [Jan 6 10:33:14] WARNING[1973]: ast_expr2.fl:432 ast_yyerror: If you have questions, please refer to doc/channelvariables.txt in the asterisk source. [Jan 6 10:33:14] NOTICE[1973]: res_odbc.c:562 odbc_obj_connect: Connecting global | ||
Comments: | By: Private Name (falves11) 2008-01-06 14:11:36.000-0600 I get thousand of line like this when compiling with all the debugs. res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 70 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 40 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 80 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 85 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 25 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 10 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 65 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'? By: Tilghman Lesher (tilghman) 2008-01-07 11:34:59.000-0600 To report a crash, you MUST upload a stack backtrace (as a separate file, NOT pasted in a bugnote), as documented in doc/backtrace.txt. By: Tilghman Lesher (tilghman) 2008-01-07 14:08:25.000-0600 Also, have you checked on your MS SQL Server as to whether that particular user is limited to 9 simultaneous connections? By: Private Name (falves11) 2008-01-07 14:15:34.000-0600 Yes, the user is a sysdamin, and has all rights. There are no limits. I believe there was a problem with the ODBC Driver, and my vendor, Easysoft is checking it right now. He might find an additional problem in the Asterisk code since it keep blowing up under pressure. He is using Valgrind to find the culprit. I will publish his findings here. Meanwhile, we should increase the max amount of events to monitor by default from the current value to a much higher number. The code is in /main/autoservice.c I am using #define MAX_AUTOMONS 10256. By: Tilghman Lesher (tilghman) 2008-01-07 15:30:52.000-0600 How many channels are you running at once? You should only hit MAX_AUTOMONS when you have 256 simultaneous channels, minimum. By: Private Name (falves11) 2008-01-07 15:35:40.000-0600 I my tests I hit way over 256 simultaneous calls. This is a 64 Bit RHEl 5. It works. The only issue is the ODBC subsistem, but it seems to be a driver issue. Freetds is not designed to handle the pressure. Away from that, Asterisk can handle any volume of calls. I am not touching the media. By: Private Name (falves11) 2008-01-08 11:24:11.000-0600 Conclusions from this case: in /main/util.c we need to double the AST_MAX_LOCKS to 64 in /main/autoservice.c we need to change MAX_AUTOMONS to 1500 With the new Driver version 10.43 from Easysoft, I can not take it down. However, Nick Gorgham submitted some very important information to the ast_development thread, gathered using valgrind. Here is the copy. "My reading of whats going on is that the scheduler can call static int retrans_pkt(const void *data) And this can run as far as the /* Lock channel PVT */ sip_pvt_lock(pkt->owner); Then it sits there waiting for the lock to be released. The lock I suspect was put on this in static void __sip_ack(struct sip_pvt *p, int seqno, int resp, int sipmethod) which runs its course, and eventually calls UNLINK(cur, p->packets, prev); dialog_unref(cur->owner); ast_free(cur); break; The ast_free releases the memory, but its still referenced by the pkt in the thread in retransmit_pkt. Then it wakes up the locked thread by caling sip_pvt_unlock(p); But now when the call in retransmit_pkt continues, the memory its using has actually been free'd by now, so it leads open the chance of corrupting the heap with the expected entertaining problems." I hope that from this case we may get a much better Asteris, one that can rival and beat any commercial soft switch out there. By: Digium Subversion (svnbot) 2008-01-08 14:43:19.000-0600 Repository: asterisk Revision: 97194 U branches/1.4/main/autoservice.c U branches/1.4/main/utils.c ------------------------------------------------------------------------ r97194 | tilghman | 2008-01-08 14:43:18 -0600 (Tue, 08 Jan 2008) | 3 lines Increase constants to where we're less likely to hit them while debugging. (Closes issue ASTERISK-11165) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=97194 By: Digium Subversion (svnbot) 2008-01-08 14:52:49.000-0600 Repository: asterisk Revision: 97198 _U trunk/ U trunk/main/autoservice.c U trunk/main/utils.c ------------------------------------------------------------------------ r97198 | tilghman | 2008-01-08 14:52:49 -0600 (Tue, 08 Jan 2008) | 11 lines Merged revisions 97194 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97194 | tilghman | 2008-01-08 14:47:07 -0600 (Tue, 08 Jan 2008) | 3 lines Increase constants to where we're less likely to hit them while debugging. (Closes issue ASTERISK-11165) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=97198 By: Digium Subversion (svnbot) 2008-01-09 12:20:55.000-0600 Repository: asterisk Revision: 97535 _U team/murf/bug11210/ U team/murf/bug11210/CHANGES U team/murf/bug11210/apps/app_queue.c U team/murf/bug11210/apps/app_voicemail.c U team/murf/bug11210/channels/chan_console.c U team/murf/bug11210/channels/chan_mgcp.c U team/murf/bug11210/channels/chan_sip.c U team/murf/bug11210/configs/queues.conf.sample U team/murf/bug11210/funcs/func_groupcount.c U team/murf/bug11210/main/asterisk.c U team/murf/bug11210/main/autoservice.c U team/murf/bug11210/main/utils.c U team/murf/bug11210/res/res_jabber.c ------------------------------------------------------------------------ r97535 | murf | 2008-01-09 12:20:51 -0600 (Wed, 09 Jan 2008) | 140 lines Merged revisions 97125,97153-97154,97193,97196-97200,97203,97208,97248 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r97125 | tilghman | 2008-01-08 12:06:27 -0700 (Tue, 08 Jan 2008) | 11 lines Merged revisions 97077 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97077 | tilghman | 2008-01-08 12:02:13 -0600 (Tue, 08 Jan 2008) | 3 lines Apply multiple crash fixes, found in issue ASTERISK-10897, but not completely closing that issue. ........ ................ r97153 | file | 2008-01-08 12:55:56 -0700 (Tue, 08 Jan 2008) | 12 lines Merged revisions 97152 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97152 | file | 2008-01-08 15:53:52 -0400 (Tue, 08 Jan 2008) | 4 lines If no group has been provided to the GROUP_COUNT dialplan function then use the first one specific to the channel. (closes issue ASTERISK-10610) Reported by: m4him ........ ................ r97154 | file | 2008-01-08 13:06:52 -0700 (Tue, 08 Jan 2008) | 6 lines Move common code for setting T38 capabilities and fix a bug with fax detection in the SIP RTP read callback. It's still sort of silly... but more on that later. (closes issue ASTERISK-10758) Reported by: dimas Patches: sipt38prop.patch uploaded by dimas (license 88) ................ r97193 | mmichelson | 2008-01-08 13:45:59 -0700 (Tue, 08 Jan 2008) | 17 lines Merged revisions 97192 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97192 | mmichelson | 2008-01-08 14:42:07 -0600 (Tue, 08 Jan 2008) | 9 lines Making some changes designed to not allow for a corrupted mailstream for a vm_state. 1. Add locking to the vm_state retrieval functions so that no linked list corruption occurs. 2. Make sure to always grab the persistent vm_state when mailstream access is necessary. 3. Correct an incorrect return value in the init_mailstream function. (closes issue ASTERISK-10818, reported by dwhite) ........ ................ r97196 | file | 2008-01-08 13:50:57 -0700 (Tue, 08 Jan 2008) | 14 lines Merged revisions 97195 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97195 | file | 2008-01-08 16:48:20 -0400 (Tue, 08 Jan 2008) | 6 lines Fix various DTMF issues in chan_mgcp. (closes issue ASTERISK-10954) Reported by: eferro Patches: dtmf_control_hybrid-inband-mode.patch uploaded by eferro (license 337) ........ ................ r97197 | file | 2008-01-08 13:52:07 -0700 (Tue, 08 Jan 2008) | 2 lines One line documentation ftw! ................ r97198 | tilghman | 2008-01-08 13:56:38 -0700 (Tue, 08 Jan 2008) | 11 lines Merged revisions 97194 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r97194 | tilghman | 2008-01-08 14:47:07 -0600 (Tue, 08 Jan 2008) | 3 lines Increase constants to where we're less likely to hit them while debugging. (Closes issue ASTERISK-11165) ........ ................ r97199 | oej | 2008-01-08 13:58:56 -0700 (Tue, 08 Jan 2008) | 2 lines Iksemel is alive in a new home. Release 1.3 is out with bug fixes. ................ r97200 | oej | 2008-01-08 14:01:46 -0700 (Tue, 08 Jan 2008) | 4 lines Change reference to external library so it appears on the extref listing http://www.asterisk.org/doxygen/trunk/extref.html ................ r97203 | mmichelson | 2008-01-08 14:18:32 -0700 (Tue, 08 Jan 2008) | 8 lines Adding the option of specifying a second interface in a member definition for a queue. app_queue will monitor this second device's state for the member, even though it actually calls the first interface. This ability has been added for statically defined queue members, realtime queue members, and dynamic queue members added through the CLI, dialplan, or manager. (closes issue ASTERISK-11085, reported by acidv) ................ r97208 | mmichelson | 2008-01-08 14:28:04 -0700 (Tue, 08 Jan 2008) | 10 lines Blocked revisions 97206 via svnmerge ........ r97206 | mmichelson | 2008-01-08 15:24:48 -0600 (Tue, 08 Jan 2008) | 3 lines Some coding guidelines-related cleanup ........ ................ r97248 | twilson | 2008-01-08 14:56:57 -0700 (Tue, 08 Jan 2008) | 2 lines Initialize new variable to NULL ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=97535 By: Private Name (falves11) 2008-01-09 23:38:46.000-0600 We should raise this constant #define MAX_AUTOMONS 1500 #define MAX_AUTOMONS 10256 under heavy load, I still get the error related to this variable. I am using SVN-trunk-r97623M By: Tilghman Lesher (tilghman) 2008-01-10 12:26:39.000-0600 russell: could you take a look at this? By: Russell Bryant (russell) 2008-01-11 15:55:04.000-0600 If a channel is in autoservice while attempting to log something to a database, then it's possible that 1500 slots in autoservice could get filled up if the database is hanging for some reason. The question is, where is it hanging, and why ... By: Digium Subversion (svnbot) 2008-02-29 17:30:56.000-0600 Repository: asterisk Revision: 105409 U branches/1.4/main/autoservice.c ------------------------------------------------------------------------ r105409 | russell | 2008-02-29 17:30:48 -0600 (Fri, 29 Feb 2008) | 23 lines Fix a major bug in autoservice. There was a race condition in the handling of the list of channels in autoservice. The problem was that it was possible for a channel to get removed from autoservice and destroyed, while the autoservice was still messing with the channel. This led to memory corruption, and caused crashes. This explains multiple backtraces I have seen that have references to autoservice, but do to the nature of the issue (memory corruption), could cause crashes in a number of areas. (fixes the crash in BE-386) (closes issue ASTERISK-11165) (closes issue ASTERISK-11391) The following issues could be related. If you are the reporter of one of these, please update to include this fix and try again. (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11545) (potentially fixes issue ASTERISK-11058) (potentially fixes issue ASTERISK-11453) (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11437) (potentially fixes issue ASTERISK-11259) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=105409 By: Digium Subversion (svnbot) 2008-02-29 17:33:02.000-0600 Repository: asterisk Revision: 105410 _U trunk/ U trunk/main/autoservice.c ------------------------------------------------------------------------ r105410 | russell | 2008-02-29 17:33:00 -0600 (Fri, 29 Feb 2008) | 31 lines Merged revisions 105409 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines Fix a major bug in autoservice. There was a race condition in the handling of the list of channels in autoservice. The problem was that it was possible for a channel to get removed from autoservice and destroyed, while the autoservice was still messing with the channel. This led to memory corruption, and caused crashes. This explains multiple backtraces I have seen that have references to autoservice, but do to the nature of the issue (memory corruption), could cause crashes in a number of areas. (fixes the crash in BE-386) (closes issue ASTERISK-11165) (closes issue ASTERISK-11391) The following issues could be related. If you are the reporter of one of these, please update to include this fix and try again. (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11545) (potentially fixes issue ASTERISK-11058) (potentially fixes issue ASTERISK-11453) (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11437) (potentially fixes issue ASTERISK-11259) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=105410 By: Digium Subversion (svnbot) 2008-02-29 17:57:04.000-0600 Repository: asterisk Revision: 105409 U branches/1.4/main/autoservice.c ------------------------------------------------------------------------ r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines Fix a major bug in autoservice. There was a race condition in the handling of the list of channels in autoservice. The problem was that it was possible for a channel to get removed from autoservice and destroyed, while the autoservice thread was still messing with the channel. This led to memory corruption, and caused crashes. This explains multiple backtraces I have seen that have references to autoservice, but do to the nature of the issue (memory corruption), could cause crashes in a number of areas. (fixes the crash in BE-386) (closes issue ASTERISK-11165) (closes issue ASTERISK-11391) The following issues could be related. If you are the reporter of one of these, please update to include this fix and try again. (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11545) (potentially fixes issue ASTERISK-11058) (potentially fixes issue ASTERISK-11453) (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11437) (potentially fixes issue ASTERISK-11259) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=105409 By: Digium Subversion (svnbot) 2008-02-29 17:57:35.000-0600 Repository: asterisk Revision: 105410 _U trunk/ U trunk/main/autoservice.c ------------------------------------------------------------------------ r105410 | russell | 2008-02-29 17:36:46 -0600 (Fri, 29 Feb 2008) | 31 lines Merged revisions 105409 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines Fix a major bug in autoservice. There was a race condition in the handling of the list of channels in autoservice. The problem was that it was possible for a channel to get removed from autoservice and destroyed, while the autoservice thread was still messing with the channel. This led to memory corruption, and caused crashes. This explains multiple backtraces I have seen that have references to autoservice, but do to the nature of the issue (memory corruption), could cause crashes in a number of areas. (fixes the crash in BE-386) (closes issue ASTERISK-11165) (closes issue ASTERISK-11391) The following issues could be related. If you are the reporter of one of these, please update to include this fix and try again. (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11545) (potentially fixes issue ASTERISK-11058) (potentially fixes issue ASTERISK-11453) (potentially fixes issue ASTERISK-10713) (potentially fixes issue ASTERISK-11437) (potentially fixes issue ASTERISK-11259) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=105410 |