Summary: | ASTERISK-19837: Asterisk crashing regularly in 1.8.11.1 due to memory corruption | ||||
Reporter: | David Cunningham (dcunningham) | Labels: | |||
Date Opened: | 2012-05-03 18:42:33 | Date Closed: | 2012-10-02 17:18:50 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Core/General | ||
Versions: | 1.8.11.1 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | CentOS release 5.6 (Final), i386 | Attachments: | ( 0) full-19837-3.log.gz | ||
Description: | Since we upgraded a system to Asterisk 1.8.11.1 we are getting regular crashes. The backtrace is below and the core file will be attached. Thanks. (gdb) bt full #0 0x007f5175 in _int_free () from /lib/libc.so.6 No symbol table info available. #1 0x007f5b09 in free () from /lib/libc.so.6 No symbol table info available. #2 0x0818cc5c in destroy (pvt=0xa055a08) at translate.c:145 t = 0x4071f520 #3 0x0818cf40 in ast_translator_free_path (p=0xa055a08) at translate.c:233 pn = 0x0 #4 0x080ada79 in free_translation (clonechan=0x44b958a8) at channel.c:2705 No locals. #5 0x080ae11e in ast_hangup (chan=0x44b958a8) at channel.c:2794 extra_str = "xX\271DxX\271D\000\000\000\000\000\000\000\000\024\\\271D\000\000\000\000\230o\216B\350c\b\bÔ‹\034\b\364\017\216\000(GT\n@!\216\000\270o\216B\t[\177\000\000\000\000\000\000\002\002" was_zombie = 0 __PRETTY_FUNCTION__ = "ast_hangup" #6 0x08144811 in __ast_pbx_run (c=0x44b958a8, args=0x0) at pbx.c:5235 found = 1 res = -1 autoloopflag = 0 error = 1 __PRETTY_FUNCTION__ = "__ast_pbx_run" #7 0x08144b3d in pbx_thread (data=0x44b958a8) at pbx.c:5327 c = 0x44b958a8 #8 0x08196782 in dummy_start (data=0x44a0b2f8) at utils.c:1004 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {9715700, 0, 1116634000, 1116631992, -422965372, -1403184250}, __mask_was_saved = 0}}, __pad = {0x428e7470, 0x0, 0x0, 0x0}} __cancel_routine = 0x807ae77 <ast_unregister_thread> __cancel_arg = 0x428e7b90 not_first_call = 0 ret = 0x89e46e a = {start_routine = 0x8144b1e <pbx_thread>, data = 0x44b958a8, name = 0x44a79568 "pbx_thread", ' ' <repeats 11 times>, "started at [ 5353] pbx.c ast_pbx_start()"} #9 0x00933832 in start_thread () from /lib/libpthread.so.0 No symbol table info available. #10 0x0085e45e in clone () from /lib/libc.so.6 No symbol table info available. | ||||
Comments: | By: David Cunningham (dcunningham) 2012-05-03 18:45:00.705-0500 The core file is 104Mb in size. How can we provide this to you please? By: Richard Mudgett (rmudgett) 2012-05-03 19:27:02.753-0500 Core files are not wanted because: 1) They are huge. 2) They are only useful on the machine that generated them. https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace What do you need to do to cause this crash? By: David Cunningham (dcunningham) 2012-05-03 19:32:09.296-0500 From our view it appears to be random - in other words, we have not identified any particular action that causes the crash. Thanks. By: Matt Jordan (mjordan) 2012-05-04 08:13:05.303-0500 We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information Please collect a DEBUG log illustrating the actions Asterisk takes leading up to the crash. By: David Cunningham (dcunningham) 2012-05-04 18:24:46.024-0500 Asterisk full log is attached. The crash happened on line 18993 at 16:20:14. Thanks. By: Russell Bryant (russell) 2012-06-01 06:50:42.825-0500 Does this happen in a test environment? I was wondering if it would be possible to test under valgrind: https://wiki.asterisk.org/wiki/display/AST/Valgrind By: David Cunningham (dcunningham) 2012-06-04 04:24:59.002-0500 This is happening on a production environment. I think the customer would be unhappy to run it under valgrind unless we were very certain it wouldn't cause any issues noticeable by their customers. We could upgrade a test system to this version of Asterisk, but since we don't know how to reproduce the problem it may not happen. Can you advise if a valgrind trace is absolutely required, and if there's any risk associated with it? By: Russell Bryant (russell) 2012-06-04 04:45:09.666-0500 There is certainly a risk. I wouldn't try to use it in production. By: David Cunningham (dcunningham) 2012-06-04 04:52:24.973-0500 OK thanks, please advise if we can help with anything else. By: Russell Bryant (russell) 2012-06-04 19:34:55.892-0500 After looking into this some more, I'm not sure how to proceed unless we're able to reproduce this in a test environment. If the customer has any sort of downtime where the system isn't used when valgrind could be used, that would be great. Otherwise, I think trying as similar of a setup as possible on a test system would be the best next step. Let me know if I can help. By: David Cunningham (dcunningham) 2012-06-05 11:17:17.150-0500 Russell - thanks, will check with the customer and get back to you. By: Igor Goncharovsky (igorg) 2012-06-07 01:15:37.249-0500 I have got access to customer server, so can look at core files. Almost all core files have differend backtraces, failing both on allocating and freeing of memmory. Core files created mostly in workhours (failing approx. once in 3 days) and I see no specific way to reproduce issue. #0 0x40000410 in __kernel_vsyscall () #1 0x007b4df0 in raise () from /lib/libc.so.6 #2 0x007b6701 in abort () from /lib/libc.so.6 #3 0x007ed3ab in __libc_message () from /lib/libc.so.6 #4 0x007f6d96 in _int_malloc () from /lib/libc.so.6 #5 0x007f7cca in calloc () from /lib/libc.so.6 #6 0x081954c8 in _ast_calloc (num=1, len=402, file=0x81f57dc "/usr/src/asterisk-1.8.11.1/include/asterisk/strings.h", lineno=420, func=0x81f57cc "ast_str_create") at /usr/src/asterisk-1.8.11.1/include/asterisk/utils.h:480 #7 0x081958db in ast_str_create (init_len=390) at /usr/src/asterisk-1.8.11.1/include/asterisk/strings.h:405 #8 0x4057c0e8 in __sip_reliable_xmit (p=0x9eb68a0, seqno=102, resp=0, data=0x9ee1510, fatal=0, sipmethod=14) at chan_sip.c:3770 #9 0x4057e2a1 in send_request (p=0x9eb68a0, req=0x4216fdd8, reliable=XMIT_RELIABLE, seqno=102) at chan_sip.c:4229 #10 0x405ab9e0 in transmit_request (p=0x9eb68a0, sipmethod=14, seqno=102, reliable=XMIT_RELIABLE, newbranch=0) at chan_sip.c:13600 #11 0x40588a09 in sip_hangup (ast=0xa247338) at chan_sip.c:6336 #12 0x080ae3bb in ast_hangup (chan=0xa247338) at channel.c:2831 #13 0x4055ad52 in hanguptree (outgoing=0xa65e8b0, exception=0x0, answered_elsewhere=0) at app_dial.c:700 (gdb) bt #0 0x40000410 in __kernel_vsyscall () #1 0x007b4df0 in raise () from /lib/libc.so.6 #2 0x007b6701 in abort () from /lib/libc.so.6 #3 0x007ed3ab in __libc_message () from /lib/libc.so.6 #4 0x007f6721 in _int_malloc () from /lib/libc.so.6 #5 0x007f7fb7 in malloc () from /lib/libc.so.6 #6 0x08195461 in _ast_malloc (len=229, file=0x81e05f4 "manager.c", lineno=5029, func=0x81e322a "append_event") at /usr/src/asterisk-1.8.11.1/include/asterisk/utils.h:457 #7 0x0812d971 in append_event ( str=0x8e05744 "Event: AGIExec\r\nPrivilege: agi,all\r\nTimestamp: 1337788773.143723\r\nSubEvent: Start\r\nChannel: Local/2@enswitch-call-exten-9e55;2\r\nCommandId: 656304319\r\nCommand: SET VARIABLE TIMEOUT(absolute) \"86398\"\r\n\r"..., category=8192) at manager.c:5029 #8 0x0812de3d in __ast_manager_event_multi By: David Cunningham (dcunningham) 2012-06-14 02:56:32.713-0500 We've had to downgrade another customer due to regular crashes in 1.8.12.0. Can we be of any assistance with fixing this? By: Russell Bryant (russell) 2012-06-14 11:07:38.712-0500 You could try running Asterisk under valgrind in a test environment similar to what your customers are using and see if you get any output when running test calls. :-) By: Russell Bryant (russell) 2012-06-19 16:21:23.572-0500 Is Voicemail being used? If so, give the patch on this issue a try: ASTERISK-19923 By: Matt Jordan (mjordan) 2012-06-25 08:45:39.280-0500 David: Have you had a chance to try the patch on ASTERISK-19923? Matt By: David Cunningham (dcunningham) 2012-06-25 18:42:33.451-0500 We're waiting to hear from the customer. Will update you, thanks. By: Igor Goncharovsky (igorg) 2012-07-08 23:59:42.605-0500 Patch helps with this issue, reported by customer By: David Cunningham (dcunningham) 2012-07-09 00:15:42.086-0500 Sorry, yes, the customer has reported no crashes since the patch was applied. What version of Asterisk will this patch be put into please? By: David Cunningham (dcunningham) 2012-07-09 00:15:55.107-0500 Sending back. By: Igor Goncharovsky (igorg) 2012-07-09 00:26:32.092-0500 It is in 1.8.14.0-rc2 and should be released in 1.8.14.0 |