Summary: | ASTERISK-29415: Crash in PJSIP TLS transport | ||
Reporter: | Andrew Yager (andrewyager) | Labels: | patch security |
Date Opened: | 2021-05-05 09:28:37 | Date Closed: | 2021-07-22 16:07:38 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_pjsip |
Versions: | 18.3.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Ubuntu 20.04 OpenSSL 1.1.1f | Attachments: | ( 0) ASTERISK-29415.diff ( 1) CoreDump-brief.txt ( 2) CoreDump-full.txt ( 3) CoreDump-info.txt.zip ( 4) CoreDump-locks.txt ( 5) CoreDump-thread1.txt |
Description: | We've been able to reproduce a repeatable crash on a system with a large number of endpoints where there are mixed TLS versions in use by the clients, and Asterisk will reliably crash within 2 - 3 minutes of startup.
We have not been able to verify if the crash is simply related to unsupported TLS version traffic being recieved, or bad TLS traffic, or other memory management issues, but the stack trace seems to be pretty reliably: {code} #0 0x00007f34451d0e74 in on_accept_complete2 (ssock=0x555f4fabe688, new_ssock=0x7f3211d0dc38, src_addr=0x7f3211d0e31c, src_addr_len=16, accept_status=0) at ../src/pjsip/sip_transport_tls.c:1352 #1 0x00007f344527953e in on_handshake_complete (ssock=0x7f3211d0dc38, status=0) at ../src/pj/ssl_sock_imp_common.c:290 #2 0x00007f3445279f89 in asock_on_data_read (asock=0x7f320a7d90a0, data=0x7f320a7c9078, size=395, status=0, remainder=0x7f33bfffeb10) at ../src/pj/ssl_sock_imp_common.c:681 #3 0x00007f344526cc1c in ioqueue_on_read_complete (key=0x555f4f83b7e0, op_key=0x7f327bfffbc8, bytes_read=395) at ../src/pj/activesock.c:504 #4 0x00007f3445265180 in ioqueue_dispatch_read_event (ioqueue=0x7f33f85940a0, h=0x555f4f83b7e0) at ../src/pj/ioqueue_common_abs.c:605 #5 0x00007f34452672d5 in pj_ioqueue_poll (ioqueue=0x7f33f85940a0, timeout=0x7f33bfffedf0) at ../src/pj/ioqueue_epoll.c:720 #6 0x00007f34451bb759 in pjsip_endpt_handle_events2 (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50, p_count=0x0) at ../src/pjsip/sip_endpoint.c:745 #7 0x00007f34451bb832 in pjsip_endpt_handle_events (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50) at ../src/pjsip/sip_endpoint.c:777 #8 0x00007f33fb6cc8d6 in monitor_thread_exec (endpt=0x0) at res_pjsip.c:5166 #9 0x00007f3445268298 in thread_main (param=0x555f4f7b0cc8) at ../src/pj/os_core_unix.c:541 #10 0x00007f3444a68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #11 0x00007f34446a3293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 {code} with the full thread1 trace as: {code} !@!@!@! thread1.txt !@!@!@! $1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {176, 0 <repeats 27 times>}, _kill = {si_pid = 176, si_uid = 0}, _timer = {si_tid = 176, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 176, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 176, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0xb0, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 176, si_fd = 0}}} Signal Stop Print Pass to program Description SIGSEGV Yes Yes Yes Segmentation fault Thread 1 (Thread 0x7f33bffff700 (LWP 59868)): #0 0x00007f34451d0e74 in on_accept_complete2 (ssock=0x555f4fabe688, new_ssock=0x7f3211d0dc38, src_addr=0x7f3211d0e31c, src_addr_len=16, accept_status=0) at ../src/pjsip/sip_transport_tls.c:1352 listener = 0x0 tls = 0x0 ssl_info = {established = -1073747744, proto = 32563, cipher = 1674508032, local_addr = {addr = {sa_family = 64638}, ipv4 = {sin_family = 64638, sin_port = 60033, sin_addr = {s_addr = 2816287024}, sin_zero = "3177000000000367316c"}, ipv6 = {sin6_family = 64638, sin6_port = 60033, sin6_flowinfo = 2816287024, sin6_addr = {s6_addr = "3177000000000367316c~374201352260315324021", u6_addr32 = {32563, 1674508032, 3934387326, 299158960}}, sin6_scope_id = 32562}}, remote_addr = {addr = {sa_family = 0}, ipv4 = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "360367062246063177000"}, ipv6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {s6_addr = "360367062246063177000000276026sR376177000", u6_addr32 = {2788358128, 32563, 1383274174, 32766}}, sin6_scope_id = 1383274175}}, local_cert_info = 0x7f344527cd07 <sk_X509_num+24>, remote_cert_info = 0x7f33bfffef80, verify_status = 1010030128, last_native_err = 139860241279344, grp_lock = 0x7f3445280424 <ssl_update_remote_cert_chain_info+412>} addr = "330022270061001000000000060326063<2177000000350337320021062177000000260310324342062177000000p351377277063177000000000E'E002000000000060!ݧ3177000" state_cb = 0x7f344461b00b <_int_free+1611> tmp_src_addr = {addr = {sa_family = 5}, ipv4 = {sin_family = 5, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "x356N023062177000"}, ipv6 = {sin6_family = 5, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {s6_addr = "x356N023062177000000356065*E4177000", u6_addr32 = {323939960, 32562, 1160394222, 32564}}, sin6_scope_id = 1674508032}} is_shutdown = 32564 status = 1153712788 addr_buf = "360367062246063177000000000367316c~374201352260351377277063177000000n005(E4177000000240351377277063177000000070334320021062177000000070334320021000000000" #1 0x00007f344527953e in on_handshake_complete (ssock=0x7f3211d0dc38, status=0) at ../src/pj/ssl_sock_imp_common.c:290 ret = 1383274174 #2 0x00007f3445279f89 in asock_on_data_read (asock=0x7f320a7d90a0, data=0x7f320a7c9078, size=395, status=0, remainder=0x7f33bfffeb10) at ../src/pj/ssl_sock_imp_common.c:681 ret = 1 ssock = 0x7f3211d0dc38 #3 0x00007f344526cc1c in ioqueue_on_read_complete (key=0x555f4f83b7e0, op_key=0x7f327bfffbc8, bytes_read=395) at ../src/pj/activesock.c:504 remainder = 0 ret = 1 flags = 32563 asock = 0x7f320a7d90a0 r = 0x7f327bfffbc8 loop = 0 status = -1073747152 #4 0x00007f3445265180 in ioqueue_dispatch_read_event (ioqueue=0x7f33f85940a0, h=0x555f4f83b7e0) at ../src/pj/ioqueue_common_abs.c:605 read_op = 0x7f327bfffbc8 bytes_read = 395 has_lock = 1 rc = 0 #5 0x00007f34452672d5 in pj_ioqueue_poll (ioqueue=0x7f33f85940a0, timeout=0x7f33bfffedf0) at ../src/pj/ioqueue_epoll.c:720 i = 0 count = 1 event_cnt = 1 processed_cnt = 0 msec = 10 events = {{events = 1, data = {ptr = 0x555f4f83b7e0, fd = 1334032352, u32 = 1334032352, u64 = 93867844286432}}, {events = 32563, data = {ptr = 0x7f33bfffec10, fd = -1073746928, u32 = 3221220368, u64 = 139860241280016}}, {events = 1160201472, data = {ptr = 0xbfffec1000000000, fd = 0, u32 = 0, u64 = 13835036133769084928}}, {events = 32563, data = {ptr = 0x7f34452716c5 <pj_lock_release+50>, fd = 1160189637, u32 = 1160189637, u64 = 139862475216581}}, {events = 3805720496, data = {ptr = 0xfb72400800007f32, fd = 32562, u32 = 32562, u64 = 18118614653968875314}}, {events = 32563, data = {ptr = 0x7f33bfffec60, fd = -1073746848, u32 = 3221220448, u64 = 139860241280096}}, {events = 1160204127, data = {ptr = 0xe2d6afb000007f34, fd = 32564, u32 = 32564, u64 = 16345445068036931380}}, {events = 32562, data = {ptr = 0x7f33fb723de0 <caching_pool>, fd = -76399136, u32 = 4218568160, u64 = 139861238627808}}, {events = 3221220448, data = {ptr = 0xe2d6b0f800007f33, fd = 32563, u32 = 32563, u64 = 16345446476786204467}}, {events = 1, data = {ptr = 0x7f33bfffed08, fd = -1073746680, u32 = 3221220616, u64 = 139860241280264}}, {events = 3221220624, data = {ptr = 0xbfffec9000007f33, fd = 32563, u32 = 32563, u64 = 13835036683524931379}}, {events = 32563, data = {ptr = 0x7f344526abbd <elapsed_msec+125>, fd = 1160162237, u32 = 1160162237, u64 = 139862475189181}}, {events = 3221220616, data = {ptr = 0xbfffed1000007f33, fd = 32563, u32 = 32563, u64 = 13835037233280745267}}, {events = 32563, data = {ptr = 0x3b9aca00, fd = 1000000000, u32 = 1000000000, u64 = 1000000000}}, {events = 0, data = {ptr = 0x963e5b8541cdcd65, fd = 1104006501, u32 = 1104006501, u64 = 10826191182138035557}}, {events = 1093863564, data = {ptr = 0xea81fc7e63cef700, fd = 1674508032, u32 = 1674508032, u64 = 16898064896641398528}}} queue = {{key = 0x555f4f83b7e0, event_type = READABLE_EVENT}, {key = 0x7f33bfffed08, event_type = 3221220624}, {key = 0x7f33bfffece0, event_type = 1160154766}, {key = 0x40937d7d876adcb1, event_type = 4166598744}, {key = 0x4df, event_type = 372}, {key = 0x7f33bfffed00, event_type = 1160189637}, {key = 0x0, event_type = 4166598704}, {key = 0x7f33bfffed20, event_type = 1160262370}, {key = 0x0, event_type = 1333776912}, {key = 0x7f33bfffeda0, event_type = 1160267970}, {key = 0x7f33bfffedf0, event_type = 1333776912}, {key = 0x555f4f9c83c8, event_type = READABLE_EVENT}, {key = 0x1, event_type = 4167558488}, {key = 0x7f341dc3f4b8, event_type = 3805720664}, {key = 0x4df, event_type = 372}, {key = 0x4df, event_type = 398}} t1 = {u32 = {lo = 1832071522, hi = 290}, u64 = 1247372587362} t2 = {u32 = {lo = 1835242228, hi = 290}, u64 = 1247375758068} #6 0x00007f34451bb759 in pjsip_endpt_handle_events2 (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50, p_count=0x0) at ../src/pjsip/sip_endpoint.c:745 timeout = {sec = 0, msec = 10} count = 0 net_event_count = 0 c = 0 #7 0x00007f34451bb832 in pjsip_endpt_handle_events (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50) at ../src/pjsip/sip_endpoint.c:777 No locals. #8 0x00007f33fb6cc8d6 in monitor_thread_exec (endpt=0x0) at res_pjsip.c:5166 delay = {sec = 0, msec = 10} #9 0x00007f3445268298 in thread_main (param=0x555f4f7b0cc8) at ../src/pj/os_core_unix.c:541 rec = 0x555f4f7b0cc8 result = 0x0 rc = 0 #10 0x00007f3444a68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 ret = <optimized out> pd = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139860241282816, -4867490839655063574, 140730281694910, 140730281694911, 140730281695120, 139860241280896, 4761515510424952810, 4763607274558480362}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #11 0x00007f34446a3293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. {code} While we can "reliably reproduce" this, we need to migrate a large portion of endpoints to our testing environment, which is not particularly easy to schedule to allow us to reproduce, so I'm hoping the above or the core dumps we have may be useful to help identify the likely candidate. | ||
Comments: | By: Asterisk Team (asteriskteam) 2021-05-05 09:28:44.974-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/]. By: Asterisk Team (asteriskteam) 2021-05-05 09:28:45.976-0500 This issue has been automatically restricted and set to a blocker due to being a security type issue. If this is not a security vulnerability issue it will be moved to the appropriate issue type when triaged. Please DO NOT put a code review up for this change at this time. Attach any applicable patches to this issue. By: Benjamin Keith Ford (bford) 2021-05-05 10:48:05.813-0500 Unfortunately there's not much to go off of in the backtrace other than seeing it completes the handshake. Can you provide the following as attachments: # More thread 1s. Also, when you say that they mostly seem to be the one that you have in the description, do you mean that some of them are slightly different but end up at the handshake still, or do you mean you are getting entirely different backtraces? # What does your endpoint config look like? How many are TLS vs. non-TLS? # The output of {{asterisk/contrib/scripts/ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump>}}. This should include the core file itself and the associated backtrace information so we can look through memory and get some more information. If you don't have the core file, the info txt file would be helpful. If any of the attachments are too large, you can email them to us directly if you are able. By: Andrew Yager (andrewyager) 2021-05-05 11:24:52.215-0500 Ubuntu seems to have written over the older core dumps, so I'll organise some time over the weekend to bring the load back over and create the crash again. There is about 2k TLS endpoints and about 700 of UDP. Endpoint configs are relatively boring, and live in pjsip realtime with static transports. I can do specific dumps of config parameters if it's helpful and you tell me what you are after. I'll upload the outputs of the ast_coredumper in a minute. By: Andrew Yager (andrewyager) 2021-05-05 11:25:36.201-0500 Actually… attachments will definitely be too large; coreball is 280M. I'll attach the other files, but let me know how you want me to dump the coreball. By: Andrew Yager (andrewyager) 2021-05-05 11:30:03.795-0500 debug files. By: Benjamin Keith Ford (bford) 2021-05-05 11:59:45.384-0500 Don't worry about the endpoint configs, we're mostly just curious about TLS load. The core file will definitely be too large. If you can upload to something like Google Drive and share the link in an email to asteriskteam@digium.com, we can copy it over to internal storage so you can remove it after. By: Andrew Yager (andrewyager) 2021-05-05 19:42:11.767-0500 I've sent this over now. By: Benjamin Keith Ford (bford) 2021-05-06 09:53:56.858-0500 Thanks! I've opened an issue for this to investigate further and moved the core file over. Feel free to remove it whenever you'd like. By: Kevin Harwell (kharwell) 2021-05-25 13:31:19.418-0500 [~andrewyager] Could you please test the attached patch ([^ASTERISK-29415.diff]) from pjproject, and let us know if it fixes the issue? To apply the patch do the following (assumes patch is downloaded to top level Asterisk source directory): {noformat} $ git apply ASTERISK-29415.diff $ make -C third-party/pjproject/ distclean $ make $ make install {noformat} When rebuilding you should see that pjproject gets rebuilt. Something like the following: {noformat} make[1]: Leaving directory '/home/kharwell/src/asterisk/18/menuselect' [pjproject] Downloading https://raw.githubusercontent.com/asterisk/third-party/master/pjproject/2.10/pjproject-2.10.tar.bz2 to /tmp/pjproject-2.10.tar.bz2 [pjproject] Verifying /tmp/pjproject-2.10.tar.bz2 [pjproject] Verify successful [pjproject] Verifying /tmp/pjproject-2.10.tar.bz2 [pjproject] Verify successful [pjproject] Unpacking /tmp/pjproject-2.10.tar.bz2 [pjproject] Applying patches /home/kharwell/src/asterisk/18/third-party/pjproject/patches /home/kharwell/src/asterisk/18/third-party/pjproject/source [pjproject] Applying user.mak [pjproject] Applying custom include file patches/config_site.h [pjproject] Applying custom include file patches/asterisk_malloc_debug.h [pjproject] Rebuilding [pjproject] Configuring with --prefix=/opt/pjproject --disable-speex-codec --disable-speex-aec --disable-bcg729 --disable-gsm-codec --disable-ilbc-codec --disable-l16-codec --disable-g722-codec --disable-g7221-codec --disable-opencore-amr --disable-silk --disable-opus --disable-video --disable-v4l2 --disable-sound --disable-ext-sound --disable-sdl --disable-libyuv --disable-ffmpeg --disable-openh264 --disable-ipp --disable-libwebrtc --without-external-pa --without-external-srtp --enable-epoll [pjproject] Copying patches/asterisk_malloc_debug.c to source/pjsip-apps/src/asterisk_malloc_debug.c [pjproject] Rebuilding [pjproject] Compiling asterisk debug malloc stubs ..... {noformat} If you run into a problem please let us know. Thanks! By: Kevin Harwell (kharwell) 2021-06-07 11:06:21.359-0500 [~andrewyager] Any chance you might be able to apply the patch ([^ASTERISK-29415.diff]) and test it out to see if it fixes your issue? By: Andrew Yager (andrewyager) 2021-06-16 07:25:01.844-0500 We're testing this now and will feedback shortly. By: Kevin Harwell (kharwell) 2021-06-16 10:49:57.959-0500 I've updated the diff to include another patch by pjproject. While the original patch should fix the issue they felt like this other patch was related enough to add it in as well. By: Andrew Yager (andrewyager) 2021-06-22 17:17:49.396-0500 Since applying the patch to four servers we have been unable to replicate the sag fault - so it looks good! By: Kevin Harwell (kharwell) 2021-06-22 18:10:52.580-0500 [~andrewyager] Excellent news. Thanks so much for testing things out. By: Friendly Automation (friendly-automation) 2021-07-22 16:07:39.576-0500 Change 16186 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16186|https://gerrit.asterisk.org/c/asterisk/+/16186] By: Friendly Automation (friendly-automation) 2021-07-22 16:19:47.516-0500 Change 16187 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16187|https://gerrit.asterisk.org/c/asterisk/+/16187] By: Friendly Automation (friendly-automation) 2021-07-22 16:19:53.395-0500 Change 16188 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16188|https://gerrit.asterisk.org/c/asterisk/+/16188] By: Friendly Automation (friendly-automation) 2021-07-22 16:19:56.578-0500 Change 16185 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16185|https://gerrit.asterisk.org/c/asterisk/+/16185] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:00.671-0500 Change 16189 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16189|https://gerrit.asterisk.org/c/asterisk/+/16189] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:07.133-0500 Change 16190 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16190|https://gerrit.asterisk.org/c/asterisk/+/16190] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:10.893-0500 Change 16207 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16207|https://gerrit.asterisk.org/c/asterisk/+/16207] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:16.166-0500 Change 16210 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16210|https://gerrit.asterisk.org/c/asterisk/+/16210] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:20.465-0500 Change 16208 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16208|https://gerrit.asterisk.org/c/asterisk/+/16208] By: Friendly Automation (friendly-automation) 2021-07-22 16:20:25.370-0500 Change 16209 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16209|https://gerrit.asterisk.org/c/asterisk/+/16209] By: Friendly Automation (friendly-automation) 2021-07-23 08:22:56.263-0500 Change 16213 merged by Friendly Automation: AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS [https://gerrit.asterisk.org/c/asterisk/+/16213|https://gerrit.asterisk.org/c/asterisk/+/16213] |