[Home]

Summary:ASTERISK-29415: Crash in PJSIP TLS transport
Reporter:Andrew Yager (andrewyager)Labels:patch security
Date Opened:2021-05-05 09:28:37Date Closed:2021-07-22 16:07:38
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:18.3.0 Frequency of
Occurrence
Related
Issues:
Environment:Ubuntu 20.04 OpenSSL 1.1.1fAttachments:( 0) ASTERISK-29415.diff
( 1) CoreDump-brief.txt
( 2) CoreDump-full.txt
( 3) CoreDump-info.txt.zip
( 4) CoreDump-locks.txt
( 5) CoreDump-thread1.txt
Description:We've been able to reproduce a repeatable crash on a system with a large number of endpoints where there are mixed TLS versions in use by the clients, and Asterisk will reliably crash within 2 - 3 minutes of startup.

We have not been able to verify if the crash is simply related to unsupported TLS version traffic being recieved, or bad TLS traffic, or other memory management issues, but the stack trace seems to be pretty reliably:

{code}
#0  0x00007f34451d0e74 in on_accept_complete2 (ssock=0x555f4fabe688, new_ssock=0x7f3211d0dc38, src_addr=0x7f3211d0e31c, src_addr_len=16, accept_status=0) at ../src/pjsip/sip_transport_tls.c:1352
#1  0x00007f344527953e in on_handshake_complete (ssock=0x7f3211d0dc38, status=0) at ../src/pj/ssl_sock_imp_common.c:290
#2  0x00007f3445279f89 in asock_on_data_read (asock=0x7f320a7d90a0, data=0x7f320a7c9078, size=395, status=0, remainder=0x7f33bfffeb10) at ../src/pj/ssl_sock_imp_common.c:681
#3  0x00007f344526cc1c in ioqueue_on_read_complete (key=0x555f4f83b7e0, op_key=0x7f327bfffbc8, bytes_read=395) at ../src/pj/activesock.c:504
#4  0x00007f3445265180 in ioqueue_dispatch_read_event (ioqueue=0x7f33f85940a0, h=0x555f4f83b7e0) at ../src/pj/ioqueue_common_abs.c:605
#5  0x00007f34452672d5 in pj_ioqueue_poll (ioqueue=0x7f33f85940a0, timeout=0x7f33bfffedf0) at ../src/pj/ioqueue_epoll.c:720
#6  0x00007f34451bb759 in pjsip_endpt_handle_events2 (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50, p_count=0x0) at ../src/pjsip/sip_endpoint.c:745
#7  0x00007f34451bb832 in pjsip_endpt_handle_events (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50) at ../src/pjsip/sip_endpoint.c:777
#8  0x00007f33fb6cc8d6 in monitor_thread_exec (endpt=0x0) at res_pjsip.c:5166
#9  0x00007f3445268298 in thread_main (param=0x555f4f7b0cc8) at ../src/pj/os_core_unix.c:541
#10 0x00007f3444a68609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007f34446a3293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{code}

with the full thread1 trace as:

{code}
!@!@!@! thread1.txt !@!@!@!

$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {176, 0 <repeats 27 times>}, _kill = {si_pid = 176, si_uid = 0}, _timer = {si_tid = 176, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 176, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 176, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0xb0, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 176, si_fd = 0}}}
Signal        Stop Print Pass to program Description
SIGSEGV       Yes Yes Yes Segmentation fault

Thread 1 (Thread 0x7f33bffff700 (LWP 59868)):
#0  0x00007f34451d0e74 in on_accept_complete2 (ssock=0x555f4fabe688, new_ssock=0x7f3211d0dc38, src_addr=0x7f3211d0e31c, src_addr_len=16, accept_status=0) at ../src/pjsip/sip_transport_tls.c:1352
       listener = 0x0
       tls = 0x0
       ssl_info = {established = -1073747744, proto = 32563, cipher = 1674508032, local_addr = {addr = {sa_family = 64638}, ipv4 = {sin_family = 64638, sin_port = 60033, sin_addr = {s_addr = 2816287024}, sin_zero = "3177000000000367316c"}, ipv6 = {sin6_family = 64638, sin6_port = 60033, sin6_flowinfo = 2816287024, sin6_addr = {s6_addr = "3177000000000367316c~374201352260315324021", u6_addr32 = {32563, 1674508032, 3934387326, 299158960}}, sin6_scope_id = 32562}}, remote_addr = {addr = {sa_family = 0}, ipv4 = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "360367062246063177000"}, ipv6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {s6_addr = "360367062246063177000000276026sR376177000", u6_addr32 = {2788358128, 32563, 1383274174, 32766}}, sin6_scope_id = 1383274175}}, local_cert_info = 0x7f344527cd07 <sk_X509_num+24>, remote_cert_info = 0x7f33bfffef80, verify_status = 1010030128, last_native_err = 139860241279344, grp_lock = 0x7f3445280424 <ssl_update_remote_cert_chain_info+412>}
       addr = "330022270061001000000000060326063<2177000000350337320021062177000000260310324342062177000000p351377277063177000000000E'E002000000000060!ݧ3177000"
       state_cb = 0x7f344461b00b <_int_free+1611>
       tmp_src_addr = {addr = {sa_family = 5}, ipv4 = {sin_family = 5, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "x356N023062177000"}, ipv6 = {sin6_family = 5, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {s6_addr = "x356N023062177000000356065*E4177000", u6_addr32 = {323939960, 32562, 1160394222, 32564}}, sin6_scope_id = 1674508032}}
       is_shutdown = 32564
       status = 1153712788
       addr_buf = "360367062246063177000000000367316c~374201352260351377277063177000000n005(E4177000000240351377277063177000000070334320021062177000000070334320021000000000"
#1  0x00007f344527953e in on_handshake_complete (ssock=0x7f3211d0dc38, status=0) at ../src/pj/ssl_sock_imp_common.c:290
       ret = 1383274174
#2  0x00007f3445279f89 in asock_on_data_read (asock=0x7f320a7d90a0, data=0x7f320a7c9078, size=395, status=0, remainder=0x7f33bfffeb10) at ../src/pj/ssl_sock_imp_common.c:681
       ret = 1
       ssock = 0x7f3211d0dc38
#3  0x00007f344526cc1c in ioqueue_on_read_complete (key=0x555f4f83b7e0, op_key=0x7f327bfffbc8, bytes_read=395) at ../src/pj/activesock.c:504
       remainder = 0
       ret = 1
       flags = 32563
       asock = 0x7f320a7d90a0
       r = 0x7f327bfffbc8
       loop = 0
       status = -1073747152
#4  0x00007f3445265180 in ioqueue_dispatch_read_event (ioqueue=0x7f33f85940a0, h=0x555f4f83b7e0) at ../src/pj/ioqueue_common_abs.c:605
       read_op = 0x7f327bfffbc8
       bytes_read = 395
       has_lock = 1
       rc = 0
#5  0x00007f34452672d5 in pj_ioqueue_poll (ioqueue=0x7f33f85940a0, timeout=0x7f33bfffedf0) at ../src/pj/ioqueue_epoll.c:720
       i = 0
       count = 1
       event_cnt = 1
       processed_cnt = 0
       msec = 10
       events = {{events = 1, data = {ptr = 0x555f4f83b7e0, fd = 1334032352, u32 = 1334032352, u64 = 93867844286432}}, {events = 32563, data = {ptr = 0x7f33bfffec10, fd = -1073746928, u32 = 3221220368, u64 = 139860241280016}}, {events = 1160201472, data = {ptr = 0xbfffec1000000000, fd = 0, u32 = 0, u64 = 13835036133769084928}}, {events = 32563, data = {ptr = 0x7f34452716c5 <pj_lock_release+50>, fd = 1160189637, u32 = 1160189637, u64 = 139862475216581}}, {events = 3805720496, data = {ptr = 0xfb72400800007f32, fd = 32562, u32 = 32562, u64 = 18118614653968875314}}, {events = 32563, data = {ptr = 0x7f33bfffec60, fd = -1073746848, u32 = 3221220448, u64 = 139860241280096}}, {events = 1160204127, data = {ptr = 0xe2d6afb000007f34, fd = 32564, u32 = 32564, u64 = 16345445068036931380}}, {events = 32562, data = {ptr = 0x7f33fb723de0 <caching_pool>, fd = -76399136, u32 = 4218568160, u64 = 139861238627808}}, {events = 3221220448, data = {ptr = 0xe2d6b0f800007f33, fd = 32563, u32 = 32563, u64 = 16345446476786204467}}, {events = 1, data = {ptr = 0x7f33bfffed08, fd = -1073746680, u32 = 3221220616, u64 = 139860241280264}}, {events = 3221220624, data = {ptr = 0xbfffec9000007f33, fd = 32563, u32 = 32563, u64 = 13835036683524931379}}, {events = 32563, data = {ptr = 0x7f344526abbd <elapsed_msec+125>, fd = 1160162237, u32 = 1160162237, u64 = 139862475189181}}, {events = 3221220616, data = {ptr = 0xbfffed1000007f33, fd = 32563, u32 = 32563, u64 = 13835037233280745267}}, {events = 32563, data = {ptr = 0x3b9aca00, fd = 1000000000, u32 = 1000000000, u64 = 1000000000}}, {events = 0, data = {ptr = 0x963e5b8541cdcd65, fd = 1104006501, u32 = 1104006501, u64 = 10826191182138035557}}, {events = 1093863564, data = {ptr = 0xea81fc7e63cef700, fd = 1674508032, u32 = 1674508032, u64 = 16898064896641398528}}}
       queue = {{key = 0x555f4f83b7e0, event_type = READABLE_EVENT}, {key = 0x7f33bfffed08, event_type = 3221220624}, {key = 0x7f33bfffece0, event_type = 1160154766}, {key = 0x40937d7d876adcb1, event_type = 4166598744}, {key = 0x4df, event_type = 372}, {key = 0x7f33bfffed00, event_type = 1160189637}, {key = 0x0, event_type = 4166598704}, {key = 0x7f33bfffed20, event_type = 1160262370}, {key = 0x0, event_type = 1333776912}, {key = 0x7f33bfffeda0, event_type = 1160267970}, {key = 0x7f33bfffedf0, event_type = 1333776912}, {key = 0x555f4f9c83c8, event_type = READABLE_EVENT}, {key = 0x1, event_type = 4167558488}, {key = 0x7f341dc3f4b8, event_type = 3805720664}, {key = 0x4df, event_type = 372}, {key = 0x4df, event_type = 398}}
       t1 = {u32 = {lo = 1832071522, hi = 290}, u64 = 1247372587362}
       t2 = {u32 = {lo = 1835242228, hi = 290}, u64 = 1247375758068}
#6  0x00007f34451bb759 in pjsip_endpt_handle_events2 (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50, p_count=0x0) at ../src/pjsip/sip_endpoint.c:745
       timeout = {sec = 0, msec = 10}
       count = 0
       net_event_count = 0
       c = 0
#7  0x00007f34451bb832 in pjsip_endpt_handle_events (endpt=0x555f4f7fcf28, max_timeout=0x7f33bfffee50) at ../src/pjsip/sip_endpoint.c:777
No locals.
#8  0x00007f33fb6cc8d6 in monitor_thread_exec (endpt=0x0) at res_pjsip.c:5166
       delay = {sec = 0, msec = 10}
#9  0x00007f3445268298 in thread_main (param=0x555f4f7b0cc8) at ../src/pj/os_core_unix.c:541
       rec = 0x555f4f7b0cc8
       result = 0x0
       rc = 0
#10 0x00007f3444a68609 in start_thread (arg=<optimized out>) at pthread_create.c:477
       ret = <optimized out>
       pd = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139860241282816, -4867490839655063574, 140730281694910, 140730281694911, 140730281695120, 139860241280896, 4761515510424952810, 4763607274558480362}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = 0
#11 0x00007f34446a3293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
{code}

While we can "reliably reproduce" this, we need to migrate a large portion of endpoints to our testing environment, which is not particularly easy to schedule to allow us to reproduce, so I'm hoping the above or the core dumps we have may be useful to help identify the likely candidate.
Comments:By: Asterisk Team (asteriskteam) 2021-05-05 09:28:44.974-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Asterisk Team (asteriskteam) 2021-05-05 09:28:45.976-0500

This issue has been automatically restricted and set to a blocker due to being a security type issue. If this is not a security vulnerability issue it will be moved to the appropriate issue type when triaged.

Please DO NOT put a code review up for this change at this time. Attach any applicable patches to this issue.

By: Benjamin Keith Ford (bford) 2021-05-05 10:48:05.813-0500

Unfortunately there's not much to go off of in the backtrace other than seeing it completes the handshake. Can you provide the following as attachments:
# More thread 1s. Also, when you say that they mostly seem to be the one that you have in the description, do you mean that some of them are slightly different but end up at the handshake still, or do you mean you are getting entirely different backtraces?
# What does your endpoint config look like? How many are TLS vs. non-TLS?
# The output of {{asterisk/contrib/scripts/ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump>}}. This should include the core file itself and the associated backtrace information so we can look through memory and get some more information. If you don't have the core file, the info txt file would be helpful.

If any of the attachments are too large, you can email them to us directly if you are able.

By: Andrew Yager (andrewyager) 2021-05-05 11:24:52.215-0500

Ubuntu seems to have written over the older core dumps, so I'll organise some time over the weekend to bring the load back over and create the crash again.

There is about 2k TLS endpoints and about 700 of UDP.

Endpoint configs are relatively boring, and live in pjsip realtime with static transports. I can do specific dumps of config parameters if it's helpful and you tell me what you are after.

I'll upload the outputs of the ast_coredumper in a minute.

By: Andrew Yager (andrewyager) 2021-05-05 11:25:36.201-0500

Actually… attachments will definitely be too large; coreball is 280M. I'll attach the other files, but let me know how you want me to dump the coreball.

By: Andrew Yager (andrewyager) 2021-05-05 11:30:03.795-0500

debug files.

By: Benjamin Keith Ford (bford) 2021-05-05 11:59:45.384-0500

Don't worry about the endpoint configs, we're mostly just curious about TLS load. The core file will definitely be too large. If you can upload to something like Google Drive and share the link in an email to asteriskteam@digium.com, we can copy it over to internal storage so you can remove it after.

By: Andrew Yager (andrewyager) 2021-05-05 19:42:11.767-0500

I've sent this over now.

By: Benjamin Keith Ford (bford) 2021-05-06 09:53:56.858-0500

Thanks! I've opened an issue for this to investigate further and moved the core file over. Feel free to remove it whenever you'd like.

By: Kevin Harwell (kharwell) 2021-05-25 13:31:19.418-0500

[~andrewyager] Could you please test the attached patch ([^ASTERISK-29415.diff]) from pjproject, and let us know if it fixes the issue?

To apply the patch do the following (assumes patch is downloaded to top level Asterisk source directory):
{noformat}
$ git apply ASTERISK-29415.diff
$ make -C third-party/pjproject/ distclean
$ make
$ make install
{noformat}
When rebuilding you should see that pjproject gets rebuilt. Something like the following:
{noformat}
make[1]: Leaving directory '/home/kharwell/src/asterisk/18/menuselect'
[pjproject]  Downloading https://raw.githubusercontent.com/asterisk/third-party/master/pjproject/2.10/pjproject-2.10.tar.bz2 to /tmp/pjproject-2.10.tar.bz2
[pjproject]  Verifying /tmp/pjproject-2.10.tar.bz2
[pjproject]  Verify successful
[pjproject]  Verifying /tmp/pjproject-2.10.tar.bz2
[pjproject]  Verify successful
[pjproject]  Unpacking /tmp/pjproject-2.10.tar.bz2
[pjproject]  Applying patches /home/kharwell/src/asterisk/18/third-party/pjproject/patches /home/kharwell/src/asterisk/18/third-party/pjproject/source
[pjproject]  Applying user.mak
[pjproject]  Applying custom include file patches/config_site.h
[pjproject]  Applying custom include file patches/asterisk_malloc_debug.h
[pjproject]  Rebuilding
[pjproject]  Configuring with --prefix=/opt/pjproject --disable-speex-codec --disable-speex-aec --disable-bcg729 --disable-gsm-codec --disable-ilbc-codec --disable-l16-codec --disable-g722-codec --disable-g7221-codec --disable-opencore-amr --disable-silk --disable-opus --disable-video --disable-v4l2 --disable-sound --disable-ext-sound --disable-sdl --disable-libyuv --disable-ffmpeg --disable-openh264 --disable-ipp --disable-libwebrtc --without-external-pa --without-external-srtp --enable-epoll
[pjproject]  Copying patches/asterisk_malloc_debug.c to source/pjsip-apps/src/asterisk_malloc_debug.c
[pjproject]  Rebuilding
[pjproject]  Compiling asterisk debug malloc stubs
.....
{noformat}
If you run into a problem please let us know.

Thanks!


By: Kevin Harwell (kharwell) 2021-06-07 11:06:21.359-0500

[~andrewyager] Any chance you might be able to apply the patch ([^ASTERISK-29415.diff]) and test it out to see if it fixes your issue?

By: Andrew Yager (andrewyager) 2021-06-16 07:25:01.844-0500

We're testing this now and will feedback shortly.

By: Kevin Harwell (kharwell) 2021-06-16 10:49:57.959-0500

I've updated the diff to include another patch by pjproject. While the original patch should fix the issue they felt like this other patch was related enough to add it in as well.

By: Andrew Yager (andrewyager) 2021-06-22 17:17:49.396-0500

Since applying the patch to four servers we have been unable to replicate the sag fault - so it looks good!

By: Kevin Harwell (kharwell) 2021-06-22 18:10:52.580-0500

[~andrewyager] Excellent news. Thanks so much for testing things out.

By: Friendly Automation (friendly-automation) 2021-07-22 16:07:39.576-0500

Change 16186 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16186|https://gerrit.asterisk.org/c/asterisk/+/16186]

By: Friendly Automation (friendly-automation) 2021-07-22 16:19:47.516-0500

Change 16187 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16187|https://gerrit.asterisk.org/c/asterisk/+/16187]

By: Friendly Automation (friendly-automation) 2021-07-22 16:19:53.395-0500

Change 16188 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16188|https://gerrit.asterisk.org/c/asterisk/+/16188]

By: Friendly Automation (friendly-automation) 2021-07-22 16:19:56.578-0500

Change 16185 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16185|https://gerrit.asterisk.org/c/asterisk/+/16185]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:00.671-0500

Change 16189 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16189|https://gerrit.asterisk.org/c/asterisk/+/16189]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:07.133-0500

Change 16190 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16190|https://gerrit.asterisk.org/c/asterisk/+/16190]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:10.893-0500

Change 16207 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16207|https://gerrit.asterisk.org/c/asterisk/+/16207]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:16.166-0500

Change 16210 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16210|https://gerrit.asterisk.org/c/asterisk/+/16210]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:20.465-0500

Change 16208 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16208|https://gerrit.asterisk.org/c/asterisk/+/16208]

By: Friendly Automation (friendly-automation) 2021-07-22 16:20:25.370-0500

Change 16209 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16209|https://gerrit.asterisk.org/c/asterisk/+/16209]

By: Friendly Automation (friendly-automation) 2021-07-23 08:22:56.263-0500

Change 16213 merged by Friendly Automation:
AST-2021-009 - pjproject-bundled: Avoid crash during handshake for TLS

[https://gerrit.asterisk.org/c/asterisk/+/16213|https://gerrit.asterisk.org/c/asterisk/+/16213]