ASTERISK-29000: internationalization: UTF-8 character in channel variables causes crashes

[Home]

Summary: ASTERISK-29000: internationalization: UTF-8 character in channel variables causes crashes

Reporter: Gregory Massel (gmza) Labels:

Date Opened: 2020-07-21 11:25:59 Date Closed:

Priority: Minor Regression? No

Status: Open/New Components: Core/General

Versions: 16.11.1 Frequency of
Occurrence Constant

Related
Issues:
is duplicated by ASTERISK-29112 Invalid UTF-8 string (problem with umlauts in callerid)

Environment: Asterisk 16.11.1, Ubuntu 18.04.4 LTS Attachments:

Description: An unexpected UTF8 character in a channel variable will either cause Asterisk to Segfault or to generate a harmless backtrace, depending on circucmstances.

E.g. When using Set(), a harmless backtrace:

[Jul 21 08:29:12] VERBOSE[1283][C-000022ef] pbx.c: Executing [s@swvpbx-sub-get-pin-auth:4] Set("PJSIP/mongenaglodge205-00005993", "PIN_CALLER_ID=Simon<E8>") in new stack
[Jul 21 08:29:12] ERROR[1283][C-000022ef] json.c: Error building JSON from '{s: s, s: s}': Invalid UTF-8 string.
[Jul 21 08:29:12] ERROR[1283][C-000022ef] : Got 13 backtrace records
# 0: [0x563b7793a45f] asterisk json.c:613 ast_json_vpack()
# 1: [0x563b7793a361] asterisk json.c:596 ast_json_pack()
# 2: [0x563b779e0384] asterisk stasis_channels.c:831 ast_channel_publish_varset()
# 3: [0x563b77983c3a] asterisk pbx_variables.c:1118 pbx_builtin_setvar_helper()
# 4: [0x563b77983e64] asterisk pbx_variables.c:1154 pbx_builtin_setvar()
# 5: [0x563b7797811f] asterisk pbx_app.c:492 pbx_exec()
# 6: [0x563b77961a1f] asterisk pbx.c:2947 pbx_extension_helper()
# 7: [0x563b77965eb7] asterisk pbx.c:4197 ast_spawn_extension()
# 8: [0x563b77966c6b] asterisk pbx.c:4371 __ast_pbx_run()
# 9: [0x563b779685e8] asterisk pbx.c:4696 pbx_thread()
#10: [0x563b77a0a05e] asterisk utils.c:1249 dummy_start()
#11: [0x7fe7554e16db] libpthread.so.0 pthread_create.c:463 start_thread()
#12: [0x7fe7546d2a3f] libc.so.6 clone.S:97 clone()

However, I have previously had scenarios where the UTF-8 character pulled into the "callerid=" clause in a pjsip.conf endpoint and, when dialling that endpoint, Asterisk segfaulted.

This is minor as I've mitigated this by stripping all UTF-8 characters, however, it would be better, in the long term, that Asterisk either ignore or strip or handle these characters rather than segfault.

Comments: By: Asterisk Team (asteriskteam) 2020-07-21 11:26:01.006-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.
By: Gregory Massel (gmza) 2020-10-20 07:36:02.611-0500

I'm wondering whether this issue hasn't been resolved by nature of the following commit?

2020-07-13 15:06 +0000 [e9e441c399] Sean Bright <sean.bright@gmail.com>

* utf8.c: Add UTF-8 validation and utility functions

There are various places in Asterisk - specifically in regards to
database integration - where having some kind of UTF-8 validation would
be beneficial. This patch adds:

* Functions to validate that a given string contains only valid UTF-8
sequences.

* A function to copy a string (similar to ast_copy_string) stopping when
an invalid UTF-8 sequence is encountered.

* A UTF-8 validator that allows for progressive validation.

All of this is based on the excellent UTF-8 decoder by Björn Höhrmann.
More information is available here:

https://bjoern.hoehrmann.de/utf-8/decoder/dfa/

The API was written in such a way that should allow us to replace the
implementation later should we determine that we need something more
comprehensive.

Change-Id: I3555d787a79e7c780a7800cd26e0b5056368abf9
By: Joshua C. Colp (jcolp) 2020-10-20 07:38:19.631-0500

That is strictly the API for doing such things, it still has to be used in places which has not been done.