ASTERISK-30500: Caller name corruption in encodings other than UTF-8

[Home]

Summary: ASTERISK-30500: Caller name corruption in encodings other than UTF-8

Reporter: Basil Mi (BaMi) Labels:

Date Opened: 2023-04-20 11:09:49 Date Closed:

Priority: Major Regression?

Status: Triage/New Components: Resources/res_pjsip

Versions: 18.17.0 Frequency of
Occurrence Constant

Related
Issues:

Environment: FreeBSD 13.2 Attachments: ( 0) sip-capture.txt
( 1) win-1251_text_example_1.txt

Description: After this change: ASTERISK-27830
===================================
{quote}
2023-02-16 10:05 +0000 [1ddfb7551a] George Joseph <gjoseph@sangoma.com>
* res_pjsip: Replace invalid UTF-8 sequences in callerid name
* Added a new function ast_utf8_replace_invalid_chars() to
utf8.c that copies a string replacing any invalid UTF-8
sequences with the Unicode specified U+FFFD replacement
character. For example: "abc\xffdef" becomes "abc\uFFFDdef".
Any UTF-8 compliant implementation will show that character
as a � character.
* Updated res_pjsip:set_id_from_hdr() to use
ast_utf8_replace_invalid_chars and print a warning if any
invalid sequences were found during the copy.
* Updated stasis_channels:ast_channel_publish_varset to use
ast_utf8_replace_invalid_chars and print a warning if any
invalid sequences were found during the copy.
ASTERISK-27830
{quote}
===================================
Some legacy devices transmit the caller name in encodings other than UTF-8. For example, PBX Panasonic KX-TDE600 uses WINDOWS-1251 and it's not configurable.
In this case we use a function ICONV in dialplan to convert caller name to UTF-8 (for incoming calls). And vice versa (for outcoming calls).

Using new function {color:red} "ast_utf8_replace_invalid_chars" {color} distorts caller name to "�" characters before it can be converted to UTF-8 in dialplan.
Users see “��” on devices instead of valid the caller's name.

This logic worked for almost 10 years and broke on 18.17.0.
Need to be able to turn off the replacement of invalid UTF-8 sequences (f.e. from config). Or be able to use the ICONV before replacement (before call {color:red} "ast_utf8_replace_invalid_chars" {color}).

Comments: By: Asterisk Team (asteriskteam) 2023-04-20 11:09:53.389-0500

WARNING

JIRA will be going read-only at the end of April, 2023. We will be starting fresh on Github at https://github.com/asterisk/asterisk at that time. No issues or patches will be copied to Github. If you file an issue on JIRA at this time you will need to recreate it on Github after this date. The same applies if you have a patch.

WARNING

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].
By: George Joseph (gjoseph) 2023-04-21 07:38:05.036-0500

Can you attach a file with some example input strings in WINDOWS-1251 that I can test with?

By: Sean Bright (seanbright) 2023-04-22 11:57:28.088-0500

Packet captures of the invalid SIP messages would also be beneficial.
By: Basil Mi (BaMi) 2023-04-25 09:27:50.232-0500

Examples of input strings in WINDOWS-1251
By: Basil Mi (BaMi) 2023-04-25 09:32:39.323-0500

Packets capture with caller name in win-1251
By: Sean Bright (seanbright) 2023-04-25 09:54:22.110-0500

We already have an iconv check in autoconf so we could potentially leverage that before scrubbing invalid byte sequences? I really hate the idea of bending over backwards for broken devices though. And adding an option to disable means that we might then generate non-compliant messages?

[~BaMi], do you have a ticket open with your vendor for them to fix this on their end?
By: Basil Mi (BaMi) 2023-04-25 12:57:12.512-0500

It might not be the best idea to force/mandatory change characters in the callername.
It may be with the correct characters, but not in the correct encoding (not UTF-8).
Some thoughts:
Use "ast_utf8_replace_invalid_chars" only if any of the common/known encodings are not found in the string. If the callername is in the valid encodings (win-1251, koi-8 and so one), do not correct it and leave further conversion to the user.
Or briefly: if we found that the callername is in valid win-1251, then do not call "ast_utf8_replace_invalid_chars". :-)

When there is a call to the "broken device", we use the inverse transformation UTF-8->WIN-1251: {code} Set(CALLERID(name)=${ICONV(UTF-8,WINDOWS-1251,$[CALLERID(name)])});{code}

Tiсket to Panasonic Corp.? :-) I think they will solve the problem for years. This is a large family of hardware PBXs and proprietary telephone sets for them (for example https://www.kx-td.com/telephone-systems/).

By: Sean Bright (seanbright) 2023-04-25 13:57:15.605-0500

RFC 3261 mandates UTF-8, so there should be no need to try and guess the encoding.

Do you have a SIP proxy between the non-compliant devices and Asterisk that can translate from Windows-1251 to UTF-8? If you aren't able to get your vendor to fix the bug then that would be the next best place to fix this.

-Otherwise I agree we should just have a flag to disable it on a per endpoint basis.-

Edit: We actually can't have a flag to disable it on a per endpoint basis, it would have to be done globally (gross).
By: Sean Bright (seanbright) 2023-04-25 14:05:22.695-0500

There's actually another option - don't upgrade to 18.17.0+ until your vendor fixes this issue.
By: Basil Mi (BaMi) 2023-04-27 08:14:39.186-0500

I know about RFC 3261, but it's not an argument for business. 😊
Therefore, the Asterisk is used as a proxy between non-compliant devices (Panasonic PBXs) and the RFC-devices.
On calls FROM Panasonic, the conversion WIN-1251->UTF8 performed in dialplan, on calls TO Panasonic – UTF-8->WIN-1251. And it's a perfectly working and very flexible solution.
Maybe in a function “ast_utf8_replace_invalid_chars” to do a check? Try to convert input string from WIN-1251 to UTF-8. And if there are no errors, then leave the original input string unchanged (in WIN-1251). Otherwise, perform actions to replace bad characters. This is a partial solution.

By: Basil Mi (BaMi) 2023-04-27 08:19:11.368-0500

I contacted the distributor, and they contacted the local Panasonic office. They are aware of this problem. Their answer is that this is not a bug, but a feature. They use local encodings in their stations. They are different in every region. They don't plan to make any changes.