[Home]

Summary:ASTERISK-28069: Dropping CDRs records with local languages
Reporter:Kirill Nikitin (dj_kill)Labels:
Date Opened:2018-09-23 09:12:55Date Closed:
Priority:MinorRegression?
Status:Open/NewComponents:CDR/cdr_pgsql
Versions:13.19.0 13.23.0 13.23.1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Debian 3.16.51-2 psql (PostgreSQL) 9.4.15Attachments:
Description:If in CDR record appears local language Asterisk may corrupt data and fail to write them in database.

How to reproduce:

1. Create context in extensions in the database with some line in local language.
Encoding of database must be UTF-8.
{noformat}
Example (with Russian):
173705
text-machine dbd54025c33844fbe672835e5960c0b7 NoOp ------------- Text Machine -------------
1
173706
text-machine dbd54025c33844fbe672835e5960c0b7 Answer
2
173707
text-machine dbd54025c33844fbe672835e5960c0b7 Hangup
9
173708
text-machine dbd54025c33844fbe672835e5960c0b7 AGI say.php,"Уважаемый Иванов Иван Иванович"
3
173709
text-machine dbd54025c33844fbe672835e5960c0b7 AGI say.php,"приглашаем вас пройти профосмотр первого августа"
4
173710
text-machine dbd54025c33844fbe672835e5960c0b7 AGI say.php,"по адресу улица казахстанская, дом номер один, кабинет номер двадцать"
5
173711
text-machine dbd54025c33844fbe672835e5960c0b7 AGI say.php,"При себе иметь удостоверение личности"
6
173712
text-machine dbd54025c33844fbe672835e5960c0b7 Wait 5
7
{noformat}
Since database accept this no special characters in that context exists.

Initiate a call and drop a call to force asterisk write Russian characters in lastapp field in CDR.

You will receive:
{noformat}
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:439 pgsql_log: Failed to insert call detail record into database!
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:440 pgsql_log: Reason: ERROR:  invalid byte sequence for encoding "UTF8": 0xd0 0x20
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:441 pgsql_log: Connection may have been lost... attempting to reconnect.
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:444 pgsql_log: Connection reestablished.
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:452 pgsql_log: HARD ERROR!  Attempted reconnection failed.  DROPPING CALL RECORD!
[2018-09-18 19:15:04] ERROR[10904]: cdr_pgsql.c:453 pgsql_log: Reason: ERROR:  invalid byte sequence for encoding "UTF8": 0xd0 0x20
{noformat}
Since CDR and context are in the same database - I think this is not database locale settings.
We test it on different 13.XX versions
We test in on PostgreSQL 9.4.XX and 9.6.XX

Byte sequence in error message varies a little, but first byte always 0xd0
Comments:By: Asterisk Team (asteriskteam) 2018-09-23 09:12:56.943-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Sean Bright (seanbright) 2018-09-25 12:59:10.009-0500

Please provide the {{CREATE TABLE}} statements of both your extensions table and your CDR table.

Also please attach your {{cdr_pgsql.conf}} file with sensitive information redacted.

By: Sean Bright (seanbright) 2018-09-25 13:34:53.021-0500

Actually, it's pretty obvious what is happening here. {{lastdata}} has enough room for 79 bytes and is truncated by {{ast_copy_string()}}. We are truncating in the middle of a UTF-8 sequence, which causes the issue.

By: Kirill Nikitin (dj_kill) 2018-09-25 14:36:28.821-0500

{{cdr_pgsql.conf}}
{noformat}
[global]
hostname=/var/run/postgresql/
port=5432
dbname=asterisk
password=password_was_here
user=asterisk
table=cdr
encoding=UTF8
{noformat}

{{global database settings}}
{noformat}
client_encoding UTF8
lc_collate en_US.UTF-8
lc_ctype en_US.UTF-8
lc_messages en_US.UTF-8
lc_monetary en_US.UTF-8
lc_numeric en_US.UTF-8
lc_time en_US.UTF-8
server_encoding UTF8
{noformat}

{{CDR table definition}}
{noformat}
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: cdr; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
--

CREATE TABLE cdr (
   calldate timestamp without time zone NOT NULL,
   clid character varying(80) NOT NULL,
   src character varying(80) NOT NULL,
   dst character varying(80) NOT NULL,
   dcontext character varying(80) NOT NULL,
   channel character varying(80) NOT NULL,
   dstchannel character varying(80) NOT NULL,
   lastapp character varying(255) NOT NULL,
   lastdata character varying(255) COLLATE pg_catalog."ru_RU" NOT NULL,
   duration integer NOT NULL,
   billsec integer NOT NULL,
   disposition character varying(45) NOT NULL,
   amaflags integer NOT NULL,
   accountcode character varying(20) NOT NULL,
   uniqueid character varying(150) NOT NULL,
   peeraccount character varying(20) NOT NULL,
   linkedid character varying(150) NOT NULL,
   sequence integer NOT NULL,
   userfield character varying(255) NOT NULL,
   recordingpath character varying(255)
);


ALTER TABLE cdr OWNER TO postgres;

--
-- Name: cdr_dcontext_idx; Type: INDEX; Schema: public; Owner: postgres; Tablespace:
--

CREATE INDEX cdr_dcontext_idx ON cdr USING btree (dcontext);


--
-- Name: cdr_uniqueid_idx; Type: INDEX; Schema: public; Owner: postgres; Tablespace:
--

CREATE INDEX cdr_uniqueid_idx ON cdr USING btree (uniqueid);


--
-- Name: cdr; Type: ACL; Schema: public; Owner: postgres
--

REVOKE ALL ON TABLE cdr FROM PUBLIC;
REVOKE ALL ON TABLE cdr FROM postgres;
GRANT ALL ON TABLE cdr TO postgres;
GRANT ALL ON TABLE cdr TO asterisk;
GRANT SELECT ON TABLE cdr TO monitor;


--
-- PostgreSQL database dump complete
--
{noformat}

{{extensions table definition}}

{noformat}
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: extensions; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
--

CREATE TABLE extensions (
   id bigint NOT NULL,
   context character varying(40) NOT NULL,
   exten character varying(40) NOT NULL,
   app character varying(40) NOT NULL,
   appdata character varying(256) NOT NULL,
   priority integer
);


ALTER TABLE extensions OWNER TO postgres;

--
-- Name: extensions_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres
--

CREATE SEQUENCE extensions_id_seq
   START WITH 1
   INCREMENT BY 1
   NO MINVALUE
   NO MAXVALUE
   CACHE 1;


ALTER TABLE extensions_id_seq OWNER TO postgres;

--
-- Name: extensions_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: postgres
--

ALTER SEQUENCE extensions_id_seq OWNED BY extensions.id;


--
-- Name: id; Type: DEFAULT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY extensions ALTER COLUMN id SET DEFAULT nextval('extensions_id_seq'::regclass);


--
-- Name: extensions_context_exten_priority_key; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace:
--

ALTER TABLE ONLY extensions
   ADD CONSTRAINT extensions_context_exten_priority_key UNIQUE (context, exten, priority);


--
-- Name: extensions_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace:
--

ALTER TABLE ONLY extensions
   ADD CONSTRAINT extensions_pkey PRIMARY KEY (id);


--
-- Name: extensions; Type: ACL; Schema: public; Owner: postgres
--

REVOKE ALL ON TABLE extensions FROM PUBLIC;
REVOKE ALL ON TABLE extensions FROM postgres;
GRANT ALL ON TABLE extensions TO postgres;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE extensions TO asterisk;
GRANT SELECT ON TABLE extensions TO monitor;


--
-- Name: extensions_id_seq; Type: ACL; Schema: public; Owner: postgres
--

REVOKE ALL ON SEQUENCE extensions_id_seq FROM PUBLIC;
REVOKE ALL ON SEQUENCE extensions_id_seq FROM postgres;
GRANT ALL ON SEQUENCE extensions_id_seq TO postgres;
GRANT SELECT,UPDATE ON SEQUENCE extensions_id_seq TO asterisk;


--
-- PostgreSQL database dump complete
--
{noformat}

Sean Bright, I think you are right. But this caused CDR drops and incorrect billing data.
We didn't test with MySQL and ODBC drivers, but for all "file" format it works fine and for failed (in database) records file copy looks quite fine.

Some comments on configs above:
Connection via unix socket or via TCP socket doesn't change anything (quite obvious why)
COLLATE definition on lastdata was added later. Also nothing changed after that.


By: Sean Bright (seanbright) 2018-09-25 14:47:53.477-0500

[~dj_kill], a quick work around if you don't actually need to store the {{lastapp}} data in your CDR table is to just drop that column and then {{module reload cdr_pgsql.so}}. A true fix for this is non-trivial.

By: Kirill Nikitin (dj_kill) 2018-09-25 14:59:58.034-0500

Unfortunately we use {{lastdata}} for some applications and cannot drop that column.
But I'm curious: you said "79 bytes"
Since asterisk can save full string to file (not just first 79 bytes) I think this is something related to {{cdr_pgsql}} module.
Since {{lastdata}} always UTF-8 character string and CDR definition also {{character varying(255)}} why module operate this as byte sequence?
Decoding this sequence from bytes to UTF-8 sting may solve the problem I think.

By: Sean Bright (seanbright) 2018-09-25 15:17:33.779-0500

Then maybe you can change the {{lastdata}} column to a {{BINARY}} type (or whatever the equivalent in PostgreSQL is) so that it is stored as a string of bytes and does not assume a specific encoding.

The {{lastdata}} member of the CDR structure is defined as {{char lastdata[80]}}. It can only store 79 bytes worth of data and a {{0}} terminator. The size of the column in your table isn't relevant to what Asterisk is doing internally.

Asterisk is pulling in the UTF-8 data from your realtime table and then copying it into an 80 character byte (not character) array. Because UTF-8 characters can be 1-4 bytes, one of them is being "cut off" at the end, resulting in a damaged string.