ASTERISK-27857: Attended Transfer: Attended transfer has failed if using AMI terminal to send.

[Home]

Summary: ASTERISK-27857: Attended Transfer: Attended transfer has failed if using AMI terminal to send.

Reporter: Cao Minh Hiep (hiepcm) Labels:

Date Opened: 2018-05-16 00:38:17 Date Closed:

Priority: Minor Regression?

Status: Open/New Components: Core/Bridging

Versions: 13.21.0 Frequency of
Occurrence Frequent

Related
Issues:

Environment: Asterisk v13.21.0, AMI Terminal, Telephone. Attachments: ( 0) attended_transfer.jpg
( 1) attended_transfer_failed_with_full-log_13.21.0.txt
( 2) attended_transfer_failed_with_AMI_13.21.0.txt
( 3) attended_transfer_failed_with_AMI.txt
( 4) CLI&AMI-LOGS_ver13.21.0.txt

Description: We have made an attended transfer with the following scenario:
1.Two phones(A and B) in one work-group.
2. Make a call to one of them(phone A) from an outside phone.
3. On AMI interface(used Tera Term terminal) to make an attended transfer from phone A to other(phone B).

We could not make an attended transfer from A phone to B phone.
It outputs a beep sound also.
When we tried to investigate the causing of this problem.
We found the difference logs between bug log and a normal log as below:
=>*2 201 (*2: attended transfer, 201: Extention)
We do that by the following AMI command:
{noformat}
Action: Atxfer
ActionID: 1
Channel: SIP/100002-0000008b
Exten: 201
{noformat}

We found the logs of "Channel Local/20@a_context_01-0000006e"
instead of "Channel Local/201@a_context_01-0000006e".

We also found there are different process ID in progress of "*2 201"
It's same process ID(30450) with "*2 20" and It turns to 3584 ID with 1.
Note: In the normal attended transfer log we found the same process ID for "*2201".

{noformat}
DTMF[30450][C-0000002d]: channel.c:3972 __ast_read: DTMF end '*' received on SIP/100002-0000008b, duration 0 ms
DTMF[30450][C-0000002d]: channel.c:3999 __ast_read: DTMF begin emulation of '*' with duration 100 queued on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:4092 __ast_read: DTMF end emulation of '*' queued on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:3972 __ast_read: DTMF end '2' received on SIP/100002-0000008b, duration 0 ms
DTMF[30450][C-0000002d]: channel.c:3999 __ast_read: DTMF begin emulation of '2' with duration 100 queued on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:4092 __ast_read: DTMF end emulation of '2' queued on SIP/100002-0000008b
-- <SIP/100002-0000008b> Playing 'pbx-transfer.gsm' (language 'ja')
DTMF[30450][C-0000002d]: channel.c:3972 __ast_read: DTMF end '2' received on SIP/100002-0000008b, duration 0 ms
DTMF[30450][C-0000002d]: channel.c:4031 __ast_read: DTMF end accepted without begin '2' on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:4042 __ast_read: DTMF end passthrough '2' on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:3972 __ast_read: DTMF end '0' received on SIP/100002-0000008b, duration 0 ms
[May 16 11:57:37] DTMF[30450][C-0000002d]: channel.c:4031 __ast_read: DTMF end accepted without begin '0' on SIP/100002-0000008b
DTMF[30450][C-0000002d]: channel.c:4042 __ast_read: DTMF end passthrough '0' on SIP/100002-0000008b
DTMF[3584][C-0000002d]: channel.c:3972 __ast_read: DTMF end '1' received on SIP/100002-0000008b, duration 0 ms
DTMF[3584][C-0000002d]: channel.c:4031 __ast_read: DTMF end accepted without begin '1' on SIP/100002-0000008b
DTMF[3584][C-0000002d]: channel.c:4042 __ast_read: DTMF end passthrough '1' on SIP/100002-0000008b
{noformat}
Please have a look at attached test logs file.
And could you please show us the causing of the problem and fixed patch for it?

Comments: By: Asterisk Team (asteriskteam) 2018-05-16 00:38:18.420-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].
By: Joshua C. Colp (jcolp) 2018-05-16 00:47:54.625-0500

It appears the bug you have submitted is against a rather old version of a supported branch of Asterisk. There have been many issues fixed between the version you are using and the current version of your branch. Please test with the latest version in your Asterisk branch and report whether the issue persists.

Please see the Asterisk Versions [1] wiki page for info on which versions of Asterisk are supported.
[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

By: Cao Minh Hiep (hiepcm) 2018-05-16 01:24:07.036-0500

Thanks for your comment.
We have just re-confirmed this bug on latest version v13.21.0.
We found the same bug in this version too.
Please take a look at attached test logs file for details.

Thank you and Best regards,

By: Benjamin Keith Ford (bford) 2018-05-17 11:19:15.926-0500

[~hiepcm], I can successfully make an attended transfer following this scenario. The steps I took are:
# Make an external call to phone A (with "t" in Dial options)
# Pick up phone A
# Do the attended transfer AMI action
# Phone B will ring - answer it
# Hang up phone A
# The outside phone is now bridged with phone B
Are these the exact steps you took? Did phone B ever ring when doing the Atxfer action via AMI?
By: Cao Minh Hiep (hiepcm) 2018-05-17 22:56:39.237-0500

Hi Benjamin Keith Ford
Thanks for your feedback.

Our scenario is the following (quite same to you).
1. Make an external call to phone A (phone A and phone B are in the same word-group)
2. Pick up phone A -> Answer it.
3. Do the attended transfer AMI action from Tera Term Terminal
4. Phone B did not ring - I heard a beep sound at phone A
5. Hang up phone A
6. The outside phone was not bridged with phone B.

We also successfully with the attended transfer AMI action on another test environment.
Note: We built asterisk and run it in the different servers.
Here is our test environment:
・Asterisk Build Server
　　- Virtual machine on a Cloud service:
　　　　CentOS Linux release 7.3.1611 (Core)
　　-Disable BUILD_NATIVE by menu-select.

・Working Server1: (Attended transfer works well on this server)
　　- Virtual machine on an other Cloud service:
　　　　CentOS Linux release 7.3.1611 (Core)
・Working Server2: (The issue occurs on this server)
　　- VMWare Esxi 6.5
　　　　CentOS Linux release 7.3.1611 (Core)
OS are same for all. Basically we used packages that has installed in CentOS7.3 ISO image.

We also can successfully make an attended transfer on real desktop phone without AMI action like this:
1. Make an external call to phone A (phone A and phone B are in the same word-group)
2. Pick up phone A -> Answer it.
3. Press *2201 on phone A for doing the attended transfer to phone B(201)
4. Phone B will ring - Answer it.
5. Hang up phone A.
6. The outside phone is now bridged with phone B -Answer it.
7. Hang up phone B and external phone.

Please let us know if you need more information about issue.
Thank you
By: Richard Mudgett (rmudgett) 2018-05-22 15:29:34.385-0500

It looks like the last digit is getting stuck in the read queue for some reason. What timing modules are installed and in use? CLI "module show like res_timing".
By: Cao Minh Hiep (hiepcm) 2018-05-22 20:54:09.495-0500

Hi Richard Mudgett
Thanks for your feedback.

Here is the information of the timing modules we have installed and in use:

On our Working Server2 which the issue occurs, we have:
"CLI> module show like res_timing
Module Description Use Count Status Support Level
res_timing_dahdi.so DAHDI Timing Interface 0 Running core
res_timing_pthread.so pthread Timing Interface 0 Running extended
res_timing_timerfd.so Timerfd Timing Interface 1 Running core
3 modules loaded"

On our Working Server1 which the attended transfer works well aslo have the same above.

Please let us know if you need more information about the issue.
Thank you

By: Cao Minh Hiep (hiepcm) 2018-05-31 03:44:13.990-0500

Hi,
We would like to update some more information about the result that we have investigated in our environment recently.
We have the following results:
No. ---- WORKING SERVER -----------BUILD SERVER -------- RESULT
1. --------- 1CPU --------------------------------- 2CPUs ------------------ X
2. --------- 2CPUs --------------------------------2CPUs ------------------ O
3. --------- 1CPU ---------------------------------1CPU ------------------- X
4. --------- 2CPUs -------------------------------1CPU -------------------- O
It depends on the number of CPU are working on each server.
Here is the result we have confirmed on our VMware server.
From this result, it seems good on the WORKING SERVER with 2CPUs and not good on 1CPU.
However, it works well on the cloud server with 1 CPU.
So We could not say it happens on a server with 1 CPU.

We hope the information will help you to find out the causing of the issue.

Thank you.

By: Joshua C. Colp (jcolp) 2018-06-12 07:59:49.267-0500

Can you please also enable debug logging (it can be done in logger.conf to a file or console) and do core set debug 9 and attach the result of a failure?
By: Cao Minh Hiep (hiepcm) 2018-06-12 22:12:15.607-0500

Hello,
Thanks for your feedback.

Here are the test logs after we enabled debug logging at logger.conf
and did core set debug 9 at CLI.
In this test, we made a call from outside to 4444 and then tried to transfer to 8888 extension.

Please let us know if you need more information about that.

Thank you.

By: Joshua C. Colp (jcolp) 2018-06-19 04:30:47.407-0500

Your debug log is against 13.12.0. Please ensure it is the latest version. We can't fix old versions, and trying to figure out things based on that takes extra work which can consume a lot of time.
By: Cao Minh Hiep (hiepcm) 2018-06-20 00:53:51.414-0500

Hello,

I am sorry for my mistake.
I reconfirmed on ver13.21.0 and attach the test logs after we enabled debug logging at logger.conf and also did core set debug 9 at CLI.
This time, we made a call from outside to 202 and then tried to transfer to 100007 extensions.

Thank you.
By: Richard Mudgett (rmudgett) 2018-06-22 12:00:45.842-0500

Some of the patches committed for ASTERISK-27625 yesterday may fix this issue. Several of those patches fix scenarios where we had two threads reading frames from a channel at the same time. One thread could read and discard frames that the other needs for the attended transfer. This ties in with an earlier comment of mine where it looks like some digits are getting stuck in the read queue. Do you have another AMI or ARI connection getting channel variables with the same channel that you initiate the atxfer?
By: Cao Minh Hiep (hiepcm) 2018-07-02 21:24:54.273-0500

Hello Richard Mudgett
Thanks for your feedback.
I am sorry to be late.

I have added these patches then tested it on our server test environment.
The result is these patches could not fix our issue.
And I don't have another AMI or ARI connection getting channel variables with the same channel that I initiate the atxfer.

Thank you.
Hiep.

By: Cao Minh Hiep (hiepcm) 2018-08-29 04:14:25.954-0500

Hello Richard Mudgett, Joshua Colp

I found the following issue that has the same situation with our issue.
ASTERISK-27943
Please have a look at the fixed patch. It might be a good way to fix our issue.

Please note that It will be ok if transfer to an agent number just has two digits.
And It works fine on Server with two cores, the issue will happen if I disable one of them.

Thank you.

By: Cao Minh Hiep (hiepcm) 2018-09-30 22:23:12.707-0500

Hello Richard Mudgett, Joshua Colp
Do you think the way to fix the problem in ASTERISK-27943 is good for this issue (ASTERISK-27857)?
Although, I understand that we don't use the same function for sending a message at 27943 in sending digits at this issue.

By: Cao Minh Hiep (hiepcm) 2019-09-02 20:47:11.247-0500

Hello, We've investigated this problem again recently.
About investigation, the result is we summarized it in attended_transfer.jpg as an attachment.

In the attended_transfer.jpg, We realize that between two digits we need at least 45 milliseconds for pending. We saw between "0" and "4" of an exten number(204),we have 45 milliseconds. this time, the process of exten matching and autoservice_run will be called so we need time for it.

As we know it, autoservice_run will run under a child process which created before.
it will read __ast_read() to get data from each frame to clean up or reorganize data.

With 2 cores CPU or more, the process of exten matching and autoservice_run will take time less than 45 ms, so it's fine to get next digit "4".

But with 1 core CPU, the process of exten matching and autoservice_run will take a time more than 45 ms, this time the digit "4" will be ready to read, so it will be clean up by autoservice_run.

To solve this problem, we try to take time to send each digit more than that as below:
main/manager.c:
action_atxfer():
for (digit = exten; *digit; ++digit) {
+ usleep(150000);
struct ast_frame f = { AST_FRAME_DTMF, .subclass.integer = *digit };
ast_queue_frame(chan, &f);
}

Could you please take a look at these and give us comment to solve the problem?.

By: Cao Minh Hiep (hiepcm) 2019-10-01 04:48:39.734-0500

Hello,
Could you give us some advice about the last report from us?.
Although We know we shouldn't use "sleep" to fix bug source code.
If It's not, Could you tell us another way to fix it base on the information we had investigated?
Let us know if you need more information.
Thank you very much.