[Home]

Summary:ASTERISK-19603: Asterisk AMI truncates long responses over medium latency connections
Reporter:WRP (wrp)Labels:
Date Opened:2012-03-28 15:57:05Date Closed:2015-02-26 09:46:52.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:1.8.7.2 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS release 5.5 (Final) - 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/LinuxAttachments:
Description:Asterisk Manager is truncating long responses when it is connected to over a WAN (with reasonable latency ~50ms). This issue is not occurring when connected locally (127.0.0.1) or from a machine on the same switch/subnet.

The particular command being issued is "Action: Command\r\nCommand: database show", which produces about 5000 lines of text when functioning correctly. It ends with "--END COMMAND--".

When it is being truncated, the command returns a significantly shorter response, though it still ends with "--END COMMAND--".

A packet capture on the server side reveals that, at a certain point, the Manager has sent out around 10 packets without receiving an ACK, and then sends out a FIN. The ACKs from the client arrive shortly after, followed by a few retransmissions, and a FIN from the client. In a low-latency connection, the ACKs from the client keep up with the Manager and the entire database is shown before "--END COMMAND--" and FIN are sent from the server.

The problem appears to lie somewhere around astman_append() below, with a buffer of sorts being fully consumed and returning when the client isn't able to read quickly enough:

{code}
static int action_command(struct mansession *s, const struct message *m)
{
       const char *cmd = astman_get_header(m, "Command");
       const char *id = astman_get_header(m, "ActionID");
       char *buf, *final_buf;
       char template[] = "/tmp/ast-ami-XXXXXX";        /* template for temporary file */
       int fd;
       off_t l;

       if (ast_strlen_zero(cmd)) {
               astman_send_error(s, m, "No command provided");
               return 0;
       }

       if (check_blacklist(cmd)) {
               astman_send_error(s, m, "Command blacklisted");
               return 0;
       }

       fd = mkstemp(template);

       astman_append(s, "Response: Follows\r\nPrivilege: Command\r\n");
       if (!ast_strlen_zero(id)) {
               astman_append(s, "ActionID: %s\r\n", id);
       }
       /* FIXME: Wedge a ActionID response in here, waiting for later changes */
       ast_cli_command(fd, cmd);       /* XXX need to change this to use a FILE * */
       l = lseek(fd, 0, SEEK_END);     /* how many chars available */

       /* This has a potential to overflow the stack.  Hence, use the heap. */
       buf = ast_calloc(1, l + 1);
       final_buf = ast_calloc(1, l + 1);
       if (buf) {
               lseek(fd, 0, SEEK_SET);
               if (read(fd, buf, l) < 0) {
                       ast_log(LOG_WARNING, "read() failed: %s\n", strerror(errno));
               }
               buf[l] = '\0';
               if (final_buf) {
                       term_strip(final_buf, buf, l);
                       final_buf[l] = '\0';
               }
               astman_append(s, "%s", S_OR(final_buf, buf));
               ast_free(buf);
       }
       close(fd);
       unlink(template);
       astman_append(s, "--END COMMAND--\r\n\r\n");
       if (final_buf) {
               ast_free(final_buf);
       }
       return 0;
}
{code}

To test my theory, I created a TCP proxy on the server which connects to the Asterisk Manager, allows tcp connections to itself, and buffers Manager responses to an array instead of directly sending them to the client. This works. Removing the buffering component and making the proxy wait for each line to send to the client before reading the next line from the Manager causes the truncated behavior.

I could only find one ticket that seemed to relate to this issue, though it's from '04. Perhaps it will provide some insight to the person able to solve this issue:

[patch] Allow Manager API Command to return big amount of output data +++
ASTERISK-1948
Comments:By: Matt Jordan (mjordan) 2012-03-30 08:16:21.246-0500

While there may be improvements that could be made in the processing here, this sounds more like a client-side problem.  If the client isn't sending an ACK after 10 packets have been sent by the server, that sounds suspiciously like the client is doing a significant amount of processing on the actual data, and blocking its transport layer acknowledgements.

By: WRP (wrp) 2012-03-30 12:03:39.401-0500

To address your comment about the client, I am simply appending the data from the socket to an array as it arrives.

The speed at which the client processes is besides the point. The TCP layer is able to handle ACKs at any (reasonable) response rate, and is (and as seen in my packet captures) even taking care of retransmissions. The problem is occurring at the server application layer. Asterisk Manager is sending a "--END COMMAND--" before the TCP connection has even been terminated. Because of this behavior, I highly suspect a buffer on Asterisk's end is filling. It isn't that the socket is closing (because of slow ACKs or otherwise) and Asterisk is failing to send the entire response because of that.

By: Matt Jordan (mjordan) 2012-04-02 12:58:47.310-0500

Please attach a pcap demonstrating the scenario.  Please also attach a DEBUG log that illustrates this during the packet capture.

By: WRP (wrp) 2012-04-03 19:21:21.097-0500

The packet capture contains sensitive data. Will a screenshot of the higher-level packet conversation in Wire Shark suffice?

By: David Woolley (davidw) 2012-04-17 05:51:22.942-0500

To me this sounds like writetimout is set too low, i.e. a configuration error.



By: Matt Jordan (mjordan) 2012-05-02 08:33:00.006-0500

In order for notifications to be sent to bug marshals that you've provided feedback, you have to use the "Enter Feedback" button in JIRA - otherwise your comment can be missed.

Its up to you whether or not you'd like to provide a DEBUG log or a packet capture, both of which would aid immensely in debugging this issue.  You can try providing a screenshot, but if we are unable to reproduce this issue it will be closed.  

By: WRP (wrp) 2012-05-15 17:49:24.151-0500

Re: David Woolley's comment:

"writetimeout" appears to be deprecated.

[2012-05-15 15:47:11] NOTICE[22905]: manager.c:6690 __init_manager: Invalid keyword <writetimeout> = <2000> in manager.conf [general]

By: David Woolley (davidw) 2012-05-16 05:35:21.316-0500

From http://svn.digium.com/svn/asterisk/tags/1.8.7.2/main/manager.c:

{noformat}
} else if (!strcasecmp(var->name, "writetimeout")) {
int value = atoi(var->value);
if (value < 100) {
ast_log(LOG_WARNING, "Invalid writetimeout value '%s' at line %d\n", var->value, var->lineno);
} else {
user->writetimeout = value;
}
{noformat}

The error you quote relates to the general section, but this is a per user option.  Also, if it were deprecated, it would still be accepted, but possibly with a warning.

By: WRP (wrp) 2012-05-16 19:26:01.342-0500

Thank you, David. I'll try it in the user section and see if it resolves this issue.

By: mbrevda (lazytt) 2012-05-23 10:47:30.702-0500

a. this issue isn't just over medium latency connections, we have seen it on the same (local) host (with large datasets)
b. uping the value of writetimeout in manager.conf seems to resolve the issue

By: WRP (wrp) 2012-12-18 11:37:08.394-0600

@mbrevda You seem to be spot on. IMO writetimeout shouldn't be needed when data is actively being sent, but setting writetimeout to something higher does resolve the issue.

By: Malcolm Davenport (mdavenport) 2015-02-26 09:46:52.829-0600

I believe [~wrp] is correct.  If data's being sent, one shouldn't have to manually set a parameter to ensure that it actually all gets sent.

But, I'm going to close this, for now, as "workaround available."  If someone wants to move forward on a better solution, I'm certain that would be helpful.