[Home]

Summary:ASTERISK-29137: pjsip: Broken pipe when sending requests
Reporter:Luke Escude (lukeescude)Labels:
Date Opened:2020-10-20 14:03:51Date Closed:2023-02-20 12:00:16.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:16.13.0 16.30.0 Frequency of
Occurrence
Related
Issues:
Environment:CentOS x64Attachments:
Description:We have about 450 Asterisk instances in production, and I've only seen this happen a few times:

{noformat}
[Oct 20 18:57:18] ERROR[11455]: res_pjsip.c:4290 endpt_send_request: Error 120032 'Broken pipe' sending NOTIFY request to endpoint 10801
[Oct 20 18:57:18] WARNING[302]: pjproject: <?>:     tsx0x7f29403de688 .......Error sending Response msg 200/SUBSCRIBE/cseq=1135 (tdta0x7f29403dee98): Broken pipe
[Oct 20 18:57:18] ERROR[4089]: res_pjsip.c:4290 endpt_send_request: Error 120032 'Broken pipe' sending NOTIFY request to endpoint 10801
[Oct 20 18:57:18] ERROR[13740]: res_pjsip.c:4290 endpt_send_request: Error 120032 'Broken pipe' sending OPTIONS request to endpoint 723
{noformat}

Currently I am fixing a small bug in our proxies that causes Asterisk to add a x-ast-orig-host parameter to contacts... Could that cause something like this?

The transport is TCP for all communications.
Comments:By: Asterisk Team (asteriskteam) 2020-10-20 14:03:52.518-0500

The severity of this issue has been automatically downgraded from "Blocker" to "Major". The "Blocker" severity is reserved for issues which have been determined to block the next release of Asterisk. This severity can only be set by privileged users. If this issue is deemed to block the next release it will be updated accordingly during the triage process.

By: Asterisk Team (asteriskteam) 2020-10-20 14:03:52.883-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2020-10-21 03:48:49.319-0500

Without a full trace leading up to that it's extremely hard to say or investigate what has happened. There may be an off-nominal in a TCP closure for example, and the underlying TCP connection was dropped. It could have happened due to conditions or because of the off-nominal. This is all theoretical.

Did this impact everything? A specific IP+port?

It is unlikely the parameter would cause it.

By: Luke Escude (lukeescude) 2020-10-21 16:33:00.026-0500

Yeah there's not much to go off of, I have been spending the whole day getting a Graylog server up and ready to receive all asterisk logging, so within the next few days I'll be pushing out an update across all instances that will send me all ERROR and WARNING messages.

It impacted the entire Asterisk instance, yes - No SIP packets would function anymore to any endpoints or trunks. It didn't seem to want to re-create a new TCP socket either, so I had to restart that customer's phone system. Hence the reason I couldn't dig into it further.

Hopefully after I get cluster-wide logging going, I can get more info next time.

By: Asterisk Team (asteriskteam) 2020-11-05 12:00:00.985-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Luke Escude (lukeescude) 2023-02-03 12:04:18.867-0600

This appears to have happened again, but I have new info. Log filled up with these:

[Feb  3 17:57:31] ERROR[10738]: res_pjsip.c:1579 endpt_send_request: Error 120032 'Broken pipe' sending OPTIONS request to endpoint (SIP PROXY ADDRESS)

This is on Asterisk 16.30.

The interesting thing is that this particular customer was NOT using TCP - They've been switched over to our new UDP-only proxies. So, the whole tcp issue seems to be more of a symptom.

In this case, ALL  sip processing stopped - UDP, TCP, didn't matter which. This customer is pretty high-volume, so maybe there's some limit to pjsip that we're hitting over time.

By: Asterisk Team (asteriskteam) 2023-02-03 12:04:19.802-0600

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Luke Escude (lukeescude) 2023-02-03 12:21:24.083-0600

Out of curiosity, do you think the AMI would be frozen during this time as well? If so, I can implement a Ping/Pong style healthcheck to reboot asterisk when this occurs.



By: Joshua C. Colp (jcolp) 2023-02-03 12:30:13.802-0600

If it's only impacting SIP sockets then no. If it's impacting all sockets then possibly. There is no answer, and nothing else that can really be said.

By: Luke Escude (lukeescude) 2023-02-03 12:38:36.150-0600

That's OK, just wanted to update this ticket as I hadn't seen it happen since opening it.

I should have run a 'manager show connected' while the error was occurring to test that theory. Unfortunately this only happens to high-volume customer systems where there are at least 400 SIP endpoints registered, so I don't have the luxury of taking a while to diagnose it.

By: Joshua C. Colp (jcolp) 2023-02-06 04:23:59.723-0600

As previously mentioned without a full log this is like finding a needle in a haystack - effectively impossible.

By: Asterisk Team (asteriskteam) 2023-02-20 12:00:15.824-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines