[Home]

Summary:ASTERISK-27371: Serious Network Trouble Errors with TCP Enabled
Reporter:Steven Sedory (stevensedory)Labels:
Date Opened:2017-10-23 18:44:17Date Closed:2020-01-14 11:14:05.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:13.17.2 Frequency of
Occurrence
Related
Issues:
Environment:FreePBX 13.0.192.19 and Asterisk 13.17.2, proxmox 4.4 on Dell R720, local RAID volume. Using TCP and obscure port for SIP. UDP 5060 still open/enabled, but firewalled to only allow Anveo Direct servers.Attachments:
Description:Per the environment explained above we are setup. The TCP port we use for our endpoints is 57875. The only other SIP traffic is UDP 5060 to Anveo Direct servers. UDP 10,000-20,000 is allowed from any for RTP. We also set qualify to 25 seconds to help keep NAT holes open.

That said, we get lots of errors like below (X.X.X.X has replaced IP's):
{noformat}
[2017-10-23 08:13:05] WARNING[10930] chan_sip.c: sip_xmit of 0x7f6058024ba0 (len 670) to X.X.X.X:50664 returned -2: Interrupted system call
[2017-10-23 08:13:05] ERROR[10930] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data

[2017-10-23 05:01:57] WARNING[21567] chan_sip.c: sip_xmit of 0x7f60b0005150 (len 714) to X.X.X.X:50567 returned -2: No such file or directory
[2017-10-23 05:01:57] ERROR[21567] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data

[2017-10-23 08:12:37] WARNING[10930] chan_sip.c: sip_xmit of 0x7f6058023360 (len 670) to X.X.X.X:50664 returned -2: Broken pipe
[2017-10-23 08:12:37] ERROR[10930] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data
{noformat}
Now, if I have the endpoints switch to UDP 5060, the errors go away. That said, shouldn't we be able to use TCP and almost any port and work just fine?

We use an obscure port so we can open an obscure port to any, allowing endpoints to be registered from essentially any internet connection with much less security risk. And we use TCP, as it seems to eliminate most NAT traversal issues.
Comments:By: Asterisk Team (asteriskteam) 2017-10-23 18:44:17.550-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2017-10-24 18:25:11.123-0500

TCP should work fine even on a different port, but I'd suggest providing console output and "sip set debug on". As well - have you confirmed that the TCP connection is still open? Is it possible it was closed for some reason?

By: Steven Sedory (stevensedory) 2017-10-24 19:17:36.003-0500

Hi Josh. Will provide output. Should I just let the log add up from now till tomorrow sometime with "sip set debug on" on?

It seems the TCP connection either is still open or re-opens right after, as we don't have any known issues with users endpoints not working.

By: Joshua C. Colp (jcolp) 2017-10-25 05:00:01.663-0500

Yes, you could.

By: Steven Sedory (stevensedory) 2017-10-25 09:25:29.404-0500

So I let that run, and about an hour later, we got FRACKs and a crash, which are related to astobj2.c (like in our other thread here ASTERISK-27321), but not identical. We let debug run again a few hours later, replicating the FRACK and crash, but took a few more hours than the first crash.

Here's a link to the entire log (was far too big to upload): https://cloud.verticalcomputers.com/index.php/s/igDEHxLFcWnw6KV

Here's a snip of it:

<--- Transmitting (NAT) to 47.145.213.138:5176 --->
SIP/2.0 200 OK
Via: SIP/2.0/TCP 192.168.1.2:5176;branch=z9hG4bK-ff7f9fcb;received=47.145.213.138;rport=5176
From: <sip:303@dubois.ringvertical.com>;tag=SP25fb38ba7f9fae97a
To: <sip:303@dubois.ringvertical.com>;tag=as5424874d
Call-ID: 65c67b3c@192.168.1.2
CSeq: 213162 REGISTER
Server: FPBX-13.0.192.19(13.17.1)
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Expires: 60
Contact: <sip:303@47.145.213.138:5176;transport=tcp>;expires=60
Date: Wed, 25 Oct 2017 01:09:07 GMT
Content-Length: 0


<------------>
Scheduling destruction of SIP dialog '65c67b3c@192.168.1.2' in 32000 ms (Method: REGISTER)
Really destroying SIP dialog '231a334b31c9638b7979bfd5696944da@23.187.0.112:57875' Method: OPTIONS

<--- SIP read from TCP:47.145.213.138:5176 --->
SIP/2.0 200 OK
Call-ID: 3f9a3b9065ea248c1f99eb074fb462f1@23.187.0.112:57875
CSeq: 102 OPTIONS
Content-Length: 0
From: "Unknown" <sip:Unknown@23.187.0.112:57875>;tag=as4b7c1172
To: <sip:303@47.145.213.138:5176;transport=tcp>
Via: SIP/2.0/TCP 23.187.0.112:57875;branch=z9hG4bK67cde5df;received=0.62.224.144;rport=1

<------------->
--- (7 headers 0 lines) ---
[2017-10-24 18:09:07] NOTICE[3003]: chan_sip.c:30191 sip_poke_peer: Still have a QUALIFY dialog active, deleting
[2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0)
Got 15 backtrace records
#0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a]
#1: [0x45da45] /usr/sbin/asterisk() [0x45da45]
#2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3]
#3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef]
#4: [0x7fbd88cb6c1e] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c1e) [0x7fbd88cb6c1e]
#5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6]
#6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c]
#7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de]
#8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405]
#9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c]
[2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0)
Got 15 backtrace records
#0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a]
#1: [0x45da45] /usr/sbin/asterisk() [0x45da45]
#2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3]
#3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef]
#4: [0x7fbd88cb6c3c] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c3c) [0x7fbd88cb6c3c]
#5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6]
#6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c]
#7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de]
#8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405]
#9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c]
[2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0)
Got 15 backtrace records
#0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a]
#1: [0x45da45] /usr/sbin/asterisk() [0x45da45]
#2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3]
#3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef]
#4: [0x7fbd88cb6c5a] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c5a) [0x7fbd88cb6c5a]
#5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6]
#6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c]
#7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de]
#8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405]
#9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c]
[2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0)
Got 15 backtrace records
v12*CLI>
Disconnected from Asterisk server
Asterisk cleanly ending (0).
Executing last minute cleanups

By: Steven Sedory (stevensedory) 2017-10-25 09:26:59.181-0500

Also, the FreePBX team updated their RPMs with BETTER_BACKTRACES last night. I just updated this morning, so we'll have better debug info on the next crash (I'm running sip debug on a couple less critical servers now, during business hours, to hopefully get another crash or FRACK).

By: Kevin Harwell (kharwell) 2017-10-26 15:28:31.669-0500

Once you get a better backtrace please attach it to the issue along with the debug log at the time of the crash. I realize the log may be quite large, so I suggest trimming it down to contain the last few minutes before the crash. Or to at least a size where you can attach it to this issue. Otherwise as this is chan_sip (it is in extended support which means it is supported by the Asterisk community and it may be a while before someone looks at this) the logs may be gone if hosted externally.

By: Steven Sedory (stevensedory) 2017-11-09 16:21:13.178-0600

So we still get lots of these serious network trouble errors, but we haven't had any FRACKs recently. Andrew Nagy from Sangoma says they're unrelated. Anything conclusive from my post on the 25th?

If not, what else can we do to find the cause of these errors?

Again, we use TCP and and port 57875 for SIP. I've noticed on one of our servers that is mostly VPN users, and has about 20 or so NAT'd users, and uses defautl UDP 5060, it has none of these errors.

By: Joshua C. Colp (jcolp) 2017-12-05 11:31:47.266-0600

There does appear to be a problem with reference counting in the qualify support potentially. Without the extra backtrace information and log it's very much guessing though. If a community member has any hope of looking into this then they'd need to see the additional information.

By: Asterisk Team (asteriskteam) 2017-12-19 12:00:01.926-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines