Summary: | ASTERISK-27371: Serious Network Trouble Errors with TCP Enabled | ||
Reporter: | Steven Sedory (stevensedory) | Labels: | |
Date Opened: | 2017-10-23 18:44:17 | Date Closed: | 2020-01-14 11:14:05.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 13.17.2 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | FreePBX 13.0.192.19 and Asterisk 13.17.2, proxmox 4.4 on Dell R720, local RAID volume. Using TCP and obscure port for SIP. UDP 5060 still open/enabled, but firewalled to only allow Anveo Direct servers. | Attachments: | |
Description: | Per the environment explained above we are setup. The TCP port we use for our endpoints is 57875. The only other SIP traffic is UDP 5060 to Anveo Direct servers. UDP 10,000-20,000 is allowed from any for RTP. We also set qualify to 25 seconds to help keep NAT holes open.
That said, we get lots of errors like below (X.X.X.X has replaced IP's): {noformat} [2017-10-23 08:13:05] WARNING[10930] chan_sip.c: sip_xmit of 0x7f6058024ba0 (len 670) to X.X.X.X:50664 returned -2: Interrupted system call [2017-10-23 08:13:05] ERROR[10930] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data [2017-10-23 05:01:57] WARNING[21567] chan_sip.c: sip_xmit of 0x7f60b0005150 (len 714) to X.X.X.X:50567 returned -2: No such file or directory [2017-10-23 05:01:57] ERROR[21567] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data [2017-10-23 08:12:37] WARNING[10930] chan_sip.c: sip_xmit of 0x7f6058023360 (len 670) to X.X.X.X:50664 returned -2: Broken pipe [2017-10-23 08:12:37] ERROR[10930] chan_sip.c: Serious Network Trouble; __sip_xmit returns error for pkt data {noformat} Now, if I have the endpoints switch to UDP 5060, the errors go away. That said, shouldn't we be able to use TCP and almost any port and work just fine? We use an obscure port so we can open an obscure port to any, allowing endpoints to be registered from essentially any internet connection with much less security risk. And we use TCP, as it seems to eliminate most NAT traversal issues. | ||
Comments: | By: Asterisk Team (asteriskteam) 2017-10-23 18:44:17.550-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Joshua C. Colp (jcolp) 2017-10-24 18:25:11.123-0500 TCP should work fine even on a different port, but I'd suggest providing console output and "sip set debug on". As well - have you confirmed that the TCP connection is still open? Is it possible it was closed for some reason? By: Steven Sedory (stevensedory) 2017-10-24 19:17:36.003-0500 Hi Josh. Will provide output. Should I just let the log add up from now till tomorrow sometime with "sip set debug on" on? It seems the TCP connection either is still open or re-opens right after, as we don't have any known issues with users endpoints not working. By: Joshua C. Colp (jcolp) 2017-10-25 05:00:01.663-0500 Yes, you could. By: Steven Sedory (stevensedory) 2017-10-25 09:25:29.404-0500 So I let that run, and about an hour later, we got FRACKs and a crash, which are related to astobj2.c (like in our other thread here ASTERISK-27321), but not identical. We let debug run again a few hours later, replicating the FRACK and crash, but took a few more hours than the first crash. Here's a link to the entire log (was far too big to upload): https://cloud.verticalcomputers.com/index.php/s/igDEHxLFcWnw6KV Here's a snip of it: <--- Transmitting (NAT) to 47.145.213.138:5176 ---> SIP/2.0 200 OK Via: SIP/2.0/TCP 192.168.1.2:5176;branch=z9hG4bK-ff7f9fcb;received=47.145.213.138;rport=5176 From: <sip:303@dubois.ringvertical.com>;tag=SP25fb38ba7f9fae97a To: <sip:303@dubois.ringvertical.com>;tag=as5424874d Call-ID: 65c67b3c@192.168.1.2 CSeq: 213162 REGISTER Server: FPBX-13.0.192.19(13.17.1) Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE Supported: replaces, timer Expires: 60 Contact: <sip:303@47.145.213.138:5176;transport=tcp>;expires=60 Date: Wed, 25 Oct 2017 01:09:07 GMT Content-Length: 0 <------------> Scheduling destruction of SIP dialog '65c67b3c@192.168.1.2' in 32000 ms (Method: REGISTER) Really destroying SIP dialog '231a334b31c9638b7979bfd5696944da@23.187.0.112:57875' Method: OPTIONS <--- SIP read from TCP:47.145.213.138:5176 ---> SIP/2.0 200 OK Call-ID: 3f9a3b9065ea248c1f99eb074fb462f1@23.187.0.112:57875 CSeq: 102 OPTIONS Content-Length: 0 From: "Unknown" <sip:Unknown@23.187.0.112:57875>;tag=as4b7c1172 To: <sip:303@47.145.213.138:5176;transport=tcp> Via: SIP/2.0/TCP 23.187.0.112:57875;branch=z9hG4bK67cde5df;received=0.62.224.144;rport=1 <-------------> --- (7 headers 0 lines) --- [2017-10-24 18:09:07] NOTICE[3003]: chan_sip.c:30191 sip_poke_peer: Still have a QUALIFY dialog active, deleting [2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0) Got 15 backtrace records #0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a] #1: [0x45da45] /usr/sbin/asterisk() [0x45da45] #2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3] #3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef] #4: [0x7fbd88cb6c1e] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c1e) [0x7fbd88cb6c1e] #5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6] #6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c] #7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de] #8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405] #9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c] [2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0) Got 15 backtrace records #0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a] #1: [0x45da45] /usr/sbin/asterisk() [0x45da45] #2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3] #3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef] #4: [0x7fbd88cb6c3c] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c3c) [0x7fbd88cb6c3c] #5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6] #6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c] #7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de] #8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405] #9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c] [2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0) Got 15 backtrace records #0: [0x60614a] /usr/sbin/asterisk(__ast_assert_failed+0x88) [0x60614a] #1: [0x45da45] /usr/sbin/asterisk() [0x45da45] #2: [0x45dae3] /usr/sbin/asterisk() [0x45dae3] #3: [0x45f0ef] /usr/sbin/asterisk(__ao2_unlink+0x20) [0x45f0ef] #4: [0x7fbd88cb6c5a] /usr/lib64/asterisk/modules/chan_sip.so(+0x17c5a) [0x7fbd88cb6c5a] #5: [0x7fbd88d3a0c6] /usr/lib64/asterisk/modules/chan_sip.so(+0x9b0c6) [0x7fbd88d3a0c6] #6: [0x7fbd88cf486c] /usr/lib64/asterisk/modules/chan_sip.so(+0x5586c) [0x7fbd88cf486c] #7: [0x5c23de] /usr/sbin/asterisk(ast_sched_runq+0x13f) [0x5c23de] #8: [0x7fbd88d38405] /usr/lib64/asterisk/modules/chan_sip.so(+0x99405) [0x7fbd88d38405] #9: [0x602d4c] /usr/sbin/asterisk() [0x602d4c] [2017-10-24 18:09:07] ERROR[3003]: astobj2.c:121 INTERNAL_OBJ: FRACK!, Failed assertion user_data is NULL (0) Got 15 backtrace records v12*CLI> Disconnected from Asterisk server Asterisk cleanly ending (0). Executing last minute cleanups By: Steven Sedory (stevensedory) 2017-10-25 09:26:59.181-0500 Also, the FreePBX team updated their RPMs with BETTER_BACKTRACES last night. I just updated this morning, so we'll have better debug info on the next crash (I'm running sip debug on a couple less critical servers now, during business hours, to hopefully get another crash or FRACK). By: Kevin Harwell (kharwell) 2017-10-26 15:28:31.669-0500 Once you get a better backtrace please attach it to the issue along with the debug log at the time of the crash. I realize the log may be quite large, so I suggest trimming it down to contain the last few minutes before the crash. Or to at least a size where you can attach it to this issue. Otherwise as this is chan_sip (it is in extended support which means it is supported by the Asterisk community and it may be a while before someone looks at this) the logs may be gone if hosted externally. By: Steven Sedory (stevensedory) 2017-11-09 16:21:13.178-0600 So we still get lots of these serious network trouble errors, but we haven't had any FRACKs recently. Andrew Nagy from Sangoma says they're unrelated. Anything conclusive from my post on the 25th? If not, what else can we do to find the cause of these errors? Again, we use TCP and and port 57875 for SIP. I've noticed on one of our servers that is mostly VPN users, and has about 20 or so NAT'd users, and uses defautl UDP 5060, it has none of these errors. By: Joshua C. Colp (jcolp) 2017-12-05 11:31:47.266-0600 There does appear to be a problem with reference counting in the qualify support potentially. Without the extra backtrace information and log it's very much guessing though. If a community member has any hope of looking into this then they'd need to see the additional information. By: Asterisk Team (asteriskteam) 2017-12-19 12:00:01.926-0600 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines |