[Home]

Summary:ASTERISK-29138: Asterisk doesn't process ACK on high CPS
Reporter:Sergio (53rg10)Labels:
Date Opened:2020-10-20 16:55:29Date Closed:2020-10-21 06:43:16
Priority:MajorRegression?No
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:16.3.0 17.7.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 8 5.8.11-1.el8.elrepo.x86_64 CentOS 8 4.18.0-193.19.1.el8_2.x86_64 Ubuntu 5.4.0-51-generic #56-Ubuntu Ryzen 2700 2 x EPYC 7742 10 Gb/s LAN connection sipp SIPp v3.6.1-TLS-SCTP-PCAP-RTPSTREAM sipp command: sipp -s uac -rtp_echo -nr -trace_msg -trace_err -d 60s -r 75 -l 5000 -i 10.40.1.36 -p 5070 -s 505 10.40.1.58:5060 Asterisk compiled with default configuration, only selected DONT_OPTIMIZE in menuselect.Attachments:
Description:As explained here https://community.asterisk.org/t/pjsip-stack-resends-messages-on-high-cps/86102 on a bit higher load Asterisk seems to fail recognizing the ACK and resends 200 OK during call setup.

I've attached the core dump, tcpdump, config files and sip/debug logs here https://drive.google.com/drive/folders/1qvg5nNFa32egMss_0SjFubgLaWUBuVmO?usp=sharing

tcpdump was taken on the machine running asterisk to avoid network issues.
Issue is consistent, and can be repeated easily.
Comments:By: Asterisk Team (asteriskteam) 2020-10-20 16:55:30.673-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2020-10-21 03:59:12.205-0500

Your system appears to be hung in DNS lookups while looking up the hostname "gladiator1.chaos.hr" which I believe is your local hostname. This is used by both chan_sip and chan_pjsip to get the local IP address for RTP/SDP purposes. Is this configured in /etc/hosts to remove any reliance on an external DNS server?

By: Sergio (53rg10) 2020-10-21 06:25:19.683-0500

Joshua, you sir deserve a medal ;)
After looking at the tcpdump and DNS traffic I can confirm that was the issue.
After adding the hostname to the hosts file, I got to 10k calls with 100 cps on the first run.

Please let me know how did you find out it was DNS? Logs, backtrace, experience? Just asking to not bother with future issues ;)

Also while exploring limits of asterisk I ran into "FRACK!, Failed assertion Excessive refcount" is it appropriate to discuss this further here, or on the forums?

By: Joshua C. Colp (jcolp) 2020-10-21 06:30:46.234-0500

The backtrace stated it was in DNS lookup code, and the hostname was one of the parameters.

That FRACK may be fine for you - it really depends on the load of the system and usage. It exists for systems with reasonable usage (at least according to us) where it would never be reached. If you are pushing things extremely, then it could certainly be hit. In that scenario it alone would not indicate an issue.

It's configured used a define in code[1].

You'd need to really dig into the specifics - what object and such - to understand where the FRACK came from.

[1] https://github.com/asterisk/asterisk/blob/master/main/astobj2.c#L574

By: Sergio (53rg10) 2020-10-21 06:41:44.944-0500

Thx a lot for your fast response & support!

By: Asterisk Team (asteriskteam) 2020-10-21 06:41:45.069-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.