[Home]

Summary:ASTERISK-29692: Asterisk stops responding to SIP OPTIONS requests from other SIP servers or asterisk servers via LAN or WAN with PJSIP module
Reporter:Julio Guarniz (jguarniz04)Labels:
Date Opened:2021-10-15 13:07:17Date Closed:2021-10-26 09:27:01
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:16.21.0 Frequency of
Occurrence
Frequent
Related
Issues:
duplicatesASTERISK-29286 chan_pjsip: Deadlock between sending response and transaction layer
Environment:CentOS Linux release 7.9.2009 (Core) 16 vCPU with + 64GB RAMAttachments:( 0) core-asterisk-running-2021-10-15T09-24-08-0500-brief.txt
( 1) core-asterisk-running-2021-10-15T09-24-08-0500-full.txt
( 2) core-asterisk-running-2021-10-15T09-24-08-0500-info.txt
( 3) core-asterisk-running-2021-10-15T09-24-08-0500-locks.txt
( 4) core-asterisk-running-2021-10-15T09-24-08-0500-thread1.txt
( 5) options_not_response.PNG
( 6) pjsip.conf
Description:Good morning, I have this problem that has been happening more frequently recently.
An Asterisk server after working normally, suddenly stops responding "sip options" messages from other Asterisk's SIP servers, so the other sip servers declare this asterisk server as down at the SIP level.

I am using Asterisk 16.21.0 with the PJSIP channel, however the asterisk service continues to run and continues to send sip options to the other servers, therefore the trunks on this server's side are always active, but this asterisk for the other servers is UNREACHABLE.

I have checked the system logs and I have no errors or warnings that could indicate that the problem is related to a particular action or configuration.

I have also enabled the CORE SET DEBUG to see the events that occur when this happens and I only get these messages.

#########################################
[Oct 5 12:40:02] DEBUG[7979]: res_pjsip/pjsip_distributor.c:503 distributor: Searching for serializer associated with dialog dlg0x7f379823d348 for Response msg 481/ACK/cseq=15893 (rdata0x7f37852a0f38)
[Oct 5 12:40:02] DEBUG[7979]: res_pjsip/pjsip_distributor.c:520 distributor: No dialog serializer for Response msg 481/ACK/cseq=15893 (rdata0x7f37852a0f38). Using request transaction as basis.
[Oct 5 12:40:02] DEBUG[7979]: res_pjsip/pjsip_distributor.c:128 find_request_serializer: Found transaction tsx0x7f3798e68298 for Response msg 481/ACK/cseq=15893 (rdata0x7f37852a0f38).
[Oct 5 12:40:02] DEBUG[7979]: res_pjsip/pjsip_distributor.c:138 find_request_serializer: Found serializer pjsip/distributor-000026e9 on transaction tsx0x7f3798e68298

############################################

It should be noted that the problem is solved, every time I restart asterisk and everything returns to normal, happening again in some cases after 2 or 3 days or in some cases up to 1 week later, so it is very difficult to replicate it and know the exact moment when it will happen.

Today I had this problem again, but previously, after a similar event previously, I had already recompiled asterisk using the DONT OPTIMIZE option, which is recommended to obtain a backtrace and be able to have a better vision of what may be happening.
I attach the files that I obtained after the problem happened and asterisk was still active, I obtained this by executing the following:
/var/lib/asterisk/scripts/ast_coredumper --running


Likewise, I am attaching captures of the traces obtained by sngrep on the affected asterisk server (192.168.0.5), where it is seen that it receives requests from SIP OPTIONS, but is not able to respond to them, so the other servers declare it UNREACHABLE or UNAVAILABLE

I really appreciate that you can help me to locate the problem and that the attachments are of help to solve this.

Greetings and thanks.
Comments:By: Asterisk Team (asteriskteam) 2021-10-15 13:07:19.052-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Julio Guarniz (jguarniz04) 2021-10-15 13:11:22.199-0500

I attach what is indicated in the description.

By: Benjamin Keith Ford (bford) 2021-10-18 09:36:53.480-0500

Were you on an older version before using 16.21.0? If so, was the problem still present there? Have you tried testing with any other versions?

By: Julio Guarniz (jguarniz04) 2021-10-18 12:40:33.228-0500

Yes, before I had asterisk working in the version, Asterisk certified /16.8-cert9, is where this problem appeared, however I updated to this latest version indicated "Asterisk 16.21.0" and the problem persists, it is difficult to know at what time, no There is a pattern in which this problem appears, it can appear up to 2 times in a row in the same day and on other occasions it could take up to 5 days for the problem to happen again, however lately it has become more frequent.

I would greatly appreciate your support and tell me if you would need more information so that you can guide me and locate the origin of the problem and thus find a solution for this.

Thanks.

By: Joshua C. Colp (jcolp) 2021-10-18 12:41:56.006-0500

Just to set expectations - if accepted there is no time frame on when a fix would occur for this or when it would get looked into further.

By: Julio Guarniz (jguarniz04) 2021-10-18 16:11:33.729-0500

I understand, thank you very much for your reply.
Just for knowledge, you could hire a commercial support plan that can evaluate this case and support us with this review in order to solve or support us in locating the problem, if requested, what would be the process to acquire it and the conditions of this?

By: Joshua C. Colp (jcolp) 2021-10-18 17:33:32.921-0500

You would need to contact Sangoma Sales to see if such a thing is possible, or look for a consultant.

By: Benjamin Keith Ford (bford) 2021-10-19 11:25:27.622-0500

Can you attach logs from both instances? One from the instance sending the OPTIONS request, the other from the instance that the request is being sent to? Debug and verbose will need to be turned up (9 is ideal).

There's a chance this may not yield any additional information. If we were to supply a patch with debug output, would you be able to apply the patch and provide additional information from that?

As Josh mentioned, there's no time frame on when this would get worked, so if this is an issue that needs to get resolved immediately, other options such as the mailing lists, the forums, a bounty, or direct support might help. We can provide additional information if you decide to go that route.

By: Julio Guarniz (jguarniz04) 2021-10-19 12:10:53.274-0500

Thank you very much for your answer Benjamin,

I understand that I would need to have debugging enabled permanently to be able to attach them, tell me, this could impact the performance of the server in its operation? I ask this since it is a server with a regular load and I do not want to affect the operation of the users . if it does not affect it, you could keep it enabled waiting for the problem to happen again.

As I mentioned above, it is very difficult to know when it will happen, additionally, there are many servers that send the SIP options and have no response when the problem happens, I understand that you would only need as shown the DEBUG and VERBOSE log of one of them, and also from the server that receives them and is not able to answer them, is it correct?

Regarding the patch with debugging output, this could impact the operation of the server service ?, since I understand that when recompiling asterisk with debugging options it impacts the performance of said server. As I mentioned, this server is in constant production and operation, if not, I could apply it following the recommendations that you provide me and provide you with information based on it.

Regarding the time you mention, it is understandable, however, since it is a very difficult problem to locate that directly impacts the operation, we are considering the support or support of a specialist who can review this problem very thoroughly, it would be of great help if Could you provide me with more information for direct support from Digium or Sangoma, or in any case provide me with the necessary information to contemplate the conditions, times and costs about it.

Thank you very much, I await your comments with the information about the direct support and the additional queries that I made.

By: Joshua C. Colp (jcolp) 2021-10-25 05:19:47.251-0500

This is a duplicate of ASTERISK-29286. This is a PJSIP issue itself, and on that patch the reporter filed an issue on their issue tracker and received a potential fix. Unfortunately they never responded back whether it worked or not. You can test it if you wish.

By: Julio Guarniz (jguarniz04) 2021-10-26 09:20:44.509-0500

Thank you very much for the information, I was looking for some cases related to mine, but I could not find, what you shared is very helpful, I will review and test it, I hope you can solve the problem this, I will share the feedback and results when I apply it.

By: Asterisk Team (asteriskteam) 2021-10-26 09:20:45.081-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.