[Home]

Summary:ASTERISK-28634: Invite loop within PJSIP
Reporter:Joeran Vinzens (jvinzens)Labels:patch
Date Opened:2019-11-26 03:21:45.000-0600Date Closed:2020-05-20 10:00:28
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:17.0.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Asterisk 17.0.0 with bundled PJSIP on Debian BusterAttachments:( 0) asterisk_invite_loop_other_side.pcap
( 1) asterisk.conf
( 2) ASTERISK-28871.diff
( 3) invite_loop_small.pcap
( 4) pjsip.conf
( 5) rtp.conf
( 6) Screenshot_from_2019-11-26_10-19-59.png
( 7) sipp_scenario.tar.xz
Description:Fom time to time we can see traffic increase on out Asterisk Systems. After some debugging we see the Asterisk send kind of an Invite loop over and over. Since there is no way to deny re-invites in PJSIP there is no way out of it. From asterisk perspective it sends these re-invites as fast as it can which result in an increase of 3MBit/s of signalling traffic from this maschine. So far we do not know how to reproduce the issue but we captured the scenario from our production system. (trace attached) the Asterisk log itself just tells us the Invite is going out but nothing further. There was no trigger on the other call-leg.
Comments:By: Asterisk Team (asteriskteam) 2019-11-26 03:21:47.216-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joeran Vinzens (jvinzens) 2019-11-26 03:22:14.679-0600

Invite Loop

By: Joeran Vinzens (jvinzens) 2019-11-26 03:23:57.379-0600

Behavior of network load

By: Benjamin Keith Ford (bford) 2019-11-26 11:20:57.807-0600

Can you provide your pjsip.conf file? Do you know what's going on at the time when the re-invites begin, such as channels being put on hold, or topology changes that could result in the need for a re-invite?

I'm going over the pcap currently to see if there's any additional information to be gathered from there. Can you specify which IP address belongs to what? And is the scenario Asterisk and another phone, or some other setup?

By: Joeran Vinzens (jvinzens) 2019-11-27 03:28:58.237-0600

pjsip config file without username and passwords

By: Joeran Vinzens (jvinzens) 2019-11-27 03:40:53.185-0600

Asterisk Server IP 217.10.77.25. There is no other IP Address on this system configured.

So it is an outgoing Invite from Asterisk towards out customer.
There are some kamailio instances in between but these are just passing the traffic though.


By: Joeran Vinzens (jvinzens) 2019-11-27 03:41:52.099-0600

Asterisk SIP Trace A side. No re-invites are triggered here

By: Benjamin Keith Ford (bford) 2019-11-27 09:20:33.999-0600

So, since the re-invites are not seen in the pcap for the other side, I'm wondering if it is a valid re-invite, but it's never making it to the customer and that's causing it to loop. Can you check the kamailio instances to see if the re-invite is being seen and sent to the customer?

By: Asterisk Team (asteriskteam) 2019-12-11 12:00:02.101-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Joeran Vinzens (jvinzens) 2019-12-16 03:54:21.582-0600

This was a misunderstanding i guess.

The Trace "asterisk_invite_loop_other_side.pcap" is the incoming Call to the asterisk. IN this case we got this call from an inbound carrier.
The first Trace I attached "invite_loop_small.pcap" is the call leg to the customer. Here we can see the messages from the asterisk to an internal loadbalancer (kamailio). From there out to the costumer i can see the re-Invites. Our customer get all of the messages you can see in the trace "invite_loop_small.pcap".

Even if the kamailio would block the messages it would be a problem. In total we have about 100+ Asterisk instances and if all of them were Asterisk 17 we would see this issue all over the place and than the kamailio would get in trouble.


By: Asterisk Team (asteriskteam) 2019-12-16 03:54:22.413-0600

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Kevin Harwell (kharwell) 2019-12-16 18:30:13.067-0600

You mentioned there is nothing of note in the Asterisk log. Is that with debug and verbose enabled? If not would it possible to enable [1] and set those to at least level 3 along with SIP debug, and then attach the full Asterisk log of an occurrence?

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Kevin Harwell (kharwell) 2019-12-16 18:33:10.049-0600

Also was this a server that was upgraded to Asterisk 17.0.0? What was the previous version that did not exhibit the problem?

By: Asterisk Team (asteriskteam) 2019-12-31 12:00:01.074-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Joeran Vinzens (jvinzens) 2020-02-07 07:00:20.585-0600

We have some problems reproducing the issue and provide logs. (without any customer relevant content)

We do have this issue frequently but sporadic in our production environment. We took the SIP Messages from this call and configured SIPP with that but we were not able to reproduce.

Are there any more debug flags we can switch on to see why this occur? In the logs you cannot see any reason why just that there is a reinvite in PJSIP logger.

At the moment we use Asterisk 17.1 in production where we see this issue.

By: Asterisk Team (asteriskteam) 2020-02-07 07:00:22.050-0600

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: George Joseph (gjoseph) 2020-02-10 08:36:04.401-0600

I'm trying to reproduce.


By: George Joseph (gjoseph) 2020-02-10 13:43:36.263-0600

Hi Joeran,  We're taking another stab at trying to reproduce this and have just have a few more questions...

Can you attach the asterisk.conf and rtp.conf files?

I'm assuming that, since the incoming call lasted 20 minutes, that audio was flowing in both directions, yes?  On the outgoing side, did the loop ever stop or did it just continue until the call ended?  I can't tell from the pcap because it only shows the first 15 seconds or so.

Thanks.


By: Joeran Vinzens (jvinzens) 2020-02-11 01:55:25.803-0600

asterisk and rtp config as requested

By: Joeran Vinzens (jvinzens) 2020-02-11 01:58:04.073-0600

Hi George Joseph,

many thanks for your help.

The RTP was flowing the entire time until the Client hung up. And Yes the re-invite loop was going on until the call was ended. It does not stop at a certain point.

BR
Jöran

By: Joeran Vinzens (jvinzens) 2020-02-11 02:04:07.548-0600

We tried to reprocude the scenario using sipp but we failed. I will attach my scenario files. We used the basix PBX configuration from asterisk make-config to have a clean setup.

from SIP it looks exactly the same just that we have some Kamailio LB in our production environment.

the sipp commands we used are:
outgoing call:
```sipp 127.0.0.1 -sf "sipp_outgoing.xml" -inf "sip-credentials.csv" -m 1 -s 1107 -p 5080```

Incoming call and a register beforehand:
```
sipp 127.0.0.1 -sf sipp_register.xml -inf "sip-credentials.csv" -m 1 -p 5070
sipp 127.0.0.1 -sf sipp_accept.xml -inf "sip-credentials.csv" -m 1 -p 5070
```

By: Joeran Vinzens (jvinzens) 2020-02-11 02:04:48.311-0600

sipP Scenario files from unsuccessful try to reproduce the issue

By: Joeran Vinzens (jvinzens) 2020-02-11 02:10:06.856-0600

while trying to reproduce the issue the logs are telling exactly the same in the test and in production. You cannot see anything regarding the re-invites except the pjsip logger

Both log say the call gets established, both enter the bridge and change the bridge type to native and at the end of the call bridge changes back and call gets hung up.

By: George Joseph (gjoseph) 2020-02-11 09:57:18.305-0600

One more question...   which pjsip endpoint was used for the incoming call and wh ich was used for the outgoing call that looped?


By: George Joseph (gjoseph) 2020-02-11 14:04:15.704-0600

OK, I was able to reproduce this...

We believe what was happening was either the original caller or the outbound carrier was sending media in a format that wasn't actually negotiated.  When that happens, Asterisk tries to send a re-invite with a new SDP offer to get the topologies synchronized.  Since the carrier just kept sending the same SDP answer back the topologies could never be synchronized hence the loop.   The good news is that the loop was fixed in Asterisk 17.1.

I'm closing this issue for now but if, after upgrading to 17.1 or 17.2, the loop continues, you can re-open the issue just by commenting in it.




By: Jonas (boettner) 2020-04-23 09:48:04.643-0500

We were able to reproduce this behavior on Asterisk 17.3.0 (built with the patch suggested in ASTERISK-28839). We'll send logfiles and traces via e-mail due to data privacy regulations.

As Joeran mentioned before logfiles are not indicating any errors.

By: Joeran Vinzens (jvinzens) 2020-05-05 06:43:50.937-0500

Some further information. With Asterisk 17.3 we now have observed two crashes where the asterisk continues the invite loop for over an hour and after more than 500000 messages it seems to produce a segfault (at least in our case) within the initial LWP.

We have some backtrace and logs if you need them. We would provide these if needed via email again due to privacy concerns.

We will update our systems to 17.4 today, but since it does not seem to have been fixed (according changelog) we do not expect any changes. We will update this ticket again as soon as we see this again on current version.

By: Joshua C. Colp (jcolp) 2020-05-05 06:47:01.905-0500

They're not needed, the already provided information remains on the internal issue.

By: Joshua C. Colp (jcolp) 2020-05-18 09:09:47.366-0500

Please try the attached patch. This returns the behavior to that of past versions. Improved codec negotiation is occurring for Asterisk 18, and this functionality could be revisited then to be improved.

By: Jonas (boettner) 2020-05-20 09:59:55.821-0500

Many thanks for your help! After implementing the patch we were not able to reproduce the former behavior. The patch is running in production for about 24h and everything seems to work fine.