ASTERISK-19365: Remote SIP Call legs are frequently not released in a cross-linked Asterisk scenario (directmedia & sendrpid)

[Home]

Summary: ASTERISK-19365: Remote SIP Call legs are frequently not released in a cross-linked Asterisk scenario (directmedia & sendrpid)

Reporter: Thomas Arimont (tomaso) Labels:

Date Opened: 2012-02-15 03:47:01.000-0600 Date Closed: 2012-03-21 08:20:19

Priority: Major Regression?

Status: Closed/Complete Components: Channels/chan_sip/General

Versions: 1.8.8.2 Frequency of
Occurrence Frequent

Related
Issues:

Environment: i568 Attachments: ( 0) BasicCallCrosslinkedAsteriskDirectMedia_NOK.pcap
( 1) bugASTERISK-19365_2012_03_08.patch
( 2) core_debug_6_10.52.26.1.log
( 3) core_debug_6_10.52.26.2.log

Description: Actually referring to 1.8.8.0-rc4_Rev:345544

As already additionally reported in ASTERISK-19355 'remote' SIP Call legs are frequently not released in a cross-linked Asterisk scenario with active directmedia if the 'local' phone goes onhook. The behaviour seems to be dependent of the used machine resp. the machine performance, i.e. it's a timing issue. On one of our systems with less system performance the occurence is frequent. On a server machine it's hard to reproduce. Probably the interconnecting OpenSER is also relevant for the timing character of this issue.

See attached wireshark trace for SIP details.

Traced scenario (call from subscr. 62 to subscr. 321, 321 releases the call):

{noformat}
subscr 62 (10.52.26.6, SNOM 360) <-> Asterisk 1 (10.52.26.1, headnumber=1000, dialoutPrefix=66) <- sip trunk ->
<-> OpenSER (10.52.17.132, SIP router & registrar for the asterisks) <->
<- sip trunk -> Asterisk 2 (10.52.26.2, headnumber=2000, dialoutPrefix=66) <-> subscr 321 (10.52.26.5, OpenStage 60)
{noformat}

Comments: By: Matt Jordan (mjordan) 2012-02-16 16:47:53.985-0600

Just to confirm in the scenario you outlined above:
* 10.52.26.5 is the 'local phone', which does hangup with a BYE with Asterisk 2
* 10.52.26.6 is the 'remote phone', and never receives a BYE from Asterisk 1

By: Thomas Arimont (tomaso) 2012-02-17 02:42:53.658-0600

Matt,
yes, this is exactly right. Matt, because at least I see a timing character of this issue (maybe especially the OpenSER inbetween) please feel free to ask for further tests or further information. It might be difficult ro reproduce this isuue in a different environment.
Thomas
By: Matt Jordan (mjordan) 2012-02-20 11:08:46.331-0600

Thomas,

I was thinking the same thing. If you can provide the SIP configurations for the two phones a DEBUG log illustrating the problem (particularly the part where the BYE is sent from one Asterisk box but not handled by the other), that would help.

There is a patch r349339 that went into Asterisk 1.8.8.1, that has some potential for impacting this situation. It dealt with a local bridge loop getting broken too soon in the presence of a certain control frame - when that occurred, interesting behavior would sometimes happen as the SIP channel was no longer in a correct state. You may want to try applying that patch and see if it solves this situation.
By: Thomas Arimont (tomaso) 2012-02-22 04:18:30.065-0600

Matt,
enclose the requested "core set debug 6" log outputs from both asterisk systems during the call release. Same scenario: phone 10.52.26.6 is calling phone 10.52.26.5 via asterisk 10.52.26.1, openser 10.52.17.132 and asterisk 10.52.26.2.
Phone 10.52.26.6 at asterisk 10.52.26.1 is releasing the active call. Phone 10.52.26.5 stays active forever ...

By: Thomas Arimont (tomaso) 2012-02-22 05:10:57.363-0600

Matt,
as you recommended I tested the mohproblem.patch from reviewboard r/1640/.
There might be a progress (it seems that the number of unsuccesful call releases is smaller) but the problem is still there. I cannot reproduce the problem now when I wait 5 sec before going onhook. But when the call is released quickly after answering it it's almost a constant behaviour. Maybe this a not really the same problem resp. has the same cause?
Thomas
By: Matt Jordan (mjordan) 2012-02-22 08:37:35.626-0600

Thanks for getting the logs. Given the complexity of this problem, I figured trying the patch for r349339 was worth a shot. The behavior that could occur was sometimes difficult to pin down (the lack of MOH was not the only symptom that could be displayed), and is a known issue in 1.8.8.0. If it didn't resolve it, then we'll just have to do it the hard way :-)
By: Thomas Arimont (tomaso) 2012-02-23 07:58:11.452-0600

How can you smile when you're talking about doing it the hard way? ;-)
By: Matt Jordan (mjordan) 2012-03-08 12:25:07.162-0600

Thomas:

I've attached a patch (bugASTERISK-19173_2012_03_08.patch) that I believe will address this issue.

The root cause appears to be the clearing of the pending BYE flag during a CANCEL for a re-INVITE. There's a strong timing element to this problem, as the pending BYE flag has to be set by sip_hangup before a final response for the re-INVITE, and before some provisional response, the handling of which triggers the CANCEL. In your particular case, it most likely occurs due to the authentication required by the proxy, which causes the re-INVITE to be sent again. In most cases, a UA would response with a final response to the re-INVITE before sip_hangup would be called.

If you could test this and see if it resolves the issue, that'd be great. If it doesn't, please attach a DEBUG log from the Asterisk instance that is handling the remote bridge.

Thanks!

Matt
By: Thomas Arimont (tomaso) 2012-03-09 09:41:01.369-0600

Matt,
this sounds good! I will test your patch at the beginning of next week.
Thomas
By: Maurice Winkels (maurice) 2012-03-21 04:15:09.509-0500

Hi Matt,
sorry, my actual JIRA account is blocked now for more than one week, so I couldn't answer.
Now I'll try it with this private JIRA account.
Your patch seems to solve the problem. I couldn't reproduce this issue any longer.
Thanks, good job!

Regards
Thomas