[Home]

Summary:ASTERISK-10105: SIP with canreinvite=yes through multiple Asterisk instances fails
Reporter:Edwin Groothuis (mavetju)Labels:
Date Opened:2007-08-17 08:50:04Date Closed:2008-12-10 10:53:36.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) sip_reinvite_bug.txt
Description:The story at http://www.mavetju.org/~edwin/asterisk-sip-reinvite.html
describes a problem I experienced with calls coming from one of our
providers where during the SIP handshake our equipment was reinviting
the SIP session: The RTP stream was never setup. We experienced
this after the upgrade from 1.2 to 1.4 (the latest SVN version),
before that it always has worked.

To simulate this problem, I have setup one SIP phone, three identical
Asterisk instances and a connection towards the end-point: A Cisco
Call Manager. The only varying factor in the experiments was the
option "canreinvite": When using "canreinvite=no", it always worked
fine, but when using "canreinvite=yes", it broke down after two
hops.

I have written down the whole setup, the configurations, the scenarios
and the results at http://www.mavetju.org/~edwin/c2-flow.txt.
Attached to each scenario are the SIP packets (captured with ngrep
and processed into a flow visualiser).

****** ADDITIONAL INFORMATION ******

This is happening with Asterisk SVN-branch-1.4-r79553.

I have three asterisk boxes, all with the same configuration. For
this excercise, I made three identical VMware instances so that it
wasn't the OS, or the network, or an Asterisk version inconsitency.

The first Asterisk instance has IP address 10.252.15.24.
The second Asterisk instance has IP address 10.252.15.25.
The third Asterisk instance has IP address 10.252.15.26.

The final leg of the call is the delivery on a Cisco Call Manager.
The SIP handler is "ccm-publisher.barnet.com.au", the end of the
RTP stream is "202.83.178.13", a Cisco 2821 media gateway.

The IP Address of the CCM is 10.252.11.130

This is the sip.conf of the first Asterisk instance:

   [general]
   context=default
   allowoverlap=no
   bindport=5060
   bindaddr=0.0.0.0
   srvlookup=yes
   context=from-sip
   canreinvite=yes

   [edwin]
   type=friend
   host=dynamic
   username=edwin
   secret=edwin
   context=from-sip
   canreinvite=no

   [asterisk1]
   type=friend
   host=asterisk1.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [asterisk2]
   type=friend
   host=asterisk2.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [asterisk3]
   type=friend
   host=asterisk3.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [ccm-publisher]
   type=friend
   host=ccm-publisher.barnet.com.au
   context=from-sip
   canreinvite=yes

This is the sip.conf of the second and third Asterisk instances.

   [general]
   context=default
   allowoverlap=no
   bindport=5060
   bindaddr=0.0.0.0
   srvlookup=yes
   context=from-sip
   canreinvite=yes

   [asterisk1]
   type=friend
   host=asterisk1.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [asterisk2]
   type=friend
   host=asterisk2.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [asterisk3]
   type=friend
   host=asterisk3.int.barnet.com.au
   context=from-sip
   canreinvite=yes

   [ccm-publisher]
   type=friend
   host=ccm-publisher.barnet.com.au
   context=from-sip
   canreinvite=yes

This are the extensions.conf of the three Asterisk instances:

Number one:
   [from-sip]
   exten => 71,1,Dial(SIP/9096@ccm-publisher.barnet.com.au)
   exten => 81,1,Dial(SIP/81@asterisk2.int.barnet.com.au)
   exten => 91,1,Dial(SIP/91@asterisk2.int.barnet.com.au)

Number two:
   [from-sip]
   exten => 81,1,Dial(SIP/9096@ccm-publisher.barnet.com.au)
   exten => 91,1,Dial(SIP/91@asterisk3.int.barnet.com.au)

Number three:
   [from-sip]
   exten => 91,1,Dial(SIP/9096@ccm-publisher.barnet.com.au)

And I have one SIP phone (Polycom SoundPoint IP650), which drops
the call at the first Asterisk instance. The name is "edwin".

The IP address of the SIP phone is 10.251.1.21.


Scenario A: "canreinvite=no"
----------------------------
In this main-scenario we have set "canreinvite" in the sip.conf for
all entries set to "no".

Scenera A-1: Call 71
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the CCM.

The RTP session ("debug rtp") goes from the phone via the
first Asterisk instance to the C2821.

Scenera A-2: Call 81
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the CCM.

The RTP session goes from the phone via the first Asterisk instance
to the second Asterisk instance to the C2821.

Scenera A-3: Call 91
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the third Asterisk
instance to the CCM.

The RTP session goes from the phone via the first Asterisk instance
to the second Asterisk instance to the C2821.

Scernario B: "canreinvite=yes/no"
------------------------------
In the main scenario we have set the Asterisk instances and the
ccm-publisher in the sip.conf to "yes". The value in the polycom
phone section is still no.

Scenera B-1: Call 71
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the CCM.

The RTP session ("debug rtp") goes from the phone via the
first Asterisk instance to the C2821.

Scenera B-2: Call 81
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the CCM.

The RTP session goes from the phone via the first Asterisk instance
to the C2821.

Scenera B-3: Call 91
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the third Asterisk
instance to the CCM.

There is no RTP stream setup.

Scernario C: "canreinvite=yes"
------------------------------
In the main scenario we have set the value of "canreinvite" for
everything to "yes".

Scenera C-1: Call 71
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the CCM.

The RTP session ("debug rtp") goes from the phone via the
first Asterisk instance to the C2821.

Scenera C-2: Call 81
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the CCM.

There is no RTP stream setup.

Scenera C-3: Call 91
--------------------
The SIP handshake goes from the Polycom to the first Asterisk
instance to the second Asterisk instance to the third Asterisk
instance to the CCM.

There is no RTP stream setup.
Comments:By: Edwin Groothuis (mavetju) 2007-08-17 08:51:14

According to Raj Jain (at http://lists.digium.com/pipermail/asterisk-dev/2007-August/029063.html):

You are running into a RE-INVITE "glare" scenario. The Asterisk boxes
facing each other are racing to send RE-INVITE to each other to drop
the RTP hairpin. The Asterisk 1.4 does not retransmit a RE-INVITE on
receving a 491 response. It is treating 491 as a permanent failure and
therefore dropping the call.

I don't know what changed between 1.2 and 1.4 to explain why you are
seeing this only in 1.4 and not in 1.2. Since this is a race
condition, I'd imagine that this could occur in 1.2 as well and has
just not occured in your setup yet. Maybe some additional processing
in 1.4 is causing the race condition to occur more frequently.

Your Mantis bug report would basically ask for correct 491 processing
in Asterisk SIP channel. Here is how RFC 3261 (Section 14.1)
recommends this condition should be resolved:

  If a UAC receives a 491 response to a re-INVITE, it SHOULD start a
  timer with a value T chosen as follows:

     1. If the UAC is the owner of the Call-ID of the dialog ID
        (meaning it generated the value), T has a randomly chosen value
        between 2.1 and 4 seconds in units of 10 ms.

     2. If the UAC is not the owner of the Call-ID of the dialog ID, T
        has a randomly chosen value of between 0 and 2 seconds in units
        of 10 ms.

  When the timer fires, the UAC SHOULD attempt the re-INVITE once more,
  if it still desires for that session modification to take place.  For
  example, if the call was already hung up with a BYE, the re-INVITE
  would not take place.

By: Alex Coseru (alexcos) 2007-08-17 10:57:03

I have the same problem (i think)



Huawei  IpPhone  ->  asterisk1    ->  asterisk2      ->     patton

10.36.2.85      ->    10.36.2.1    -> 10.105.177.21  ->  10.105.177.14



The RTP setup is a mess , the huawei phone is trying to send the rtp to 10.105.177.14 , and the patton expects it from 10.36.2.1
Asterisk2  seems to be the problem , but not using an reinvite..

Asterisk1 and Asterisk2 are the same , version: Asterisk 1.4.8


If you need more info , pls contact me.

I have attached a debug file.

Regards



By: Alex Coseru (alexcos) 2007-08-17 12:19:09

Related with  bugASTERISK-997037  - http://bugs.digium.com/view.php?id=10037



By: Edwin Groothuis (mavetju) 2007-08-29 20:22:28

Is there more information you require for handling this issue?

By: marvy (marvy) 2007-09-15 20:46:21

See http://svn.digium.com/view/asterisk?view=rev&revision=47418 for the reason this is handled differently in 1.4 as in 1.2. Reversing this patch seems to re-establish a working system.

By: Raj Jain (rjain) 2007-10-11 14:52:12

This bug report is a duplicate of 9431. Either 9431 or this report should be closed as duplicate (perhaps, 9431 should be closed as a duplicate of this because this report provides more details logs etc).

http://bugs.digium.com/view.php?id=9431

By: Olle Johansson (oej) 2007-11-19 13:18:22.000-0600

The issue here is really that we're sending a 491 without supporting it on the receiving end. I might have to disable the 491 totally and just refuse the offer with another code to disable it until we have proper 491 support, which isn't easy to do in Asterisk. The re-invite is forced from deep inside the RTP subsystem. If we stall it in one server - we will stall it in a range of servers and that will propably cause more havoc.

Food for thought. Thanks for a good a detailed bug report.

By: Olle Johansson (oej) 2007-12-05 11:17:19.000-0600

Ok, I've seen this too in labs.

By: Digium Subversion (svnbot) 2007-12-06 14:00:33.000-0600

Repository: asterisk
Revision: 91539

A   team/oej/reinvite-racing/

------------------------------------------------------------------------
r91539 | oej | 2007-12-06 14:00:32 -0600 (Thu, 06 Dec 2007) | 13 lines

Branch to try to track down 491 handling on Asterisk 1.4 reinvite races.

Asterisk 1.4 actually denies an INVITE if we have an outstanding
INVITE in the same call. Asterisk 1.2 happily accepts anything
regardless.

This causes a problem where the 491 on a reinvite is handled
as an error. We need to remember that there's a re-invite
and not an invite that we're dealing with and postpone it.

Related to bug ASTERISK-10105


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=91539

By: Olle Johansson (oej) 2007-12-06 14:29:38.000-0600

Trying to find a solution in the branch "reinvite-racing"

By: Olle Johansson (oej) 2007-12-07 05:09:06.000-0600

Ok, seems like the branch "reinvite-racing" solves the issue for normal calls. I need to check the impact on T.38 calls too. Please test and report back here. Thank you!

By: Olle Johansson (oej) 2007-12-10 07:48:30.000-0600

Thank you for all of the test reports! Impressive...

Going ahead and merging this with 1.4.

By: Digium Subversion (svnbot) 2007-12-10 08:02:09.000-0600

Repository: asterisk
Revision: 92158

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r92158 | oej | 2007-12-10 08:02:08 -0600 (Mon, 10 Dec 2007) | 16 lines

Avoid reinvite race situations with two Asterisks trying
to reinvite each other in 1.4 and trunk.

This patch implements support for the 491 error code that
Asterisk 1.4 generates on situations where we get an
incoming INVITE and already has one in progress.

Thanks to mavetju for reporting and to Raj Jain for an
excellent explanation of the problem.

Patch by myself. Tested with 8 Asterisk servers connected
to each other in a training network.

Closes issue ASTERISK-10105


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=92158

By: Digium Subversion (svnbot) 2007-12-10 08:07:29.000-0600

Repository: asterisk
Revision: 92159

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r92159 | oej | 2007-12-10 08:07:28 -0600 (Mon, 10 Dec 2007) | 24 lines

Merged revisions 92158 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r92158 | oej | 2007-12-10 15:04:44 +0100 (M?�95?Y?�94?un, 10 Dec 2007) | 16 lines

Avoid reinvite race situations with two Asterisks trying
to reinvite each other in 1.4 and trunk.

This patch implements support for the 491 error code that
Asterisk 1.4 generates on situations where we get an
incoming INVITE and already has one in progress.

Thanks to mavetju for reporting and to Raj Jain for an
excellent explanation of the problem.

Patch by myself. Tested with 8 Asterisk servers connected
to each other in a training network.

Closes issue ASTERISK-10105


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=92159

By: Digium Subversion (svnbot) 2007-12-11 09:47:23.000-0600

Repository: asterisk
Revision: 92303

_U  team/file/bridging/
U   team/file/bridging/Makefile
U   team/file/bridging/Makefile.moddir_rules
U   team/file/bridging/apps/Makefile
U   team/file/bridging/apps/app_queue.c
U   team/file/bridging/apps/app_voicemail.c
U   team/file/bridging/build_tools/make_version
U   team/file/bridging/build_tools/make_version_h
U   team/file/bridging/cdr/Makefile
U   team/file/bridging/channels/Makefile
U   team/file/bridging/channels/chan_sip.c
U   team/file/bridging/channels/chan_zap.c
U   team/file/bridging/codecs/Makefile
U   team/file/bridging/doc/CODING-GUIDELINES
U   team/file/bridging/doc/manager_1_1.txt
U   team/file/bridging/formats/Makefile
U   team/file/bridging/funcs/Makefile
U   team/file/bridging/include/asterisk/_private.h
U   team/file/bridging/include/asterisk/adsi.h
U   team/file/bridging/include/asterisk/ael_structs.h
U   team/file/bridging/include/asterisk/aes.h
U   team/file/bridging/include/asterisk/agi.h
U   team/file/bridging/include/asterisk/alaw.h
U   team/file/bridging/include/asterisk/app.h
U   team/file/bridging/include/asterisk/ast_expr.h
U   team/file/bridging/include/asterisk/astdb.h
U   team/file/bridging/include/asterisk/astobj2.h
U   team/file/bridging/include/asterisk/callerid.h
U   team/file/bridging/include/asterisk/causes.h
U   team/file/bridging/include/asterisk/cdr.h
U   team/file/bridging/include/asterisk/devicestate.h
U   team/file/bridging/include/asterisk/doxyref.h
U   team/file/bridging/include/asterisk/dsp.h
U   team/file/bridging/include/asterisk/event.h
U   team/file/bridging/include/asterisk/extconf.h
U   team/file/bridging/include/asterisk/frame.h
U   team/file/bridging/include/asterisk/hashtab.h
U   team/file/bridging/include/asterisk/io.h
U   team/file/bridging/include/asterisk/localtime.h
U   team/file/bridging/include/asterisk/logger.h
U   team/file/bridging/include/asterisk/mod_format.h
U   team/file/bridging/main/rtp.c
U   team/file/bridging/pbx/Makefile
U   team/file/bridging/res/Makefile
U   team/file/bridging/res/res_agi.c
U   team/file/bridging/utils/Makefile
U   team/file/bridging/utils/check_expr.c
U   team/file/bridging/utils/clicompat.c

------------------------------------------------------------------------
r92303 | file | 2007-12-11 09:47:21 -0600 (Tue, 11 Dec 2007) | 193 lines

Merged revisions 92082-92084,92103-92104,92122,92140,92159-92160,92199,92201,92203,92205-92206,92243,92267,92285 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r92082 | rizzo | 2007-12-09 23:50:38 -0400 (Sun, 09 Dec 2007) | 23 lines

Put into Makefile.moddir_rules the common instructions used to
generate loadable and embedded module lists.

Individual Makefiles now are a lot simpler, possibly as simple as this:

   -include $(ASTTOPDIR)/menuselect.makeopts $(ASTTOPDIR)/menuselect.makedeps
   MODULE_PREFIX=cdr_
   all: _all
   include $(ASTTOPDIR)/Makefile.moddir_rules

and also more flexible because in a single directory we can combine
various types of modules (app_, cdr_, func_, ... ) by simply
listing them in the MODULE_PREFIX variable.

The individual Makefiles can also create list of modules to be
excluded by listing them in the variablel MODULE_EXCLUDE (see an
example in channels/Makefile).

With this change it becomes trivial to integrate a directory with
locally created/modified sources into the main build.



................
r92083 | rizzo | 2007-12-10 00:18:07 -0400 (Mon, 10 Dec 2007) | 7 lines

Fix the detection of modules installed from this build.

You can now add the path of local module subdirs from the command line with
  make LOCAL_MOD_SUBDIRS= ....



................
r92084 | rizzo | 2007-12-10 00:38:49 -0400 (Mon, 10 Dec 2007) | 3 lines

add a bit of info on the build infrastructure


................
r92103 | rizzo | 2007-12-10 04:35:35 -0400 (Mon, 10 Dec 2007) | 2 lines

simplify this file

................
r92104 | rizzo | 2007-12-10 04:40:59 -0400 (Mon, 10 Dec 2007) | 12 lines

remove relative paths and use ASTTOPDIR instead.

Give a default value to ASTTOPDIR if unset so we can at least
do a 'make clean' without too much trouble.

The proper fix, however, is to partition the top level
Makefile in a 'setup' and a 'main' part, in a way that the
'setup' part can be included from subdirs' Makefiles and
allow targets to be built without going through the
top level Makefile.


................
r92122 | rizzo | 2007-12-10 05:00:44 -0400 (Mon, 10 Dec 2007) | 2 lines

simplify/cleanup the scripts

................
r92140 | oej | 2007-12-10 09:29:57 -0400 (Mon, 10 Dec 2007) | 8 lines

Add a few extra headers in the voicemail users listing in
manager 1.1. Update documentation too.

(closes issue ASTERISK-10996)
Reported by: caio1982
Patches:
     extra_vm_manager_info1.diff uploaded by caio1982 (license 22)

................
r92159 | oej | 2007-12-10 10:10:24 -0400 (Mon, 10 Dec 2007) | 24 lines

Merged revisions 92158 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r92158 | oej | 2007-12-10 15:04:44 +0100 (M?�95?Y?�94?un, 10 Dec 2007) | 16 lines

Avoid reinvite race situations with two Asterisks trying
to reinvite each other in 1.4 and trunk.

This patch implements support for the 491 error code that
Asterisk 1.4 generates on situations where we get an
incoming INVITE and already has one in progress.

Thanks to mavetju for reporting and to Raj Jain for an
excellent explanation of the problem.

Patch by myself. Tested with 8 Asterisk servers connected
to each other in a training network.

Closes issue ASTERISK-10105


........

................
r92160 | oej | 2007-12-10 10:18:21 -0400 (Mon, 10 Dec 2007) | 2 lines

Removing some LOG_DEBUG items

................
r92199 | file | 2007-12-10 12:07:33 -0400 (Mon, 10 Dec 2007) | 4 lines

Only send a SIGHUP if the pid is greater than -1, otherwise all PIDs greater than -1 will get the SIGHUP... and that is bad.
(closes issue ASTERISK-10962)
Reported by: alanmcmillan

................
r92201 | file | 2007-12-10 12:15:06 -0400 (Mon, 10 Dec 2007) | 11 lines

Blocked revisions 92200 via svnmerge

........
r92200 | file | 2007-12-10 12:13:43 -0400 (Mon, 10 Dec 2007) | 4 lines

It is possible for nativeformats to contain more then one codec, so print out multiple ones.
(closes issue ASTERISK-10877)
Reported by: ovi

........

................
r92203 | mmichelson | 2007-12-10 12:30:46 -0400 (Mon, 10 Dec 2007) | 15 lines

Merged revisions 92202 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r92202 | mmichelson | 2007-12-10 10:29:44 -0600 (Mon, 10 Dec 2007) | 7 lines

If there are no members in a queue, then the loop where the datastore for detecting
duplicate dialed numbers will be skipped, meaning the datastore isn't created. This means
that when we try to free it, there's a crash. This stops that crash from occurring.

(closes issue ASTERISK-10998, reported by slavon, patched by eliel)


........

................
r92205 | file | 2007-12-10 12:37:35 -0400 (Mon, 10 Dec 2007) | 14 lines

Merged revisions 92204 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r92204 | file | 2007-12-10 12:36:15 -0400 (Mon, 10 Dec 2007) | 6 lines

Add G729A as another possible payload name for G729. Some devices use this instead of G729, which is perfectly normal since the payload number itself is defined and can't be used by anything else so the name doesn't matter that much.
(closes issue ASTERISK-10985)
Reported by: revolution
Patches:
     rtp.diff uploaded by revolution (license 346)

........

................
r92206 | file | 2007-12-10 12:48:18 -0400 (Mon, 10 Dec 2007) | 4 lines

Add ast_atomic_fetchadd_int_slow to check_expr for platforms that need it.
(closes issue ASTERISK-10986)
Reported by: snuffy

................
r92243 | dbailey | 2007-12-10 16:18:25 -0400 (Mon, 10 Dec 2007) | 2 lines

Add CLI commands to dynamically set hw and sw gains

................
r92267 | oej | 2007-12-11 05:26:25 -0400 (Tue, 11 Dec 2007) | 2 lines

Doxygen updates

................
r92285 | oej | 2007-12-11 10:17:29 -0400 (Tue, 11 Dec 2007) | 2 lines

A lot of doxygen updates

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=92303