Summary: | ASTERISK-10105: SIP with canreinvite=yes through multiple Asterisk instances fails | ||
Reporter: | Edwin Groothuis (mavetju) | Labels: | |
Date Opened: | 2007-08-17 08:50:04 | Date Closed: | 2008-12-10 10:53:36.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) sip_reinvite_bug.txt | |
Description: | The story at http://www.mavetju.org/~edwin/asterisk-sip-reinvite.html describes a problem I experienced with calls coming from one of our providers where during the SIP handshake our equipment was reinviting the SIP session: The RTP stream was never setup. We experienced this after the upgrade from 1.2 to 1.4 (the latest SVN version), before that it always has worked. To simulate this problem, I have setup one SIP phone, three identical Asterisk instances and a connection towards the end-point: A Cisco Call Manager. The only varying factor in the experiments was the option "canreinvite": When using "canreinvite=no", it always worked fine, but when using "canreinvite=yes", it broke down after two hops. I have written down the whole setup, the configurations, the scenarios and the results at http://www.mavetju.org/~edwin/c2-flow.txt. Attached to each scenario are the SIP packets (captured with ngrep and processed into a flow visualiser). ****** ADDITIONAL INFORMATION ****** This is happening with Asterisk SVN-branch-1.4-r79553. I have three asterisk boxes, all with the same configuration. For this excercise, I made three identical VMware instances so that it wasn't the OS, or the network, or an Asterisk version inconsitency. The first Asterisk instance has IP address 10.252.15.24. The second Asterisk instance has IP address 10.252.15.25. The third Asterisk instance has IP address 10.252.15.26. The final leg of the call is the delivery on a Cisco Call Manager. The SIP handler is "ccm-publisher.barnet.com.au", the end of the RTP stream is "202.83.178.13", a Cisco 2821 media gateway. The IP Address of the CCM is 10.252.11.130 This is the sip.conf of the first Asterisk instance: [general] context=default allowoverlap=no bindport=5060 bindaddr=0.0.0.0 srvlookup=yes context=from-sip canreinvite=yes [edwin] type=friend host=dynamic username=edwin secret=edwin context=from-sip canreinvite=no [asterisk1] type=friend host=asterisk1.int.barnet.com.au context=from-sip canreinvite=yes [asterisk2] type=friend host=asterisk2.int.barnet.com.au context=from-sip canreinvite=yes [asterisk3] type=friend host=asterisk3.int.barnet.com.au context=from-sip canreinvite=yes [ccm-publisher] type=friend host=ccm-publisher.barnet.com.au context=from-sip canreinvite=yes This is the sip.conf of the second and third Asterisk instances. [general] context=default allowoverlap=no bindport=5060 bindaddr=0.0.0.0 srvlookup=yes context=from-sip canreinvite=yes [asterisk1] type=friend host=asterisk1.int.barnet.com.au context=from-sip canreinvite=yes [asterisk2] type=friend host=asterisk2.int.barnet.com.au context=from-sip canreinvite=yes [asterisk3] type=friend host=asterisk3.int.barnet.com.au context=from-sip canreinvite=yes [ccm-publisher] type=friend host=ccm-publisher.barnet.com.au context=from-sip canreinvite=yes This are the extensions.conf of the three Asterisk instances: Number one: [from-sip] exten => 71,1,Dial(SIP/9096@ccm-publisher.barnet.com.au) exten => 81,1,Dial(SIP/81@asterisk2.int.barnet.com.au) exten => 91,1,Dial(SIP/91@asterisk2.int.barnet.com.au) Number two: [from-sip] exten => 81,1,Dial(SIP/9096@ccm-publisher.barnet.com.au) exten => 91,1,Dial(SIP/91@asterisk3.int.barnet.com.au) Number three: [from-sip] exten => 91,1,Dial(SIP/9096@ccm-publisher.barnet.com.au) And I have one SIP phone (Polycom SoundPoint IP650), which drops the call at the first Asterisk instance. The name is "edwin". The IP address of the SIP phone is 10.251.1.21. Scenario A: "canreinvite=no" ---------------------------- In this main-scenario we have set "canreinvite" in the sip.conf for all entries set to "no". Scenera A-1: Call 71 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the CCM. The RTP session ("debug rtp") goes from the phone via the first Asterisk instance to the C2821. Scenera A-2: Call 81 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the CCM. The RTP session goes from the phone via the first Asterisk instance to the second Asterisk instance to the C2821. Scenera A-3: Call 91 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the third Asterisk instance to the CCM. The RTP session goes from the phone via the first Asterisk instance to the second Asterisk instance to the C2821. Scernario B: "canreinvite=yes/no" ------------------------------ In the main scenario we have set the Asterisk instances and the ccm-publisher in the sip.conf to "yes". The value in the polycom phone section is still no. Scenera B-1: Call 71 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the CCM. The RTP session ("debug rtp") goes from the phone via the first Asterisk instance to the C2821. Scenera B-2: Call 81 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the CCM. The RTP session goes from the phone via the first Asterisk instance to the C2821. Scenera B-3: Call 91 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the third Asterisk instance to the CCM. There is no RTP stream setup. Scernario C: "canreinvite=yes" ------------------------------ In the main scenario we have set the value of "canreinvite" for everything to "yes". Scenera C-1: Call 71 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the CCM. The RTP session ("debug rtp") goes from the phone via the first Asterisk instance to the C2821. Scenera C-2: Call 81 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the CCM. There is no RTP stream setup. Scenera C-3: Call 91 -------------------- The SIP handshake goes from the Polycom to the first Asterisk instance to the second Asterisk instance to the third Asterisk instance to the CCM. There is no RTP stream setup. | ||
Comments: | By: Edwin Groothuis (mavetju) 2007-08-17 08:51:14 According to Raj Jain (at http://lists.digium.com/pipermail/asterisk-dev/2007-August/029063.html): You are running into a RE-INVITE "glare" scenario. The Asterisk boxes facing each other are racing to send RE-INVITE to each other to drop the RTP hairpin. The Asterisk 1.4 does not retransmit a RE-INVITE on receving a 491 response. It is treating 491 as a permanent failure and therefore dropping the call. I don't know what changed between 1.2 and 1.4 to explain why you are seeing this only in 1.4 and not in 1.2. Since this is a race condition, I'd imagine that this could occur in 1.2 as well and has just not occured in your setup yet. Maybe some additional processing in 1.4 is causing the race condition to occur more frequently. Your Mantis bug report would basically ask for correct 491 processing in Asterisk SIP channel. Here is how RFC 3261 (Section 14.1) recommends this condition should be resolved: If a UAC receives a 491 response to a re-INVITE, it SHOULD start a timer with a value T chosen as follows: 1. If the UAC is the owner of the Call-ID of the dialog ID (meaning it generated the value), T has a randomly chosen value between 2.1 and 4 seconds in units of 10 ms. 2. If the UAC is not the owner of the Call-ID of the dialog ID, T has a randomly chosen value of between 0 and 2 seconds in units of 10 ms. When the timer fires, the UAC SHOULD attempt the re-INVITE once more, if it still desires for that session modification to take place. For example, if the call was already hung up with a BYE, the re-INVITE would not take place. By: Alex Coseru (alexcos) 2007-08-17 10:57:03 I have the same problem (i think) Huawei IpPhone -> asterisk1 -> asterisk2 -> patton 10.36.2.85 -> 10.36.2.1 -> 10.105.177.21 -> 10.105.177.14 The RTP setup is a mess , the huawei phone is trying to send the rtp to 10.105.177.14 , and the patton expects it from 10.36.2.1 Asterisk2 seems to be the problem , but not using an reinvite.. Asterisk1 and Asterisk2 are the same , version: Asterisk 1.4.8 If you need more info , pls contact me. I have attached a debug file. Regards By: Alex Coseru (alexcos) 2007-08-17 12:19:09 Related with bugASTERISK-997037 - http://bugs.digium.com/view.php?id=10037 By: Edwin Groothuis (mavetju) 2007-08-29 20:22:28 Is there more information you require for handling this issue? By: marvy (marvy) 2007-09-15 20:46:21 See http://svn.digium.com/view/asterisk?view=rev&revision=47418 for the reason this is handled differently in 1.4 as in 1.2. Reversing this patch seems to re-establish a working system. By: Raj Jain (rjain) 2007-10-11 14:52:12 This bug report is a duplicate of 9431. Either 9431 or this report should be closed as duplicate (perhaps, 9431 should be closed as a duplicate of this because this report provides more details logs etc). http://bugs.digium.com/view.php?id=9431 By: Olle Johansson (oej) 2007-11-19 13:18:22.000-0600 The issue here is really that we're sending a 491 without supporting it on the receiving end. I might have to disable the 491 totally and just refuse the offer with another code to disable it until we have proper 491 support, which isn't easy to do in Asterisk. The re-invite is forced from deep inside the RTP subsystem. If we stall it in one server - we will stall it in a range of servers and that will propably cause more havoc. Food for thought. Thanks for a good a detailed bug report. By: Olle Johansson (oej) 2007-12-05 11:17:19.000-0600 Ok, I've seen this too in labs. By: Digium Subversion (svnbot) 2007-12-06 14:00:33.000-0600 Repository: asterisk Revision: 91539 A team/oej/reinvite-racing/ ------------------------------------------------------------------------ r91539 | oej | 2007-12-06 14:00:32 -0600 (Thu, 06 Dec 2007) | 13 lines Branch to try to track down 491 handling on Asterisk 1.4 reinvite races. Asterisk 1.4 actually denies an INVITE if we have an outstanding INVITE in the same call. Asterisk 1.2 happily accepts anything regardless. This causes a problem where the 491 on a reinvite is handled as an error. We need to remember that there's a re-invite and not an invite that we're dealing with and postpone it. Related to bug ASTERISK-10105 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=91539 By: Olle Johansson (oej) 2007-12-06 14:29:38.000-0600 Trying to find a solution in the branch "reinvite-racing" By: Olle Johansson (oej) 2007-12-07 05:09:06.000-0600 Ok, seems like the branch "reinvite-racing" solves the issue for normal calls. I need to check the impact on T.38 calls too. Please test and report back here. Thank you! By: Olle Johansson (oej) 2007-12-10 07:48:30.000-0600 Thank you for all of the test reports! Impressive... Going ahead and merging this with 1.4. By: Digium Subversion (svnbot) 2007-12-10 08:02:09.000-0600 Repository: asterisk Revision: 92158 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r92158 | oej | 2007-12-10 08:02:08 -0600 (Mon, 10 Dec 2007) | 16 lines Avoid reinvite race situations with two Asterisks trying to reinvite each other in 1.4 and trunk. This patch implements support for the 491 error code that Asterisk 1.4 generates on situations where we get an incoming INVITE and already has one in progress. Thanks to mavetju for reporting and to Raj Jain for an excellent explanation of the problem. Patch by myself. Tested with 8 Asterisk servers connected to each other in a training network. Closes issue ASTERISK-10105 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=92158 By: Digium Subversion (svnbot) 2007-12-10 08:07:29.000-0600 Repository: asterisk Revision: 92159 _U trunk/ U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r92159 | oej | 2007-12-10 08:07:28 -0600 (Mon, 10 Dec 2007) | 24 lines Merged revisions 92158 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r92158 | oej | 2007-12-10 15:04:44 +0100 (M?�95?Y?�94?un, 10 Dec 2007) | 16 lines Avoid reinvite race situations with two Asterisks trying to reinvite each other in 1.4 and trunk. This patch implements support for the 491 error code that Asterisk 1.4 generates on situations where we get an incoming INVITE and already has one in progress. Thanks to mavetju for reporting and to Raj Jain for an excellent explanation of the problem. Patch by myself. Tested with 8 Asterisk servers connected to each other in a training network. Closes issue ASTERISK-10105 ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=92159 By: Digium Subversion (svnbot) 2007-12-11 09:47:23.000-0600 Repository: asterisk Revision: 92303 _U team/file/bridging/ U team/file/bridging/Makefile U team/file/bridging/Makefile.moddir_rules U team/file/bridging/apps/Makefile U team/file/bridging/apps/app_queue.c U team/file/bridging/apps/app_voicemail.c U team/file/bridging/build_tools/make_version U team/file/bridging/build_tools/make_version_h U team/file/bridging/cdr/Makefile U team/file/bridging/channels/Makefile U team/file/bridging/channels/chan_sip.c U team/file/bridging/channels/chan_zap.c U team/file/bridging/codecs/Makefile U team/file/bridging/doc/CODING-GUIDELINES U team/file/bridging/doc/manager_1_1.txt U team/file/bridging/formats/Makefile U team/file/bridging/funcs/Makefile U team/file/bridging/include/asterisk/_private.h U team/file/bridging/include/asterisk/adsi.h U team/file/bridging/include/asterisk/ael_structs.h U team/file/bridging/include/asterisk/aes.h U team/file/bridging/include/asterisk/agi.h U team/file/bridging/include/asterisk/alaw.h U team/file/bridging/include/asterisk/app.h U team/file/bridging/include/asterisk/ast_expr.h U team/file/bridging/include/asterisk/astdb.h U team/file/bridging/include/asterisk/astobj2.h U team/file/bridging/include/asterisk/callerid.h U team/file/bridging/include/asterisk/causes.h U team/file/bridging/include/asterisk/cdr.h U team/file/bridging/include/asterisk/devicestate.h U team/file/bridging/include/asterisk/doxyref.h U team/file/bridging/include/asterisk/dsp.h U team/file/bridging/include/asterisk/event.h U team/file/bridging/include/asterisk/extconf.h U team/file/bridging/include/asterisk/frame.h U team/file/bridging/include/asterisk/hashtab.h U team/file/bridging/include/asterisk/io.h U team/file/bridging/include/asterisk/localtime.h U team/file/bridging/include/asterisk/logger.h U team/file/bridging/include/asterisk/mod_format.h U team/file/bridging/main/rtp.c U team/file/bridging/pbx/Makefile U team/file/bridging/res/Makefile U team/file/bridging/res/res_agi.c U team/file/bridging/utils/Makefile U team/file/bridging/utils/check_expr.c U team/file/bridging/utils/clicompat.c ------------------------------------------------------------------------ r92303 | file | 2007-12-11 09:47:21 -0600 (Tue, 11 Dec 2007) | 193 lines Merged revisions 92082-92084,92103-92104,92122,92140,92159-92160,92199,92201,92203,92205-92206,92243,92267,92285 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r92082 | rizzo | 2007-12-09 23:50:38 -0400 (Sun, 09 Dec 2007) | 23 lines Put into Makefile.moddir_rules the common instructions used to generate loadable and embedded module lists. Individual Makefiles now are a lot simpler, possibly as simple as this: -include $(ASTTOPDIR)/menuselect.makeopts $(ASTTOPDIR)/menuselect.makedeps MODULE_PREFIX=cdr_ all: _all include $(ASTTOPDIR)/Makefile.moddir_rules and also more flexible because in a single directory we can combine various types of modules (app_, cdr_, func_, ... ) by simply listing them in the MODULE_PREFIX variable. The individual Makefiles can also create list of modules to be excluded by listing them in the variablel MODULE_EXCLUDE (see an example in channels/Makefile). With this change it becomes trivial to integrate a directory with locally created/modified sources into the main build. ................ r92083 | rizzo | 2007-12-10 00:18:07 -0400 (Mon, 10 Dec 2007) | 7 lines Fix the detection of modules installed from this build. You can now add the path of local module subdirs from the command line with make LOCAL_MOD_SUBDIRS= .... ................ r92084 | rizzo | 2007-12-10 00:38:49 -0400 (Mon, 10 Dec 2007) | 3 lines add a bit of info on the build infrastructure ................ r92103 | rizzo | 2007-12-10 04:35:35 -0400 (Mon, 10 Dec 2007) | 2 lines simplify this file ................ r92104 | rizzo | 2007-12-10 04:40:59 -0400 (Mon, 10 Dec 2007) | 12 lines remove relative paths and use ASTTOPDIR instead. Give a default value to ASTTOPDIR if unset so we can at least do a 'make clean' without too much trouble. The proper fix, however, is to partition the top level Makefile in a 'setup' and a 'main' part, in a way that the 'setup' part can be included from subdirs' Makefiles and allow targets to be built without going through the top level Makefile. ................ r92122 | rizzo | 2007-12-10 05:00:44 -0400 (Mon, 10 Dec 2007) | 2 lines simplify/cleanup the scripts ................ r92140 | oej | 2007-12-10 09:29:57 -0400 (Mon, 10 Dec 2007) | 8 lines Add a few extra headers in the voicemail users listing in manager 1.1. Update documentation too. (closes issue ASTERISK-10996) Reported by: caio1982 Patches: extra_vm_manager_info1.diff uploaded by caio1982 (license 22) ................ r92159 | oej | 2007-12-10 10:10:24 -0400 (Mon, 10 Dec 2007) | 24 lines Merged revisions 92158 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r92158 | oej | 2007-12-10 15:04:44 +0100 (M?�95?Y?�94?un, 10 Dec 2007) | 16 lines Avoid reinvite race situations with two Asterisks trying to reinvite each other in 1.4 and trunk. This patch implements support for the 491 error code that Asterisk 1.4 generates on situations where we get an incoming INVITE and already has one in progress. Thanks to mavetju for reporting and to Raj Jain for an excellent explanation of the problem. Patch by myself. Tested with 8 Asterisk servers connected to each other in a training network. Closes issue ASTERISK-10105 ........ ................ r92160 | oej | 2007-12-10 10:18:21 -0400 (Mon, 10 Dec 2007) | 2 lines Removing some LOG_DEBUG items ................ r92199 | file | 2007-12-10 12:07:33 -0400 (Mon, 10 Dec 2007) | 4 lines Only send a SIGHUP if the pid is greater than -1, otherwise all PIDs greater than -1 will get the SIGHUP... and that is bad. (closes issue ASTERISK-10962) Reported by: alanmcmillan ................ r92201 | file | 2007-12-10 12:15:06 -0400 (Mon, 10 Dec 2007) | 11 lines Blocked revisions 92200 via svnmerge ........ r92200 | file | 2007-12-10 12:13:43 -0400 (Mon, 10 Dec 2007) | 4 lines It is possible for nativeformats to contain more then one codec, so print out multiple ones. (closes issue ASTERISK-10877) Reported by: ovi ........ ................ r92203 | mmichelson | 2007-12-10 12:30:46 -0400 (Mon, 10 Dec 2007) | 15 lines Merged revisions 92202 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r92202 | mmichelson | 2007-12-10 10:29:44 -0600 (Mon, 10 Dec 2007) | 7 lines If there are no members in a queue, then the loop where the datastore for detecting duplicate dialed numbers will be skipped, meaning the datastore isn't created. This means that when we try to free it, there's a crash. This stops that crash from occurring. (closes issue ASTERISK-10998, reported by slavon, patched by eliel) ........ ................ r92205 | file | 2007-12-10 12:37:35 -0400 (Mon, 10 Dec 2007) | 14 lines Merged revisions 92204 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r92204 | file | 2007-12-10 12:36:15 -0400 (Mon, 10 Dec 2007) | 6 lines Add G729A as another possible payload name for G729. Some devices use this instead of G729, which is perfectly normal since the payload number itself is defined and can't be used by anything else so the name doesn't matter that much. (closes issue ASTERISK-10985) Reported by: revolution Patches: rtp.diff uploaded by revolution (license 346) ........ ................ r92206 | file | 2007-12-10 12:48:18 -0400 (Mon, 10 Dec 2007) | 4 lines Add ast_atomic_fetchadd_int_slow to check_expr for platforms that need it. (closes issue ASTERISK-10986) Reported by: snuffy ................ r92243 | dbailey | 2007-12-10 16:18:25 -0400 (Mon, 10 Dec 2007) | 2 lines Add CLI commands to dynamically set hw and sw gains ................ r92267 | oej | 2007-12-11 05:26:25 -0400 (Tue, 11 Dec 2007) | 2 lines Doxygen updates ................ r92285 | oej | 2007-12-11 10:17:29 -0400 (Tue, 11 Dec 2007) | 2 lines A lot of doxygen updates ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=92303 |