[Home]

Summary:ASTERISK-17249: SIP registration message sequencing issue causes inability to register
Reporter:Ed Brundage (foxnetradio)Labels:
Date Opened:2011-01-14 11:06:44.000-0600Date Closed:2013-06-20 11:35:11
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Registration
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) case1-success.txt
( 1) case2-failure.txt
Description:We have been having problems with our SIP trunks losing connectivity, so I rolled up my sleeves and did some debugging.

At first glance, it looks as if Asterisk is ignoring SIP REGISTER OK messages from the provider.  But, after closer inspection (with all debugging and verbosity enabled), I realized it looks more like a problem with message sequencing in the SIP registration and/or registry routines.

The problem occurs at the time of trunk registration expiry (during re-registration).  However, the issue must be explained via two separate scenarios.  The two scenarios differ based on whether or not Asterisk's saved re-auth token is accepted by the provider during registration.  In the first case, where re-auth information is still valid, system operation is not overly hindered by this issue (other than by unneeded bandwidth utilization).  In the second case, however, where the provider requests an authentication challenge/response prior to registration, the issue becomes very serious.  That said, in both cases, it still looks as though there is a logic error in message sequencing.

The behavior is as follows:

Case 1)  Asterisk sends a REGISTER message, and within a few milliseconds, the provider responds with the expected OK message.  However, then Asterisk reports that the message sequence number is + 1, and drops the packet (i.e. "Ignoring out of order response 128 (expecting 127)").  This cycle repeats a number of times -- Asterisk retransmits the same REGISTER message, the provider responds appropriately with the same OK, and each time, Asterisk drops the packet with the same "out of order" complaint.  Asterisk repeats the retransmit cycle five times, gives up on retransmitting the packet, then subsequently realizes there is a REGISTER OK message with the correct (later) sequence number and completes registration successfully.

Case 2)  In the case that the provider has decided that our standing authorization tokens are no longer valid, it sends a "401 Unauthorized" challenge message in response to the REGISTER message, and again, this process repeats a number of times.  Because Asterisk thinks the message sequence is out of order, it does not reply with a response to the challenge -- it drops the message and sends another REGISTER.  Now, in the case of our provider, after sending four "401 Unauthorized" messages and not receiving a valid challenge response, it decides the UA attempting registration is fraudulent, and blocks incoming communication (i.e. via firewall) for approximately an hour.

The logs clearly indicate the correct sequence numbers are being sent and received throughout the entire message flow, so this must be a variable increment issue or something of that nature.

Of Interest:

1)  This issue only manifests on re-registration.  It does not occur on initial trunk creation (i.e. when Asterisk starts).
2)  Our provider, Broadvoice, recommends "pedantic=no" be set in sip.conf.  Setting "pedantic=yes" on our current production version of Asterisk (1.6.2.13) has no effect on this issue.  However, on our test PBX, running Asterisk 1.8.1.1, setting "pedantic=yes" resolves the issue perfectly.
Comments:By: Ed Brundage (foxnetradio) 2011-01-14 11:08:12.000-0600

Attaching log output for case one and two.  Both illustrate the sequencing issue.

By: Ed Brundage (foxnetradio) 2011-01-14 11:23:58.000-0600

I apologize for the mismatching in the version selections in the dropdowns.  We are using a layered product (Elastix) and the underlying Asterisk is of version 1.6.2.13, which was not in the list of selections.

Using another version of Asterisk in our production environment is certainly possible, but, we are doing our best to allow Elastix to control versioning of Asterisk and other products built into the solution, so we're not sure how to proceed here.  It wouldn't really be an option for us to wait until Elastix releases an Asterisk 1.8 edition.  If a patch is/becomes available, I would let Palosanto know, so it might be included in an upcoming edition.

I am also considering putting a patch together myself -- however, although I am an experienced developer (for almost two decades), this would be my first time working in the Asterisk codebase.  But, since the pedantic specifier seems to fix the problem in 1.8, this may end up being a simple backport.  Finding the sequence issue may be a little tougher.

By: Ed Brundage (foxnetradio) 2011-01-24 12:44:02.000-0600

Just FYI -- been incredibly busy the last week or so -- still trying to find some time to dig into the code.

I realize this is probably a more difficult ticket to reproduce, as it requires a specific trunk provider for testing, but I'd love ideas, if anyone has any.

Otherwise, I'll keep everyone posted...

By: roman sidler (rsidler) 2013-06-20 11:02:31.896-0500

A little bit late, but the issue is still open. We've ecountered the same problem on 1.6.1.
The situation is re-producable when the expiration time of REGISTER is set < 64*T1 (typically 32s).
Then the lookup function for existing dialogs (find_call()) picks up the previous dialog.
A dialog is identified by its "Call-ID:" and in case of Registration it's always the same value (according RFC).
pedantic=yes doesn't work correctly on REGISTER-messages.
**
I modified the 1.6.1 code that always the "From:"-tag  AND the "Call-ID" is considered.
it works but is a rather invasive code change, the 1.8. solution is much better.

By: Matt Jordan (mjordan) 2013-06-20 11:35:05.513-0500

Unfortunately, 1.6 Asterisk is no longer supported. Since this issue is already solved in Asterisk 1.8 with pedantic=yes (which is the default), I don't expect any further action will be taken on this issue.