[Home]

Summary:ASTERISK-07824: Calls CANCEL'ed or BYE'ed before 180+183 result in 487 retries and timeouts
Reporter:Jeff Pyle (jpyle)Labels:
Date Opened:2006-09-26 12:17:25Date Closed:2007-10-30 10:20:15
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 487_trace.txt
( 1) 487_trace1.txt
( 2) 487test_invitestate.txt
( 3) 487test_sipp.txt
( 4) 487test_uas.txt
( 5) 487test.txt
( 6) customerbox-cap.txt
( 7) customerbox-debugs.txt
( 8) networkbox-cap.txt
( 9) networkbox-debugs.txt
Description:One Asterisk box sending another a 180+183 seems to break the second's ability to CANCEL/BYE a call properly before it is answered.

****** ADDITIONAL INFORMATION ******

There are 2 Asterisk boxes in play here; we'll call them the Network and the Customer.  The Network box is sending calls on to a carrier that sends back 180 Ringing w/ SDP.  When the Network box receives that, it relays it to the Customer box as a plain 180, then a 183 w/ SDP right after.

When the Customer box receives the 180+183, it seems to break its ability to properly CANCEL (or is it BYE?) the call before its answered.  The result is the Network box sending 487s to the customer box that go unanswered.

I don't know if the problem is in the Network or Customer side; I don't know which is mishandling something.

Network box is 1.2.12.1, Customer box is SVN-branch-1.2-r43634.
Comments:By: Serge Vecher (serge-v) 2006-09-26 12:26:15

jpyle: we need to see the debug output also. Please redo the logs as per following, paying special attention to item 2):

ollowing:
1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterik.
4) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
5) Save complete console log to file and _attach_ said file to the bug.
 i.e. asterisk -Tvvvvvdddddgc | tee /tmp/sipdebug.txt

By: Jeff Pyle (jpyle) 2006-09-26 13:12:30

My apologies.  The new debugs files should have everything necessary.

A test environment in this case is not possible because the carrier is attached to the production box, and I cannot get the Network box to send on a 180 w/ SDP, so debugs must be taken from the production box.  As such, I've manually stripped some irrelevant debug info out, including anything with res_config_mysql, cdr population, other clients' registration traffic, OPTIONS qualifies, and Executing/Goto/Macro lines.

02015 is the Polycom handset that placed the call to the Customer box, which passed it along to the Network box, which passed it to a peer called carrier-out in this case.

I think this should be it.  Let me know if I forgot anything this time.

By: Jeff Pyle (jpyle) 2006-10-02 07:32:39

Initial efforts to get the carrier to send me 183 Session Progress w/ SDP instead of 180 Ringing w/ SDP have failed.  They say they're following RFC 3261 and doing it "my way" would be violating it.  Grr.  When I read the RFC, I don't see it specified either way.

It looks like we're treading into some of the same territory that bug ASTERISK-4880 did.  The current behavior of an Asterisk UAS sending an empty 180 *and* a 183 w/ SDP to an Asterisk UAC seems to cause the error I've described above.  Would a simple solution be to send only the 183?

By: Jeff Pyle (jpyle) 2006-10-03 13:45:43

The carrier's being stubborn (imagine that).  It looks like I'm stuck with 180 w/ SDP.  But that doesn't seem to be the root cause anymore.  I changed chan_sip.c to send only a 183 when it receives a 180 w/ SDP and it's still causing the 487 Request Terminated from the UAS Asterisk to timeout to the UAC Asterisk.

Even if the Polycom behind the Customer box cancels the call before the  Customer box receives a 18x response from the Network box, the Customer box waits to send a BYE to the Network box until it receives a 18x response from the network box.  Then the Network box sends a 487 Request Terminated to the Customer box, which the customer box ignores.  It eventually times out on the Network box.  Is that because the Customer box has already distroyed the call?

Shouldn't there be a CANCEL somewhere in there?  My Polycom phone sends a CANCEL to the Customer box between the 100 trying and the 18x response.  The Customer box then sends a 487 Request Terminated, and a 200 OK.  The Polycom responds with an ACK.  The Cisco 7940 I have here does the same thing.  When I read the RFC, the way I understand it validates the Cisco and Polycom behavior.

By: mikma (mikma) 2006-10-24 10:45:32

It looks like The Network box fails to process the ACK request according to RFC 3261. The ACK request has correct CSeq number, but it's not accepted anyway.

Sep 26 13:41:33 DEBUG[2934]: chan_sip.c:11178 handle_request: **** Received ACK (6) - Command in SIP ACK
Sep 26 13:41:33 DEBUG[2934]: chan_sip.c:11182 handle_request: Ignoring too old SIP packet packet 103 (expecting >= 104)
Sep 26 13:41:33 DEBUG[2934]: chan_sip.c:11379 sipsock_read: SIP message could not be handled, bad request: 074d7ae656a856146ce5540672e1be87@10.2.1.1

By: Olle Johansson (oej) 2006-10-26 13:17:14

Wonder if this is the same From/to tag/name mix problem as in a lot of other bugs?

By: Olle Johansson (oej) 2006-11-12 13:56:45.000-0600

Please check changes in the branch mentioned in my bugnote in issue ASTERISK-7293. Thank you.

All testing helps so we can solve these issues.

By: Olle Johansson (oej) 2006-11-15 15:33:25.000-0600

Any test results? Please check out the "invitestate" branch and test, test, test.

By: Jeff Pyle (jpyle) 2006-11-20 10:00:06.000-0600

I have been around very little lately, hence no test results.  I'll have to give this a shot next week (after Thanksgiving).

By: Juan Alvarez (juanxarg) 2006-11-30 18:04:46.000-0600

Hi, I think I've seen an issue related to this.

See the logs I send. They were produced with sipp (sipp.sf.net). The scenario I prepared looks like this:

                                Messages  Retrans   Timeout   Unexpected-Msg
     INVITE ---------->         1         0         0
        100 <----------         1         0                   0
        180 <----------         1         0                   0
        183 <----------         0         0                   0
        200 <----------  E-RTD1 1         0                   0
        BYE ---------->         1         0         0
        487 <----------         1         0                   0
        ACK ---------->         1         0
      Pause [      0ms]         1                             0
        200 <----------         1         0                   0

I know it breaks the RFC, but it's for testing purposes. I've also attached the sipp logs since y think they are clearer... (see 487test*.txt)

What I've noticed is that although Asterisk has sent an 487, it still send OKs for both BYE and INVITE requests. I understand this is not an acceptable behaviour.

Am I right?

Hope it helps

Juan

By: Olle Johansson (oej) 2006-12-01 03:45:40.000-0600

Can you please try the same with the invitestate branch? I also would like to see logs not from SIPP but from Asterisk. To identify the problem within Asterisk, I need to see what's going on. Thanks.

By: Olle Johansson (oej) 2006-12-01 03:46:32.000-0600

We should send 200 OK on the bye, since that's a separate transaction. Multiple responses on the INVITE is no good.

By: Juan Alvarez (juanxarg) 2006-12-01 06:26:59.000-0600

oej, the Asterisk log is there, with debug flags and sip debug enabled. The file is 487test.txt. Unfortunately, I forgot to "prune" it, so it's the full thing since Asterisk started.

Yet another issue: I discovered this issue while forcing an Asterisk server to its limit with sipp. Apparently, under heavy load, when the UAS leg delays the response of a call, * messes the messages, responding the UAC's INVITE request with a 487. However, the call wasn't BYEd prior to the termination of the INVITE transaction. It just might be an issue with sipp's Call-Id hash function, since it's probably too simple an on high load could be repeated, but I'm not sure how to debug it. Under that load, Asterisk's logs will be enormous.

Anyway, looking at the messages, I don't think it's a problem with sipp but with Asterisk not getting the ACK for the INVITE and therefore responding as stated in my last post.

Finally, I've uploaded 487test_uas.txt that is the callee leg as seen by sipp. Shouldn't the 487 message be retransmited to the callee? Anyway, the call completes normally in this case with no retransmissions.

Let me know if I can do anything else to help!

Juan

By: Serge Vecher (serge-v) 2006-12-01 10:57:03.000-0600

Alright, to help the debugging process along, these are the instructions that will produce the Asterisk log oej is looking for. Please make sure you are testing with the branch he has mentioned.

1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterisk with the following command:
  'asterisk -Tvvvvvdddddgc | tee /tmp/verbosedebug.txt'
4) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
5) Attach verbosedebug.txt to the issue.

By: Olle Johansson (oej) 2006-12-01 11:27:38.000-0600

I think part of the problem is SIPP. I've tested myself with SIPP and have seen errors where Asterisk acts properly, but SIPP does not. The scenarios are not very complete, usually. I don't know if I can take the time to go through all 200 kb of logging - if you can pick out the particular dialogs that fail I can look at them.

By: Juan Alvarez (juanxarg) 2006-12-01 17:33:44.000-0600

oej, the file uploaded has only one call. The rest is only the log from Asterisk's start up. When i get a little free time, I'll do some more detailed testing. BTW, sipp is a test tool, the scenarios are not supposed to be a complete sip state machine, but a specific test application. I know mine is terribly bad formed, I'll try to make a better implementation and send it.

By: Olle Johansson (oej) 2006-12-02 14:35:38.000-0600

This was related to 183 Session progress.

Please test latest version of the branch of your choice
- invitestate for 1.2
- invitestate-1.4 for 1.4
- trunk for trunk

I believe this is fixed. Your urgent reply is really appreciated, so we can close this issue and merge
the branches. Thanks.

By: Juan Alvarez (juanxarg) 2006-12-04 16:13:09.000-0600

487test_invitestate.txt is Asterisk's debug for invitestate branch.
487 is no longer recieved. The BYE transaction completes OK, but Asterisk keep retransmitting 200 OK for INVITE transaction. Shouldn't the call have been destroyed already?

This scenario is not RFC-Compliant, since calling peer should send CANCEL instead of BYE in order to interrupt INVITE in progress, but I think it could happen if ACK message for INVITE gets lost and calling peer hangs up.

By: Olle Johansson (oej) 2006-12-05 11:43:32.000-0600

juanxarg: That's a different bug. Won't affect any real calls, but definitely a bug. That BYE is wrong but we are nice and honor it. The call is removed after the retransmits, at the end of your log file, but it would be nicer if we cancelled all retransmits at that point.

By: Olle Johansson (oej) 2006-12-05 11:57:04.000-0600

jpyle: can you confirm if the "invitestate" branch solves the problem you've reported in this bug report? Thanks.

By: Anthony LaMantia (alamantia) 2006-12-15 19:33:19.000-0600

jpyle: can you confirm that invite state resolves the problem per oej's request?

By: bcoppens (bcoppens) 2007-01-15 10:06:48.000-0600

Noticed the same problem when calls are CANCEL'ed (from the A-side) before 180/183. Asterisk returns a 487 followed by a 200 OK (resulting in unanserwered 487's). Have been testing with SVN-branch-1.4-r50468.
Looks like the above mentioned scenario's provoke a memory leak in the chan_sip.
Enclosed you will find the corresponding log 487_trace.txt. Please check for call-ID 6994090221311200803-1168871705.

By: bcoppens (bcoppens) 2007-01-16 04:13:42.000-0600

same behavior with invitestate-1.4 ( SVN-oej-invitestate-1.4-r51106)
See log 487_trace1.txt Call_ID 6994090221311298763-1168941506



By: Serge Vecher (serge-v) 2007-01-17 08:21:34.000-0600

since this bug is marked as 1.2.x, do you mind checking in the invitestate branch?

By: bcoppens (bcoppens) 2007-01-18 05:05:24.000-0600

Can't compile invitestate branch

make[1]: Entering directory `/usr/src/asterisk_invitestate/channels'
gcc  -pipe  -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -g3  -Iinclude -I../include -D_REENTRANT -D_GNU_SOURCE  -O6 -march=i686 -DZAPTEL_OPTIMIZATIONS         -fomit-frame-pointer  -Wno-missing-prototypes -Wno-missing-declarations -DZAPATA_PRI -DIAX_TRUNKING -DCRYPTO -fPIC    -c -o chan_sip.o chan_sip.c
chan_sip.c: In function `handle_response_invite':
chan_sip.c:9753: `SIP_CAN_BYE' undeclared (first use in this function)
chan_sip.c:9753: (Each undeclared identifier is reported only once
chan_sip.c:9753: for each function it appears in.)
make[1]: *** [chan_sip.o] Error 1
make[1]: Leaving directory `/usr/src/asterisk_invitestate/channels'
make: *** [subdirs] Error 1

By: Serge Vecher (serge-v) 2007-03-13 13:53:24

oej: looks like the invitestate branch needs some attention ...

By: bcoppens (bcoppens) 2007-03-20 05:10:06

The test I have performed with the latest 1.4 branch shows that a CANCEL instruction prior to 200-OK (as described above) does not clear the SIP channel on asterisk.
The call is released on both legs but the a "sip show channels" still responds with an active call state:

193.220.188.27   0022898550  3040a299338  00102/00000  unkn  No       Tx: ACK
193.220.188.27   0022508203  7bb2080b020  00102/00000  unkn  No       Tx: ACK
193.220.188.27   0022507223  4810992a57b  00102/00000  unkn  No       Tx: ACK


These entries remains there forever :-)

If you want me to do perform more tests with a 1.2 branch, please feel free to ask.

Same description as in 0009235, at a certain moment we run out of UDP ports.



By: Bluce Ree (tasker) 2007-03-22 18:01:34

Shouldn't these related issues be bumped up to critical given that they can cripple a high-volume production environment?

By: Bluce Ree (tasker) 2007-03-26 11:48:59

Bump ... this is a critical bug.

By: Bluce Ree (tasker) 2007-04-07 14:22:42

After 4 days:

"sip show channels - 4153 active sip channels"

Screen also full of these:

"[Apr  7 15:07:55] ERROR[15759]: rtp.c:1913 ast_rtp_new_with_bindaddr: No RTP ports remaining. Can't setup media stream for this call.
[Apr  7 15:07:55] WARNING[15759]: udptl.c:819 ast_udptl_new_with_bindaddr: No UDPTL ports remaining"

Asterisk rejects calls at this point and required a reboot.

I maintain my recommendation that this be bumped to critical.

By: Olle Johansson (oej) 2007-05-30 02:38:08

I've done a lot of changes recently. Please re-test with 1.2 from svn - latest version. Thank you!

By: Joshua C. Colp (jcolp) 2007-10-30 10:20:12

Closing as this is a 1.2 bug that has already been fixed in 1.4.