[Home]

Summary:ASTERISK-14953: [patch] Autocreated peers not deleted when unregister explicitly, become zombies
Reporter:Kirill Katsnelson (kkm)Labels:
Date Opened:2009-10-07 04:50:12Date Closed:2011-04-27 15:54:31
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Registration
Versions:Frequency of
Occurrence
Related
Issues:
causesASTERISK-22428 [patch] SIP unregister does not fully unregister when using Realtime sip peers and Expires not 0 on 200ok
Environment:Attachments:( 0) 016033-chansip-autocreateunreg-1.6.1.12.diff
( 1) 016033-chansip-autocreateunreg-1.6.1.16.diff
( 2) 016033-chansip-autocreateunreg-1.6.2.4.diff
( 3) 016033-chansip-autocreateunreg-trunk.240319.diff
( 4) 016033-tilgman-fixed-refcount.diff
( 5) 20100425__issue16033.diff.txt
( 6) chansip-autocreate-unregfix.diff
Description:When a SIP peer is autocreated (per `autocreatepeer=yes' in sip.conf), and later explicitly un-registers, the peer record in Asterisk is never deleted.

The `sip show peer' command output indicated `Transport : UNKNOWN'. This changes the moment the peer unregisters; before that, the transport is indicated as UDP.

The peer stays in the list forever, but registration with the same name becomes impossible. Each REGISTER message is rejected by Asterisk, and the following error is printed:

[Oct  7 02:21:31] ERROR[26490]: chan_sip.c:11650 register_verify: 'UDP' is not a valid transport for 'dyn-Q-N5'. we only use 'UNKNOWN'! ending call.
[Oct  7 02:21:31] NOTICE[26490]: chan_sip.c:19958 handle_request_register: Registration from '<sip:dyn-Q-N5@192.168.0.91>' failed for '192.168.0.103' - Device not configured to use this transport type

If, however, the peer fails to register within its registration expiration time, the record is deleted as expected.

****** STEPS TO REPRODUCE ******

1. Add a line `autocreatepeer=yes' into the General section of sip.conf and issue `sip reload' CLI command.
2. Register a SIP device with a name not among static peers/users defined in sip.conf. The device used UDP transport. See that `sip show peers' command lists the peer, and `sip show peer <name>' shows Transport: UDP
3. Have the device unregister with the PBX.

Expected behavior:
- Device disappears from the `sip show peers' command output

Observed:
1. Device never goes away from that list.
2. An attempt to register with the same name fails.

****** ADDITIONAL INFORMATION ******

A device in unregistered in three places in chan/chan_sip.c:

1. Upon expiration, in the function expire_register(), near line 10665. There, if the condition `peer->selfdestruct ||    ast_test_flag(&peer->flags[1], SIP_PAGE2_RTAUTOCLEAR)' is true, the device is unlinked from 2 internal device lists. This, as I understand, is the correct behavior.

2. From CLI command `sip unregister <peer-name>'. This calls the above function.

3. From explicit unregistration by the device, in the function parse_register_contact(), near line 10903. There most of the logic of expire_register() is repeated, except the device is not unlinked. This  behavior appears incorrect to me.

I am going to work on a fix and sumbit a patch in a day or two, because correct working autoregistration is critical to me.
Comments:By: Kirill Katsnelson (kkm) 2009-10-08 00:17:55

I am attaching a patch that fixes the issue. The patch is against vanilla 1.6.1.6 release. Has been abused for a day by application developers, but within a narrowly limited usage pattern. Not tested with “realitme” configuration, for one.

By: Kirill Katsnelson (kkm) 2010-01-14 23:53:09.000-0600

Adding a patch against the latest trunk (rev 240319)

By: Olle Johansson (oej) 2010-02-17 04:20:39.000-0600

"THanks for contributing. Please stand by." And apologies for no one responding here.

By: Olle Johansson (oej) 2010-02-17 04:23:47.000-0600

We need to separate the addition of the "Cause" to the manager event from the bug fixes. And we propably need a bug fix for 1.4 too.

By: Kirill Katsnelson (kkm) 2010-02-17 18:47:12.000-0600

The 1.6.1.6 patch does not apply cleanly to neither 1.6.1.12 nor the 1.6.2 branch. I have a patch for 1.6.1.12. Will that help?

The patch against 1.6.1 is not readable. I rearranged functions in the 1.8 trunk patch with the only purpose that the diff could be understood.

I do not understand the comment about the "Cause" in the manager. I never attempted to fix the problem in 1.4.

By: Olle Johansson (oej) 2010-02-18 01:46:50.000-0600

If I understand it right, your patch adds a new header called "Cause" to an existing manager event. We can't do that in release code, only in trunk.

By: Kirill Katsnelson (kkm) 2010-02-18 09:06:30.000-0600

No, this is not the case. There are 2 slightly mutated clones of the same code currently; I rearranged them into one function unregister_peer that is called from 2 places. One place (expire_peer, if I remember correctly) has the Cause field:

- manager_event(EVENT_FLAG_SYSTEM, "PeerStatus", "ChannelType: SIP\r\nPeer: SIP/%s\r\nPeerStatus: Unregistered\r\nCause: Expired\r\n", peer->name);

and the second does not

- manager_event(EVENT_FLAG_SYSTEM, "PeerStatus", "ChannelType: SIP\r\nPeer: SIP/%s\r\nPeerStatus: Unregistered\r\n", peer->name);

The behavior is preserved in the new function through its second argument `cause':

+ if (mgr_cause)
+ manager_event(EVENT_FLAG_SYSTEM, "PeerStatus", "ChannelType: SIP\r\nPeer: SIP/%s\r\nPeerStatus: Unregistered\r\nCause: %s\r\n", peer->name, mgr_cause);
+ else
+ manager_event(EVENT_FLAG_SYSTEM, "PeerStatus", "ChannelType: SIP\r\nPeer: SIP/%s\r\nPeerStatus: Unregistered\r\n", peer->name);



By: Kirill Katsnelson (kkm) 2010-02-18 09:16:03.000-0600

Unrelated question: is there any way to keep code formatted in tickets? Can't find any information even on mantisbt.org

By: Kirill Katsnelson (kkm) 2010-02-19 21:00:10.000-0600

Uploaded patches against 1.6.1.12 (working version here), 1.6.1.16 (latest 1.6.1 - compiles, not tested), 1.6.2.4 (same).

By: Kirill Katsnelson (kkm) 2010-03-03 13:31:31.000-0600

We have upgraded to 1.6.2.4, so the patch 016033-chansip-autocreateunreg-1.6.2.4.diff above is confirmed working (in case anybody uses it already).

By: Tilghman Lesher (tilghman) 2010-04-25 12:52:59

The patch seems to be a little overcomplicated, based upon your report.  I've uploaded a patch based upon your description (for trunk), which is a lot simpler.  Does this not work for your needs?

By: Kirill Katsnelson (kkm) 2010-04-26 01:08:06

I think it should work (I'll test tomorrow to be completely sure). I just tried to collect the 2 almost identical copies of code into one function with a clear semantics. This is just my habit of eliminating such things early from my codebase, but of course I am not insisting on imposing the same policy onto that of yours. :-)

By: Kirill Katsnelson (kkm) 2010-04-30 20:41:42

I am sure you noticed that, but your patch changes event semantics, albeit a little. Hope it wont break regression or make somebody's life harder. Namely, the PeerStatus event is now sent with 'Reason: Expired' even though the registration ended on UAC's request. I tried to preserve the behavior completely in my patch -- but really I do not know how important it is too keep it fully backward-compatible.

Anyhow, the patch has been on the production machine for quite a few hours, and I do not see any ill effects. Agents coming and going, and there are no zombified registrations.

By: Kirill Katsnelson (kkm) 2010-07-31 05:06:15

*** STOP THE SHOW ***

tilghman's patch has a reference count problem (and mine likely had, too). Please do not merge. I have a reliable real-life crash scenario. I am working on a fix (there are other refcounting problems with autopeers).

By: Kirill Katsnelson (kkm) 2010-07-31 22:40:58

Refcounting problem fixed in the latest attached patch. Show must go on.

NB: This fix has not been tested separately from ASTERISK-1752774. Both are required together to fix refcounting properly. Please add a xref, I do not have the privilege.



By: ibercom (ibercom) 2011-04-22 12:58:24

This bug is related to 0018699, although it is about asterisk 1.4 and realtime.
The bug description is similar. Additional information -3- is my problem.
I will try patch 20100425__issue16033.diff.txt. I don't know about patch  016033-tilgman-fixed-refcount.diff and asterisk 1.4



By: Terry Wilson (twilson) 2011-04-26 16:08:04

This looks good to me. I've tested and it appears to work. I'll commit it.

By: Digium Subversion (svnbot) 2011-04-26 17:47:58

Repository: asterisk
Revision: 315671

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r315671 | twilson | 2011-04-26 17:47:58 -0500 (Tue, 26 Apr 2011) | 11 lines

Make sure unregistering a peer unlinks it from the peer container

Instead of mostly copying the code from expire_register, just use the function
that "does the right thing".

(closes issue ASTERISK-14953)
Reported by: kkm
Patches:
     016033-tilgman-fixed-refcount.diff uploaded by kkm (license 888)
Tested by: kkm, tilghman, twilson

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=315671

By: Digium Subversion (svnbot) 2011-04-26 17:52:26

Repository: asterisk
Revision: 315672

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r315672 | twilson | 2011-04-26 17:52:26 -0500 (Tue, 26 Apr 2011) | 18 lines

Merged revisions 315671 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r315671 | twilson | 2011-04-26 15:47:56 -0700 (Tue, 26 Apr 2011) | 11 lines
 
 Make sure unregistering a peer unlinks it from the peer container
 
 Instead of mostly copying the code from expire_register, just use the function
 that "does the right thing".
 
 (closes issue ASTERISK-14953)
 Reported by: kkm
 Patches:
       016033-tilgman-fixed-refcount.diff uploaded by kkm (license 888)
 Tested by: kkm, tilghman, twilson
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=315672

By: Digium Subversion (svnbot) 2011-04-26 17:56:20

Repository: asterisk
Revision: 315673

_U  branches/1.8/
U   branches/1.8/channels/chan_sip.c

------------------------------------------------------------------------
r315673 | twilson | 2011-04-26 17:56:20 -0500 (Tue, 26 Apr 2011) | 25 lines

Merged revisions 315672 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.6.2

................
 r315672 | twilson | 2011-04-26 15:52:25 -0700 (Tue, 26 Apr 2011) | 18 lines
 
 Merged revisions 315671 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r315671 | twilson | 2011-04-26 15:47:56 -0700 (Tue, 26 Apr 2011) | 11 lines
   
   Make sure unregistering a peer unlinks it from the peer container
   
   Instead of mostly copying the code from expire_register, just use the function
   that "does the right thing".
   
   (closes issue ASTERISK-14953)
   Reported by: kkm
   Patches:
         016033-tilgman-fixed-refcount.diff uploaded by kkm (license 888)
   Tested by: kkm, tilghman, twilson
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=315673

By: Digium Subversion (svnbot) 2011-04-26 18:10:59

Repository: asterisk
Revision: 315675

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r315675 | twilson | 2011-04-26 18:10:59 -0500 (Tue, 26 Apr 2011) | 32 lines

Merged revisions 315673 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.8

................
 r315673 | twilson | 2011-04-26 15:56:19 -0700 (Tue, 26 Apr 2011) | 25 lines
 
 Merged revisions 315672 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.6.2
 
 ................
   r315672 | twilson | 2011-04-26 15:52:25 -0700 (Tue, 26 Apr 2011) | 18 lines
   
   Merged revisions 315671 via svnmerge from
   https://origsvn.digium.com/svn/asterisk/branches/1.4
   
   ........
     r315671 | twilson | 2011-04-26 15:47:56 -0700 (Tue, 26 Apr 2011) | 11 lines
     
     Make sure unregistering a peer unlinks it from the peer container
     
     Instead of mostly copying the code from expire_register, just use the function
     that "does the right thing".
     
     (closes issue ASTERISK-14953)
     Reported by: kkm
     Patches:
           016033-tilgman-fixed-refcount.diff uploaded by kkm (license 888)
     Tested by: kkm, tilghman, twilson
   ........
 ................
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=315675

By: Digium Subversion (svnbot) 2011-04-27 15:54:31

Repository: asterisk
Revision: 315989

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r315989 | seanbright | 2011-04-27 15:54:31 -0500 (Wed, 27 Apr 2011) | 6 lines

Partial revert of r315671 which removed a logging statement and not a manager event.

Reported by ibercom in #asterisk-bugs.

(issue ASTERISK-14953)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=315989