[Home]

Summary:ASTERISK-30469: res_pjsip_pubsub: Regression for subscription shutdowns
Reporter:N A (InterLinked)Labels:
Date Opened:2023-03-15 17:06:05Date Closed:2023-04-03 08:04:15
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_pubsub
Versions:20.2.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:pjproject 2.13Attachments:
Description:ASTERISK-30325 caused a regression with subscription shutdowns in res_pjsip_pubsub.c.

On the upside, it does appear to have fixed a crash issue I was having with PJSIP which I now see was a bug in pjproject the whole time.

On the downside, when HAVE_PJSIP_EVSUB_PENDING_NOTIFY is defined, clean_sub_tree is *NEVER* called (as opposed to called in a different place as the code allegedly says it should be), so the refactoring of this code seems to be wrong or incomplete. This is confirmed by adding debugs and code that previously ran before this patch to notify a pubsub module of a subscription module is no longer invoked at all (see below). This can cause all kinds of problems in pubsub modules since they never learn a subscription no longer exists. In particular, the case I'm testing is when an endpoint sends a SUBSCRIBE with an Expires of 0 to cancel the subscription (not sure if that matters). Reverting to a version of res_pjsip_pubsub prior to this change makes things work properly.

This comment that was added recently seems suspect to me, though it doesn't elaborate specifically on which invocation is being referenced:

{noformat}
/* for pjproject <2.13, this cleanup occurs here.  For >=2.13, pubsub_on_evsub_state
  is called before pubsub_on_rx_refresh and so must be cleaned there.*/
{noformat}

If we look at the trace below, this does not seem to match up. pubsub_on_evsub_state is called before pubsub_on_rx_refresh, but it is also called AFTER it as well. So I can see that this would not be the right place to call clean_sub_tree necessarily, as you wouldn't want it firing twice for TERMINATED events. Nonetheless, it needs to be called somewhere.

A patch that resolves this issue and restores the correct shutdown behavior is attached, but further scrutiny is probably warranted.

{noformat}
[2023-03-15 17:35:32.803] DEBUG[206558]: res_pjsip_pubsub.c:3959 pubsub_on_evsub_state: evsub 0x7f4824060bf8 state TERMINATED event TSX_STATE sub_tree 0x  7f48240601e0 sub_tree state Normal
[2023-03-15 17:35:32.803] DEBUG[206558]: res_pjsip_pubsub.c:3981 pubsub_on_evsub_state: Called here2
[2023-03-15 17:35:32.803] DEBUG[206558]: res_pjsip_pubsub.c:4109 pubsub_on_rx_refresh: evsub 0x7f4824060bf8 sub_tree 0x7f48240601e0 sub_tree state Normal
[2023-03-15 17:35:32.803] DEBUG[206558]: res_pjsip_pubsub.c:4202 pubsub_on_rx_refresh: XXXXXXX else
[2023-03-15 17:35:32.803] DEBUG[206558]: res_pjsip_pubsub.c:686 subscription_persistence_update: Updating persistence for 'PolycomSCA1->PolycomSCA1'  pru  ne on boot: no
[2023-03-15 17:35:32.804] DEBUG[206558]: res_pjsip_pubsub.c:3990 pubsub_on_refresh_timeout: sub_tree 0x7f48240601e0 sub_tree state TerminatePending
[2023-03-15 17:35:32.804] DEBUG[206558]: res_pjsip_pubsub.c:686 subscription_persistence_update: Updating persistence for 'PolycomSCA1->PolycomSCA1'  pru  ne on boot: no
[2023-03-15 17:35:32.846] DEBUG[206558]: res_pjsip_pubsub.c:4235 pubsub_on_rx_refresh: Called here
[2023-03-15 17:35:32.846] DEBUG[206558]: res_pjsip_pubsub.c:4237 pubsub_on_rx_refresh: Called here2
[2023-03-15 17:35:32.846] <--- Transmitting SIP response (581 bytes) to UDP:192.168.10.106:5060 --->
[2023-03-15 17:35:32.846] SIP/2.0 200 OK
[2023-03-15 17:35:32.846] Via: SIP/2.0/UDP 192.168.10.106;rport=5060;received=192.168.10.106;branch=z9hG4bK489e9c0651EAD9AD
[2023-03-15 17:35:32.846] Call-ID: e9b6fbd1-d4bb662e-1c7a7f25@192.168.10.106
[2023-03-15 17:35:32.846] From: "401" <sip:PolycomSCA1@192.168.10.143>;tag=19AED809-557F37D6
[2023-03-15 17:35:32.846] To: <sip:PolycomSCA1@192.168.10.143>;tag=86490954-62cb-4ea1-bcc6-d1297cfe3351
[2023-03-15 17:35:32.846] CSeq: 3 SUBSCRIBE
[2023-03-15 17:35:32.846] Expires: 0
[2023-03-15 17:35:32.846] Contact: <sip:192.168.10.143:14444>
[2023-03-15 17:35:32.846] Allow: OPTIONS, REGISTER, SUBSCRIBE, NOTIFY, PUBLISH, INVITE, ACK, BYE, CANCEL, UPDATE, PRACK, MESSAGE, REFER
[2023-03-15 17:35:32.846] Supported: 100rel, timer, replaces, norefersub
[2023-03-15 17:35:32.846] Server: Asterisk
[2023-03-15 17:35:32.846] Content-Length:  0
[2023-03-15 17:35:32.846]
[2023-03-15 17:35:32.846]
[2023-03-15 17:35:32.847] <--- Transmitting SIP request (677 bytes) to UDP:192.168.10.106:5060 --->
<snip>
[2023-03-15 17:35:32.847]
[2023-03-15 17:35:32.847]
[2023-03-15 17:35:32.847] DEBUG[206558]: res_pjsip_pubsub.c:3959 pubsub_on_evsub_state: evsub 0x7f4824060bf8 state TERMINATED event USER sub_tree 0x7f482  40601e0 sub_tree state TerminateInProgress
[2023-03-15 17:35:32.847] DEBUG[206558]: res_pjsip_pubsub.c:3981 pubsub_on_evsub_state: Called here2
[2023-03-15 17:35:32.898] <--- Received SIP response (545 bytes) from UDP:192.168.10.106:5060 --->
[2023-03-15 17:35:32.898] SIP/2.0 200 OK

PJPROJECT compile time config currently running against:
PJ_CC_NAME/VER_(1,2,3)    : gcc-10.2.1
PJ_DEBUG                  : 0
PJ_EXCEPTION_USE_WIN32_SEH: 0
PJ_FUNCTIONS_ARE_INLINED  : 0
PJ_HAS_EVENT_OBJ          : 1
PJ_HAS_EXCEPTION_NAMES    : 1
PJ_HAS_FLOATING_POINT     : 1
PJ_HAS_HIGH_RES_TIMER     : 1
PJ_HAS_INT64              : 1
PJ_HAS_IPV6               : 1
PJ_HAS_PENTIUM            : 0
PJ_HAS_POOL_ALT_API       : 0
PJ_HAS_SEMAPHORE          : 1
PJ_HAS_TCP                : 1
PJ_HAS_THREADS            : 1
PJ_IOQUEUE_HAS_SAFE_UNREG : 1
PJ_IOQUEUE_MAX_HANDLES    : 5000
PJ_IS_(BIG/LITTLE)_ENDIAN : little-endian
PJ_LOG_MAX_LEVEL          : 6
PJ_LOG_MAX_SIZE           : 4000
PJ_LOG_USE_STACK_BUFFER   : 1
PJ_LOG_USE_STACK_BUFFER   : 1
PJ_MAX_EXCEPTION_ID       : 16
PJ_MAX_HOSTNAME           : 256
PJ_M_NAME                 : x86_64
PJ_OS_HAS_CHECK_STACK     : 0
PJ_OS_NAME                : x86_64-unknown-linux-gnu
PJ_POOL_DEBUG             : 0
PJ_TIMESTAMP_USE_RDTSC:   : 0
PJ_VERSION                : 2.13
ioqueue type              : epoll[0x3]
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2023-03-15 17:06:08.281-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Pavel Šidlo (lbpsidlo) 2023-03-23 02:10:00.349-0500

We are having a memory leak related to this issue. This patch helped us a lot in Asterisk 18.17. Still, according to the 'pjsip show subscriptions inbound' listing, non-expiring subscriptions sporadically appear.

By: Joshua C. Colp (jcolp) 2023-03-23 03:57:53.026-0500

Yes, the patch appears to be incomplete and introduces a different memory leak by fixing this one. [~mbradeen] tested the original patch and determined that, and is experimenting/working on a patch to fix both.

By: Michael Bradeen (mbradeen) 2023-03-29 16:40:31.272-0500

I posted a new patch/review, please let me know if this fully resolves the leaks or causes any other issues.

https://gerrit.asterisk.org/c/asterisk/+/20028

By: N A (InterLinked) 2023-03-29 17:00:33.307-0500

I just tested it and the correctness issue is resolved.
I don't have MALLOC_DEBUG on that system currently so I can't say anything immediately about leaks, though that wasn't my initial concern.
I'll test this at another site as well that gets more traffic which should expose any correctness issues.

By: Friendly Automation (friendly-automation) 2023-04-03 08:04:16.564-0500

Change 20028 merged by Friendly Automation:
res_pjsip_pubsub: subscription cleanup changes

[https://gerrit.asterisk.org/c/asterisk/+/20028|https://gerrit.asterisk.org/c/asterisk/+/20028]

By: Friendly Automation (friendly-automation) 2023-04-03 08:06:16.042-0500

Change 20018 merged by Friendly Automation:
res_pjsip_pubsub: subscription cleanup changes

[https://gerrit.asterisk.org/c/asterisk/+/20018|https://gerrit.asterisk.org/c/asterisk/+/20018]

By: Friendly Automation (friendly-automation) 2023-04-03 09:43:46.438-0500

Change 20017 merged by George Joseph:
res_pjsip_pubsub: subscription cleanup changes

[https://gerrit.asterisk.org/c/asterisk/+/20017|https://gerrit.asterisk.org/c/asterisk/+/20017]

By: Friendly Automation (friendly-automation) 2023-04-03 09:44:00.587-0500

Change 20040 merged by George Joseph:
res_pjsip_pubsub: subscription cleanup changes

[https://gerrit.asterisk.org/c/asterisk/+/20040|https://gerrit.asterisk.org/c/asterisk/+/20040]

By: Friendly Automation (friendly-automation) 2023-04-03 09:44:11.563-0500

Change 20019 merged by George Joseph:
res_pjsip_pubsub: subscription cleanup changes

[https://gerrit.asterisk.org/c/asterisk/+/20019|https://gerrit.asterisk.org/c/asterisk/+/20019]