[Home]

Summary:ASTERISK-27408: Identify causes and fix pjsip/resolver/srv/failover/in_dialog/transport_tcp
Reporter:Corey Farrell (coreyfarrell)Labels:patch pjsip
Date Opened:2017-11-09 11:44:40.000-0600Date Closed:2017-12-14 18:21:55.000-0600
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Third-Party/pjproject
Versions:GIT 15.1.1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) 0020-sip_parser-Add-validity-checking-for-numeric-header-.patch
( 1) fails.pcapng
( 2) works.pcapng
Description:With the recent security fix for the sip_parser, 2 of the resolver unbound tests fail.  
transport_unspecified also fails but transport_udp passes.
It's possible that the patch exposed some other issue.
2 pcaps attached
Comments:By: Corey Farrell (coreyfarrell) 2017-11-09 12:17:20.489-0600

Without pjsip_find_msg seems to work for me.

By: George Joseph (gjoseph) 2017-11-10 10:59:36.265-0600

There are actually 3 root cause issues...

The 503 response XML scenarios were missing a Content-Length header which is  required for TCP connections.
Fixed in https://gerrit.asterisk.org/#/c/7181/

The pjsip_find_msg function in sip_parser was returning PJ_SUCCESS when the required Content-Length header was missing.
https://gerrit.asterisk.org/#/c/7180/

After fixing those... we still have an issue in pjsip_distributor.
When we receive a 503, we remove the dialog object from the dialog_associations container BUT we continue using the same dialog object for the new INVITE.
When we get the 180 and 200 on the new INVITE, we don't find the dialog but there's a fallback.  Since these are responses, we can get the dialog from the request.
When we get a BYE, we still don't find the dialog but there's no fallback since this is a new request so we return a 481.

We could create a new dialog for the new request but a safer alternative is to just not delete the object from the container on a 503 when failover processing is in process.




By: Corey Farrell (coreyfarrell) 2017-11-10 12:38:33.327-0600

bq. We could create a new dialog for the new request but a safer alternative is to just not delete the object from the container on a 503 when failover processing is in process.

My only concern with this is making sure we eventually clean the object.  IE if all failover options produce 503 or the rest don't respond.  I'm saying this without having looked at the code, maybe the required checks/timers are already in place to deal with this.  Just want to make sure this is considered.

By: Friendly Automation (friendly-automation) 2017-12-14 18:21:55.666-0600

Change 7546 merged by Jenkins2:
pjsip: Ignore state changes from old transactions.

[https://gerrit.asterisk.org/7546|https://gerrit.asterisk.org/7546]

By: Friendly Automation (friendly-automation) 2017-12-14 18:26:30.199-0600

Change 7543 merged by Jenkins2:
pjsip: Ignore state changes from old transactions.

[https://gerrit.asterisk.org/7543|https://gerrit.asterisk.org/7543]

By: Friendly Automation (friendly-automation) 2017-12-14 18:40:26.894-0600

Change 7544 merged by Jenkins2:
pjsip: Ignore state changes from old transactions.

[https://gerrit.asterisk.org/7544|https://gerrit.asterisk.org/7544]