[Home]

Summary:ASTERISK-28048: res_pjsip fails to migrate endpoint devstate from Unavailable to Not in use after restart until pjsip reload (or reregister)
Reporter:Jaco Kroon (jkroon)Labels:pjsip
Date Opened:2018-09-11 14:23:28Date Closed:2020-01-14 11:14:01.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:13.22.0 13.23.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:
Description:Directly after "core restart now" my PJSIP endpoints will report as such:

Endpoint:  100/100                                              Unavailable   0 of inf
    InAuth:  100/100
       Aor:  100                                               10
     Contact:  100/sip:100@165.16.203.126:5060            85f65a7816 Created       0.000

This will remain as-is, with no indication at the network level of attemped qualifications (core show hints will also show unavailable at this point).  Once the endpoint reREGISTERs then it starts to work for that specific endpoint, or if I execute pjsip reload.

The fact that pjsip reload fixes things seems to imply an ordering issue, so for the sake of eliminating that, here is my module load order (pjsip related):

load => chan_pjsip.so ; this one def doesn't matter, even loading it after all other modules ...

load => res_odbc.so
load => res_odbc_transaction.so
load => res_config_odbc.so

load => res_sorcery_config.so
load => res_sorcery_memory.so
load => res_sorcery_astdb.so
load => res_sorcery_realtime.so

load => res_pjproject.so
load => res_pjsip.so
load => res_pjsip_transport_management.so
load => res_pjsip_session.so
load => res_pjsip_authenticator_digest.so
load => res_pjsip_endpoint_identifier_user.so
load => res_pjsip_registrar.so
load => res_pjsip_refer.so
load => res_pjsip_nat.so
load => res_pjsip_pubsub.so
load => res_pjsip_mwi_body_generator.so
load => res_pjsip_mwi.so
load => res_pjsip_sdp_rtp.so
load => res_pjsip_header_funcs.so
load => res_pjsip_caller_id.so
load => res_pjsip_transport_websocket.so
load => res_http_websocket.so

It's always from a clean start, pjsip reload resolves the issue.  One client reported this recurring, and even waiting for extensions to re-REGISTER didn't  fix it for that particular customer, had to execute pjsip reload for him.  Eventually added this to cli.conf:

[startup_commands]
core waitfullybooted = yes
pjsip reload = yes

The former is required to ensure that pjsip reload doesn't attempt to execute too early or it doesn't actually "solve" the issue.
Comments:By: Asterisk Team (asteriskteam) 2018-09-11 14:23:30.568-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Jaco Kroon (jkroon) 2018-09-11 14:49:52.505-0500

A quick perusal of the pjsip code reveals that during pjsip initialization these two functions are called in this specific order in load_module:

ast_res_pjsip_initialize_configuration
ast_res_pjsip_init_options_handling

The latter installs a sorcery hook for (as I understand it) sending OPTIONs for qualify support.

The former calls load_all_endpoints which loads all the initial endpoints (and presumably existing contacts), thus skipping the hooks.

I would suggest the last two calls (load_all_endpoints and ast_sip_location_prune_boot_contacts) from ast_res_pjsip_initialize_configuration be split into a separate function to be called after all other subsystems of PJSIP has been initialized and registered their hooks.  It does look like there is a quite a bit of work in ast_res_pjsip_initialize_configuration that needs to happen before other subsystems can initilize, but those two calls should probably be delayed a bit.

By: Joshua C. Colp (jcolp) 2018-09-11 14:59:26.933-0500

ast_res_pjsip_init_options_handling does not just add sorcery hooks to know when things occur, it also retrieves all AORs (and contacts on them) and endpoints from the configuration itself. It uses that to set up the associations so things can be fed with the correct data. I don't know if moving things around will resolve the problem because I don't know the answer to why it is seemingly not getting all the data it should after the configuration is loaded.

By: Richard Mudgett (rmudgett) 2018-09-11 15:35:41.788-0500

Use of the allow_unqualified_fetch sorcery realtime option can block fetching the endpoints and AOR records with the new OPTIONS qualify code.  See ASTERISK-28003

By: Joshua C. Colp (jcolp) 2018-09-14 09:39:15.798-0500

This needs further information and configuration information. For example on IRC you mentioned realtime, but it's not mentioned in this post at all. As well a complete log with debug enabled would be needed to see what the OPTIONS logic is actually doing.

By: Asterisk Team (asteriskteam) 2018-09-28 12:00:01.990-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines