[Home]

Summary:ASTERISK-28003: Qualifying non-authenticated endpoints on startup
Reporter:Jason Hord (jhord)Labels:fax patch pjsip
Date Opened:2018-08-03 12:54:27Date Closed:2020-01-14 11:14:03.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:15.5.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 7.5Attachments:( 0) debug-28003-patched-aor.txt
( 1) debug-28003-pjsip-aor.txt
( 2) debug-28003-pjsip-endpoint.txt
( 3) debug-28003-realtime-only.txt
( 4) extconfig.conf
( 5) jira_asterisk_28003_debug_v15.5.0.patch
( 6) modules.conf
( 7) sorcery.conf
Description:It would appear as though something has changed after Asterisk version 15.2.2 related to manual/persistent endpoints being qualified on startup.

At the company I work for, we currently run 15.2.2 with AORs defined in pjsip.conf.  When Asterisk starts up it will create endpoints and contacts for these based on settings from our realtime database and qualify them on regular intervals.  We use this to keep ensure our outbound SIP proxies are always in a known state.

While testing upgrades to 15.4 and 15.5 I have found this to no longer be the case.  The same configuration we are using for 15.2.2 will create the endpoints but they are never qualified and the contacts always just show 'Created'.  Manually qualifying these endpoints using 'pjsip qualify $endpoint' doesn't even appear to send SIP traffic.

Is this expected behavior with 15.4+?  What is the correct way to configure static endpoints/contacts in the realtime database such that they will be qualified on startup?  Since we have a large, distributed infrastructure we would like to avoid using pjsip.conf completely.
Comments:By: Asterisk Team (asteriskteam) 2018-08-03 12:54:30.509-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Richard Mudgett (rmudgett) 2018-08-03 19:12:13.525-0500

How to configure endpoints and aors as well as where they are retrieved from hasn't changed.  There likely has been a few new options added to endpoints since v15.2.2 that you would need to update your database schema.  These would be mentioned in the CHANGES and UPGRADE.txt files.  There are Alembic scripts available (See the contrib/ast-db-manage directory in the source tree) to manage the database schema updates.

Qualifying endpoints was rewritten (ASTERISK-26806) and first released in the 15 branch in v15.5.0.

Are you sure you are having these issues in v15.4.0 too?
Are the aors and endpoints loaded?
Does the CLI "pjsip qualify $endpoint" give an error?
What is a typical endpoint and aor configuration?

By: Jason Hord (jhord) 2018-08-06 13:02:42.869-0500

Hi Richard,

Thank-you for the response.  I have been looking through the CHANGES and UPGRADE.txt files and I do see a couple of options that we might need to add but I don't think they are relevant here.  These are the 'follow_early_media_forked' and 'accept_multiple_sdp_answers' options in pjsip.conf.  I will make sure we have the schema completely up-to-date, though.

Beyond that I have answers to your individual questions below:


x) Are you sure you are having these issues in v15.4.0 too?

I did some testing this morning and 15.4.1 appears to qualify endpoints as expected using the same config from 15.2.2.  We are working multiple issues in the 15.x code and I just got my version numbers mixed up.  This appears to only be an issue with 15.5.0 as tested this morning.


x) Are the aors and endpoints loaded?

Here is output from 'pjsip list aors':
{noformat}
     Aor:  FC-NYC-PROXY                                         0
{noformat}
and 'pjsip list endpoints':
{noformat}
Endpoint:  FC-NYC-PROXY                                         Unavailable   0 of inf
{noformat}
So it looks like they are loaded, but not qualifying from my perspective.


x) Does the CLI "pjsip qualify $endpoint" give an error?

No error and it tells me that it is qualifying the endpoint.  I can run a sipgrep alongside it and I don't see any outbound SIP traffic from Asterisk.  Here is the command output:
{noformat}
*CLI> pjsip qualify FC-NYC-PROXY
Qualifying AOR 'FC-NYC-PROXY' on endpoint 'FC-NYC-PROXY'
{noformat}

x) What is a typical endpoint and aor configuration?

Sorry to be so verbose, but here are database dumps of the AOR and endpoint config for FC-NYC-PROXY as we have it configured:

AOR:
{noformat}
[FC-NYC-PROXY]
authenticate_qualify =
contact = sip:69.55.55.125:5060
default_expiration = 300
id = FC-NYC-PROXY
mailboxes =
max_contacts = 10
maximum_expiration = 1800
minimum_expiration = 300
outbound_proxy =
qualify_frequency = 25
qualify_timeout = 5
remove_existing =
support_path =
{noformat}

Endpoint:
{noformat}
[FC-NYC-PROXY]
100rel = no
aggregate_mwi =
allow = ulaw;g722
allow_subscribe = yes
allow_transfer =
aors = FC-NYC-PROXY
asymmetric_rtp_codec = no
auth =
call_group =
callerid =
callerid_privacy =
callerid_tag =
connected_line_method =
context = outside-in
cos_audio = 5
cos_video = 4
device_state_busy_at =
direct_media = no
direct_media_glare_mitigation =
direct_media_method =
disable_direct_media_on_nat =
disallow = all
dtls_ca_file = /etc/asterisk/keys/mediassl.pem
dtls_ca_path =
dtls_cert_file = /etc/asterisk/keys/mediacrt.pem
dtls_cipher =
dtls_fingerprint =
dtls_private_key = /etc/asterisk/keys/mediakey.pem
dtls_rekey =
dtls_setup = actpass
dtls_verify =
dtmf_mode = rfc4733
fax_detect = no
force_avp = no
force_rport = yes
from_domain = fluentcloud.com
from_user =
ice_support = no
id = FC-NYC-PROXY
identify_by =
inband_progress = no
incoming_mwi_mailbox =
language =
mailboxes =
max_audio_streams =
max_video_streams =
media_address =
media_encryption = no
media_encryption_optimistic = no
media_use_received_transport = yes
message_context =
moh_suggest =
mwi_from_user =
mwi_subscribe_replaces_unsolicited = 1
named_call_group =
named_pickup_group =
notify_early_inuse_ringing = yes
one_touch_recording =
outbound_auth =
outbound_proxy =
pickup_group =
preferred_codec_only = yes
record_off_feature =
record_on_feature =
redirect_method =
refer_blind_progress = no
rewrite_contact = no
rtcp_mux = no
rtp_engine =
rtp_ipv6 =
rtp_keepalive = 30
rtp_symmetric = yes
rtp_timeout = 300
rtp_timeout_hold = 0
sdp_owner = genie
sdp_session = genie
send_diversion =
send_pai = no
send_rpid = no
set_var =
srtp_tag_32 =
sub_min_expiry =
subscribe_context =
t38_udptl = yes
t38_udptl_ec = redundancy
t38_udptl_ipv6 = no
t38_udptl_maxdatagram = 176
t38_udptl_nat =
timers = no
timers_min_se =
timers_sess_expires =
tone_zone =
tos_audio = ef
tos_video = af41
transport =
trust_id_inbound =
trust_id_outbound =
use_avpf = no
use_ptime =
{noformat}

Thank-you!

By: Richard Mudgett (rmudgett) 2018-08-06 14:19:35.744-0500

There seems to be a discrepancy between the FC-NYC-PROXY AOR you dumped from the database and the CLI "pjsip list aors" output.  The max_contacts value is different.

Does the loaded AOR and endpoint config for FC-NYC-PROXY match your database?
pjsip show aor FC-NYC-PROXY
pjsip show endpoint FC-NYC-PROXY

What is the output of
pjsip show contacts
Does it have any FC-NYC-PROXY AOR contacts to qualify?

What is the output of
pjsip show scheduled_tasks


By: Jason Hord (jhord) 2018-08-07 08:57:35.792-0500

With 15.2.2 we currently define AORs in pjsip.conf.  Prior to this we were able to rely solely on the realtime database, but we needed to make this change in order to have qualification work.  The relevant section is here:

{noformat}
[FC-NYC-PROXY]
type=aor
contact=sip:69.55.55.125:5060
qualify_frequency=30
{noformat}

The output of 'pjsip show aor FC-NYC-PROXY' appears to be incorrect.  It shows different values for several settings.  I am assuming that's because it is only pulling the entry from pjsip.conf and using defaults for the values not provided.

The output of 'pjsip show endpoint FC-NYC-PROXY' does appear correct.  The values reflect the settings in the realtime database.

Here is the relevant output from 'pjsip show contacts':

{noformat}
 Contact:  FC-NYC-PROXY/sip:69.55.55.125:5060             58dc908c3c Created       0.000
{noformat}

And here is the output of 'pjsip show scheduled_tasks':

{noformat}
pjsip/options/FC-NYC-PROXY-00000042              30.000      2294 wait   2018-08-06 11:49:55  2018-08-07 06:56:53  2018-08-07 06:57:22 (   20)
{noformat}

Just looking at all of this, I'm guessing that the default max_contacts of 0 with the way we have AORs defined in pjsip.conf could be a problem.  I would still like to know if there is a way to define the AOR in the realtime database and not rely on the pjsip.conf values.

By: Richard Mudgett (rmudgett) 2018-08-07 15:56:57.059-0500

The scheduled tasks output shows that the FC-NYC-PROXY AOR is scheduled to poll the contacts.  Now we need to see some debug output as described how to collect in the link \[1] to see why it the OPTIONS requests are not going out.  You will need to turn on the pjsip message logging too.  It will reduce the extraneous clutter in the log if there is no other activity going on while collecting the debug output.  (This output should have been supplied initially to demonstrate your problem but oh well.)

\[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Jason Hord (jhord) 2018-08-08 11:14:42.864-0500

Debug logs.

realtime-only: No configuration in pjsip.conf for AORs or endpoints

pjsip-aor: With configuration for FC-NYC-PROXY AOR in pjsip.conf

pjsip-endpoint: With configuration for FC-NYC-PROXY AOR and endpoint in pjsip.conf


By: Jason Hord (jhord) 2018-08-08 11:24:57.790-0500

I have attached some debug logs.  I ran Asterisk in three different configurations.

The first is with nothing defined in pjsip.conf relying completely on the realtime database.  I can see an ODBC query to ps_endpoints for 'FC-NYC-PROXY', but nothing else.

The next was with the AOR for FC-NYC-PROXY defined in pjsip.conf.  There I see the same ODBC query but when it goes to qualify, it says no endpoint is found.

The last is with both the AOR and endpoint for FC-NYC-PROXY defined.  That one works and I can see the qualify packet being sent in the PJSIP debug.


By: Jason Hord (jhord) 2018-08-09 16:21:17.694-0500

I just wanted to update the ticket with some new information.  I have found a process to produce the behavior I'd like, but I'm a little confused on why it works the way it does.

I have completely emptied pjsip.conf in order to rely on our ODBC realtime only.  I have also modified sorcery.conf to only use memory_cache and realtime sources.

When Asterisk starts, I still have no AORs or endpoints loaded.  However, if I run 'pjsip reload qualify endpoint FC-NYC-PROXY', the AOR is created, the endpoint is created, the contact is created, and it qualifies successfully.

What is odd to me is that prior to that, if I run 'pjsip show qualify endpoint FC-NYC-PROXY', I get no output.  Running 'pjsip show qualify aor FC-NYC-PROXY' gives the following error:

{noformat}
$ asterisk -rx 'pjsip show qualify aor FC-NYC-PROXY'
Unable to retrieve aor 'FC-NYC-PROXY' qualify options
Command 'pjsip show qualify aor FC-NYC-PROXY' failed.
{noformat}

After performing the reload operation, however, I get normal output for both commands.  It would appear as though the AOR and/or endpoint aren't loaded from any source until referenced in some way.  Is that correct behavior or would this indicate a configuration problem in our realtime database?

By: Richard Mudgett (rmudgett) 2018-08-09 19:22:31.961-0500

[^jira_asterisk_28003_debug_v15.5.0.patch] - This patch adds some debugging messages when realtime asks to fetch all the endpoints.

Could you apply the patch and re-collect the debug output you did for debug-28003-pjsip-aor.txt and attach it as a new file.

Also please attach the modules.conf, sorcery.conf, and extconfig.conf files you use.

By: Richard Mudgett (rmudgett) 2018-08-09 19:43:49.791-0500

To answer your question, no that is not correct behavior as all those records were asked for when Asterisk started.

When asterisk starts, right after res_pjsip.so registers endpoint, contact, and aor option parameters with the sorcery configuration framework res_pjsip attempts to load all endpoints from the database.  When res_pjsip sets up the qualify options ping sub-module it will also attempt to get all aors and endpoints.  When chan_pjsip.so loads after res_pjsip.so it also reloads all endpoints to update their device state from invalid.  Something seems to not be ready to handle the database requests until later.

You get the 'pjsip show qualify aor FC-NYC-PROXY' error because the AORs didn't get loaded when the qualify options sub-module couldn't load the aor records.

By: Jason Hord (jhord) 2018-08-10 10:41:13.286-0500

Attached is the new debug log and config files as asked.

By: Richard Mudgett (rmudgett) 2018-08-10 11:58:52.949-0500

Ahha, here is the problem.  You are blocking the new qualify options code from fetching the endpoints and aors.
{noformat}
endpoint=realtime,ps_endpoints,allow_unqualified_fetch=no
aor=realtime,ps_aors,allow_unqualified_fetch=no
{noformat}
You need to either remove the {{allow_unqualified_fetch}} option or set it to yes.

Let me know if it fixes the problem.

By: Jason Hord (jhord) 2018-08-10 15:59:59.386-0500

Thank-you for the response, Richard.  I was able to play around with a few settings and am seeing some different behavior now.  If I remove {{allow_unqualified_fetch}} or set it to 'yes', all of the AORs and endpoints are loaded as expected.  The contacts also qualify properly and everything appears to be working as expected.

However, I'm also seeing a tremendous amount of load on the system.  The asterisk process will use 95%+ CPU continuously and never seems to drop.  I was able to attempt a test call and audio was completely choppy and lagged.

Our realtime database has over 24,000 endpoints in total and it looks like it may be loading all of these on startup.  The vast majority of these are dynamic endpoints that only need to be present when a device registers.  We need to load around 50 static endpoints, though.  I'm still a little unclear on all the options that can be set in sorcery.conf.  Is there a magical combination that will only load our static endpoints but not dynamic ones until registration?

By: Richard Mudgett (rmudgett) 2018-08-10 16:42:42.945-0500

Yes to reduce expensive database lookups the new OPTIONS code pulls in all endpoints and aors on startup as I mentioned in an earlier comment.

The new OPTIONS code was tested with 3000 endpoints in a config file.  It was able to load and run in a fraction of the time it took before the rewrite.  A few seconds vs a few minutes.

By: Joshua C. Colp (jcolp) 2018-08-14 10:12:14.484-0500

The current OPTIONS code was written for a case of a few thousand (or less) endpoints, be it in the configuration file or from realtime. 24,000 is outside of the scope of its implementation. If you'd like to contribute a change which improves this it would be welcome, but it can not cause problems for the case I originally provided. This was a problem before where changes were made to improve the realtime case and they ended up substantially (2-8X slower) reducing performance for everyone else and for the normal cases.

By: Jason Hord (jhord) 2018-08-14 12:16:34.991-0500

Joshua and Richard,

Thanks for the updates.  We have been anxious to test the new OPTIONS code because one of our servers has over 3,500 contacts registered and we have been fighting edge-case performance issues.  We don't run 24,000 endpoints on any single server and 3,500-4,000 is probably all we need.  We start seeing call quality drop off after that due to CPU contention.

Is there a document that explicitly describes the startup process for Asterisk with respect to this new patch?  I have looked over the changelog and got a sense for what changed but I'm still having trouble getting the configuration I would like.  Perhaps you could point me to a couple of source files to look over?

I'm thinking what I might try to do is add functionality to only start qualifying static endpoints.  I'm guessing these would either be endpoints with a contact specified on the AOR or perhaps just flagged in the config.  From our perspective this should allows us to quickly load the 40-50 endpoints we need at service startup and then let normal Asterisk processing proceed for dynamic endpoints as they re-register.  I'm not sure how well this would play with astdb so any opinion would be welcome.


By: Joshua C. Colp (jcolp) 2018-08-14 12:25:11.096-0500

The design of the new OPTIONS and how it works is detailed at the top of the file[1]. It doesn't dive into the loading part because it assumes that its view of the universe is consistent and not partial. The sip_options_apply_aor_configuration function[2] is what actually applies the AOR config with contacts.

[1] https://github.com/asterisk/asterisk/blob/master/res/res_pjsip/pjsip_options.c#L38
[2] https://github.com/asterisk/asterisk/blob/master/res/res_pjsip/pjsip_options.c#L1239

By: Asterisk Team (asteriskteam) 2018-08-29 12:00:01.582-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines