[Home]

Summary:ASTERISK-29441: Core reload making TCP endpoints go offline
Reporter:Luke Escude (lukeescude)Labels:patch
Date Opened:2021-05-20 10:50:46Date Closed:2021-05-26 11:24:18
Priority:MajorRegression?
Status:Closed/CompleteComponents:Core/PBX
Versions:16.18.0 Frequency of
Occurrence
Related
Issues:
Environment:CentOS 7 x64Attachments:( 0) ASTERISK-29441.diff
( 1) debug_log_29441.txt
Description:So I updated our Asterisk image last night from 16.14 to the newest, 16.18.

All of our endpoints use transport-tcp, and the image booted fresh just fine - Everything connected. Including outbound registrations (from Asterisk to an upstream SIP proxy)

But, every time we perform a 'core reload' all TCP endpoints disconnect and won't re-register (Asterisk begins sending the REGISTER packets over UDP instead of TCP).

The following errors appear in the console when reloading:

[May 20 15:45:52] WARNING[8794]: res_pjsip/config_transport.c:559 transport_apply: Transport 'transport-udp' is not fully reloadable, not reloading: protocol, bind, TLS, TCP, ToS, or CoS options.
[May 20 15:45:52] WARNING[8794]: res_pjsip/config_transport.c:559 transport_apply: Transport 'transport-tcp' is not fully reloadable, not reloading: protocol, bind, TLS, TCP, ToS, or CoS options.


Restarting asterisk fixes the issue, but we cannot reload.
Comments:By: Asterisk Team (asteriskteam) 2021-05-20 10:50:46.796-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Luke Escude (lukeescude) 2021-05-20 11:01:07.564-0500

Just as further clarification, Asterisk is performing OUTBOUND registrations to a proxy that uses TCP. This is the process that is failing, since for some reason it starts using UDP instead.

I am not referring to inbound devices registering to Asterisk, we don't have any of those.

By: Joshua C. Colp (jcolp) 2021-05-20 11:03:46.363-0500

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Joshua C. Colp (jcolp) 2021-05-20 11:05:03.794-0500

Configuration would also be good to see - for example, is the transport explicitly specified? Is everything stored in .conf files? What are the transport definitions?

By: Luke Escude (lukeescude) 2021-05-20 11:16:25.015-0500

Cool I'll grab debug info... Here's pjsip.conf:

{code}
[global]
send_contact_status_on_update_registration=yes
user_agent=<Customer Identifier>
endpoint_identifier_order=username,ip

[transport-tcp]
type=transport
protocol=tcp
bind=0.0.0.0
tos=104
cos=3

[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0
tos=104
cos=3

; Outbound register to a proxy:
[proxy1]
type=registration
transport=transport-tcp
outbound_auth=proxy1
server_uri=sip:<PROXY_IP>:<PROXY_PORT>
client_uri=sip:<CUSTID>@<PROXY_DNSNAME>
outbound_proxy=sip:<CUSTID>@<PROXY_IP>:<PROXY_PORT>
contact_user=<CUSTID>
expiration=120
retry_interval=2
forbidden_retry_interval=2
fatal_retry_interval=2
auth_rejection_permanent=no
max_retries=9999999

[proxy1]
type=endpoint
transport=transport-tcp
context=from-trunk
disallow=all
allow=g729
allow=ulaw
outbound_auth=proxy1
aors=proxy1
t38_udptl=no
rtp_keepalive=1
rtp_symmetric=yes
preferred_codec_only=yes
force_rport=yes
direct_media=no
rtp_timeout=20
rtp_timeout_hold=20
tos_audio=184
cos_audio=5
100rel=no

[proxy1]
type=auth
auth_type=userpass
username=<CUSTID>
password=<SIP_PASSWORD>

[proxy1]
type=aor
max_contacts=1
remove_existing=yes
contact=sip:<PROXY_IP>:<PROXY_PORT>
qualify_frequency=15
qualify_timeout=3
default_expiration=120
minimum_expiration=1
maximum_expiration=600

{code}

Besides the main issue we're discussing, if you see any wonkiness with my config, please do let me know.

By: Luke Escude (lukeescude) 2021-05-20 11:26:37.129-0500

Okay, I have figured it out.

If transport-tcp is defined in pjsip.conf BEFORE transport-udp, the issue doesn't occur when performing a reload.

But, if transport-tcp is defined AFTER transport-udp, Asterisk starts using UDP for all the TCP endpoints.

Please see the attached debug log - It begins with asterisk starting up, and everything works normally. Then I perform a core reload with TCP defined before UDP - Everything continues to work. Then I perform a second core reload with UDP defined before TCP, and you'll notice all the outbound registrations becoming dysfunctional.

So, pjsip.conf transport definition order matters.

See attached - Thanks!

By: Luke Escude (lukeescude) 2021-05-20 11:26:53.063-0500

Attaching debug log.

By: Luke Escude (lukeescude) 2021-05-20 11:29:47.894-0500

Well, never mind about transport order mattering... The order doesn't seem to matter.

By: Joshua C. Colp (jcolp) 2021-05-20 11:35:17.299-0500

Does the attached patch resolve the issue?

By: Luke Escude (lukeescude) 2021-05-20 11:38:02.341-0500

I had to bump back down to 16.14 so my team can continue with their dial plan testing, but I will recompile with this patch and test it as soon as I can!

By: Luke Escude (lukeescude) 2021-05-24 12:53:04.048-0500

Hey Joshua, I haven't compiled a patch in a long time... Apparently it doesn't like the following command in the root of the asterisk source folder:

patch -p0 < ASTERISK-29441.diff

Do I need to be in some other root directory, or do I need to use a git specific diff command?

Thanks

By: Richard Mudgett (rmudgett) 2021-05-24 12:55:24.884-0500

You need to use -p1 instead of -p0 in the patch command.

By: Luke Escude (lukeescude) 2021-05-24 13:27:11.904-0500

Thanks Richard!

Josh, it looks like your patch fixes the issue! My team is going to continue their testing of our version 2 dial plan, so if any more bugs pop up, we'll know.

By: Friendly Automation (friendly-automation) 2021-05-26 11:24:18.973-0500

Change 15936 merged by Friendly Automation:
res_pjsip: On partial transport reload also move factories.

[https://gerrit.asterisk.org/c/asterisk/+/15936|https://gerrit.asterisk.org/c/asterisk/+/15936]

By: Friendly Automation (friendly-automation) 2021-05-26 11:31:31.326-0500

Change 15949 merged by Friendly Automation:
res_pjsip: On partial transport reload also move factories.

[https://gerrit.asterisk.org/c/asterisk/+/15949|https://gerrit.asterisk.org/c/asterisk/+/15949]

By: Friendly Automation (friendly-automation) 2021-05-26 11:37:09.389-0500

Change 15935 merged by Joshua Colp:
res_pjsip: On partial transport reload also move factories.

[https://gerrit.asterisk.org/c/asterisk/+/15935|https://gerrit.asterisk.org/c/asterisk/+/15935]