[Home]

Summary:ASTERISK-29187: stasis.c: FRACK!, Failed assertion bad magic number 0x0 for object in publish_msg
Reporter:Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus))Labels:
Date Opened:2020-11-30 12:48:59.000-0600Date Closed:
Priority:MajorRegression?No
Status:Open/NewComponents:Resources/res_stasis
Versions:17.9.0 18.1.0 18.3.0 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-29331 publish_msg: FRACK!, Failed assertion bad magic number
is related toASTERISK-28931 stasis.c:1475 publish_msg: FRACK!, Failed assertion bad magic number 0x0 for object
is related toASTERISK-29296 ari: Bridge is partially invalid when it shouldn't be
Environment:Kubernetes pod running on a Container Optimized OS (GCE) instanceAttachments:( 0) ast-coredumper-files.tar.gz
Description:This is a follow up for ASTERISK-28931, which we opened some time ago. Seems like the same problem is happening again.

This problem occurs in versions as old as Asterisk 17.3.0 and is happening in every version onwards. We are currently running version 18.1.0 and the segfault still occurs.

Our Asterisk instances are crashing at least one time per day and in some cases even multiple times, with the following backtrace being displayed:

FRACK!, Failed assertion bad magic number 0x0 for object 0x7fa328020c60 (0) at line 1474 in publish_msg of stasis.c
{noformat}
#0: asterisk __ast_assert_failed()
#1: asterisk <unknown>()
#2: asterisk __ao2_ref()
#3: asterisk <unknown>()
#4: asterisk stasis_publish()
#5: asterisk bridge_topics_destroy()
#6: asterisk <unknown>()
#7: asterisk __ao2_ref()
#8: asterisk <unknown>()
#9: asterisk __ao2_ref()
#10: asterisk ast_bridge_depart()
#11: [0x7fa33f7c16c0] res_stasis.so res_stasis.c:1567 stasis_app_exec()
#12: [0x7fa33f02e310] app_stasis.so app_stasis.c:105 app_exec()
#13: asterisk pbx_exec()
#14: asterisk <unknown>()
#15: asterisk ast_spawn_extension()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: asterisk <unknown>()
#19: [0x7fa350359ea7] libpthread.so.0 pthread_create.c:478 start_thread()
#20: [0x7fa34ffb0d4f] libc.so.6 clone.S:97 __GI___clone()

26 Segmentation fault (core dumped) asterisk  -fg
{noformat}

GDB's "bt" output:
{noformat}
(gdb) bt
#0  0x000055d23a2f787a in dispatch_message (sub=0x30ec8348e5894855,
   message=0x7fa2740b6fd8, synchronous=0) at stasis.c:1335
#1  0x000055d23a2f7d63 in publish_msg (topic=0x7fa328020c60,
   message=0x7fa2740b6fd8, sync_sub=0x0) at stasis.c:1484
#2  0x000055d23a2f7df6 in stasis_publish (topic=0x7fa328020c60,
   message=0x7fa2740b6fd8) at stasis.c:1509
#3  0x000055d23a2fc765 in bridge_topics_destroy (bridge=0x7fa32807af10)
   at stasis_bridges.c:344
#4  0x000055d23a1c2afc in destroy_bridge (obj=0x7fa32807af10) at bridge.c:666
#5  0x000055d23a1b41d8 in __ao2_ref (user_data=0x7fa32807af10, delta=-1,
   tag=0x0,
   file=0x55d23a3c7960 "\r\n<title>%d %s</title>\r\n</head><body>\r\n<h1>%s</h1>\r\n<p>%s</p>\r\n<hr />\r\n%s</body></html>\r\n", line=3105,
   func=0x55d23a3c8f00 <__PRETTY_FUNCTION__.8> "playtones_generator")
   at astobj2.c:614
#6  0x000055d23a1e3ef1 in bridge_channel_destroy (obj=0x7fa274023ce0)
   at bridge_channel.c:3105
#7  0x000055d23a1b41d8 in __ao2_ref (user_data=0x7fa274023ce0, delta=-1,
   tag=0x0, file=0x55d23a3c2bd9 "", line=1984,
   func=0x55d23a3c4d60 <__PRETTY_FUNCTION__.130> "ent call progress message %d\n") at astobj2.c:614
#8  0x000055d23a1c70ac in ast_bridge_depart (chan=0x7fa26c11da60)
   at bridge.c:1984
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2020-11-30 12:49:00.939-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Diogo Hartmann (dhart) 2020-11-30 13:09:41.021-0600

Thank you Mauri for reopening this case.

Just to be clear, in fact this problem has been occurring for a long time, I believe it is from before version 17.3.0 and It never stopped occurring, we have been supporting the falls until now due to lack of time to adjust the image to generate the required core dump.

I believe that now with the core dump files and some kind help of the Asterisk community we will finally be able to eliminate this problem that has been hindering us for the past few years.

Thanks!

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-11-30 13:21:43.544-0600

Attached text files generated by ast_coredumper.

By: Kevin Harwell (kharwell) 2020-11-30 13:45:17.293-0600

From within gdb could you execute the following and show the output for the "sub" structure:
{noformat}
(gdb) p *sub
{noformat}
As well for the "topic":
{noformat}
(gdb) up
(gdb) p *topic
{noformat}
Thanks!

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-11-30 14:10:02.469-0600

{noformat}
(gdb) p *sub
Cannot access memory at address 0x30ec8348e5894855
(gdb) up
#1  0x000055d23a2f7d63 in publish_msg (topic=0x7fa328020c60, message=0x7fa2740b6fd8, sync_sub=0x0) at stasis.c:1484
1484    in stasis.c
(gdb) p *topic
$1 = {subscribers = {elems = 0x55d23a1b75bd <hash_ao2_node_destructor>, max = 0, current = 12036734452215316481}, upstream_topics = {elems = 0x7fa32804b780,
   max = 140338728017104, current = 1}, subscriber_id = 0, name = 0x0, detail = 0x2 <error: Cannot access memory at address 0x2>, creationtime = 0x35}
{noformat}

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-11-30 14:17:31.736-0600

I can also provide you the core dump if necessary.

Thank you!


By: Kevin Harwell (kharwell) 2020-11-30 17:58:25.235-0600

If you have the Asterisk log still from right before the crash please attach to this issue.

As well please attach your _pjsip.conf_, or post relevant portions from it.

By: Kevin Harwell (kharwell) 2020-11-30 18:00:31.541-0600

Also, is this easily repeatable? Or more sporadic? Like does it happen often for a given configured endpoint, or say once after so many days of running?

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-11-30 22:20:49.180-0600

It happens two or three times per day, every day. Each asterisk instance runs inside a container so the environment is consistent between then. Different instances crash randomly, it's not always the same instance or instances running on the same node etc.

Here's the contents of pjsip.conf

{noformat}
[system]
type=system
threadpool_initial_size=50
threadpool_auto_increment=10
threadpool_idle_timeout=120
threadpool_max_size=500
timer_t1=150
timer_b=9600

[global]
type=global
mwi_tps_queue_high=5000

;================================ TRANSPORTS ==
; Our primary transport definition for UDP communication behind NAT.
[transport-external]
type = transport
protocol = udp
bind = 0.0.0.0
; NAT settings
local_net = 10.0.0.0/8
external_media_address = 10.142.0.49
external_signaling_address = 10.142.0.49

[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:5061

[transport-tcp]
type = transport
protocol = tcp
bind = 0.0.0.0:5061
{noformat}

The contents of the log before the crash including the crash itself:

{noformat}
[Nov 30 13:08:27] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000b356' for lack of audio RTP activity in 15 seconds
[Nov 30 13:08:30] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cc3b' for lack of audio RTP activity in 15 seconds
[Nov 30 13:08:56] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cb1b' for lack of audio RTP activity in 15 seconds
[Nov 30 13:08:58] WARNING[23472]: res_pjsip_pubsub.c:787 subscription_get_handler_from_rdata: No registered subscribe handler for event presence.winfo from kamailio_endpoint
[Nov 30 13:10:39] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cccb' for lack of audio RTP activity in 15 seconds
[Nov 30 13:11:06] ERROR[346332]: channel.c:724 does_id_conflict: Channel Unique ID '6KZBoSNqME' already in use by channel PJSIP/kamailio_endpoint_tls-0000cbbe(0x7fa25c102160)
[Nov 30 13:11:17] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-00009b44' for lack of audio RTP activity in 15 seconds
[Nov 30 13:11:42] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000c6a6' for lack of audio RTP activity in 15 seconds
[Nov 30 13:12:20] WARNING[108989]: res_pjsip_pubsub.c:787 subscription_get_handler_from_rdata: No registered subscribe handler for event presence.winfo from kamailio_endpoint
[Nov 30 13:14:27] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cde5' for lack of audio RTP activity in 15 seconds
[Nov 30 13:16:17] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000b392' for lack of audio RTP activity in 15 seconds
[Nov 30 13:16:26] WARNING[108989]: res_pjsip_pubsub.c:787 subscription_get_handler_from_rdata: No registered subscribe handler for event presence.winfo from kamailio_endpoint
[Nov 30 13:17:08] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cc63' for lack of audio RTP activity in 15 seconds
[Nov 30 13:20:10] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cf01' for lack of audio RTP activity in 15 seconds
[Nov 30 13:20:16] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cf64' for lack of audio RTP activity in 15 seconds
[Nov 30 13:20:44] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cfb9' for lack of audio RTP activity in 15 seconds
[Nov 30 13:21:21] WARNING[23472]: res_pjsip_pubsub.c:787 subscription_get_handler_from_rdata: No registered subscribe handler for event presence.winfo from kamailio_endpoint
[Nov 30 13:22:00] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000cfe3' for lack of audio RTP activity in 15 seconds
[Nov 30 13:22:07] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000c5f2' for lack of audio RTP activity in 15 seconds
[Nov 30 13:22:39] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000ced5' for lack of audio RTP activity in 15 seconds
[Nov 30 13:23:23] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-000099b1' for lack of audio RTP activity in 15 seconds
[Nov 30 13:26:08] NOTICE[131]: res_pjsip_sdp_rtp.c:149 rtp_check_timeout: Disconnecting channel 'PJSIP/kamailio_endpoint_tls-0000d024' for lack of audio RTP activity in 15 seconds
[Nov 30 13:26:55] WARNING[350449][C-0000cd30]: res_rtp_asterisk.c:2932 dtls_srtp_setup: Fingerprint provided by remote party does not match that of peer certificate on RTP instance '0x7fa26c2daf30'
[Nov 30 13:26:55] WARNING[350449][C-0000cd30]: res_rtp_asterisk.c:7422 ast_rtp_read: RTP Read error: Unspecified.  Hanging up.
[Nov 30 13:26:55] WARNING[350449][C-0000cd30]: res_stasis_playback.c:276 playback_final_update: 10.8.9.18-1606753614.53586: Playback failed for sound:login
FRACK!, Failed assertion bad magic number 0x0 for object 0x7fa328020c60 (0) at line 1474 in publish_msg of stasis.c
[Nov 30 13:27:27] ERROR[350426][C-0000cd2c]: stasis.c:1474 publish_msg: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fa328020c60 (0)
[Nov 30 13:27:27] ERROR[350426][C-0000cd2c]:   Got 21 backtrace records
FRACK!, Failed assertion bad magic number 0x0 for object 0x7fa328020c60 (0) at line 1478 in publish_msg of stasis.c
# 0: asterisk __ast_assert_failed()
# 1: asterisk <unknown>()
# 2: asterisk __ao2_ref()
# 3: asterisk <unknown>()
# 4: asterisk stasis_publish()
# 5: asterisk bridge_topics_destroy()
# 6: asterisk <unknown>()
# 7: asterisk __ao2_ref()
# 8: asterisk <unknown>()
# 9: asterisk __ao2_ref()
#10: asterisk ast_bridge_depart()
#11: [0x7fa33f7c16c0] res_stasis.so res_stasis.c:1567 stasis_app_exec()
#12: [0x7fa33f02e310] app_stasis.so app_stasis.c:105 app_exec()
#13: asterisk pbx_exec()
#14: asterisk <unknown>()
#15: asterisk ast_spawn_extension()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: asterisk <unknown>()
#19: [0x7fa350359ea7] libpthread.so.0 pthread_create.c:478 start_thread()
#20: [0x7fa34ffb0d4f] libc.so.6 clone.S:97 __GI___clone()
[Nov 30 13:27:27] ERROR[350426][C-0000cd2c]: stasis.c:1478 publish_msg: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fa328020c60 (0)
[Nov 30 13:27:27] ERROR[350426][C-0000cd2c]:   Got 21 backtrace records
# 0: asterisk __ast_assert_failed()
# 1: asterisk <unknown>()
# 2: asterisk __ao2_lock()
# 3: asterisk <unknown>()
# 4: asterisk stasis_publish()
# 5: asterisk bridge_topics_destroy()
# 6: asterisk <unknown>()
# 7: asterisk __ao2_ref()
# 8: asterisk <unknown>()
# 9: asterisk __ao2_ref()
#10: asterisk ast_bridge_depart()
#11: [0x7fa33f7c16c0] res_stasis.so res_stasis.c:1567 stasis_app_exec()
#12: [0x7fa33f02e310] app_stasis.so app_stasis.c:105 app_exec()
#13: asterisk pbx_exec()
#14: asterisk <unknown>()
#15: asterisk ast_spawn_extension()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: asterisk <unknown>()
#19: [0x7fa350359ea7] libpthread.so.0 pthread_create.c:478 start_thread()
#20: [0x7fa34ffb0d4f] libc.so.6 clone.S:97 __GI___clone()
/usr/bin/entrypoint.sh: line 104:    26 Segmentation fault      (core dumped) asterisk -fg
{noformat}

If you need more info please let me know.

By: Kevin Harwell (kharwell) 2020-12-03 11:27:15.339-0600

{quote}I can also provide you the core dump if necessary.{quote}
Can't hurt to have it in this case.

{panel:title=Private Submission of Information Disclaimer}You have indicated that you wish to submit unredacted information privately. It is not recommended to do this as it will substantially restrict the number of individuals who can help with your issue, as submitted information is only available to Sangoma. Note that submission of such information does not change the priorization of this issue. If you still wish to proceed you may do so by sending it to asteriskteam@digium.com with the issue number in the subject. For large files please send a link where they can then be downloaded. By sending this information you agree to the Website Terms of Use available on the Sangoma website at https://www.sangoma.com/legal/. Any exchange of private information between you and an Asterisk community member outside of the Asterisk JIRA is not subject to the Website Terms of Use and should be privately discussed between yourself and the Asterisk community member.{panel}

Please execute the following to get the needed files [1]:
{noformat}
$ /var/lib/asterisk/scripts/ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump>
{noformat}

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-Reportingcrashesanddeadlocks

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-03 15:14:40.334-0600

I already have the core dumps in this format, we have a routine to upload the core dumps to a google storage bucket.

Do I need to upload as an attachment to this issue or share a private link with you?


By: Kevin Harwell (kharwell) 2020-12-03 15:40:54.527-0600

Up to you. The file is usually too large to attach here, but if you don't mind the data and/or google link being public then you can supply here on the issue.

That said if you'd rather it be private you can email a link for download to asteriskteam@digium.com with this issue number as the subject.

Thanks!

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-04 05:10:41.113-0600

I've sent an email with a link to the core dump tarball.

I hope you have enough info to identify the issue, our service is highly impacted by this bug.

If that helps, we have an AMI+ARI integration constantly listening for the following events:

{noformat}
agent-login
connection
connection-answered
connection-spy
{noformat}

I dunno but this may be related.

Thanks!

By: Joshua C. Colp (jcolp) 2020-12-04 05:15:20.196-0600

Just to set expectations - there is no timeline on when this issue would get looked into.

By: Kevin Harwell (kharwell) 2020-12-07 13:35:10.681-0600

What's OS (and version number) that the container is running?

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-07 13:59:16.441-0600

We use a modified version of this image https://github.com/lucasvs/docker-asterisk

It's running Debian 11 (Bullseye).

By: Lucas Souza (lucasvs) 2021-05-27 14:26:13.076-0500

I found a solution, in my code I was doing a POST ARI to delete the channel (channel is on the bridge) and another to delete the bridge synchronously. so to fix it I put a 5 second sleep after deleting the channel and the FRACK crash stopped.

I think there is a race condition on the asterisk when the delete channel (channel is on the bridge) and the delete bridge run synchronously.

By: Lucas Souza (lucasvs) 2021-07-13 08:03:13.499-0500

Just to update, one month without FRACK crashes after applying the above solution. We've been testing 18.5.0 for a week and no FRACK too.