[Home]

Summary:ASTERISK-29116: Asterisk hang in pbx.c on thread exit
Reporter:Dennis Haney (CineCine)Labels:
Date Opened:2020-10-09 04:04:27Date Closed:2020-11-11 12:00:01.000-0600
Priority:MinorRegression?Yes
Status:Closed/CompleteComponents:PBX/General
Versions:17.5.1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) core-asterisk-running-2020-10-26T16-04-34+0100.gz
Description:I have hundreds of threads stuck with this stack trace, which eventually causes the server to run out of resources and is unable to make further calls.

I dont see any other threads with dl_open in the stack that isnt this, nor any other threads with pbx_thread that isnt stuck like this.

FYI, we are stuck on chan_sip if that makes a difference.
{noformat}
Thread 686 (Thread 0x7f863d3d6700 (LWP 14068)):
#0  0x00007f867b73129c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f867b72a7d1 in __GI___pthread_mutex_lock (mutex=0x7f867c080968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:115
       id = 14068
       type = <optimized out>
       __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
       id = <optimized out>
#2  0x00007f867c06ab56 in _dl_open (file=0x7f867b735a0f "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7f867b732c7b <pthread_cancel_init+43>, nsid=-2, argc=1, argv=0x7ffda
dfe0b08, env=0x563a65927300) at dl-open.c:548
       args = {file = 0x0, mode = -1, caller_dlopen = 0x0, map = 0x563a64441e93 <ast_unref_namedgroups+35>, nsid = 140215383486384, argc = 1682268464, argv = 0x7f86701fefb0,
env = 0x7f86701fefb0}
       exception = {objname = 0x563a646873e0 <change_pending> "376036", errstring = 0x563a645d5c30 "devicestate.c", message_buffer = 0x7f86701fef98 "340263Cd:V"}
       errcode = <optimized out>
       __PRETTY_FUNCTION__ = "_dl_open"
#3  0x00007f867b33906d in do_dlopen (ptr=ptr@entry=0x7f863d3d5d40) at dl-libc.c:96
       args = 0x7f863d3d5d40
#4  0x00007f867b339b2f in __GI__dl_catch_exception (exception=exception@entry=0x7f863d3d5cc0, operate=operate@entry=0x7f867b339030 <do_dlopen>, args=args@entry=0x7f863d3d5d40)
at dl-error-skeleton.c:196
       errcode = 22074
       c = {exception = 0x7f863d3d5cc0, errcode = 0x7f863d3d5bcc, env = {{__jmpbuf = {0, -5657367600969769489, 140214529776920, 140214527728383, 140214529779456, 140215116599
168, -5657367600885883409, -5657521440193339921}, __mask_was_saved = 0, __saved_mask = {__val = {140215383488960, 0, 94808791407956, 140215383486384, 94808791900464, 140215383
488960, 140215383486384, 0, 140215383488960, 0, 0, 140214529776828, 94808790813846, 140214529776832, 0, 0}}}}}
       old = <optimized out>
#5  0x00007f867b339bbf in __GI__dl_catch_error (objname=objname@entry=0x7f863d3d5d18, errstring=errstring@entry=0x7f863d3d5d20, mallocedp=mallocedp@entry=0x7f863d3d5d17, opera
te=operate@entry=0x7f867b339030 <do_dlopen>, args=args@entry=0x7f863d3d5d40) at dl-error-skeleton.c:215
       exception = {objname = 0x563a645f2200 <__PRETTY_FUNCTION__.17003+32> "he", errstring = 0xd2646d2b5cddd100 <error: Cannot access memory at address 0xd2646d2b5cddd100>,
message_buffer = 0x20 <error: Cannot access memory at address 0x20>}
       errorcode = <optimized out>
#6  0x00007f867b339147 in dlerror_run (operate=operate@entry=0x7f867b339030 <do_dlopen>, args=args@entry=0x7f863d3d5d40) at dl-libc.c:46
       objname = 0x563a645e0713 "pbx.c"
       last_errstring = 0x0
       malloced = false
       result = <optimized out>
#7  0x00007f867b3391d6 in __GI___libc_dlopen_mode (name=name@entry=0x7f867b735a0f "libgcc_s.so.1", mode=mode@entry=-2147483646) at dl-libc.c:195
       args = {name = 0x7f867b735a0f "libgcc_s.so.1", mode = -2147483646, caller_dlopen = 0x7f867b732c7b <pthread_cancel_init+43>, map = 0x7f8674000020}
#8  0x00007f867b732c7b in pthread_cancel_init () at ../sysdeps/nptl/unwind-forcedunwind.c:53
       resume = <optimized out>
       personality = <optimized out>
       forcedunwind = <optimized out>
       getcfa = <optimized out>
       handle = <optimized out>
#9  0x00007f867b732e94 in _Unwind_ForcedUnwind (exc=0x7f863d3d6d70, stop=0x7f867b730f30 <unwind_stop>, stop_argument=0x7f863d3d5e30) at ../sysdeps/nptl/unwind-forcedunwind.c:127
       forcedunwind = <optimized out>
#10 0x00007f867b7310a0 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
       ibuf = <optimized out>
       self = <optimized out>
#11 0x00007f867b7291f5 in __do_cancel () at pthreadP.h:305
#12 0x00007f867b7291f5 in __pthread_exit (value=value@entry=0x0) at pthread_exit.c:28
#13 0x0000563a644c59f1 in pbx_thread (data=data@entry=0x7f86701fefb0) at pbx.c:4698
       c = 0x7f86701fefb0
#14 0x0000563a64542fec in dummy_start (data=<optimized out>) at utils.c:1249
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140215317300000, 5652163074533400047, 140214527728382, 140214527728383, 140214529779456, 140215116599168, -5657367601009615377, -2160517297729790481}, __mask_was_saved = 0}}, __pad = {0x7f863d3d5ed0, 0x0, 0x7f863d3d6700, 0x7c000}}
       __cancel_arg = 0x7f863d3d6700
       __not_first_call = <optimized out>
       ret = <optimized out>
       a = {start_routine = 0x563a644c5980 <pbx_thread>, data = 0x7f86701fefb0, name = <optimized out>}
       __PRETTY_FUNCTION__ = "dummy_start"
#15 0x00007f867b727fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
       ret = <optimized out>
       pd = <optimized out>
       now = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140214529779456, 5652163074533400047, 140214527728382, 140214527728383, 140214529779456, 140215116599168, -5657367600982352401, -5657520887340556817}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = <optimized out>
#16 0x00007f867b2fe4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2020-10-09 04:04:28.508-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Sean Bright (seanbright) 2020-10-09 15:26:03.181-0500

When did this start? Were you running a different version of Asterisk before that was not having this problem?

By: Dennis Haney (CineCine) 2020-10-11 23:41:21.625-0500

Not sure, it may have been there for a while.
We copied this VM from another hosting environment, but that is less busy, so the daily reboot may have hidden the problem.
I can reproduce it with the following:

{noformat}
 == Using SIP RTP CoS mark 5
   -- Called TESTUSER
   -- SIP/TESTUSER-00000003 is ringing
   -- SIP/TESTUSER-00000003 is ringing
   -- SIP/TESTUSER-00000003 answered
   -- Executing [4111239@MeetMeInvite:1] Answer("SIP/TESTUSER-00000003", "") in new stack
   -- Executing [4111239@MeetMeInvite:2] Set("SIP/TESTUSER-00000003", "CONFBRIDGE(user,quiet)=no") in new stack
   -- Executing [4111239@MeetMeInvite:3] Set("SIP/TESTUSER-00000003", "CONFBRIDGE(user,marked)=yes") in new stack
   -- Executing [4111239@MeetMeInvite:4] Set("SIP/TESTUSER-00000003", "CONFBRIDGE(user,music_on_hold_when_empty)=yes") in new stack
   -- Executing [4111239@MeetMeInvite:5] Set("SIP/TESTUSER-00000003", "CONFBRIDGE(user,dtmf_passthrough)=yes") in new stack
   -- Executing [4111239@MeetMeInvite:6] Set("SIP/TESTUSER-00000003", "CONFBRIDGE(user,dsp_drop_silence)=yes") in new stack
   -- Executing [4111239@MeetMeInvite:7] UserEvent("SIP/TESTUSER-00000003", "EnteredMeetMe, Channel: SIP/TESTUSER-00000003, MeetMeRoom: 4111239") in new stack
   -- Executing [4111239@MeetMeInvite:8] ConfBridge("SIP/TESTUSER-00000003", "4111239") in new stack
   -- Channel CBAnn/4111239-00000002;2 joined 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- Started music on hold, class 'default', on channel 'SIP/TESTUSER-00000003'
   -- Channel SIP/TESTUSER-00000003 joined 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- <CBAnn/4111239-00000002;1> Playing 'beep.gsm' (language 'en')
   -- Called PRIVACY@USRCall/n
   -- Executing [PRIVACY@USRCall:1] Set("Local/PRIVACY@USRCall-00000003;2", "USR=PRIVACY") in new stack
   -- Executing [PRIVACY@USRCall:2] Verbose("Local/PRIVACY@USRCall-00000003;2", "0,USRCall") in new stack
USRCall
   -- Executing [PRIVACY@USRCall:3] Verbose("Local/PRIVACY@USRCall-00000003;2", "0,") in new stack

   -- Executing [PRIVACY@USRCall:4] Dial("Local/PRIVACY@USRCall-00000003;2", "SIP/mysipprovider-out/00PRIVACY,30") in new stack
 == Using SIP RTP CoS mark 5
   -- Called SIP/mysipprovider-out/00PRIVACY
   -- SIP/mysipprovider-out-00000004 is ringing
   -- Local/PRIVACY@USRCall-00000003;1 is ringing
   -- SIP/mysipprovider-out-00000004 answered Local/PRIVACY@USRCall-00000003;2
   -- Local/PRIVACY@USRCall-00000003;1 answered
   -- Executing [4111239@MeetMeUSR:1] Answer("Local/PRIVACY@USRCall-00000003;1", "") in new stack
   -- Executing [4111239@MeetMeUSR:2] Set("Local/PRIVACY@USRCall-00000003;1", "CONFBRIDGE(user,quiet)=yes") in new stack
   -- Executing [4111239@MeetMeUSR:3] Set("Local/PRIVACY@USRCall-00000003;1", "CONFBRIDGE(user,end_marked)=yes") in new stack
   -- Executing [4111239@MeetMeUSR:4] Set("Local/PRIVACY@USRCall-00000003;1", "CONFBRIDGE(user,dtmf_passthrough)=yes") in new stack
   -- Executing [4111239@MeetMeUSR:5] Set("Local/PRIVACY@USRCall-00000003;1", "CONFBRIDGE(user,dsp_drop_silence)=yes") in new stack
   -- Executing [4111239@MeetMeUSR:6] UserEvent("Local/PRIVACY@USRCall-00000003;1", "USREnteredMeetMe, CreatorChannel: Local/PRIVACY@USRCall-00000003;1, Channel: SIP/TESTUSER-00000003, MeetMeRoom: 4111239, RpqId: 147703783, USRNumber: PRIVACY,USRPrimaryKey:4111239") in new stack
   -- Executing [4111239@MeetMeUSR:7] ConfBridge("Local/PRIVACY@USRCall-00000003;1", "4111239") in new stack
   -- Stopped music on hold on SIP/TESTUSER-00000003
   -- Channel Local/PRIVACY@USRCall-00000003;1 joined 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- Channel SIP/mysipprovider-out-00000004 joined 'simple_bridge' basic-bridge <1aec3a06-d34d-4d80-a4e0-11fd03fe7c3d>
   -- Channel Local/PRIVACY@USRCall-00000003;2 joined 'simple_bridge' basic-bridge <1aec3a06-d34d-4d80-a4e0-11fd03fe7c3d>
   -- Channel SIP/mysipprovider-out-00000004 left 'simple_bridge' basic-bridge <1aec3a06-d34d-4d80-a4e0-11fd03fe7c3d>
   -- Channel Local/PRIVACY@USRCall-00000003;2 left 'simple_bridge' basic-bridge <1aec3a06-d34d-4d80-a4e0-11fd03fe7c3d>
 == Spawn extension (USRCall, PRIVACY, 4) exited non-zero on 'Local/PRIVACY@USRCall-00000003;2'
   -- Executing [h@USRCall:1] UserEvent("Local/PRIVACY@USRCall-00000003;2", "USRLeavedMeetMe, Channel: SIP/TESTUSER-00000003") in new stack
   -- Executing [h@USRCall:2] UserEvent("Local/PRIVACY@USRCall-00000003;2", "USRHangup, USR: PRIVACY, Reason: ANSWER, Channel: SIP/TESTUSER-00000003") in new stack
   -- Channel Local/PRIVACY@USRCall-00000003;1 left 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- Started music on hold, class 'default', on channel 'SIP/TESTUSER-00000003'
   -- Stopped music on hold on SIP/TESTUSER-00000003
   -- Channel SIP/TESTUSER-00000003 left 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- <CBAnn/4111239-00000002;1> Playing 'beep.gsm' (language 'en')
   -- Channel CBAnn/4111239-00000002;2 left 'softmix' base-bridge <1144e8e1-a7de-429c-b21c-49b4967d0e67>
   -- Executing [h@MeetMeInvite:1] Hangup("SIP/TESTUSER-00000003", "") in new stack
 == Spawn extension (MeetMeInvite, h, 1) exited non-zero on 'SIP/TESTUSER-00000003'
{noformat}

{noformat}
[MeetMeInvite]
exten => _X!,1,Answer()
same => n,Set(CONFBRIDGE(user,quiet)=no)
same => n,Set(CONFBRIDGE(user,marked)=yes)
same => n,Set(CONFBRIDGE(user,music_on_hold_when_empty)=yes)
same => n,Set(CONFBRIDGE(user,dtmf_passthrough)=yes)
same => n,Set(CONFBRIDGE(user,dsp_drop_silence)=yes)
same => n,UserEvent(EnteredMeetMe, Channel: ${CHANNEL}, MeetMeRoom: ${EXTEN})
same => n,ConfBridge(${EXTEN})

exten => s,1,Hangup()
exten => h,1,Hangup()
exten => failed,1,Hangup()

[USRCall]
exten => _X!,1,Set(USR=${EXTEN})
same => n,Verbose(0,USRCall)
same => n,Verbose(0,${SIPDOMAIN})
exten => _X!,n,Dial(SIP/mysipprovider-out/00${EXTEN},30)

exten => h,1,UserEvent(USRLeavedMeetMe, Channel: ${MyChannel})
exten => h,n,UserEvent(USRHangup, USR: ${USR}, Reason: ${DIALSTATUS}, Channel: ${MyChannel})

exten => failed,1,Hangup()
{noformat}


By: Kevin Harwell (kharwell) 2020-10-12 14:53:48.003-0500

Please attach associated configuration files (sip.conf, confbridge.conf, etc..), or relevant sections from those.

Also please execute the {{ast_coredumper}} script [1] against the core file and/or running Asterisk at the time, and attach its output here.

Lastly, please attach a full Asterisk log with debug level 5 enabled if possible from an occurrence.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Dennis Haney (CineCine) 2020-10-16 04:57:48.730-0500

We have solved it temporarily by giving the machine 6gig of memory, which is enough to have it survive a single day and can recover during the nightly reboot.
I will attempt to get the above for you with minimal interference sometime next week.

By: Dennis Haney (CineCine) 2020-10-26 10:16:53.213-0500

here is dump from production. I havent had time to repro in test environment

By: Kevin Harwell (kharwell) 2020-10-26 10:30:12.995-0500

Please execute the {{ast_coredumper}} script [1] on the core file, and then attach that output here.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Thanks!

By: Dennis Haney (CineCine) 2020-10-27 23:25:34.054-0500

What is the problem with the attachment I sent yesterday?

By: Sean Bright (seanbright) 2020-10-28 09:46:44.321-0500

[~CineCine] - A core file by itself is not sufficient for debugging anything. {{ast_coredumper}} bundles everything up that we need to determine the cause of the crash.

By: Kevin Harwell (kharwell) 2020-10-28 09:52:48.618-0500

The attachment just contains the core file. We cannot make use of that locally without all the backing symbol files. Even then it can be a tricky endeavor to load correctly.

The simplest thing is to use the core file from the system it was produced on. The script will automatically extract information (e.g. backtrace) from the core that could be helpful.

Edit: Or what [~seanbright] said :-)

By: Asterisk Team (asteriskteam) 2020-11-11 12:00:00.976-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines