[Home]

Summary:ASTERISK-26679: Crash on invalid contact domain (pjsip aor)
Reporter:Dmitriy (kasper)Labels:
Date Opened:2016-12-27 13:02:36.000-0600Date Closed:2017-02-02 09:36:28.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:13.13.1 Frequency of
Occurrence
Frequent
Related
Issues:
is duplicated byASTERISK-26746 pjsip: Repeated segfaults with not responding nameserver
Environment:debian 8.1Attachments:( 0) backtrace.txt
Description:contact=sip:sipuser

{quote}
[Dec 27 21:38:38] WARNING[24817]: pjproject:0 <?>:      tsx0x7f5494015 Failed to send Request msg OPTIONS/cseq=20579 (tdta0x7f549401d650)! err=320053 (DNS "Name Error" (PJLIB_UTIL_EDNS_NXDOMAIN))
[Dec 27 21:39:38] WARNING[24818]: pjproject:0 <?>:      tsx0x7f5494015 .Failed to send Request msg OPTIONS/cseq=59404 (tdta0x7f549401d650)! err=320053 (DNS "Name Error" (PJLIB_UTIL_EDNS_NXDOMAIN))
[Dec 27 21:39:38] ERROR[24818]: res_pjsip.c:3591 endpt_send_request: Error 320053 'DNS "Name Error" (PJLIB_UTIL_EDNS_NXDOMAIN)' sending OPTIONS request to endpoint extline-42
[Dec 27 21:39:38] ERROR[24818]: astobj2.c:131 INTERNAL_OBJ: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f549401d5b0 (0)
Got 19 backtrace records
#0: [0x5f9365] main/utils.c:2459 __ast_assert_failed() (0x5f92d8+8D)
#1: [0x45c06c] main/astobj2.c:132 INTERNAL_OBJ()
#2: [0x45c0c7] main/astobj2.c:152 __ao2_lock() (0x45c09c+2B)
#3: [0x7f547f66cb6a] res/res_pjsip.c:3600 endpt_send_request()
#4: [0x7f547f66d022] res/res_pjsip.c:3720 ast_sip_send_out_of_dialog_request() (0x7f547f66ce71+1B1)
#5: [0x7f547f670ffb] res_pjsip/pjsip_options.c:444 qualify_contact()
#6: [0x7f547f671155] res_pjsip/pjsip_options.c:519 qualify_contact_task()
#7: [0x5e0e83] main/taskprocessor.c:969 ast_taskprocessor_execute() (0x5e0d73+110)
#8: [0x5e9942] main/threadpool.c:1322 execute_tasks()
#9: [0x5e0e83] main/taskprocessor.c:969 ast_taskprocessor_execute() (0x5e0d73+110)
#10: [0x5e7c6f] main/threadpool.c:351 threadpool_execute()
#11: [0x5e92c6] main/threadpool.c:1105 worker_active()
#12: [0x5e907f] main/threadpool.c:1025 worker_start()
#13: [0x5f6477] main/utils.c:1235 dummy_start()
Pbx*CLI>
Disconnected from Asterisk server
Asterisk cleanly ending (0).
Executing last minute cleanups
root@Pbx:/tmp#
{quote}
Comments:By: Asterisk Team (asteriskteam) 2016-12-27 13:02:37.155-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Mark Michelson (mmichelson) 2016-12-28 12:49:21.521-0600

Linking this with ASTERISK-26675 and ASTERISK-26669 since they may originate from the same issue.

By: Mark Michelson (mmichelson) 2016-12-28 13:04:29.112-0600

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

The backtrace is incredibly helpful, but a full debug will provide us with more context for the crash, and we might be able to reproduce or solve the issue better.

By: Jacek Konieczny (jkonieczny) 2017-01-10 02:47:26.872-0600

I have experienced similar crashes due to a misconfigured contact hostname or broken nameservers. The last one:

{noformat}
# coredumpctl  gdb -1                                                                                                                                        
          PID: 8722 (asterisk)
          UID: 188 (asterisk)
          GID: 188 (asterisk)
       Signal: 11 (SEGV)
    Timestamp: Tue 2017-01-10 01:29:19 CET (8h ago)
 Command Line: /usr/sbin/asterisk -U asterisk -G asterisk -g -f
   Executable: /usr/sbin/asterisk
Control Group: /
        Slice: -.slice
      Boot ID: ec363aa2f81a46d998f3dec80b9b10f3
   Machine ID: 1b7ba458f42347e897c9190248b35978
     Hostname: x12c-4puntnuldemo
     Coredump: /var/lib/systemd/coredump/core.asterisk.188.ec363aa2f81a46d998f3dec80b9b10f3.8722.1484008159000000.lz4
      Message: Process 8722 (asterisk) of user 188 dumped core.
               
               Stack trace of thread 8754:
               #0  0x00000000f63b06e1 cancel (libpj.so.2)
               #1  0x00000000f63b0d2b cancel_timer (libpj.so.2)
               #2  0x00000000f635db90 endpt_send_request (res_pjsip.so)
               #3  0x00000000f6360a43 ast_sip_send_out_of_dialog_request (res_pjsip.so)
               #4  0x00000000f636460c qualify_contact (res_pjsip.so)
               #5  0x00000000f63648eb qualify_contact_task (res_pjsip.so)
               #6  0x00000000081d741c ast_taskprocessor_execute (asterisk)
               #7  0x00000000081de52f execute_tasks (asterisk)
               #8  0x00000000081d741c ast_taskprocessor_execute (asterisk)
               #9  0x00000000081dde9f threadpool_execute (asterisk)
               #10 0x00000000081e6910 dummy_start (asterisk)
               #11 0x00000000f70e5220 start_thread (libpthread.so.0)
               #12 0x00000000f6f885ee __clone (libc.so.6)
               
[...]
Program terminated with signal SIGSEGV, Segmentation fault.
#0  cancel (ht=ht@entry=0x866280c, entry=entry@entry=0x6478302b, flags=flags@entry=7) at ../src/pj/timer.c:331
331  if (entry->_timer_id < 0 || (pj_size_t)entry->_timer_id > ht->max_size) {
$1 = (pj_timer_entry *) 0x6478302b
(gdb) print entry->_timer_id
Cannot access memory at address 0x64783037
(gdb) up
#1  0xf63b0d2b in cancel_timer (id_val=id_val@entry=0, flags=6, entry=0x6478302b, ht=0x866280c) at ../src/pj/timer.c:581
581    count = cancel(ht, entry, flags | F_DONT_CALL);
(gdb) print *entry
Cannot access memory at address 0x6478302b
(gdb) up
#2  pj_timer_heap_cancel_if_active (ht=0x866280c, entry=entry@entry=0x6478302b, id_val=id_val@entry=0) at ../src/pj/timer.c:605
605    return cancel_timer(ht, entry, F_SET_ID | F_DONT_ASSERT, id_val);
(gdb) up
#3  0xf635db90 in endpt_send_request (endpoint=endpoint@entry=0x89bc080, tdata=tdata@entry=0xf3b3e91c, timeout=timeout@entry=3000, token=token@entry=0x8979600,
   cb=0xf635dc09 <send_request_cb>) at res_pjsip.c:3600
3600 timers_cancelled = pj_timer_heap_cancel_if_active(
(gdb) print req_wrapper
$2 = (struct send_request_wrapper *) 0x89a64f0
(gdb) print *req_wrapper
$3 = {
 token = 0x726f7373,
 callback = 0x6578655f,
 cb_called = 1702131043,
 timeout_timer = 0x6478302b,
 timeout = 1528834402,
 tdata = 0x31387830
}
(gdb) print *req_wrapper->timeout_timer
Cannot access memory at address 0x6478302b
{noformat}

And the log fragment:
{noformat}
Jan 10 01:29:19 x12c-4puntnuldemo asterisk[8722]: ERROR[8754]: res_pjsip.c:3591 in endpt_send_request: Error 320053 'DNS "Name Error" (PJLIB_UTIL_EDNS_NXDOMAIN)' sending OPTIONS request
to endpoint GSMINBOUND
Jan 10 01:29:19 x12c-4puntnuldemo asterisk[8722]: [Jan 10 01:29:19] ERROR[8754]: res_pjsip.c:3591 endpt_send_request: Error 320053 'DNS "Name Error" (PJLIB_UTIL_EDNS_NXDOMAIN)' sending
OPTIONS request to endpoint GSMINBOUND
Jan 10 01:29:19 x12c-4puntnuldemo asterisk[8722]: FRACK!, Failed assertion bad magic number 0x0 for object 0x89a64f0 (0) at line 131 in INTERNAL_OBJ of astobj2.c
Jan 10 01:29:19 x12c-4puntnuldemo asterisk[8722]: ERROR[8754]: astobj2.c:131 in INTERNAL_OBJ: FRACK!, Failed assertion bad magic number 0x0 for object 0x89a64f0 (0)
Jan 10 01:29:19 x12c-4puntnuldemo asterisk[8722]: [Jan 10 01:29:19] ERROR[8754]: astobj2.c:131 INTERNAL_OBJ: FRACK!, Failed assertion bad magic number 0x0 for object 0x89a64f0 (0)
{noformat}

There are more 'DNS "Name Error"' and 'FRACK!' in the logs, it does not end with an Asterisk crash every time.

By: George Joseph (gjoseph) 2017-01-31 10:24:45.336-0600

I can reproduce this at will.

Simply create an aor with a bad hostname and either wait for it to qualify or do a 'pjsip qualify' from the CLI.    May take 2 tries.

An outbound call to the aor seems to fail gracefully so it may have to do with an out-of-dialog request.
.


By: George Joseph (gjoseph) 2017-01-31 11:52:21.521-0600

patch up courtesy of [~jcolp]
https://gerrit.asterisk.org/4846

By: Friendly Automation (friendly-automation) 2017-02-02 09:36:28.722-0600

Change 4847 merged by Joshua Colp:
res_pjsip: Handle invocation of callback on outgoing request when error occurs.

[https://gerrit.asterisk.org/4847|https://gerrit.asterisk.org/4847]

By: Friendly Automation (friendly-automation) 2017-02-02 10:45:55.948-0600

Change 4848 merged by zuul:
res_pjsip: Handle invocation of callback on outgoing request when error occurs.

[https://gerrit.asterisk.org/4848|https://gerrit.asterisk.org/4848]

By: Friendly Automation (friendly-automation) 2017-02-02 10:50:55.463-0600

Change 4846 merged by Joshua Colp:
res_pjsip: Handle invocation of callback on outgoing request when error occurs.

[https://gerrit.asterisk.org/4846|https://gerrit.asterisk.org/4846]

By: Friendly Automation (friendly-automation) 2017-02-02 11:52:20.447-0600

Change 4868 merged by zuul:
res_pjsip: Handle invocation of callback on outgoing request when error occurs.

[https://gerrit.asterisk.org/4868|https://gerrit.asterisk.org/4868]