[Home]

Summary:ASTERISK-30380: res_pjsip: Crash in simple
Reporter:Reuben Farrelly (reuben)Labels:
Date Opened:2022-12-28 18:28:51.000-0600Date Closed:2022-12-29 04:53:01.000-0600
Priority:MinorRegression?Yes
Status:Closed/CompleteComponents:General
Versions:20.0.0 20.1.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Gentoo Linux x86_64 - Asterisk ebuild in portage treeAttachments:( 0) core-asterisk-2022-12-28T23-40-58Z-brief.txt
( 1) core-asterisk-2022-12-28T23-40-58Z-full.txt
( 2) core-asterisk-2022-12-28T23-40-58Z-info.txt
( 3) core-asterisk-2022-12-28T23-40-58Z-locks.txt
( 4) core-asterisk-2022-12-28T23-40-58Z-thread1.txt
( 5) core-asterisk-2022-12-29T01-51-49Z-brief.txt
( 6) core-asterisk-2022-12-29T01-51-49Z-full.txt
( 7) core-asterisk-2022-12-29T01-51-49Z-info.txt
( 8) core-asterisk-2022-12-29T01-51-49Z-locks.txt
( 9) core-asterisk-2022-12-29T01-51-49Z-thread1.txt
Description:Asterisk is crashing frequently for me with the following printed in the system dmesg:

[463264.572879] Code: 49 89 f0 8b b7 80 00 00 00 48 8d 97 88 00 00 00 31 c9 e9 f9 d3 ff ff 66 0f 1f 84 00 00 00 00 00 41 54 41 bc 0d 9c 02 00 55 53 <48> 8b 86 88 01 00 00 8b 10 85 d2 75 5b 48 89 fb 48 8b 7f 30 48 89
[465079.782571] asterisk[5122]: segfault at 188 ip 00007f785395ed9a sp 00007f78514b5890 error 4 in libpjsip-simple.so.2[7f785395c000+a000] likely on CPU 1 (core 1, socket 0)
[465079.782580] Code: 49 89 f0 8b b7 80 00 00 00 48 8d 97 88 00 00 00 31 c9 e9 f9 d3 ff ff 66 0f 1f 84 00 00 00 00 00 41 54 41 bc 0d 9c 02 00 55 53 <48> 8b 86 88 01 00 00 8b 10 85 d2 75 5b 48 89 fb 48 8b 7f 30 48 89
[466895.025012] asterisk[5307]: segfault at 188 ip 00007fdd6445cd9a sp 00007fdd15f8a890 error 4 in libpjsip-simple.so.2[7fdd6445a000+a000] likely on CPU 3 (core 3, socket 0)
[466895.025028] Code: 49 89 f0 8b b7 80 00 00 00 48 8d 97 88 00 00 00 31 c9 e9 f9 d3 ff ff 66 0f 1f 84 00 00 00 00 00 41 54 41 bc 0d 9c 02 00 55 53 <48> 8b 86 88 01 00 00 8b 10 85 d2 75 5b 48 89 fb 48 8b 7f 30 48 89

This is with Asterisk-20.0.0 and 20.1.0-rc1.  Gentoo policy is that bundled libraries are not used so I am using the Gentoo portage pjsip (2.13).

I have followed the instructions around generating a backtrace and will attach these soon.  However in short this is what is printed:

Thread 71 (Thread 0x7f7865b6f6c0 (LWP 5096)):
#0  0x00007f7865cc70ca in fdatasync () at /lib64/libc.so.6
#1  0x00007f786643bc81 in unixSync () at /usr/lib64/libsqlite3.so.0
#2  0x00007f7866489aab in syncJournal.part.0 () at /usr/lib64/libsqlite3.so.0
#3  0x00007f7866495786 in sqlite3PagerCommitPhaseOne.part.0 () at /usr/lib64/libsqlite3.so.0
#4  0x00007f7866496a3b in sqlite3BtreeCommitPhaseOne.part.0 () at /usr/lib64/libsqlite3.so.0
#5  0x00007f7866498c28 in sqlite3VdbeHalt () at /usr/lib64/libsqlite3.so.0
#6  0x00007f78664d09a4 in sqlite3VdbeExec () at /usr/lib64/libsqlite3.so.0
#7  0x00007f78664d3ca6 in sqlite3_step () at /usr/lib64/libsqlite3.so.0
#8  0x00007f78664d47ad in sqlite3_exec () at /usr/lib64/libsqlite3.so.0
#9  0x000055f1869c29ec in db_execute_sql (sql=0x55f186b8563b "COMMIT", callback=0x0, arg=0x0) at db.c:315
       errmsg = 0x0
       res = 0
       __FUNCTION__ = "db_execute_sql"
#10 0x000055f1869c2a9b in ast_db_commit_transaction () at db.c:331
#11 0x000055f1869c576c in db_sync_thread (data=0x0) at db.c:1148
       __PRETTY_FUNCTION__ = "db_sync_thread"

sqlite was updated on this system from 3.39 to 3.40 on November 25th which could be related.  Either way, Asterisk should not be crashing and restarting like this.

These crashes happen every 30 minutes or so.

I do not have sqlite configured anywhere in asterisk.
Comments:By: Asterisk Team (asteriskteam) 2022-12-28 18:28:52.440-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Reuben Farrelly (reuben) 2022-12-28 18:33:38.384-0600

supercell /var/lib/asterisk/coredump # /var/lib/asterisk/scripts/ast_coredumper core
Processing /var/lib/asterisk/coredump/core
Renaming /var/lib/asterisk/coredump/core to /var/lib/asterisk/coredump/core-asterisk-2022-12-28T23-40-58Z
Creating /tmp/core-asterisk-2022-12-28T23-40-58Z-thread1.txt
Creating /tmp/core-asterisk-2022-12-28T23-40-58Z-brief.txt
Creating /tmp/core-asterisk-2022-12-28T23-40-58Z-full.txt
Creating /tmp/core-asterisk-2022-12-28T23-40-58Z-locks.txt
Creating /tmp/core-asterisk-2022-12-28T23-40-58Z-info.txt
supercell

By: Joshua C. Colp (jcolp) 2022-12-28 18:54:16.458-0600

This has nothing to do with sqlite. It's crashing here:

{noformat}
Thread 1 (Thread 0x7f78514b66c0 (LWP 5122)):
#0  0x00007f785395ed9a in pjsip_evsub_send_request () at /usr/lib64/libpjsip-simple.so.2
#1  0x00007f785395f372 in mod_evsub_on_tsx_state () at /usr/lib64/libpjsip-simple.so.2
#2  0x00007f7853934fd9 in pjsip_dlg_on_tsx_state () at /usr/lib64/libpjsip.so.2
#3  0x00007f785392f00d in tsx_set_state () at /usr/lib64/libpjsip.so.2
#4  0x00007f785392ffa6 in tsx_on_state_null () at /usr/lib64/libpjsip.so.2
#5  0x00007f785393247e in pjsip_tsx_recv_msg () at /usr/lib64/libpjsip.so.2
#6  0x00007f7853934cef in pjsip_dlg_on_rx_request () at /usr/lib64/libpjsip.so.2
#7  0x00007f7853936391 in mod_ua_on_rx_request () at /usr/lib64/libpjsip.so.2
#8  0x00007f785391a1a3 in pjsip_endpt_process_rx_data () at /usr/lib64/libpjsip.so.2
#9  0x00007f78539cee7e in distribute (data=0x7f780405a898) at res_pjsip/pjsip_distributor.c:955
#10 0x000055f186ac593f in ast_taskprocessor_execute (tps=0x55f1891948c0) at taskprocessor.c:1302
#11 0x000055f186ad0316 in execute_tasks (data=0x55f1891948c0) at threadpool.c:1350
#12 0x000055f186ac593f in ast_taskprocessor_execute (tps=0x55f18859add0) at taskprocessor.c:1302
#13 0x000055f186acdbcb in threadpool_execute (pool=0x55f1885998d0) at threadpool.c:367
#14 0x000055f186acfa66 in worker_active (worker=0x7f77fc000be0) at threadpool.c:1137
#15 0x000055f186acf7c5 in worker_start (arg=0x7f77fc000be0) at threadpool.c:1056
#16 0x000055f186adad9a in dummy_start (data=0x55f188598910) at utils.c:1574
#17 0x00007f7865c51625 in start_thread () at /lib64/libc.so.6
#18 0x00007f7865cd1bec in clone3 () at /lib64/libc.so.6
{noformat}

Unfortunately PJSIP is not built with symbols, so it doesn't give any further information.

What version of PJSIP is in use? Was it updated without Asterisk being rebuilt?

By: Joshua C. Colp (jcolp) 2022-12-28 18:57:32.364-0600

This could even be happening because PJSIP was built in developer mode, and not release mode, which enables assertions that will cause a crash.

By: Joshua C. Colp (jcolp) 2022-12-28 18:59:03.987-0600

Oh, you stated PJSIP 2.13. We haven't tested/verified PJSIP 2.13 as of yet. ASTERISK-30325 is open to do so, which I hope to get to soon. I have no idea what could happen with Asterisk when used with it.

By: Reuben Farrelly (reuben) 2022-12-28 20:18:17.961-0600

Updated core outputs this time with pjsip debug symbols.   Acknowledge re: this being an untested version of pjsip.

The other updated core outputs will be uploaded shortly.

Thread 1 (Thread 0x7fc94032f6c0 (LWP 7904)):
#0  0x00007fc940bfef71 in pjsip_evsub_send_request (sub=0x7fc8cc0466f8, tdata=0x0) at ../src/pjsip-simple/evsub.c:1396
       status = 0
#1  0x00007fc940c00916 in on_tsx_state_uas (sub=0x7fc8cc0466f8, tsx=0x7fc8cc026fe8, event=0x7fc94032e9c0) at ../src/pjsip-simple/evsub.c:2228
       expires = 0x7fc8dc05ad98
       tdata = 0x7fc8cc01d978
       st_code = 200
       st_text = 0x0
       body = 0x0
       old_state = PJSIP_EVSUB_STATE_ACTIVE
       old_state_str = {ptr = 0x7fc940c091bc "ACTIVE", slen = 6}
       reason = {ptr = 0x0, slen = 0}
       rdata = 0x7fc8dc05b0e8
       msg = 0x7fc8dc06b440
       res_hdr = {prev = 0x7fc94032e890, next = 0x7fc94032e890, type = 3422840568, name = {ptr = 0x7fc941f6ede3 <grp_lock_acquire+100> "slen = 140502342232368}, sname = {ptr = 0x7fc8d0017188 "210q0017", slen = 140500459942360}, vptr = 0x7fc8d0017188}
       status = 0
       event_hdr = 0x7fc8dc05ade0
#2  0x00007fc940c00cd2 in mod_evsub_on_tsx_state (tsx=0x7fc8cc026fe8, event=0x7fc94032e9c0) at ../src/pjsip-simple/evsub.c:2333
       sub = 0x7fc8cc0466f8
#3  0x00007fc940bd602d in pjsip_dlg_on_tsx_state (dlg=0x7fc8cc03a368, tsx=0x7fc8cc026fe8, e=0x7fc94032e9c0) at ../src/pjsip/sip_dialog.c:2143
       i = 0
#4  0x00007fc940bd691e in mod_ua_on_tsx_state (tsx=0x7fc8cc026fe8, e=0x7fc94032e9c0) at ../src/pjsip/sip_ua_layer.c:186
       dlg = 0x7fc8cc03a368
#5  0x00007fc940bce0f3 in tsx_set_state (tsx=0x7fc8cc026fe8, state=PJSIP_TSX_STATE_TRYING, event_src_type=PJSIP_EVENT_RX_MSG, event_src=0x7fc8dc05b0e8, flag=0) at ../src/pjsip/sip_transaction.c:1339
       e = {prev = 0x7fc94032e9e0, next = 0x7fc941f6f523 <pj_grp_lock_add_ref+24>, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7fc8dc05b0e8}, tsx_state = {src = {rdata = 0x7fc8dc05b0e8, tdata = 0x7fc8dc05b0e8, timer = 0x7fc8dc05b0e8, status = -603606808, data = 0x7fc8dc05b0e8}, tsx = 0x7fc8cc026fe8, prev_state = 0, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7fc8dc05b0e8}, tx_error = {tdata = 0x7fc8dc05b0e8, tsx = 0x7fc8cc026fe8}, rx_msg = {rdata = 0x7fc8dc05b0e8}, user = {user1 = 0x7fc8dc05b0e8, user2 = 0x7fc8cc026fe8, user3 = 0x300000000, user4 = 0x7fc94032ea00}}}
       prev_state = PJSIP_TSX_STATE_NULL
#6  0x00007fc940bd04f8 in tsx_on_state_null (tsx=0x7fc8cc026fe8, event=0x7fc94032ea50) at ../src/pjsip/sip_transaction.c:2540
       status = 32713
#7  0x00007fc940bcf124 in pjsip_tsx_recv_msg (tsx=0x7fc8cc026fe8, rdata=0x7fc8dc05b0e8) at ../src/pjsip/sip_transaction.c:1902
       event = {prev = 0x7fc8cc03c440, next = 0xdc06b468, type = PJSIP_EVENT_RX_MSG, body = {timer = {entry = 0x7fc8dc05b0e8}, tsx_state = {src = {rdata = 0x7fc8dc05b0e8, tdata = 0x7fc8dc05b0e8, timer = 0x7fc8dc05b0e8, status = -603606808, data = 0x7fc8dc05b0e8}, tsx = 0x7fc94032eaa0, prev_state = 1086136050, type = 32713}, tx_msg = {tdata = 0x7fc8dc05b0e8}, tx_error = {tdata = 0x7fc8dc05b0e8, tsx = 0x7fc94032eaa0}, rx_msg = {rdata = 0x7fc8dc05b0e8}, user = {user1 = 0x7fc8dc05b0e8, user2 = 0x7fc94032eaa0, user3 = 0x7fc940bd1ef2 <pjsip_uri_cmp+49>, user4 = 0x7fc900000006}}}
#8  0x00007fc940bd56b6 in pjsip_dlg_on_rx_request (dlg=0x7fc8cc03a368, rdata=0x7fc8dc05b0e8) at ../src/pjsip/sip_dialog.c:1785
       status = 0
       tsx = 0x7fc8cc026fe8
       processed = 0
       i = 1

By: Reuben Farrelly (reuben) 2022-12-28 20:19:53.345-0600

Updated core outputs with pjsip now rebuilt with debug symbols

By: Reuben Farrelly (reuben) 2022-12-28 20:28:55.883-0600

Confirming pjproject was built with -DNDEBUG=1 - which I gather disables developer mode - this is observed all through the build output of the pjproject package.

By: Joshua C. Colp (jcolp) 2022-12-29 03:44:42.655-0600

The assertion that is failing:

{code}
   PJ_ASSERT_RETURN(tdata->msg->type == PJSIP_REQUEST_MSG,
                    PJSIP_ENOTREQUESTMSG);
{code}

Which I would expect to turn into a return in release mode. The documentation says to use "-DNDEBUG" and not "-DNDEBUG=1" so I don't know if that would make a difference. I believe this is fundamentally a build issue outside of Asterisk and is strictly with how PJSIP is being built. I'm not that familiar with it, we determined the ideal build for Asterisk for bundled years ago at this point and it rarely changes. Our config_site.h is here[1] which generally controls things.

[1] https://github.com/asterisk/asterisk/blob/20/third-party/pjproject/patches/config_site.h

By: Reuben Farrelly (reuben) 2022-12-29 04:50:53.455-0600

I think you're right - the NDEBUG value is not defined in the config_site.h used in the Gentoo installation of pjproject but is handled (or possibly mis-handled in this case) in the ebuild via a debug flag.  I will take this up with the Gentoo maintainer.  Thanks for your help.

By: Jaco Kroon (jkroon) 2022-12-29 10:04:38.071-0600

ACK'ed and referenced here:  https://bugs.gentoo.org/888879

I believe this issue can be closed on asterisk side as "gentoo specific", and I'll sort it out on the above.