[Home]

Summary:ASTERISK-24727: PJSIP: Crash experienced during multi-Asterisk transfer scenario.
Reporter:Mark Michelson (mmichelson)Labels:
Date Opened:2015-01-27 11:35:00.000-0600Date Closed:2015-02-13 11:23:43.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:13.1.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Attachments:( 0) extensions.conf.levin
( 1) extensions.conf.santigold
( 2) gdb.txt
( 3) pjsip.conf.levin
( 4) pjsip.conf.santigold
Description:While trying to reproduce a reported deadlock, I managed to make a crash occur from time to time.

The scenario involves two Asterisk boxes (called "levin" and "santigold" since those are my server names) and three phones (200, 201, and 202)
Phones 201 and 202 are registered to levin (10.24.20.249)
Phone 200 is registered to santigold (10.24.16.50)

The execution is simple. Phone 201 places a call to 200. 200 performs an attended transfer to phone 202. Once 201 and 202 are bridged, 201 hangs up the call. Once the call is hung up, Asterisk will sometimes crash. The last message printed is:

{noformat}
[Jan 27 10:57:22] WARNING[12002]: res_pjsip_session.c:534 send_delayed_request: Unexpected delayed  request with no existing request structure
*** Error in `asterisk': double free or corruption (fasttop): 0x00007f22800431f0 ***
{noformat}

On one instance of the crash, I also got this printed at crash time:
{noformat}
[Jan 27 10:20:00] WARNING[11390]: res_pjsip_session.c:534 send_delayed_request: Unexpected delayed  request with no existing request structure
[Jan 27 10:20:00] ERROR[11390]: astobj2.c:126 INTERNAL_OBJ: bad magic number for object 0x7f2e100537a0. Object is likely destroyed.
[Jan 27 10:20:00] ERROR[11390]: astobj2.c:130 INTERNAL_OBJ: FRACK!, Failed assertion 0 (0)
Got 19 backtrace records
#0: [0x5fe5ef] asterisk(__ast_assert_failed+0x84) [0x5fe5ef]
#1: [0x45be90] asterisk() [0x45be90]
#2: [0x45bf7e] asterisk() [0x45bf7e]
#3: [0x45c641] asterisk() [0x45c641]
#4: [0x45ca21] asterisk(__ao2_ref+0x2e) [0x45ca21]
#5: [0x45ca91] asterisk(__ao2_cleanup+0x24) [0x45ca91]
#6: [0x7f2e5698a39f] /usr/lib/asterisk/modules/res_pjsip_session.so(+0x939f) [0x7f2e5698a39f]
#7: [0x5e63a1] asterisk(ast_taskprocessor_execute+0x10d) [0x5e63a1]
#8: [0x5ef1f6] asterisk() [0x5ef1f6]
#9: [0x5e63a1] asterisk(ast_taskprocessor_execute+0x10d) [0x5e63a1]
#10: [0x5ed9d5] asterisk() [0x5ed9d5]
#11: [0x5eeef3] asterisk() [0x5eeef3]
#12: [0x5eecb0] asterisk() [0x5eecb0]
#13: [0x5fae5a] asterisk() [0x5fae5a]
FRACK!, Failed assertion 0 (0) at line 409 in internal_ao2_ref of astobj2.c
[Jan 27 10:20:00] ERROR[11390]: astobj2.c:409 internal_ao2_ref: FRACK!, Failed assertion 0 (0)
Got 18 backtrace records
#0: [0x5fe5ef] asterisk(__ast_assert_failed+0x84) [0x5fe5ef]
#1: [0x45be90] asterisk() [0x45be90]
#2: [0x45c66b] asterisk() [0x45c66b]
#3: [0x45ca21] asterisk(__ao2_ref+0x2e) [0x45ca21]
#4: [0x45ca91] asterisk(__ao2_cleanup+0x24) [0x45ca91]
#5: [0x7f2e5698a39f] /usr/lib/asterisk/modules/res_pjsip_session.so(+0x939f) [0x7f2e5698a39f]
#6: [0x5e63a1] asterisk(ast_taskprocessor_execute+0x10d) [0x5e63a1]
#7: [0x5ef1f6] asterisk() [0x5ef1f6]
#8: [0x5e63a1] asterisk(ast_taskprocessor_execute+0x10d) [0x5e63a1]
#9: [0x5ed9d5] asterisk() [0x5ed9d5]
#10: [0x5eeef3] asterisk() [0x5eeef3]
#11: [0x5eecb0] asterisk() [0x5eecb0]
#12: [0x5fae5a] asterisk() [0x5fae5a]
*** Error in `asterisk': free(): invalid pointer: 0x00007f2e100537a0 ***
{noformat}

It appears that a race condition is sometimes causing a double-free of a ref-counted object. Based on the send_delayed_request warning message and the backtrace, I think that a delayed request is getting double-freed.

I am going to attach a backtrace, as well as pjsip.conf and extensions.conf files necessary to reproduce the scenario. Note that the crash is pretty infrequent, about 1 in 15-20 attempts.
Comments:By: Mark Michelson (mmichelson) 2015-01-27 11:47:04.810-0600

I have attached 5 files:

pjsip.conf.levin: pjsip.conf from the "levin" Asterisk server
pjsip.conf.santigold: pjsip.conf from the "santigold" Asterisk server
extensions.conf.levin: extensions.conf from the "levin" Asterisk server
extensions.conf.santigold: extensions.conf from the "santigold" Asterisk server.

These files have been stripped down a bit since both servers actually have a lot more content in them than the involved endpoints and extensions. The main things to note here are that all call legs involved use ulaw as the only allowed codec. Also note that phone 201 on levin and phone 200 on santigold have direct_media disabled. This may play a factor in the crash, but it may not.

I have also attached gdb.txt. This has the bt, bt full, thread apply bt, and thread apply all bt full output from a gdb session. For the time being, I still have the core file from the latest crash on this system and can get further information if it is required.