[Home]

Summary:ASTERISK-24212: testsuite: Sporadic crash due to assert on stopping RTP engine
Reporter:Matt Jordan (mjordan)Labels:
Date Opened:2014-08-12 09:36:58Date Closed:2014-09-02 13:18:13
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip Resources/res_rtp_asterisk Tests/testsuite
Versions:Frequency of
Occurrence
Related
Issues:
causesASTERISK-25023 Deadlock in chan_sip in update_provisional_keepalive
Environment:Attachments:( 0) backtrace_12706.txt
( 1) full.txt
Description:Periodically, when stopping an RTP instance, an assertion is triggered by the scheduler:

{noformat}
#4  0x00000000006c6761 in _ast_assert (con=0x20885f0, id=4, file=0x7fc5de39cbb1 "res_rtp_asterisk.c", line=4590, function=0x7fc5de39edf3 "ast_rtp_stop") at /srv/bamboo/xml-data/build-dir/AST-ATSF4-C664TE/asterisk/include/asterisk/utils.h:810
No locals.
#5  _ast_sched_del (con=0x20885f0, id=4, file=0x7fc5de39cbb1 "res_rtp_asterisk.c", line=4590, function=0x7fc5de39edf3 "ast_rtp_stop") at sched.c:489
       buf = "s != NULL, id=4\000\322\002\000\000\001\000\000\000\060V\004\004\306\177\000\000\020V\004\004\306\177\000\000`\363\337\351\305\177\000\000xF\000\020\306\177\000\000P\364\337\351\305\177\000\000\330\004:\353\061\063\000\000\000\365\337\351\305\177\000\000\377\000\000\000\000\000\000\000\227\367\000\020\306\177\000\000\367\365\337", <incomplete sequence \351>
       s = 0x0
       tmp = {list = {next = 0x0}, id = 4, when = {tv_sec = 0, tv_usec = 0}, resched = 0, variable = 0, data = 0x0, callback = 0, __heap_index = 0}
       last_id = 0x7fc6100b9390
       __PRETTY_FUNCTION__ = "_ast_sched_del"
#6  0x00007fc5de38e815 in ast_rtp_stop (instance=0x7fc604045ab8) at res_rtp_asterisk.c:4590
       rtp = 0x7fc60404a510
       addr = {ss = {ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, len = 0}
       __PRETTY_FUNCTION__ = "ast_rtp_stop"
#7  0x000000000068e7ec in ast_rtp_instance_stop (instance=0x7fc604045ab8) at rtp_engine.c:1037
No locals.
#8  0x00007fc5d8eb3bcf in stream_destroy (session_media=0x7fc610045568) at res_pjsip_sdp_rtp.c:1170
No locals.
#9  0x00007fc5eb39cc41 in session_media_dtor (obj=<value optimized out>) at res_pjsip_session.c:1038
       session_media = 0x7fc610045568
{noformat}

Somehow, we appear to have a valid RTCP scheduler ID, but when deleting it we don't have anything in your scheduler context corresponding to it.
Comments:By: Mark Michelson (mmichelson) 2014-08-21 18:31:00.501-0500

The cause of this is actually pretty simple. The session happens to be destroyed at the same time that a scheduled RTCP transmission is occurring. Since the scheduled RTCP transmission is currently not in the scheduler heap, the scheduler can't delete it. In the testsuite, since DO_CRASH is enabled, the triggered assertion results in a crash and a test failure.

Fixing this will be interesting. A good start would be to have the scheduler context track which task it is currently running so that that may be detected when attempting to delete a scheduler entry. At least with that, we can detect the circumstance and not fail an assertion. What we then do when detecting that situation is a different story. I think the easiest thing to do would be to mark the scheduler entry in such a way that it does not enter back into the heap and return successful deletion of the scheduled entry. The locking in place in the scheduler should prevent race conditions, and the party that is deleting the scheduler entry should presumably also be unreffing/deleting the data that was attached to the scheduler entry.