[Home]

Summary:ASTERISK-23012: crash in pjsip_transport_dec_ref when called from rx_task_data_destroy in res_pjsip_registrar
Reporter:Matt Jordan (mjordan)Labels:
Date Opened:2013-12-17 06:44:54.000-0600Date Closed:2014-09-24 12:17:33
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_registrar
Versions:12.0.0-beta2 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-22994 pjsip/incoming_calls_without_auth: test bounces
Environment:Attachments:( 0) backtrace_6600.txt
( 1) backtrace_9154.txt
Description:Found by the Test Suite in channels/pjsip/registration/inbound/nominal/mixed/unauthed:

https://bamboo.asterisk.org/bamboo/browse/AST-ATSF4-C632TE-100/test/case/1413778

Backtrace is attached.
Comments:By: Matt Jordan (mjordan) 2014-02-03 09:19:34.259-0600

Another more recent crash: https://bamboo.asterisk.org/bamboo/browse/AST-ATSF-C632TE-157/artifact

By: Matt Jordan (mjordan) 2014-03-13 15:55:04.019-0500

We decided this was most likely being caused by the same problem as ASTERISK-22994. Once that is fixed, this should be fixed as well.

This was a lot harder to reproduce than the other issue. If it turns out we were wrong, we'll reopen this.

By: Kevin Harwell (kharwell) 2014-03-13 16:03:42.387-0500

At this time I have been unable to reproduce the problem after 1000+ runs of the test that uncovered the problem to begin with and the test has not crashed on the build agents in over a month, so for now the issue is going to be closed out.  Hard to fix something that is not showing to be broken (well any more).

But just in case this gets reopened:

Looking at the _pjsip_transport_dec_ref_ function (in sip_transport.c) there is an assert check on the transport ref count.  It expects it to be greater than zero upon entering the function.  So the ref on the transport is decremented one too many times before entering this function.  At a quick glance all transport dec refs seem to be associated with a corresponding add ref.

However (this is just a guess at some direction and only a guess after a cursory look at the code), in the _pjsip_rx_data_clone_ function a shallow copy of the transport takes place and then before bumping transport ref message headers are copied.  Perhaps a race condition?  I am not sure, although there seems to be no locking going on, but I may have missed something.

By: Matt Jordan (mjordan) 2014-04-28 12:00:27.754-0500

Another failure was caught in the {{channels/pjsip/registration/inbound/off-nominal/no_contact_header}} test:

https://bamboo.asterisk.org/bamboo/browse/AST-ATSF-C632TE-233

Backtrace attached