Summary: | ASTERISK-23012: crash in pjsip_transport_dec_ref when called from rx_task_data_destroy in res_pjsip_registrar | ||||
Reporter: | Matt Jordan (mjordan) | Labels: | |||
Date Opened: | 2013-12-17 06:44:54.000-0600 | Date Closed: | 2014-09-24 12:17:33 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Resources/res_pjsip_registrar | ||
Versions: | 12.0.0-beta2 | Frequency of Occurrence | |||
Related Issues: |
| ||||
Environment: | Attachments: | ( 0) backtrace_6600.txt ( 1) backtrace_9154.txt | |||
Description: | Found by the Test Suite in channels/pjsip/registration/inbound/nominal/mixed/unauthed:
https://bamboo.asterisk.org/bamboo/browse/AST-ATSF4-C632TE-100/test/case/1413778 Backtrace is attached. | ||||
Comments: | By: Matt Jordan (mjordan) 2014-02-03 09:19:34.259-0600 Another more recent crash: https://bamboo.asterisk.org/bamboo/browse/AST-ATSF-C632TE-157/artifact By: Matt Jordan (mjordan) 2014-03-13 15:55:04.019-0500 We decided this was most likely being caused by the same problem as ASTERISK-22994. Once that is fixed, this should be fixed as well. This was a lot harder to reproduce than the other issue. If it turns out we were wrong, we'll reopen this. By: Kevin Harwell (kharwell) 2014-03-13 16:03:42.387-0500 At this time I have been unable to reproduce the problem after 1000+ runs of the test that uncovered the problem to begin with and the test has not crashed on the build agents in over a month, so for now the issue is going to be closed out. Hard to fix something that is not showing to be broken (well any more). But just in case this gets reopened: Looking at the _pjsip_transport_dec_ref_ function (in sip_transport.c) there is an assert check on the transport ref count. It expects it to be greater than zero upon entering the function. So the ref on the transport is decremented one too many times before entering this function. At a quick glance all transport dec refs seem to be associated with a corresponding add ref. However (this is just a guess at some direction and only a guess after a cursory look at the code), in the _pjsip_rx_data_clone_ function a shallow copy of the transport takes place and then before bumping transport ref message headers are copied. Perhaps a race condition? I am not sure, although there seems to be no locking going on, but I may have missed something. By: Matt Jordan (mjordan) 2014-04-28 12:00:27.754-0500 Another failure was caught in the {{channels/pjsip/registration/inbound/off-nominal/no_contact_header}} test: https://bamboo.asterisk.org/bamboo/browse/AST-ATSF-C632TE-233 Backtrace attached |