[Home]

Summary:ASTERISK-20553: Locks and linked list corruption
Reporter:Octavio Ruiz (tacvbo)Labels:
Date Opened:2012-10-10 23:20:31Date Closed:2012-11-27 17:27:35.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Core/PBX
Versions:11.0.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:CentOS release 6.3 (Final) Linux 2.6.32-279.5.2.el6.x86_64 Attachments:( 0) core.1.bt
( 1) core.2.bt
( 2) core.3.bt
( 3) core.4.bt
( 4) core.5.bt
( 5) core.6.bt
( 6) core.7.bt
( 7) core.8.bt
Description:When using Asterisk 11 beta1, beta2 and release candidate 1 as a SIP to SIP gateway transcoding calls from G.711 to G.729 using a Sangoma D100 card it segfaults randomly. Backtraces does not seems to relate the use of this card to the segfaults, last known version that worked stable was 1.8.12.
Comments:By: Octavio Ruiz (tacvbo) 2012-10-10 23:44:54.006-0500

bt, bt full, thread apply all bt.

By: Matt Jordan (mjordan) 2012-10-11 08:39:55.333-0500

In each of these backtraces, {{codec_sangoma}} (most often) is holding a lot of locks around the format list.  It appears as if something has either locked the reentrancy lock and not unlocked it (very bad), or there is a recursive case with the read-write locks that isn't shown in the backtraces (also very bad).
# Does this problem occur if {{codec_sangoma}} is not used?
# Has Sangoma provided a {{codec_sangoma}} for Asterisk 11?  If not, I would not expect one built for Asterisk 1.8 to be compatible with Asterisk 11.
# Does this occur when you don't have {{DEBUG_THREADS}} enabled?  Since things are blocking on the reentrancy lock, you have it defined, and not only will this slow down the system significantly, but it may be actually whats causing this problem.

By: Octavio Ruiz (tacvbo) 2012-10-12 20:23:01.091-0500

Thank you for your response, Matt.

1. Can't tell if this problem occur if codec_sangoma is not used because transcoding is the main purpose of the system.

2. Sangoma provided a codec_sangoma for Asterisk 11. They're also taking a look at this. There are no esoteric dialplan or asterisk use and worked fine with Asterisk 1.8.12.

3. Yes, there is no difference between DEBUG_THREADS enabled or disabled, actually I activated in order to have proper debug info.

I'll post more information, new backtraces and Sangoma's resolution later.

By: Octavio Ruiz (tacvbo) 2012-10-12 20:30:40.652-0500

bt, bt full, thread apply all bt.

By: Matt Jordan (mjordan) 2012-10-15 09:27:40.593-0500

I can say that I haven't seen any other issues remotely like this reported against Asterisk 11.  As there were *major* changes to the media format architecture in Asterisk 10, I would have imagined that any codec module would require some retooling in order to interoperate properly with the new media format architecture.  Without further evidence to the contrary, it appears as if something in {{codec_sangoma}} is corrupting the memory for a lock or is in someway causing a mutex to be put into an invalid state.  The fact that the mutex in {{astmm}}, which is a static lock that protects memory allocations and is only available when {{MALLOC_DEBUG}} is enabled, is also causing a seg fault (core.7 and core.8) is disturbing - that mutex should exist throughout the lifetime of Asterisk and other than being locked/unlocked, its state should not be altered.

Note that in each of the backtraces I've looked at, {{codec_sangoma}} is on the critical path where the segmentation fault occurs.

It may be worthwhile running the system under valgrind for a short period of time to see if there are any memory corruptions occurring prior to the crash.

By: Rusty Newton (rnewton) 2012-11-27 17:27:35.373-0600

Closing this as it appears to not be a bug in Asterisk. If the valgrind output discussed above is captured, this issue may be reopened. The reporter can contact a developer in #asterisk-bugs (irc.freenode.net) to get it reopened and looked at.