[Home]

Summary:ASTERISK-25374: Crash in CDR handle_dial_message where peer is null
Reporter:Scott Griepentrog (sgriepentrog)Labels:
Date Opened:2015-09-04 07:34:00Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:CDR/General
Versions:13.5.0 Frequency of
Occurrence
One Time
Related
Issues:
Environment:CentOS 7 VPS with Asterisk 13 current GITAttachments:( 0) backtrace-core.20280.txt
( 1) full-log-core.20280.txt
Description:Crash in handle_dial_message() on null peer passed to filter_channel_snapshot.

Crash happened during stress caused by rapid influx of SIP invite exploit attempts.

Note: dialplan involved is able to pass h extension to Dial(): {{exten => _.,1,Dial(PJSIP/100,6)}}


Comments:By: Scott Griepentrog (sgriepentrog) 2015-09-04 07:35:00.317-0500

Attached [^backtrace-core.20280.txt] and [^full-log-core.20280.txt]

By: Scott Griepentrog (sgriepentrog) 2015-09-04 07:38:18.989-0500

Notes from [~mjordan]:
{noformat}
caller can legitimately be NULL
peer should never be NULL
When we get a Dial message, one of two things has happened
It's a normal Dial, in which case you have a caller and a peer
Or it is an Originate, in which case you only have a peer
Soo.....
peer being NULL is actually terrifying
Someone made a VERY bad Dial message.
So, it really isn't the CDR code's fault.
My guess is that in our app_dial code, we have a bug on receiving a CANCEL
That could be a race
Things are definitely the most wonky there. Anyway, I'd check to see if we have code in the construction of the Stasis Dial message that makes sure that we have a peer, and doesn't publish if we don't
CDR was the first one to crash, but I'd bet AMI or others would as well
{noformat}

By: Scott Griepentrog (sgriepentrog) 2015-09-04 10:27:03.744-0500

The test system in this case has only 1g ram and no swap, so an allocation failure of peer that was not immediately detected could be the culprit here.