I accidentally discovered a situation where Asterisk is guaranteed to deadlock the PJSIP monitor thread, thereby no longer allowing SIP traffic to be processed.
Have a device subscribe to a resource in Asterisk (in my case, the device subscribed to presence), then immediately kill the connection between Asterisk and the device.
Asterisk sends a 200 OK in response to the incoming SUBSCRIBE, and then follows that up with a NOTIFY to give the initial state. Since the device is unreachable, Asterisk continually resends the NOTIFY until the transaction times out. When the transaction times out, the monitor thread in res_pjsip.c handles the transaction timeout event, which bubbles through the stack. When the dialog layer handles the event, it locks the dialog before calling up further. When the event reaches us, we terminate the subscription and attempt to destroy the ast_sip_subscription object. The destructor pushes a synchronous task to the threadpool to destroy the serializer on the dialog, which requires locking the dialog. Since the dialog is already locked by the thread handling the transaction timeout, the synchronous task fails to lock. From this point, Asterisk is no longer capable of handling SIP traffic.
I am attaching two files to assist in debugging.
deadlock.txt is a backtrace during the deadlock. Threads 81 and 82 are the threads of interest.
uac-subscribe-presence-noreply.xml is a SIPp scenario that can trigger the deadlock. The scenario creates a subscription and then ends after receiving the initial NOTIFY from Asterisk. You'll have to wait about a minute after the scenario finishes executing for the deadlock to actually occur. You can invoke the SIPp scenario as follows:
I'm keeping this issue private for now since this may be exploitable as a DOS attack. An attacker would need to properly authenticate and successfully subscribe to a real resource in order to exploit this, though.