ASTERISK-21385: SIP Channel Lock

[Home]

Summary: ASTERISK-21385: SIP Channel Lock

Reporter: Ross Beer (rossbeer) Labels:

Date Opened: 2013-04-06 13:36:00 Date Closed: 2013-04-26 11:33:30

Priority: Major Regression?

Status: Closed/Complete Components: Channels/chan_sip/General

Versions: 1.8.21.0 11.4.0 Frequency of
Occurrence Constant

Related
Issues:
is related to ASTERISK-21386 SIP Channel Locks

Environment: Cent OS Attachments: ( 0) Asterisk_1.8.19.0_Lock_at_10CPS_460_Concurrent_Calls.txt
( 1) Asterisk_1.8.21.0_Lock_at_10CPS_360_Concurrnet_Calls.txt
( 2) Asterisk_10.12.2_Lock_at_10CPS_560_Concurrent_Calls.txt
( 3) Asterisk_11.2.0_Lock_at_10CPS_50_Concurrent_Calls.txt
( 4) Asterisk_11.3.0_Lock_at_10CPS_50_Concurrent_Calls.txt
( 5) ThreadLock2.rtf
( 6) ThreadLocks.txt

Description: When above 10 calls per second are placed there is a thread lock which means that the SIP channel stops processing calls. There is a fix for the general SIP channel in 11.4 however this does not appear to resolve Realtime ODBC issues.

RealTime peers only seam to use a single connection to the ODBC database even when configured to use more:

ODBC DSN Settings
-----------------

Name: mysql
DSN: MySql
Last connection attempt: 1970-01-01 01:00:00
Pooled: Yes
Limit: 100
Connections in use: 3
- Connection 1: in use (res_config_odbc.c:180 realtime_odbc)
- Connection 2: connected (:0 )
- Connection 3: connected (:0 )

The RealTime ODBC process should allow as many connections as possible else the backlog causes the SIP channel to stop responding.

I believe the SIP channel is threaded and therefore should be able to use more connections as needed.

This is a major issue as SIP registrations stop when there are 100's of devices registered and calls are being made via the SIP channel.

On each outbound call, the database is access to lookup the peer making the call. The peer is not stored in cache for any period of time and therefore this also increases the need for accessing the database.

Comments: By: Ross Beer (rossbeer) 2013-04-06 13:37:05.458-0500

Core Show Locks
By: Joshua C. Colp (jcolp) 2013-04-06 13:39:06.994-0500

This is not a problem with ODBC but a problem with chan_sip. It is *not* multithreaded for UDP traffic. All traffic uses a single thread so a blocking database call will cause problems. Unfortunately changing this is rather drastic and would not be placed into a release branch.
By: Ross Beer (rossbeer) 2013-04-06 14:35:36.198-0500

I have removed the realtime element here and the issue still persists.

I think this may be more related to the chan_sip it's self, please see ThreadLocks2.txt

By: Ross Beer (rossbeer) 2013-04-07 11:10:45.685-0500

I created another ticket when you closed this one which has the same information. Can you please merge the ticket ASTERISK-21386?

I have been testing with different asterisk versions and have some interesting results:

Asterisk 1.8.19.0 460 Concurrent Calls Before Lock

Asterisk 1.2.21.0 360 Concurrent Calls Before Lock

Asterisk 10.12.2 560 Concurrent Calls Before Lock

Asterisk 11.2.0 50 Concurrent Calls Before Lock

Asterisk 11.3.0 50 Concurrent Calls Before Lock

I have attached the lock status for each version. The best performing version is Asterisk 10.

All of the locks appear to be in the same place.
By: Matt Jordan (mjordan) 2013-04-11 09:57:59.610-0500

Ross:

I've taken a look at the various backtraces you've provided. While the symptoms may feel like a deadlock, none of the backtraces actually indicate a deadlock situation.

When a deadlock occurs, you will have some form of circular waiting. Take two threads - 0x01 and 0x02. A 'core show locks' output that indicates a deadlock will look something like this:

{noformat}
=== Thread ID: 0x01 (pbx_thread started at [ 5597] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 4880 ast_write chan 0x7f883c4eb700 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4d6eae]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xb6) [0x4d2636]
/lib64/libpthread.so.0(+0x7851) [0x7f88c5b73851]
/lib64/libc.so.6(clone+0x6d) [0x7f88c686190d]
=== ---> Waiting for Lock #1 (chan_sip.c): MUTEX 12978 cb_extensionstate p 0xac477790 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8105607]
/usr/sbin/asterisk() [0x8080295]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8080e3a]
/lib/libpthread.so.0(+0x5afe) [0xb7477afe]
/lib/libc.so.6(clone+0x5e) [0xb76b564e]
=== — ---> Locked Here: chan_sip.c line 7514 (find_call)

=== Thread ID: 0x02 (pbx_thread started at [ 5597] pbx.c ast_pbx_start())
=== ---> Lock #0 (chan_sip.c): MUTEX 4880 sip_dosomething p 0xac477790 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4d6eae]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xb6) [0x4d2636]
/lib64/libpthread.so.0(+0x7851) [0x7f88c5b73851]
/lib64/libc.so.6(clone+0x6d) [0x7f88c686190d]
=== ---> Waiting for Lock #1 (channel.c): MUTEX 12978 chan_do_stuffs c 0x7f883c4eb700 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8105607]
/usr/sbin/asterisk() [0x8080295]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8080e3a]
/lib/libpthread.so.0(+0x5afe) [0xb7477afe]
/lib/libc.so.6(clone+0x5e) [0xb76b564e]
=== — ---> Locked Here: channel.c line 12397 (ast_write)
{noformat}

In this contrived example, thread 0x01 has locked a channel (mutex 0x7f883c4eb700) and is attempting to lock a SIP private data structure (mutex 0xac477790). At the same time, thread 0x02 has locked the SIP private data structure (mutex 0xac477790) and is attempting to lock the channel owned by thread 0x01 (mutex 0x7f883c4eb700). Neither can win, and so both threads are stuck. Deadlock!

In your backtraces, there is no indication of this condition. Rather, the 'core show locks' output merely shows that some locks are held. In many of them, the locks that are held are related to the ODBC connection, which isn't surprising - a database connection can take some time to complete, and there can be lots of contention for that lock. All the output verifies is that there are threads that hold locks - not that any of them are blocked indefinitely.

So why is the system apparently 'stalling'?

{{DEBUG_THREADS}} is a performance killer. I can't overstate that. The {{DEBUG_THREADS}} compile time option essentially turns every lock into a single global mutex. This means that every time any one thread grabs a lock, all mutexes are blocked for a period of time. This practically single threads the entire system.

In the past, {{DEBUG_THREADS}} was not as aggressive as it is in 1.8+. It was made more aggressive because in prior versions of Asterisk, the data it gave back was inaccurate - which was less than useful when debugging deadlocks. As a result, the setting *will* help provide information regarding a deadlock, but it will kill a production system (and in some situations, is simply too aggressive for production systems).

If you compile Asterisk without {{DEBUG_THREADS}} and re-run some of your tests, do you encounter the same situation at the same level of load?
By: Rusty Newton (rnewton) 2013-04-26 09:41:20.643-0500

Ross I'm pinging you again on this issue to make sure you saw Matt's question. We didn't see a response back from you in ~2 weeks.
By: Ross Beer (rossbeer) 2013-04-26 09:48:39.817-0500

I can confirm that running with 'DEBUG_THREADS' disabled does improve the performance. Please close this ticket and thank you for your assistance.